Professional Documents
Culture Documents
Manual
No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any
language or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual
or otherwise without the prior written consent of Maipu Communication Technology Co., Ltd.
Maipu makes no representations or warranties with respect to this document contents and specifically disclaims any implied
warranties of merchantability or fitness for any specific purpose. Further, Maipu reserves the right to revise this document
and to make changes from time to time in its content without being obligated to notify any person of such revisions or
changes.
Maipu values and appreciates comments you may have concerning our products or this document. Please address comments
to:
All other products or services mentioned herein may be registered trademarks, trademarks, or service marks of their
respective manufacturers, companies, or organizations.
Accessibility:
(Contents, index, headings, numbering)
Good Fair Average Poor
Editorial:
(Language, vocabulary, readability, clarity, technical accuracy, content)
Good Fair Average Poor
Your Please check suggestions to improve this document:
suggestions to Improve introduction Make more concise
improve the
Improve Contents Add more step-by-step procedures/tutorials
document
Improve arrangement Add more technical information
Include images Make it less technical
Add more detail Improve index
Contents
Overview................................................................................................ 16
OSI Model ............................................................................................................ 16
Physical Layer .................................................................................................................... 17
Data Link Layer .................................................................................................................. 17
Network Layer.................................................................................................................... 18
Transmission Layer ............................................................................................................. 19
Session Layer..................................................................................................................... 19
Representation Layer .......................................................................................................... 19
Application Layer ................................................................................................................ 20
VLAN Technology................................................................................... 50
Overview and Principle .......................................................................................... 50
Overview ........................................................................................................................... 50
VLAN Principle .................................................................................................................... 51
Overview
Main contents:
OSI model
OSI Model
The OSI model is composed of seven layers: physical layer, data link layer,
network layer, transmission layer, session layer, presentation layer, and
application layer (see figure 1-1). Each layer processes specific
communication tasks. It exchanges data with the next layer of the protocol
stack through the protocol-based communication. The communication
between two network devices is implemented through the transfer of data
in the protocol stack of the devices. For example, if a workstation wants to
communicate with a server, the task starts from the application layer of
the workstation, certain information formatted by the lower layer, and
then the data reaches the physical layer. Then, the data is transmitted to
the server through the network. The server obtains information from the
physical layer of the protocol stack. Then, sends information to the upper
layer to explain the information until the information reaches the
application layer. Each layer can be called as the name and can be
identified thr9ough the location in the protocol stack. For example, the
bottom layer can be called as the physical layer of the first layer.
The function implemented at the bottom layer is relevant with the physical
communication, for example, frame creation, transmission of packet-
contained signals; the middle layer coordinates the network
communication between nodes: ensure uninterrupted session, and error-
free communication The work of the highest layer affect the application
and data representation of the software, including data format, encryption,
data and file transmission management. Generally, these layers are called
protocol stack.
Physical Layer
The bottom layer of the OSI model is the physical layer. It includes the
following items:
Network plug
Network topology
The physical layer device transmits and receives signals containing data, it
should generate, carry, and check the voltage. In the signal transmission,
the physical layer processes the data transmission rate, monitors the data
error frequency, and processes the voltage and electrical level.
Data
Network Layer
In the protocol stack, the third layer from the bottom is the network layer.
All networks are composed of physical route (cable path) and logical route
(software path). The network layer reads the packet protocol address
information and forwards each packet along the best path (including
physical and logical paths) to transmit data effectively. In this layer, the
packets can be sent from one network to another through the router. The
path of the network layer control packets is similar to the traffic controller.
It routes the packets through the most effective path. To determine the
best path, the network layer needs to collect the information about
network and node addresses. The process is called discovery.
The network layer can route data on different paths by creating virtual
(logical) circuit. The virtual circuit is a logical communication path for
sending and receiving data. The virtual circuit is for the network layer only.
The network layer manages the data along multiple virtual circuits. Then,
when the data is reached, the wrong sequence may occur. The network
layer checks the data sequence before the data is transmitted to the next
layer. If necessary, correct the sequence. The network layer needs to
adjust the size of the frame to meet the requirements of the receiving
network.
Transmission Layer
Similar to the data link layer and the network layer, the function of the
transmission layer is to ensure the reliable transmission of data from the
sending node to the destination node. For example, the transmission layer
ensures that the data is sent and received in the same sequence. The
receiving node will returns response after the transmission. When the
virtual link is adopted in the network, the transmission layer is also
responsible for tracing the ID specified to each circuit. The ID is called port,
connection ID, or socket, which is specified by the session layer. The
transmission layer needs to determine the level of the packet error
detection. The highest level can ensure that the packets can be
transmitted to from one node to another without any error in the tolerable
time.
Session Layer
The session layer is responsible for establishing and maintaining the
communication link between two nodes. It also determines correct
sequence for the communication between nodes. For example, it can
determine the first transmission node. The session layer can also
determine the transmission distance and how to restore from the
transmission error. If the transmission layer is interrupted in the lower
layer, the session layer will try to re-establish the communication.
Representation Layer
This layer processes the data formatting problem. Different software
applications use different data formatting scheme. Therefore, the data
formatting is necessary. To some degree, the representation layer is
similar to the syntax checker. It ensures that the numbers and texts can
be sent in the format that can be recognized by the receiving node. For
example, the data sent from the IBM mainframe may use the EBCDIC
character formats. For the workstation running Window 95 or Windows98
can read the information, the data must be expressed in ASCII character
format.
Application Layer
The application layer is the highest layer of the OSI model. It controls the
access to the application programs and network services. The network
services include file transmission, file management, remote access to file
and printer, email, and terminal simulation. The programmer uses the
layer to connect the workstation to the network service, for example,
connect the application link to the email, or provide database access on
the network.
The OSI model is also applied to the network hardware and software
communication. To meet the standard, the network hardware and
software must contain the layers of the OSI model. The following table
lists the matching conditions of network hardware/software and specific
OSI model.
Physical layer Cable circuit, cable socket, multiplex adapter, sender, receiver,
transceiver, passive hub, passive cable connector, repeater, and
gateway.
Table 1-1 Network hardware and software related with OSI model layers
Simple Ping
The simple ping command can be used in the common user mode and the
privileged user mode of the Maipu switch. The method is as follows:
Switch>ping 131.199.130.3
The returned response characters are as follows:
Successful response
. timeout wait
U unavailable destination
& TTL timeout
It summarizes the results of sending 5 packets with the successful
proportion. If the ping is successful, it indicates that the network is normal
at the network layer. In addition, the two hosts can be connected to the
network layer.
Expanded Ping
Sometimes, the simple ping command cannot provide expected test for
some faults. In this case, the privileged mode of the Maipu switch provides
the expanded ping command. Ping is an interactive mode. It provides the
quantity, size, timeout value, and data format to respond to different
prompts. The usage method is as follows: Switch# ping <CR>.
Then, you will be prompted to set parameters. You can also read the help
file of the command.
show process
This command is to display the major tasks and the running status.
switch#show process
Displayed Content
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tExcTask 2a2aa8 2ffe458 0 PEND 2b8b38 2ffe368 3d0001 0
tLogTask 2ad798 2ffbad0 0 PEND 2b8b38 2ffb9f0 0 0
tExcTrace 103050 2fe98b8 10 PEND 2bf428 2fe9450 0 0
tSysWdog 2fc2e8 2ff7178 15 DELAY 2cc8e8 2ff70f8 0 3
tShell1 1291f0 13280c0 20 PEND 2bf428 1327840 c0002 0
tSysLog 43ebdc 16173e8 40 PEND 2bf428 1617318 3d0001 0
tFwdTask 356a18 235fd78 45 PEND 2bf428 235fcd8 0 0
tMonDscc 3e9fac 1e638f0 45 DELAY 2cc8e8 1e63848 0 66
tNetTask 356984 23626a0 50 PEND 2bf428 23625e8 0 0
tSysTimer 122f88 235d410 50 PEND 2bf428 235d378 0 0
tActive 2fb32c 16087c8 55 DELAY 2cc8e8 1608738 0 8
show cpu
Display the CPU usage of each task.
switch#spy cpu
switch#show cpu
Displayed Content
INTERRUPT 0% ( 0) 0% ( 0)
IDLE 99% ( 447) 100% ( 13)
TOTAL 99% ( 450) 100% ( 13)
Note
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
1% 1% 1% 1% 1% 1% 1% 1% 1% 1%
1% 1% 1% 1% 1% 1% 1% 1% 1% 1%
1% 2% 1% 1% 1% 1% 1% 2% - -
- - - - - - - - - -
- - - - - - - - - -
- - - - - - - - - -
1% - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
Note
CPU utilization for five seconds The CPU usage in the recent 5 seconds
one minute The CPU usage in the recent 1 minute
five minutes The CPU usage in the recent 5 minutes
CPU utilization per second in the past 60 The CPU usage per second in the recent 60 seconds
seconds
CPU utilization per minute in the past 60 The CPU usage per minute in the recent 60 minutes
minutes
CPU utilization per quarter in the past The CPU usage per quarter in the recent 96 quarters
96
quarters
- The time is not up
show stack
Display the task stacks in the system:
switch#show stack
Displayed Content
NAME ENTRY TID SIZE CUR HIGH MARGIN
------------ ------------ -------- ----- ----- ----- ------
tExcTask 0x00002a2aa8 2ffe458 7984 240 472 7512
tLogTask 0x00002ad798 2ffbad0 4984 224 376 4608
tExcTrace 0x0000103050 2fe98b8 7984 1128 1360 6624
tMonitor 0x0000102198 12f1438 2032 136 200 1832
tSysWdog 0x00002fc2e8 2ff7178 3984 128 360 3624
tShell1 0x00001291f0 13280c0 16376 2176 3552 12824
tSysLog 0x000043ebdc 16173e8 5112 208 1088 4024
tFwdTask 0x0000356a18 235fd78 9984 160 1384 8600
tMonDscc 0x00003e9fac 1e638f0 7984 168 1048 6936
tNetTask 0x0000356984 23626a0 9984 184 1064 8920
tSysTimer 0x0000122f88 235d410 10224 152 328 9896
tCheckCpu 0x00004f14dc 12f0008 8176 176 4544 3632
tActive 0x00002fb32c 16087c8 3992 144 424 3568
tSysTask 0x0000449a54 2f43d30 9984 176 240 9744
tTnd00 0x00004f774c 2feead8 10232 2544 3448 6784
tSh00 0x00004fab9c 12f8098 20472 2600 5864 14608
tTffsPTask 0x00005609b0 2ff7e88 2032 136 416 1616
tTelnetd 0x00004f83fc 16066f0 10224 256 976 9248
tSnmpd 0x00004cf0f8 1322c90 28664 2616 4800 23864
tSnmpTmr 0x00004cee20 1323ea8 4080 256 536 3544
tIdle 0x0000102304 12f0a20 2040 128 408 1632
INTERRUPT 5000 0 800 4200
Note
show semaphore
Display the major semaphores used in the system and the status:
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Note
show memory
Display the memory usage in the system:
switch#show memory
Displayed Content
Memory management mechanism, types, and usage.
SUMMARY
-------
Type Used bytes Free bytes Total bytes Used percent
Note
The space of all such memory types exclude CODE is part of the HEAP's
used memory,for example:MBUF,SLAB,and FPSS if exists.
The memory of all memory management mechanisms (such as MBUF, SLAB, and FPSS-if
they exist) except the CODE segment are part of the used
memory of HEAP.
STATISTICS
----------
Used bytes Free bytes Total bytes Used percent
Note
HEAP Stack memory, the most basic memory area in the system. Other re-
allocation memory management mechanisms are separated from the
area.
CODE Code segment memory, used in the area for saving code segment
SLAB A memory re-allocation management mechanism
show arp
Display the ARP cache of the system.
switch#show arp
Displayed Content
Protocol Address Age (min) Hardware Addr Type Interface
Internet 128.255.41.40 2 0022.153b.55e4 ARPA vlan1
Internet 128.255.41.47 - 0001.7a5c.004a ARPA vlan1
Internet 128.255.43.254 0 0001.7a58.19ba ARPA vlan1
Note
show ip socket
Display the information about the sockets in the active status:
switch#show ip socket
Displayed Content
Active Internet connections (including servers)
PCB Proto Recv-Q Send-Q Local Address Foreign Address vrf (state)
-------- ----- ------ ------ ---------------------- ---------------------- ------- -------
Recv-Q The quantity of data received in the receiving cache of the socket
Local Address The local IP address and port number bound with the socket (0.0.0.0.23
indicates that the IP address is any of the all local IP addresses; the port
number is 23).
Foreign Address The foreign IP address and the port number corresponding to the socket.
Vrf VPN route forwarding
state The status of the socket (effective to the TCP)
show pool
Display the three commands in the current cache pool:
Show pool information (show the actual information about the cache chain)
Switch# sh pool
Displayed Content
Driver pool
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
1884 11008 10496 3906
-------------------------------------------------------------------------------
Size: 21247488 bytes
Data pool
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
64 18000 17983 1611
128 36000 35943 175
256 3424 3422 40
512 2400 2394 20
1024 180 180 0
2048 300 300 0
-------------------------------------------------------------------------------
Size: 14442240 bytes
*** pool The name of the cache pool, for example, data pool
is the cache pool used by the upper layer protocol and the driver pool is
the cache pool used by the drivers.
All MBUF pool size: the size of the memory occupied by all cache pools
Data pool
fact free number: the actual number of mblks of traversed mblk links
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 1024
number of mbufs: 1024
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
1556 512 256 599
-------------------------------------------------------------------------------
Link pool
256 10 10 0
512 10 10 0
1024 10 10 0
2048 100 100 0
-------------------------------------------------------------------------------
Size: 461120 bytes
sys pool
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 1
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 8000
number of mbufs: 8000
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
64 800 800 4
128 200 199 27520
256 200 200 0
512 100 100 0
1024 80 80 0
2048 50 50 0
-------------------------------------------------------------------------------
Size: 767000 bytes
Driver pool
Statistics for the network stack mbuf
type number
--------- ------
FREE : 1388
DATA : 112
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 56
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 4
DRVEXTSCC: 4
TOTAL : 6000
number of mbufs: 6000
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
1600 6000 5936 2446
-------------------------------------------------------------------------------
Size: 10080000 bytes
Note
number of times failed to find space: 0 The times of failed to applying for
the mbuf
number of times waited for space: 0 The times of waiting in applying for
mbuf
CLUSTER POOL TABLE The statistics of cluster pool of the current mbuf
pool
size clusters free usage cluster size, total data, free, used
netstat -m
Display the statistics of the system data pool:
switch#netstat -m
Displayed Content
Statistics for the network stack mbuf
type number
FREE : 7999
DATA : 0
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 1
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 8000
number of mbufs: 8000
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
64 800 800 9
128 200 199 20
256 200 200 0
512 100 100 0
1024 80 80 0
2048 50 50 0
-------------------------------------------------------------------------------
Note
The command displays the statistics of the system data pool. The display
format and the content of the show pool detail are the same as that of the
data pool. In the show pool detail command, the statistics of the system
data pool is also displayed.
show ip statistics
Display the statistics of the IP packets:
switch#show ip statistics
Displayed Content
Statistics for the IP protocol
total 1434
badsum 0
tooshort 0
toosmall 0
badhlen 0
badlen 0
infragments 0
fragdropped 0
fragtimeout 0
forward 0
cantforward 1403
redirectsent 0
unknownprotocol 0
toupper 31
nobuffers 0
reassembled 0
outfragments 0
noroute 0
rawsockout 0
badaddress 0
fastforwardtotal 0
fastforward 0
cannotfastforward 0
Note
show ip icmpstate
Display the statistics of the ICMP packets:
switch#show ip icmpstate
Displayed Content
Statistics for ICMP protocol
6929 calls to icmp_error
0 error not generated because old message was icmp
Output histogram:
echo reply: 5
destination unreachable: 24
0 message with bad code fields
0 message < minimum length
0 bad checksum
0 message with bad length
Input histogram:
echo: 5
#10: 2
5 message responses generated
Note
Switch Principles
This chapter describes the switch principles for users to understand the
later chapters.
Main contents:
In the later 1980s, the rapid increase of the semaphore boosts the
development of the technology. As a result, the LAN has increasingly
excellent performance. The 1M bps rate is replaced by the 100BASE-T and
100CGANYLAN. But, in the traditional media access method, lot of sites
share a common transmission media, namely CSMA/CD.
The development of the LAN switching technology goes back to the two-
port bridge. The bridge is a storage and forwarding device for connecting
similar LANs. According to the structure of the internet network, the bridge
is the DCE class point-to-point connection. According to the protocol layer,
the bridge stores and forwards the data frame in the logical link layer; it is
similar to the function of a repeater in the L1 and router in L3. The two-
port bridge and the Ethernet are developing at the same time.
Frame forwarding: forward the frames received from the input media to
the corresponding output media;
Frame Forwarding
The switch forwards frames according to the MAC address. When the
switch forwards frames, the following rules must be observed:
4. If the destination address and the source address of the frame are in
the same network segment, the frame is discarded and switching is
not performed.
When host D sends the broadcast frames, the switch receives frames with
the destination address of ffff.ffff.ffff from port E3, the frame is forwarded
to ports E0, E1, E2, and E4.
When host D communicates with host E, the switch receives frames with
the destination address of 0260.8c01.5555 from E3 port. Search the
address table and find that 0260.8c01.5555 is not in the table. Therefore,
the switch forwards the frames to E0, E1, E2 and E4 ports.
When host D communicates with host F, the switch receives frames with
the destination address of 0260.8c01.6666 from port E3. Search the
address table and find that 0260.8c01.6666 is at port E3, namely, the
address and the source address are in the same network segment.
Therefore, the switch does not forward the frame, and it drops the frame
directly.
When host D communicates with host A, the switch receives the frames
with the destination address of 0260.8c01.1111 from port E3. Search the
address table and find that 0260.8c01.1111 is at port E0. Therefore, the
switch forwards the frames to port E0. As a result, host A can receive the
frame.
When a frame reaches from a specific port, the switch gets the conclusion
according to the two items: from port X, the workstation specified by the
frame source address domain can be reached. Therefore, the switch can
update the forwarding database for the MAC address. To allow the change
of the network topology, each item of the database is configured with a life
timer. When a new item is added to the database, the timer is started. The
default value of the timer is 30 seconds. If the scheduled time is up, the
item searches the database to check whether any item with the same
address field value and frame address exists. If such item exists in the
database, the content of the item is updated. Reset the timer. If such item
does not exist in the database, add a new item in the database. The
address in the new item is the MAC address of the received frame; the
port number is the port of the received frame; the timer value is set to the
original value.
In the switching mode, the switch needs to receive certain data to check
the forwarding before the switching operation. If the length of the
detection data is increased, you can expand the L2 switching technology to
the L3, or even L4 switching technology.
The widely used multiple layer switching technology combines L2, L3, and
L4 switching technologies to implement one route, multiple switching
function.
The function of a router is more powerful than that of a switch. But the
rate of a router is low and the price is high.
The L3 switch is widely used for it has the line speed forwarding capability
of a switch and has the good control function of a router.
VLAN Technology
Main contents:
VLAN division
Typical application
Main contents:
Overview
VLAN principle
Overview
In the Ethernet communication, network problems including serious
conflict, flooded broadcast, and performance decreasing may be
encountered when the number of hosts is large. To solve the preceding
problems, the VLAN technology occurs. Each VLAN is a broadcast domain.
The hosts in a VLAN can communicate mutually. But the hosts between
VLANs cannot communicate with each other. As a result, the broadcast
packets are limited to a VLAN.
VLAN Principle
To identify packets of different VLANs, add VLAN tag in the packets. The
encapsulation format of the VLAN packets comply with IEEE 802.1Q, as
shown in the following figure.
DA: destination MAC address; SA: Source MAC address; Type: protocol
type of the packets. IEEE 802.1Q defines that after the destination MAC
address and the source MAC address, four-byte VLAN tag should be
encapsulated to identify the VLAN. The VLAN tag contains four fields
including Tag Protocol Identifier (TPID), priority, Canonical Format
Indicator (CFI), and VLAN ID.
TPID: identify the frame with VLAN tag; the length is 16bit; the value is
0x8100.
Priority: Indicates the 802.1P priority of the packets; the length is 3 bit.
VLAN ID: identifies the VLAN o the packets. The length is 12bit. The value
range is 0-4095. 0 and 4095 are the reserved value of the protocol. The
value range of VLAN ID is 1-4094.
VLAN Division
VLAN can be divided into different types. The common types are as follows:
Port-based VLAN
MAC-based VLAN
IP subnet-based VLAN
Protocol-based VLAN
In the default configuration, the priority (from high to low) of the four
types of VLANs is: MAC-based VLAN, IP subnet-based VLAN, Protocol-
based VLAN, and Port-based VLAN. In the same port, the VLAN division
takes effect according to the priority. Only one VLAN division takes effect.
Port-Based VLAN
In the Port-based VLAN, a port is regarded as a member of the port and
added to the VLAN. The port can forward the packets of the VLAN.
Port Types
The port modes can be classified into three types according to the mode of
processing packet tag.
Access:
The port belongs to one VLAN; the default VLAN ID of the port and the
home VLAN ID are the same; connected with user devices. The default
type of the port is Access.
Trunk:
The port allows multiple VLANs; receives and sends packets of multiple
VLANs; permits default VLAN packets without tag; used in interconnection
of network devices.
Hybrid:
The port can be added to multiple VLANs; receives and sends packets of
multiple VLANs; permits packets without tag of multiple VLANs; used in
interconnection of user devices and network devices.
The default VLAN of the Access port is the home VLAN. It cannot be
configured.
The Trunk port and the Hybrid port can belong to multiple VLANs. The
default VLAN can be configured.
MAC-based VLAN
The MAC-based VLAN divides VLAN ID for packets according to the source
MAC address of the received packets.
The untag packets received in the port are process as follows according to
different configuration:
1. If the source MAC and the MAC address of MAC-based VLAN are
consistent, and the In port of the packets is allocated to the VLAN of
the corresponding VLAN ID, the packet is allocated to the VLAN ID
corresponding to the MAC VLAN.
2. If the packets have no MAC set by the matched MAC VLAN, the
packets are divided to the default VLAN ID of the port.
IP subnet-based VLAN
The IP subnet-based VLAN divides VLAN ID for packets according to the
source IP address of the received packets.
The untag packets received in the port are process as follows according to
different configuration:
Protocol-based VLAN
The protocol-based VLAN divides VLAN ID for packets according to the
encapsulation format and protocol type of the received packets.
The protocol VLAN defines the protocol template. The protocol template is
composed of the frame encapsulation format and the protocol type. The
same port can be configured with multiple protocol templates. When the
protocol VLAN is enabled in the port, the port is configured with protocol
template, the protocol VLAN process the received untag packets as follows
according to different configuration.
1. If the packet matches the protocol template, and the In port of the
packet is allocated to the VLAN of the corresponding VLAN ID, the
packet is allocated to VLAN ID corresponding to the port configuration
protocol template.
Typical Application
In an enterprise, communication can be performed in the same
department located in different places. Communication cannot be
performed between different departments. The networking diagram is as
follows:
VLAN 10 VLAN 20
Link Aggregation
This chapter describes the link aggregation technology and its application.
Main contents:
Link aggregation
Typical application
Link Aggregation
This section describes the concept of the link aggregation.
Main contents:
LACP protocol
LACP Protocol
IEEE802.3ad-based LACP is a protocol for implementing the link dynamic
aggregation. The LACP protocol communicates with the opposite end
through the Link Aggregation Control Protocol Data Unit (LACPDU).
After the LACP protocol of a port is enabled, the port advertises the
system priority, system MAC address, port priority, port number, and the
operation key to the opposite end by sending LACPDU. After the opposite
end receives the information, compare the information with the
information saved in other ports to select port to aggregate. As a result,
the two parties can agree with each other on joining or exiting a dynamic
aggregation group.
1. Manual aggregation
Manual Aggregation
1. Overview
In the manual aggregation group, the status of the port can be Selected
and Unselected. Only the Selected port can receive user service packets;
the Unselected port cannot receive or send user service packets.
The system sets the port status (Selected or Unselected) according to the
following principles:
The any port in the aggregation group is in the Up status, select the
port with the highest priority and in the Up status to serve as the root
port of the group.
The port in the Up status with the same operation key as the root port
becomes the candidate port of the possible Selected port. Other ports
will be in the Unselected status.
In the manual aggregation group, only the ports with the same
configuration as the reference port can become the Selected ports.
The configuration covers the rate, duplex, and up/down status. Users
In the LACP aggregation group, the status of the port can be Selected and
Unselected.
The Selected ports and the Unselected ports in the up status can
receive and send LACP packets.
Only the Selected port can receive user service packets; the
Unselected port cannot receive or send user service packets.
The system sets the port status (Selected or Unselected) according to the
following principles:
The local system and the opposite system negotiate. The status of the
ports at two ends is determined by the port ID with higher device ID
priority. The negotiation procedure is as follows:
Compare the device IDs of the two ends (device ID= system priority +
system MAC address). Compare the system priorities. If the system
priorities are the same, compare the system MAC addresses. The end
with smaller device ID is considered to be prior (when the system
priority is low and the system MAC address is small, the device ID is
small)
Compare the port IDs of the end with the prior device ID (port ID =
port priority + port number). For the ports at the end with prior device
ID, compare the port priorities. If the priorities are the same, compare
the port numbers. The port with small port ID serves as the root port
of the aggregation group (the port with lower priority has smaller port
number, and the port ID is small).
When the port is consistent with the operation key of the root port and
is in the Up status, the configurations of the opposite port and the
opposite root port are the same, the port becomes the candidate port
of the Selected ports. Otherwise, the port is in the Unselected status.
In the LACP aggregation group, only the ports with the same
configuration as the root port can become the Selected ports. The
configuration covers the rate, duplex, and up/down status. Users need
to keep the basic configuration of each port same through manual
configuration.
The following figure illustrates the LACP aggregation. The priority of device
S is higher than the priority of device T. The member ports of aggregation
group 1 are A, B, C, E, D, and F. Port F is in the Down status. The rate of
port E is 10M and the rate of other ports is 100M. One aggregation group
supports only three ports.
LACP aggregation
1. Port A has the highest priority and is set to the Selected status first.
Therefore, port A is the root port of aggregation group 1.
3. The link is in the down status, and the aggregation status is set to
Unselected.
4. The rate of port E is different from that of root port A, and the
aggregation status is set to Unselected.
5. The rate and duplex of the port D are the same as root port A. But the
link priority is lower than B and C, therefore, the aggregation status is
set to Unselected.
Typical Application
Switch A:
Switch B:
MSTP
In the L2 switching network, the loopback may cause loop and propagation
of packets and thus broadcast storm is generated. As a result, all valid
bandwidth is occupied and the network is unavailable. The STP protocol
occurs accordingly. The STP is a L2 management protocol. It selectively
blocks redundant links to eliminate L2 loopback of the network. At the
same time, the protocol provides the link backup function.
This chapter describes the protocols of STP and focuses on the MSTP.
Main contents:
STP
RSTP
MSTP protocol
STP Overview
The basic idea of the STP protocol is very simple. Loopback does not occur
in the natural trees. If a network grows like a tree, no loopback will occur.
In the STP protocol, the Root Bridge, Root Port, Designated Port, and Path
Cost are defined. The purpose is to construct a tree to tailor the redundant
loopback and back up links and optimize paths. The algorithm for
constructing the tree is Spanning Tree Algorithm.
STP exchanges the BPDU information between bridges. First, the root
bridge is selected. The selection is based on the bridge ID composed of
bridge priority and MAC address. The bridge with smallest ID will become
the root bridge of the network. All ports are connected to the downstream
bridge. Therefore, all port roles become designated ports. Then, the
downstream bridge connected with the root bridge will select a most
robust branch to serve as the path of the root bridge. The role of the
corresponding port becomes the root port. Perform the operation to the
edge of the network. After the designated port and the root port are
determined, a tree is generated. After 30 seconds (default value), the
designated port and the root port enter the forwarding status. Other ports
enter the block status. The STP BPDU is transmitted from the designated
port of each bridge periodically to maintain the link status. If the network
topology changes, the spanning tree recalculates and the port status
changes accordingly. This is the basic principle of the spanning tree.
RSTP Overview
To solve the defect of STP convergence speed, in 2001, the IEEE defines
the RSTP based on IEEE 802.1w. The RSTP protocol improves the STP
protocol in the following three aspects to quicken the convergence (within
one second at maximum):
1. Set Alternate Port and Backup Port for the root port and the
designated port. When the root port fails, the alternate port becomes
the new root port and enters the forwarding status without any delay.
When the designated port fails, the backup port becomes the new
designated port and enters the forwarding status without any delay.
through handshaking with the downstream bridge. For the shared link
connecting more than three bridges, the downstream bridge does not
respond to the handshaking requests sent from the upstream
designated port. It waits for double Forward Delay time to enter the
forwarding status.
3. The port connected with terminals but not connected with other
bridges is defined as the Edge Port. The edge port can enter the
forwarding status without any delay.
1. There is only one spanning tree in the entire switching network. When
the network scale is large, the convergence time is long.
The defects cannot be removed by the single spanning tree. The MSTP
supporting VLAN occurs.
MSTP Protocol
Terms
Multiple Spanning Tree Regions
CST is the single spanning tree connecting all MST domains in the switch
network. If each MST domain is regarded as a device, CST is a spanning
tree generated by the MSTP protocol.
1. The domain concept is used in the MSTP. One switching network can
be divided into multiple domains. Multiple spanning trees are
generated in each domain and each spanning tree is independent.
Between domains, the MSTP uses the CIST to ensure that no loopback
exists in the global topology.
2. The Instance concept is used in the MSTP. Multiple VLANs are mapped
to an instance to save communication overhead and resource
utilization. The calculation of each MSTP instance is independent.
(Each instance corresponds to a spanning tree). In these instances,
the load of VLAN data can be shared.
3. MSTP can implement the port status fast transfer similar to the RSTP.
The MSTP sets the VLAN mapping table to associate VLAN and the
spanning tree. At the same time, it divides a switching network into
multiple domains. Multiple spanning trees are generated in each domain.
Each spanning tree is independent. The MSTP prunes the loopback
network into a loopless tree network to avoid increasing and indefinite
cycling of packets in the loopback network. At the same time, multiple
redundant paths for data forwarding are provided. In the process of data
forwarding, the load of VLAN data is balanced.
For example, in the following network, there are four bridges A, B, C, and
D, including VLAN 10, 20, 30, 40, 50, and 60. Four bridges run the MSTP
protocol. Bridge B, C, and D, are in the same MST domain. Bridge A can
be considered to be in an isolated area. On bridge B, C, and D, map VLAN
10 and VLAN 20 to instance 1, map VLAN 30 and VLAN 40 to instance 2,
map VLAN 50 and VLAN 60 to instance 0.
The connection of CIST is shown in the blue links in the following figure.
Frames of VLAN 50 and 60 are forwarded along the active connection.
Bridge A is the general root of the entire CIST. Bridge B is the region root
of CIST. Port 1 of bridge B is the root port of CIST region root.
Normally, the ports do not receive any BPDU packets. If anybody attacks
devices by pretending BPDU, the network oscillation may occur.
The MSTP provides the BPDU Guard function to prevent the attack: after
the BPDU protection function is enabled, if a port whose AdminEdge is
TRUE receives the BPDU packets, the port will be shut down. At the same
time, log information is used to prompt users. The disabled ports can be
restored only by the network administrators. The ports can also be
automatically restored through the port management module.
Root Protection
The root bridge and the backup root bridge of the spanning tree should be
in the same domain, especially for the CIST root bridge and backup bridge.
In the network design, the CIST root bridge and back root bridge are
usually placed in a high bandwidth core domain. But, owing to the
incorrect configuration and the malicious attack in the network, the legal
root bridge in the network may receive BPDU with higher priority. As a
result, the legal root bridge loses the position of the root bridge and the
network topology changes. The illegal changes may cause that the high-
speed link traffic is led to the low-speed link. As a result, the network is
congested.
For the ports enabled with Root Guard function, the port roles in all
instances can only be the specified port. Once the port receives the BPDU
with higher instance priority, the port will be blocked. If no configuration
information with higher priority is received, the port will be restored to the
original status.
Loop Protection
By receiving the BPDU packets sent from the upstream devices, the device
can maintain the status of root ports and other congested ports. Owing to
the link congestion or unidirectional link fault, the ports cannot receive
BPDU packets sent from the upstream devices. The spanning tree
information on the port times out. In this case, the downstream device re-
selects the port role. The downstream device port that cannot receive
BPDU packets will become the designated port and the congested port will
be transferred to the forwarding status. Then, a loopback occurs in the
switching network.
The loop guard function suppresses the generation of the loopback.For the
port configured with the Loop Guard, when the BPDU packets from the
upstream devices cannot be received, the spanning tree information times
out, in the case of recalculating the port roles, set all instances to the
Blocking status, and the port does not participate in the spanning tree
calculation. If the port receives the BPDU packets, it re-participates in the
spanning tree calculation.
After the MSTP calculation, the forwarding paths of different VLANs are
shown in figure 5-5. As a result, the load of each link is reduced. At the
same time, each VLAN has a redundant backup link. When the working
link fails, the redundant link takes effect immediately, which reduces the
traffic lose caused by link failure.
QinQ Technology
Main contents:
Therefore, the QinQ technology comes into being. QinQ expands the VLAN
technology and increases the VLAN quantity to 4K 4K via the double
layers of tags.
The QinQ technology is called VLAN dot1q tunnel, 802.1Q tunnel, VLAN
Stacking technology. The standard comes from IEEE 802.1ad and it is the
expansion of the 802.1Q protocol. QinQ adds one layer of 802.1Q tag
(VLAN tag) based on the original 802.1Q packet head. With the double
layers of tags, the VLAN quantity is increased to 802.1Q. QinQ
encapsulates the private network VLAN tag of the user in the public
network VLAN Tag to make the packet with double layers of VLAN Tags
cross the backbone network (public network) of the operator. In the public
network, the packet is broadcasted according to the out layer of VLAN tag
(that is the public network VLAN Tag) and the private network VLAN Tag
of the user is shielded
The formats of the common 802.1Q packet with one layer of VLAN TAG
and the QinQ packet with two layers of VLAN TAGs are as follows:
QinQ features:
1. Shield the VLAN ID of the user, so as to save the public network VLAN
ID resource of the service provider;
2. The user can plan the private network VLAN ID, avoiding the
confliction with the public network and other user VLAN IDs;
QinQ diagram
The upstream packet of the CE1 switch carries one layer of VLAN tag. The
packet reaches the QinQ port of the PE1 switch. According to the
configuration of the QinQ port, add one out layer of VLAN TAG to the
packet. The packet with two layers of VLAN tags is forwarded to PE2 via
the public network. On the QinQ port of PE2, the out layer of VLAN TAG is
deleted, and the packet recovers to have one layer of VLAN Tag and is
forwarded to CE2.
Basic QinQ: When receiving the packet, the QinQ port adds the VLAN TAG
of the default VLAN of the port to the packet no matter whether the packet
has the VLAN TAG. Before the packet is forwarded out from the QinQ port,
delete the out layer of TAG and then forward it. The disadvantage of the
method is that the encapsulated out layer of VLAN cannot be selected
according to the VLAN TAG of the packet.
Selective QinQ: The selective QinQ solves the disadvantage of the basic
QinQ. When receiving the packet, the QinQ port adds the specified out
layer of VLAN TAG to the packet according to the VLAN TAG of the packet.
If the encapsulated out layer of VLAN TAG is not specified, add the VLAN
TAG of the default VLAN of the port to the packet.
TPID (Tag Protocol Identifier): It is one field in VLAN TAG, used to indicate
the protocol type of VLAN TAG. IEEE 802.1Q protocol defines the value of
the field as 0x8100 The default value of the out layer of TPID of QinQ is
0x8100. The TPID of the out VLAN TAG of the device QinQ packet of some
manufacturer is 0x9100 or 0x9200. The user can modify the TPID of the
port at the public network to realize the intercommunication of the devices
of different manufacturers.
L2 Protocol Control
Technology
Main contents:
Typical application
L2 Protocol Tunnel
With L2 protocol tunnel, the L2 protocol packets (such as BPDU and
LACPDU) of the customer network can be transmitted transparently in the
operators network.
The upper is the operators network and the lower is the user network,
which includes the user network A and user network B. Configure the L2
Protocol Tunnel function on the packet input and output devices at the two
sides of the operators network so that the BPDU and LACPDU packets of
the user network can be transmitted transparently in the operators
network. Besides, the spanning tree calculation and link aggregation
functions of the whole user network can be realized.
L2 Protocol Discard
With L2 protocol discard, the port directly discards the received BPDU and
LACPDU packets so that the packets do not take part in the corresponding
protocol processing.
L2 Protocol Peer
With L2 protocol peer, the port does not process the received BPDU and
LACPDU packets, but directly forward the packets to the upper protocol
module for processing. The function is the default function.
Currently, the bmga protocol, dot1x protocol, gmrp protocol, gvrp protocol,
lacp protocol and stp (mstp) protocol support the L2 protocol tunnel
function.
Typical Application
PE1 and PE2 are the devices of the operator network. Customer A and
Customer B are the devices of the user network.
Networking
The user enables the L2 tunnel function of the STP protocol packets on the
edge ports Port0/0/2 of PE1 and Port0/0/2 of PE2. The network between
PE1 and PE2 can pass the tunnel packets.
L2 Multicast
Main contents:
Main contents:
Terms
Introduction
Terms
1. L2 multicast comprehensive table: the table integrates the L2
multicast information obtained in static configuration and dynamic
learning. In each entry, the VLAN, multicast MAC address, and output
port list obtained through static configuration and dynamic learning
are contained.
Introduction
The public part of the L2 multicast is the middle layer connecting bottom
layer chips and the L2 multicast applications. It integrates the L2 multicast
applications (for example, configured in L2 static multicast and learned
from IGMP Snooping dynamic L2 multicast application) to form the L2
multicast forwarding table, and delivers the entries to the bottom layer
chips. Consequently, the hardware forwarding table is formed.
determined by VLAN and multicast MAC. The forwarding port list is the
collection of ports whose L2 multicast packets should be duplicated and
forwarded.
Main contents:
Terms
Introduction
Typical Application
Terms
L2 static multicast table: a table of L2 static multicast maintenance, each
table entry is the L2 static multicast information generated in static
configuration. The information covers VLAN, multicast MAC, member port
list, and forbidden port list.
Introduction
The L2 static multicast can generate L2 multicast information through the
static configuration. The VLAN, multicast MAC, member port list, and
forbidden port list should be specified. The L2 static multicast table entry
generates the related entries through L2 multicast public part. At last, the
entries are delivered to the hardware forwarding table.
Typical Application
As shown in the preceding figure, the video server and the switch are
connected. The video server sends multicast video programs. The
receivers PC1, PC2, and PC3 are connected with the switch. The ports
connected with the video server and receiver PC belong to the same VLAN.
Create L2 static multicast table entry according to the VLAN and multicast
MAC. Then, set the port connected with PC1 to be member port. Set the
port connected with PC2 to be forbidden port. Do not configure the port
connected with PC3. PC1 can receive the video programs. PC2 and PC3
cannot receive the video programs.
Main contents:
Terms
Introduction
Terms
1. IGMP: Internet Group Management Protocol, used to maintain the
multicast member qualification protocol advertised to the router or
switch by the host.
Introduction
The IGMP protocol creates and maintains the multicast member
qualification between the host and the router. The IGMP protocol is
running between the host and the connected multicast routers. At one side,
the host notifies the multicast router through the IGMP protocol that it
wants to join in and receive the information of specific multicast group (or
specific multicast source); at other side, the router queries through the
IGMP protocol whether any members are in the active status in the local
network segment, namely, check whether any multicast group member
exists in the network segment and then collects the member information I
the local network segment. The multicast router only cares whether any
multicast group member exists in the local network segment; it does not
care the number of the members in the network segment. If there is one
group member, the router will forward the service data of the specified
multicast group (or specified multicast source) to the network segment.
IGMP has three versions: IGMPV1, V2, and V3. The most common version
is IGMP V2. IGMP V1 is defined in RFC1112. It describes the process of
universal query and qualification report. IGMPv2 is defined in RFC 2236.
On the basis of IGMP V1, it adds the group member quick leave
mechanism and querier selection function. IGMPv3 is defined in RFC 3376.
On the basis of IGMP V2, the source filtering function is added. It can
specify to receive specific multicast group service of certain multicast
source host; it can also exclude specified multicast group service.
As shown in the following figure, when the IGMP Snooping is not running
in the L2 device, the multicast data is flooded in the VLAN. The multicast
traffic is forwarded to all ports in the VLAN. When the IGMP Snooping is
running in the in the devices, the known multicast data will not be flooded
in the VLAN, but is forwarded to specified multicast member port.
After the IGMP ordinary query packets are received, the switch forwards
the packets through the ports except the receiving port in the VLAN. The
switch processes the receiving port as follows:
1. If the router port list contains the dynamic router port, reset the
aging timer.
2. If the router port list does not contain the dynamic router port,
add the port to the router port list, and then start the aging timer.
After the IGMP member relation report packets are received, the switch
forwards the packets through all router ports in the VLAN. It parses the
multicast group address that the host will be added from the packets and
processes the receiving port as follows:
In the preceding forwarding process, you cannot forward the packets from
the multicast packet to the port.
Main contents:
Terms
Introduction
Terms
IGMP PROXY: The switch is logically divided into two parts. One part acts
as the IGMP group member, responsible for sending IGMP member report
to the router. The other part acts as the multicast router, sends IGMP
query to the downstream port list and collects member information to
form the member database. Different from IGMP SNOOPING, IGMP Proxy
integrates the port member information to form its own IGMP member
report.
Introduction
The preceding figure is the working principle of IGMP proxy. The L2 switch
running IGMP proxy is logically divided into two parts: IGMP group
member and multicast router. The multicast router disguises the switch to
be a multicast router to send IGMP query information and collect IGMP
member information. The multicast router integrates some group member
information and forms the IGMP proxy member database. The IGMP group
member reports the IGMP member information to the real multicast router
according to the IGMP proxy member database information. Different from
IGMP SNOOPING, the IGMP member report of the downstream receive
host and the leave message are terminated in the switch running IPMP
proxy. The query messages sent by multicast router are also terminated in
the switch running IGMP proxy. IGMP proxy automatically sends IGMP
protocol query, member report, and leave message. But IGMP SNOOPING
only forwards the messages.
Typical Application
Main contents:
Terms
Introduction
Terms
MVR: Multicast VLAN Registration.
Introduction
In the traditional multicasting VOD mode, when users of different VLANs
select programs in the multicasting VOD mode, the multicasting data is
duplicated in each VLAN. This mode wastes large amount of bandwidth
and increases the load of layer 3 equipment. To solve the problem, you
can configure the multicasting VLAN function in the switch, that is, add the
user interfaces belonging to different VLANs to the multicasting VLAN, and
enable the IGMP snooping function. Through the VLAN conversion, the
IGMP joining and leaving packets received by the multicasting VLAN
interface carry the tag of multicasting VLAN. The forwarding table of
multicasting VLAN is generated in the switch. As a result, the multicasting
data only need to send one copy in the multicasting VLAN, and users of
different VLANs can receive the multicasting data. This mode of joining the
user interfaces that should receive multicasting data to the multicasting
VLAN and become the member of multicasting VLAN is called Multicast
VLAN Registration (MVR).
Typical Application
The MVR improves the multicast application. It can save the bandwidth
and reduces the burden of L3 devices. In all multicast application
environments, the MVR can be used. The following figure describes the live
web broadcasting.
Main contents:
Terms
Introduction
Terms
MVP: Multicast VLAN Plus.
Introduction
In the traditional multicast distribution mode, when the users belong to
different VLANs, the upstream will duplicate the multicast data for each
VLAN. It occupies large amount of bandwidth and adds extra burden to L3
device. To solve the problem, you can configure the MVP function in the
switch. The home VLAN of the receiver joins the multicast VLAN as sub-
VLAN. As a result, the receiver of main VLAN and the sub-VLAN in the
multicast VLAN can receive the multicast data flow. Compared with
traditional multicast forwarding mode, the upstream only needs to send
one copy of data to the multicast VLAN. Consequently, the bandwidth is
saved and the upstream pressure is relieved. Compared with MVR, it does
not require than all receivers should join in the multicast VLAN. The cross-
VLAN multicast duplication can be implemented. Users of different VLANs
are isolated, which ensures the security.
Typical Application
The MVP can save bandwidth and reduce the burden of L3 devices. The
following figure describes the live web broadcasting:
Security Technology
Main contents:
802.1X technology
Port security
Port monitoring
Port isolation
Main contents:
Related terms
Introduction
Typical application
Related Terms
Supplicant system: It is the client, an entity located at the LAN. It is
authenticated by the device at the other side of the link. The client is
one user terminal device. The user initiates the 802.1X authentication
by enabling the client software.
PAE (Port Access Entity): It is the entity that executes the calculation
and protocol operations in 802.1X.
Introduction
802.1 X Au thent ication System Structu re
Length: It is the data length, that is, the length of the Packet Body If
it is 0, it means that there is no data.
When the Type of the EAPOL message is EAP-Packet, Packet Body is the
EAP packet structure, as follows:
Code: the EAP type, including Request, Response, Success, and Failure.
Success and Failure do not have Data field. The value of Length is 4.
The Data field format of Request and Response is as follows. Type is the
EAP authentication type and the contents of Type data depend on Type.
Data: the contents of the EAP packet, depending on the Code type.
To support the EAP authentication, RADIUS adds two attributes, that is,
EAP-Message and Message-Authenticator.
EAP-Message
As shown in figure 9-5, the attribute is used to encapsulate the EAP packet.
The type code is 79 and the String field is 253 bytes at most. If the length
of the EAP packet is larger than 253 bytes, you can fragmentize the packet
and encapsulate in multiple EAP-Message attributes.
Message-Authenticator
As shown in figure 9-6, the attribute is used to prevent the access request
packet from being monitored when using the EAP and CHAP authentication.
The packet with the EAP-Message attribute must contain Message-
Authenticator at the same time. Otherwise, the packet is regarded as
invalid and discarded.
1. When the user has the requirement of accessing the network, enable
the 802.1x client program, input the applied and registered user name
and password, and initiate the connection request (EAPOL-Start
packet). Here, the client program sends the authentication request
packet to the device side and starts one authentication.
2. After receiving the authentication request data frame, the device side
sends one request frame (EAP-Request/Identity packet) to ask the
client program of the user to send the input user name.
3. The client program answers the request of the device side and sends
the user name information to the device side via the data frame (EAP-
Response/Identity packet). The device side encapsulates the data
8. The client also can send the EAPOL-Logoff packet to the device side to
ask for logout actively. The device side changes the port status from
the authorized state to the un-authorized state and sends the EAP-
Failure packet to the client.
Auto Vlan in the port-based access control mode is valid only on the
ACCESS port. Auto Vlan in the MAC-based access control mode is valid
only on the HYBRID port. In other access control modes, Auto Vlan is
invalid.
Auto Vlan is also called Assigned Vlan. When the 802.1x user passes the
authentication on the server, the server delivers the authorized VLAN
information to the device side. If the delivered VLAN is illegal (VLAN ID is
wrong or the VLAN does not exist), the authentication fails. Otherwise, the
authentication port is added to the delivered VLAN. After the user logs out,
the port recovers to the unauthorized state and is deleted from the Auto
Vlan. The default VLAN of the port recovers to the previous configured
VLAN.
The authorized delivered Auto Vlan does not change or affect the port
configuration, but the priority of the authorized delivered Auto Vlan is
higher than that of the Vlan configured by the user (that is Config Vlan),
that is to say, the effective Vlan after passing the authentication is the
authorized delivered Auto Vlan and the Config Vlan takes effect after the
user logs out.
Guest Vlan:
Guest Vlan in the port-based access control mode takes effect only on the
ACCESS port. Guest Vlan in the MAC-based access control mode takes
effect only on the HYBRID port. It does not take effect in other access
control mode.
The user in Guest Vlan can get the 802.1X client software, upgrade the
client, or execute other application upgrade programs (such as anti virus
software and operation system patch program).
After enabling the 802.1X and configuring Guest Vlan, the port is added to
the Guest Vlan in untagged mode. Here, the users of the ports in the
Guest Vlan initiate authentication. If the authentication fails, the port is
still in Guest Vlan; if the authentication succeeds, there are two cases as
follows:
1. If the authentication server delivers one Vlan, the port leaves Guest
Vlan and is added to the delivered Vlan. After the user logs out, the
port returns to Guest Vlan.
2. If the authentication server does not deliver Vlan, the port leaves
Guest Vlan and is added to Config Vlan. After the user logs out, the
port returns to Guest Vlan.
802.1 X Expansion
User-based authentication:
The standard 802.1X protocol is based on the port to realize, that is, as
long as one user of the port passes the authentication, the other users can
use the network resources without authentication, but after the user logs
out, the other users also are denied to use the network. Maipu switch
supports the user-based authentication (based on MAC address). When
the port is configured as the user-based authentication, each user of the
port needs the separate authentication. Only the users that pass the
authentication can use the network resources. After one user logs out,
only the user cannot use the network, but the other authenticated users
still can use the network.
The standard 802.1X protocol defines that the client and the server
interact with each other via the EAP packet. During the interaction, the
device serves as the role of EAP relay. The device encapsulates the EAP
data sent from the authentication server in the EAPOL packet and then
sends it to the client. The interaction mode is called EAP relay. The EAP
relay requires that the authentication server supports the EAP protocol.
Otherwise, the authentication server cannot interact with the client by
using EAP. Considering the actual application environment, maybe the
previous deployed authentication sever does not support the EAP protocol,
so Maipu switch expands it and supports the EAP termination mode. The
EAP data of the client is not directly sent to the authentication server, but
the device completes the EAP interaction with the client. The device gets
the authentication information of the user from the EAP data and then
sends it to the authentication server for authentication. If adopting the
EAP termination mode, only MD5-based EAP authentication is supported.
When adopting the EAP termination mode, the service interaction flow is
as follows:
Figure 9-8 The service flow of the EAP termination mode of the 802.1X
authentication system
Compare Figure 9-8 with Figure 9-7, and we can see that when EAP
termination mode is adopted, the EAP protocol packer is not sent to the
authentication server, but terminates at the device side. The device gets
the enough information from the EAP protocol packet and then sends it to
the authentication server for authentication.
In the standard 802.1X function, the client and the authentication device
exchange information via the EAPOL (EAP over LAN) packets. In the actual
application environment, because of the network complexity, maybe the
user to be authenticated and the authentication device need to traverse
the intermediate switch. Once the intermediate switches do not transmit
In the actual network, besides lots of PC terminal users, there are some
network terminals (such as network printer), which do not carry or cannot
be installed with 802.1X client program. Therefore, this kind of user
authentication is called non-client user authentication, that is, the so-
called MAC address authentication. The authentication method does not
need the user to install any client software. After the device detects the
user MAC address at the first time, enable the authentication for the user
at once. The authentication process does not need the user to input the
user name and password. After passing the authentication, the user can
access the network. The authentication is suitable for the terminal without
client software to authenticate and the PC terminal user that does not
want to install the client software or does not want to input the user name
or password to authenticate.
When performing the MAC address authentication, you can select the user
name type of the MAC address authentication. Usually, there are the
following two modes:
MAC address user name: Use the MAC address information of the user as
the user name and password for authentication.
Fixed user name: No matter what is the user MAC address, all users use
the local user name and password pre-configured on the device to
authenticate.
In the 802.1X authentication environment that uses the radius server, you
can configure the corresponding ACL name on radius. When the user
authentication is passed, the server delivers the ACL name to the
authentication device, which binds the user with the ACL so that the
subsequent actions of the user are controlled by ACL. The ACL needs to be
pre-configured on the device. Passing the user authentication is just a
process of searching and binding. If the searching or binding fails, the user
cannot be online.
Typical Application
802.1 x C lient Authent ication
The Supplicant is connected to the network via 802.1X authentication. The
authentication server is the Radius server. The port 0/0/1 connected to
the Supplicant is in Vlan 1; the authentication server is in Vlan2; Update
Server is the server used to download and upgrade the client software and
is in Vlan 10; the port 0/0/2 of the switch connected to Internet is in Vlan
5.
Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 1 Vlan 5
Port 0/1 Switch Port 0/2
Internet
Supplicant
Figure 9-9
Port 0/1 is added to Guest Vlan. Here, Supplicant and Update Server are in
Vlan 10; Supplicant can access Update Server and download the 802.1X
client.
Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 10
Vlan 1 Vlan 5
Port 0/1 Switch Port 0/2
Internet
Supplicant
Figure 9-10
When the user goes online after passing the authentication, the
authentication server delivers Vlan5. Here, Supplicant and Port 0/2 are in
Vlan 5; Supplicant can access Internet.
Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 1 Vlan 5
Port 0/1 Switch Port 0/2
Internet
Vlan 5
Supplicant
Figure 9-11
Figure 9-12
Main contents:
Related terms
Introduction
Typical application
Related Terms
Trust Port: DHCP Snooping divides the ports to trust port and un-trust
port and performs some limitation processing for the DHCP packet on the
un-trust port, so as to realize the security policy.
Option 82: Option82 is one DHCP option. The option is used to record the
location information of the DHCP client. The administrator can locate the
DHCP client according to the option, so as to perform some security
control.
Introduction
DHCP Snooping is one security feature of DHCP. It can ensure that the
client gets the IP address from the legal server, preventing the proof
attack. It also can record the corresponding relation between the IP
address and the MAC address of the DHCP client for the administrator to
view and for other security modules to use.
DHCP Snooping records the MAC address of the DHCP customer and the
got IP address by snooping the DHCP-REQUEST and DHCP-ACK broadcast
packets received by the trust ports. The administrator can use the show
dhcp-snooping command to view the information about the IP address
got by the DHCP client.
The trust port is the port directly or indirectly connected to the legal DHCP
server. The trust port forwards the received DHCP packets normally, so as
to ensure that the DHCP client gets the correct IP address.
The un-trust port is the port not connected to the legal DHCP server. If the
DHCP-ACK and DHCP-OFFER packets returned by the DHCP server are
received from the un-trust port, discard them, so as to prevent the DHCP
client from getting the wrong IP address.
Support O p tion 82
Option82 is one DHCP option. The option is used to record the location
information of the DHCP client. The administrator can locate the DHCP
client according to the option, so as to perform some security control, such
as restrict the number of the IP addresses distributed to one port or VLAN.
Option 82 can contain 255 sub options at most. SM4100 series switch only
supports two sub options, that is, sub-option 1 (Circuit ID) and sub-option
2 (Remote ID).
SM4100 series switch supports two kinds of filling formats, that is, default
format and user-configured format.
The contents of the two sub options of the default format are as follows:
The contents of the two sub options of the user-configured format are as
follows:
1. After receiving the DHCP request packets, the device performs the
following processing on the packets according to whether the packet
contains Option 82 and the processing policy configured by the user,
as well as the filling format, and then forwards the processed packets
to the DHCP server.
port each second. If the number of the packets received each second
exceeds the set threshold, the excessive packets are directly dropped by
CPU. If the number of the received DHCP packets exceeds the threshold in
successive 20 seconds, directly shut down the port and whether to recover
automatically depends the configuration managed by the port. You can
also recover manually.
Typical Application
The typical application of the DHCP Flooding function in the network is as
shown in the following Switch A. The port connected to the client network
is set as the un-trust port and the port connected to the relay or server is
set as the trust port. This can ensure that the client can get the IP address
from the trust port (that is the legal server).
Main contents:
Related terms
Introduction
Typical application
Related Terms
IP Source Guard: Filter IP packets via IP or IP+MAC.
Introduction
With the IP Source Guard binding function, you can filter the packets
forwarded by the port, so as to prevent the packets with invalid IP address
and MAC address from passing the port and improve the port security.
After receiving the packet, the port searches for the IP Source Guard
binding entries and perform the following processing on the packet
according to the filter mode specified on the port.
When the filter mode of the port is IP: If the source IP address of the
packet is the same as the IP address recorded in the binding entries,
the port forwards the packet. Otherwise, drop the packet.
When the filter mode of the port is IP+MAC: If the source MAC
address and source IP address of the packet is the same as the MAC
address and IP address recorded in the binding entries, the port
forwards the packet. Otherwise, drop the packet.
The IP Source Guard binding entries have two sources. One is the static
binding entries configured manually by IP Source Guard; the other is the
entries maintained by DHCP Snooping.
3. When adding the IP Source Guard static entries, update the hardware
entries automatically. Delete the hardware entries during deletion. If
setting the hardware entries fails, the static table sets Writed-Flag as
nonwrite.
5. Synchronize the software table (IP Source Guard static entries and
DHCP Snooping dynamic entries) and hardware table every minute.
Because of the ACL resource limitation, it is likely that all software
entries cannot be written into the hardware entries. You need to check
whether there are available resources regularly. If there are available
resources, for example, some entries are deleted and the ACL
resources are adjusted larger, write the legal entries in the software
table into the hardware entries. The default ACL resources are two
slices, that is, 256. Enabling one port needs to occupy two and the
other are used to set the filter entries.
Typical Application
Applic ation in non - D H CP Snooping
en viro nment
Main contents:
Related terms
Introduction
Typical application
Related Terms
Dynamic ARP Inspection: It is one security measure of discovering and
preventing the ARP proof attack by checking the validity of the ARP packet.
Introduction
Dynamic ARP detection function can be used to discover and prevent the
ARP proof attack.
The dynamic ARP function re-directs all ARP packets (broadcast ARP and
unicast ARP) of the port on which the ARP detection function is enabled to
CPU for judging, comparing, software forwarding, log recording and so on,
so when there are lots of ARP packets, the CPU resource is consumed.
Therefore, in the normal state, it is not recommended to enable the
function. When it is double that there is the ARP proof attack in the
network, you can enable the function to confirm and locate.
The device does not check all ARP packets from the port on which the
dynamic arp inspection function is not enabled, but directly forward the
packets. Usually, the port on which the dynamic arp inspection is not
enabled is the upstream port of the device. The device checks the ARP
packets from the port on which the dynamic arp inspection function is
enabled according to the DHCP Snooping table or the IP static binding
table configured manually by IP Source Guard.
AR P Detection Pol ic y
1. When the binding relation of the source IP address and source MAC
address in the ARP packet matches with the DHCP Snooping entries
or the manual-configured IP static binding entries, and the ingress
port of the ARP packet and its VLAN are consistent with the DHCP
2. When the binding relation of the source IP address and source MAC
address in the ARP packet does not match with the DHCP Snooping
entries or the manual-configured IP static binding entries, and the
ingress port of the ARP packet and its VLAN are inconsistent with
the DHCP Snooping entries or the IP static binding entries manually
configured by IP Source Guard, the ARP packet is invalid and is
dropped. Besides, the log information is printed.
1. If the destination MAC address of the ARP packet is the local device,
forward the packet to the ARP protocol stack processing and update
the ARP cache of the local device.
1. Receiving VLAN
2. Receiving port
4. The MAC address of the sender and the destination MAC address
The log information is not output in real time, but output periodically. The
user can perform the further processing according to the output log
information, such as locate the host that initiates the ARP attack.
Typical Application
Port Security
This section describes the basic theory of the port security and its
application.
Main contents:
Introduction
Typical application
Introduction
The port security is applied at the access layer. It can limit the hosts that
access the network via the device, permitting some specified hosts to
access the network, but other hosts cannot access the network.
The port security function binds the MAC address, IP address, VLAN ID and
Port of the user flexibly to prevent the invalid user from being connected
to the network, so as to ensure the security of the network data and the
valid user can get the enough bandwidth.
The user can limit the hosts that can access the network via three kinds of
rules, including MAC rule, IP rule and MAX rule. The MAC rule is divided to
three kinds of binding modes, that is, MAC binding, MAC+IP binding, and
MAC+VID binding. The IP rule can be for one IP or a series of IP. The MAX
rule is used to limit the number of the maximum MAC addresses that the
port can learn (by order). The maximum number of the MAC addresses
does not include the valid MAC addresses generated by the MAC rule and
IP rule.
The MAC rule and IP rule can specify whether the packet that matches
with the corresponding rule permits the communication. With the MAC rule,
you can bind the MAC address with VLAN, MAC address with IP address
flexibly. The port security is realized based on the software. The rule
quantity is not limited by the hardware resources, which makes the
configuration more flexible.
The rules of the port security depend one the ARP packets of the terminal
device to trigger. When the device receives the ARP packet, the port
security gets the information about various kinds of packets to match the
configured three rules. The matching order is first to match the MAC rule,
then match IP rule and at last match the MAX rule. Control the L2
forwarding table of the port according to the matching result, so as to
control the forwarding of the port for the packet.
When the port security regards the packet as the illegal packet, it
performs the corresponding process. Currently, there are three kinds of
processing modes, that is, protect, restrict, and shutdown. The protect
mode drops packets; the restrict mode drops packets and trap alarm
(alarm within two minutes when receiving illegal packet); besides the
actions of the restrict mode, the shutdown mode shuts down the port.
Typical Application
Refer to the related chapter of the configuration manual.
Port Monitoring
This section describes the basic theory of the port monitoring and its
application.
Main contents:
Introduction
Typical application
Introduction
The port monitoring function is to monitor the packets on the switch CPU,
filter the excessive packets at the bottom layer and protect the switch
from being attacked by the lots of invalid packets.
The monitoring includes the port monitoring and host monitoring. When
the switch is attacked, the user first enables the port monitoring. The
monitoring program measures the packets to the CPU by port. The user
discovers the attacked port from the statistics data and then enables the
host monitoring on the port and sets the upper threshold of the packets to
the CPU in sampling period. The packets that exceed the threshold in the
sampling period from the host that initiates the attack are filtered at the
bottom layer and they do not go to the IP layer for being routed and are
not written into the hardware route table, so as to save the CPU resources
and hardware table resources. When performing the packet filtering on the
host that initiates the attack, the other hosts still can communicate
normally. The monitoring program writes the host whose packets to the
CPU exceed the upper threshold in the sampling period into the blacklist.
In the next sampling period, only half of the upper threshold of the
packets of the hosts in the backlist can go to CPU and the other packets to
CPU are dropped. The port monitoring program performs the measuring
and dropping operations according to the packet classification.
The port monitoring program calculates the sampling result at the end of
each sampling period and updates the backlist information.
5. other-packet: The other packets except for the previous four kinds
of packets;
Typical Application
Refer to the related chapters of the configuration manual.
Port Isolation
This section describes the basic theory of the port isolation and its
application.
Main contents:
Related terms
Introduction
Typical application
Related Terms
Port isolation: It is one function of the port security. The function can
prevent the packet forwarding between one port and the other ports of the
switch.
Introduction
The port isolation is port-based security feature. The user can specify the
isolated ports of one port as desired to realize the L2 and L3 data isolation
between the port and the isolated ports, which improves the network
security and provide the flexible networking scheme for the user.
By default, the packets can be forwarded between any two ports in one
VLAN of the switch. To make any specified port in one VLAN cannot
communicate, you can configure the isolated ports of the port in the
specified port mode so that the port that is configured with the port
isolation cannot communicate with the specified isolated ports.
The port isolation is not related with the VLAN of the port. Currently, the
switch supports configuring the isolated ports in the common port and
aggregation port mode. The isolated port can be common port or
aggregation port. The port isolation only realizes the uni-directional packet
dropping. Suppose that the isolated ports are set as B, C, and D on port A.
If the destination port of the packet entering from the port A is B/C/D,
drop the packet directly. However, the destination port of the packet
entering from the port B/C/D is port A, the packet can be forwarded
normally.
Typical Application
Illustration
PC1, PC2 and PC3 cannot communicate with each other, but can
communicate with the public network normally. In the normal state, the
ports in one VLAN can communicate with each other. To meet the previous
environment, you can use the port isolation function to realize the
application environment. Isolate port 0/2 and port0/3 on port 0/1; isolate
port 0/1 and port0/3 on port 0/2; isolate port 0/1 and port0/2 on port 0/3.
After the configuration, port 0/1, port 0/2, and port 0/3 cannot
communicate with each other, but can communicate with port 0/27.
SPAN Technology
This chapter describes the port mirroring SPAN technology and application.
Main contents:
SPAN technology
Typical application
SPAN Technology
Switched Port Analyzer (SPAN) is used to monitor the data flow of the
switch port. You can use SPAN to copy the frames on one monitoring port
(source port) to another destination port on the switch connected to the
network analysis device to analyze the communication on the source port.
The user adopts the network analysis device to analyze the packets
received by the destination port for network monitoring and
troubleshooting. SPAN does not affect the normal packet switching of the
switch, but all frames that enter into the source port and are output from
the source port are copied to the destination port. However, for one
destination port with excessive traffic, for example, one 100Mbps
destination port monitors one 1000Mbps port, the frames may be dropped.
The SPAN session means the data flow between one group of monitoring
ports and one destination port. The data of multiple monitoring ports can
be mirrored to the destination port. The mirrored data flow can be the
input data flow, output data flow or output and input data flow. You can
set SPAN for the port that is in the close state, the SPAN session is
inactive. However, as long as the port is enabled, SPAN becomes active.
Each line card support the SPAN session of four rx and one tx.
Local SPAN
Local SPAN supports the port mirroring on one switch and all monitoring
ports and destination ports are on one switch. Local SPAN mirrors the data
of one or multiple monitoring ports to the destination port.
Remote SPAN
RSPAN supports that the monitoring port and the destination port are not
on the same switch, so as to realize the remote monitoring across the
network. Each RSPAN Session bears the monitoring traffic on the specified
RSPAN VLAN. RSPAN includes RSPAN Source Session, RSPAN VLAN, and
RSPAN Destination Session. You need to configure RSPAN Source Session
and RSPAN Destination Session on different switches. When configuring
RSPAN Source Session, you need to specify one or multiple monitoring
ports and one RSPAN VLAN. The monitoring data is sent to RSPAN VLAN.
Configure RSPAN Destination Session on another switch and you need to
specify the destination port and RSPAN VLAN. RSPAN Destination Session
sends the RSPAN VLAN data to the destination port.
The switches that realize the remote port mirroring function are divided to
three kinds:
Traffic Types
The data of the monitoring port (source port) is monitored for network
analysis. The monitored data flow can be input, output or bi-directional
and can be in different VLANs.
Destination port
The destination port can only be one separate physical port or aggregation
group. One destination port can only be used in one SPAN session.
The destination port cannot take part in the STP calculation. The local
SPAN includes the BPDU of the monitored traffic, so any BPDU seen by
the destination port is from the source port;
The destination port does not enable the LACP and 802.1X function,
preventing the mirroring data from being affected;
The source RSPAN destination port can only be the common port, but
cannot be the aggregation port;
The destination port can serve as the common forwarding port, but to
prevent the monitored data from being interfered by other data flow,
it is recommended to delete the destination port from all VLANs.
RSPAN VLAN
RSPAN Vlan should be one private idle VLAN for RSPAN and its VLAN
number can be 2-4096. You can select one idle VLAN flexibly during
configuration, but you need to ensure that other devices on all paths to
the analysis device are all configured with the VLAN and the corresponding
ports are added to the VLAN.
Except for the ports those are used to bear the RSPAN traffic, do not
configure any port to RSPAN VLAN;
Limitations
1. SPAN and flow mirroring use the same chip resource. When
enabling the port mirroring, avoid enabling the flow mirroring.
Otherwise, the hardware resource may become lacking.
Typical Application
Local SPAN Appl ication
The following is one simple local SPAN environment.
Illustration
In the above figure, all packets of port 0/1 are mirrored to port 0/2. The
network analyzer connected to port 0/2 is not connected to port 0/1
directly, but port 0/2 can receive the packets of port 0/1 via the mirroring.
Illustration
In the above figure, the mirroring packets of the port 0/8 on the source
device switch 1 are transmitted to the destination port 0/1 of the
destination device switch 2 via RSPAN Vlan 100, realizing the monitoring
for the sent and received packets of the source switch ports on the
destination switch.
Main contents:
M-VRF
Load balance
The source of the route includes three types: when the forwarding device
is directly connected to the network, the directly-connected route is
There are many paths for packets sent from one host to another.
Therefore, the best path should be selected to forward the packets.
Determine the path from the following aspects:
Path length: the path length can be measured through the hops or
cost. In the distance vector routing protocol, the path length refers to
the number of the forwarding devices from the source host to the
destination host. In the link status routing protocol, the path length
refers to the sum of the cost of each link.
Reliability: measured by the error rate between the source host and
the destination host. In most routing protocols, the reliability of a link
is designated by the network engineer.
Delay: refers to the sum of the time spent in traveling through all
network devices, links, and switching devices. In addition, for the
delay time, the network congestion and the distance between the
source end and the destination end. Many variables are taken into
account for the delay time. Therefore, in the calculation for best path,
delay is an important measurement standard.
To find a route, use the next-hop address as the destination, and parse
the link layer address. The next-hop address may be the address of
another host directly connected with the switch. It may be the address of
another host non-directly connected with the switch in the network. The
addresses can be routed.
To route the packets, the switch searches the routing table to get the
correct route. In the database, each route in the database should contain
the following two conditions:
1. Destination address: The network address that the switch can reach.
Based on the same primary network address , the switch may have
more than one route to the same address.
2. Destination pointer: The pointer specifies whether the network and the
switch are directly connected or specifies the address of the next
switch, namely, the next-hop switch.
The switch will try to match the most special address. In the following
special sequence, the address may be one of the following:
Subnet
Main network ID
Default address
If the destination address of the packets does not match any entry in the
routing table, the packets will be discarded and send an ICMP message
that the destination address is unavailable to the source address.
Illustration
The data flow sent to PC-2 from PC-1 reaches the default gateway switch-
a. Switch-a finds that the destination address 10.1.1.1 of the data flow is
not the local address. Search the routing table. Owing to the existence of
static route 10.1.1.0/24, switch-a can forward the data flow to the next
hop 10.1.2.1 (namely switch-b). Switch-b continues forwarding, the
destination address of the data flow hits the directly connected route, and
the data flow is successfully transmitted to PC-2.
When the static route is a load balancing route, it is possible that the data
is sent to the CPU continuously owing to the different route of the software
and hardware.
hop. For example: select 1.1.1.3. For 1.1.1.3, if the ARP is not parsed, the
packets should be transmitted to the CPU to perform software forwarding.
After the packets reach the CPU, if the software also adopts the flow load
balancing mode to select the next hop, owing to the different algorithm of
software and hardware, 1.1.1.2 may be selected. As a result, the ARP
parsing of 1.1.1.2 is implemented. 1.1.1.3 is not parsed.
Then, the hardware selects 1.1.1.3 as the next hop. The software selects
1.1.1.2 as the next hop. Consequently, the data flow is continuously
transmitted to the CPU and hardware forwarding cannot be performed.
Therefore, for the hardware route switching devices, when the static route
load balancing mode is used, we recommend setting the software load
balancing to packet load balancing mode. Then, each next hop on the
software can perform ARP parsing.
M-VRF
Main contents:
Terms
Introduction to M-VRF
Terms of M-VRF
VPN- Virtual Private Network Through VPN technology, two or multiple
network sites can be connected through the Internet. In the VPN, the
running mode is like that all sites are in a single private network.
M-VRF- MultiVPN Routing and Forwarding In the switch, each VPN has
its own routing and forwarding table. All customers of sites of the VPN can
only access the routes of the table.
Introduction to M-VRF
M-VRF supports the VPN. In a switch, multiple VRFs may exist. The
resources (interface, IP address, routing table) belong to a VRF. The
resources in different VRF cannot access mutually. Through the Multi-VRF
function, users can isolate the network. And the address space overlapping
is supported.
The M-VRF does not modify the packet format. It only enhances the
security by dividing the resource attributes. The resources in the system
belong to one VRF only. After the interface is configured with a VRF, the
packets sent or received through the interface can only access the
resources of its own VRF.
Load Balancing
Main contents:
1. Switching per packet: when the concurrent link is less than 64K, it is a
good option. Missequence may occur. It is improper for specific
application, such as voice traffic (depends on the sequence of the
arrived packets))
2. Switching per session: when the load of the link used by the session
traffic is heavy (for the communication traffic is heavy), but the load
of other links is light, the load of different links may be unbalanced.
Note
The RIP protocol includes RIPv1 and RIPv2. RIPv1 does not support
classless routes but RIPv2 supports the classless routes. Usually RIPv2 is
used.
The RIP protocol is simple and the configuration is also simple. The routing
information to be advertised by the RIP protocol and the number of routes
in the routing table are directly proportional. A large number of routes use
lots of network resources. At the same time, the RIP protocol defines that
the maximum of the hops is 15. Therefore, the RIP protocol is only
applicable to the simple small-to-medium network.
As shown in the preceding figure, the RIP protocol is based on the UDP
protocol. The protocol packets sent by the RIP protocol are encapsulated
in the UDP packets. At port 520, the RIP protocol receives the protocol
packets sent from the remote routing devices. It updates the local routing
table according to the routing information in the received protocol packets.
At the same time, add one to the metric and then notify other adjacent
routing devices. Through this mode, all routing devices in the route
domain can learn all routes.
The RIP protocol sends packets in the following three modes: broadcast,
multicast, and unicast. The usage of each mode is shown in the following
table.
There are two types of packets: Request packets and Response packets.
The RIP packet types and the functions are as follows.
As shown in the preceding figure, the RIP packets are encapsulated in the
UDP packets. In the IP header of the RIP packets, TTL is set to 1 to
prevent RIP packets from being forwarded by other routing devices.
The RIP header has two fields: Command field identifies the request
packets (value is 1) or response packets (value is 2); Version field
identifies the RIPv1 (value is 1) or RIPv2 (value is 2).
RIP Entry includes three types: RIPv1 routing entry, RIPv2 routing entry,
and authentication information entry. RIP Entry types and description are
as follows.
The working flow of the RIP protocol is shown in the preceding figure. It
can be divided into two parts: one is the RIP protocol starting flow, and
the other is the processing flow of RIP receiving packets.
When an interface starts to run the RIP protocol, request packets are sent
to the interface through the broadcast (RIPv1) or multicast (RIPv2) mode
to request all routing information from all adjacent routing devices. Then,
the fast convergence can be implemented.
After the response packets of the request packets are received, update the
routes in the route database according to the routing information
contained in the packets. Then, the changed routes are advertised to other
adjacent routing device RIP (triggered updates).
At the same time, start the Updates timer. Every 30 seconds by default,
advertise all routing information through response packets to the adjacent
routing device RIP. The purpose of the operation is to ensure the
synchronization of the database between the routing device RIPs and to
update the advertise routes. As a result, the previously advertised routes
do not time out or become invalid on other routing devices.
Route Database
The route database records all routing information about the RIP protocol.
Each routing information is composed of the following elements:
6. Route tag: defined by the user, for marking category 1 route. For
example, mark that a route is obtained through redistributing the
BGP routes.
In the RIP route database, the sources of the routing entries are as follows:
In RIPv1, the next-hop interface of the route is the interface of the learned
route. The next-hop IP address is the source IP address of the response
packets of the learned route.
In RIPv2, the routing information in the response packets can carry the
next-hop IP address. The next-hop interface of the route is the interface of
the learned route. The next-hop IP address can be one of the following:
the source IP address of the response packets that learned the route; the
next-hop IP address carried in the routing information. If the next-hop IP
address in the routing information and the interface that receives the
routing information are in the same subnet, the next-hop IP address of the
route is the next-hop IP address in the routing information. Otherwise, the
next-hop IP address of the route is the source IP address of the response
packets. The purpose is to implement the re-direction function.
As shown in the preceding figure, switch-A runs RIP, switch-B runs RIP
and OSPF, switch-C runs OSPF. In switch-B, the RIP redistributes the
learned OSPF route 11.0.0.0/8. As a result, switch-A can learn the route
11.0.0.0/8 that reaches the subnet. When switch-A learns the route, by
default, the next-hop is switch-B, namely, 10.1.1.2. Then, the packets
forwarded from switch-A to destination subnet 11.0.0.0/8 reach switch-C
through switch-B.
Route Updates
When a route is learned from the adjacent routing device RIP, in the
following cases, use the route to update the route in the database:
1. The route does not exist in the route database and the metric of the
route is less than 16 hops.
2. The route exists in the database. The source IP address and the source
IP address of the learned route are the same.
3. The route exists in the database, but the metric is equal to or greater
than the metric of the learned route.
To accurate the number of metric hops, when the routes in the route
database are advertised, the metric increases 1. The maximum of the
metric is 15. When the metric is greater than 15, the route is considered
to be unreachable.
R I P Ti mer
Running
invalid timer on
Valid nexthops of routes
Invalid Timer timeout
or metric is updating Route Update
to 16 (Unreachable)
Running Holddown
Timer timeout Running
holdown timer Invalid +
Invalid flush timer on
and Holddown routes
flush timer on
routes
Flush Flush
Timer timeout Timer timeout
Flush
(Delete route from
database)
RIP protocol contains four timers, Update timer, invalid timer, holddown
timer, and flush timer. The description of each timer is as follows.
Counting to Infinity
The RIP protocol allows the maximum hop of 15. The destination greater
than 15 hops is considered to be unreachable. The number restricts the
network size and prevents the infinite transfer of routing information. The
routing information travels from one routing device to another. The
number of hops increases 1 at each transfer. When the number of hops
exceeds 15, the route will be deleted from the routing table.
Split Horizon
The rules of the RIP split horizon are as follows: if the routing device RIP
learns routing information A from an interface, the response packets sent
to the interface cannot contain the routing information A.
Poisoned Reverse
The purpose of poisoned reverse and the purpose of the split horizon are
the same, but the operations are different.
The rules of the RIP poisoned reverse are as follows: if the routing device
RIP learns routing information A from an interface, the response packets
sent to the interface cannot contain the routing information A, but the
metric is set to 16 (namely unreachable).
Compared with split horizon, the poisoned reverse has the advantage that:
when the number of hops is set to unreachable, notify the routing
information to the source routing device, if the route loopback already
exists, the route loopback can be broken immediately. But for the split
horizon, it has to wait until the wrong route entry is deleted for timeout.
The disadvantage is that: the poisoned reverse increases the size of the
response packets. As a result, the consumption of the protocol bandwidth
is increased.
Holddown Timer
Through the hoddown timer, before the route device receives the message
that the route is unreachable, the unreachable route will not be updated
by the received response packets. The route entry information in the
received response packets may be the packets advertised by itself.
Triggered updates
The poisoned reverse and split horizon break the route loopback composed
of any two routing devices. The route loopback composed of three or more
routing devices may also occur until the route metric is accurate to infinite
(16). The triggered updates can quicken the convergence of the route.
Then, the time for breaking the route loopback is shortened.
Related terms
successor: the next router passed from the current router to the
destination router;
The router only provides the used route information for the direct-
connected neighbor. The information sent to the neighbor also can be
filtered at first, and then be sent.
1. IRMP saves all routes sent by all neighbors in the topology table, but
not just save the best route received up to now;
IRMP Types
Opcode Type
1 Update
3 Query
4 Reply
5 Hello
6 IPX SAP (does not support for the moment)
When re-transmitting the packet to the neighbor that does not reply
within the multicast timeout interval;
OSFP features
Area: the collection of routing devices, which has such topology database:
OSPF divides one AS into multiple areas; the topology of one are is
invisible to another area, which reduces the number of routing information
in an AS. The area is used to contain link state updates and enables the
administrator to create hierarchical network.
LSA- Link state advertisement: the data unit for describing local routing
device or network state. For a routing device, the interface state of the
routing device and the adjacency state are contained. The advertisement
of each link is sent to the entire area. The routing device uses the
collected link state advertisement to form the link state database.
Stub Area-the area that has only one interface connected with the
external. Category 5 LSA cannot be flooded to the area.
Introduction to OSPF
Open Shortest Path First Protocol (OSPF) is a dynamic routing protocol. It
can detect the network change in the AS and form new route after
convergence for some time. The convergence time is short and the routing
information is limited. In the OSPF protocol, each route maintains one
network topology database describing the AS. Each specific routing device
has the same database. Each record of the database is the local state of
the specific routing device The routing device distributes the local states
through the flooding mode in the AS.
All routing devices run the same algorithm in parallel. Each routing device
uses the link state database to generate a shortest path tree with itself as
the root. The shortest path tree provides the route to each destination in
the AS. The external routing information serves as leaves in the tree.
All OSPF interactions are authorized. It means that only the trusted
routing devices can participate in the route of AS. Multiple authorization
configurations can be used. Actually, each subnet can be configured with
independent authorization.
SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder
router (ABR);
SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder
router (ABR);
SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router
(ABR);
Process o f OSPF
The basic idea of OSPF: in the AS, each routing device running OSPF
collects the link state. Broadcast the link state in the entire system
through the flooding mode. Then, the entire system maintains the
synchronized link state database. Each routing device calculates a shortest
path tree with the device itself as the root and other network nodes as the
leaves through the database. Then, the best routes to many places in the
system are obtained.
The routing devices running the OSPF form an AS. The AS can be divided
into multiple areas. For each routing device in the area, an AS topology
(link state database is required).
After the topology is obtained, routing device A runs the SPF algorithm to
generate one shortest path tree with itself as the root and records it in the
routing table. The route to the destination in the future is obtained from
the routing table.
The area boundary router belongs to multiple areas at the same time.
Therefore, the topology of the home area of routing device A will be
advertised to other areas, and the topology of other areas will be
advertised into the area. Through the exchange of topology in the
boundary routing devices, the home area of routing device A learns the
network topology of the entire AS area. In the OSPF, the boundary routing
devices form the backbone area.
of the route calculation is to calculate one shortest path tree a root; the
second step is to calculate the leaves (routes) on the top point according
to the shortest path tree. The increment for the shortest path tree in the
network topology is called incremental SPF (ISPF); the incremental
calculation for the leaves (routes) are called Partial Route Calculate (PRC).
Incremental calculation can significantly improve the calculation
performance of the routing devices and decrease the CPU load.
Generally, the initial interval can be set to 100 milliseconds, which can
respond to burst change quickly; the incremental interval can be set to
100 milliseconds or 1 second; the maximum interval can be set to 5
seconds or 10 seconds.
The ISPF only processes the network topology information, namely, only
calculates the shortest path tree. By reorganizing the link, the ISPF forms
a graphical database reflecting the network topology. The calculated
shortest path tree is saved in the graphic. When the link state changes,
the ISPF determines the affected network topology. Then, only the
affected parts are calculated, instead of the entire network topology.
As shown in the preceding figure, RTA is the root node (the routing
devices performing calculation). When the cost of RTC-> RTD (blue link) is
changed into 50 from 100, the affected parts are RTD and RTE. Other
routing devices are not affected. ISPF will judge the range of the effect.
Then, only the routes released by RTD and RTE are calculated.
If the positions of the network topology changes are different, the affected
range is different. The spent time of the ISPF calculation is different.
Therefore, the spent time is different, even in the same network structure.
If the sides of the root node change (RTA->RTB and RTA->RTF), the
affected range covers the entire topology. In this case, ISPF is similar to
re-calculate all.
PRC Technology
For IGP, any route is a leaf in the network node. The expression leaf can
reflect the relation between the route and the network node. According to
the root node, if the shortest path of the network node is determined, the
shortest path of the route released by the node is determined. Therefore,
PRC uses the shortest path tree calculated by ISPF to calculate the leaf
route. When any routing information changes, PRC determines the
changed route (leaf). Then, the route is selected and updated. (based on
the existing calculation of the ISPF)
Owing to the restriction of the link information format in the OSPF protocol,
the routing information and the network node (released routing devices)
are not directly associated. The same routes released by different devices
are also not directly associated. Therefore, the PRC needs to re-organize
the database.
Take the route as the base point; organize all elements that release the
route. As a result, select the best route from all elements in the case of
calculating routes. At another aspect, take the releaser as the base point,
all routes released by the releaser are assembled together. As a result,
when the ISPF announces that the shortest path of a node changes, all
routes released by the node will be directly updated.
The LSDB is composed of link state advertisements (LSA). The LSA can be
divided into 6 categories:
The boarder route of the area assembles the information about the local
area into a summary_LSA. It is flooded to the boarder routers of other
areas in the AS. The boarder routing devices analyze the received
summary_LSA and then generate summary_LSA and then flood to each
area. All boarder routers and the links among them form the backbone
area. Backbone areas are mutually reachable. They can be connected
physically or through the virtual link. In the case of configuring the virtual
link, the passed area must be transit area, instead of stub area.
The ASBR of the AS sends the external routing information to all nodes
except the stub area in the AS. The routing devices in the stub area are
directed to the ASBR through the default route.
Type: the packet type at the later part of the OSPF header. The OSPF has
five types of packets. Hello packets, type=1; database description packets,
type=2; link state request packets, type=3; link state update packets,
type=4; link state acknowledgement, type =5.
Area ID: the area where the packet is generated; when the packet passes
the virtual link, area ID is 0.0.0.0.
Hello packets are used to create and maintain the adjacency. It contains
the consistent parameters when the neighbor creates the adjacency.
Network Mask: the mask of the interface where the packets are generated
Router priority: it is used in the case of selecting DR and BDR. When the
router priority is 0, the routing device does not have the selecting rights.
Router Dead Interval: if no hello packets are received in the router dead
interval, the neighbor is considered to be down. Delete the neighbor.
Backup DR: the IP address of the BDR selected by the interface generating
the packets
Neighbor: the list of the neighbors that can receive hello packets at the
interface generating the packets in the router dead interval.
Interface MTU: the maximum IP packets that can be transmitted when the
interface generating the packets is not fractionized When the packets are
transmitted in the virtual link, the interface MTU is set to 0.
I-bit: initial bit, when the packet is the initial packet of the DD packet
sequence, the bit is 1.
M-bit: More bit, when the packet is the last packet of the DD packet
sequence, the bit is 1.
LSA Headers: the LSA header list of the link state database
Link State ID: works with link state type and advertising router to identify
a LSA.
Advertising Router: the router ID of the routing device generating the LSA.
LSA header
Link State ID: works with link state type and advertising router to identify
a LSA.
Advertising Router: the router ID of the routing device generating the LSA
V: Virtual Link Endpoint bit; set the bit when the routing device generating
the packet is one end of a virtual link
E: External bit, set the bit when the routing device generating the packets
is ASBR
B: External bit, set the bit when the routing device generating the packets
is ASBR
Link Data: the data of the link, the meaning varies with the link type
Number of TOS: the cost of the TOS (type of service), set for the forward
compatibility of protocol
Link State ID: for the Network LSA, it is the IP address of the DR interface
Attached Router: the list of the routing devices adjacent to the DR in the
network
Figure 11-22 Format of the Network and ASBR summary LSA packet
Link State ID: for type 3 LSA, it is the IP address of the advertised
network or subnet; for type 4 LSA, it is the router ID of the advertised
ASBR.
Network Mask: for type 3 LSA, it is the mask of the advertised network or
subnet; for type 4 LSA, the domain is set to 0.
Link State ID: for the ASE LSA, it is the IP address of the destination
E: External metric bit, the type of the external cost used by the route If
the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.
DC: set the bit in the case of configuring the demand line
EA: set the bit when the source routing device has the capability of
receiving/sending external attributes LSA
N: used only in the hello packets, set it to 1 when the NSSA external LSA
is supported; set it to 0 when the NSSA external LSA is not supported;
when N is set to 1, the E bit must be 0.
P: used only in the NSSA external LSA headers If P bit is set, the ABR of
NSSA must convert type 7 LSA to type 5 LSA.
MC: set the bit when the source routing device forwarded multicast
packets.
E: set the bit when the source routing device received the ASE LSA
packets.
OSFP Features
O SFP Fe atures
1. OSPF is a kind of IGP, designed for using in the AS system
4. In OSPF, the AS system can be divided into multiple areas. It has the
following advantages: 1) the routes in an area and the routes between
areas are separated; 2) dividing the AS system into areas can reduce
the calculation of SPF.
9. Flexible metric: in the OSPF, the metric is specified as the output cost
of the routing device interface. The path cost is the total of the cost of
all interfaces. The route metric can be specified by the system
administrator according to the network features (delay, bandwidth, and
cost).
10. Multiple paths with the same cost to the same destination: the OSPF
finds the paths and balances the load.
12. Support stub area: when the area is set to stub area, the external
LSAs cannot be flooded to the stub area. In the stub area, the route to
the external destination is specified by the default route.
Memory of routing device: the link state database of the OSPF may
become very large, especially when many external link states are
advertised. In this case, the memory of the routing device must be very
large. In the process of updating and synchronizing the link state database,
large amount of memory is used.
CPU usage: in the OSPF, it is related with time of running the SPF
algorithm. Moreover, it is related with the number of routing devices in the
OSPF system. In addition, when the link state database is very large, in
the process of protocol convergence, if large amount of packets should be
exchanged, a great deal of CPU is occupied.
Specify the router role: specify the router in the multi-access network to
receive and send more packets than other routing devices. At the same
time, when the specified router fails, it is switched to a new specified
router. Because of this, the number of the routing devices connected to a
network should be restricted.
In the area, to reduce the database size, do as follows: 1. the area can
use the default route, so reduce the external route that should be input; 2.
EGP external gateway protocol can use its own information to pass the
OSPF AS area instead of depending on the IGP (such as OSPF) to transmit
information; 3. You can specify the routing device to be the stub area; 4.
If the external network is regular address, you can summarize the
addresses. After the summary, the external information of the OSPF
decreases dramatically.
The OSPF is also suitable for the small and independent AS or stub AS,
because: 1. fast convergence; 2. support multiple paths to the destination
with the same cost.
IS- Intermediate System, similar to the router in the TCP/IP, the basic
unit generating routes and transmitting routing information Hereinafter,
the IS and the router have the same meaning.
ES-End System, equivalent to the host system in the TCP/IP. ES does not
participate in the processing of IS-IS routing protocol, ISO has dedicated
ES-IS protocol defining the communication between the terminal system
and the IS.
Area- the routing area divided in the IS-IS protocol, including level-1 area
and level-2 area.
LSP- Link State PDU, carries the link state information that should be
published, including adjacency information and reachable subnet
information.
PSNP- Partial Sequence Number PDU, one type of SNP packets; for
confirming the LSP packets (point-to-point network) and the request LSP
packets (broadcast network).
CSNP- Complete Sequence Number PDU, one type of SNP packet, used
for advertising the abbreviated description information of the LSDB
In this chapter, the IS-IS protocol for IPv4 and IPv6 are described. The
OSI route is not widely used, so it is not described in this document.
As shown in the preceding figure, the IS-IS protocol is based on the link
layer, independent from the network layer of the IPv4, IPv6, and OSI
protocol stack. In the broadcast network, the packets are sent in the
multicast mode. In the Ethernet, IS-IS uses the following MAC addresses.
systems
AllEndSystems 09-00-2B-00-00-04 The multicast MAC address of all ES
systems
As shown in the preceding figure, the position of the IS-IS protocol in the
network protocol stack is based on the link layer. Therefore, the IS-IS
protocol is encapsulated in the link layer packet. The routing information
carried in the IS-IS packet are organized in the TLV mode. It can be
organized and expanded flexibly. TLV: data type (1 byte)+data length (1
byte)+ data value (0-255 bytes). At the same time, according to the IS-IS
protocol, the TLV that cannot be identified should be ignored, instead of
being dropped.
IS-IS is based on the link layer and is irrelevant with the network layer,
and the routing information is organized flexibly in the TLV mode. In
addition, the TLV that cannot be identified can be ignored. This determines
the features of easy expanding and smooth upgrade.
area
Level 2 Link State PDU 20 Publish routing information in layer 2
area
CSNP Level 1 Complete Sequence 24 Advertise the database abbreviated
Numbers PDU description information to the layer 1
neighbor
Level 2 Complete Sequence 25 Advertise the database abbreviated
Numbers PDU description information to the layer 2
neighbor
PSNP Level 1 Partial Sequence Numbers 26 Request or confirm LSP packets from
PDU layer 1 neighbors
Level 2 Partial Sequence Numbers 27 Request or confirm LSP packets from
PDU layer 2 neighbors
N ET of I S
When the IS-IS protocol is used to route for the TCP/IP protocol, it is still a
CLNP protocol of ISO. In the OSPF protocol, use the router ID to identify a
routing device. In the IS-IS protocol, use an ISO network address to
identify a routing device (IS). The ISO network address is the NET
(Network Entity Title). The description of NET is shown in the preceding
figure. The example in the figure is: NET 47.0000.0000.0000.0011.00.
Area ID is used to identify the layer 1 area. Level-2 Area is the backbone
of a network. Only one level-2 area is allowed. Therefore, ID is not
required.
SEL (NSAP Selector, also N-SEL), is similar to the protocol ID in the IP.
Different transmission protocol corresponds to different SEL. In IS-IS, all
SELs are 00.
Note the description of NET is for the routing purpose of the TCP/IP
protocol in the IS-IS. NET is defined in the ISO8348.
The preceding figure illustrates the two-layer network topology of the IS-
IS protocol. A typical IS-IS network is composed of a level-2 area serving
as the core backbone network and multiple level-1 areas serving as the
access network. Each level-1 area uses one or multiple Level-2 Switch to
access the level-2 area. Each level-1 area is connected through level-2
area. Then, a level-2 network topology is formed. In an IS-IS network,
there can be one level-1 area or one level-2 area. More detailed area
division is not required.
The LSDBs of each area are independent. They are also independent in
SPF routing calculation. The function of dividing areas is to divide the
entire network into many small routing domains. Then, the size of the
LSDB is reduced. Consequently, the consumption of the memory and the
SPF calculation is reduced. But, a new problem occurs; the SPF calculation
can only implement the route learning in the area. How the route learning
should be performed between areas?
Level-1 Area and Level-2 Area are connected through Level-2 switch.
Level-2 Switch runs the level-1 protocol and level-2 protocol of IS-IS at
the same time. To solve the problem of route between level-1 area and
level-2 area, deal with level-2 switch. Level-2 switch advertises the route
learned from level-1 area to level-2 area, advertises the attach tag to
level-1 area to show that it is connected to level 2 core network.
Mark the attach tag in the level-1 routing information published on level-2
switch. It indicates that the route is connected to the level-2 core network.
As a result, all switches in the level-1 area generate a default route to the
level-2 switch. Then, all switches in the level-1 area have the default route
reaching level-2 area.
Designated IS
Pseudo-node
Neighbor ID
The network node in the adjacent network topology is identified using the
neighbor ID in the LSDB, as shown in the preceding figure. There are two
types of nodes in the adjacent network topology: 1. IS, in its neighbor ID,
the system ID is its own system ID, the Circ ID is always 0x00; 2. Pseudo-
node, created by the DIS; in its neighbor ID, the system ID is the DIS ID,
the Circ ID is the ID of the interface generating the Pseudo-node of the
DIS; it must be non-zero to distinguish the neighbor ID of the IS.
Creation of Neighbors
The adjacency information describes the IS systems that the host can
reach directly. The generated adjacency information is described in the
point-to-point mode.
For the broadcast network, to simplify the adjacent network topology, the
DIS virtualizes a Pseudo-node in the broadcast network. All IS systems in
the broadcast network generate adjacency information to the pseudo-node.
The adjacency information of the pseudo-node is the IS systems adjacent
to the broadcast network. The adjacency information of the Pseudo-node
is generated and published by the DIS.
If the LSDBs of each IS are not synchronous, the calculated SPF trees are
not consistent. The route loopback may occur. Therefore, in the entire
area, when the status is stable, ensure that the LSDBs of each IS system
must be synchronous.
The LSDB is composed of LSP packets. The LSDBs are not synchronous
because the IS-IS packets are transmitted based on the link layer, it does
not depend on the transmission mechanism. Therefore, the LSP packets
may be dropped in the transmission process. Ensuring the synchronization
of the LSDBs is to ensure the reliability of the LSP packets. Therefore, for
the point-to-point network and the broadcast network, the synchronization
protection mechanisms are different.
Step 1: Calculate the SPF tree through the SPF algorithm according to the
network topology composed of the adjacency information of the LSDB. As
a result, the shortest path to each network node (namely the IS) and the
next-hop are obtained.
Illustration
As shown in the preceding network topology, there are four switches (A, B,
C, and D), namely four IS systems. The following describes the process of
route learning through the example of switch A learns the subnet
10.0.0.0/8 route of switch D. The metric of each link is 10. The DIS
selected from the Ethernet network is switch B.
Adjacency to D
(0000.0000.0004.00) metric
10
IS D 0000.0000.0004 0000.0000.0004.00 Adjacency to C
(0000.0000.0003.00) metric
10
In IS-A, according to the information about LSDB, take A as the start point;
use the SPF algorithm to calculate the SPF tree as shown in the preceding
figure. Then, the shortest path (Pseudo-node should be ignored when the
shortest path is obtained) to the IS D obtained is A->C->D. If the Ethernet
interface of A is vlan1, the IP address of Ethernet interface of C is 3.3.3.3,
the next-hop interface of IS D is vlan1, the next-hop address is 3.3.3.3,
and the metric is 20.
IBGR-the BGP in the same AS. An IBGP neighbor is the routing device in
the same management control domain.
Ultranet-a network advertisement whose prefix rang is one bit less than
the natural mask of the network. For example, the natural mask of class C
network 202.11.1.0 is 255.255.255.0. If we use 202.11.0.0/16 to
represent the network address, the mask is 16 bits, which is less than 24
bits. Therefore, it is an ultranet.
SYN-Synchronize Before the BGP advertises the routes, the route must be
in the current IP routing table. Namely, the BGP and IGP must be
synchronized before the route is advertised.
BGP uses the TCP as the transmission protocol (port 179). Then, reliable
data transmission is provided. The retransmission and acknowledgement
of data are implemented by the TCP, instead of BGP. As a result, the
process is simplified. The reliability need not be designed in the protocol.
Create a TCP connection between two routing devices running BGP. Then,
the two routing devices are called peers. Once the connection is created,
the two peer routing devices acknowledge the connection parameters
through exchanging the open packets. The parameters include BGP
version number, AS number, duration, BGP identifier and other optional
parameters. After the two peers negotiate parameters successfully, the
BGP exchanges routes by sending update packets. The update packets
contain the list of reachable destinations passing each AS system (namely
NLRI), and the path attributes of each route. When the route changes,
incremental update packets are used between peers to transmit the
information. BGP does not require refreshing routing information
periodically. If the route does not change, the BGP peers only exchange
keepalive packets. The keepalive packets are sent periodically to ensure
the valid connection.
BG P Message H eader
The BGP message header contains a 16-byte tag, 2-byte length field, and
1-byte type field. The following figure illustrates the format of the BGP
message header.
Length: the length field occupies 2 bytes. It indicates the length of the
message. The minimum allowed length is 19 bytes and the maximum is
4096 bytes.
Type: The type field occupies one byte. It indicates the type of the BGP
message. The four types of the BGP message are as follows:
Number Type
1 Open
2 Update
3 Notification
4 Keepalive
O pen Messages
After the TCP connection is created, the first packet is the open message.
The Open message contains BGP version number, AS number, duration,
BGP identifier, and other optional parameters.
If the open message is acceptable, it means that the peer routing devices
agree with the parameters. In this case, the keepalive message is sent to
acknowledge the open message.
Except the fixed BGP header, the open message contains the following
fields:
Version: the version field occupies one byte. It indicates the version
number of the BGP protocol. When the neighbors are negotiating, the peer
routing devices agree on the BGP version numbers. Usually, the latest
version supported by the two routing devices is used.
Hold Time: the field is two bytes. It indicates the maximum waiting time
when the sending party receives the adjacent keepalive or update
messages. The BGP routing device negotiates with the peer and set the
hold time to the smaller value of the two hold times.
BGP Identifier: the field is four bytes. It indicates the identifier of the BGP
sending routing devices. The field is the ID of the routing device, namely
the maximum loopback interface address or the maximum IP address of
the physical interfaces. You can set the address of the router-id manually.
Optional parameter Length: the field is one byte. It indicates the total
length of the optional parameter fields (the unit is byte). If there are no
optional parameters, the field is set to 0.
U pdate Message
The update message is used to exchange routing information between BGP
peers. When you advertise routes to a BGP peer or cancel the routes, the
update message is used. The update message contains the fixed BGP
header and the following optional parts:
Total Path Attribute Length: the field is two bytes; it indicates the total
length of the path attribute field.
Path Attribute: the variable long field contains the BGP attribute list
related with the prefix in the NLRI. The path attribute provides the
attribute information of the advertised prefix, such as the priority or next
hop. The information is for route filtering and route selection. The path
attribute can be classified into the following types:
higher is the route priority. The local_pref is not contained in the update
message sent to the EBGP neighbor. If the attribute is contained in the
update message from the EBGP neighbor, the update message will be
ignored.
AGGREGATOR: the attribute marks the BGP peer (IP address) performing
the route aggregation and the AS number.
from the processing of local priority. The external routing device may
affect the route selection of another AS. The local priority only affects the
route selection in the AS.
Network Layer Reachability: the variable long field contains the list of
reachable IP address prefix advertised by the sender.
Keepal i ve Message
The keepalive messages are exchanged between peers periodically to
check whether the peer is reachable.
Error Code: one byte, the field indicates the error type.
ERROR SUBCODE: one byte, the field provides more details about the
error.
DATA: variable length field, the field contains the data related with the
error, for example, invalid message header, illegal AS number. The
following table lists the possible error codes and the error subcodes.
Table 11-8 BGP Notification message error code and error subcode
IE Description
1 BGP starts
2 BGP ends
3 BGP transmission connection opens
4 BGP transmission connection is terminated
5 Fail to open the BGP transmission connection
6 BGP transmission fatal errors
7 Retrying connection timer times out
8 Duration time terminated
9 Keepalive timer terminated
10 Receive Open messages.
11 Receive Keepalive messages.
12 Receive update messages
13 Receive notification messages
Idle: initial status, the BGP is in the idle status until an operation triggers
a startup event. The startup event is usually triggered by the creation or
restart of BGP session.
Active Status: in the status, BGP attempts to create a TCP connection with
the neighbor. If the connection succeeds, send the Open message, and
move to the status of sending open message. If re-connecting timer times
out, the BGP restarts the connection timer and goes back to the
connection status to monitor the connection from the peers.
OpenSent: in the status, the open message is sent. BGP is waiting for the
open message sent from the peers. Check the received open message. If
any error occurs, the system sends a notification message and goes back
to the idle status. If no error occurs, the BGP sends a keepalive message
to the peer and resets the keepalive timer.
Established: the last phase of the neighbor negotiation. In the status, the
connection between BGP peers is established. Between peers, the update,
notification, and keepalive messages can be exchanged.
Well-Known Mandatory;
Well-Known Discretionary;
Optional Transitive
Optional Non-Transitive;
Optional Transitive: BGP does not need to support the attribute, but it
should accept the path with the attribute and the paths should be
advertised.
When multiple routes with the prefix of the same length and to the same
destination exist, BGP select the best route according to the following rules:
9. Preferentially select the route whose next-hop has the minimum IGP
metric;
11. Preferentially select the route with the minimum BGP ROUTER-ID;
13. Preferentially select the route from the lowest neighbor address;
14. If the BGP load balancing is started, rules 10-13 are ignored. All routes
with the same AS_PATH length and MED values are installed in the
routing table.
Figure 11-39 In the same condition, preferentially select the route with
higher LOCAL_PREF value
User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred
ISP. When the device connected to the ISP1 announces routes to the
switch-F, set the LOCAL_PREF value higher. For the same destination,
preferentially select the routes learned by ISP1 for its LOCAL_PREF value
is higher.
Figure 11-40 In the same condition, preferentially select the route with
lower MED value
The two-host structure is used between a user and an ISP. The ISP prefers
to use LINK2 and use LINK1 as the backup. When the user publishes
routes to the ISP, the update packets with lower MED value are
transferred on LINK2. If the routes transferred on EBGP neighbor created
on LINK2 and LINK1 have no different options, the route with lower MED is
selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.
R oute Fi ltering
Route filtering means that a BGP speaker can determine the sent route
and the received route from any BGP peers. Route filtering is to define the
route policy. The procedure is as follows:
1. Identify Routes
3. Operation attributes
We can complete route filtering through access list, prefix list, or AS path
access list. We can also use the route mapping to implement filtering and
attribute operation.
The route reflector is recommended only in the large scale internal BGP
closed network. The route reflector increases the overhead of the route
reflector server. If the configuration is incorrect, the route may be cyclic or
unstable. Therefore, route reflector is not recommended in every topology.
All iance
The alliance is another method for processing the sharp increase of IBGP
closed network in the AS. Similar to the route reflector, the alliance is
recommended only in the large scale internal BGP closed network.
The concept of the alliance is put forward because one AS can be divided
into multiple sub-AS systems. In each sub-AS, all IBGP rules are
applicable. For example, all BGP routing devices in the sub-AS must form
a fully closed network. Each sub-AS has different AS number. Therefore,
external BGP must be run between them. Although the EBGP is used
The defect of the alliance is: in the case of changing the plan from the
non-alliance to the alliance, the routing devices should be reconfigured
and the logical topology should be changed. In addition, if the BGP policy
is not manually set, you cannot select the best route through the alliance.
R ou te Da mping
Route damping (route attenuation) is a technology controlling the
unstability of routes. It significantly reduces the unstability caused by
route oscillation.
The route damping divides the route into normal performance and bad
performance. Routes with normal performance demonstrate long-term
high stability. In addition, the route with bad performance demonstrate
unstability in short term. The route with bad performance should be
punished with direct proportion to the expected route unstability. Unstable
routes should be suppressed until the route becomes stable.
The recent history of the route is the basis of evaluating the future
stability. To know the route history, first, you should know the swing times
of the route in certain period. In the route damping, when the route
swings, it is punished. When the punishment reaches a predefined limit,
the route is suppressed. After the route is suppressed, the route can
increase punishments. The more frequent the route swing is, the earlier
the route will be suppressed.
Similar rules are used to un-suppress the route and re-advertise the route.
An algorithm is used to exit (reduce) punishment according to the power
law. The basis of configuring the algorithm is the parameters defined by
users.
After the route device becomes faulty, the neighbors in the BGP route
layer will detect that the neighborship becomes down and up, which is
called BGP neighbor oscillation. The oscillation of neighborship finally
causes the route oscillation. As a result, route blackhole occurs after the
routing device is restarted for a while or the data service of the neighbor
bypasses the restarted routing device. Consequently, the reliability of the
network is decreased.
The BGP graceful restart in the case of routing device failure prevents the
route disturbance and accelerates the route aggregation, which ensures
the network reliability.
1. In the BGP OPEN message, the graceful restart capability is added. The
fields are as follows:
2. In the BGP update packets, add the EOR flag to indicate that the
update is complete.
2. When any fault occurs, the forwarding layer of switch A reserves the
route and continue guiding the forwarding;
5. Delay the route calculation until the EOR flag from the neighbor is
received or the deter-timer times out.
6. Calculate the route, update the core route and advertise the route.
2. After the restarter end becomes faulty, if any TCP error is detected,
run step 3, if no TCP error is detected, run step 4.
4. Re-construct neighbors and delete the restart timer. If the timer exists,
start the stale-path timer.
5. Before the creation, the restart timer times out, or the fwd-flag in the
corresponding address family of the open message is not 1, or the
corresponding address family information is not contained, run step 8.
6. Send routes to the restart routing device. Then, send EOR flag.
7. If the stale-path times out before the EOR is received, run step 8.
8. Delete the reserved route and then enter the normal BGP flow.
ACL Technology
This chapter describes the ACL technology and its application. The
configurations related with the ACL function in the switch include the
action group configuration, traffic meter configuration, and time range
configuration.
Main contents:
Main contents:
ACL classification
Typical application
number. After identifying the traffic, ACL can execute the specified
operations, such as prevent them from passing one interface.
ACL comprises a series of rules. Each rule is used to match one specified
type of traffic. The serial number of the rule (Sequence) decides the
location of the rule in the ACL. ACL checks the packets according to the
rule sequence from small to large. The first rule that matches with the
packet in the ACL decides the processing result for the packet, permit or
deny. If there is no rule to match the packet, the packet is denied, that is
to say, the packets that are not permitted are denied. This shows that the
rule order is important.
The following figure shows the access authority of the ACL segments. The
action of the shadow part is deny and the action of the white part is permit.
After the last rule (that is, after the above rule 30), there is one hidden
rule deny any. The serial number of the rule is larger than those of all
rules in the ACL. The hidden rule is invisible and denies all packets that do
not match with the previous rules. To make the hidden rule not take effect,
you need to configure one rule permit any manually to permit the
packets that do not match all other rules to pass.
ACL Classification
According to the usage of the ACL, ACL can be divided to six types:
IP standard ACL
IP standard ACL
IPV6 ACL
I P Sta ndard AC L
The IP standard ACL makes the rules only according to the source address
of the packet, so as to analyze and process the packet. For example, the
following standard IP ACL denies the packets sent from the host
171.69.198.102, but permits the packets sent from other hosts.
I P Exte nded AC L
The IP extended ACL filters the packets according to the IP upper-layer
protocol number, source IP address, destination IP address, source
TCP/UDP port number, destination TCP/UDP port number, TCP flag, ICMP
message type and code, and TOS priority. For example, the following IP
extended ACL denies the telnet packets sent from 171.69.198.0/24 to
171.69.198.0/24, but permits other TCP packets.
M AC Standard AC L
MAC standard ACL makes the rules according to the source MAC address
of the Ethernet packet, so as to analyze and process the packet.
H yb rid ACL
The Hybrid ACL can filter packets according to IP protocol number, source
IP address, source MAC address, DSCP, VLAN and so on.
I PV6 ACL
The IPV6 extended ACL filters the packets according to the IPV6 upper-
layer protocol number, source IP address, destination IP address, source
TCP/UDP port number, destination TCP/UDP port number, and TOS priority.
For example, the following IPV6 ACL permits the IPV6 packets sent from
the host 1:2:3:4::5.
switch(config-v6-list)#
Typical Application
One basic function of ACL is used to limit the access for the network
resources, that is, one group of limited IP addresses access one group of
limited services. The most common used method of using ACL to control
the access authority is to create ACL to permit only the legal traffic to pass,
but prevent all illegal and un-authorized traffic. The following adopts one
example to describes the ACL function.
Application requirement:
In the intranet of one company, the port 0/0 of the switch is connected to
the news server and finance server; port 0/1 of the switch is connected to
the marketing department; port 0/2 of the switch is connected to the
Network topology:
1. Create the extended IP ACL 1001; permit all packets to reach the
news server via port 0/0; only permit the packets sent from the
accounting department to reach the finance server via port 0/0.
2. Apply the ACL 1001 at the input direction of port 0/1 and port 0/2.
Related Terms
SRTCM (Single Rate Three Color Marker): It is defined in RFC2697. Use
the three parameters (CIR, CBS, and EBS) to realize the single rate control
and packet coloring function. It includes colorblind mode and color
sensing mode;
TRTCM (Two Rate Three Color Marker): It is defined in RFC2698. Use CIR,
CBS, PIR, and PBS to realize the two rate control and the coloring for
packets. It includes the colorblind mode and color sensing mode;
The meter supports two modes, including SRTCM and TRTCM. The function
of the meter is to remark or drop the packet according to the traffic. The
meter has the processing action for the colored packet. When being
configured as drop the colored packet, it is used to complete the packet
traffic limitation function; when being configured as remark the colored
packet, it is used to complete the packet classification according to the
traffic so that the user takes different QoS policies in the later data path.
After the meter is configured to color the packets, the counter in the
action group can count the packets.
Related terms
Related Terms
Time domain: It is the set of the time periods. One time domain can
contain 0 to multiple time periods. The time range of the time domain is
the union set of the time periods.
QoS Technology
This chapter describes the port-based QoS technology and the applications.
Main contents:
Priority mapping
Dropping mode
Rate restriction
Flow shaping
Priority Mapping
This section describes the theory of the priority mapping.
Main contents:
Related terms
Typical application
Related Terms
802.1p priority: The 8021.p priority is located in the L2 packet header. It
is used when there is no need to analyze the L3 packet header, but need
to ensure QoS in L2 environment. As shown in Figure 13-1, the 4-byte
802.1Q header contains 2-byte TPID (Tag Protocol Identifier valued as
0x8100) and 2-byte TCI (Tag Control Information). The following figure
shows the detailed contents of the 802.1Q header.
802.1Q header
As shown in Figure 13-2, the Priority field in TCI is the 802.1p priority. It
comprises three bits and the value range is 0-7.
DSCP priority: RFC2474 defines the ToS domain of the IP packet header
called DS field. Here, the first six bits indicates the Differentiated Services
Code Point (DSCP) and the value range is 0-63. The later two bits are
reserved, as shown in Figure 13-3.
DS field
Local Priority: It is the priority with the local meaning distributed by the
switch to the packet. By default, it corresponds with the cos queue as the
intermediary role of DSCP or 802.1p priority to the cos queue.
Re-tag the DSCP value of the packet according to the DSCP value of
the packet;
Map the egress 802.1p priority of the packet according to the local
priority of the packet;
Map the egress dscp priority of the packet according to the local
priority of the packet;
After the packet enters into the switch, map to the local priority according
to the 802.1p priority or DSCP, and then to the cos queue. Meanwhile,
configure the DSCP to the local priority mapping and 802.1p priority to the
local priority. The former has higher priority (that is, the mapping from the
DSCP to the local priority takes effect).
Main contents:
Related terms
Typical application
Related Terms
SP (Strict Priority): It is one of queue scheduling algorithms. SP sends
the packets in the queue strictly according to the priority order from high
to low. When the queue with high priority is empty, send the packets in
the queue with lower priority. Queue 7 has the highest priority and queue
0 has the lowest priority.
Typical Application
Scheduling mode
Illustration
The devices in the LAN are connected to the outer network via port 0/1 of
the switch. The packets sent by the devices in the LAN are mapped to the
output queue of port 0/1 according to the rules such as priority mapping.
Suppose the packets that queues 0, 6, and 7 are to send have high real-
time requirement and the other queues have the same priority. You can
configure port 0/1 to schedule by WRR and the weight of queues 0, 6, and
7 as 0. Therefore, the three queues schedule by strict priority and forward
packets first.
Drop Mode
This section describes the drop mode of the port.
Main contents:
Related terms
Typical application
Related Terms
SRED: Simple random early detection
Typical Application
Drop mode
Illustration
The devices in the LAN are connected to the outer network via port 0/1 of
the switch. The packets sent by the devices in the LAN are mapped to the
output queue of port 0/1 according to the rules such as priority mapping.
By default, when the network is blocked, drop the excessive packets,
which is unfair to the later packets. Therefore, configure the SWRED drop
mode on the port, that is, drop the packets according to the rate before
the network is blocked.
Speed Restriction
The port-based input direction provides the speed restriction with
granularity as 64Kbit/s. The overspeed flow is dropped. The configured
parameters are bandwidth threshold (Kbit; 64K is the minimum
granularity) and burst flow (byte). The granularity of the burst flow is 4K
bytes. Use the port speed restriction to make the flow entering the
network with an even speed, preventing the network blocking from the
headstream.
Flow Shaping
The flow shaping has two kinds:
The port-based flow shaping at the output direction makes the packets be
sent out with an even speed. The configured parameters are bandwidth
threshold (Kbit; 64K is the minimum granularity) and burst flow (byte).
The granularity of the burst flow is 4K bytes.
The output flow shaping based on the port queue makes packets be sent
out with an even speed. The configured parameters are queue number,
commitment information speed, commitment burst size, peak burst size,
and peak information rate. Here, the granularities of both the commitment
information speed and peak information speed are 64kbit/s; the
granularities of both commitment burst and peak burst size are 4k bytes.
The switches classifies the queue to three types according to the relation
between the queue flow size and cir/pir, that is, first schedule the queue
with less than cir flow, then the queue with the flow between cir and pir,
and at last the queue with more than pir flow.
After the packet enters the switch, enter the corresponding virtual queue
according to the VLAN number of the packet. On the virtual queue, the
queue scheduling and shaping can be realized. After VLAN queue shaping,
the traffic enters queue 9 of the port.
AAA Technology
This chapter describes the AAA security service theory, RADIUS and
TACACS protocols, the ID authentication mechanism of MP series router,
and the common used debug commands and displayed debug information.
Main contents:
AAA terms
AAA Terms
AAA: It is short for Authentication, Authorization and Accounting. It
provides one consistency frame used to configure the three kinds of
security functions. In fact, AAA configuration is to manage the network
security. Here, the network security mainly refers to the access control,
including:
NAS: It is short for Network Access Server. Enable the AAA security
services on the router as NAS. When the users wants to set up the
connection with NAS via one network (such as telephone network), so as
to get the authority of accessing other networks (or get the authority of
using some network resources), NAS is used to identify the user (or the
connection).
AAA uses the protocols (such as RADIUS and TACACS) to manage its
security function. AAA sets up the communication between NAS and
RADIUS, TACACS security server. Besides, the local user name, line
password and valid password can be used as the ID authentication method
of the access control.
As shown in the above figure, suppose that one method list is defined on
NAS. In the list, R1 is first used to get the ID authentication information,
then R2, T1, T2, and at last, the local user name database on NAS. If one
remote user tries to dial to the network, NAS first queries the ID
authentication information from R1. Suppose that the user passes the ID
authentication of R1, R1 sends out one PASS response to the network
access server. In this way, the user gets the authority of acing the
network. If R1 returns the FAIL response, the user is denied to access the
network and the session is ended. If R1 has no response, NAS regards it
as ERROR and queries the ID authentication information from R2. This
mode keeps in the following specified methods until the user passes the ID
authentication, is denied or the session is ended.
Note
NAS tries the next method only when the previous method has no
response. If the ID authentication fails at one point of the period, that is,
the security server or local user name database responds by denying the
user access, the ID authentication ends and do not try other ID
authentication method any more.
Introduction to RADIUS
RADIUS is one UDP-based customer/server protocol. NAS serves as the
RADIUS client machine, but RADIUS server is the background process that
runs on the UNIX or Windows NT host.
RADIUS packet exists in the data domain of the UDP packet. The length is
variable. The domain attribute varies with the RADIUS packet type. The
following is the structure of the RADIUS packet.
Code field
Identifier field
Length field
Authenticator field
1-Request Authenticator
2-Response Authenticator
ResponseAuth = MD5(Code+ID+Length+RequestAuth+Attributes+Secret)
Attribute field
0 1 2
Type Length Value
The Type field indicates the Attribute type.
The Length field indicates the length of the whole Attribute, including Type,
Length and Value.
Introduction to TACACS
TACACS provides the authentication, authorization and accounting services.
TACACS adopts the TCP packet to transmit the data and uses the port 49
to receive the TCP packet. The format of the TACACS packet header is as
follows. The packet header always adopts the plaintext to transmit.
type field
1-authentication
2-authorization
3-accounting
seq_no field
flags field
It is the flag. The lowest bit indicates whether the packet is encrypted.
session_id field
It is the session ID. It is one random 4-byte number. The ID does not
change in one session.
length field
Introduction to ID Authentication
Mechanism
Login Authentication
1. If AAA is not configured and Line is not configured, the login via
console port or telnet directly pass the authentication; for SSH, you
should use the local login.
When the user logs in via the interface or line, the system authenticates
the ID according to the method list referenced by the interface or line. If
the interface or line does not reference ant method list or the referenced
method is not defined, the system uses the default method list to
authenticate the ID; if the default method list is not configured, adopt the
default method to authenticate.
For the login via console port, the default method is none; for telnet and
ssh login, the default method is local.
If the user adopts the valid user name to log in, it is not required to input
the user name when authenticating ID in the privileged mode and you just
need to input the desired password.
If the login user has the enable password, authenticate according to the
password;
If there is no any enable password, the user that logs in via the console
port directly passes the authentication, but the telnet user does not pass
the authentication.
2. Configure AAA
After the user logs into the router, request entering the privileged mode.
The system authenticates the ID according to the default method list; if
the method list does not exist, adopt the default method to authenticate:
For the login via the console port, the default method list is enable none;
For the telnet and ssh login, the default method is enable.
EIPS Technology
The EIPS technology supports two modes. One is sub ring mode. When
processing the intersecting rings, de-compound the two intersecting rings
to one master ring and one sub ring; there is one public link between the
master ring and the sub ring. The other mode is called hierarchical mode.
When processing the two intersecting rings, choose one ring as the master
ring. After removing the public link with the master ring, the ring
connected to the master ring becomes the low-level link connected to the
master ring.
EIPS domain: The EIPS domain is identified by the integer ID. A group of
switches that are configured with the same domain ID and are
interconnected form one EIPS domain. The EIPS domain comprises EIPS
ring, EIPS control VLAN, master node, transmission node, edge node and
assistant edge node.
EIPS ring: The EIPS ring is identified by the integer ID. It physically
corresponds with one ring Ethernet topology. Each EIPS ring is one local
unit of the EIPS domain. The EIPS protocol takes effect on the EIPS ring.
The EIPS rings in the EIPS domain are divided to master ring and rub ring.
In one EIPS domain, there is only one master ring, but there can be one or
multiple sub rings. The sub ring intersects with the upper ring via the edge
node and the assistant edge node.
EIPS sub ring: It is the EIPS ring whose level is larger than 0.
EIPS control VLAN: It is relative to the data VLAN. In the EIPS domain,
the control VLAN can only be used to transmit the EIPS protocol packets.
Each EIPS ring has one control VLAN. The master ring protocol packets are
transmitted in the master control VLAN. The sub ring protocol packets are
transmitted in the sub control VLAN. It is not permitted to configure the IP
address on the master control VLAN and sub control VLAN interfaces. The
port connected to the Ethernet ring on the switch belongs to the control
VLAN and only the port connected to the Ethernet ring can be added to the
control VLAN. The port on the master ring belongs to the master control
VLAN and the sub control VLAN. The port on the sub ring only belongs to
the sub control VLAN. The whole master ring is regarded as one logical
node of the sub ring. The EIPS protocol packets of the sub ring are
transmitted transparently as the user packets of the master ring. The EIPS
protocol packets of the master ring do not enter the sub ring, but are only
transmitted in the master ring.
EIPS node: each switch on the EIPS ring is one node on the EIPS ring.
The nodes on one ring have the same EIPS domain ID and the EIPS ring
ID. Each EIPS node has two EIPS ports connected to the EIPS ring, which
are specified as the master port and standby port by the user during the
configuration.
Master node: The master node is the initiator of polling the status of the
ring network (the master node sends HEALTH packets periodically from
the master and standby ports. If at least one port can receive the packet
from another port, it indicates that the ring is complete. If the HEALTH
packet cannot be received for a long time, it is regarded that the ring fails).
The master node is also the decider of executing the operation after the
network topology status changes.
Complete State:
When all links on the ring network are in the UP state, the master node
can receive the HEALTH packet sent by itself from the standby port, which
indicates that the master node is in the complete state. The status of the
master node reflects the status of the EIPS ring. Therefore, EIPS ring is
also in the complete state. Here, the master node blocks the standby port,
so as to prevent the packets from forming the broadcast loop on the ring
topology.
Failed State:
When all links on the ring network are in Down state, it indicates that
master node is in the Failed state. Here, the master node enables the
standby port to ensure that the communication between the nodes on the
ring network is not interrupted.
PRE-UP State:
When the master node is in the failed state, it first turns to the Pre-up
state after receiving the HEALTH packet. If it still can receive the HEALTH
packets within a period, it turns to the complete state. This is to prevent
the network flap.
Transmission node: Besides the master node, there are all transmission
nodes on the EIPS ring. The transmission node is responsible for
monitoring the status of the direct-connected link and reporting the status
change to the master node via the EIPS protocol packet, and then the
master node decides how to process. The two transmission nodes
intersecting with the master ring on the sub ring are divided to edge node
and assistant edge node (there is only the transmission node on the
master ring; the edge node and assistant edge node are just on the sub
ring). If the transmission node on the master ring has the public port with
the edge node of the sub ring, it needs to send the sub ring protocol
channel status detection packet on its port. If the transmission node on
the master ring has the public port with the assistant edge node of the sub
ring, it needs to transmit the received sub ring protocol channel status
detection packet to the corresponding assistant edge node.
The master port and standby port of the transmission node are both in the
up state. The transmission node is in the Link-Up state.
When the master port or standby port of the transmission node is in the
Down state, the transmission node is in the Link-Down state. When the
transmission node in the Link-up state finds that the master port or
standby port is in the Link-Down state, it turns from the Link-Up state to
the Link-Down state and informs the master node by sending the Link-
Down packet.
The transmission node cannot directly return to the Link-Up state from the
Link-Down state. When one port of the transmission node in the Link-
Down state is in the Link-Up state, and then the master port and standby
port recover to the Up state, the transmission node turns to the
Preforwarding state and blocks the last recovered port. At the moment
when the master port and standby port of the transmission node recover,
the master node cannot get to known the message at once, while the
standby port is still in the enabled state. If the transmission node returns
to the Link-UP state at once, the packets form the broadcast loop on the
ring network. Therefore, the transmission node first turns from the Link-
Down state to the Preforwarding state.
Edge node and assistant edge node: The edge node and assistant edge
node are used to detect the status of the sub ring protocol packet channel
in the master ring. The edge node is the initiator of the detection
mechanism, the assistant edge node judges the channel status and
reports to the edge node, and at last, the edge node makes decision
according to the channel status.
The edge node and the assistant node are both the special transmission
node, so they have the same three state as the transmission node, but the
meanings are a little different, as follows:
When the edge port is in the UP state, it indicates that the edge node
(assistant edge node) is in the Link-Up state.
When the edge port is in the Down state, it indicates that the edge node
(assistant edge node) is in the Link-Down state.
The transferring of the edge node (assistant edge node) status is basically
the same as the transmission node. The difference is that when the port
link statues change results in the status transferring of the edge node
(assistant edge node), it only depends on the status of the edge port
(refer to the previous introduction of the edge node status).
The edge node and the assistant edge node is the two main bodies of the
mechanism of detecting the sub ring protocol packet channel status in the
master ring. The edge node is the initiator of the mechanism, the assistant
edge node judges the channel status and reports to the edge node, and at
last, the edge node makes decision according to the channel status. The
mechanism is described in details later.
EIPS port: EIPS port is one abstract concept, corresponding to one of the
links that form the EIPS ring. The link can be one single physical link or
the aggregation link formed by multiple physical links. On each EIPS node,
there are always two ports connected to the EIPS ring. The EIPS rings may
intersect, so one EIPS port may belong to multiple EIPS nodes.
EIPS master port and EIPS standby port: The ports on the master
node and the common transmission node (non-edge node and assistant
edge node) are divided to master port and standby port. For the master
node, when the loop is complete, the user data VLAN of the standby port
needs to be blocked; for the transmission node, the master port and
standby port do not have special meaning.
EIPS public port and EIPS edge port: The ports on the edge
transmission node and assistant edge transmission node are divided to
public port and edge port. The public port is the port connected to the
public link of two intersecting rings and belongs to multiple EIPS rings. The
edge port only belongs to one sub ring. When the public port fails, do not
need to report to the master node of the sub ring, but only need to report
to the master node of the master ring.
0 15 16 31 32 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
Type (Ether Type) (TPID) PRI + CFI + VLAN ID Frame Length
Source MAC address: 48bits, the MAC address of the sending node;
Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;
The master ring protocol packets are broadcasted in the main control
VLAN; the sub ring protocol packets are broadcasted in the sub control
VLAN;
The EIPS ports on the master ring node are added to the main control
VLAN and sub control VLAN; the EIPS ports on the sub ring are only
added to the sub control VLAN;
The protocol packets of the sub ring are processed as the packets in
the master ring, being blocked/enabled synchronously with the
packets;
There are two aspects of reasons why the master node sends HEALTH
packets from two ports at the same time:
When enabling the standby master node function, if the one link in the
loop is DOWN, the standby master node is at the port that does not
send the HEALTH packets on the master node, so the standby master
node function cannot take effect.
Figure 15-1 The running when the uni-ring is in the non-fault state
Figure 15-2 The master node cannot receive the HEALTH packets
Figure 15-3 Transmission node detects that the physical line is down
When the edge node and assistant edge node receive the COMP-FLUSH-
FDB packets of the sub ring, turn to the LINK-UP state unconditionally.
When the assistant edge node receives the EDGE-HEALTH packet, the
status machine turns to the LINK-UP state.
To avoid that the edge node becomes disordered when receiving the
MAJOR-FAULT and COMP-FLUSH-FDB packets and the status of the edge
node becomes wrong, when the assistant edge node turns to the LINK-UP
state, send the MAJOR-RESUME packet to the edge node. After receiving
the packet, the edge node needs to turn to the LINK-UP state.
Figure 15-4 The sub ring detecting the master ring link
Figure 15-5 The sub ring detects the master ring link fault
As shown in Figure 15-6, there is only one ring in the network topology.
Here, you just need to define one EIPS domain and one EIPS ring. The
feature of the networking is that when the topology changes, the response
speed is high and the convergence time is short, which can meet the
application when there is only one ring in the network.
As shown in Figure 15-7, there are two or more rings in the network
topology, but there are two public nodes between the rings. Here, you just
need to define one EIPS domain and select one ring as the master ring
and the other as the sub ring. The typical application of the networking is
that the master node of the sub ring can go upstream via two edge nodes
and provide the upstream link backup.
Hierarchical EIPS
Main contents:
Master Node (master, M for short): It is the main decision maker and
control node on the ring of one domain. There is only one master node on
one single ring. The two ports of the master node on the ring are the
master port and the assistant port. When the link pf the domain controlled
by the master node is complete, the assistant port blocks all data to avoid
the loop. When the link on the ring fails and if the port of the faulty link is
not the assistant port of the master node, enable the forwarding function
of the assistant port.
major-level ring is highest (it is 0). Here, the major ring is one complete
ring. The low-level links are the un-complete ring link set after removing
the public links with the access upper layer.
In Figure 15-8, the nodes T1, T2, T3, and M form the major-level ring
(level 0, seg 0); the node M is the master node; the nodes T1, T2, and T3
are the transmission nodes. When the major-level ring is not faulty,
EIPS blocks the services of the second port S.
In Figure 15-9, choose one of the intersecting rings as the major-level ring
and the other rings degenerate as the low-level segment link. The nodes
T1, T2, T3, T4, and M form the major-level ring; the node M is the master
node; the nodes T1, T2, T3 and T4 are the transmission nodes. Divide the
level and segment for other links; (level 1, segment 1) includes the nodes
T1, T2, T3 and T4. Here, the node T2 is the edge control node; the node
T1 and T2 are the transmission nodes; the node T3 is the edge assistant
node. When ((level 1, segment 1) link is not faulty, the node T2 blocks the
edge port connected to (level 1, segment 1). The major-level ring is one
single ring and the low-level segment link is one link. The larger the level
number is, the lower the level is.
N ode Roles
Master Node:
The major-level ring of one domain has one master node, that is, the
master node of the major-level ring. The master node is the initiator of
detecting the major-level ring status actively and the decision maker of
executing the operation after the major-level ring topology changes.
The master node sends the HEALTH packets periodically from two ports,
which are transmitted via the transmission nodes on the ring. If the
master node can receive the HEALTH packets sent by itself, it indicates
that the major-level ring link is complete; if the two ports cannot receive
the HEALTH packets within the specified time, it regards that the ring
network link fails.
Complete State
The major-level ring is in the stable state and there is no broken link in
the ring. The master node blocks the service forwarding function of the
protect VLAN of the assistant port, to as to prevent the network storm
caused by the loop. Meanwhile, the master node periodically sends the
HEALTH packet, which is transmitted via the transmission node when the
loop is normal and returns to the port of the master node.
Failed State
When the link of the major-level ring is disconnected, the master node
enters into the Failed State after receiving the event that the link is
disconnected. If the corresponding port of the faulty link is not assistant
port, the assistant port enables the data forwarding function of the protect
VLAN. Because the topology of the major-level loop changes, the master
node needs to send the COMM-FLUSH-FDB control messages from the
main port and assistant port to inform all other nodes of the level segment
to clear up the address entries of the master node and the protected VLAN.
Init State
When the master node begins to initiate, the link status of the current loop
is not known, so set the current status as Init State until the actual status
of the loop is detected.
PRE-UP State
To avoid that the fault point flaps repeatedly and the loop status
frequently switches, which causes the interruption of the service data, the
master node waits for some time and then enters the Complete State from
the Failed State. During the waiting time, the status of the master node is
PRE-UP State.
Transmission Node:
The transmission node is responsible for monitoring the status of the link
on the direct-connected loop. When the link fails, send the LINK-DOWN
packet to inform the control node of the level segment and then the
control node decides how to deal. When the COMP-FLUSH-FDB and COMM-
FLUSH-FDB packets of the control node are received, update the FDB table
related with the protection service VLAN.
Complete State:
Failed State:
Init State:
When the transmission node begins to initiate, the link status of the
current loop is not known, so set the current status as Init State and send
the ASK packet to query the control node of the level segment.
Pre-forwarding:
The status appears at the moment when the link recovers. When in the
state, the original Down port becomes up. The EIPS control VLAN is
enabled and can forward the EIPS protocol packets, but the service VLAN
is still blocked. After the loop enters the Complete state and the
transmission node receives the COMP-FLUSH-FDB packet of the control
node, enable the forwarding function of the service VLAN and turn to the
Complete state. If the transmission node does not receive the COMP-
FLUSH-FDB packet within the specified time, automatically turn to the
Complete state.
The edge control node is the control node that has only one port on the
low-level segment link. There is no master node in the level segment link.
The edge control node periodically sends the HEALTH packet to the level
segment link from the access port. When the link is complete, the returned
HEALTH packet can be received. The edge control node is similar to the
master node and has the following four status:
Complete State
The level segment link is in the stable state and there is no broken link.
The edge control node blocks the service forwarding function of the protect
VLAN of the access port, to as to prevent the network storm caused by the
loop. Meanwhile, the access port periodically sends the HEALTH packets,
which are transmitted via the nodes of the low-level segment link when
the loop is normal and return to the access port of the edge control node.
Failed State
When the access port of the edge control node does not receive the
returned HEALTH packets within the specified time or receives the event
that the link is disconnected on the level segment link, the node enters the
Failed State. If the corresponding port of the faulty link is not the access
port of the edge control node, enable the data forwarding function of the
protection service VLAN of the access port. Because the topology of the
level segment link changes and the edge control node needs to send
COMM-FLUSH-FDB control message to inform the other nodes on the level
segment link and the related nodes of the upper level to clear up the FDB
table of the node and the protected VLAN.
Init State
When the edge control node begins to initiate, the link status of the
current level segment link is not known, so set the current status as Init
State until the actual status of the loop is detected.
PRE-UP State
To avoid that the fault point flaps repeatedly and the loop status
frequently switches, which causes the interruption of the service data, the
edge control node waits for some time and then enters the Complete State
from the Failed State. During the waiting time, the status of the edge
control node is Preforwarding State.
The edge assistant node is the non-control node that has only one port on
the low-level segment link. When receiving the HEALTH packet sent by the
control node of the level segment link, return it to the control node from
the receiving port and cooperate with the control node to detect the level
segment link status. If the edge assistant node does not receive the
HEALTH packet within the specified time, it is regarded that the link
between the edge assistant node and the control node fails. When the
edge assistant node receives the LINK-DOWN packet of the level segment
link, it is also regarded that the link between the edge assistant node and
the control node fails. The edge assistant node is responsible for
monitoring the status of the link on the direct-connected loop. When the
link fails, send LINK-DOWN packet to inform the control node of the level
segment. When the edge assistant node finds that the link between itself
and the control node of the level segment link fails, it serves as the
temporary control node and send the COMM-FLUSH-FDB packet to inform
the other nodes on the level and the upper-level nodes to update the FDB
table related with the protection service VLAN.
The master node of the main ring send the HEALTH packets from two
ports. If at least one port can receive the packet from the other port, it
indicates that the main ring is complete, so you need to block the data
forwarding function of the protection service VLAN of the assistant port.
Contrarily, if the HEALTH packet is not received within the specified time
or the LINK-DOWN packet of the main ring is received, it indicates that the
major-level ring fails. If the corresponding port of the faulty link is not the
assistant port, you need to enable the protection service VLAN forwarding
function of the assistant port, so as to ensure the normal communication
of all nodes on the ring. Besides, the master node of the main ring
receives the address update packet from other low-level segment link, but
does not forward it.
The main port and assistant port of the transmission node has no
difference on the function. The port role also depends on the user
configuration.
Edge Port
The edge node has only one port connected to one level segment link and
the port is the edge port. When the address refresh message COMP-
FLUSH-FDB and COMM-FLUSH-FDB is received from the edge port and if
the upper level does not get the status change notification of the level
segment link that sends the control message, send the packet to the
upper level and update the FDB table of the port related with the
protection service VLAN.
Block: block port, prohibit the data from being forwarded via the port;
Forward: enable port, permit the data to be forwarded via the port;
For example, when the link on the main ring is normal, the master node of
the main ring blocks the assistant port so that the data in the protection
service VLAN cannot pass the assistant port of the master node, avoiding
the loop. When the link on the main ring fails and the corresponding port
of the faulty link is not the assistant port of the master node, the master
node enables the assistant port and permits the data in the protection
service VLAN to pass the assistant port and recover the communication of
service data.
Table 15-4
0 15 16 31 32 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
Type (Ether Type) (TPID) PRI + CFI + VLAN ID Packet length (Frame
Length)
DSAP/SSAP CONTROL OUI = 0x00E02B
0x00BB 0x99 0x0B ERP_LENGTH
ERP_VER ERP_TYPE CTRL_VLAN_ID LEVEL_ID SEG_ID
0x0000 SYSTEM_MAC_ADDR (high 4 bytes)
Low 2 bytes HEALTH_TIMER FAIL_TIMER
STATE 0x00 HEALTH_SEQ 0x0000
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;
LEVEL_ID: 8bits, the level number of the segment link, the major-
level ring is 0; the low-level link is larger than 0;
IDLE=0
COMPLETE=1
FAILED= 2
LINK-UP =3
LINK-DOWN =4
PRE-FORWARDING=5
N on -fau lt Status
When the links and nodes on the uni-ring has no fault, the master node
periodically sends the HEALTH packets from the main port, which are
transmitted via the transmission nodes and links on the ring to reach the
assistant port of the master node. The master node blocks the protect
VLAN forwarding function of the assistant port so that the data in the
protect VLAN cannot be transmitted via the assistant port of the master
node, avoiding the loop. The control VLAN does not block and the EIPS
protocol packets can pass the blocked assistant port of the master node.
As shown in Figure 15-10, the master node M periodically sends the
HEALTH packets; because the loop is not faulty, the HEALTH packet
reaches the assistant port of the master node; the master node blocks the
data forwarding function of the protect VLAN of the assistant port,
avoiding the loop.
If the master node itself fails, the processing is different. If the main port
fails, block the main port and enable the data forwarding function of the
assistant port; if the assistant port fails, the assistant port is still blocked.
Fault Reco ve r y
After the link fault on the ring disappears, the neighbor node of the faulty
link detects that the link fault of the port disappears; set the port of the
link on which the fault disappears as the status of forwarding the ring
network control packets so that the port can forward the EIPS protocol
packets. Set the port status as Pre-Forwarding, but the port still cannot
forward the packets of the protect VLAN.
When the link fails, the master node periodically sends the HEALTH packet
from the main port. After the link fault disappears, the master node
regards that the link recovers when the assistant port receives the HEALTH
packet. To prevent the link status flap, turn to the PRE-UP state, enable
the PRE-UP timer, and enable the data VLAN. After the PRE-UP timer times
out, turn to the COMPLETE state, re-block the data forwarding function of
the protect VLAN of the assistant port and send the COMP-FLUSH-FDB
packet to the main port. Meanwhile, the master node updates the FDB
address table of the port. After the transmission node on the ring receives
the COMP-FLUSH-FDB packet, update the FDB table of the port, set the
two neighboring ports of the faulty link as Forward state, and enable the
protect VLAN data forwarding function of the port.
To prevent the COMP-FLUSH-FDB packet from being lost, set the Pre-
Forwarding port as Forward and enable the protect VLAN data forwarding
function of the port when the neighboring node of the link on which the
fault disappears does not receive the COMP-FLUSH-FDB packet within the
specified time so that the data of the protect VLAN is transmitted
according to the topology. To prevent that the transmission node receives
two COMP-FLUSH-FDB packets, which results in the repeated updating of
the port FDB address, record the current loop status as Complete State
when the transmission node receives the COMP-FLUSH-FDB packet. If the
recorded current loop status of the transmission node is Complete State,
do not process after receiving the COMP_FLUSH-FDB packet, avoiding the
repeated updating of the port FDB table. To make the status of all
transmission nodes on the ring consistent, the master node periodically
sends the COMP-FLUSH_FDB packets. As shown in Figure 15-12, after the
link fault between the nodes T2 and T3 recovers, the nodes T2 and T3
detect that the link fault of the port disappears and set the port of the link
on which the fault disappears as the status of permitting forwarding the
ring network control packets so that the port can forward the Ethernet ring
network protect control packets. Set the port status as Pre-Forwarding,
but the port still cannot forward the packets of the protect VLAN. If the
HEALTH packets sent by the master node from the main port can pass the
link on which the fault recovers to reach the assistant port, it is regarded
that the loop recovers and starts to work and the status turns to PRE-UP;
enable the PRE-UP timer. After the PRE-UP timer times out, turn to the
COMPLETE. As shown in Figure 15-13, the master node blocks the protect
VLAN data forwarding function of the assistant port and sends the COMP-
FLUSH-FDB packet to inform other nodes of the loop recovery and to
update the FDB table of the port. After other nodes on the ring receive the
COMP-FLUSH-FDB packet, update the FDB table of the port, the
neighboring node of the link on which the fault recovers enables the Pre-
Forwarding port so that the data of the protect VLAN can pass and the
loop completes the fault protect switchover.
After the edge control node and edge assistant node detects the fault
status of the level segment link, the edge control node sends the COMM-
FLUSH-FDB packets from the edge port and the two ports of the accessed
level. If the faulty port is not the edge port of the edge control node,
enable the data forwarding function of the edge port protect service VLAN
of the edge control node; when the edge assistant node detects that the
local level segment link fails, send the COMM-FLUSH-FDB packets from the
edge port and the two ports of the accessed level.
When the edge node receives the COMP-FLUSH-FDB packet from the edge
access port and if the level of the edge access port is higher than or equal
to the level of the sending source and the upper-level node does not know
the link status change of the sending source level, forward the COMP-
FLUSH-FDB packet to the upper level and update the FDB table of the port
related with the protect service VLAN. After the two neighboring ports of
the faulty link detects that the link recovers, the EIPS protocol packets are
forwarded via the port on which the fault recovers; set the port status as
Pre-Forwarding. If the COMP-FLUSH-FDB packet of the local level segment
is received, enable the data forwarding function of the protect service
VLAN on the port; if the COMP-FLUSH-FDB packet is not received within
the specified time, automatically time out and enable the port.
Extended Functions
Realizing the Ethernet intelligent protect switch is the basic function of
EIPS and is also the main function. The following describes several
extended functions.
Main contents:
Reliability realization
The four switches M1, M2, M3, and M4 are interconnected with each other,
forming one physical ring. Configure four EIPS rings on the physical ring;
the master node of R1 is M1 and the protect instance is inst 1; the master
node of R2 is M2 and the protect instance inst 2; the master node of R3 is
M3 and the protect instance is inst 3; the master node of R4 is M4 and the
protect instance is inst 4. When the physical ring is complete, the EIPS
ring R1, R2, R3, and R4 are all complete. The master node of R1 M1 blocks
the data of inst 1 at the assistant port S; the master node of R2 M2 blocks
the data of inst 2 at the assistant port S; the master node of R3 M3 blocks
the data of inst 3 at the assistant port S; the master node of R4 M4 blocks
the data of inst 4 at the assistant port S. The data traffic of each instance
can pass different link, so as to realize the payload balance.
Basic Theor y
Each node on the ring collects the topology separately. When EIPS is
enabled on the node, the ports of the node actively send the multicast
topology request packet. After the other nodes on the same logical ring
receive the packet, add one to the TTL value. The receiving port returns
the unicast topology response packet to the requester. The response
packet contains the basic information of the node, including the node type,
node status, the information about the contained ports and so on.
Meanwhile, for the master node and transmission node, continue to
forward the topology request from another port. Each node need to reply
after receiving the topology request sent by other node. After the node
receives the topology response packet, save the information and confirm
the location in the node according to the TTL value in the packet. After all
nodes respond, the whole topology structure can be described completely.
The topology collection can reflect the topology status of the current ring,
that is, whether it is one complete ring topology structure. The main ring
and sub ring cannot see the topology structure of each other, but only can
see whether there is other edge node on the transmission node.
For the edge node and assistant edge node, there is only one port, so the
seen topology is the topology collected by the port; but for the master
node and the assistant edge node, when the topology is complete, the
topologies collected by the two ports are complete and consistent; when
the topology is in-complete, for example, one link is disconnected, they
can only collect the part of the topology and you need to combine the
collected parts to form one complete topology. The seen by the user on
the node is the complete topology after combining the topologies collected
by the two ports.
The realtime requirement is not high. Each node sends one topology
request every 10 seconds, so when the topology changes, it cannot get
the response at once and needs to be re-discovered by the re-collection of
the topology after 10 seconds. Each collection updates the previous
topology according to the new response packet. If one node is not updated
within 10 seconds, it is regarded that the node is in the topology range.
The meanings of the fields in the topology information head are as follows:
ttl: one byte; indicating the location of the node relative to the request
node; fill 0 in the topology request packet; add one after passing one
node;
baseMac: 6 bytes, indicating the MAC address of the device; for the
topology request packet, it is the device MAC address of the request
node; it should be null in the topology response packet;
SMAC: 6 bytes, indicating the MAC address of the source port; in the
topology request packet, it is the MAC address of the request packet;
in the topology response packet, it is the MAC address of the response
port;
In the standard EIPS packet field, the destination MAC address is the MAC
address of the initiating port of the topology request initiator. The MAC
address is got from the SMAC field in the information head of the received
topology request packet. ERP_TYPE in the standard EIPS packet is
TOPOLOGY(15). In the topology information head, type is 2; ttl is the hops
from the initiator to responder; DMAC is the destination MAC address, the
MAC address of the initiating port of the initiator, that is the value of the
SMAC field of the head information field in the topology request packet;
SMAC is the MAC address of the sending port.
hop: one byte, indicating the hops from the responder to initiator,
equal to the TTL value in the packet;
nt: four bits, short for node type, indicating the type of the response
node;
ns: three bits, short for node status, indicating the current status of
the response node;
b: one bit, short for border, indicating whether there is the edge node
connection; 0 means no; 1 means yes;
bm: four bits, short for backup master, indicating whether it is the
backup master node; 0 means no; 1 means yes;
ar: four bits, short for actor role, only valid for the backup master
node; o means that the backup master node role is not the master
node; 1 means that the backup master node serves as the master
node;
base mac: 6 bytes, the device MAC address of the response node;
r_role: one byte, indicating the port role of the port that receives the
request packet;
r_b: four bits, short for r_blockstatus, indicating the BLOCK status of
the port that receives the request packet on the ring of the node; 0
means non-BLOCK; 1 means BLOCK;
r_l: four bits, short for r_linkstatus, indicating the LINK status of the
port that receives the request packet; 1 means UP; 2 means DOWN;
r_i: two bytes, short for r_index, indicating the number of the port
that receives the request packet;
r_n: 16 bytes, short for r_name, indicating the name of the port that
receives the request packet. To save the memory space, intercept a
part of the port name. If it is the common port, omit port. For
example, save as 0/0/1 or 0/1; if it is the aggregation port, omit
linkaggregation. For example, save the aggregation port 1 as 1
and aggregation port 2 as 2;
r_mac: 6 bytes, indicating the MAC address of the port that receives
the request packet;
s_role: one byte, indicating the role of the port that forwards the
request packet;
s_b: four bits, short for s_blockstatus, indicating the BLOCK status of
the port that forwards the request packet on the ring of the node; 0
means non BLOCK; 1 means BLOCK;
r_l: four bits, short for r_linkstatus, indicating the LINK status of the
port that forwards the request packet; 1 means UP; 2 means DOWN;
r_i : two bytes, short for r_index, indicating the number of the port
that forwards the request packet;
r_n: 16 bytes, short for r_name, indicating the name of the port that
receives the request packet. To save the memory space, intercept a
part of the port name. If it is the common port, omit port. For
example, save as 0/0/1 or 0/1; if it is the aggregation port, omit
linkaggregation. For example, save the aggregation port 1 as 1
and aggregation port 2 as 2;
r_mac: 6 bytes, indicating the MAC address of the port that forwards
the request packet;
To solve the problem, the EIPS nodes send the detection packet LINK-
HELLO to each other. The LINK-HELLO adopts the standard EIPS packet
and uses the SYSTEM_MAC_ADDR field and the front two fields in the
packet to detect. The destination MAC address in the standard EIPS packet
is 0001.7A4F.4AB4, but can automatically learn according to the peer
destination MAC address. ERP_TYPE is LINK-HELLO(14).
SYSTEM_MAC_ADDR records the MAC address of the peer port and the
front two fields record the port number of the peer port. Meanwhile, adopt
the front fields of the reserved field in the packet record the port number
of the sending port. When the eight bytes about the peer information are
all 0.
As shown in Figure 15-16, if one node can receive the LINK-HELLO packet
of the neighbor and SYS_MAC_ADDR in the packet is the MAC address of
the local port and the port number is the number of the local port, it is
regarded that the line is bi-directional.
and the port number is the local port number, it is regarded that the LINK-
HELLO packet that takes part in the timeout judging is received. When the
nodes does not know the peer MAC address, SYSTEM_MAC_ADDR and the
port number of the LINK-HELLO packet are set as all 0.
After the receiving time out, it indicates that one direction or two
directions may be disconnected. If one direction is disconnected, the
neighbor can detect; if two directions are disconnected, the EIPS master
node can detect. Therefore, when the receiving times out, you just need to
clear up the recorded MAC address of the neighbor and do not need more
operations.
If the port belongs to multiple EIPS nodes, choose the control VLAN of one
node as the VLAN field in the LINK-HELLO packet at random when forming
the LINK-HELLO packet. For the selection convenience, select the control
VLAN of the EIPS node with the minimum node number.
Reliability Realization
In the ring topology network, if the control platform of the master node
becomes abnormal and breaks down, but the data platform is complete, it
makes the data platform become ring. To avoid the problem, back up the
master node to realize the EIPS reliability. Therefore, the concept of
backup master node is put forward. The main function of the backup
master node is to serve as the master node when the control platform of
the master node breaks down. When it is detected that the topology is
complete, block the assistant port to avoid the ring and inform other
nodes to refresh FDB.
The backup master node can only be the transmission node. The edge
node and assistant edge node, as well as the transmission node that is
connected to the edge node or assistant edge node cannot serve as the
backup master node. To avoid the influence for the link caused by blocking
the assistant port of the backup master node and the assistant port of the
master node, the assistant port of the configured backup master node
must be direct-connected to the assistant port of the master node, as
shown in the following figure.
Set the Hello packet and LINKDOWN packet on the backup master node to
go to CPU and be forwarded. When the backup master node cannot
receive the HELLO packet of the master node, send the HELLO1 packet
(the format of the HELLO1 packet is the same as that of the HELLO packet;
only the destination MAC address is different; the destination MAC address
of the HELLO1 packet is 0001.7A4F.4AB5) that detects the integrality of
the data platform of the master node and the complete status of the ring.
If the assistant port can receive the HELLO1 packet, it indicates that the
loop is complete and the data platform of the master node is complete,
but the control platform breaks down. Here, the assistant port should be
blocked, and send the COMP-FLUSH-FDB packet from the main port. Set
the working status of the backup master node as the master node, as
shown in Figure 15-21.
When the backup master node works as the master node, its working
theory is basically the same as the master node. When the LINKDOWN
packet on the ring is received, you need to enable the assistant port and
send the COMM-FLUSH-FDB packet to the ring via two ports. If the HELLO
packet of the master node is received and the assistant port is in the
BLOCK state, you need to enable the assistant port and switch the working
status to the transmission node status.
ULFD Technology
This section describes the theory and realization of the ULFD protocol.
Take fiber as an example. The uni-directional link includes two types. One
is that the fibers are cross-connected; the other is that one fiber is not
connected or one fiber is disconnected. As shown in Figure 16-1, the fibers
of the two devices are cross-connected; as shown in Figure 16-2, the
hollow wire means that one fiber is not connected or one fiber is
disconnected. The typical case of Figure 16-2 is that one device is not
connected or disconnected.
The ULFD protocol has the following features. ULFD is the link layer
protocol and it cooperates with the physical layer protocol to monitor the
link status of the devices. The auto negotiation mechanism of the physical
layer is used to detect the physical signals and faults; ULFD is used to
identify the peer devices and uni-directional link and close the un-
reachable ports. After enabling the auto negotiation mechanism and ULFD,
they cooperate to work and can detect and close the physical and logical
uni-directional connection and prevent other protocols (such as STP
protocol) from become invalid. If the links of the two ends can work
separately at the physical layer, ULFD detects whether the links are
connected correctly and whether the two ends can exchange packets. The
detection cannot be realized via the auto negotiation mechanism.
Org ID : 0x00017a
Flags(8 bits):
1 Byte
0 1 2 3 4 5 6 7
Recommended timeout flag(RT) ReSynch flag(RSY) Reserved
The RSY flag is used to indicate that the packet is normal probe keepalive
packet or the probe packet that requests re-synchronization and detection.
When the RSY flag is 1, the receiving end needs to return the echo packet.
TLV format:
If the TLV type is in the TLV type range defined by ULFD, the TLV is
regarded as invalid.
Protoco l Action
The work of the ULFD protocol contains the following aspects:
Neighbor discovery: The port sends its own information and the re-
synchronization request via the probe packet, while the peer port realizes
the neighbor discovery according to the content information of the probe
packet after receiving the probe packet. After the port receives one probe
packet, judge whether the sending port is in the neighbor table. If no, it
indicates that it is the new neighbor, so add it to the neighbor table and
return the echo packet for uni-directional detection; if the sending port is
in the neighbor table, but the probe packet is set with the RSY flag, it
indicates that the neighbor requests re-synchronization and send the echo
packet to the port for the uni-directional detection; if the sending port is in
the neighbor table and is not set with the RSY flag, the probe packet is
one common keepalive packet and update the information of the neighbor.
Neighbor aging: After the neighbor is added to the neighbor table, the
port sets one aging time Tlf according to the Message Interval value in the
received probe packet. If the port does not receive the probe keepalive
packer sent by the neighbor after reaching the time Tlf, the neighbor is
aged and deleted from the neighbor table.
In normal state, if the port does not the packet of the peer end in the
keepalive stage, the port is in the un-confirmed status; if the port does not
receive the echo packet of the peer end or the received echo packet does
not have the local port information in the uni-directional detection stage, it
is regarded that the local port and the peer link are in the uni-directional
state. The Normal mode is often used to check the uni-directional status
caused by the crossover connection.
In the aggressive mode, if the port does not receive the packet of the peer
end and as a result, all neighbor are aged in the keepalive stage, and no
any neighbor is learned after the process of Re-establishing the link, it is
regarded that the local port is un-reachable (not the uni-directional link on
strict meaning), and shut down the local port; if the port does not receive
the echo packet of the peer end or the received echo packet does not have
the local port information in the uni-directional detection stage, it is
regarded that the local port and the peer link are in the uni-directional
state. The Aggressive mode is used to check the uni-directional connection
caused by the fiber crossover connection or disconnection.
Typical Application
When using ULFD, ensure that the corresponding ports are configured with
the ULFD function and work in the same detection mode; the ULFD global
setting of the device is enabled.
Illustration
Port 0/0 of the local switch A is connected to Port 0/1 of the peer switch B
via the fiber. Now, configure the ULFD function on the connection to detect
the connection status of the link.
Command Description
SwitchA(config)# port 0/0 Enter the port configuration mode
SwitchA (config-port-0/0) #ulfd port Configure the ULFD work node aggressive on
aggressive port 0/0
SwitchA (config-port-0/0) #exit Exit the port configuration mode
SwitchA (config)#ulfd message time 16 Configure the interval of sending packets as
16s
SwitchA(config)#ulfd enable Enable ULFD globally
SwitchA (config)#exit Complete the ULFD configuration
Command Description
SwitchA(config)# port 0/1 Enter the port configuration mode
SwitchA (config-port-0/1) #ulfd port Configure the ULFD work node aggressive on
aggressive port 0/1
SwitchA (config-port-0/1) #exit Exit the port configuration mode
OAM Technology
The chapter describes the MAN OAM technology and the applications. OAM
is short for Operation, Administration and Maintenance.
Main contents:
Main contents:
Maintenance Association End Point (MEP): It can receive and send any
CFM packet. Each MEP is identified by an integer, which is called MEP ID.
MEP is configured on the port and decides the MD range. The MA and MD
to which the MEP belongs decide the VLAN attribute and level attribute of
the packet sent by MEP. According to the location of MEP in MA, the MEP
direction includes inward and outward. If the packet in MA is received from
the port on which the MEP is configured, the MEP direction is outward.
Similarly, the outward MEP can only send packets to the network via the
port on which the MEP is configured. Contrariwise, if the packet in MA is
received from other port, the MEP direction is inward. The inward MEP
cannot send packets to the network via the port on which the MEP is
configured.
This section describes some basic concepts and functions of Ethernet CFM.
M aintenance Do main
The maintenance domain is a part of the network covered by the
connectivity fault management. Its limit is defined by a series of
maintenance points (MP) configured on the ports, including MEP and MIP,
as shown in figure 17-1.
Maintenance domain
Figure 17-2 shows three maintenance domains, that is, customers, service
providers, and carriers, as well as the hierarchical structure of the
maintenance domains. CE is the edge device of the customer (Customer
Edge); PE is the edge device of the service provider (Provider Edge).
M aintenance Poin t
One maintenance point is one function point configured on the port, which
takes part in the CFM protocol operation. According to the different
locations of the maintenance points in the maintenance domain, the
maintenance point is divided to Edge Maintenance Point (MEP) and
Maintenance Intermediate Point (MIP).
MIP can process and respond to some CFM packets (such as LT packet or
the packet whose destination is the LB which is at the same layer as itself),
but cannot send packets initiatively.
Figure 17-3 shows the case that MEP and MIP are on the devices of the
customers, service providers, and carriers.
When the maintenance domain is used to locate the fault, you can first use
LT or LB to determine the fault interval on Level 5. If the fault is between
two MIPs on Level 5, continue to use LT or LB to locate the fault on Level 3.
The packets sent or received by each MP belong to its MA, have the
features of the VLAN and layer, and do not interfere with each other. The
rest is deduced by analogy until the minimum fault area is found.
Similarly, MEP sends CCM, and remote MEP receives and processes it.
When the MD and MA configured by remote MEP are inconsistent with
those configured by the MEP that sends CCM, you can find out the
configuration error in the network.
C onnecti vi t y Check
The continuity check function is the most basic function in 802.1ag, used
to check the connection failure of the Ethernet flow between MPs. The
connection failure may be caused by the fault or configuration error. The
connectivity check is suitable for checking the unidirectional connection
failure. Figure 17-4 shows the example chart of one CC function. The
maintenance domain (Provider Domain) contains two Operator Doamians
(Operator A and Operator B).
Connectivity checking
After the MEP receives the CCM sent by the equivalent MEP in the same
maintenance domain and analyzes it correctly, the information of the peer
MEP is saved in the CCM database. The information includes MEP ID, MAC
address of MEP, remote error ID (RDI) of MEP, Sender ID of MEP, and so
on.
The local MEP compares the MEP ID of the received CCM to ensure that
there is no repeated MEP ID in the local configuration. If there is repeated
The timeout of CCM is the 3.5 multiples of the sending interval, that is, the
connection between the local MEP and the remote MEP is regarded as
wrong when three successive CCMs are lost.
01-80-C2-00-00-3y
MD Level of CCM Four address bits y
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
CCM can reach any MEP in one MA. When other MEPs receive the CCMs
from one MA, first get the packet information and save it in the CCM
database, and then check whether the CCMs of all other MEPs in one MA
are received within the specified time.
Suppose MEP sends one CCM. When the CCM reaches the MIP in the MA,
the MIP continues to forward it; when the CCM reaches the destination
MEP of the same MA, the MEP checks whether the Level is the same as
CCM. When the timer does not time out, process the packet, re-set the
timer, and wait for receiving the next CCM sent by the remote MEP.
When receiving the CCMs sent by the other MEPs in the same MA, MEP
periodically multicasts the CCMs outward. The local MEP is responsible for
checking whether the MEP in the local CCM database times out. If the MEP
times out, it indicates that the connection with the remote MEP fails;
report the error to the network administrator.
When the sending interval of the received CCM is inconsistent with the
configured value in MA, it triggers error notification (FNG alarm). When
the MA IDs of the received CCMs are inconsistent, it indicates that there is
cross-connection error, which also triggers FNG alarm.
Loopback Check
Loopback (LB) check function is used to check the connection status with
the remote device. It is suitable for checking the bidirectional connectivity
failure. The LB function is shown in Figure 17-5.
Loopback check
Execute the command to send Loop Back Message (LBM) actively on MEP
via the network management system. The target can be any MP in MA. For
the other remote MEP in MA, the local MEP can get its MAC address via
CCM; for MIP, the local MEP gets its MAC address by sending Link Trace
Message (LTM).
Each LBM has a unique serial number. After sending LBM, the serial
number of the packet is reserved for at least 5 seconds, used to
distinguish whether the received Loop Back Reply (LBR) is the correct
reply packet of the sent LBM.
LBR with those of LBM; the packet type is changed to LBR; the contents
are the same as those of LBM.
When MEP receives LBR, it checks whether the serial number is consistent
with that of the latest LBM. If inconsistent, it indicates that there is error;
if MIP receives one LBR, it is regarded as one error packet and drop it.
LTM is the multicast packet. The multicast address is as shown in table 1-2.
01-80-C2-00-00-3y
MD Level of LTM Four address bits y
7 F
6 E
5 D
4 C
3 B
2 A
1 9
0 8
TLV of LTM contains one original address (Original MAC) and one target
address (Target MAC). The original address is the address of the port
where the MEP that sends LTM is located; the target address is the MAC
address of the target MEP to which the LTM is sent. Their difference is the
destination address and source address of the Ethernet data frame. There
is a unique serial number in the LTM packet, which is added with one
every time sending.
Each MP with the same level to the target address sends one LTR packet
to the original address. The packet is one unicast packet, whose source
address is equal to the target address of LTM and the destination address
is equal to the original address in TLV of LTM.
When the FNG alarm appears, send the LTM packet to track and locate the
error link. MEP sends one LTM and MIP decides whether to receive the LTM
packet according to the level of the maintenance domain. When receiving
the packet, MIP first checks whether the TTL value of LTM is 0. If yes, drop
the packet. Otherwise, subtract one from TTL and then search for the
egress port to forward the LTM packet according to the target address and
VLAN ID of LTM in the FDB table. If the egress port is not found in the FDB
table, drop the LTM packet. When the LTM packet is forwarded, the other
information except for the source MAC address and TTL value does not
change. The MIP on the port replies one LTR packet to the source MEP
after one random delay. When the network fails, LTM can only reach the
MP before the faulty point. The MPs between the faulty point and the
target MEP do not reply LTR. In this way, the faulty area can be found.
C FM Packe t
The CFM packet type is 0x8902. The public head of the CFM packet is as
shown in Figure 17-7.
CCM packet
LTM packet
LTM packet
UNI-N of E-LMI
UNI-C of E-LMI
Typical application
CE Polling:
The UNI-C device transmits the E-LMI Check message (E-LIMI Check
STATUS ENQUIRY) to the UNI-N device for active polling; the polling
interval is T391s (by default, it is 10s). Every after N391 times (360 times
by default) of active polling, UNI-C transmits one complete status request
message (FULL STATUS ENQUIRY). UNI-N transmits the status and
configuration information of UNI and EVC to UNI-C as response. UNI-N
enables the T392 timer to wait for the request message of UNI-C. The
configured value of T392 must be larger than T391.
PE Informing:
E -L MI Message Type
MEP 16 defines two kinds of message types to realize the E-LMI protocol
interacting, including STATUS ENQUIRY message and STATUS message.
The content type (Report Type) transmitted by the E-LMI packet is divided
to the following four types:
Single EVC Asynchronous Status: active EVC status informing packet; the
packet can only be sent by UNI-N to inform CE of the EVC status change
information.
The STATUS ENQUIRY message is sent by UNI-C to ask the UNI-N for the
configuration and status information of EVC and UNI. After receiving one
valid STATUS ENQUIRY message, UNI-N should send one STATUS
message to reply the request message.
STATUS Message:
The Report Types of the messages are different, so the contents of the
STATUS massages are different. The content relation is as follows.
STATUS message
Information Element
Sequence Numbers X X X
Data Instance X X X
UNI Status X
EVC Status X X X
CFM
EVC
One EVC needs to be bound with the CFM management domain instance.
The connectivity between the UNIs in EVC can be got via CFM
management domain instance.
UNI
Bundling: Multiple EVCs can be configured on one UNI and one EVC can
map with multiple CE-VLAN IDs;
All to one Bundling: One UNI can be bound with only one EVC and all
CE-VLAN IDs map to the EVC;
The port of the UNI-N end needs to be configured as the MEP node of one
CFM domain and enable the CC function of CFM. In this way, UNI-N end
can get the connection status between the UNIs of EVC configured on the
PE device via the 802.1ag protocol, so as to get the current operation
status of the EVC.
After enabling the PE mode of the E-LMI protocol on the UNI-N, the UNI-N
waits for the request of UNI-C and makes the corresponding response.
When UNI-N finds that the status of the EVC bound to the UNI changes, it
actively sends the EVC status notification message to the PE.
UNI-C of E-LMI
The UNI-C of E-LMI only needs to enable the E-LMI protocol and run in the
CE mode. After being configured as the CE mode, UNI-C periodically sends
the E-LMI Check request to UNI-N and initiates one Full Status request to
ask UNI-N for the EVC and UNI configuration and status information when
finding that the Data Instance values of EVC and UNI do not match with
each other via the E-LMI Check message. Besides, the local UNI-C
information is updated.
Typical Applications
The following is one typical application of E-LMI.
Enable the E-LMI protocol on the UNI connection UNI1 between CE1 and
PE1. CE1 gets the UNI1 configuration information, and EVC_Provider
configuration and status information from PE1 via the E-LMI protocol, so
as to complete the auto configuration function of CE1.
Currently, Ethernet OAM mainly solves the OAM problems of the Ethernet
devices at the last one km, including link performance monitoring, fault
detecting and alarming, loopback test, remote MIB and variable request.
All functions of Ethernet OAM can become valid only after the Ethernet
OAM connection is set up.
As shown in the above figure, the Ethernet OAM is located between MAC
Control layer and the LLC layer.
Protoco l Structure
As shown in the above figure, Ethernet OAM comprises the OAM sublayer
and OAM client.
The OAM sublayer is responsible for the flow dividing and remote loopback
policy processing of the sent and received packets on the interface; OAM
client is responsible for the connection maintenance and remote loopback
control of the protocol.
Control is responsible for sending and receiving the Ethernet OAM protocol
packets.
As shown in the above figure, the destination address of the Ethernet OAM
packet is 01-80-C2-00-00-02; the OAM packet belongs to the low-speed
protocol (the protocol number is 88-09); the subtype is 0x03;
Data/Pad is the data content of the Ethernet OAM packet, which varies
with Code;
Information OAMPDU:
Loopback Control OAMPDU packet is used for remote loopback control. The
device can select whether to use the packet. To realize the loopback
control, the local DTE sends the loopback control command to the remote
DTE. If the loopback control function of the remote DTE is enabled, the
sent packet is returned to the sending party. The packet format is as
follows.
The device can select Active mode or Passive mode to set up the OAM
connection. The DTE (Data Terminating Entity) processing capabilities in
active mode and passive mode are as follows.
The above figure shows the status transferring of the Ethernet OAM
connection. Besides the status transferring described in the above figure,
there are several special status transferring:
1. When the connected timeout timer times out, all status return to
Active or Passive;
2. When the port is down or the OAM function is shut down, all status
return to Fault;
Status Description
Fault Ethernet OAM does not begin running.
Active Active status, actively sending out the information OAMPDU
packet that contains Local information TLV periodically to
discover the connection.
Passive Passive status, passively waiting for the the information
OAMPDU packet that contains Local information TLV to accept
the connection
Discovered Discovered connection status, periodically sending out the
information OAMPDU packet that contains Local information
TLV and Remote information TLV to negotiate the connection
and enable the connection timeout timer
Local-stable The connection status that the local passes the attribute
matching, periodically sending out the information OAMPDU
packet that contains Local information TLV and Remote
information TLV to negotiate the connection and enable the
connection timeout timer
Up The setup status of the connection, periodically sending out
information OAMPDU packet that contains Local information
TLV and Remote information TLV to keep alive the connection
and enable the connection timeout timer
Event Description
Ethernet OAM port UP The Ethernet OAM port becomes up
Ethernet OAM port DOWN The Ethernet OAM port becomes down, including port down
and Ethernet OAM function shutdown
Receive the information The information OAMPDU packet is received.
OAMPDU packet
Local attribute matching According to information OAMPDU, match the local attribute
passed and the matching is passed
Local attribute matching not According to information OAMPDU, match the local attribute
passed and the matching is not passed
Remote attribute matching According to the flags digit of the information OAMPDU
passed packet, judge that the remote attribute matching is passed
Remote attribute matching According to the flags digit of the information OAMPDU
not passed packet, judge that the remote attribute matching is not
passed
Connection times out The connection is invalid and the timer times out
The serious link event types of the Ethernet OAM connection and the
definitions are as follows:
Event Definition
Link fault Hardware PHY finds the link fault at the receiving
direction;
Dying gasp The un-recoverable fault event happens to the local. For
example, Ethernet OAM is down.
critical-event Un-predictable serious event happens (currently, there is
no definition)
The link monitoring types and the definitions of Ethernet OAM connection
are as follows:
The link monitoring types and the definitions of Ethernet OAM connection
looped back by the peer end, so as to test the parameters of the link, such
as packet loss rate and delay.
In the remote loopback test mode, the processing of the OAM sublayer is
as follows:
EVC Technology
Main contents:
Related terms
Application description
Typical application
Related Terms
This section describes the related terms of EVC.
The difference is that there can be multiple EVPLs on one UNI, while there
can be only one EPL on one UNI.
Maipu switch does not support this kind of EVC directly, but can
support indirectly by configuring the port separation and L3 forwarding
features between UNIs.
All to one Bundling: One UNI can be configured with only one EVC; all
CE-VLAN IDs are mapped to the EVC;
The port of the UNI-N end needs to be configured as the MEP node of one
CFM domain and enables the CC (Cross Check) function of CFM. In this
way, UNI-N end can get the connection status between the UNI ends of
EVC configured on the PE device via 802.1ag, so as to get the current
operation status of EVC.
Application Description
EVC provides the public attributes and configurations, cooperating with the
modules to realize the service functions. For details, refer to EVC
Configuration Manual. The main attributes of EVC are described as follows:
EVC type: There are two types, that is, point-to-point and multipoint-to-
multipoint. Point-to-point means that there are only two UNI ports in one
Local MEP and remote MEP of EVC: MEP is the end point used to
maintain the connection and can send/receive ant CFM packet. Each MEP
uses one integer to identify, called MEP ID.
QINQ type: There are two kinds, including double and mapping. Double
supports the mapping of multiple CEVLANs and one single EVC, while
mapping only supports the mapping of one single CEVLAN and one single
EVC.
QINQ mode: There are two kinds, that is, one and multiple. The one
mode does not need to configure SVLAN and CEVLAN of EVC, adopting the
port default value; multiple has no limitation.
Associate EVC to the local port and run QinQ function on the port to set up
the EVC connection. Bind EVC on the port, get the QinQ information in EVC
according to EVC ID and convert the QinQ information to the port
configuration. The UNI mapping type of the port should match with the
information in the bound EVC. The detailed matching rules are as follows:
You can bind EVC only to the Hybrid and Trunk port, but cannot bind EVC
to the Access port. The UNI mapping of the port is ALL-TO-ONE. The port
can only be bound to one EVC and all CEVLANs are mapped to the EVC.
If the UNI mapping of the port is BUNDLING, the port can be bound to
multiple EVCs and each EVC can be configured with multiple CEVLANs. The
CEVLANs in the multiple bound CEVLANs cannot be the same and SVLANs
cannot conflict with each other.
If the UNI mapping of the port is MULTIPLEXING, the port can be bound to
multiple EVCs, but each EVC can be configured with only one CEVLAN. The
CEVLANs in the multiple bound EVCs cannot be the same and SVLANs
cannot conflict with each other.
2. The application combination of EVC and ELMI (for ELMI, refer to ELMI
Technical Manual)
Bind the E-LMI protocol on the connected ports of the PE and CE devices,
and run the E-LMI protocol as the PE and CE modes. With the E-LMI
switching, the CE device can get the configuration information and status
information of all EVCs bound on the ports connected to the CE device
from the PE device. Meanwhile, when the EVC status on the PE port
changes, actively inform the CE device to update at once via the E-LMI
protocol.
One EVC needs to be bound with the CFM management domain instance.
With the CFM management domain instance, you can get the connectivity
between the UNIs in the EVC.
The current status of EVC depends on the status of all local ports and
remote ports in EVC. The status of the remote port needs to be got via
802.1ag. Therefore, EVC needs to concern and process the following
events in 802.1ag: add remote MEP, remote MEP status UP, delete remote
MEP, remote MEP status DOWN, and delete CFM management domain
information. Process the events to update the current status of EVC.
Typical Application
The following figure shows one typical application instance of combining
EVC and E-LMI.
Enable the E-LMI protocol on UNI1 between CE1 and PE1. CE1 gets the
UNI1 configuration information and the configuration and status
information of EVC_Provider from PE1 via the E-LMI protocol, so as to
complete the auto configuration function of CE1.
LLDP Technology
Main contents:
Overview
Overview
LLDP (Link Layer Discovery Protocol) is the link layer protocol defined in
802.1ab. It organizes the information of the local device as TLV
(Type/Length/Value) to be encapsulated in LLDPDU (Link Layer Discovery
Protocol Data Unit), which is sent to the direct-connected neighbor.
Meanwhile, LLDP saves LLDPDU received from the neighbor in MIB
(Management Information Base). With LLDP, the device can save and
manage the information of itself and direct-connected neighbor device for
the network management system to query and judge the communication
status of the link. LLDP does not configure or control network elements or
traffic, but it only reports the L2 configuration. Another content in 802.1ab
is to make the network management software use the information
provided by LLDP to discover some L2 contradiction.
When some configuration about the LLDP of the local device (such as
holdtime, select the released TLV type) changes, or when the polling
mechanism finds that the configuration information of the local system
LLDP changes after the polling function is enabled, to make other devices
discover the change of the local device as soon as possible, enable the
rapid transmitting mechanism, that is, transmit the LLDPDU of the
specified number (it is 3 by default) continuously at once, and then
recover to the normal transmitting period.
When the device disables LLDP globally or the port on which LLDP is
enabled performs the operations of shutdown, adding into aggregation
group, disabling LLDP, and executing the system reload, to make the
neighbor device learn the disabling of the local device LLDP rapidly, you
need to transmit one CLOSE TLV LLDPDU to inform the neighbor.
Set the aging time of the local information on the neighbor device by
configuring holdtime. The default value is 120s. The maximum value of
holdtime is 65535s.
The device does not support sending Protocol Identity TLV, but can receive
this type of TLV.
Link Aggregation TLV: Whether the port supports the link aggregation
and whether to enable the link aggregation;
As shown in the above figure, the port 0/0/1 of SW1 is connected with
port 0/0/1 of SW2; port 0/0/2 of SW1 is connected with port 0/0/2 of SW3.
Configure LLDP function on the three devices. The three devices can
exchange information via LLDPDU and query the neighbor information of
each other. The remote NMS can be connected to the device for network
management and topology collection, so as to realize the cluster
management.
Main contents:
Related terms
Introduction
Related Terms
Dynamic MAC address: the auto learned MAC address of the packet
received by the switch. When the port receives one packet, search
whether the source/destination MAC address of the packet is in the MAC
address table. If not, associate the port, VLAN and source MAC address
and save in the MAC address table.
Static MAC address: the static forwarded MAC address configured by the
user via the shell command or snmp proxy; the static MAC address and
the dynamic MAC address have the same function, but compared with the
dynamic MAC address, the static MAC address does not age.
Filter MAC address: the static filtered MAC address configured by the
user via the shell command or the snmp proxy; when the source or
destination MAC address of the packer received by the gateway is the filter
MAC address, directly discard the packet.
Aging time: the existing time of the dynamic MAC address in the MAC
address table after the switch learns the MAC address.
Introduction
The MAC address entry contains the address information of the packet
forwarding between the ports. There are three types of addresses in the
MAC address entries, including static MAC address, dynamic MAC address,
and filter MAC address. The MAC address entry is formed by the
information, such as MAC address, VLAN, port number and the type of the
MAC address.
The static MAC address can only be set manually or via other software.
Compared with the dynamic MAC address, the static MAC address is not
aged and cannot be learned, but can only be added and deleted manually.
According to the function, the static MAC address is divided to three kinds,
that is, the static MAC address of forwarding packets normally (FWD), the
static MAC address of only transmitting the packet to CPU, but not
forwarding the packet (TRAP) and the static MAC address of transmitting
the packet to CPU and forwarding packet (F&T).
The filter MAC address is global and functions on the whole switch. If one
MAC address is configured as the filter address, the host of the address is
prohibited to access the network via the switch, that is, the packet with
the destination or source MAC address as the MAC address is dropped.
The dynamic MAC address is the MAC address that is learned according to
the source MAC address of the packet after the switch receives the packet.
The MAC address entry is associated and saved according to the MAC
address, VLAN ID and port value. The MAC address table updates the
entries according to this mode. When receiving one packet whose
destination MAC address is in the MAC address table, forward it directly.
Otherwise, write the source MAC address into the MAC address table, that
is, learn one MAC address and forward the packet to other member ports
of the VLAN to which the port belongs. That is to say, the packet floods.
When the number of the MAC addresses learned by the port reaches the
maximum value, do not learn any more and flood the packet. If the device
does not receive the packet with the source MAC address packet as the
address before the aging time of the dynamic MAC address arrives after
learning one MAC address, the MAC address entry is deleted when the
aging time arrives.
The port-based MAC address learning number limitation is that the user
can configure to limit the number of the dynamic MAC addresses learned
by each port. Usually, the maximum number of the MAC addresses that
can be learned by one port is 32767. When the number of the MAC
addresses learned by the port reaches 32767, do not learn MAC address
any more. The new MAC address cannot be learned until the MAC
addresses are aged and the new address packets do not flooding.
The function of the static forwarding MAC address and dynamic MAC
address is fast forwarding, that is, the MAC address table is one fast
forwarding table, which can make the packet be forwarded via the
specified port rapidly and correctly, so as to prevent the packet from being
broadcasted in the whole VLAN.
Note
The static MAC address entries configured by the user manually and filter
MAC address entries are not covered by the dynamic MAC address entries,
but the dynamic MAC address entries can be covered by the static MAC
address entries and black-hole MAC address entries.
Basic concepts
Technology theory
Realizing method
Typical application
Basic Concepts
With the evolution of the network technology and the network
convergence, the network data transmission and switching mode with the
packet as the basic unit will be the dominant in the next generation
network. Both IP network and MPLS network are the representatives of the
packet switching network. However, the next generation network (NGN)
cannot be constructed overnight. The current PDH/SDH network serving
PSTN public voice communication services will exist for a long time, and
the existing TDM devices of users on the network will still be used. To
protect the investments of users on the TDM devices, it is necessary to
provide the capabilities of accessing the TDM services and transmitting the
TDM data transparently in the next generation packet switching network.
For the data transparent transmission of the TDM circuit switching service
on the packet switching network, several standard organizations put
forward their own standards and solutions. Currently, the TDM circuit
emulation is the most mature.
The standards put forward by MEF focus on how to encrypt the original
TDM service to the Ethernet frame, while the standards of MFA focus on
how to bear the TDM service on the MPLS network. ITU-T standards also
focus on the data layer. It provides the mode of MPLS bearing the TDM
service data and the mode of IP bearing the TDM service data. Besides,
ITU-T defines the clock transmission solutions that are important for the
TDM service.
Commonly-used Terms
PWE3 (Pseudo Wire Edge to Edge Emulation): IETF defines the meaning
of PW in RFC3985, that is, an emulation of using the packet switching
network to bear the local service;
IWF (Interworking Function): the device that switches the data between
two different networks;
CE (Customer Edge): the device to initiate and terminate the TDM service;
PE (Provider Edge): the device that provides the PWE, which is equivalent
to IWF;
Bundle: the bit flow sent by the TDM circuit of the PE devices at the two
sides of the PW; it can comprise any several 64Kbps time slots in one E1
or T1. Bundle is the uni-directional data flow. It often matches the
opposite Bundle to form the full-duplex communication. There can be
several Bundles between two PE devices.
Technical Theory
IETF PWE3 working group plays a leading role in making the standards of
the TDM service transparent transmission, so the standards of the TDM
RFC4553 provides the emulation function for the low-rate PDH circuit
services such as E1/T1/E3/T3. SAToP is to transmit the unstructured (that
is unframed) E1/T1/E3/T3 service data. It segments and encrypts the TDM
service as the serial data stream, and transmits it on the PW tunnel. In the
elements of the TDM emulation service described in the above section, the
protocol can provide the transparent transmission of the TDM service and
the transmission of the synchronous timing information, but cannot
identify the TDM frame structure. Therefore, the information about the
TDM frame structure and the signaling in the TDM frame cannot be
identified and processed, and can only be transmitted transparently. The
protocol is the simplest mode of transmitting the PDH low-rate service
transparently in the TDM circuit emulation scheme. It is also because it is
simple to realize that it is released by IETF as the RFC formal standard.
The L2TPv3 mode adopts the L2TPv3 packet head to encrypt the PWE3
packet and uses the different session IDs to distinguish different outer
tunnels. The mode can adopt the L2TPv3 protocol negotiation to set up the
outer tunnel and distributes different session IDs to the different PWs in
the tunnel via the protocol. It is more flexible than the UDP/IP mode in
using.
MPLS mode adopts the MPLS label to encrypt the PWE3 packet and adopts
the LSP as the outer tunnel of PW. The PW label is the most inner label of
the MPLS label stack. In MPLS mode, the user can perform the dynamic
2. CESoPSN protocol
Compared with SAToP, CESoPSN can provide the structured TDM service
emulation transmission function, that is, can identify, process, and
transmit the framed structure and the signaling in the TDM frame. Take E1
as an example. The structured E1 comprises 32 time slots. Except for time
slot 0, the other 31 time slots can bear one 64Kbps voice service
respectively. Time slot 0 is used to transmit the signaling and the frame
symbol. The CESoPSN protocol can identify the frame structure of the TDM
service. The idle time slot channel does not need to transmit the data.
Only the useful time slots of the CE device are used to encrypt the E
service flow to the PWE3 packet. Meanwhile, the functions of identifying
and transmitting the CAS and CCS signaling in the E1 service flow are
provided.
Besides the TDM service data, CESoPSN provides the scheme of identifying
and transmitting the CAS signaling.
3. TDMoIP protocol
The PW encryption modes (UDP/IP mode, L2TPv3 mode, MPLS mode, and
MEF mode) on different PSN networks are described. Both SAToP and
CESoPSN take the TDM bit flow as the payload encrypted by the PW, while
TDMoIP adds three new TDM payload types, that is, the AAL1 payload,
AAL2 payload, and HDLC payload. Currently, the TDM service products
developed by Maipu support the HDLC TDM payload.
Besides, the PWE3 working group of IETF defines the structured circuit
emulation scheme for the high-end and low-end channel of SONET/SDH to
transmit the VC11/VC12 and VC2 TDM service data transparently via the
PWE3 mode.
Figure 21-2 Mapping relation between the function layer and MEF
packet encryption
Key Technologies
D ata J itter Bu ffer
After crossing the packet switching network to reach the exit PE device,
the reaching interval may be different and the packets may be out of order.
To ensure that the TDM service data flow can be re-constructed on the exit
PE device, the jitter buffer technology is needed to smooth the interval of
the PW packets and re-arrange the packets that are out of order. The
capacity of the jitter buffer considers the performance eclectically. The
jitter buffer with large capacity can absorb the packet transmission
interval jitter with much change in the network, but brings in large delay
when re-constructing the TDM service data flow. Providing the jitter buffer
whose capacity the user can configure and adjust is a good policy. The
user can configure it flexibly according to the different network delay and
jitter. Currently, the TDM circuit emulation products developed by Maipu
support configuring the jitter buffers with different capacities via the
command.
emulation mode to transmit the TDM service transparently, the data delay
comprises the following aspects, that is, packet encryption delay, service
processing delay, and network transmission delay.
2. Service processing delay is the time for the device to process the
packet, including the packet validity check, packet filtering, parity
check, and calculation, packet encryption and receiving and
sending. The delay depends on the service processing capability of
the device. For one device, it is fixed.
The TDM service delay depends on the above three kinds of delays.
Comparing the two modes, the un-structured mode is simpler. It does not
need to identify the frame format in the TDM data flow and is more
commonly used. For the device in the traditional data network that takes
E1/T1 as the synchronous serial interface (that is, ignore the frame format)
and adopts the net channel transmission, the un-structured TDM Pseudo
Wire Emulation is more convenient.
Realizing Methods
PWE3 Packet Format
Currently, Maipu only supports the PWE3 packets encrypted by the
UDP/IPv4 mode. As shown in Figure 21-3, the TDM service data is
encrypted in the TDMoIP PAYLOAD of the packet.
The format of the UDP/IPv4 head is as shown in Figure 21-4. The source
IP address is the local address of the Pseudo Wire. The source addresses
of the PWE3 packets sent from the local are the same. The destination IP
address is the remote address of the Pseudo Wire. The destination IP
addresses of the PWE3 packets sent to the Pseudo Wire are different. UDP
destination port number is fixed as 2142, which is the private port number
of TDM over IP distributed by IANA. It is the ID of the PWE3 packet
encrypted by the UDP/IP mode. The UDP source port number is used to
distinguish the PWE3 packets of different bundles on one Pseudo Wire and
the value range is 1-8063.
The control word provides the method of exchanging TDM circuit status
and PSN network status for the PWE3 packet. The format is as shown in
Figure 21-5. RES is the reversed field and must be set as 0. L bit means
the local asynchronous; placing 1 at L bit means that the local is detected
or informed. The fault at the TDM physical layer results in the incomplete
of the data, so the bit can be used to indicate the asynchronous at the
physical layer and trigger generating the AIS signal at the remote side.
After the TDM fault is fixed, L bit is cleared up. R bit means the remote
receiving fault. Placing 1 at the R bit means that the remote does not
receive the packet from the Ethernet port. R bit can be used to advertise
the fault block or other network faults. Receiving the remote fault
indication can trigger the rollback mechanism to avoid the block. The R bit
is placed with 1 after the pre-set successive N packets are not received;
after the packets are received, the R bit is cleared up.
FRG field means the segmenting type and it is used for the CAS multi-
frame structure in the CESoPSN protocol. When FRG is 00, it means that
the multi-frame is in one packet; 01 means that the packet carries the
first segment of the multi-frame; 10 means that the packet carries the last
segment of the multi-frame; 11 means that the packet carries the middle
segment of the multi-frame. LENGTH field means the total bytes of the
control word, payload, and RTP head (if there is), which is used when the
length is less than 64 bytes. When the length is more than 64 bytes, the
field is set as 0. SEQUENCE NUMBER field means the serial number of the
packet. The initial value is a random value and it increases according to
the sent packets. When reaching the maximum value, it rolls back to 0.
The field is used to check whether the packet is lost.
The RTP head is used to carry the clock information and assist the
receiving end to recover the TDM clock from the PSN network. The format
is as shown in Figure 21-6. V means the version and is fixed as 2. P means
the filling bit and is fixed as 0. CC is the CSRC count and is fixed as 0. M is
the marking bit and is fixed 0. PT field means the payload type and the
value of each bundle is unique. SN is the serial number of the packet and
is the same as SEQUENCE NUMBER in the control word. TS is the time
stamp and has two generating modes, that is, absolute mode (it is from
the recovered clock on the TDM line and it increases by 1 every 125 ms)
and the relative mode (it is from the common clock and it is added with 1
every time receiving a bit). SSRC indicates the synchronous source.
SAToP Protocol
The TDM port on PE works in the non-framed mode and does not concern
the received TDM frame structure information, which is regarded as a bit
flow with fixed rate. As shown in Figure 21-7, SAToP processes the TDM
flow with the byte (8 bits) as the unit. Every N received TDM bytes are
encrypted to the TDM payload of the PWE3 packet, and sent to the PSN
network. After the PE device at the other side of the Pseudo Wire receives
the packet, dis-encrypt the TDM payload from the PWE3 packet and send
it to the TDM port.
number of the encrypted bytes. The more the encrypted bytes, the larger
the packet encryption delay, the fewer the generated packets in unit time.
CESoPSN Protocol
The TDM port on PE works in the framed mode, which is divided into non-
CAS and CAS modes according to the TDM service type.
Non-CAS mode
As shown in Figure 21-8, CESoPSN processes the TDM flow with the frame
as the unit. After every N frames are received, the data of the specified
time slots (time slot 4 and 25) is encrypted into the TDM payload of the
PWE3 packet and then sent to the PSN network. After the PE device at the
other side of the Pseudo Wire receives the packet, dis-encrypt the TDM
payload from the PWE3 packet, insert them to the specified time slots
(time slot 4 and 25) respectively, and then send them to the TDM port.
CAS mode
As shown in Figure 21-9, TDM has the CAS multi-frame structure, that is,
comprises 26 base frames. The 16 time slots of each base frame are used
to carry the signaling and multi-frame synchronization. CESoPSN
processes the TDM flow with the CAS multi-frame as the unit. Encrypt the
data of the specified time slots (time slot 2, 4, and 25) in each base frame
to the TDM payload in the PWE3 packet according to the order that begins
with the first base frame of the multi-frame and ends with the last base
In the segmenting mode, PCT the number of the base frames in the
multi-frameframe timethe number of the base frames in the multi-
frameframe rate. Take E1 CAS multi-frame as an example. The E1 rate
is 2.048Mbps; each frame contains 32 time slots; 8000 frames are
transmitted every second; the frame rate is 0.125ms; each sub multi-
frame contains 4 base frames. Therefore, the delay for encrypting one
PWE3 packet is 4 0.125ms=0.5ms. The packet encryption delay
increases with the number of the base frames in the sub multi-frame. The
more the base frames in the sub multi-frame, the larger the packet
encryption delay.
HDLC Mode
SAToP and CESoPSN circuit emulation modes are called flow mode
(transparent transmission mode), because the encrypted in the packet is
the original bit flow. The purpose is to transmit the TDM bit flow without
any change between two TDM devices.
However, in the HDLC mode, only the existing HDLC frames in the TDM bit
flow are transmitted, as shown in Figure 21-11. No matter whether the
TDM flow is framed or not, it is processed with the HDLC frame as the unit,
that is, search for the frame head and the frame trail of the HDLC frame in
the bit flow. When a complete HDLC frame is received, the data is
encrypted to the TDM payload and then sent to the PSN network. After
The PE device at the other side of the Pseudo Wire dis-encrypts the PWE3
packet, the payload is re-encrypted as the HDLC frame and inserted into
the TDM bit flow.
As shown in Figure 21-12, the gateway (IWF) at the clock source side
sends the time information to the peer gateway regularly. The time
information is provide with the T1/E1 emulation packet. At the other side,
the gateway extracts the time stamp from the packet and recovers the
service clock (f-service) via algorithm.
The core theory of the algorithm is that the left IWF device sends the
packet to the destination IWF device according to its own source clock.
The destination IWF device uses one queue to buffer the packet, and uses
its own local clock to send it out. If the source clock and the destination
local clock are not consistent, even if only a very small difference, it
results in the depth change of the buffer queue in the destination device.
Therefore, we can judge whether the local clock is consistent with the
source clock according to the depth of the queue. If the queue depth
continues increasing, it shows that the local clock is slower than the source
clock and the local clock needs to be adjusted quicker; if the queue depth
continues reducing, it shows that the local clock is quicker than the source
clock, and the local clock needs to be adjusted slower. This is a negative
feedback mechanism. After it becomes stable, we will find that the local
clock at the destination is the same as the source clock in the long run. In
this way, the frequency synchronization is complete between two IWF
devices on the IP network.
jitter of the IP network is not cumulative, so you can use the statistics
methods such as getting the average to perform the filtering.
Figure 21-13 The connection and aggregation of the MAN private line
As shown in Figure 21-13, the TDM circuit emulation technology can be
used to connect and aggregate the MAN private line. For example, the LAN
district is connected to the PBX switches of the branches in the district to
provide the E1 voice access function to realize the communication in the
district. This can also be realized by connecting the district to the PSTN.
The TDM circuit emulation service is the emulation for the TDM physical
transmission mode and does not perceive the actual services transmitted
in E1. The DDN service, FR service, and ATM service over E1 can be
transmitted transparently via the TDM circuit emulation mode.
TDMoIP Gateway in the figure is the PWE3 device. The PWE3 packet
formats on paths are as shown in Figure 21-14.
Loopback Detection
Technology
Introduction to Loopback
Detection
Ethernet is one broadcast network. When the destination of the packet
cannot be identified, the switch broadcast the packet in one VLAN. When
there is loop in the network, the packet is forwarded repeatedly in the
network and at last, the network bandwidth is consumed up and the
communication cannot be performed. Enable the loopback detection
function on the port and send Loopabck packets with an interval to check
whether there is loop in the network. When the port receives the Loopback
packet sent by the local device, analyze the source port of the packet from
the loopback packet, set the port as ERR-DISABLE, and print the log
information.
This section describes the theory of the loopback detection protocol and
how to realize it.
There are two cases of loop. One is that the loop is between different ports
of the switch. For example, because of connection error, two ports of one
switch are connected; the other is that the loop is on one port of the
switch. For example, the port is connected to one bridge device and the
Ethernet port of the bridge loops. In the first case, you can use STP to
detect, but in the second case, STP is useless and you should adopt other
methods to detect.
The theory of the port loopback detection is to send one special packet
timely. In the normal state, the device that receives the packet drops it. If
there is loop, the packet is returned to the source port. Compared with the
sent packet, you can get to know whether there is loopback.
Ethernet type field (2 bytes): the protocol type number of the loopback
packet, 0x9000;
Port Index field (2 bytes): the number of the port that sends the loopback
packet;
Send the detection packet with an interval on the port that is configured
with loopback detection. The DMAC of the packet is one MAC of the switch
(got from the base MAC); the SMAC is one MAC of the switch (got from
the base MAC); Skip counter is 0; Message type is 0x0100; Receipt
number is the port number. If the port is not configured to any VLAN, send
only one untag loopback packet. Otherwise, if the port belongs to one or
multiple VLANs, besides one untag loopback packet, send the tag loopback
packet to each VLAN that is configured with tag.
When the port receives one loopback packet that is not configured with
the loopback detection, drop it. Otherwise, check whether the DMAC and
SMAC of the packet are the MAC addresses of the device. If yes, prompt
the port loopback to the user. If the port is in the controlled state,
shutdown the port. Otherwise, do not shutdown the port and only prompt
the port loopback to the user.
Typical Application
When using the loopback detection, ensure that the corresponding port is
configured with the loopback detection function and works in the same
detection mode.
Illustration
Command Description
switch1(config)#loopback-detection enable Enable the port loopback detection globally
switch 1(config)# port 0/1 Enter the port configuration mode
switch1(config-port-0/1)# port hybrid tagged Add port0/1 to VLAN10 in tagged mode
vlan 10
switch1(config-port-0/1)#loopback-detection Set the interval of sending the loopback
enable interval-time 10 detection packets of port 0/1 as 10s
switch1(config-port-0/1)#loopback-detection Enable the port loopback detection
enable
switch1(config-port-0/1)#exit Complete the loopback detection
configuration
Command Description
switch2 (config)# port 0/2-0/4 Enter the port configuration mode
switch2(config-port-range)# port hybrid Add port0/2, port 0/3 and port 0/4 to VLAN
tagged vlan 10 10 in tagged mode.
Main contents:
Super-VLAN theory
Super-VLAN realization
Typical application
Super-VLAN Theory
Super-VLAN: also called VLAN aggregation. Super-VLAN associates
multiple sub-VLANs. Configure the IP address on the Super-VLAN interface.
Each Sub-VLAN is one broadcast domain and different Sub-VLANs are
separated from each other. To realize the intercommunication between
different Sub-VLANs, the ARP proxy function is needed. With the ARP
proxy, forward and process the ARP request and response packets, so as
to realize the L3 intercommunication between L2 separated ports. The L3
communication between the users in sub-VLAN uses the IP address of
Super-VLAN as the gateway. In this way, the IP addresses are saved.
Super-VLAN Realization
After Super-VLAN_1 receives the ARP request packet, check (for example,
whether the ARP proxy is configured) and judge whether forwarding is
needed. If yes, modify the MAC address of the requester (sender) of the
ARP packet to the MAC address of the interface and forward to other Sub-
VLAN.
PC3 response packet and the switch interface can receive the ARP
response.
After the switch receives the ARP response, make a series of processing
and judge whether to answer the ARP response to the original requester of
the ARP (PC1). When answering the ARP response, modify the source MAC
address of the ARP response packet to the MAC address of the interface.
After PC1 receives the ARP response packet, the IP packet sent from PC1
to PC3 is sent to the Super-VLAN_1 interface. The switch forwards the IP
packet to PC3 via the Super-VLAN_1 interface.
After the Super-VLAN_1 interface receives the ARP response packet of PC3,
search the ARL table according to the destination IP address of the ARP
packet. According to the recorded binding information of IP and VLANID,
Port, the switch can get to know which VLAN PC1 is in and from which port
the packet sent to PC1 should be sent out. In this way, the packet does
not need to be forwarded in all other VLANs.
Typical Application
Create Super-VLAN 10 and Sub-VLAN: VLAN 5, VLAN 6, and VLAN 8. Port
0/2 and port 0/3 belong to VLAN 5; port0/0/4 and port0/0/5 belong to
VLAN 6; port0/0/6 and port0/0/7 belong to VLAN 8. The L2 separation is
performed between different VLANs, so all Sub-VLANs use the L3 interface
of the Super-VLAN as the gateway to communicate with the outside, so as
to realize the L3 communication between different Sub-VLANs.
port0/2 and port0/0/3 are added to VLAN5; port0/0/4 and port0/0/5 are
added to VLAN6; port0/0/6 and port0/0/7 are added to VLAN 8. Create
Super-VLAN 10, enable ARP proxy function; add VLAN 5, VLAN 6, and
VLAN 8 to Super-VLAN 10. Create the VLAN interface of Super-VLAN 10
(interface vlan 10) and configured the reasonable IP address on the VLAN
interface. The configurations of the basic functions of Super-VLAN are
complete.
L3 Multicast Technology
This chapter describes the IP multicast theory and the related multicast
protocols. IGMP(Internet Group Management Protocol) is mainly used to
manage the group number relation between the host and the route/switch
device. The dynamic multicast routing protocol is used to maintain the
consistent multicast route table of the whole network. The multicast public
part maintains one multicast forwarding table calculated according to the
multicast route table. When the multicast service packets are received, the
route/switch device searches the multicast forwarding table to confirm
whether to forward the packets and how to forward the packets.
Note The term Route/switch device used in this chapter means the
router or the L3 switch with the routing function.
Main contents:
Introduction to multicast
Introduction to Multicast
When the destination address of the information (including data, voice and
video) is one group of users in the network, you can adopt many kinds of
transmission modes. For example, adopt unicast mode and set up one
separate data transmission path for each user; adopt the Broadcast mode
and transmit the information to all users in the network; no matter
whether they need, they all receive the broadcasted information. The
above two modes both waste lots of bandwidth resources. Moreover, the
broadcast mode is not be propitious to the security and confidentiality of
If there are the route/switch devices that do not support the multicast, the
multicast route/switch device can adopt the tunnel mode to encapsulate
the multicast packets in the unicast IP packet and then send it to the
neighboring route/switch device. And then the neighboring multicast
route/switch device removes the unicast IP head and continues to perform
the multicast transmission until reaching the destination.
multicast distribution tree: In the multicast model, the source host can
send information to any host that is added to the multicast group. The
path of the IP multicast service packets in the network becomes the
multicast distribution tree, which includes the source tree and shared tree.
source tree: The root of the tree is the multicast information source. The
branches form the distribution tree that reaches the receiving station via
the network. The source tree that runs through the network with the
shortest path is called the shortest path tree (SPT).
shared tree: The shared tree does not use the information source as the
tree root, but adopts some selectable point in the network as the public
root, which is called Rendezvous Point (RP).
IP Multicast Address
IP multicast address is used to identify one IP multicast group. IANA
distributes Class D of addresses to multicast, which ranges from 224.0.0.0
to 239.255.255.255. The front four bits of the IP multicast addresses are
all 1110.
Only 23 bits in the back 28 bits of the IP multicast address are mapped to
the MAC address. In this way, 32 IP multicast addresses are mapped to
one MAC address.
IP Multicast Features
In the common TCP/IP route, the transmission path of one packet is from
the source address to the destination address, adopting the Hop-by-Hop
theory to transmit in the IP network. However, in the IP multicast
environment, the destination address of the packet is not one but one
group, forming the group address. All information receivers are added to
one group. Once being added to the group, the data to the group address
is transmitted to the receivers at once. All members in the group can
receive the packet. Therefore, to receive the packet, it must become the
member of the multicast group first, while the sender of the packet does
not need to be the member in the group. In the multicast environment,
the data is sent to all members in the group and the users those are not
the members in the group do not receive the data.
1. There is no limitation for the location of the group members and the
number of the members. That is to say, the separate host can be
added to or leave the multicast group at any time. The members can
be at any place of Internet. One host can be the member of more than
one multicast groups at one moment;
2. One host can send packets to one multicast group, even the host is
not the member of the group. Transmitting the multicast packet to all
hosts in one multicast group is like the unicast and only needs to send
one packet to the group address;
3. The route/switch device does not need to save the member relation of
all hosts. It only needs to know whether there is host that belongs to
one multicast group on the segment of the physical interface; the host
needs to save the information of the multicast groups to which the
host is added.
The multicast routing protocol in dense mode is suitable for small network.
It supposes that each subnet in the network has at least one receiver that
is interested in multicast group, so the multicast packet is spread to all
nodes in the network and the related resources (such as bandwidth and
CPU of route/switch device) are consumed. To reduce the consumption for
the precious network resources, the multicast routing protocol in dense
mode performs the pruning operation for the branches without multicast
data forwarding and only reserves the branches that contain the receivers.
To send data to the specified address, the sender first needs to register at
the Rendezvous Point and then sends the data to the Rendezvous Point.
When the data reaches the Rendezvous Point, the multicast packets are
copied and transmitted to the receivers along the distribution tree path.
The copying only happens to the branches of the distribution tree and it
can automatically repeat until the packet reaches the destination.
The typical multicast routing protocol in sparse mode is the PIM-SM in the
sparse mode.
The RPF process uses the original unicast route table to confirm the
upstream and downstream adjacency nodes. Forward the packet to the
downstream only when the packet reaches from the interface (called RPF
interface) of the upstream adjacency node. The RPF can be used to
forward the packets correctly according to the configuration of the
multicast route and avoid the loop caused because of various reasons.
Avoiding the loop is an important problem in the multicast routing. The
main body of RPF is RPF check. After receiving the multicast packet, the
route/switch device first performs the RPF check. The packet can be
forwarded only after passing the check. Otherwise, drop the packet. The
process of the RPF check is as follows:
1. The route/switch device searches for the multicast source or the RPF
interface of RP in the unicast route table. When the source tree is used,
search for the RPF interface of the multicast source; when the shared
tree is used, search for the RPF interface of RP. The RPF interface of
one address means the output interface when sending the IP unicast
packet from the route/switch device to the address;
2. If the multicast packet is received from the RPF interface, the RPF
check is passed. If the multicast packet passes the RPF check, the
route/switch device forwards the packet to the downstream interface.
Otherwise, drop the packet.
The following figure shows the RPF check process when the source tree is
used.
RPF check
From the RPF check process, we can see that the RPF check uses the
interface of the shortest path from the route/switch device to the multicast
source or RP, so it is called Reverse Path Forwarding.
IP Multicast Application
Inf orma tion Distr ibution
IP multicast makes the data in the company can be distributed to lots of
users. For example, one company with several chain stores can use the
multicast to transmit the price information to the cash registers of the
chain stores or the media provides the onsite real-time information to the
users that support multicast via Internet, such as the remote employee
management and remote education.
D ata Broadcast
The traditional data broadcast is based on the broadcast and occupies lots
of Internet bandwidth. With the multicast technology, the TV and radio not
only can broadcast the programs to the users that really need the data,
but also can reduce the maintenance costs of the network.
IGMP querier: The IGMP querier can send the IGMP query packets
regularly to query whether there is the host member that is applying for
adding to the multicast group in the LAN of the route/switch device.
Besides, in the version 2, IGMP querier sends the query packet of the
specified group for the IGMP leave message of one group member; in
version 3, send the query packet of the specified source for the specified
multicast source. Usually, the host does not generate the query packet
and it returns one group member qualification report packet as desired
only when receiving the query packet.
With the above mechanism, set up one table in the multicast route/switch
device, which records which subnets of the interfaces on the route/switch
device have the multicast group of the active member, as well as one
timer for each multicast group. Besides, the table records one member of
the multicast group, but does not need to record all members. After the
route/switch device receives the packets of one group G, forward the
packet only to the interfaces with the member of group G. How to forward
the packet between the route/switch devices depends on the multicast
routing protocol, which is not the function of the IGMP protocol.
IGMP V1
Packet For mat
IGMP is one part of IP. The IGMP packet is encapsulated in the IP packet.
The protocol number of the IP packet is 2. The IGMP packet uses TTL 1 to
transmit and includes the IP route checksum in the IP head.
Version number: 1
The host that receives the packet fills the address of multicast group to
which it is added in the report packet and multicasts the packet to the
multicast address;
After other hosts added to the multicast group receive the multicast
packet, suppress the sending of its own report packet;
IGMP V2
I mpro ve ment Co mpared with V 1
Query selection process
group address
Type: type;
Three kinds of IGMP message is related with the interaction between the
host and the route/switch device:
The two messages use the group address to distinguish. For the general
query, the group address is 0; for the specified group query, the group
address contains the multicast group address to be queried.
The maximum response time domain is valid only in the member relation
query. It defines the maximum waiting time before answering the
membership query (the unit is 1/10s). In all other messages, the sender
sets it as 0, while the receiver ignores the domain.
host multicasts one V2 member report to the group and the TTL is 1. If the
host receives the report of another host (the version is 1 or 2), but its own
timer does not arrive, it stops the timer and does not send report, which
reduces the repeated report.
When receiving the membership report, the route/switch device adds the
group to the member list of the multicast group and sets one timer with
value as Group Membership Interval (GMI) for it. Receiving the report of
the group results in the updating of the timer. If the timer times out, the
route/switch device regards that there is no local group member and does
not need to forward the multicast packet for the group on the neighboring
network.
When the host is added to the multicast group, send one V2 membership
report at once, avoiding that it is the first member of the group on the
network. The report may be lost, so the host needs to re-send the
membership report for at least one time after Unsolicited Report Interval
(URI).
When the querier receives the leave group message of the group member,
send the specified group query to the group that is to leave, so as to
confirm whether there are other active group members in the subnet.
Other active group members answer the membership report. If there is no
any report message at the last member query period, the route/switch
device regards that the group does not have the local member.
Inter-operation of V1 and V2
R oute/ Switc h D e vice Ser vi ng as Mul ticast Host
If the route/switch device that supports V2 receives the V1 IGMP
membership query, the route/switch device turns to the status that the
current queried route/switch device is V1 and sets one timer. As long as
the timer receives the V1 membership query, it resets. If the timer times
out, the route/switch device returns to the V2 status.
IGMP V3
I mpro ve ment Co mpared with V1 and V2
Add the private member report packet of V3; one packet can report
multiple group records and each group record can indicate which
sources to be received or refused;
When the source quantity in the query packet is 0, the query packet
length is 4 bytes more than the V2 packet;
The Max Resp Code field; when the number is larger than 128, you
can perform the floating-point transformation to get Max Resp Time;
With the INCLUDE and EXCLUDE filtering mode, unify the formats of
the member report packet and leave packet;
Group Address
Type: type;
The actual used is Max Resp Time (the unit is s1/10s). The relation of Max
Resp Time and Max Resp Code is as follows:
If Max Resp Code < 128, Max Resp Time = Max Resp Code;
If Max Resp Code >= 128, Max Resp Code indicates one floating value of
the following format :
01234567
+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+
When sending the general query, the group address is 0; when sending
the specified group and specified source query, it is the group address;
S: the flag S;
The actual used is QQI. The relation between QQI and QQIC is similar to
Max Resp Code. When smaller than 128, QQI = QQIC; when larger than
128, it is processed as the floating value.
Number of Sources (N): the number of the queried sources in the query.
For the general query and specified group query, N is 0; for the specified
source query, N is not 0. The N value is limited by the MTU of the network.
Type: type
Multicast Address
Record Type: the group record type; the value range is 1-6, whose
meanings are as follows:
Here, the group records of type 1 and 2 are the current status group
records. the group records of type 3 and 4 are the status change group
records.
Each source in the source list of each group has one source status,
comprising the source address and the source timer.
When all sources of one group are interested, the group status is EXCLUDE
and the source list is null.
When there is no IS_EX or TO_EX report in the network, the filter mode of
the group status is INCLUDE; when receiving the IS_EX or TO_EX report,
the filter mode of the group status changes to EXCLUDE.
When the group is EXCLUDE, there are two source lists. One is the list of
the confirmed un-interested and confirmed un-forwarded sources in the
network; the other is the list of the un-interested sources and maybe
interested sources or confirmed interested sources (when turning to
INCLUDE, these sources are needed). The packets of the source from the
list are forwarded.
When the group is INCLUDE, there is only one list, that is, the list
comprising the sources that need to be forwarded. When the timer of the
source in the list times out, the list is empty and the group is deleted.
The group timer runs only in the EXCLUDE filter mode. In the INCLUDE
mode, only the source timer runs. When the source timer times out, the
group is deleted. When the group timer times out, the filter mode of the
group switches from EXCLUDE to INCLUDE.
Only the source whose packet can be forwarded has the source timer. The
source in the source list that is not forwarded in the EXCLUDE mode does
not have the source timer. When the source timer times out and if the
group is the INCLUDE mode, delete the source; if the group is EXCLUDE
mode, remove the source from the forwarding source list to the un-
forwarding source list.
IGMPv1 query: The length of the query packet is 8 bytes and Max Resp
Code is 0;
IGMPv2 query: The length of the query packet is 8 bytes and Max Resp
Code is not 0;
IGMPv3 query: The length of the query packet is larger than or equal to 12
bytes.
The member report packets of IGMPv1 and v2 are sent to the added group.
The leave packets of IGMPv2 are sent to all route/switch device groups
(224.0.0.2); the member report packets of IGMPv3 are sent to all IGMPv3
route/switch device groups (224.0.0.22).
When the route/switch device of IGMP v3 receives the query packets sent
by the route/switch device with lower version, the route/switch device can
be configured as IGMP v1 or IGMP v2 artificially. If not configured as the
lower version, the alarm information appears.
When the timer of v2 host runs, do not process the BLOCK record of the
group and all TO_EX packets are processed as TO_EX{};
When the timer of v1 host runs, do not process the BLOCK record, all
TO_EX packets are processed as TO_EX{} and do not process the TO_IN{}
record of the v2 leave packet.
When the timer of the v1 host of the group times out and if the timer of v2
host does not run, the processing of the group returns to the v3
processing mode. Otherwise, adopt the v2 processing mode; when the
timer of the v2 host, return to the v3 processing mode.
DR: Designated Router, used to forward the multicast packets and send
adding/pruning and registering messages in the multi-path access network
(such as Ethernet);
PIM-SM Protocol
In the PIM-SM domain, the route/switch device that runs the PIM-SM
protocol periodically sends the Hello message, which is used to discover
the neighboring PIM route/switch device and is responsible for selecting
DR in the multi-path access network. Here, DR is responsible for sending
the adding/pruning message and registering message.
When the source host sends the multicast data to the group, the source
data is encapsulated in the register message and then DR unicasts it RP
(No. 5 in the following figure). RP encapsulates the register message as
packet and forwards it to the group members along the shared tree. And
then, RP can send the adding/pruning message (No. 3 in the following
figure) for the specified source along the source direction to add into the
shortest path tree of the source. In this way, the packet is sent to RP
without being encapsulated along the shortest path tree. When the
multicast packet reaches along the shortest path, RP sends the register-
stop message to the DR of the source, so as to make DR stop the
registering and encapsulating process. Hereafter, the multicast data of the
source is not registered or encapsulated any more, but is sent to RP (AB
RP) along the shortest path tree of the source, and then RP forwards the
packet to the shared tree. At last, the packet is sent to the group
members along the shared tree (RPCE).
PIM-SM refers to the selection mechanism of BSR and RP. One or multiple
Candidate-BSRs are configured in the PIM-SM domain and use some rule
to select the public unique BSR of the domain. Candidate-RP is also
configured in the PIM-SM domain. The Candidate-RPs unicast the packets
that contain the information about their addresses and the multicast
groups that can be served to the BSR, and then BSR regularly generates
the BootStrap messages that contain a series of Candidate-RPs can
corresponding group addresses. The BootStrap messages are sent hop by
hop in the whole domain. The route/switch device receives and saves the
BootStrap messages. Id DR receives the IGMP adding packets from the
direct-connected host and it does not have the route option of the group,
use the hash algorithm to map the group address to one candidate RP,
and multicast the adding/pruning message hop by hop along the RP
direction. If DR receives the multicast packets from the direct-connected
host, and it does not have the route option of the group, use the hash
algorithm to map the group address to one candidate RP and then
encapsulate the multicast data in the register message and unicast it to RP.
D R Select ion
The rules of selecting DR are as follows:
2. If the interface has one neighbor route/switch device whose PIM Hello
packets do not carry the priority field, select DR according to the IP
address, that is, select the one with the largest IP address as DR.
B S R Selecti on
At first, the route/switch device configured as the candidate-BSR enters
the Pending-BSR status; set the Bootstrap timer as the random veto value
(5s-23s) and begin to monitor the Bootstrap message.
Bootstrap message adopts the all PIM routers multicast group address
224.0.0.13 and TTL is set as 1. When one PIM route/switch device
receives the Bootstrap message, it sends the message on all interfaces
(except for the receiving interface). The process not only can ensure that
the Bootstrap message is spread to the multicast domain, but also can
ensure that each PIM route/switch device can receive the packet, so as to
know which route/switch device is BSR.
R P Select ion
One route/switch device can be configured as the candidate-RP (C-RP) of
some specified multicast group or all multicast groups. After receiving the
Bootstrap message and getting to know the BSR location, C-RP transmits
Candidate-RP-Advertisement message to BSR via unicast. The message
has the RP address of the message initiator, its priority and the multicast
group address of C-RP.
BSR clears up all C-RPs, lists their priorities and their groups and forms
the RP set. BSR declares the RP set to the whole multicast domain via the
Bootstrap message. The Bootstrap message includes one 8-bit hash mask.
When one route/switch device receives the IGMP message or PIM join
message and one shared tree needs to be added, it checks the RP set got
from BSR. With the specified hash algorithm, select the RP for the
multicast group.
PIM-DM Protocol
N eighbor Se tup
After the PIM route/switch device starts, it periodically (by default, it 30s)
sends the hello packets to the route/switch device (sent to all PIM router
groups 224.0.0.13) to set up the neighbor relation. The route/switch
device that receives the hello packet adds the route/switch device that
sends the hello packet to the neighbor list and enables one timer for it.
The value of the timer is the value in the holdtime domain in the hello
packet.
When the service packet is transmitted from E to I and I finds that itself
does not have the downstream neighbor or local group member and the
egress port is empty, I sends the pruning message to the upstream E
(note: the pruning is sent out from the input interface and the destination
address is the address of the group to be pruned) to ask for pruning. Here,
E finds that it has only one neighbor (such as the point-to-point
connection between E and I), E prunes I at once after receiving the
pruning of I. After pruning, E finds that its egress port is empty, it
continues to send the pruning upstream. After receiving the pruning of E,
C finds that there is local group member (refer to IGMP) in the network, so
ignore the pruning of E.
In the PIM-SM mode, if one source begins to send multicast service flow,
the first hop DR connected to the source registers the source information
to RP. In this way, the RP in PIM-SM can always know the source
information of all multicast service flows in the domain. In actual
application, to meet the network management requirements, divide the
whole network to multiple PIM domains and each domain has its own RP,
which is used to manage the source information of all multicast service
flows in the domain. Usually, the RP in the domain cannot know the source
information of other PIM domains, so it cannot receive the multicast
service flow of other domains. However, to meet the use requirements,
the users belonging to different domains hope to receive the multicast
service flow of other domains. To provide all multicast service flows, one
domain must depend on the RPs of other domains, which is not hoped by
the carriers. MSDP appears to solve the problem.
MSDP sets up the peer connection relation between domains. The defined
information exchanging makes RPs of the domains share the active source
information in the network. Meanwhile, the RP of each domain maintains
the receiver information of its own domain. Therefore, for the multicast
service flow with receiver, RP can directly initiate adding to the source and
does not depend on the RPs of other domains. After the service flow is
referenced to RP via the source tree (SPT), RP transmit the service flow to
the receivers in the domain via the sharing tree (RPT). In this way, the
multicast service flow can be transmitted in the domain without depending
on the RP of other domain.
The MSDP peer relation is set up between the RPs of the domains via the
TCP connection. When the RP of one domain learns the new active source
in the domain, it sends SA (source-Active) message to all peer ends that
set up the peer relation with it. The peer end of MSDP adopts the
improved RPF to check whether to accept the SA message sent from other
peer end. After receiving the SA message, forward it to other peer ends
until all MSDP routers in the network receive the SA message. If the RP
that receives the SA sets up the (*, G) item, RP sets up the (S,G) item
and adds it to the source via SPT, importing the service flow to the domain.
The left is processed by the PIM-SM protocol. Besides, MSDP router
periodically sends out the source information in its own domain via the SA
message, letting the MSDP peer ends of all other domains know that the
source is sending service flow.
MSDP Application
Int er -doma in MSD P
PIM-SM can be regarded as one multicast IGP protocol, because it is
supposed to run in one single domain. How to cross the AS boundary to
distribute the multicast packets and maintain the autonomy of each AS at
the same time is the problem of PIM-SM. PMBR (PIM Multicast Border
Router) in the PIM-SM protocol is used to solve the problem. PMBR is
located at the edge of AS and sets up the branches for all RPs in the AS.
Each branch is expressed by (**RP). The wildcard indicates all source
and group addresses mapped to the RP. When RP receives the traffic from
the source, forward the traffic to PMBR, and then PMBR forwards the
traffic to other domain. When the adjacency domain does not need the
traffic, send pruning to PMBR, and then PMBR sends the pruning to RP, as
follows:
PMBR solution
To solve the above problem, the following two problems need to be solved:
1. When the source is in one domain, but the group member is in the
another domain, RPF process must keep valid;
PIM can use the BGP route to decide the RPF to other domain, but when
the unicast and multicast use different links, RPF check may fail. The static
multicast route can be used to prevent the RPF problem, but using the
static multicast route in a large range is not realistic. MBGP expanded from
BGP can solve the problem. In this way, problem can be solved via MBGP.
Inter-domain MSDP
MSDP shares the known source information of the RPs between different
AS via interaction. PIM-SM feels that the shared sources are in the same
domain. In this way, the receiver only depends on the RP in the local
domain, realizing the AS autonomy.
1. Traffic bottleneck;
The hash algorithm and auto RP filtering of the PIMv2 BootStrap protocol
can relieve the above problems, but cannot provide the scheme of solving
the problems completely. Anycast RP is the method of permitting one
single group to be mapped to multiple RPs. The RPs can distributed in the
whole domain and use the same RP address. As a result, the virtual RP is
generated, while MSDP is the basis of generating the virtual RP.
MPLS Technology
Main contents:
Terms
Introduction to MPLS
MPLS architecture
Introduction to CSC
MPLS OAM
Label -Label
Introduction to MPLS
The MPLS integrates the latest development of the route/switch solution.
It combines the simplicity of L2 switching and flexibility of L3 route. It
provides the following features:
Frame relay, ATM, PPP, HDLC, SDH, and DWDM are supported, which
ensures the interconnection of multiple types of network.
MPLS Architecture
Separation of Control and Forwarding
The MPLS architecture is divided into two independent units: the control
unit and forwarding unit, as shown in the following figure:
The control unit uses the standard routing protocol (such as OSPF and
BGP4) to exchange routing information and maintain routing tables. At the
same time, it uses the label control protocol (such as LDP, MP-BGP, and
RSVP) to exchange the label forwarding information with the
interconnected label switching devices to create and maintain the label
forwarding table.
The entrance LSR of the MPLS domain determines one FEC for the IP
packet entering the MPLS domain. Then, it searches the corresponding
label value according to the FEC and encapsulates them into the IP
packets to form label packets. Then, transmit the packets in the MPLS
domain.
One MPLS packet can carry multiple label headers. The structure is called
a label stack. The labels are organized in the Last in, first out mode. The
external label is called the stack top label and the internal label is called
the stack bottom (simple IP unicast route does not use label stack, but
other MPLS-based applications including MPLS-VPN rely on the label stack).
Each label is composed of the following fields:
Time-to-Live
The TTL of the field is 8 bits. It is used for coding of TTL. The function is
the same as that of the TTL field in the IP header. The filed is used to
prevent forwarding loopback caused by improper configuration, fault, or
slow convergence of routing algorithm, and to restrict the packet scope.
The field is 1 bit and the location is 1. It indicates that the corresponding
label is the last label (S) in the label stack. 0 indicates all other labels
except the bottom stack label.
The field is 3 bits used to carry CoS information (the function is similar to
TOS data in the IP packet).
Label Value
The field is 20 bits, containing the actual value of the label. When a LSR
receives the label packet, it first checks the label value of the stack top.
Normally, the LSR knows the next-hop node through the label value and
uses new label to replace the current stack top label. The label values 0-15
are the reserved label values. The meaning is as follows:
Label Description
Value
0 Indicates that IPv4 shows the blank label. When the label is at the stack top, it
indicates that the next step is to pop up the label and forward the packets according
to the new stack top label. If the label is the only label in the label stack, namely,
after the popup, the label stack is empty, the forwarding of the packets are based
on the IPv4 packet header.
1 Indicates the alert label of the router. When the stack top label of the receive
packet is 1, the packet is sent to the local software. The forwarding of the packet is
determined by the next item in the label stack.
2 Indicates that IPv6 shows the blank label. The usage is similar to label value 0.
3 It indicates the hidden blank label. The LDP uses it to require upstream neighbor to
pop up labels (penultimate relay segment popup). The label value does not occur in
the label encapsulation.
415 Reserved
The basic unit of the MPLS network is the label switching router (LSR). The
switches or routers that can distribute labels and forward packets
according to the label belong to LSR. According to the functions provided
by the LSR, it can be divided into LSE (LER) and core LSR.
With the penultimate hop popping mechanism, the boarder LSR can
require the upstream neighbor to pop up the label (through the signaling
protocol such as LDP to send hidden label tag value 3 to the upstream
neighbor). In figure 25-3, R6 router pops up the labels in the packets,
then, send the pure IP packets to the R7 router. At last, R7 router
performs simple L3 searching operation and sends packets to the
destination.
Label Space
The concept of label space is related with the assignation and distribution
of the label. It defines the scope of using labels and defines whether the
labels in different interfaces can be repeated. The label space includes two
types:
The interface using the interface resources as the label generally uses the
label space. If the LDP peer is connected through specific interface, and
the label is transmitted through specific interface data, the label space
based on the scope of each interface can be used. In this case, the label is
unique in each interface.
When the interfaces share label resources, the label space based on each
platform scope is used. In this case, the label is unique in a platform (a
LSR).
L D P Ident ifier
The length of LDP identifier is 6 bytes. It is used to mark the label space
scope of specific LSR. The first four bytes indicate the IP address assigned
to the specific LSR. The rest two bytes indicate the specific label space in
the LSR. For the label space in the platform scope, the last two bytes of
LDP identifier are always 0. The format of LDP identifier is as follows:
If there are two physical links between two LSRs, the two physical links
are ATM links using the label space of each interface scope. In this case,
multiple label spaces should be advertised between LSRs, and multiple
LDP identifiers should be used.
L D P Session
The LDP session is used to exchange label information between LSRs. If
multiple label spaces are advertised to another LSR from one LSR, for
different label spaces, different LDP sessions must be created between
LSRs.
L D P Trans mission
The LDP uses TCP to ensure reliable transmission of the LDP session. If
multiple LDP session is required for two LSRs, different LDP sessions will
correspond to different TCP connection.
L D P Disco ver y
LDP discovers and creates Adjacency through the discovery mechanism.
LDP supports basic discovery and expanded discovery.
If the LSR receives LDP hello message, it indicates that the potential
reachable LDP peer exists. The label space used by the LDP peer can be
obtained.
Assume that the label space of LSR1 is LSR1: a, the label space of LSR2 is
LSR2: b. The following describes the process of creating LDP session of
LSR1.
After discovering the hello message through exchanging LDP, the two
parties will create an adjacency. Then, determine the initiative part
according to the transmission addresses of the two parties. If the
transmission address of LSR1 is larger than the transmission address of
LSR2, LSR1 serves as the initiative party of creating connection, namely, it
initiatively launches TCP connection (port: 646) to LSR2, and LSR2 serves
as the passive party of the connection to wait for the creation of
connection.
If LSR1 did not use the optional TLV of transmission address, the
transmission address of LSR1 is the source address of sending hello
message to LSR2.
After LSR1 and LSR2 create the transmission layer connection, they
exchange the LDP initialization messages and negotiate the LDP session
parameters. The parameters that should be negotiated include LDP
protocol version, label distribution mode, and the session holding timer
value. When the parameters are negotiated successfully, a session based
on LSR1: a and LSR2: B are created between LSR1 and LSR2. The
following describes the initialization process of a session through the state
machine mode.
After the LDP session is created, LSR must send the LDP protocol message
within the session holding time. If no message will be sent, the session
holding message will be sent.
If LSR wants end the LDP session, the LSR will send the closing notification
message to LDP peers.
For a specific FEC, the LSR does not need to obtain label request message
from the upstream to perform label distribution and assignment. This
mode is called Downstream Unsolicited.
For a specific FEC, the LSR needs to obtain label request message to
perform label distribution and assignment. This mode is called
Downstream on Demand.
There are two types of label control modes in MPLS: Independent Control
and Ordered Control.
When the independent control mode is used, each LSR can advertise label
mapping to the connected LSR at any time.
When the ordered control mode is used, if the LSR receives specific FEC-
label mapping message of specific FEC next-hop or if LSR is the egress
node of LSP, LSR can send label mapping message to the upstream.
There are two types of label retention modes in the MPLS: Liberal
retention mode and conservative retention mode.
For s specific FEC, LSR Ru receives label mapping from the LSR Rd. When
Rd is not the next-hop of Ru, if Ru saves the binding, the mode used by
Ru is liberal retention mode; if Ru discards the binding, the mode used by
Ru is conservative retention mode.
L D P Graceful R estart
For the LDP to support graceful restart, you should add the support for
Fault Tolerant Session TLV as optional parameter in the initialization
message. It sends the initialization message carrying FT session TLV
parameter to peers to indicate that it can retain the MPLS LSP information
and FEC information in the case of restarting LDP. At the same time, the
neighbor router should support the graceful restart capability and retain
the MPLS LSP information created with the restart router.
LDP advertises two times in the FT Session TLV to the peers: FT Reconnect
Timeout and Recovery Time. FT Reconnect Timeout is the time of
reconnecting after restart. Recovery time is the time of LDP recovery after
restart.
In the case of restarting, LDP starts the restart process for each neighbor.
Before the reconnect timer times out, reconstruct the connection with
neighbors. Then, wait for the neighbor to send the label mapping message
retained in the restart. Update the forwarding information retained locally
in the restart according to the information. At last, send new label
mapping information to all neighbors. When the restart is over, delete all
forwarding information that is not updated. For the neighbor routers, when
new connection is created, the MPLS LSP created with the restart router is
marked as to be aged, and send label mapping information to the restart
router. At last, update the MPLS LSP marked as to be aged according to
the label mapping information received from restart router. When the
restart is complete, delete all to-be-aged MPLS LSP that is not updated.
PDU Length: two-byte integer; indicates the PDU length, excluding version
number and PDU length field.
LDP identifier: 6 bytes, identifies the label space sending LSR of PDU. The
first four bytes indicate the router ID (IP address) of LSR and the last two
bytes indicate the label space of LSR.
U bit: unknown message bit. If the LSR receives unknown message with U
bit of 0, the LSR will return a notification message to the message source;
if the LSR receives unknown message with U bit of 1, the LSR will discard
the unknown message.
State TLV: indicates the event type of the notification message. For details,
see universal TLV coding mode of LDP-state TLV.
Session holding time: with the unit of seconds, indicates the wanted value
of session holding timer of the sending LSR.
Address list TLV: the sending LSR notifies the interface address through
the address list TLV For the coding format, see the universal TLV coding
mode of LDP-address list TLV.
Address list TLV: the sending LSR withdraws the interface address through
the address list TLV For the coding format, see the universal TLV coding
mode of LDP-address list TLV.
FEC TLV: indicates the FEC unit part of FEC/label mapping, for the code
format, see the universal TLV coding mode of LDP-FEC TLV.
Label TLV: indicates the unit part of FEC/label mapping, for the code
format, see the universal TLV coding mode of LDP-label TLV.
FEC TLV: indicates the FEC unit corresponding to the label request.
FEC TLV: indicates the FEC unit corresponding to the label request
withdraw
Label request message identifier TLV: the label request message indicated
by the label request withdraw
FEC TLV: indicates the FEC unit corresponding to the label withdraw
message, for the code format, see the universal TLV coding mode of LDP-
FEC TLV.
Label TLV: indicates the withdrawn label, for the code format, see the
universal TLV coding mode of LDP-label TLV.
FEC TLV: indicates the FEC unit corresponding to the label release
message, for the code format; see the universal TLV coding mode of LDP-
FEC TLV.
Label TLV: indicates the released label, for the code format, see the
universal TLV coding mode of LDP-label TLV.
Note
FEC units cover many types. The coding also depends on the unit type.
The field with the length of 1 byte, indicates the unit type of FEC.
The field with variable length, indicates the FEC unit value dependent on
the type.
Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.
Prefix length field: one byte, indicates the length of prefix length in bytes.
Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.
Host address length: one byte, indicates the length of host address in
bytes.
Labels cover three types: universal label, ATM label, and frame relay label.
The following describes the code of universal label.
State code: 32-bit unsigned integer, indicates the event type The code
format is as follows:
Message type, 0: the state TLV is not related with specific message; 1:
message type of the state TLV.
BGP/MPLS VPN
BGP/MPLS VPN is a mechanism permitting SP to use the IP backbone
network to provide L3 VPN service for users. In the mechanism, BGP is
used to publish the VPN routing information in the backbone network of
the SP. MPLS is used to forward VPN service from one VPN station to
another.
In the preceding figure, each PE contains two VRFs, and connects two sites.
The two interfaces connecting sites belong to two different VRFs. Site1 and
site2 belong to one VPN; site3 and site4 belong to another VPN.
OSPF and RIP IGP protocols run between PE and CE. At the same time,
BGP protocol can also run. The routing information can be exchanged
through static route. For PE2, the route 10.2.1.0/24 learned from site2
will be saved in the routing table of VRF1.
PE1 (ingress PE device) receives the VPN packets, checks the relevant
VRF routes; searches MPLS forwarding table according to the routes to
obtain relevant output label value L2. The next-hop of the route is the
loopback interface of PE2, obtain the corresponding label value L1
from the MPLS forwarding table. The two labels are integrated into
MPLS label stack, added to the front of the received VPN packets, and
forwarded to P device.
PE2 device (egress port PE) receives MPLS packets (the packets have
only one layer of label L2), according to the stack top label L2,
determine the VRF, pop up the label, check the relevant VRF route,
and forward packets to CE2 according to the routes.
The advantage of the mode is MPLS is not required between ASBRs. The
disadvantage is that all VPN routes should be maintained in the ASBR. In
addition, one interface/sub-interface should be assigned for each cross-
domain VPN. Therefore, the problem of expansibility exists.
In this mode, the ASBR is not required to assign VRF for each VPN; the
VPNV4 route should not be imported, and interface/sub-interface should
not be assigned for each VPN. But the ASBR should maintain all VPNV4
routes and assign labels fro routes. Install the ILM entry locally. Therefore,
the pressure of ASBR is heavy.
In figure 25-7, CE1 and CE2 belong to the same VPN; PE1 and ASBR1
belong to AS1; PE2 and ASBR2 belong to AS2.
In the P-Network of the same AS, each device runs a certain IGP
protocol (such as OSPF) to mutually advertise routes, including
loopback interface.
In the P-Network of the same AS, each device starts the MPLS,
and mutually advertises label mapping through signaling protocol
(such as LDP). For PE1, in the routing table, there is a route to the
ASBR1 LOOPBACK interface, the corresponding output MPLS label
is L1. For ASBR2, in the routing table, there is a route to PE2
LOOPBACK interface, and the corresponding output MPLS label is
L2.
OSPF and RIP IGP protocols run between PE and CE. At the same
time, BGP protocol can also run. The routing information can be
exchanged through static route. For PE2, the route 10.2.1.0/24
learned from site2 will be saved in the routing table of VRF1.
Run MP-IBGP between PE1 and ASBR1. ASBR1 receives VPN route
10.2.1.0/24 advertised from ASBR2; the output label is L4; ASBR1
ASBR1 receives MPLS packets (the packets have only one layer
label L5); change L5 into L4, and then forward MPLS packets to
ASBR2.
ASBR2 receives MPLS packets (in this case, the packets have only
one layer of label L4); change L4 into L3, and then search routing
table. The next-hop of the route is the loopback interface of PE2,
label value L2 is obtained from the MPLS forwarding table. Press
L2 into MPLS label stack top of the packets and then forward MPLS
packets to P2.
PE2 device (egress port PE) receives MPLS packets (the packets
have only one layer of label L3), according to the stack top label
L3, determine the VRF, pop up the label, check the relevant VRF
route, and forward packets to CE2 according to the routes.
for the enterprise customers provided by the MPLS VPN service provider.
Taking address overlapping, access control, and security into consideration,
we provide two common Internet access solutions.
Many customers do not want the VPN users to access internet directly.
They want to use the firewall to control the access to internet of VPN users.
It is the typical requirement of enterprise access to the Internet. In the
VPN environment, specific sites are provided for accessing internet. Each
VPN site sends the internet data flow to one or multiple central sites.
In the network as shown in figure 25-8, all sites in the VPN access internet
through a central site. The central site provides service between the VPN
members and internet through firewall and NAT service. VPN members
forward the data flow accessing internet to the central site by importing a
default route (the next hop is the central site CE) accessing internet. The
central site forwards the data flow accessing internet to the enterprise
firewall. In the firewall, control the access and perform NAT processing
according to the security policy of the enterprise. At last, the firewall
forwards the data flow to the internet.
For the access mode, you only need to deploy the configuration in the
enterprise VPN. Participation of the carrier is not required. The enterprise
can control the security policy of accessing internet of the intranet users.
But this mode requires that the enterprise should have strong security
management capability. For the carrier, universal management for the
access to internet cannot be performed.
The universal access control mode is to put the access control of internet
and the universal access point at the egress port of carrier. To solve the
problem of address overlapping between different VPNs, the carrier needs
to provide a VPN and firewall for each VPN to access internet. Each VPN
user accesses internet through respective firewall. In this mode, a default
route directing to internet gateway should be configured at the VPN site of
the internet egress port. Advertise the default route through BGP to other
sites of the VPN. Other sites of the VPN forward the data flow accessing
internet to the VPN sites at the PE side of the internet access point of the
carrier through the regular L3VPN mode in the default route. After
processed by the firewall, the data flow is forwarded to internet.
Some customers want to access internet through VPN, but the customers
do not require receiving complete routing information. The access
requirement can be met through the static default route access.
The following describes the access mode taking VPNA users accessing
internet as an example. In VPNA, there are two users: CE1 and CE4. First,
we need to respectively configure a cross-VRF default static route
accessing internet in the VRF routing table of PE1 and PE2. The next hop
of the cross-VRF default static route in PE1 is the Internet gateway; in PE2,
the next hop of cross-VRF default static route is PE1. To ensure the path
for returning from internet, you should configure a cross-VRF route
reaching CE1 in the global routing table of PE1. The next hop is the CE1 of
VRF. In the global routing table of PE2, you should configure a cross-VRF
route reaching CE4. The next hop is the CE4 in VRF. Advertise the route
through IGP to the MPLS network.
PE1 receives the packets accessing internet sent by PE2. In the global
routing table, search the internet route and forward the packets to the
internet gateway.
When the packets returned from the internet reach PE1, PE1 searches
the global routing table. There is a route reaching CE4 advertised by
PE2 through IGP. Then, forward the packets through the label of the
route to PE2. PE2 found a static route reaching CE4 in the global
routing table and forward the packets to CE4.
Introduction to CSC
C SC Co ncept
With the promotion and spread of BGP/MPLS VPN, more and more end
users implement network interconnection through MPLS VPN. Many small-
to-medium carriers, in order to save the cost of independently
constructing or leasing L2 transmission link, begin to lease VPN lines from
large MPLS carrier to implement POP connection. This is called Carriers
Carrier (CSC).
Basic structure of CSC is not significantly different from that of MPLS VPN
network. Carrier network usually refers to the large-scale network
providing VPN access service based on the label exchange for small-to-
medium carriers and end users. See backbone carrier network in figure
25-10. Carriers Carrier network is based on the carrier network. It
provides internet access or VPN access for end users or end users. See
User Carrier POP1 and User Carrier POP2 in figure 25-10. Theoretically,
the number of layers is not restricted. Therefore, it is expansible.
After P1 receives the packet, the external label L1 pops up. Then,
send packets L4 L7 10.2.1.1 to CSC-PE2.
MPLS L2VPN
Terms
VPLS: Virtual Private LAN Service, expands Ethernet LAN to IP/MPLS
network. It provides users with virtual cross-WAN transparent LAN service.
SVC: Spoke VC
uPE: User-facing PE
Basic Concepts
MPLS L2VPN provides L2 VPN service in the MPLS network. With the
MPLSL2VPN technology, carriers can provide users with L2 VPN services of
different media through the MPLS network, including ATM, FR, VLAN,
Ethernet, PPP, and HDLC. The MPLS network also provides common IP, L3
VPN, traffic engineering, and QOS service. As a result, carriers can save
the investment for constructing network.
With MPLS L2VPN, the carrier only needs to provide L2 connectivity for
users. The carrier does not need to participate in the route calculation of
VPN users. But the MPLS L2VPN is same as traditional L2 VPN (for example,
VPN provided by ATM PVC), there is the problem of N power. In each VPN,
the connection between any two CEs requires a link between CE and PE.
For PE device, if a VPN has N sites, N-1 physical or logical port connection
between CE and PE must be created. In MPLS L2VPN, PE device does not
participate in the route calculation of users. Therefore, the expansibility of
L2VPN is greater than L3VPN. But, L2VPN is less flexible.
PPVPN team of IETF worked out many frame drafts, in which, the two
most important types are Martini and Kompell. The Martini draft
implements MPLS L2VPN through expanding LDP; Kompell draft
implements it through expanding MP-BGP. Currently, Martini draft has
become a standard. Maipu supports this mode.
MPLS Networks
MPLS L2VPN covers Virtual Private Wire Service (VPWS) and Virtual Private
LAN Services (VPLS). VPWS is a point-to-point virtual dedicated line
technology. It supports most link layer protocols. VPLS provides similar
LAN services in the MPLS network. Distributed users can access mutually
like accessing LAN directly.
VPWS
The basic principle of MPLS L2VPN is similar to that of BGP/MPLS VPN. It
also uses the label stack to implement the transparent transmission of
packets in the MPLS network. External label (tunnel label) is used to
transfer packets from one PE to another. Internal label (in MPLS L2VPN, it
is called VC label) is used to distinguish different connections in different
VPNs. The receiver PE determines the destination CE according to the VC
label. In the process of forwarding, the label stack changes as follows:
Illustration
V: internal VC label
T, T1: external Tunnel label, in the MPLS forwarding, the tunnel label will
be replaced.
For the LDP protocol to distribute VC labels, RFC4447 expands the LDP
protocol. In the LDP protocol, FEC type of VC FEC is added. In addition,
the two PEs switching VC labels are not directly connected. Therefore, LDP
must use target peer to create a session and then transfer VC FEC and VC
labels over the session. The process of distributing VC labels of LDP is the
same as the distribution process of other labels.
The L2VPN implemented through expanding LDP can carry ATM, FR,
Ethernet/VLAN, PPP, and HDLC. It requires that the link layer protocols in
each site in the VPN are the same. Only when all sites are Ethernet or ATM,
the L2 VPN network can be created. The disadvantage of L2VPN in Martini
mode is that only the point-to-point VPN L2 connection can be created.
The automatic discovery mechanism of the VPN is not supported.
The L2VPN of Martini mode focuses on the problem how to create virtual
circuit between two CEs. It adopts VC-TYPE + VC-ID to identify a VC. VC-
TYPE indicates the VC types including ATM, ETHERNET, VLAN, and PPP.
VC-ID is used to identify a VC. It must be unique in the PE device. The PE
connecting two CEs exchanges VC labels through LDP protocol and binds
the corresponding CEs through VC-ID.
When the LSP connecting two PEs is created successfully, and the
exchange and binding of labels are complete, the VC is complete. The two
CEs transmit L2 data through the VC.
VPLS uses IP/MPLS domain to classify the network and restrict the L2
service to the entrance/edge network. According to the networking
requirements, the MAN using VPLS technology includes the following two
modes.
In the signaling control panel, VPLS technology uses LDP signaling protocol
to create a pair of cross-backbone network unidirectional MPLS VC-LSP,
and create corresponding PW between PEs. Transmit the Ethernet data
unit in the backbone network through PW. VC-LSP can be configured
statically or dynamically configured by the LDP protocol. The created PSN
tunnel can carry multiple VPLS services. At the same time, It shields the
transmission data to protect the cross-backbone network security.
For data forwarding, in the MAN created according to the VPLS technology,
the PE devices in the network independently learn the MAC address and
maintain the MPLS FIB table, encapsulate/de-encapsulate the received L2
data according to RFC4447. The data is exchanged through the PSN tunnel
created by LSP of MPLS between PEs. One VPLS instance corresponds to
one enterprise customer. PE maintains one MPLS FIB entry for different
VPLS instance. In the maintained MPLS FIB table entry, the key is the
relationship between MAC and PW, namely, the relation between MAC and
LSP. Note that one PW is composed of two LSPs. MAC corresponds to the
labels of negative direction. Then, the data can be properly forwarded.
When the PE maintains the MPLS FIB table entries, the problem similar to
the MAC address aging of the switch will be encountered. VPLS
implements the function through the signaling protocol to send address
withdraw message. The function is implemented through a FEC TLV
(involved VPLS of the flag) contained in a LDP address withdraw message
and a MAC address TLV (optional).
RAW: the packets can contain 802.1Q VLAN tag (or do not contain),
but the tag is meaningless for the two connected nodes. The tag is
transparently transmitted.
The packets received from AC, namely, the packets received from VLAN
interface, can contain tag of not. If the tag is contained, the tag can be the
Service-Tag (S-TAG) pressed by users for the SP network to distinguish
users. It can also be customer VLAN-Tag (C-TAG). To identify the S-TAG
or C-TAG, you should check the configuration of customers (packets first
match the TPID of OVID, then, match the TPID of IVID (namely, the per-
chip configured inner TPID). If the two TPIDs are equivalent, it is
considered to be OVID).
The packets with Tags are received from PW. If the PW is in TAGGED
mode, and the TPID in the packets is equivalent to the configured TPID,
the external TAG is considered to be S-TAG. Otherwise, it is C-TAG. If PW
is the RAW mode, the tag contained in the packets is C-TAG. The C-TAG is
transparently transmitted in the VPLS processing. It will not be deleted or
replaced.
1. Packet Encapsulation on AC
Ethernet access: The uplink Ethernet frame header of CE and the downlink
Ethernet frame header of PD do not contain S-TAG. If the frame header
contains VLAN tag, it indicates that it is the internal VLAN tag of the user
packets and it is meaningless for PE devices. The tag of the internal VLAN
is called C-TAG.
A. RAW Mode
Packets sent from AC do not process tag in the VPLS process, no matter
whether S-TAG or C-TAG exists. Whether S-TAG should be added to the
packets is determined by the port configuration and VLAN configuration.
B. TAGGED Mode
If the packets sent from AC contain S-TAG in the VPLS processing part,
judge whether the S-TAG is equivalent to the S-TAG of AC. If they are
equivalent, do not perform any operation; if they are not equivalent,
replace the tag. If the packets do not contain S-TAG, add the S-TAG of AC.
2. PW Encapsulation
The encapsulation mode in PW also contains two types: RAW mode and
Tagged mode.
A. RAW Mode
If PW uses the RAW mode, PW indicates the virtual links on two Ethernet
ports. Packets are transparently transmitted. The packets can contain Tags.
But the tag is meaningless fro ingress and egress PE. S-TAG will not be
transmitted over PW.
The packets received from AC will be output from PW. If the packets
contain S-TAG previously, delete the S-TAG first, then, press two layers of
MPLS labels before forwarding. If the packets without S-TAG are received,
press two layers of MPLS labels before forwarding.
B. TAGGED Mode
The packets received from AC should be output from PW. If the S-TAG is
contained previously, press two layers of MPLS labels before forwarding; if
the packets do not contain S-TAG, add an empty TAG (TAG VID=0) and
then press two layers of MPLS labels before forwarding.
Basic VPL S
The full-mesh interconnection structure is adopted in the basic VPLS.
H - VPL S
Hierarchical VPLS (H-VPLS), is a technology enhancing the VPLS
expansibility. It extends the access scope of the service provider VPLS and
decreases the network complexity to facilitate network management. At
the same time, the construction and operation cost is reduced. When
common VPLS is used, if one PE is expanded, full-connection with each PE
is required. If LDP is used, each PE device in the VIS should be configured.
N2 problems occur in the case of controlling the quantity of packets. After
the H-VPLS is used, expand a PE. You only need to modify the
configuration of the PE connected. In addition, the quantity of the packets
does not encounter the N2 problem.
New roles are introduced in H-VPLS: uPE, namely the user end PE, the PE
in the SP network connected with uPE is also called nPE, namely, network
end PE. uPE can be the L2 device with the Ethernet switch function only. It
can also be L3 device with switch and route functions. One end is
connected with PE of SP network; the other end (multiple interfaces)
connects multiple user CE devices in the building. uPE is one part of VPLS.
It connects with PE by creating a PW. The PW is also called SVC.
The core network in the H-VPLS is the full-mesh topology. The edge
network is the Hub-and-Spoke star topology. In the preceding figure, uPE
is the hub, and the multiple CEs are equivalent to spoke. The top layer and
the edge layer of the core are connected through the pseudo wire.
In the H-VPLS network, if you want to make full connection like basic VPLS,
the uPEs will serve as PEs in the basic VPLS to make full connection. The
quantity of sessions is greater than the full-connected PE devices in H-
VPLS. Therefore, H-VPLS enhances the expansibility of the VPLS. As a
result, the N power problem caused by the expansion is prevented. For the
new uPE, configure the uPE and the connected PE. You do not need to
For the signaling protocol between PE and uPE, one mode is that the PW
from PE to uPE is implemented through spoke VC function of LDP, to
implement H-VPLS; the other mode is the H-VPLS based on the QinQ
mode. It is only applicable to Ethernet link.
In the H-VPLS, uPE can access multiple CEs. The CEs can belong to one or
multiple different VPLS instances. Between nPE and uPE, label or VLAN-ID
is used to distinguish VPLS instance. If the VLAN-ID is used, the QinQ
technology is required for the user data frame may contain VLAN-ID label.
For the CEs in the same VPLS instance connected to the same uPE to
exchange information, you can implement the function through L2 switch
on uPE. The participation of nPE is not required.
When CE2 wants to send data to the remote CE1 (through the CE
connected to the WAN of SP) in the VPLS instance, the Ethernet frame is
first sent to uPE1. If uPE1 fails to learn the DEST-MAC (broadcast frame or
multicast frame) of the frame, send the frame on other ports (AC, SVC,
and PW) of non-receiving port. After PE2 receives the frame, if the MAC is
not learned, the frame will be broadcast in all ports (PW, AC, and other
SVC) of the VPLS instance. If the DEST-MAC is learned, the frame will be
sent in the corresponding PW. If PE1 of the other end receives the data
frame, it will be forwarded according to the DEST-MAC, namely, if the
DEST-MAC is not learned, broadcast the frame on other ports of the VPLS
instance; if the DEST-MAC is learned, send the frame to the corresponding
AC, and then upload to the CE1.
The connection between uPE and PE can adopt VC, which is called Spoke-
VC (SVC). Use the SVC to identify the VPLS instance of the packets
entering PE. For the SVC, there are two conditions:
For SVC, two VPLS instances (such as two cross-MAN VPLS instances) can
be connected. This is called Multi-domain VPLS. Two PEs connected by
SVC are called border-PEs. If multiple multi-domains should be
interconnected, perform full-mesh for border-PEs of each VPLS through
SVC. As a result, a L3 VPLS network is formed.
2. QinQ Access
A. Enable QinQ at the CE access port. Add pressed VLAN tag for
the received packets to serve as multiplexing separation tag.
Between MTU and PE1, transparently transmit packets to PE1
through QinQ tunnel.
B. PE1 first determines the home VSI according to the VLAN tag
of the carried MTU, and then press multiplexing separation
label (MPLS label) corresponding to PW it according to the
destination MAC of the packets. At last, forward the packets.
The address recycling message carries the MAC TLV. The devices receiving
the message delete the MAC address or re-learn the MAC address
according to the parameters specified by TLV.
The destination of the MAC address recycling message is relevant with the
fault type. The basic principle is to notify all devices that may learn the
MAC addresses. The fault types include: AC interface fault, Mesh-PE device
fault, and Spoke-PE device fault.
When the AC interface is faulty, you should send the MAC address
recycling message to all Mesh-PE devices and Spoke-PE devices.
When a Mesh-PE device is faulty, you should notify all Spoke-PE devices.
When a Spoke-PE device is faulty, notify all Mesh-PE devices and other
Spoke-PE devices.
In the basic networking environment, among all PEs of the same VPLS
instance, the PE will be created to form a full-mesh topology. As a result, a
PE can connect with other PEs through the PW. At the same time, PE will
be connected to CE through the access circuit (AC). In split horizon, the
broadcast, multicast, or the frames to be flooded that are received from
the PW will not be sent to other PWs (including itself) of the same VPLS
instance, but they can be sent to AC; the broadcast, multicast, or frames
to be flooded that are received from AC, except the AC itself, can be sent
to other PWs and ACs of the same VPLS instance, namely, the packets
received from the PWs at the public network will not be forwarded to other
PWs of the public network. The packets can only be forwarded to the
private network.
The core network created in this mode does not have loopback.
If uPE is dual-homed to two PEs, the L2 split horizon only cannot prevent
the loopback. You have to enable the spanning tree protocol between uPE
and nPE.
devices for VPLS customers, when devices for VPLS customers, when you
you add, delete, or re-deploy CEs in add, delete, or re-deploy CEs in the L2
the L2 VPN, you must re-configure VPN, you must re-configure the
each peer PE. connected PEs.
Signaling LDP, the pseudo wire between PEs is LDP, the pseudo wire between PEs is
protocol called VC. called PW or SVC.
Encapsulation Add the VPWS label, and then add Add the VPLS label, and then add the
mode the label of external MPLS tunnel. label of external MPLS tunnel. Take
Take the FR AC access as an the FR AC access as an example:
example: When the AC interface encapsulation
When the AC interface encapsulation between CE and PE is FR, packets are
between CE and PE is FR, packets received on PE (the format should be:
are received on PE. Add the VPWS FR header + Ethernet header + Data).
label before the FR header, and The FR header should be removed.
then, add external MPLS label. Add VPLS label only before the
Ethernet header in the FR packet, and
then, add MPLS label.
AC access Multiple types of ACs are supported, Multiple types of ACs are supported,
such as, PPP, HDLC, Ethernet, VLAN, such as, PPP, HDLC, Ethernet, VLAN,
FR, and ATM FR, and ATM
Packet The network connection is as The network connection is as follows:
processing follows: CE1--------PE1--------P------- CE1--------PE1--------P-------PE2--------
flow PE2--------CE2. CE2.
Assume that: data communication is Assume that: data communication is
performed from CE1 to CE2, VC label performed from CE1 to CE2, VC label
is exchanged through the LDP is exchanged through the LDP protocol
protocol between PE1 and PE2. The between PE1 and PE2. The data
data processing flow is as follows: processing flow is as follows:
CE1PE1, PE1 adds the VPWS label CE1PE1, after PE1 receives packets,
and then adds the global route label, it learns the MAC address of CE1, and
send to PE2, after PE2 receives the then search the MAC address table in
packets, remove the label, and send the VPLS instance taking the
to CE2 interface. destination MAC as the key value. The
found destination MAC will be sent to
the PW of PE2, add VPLS label in the
encapsulation, and then add global
MPLS label, at last, send to PE2. After
PE receives the packets, learn the
address and then search the table. If
the AC is found, remove the label, and
then send it to CE2 interface. If the
packet is not found in the table,
perform flooding in the VPLS instance
based on the split horizon principle.
Traffic engineering is to design the traffic for normal transmission over the
network. Despite the efforts of network designers, the actual traffic in the
network is not the same as the predicted value. The increasing speed of
the traffic is beyond the expectation sometimes, but the network
designers cannot upgrade the network at once. Usually, rapid traffic
increase, emergency, or network accident may increase the requirements
for bandwidth at certain places. At the same time, some links in the
network is not fully utilized. The core concept of the traffic engineering is
to transfer the traffic, and the traffic blocking the link will be transferred to
the links not fully utilized. The traffic engineering is not the proprietary
product of MPLS, it is a universal solution. MPLS-based traffic engineering
is a trial. It attempts to use the link-oriented traffic engineering
technology and integrate the technology with IP routing technology.
At the ingress port (it can be considered to be source end of the data) of
MPLS network, the MPLS traffic engineering controls the path to specific
destination. Create the LSP and reserve network bandwidth in the passing
routes. Balance the traffic load and make full use of the link bandwidth.
The acronym of MPLS traffic engineering is MPLS-TE.
MPLS-TE ensures the bandwidth for each traffic by creating tunnels. After
the tunnel is created, the data is mapped to be FEC, and is forwarded in
the tunnel along the LSP path. At the head end of the tunnel, the tunnel
exists as a tunnel interface. Any traffic to pass the tunnel, should be sent
through the interface. In the network routing, the tunnel interface can be
found through static route and dynamic route. The routes directing to the
tunnel interface can be distributed through the dynamic route.
The link state routing protocol (OSPF or IS-IS), according to the known
network topology and the advertised MPLS-TE network topology
information, calculates the shortest path of the required MPLS-TE tunnel.
neighbor address
TE metric The cost of calculating tunnel path in the link mpls traffic-eng
admin-weight
Maximum physical The maximum physical bandwidth on the link bandwidth
link bandwidth interface
Maximum reserved Maximum bandwidth that can be reserved in ip rsvp bandwidth
bandwidth the link
Unreserved Unassigned reserved bandwidth of each N/A
bandwidth for each priority tunnel on the link
priority
Attribute flag The link attributes defined by the user. mpls traffic-eng
Include or exclude the link according to the attribute-flags
attribute in the path calculation.
After the MPLS-TE tunnel path is calculated, transfer the path information
to the RSVP. Then, the RSVP creates a tunnel according to the path
information.
After the tunnel is created (UP), RSVP-TE protects the tunnel through the
"soft-state" mode. Soft sate means that the relevant states are maintained
through refreshing Path and RESV messages. It includes that each node
sends Path messages to the downstream respectively and sends RESV
messages to the upstream. Each node will wait for the upstream Path
messages and downstream RESV messages. If the wait times out, we
think that the tunnel maintenance is not required and thus delete the
relevant resource reservation. Independence means that the node will not
immediately send Path messages to the downstream after receiving
upstream Path messages, and will not immediately send RESV messages
to the upstream after receiving the downstream RESV messages. Of
course, the condition of receiving PATH messages and RESV messages for
the first time should be excluded. They have their own cycle. But the
cycles are not the same. The time of refreshing cycle changes around 50%
(up and down) to avoid global cycle synchronization. If the refreshing
cycle is 30 seconds, the possible refreshing time includes: 30s, 45s, 15s,
and 30s.
Stat ic Route
In static route, certain traffic is designated to forward through MPLS-TE
tunnel.
For the route reaching MPLS-TE tunnel end, the load between IGP path
and MPLS-TE tunnel can be balanced. The downstream of the MPLS-TE
tunnel end does not belong to the scope of the traffic engineering.
Therefore, you need not worry whether the nodes support the traffic
engineering.
The next-hop generated by the automatic route cannot balance the load
with the next-hop generated by the forwarding adjacency. The automatic
route has the absolute priority. The forwarding adjacency is implemented
through affecting the SPF route calculation network topology. But the
automatic route is implemented by replacing the next hop of the relevant
route after the SPF route calculation. Therefore, the next hop generated by
the automatic route has the absolute priority.
For MPLS-TE tunnel or IGP path, the condition for determining load
balance is that the metric values of the paths are the same.
Tunnel Protection
In the network, the link and switch nodes carrying tunnel traffic may fail
owing to the internal fault. The failure of any link or node may cause
network breakdown and data loss. When the network encounters a failure,
the IGP protocol running in the large-scale network requires long time to
perform route aggregation. RSVP-TE also requires long time to update the
path used by the tunnel. During this period, the data in the tunnel will be
lost. In addition, the duration is long. You can use minutes to be the
metric unit. To prevent this symptom, RSVP-TE provides sophisticated
tunnel protection and restoration function. As a result, the data loss or
interruption caused by the link or node failure is reduced.
The protection includes two types: Full path protection and fast re-route
protection (FRR). MP switch supports FRR.
As shown in the preceding figure, the backup LSP protects R2 node. At the
same time, it protects the link R1 > R2.
1. One-to-One
One-to-One mode means that one backup LSP protects one protected
tunnel. See the following figure. The red LSP is the backup LSP, which is
called Detour LSP in this mode. It protects the primary LSP (TUNNEL). The
Detour LSP starts from S1 switch. S1 switch is called point of local repair.
It is the ingress device of the detour. The Detour LSP bypasses the
downstream node S2 of PLR (S1). The destination of the Detour LSP is the
Egress of the protected Tunnel. It meets with primary LSP in S3 and is
merged into the primary LSP. This action is called "Merge". Therefore, S3
is called Merge Point (MP). Actually, the mergence operation is not
necessary. But if the merge operation is not performed, multiple LSP
signaling should be maintained after the MP. Therefore, the mergence
operation is required.
The detour LSP in the One-to-One mode exists depending on the protected
LSP. If the protected LSP is deleted, the Detour LSP related with the
tunnel will be deleted.
In the One-to-One Mode, the Ingress node of the primary LSP initiates the
FRR requirements. Each node (including Ingress) in the primary LSP try to
create the Detour LSP with itself as the start point. Therefore, the
expansibility of the protection mode is faulty. MP switch does not support
the One-to-One mode.
2. Facility Mode:
In the facility mode, the end node of the bypass tunnel is the Next Next
Hop (NNHOP) of the PLR, as shown in S3 device in figure 25-21. It
bypasses the downstream node (S2) of the PLR.
In the facility mode of the link protection, the end node of the bypass
tunnel is the next hop (NHOP) of the PLR. It bypasses the S1->S2 link
between the PLR and the downstream node (S2).
MP series switches support the facility mode protection, including the link
protection and node protection.
Graceful Restart
Graceful Restart (GR), means that the forwarding service is not
interrupted when the protocol is restarted.
will send the path message. S3 device will send the recovery path
message for helping S2 device to restore the state.
MPLS OAM
Introduction to MPLS OAM
According to the actual demands of the carrier network, the network
management work can be classified into three types: operation,
administration, and Maintenance (OAM). The operation covers prediction,
planning, and configuration for routine network and services; the
maintenance covers test and fault management.
The OAM function is very important in the public network for it can simplify
the network operation, check the network performance, and reduce the
operation cost. In the network providing QoS, OAM is particularly
important. Relevant OAM function is defined for traditional SDH/SONET
and ATM. MPLS, as the key carrying technology of the expansible next
generation network, provides multiple-service capability with QoS.
In the MPLS network, when the label switch path (LSP) failed to forward
user data, the control panel requires a method to detect MPLS LSP data
graphical fault. But in the detection methods of traditional IP network, IP
Ping and Traceroute cannot detect the connectivity of the MPLS network.
Traditional Traceroute cannot locate MPLS LSP faults hop by hop and
return relevant information of LSP. Successful IP forwarding does not
mean that the LSP is connected. In addition, standard ICMP packets
cannot return relevant information including label stack and downstream
mapping of LSP.
A method for the MPLS network to detect the faults is required. This
document describes a simple but effective mechanism-MPLS LSP
Ping/Traceroute for detecting the faults of the MPLS LSP.
2. Basic Principle
It adopts the packets of specific FEC forwarding class to verify the integrity
of the LSP (from ingress LSR to egress LSR) in the FEC. The information of
the home FEC is carried in the MPLS echo request message.
In the LSP ping operation, the echo request packets are encapsulated in
the UDP packets, including serial number and NTP time stamp parameter.
The destination port number is well-know port 3503. When the MPLS is
processing the LSP Ping request messages, the forwarding policy same as
that of the FEC packets is adopted. When the ping command is used to
test the connectivity, the packets reach the LSP egress port. The LSR
checks the packets to verify that whether the port is the actual egress port
of the FEC.
LSP Traceroute mode is used as a method for locating faults. The LSR that
initiates the test sends ping packets to the destination LSR. The initial
value of TTL is 1, the step value is 1. The LSRs check the packets to return
the information of relevant control panel and data panel.
M PLS BFD
1. Introduction to the Protocol
In the asynchronous mode, the BFD control packets are mutually sent
between systems periodically. If a system failed to receive the BFD control
packets from the opposite end in certain time, announce that the session
is down and notify the control panel or the forwarding panel.
2. Creating a Session
When the BFD is used to detect the fault of the MPLS LSP, a BFD session is
created between the ingress LSR and the egress LSR. The BFD control
packets are transmitted along the data path same as that of LSP.
A. The ingress LSR (the initiative party of the session) sends an echo
request packet carrying local session discriminator.
B. The egress LSR (the passive party of the session) replies an echo
replay packet carrying local session discriminator.
C. The ingress LSR sends a BFD control packet to the egress LSR.
Set the value of Your Discriminator field to the session
discriminator of egress LSR to enter the Down state.
D. The egress LSR receives the BFD control packets of the ingress
LSR. Send a BFD control packet to the ingress LSR to enter the
Down state.
E. After the ingress LSR receives the BFD control packets of the
egress LSR, the state changes from Down to INIT. Determine the
sending interval and detection time of the local packets according
to the time parameter carried in the packets. Start the timer of
sending BFD packets. Send the BFD control packets according to
the negotiated interval.
F. The egress LSR receives the BFD control packets of the ingress
LSR. The state changes from Down to Up.
G. After the ingress LSR receives the BFD packets of the egress LSR,
the state changes from INIT to Up.
The creation of the session covers three handshaking processes. After the
creation process, the session become Up. Negotiate the corresponding
parameters. The subsequent state changes are based on the fault
detection results. Relevant processing should be performed. The state
machine migration is as follows:
Overview
With the rapid development of the IP network scale and services, the user
quantity of the IP network increases and more and more problems of the
IP network appear, such as insufficient address space and security
problem. To solve the Internet problems, especially the problem of
insufficient address space, IETF defines the next-generation Internet
protocol based on IPv4 in 1992, called Ipng or IPv6.
Main contents:
ICMPv6 protocol
IPv6 address
Type: The length is 8 bits, indicating that the packet provides one
distinguish service. At first, RFC 1883 defines the field as 4 bits and
names as priority field. Later, the name of the field changes to Type.
The latest IPv6 Internet scheme, it is called service flow type. The
definition of the field is independent from IPv6 and currently, it is not
defined in any RFC. The default value of the field is all-0.
Flow label: The length is 20 bits, used to identify the packets that belong
to one service flow. One node can serve as the sending source of multiple
service flows. The flow label and source node address uniquely identify
one service flow. At first, RFC 1883 defines the field as 24 bits, but the
after the length of the type field increases to 8 bits, the flow label field is
forced to reduce the length as compensation.
Payload length: The length is 16 bits, including the byte length of the
packet payload, that is, the bytes contained in the packet behind the IPv6
header. It indicates that when calculating the payload length, the length of
the IPv6 extension header is contained.
Next header: The field indicates the protocol type in the header field
following the IPv6 header. Similar to the IPv4 protocol field, the next
header field can be used to indicate that the upper layer is TCP or UDP,
but it can also be used to indicate the existing of the IPv6 extension
header.
Hop threshold: The length is 8 bits. After one node forwards the packet,
the field is reduced by 1. If the field reaches 0, the packet is dropped. In
IPv4, there is the life time field with the similar function, but different from
IPv4, people are unwilling to define one upper threshold about the packet
life time in IPv6, which means that the function of judging the timeout for
the outdated packet can be completed by the high layer protocol.
Source address: The length is 128 bits, indicating the address of the
sender of the IPv6 packet.
Destination address: The length is 128 bits, indicating the address of the
receiver of the IPv6 packet. The address can be one unicast, multicast or
any on-demand address. If the routing extension header is used (define
the special routes that one packet must pass), the destination address can
be the address of one intermediate node, but not the final address.
ICMPv6 Protocol
The IP node needs one special protocol to exchange packets, so as to get
to know the information about IP. ICMP is just suitable for the requirement.
When the IPv4 is upgraded to IPv6, ICMP experiences some modification.
The latest ICMPv6 is defined in RFC 2463. The ICMP packet can be used to
report the error and the information status, as well as the Internet
detection (Ping) of the packet and route tracking.
The generation of the ICMP packet is from some errors. For example, if
one gateway device cannot process one IP packet because of some reason,
it may generate one type of ICMP packet and directly return the packet to
the source node of the packet. And then the source node adopts some
methods to correct the reported error status. For example, if the reason
why the gateway device cannot process one IP packet is because the
packet is too long and cannot be sent to the network link, so the gateway
device generates one ICMP error packet to indicate that the packet is too
long. After receiving the packet, the source node can use the packet to
confirm one more suitable packet length and re-send the data via a series
of new IP packets.
RFC 2463 defines the following packet types (excluding the group packets
defined in the document):
3. Timeout;
The packet is generated when the gateway device or the source host can
forward one packet because of the reasons except for the blocking of the
service flow. The error packet has four codes, includeing:
2: The address is unreachable. The code indicates that there are some
problems when the IPv6 destination address is parsed to the link-layer
address or the link layer of the destination network goes to its destination.
3: The port unreachable. This happens when the high-layer protocol (such
as UDP) does not listen to the destination port and the transmission layer
protocol does not have other methods to inform the problem to the source
node.
When the gateway device that receives one packet cannot forward the
packet because the packet length is larger than the MTU of the destination
link, generate one packet about the too long packet. The ICMPv6 error
packet has one field to indicate the MTU value of the link that results in
the problem. During the process of discovering the path MTU, this is one
useful error packet.
Timeout
When the gateway device receives one packet with hop threshold 1, it
must reduce the value before forwarding the packet. If after the gateway
device reduces the value, the hop threshold field changes to 0 (or the
gateway device receives the packet with hop threshold field), the gateway
device must drop the packet and send the ICMP timeout packet to the
source node. After the source node receives the packet, it can be regarded
that the original hop threshold is set too small (the actual route of the
packet is larger than the expected) or one routing circulating results in the
failure of the packet delivery. The packet is useful in the tracking route
function. With the function, one node can identify all gateway devices on
the path of one packet from the source node to the destination node. Its
working mode is as follows: First, the hop threshold of the packet to the
destination is set as 1. The first gateway device that the packet reaches
reduces the hop threshold to 0 and returns one timeout packet. In this
way, the source node identifies the first gateway device on the path. And
then if the packet must pass the second gateway device, the source node
re-sends one packet with hop threshold 2 and the gateway device reduces
the hop threshold to 0 and generates another timeout packet, which ends
when the packet reaches the destination address and meanwhile, the
source node also gets the timeout packet sent from each intermediate
gateway device.
Parameter problem
When some part of the IPv6 header or the extension header has problem,
the gateway device cannot process the packet, but just drops it. The
gateway device should generate one ICMP parameter error packet to
indicate the problem type (such as the error header field, un-identifiable
next header type or un-identifiable IPv6 option) and use one pointer value
to indicate which byte has the error.
ICMPv6 contains one function that is not related with the error. All IPv6
nodes need to support two kinds of packets, that is, the echo packet and
echo response. The echo request packet can be sent to any correct IPv6
address and contain one echo request ID, one order number and some
data. The echo request ID and order number are optional, but they can be
used to distinguish the responses of different requests. The data of the
echo request is also one option and can be used for diagnosis. When one
IPv6 node receives one echo request packet, it must return one echo
response packet. The response packet contains the same request ID, order
number and the data carried in the original request packet. The ICMPv6
echo request/response packet pair is the basis of the ping function. Ping is
one important diagnosis function, because it provides one method to
confirm whether one special host is connected to the same network with
other hosts.
discovery part in the ARP and ICMP of IPv4, and has the mechanism of
checking the unreachable neighbor. The neighbor discovery protocol
realizes the functions of gateway device and prefix discovery, address
resolution, next-hop address confirming, re-direction, neighbor un-
reachable checking and repeated address checking. The functions of the
link-layer address change, input address balance, any-cast address and
proxy advertising. The neighbor discovery protocol adopts five types of
IPv6 control information packet (ICMPv6) to realize the functions of the
neighbor discovery protocol. The five types of messages are as follows:
1. Router Solicitation: When the interface works, the host sends the
router request message to request the gateway device to generate the
router Advertisement message at once, but do not need to wait for the
next scheduled time;
5. Redirect: The gateway device informs the host via the re-direction
message. For the special destination address, if it is not the best route,
inform the host to reach the best next hop of the destination address.
IPv6 has one design requirement. Even in the limited network, the host
must work correctly and it is unnecessary to save the route table on the
gateway device or have fixed configuration. Therefore, the host must
configure automatically and learn the information about how to send the
data to the destination. The memorizer that saves the information is called
cache. The data structure is the queue of a series of records, called entries.
The information of each entry has some validity and you need to clear up
the entries in the cache, so as to ensure the space size of the cache. The
host needs to maintain the following information for each interface:
Prefix list: The list of a group of the prefixed of the on-link addresses.
The entries of the prefix list are generated from the information received
by the router advertisement. Each entry has one related invalid timer
value (depending on the advertisement information), which is used to
abandon the prefix when the prefix becomes invalid. Unless one new
(limited) value is received in the later advertisement, the special
unlimited timer value rules that the prefix is valid forever. The local link
prefix is in the prefix list with the unlimited invalid timer regardless
whether the gateway device is advertising the prefix. The received router
advertisement should not modify the invalid timer of the local link prefix.
Default router list: The list of the routers that receive packets. The entries
of the router list point to the entries in the neighbor cache. The default
selection algorithm of the gateway device is: Select the known reachable
gateway devices, but do not select the gateway device whose reachability
is not confirmed. Each entry has one related invalid timer value (got from
the router advertisement information), which is used to delete the entries
that are not advertised any more.
4. DELAY: The neighbor is not reachable any more and the data flow is
sent to the neighbor recently. Do not detect the neighbor at once, but
send detection information after one short delay, which can provide the
reachability confirming for the upper protocol;
5. PROBE: The neighbor is not reachable any more; meanwhile, send the
unicast neighbor request detection to check the reachability.
When the node sends the packet to the destination, use the destination
cache, prefix list, and default router list to confirm the suitable next-hop IP
address and then the gateway device queries the neighbor cache to
confirm the link-layer address of the neighbor.
The operation of confirming the next hop of the IPv6 unicast address is as
follows:
The sender uses the prefix in the prefix list to perform the longest prefix
matching, so as to confirm the destination is connected or un-connected.
If the next hop is connected, the next-hop address is the same as the
destination address. Otherwise, the sender selects the next hop from the
default router list. If the default router list is null, the sender regards that
the destination is connected.
After learning the IPv6 address of the next-hop gateway device, the
sender checks the neighbor cache to confirm the link-layer address. If
there is no existing next-hop IPv6 address entry, the work of the gateway
device is as follows:
When the address resolution ends, get the link-layer address and save it in
the neighbor cache. Here, the entry becomes the new reachable status
and the packets in the queue can be transmitted.
For the multicast packet, the next hop always is regarded as being
connected and confirm that the link-layer address of the multicast IPv6
address depends on the link type.
When the neighbor cache starts to transmit the unicast packet, the sender
checks the related reachability information and validate the neighbor
reachability according to the neighbor un-reachable checking algorithm.
When the neighbor is un-reachable, execute the operation of confirming
the next hop and check whether another path to the destination is
reachable.
If the IP address of the next-hop node is known, the sender checks the
link-layer information about the neighbor in the neighbor cache. If there is
no entry, the sender creates one and sets its status as INCOMPLETE.
Meanwhile, enable the address resolution and make the packets whose
address resolution is not complete in a queue. For the interfaces with the
multicast function, the address resolution process is to send one neighbor
request information and wait for one neighbor advertisement. When
receiving one neighbor advertisement response, the link-layer address is
saved in the neighbor cache and send the packets in the queue.
When transmitting the unicast packets and every time reading the entries
of the neighbor discovery cache, the sender checks the related information
of checking the neighbor un-reachability according to the algorithm of the
neighbor un-reachability checking, but the un-reachability checking makes
the sender send out the unicast neighbor request, so as to check whether
the neighbor is reachable.
When the data flow is sent to the destination for the first time, execute the
operation of confirming the next hop and then if the destination still can
communicate normally, the destination cache entries can continue to be
used. If the neighbor un-reachability algorithm decides to end the
communication on one point, execute the operation of confirming the next
hop again. For example, the traffic of the faulty gateway device should
switch to the gateway device that works normally and the data flow to the
mobile node may be re-routed to mobile agent.
When the node re-confirms the next hop, do not need to drop the entries
of the whole destination cache. Here, information of the PMTU and round
timer value is useful.
The gateway device must drop the router request and router
advertisement messages that do not meet the validity check
unconditionally.
The router discovery function is used to identify the gateway device that is
connected to the specified link and get the prefix and configured
parameters related with the address auto configuration.
As the response for the request message, the gateway device should
periodically send the multicast router advertisement message to advertise
the reachability of the node on the link. Each host receives the router
advertisement message from the gateway device connected to the link
and sets up the default router list (the gateway device used when the path
to the destination is un-known). If the gateway device frequently
generates the router advertisement messages, the host can learn the
existing of the gateway device within several minutes. Otherwise, use the
neighbor un-reachability check.
The router advertisement message should contain the prefix list that is
used to confirm the connection reachability. The host uses the prefix got
from the router advertisement message to confirm whether the destination
is being connected and whether it is reachable directly or whether it is
non-connected or is reachable only via one gateway device. The
destination is connected, but the destination is not covered by the prefix
learned by the router advertisement message. In this case, the host
regards that the destination is non-connected and the gateway device
sends the re-direction message to the sender.
When the host sends the router request message to the gateway device,
the gateway device should send the router advertisement message at once,
which can speed up the configuration of the node.
2. Address resolution
The IPv6 node resolves the IPv6 address to the link-layer address via the
neighbor request and neighbor advertisement message; do not execute
the address resolution for the multicast address.
The node activates the address resolution process via the multicast
neighbor request message. The neighbor request message is used to
request the target gateway device to return its link-layer address. The
source gateway device contains its link-layer address in the neighbor
request message and multicasts the neighbor request message to the
multicast address of the request node related with the target address. The
target gateway device returns its link-layer address in the unicast neighbor
advertisement message. With the pair of messages, the source and
destination gateway devices can resolve the link-layer address of each
other.
3. Re-direction function
The gateway device must confirm the local link address of each neighbor
gateway device, so as to ensure that the target address of the re-direction
message identifies the neighbor gateway device according to the local link
address.
When the source terminal does not answer the re-direction message
correctly or the source terminal ignores the un-authenticated re-direction
message, to save the frequency band and the processing expense, the
gateway device must limit the rate of sending the re-direction message.
All paths between the host and the neighbor node should perform the
neighbor reachability check, including the communication between the
host and the host, between the host and the gateway device, and between
the gateway device and the host. It can also be used between the gateway
devices to check the neighbor or the fault of the neighbor forward path.
If the gateway device receives the confirming recently that the IP layer of
the neighbor has received the packet sent to it recently, the neighbor is
reachable. The un-reachability checking of the neighbor uses two methods
to confirm: One is the prompt from the upper protocol, providing the the
connection is being processed confirming; the other is that the gateway
device sends the unicast neighbor request message and receives the
responded neighbor advertisement message. To reduce the unnecessary
network traffic, the detection message is only sent to the neighbor.
After IETF made the standard text RFC2461 of the neighbor discovery
protocol in Dec. 1998, the neighbor discovery becomes the important
protocol used by the IPv6 node, solving the interoperation problem
between all nodes connected on one link.
The current IPv6 standard are already stable and the related products and
devices developed by the international manufacturers also become mature,
but the requirement of China market for IPv6 technology is not clear.
Therefore, the IPv6 technology is still at the practice and operation phrase
of the trial network in China. With the speedup of the commercial process
of the IPv6 network application, the neighbor discovery protocol is used
more widely.
IPv6 Address
The most obvious difference between IPv4 and IPv6 addresses is the
length. The length of the IPv4 address is 32 bits and the length of the IPv6
address is 128 bits. The RFC 2373 not only explains the expressing modes
of the addresses, but also describes the different address types and the
structures. The IPv4 address can be divided to 2-3 different parts (network
ID, node ID, and subnet ID). The IPv6 address has larger address space
and supports more fields.
The IPv6 address has three types, including the unicast, multicast and
any-cast address. The unicast address and multicast address are similar to
the IPv4 address. IPv6 does not support the broadcast address in IPv4 any
more, but adds one any-cast address.
The length of the IPv6 address is four times of the IPv4 address, so the
complexity of expressing IPv6 address is four times of IPv4 address. The
basic expression mode of IPv6 address is X:X:X:X:X:X:X:X. Here, X is one
4-bit hexadecimal integer (16 bits). Each number contains 4 bits, each
integer contains 4 numbers, each address includes 8 integers and there
are 128 bits totally (448 = 128). For example, the following are some
valid IPv6 addresses:
CDCD:910A:2222:5498:8475:1111:3900:2020
1030:0:0:0:C9B4:FF12:48AA:1A2B
2000:0:0:0:0:0:0:1
These integers are hexadecimal integers. A-F mean 10-15. Each integer in
the address must be expressed, but the start 0 is unnecessary to be
expressed. This is one standard IPv6 address expression mode. Besides,
there are another two common modes. Some IPv6 address may contain a
long list of 0 (just like the previous example 2 and 3). In this case, the
standard permits using space to express the long list of 0. That is to say,
the address 2000:0:0:0:0:0:0:1 can be expressed as 2000::1.
The two colons mean that the address can be expanded to one complete
128-bit address. In this method, only when 16-bit group is all 0, it can be
replaced by two colons, which can appear for only one time in the address.
In the mixed environment of IPv4 and IPv6, there may be three methods.
The lowest 32 bits in the IPv6 address can be used to express IPv4
address. The address can be expressed by one mixed mode, that is,
X:X:X:X:X:X:d.d.d.d. Here, X means one 16-bit integer, while d means
one 8-bit decimal integer. For example, the address 0:0:0:0:0:0:10.0.0.1
is one valid IPv4 address. Combine two possible expression modes and the
address can also be expressed as ::10.0.0.1.
The IPv6 address is divided to two parts (subnet prefix and interface ID),
so people hope that one IP node address can be expressed as one address
with the additional value by the mode of similar to CIDR address,
indicating how many bits in the address are the mask. The IPv6 node
address indicates the prefix length, which is separated from the IPv6
address by slash, such as 1030:0:0:0:C9B4:FF12:48AA:1A2B/60. In the
address, the prefix length used for routing is 60 bits.
1. Unicast: The ID of one single interface. The packet sent to one unicast
address is transmitted to the interface with the address ID.
Unicast
The unicast address identifies one separate IPv6 interface. One node can
have multiple IPv6 network interfaces. Each interface must have one
related unicast address. The unicast address can be regarded to contain a
segment of information. The segment of information is contained in the
128-bit field. The address can define one special interface. Besides, the
data in the address can be explained as multiple small segments of
information. Anyway, when all information is placed together, one 128-bit
address that identifies one node interface is formed.
The IPv6 address can provide some information about its structure for the
node, which depends on who views the address and what to view. For
example, the node may only need to know that the whole 128-bit address
is one unique ID, but does not need to know whether the node exists in
the network. On the other hand, the gateway can use the address to
decide that one part of the address identifies one special network or one
unique node on the subnet.
For example, one IPv6 unicast address can be regarded as one entity with
two fields. One field is used to identify the network and the other is used
to identify the interface of the node on the network. The network ID can
be divided to several parts, identifying different network parts. The IPv6
unicast address function can be limited by CIDR like IPv4 address, that is
to say, divide the address on one special edge to two parts to two parts.
The high-bit part of the address contains the prefix used by routing, while
the low-bit part of the address contains the network interface ID.
The simplest method is to make the IPv6 address as one 128-bit data that
is not distinguished, but from the formatting view, it can be divided to two
segments, that is, interface ID and subnet prefix. The length of the
interface ID depends on the length of the subnet prefix. The lengths of the
interface ID and subnet prefix are variable. For the gateway device that is
near to the addressing node interface (far from the backbone network),
you can use fewer bits to identify the interface; but for the gateway device
that is near to the backbone network, just need a few address bits to
specify the subnet prefix. In this way, most part of the address is used to
identify the interface ID.
RFC 2373 changes and simplifies the IPv6 address distribution. One is to
cancel the address distribution based on the physical location and the
unicast address based on the supplier changes to the aggregatable global
unicast address. Seeing from the name change, for the address based on
supplier, permit the previous defined aggregation and the new aggregation
based on the exchange office. This reflects one more balanced address
classification. The NSAP and IPX address space is still reserved and 1/8 of
the addresses are distributed to the aggregatable addresses. Besides,
except for the multicast address and one type of reserved address, the
remaining part of the IPv6 address space is the un-distributed address,
reserving the enough space for the future development.
1. Interface ID
In the IPv6 addressing structure, any IPv6 unicast address needs one
interface ID. The interface is like the MAC address. The MAC address is
burned into the NIC by the manufacturer. The MAC address is unique
globally and it is impossible that two NICs have the same MAC address.
The address can be used to identify the interface on the network link layer.
The interface ID of the IPv6 host address is based on IEEE EUI-64 format
The format is based on the existing MAC address to create 64-bit interface
ID, which is unique globally and at the local. The appendix of RFC 2373
explains how to create the interface ID.
The 64-bit interface ID can uniquely identify each network interface, which
means that there can be 2 64 different physical interfaces in theory and
19
there are about 1.8 10 different addresses, which only uses a half of
the IPv6 address space.
FP field: It is the format prefix in the IPv6 address. The length is three bits,
used to identify to which kind the address belongs in the IPv6 address
space. Currently, the field is 001, indicating that it is the aggregatable
global unicast address.
The TLA ID field: The top-level aggregation ID, containing the highest
level address routing information. It is the maximum routing information
in the network interconnection. Currently, the field is 13 bits and can get
the maximum 8192 different top-level routes.
RES field: The field is 8 bits, reserved for the future use. At last, it may be
used to extend the top-level or the next-level aggregation ID field.
Interface ID field: It has 64 bits, including the 64-bit value of the IEEE
EUI-64 interface ID.
The first 8-bit of all addresses in the first 1/256 IPv6 address space: 0000
0000 is reserved. The most empty address space is used as the special
address. The special addresses include:
The IPv6 address with IPv4 address: There are two kinds of addresses.
One permits the IPv6 node access, but does not support the IPv4 node of
IPv6; the other permits the IPv6 gateway device to use the tunnel mode
to transmit the IPv6 packet on the IPv4 network.
The IPv4 compatible address is used by the node to transmit the IPv6
packet via the IPv4 gateway device in the tunnel mode. The nodes
understand IPv4 and IPv6. The IPv4 reflection address is used by the IPv6
node to access the node that only supports IPv4.
For the organizations that are unwilling to apply for the global unique IPv4
network address, adopt the 10 model address to translate the IPv4
network address and provide one option for the organizations. The
gateway device used by the organizations should not forward the
addresses, but cannot block forwarding the addresses or distinguish the
addresses or other valid IPv4 addresses. You can configure the gateway
device to forward the addresses.
To realize the function, IPv6 extracts two different address segments from
the global unique Internet space. The link local address is used to number
the host on the single network link. The address identified by the front 10
bits of the prefix is the link local address.
The gateway devices do not process the packets with the link local address
at their source and destination ends, because they do not forward the
packets forever. The middle 54 bits of the address are set as 0. The 64-bit
interface ID also uses the IEEE structure and the part of the address space
64
permits some network to connect up to ( 2 - 1) hosts.
Multicast
Like the broadcast address, the multicast address is useful in the local
network similar to the old Ethernet. In the network, all nodes can detect
all data transmitted on the line. When each transmission starts, each node
checks the destination MAC address of the packet. If consistent with the
interface MAC address of the local node, the node accepts the packet. If it
is broadcast, the node only needs to listen, but does not need to make any
decision, so it is simple. For multicast, it is a little more complicated. The
node needs to reserve one multicast address. When it is found that the
destination address is the multicast address, you need to confirm whether
it is the multicast address reserved by the node.
The format of the IPv6 multicast address is different from that of the IPv6
unicast address. The multicast address can only serve as the destination
address and no packet takes the multicast address as the source address.
The first byte in the address format is all 1, indicating that it is the
multicast address. The other part of the multicast address except for the
first byte includes the following three fields:
Flag field: It comprises four single bit flags. Currently, only the fourth bit
is specified and the bit is used to indicate that the address is the familiar
multicast address specified by the Internet coding organization or the
temporary multicast address used by special occasion If the flag bit is 0, it
indicates that the address is the familiar address; if the flag bit is 1, it
indicates that the address is the temporary address. The other three flag
bits are reserved for future use.
Range field: The length is 4 bits, indicating the multicast range, that is,
the multicast group includes only the nodes in one local network, one
station and one organization, or still includes the nodes at any location of
the IPv6 global address space. The possible values of the four bits are:
Group ID field: The length is 112 bits, used to identify the multicast group.
One multicast ID can show different groups according to the multicast
address is temporary or familiar and the address range. The permanent
multicast address uses the specified group ID with the special meaning,
The members in the group relies on the group ID and the range.
All IPv6 multicast addresses begin with FF. The first 8 bits of the address
are all 1. Currently, the remaining bits of the flag are not defined, so if the
third hexadecimal number of the address is 0, it indicates the familiar
address; if the third hexadecimal number of the address is 1, it indicates
the temporary address. The fourth hexadecimal number means the range,
which can be un-distributed value or reserved value.
2. Multicast group
IPv4 already has the multicast application, because the application sends
the same data to multiple nodes. Use the distributed multicast addresses
and multicast ranges to combine, showing various meanings and being
used on other applications. Some previous registered multicast address
includes the gateway devices in groups, DHCP service, audio and video
service, and the network game service. For details, refer to RFC 2375.
Any-cast
The multicast address can be shared by multiple nodes on some meaning.
All nodes of the multicast address member hope to receive all packets sent
to the address. One gateway device connected to five different local
Ethernet networks forwards the copy of one multicast packet to each
network (suppose that at least one on each network reserves the multicast
address). The any-cast is similar to the multicast address. Multicast nodes
share one any-cast address. The difference is that only one node hopes to
receive the packets to the any-cast address. Any-cast is useful for
providing some type of services, especially for some services that do not
need to have specified relation between the client and server, such as the
domain name server and the time server. The name server is just one
name server and it should work the same regardless of the distance.
Similarly, one near time server is more advisable. Therefore, when one
host sends out request to the any-cast address to get information, the
nearest server to the any-cast address should respond.
2. Any-cast routing
To get to know how to confirm the route for one unicast packet, you must
extract the lowest public routing naming character from a group of hosts
of one specified unicast address, that is, they are sure to have some public
network address number and the prefix defines the area of all any-cast
nodes. For example, one ISP can require each of its users to provide one
time server and the time servers share one any-cast address. The prefix
defining the any-cast area is distributed to ISP for re-distribution. The
routing in the area is defined by the distribution of the hosts that share
the any-cast address. In the area, one any-cast address is sure to carry
one routing option. The routing option includes some pointer, pointing to
the network interface of all nodes that share the any-cast address. In
previous case, the area is limited in the limited range. The any-cast hosts
may disperse on the global Internet. In this case, the any-cast address
must be added to all route tables in the world.
Hop-by-Hop Options Header: The extension header must follow the IPv6
header. It contains the optional data that each node on the path passed by
the packet must check. Up to now, only one option is defined, that is,
jumbo payload option. The option indicates that the payload length of the
packet exceeds 16-bit payload length field of IPv6. As long as the payload
(including the hop-by-hop options header) of the packet exceeds 65535
bytes, the packet must contain the option. If the node cannot forward the
packet, it must return one ICMPv6 error packet.
Routing header: The extension header indicates the special nodes passed
by the packet to the destination. It contains the address list of the nodes
passed by the packet. The original destination address of the IPv6 header
is not the final destination address of the packet, but it is the first address
listed in the routing header. After the node of the address receives the
packet, process the IPv6 header and routing header, and then send the
packet to the second address in the routing header list until the packet
reaches the final destination.
Using the IPv6 extension header can realize the option on the premise of
not affecting the performance. The developer can use the option if
necessary, but does not need to care that the gateway device treats the
packets with extension options distinctively unless the routing extension
header or hop-by-hop option is set. Even the two options are set, the
gateway device still can perform the necessary processing, easier than
using the IPv4 option.
Extension Header ID
All IPv6 headers are the same long and look nearly the same The unique
difference is the next header field. In the IPv6 packet without extension
header, the value of the field means the upper protocol. That is to say, if
there is the TCP field in the IP packet, the 8-bit binary value of the next
header field is 6 (from RFC 1700); if there is UDP packet in the IP packet,
the value is 17. The next header field value indicates that whether there is
the next extension header and what is the next extension header.
Therefore, the IPv6 headers can be linked, beginning from the basic IPv6
header to link the extension headers one by one.
1. IPv6 header
4. Routing header
5. Fragmentation header:
7. ESP header
9. Upper header
From the previous order, we can see that only the destination extension
header can appear for many times in one IP packet when the packet
contains the routing extension header. The previous order is not absolute.
For example, when the remaining part of the packet needs to be encrypted,
the ESP header must be the last extension header. Similarly, the hop-by-
hop option has higher priority than all other extension headers, because
each node that receives the IPv6 packet must process the option.
The extension header must be confirmed via the next header field of the
IPv6 header, which means that the field is 8 bits and there can be only
256 different values at most. Even the number of the possible values of
the field is reduced, all possible values of the upper header also must be
supported. That is, the value identifies not only the extension header, but
also all other protocols encapsulated in the IP packet. Therefore, many
values are assigned and the un-assigned values are limited.
Some protocol IDs of the extension header in IPv6 is from IPv4, such as
ID authentication header and ESP header. Up to now, many extension
headers are assigned, but it is also permitted to set up new options via the
hop-to-hop options extension header and destination option extension
header. Besides saving protocol values for the next header field, it is easy
to realize new options by using the option header extension. If using one
new header type to send IP packet and the destination node supports the
new header type, everything goes well. Contrarily, if the new header type
is unknown for the destination node, the destination node has to drop the
packet. On the other hand, all IPv6 nodes must support hop-by-hop
options extension header, destination option extension header and some
basic options (refer to the next section). Here, if the destination node
receives the packet with the destination option extension header, even
does not support the option in the extension header, it still can respond
The option also can request the destination node to return one ICMP error
packet, indicating that the destination node does not understand the
option.
header field and header extension length field All IPv6 headers contain the
next header field. The header extension length field occupies 8 bits,
indicating the length of the option header. The length takes 8 bytes as the
unit, excluding the first 8 bytes of the extension header, that is, if the
option extension header only has 8 bytes, the field value is 0. The filed
limits the extension header to 2048 bytes at most. The remaining part of
the extension header is the options contained by the extension header.
Options
The IPv6 option contains the following three fields:
Option type: The field is the 8-bit ID, indicating the type of the option.
Even the destination node cannot identify the option, the front 3-bit code
can also translate the option type.
Option data length: The field is 8-bit integer, indicating the length of the
option data field. The maximum value of the field is 255.
Option data: The field contains the specified data of the option and the
maximum length is 255 bytes. The front two bits of the option type field
indicates that the destination node should take actions when the specified
options cannot identified. There are the following four option types:
00: Ignore the option and complete the processing for the remaining part
of the extension header;
10: Drop the packet; no matter whether the destination address of the
packet is multicast address, send one ICMP packet to the source address
of the packet;
11: Drop the packet; if the destination address of the packet is unicast
address or any on-demand address (that is non-multicast address), send
one ICMP packet to the source address of the packet.
The third bit of the option type indicates whether the value of the option
data can change when the packet is transmitted from the source address
to the destination address. If it is 0, the option data cannot change; if it is
1, the option data is variable. Both hop-by-hop options extension header
and destination option extension header contain the same options, that is,
two filling options (filling option 1 and filling option N). The filling option 1
is special; it has only 8 bits, which are all set as 0; there is no option data
length or other option data.
The filling option N is identified by one of the previous four option types. It
uses multiple bytes to fill in the extension header. If the extension header
needs N bytes to fill in, the value of the option data length field is N-2,
that is, the option data field occupies N-2 bytes, which are all set as 0.
Plus the one byte of option type field and one byte of the option data
length field, totally N bytes are filled.
The same as other option extension headers, the front two fields indicate
the length of the next header protocol and extension header (here,
because the whole option has only 8 bits, the field value of the extension
header length is 0). The jumbo payload option starts from the third byte of
the extension header. The third byte is the extension header type and the
value is 194. The fourth byte (that is the value of the jumbo payload
option data length) is 4. The last field of the option is the jumbo payload
length, indicating the actual bytes contained in the IP packets (including
the hop-by-hop option extension header, but excluding the IPv6 header).
The node can use the jumbo payload option to send the jumbo IP packet
only when each gateway device on the way can process. Therefore, the
option is used in the hop-by-hop extension header and it is required that
each gateway device on the way must check the information. The jumbo
payload option permits the IPv6 packet payload length to exceed 655535
bytes , exceeding the 4 billion bytes. If the option is used, it is required
that the 16-bit payload length field value of the IPv6 header must be 0
and the jumbo payload length field value in the extension header is no less
than 65535. If the two conditions are not satisfied, the node that receives
the packet should send the ICMP error packet to the source node,
informing the problem. Besides, there is another limitation: If there is
fragmentation extension header in the packet, the jumbo payload option
cannot be used at the same time, because the packet cannot be
fragmented when the jumbo payload option is used.
filled. Besides, each gateway device on the path must process the whole
address list no matter whether the gateway device is in the list, so the
processing for the source route packet is slow. IPv6 defines one common
routing extension header, which has two fields, that is, routing type field
and remaining fragment field. The two fields occupy one byte respectively.
The routing type field indicates the type of the used routing header, while
the remaining fragment field indicates the number of the additional
gateway devices listed by the remaining part of the extension header. The
gateway devices must be passed by the packet to the destination. The
remaining part of the extension header is the specified data of the type,
which is related with the routing header type. RFC 1883 defines one type,
that is, type 0 routing header.
Type 0 routing extension header solves the main problem of the IPv4
source routing. Only the gateway devices in the list process the routing
header, and the other gateway devices do not need to process. Up to 256
gateway devices can be specified in the list. The operation process for the
routing header is as follows:
The source node constructs the list of the gateway devices that must be
passed by the packet and construct type 0 routing header. The header
contains the list of the gateway devices, the final destination node address
and the remaining fragments. The remaining fragments (8-bit integer)
indicates the number of the gateway devices that must be passed by the
packet to the destination node.
When the source node sends the packet, set the destination address of the
IPv6 header as the address of the first gateway device in the routing
header list.
The packet is forwarded till reaching the first station of the path, that is,
the destination address of the IPv6 header (the first gateway device in the
routing header list). Only the gateway device checks the routing header
and the medium gateway devices on the path ignore the routing header.
At the first station and all later stations, the gateway device checks the
routing header to ensure that the remaining fragments are consistent with
the address list. If the value of the remaining fragments is 0, it indicates
that the gateway device node is the final destination of the packet and the
node continues to process the other part of the packet.
Suppose that the node is not the final destination of the packet. The node
gets its own address out from the destination address field of the IPv6
header and replaces it with the address of the next node in the routing
header list. Meanwhile, the node reduces the value of the remaining
fragment field by 1 and then sends the packet to the next station. The
other nodes in the list repeat the process until the packet reaches the final
destination.
The previous description shows that the source node can transmit up to
1280-byte packet without considering the packet fragmentation. Maybe
the 1500-byte packet is not fragmented, but the IPv6 suggests that all
nodes execute the path MTU discovery mechanism and only permit the
fragmentation of the source node. That is to say, before sending any
packet, check the path from the source node to the destination node and
calculate the sent maximum packet without fragmentation. To send the
packet whose length exceeds the maximum value, the source node must
fragment the packet. In IPv6, the fragmentation only happens to the
source node and use the fragmentation header to express.
Next header field: The 8-bit field is common for all IPv6 headers.
Fragment offset field: It is similar to the IPv4 fragment offset field. The
filed has 13 bits and takes the 8 bytes as unit, indicating the location
relation between the first byte of the data in the packet (fragment) and
the first byte of the fragmentable data in the original packet. That is to say,
if the value is 175, it indicates that the data in the fragment starts from
the 1400th byte of the original packet.
Reserved field: Currently, the 2-bit field is not used and is set as 0.
ID field: The field is similar to the IPv4 ID field, but it is 32 bits, while in
IPv4, it is 16 bits. The source node distributes one 32-bit ID for each
fragmented IPv6 packet, used to identify the packet that is sent from the
source address to the destination address recently (in the life time of the
packet). Only part of the IPv6 packet can be fragmented. The
fragmentable part includes payload and extension header that can be
processed only when reaching the final destination. For the IPv6 header
and the extension header that must be processed by the gateway device
when sending to the destination node, such as the routing header or hop-
by-hop options header, do not permit fragmenting.
GRE Technology
Main contents:
Terms
Typical Application
Terms
VPN: Virtual Private Network Through VPN technology, two or multiple
network sites can be connected through the Internet. In the VPN, the
running mode is like that all sites are in a single private network
The GRE technology is used to create a tunnel between the source end
and the destination end. The packets that will pass the tunnel are
encapsulated with a new packet header (GRE packet header). Then, the
packets with tunnel destination address are put into the tunnel. When the
packets reach the destination of the tunnel, the GRE header is stripped.
The GRE packets are transmitted after being added with IP header.
Therefore, the GRE is over the IP layer. The protocol ID in the IP header is
47.
GRE header: it is added after the payload packet enters the tunnel;
includes the GRE protocol and passenger protocol-related information.
A simplest GRE header contains four bytes, namely, when the C, K, and S
flag bit are 0, the GRE header only contains the information of bit 0 to 31.
Bit 0 is the flag bit of the checksum. Only when the flag bit of checksum is
set to 1, the checksum field is valid.
Bit 2 is the key flag bit. Only when the key flag bit is set to 1, the key field
is valid.
Bit 3 is the sequence number flag bit. Only when it is set to 1, the
sequence number filed is valid.
The protocol type field marks the type value of the payload packet.
Generally, the values of the protocol field and the Ethernet frame type are
the same. For example, the protocol type of IP packets equal 0800.
Checksum field
The checksum field carries the checksum of the GRE headers. The
checksum must cover the GRE headers and the payload packets.
Key field
The key field carries the keys of the tunnel. The same key must be
configured at two ends of the tunnel (or do not configure keys at two ends)
for a connected tunnel.
Sequence field
The sequence field carries the sequence number of the packets. If the
sequence flag bit is set, the packets passing the tunnel will carry sequence
numbers. The sequence number starts from 0. 1 is added when one
packet is sent. After the opposite end receives the packet, it will record the
sequence number of the received packet. If invalid packet is received, the
opposite end will discard the packets.
The shadowed part is the new IP header; the part in the pane is the GRE
header; the rest is the real IP packet, serving as the data.
45 00 05 f4 8f e3 00 00 7f 2f fd 85 c0 a8 01 02 c0 a8 01 01 00 00 08 00
45 00 05 dc 72 3f
2f indicates the type of the protocol contained in the IP packet: GRE (47).
0000 0800 indicates the GRE header: all the flag bits are 0, which
indicates that the GRE packet does not contain checksum, key, and
sequence number; the passenger protocol is IP.
Packet receiving: If the destination of the packets is the router, send the
packets to the upper-level protocol for processing; if the protocol is GRE
(47), search the corresponding tunnel interface. Then, process the GRE
headers. Perform a series of test and then strip the external IP headers.
Modify the recvif field of the mbuf to the local tunnel interface. At last,
send the packets to the IP input queue.
Packet sending: If the packets are sent to the tunnel interface, add GRE
headers according to the interface configuration. Add the IP headers of the
source address and destination address specified by the tunnel; route
according to the destination address of the tunnel to send the packets to
the actual physical interface.
After the packets reach switch2, the switch2 routes the packets. Owing to
the existence of static route, switch 2 is determined to forward packets
from the tunnel. The packets are encapsulated.
Encapsula tion
In this case, the packets to be forwarded are the payload packets (IP
packets in this case). The tunnel adds a GRE header to the header. The
protocol type of the GRE header is set to 0800 (IP protocol type). Then,
add an IP header (delivery header) to the GRE header. The protocol value
of the IP header is set to 47 (GRE protocol ID). The destination address of
the IP header is set to the destination 21.1.1.1 of Tunnel1. Set the source
After the encapsulation is complete, the packets are sent from interface
12.1.1.1.
For warding
After switch3 receives packets, it sends the packets to the IP layer for
routing. In this case, the IP header analyzed by switch3 is Delivery header
(the payload packet is encapsulated and switch3 cannot reach the IP
header of payload packet). Therefore, perform route forwarding according
to the destination address 21.1.1.1 of the delivery header.
The process lasts until the packets reach the destination switch4 of the
tunnel.
D e- capsula tion
After switch4 receives the packets, it also analyzes the Delivery header. If
the destination address 21.1.1.1 is its own address, it checks the protocol
field of the IP packets. Since the protocol field is 47, the IP packets should
be processed by the GRE tunnel. The tunnel first removes the Delivery
header, and then checks the protocol type of the GRE header. Protocol
type is 0800, therefore, the tunnel sends the payload packet to the IP
layer for processing to implement de-capsulation.
The disadvantage of the GRE is the high management cost and the scale
of the tunnel is large. The GRE is manually configured. Therefore, the cost
for configuring and maintaining tunnels is relevant with the number of the
tunnels. When the terminal of the tunnel changes, the tunnel should be re-
configured.
Typical Application
The GRE tunnel technology can meet the requirements of Extranet VPN
and Intranet VPN.
Transition Technology
Main contents:
Tunnel technology
Tunnel Technology
The tunnel technology provides a method using the existing IPv4 route
architecture to transfer the IPv6 data: regard the IPv6 packets as
structureless and meaningless data, encapsulate into IPv4 packets and to
transfer through IPv4 network. According to the creation mode, the tunnel
technology includes manually configured tunnel and automatically
configured tunnel. The tunnel technology uses the existing IPv4 network.
It provides a communication method between IPv6 nodes during the
transition, but it cannot solve the interconnection problem between IPv6
node and IPv4 node.
In the tunnel, the following functions are widely used: manually configured
tunnel, automatically configured tunnel, 6to4, 6over4, and ISATAP.
2. 6to4 tunnel
SLA Technology
This chapter describes the SLA theory and how to realize it.
Main contents:
SLA terms
Introduction to SLA
Introduction to SLA
SLA Terms
SLA: Service Level Agreements; sending the packets of the specified
protocol to detect and monitor the network communication;
RTR: Response Time Reporter; SLA calculates and outputs the report
according to the packet transmission, so it is also called RTR (Response
Time Reporter);
ICMPECHO: It is the RTR entity that sends the ICMP PING packet to
detect the network communication. With the detection, output the packet
round delay, packet loss and so on.
UDPECHO: It is the RTR entity that sends the UDP packets regularly to
detect the communication of the UDP packet in the network. With the
detection, output the round delay and packet loss (not the connection
packet, but the data detection packet) of the packet.
RTR GROUP: The RTR group is the set of one or multiple RTR entities.
The RTR group comprises the single RTR entity and the group cannot
become the member of the group. One RTR entity can belong to multiple
RTR groups, but one RTR entity can only belong to one group for one time.
RTR SCHEDULE: It schedules one RTR entity or RTR group to detect the
network communication.
CODEC: It is used for the coding and decoding of the VoIP signals.
Introduction to SLA
There are many factors that affect the normal running of the network,
such as the complexity of the network environment, the configuration
mistake of the administrator, the failure of the network device and even
irresistible factors. Therefore, Detecting and recording the detection result
regularly for the network communication in the networking and network
running is important for solving the problems when the network fails. As
for this, SLA is developed, a the network detection and monitoring tool.
The basic theory is to use the different kinds of RTR entities to represent
different kinds of network detections and initiate the schedule for the
entities to reach the detection purpose. Meanwhile, with the rich schedule
policies, SLA can track and monitor the network communication in detail.
RTR Entity
RTR entity is one common concept, not related with the specified type of
RTR entity. Currently, the RTR entity types of the system include the
MACSLA entity used to detect the L2 connectivity, the ICMPECHO entity,
the ICMP-PATH-ECHO entity, the ICMP-PATH-JITTER entity, and the
UDPECHO entity used to detect the network communication, the JITTER
entity used to detect the transmission of the VoIP packets in the network,
and the FLOW-STATISTICS entity used to detect the interface traffic.
I C M PEC H O Ent it y
The ICMPECHO entity is used to detect the basic communication of the
network. It sends the ICMP PING packets to one destination address in the
network, so as to detect the transmission delay and packet loss of the
packet from the source to the destination.
The common network devices all support PING, so the entity can take
effect in detecting the basic communication of the network. With the rich
schedule policies and log recording function, the network administrator can
get to know the network communication and history information, as well
as reducing the work of inputting the common PING commands.
I C M P- PATH- EC H O En tit y
ICMP-PATH-ECHO entity is used to detect the basic communication of the
network. It sends the ICMP PING packets to one destination address in the
network regularly, so as to get the packet transmission delay and packet
loss from the detection end to the destination end, and get the delay and
packet loss between the detection end and the medium devices from the
detection end to the destination end.
The common network devices all support PING, so the entity can take
effect in detecting the basic communication of the network. With the rich
schedule policies and history recording function, the network administrator
can get to know the network communication (for example, which network
device has serious delay on the path) and history information.
I C M P- PATH-JITTER Entit y
The ICMP-PATH-JITTER entity is used to detect the basic communication of
the network. It sends the ICMP PING packets to one destination address in
the network regularly, so as to get the packet transmission delay, jitter
and packet loss from the detection end to the destination end, and the get
the delay, jitter, and packet loss between the detection end and the
medium devices from the detection end to the destination end.
The common network devices all support PING, so the entity can take
effect in detecting the basic communication of the network. With the rich
schedule policies and history recording function, the network administrator
can get to know the network communication (for example, which network
device has serious delay on the path) and history information.
JI TTER Enti t y
Introduction to VoIP and the related communication detection standards
VoIP is short for Voice over IP. It mainly converts the voice or fax to data
and then share one IP network (Internet) with the data for transmission.
The cost for transmitting the voice and fax on Internet is low, so the
technology is widely applied. The voice is transmitted on the IP network.
Compared with the traditional telephone, it adopts the voice coding mode
to digitalize the analog voice, pack it, and then adopt the Best-Effort IP
transmission mechanism to transmit it to the receiving end via the IP
network. After collecting the packets, the receiving end decodes the voice
to get the analog voice. From the transmission of the voice on the IP
network, we can see that the packet delay and packet loss caused by the
network transmission quality, the cost for the converting between the
analog voice and the data caused by the codec, the
compression/decompression cost, echo cost, process delay and so on
become the factors that affect the Internet VoIP transmission quality. This
shows that the transmission of the voice on the IP network needs to
consider many factors that are different from the traditional telephone
network and traditional data network and the factors limits the VoIP
quality.
of the human. The corresponding relation of the VoIP quality and MOS is
that the network configuration, standard and monitoring provide the
foundation.
Note
The value range of ICPIF is 5-55. If the ICPIF value is small than or equal
to 5, it is called low impairment and the VoIP quality is best, but if the
ICPIF value is no less than 55, it is called high impairment and the VoIP
quality is called high impairment and the VoIP quality is poorest. The ICPIF
value lower than 20 is regarded as acceptable. (Since 2001, ICPIF is not
recommended by ITU-T, and E-MODEL replaces it. But currently, we also
measure the communication quality according to ICPIF)
14 - 23 3
24 - 33 2
34 - 43 1
Default
Default
Default packet interval Default sending
Codec packet
length between frequency
quantity
packets
160 + 12 RTP
G.711 mu-Law (g711ulaw) 20 ms 1000 Once every 1 minute
bytes
160 + 12 RTP
G.711 A-Law (g711alaw) 20 ms 1000 Once every 1 minute
bytes
G.729A (g729a) 20 + 12 RTP bytes 20 ms 1000 Once every 1 minute
The JITTER entity can simulate three kinds of codec or customized codec
to send the UDP packets with the corresponding rate, interval and size,
and measure the round-trip time, uni-directional packet loss and uni-
directional delay. Based on the statistics information, calculate the ICPIF
value and estimate the MOS value according the ICPIF value at last.
Use the JITTER entity to test the network transmitting VoIP packets.
Consider two factors for calculating ICPIF, that is, the uni-directional delay
of the packet and the packet loss. Therefore, the formula for calculating
Ie is called device impairment factor, which is related with the packet loss.
Ie can be got according to the percentage of the packet loss. The relation
is as follows:
The expected factor is used to indicate the conflict balance of the user
access and VoIP quality. For example, compare the countryside where the
signal is difficult to receive with the plain where the signal is good. The
VoIP quality of the wireless telephone of the former is sure to be lower
than the expected VoIP value of the cable phone of the latter. Currently,
the relation of the common user access mode and the expected factor is
as follows:
Max. expected
Communication service type
factor
General cable communication link 0
The measured MOS value is just one suggestion for the network to
transmit the VoIP packets, but there may be some difference with the
actual measured MOS.
During the JITTER measuring process, we use the UDP packets (because
the VoIP packets are encapsulated in the UDP packet) to simulate the
transmission of the VoIP packets and calculate the ICPIF value and MOS
value according to the transmission status, so as to detect the quality of
the network transmitting the VoIP packets. The size of the sent UDP
packet, the number of the sent UDP packets, and the interval of sending
the UDP packets depend on the type of the codec to be simulated.
Meanwhile, the user also can customize the codec to configure the
parameters.
To reach more exact measuring and be compatible with Cisco, you need to
configure the RTR Responder at the destination end of the measurement.
Responder is used to set up the connection with the source end and
respond to the detection packet sent by the source end, so as to make the
measurement result more exact. To use the JITTER entity detection,
Responder must be configured at the destination end.
The source end and Responder end adopt the one inner protocol realized
by CiscoSAA control protocol for the connection and communication
detection. The protocol is encapsulated in the UDP packet, belonging to
the application layer protocol.
The SAA control protocol is one private protocol of Cisco and the main
packet formats include the SAA connection request packet and response
packet and SAA packet.
When using the JITTER entity detection, the SLA source end first initiates
the SAA connection request packet according to the specified parameters
and sends it to the destination monitoring port 1967. The SAA connection
request packet is as follows:
Note
Here, the version field indicates the version of the SAA control protocol
and currently, it is 1. Id indicates the ID of initiating the SAA connection
request, used to identify one connection; the frame length indicates the
length of the SAA connection request packet, it is 52 bytes when the life
time field is 2 bytes and it is 56 bytes when the life time field is 6 bytes;
4-byte reserving area is all-0; the 2-byte command type indicates the
connection property and 0004 is the JITTER detection connection;
currently, the 6-byte reserving area is the unknown area and usually, it is
000100000000. The followed are the 4-byte destination IP address and 2-
byte destination port number, indicating the destination IP and port
number of the JITTER connection. The 2-byte or 6-byte life time field
indicates the life time of the connection from being set up to being
disconnected, taking ms as the unit, and it is equal to the number of the
packets sent for one time the interval of sending packets + the packet
timeout. The last are the packet end flag field and usually, it is 0001001c,
and the all-0 filling field.
When RESPONDER receives the request packet, send the SAA connection
response packet after processing. If setting up the connection succeeds,
the detection starts. Otherwise, cut off the connection. The format of the
SAA connection response packet is as follows:
Note
Here, the version field indicates the version of the SAA control protocol
and currently, it is 1. Id indicates the ID of initiating the SAA connection
request, used to identify one connection; the packet length indicates the
length o the SAA connection response packet and it is 8 bytes; 2-byte
response code is 0x0000 for success and 0x0002 for failure; the last is the
2-byte reserving area and it is all 0.
After receiving the response packet from the RESPONDER end, the source
end processes it. If the response packet indicates failure, cut off the
connection; if the response indicates success, start to fill in the SAA packet
and send it to the RESPONDER end for detection. After receiving the
packet, the RESPONDER end processes it, filling in the desired contents
and sending to the source end, that is, completing the packet detection.
The format of the JITTER packet is as follows:
Note
For the JITTER entity, the results that need to be saved include the packet
round-trip delay, jitter, uni-directional delay (need to synchronize the
clocks of the source and destination ends), and packet loss. The ICPIF
value and MOS value can be calculated according to the previous
parameters.
After setting up the connection, the source end sends the UDP detection
packets to the destination port according to the options negotiated by the
SAA control protocol. Before sending the packet, fill the sending time (ST1)
into the packet and fill in the sending serial No. (QS1), while the
destination end fills the receiving time (RT1) and the receiving serial No.
(QR1) into the packet and fills the delay caused by the processing time
(DT1) into the packet before sending. In this way, if the sending end
receives the packet within the timeout, record the receiving time (AT1).
Record the ST2, QS2, RT2, QR2, DT2, and AT2 of the second packet, as
follows:
RTTRT1-ST1+AT1-RT1-DT1AT1-ST1-DT1
JITTERSD(RT2-RT1)-(ST2-ST1)i2i1
JITTERDSAT2-AT1-((RT2+DT2)-(RT1+DT1)) =i3i2
Here, i1 is the sending interval of the second packet and the first packet;
i2 is the receiving interval of the first packet and the second packet; i3 is
the interval of receiving the response packets of the first packet and the
second packet.
Meanwhile, if the clocks of the source end and destination end are
adjusted to be consistent, the uni-directional delay is:
DelaySD=RT1-ST1;
DelayDS=AT1-RT1-DT1;
And then, you can calculate ICPIF: Icpif IddIeA according to the
uni-directional delay and lost packets. After calculating ICPIF, you can get
the MOS value according to the converting relation of ICPIF and MOS, so
as to get the standard of measuring the network transmitting the VoIP
packets.
U D PEC H O Enti t y
The UDPECHO entity is to detect the UDP packets transmitted in the IP
network. The destination address and port of the sent packet need to be
specified in the entity. You can monitor the transmission of the UDP
packets in the IP network via the scheduling for the entity.
The UDPECHO entity can record the round-trip delay and packet loss of
the UDP packets in the IP network via the valid monitoring, even can
record the monitored history information via the logs for the network
administrator to get to know the network communication and fix faults.
The request and response packet of the SAA UDPECHO entity is the same
as the SAA JITTER entity. The packets of the UDPECHO entity are different
from the packets of the JITTER entity. The packet format is as follows:
Usually, the 2-byte packet ID is 00 01, used to identify the data frame
between the sender and responder, but not the request or response
packet; the 2-byte DT field is 00 00 for sender and 00 02 or 0001 for
responder; Part1 and Part2 are optional and the filled contents of them are
related with the rtr attribute data-pattern and packet size. The filling
format is: part 1 gets all even bits of data-pattern. If the value of the even
bit is smaller than or equal to f, complement 0 in the front; if the value of
the even bit is larger than f, fill in ff. For part 2, first n = data-pattern
length/2 and then get the value of the first to nth (ASC code) from data-
pattern. By default, the SAA packet length is 16 bytes; if the current filled
length does not reach 16 bytes, the latter vacancy is filled by the ASC
code.
M AC SL A En tity
The MAC SLA entity is to detect the traffic of the Ethernet link. Currently,
MAC SLA is based on the Delay Measure function of the CFM protocol to
realize, so if you want to configure and run the MAC SLA entity, you need
to configure the CFM domain, service instance and MEP, while the MAC
SLA entity is performed between the specified CFM domain, service
instance and MEP.
Currently, MAC SLA supports the function of detecting the four quality
parameters, including uni-directional delay, bi-directional delay, jitter and
delay of the Ethernet link traffic. When the quality parameter exceeds the
threshold, output the corresponding log information.
The detection function of MAC SLA entity is widely used in Ethernet and
can reflect the network quality.
RTR Group
One RTR group is the set of one or multiple RTR entities. One RTR member
can belong to multiple RTR groups. One group cannot become the member
of the group any more. One group can contain one member for only one
time. The RTR group is identified by the group ID and the group name is
automatically generated by the system.
The RTR group is to schedule one RTR set. The scheduling for the RTR
group is equivalent to schedule all existing RTR entities in the RTR group
and the detection result is saved in the RTR entity history record.
RTR Schedule
If only the RTR entity or RTR group is configured, the detection cannot be
performed. The detection can be completed only after initiating the
scheduling. The RTR schedule is the policy of performing the scheduling
and detection for the RTR entity or group.
The RTR scheduling can take one single entity member or one RTR group
as the object, but cannot take one group and one entity as the object at
the same time. The RTR schedule is identified by the schedule ID and is
not related with the RTR entity type, but the scheduling interval must
consider the attributes of the RTR entity or the members in the RTR group
to be scheduled.
The RTR schedule provides rich schedule policies. You can choose to start
scheduling at once or after some time, even can set the absolute time of
starting the schedule. Besides, the schedule can die out after the set
schedule times or exist forever.
Request-data-size:32 REACHABLE.
Timeout:5000(ms)
Frequency:60(s)
TargetOnly:FALSE
Verify-data:FALSE
Alarm-type:LOG
Threshold-of-rtt:6 (direction be)
Threshold-of-pktLoss:200000000 (direction be)
Threshold-of-jitter:5 (direction be)
Number-of-history-kept:100 Rtr id 3 is the ICMP-PATH-
Periods:3 JITTER entity; the time of
In-scheduling:FALSE creating the entity is THU JAN
01 05:15:50 2009; the last
Status:DEFAULT
modifying time is THU JAN 01
-------------------------------------------------------------- 05:48:03 2009; the entity is
scheduled for 0 times, that is,
not start to schedule; only
send 10 ICMP packet to the
destination end and the
medium devices during each
schedule; the valid payload is
s32 bytes; the timeout is
5000ms; the schedule
frequency is 60s; just detect
the network of the destination
end and between the source
and the medium devices; do
-------------------------------------------------------------- not check the data; the alarm
ID:4 name:Jitter4 Created:TRUE mode LOG is SHELL prompt,
****************type:JITTER**************** none means no alarm, log
CreatedTime:THU JAN 01 05:15:53 2009 means the shell prompt, log-
andtrap means the shell
LatestModifiedTime:THU JAN 01 05:52:41 2009
prompt and sending the trap
Times-of-schedule:0 information to inform the
Entry-state:Pend NMS, and trap means just
TargetIp:1.1.1.2 targetPort:3434 sending trap to inform the
Codec:G.729A Packet-size:32 Packet-number:1000 NMS; the threshold of the
round-trip delay is 6ms and
Packet-transmit-interval:20(ms)
provide the alarm by alarm-
frequency:60(s) type when the round-trip
TimeOut:5000(ms) delay of the actual detection
Alarm-type:LOG-AND-TRAP is no less than the threshold;
Threshold-of-dsDelay:8(direction be) the threshold of the packet
Threshold-of-dsJitter:8(direction be) loss is 200000000; be means
alarming when no less than
Threshold-of-dsPktLoss:3(direction be)
the threshold, se means
Threshold-f-sdDelay:8(direction be) alarming when smaller than
Threshold-of-sdJitter:8(direction be) or equal to the threshold, and
Threshold-of-sdPktLoss:2(direction be) alarm by alarm-type; the jitter
Threshold-of-rtt:6(direction be) threshold is 5ms; save 100
history records and the new
Threshold-of-mos:10000000 (direction be)
records cover the old records
Threshold-of-icpif: 100000000 (direction se) when exceeding 100; save
Number-of-history-kept:120 the history record every
Periods:1 detecting for three times; not
Status:DEFAULT in the debug state; the link
-------------------------------------------------------------- status is DEFAULT; if the
destination is reachable, the
status is REACHABLE.
----
3.3.3.2 Rtt:1 Jitter:0 Pkt loss:0 record is saved; save according to the
1.1.1.2 Rtt:2 Jitter:0 Pkt loss:0 schedule interval 60s.
History of record from source to dest:
CurHistorySize:1 MaxHistorysize:100 The network environment is as follows:
THU JAN 01 02:30:03 1970
Source-router 1-destination
Rtt:2 Jitter:0 Pkt loss:0
The round-trip delay from the source to
-----------------------------------------------------------
router 1 (3.3.3.2) is 1ms; the jitter is 0;
---
there is no lost packet;
The round-trip delay from the source to
destination 1.1.1.2 is 2ms; the jitter is 0;
there is no lost packet;
Note
1. If there is another history record
when the number of the history
records reaches 100, the new
record covers the oldest record.
2. The NTP protocol must be
configured; let the clock to
synchronize.
After configuring the RTR entity 5, view the history records of rtr entity 5:
After configuring the RTR entity 6, view the history records of rtr entity 6:
Enable the debug during the entity detection and you can see the specific
debug information.
VRRP Technology
This chapter describes the VRRP protocol theory and how to realize it.
Main contents:
Master: One status of VRRP; the active device is in the state; ensure the
forwarding of the IP packets;
Backup: One status of VRRP; the standby device is in the state; ensure
the switch in time when the active device fails.
Here, the used gateway is any network device with the IP forwarding
function, such as switch and router. To make it easy for the reader to
understand, the following uses router to express the gateway.
VRRP is to solve the previous problem and it is designed for the LAN with
multicast or broadcast capability (such as Ethernet). VRRP makes a group
of routers of the LAN (including one MASTER and several BACKUP) form
one virtual router, called one backup group.
The virtual router (that is backup group) has its own IP address. The
router in the backup group has its own IP address. The hosts in the LAN
just need to know the IP address of the virtual router, but do not need to
know the IP address of the master router or the IP address of the backup
router. They set their default route as the IP address of the virtual router.
Therefore, the hosts in the network communicate with other networks via
the virtual router. When the master router in the backup group fails, the
other backup router in the backup group becomes the new master and
continues to provide route service for the hosts in the network, so as to
realize the un-interrupted communication with the out network.
The VRRP protocol is one IP packet and the protocol number is 112 (0x70).
VRID: The configured vrid of the interface, Virtual Router Identifier (VRID).
Priority: The priority configured on the interface. The priority of the router
with the virtual IP address (the router with VIP as the interface IP) is 255;
the priorities of the other routers are 1-254 and the default value is 100.
VRRP Workflow
Simply speaking, VRRP is one fault tolerance protocol. It ensures that
when the next hop router of the host fails, there is another router to
replace in time, so as to keep the continuity and reliability of the
communication. To make VRRP work, configure the virtual router number
and virtual IP address on the router. In this way, one virtual router is
added to the network, while the communication between the host on the
network and the virtual router does not need to know any information of
the physical router on the network. One virtual router comprises one
master router and several backup routers. The master router realizes the
real forwarding function. When the master router fails, one backup router
becomes the new master router and takes over the work.
The VRRP protocol defined in RFC2338 is made on the basis of the private
HSRP protocol of Cisco, but VRRP simplifies the mechanism put forward by
HSRP, reducing the additional load brought by the redundancy function to
the network. For example, HSRP defines that the virtual router has 6
states, while VRRP has only three, so as to reduce the complexity of the
protocol. In the stable state, HSRP has two states that can send packets,
while in VRRP, only the router in the Master state can forward packets and
the packets are one kind, which reduces the occupied bandwidth The HSRP
packets are based on UDP, while the VRRP packets are encapsulated on
the IP packet. Meanwhile, VRRP supports using the actual interface IP
address as the virtual IP address.
VRRP router forms the different virtual routers via VRID. The routers that
form one virtual router are divided to master router and backup router.
The master and backup virtual routers needs to be confirmed via some
rules. The following are the rules for selecting the master and backup
routers:
1. Select the master router according to the priority. The router with the
highest priority is the master router and the status is Master. If the
priorities of the routers are the same, compare the IP addresses of the
interfaces, the one with larger IP address becomes the master router.
2. The other routers serve as the standby router, monitoring the status
of the master router in real time. When the master router works
normally, it sends one VRRP multicast packet (224.0.0.18), informing
the backup router in the group that it is in the normal state. If the
backup router in the group does not receive the packets from the
master router for a long time, it turns to Master. When there are
multiple backup routers in the group, there may be multiple master
routers. Here, each master router compares the priority in the VRRP
packet and its local priority If the local priority is smaller than the
priority in VRRP, its status turns to Backup. Otherwise, keep its status.
In this way, the router with the highest priority becomes the new
master router and completes the backup function of VRRP.
The virtual router has three status, including Initialize, master and backup.
Master status:
Must answer the ARP request for the virtual IP address; the response
of ARP is the corresponding MAC address of the virtual router IP
address;
Must receive the packets with the related IP address as the destination
(if it is the IP address owner);
When turning to master from other status, send the free ARP packets;
BACKUP status:
Cannot answer the ARP request for the virtual router IP address;
Cannot send the protocol packets; must receive the protocol packets
(multicast);
INITIALIZE status:
VRRP Features
VRRP has the following features:
Load balance: It is one function with high VRRP added value. Use
multiple virtual routers to back up multiple gateways; the terminal
sets different virtual router IP addresses to realize the load balance.
It shows that the switch sends the VRRP packet from the interface VLAN1;
the VRID is 1 and the priority is 100.
It shows that the switch receives the VRRP packet from the interface
VLAN1; the contents of the packet is displayed in detail.
2. Event debug
VBRP Technology
This chapter describes the VBRP protocol theory and how to realize it.
Main contents:
As shown in the above figure, the two devices that have unique IP address
respectively are in one network. In the normal state, the user must select
one of the two devices as the default gateway. The failure rate of the user
network depends on the failure rate of the device. However, if the two
devices are configured with the VBRP protocol, generate one logical device
with separate virtual IP address, which is used as the default gateway of
the host. In any specified time, one device is the active device and the
other one is the standby device. The master device forwards and
processes the data flow of the user. When the active device fails, the
standby device takes over all work of the active device and becomes the
new active device, so as to reduce the failure rate of the network to the
concurrent failure rate of the two devices.
The VBRP packet is one UDP packet. Both the source and destination ports
are 1985.
Hello message: It indicates that the router is running and can become the
active or standby device;
Coup message: When one device hopes to become the active device, send
the message;
Resign message: When device does not hope to become the active device,
send the message;
Hellotime: It indicates the Hello interval of the sender of the Hello packet
(s). The field is valid in the Hello packet. The router that sends the Hello
packet must fill its own Hellotime into the packet. By default, the Hellotime
is s3s;
Holdtime: It indicates the validity of the Hello packet (s). The field is valid
in the Hello packet. The receiver of the Hello packet regards the Holdtime
in the packet as the validity of the Hello packet. Holdtime should be 3
times of Hellotime at least.
Priority: The priority field; it is used when selecting the active and standby
device. The one with larger value is preferential. If the devices have the
same priority, the one with larger address is preferential.
VBRP Workflow
To make VBRP work, first create one virtual IP address. In this way, one
virtual device is added to the network. However, when the host on the
network communicates with the virtual device, do not need to know any
information of the physical device on the network. One VBRP device is
specified as the active device and another physical device serves as the
standby in case that the active device fails. The active device responds to
not only its own IP address but also the virtual IP address.
When the host sends one packet to the networks except for the local
network, the host configuration indicates that the next hop of the packet is
the default gateway. The IP address of the default gateway is configured,
but to send the Ethernet frame to the device, the host needs to know the
MAC address of the device. The host sends one ARP request to the
network to query the MAC address of the default gateway. The actual host
on the network does not have the MAC address of the virtual device, so
the active device responds to the ARP request. The active device monitors
any traffic to the virtual IP address and maintains the traffic. It looks like
the traffic is routed to the active device.
The device configured with VBRP uses the UDP call packet to advertise
their existing. The advertisement is used to detect the invalidity and
negotiation parameters of the device, such as virtual IP address and
authentication password. The advertisement is also used to select the
device. At any time, there can be only one active device and one standby
device on the network. All other devices configured in one standby group
are in the Listen state until the next route selection. The next selection
happens when the active or standby device becomes unavailable.
VBRP defines three types pf packets. The first is Hello packets, sent by the
active device, standby device and the router in the SPEAK state to inform
group members of their existing The Hello packet also contains the
configuration parameters, such as IP address and timer value. The device
that does not define the parameters can get the parameter values via the
Hello packet.
The second is the Resign packet. When the active device exits from the
VBRP group because the configuration changes or the device is disabled
and so on, the active device sends the Resign packet.
The third is Coup packet. The packet is sent when the preempt
configuration command causes one device to replace the active device. If
the device is the standby device with the highest priority, it becomes the
active device.
The VBRP protocol has 6 states, including INITIAL, LEARN, LISTEN, SPEAK,
STANDBY, and ACTIVE.
1. INITIAL state
All devices start from the initial state. This is one initial state, indicating
that VBRP does not run. When one interface is in DOWN state or turns to
the DOWN state, it enters the state.
2. LEARN state
In the LEARN state, the device waits for the hello packet from the ACTIVE
device and plans to learn the virtual IP address. When one device
configured with one virtual device group is not configured with VIP, the
state appears.
3. LISTEN state
In the LISTEN state, the device knows its VIP, but it is not the ACTIVE
device or STANDBY device. It only accepts the protocol packets from the
ACTIVE device and STANDBY device. It changes its status to take part in
the election of the ACTIVE device or STANDBY device when the protocol
packets are not received from one device within some time (the other
devices except for the ACTIVE and STANDBY devices are all in the LISTEN
state).
4. SPEAK state
In the SPEAK state, the device sends the periodical hello packets and
takes part in the election of the ACTIVE/STANDBY device. The device
cannot enter the SPEAK state before getting VIP.
5. STANDBY state
In the STANDBY state, the device becomes the candidate device of the
next ACTIVE device and sends the periodical hello packets. In one virtual
device group, there can be only one standby device.
6. ACTIVE state
In the ACTIVE state, the device is responsible for forwarding the packets
that are sent to the virtual MAC address of the virtual device group and
responding to the ARP request whose destination IP is VIP. The active
device sends periodical hello packets. In one virtual device group, there
can be only one active device.
VBRP Functions
1. Gateway backup: Multiple devices share one IP address,
preventing that the unique gateway fails and minimizing the
network black hole. This is the main function of VBRP.
4. Remote login: When the IP address of the virtual device is like the
IP address of one interface, you can log into the device in the
ACTIVE state remotely;
00:28:18: VBRP: vlan1 Grp 0 Hello out 128.255.17.54 Active Pri 100 vIP
128.255.17.1
The above information shows that the Ethernet port vlan1 sends the VBRP
Hello packet. The VBRP group number is 0; the main address of the
Ethernet port is 128.255.17.54 and the current status is Active; the
priority is 100 and the virtual IP address is 128.255.17.1.
00:38:44: VBRP: vlan1 Grp 0 Hello in 128.255.16.3 Standby pri 100 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 receives the VBRP
Hello packet. The VBRP group number is 0; the source address of the
sender is 128.255.16.3; the current status is Standby; the priority is 100;
the virtual IP address is 128.255.17.1.
Only the VBRP devices in the Speak, Standby, and Active state can send
Hello packets.
00:28:18: VBRP: vlan1 Grp 0 Coup out 128.255.17.54 Active Pri 100 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 sends the VBRP
Coup packets. The VBRP group number is 0; the main address of the
Ethernet port is 128.255.17.54; the current status is Active; the priority is
100; the virtual IP address is 128.255.17.1.
02:43:54: VBRP: vlan1 Grp 0 Coup in 128.255.16.3 Active pri 110 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 receives the VBRP
Coup packets. The VBRP group number is 0; the source address of the
sender is 128.255.16.3; the current status is Active; the priority is 110;
the virtual IP address is 128.255.17.1.
2. Event debug
r2(config-if-vlan1)#shutdown
03:08:32: %LINEPROTO-5-UPDOWN: Line protocol on Interface vlan1,
changed state to down
03:08:32: VBRP: vlan1 API Software interface going down
03:08:32: VBRP: vlan1 Grp 0 Active: b/VBRP disabled
03:08:32: VBRP: vlan1 API MAC address update
03:08:32: VBRP: vlan1 Grp 0 Active router is unknown, was local
03:08:32: VBRP: vlan1 Grp 0 Active -> Init
The vlan1 port becomes down, so VBRP turns from Active to Init.
The following debug information shows the converting process from Active
to Standby.
03:11:53: VBRP: vlan1 Grp 0 Active: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
03:11:53: VBRP: vlan1 API MAC address update
03:11:53: VBRP: vlan1 Grp 0 Active router is 128.255.16.3, was local
03:11:53: VBRP: vlan1 Grp 0 Active -> Speak
The Active device receives one Hello packet with high priority from another
devicer (128.255.16.3). The router is configured as preempt, so the device
enters the Speak state.
03:11:56: VBRP: vlan1 Grp 0 Speak: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
03:11:59: VBRP: vlan1 Grp 0 Speak: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
03:12:02: VBRP: vlan1 Grp 0 Speak: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
03:12:03: VBRP: vlan1 Grp 0 Speak: d/Standby timer expired (unknown)
03:12:03: VBRP: vlan1 Grp 0 Standby router is local, was unknown
03:12:03: VBRP: vlan1 Grp 0 Speak -> Standby
The Hello packet is not received from other Standby device, so the device
turns from Speak to Standby.
The priority of the Standby device is adjusted to 200 and it turns to Active.
IPFIX Technology
Overview
This chapter describes the working principle of IPFIX.
Main contents:
Terms
Terms
IPFIX-IP Flow Information Export
IPFIX Packets-The packets sent to the IPFIX workstation from the IPFIX
module; it carries the IP flow statistical information monitored by the
IPFIX on the network devices. The IPFIX packets are UDP packets and
assembled according to the NetFlow v9 mode.
IPFIX flow record-a type of IPFIX packets; it records the statistics of the
IP flow.
IPFIX option records-a type of IPFIX packets; it records the content of the
statistical options irrelevant with single IP flow in the IPFIX.
IPFIX restrictions
1. Determine the ports to monitor traffic. The ports are called observation
points.
3. Configure the address of the IPFIX server and the UDP destination port
number. The destination address of the IPFIX packets and the UDP
destination port number will use the configuration.
times out, the statistical information of the flow will be delivered to the
IPFIX.
IPFIX Restrictions
The restrictions of the IPFIX in a switch are as follows:
2. For the statistics of INGRESS flow, only the unicast flow is counted. For
the unicast flow, the chip forwards the packets through a single port
instead of multiple ports (namely, it cannot be flooding). The flow
statistics of the egress is not restricted.
Packet Header
System Uptime: the running time of the device, with the unit of ms.
Flo wSet
FlowSet includes: Template FlowSet and Data FlowSet. One IPFIX packet
can contain multiple FlowSets.
Template FlowSet
The template can be classified into flow record template and option record
template. The flow record template defines how to explain the flow record;
the option record template defines how to explain the option records.
Template ID: for the matching of data and template. It starts from 256.
Filed Length: the number of bytes of the field defined by the field type.
Template ID: for the matching of data and template; it is greater than 255.
Scope Field Type: the type of the scope field quoted by the relevant data
of the IPFIX process 0x1: system; 0x2: interface; 0x3: line card; 0x4:
IPFIX cache; 0x5: template.
Option Filed Type: the type of the option data, the used value is the same
as the field type value described in flow template.
The types of the fields used in the IPFIX template are as follows:
Data FlowSet
FlowSet ID: The FlowSet ID is corresponding to the template ID; the IPFIX
explains the data information according to the corresponding relation.
By default, the packet forwarding can be realized between any two ports in
one VLAN of the switch. To realize that any specified port in one VLAN
cannot communicate, you can configure the isolated port in the specified
port mode so that the port configured with the port isolation cannot
communicate with the specified isolated port.
The port isolation feature is not related with the port VLAN. Currently, the
switch supports configuring the isolated port in the common port and
aggregation port mode. The configured isolated port can be common port
or aggregation port. The port isolation function only realizes the uni-
directional packet dropping. Suppose that the configured isolated ports on
port A are port B, C, and D. If the destination port of the packet entering
from port A is B/C/D, the packet is directly dropped. But if the destination
port of the packet entering from port B/C/D is A, the packet can be
forwarded normally.
Illustration
Command Description
switch(config)#port 0/0/1 Enter the port configuration mode
switch (config-port-0/0/1)#isolate-port port0/0/2- Configure port0/0/1 to be isolated
0/0/3 from port0/0/2 and port0/0/3
switch (config-port-0/0/1)#exit Exit the port configuration mode
Split horizon: learn the route from one interface, but do not advertise
the route to the interface. The IPv6 RIPng protocol is one measure to
prevent the route loop.
Poisoned reverse: Learn the route from one interface and then advertise
the route to the interface with unreachable metric (16). IPv6 RIPng
protocol is one measure to prevent the route loop, which is more active
than Split horizon.
The advantages of the IPv6 RIPng protocol are that the protocol is simple
and the configuration is simple, but the route information that needs to be
advertised by the IPv6 RIPng is proportional to the route quantity of the
route table. When there are many routes, many network resources are
consumed. Meanwhile, the IPv6 RIPng protocol defines that the maximum
hops of the route devices that are passed by the route path is 15 hops.
Therefore, the IPv6 RIPng protocol is just used for the simple middle/small
networks.
The IPv6 RIPng protocol can be used for most of the campus networks and
the area networks with simple structure and strong continuity. Generally,
the complicated environments do not use the IPv6 RIPng protocol.
TCPv6 UDPv6
A shown in the above figure, the IPv6 RIPng protocol is one routing
protocol based on the UDP protocol. The protocol packet sent by the IPv6
RIPng protocol is encapsulated in the UDPv6 packet. By default, IPv6
RIPng protocol uses the 521 port to send and receive the protocol packets
from the remote route device, updates the local route table according to
the route information in the received protocol packet, and then add the
metric with 1 to advertise to the other adjacent route device. In this way,
all route devices in the route domain can learn all routes.
IPv6 RIPng protocol sends the protocol packets in three modes, as follows:
command version must be zero route table entry route table entry
(1 byte) (1 Byte) (2 Bytes) (20 Bytes) (20 Bytes)
IPv6 RIPng header has two fields: Command field identifies the packet is
the request packet (the value is 1) or the response packet (the value is 2);
the version field is always 1.
Route table entry can have two types, which are described as follows:
Table 34-2 Route table entry type of the IPv6 RIP protocol
IPv6 prefix (16 Bytes) IPv6 next hop address (16 Bytes)
Route Tag Prefix len Metric Must be zero Must be zero 0xFF
(2 Bytes) (1 Bytes) (1 Bytes) (2 Bytes) (1 Bytes) (1 Bytes)
Packet type?
Request Response
packet Else packet
IPv6 RIPng protocol start
packet
Routes
changed?
N
Update all routing
information to neighbor
Y
Trigger update
routing information
30 Sec
The basic work flow of the IPv6 RIPng protocol is as shown in the above
figure, including two parts. One is the flow of starting the protocol and the
other is the flow of processing the received packet.
After receiving the response packet of the request packet, update the
routes in the route database according to the route information in the
packet and then advertise the changed route to IPv6 RIPng of other
adjacent route device (Triggered updates).
Meanwhile, enable the Updates Timer and use the route response packet
to advertise all route information to IPv6 RIPng of all adjacent route
devices, so as to ensure the synchronization of the route database
between IPv6 RIPng of each route device and update the advertised route.
In this way, the previous advertised route does not time out and become
invalid on other route devices.
R oute Database
The route database records all route information of the IPv6 RIPng
protocol. Each route information comprises the following elements:
5. Source IPv6 address: The source IPv6 address of the response packet
that learns the route;
6. Route tag: It is defined by the user, used to tag one type of route. For
example, tag one route is got by re-distributing the BGP route.
5. The route learned from IPv6 RIPng of the adjacent route device;
Therefore, for the re-distributed route, when the sending interface is the
next-hop interface of the route, the route carries the next-hop address of
the route.
The following provides one instance to describe the using of the next-hop
address information of the route information in IPv6 RIPng.
As shown in above figure, IPv6 RIPng runs on Switch-A; IPv6 RIPng and
IPv6 OSPFv3 run on Switch-B; IPv6 OSPFv3 runs on SwitchC. IPv6 RIPng
in Switch-B re-distributes the IPv6 OSPFv3 route 11::/24 learned by the
local device so that switch-A can learn the route to the subnet 11::/24.
When the route is learned on switch-A, the next-hop is Switch-B, that is,
fe80::0201:7aff:fe4f:73f8 by default. As a result, the packets forwarded
from switch-A to the destination subnet 11.0.0.0/8 all first pass switch-B
and then reaches Switch-C.
R oute Update
When IPv6 RIP of the adjacent route device learns one route, add 1 to the
metric before route processing, so as to accumulate the metric hops.
When the metric is smaller than 15, the route is the reachable route; when
the metric is larger than or equal to 16, the route is un-reachable route.
If the route complies with the following conditions, use the route to update
the routes in the route database:
1. The route does not exist in the route database and the metric of the
route is smaller than 16 hops;
2. The route exists in the database and the source IPv6 address is
consistent with the source IPv6 address of the learned route;
3. The route exists in the database, but the metric is larger than or equal
to the metric of the learned route.
Running Holddown
Timer timeout Running
holdown timer Invalid +
Invalid flush timer on
and Holddown routes
flush timer on
routes
Flush Flush
Timer timeout Timer timeout
Flush
(Delete route from
database)
IPv6 RIPng protocol has four timers, including Update Timer, Invalid Timer,
Holddown Timer, and Flush Timer. The timers are described as follows:
C ounting to In fin it y
The IPv6 RIPng protocol permits the maximum metric to be 15. The
destination whose metric is larger than 15 is regarded as unreachable.
This limits the network size and prevents unlimited transmission of the
route information. The route information is transmitted from one route
device to another route device and the metric is added with 1 after
transmitting for one time. When the metric exceeds 15, the route is
deleted from the route table.
Spli t Horizon
The route learned from one interface cannot be advertised to the same
interface. If the route learned from one interface is advertised to the same
interface, it may result in the route loop.
The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6
RIPng of the route device learns the route information A from one
interface, the response packet sent to the interface cannot contain the
route information A.
Split Horizon has one special case. When one interface receives a part of
the route information request packet, the response of the packet does not
perform Split Horizon.
Poisoned Re verse
The purpose of the poisoned Reverse is the same as that of Split Horizon,
but there is a little difference as follows.
The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6
RIPng of the route device learns route information A from one interface,
the route response packet sent to the interface contains route information
A, but the metric is set as 16 (that is unreachable).
H olddo wn Ti mer
Holddown timer is to deny the route entry to be updated by the route
response packet within some time after becoming unreachable.
Holddown timer ensures that the unreachable route is not updated by the
response packet before each route device receives route unreachable
information. The information of the route entry in the received response
packet may be the one advertised previously.
Triggere d updates
Triggered updates is to use the route response packet to advertise the
route change information to the adjacent route device at once when the
route changes.
Poisoned Reverse and Split Horizon breaks he route loop formed by ant
two route devices, but the route loop formed by three or more route
devices still appear until the metric of he route is transmitted and
accumulated to unreachable (16). Triggered Updates can speed up the
route convergence, so as to shorten the time of breaking the route loop.
Area: the collection of route devices, which has such topology database:
OSPFv3 divides one AS into multiple areas; the topology of one are is
invisible to another area, which reduces the number of routing information
in an AS. The area is used to contain link state updates and enables the
administrator to create hierachical network.
LSA- Link state advertisement: the data unit for describing local route
device or network state. For a route device, the interface state of the route
device and the adjacency state are contained. The advertisement of each
link is sent to the entire area. The route device uses the collected link
state advertisement to form the link state database.
Stub Area-the area that has only one interface connected with the
external. Category 5 LSA cannot be flooded to the area.
Backbone Area-Composed of all area boarder route devices and the links
among them.
OSPFv3 detects the changes of IPv6 link and network in the AS and
advertises the link state information. After the convergence for some time,
new route is formed. The convergence time is short and the link state
information is insufficient. In the OSPFv3 protocol, each route maintains
one network topology database describing the AS. Each specific route
device has the same database. Each record of the database is the local
state of the specific route device The route device distributes the local
states through the flooding mode in the AS.
All route devices run the same algorithm in parallel. Each route device
uses the link state database to generate a shortest path tree with itself as
the root. The shortest path tree provides the route to each destination in
the AS. The external routing information serves as leaves in the tree.
OSPFv3 advertises the IPv6 information including IPv6 prefix and the
prefix length. The last calculated IPv6 route includes one prefix and the
prefix length. IPv6 datagram is routed to the best route.
SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder
router (ABR);
SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder
router (ABR);
SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router
(ABR);
Process o f OSPFv3
The basic idea of OSPFv3: in the AS, each route device running OSPFv3
collects the IPv6 link state. Broadcast the link state in the entire system
through the flooding mode. Then, the entire system maintains the
synchronized link state database. Each route device calculates a shortest
path tree with the device itself as the root and other network nodes as the
leaves through the database. Then, the best routes to many places in the
system are obtained.
The route devices running the OSPFv3 form an AS. The AS can be divided
into multiple areas. For each route device in the area, an AS topology (link
state database is required).
If the network type is broadcast or NBMA network, the route device A will
select the DR and BDR from the known neighbors. In addition, it creates
adjacency with them. As a result, the data traffic is reduced for all route
devices create adjacencies only with the DR and BDR.
After the topology is obtained, route device A runs the SPF algorithm to
generate a shortest path tree to other route devices in the area with its
own as the root. Calculate the shortest path of each route according to the
routing information advertised by each route device and then record it in
the IPv6 routing table. The route to the destination in the future is
obtained from the routing table.
Each route device in the area exchanges link state information with
specified route devices continuously. Therefore, the adjacencies of each
point-to-point link exchange link state information paralelly. After the link
state information is exchanged, the link state information will also be
flooded. Therefore, the route devices in the entire area have the same link
state database.
The area boundary router belongs to multiple areas at the same time.
Therefore, the route of the home area of route device A will be advertised
to other areas, and the routes of other areas will be advertised into the
area. Through the exchange of topology in the boundary route devices,
the home area of route device A learns the network topology and routes of
the entire AS area. In the OSPFv3, the boundary routers form the
backbone area.
According to the NSF capability, the route devices are divided as follows:
NSF-Capable routing device: the route device with the None Stop
Forwarding capability. It is required that the device has the dual-control
redundancy and routing protocol GR capability.
GR-Capable routing device: the route device with the graceful restarting
capability.
GR-Aware route device: the route device that can be aware that GR
happens to the neighbor and can help the neighbor to complete GR. GR-
Capable route device is also the GR-Aware route device.
GR-Unaware route device: the route device that cannot be aware that GR
happens to the neighbor and cannot help the neighbor to complete GR.
According to the role of the route device in the GR process, the route
device can be divided as follows:
GR-Restarter route device: the route device that performs the protocol
graceful restarting;
GR-Helper route device: the route device that helps the protocol graceful
restarting.
period, the neighbor plays the role of Helper, also called Helper mode,
including enter and exit Helper mode.
Graceful period rule: Do not generate any type of LSA. Do not perform the
update processing for the received self-generated SLA, but just receive it.
Permit the route calculation, but do not install the route to the system
forwarding table. If the device is DR before restarting, it is still DR after
restarting.
The features of entering the graceful restarting period: After the interface
becomes up, first generate Grace-LSA to advertise the neighbor. Delay
sending the Hello packet, so as to receive the hello packet of the neighbor
and enter the 2-way status. After the adjacency becomes FULL, perform
the SPF calculation, but do not install the route to the core route table.
If Route device (X) wants to complete the graceful restarting, its neighbor
route device (Y) must help to complete the graceful restarting. The device
that helps to complete the graceful restarting is Helper. During the period,
Helper is also called entering the Helper mode. The feature is that it is
based on each segment, that is, the link with the adjacency relation;
During the restarting period, advertise the link of the restarting route
device. For the virtual link, still set Bit V.
When the route device at Helper end receives Grace-LSA of the neighbor,
set the neighbor restart flag and prepare to enter the Helper mode. The
following conditions need to be met: Check whether X (the graceful
restarting route device Restarter) and Y (Helper route device) are the FULL
adjacency; After X restarts, the related link does not change; whether the
If meeting any of the following conditions, exit the Helper mode: Grace-
LSA is deleted; Grace period is due; the link database contents change.
The action of exiting the Helper mode: Re-elect the DR of the segment and
regenerate Router-LSAs of the segment. If it is DR, regenerate Network-
LSA; if it is virtual link, regenerate the Router-LSA of the virtual link.
The LSDB is composed of link state advertisements (LSA). The LSA can be
divided into 8 categories:
In the area boarder router, all areas use the intra-area routes calculated
to form an Inter-Area-Prefix-LSA and flood it to other areas. The backbone
area uses the intra-area inter-area routes calculated to form an Inter-
Area-Prefix-LSA and flood it to other areas. All boarder routers and the
links among them form the backbone area. Backbone areas are mutually
reachable. They can be connected physically or through the virtual link. In
the case of configuring the virtual link, the passed area must be transit
area, instead of stub or NSSA area.
The ASBR of the AS sends the external routing information to all areas
except the stub area in the AS. The route devices in the stub area are
directed to the ASBR through the default route.
OSPFv3 packet has a standard OSPFv3 header. The length of the packet
header is 16 bytes. The recorded information determines whether further
processing is required.
Type: the packet type at the later part of the OSPFv3 header. The OSPFv3
has five types of packets. Hello packets, type=1; database description
packets, type=2; link state request packets, type=3; link state update
packets, type=4; link state acknowledgement, type =5.
Area ID: the area where the packet is generated; when the packet passes
the virtual link, area ID is 0.0.0.0.
The hello packets are for creating and maintaining adjacencies. After the
interface is UP, if the OSPFv3 is started, the hello packets are sent
periodically to detect neighbors and thus to create adjacency relation.
After the adjacency relation is created, periodically hello packets are
required to maintain the adjacency. Hello packets contain some necessary
consistent parameters required when the neighbor sets up the adjacency,
such as the hello interval and neighbor dead time. If they are inconsistent,
the hello packets will be discarded.
Interface ID: a 32-bit number; it identifies the interface sending the hello
packets in the local route devices, such as the IfIndex.
Router priority: it is used in the case of selecting DR and BDR. When the
router priority is 0, the route device cannot be selected as DR or BDR..
Option: The optional capability supported by the route devices. See the
option domain in OSPFv3 packets.
Router Dead Interval: if no hello packets are received in the router dead
interval, the neighbor is considered to be down. Delete the neighbor.
Backup DR: the router ID of the BDR selected by the interface generating
the packets.
Neighbor: the list of the neighbors that can receive hello packets at the
interface generating the packets in the router dead interval.
Interface MTU: the maximum IPv6 packets that can be transmitted when
the interface generating the packets is not fractionized When the packets
are transmitted in the virtual link, the interface MTU is set to 0.
I-bit: initial bit, when the packet is the initial packet of the DD packet
sequence, the bit is 1.
M-bit: when the packet is the last packet of the DD packet sequence, the
bit is 1.
Link State ID: works with link state type and advertising router to identify
a LSA.
Advertising Router: the router ID of the route device generating the LSA
In the process of creating neighbors, when the link state request packets
are received, the LSA in the local database is sent to neighbors through
the update packets. In addition, if the local link state changes, the
changed LSA is sent out through the update packets. The flooding
mechanism is used in the case of sending update packets.
LSA header
Link State ID: works with link state type and advertising router to identify
a LSA.
Advertising Router: the router ID of the route device generating the LSA
V: Virtual Link Endpoint bit; set the bit when the route device generating
the packet is one end of a virtual link
E: External bit, set the bit when the route device generating the packets is
ASBR
B: External bit, set the bit when the route device generating the packets is
ASBR
W: Multicast bit, it is set when the route device generating the packet is
the wild-card multicast receiving route device.
Neighbor Router ID: the router ID of the neighbor route device; the point-
to-point interface refers to the router ID of the neighbor route device; the
multipoint access interface type refers to the router ID of the DR router.
Link State ID: for the Network LSA, it is the interface ID of the DR
interface
Attached Router: the list of the route devices adjacent to the DR in the
network
Options: the option capability of the route devices described in the LSA.
Metric: the cost for reaching the destination route device described in the
LSA.
Destination Router ID: the router ID information about the described route
devices.
E: External metric bit, the type of the external cost used by the route If
the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.
T: the tag bit of the route, if it is set to 1, it indicates that the tag value
exists.
Referenced LS Type: the LS type related with the LSA; if the value is set,
the Referenced Link State ID exists; through the LS Type, Link State ID
and the advertised router ID of the LSA, you can find the related LSA.
Each IPv6 link in the route device generates a corresponding link LSA. The
link LSA is advertised only in the local link. The content of the
advertisement contains the IPv6 link-local address and the IPv6 prefix
address in the link. The link ID of the LSA is the interface ID.
Options: the options will be used in the Network LSA where the link
resides.
Referenced LS Type, Link State ID, Advertising Router: the LSA related
with IPv6 prefix advertised by LSA can be router-LSA and network-LSA.
DC: set the bit in the case of configuring the demand line
EA: set the bit when the source route device has the capability of
receiving/sending external attributes LSA
N: used only in the hello packets, set it to 1 when the NSSA external LSA
is supported; set it to 0 when the NSSA external LSA is not supported;
when N is set to 1, the E bit must be 0.
P: used only in the NSSA external LSA headers If P bit is set, the ABR of
NSSA must convert type 7 LSA to type 5 LSA.
MC: set the bit when the source route device forwarded multicast packets.
E: set the bit when the source route device received the ASE LSA packets.
NU: non-unicast address, if the bit is set to 1, it indicates that the prefix
address cannot be used in the case of calculating routes.
LA: local address, indicates that the prefix address is a local address
advertising the route device.
MC: set the bit when the source route device forwarded multicast packets.
P: the prefix used in NSSA External LSA. If P bit is set, the ABR of NSSA
must convert type 7 LSA to type 5 LSA.
2. Same Points
The basic packet types are the same, including hello, LS-DD, LS-Req, LS-
Upd, and LS-Ack. The process and principle of neighbor discovery and
adjacency creation are the same. The types of the supported interface
network are the same, including P2P, P2MP, Broadcast, NBMA, and Virtual.
The flooding mechanism and the aging mechanism of LSA are the same.
The SPF calculation principles are also the same. The contained LSAs are
basically same. Two types of LSA are added in OSPFv3 to advertise IPv6
Link-local address and IPv6 prefix address. Router ID, Area ID, and Link
ID use the IPv4 address format.
3. Difference
OSPFv3 is running on an IPv6 link. The concept of subnet does not exist.
OSPFv2 is running on a subnet.
In one IPv6 link, multiple OSPFv3 processes are allowed and they are
identified through the Instance ID. But one IPv4 interface can run one
OSPFv2 process only.
The link ID of the OSPFv2 LSA expresses the IPv4 address information.
But the link ID of OSPFv3 does not express the address information. It is
used to identify different LSAs and it has no special meaning.(a few link
IDs of the LSA express the interface ID information, such as Network LSA).
OSPFv3 multicast packets use the IPv6 multicast address to send. Unicast
packets use the IPv6 link local address to send.
The scope of the OSPFv2 LSA flooding is judged from the LSA type. The
header of OSPFv3 LSA contains the flooding scope (flag bit of other
capabilities are also contained, for example, how to process the
Two types of LSAs are added in OSPFv3: Link LSA, advertises the link local
address, and it is flooded only in the local link; Intra-area-prefix LSA, used
to advertise the IPv6 address information of the interface.
O SFPv3 Features
OSFPv3 Features
9. Equivalent multiple paths: If there are multiple paths with the same
cost to the destination, OSPFv3 finds the paths and uses load balancing.
10. Support stub area: when the area is set to stub area, the external
LSAs cannot be flooded to the stub area. In the stub area, the route to
the external destination is specified by the default route.
Memory of route device: the link state database of the OSPFv3 may
become very large, especially when many external link states are
advertised. In this case, the memory of the route device must be very
large. In the process of updating and synchronizing the link state database,
large amount of memory is used.
CPU usage: in the OSPFv3, it is related with time of running the SPF
algorithm. Moreover, it is related with the number of route devices in the
OSPF system. In addition, when the link state database is very large, in
the process of protocol convergence, if large amount of packets should be
exchanged, a great deal of CPU is occupied.
Specify the router role: specify the router in the multi-access network to
receive and send more packets than other route devices. At the same time,
when the specified router fails, it is switched to a new specified router.
Because of this, the number of the route devices connected to a network
should be restricted.
Precautions of OSPFv3
Limiting the size of the OSPFv3 system can save the memory of the route
device.
In the area, to reduce the database size, do as follows: 1. the area can
use the default route, so reduce the external route that should be input; 2.
EGP external gateway protocol can use its own information to pass the
OSPFv3 AS area instead of depending on the IGP (such as OSPFv3) to
transmit information; 3. You can specify the route device to be the stub
area; 4. If the prefix address of external network is regular address, you
can summarize the addresses. After the summary, the external
information of the OSPFv3 decreases dramatically.
IS-IS Multi-Topology
NET (network entity title): It is used to identify the ISO address of one
intermediate system. It is similar to the IP address and is divided to area
ID and system ID;
Area: The route area in the IS-IS protocol, including Level-1 Area and
Level-2 Area;
LSP (Link State PDU): Bear the link status information to be publicized,
including the adjacency information and reachable subnet information;
LSDB (Link State Database): It comprises the LSP generated by all ISs of
the whole area, describing the adjacency topology and related route
information of the whole area. LSDB has the same backup on each IS. IS
uses the SPF algorithm to calculate the route according to its own LSDB;
PSNP (Partial Sequence Number PDU): It is one kind of the SNP packet,
used to confirm the LSP packet (point-to-point network) and request the
LSP packet (broadcast network);
CSNP (Complete Sequence Number PDU): It is one kind of the SNP packet,
used to advertise the LSDB abstract description information;
DIS (Designated IS): One IS system elected from all ISs on the broadcast
network. It is responsible for simulating one Pseudo-node and maintaining
the synchronization of LSDB of all ISs on the broadcast network.
The IS-IS protocol can support the routes of multiple protocol stacks,
including IPv4, IPv6, and OSI. At first, the IS-IS protocol is applied in the
OSI protocol stack (ISO10589) and then is used in the routes of IPv4
protocol stack (RFC1195) and the IPv6 protocol stack (draft-ietf-isis-ipv6).
Meanwhile, the IS-IS protocol can support the CSPF calculation of MPLS-TE
(RFC3784).
The IS-IS protocol has good compatibility (the different devices that
realize different expanding functions can also be compatible with each
other) and large the network capacity; it supports the multiple protocol
stacks and can be upgraded smoothly; it is simper than OSPF and is
unlikely to have problems. Therefore, IS-IS is suitable for large core
backbone network.
To generate the route of the IPv6 address stack according to the topology,
each route device needs to advertise the IPv6 reachable subnet
information when advertising the link status information. After calculating
the shortest path (SPF tree) to all route devices, generate the IPv6 route
according to the shortest path and the IPv6 reachable subnet information
advertised by the route devices.
Meanwhile, you need to check the interface address: When checking the
IPv4 address stack, check whether the Hello packet of the neighbor
advertises the IPv4 address of the same subnet as the local interface;
when checking the IPv6 address stack, check whether the Hello packet of
the neighbor advertises the Link Local Address.
First, calculate the shortest path to the route device, that is, calculate the
SPF tree by the SPF algorithm. And then calculate the route according to
the shortest path and the advertised IPv6 reachable subnet information to
the route device.
IS-IS Multi-Topology
O ve r vie w
In the previous IS-IS protocol, the advertised link status database has
only one network topology, which is Single-Topology.
2 R |R |R |R | MT ID R |R |R |R | MT ID R |R |R |R | MT ID
0 - 253
Octect
Num
Multi-Topology TLV
1 Type 229
1 Length
2 O |A |R |R | MT ID
O |A |R |R | MT ID
Point-to-point Neighbor
When the neighbor has any kind of topology which is the same as the
interface, set up the neighbor. Otherwise, the neighbor cannot be set up.
When calculating the single-topology route, use the related flag carried by
the LSP header.
IBGR: the BGP in the same AS. An IBGP neighbor is the routing device in
the same management control domain.
MP-BGP (Multiprotocol BGP): The BGP that carries different kinds of route
information is called multi-topology BGP.
BGP uses the TCP as the transmission protocol (port 179). Then, reliable
data transmission is provided. The retransmission and acknowledgement
Create a TCP connection between two routing devices running BGP. Then,
the two routing devices are called peers. Once the connection is created,
the two peer routing devices acknowledge the connection parameters by
exchanging the open packets. The parameters include BGP version number,
AS number, duration, BGP identifier and other optional parameters. After
the two peers negotiate parameters successfully, the BGP exchanges
routes by sending update packets. The update packets contain the list of
reachable destinations passing each AS system (namely NLRI), and the
path attributes of each route. When the route changes, incremental
update packets are used between peers to transmit the information. BGP
does not require refreshing routing information periodically. If the route
does not change, the BGP peers only exchange keepalive packets. The
keepalive packets are sent periodically to ensure the valid connection.
IPv6 BGP4+ is the inter-domain routing protocol that supports IPv6. Based
on BGP4, it reflects the information of the IPv6 network layer protocol to
NLRI and Next_Hop attribute. It brings in two NLRI attributes, that is,
MP_REACH_NLRI (Multiprotocol Reachable NLRI, used to release reachable
IPv6 route and next-hop information) and MP_UNREACH_NLRI
(Multiprotocol Unreachable NLRI, used to cancel the unreachable IPv6
route). The Next_Hop attribute is identified by the IPv6 address, which can
be IPv6 global unicast address or next-hop link local address. IPv6 BGP4+
uses the BGP4 multi-topology expanding attribute to be applied in the IPv6
network, while the original message mechanism and routing mechanism of
BGP4 do not change, so we can say that the application situation and
working principle of IPv6 BGP4+ are the same as BGP4.
BG P Message H eader
The BGP message header contains a 16-byte tag, 2-byte length field, and
1-byte type field. The following figure illustrates the format of the BGP
message header.
Length: the length field occupies 2 bytes. It indicates the length of the
message. The minimum allowed length is 19 bytes and the maximum is
4096 bytes.
Type: The type field occupies one byte. It indicates the type of the BGP
message. The four types of the BGP message are as follows:
Number Type
1 Open
2 Update
3 Notification
4 Keepalive
O pen Messages
After the TCP connection is created, the first packet is the open message.
The Open message contains BGP version number, AS number, duration,
BGP identifier, and other optional parameters.
If the open message is acceptable, it means that the peer routing devices
agree with the parameters. In this case, the keepalive message is sent to
acknowledge the open message.
Except the fixed BGP header, the open message contains the following
fields:
Version: the version field occupies one byte. It indicates the version
number of the BGP protocol. When the neighbors are negotiating, the peer
routing devices agree on the BGP version numbers. Usually, the latest
version supported by the two routing devices is used.
Hold Time: the field is two bytes. It indicates the maximum waiting time
when the sending party receives the adjacent keepalive or update
messages. The BGP routing device negotiates with the peer and set the
hold time to the smaller value of the two hold times.
BGP Identifier: the field is four bytes. It indicates the identifier of the BGP
sending routing devices. The field is the ID of the routing device, namely
the maximum loopback interface address or the maximum IP address of
the physical interfaces. You can set the address of the router-id manually.
Optional parameter Length: the field is one byte. It indicates the total
length of the optional parameter fields (the unit is byte). If there are no
optional parameters, the field is set to 0.
U pdate Message
The update message is used to exchange routing information between BGP
peers. When you advertise routes to a BGP peer or cancel the routes, the
update message is used. The update message contains the fixed BGP
header and the following optional parts:
Total Path Attribute Length: the field is two bytes; it indicates the total
length of the path attribute field.
Path Attribute: the variable long field contains the BGP attribute list
related with the prefix in the NLRI. The path attribute provides the
attribute information of the advertised prefix, such as the priority or next
hop. The information is for route filtering and route selection. The path
attribute can be classified into the following types:
higher is the route priority. The local_pref is not contained in the update
message sent to the EBGP neighbor. If the attribute is contained in the
update message from the EBGP neighbor, the update message will be
ignored.
AGGREGATOR: the attribute marks the BGP peer (IP address) performing
the route aggregation and the AS number.
the processing of local priority. The external routing device may affect the
route selection of another AS. The local priority only affects the route
selection in the AS.
Network Layer Reachability: the variable long field contains the list of
reachable IP address prefix advertised by the sender.
K eepal i ve Message
The keepalive messages are exchanged between peers periodically to
check whether the peer is reachable.
Error Code: one byte, the field indicates the error type.
ERROR SUBCODE: one byte, the field provides more details about the
error.
DATA: variable length field, the field contains the data related with the
error, for example, invalid message header, illegal AS number. The
following table lists the possible error codes and the error subcodes.
Table 34-4 BGP Notification message error code and error subcode
IE Description
1 BGP starts
2 BGP ends
3 BGP transmission connection opens
4 BGP transmission connection is terminated
5 Fail to open the BGP transmission connection
6 BGP transmission fatal errors
7 Retrying connection timer times out
8 Duration time terminated
9 Keepalive timer terminated
10 Receive Open messages.
11 Receive Keepalive messages.
12 Receive update messages
13 Receive notification messages
Idle: initial status, the BGP is in the idle status until an operation triggers
a startup event. The startup event is usually triggered by the creation or
restart of BGP session.
active status. If the re-connecting the timer times out, it remains in the
connection status; the timer will be reset and one transmission connection
is started. If any other events occur, it returns to the idle status.
Active Status: in the status, BGP attempts to create a TCP connection with
the neighbor. If the connection succeeds, send the Open message, and
move to the status of sending open message. If re-connecting timer times
out, the BGP restarts the connection timer and goes back to the
connection status to monitor the connection from the peers.
OpenSent: in the status, the open message is sent. BGP is waiting for the
open message sent from the peers. Check the received open message. If
any error occurs, the system sends a notification message and goes back
to the idle status. If no error occurs, the BGP sends a keepalive message
to the peer and resets the keepalive timer.
Established: the last phase of the neighbor negotiation. In the status, the
connection between BGP peers is established. Between peers, the update,
notification, and keepalive messages can be exchanged.
Well-Known Mandatory;
Well-Known Discretionary;
Optional Transitive
Optional Non-Transitive;
Optional Transitive: BGP does not need to support the attribute, but it
should accept the path with the attribute and the paths should be
advertised.
When multiple routes with the prefix of the same length and to the same
destination exist, BGP select the best route according to the following rules:
9. Preferentially select the route whose next-hop has the minimum IGP
metric;
11. Preferentially select the route with the minimum BGP ROUTER-ID;
13. Preferentially select the route from the lowest neighbor address;
14. If the BGP load balancing is started, rules 10-13 are ignored. All routes
with the same AS_PATH length and MED values will be installed in the
routing table.
Figure 34-31 In the same condition, preferentially select the route with
higher LOCAL_PREF value
User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred
ISP. When the device connected to the ISP1 announces routes to the
switch-F, set the LOCAL_PREF value higher. For the same destination,
preferentially select the routes learned by ISP1 for its LOCAL_PREF value
is higher.
Figure 34-32 In the same condition, preferentially select the route with
lower MED value
The two-host structure is used between a user and an ISP. The ISP prefers
to use LINK2 and use LINK1 as the backup. When the user publishes
routes to the ISP, the update packets with lower MED value are
transferred on LINK2. If the routes transferred on EBGP neighbor created
on LINK2 and LINK1 have no different options, the route with lower MED is
selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.
R oute Fi ltering
Route filtering means that a BGP speaker can determine the sent route
and the received route from any BGP peers. Route filtering is to define the
route policy. The procedure is as follows:
1. Identify Routes
3. Operation attributes
We can complete route filtering through access list, prefix list, or AS path
access list. We can also use the route mapping to implement filtering and
attribute operation.
The route reflector is recommended only in the large scale internal BGP
closed network. The route reflector increases the overhead of the route
reflector server. If the configuration is incorrect, the route may be cyclic or
unstable. Therefore, route reflector is not recommended in every topology.
All iance
The alliance is another method for processing the sharp increase of IBGP
closed network in the AS. Similar to the route reflector, the alliance is
recommended only in the large scale internal BGP closed network.
The concept of the alliance is put forward because one AS can be divided
into multiple sub-AS systems. In each sub-AS, all IBGP rules are
applicable. For example, all BGP routing devices in the sub-AS must form
a fully closed network. Each sub-AS has different AS number. Therefore,
external BGP must be run between them. Although the EBGP is used
between sub-AS systems, the route selection in the alliance is similar to
the IBGP route selection in a single AS. Namely, when the sub-AS boarder
is crossed, the next-hop, MED, and local priority information is reserved.
An alliance looks likes a single AS.
The defect of the alliance is: in the case of changing the plan from the
non-alliance to the alliance, the routing devices should be reconfigured
and the logical topology should be changed. In addition, if the BGP policy
is not manually set, you cannot select the best route through the alliance.
R oute Da mping
Route damping (route attenuation) is a technology controlling the
unstability of routes. It significantly reduces the unstability caused by
route oscillation.
The route damping divides the route into normal performance and bad
performance. Routes with normal performance demonstrate long-term
high stability. In addition, the route with bad performance demonstrate
unstability in short term. The route with bad performance should be
punished with direct proportion to the expected route unstability. Unstable
routes should be suppressed until the route becomes stable.
The recent history of the route is the basis of evaluating the future
stability. To know the route history, first, you should know the swing times
of the route in certain period. In the route damping, when the route
swings, it is punished. When the punishment reaches a predefined limit,
the route is suppressed. After the route is suppressed, the route can
increase punishments. The more frequent the route swing is, the earlier
the route will be suppressed.
Similar rules are used to un-suppress the route and re-advertise the route.
An algorithm is used to exit (reduce) punishment according to the power
law. The basis of configuring the algorithm is the parameters defined by
users.
BG P G raceful R estart
Principle of BGP Graceful Restart
After the route device becomes faulty, the neighbors in the BGP route
layer will detect that the neighborship becomes down and up, which is
called BGP neighbor oscillation. The oscillation of neighborship finally
causes the route oscillation. As a result, route blackhole occurs after the
routing device is restarted for a while or the data service of the neighbor
bypasses the restarted routing device. Consequently, the reliability of the
network is decreased.
The BGP graceful restart in the case of routing device failure prevents the
route disturbance and accelerates the route aggregation, which ensures
the network reliability.
1. In the BGP OPEN message, the graceful restart capability is added. The
fields are as follows:
2. In the BGP update packets, add the EOR flag to indicate that the
update is complete.
2. When any fault occurs, the forwarding layer of switch A reserves the
route and continue guiding the forwarding;
5. Delay the route calculation until the EOR flag from the neighbor is
received or the deter-timer times out.
6. Calculate the route, update the core route and advertise the route.
2. After the restarter end becomes faulty, if any TCP error is detected,
run step 3, if no TCP error is detected, run step 4.
4. Re-construct neighbors and delete the restart timer. If the timer exists,
start the stale-path timer.
5. Before the creation, the restart timer times out, or the fwd-flag in the
corresponding address family of the open message is not 1, or the
corresponding address family information is not contained, run step 8.
6. Send routes to the restart routing device. Then, send EOR flag.
7. If the stale-path times out before the EOR is received, run step 8.
8. Delete the reserved route and then enter the normal BGP flow.
GVRP Technology
This chapter describes the GVRP and GARP technology and the application.
Main contents:
Implementation of GVRP
Typical Application
Main contents:
GVRP overview
GARP principle
GVRP Overview
Generic Attribute Registration Protocol (GARP) provides the mechanism of
generic attribute registration, de-registration, and transfer. According to
different attributes of the GARP protocol packets, different upper layer
protocol applications are supported.
GARP Principle
G AR P Message
The information exchange between GARP members is through three types
of messages: join message, leave message, and LeaveAll message.
Join Message
When a GARP application entity wants other entity to register its own
attribute information, it will send join message. When the join message
from other entities is received or some attributes are statically configured
in the entity, if you need other GARP application entity to register, it will
send join message.
The join message includes JoinEmpty and JoinIn. The differences are as
follows:
Leave Message
When a GARP application entity wants other devices to de-register its own
attribute information, it will send the Leave message. When you de-
register some attributes after receiving the Leave message from other
entities or you de-register some attributes statically, it will send the leave
message.
The Leave message includes LeaveEmpty and LeaveIn. The differences are
as follows:
Leaveall Message
When each GARP application entity is started, the LeaveAll timer will be
started at the same time. If the timer times out, the GARP application
entity will send the LeaveAll message. The LeaveAll message is used to
Note
G AR P Timer
Join Timer
The Join timer is used to control the sending of Join message (including
JoinIn and JoinEmpty). To ensure the reliable transmission of the Join
message, you have to wait for the interval of the Join timer after the first
join message is sent. If the JoinIn message is received within one Join
timer interval, the second Join message will not be sent. If the JoinIn
message is not received, re-send a Join message.
Hold Timer
The hold timer is used to control the sending of Join message (including
JoinIn and JoinEmpty) and Leave message (including LeaveIn and
LeaveEmpty).
The value of Hold timer should be less than or equivalent to half of the
Join timer value.
Leave Timer
The Leave timer will be started after each application entity receives the
Leave or LeaveAll message. If the Join message of the attribute is not
received before the Leave timer times out, the attribute will be de-
registered.
LeaveAll Timer
After each GARP application entity is started, the LeaveAll timer will be
started. If the timer times out, the GARP application entity will send
LeaveAll message. Then, the LeaveAll timer is started to start a new cycle.
Protocol ID Protocol ID 1
Attribute Type The type of the attribute The value of GVRP attribute
type is 1; it indicates the VLAN
ID
Implementation of GVRP
GVRP is one application of the GARP. It maintains the VLAN dynamic
registration information and transmits the information to other devices
based on the GARP working mechanism. The manually configured VLAN is
called a static VLAN. The VLAN created through the GVRP protocol is called
a dynamic VLAN.
Enable the GVRP function (enable the GVRP function globally; enable GVRP
in the trunk port). Transmit the VLAN information allowed by the trunk
port to the connected network segment through the GVRP packet. When
the switch on the network segment receives the GVRP packets, it registers
or de-registers the LAN according to the parsed packet information. At the
same time, the switch transmits the VLAN information to the network
segment of the active port. As a result, the VLAN information is
transmitted to the entire switching network. When the GVRP is
transmitting information, the VLAN information is only transmitted in the
corresponding active ports (in the forwarding status). The active status of
the port is retrieved from the MSTP module. If the port is not in the
FORWARDING state in the instance mapped by the message VLAN after
receiving the message, directly drop the message and do not transmit it.
Normal Mode
Fixed Mode
Forbidden Mode
Note
Typical Application
Through the GVRP function, you only need to configure the VLAN of some
devices (boarder devices). Then, the VLAN configuration can be
automatically applied to the switching network, which reduces the work of
the administrator and reduces the possibility of making mistakes.
This section describes the Private VLAN protocol technology and the
application. The function is just applicable to MyPower 3400 and
MyPower4100.
Primary VLAN: The primary VLAN represents one sub domain. All PVLANs
in one PVLAN domain share one primary VLAN;
Secondary VLAN: There are two types of primary VLAN, including Isolate
VLAN and Community VLAN;
Isolated VLAN: The ports in one Isolated VLAN cannot perform the L2
communication each other. There is only one Isolated VLAN in one PVLAN
domain;
Community VLAN: The ports in one community VLAN can perform the L2
communication each other, but cannot perform the L2 communication with
the ports in other community VLAN. There can be multiple community
VLANs in one PVLAN domain.
Isolated port: It belongs to the Isolated VLAN and can only communicate
with the promiscuous port.
Primary VLAN and Secondary VLAN form one PVLAN domain. One PVLAN
domain must contain one and at most one Primary VLAN (therefore, we
take Primary VLAN to represent PVLAN domain), and can contain multiple
Community VLANs and at most one Isolated VLAN. The promiscuous port
belongs to all PVLANs of the PVLAN domain, while the host port only
belongs to its own Secondary VLAN and Primary VLAN.
In PVLAN domain, the host port of Isolated VLAN can only communicate
with the promiscuous port of primary VLAN, while the host ports in
Isolated VLAN cannot communicate with each other. The host port of
Community VLAN can communicate with the promiscuous port of primary
VLAN and the other host ports in the Community VLAN.
The above figure is one complete PVLAN domain. VLAN 2 is Primary VLAN;
VLAN 100 is Isolated VLAN; VLAN 101 and VLAN 102 are Community VLAN.
Port 0/0/7 is Promiscuous Port; Port 0/0/1 and Port 0/0/2 are Isolated
Port ; Port 0/0/3, Port 0/0/4, Port 0/0/5 and Port 0/0/6 are all Community
Port.
Port 0/0/7 can communicate with Port 0/0/1-Port 0/0/6; Port 0/0/1 and
Port 0/0/2 can only communicate with Port 0/0/7; Port 0/0/3 and Port
0/0/4 can communicate with each other and with Port 0/0/7. Port 0/0/5
and Port 0/0/6 can communicate with each other and with Port 0/0/7.
This chapter describes the Voice VLAN protocol technology and application.
The function is only applicable to MyPower 3400 and MyPower4100.
OUI address: The address range got by performing and on the MAC
address and address mask, used to identifying the packet sent by the VoIP
device of the manufacturer.
MyPower 3400 and MyPower4100 series switch matches the source MAC
address of the packet via the OUI address. The packet that complies with
Serial
OUI address Manufacturer
No.
1 0003-6b00-0000 Cisco phone
2 000f-e200-0000 H3C Aolynk phone
3 00d0-1e00-0000 Pingtel phone
4 00e0-7500-0000 Polycom phone
5 00e0-bb00-0000 3Com phone
When the source MAC address of the packet matches the OUI address of
the VoIP device, the data is regarded as the VoIP data, the priority of the
packet is automatically modified, and the packet is forwarded to the
corresponding Voice VLAN, ensuring the call quality.
When configuring the Voice VLAN on the port, the user can choose the
following two application modes:
Auto mode: When the port configured as the auto mode receives the
VoIP packet, automatically modify the priority of the packet, forward
the packet to the corresponding Voice VLAN, and use the aging
mechanism to maintain the ports in Voice VLAN. If the port does not
receive the data from the MAC address any more before the aging
time reaches, the MAC address automatically exits from the Voice
VLAN.
Manual mode: The user needs to use the default vid of the command
configuration port as the vid of the voice vlan.
The port in auto mode only processes the untagged voice flow. The system
uses the untagged packet sent on the VoIP device regularly, learns the
source MAC address and automatically adds the MAC address of the VoIP
device to the Voice VLAN; the MAC address that reaches the aging time,
but cannot update the OUI address is automatically deleted from the Voice
VLAN. The user needs to adopt the command to add the port to the Voice
VLAN or remove the port from the Voice VLAN manually.
The port in manual mode processes the voice flow in the configured VLAN.
The user needs to adopt the command to add the port of the access IP
telephone to the Voice VLAN directly.
The system regards that the tag packet is distributed with the priority, so
does not need to modify the packet priority.
The IP telephone that is configured with Voice VLAN manually does not
need the process of requesting the IP address in the default VLAN for the
first time, but always send/receive the voice flow with Voice VLAN Tag.
However, the IP telephone that is configured with IP address and voice
VLAN directly initiates registration and communication with the voice
gateway.
Table 2 The conditions for all types of ports to cooperate with the IP phone
that automatically gets the Voice VLAN information
Note In the above conditions, if the user configures the Voice VLAN
information of the IP phone manually, whether the access port needs to
permit the packets of the default VLAN to pass depends on whether the
port is connected to common PC, so the default VLAN is mainly used to
transmit the common service packets of the PC. If no common PC is
connected, the port does not need to permit the packets of the default
VLAN to pass.
Table 3 The conditions for all types of ports to cooperate with the IP phone
that sends the untagged voice flow in manual mode
If the port enables the Voice VLAN function and is configured as the auto
mode, use the PVID of the port to forward when receiving the first
untagged packet, and later, forward the packets according to the matching
status of the source MAC; if the tagged packet is received and tagged is
Voice-VLAN, forward the packet in Voice-VLAN.
For the port in manual mode whose default VLAN is Voice VLAN, any
untagged packet can be transmitted in Voice VLAN, but do not need to use
OUI to check.
Precautions
Voice VLAN uses some limitation conditions, as follows:
Both Voice VLAN and MAC VLAN need to use the hardware resources of
MAC VLAN. When Voice VLAN and MAC VLAN are configured for one MAC
address, only the configuration of Voice VLAN takes effect.
Figure 1 The network diagram when host and IP phone are connected to
switch in series
The manual mode is suitable for the network mode (as shown in Figure 2)
where the IP phone is separately connected to the switch (the ports only
transmit the voice packets). The static adding mode can make the port be
used to transmit the voice data privately, avoiding the influence of the
service data for the voice data transmission furthest.
Neighbor Discovery
Technology
Main contents:
Introduction to NDP
Typical Application
Hello packets: the packets are the basis of maintaining the neighbor
relation. In the packets, the information about the sender is encapsulated
for the receivers to learn and update.
Aging time: when the local devices failed to receive hello packets sent
from the neighbors after the aging time, the neighbor is thought to be
nonexistent. Delete the neighbor from the neighbor list.
Introduction to NDSP
The NDSP protocol is for detecting the directly connected Maipu devices.
The NDSP uses the hello messages (NDSP packets) periodically sent
between two directly-connected devices to maintain the neighbor relation.
By default, each Maipu device sends a NDSP packet to the connected
opposite party at an interval of 60 seconds. If no NDSP packets from the
opposite party are received after three hello periods (180 seconds,
holdtime or TTL), the local device deletes the NDSP neighbor device in the
NDSP neighbor table.
Typical Application
Illustration
As shown in the preceding figure, two switches are connected through port
0/0/0.
Configuration of Switch-a:
Command Description
SwitchA(config)#ndsp run Enable NDSP globally
SwitchA(config)#ndsp timer 30 Send hello packets of NDSP at an interval
of 30 seconds
SwitchA(config)#ndsp holdtime 150 Set the aging time of NDSP neighbor to
150 seconds
SwitchA(config)# port 0/0/0 Enter the port configuration mode.
SwitchA(config-port-0/0/0)#ndsp enable Enable NDSP port
Configuration of Switch-b:
Command Description
SwitchB(config)#ndsp run Enable NDSP globally
SwitchB(config)#indsp timer 35 Send hello packets of NDSP at an interval
of 30 seconds
SwitchB(config)#ndsp holdtime 160 Set the aging time of NDSP neighbor to
160 seconds
SwitchB(config)# port 0/0/0 Enter the port configuration mode.
SwitchB(config-if-dialer0)#ndsp enable Enable NDSP port
MFF Technology
Main contents:
MFF technology
Typical application
MFF Technology
In the traditional Ethernet networking scheme, to realize the L2 isolation
and L3 intercommunication between different client hosts, adopt the
method of dividing VLAN on the switch, but when there are many users
that need the L2 isolation, it occupies lots of VLAN resources. Meanwhile,
to realize the L3 intercommunication between clients, you need to divide
different IP segment for each VLAN and configure the IP address of the
VLAN interface. Therefore, dividing too many VLANs reduces the
distributing efficiency of the IP addresses.
MFF intercepts the ARP request packet of the user and replies the ARP
response packet of the gateway MAC address via the ARP pick-up
mechanism. In this way, you can force the user to send all traffic
(including the traffic in one subnet) to the gateway so that the gateway
can monitor the data flow, avoiding the vicious attack between users and
ensuring the security of the network deployment.
MFF Terms
Related terms:
AR (access router): the access router of the user terminal or the switch
with the L3 function; usually, it refers to the gateway of the subnet where
the user is located;
User port: the port that is directly connected to the network terminal user;
Network port: the ports that connect to other network devices, such as
access switch, aggregation switch and gateway.
MFF principle:
Get the IP address and MAC address of AR. In the DHCP environment,
get the IP address of AR via DHCP snooping and get the MAC address
of AR via ARP; in the static IP address environment, you need to pre-
configure the default IP address of AR and then get the MAC address
of AR via ARP.
Intercept the ARP request packet of the user and reply the MAC
address of AR to the user. In this way, the ARP request host forms
the MAC addresses to all other hosts as the ARP entries of the MAC
addresses of AR. When receiving the request packet for the user host
from AR, reply the MAC address of the user host to AR.
Filter the uplink packets and drop all unicast packets except for those
whose MAC address is AR. Because of the virus or other network faults,
the unicast packets whose destination MAC is other host may be
received, so these packets need to be dropped.
The VLAN in which MFF is enabled include two port roles, that is, user port
and network port. The two ports only limit the ingress packets.
1. User port (the port connected to the user terminal device) processes
different packets as follows:
In the VLAN enabled with the MFF function, all ports are the user ports by
default. The network ports need to be enabled via the command. The
limitation feature of the network ports and users for packets is just in the
VLAN enabled with the MFF function. In the VLAN not enabled with the
MFF function, the user ports and network ports do not have the above
features.
To get the ARP information of the gateway and ensure the availability of
the gateway, after enabling the MFF function of VLAN, the gateway
detection function is enabled by default. The user can force disabling the
gateway detection function via the command. The gateway detection relies
on the ARP information of the user. When one user is connected, MFF
intercepts the ARP packet of the user and uses the ARP information of the
user to detect the gateway. If the gateway is unavailable , the detection
interval is 5s; if the gateway is available, the detection interval is 30s by
default. The user can configure the detection interval of the gateway (the
gateway detection interval configured by the user can take effect only
when the gateway is available; when the gateway is unavailable, the
gateway detection interval is fixed as 5s.)
After MFF learns the ARP of the connected user from the user port, the
ARP aging function of the user is enabled by default. You can use the
command to disable it. By default, the aging interval is 300s. The user can
configure the aging interval. If the user ARP is not received in successive
four aging time, regard that the user does not exist any more and delete
the ARP information of the user.
Typical Application
As shown in the figure, switch A and switch B are the access devices of the
user terminal; switch C is the aggregation device.
Host A, Host B and Host C are the user hosts, which all belong to VLAN 10.
The corresponding IP addresses are 10.1.1.1 10.1.1.2 10.1.1.3. The MFF
function is enabled on the access device of the user terminal switch A and
switch B. When host A wants to communicate with host B, send ARP to
request the MAC address of host B; switch A intercepts the ARP request
and replies the MAC address of the gateway to Host A. As a result, host A
regards the MAC address of the gateway as the MAC address of Host B by
mistake. Therefore, it sends data to gateway. After the gateway receives
the data from host A, it is found that the destination IP address is Host B.
After querying the route, the gateway sends the route query result to Host
B. Similarly, the data sent from Host B to Host A is forwarded via the
gateway.
Switch A configuration:
Command Description
SwitchA(config)#port 0/1-0/2 Enter port mode
SwitchA(config-port-range)#port access vlan 10 Add port 0/1,0/2 to VLAN 10
SwitchA(config-port-range)#port 0/3 Enter port 0/3
SwitchA(config-port-0/3)#port mode trunk Set port 0/3 as trunk port
SwitchA(config-port-0/3)#port trunk allowed vlan 10 Add port 0/3 to VLAN 10
SwitchA(config-port-0/3)#mac-forced-forwarding Set port 0/3 as the network port
network-port
SwitchA(config-port-0/3)#exit Exit the port mode
SwitchA(config)#vlan 10 Enter the VLAN mode
SwitchA(config-vlan10)#mac-forced-forwarding Configure the default gateway of VLAN
default-gateway 10.1.1.254 as 10.1.1.254
Switch B configuration:
Command Description
SwitchB(config)#port 0/1 Enter port mode
SwitchB(config-port-0/1)#port access vlan 10 Add port 0/1 to VLAN 10
SwitchB(config-port-0/1)#port 0/2 Enter port 0/2
SwitchB(config-port-0/2)#port mode trunk Set port 0/2 as trunk port
SwitchB(config-port-0/2)#port trunk allowed vlan 10 Add port 0/2 to VLAN 10
PPPoE+ Technology
Main contents:
PPPoE+ principle
PPPoE+ Principle
With the popularity of the network construction based on the IP
technology and being richer of the user service type, carriers need to
enhance the control capability for the user service data. Currently, IP
DSLAM serves as the main access device of DSL. The upstream BAS
cannot or is hard to get the user port information from the Ethernet packet,
so it cannot authenticate and manage the user ports in a unified manner
and cannot prevent the user account from being embezzled effectively.
PPPoE is short for PPPoE Intermediate agent. At first, the scheme is put
forward on the DSL FORM forum and is defined according to the RFC 3046
user line ID field. The original idea of the PPPoE+ scheme is that after
receiving the PPPoE PADI and PPPoE PADR packets of the user, DSLAM
adds the PPPoE+ tag that indicates the user physical port number or PVC
in the packet. After identifying PPPoE+ Tag, the upstream BRAS extracts
the physical location information of the user and uses the Radius NAS-
Port-ID attribute to Radius Server for user identification and user
management.
Figure 41-1
1. The user terminal initiates the PPPOE request and sends the PPPOE
PADI packet;
4. After receiving PADI+VSA, BRAS replies the PADO packet to the user;
6. DSLAM captures the PADR packet and inserts PPPoE+ Tag to the PADR
packet;
8. Here, BRAS can process the PPP flow normally. After the PPP flow is
complete, BRAS sends PPPoE+ Tag to the IPTV service system and
Radius Server via Radius NAS-Port-ID.
Figure 41-2