Optical Switching Comprehensive Guide

Optics Switching Technology
P. Raatikainen
Switching Technology / 2005
L1 - 1
General
Lecturer:Pertti Raatikainen, research professor /VTT email: pertti.raatikainen@vtt.fi Exercises: Kari Seppnen, snr. research scientist /VTT email: kari.seppnen@vtt.fi Information: http://www.netlab.hut.fi/opetus/s38165
P. Raatikainen
L1 - 2
Goals of the course

Understand what switching is about Understand the basic structure and functions of a switching system Understand the role of a switching system in a transport network Understand how a switching system works Understand technology related to switching Understand how conventional circuit switching is related to packet switching
P. Raatikainen Switching Technology / 2005 L1 - 3
Course outline
Introduction to switching
switching in general switching modes transport and switching
Switch fabrics
basics of fabric architectures fabric structures path search, self-routing and sorting
P. Raatikainen
L1 - 4
Course outline Switch implementations

PDH switches ATM switches routers
Optical switching
basics of WDM technology components for optical switching optical switching concepts
P. Raatikainen
L1 - 5
Course requirements
Preliminary information
S-38.188 Tietoliikenneverkot or S-72.423 Telecommunication Systems (or a corresponding course)
13 lectures ( 3 hours) and 7 exercises ( 2 hours) Calculus exercises Grating

Calculus 0 to 6 bonus points valid in exams in 2005 Examination, max 30 points
P. Raatikainen
L1 - 6
Course material
Lecture notes Understanding Telecommunications 1, Ericsson & Telia, Studentlitteratur, 2001, ISBN 91-44-00212-2, Chapters 2-4. J. Hui: Switching and traffic theory for integrated broadband networks, Kluwer Academic Publ., 1990, ISBN 0-7923-9061-X, Chapters 1 - 6. H. J. Chao, C. H. Lam and E. Oki: Broadband Packet Switching technologies A Practical Guide to ATM Switches and IP routers, John Wiley & Sons, 2001, ISBN 0-471-00454-5. T.E. Stern and K. Bala: Multiwavelength Optical Networks: A Layered Approach, Addison-Wesley, 1999, ISBN 0-201-30967-X.
P. Raatikainen
L1 - 7
Additional reading
A. Pattavina: Switching Theory - Architecture and Performance in Broadband ATM Networks, John Wiley & Sons (Chichester), 1998, IBSN 0-471-96338-0, Chapters 2 - 4. R. Ramaswami and K. Sivarajan, Optical Networks, A Practical Perspective, Morgan Kaufman Publ., 2nd Ed., 2002, ISBN 1-55860-6556.
P. Raatikainen
L1 - 8
Schedule
Day L/E Topic 18.1. L Introduction to switching 25.1. L Transmission techniques and multiplexing 27.1. E Exercise 1 1.2. L Basic concepts of switch fabrics 8.2. L Multistage fabric architectures 1 10.2. E Exercise 2 15.2. L Multistage fabric architectures 2 22.2. L Self- routing and sorting networ ks 24.2. E Exercise 3 1.3. L Switch fabric implementations 8.3. L PDH switches 10.3. E Exercise 4 15.3. L ATM switches 17.3. E Exercise 5 22.3. L Routers 5.4. L Introduction to optical networks 7.4. E Exercise 6 12.4. L Optical network architectures 19.4. L Optical switches 21.4. E Exercise 7
P. Raatikainen
L1 - 9
http://www.netlab.hut.fi/opetus/s38165
Switching Technology S38.165
P. Raatikainen
L1 - 10
Switching in general Switching modes Transport and switching
P. Raatikainen
L1 - 11
Switching in general
ITU-T specification for switching:
The establishing, on-demand, of an individual connection from a desired inlet to a desired outlet within a set of inlets and outlets for as long as is required for the transfer of information.
inlet/outlet = a line or a channel

Switching in general (cont.)
Switching implies directing of information flows in communications networks based on known rules Switching takes place in specialized network nodes Data switched on bit, octet, frame or packet level Size of a switched data unit is variable or fixed
P. Raatikainen
L1 - 13
Why switching ?
Switches allow reduction in overall network cost by reducing number and/or cost of transmission links required to enable a given user population to communicate Limited number of physical connections implies need for sharing of transport resources, which means
better utilization of transport capacity use of switching
Switching systems are central components in communications networks
P. Raatikainen
L1 - 14
Full connectivity between hosts
Full mesh
Number of links to/from a host = n-1 Total number of links = n(n-1)/2
P. Raatikainen
L1 - 15
Centralized switching
Number of links to/from a host = 1 Total number of links = n
P. Raatikainen
L1 - 16
Switching network to connect hosts

Number of links to/from a host = 1 Total number of links depends on used network topology
P. Raatikainen
L1 - 17
Hierarchy of switching networks
Local switching network
To higher level of hierarchy
Long distance switching network
P. Raatikainen
L1 - 18
Sharing of link capacity
Space Division Multiplexing (SDM)

1 2 3 1
...
Physical link
CH 1 CH 2 CH n
Physical link
...
...
n
Space to be divided:
- physical cable or twisted pair - frequency - light wave

Switching Technology / 2005 L1 - 19
P. Raatikainen
Sharing of link capacity (cont.)

Time Division Multiplexing (TDM)
Synchronous transfer mode (STM)
1 2
...
n-1
1 3 2 1 2
...
...
Asynchronous transfer mode (ATM)

1 2
...
1 1 1 idle idle n 2 2 1 2
...
...
n Overhead P. Raatikainen
k
n Payload L1 - 20
Main building blocks of a switch

Switch control
Input Interface Input Card #1 Interface Input Card #1 interface #1
Switch fabric
Output Interface Output Card #1 Interface Output Card #1 interface #1
input signal reception error checking and recovery incoming frame disassembly buffering routing/switching decision
switching of data units from input interfaces to destined output interfaces limited buffering
buffering, prioritizing and scheduling outgoing frame assembly output signal generation and transmission
processing of signaling/connection control information configuration and control of input/output interfaces and switch fabric
P. Raatikainen
L1 - 21
Heterogeneity by switching
Switching systems allow heterogeneity among terminals
terminals of different processing and transmission speeds supported terminals may implement different sets of functionality
and heterogeneity among transmission links by providing a variety of interface types

data rates can vary different link layer framing applied optical and electrical interfaces variable line coding
P. Raatikainen
L1 - 22
Heterogeneity by switching (cont.)

Analog interface
Subscriber mux
ISDN (2B+D) or E1
E1 or E2
Remote subscriber switch
E1, E2 or E3
P. Raatikainen
L1 - 23
Basic types of witching networks

Statically switched networks
connections established for longer periods of time (typically for months or years) management system used for connection manipulation
Dynamically switched networks

connections established for short periods of time (typically from seconds to tens of minutes) active signaling needed to manipulate connections
Routing networks
no connections established - no signaling each data unit routed individually through a network routing decision made dynamically or statically
Development of switching technologies

Broadband, optical Broadband, electronic SPC, digital switching SPC, analog switching Crossbar switch Step-by-step Manual
1950 1960 1970 1980 1990 SPC - stored program control 2000 2010 2020
Source: Understanding Telecommunications 1, Ericsson & Telia, Studentlitteratur, 2001.
P. Raatikainen
L1 - 25
Development of switching tech. (cont.)

Manual systems
in the infancy of telephony, exchanges were built up with manually operated switching equipment (the first one in 1878 in New Haven, USA)
Electromechanical systems
manual exchanges were replaced by automated electromechanical switching systems a patent for automated telephone exchange in 1889 (Almon B. Strowger) step-by-step selector controlled directly by dial of a telephone set developed later in the direction of register-controlled system in which number information is first received and analyzed in a register the register is used to select alternative switching paths (e.g. 500 line selector in 1923 and crossbar system in 1937) more efficient routing of traffic through transmission network increased traffic capacity at lower cost
P. Raatikainen
L1 - 26
Development of switching tech. (cont.)

Computer-controlled systems
FDM was developed round 1910, but implemented in 1950s (ca. 1000 channels transferred in a coaxial cable) PCM based digital multiplexing introduced in 1970s transmission quality improved costs reduced further when digital group switches were combined with digital transmission systems computer control became necessary - the first computer controlled exchange put into service in 1960 (in USA) strong growth of data traffic resulted in development of separate data networks and switches advent of packet switching (sorting, routing and buffering) N-ISDN network combined telephone exchange and packet data switches ATM based cell switching formed basis for B-ISDN next step is to use optical switching with electronic switch control all optical switching can be seen in the horizon
Roadmap of Finnish networking technologies

Circuit switching Packet sw
UMTS GSM NMT-900 NMT-450 WWW Arpanet ---> Internet technology Data networks ISDN Digitalization of Exchanges Digital transmission Automation of long distance telephony
1955
-60
-65
-70
-75
-80
-85
-90
-95
2000
L1 - 28
P. Raatikainen
Challenges of modern switching

Support of different traffic profiles
constant and variable bit rates, bursty traffic, etc.
Simultaneous switching of highly different data rates

from kbits/s rates to Gbits/s rates
Support of varying delay requirements

constant and variable delays
Scalability
number of input/output links, link bit rates, etc.
Reliability Cost Throughput

Switching modes
P. Raatikainen
L1 - 30
Narrowband network evolution

Early telephone systems used analog technology - frequency division multiplexing (FDM) and space division switching (SDS) When digital technology evolved time division multiplexing (TDM) and time division switching (TDS) became possible Development of electronic components enabled integration of TDM and TDS => Integrated Digital Network (IDN) Different and segregated communications networks were developed
circuit switching for voice-only services packet switching for (low-speed) data services dedicated networks, e.g. for video and specialized data services
P. Raatikainen
L1 - 31
Segregated transport
UNI Voice Circuit switching network Packet switching network Dedicated network
UNI Voice
Data Data Video
Data Data Video
P. Raatikainen
L1 - 32
Narrowband network evolution (cont.)

Service integration became apparent to better utilize communications resources => IDN developed to ISDN (Integrated Services Digital Network) ISDN offered
a unique user-network interface to support basic set of narrowband services integrated transport and full digital access inter-node signaling (based on packet switching) packet and circuit switched end-to-end digital connections three types of channels (B=64 kbit/s, D=16 kbit/s and H=nx64 kbit/s)
Three types of long-distance interconnections

circuit switched, packet switched and signaling connections
Specialized services (such as video) continued to be supported by separate dedicated networks

Integrated transport
UNI Voice Data ISDN switch
Signaling network Circuit switching network Packet switching network ISDN switch
UNI Voice Data
Data Video
Dedicated network
Data Video
P. Raatikainen
L1 - 34
Broadband network evolution

Progress in optical technologies enabled huge transport capacities => integration of transmission of all the different networks (NB and BB) became possible Switching nodes of different networks co-located to configure multifunctional switches
each type of traffic handled by its own switching module
Multifunctional switches interconnected by broadband integrated transmission (BIT) systems terminated onto network-node interfaces (NNI) BIT accomplished with partially integrated access and segregated switching
P. Raatikainen
L1 - 35
Narrowband-integrated access and broadband-integrated transmission
Signaling switch Voice Data
Signaling switch Circuit switch Packet switch Ad-hoc switch NNI NNI Multifunctional switch UNI Data Video
ISDN switch
Circuit switch Packet switch
ISDN switch
Voice Data
Data Video UNI
Ad-hoc switch Multifunctional switch
P. Raatikainen
L1 - 36
Broadband network evolution (cont.)
N-ISDN had some limitations:

low bit rate channels no support for variable bit rates no support for large bandwidth services
Connection oriented packet switching scheme, i.e., ATM (Asynchronous Transfer Mode), was developed to overcome limitations of N-ISDN => B-ISDN concept => integrated broadband transport and switching (no more need for specialized switching modules or dedicated networks)
P. Raatikainen
L1 - 37
Broadband integrated transport
Voice Data Video UNI
B-ISDN switch
NNI NNI
B-ISDN switch
UNI
Voice Data Video
P. Raatikainen
L1 - 38
OSI definitions for routing and switching
Routing on L3
L4 L3 L2 L3 L2 L3 L2 L3 L2 L3 L2 L4 L3 L2
Switching on L2
L4 L3 L2 L3 L2 L3 L2 L3 L2 L3 L2 L4 L3 L2
P. Raatikainen
L1 - 39
Switching modes
Circuit switching Cell and frame switching Packet switching

Routing Layer 3 - 7 switching Label switching
P. Raatikainen
L1 - 40
Circuit switching
End-to-end circuit established for a connection Signaling used to set-up, maintain and release circuits Circuit offers constant bit rate and constant transport delay Equal quality offered to all connections Transport capacity of a circuit cannot be shared Applied in conventional telecommunications networks (e.g. PDH/PCM and N-ISDN)
Limited error detection Limited error detection
Layer 1
Layer 1
Layer 1
Layer 1
Network edge P. Raatikainen
Switching node Switching Technology / 2005
Network edge
L1 - 41
Cell switching
Virtual circuit (VC) established for a connection Data transported in fixed length frames (cells), which carry information needed for routing cells along established VCs Forwarding tables in network nodes
Layer 2 (H) Layer 2 (L) Layer 1
Error recovery & flow control Error & congestion control Limited error detection Layer 2 (L) Layer 1 Layer 2 (L) Layer 1 Error & congestion control Limited error detection
Network edge
L1 - 42
Cell switching (cont.)
Signaling used to set-up, maintain and release VCs as well as update forwarding tables VCs offer constant or variable bit rates and transport delay Transport capacity of links shared by a number of connections (statistical multiplexing) Different quality classes supported Applied, e.g. in ATM networks
P. Raatikainen
L1 - 43
Frame switching
Virtual circuits (VC) established usually for virtual LAN connections Data transported in variable length frames (e.g. Ethernet frames), which carry information needed for routing frames along established VCs Forwarding tables in network nodes
LLC MAC Layer 1
Error recovery & flow control Error & congestion control Limited error detection MAC Layer 1 MAC Layer 1 Error & congestion control Limited error detection
LLC MAC Layer 1
Network edge
L1 - 44
Frame switching (cont.)

VCs based, e.g., on 12-bit Ethernet VLAN IDs (Q-tag) or 48-bit MAC addresses Signaling used to set-up, maintain and release VCs as well as update forwarding tables VCs offer constant or variable bit rates and transport delay Transport capacity of links shared by a number of connections (statistical multiplexing) Different quality classes supported Applied, e.g. in offering virtual LAN services for business customers
P. Raatikainen
L1 - 45
Packet switching
No special transport path established for a connection Variable length data packets carry information used by network nodes in making forwarding decisions No signaling needed for connection setup
Routing & mux Layer 3 Layer 2 Layer 1 Error recovery & flow control Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1 Routing & mux Error recovery & flow control Layer 3 Layer 2 Layer 1
Network edge L1 - 46
Packet switching (cont.)
Forwarding tables in network nodes are updated by routing protocols No guarantees for bit rate or transport delay Best effort service for all connections in conventional packet switched networks Transport capacity of links shared effectively Applied in IP (Internet Protocol) based networks
P. Raatikainen
L1 - 47
Layer 3 - 7 switching
L3-switching evolved from the need to speed up (IP based) packet routing L3-switching separates routing and forwarding A communication path is established based on the first packet associated with a flow of data and succeeding packets are switched along the path (i.e. software based routing combined with hardware based one) Notice: In wire-speed routing traditional routing is implemented in hardware to eliminate performance bottlenecks associated with software based routing (i.e., conventional routing reaches/surpasses L3-switching speeds)
Layer 3 - 7 switching (cont.)

In L4 - L7 switching, forwarding decisions are based not only on MAC address of L2 and destination/source address of L3, but also on application port number of L4 (TCP/UDP) and on information of layers above L4
Routing info Layer 7 Layer 7
Layer 4 Layer 3 Layer 2 Layer 1
Evolved from the need to speed up connectionless packet switching and utilize L2-switching in packet forwarding A label switched path (LSP) established for a connection Forwarding tables in network nodes
Flow control Layer 3 Layer 2 Layer 1 Error recovery & flow control Error recovery & flow control Layer 3 Layer 2 Layer 1
...
Routing Error recovery & flow control
...
Flow control Routing Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1 Error recovery & flow control
Layer 4 Layer 3 Layer 2 Layer 1
Network edge L1 - 49
Label switching
Layer 2 Layer 1
Layer 2 Layer 1
Network edge
L1 - 50
Label switching (cont.)

Signaling used to set-up, maintain and release LSPs A label is inserted in front of a L3 packet (behind L2 frame header) Packets forwarded along established LSPs by using labels in L2 frames Quality of service supported Applied, e.g. in ATM, Ethernet and PPP Generalized label switching scheme (GMPLS) extends MPLS to be applied also in optical networks, i.e., enables light waves to be used as LSPs
P. Raatikainen
L1 - 51
Latest directions in switching

The latest switching schemes developed to utilize Ethernet based transport Scalability of the basic Ethernet concept has been the major problem, i.e., 12-bit limitation of VLAN ID Modifications to the basic Ethernet frame structure have been proposed to extend Ethernets addressing capability, e.g., Q-in-Q, Mac-in-Mac, Virtual MAN and Ethernet-over-MPLS Standardization bodies favor concepts (such as Q-in-Q and VMAN) that are backward compatible with the legacy Ethernet frame Signaling solutions still need further development
P. Raatikainen
L1 - 52
Transmission techniques and multiplexing hierarchies

P. Raatikainen
L2 - 1
Transmission techniques and multiplexing hierarchies

Transmission of data signals Timing and synchronization Transmission techniques and multiplexing
PDH ATM IP/Ethernet SDH/SONET OTN GFP
P. Raatikainen
L2 - 2
Transmission of data signals

Encapsulation of user data into layered protocol structure Physical and link layers implement functionality that have relevance to switching
multiplexing of transport signals (channels/connections) medium access and flow control error indication and recovery bit, octet and frame level timing/synchronization line coding (for spectrum manipulation and timing extraction)
P. Raatikainen
L2 - 3
Encapsulation of user data
User data error coding/indication octet & frame synchronization addressing medium access & flow control
TLH
Transport layer payload
NLH
Network layer payload
LLH
Link layer payload
PLH
Physical layer
line coding bit level timing physical signal generation/ recovery
P. Raatikainen
L2 - 4
Synchronization of transmitted data
Successful transmission of data requires bit, octet, frame and packet level synchronism Synchronous systems (e.g. PDH and SDH) carry additional information (embedded into transmitted line signal) for accurate recovery of clock signals Asynchronous systems (e.g. Ethernet) carry additional bit patterns to synchronize receiver logic
P. Raatikainen
L2 - 5
Timing accuracy
Inaccuracy of frequency classified in telecom networks to
jitter (short term changes in frequency > 10 Hz) wander (< 10 Hz fluctuation) long term frequency shift (drift or skew)
To maintain required timing accuracy, network nodes are connected to a hierarchical synchronization network
Universal Time Coordinated (UTC): error in the order of 10-13 Error of Primary Reference Clock (PRC) of the telecom network in the order of 10-11
P. Raatikainen
L2 - 6
Timing accuracy (cont.)

Inaccuracy of clock frequency causes
degraded quality of received signal bit errors in regeneration slips: in PDH networks a frame is duplicated or lost due to timing difference between the sender and receiver
Based on applied synchronization method, networks are divided into

fully synchronous networks (e.g. SDH) plesiochronous networks (e.g. PDH), sub-networks have nominally the same clock frequency but are not synchronized to each other mixed networks
Methods for bit level timing

To obtain bit level synchronism receiver clocks must be synchronized to incoming signal Incoming signal must include transitions to keep receivers clock recovery circuitry in synchronism Methods to introduce line signal transitions Line coding Block coding Scrambling
P. Raatikainen
L2 - 8
Line coding
1 +V Uncoded +V ADI +V ADI RZ +V AMI RZ -V ADI - Alternate Digit Inversion ADI RZ - Alternate Digit Inversion Return to Zero AMI RZ - Alternate Mark Inversion Return to Zero 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1
P. Raatikainen
L2 - 9
Line coding (cont.)

ADI, ADI RZ and codes alike introduce DC balance shift => clock recovery becomes difficult AMI and AMI RZ introduces DC balance, but lacks effective ability to introduce signal transitions HDB3 (High Density Bipolar 3) code, used in PDH systems, guarantees a signal transition at least every fourth bit
0000 coded by 000V when there is an odd number of pulses since the last violation (V) pulse 0000 coded by B00V when there is an even number of pulses since the last violation pulse
1 +V HDB3 -V
0 V
P. Raatikainen
L2 - 10
Line coding (cont.)

When bit rates increase (> 100 Mbit/s) jitter requirements become tighter and signal transitions should occur more frequently than in HDB3 coding CMI (Coded Mark Inversion) coding was introduced for electronic differential links and for optical links CMI doubles bit rate on transmission link => higher bit rate implies larger bandwidth and shortened transmission distance
1 +V
CMI
-V
P. Raatikainen
L2 - 11
Block coding
Entire blocks of n bits are replaced by other blocks of m bits (m > n) nBmB block codes are usually applied on optical links by using on-off keying Block coding adds variety of 1s and 0s to obtain better clock synchronism and reduced jitter Redundancy in block codes (in the form of extra combinations) enables error recovery to a certain extent When m>n the coded line signal requires larger bandwidth than the original signal Examples: 4B5B (FDDI), 5B6B (E3 optical links) and 8B10B (GbE)
P. Raatikainen
L2 - 12
Coding examples
4B5B coding
Input word 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Output word 11110 01001 10100 10101 01010 01011 01110 01111 10010 10011 10110 10111 11010 11011 11100 11101 Other output words 00000 11111 00100 11000 10001 01101 00111 11001 00001 00010 00011 00101 00110 01000 01100 10000 Quiet line symbol Idle symbol Halt line symbol Start symbol Start symbol End symbol Reset symbol Set Symbol Invalid Invalid Invalid Invalid Invalid Invalid Invalid Invalid
5B6B coding
Input word 00000 00001 00010 00011 ... 11100 11101 11110 11111 Output word 101011 101010 101001 111000 ... 010011 010111 011011 011100
P. Raatikainen
L2 - 13
Scrambling
Data signal is changed bit by bit according to a separate repetitive sequence (to avoid long sequences of 1s or 0s) Steps of the sequence give information on how to handle bits in the signal being coded A scrambler consists of a feedback shift register described by a polynomial (xN + + xm + + xk + + x + 1) Polynomial specifies from where in the shift register feedback is taken Output bit rate is the same as the input bit rate Scrambling is not as effective as line coding
P. Raatikainen
L2 - 14
Scrambler example
SDH/STM-1 uses x7+x6+1 polynomial
Scrambler
x
Preset
0
+
x
2
Di x
7
Si
+
Xi
Xi = SiDi
Descrambler
x0
Preset
+
x3 x4 x5 x6 x7
Xi
x1
x2
Si
+
Ri =Di
Ri = SiXi = Si(SiDi) = Di
P. Raatikainen
L2 - 15
Methods for octet and frame level timing
Frame alignment bit pattern Start of frame signal Use of frame check sequence
P. Raatikainen
L2 - 16
Frame alignment sequence

Data frames carry special frame alignment bit patterns to obtain octet and frame level synchronism Data bits scrambled to avoid misalignment Used in networks that utilize synchronous transmission, e.g. in PDH, SDH and OTN Examples
PDH E1 frames carry bit sequence 0011011 in every other frame (even frames) SDH and OTN frames carry a six octet alignment sequence (hexadecimal form: F6 F6 F6 28 28 28) in every frame
P. Raatikainen
L2 - 17
Start of frame signal

Data frames carry special bit patterns to synchronize receiver logic False synchronism avoided for example by inserting additional bits into data streams Used in synchronous and asynchronous networks, e.g., Ethernet and HDLC Examples
Ethernet frames are preceded by a 7-octet preamble field (10101010) followed by a start-of-frame delimiter octet (10101011) HDLC frames are preceded by a flag byte (0111 1110)
P. Raatikainen
L2 - 18
Frame check sequence

Data frames carry no special bit patterns for synchronization Synchronization is based on the use of error indication and correction fields
CRC (Cyclic Redundancy Check) calculation
Used in bit synchronous networks such as ATM and GFP (Generic Framing Procedures) Example
ATM cells streams can be synchronized to HEC (Header Error Control) field, which is calculated across ATM cell header
P. Raatikainen
L2 - 19
Transmission techniques
PDH (Plesiochronous Digital Hierarchy) ATM (Asynchronous Transfer Mode) IP/Ethernet SDH (Synchronous Digital Hierarchy) OTN (Optical Transport network) GFP (Generic Framing Procedure)
P. Raatikainen
L2 - 20
Plesiochronous Digital Hierarchy (PDH)

Transmission technology of the digitized telecom network Basic channel capacity 64 kbit/s Voice information PCM coded 8 bits per sample A or law sample rate 8 kHz (125 s) Channel associated signaling (SS7) Higher order frames obtained by multiplexing four lower order frames bit by bit and adding some synchr. and management info The most common switching and transmission format in the telecommunication network is PCM 30 (E1)
E4
139.264 Mbit/s
1920 channels
x 4
E3
34.368 Mbit/s
480 channels
x 4
E2
8.448 Mbit/s
120 channels
x 4
E1
2.048 Mbit/s
30 channels
x 32
...
E0
64 kbit/s
1 channel
P. Raatikainen
L2 - 21
PDH E1-frame structure (even frames)

Multi- frame F0 F1 ... F14 F15
Voice channels 1 - 15
T0 T1 T2 T0
T15 T16 T17
...
...
T28 T29 T30 T31
Frame alignment time-slot

C 0 0 1 1 0 1 1
Signaling time-slot
0 0 0 0 1 A 1 1
Voice channel 28
B1 B2 B3 B4 B5 B6 B7 B8
Frame alignment signal (FAS) Error indicator bit (CRC-4)
Multi-frame alignment bit sequence in F0
Multi-frame alarm
Polarity
Voice sample amplitude
P. Raatikainen
L2 - 22
PDH E1-frame structure (odd frames)

Multi- frame F0 F1 ... F14 F15
T0 T1 T2 T0
T15 T16 T17
...
...
T28 T29 T30 T31
Frame alignment time-slot

C 1 A D D D D D
Signaling time-slot
a b c d a b c c
Error indicator bit (CRC-4)
Data bits for management Far end alarm indication
Channel 1 signaling bits
Channel 16 signaling bits
Nowadays, time slot 1 used for signaling and time slot 16 for voice
P. Raatikainen
L2 - 23
PDH-multiplexing
Tributaries have the same nominal bit rate, but with a specified, permitted deviation (100 bit/s for 2.048 Mbit/s) Plesiochronous = tributaries have almost the same bit rate Justification and control bits are used in multiplexed flows First order (E1) is octet-interleaved, but higher orders (E2, ) are bit-interleaved
P. Raatikainen
L2 - 24
PDH network elements

concentrator
n channels are multiplexed to a higher capacity link that carries m channels (n > m)
multiplexer
n channels are multiplexed to a higher capacity link that carries n channels
cross-connect
static multiplexing/switching of user channels
switch
switches incoming TDM/SDM channels to outgoing ones
P. Raatikainen
L2 - 25
Example PDH network elements

Concentrator
n input channels
Cross-connect
...
n>m
m output channels
DXC
Multiplexer
4 3 3 3 2 2 2 1 1 1
Switch
4 4 4 3 3 3 2 2 2 1 1 1
n input channels
...
n=m
m output channels
P. Raatikainen
L2 - 26
Synchronous digital hierarchy

STM-256
40 Gbit/s
Major ITU-T SDH standards: - G.707 - G.783
x 4
10 Gbit/s
STM-64
x 4
2.48 Gbit/s
STM-16
x 4
Notice that each frame transmitted in 125 s !
STM-4
622 Mbit/s
x 4
155 Mbit/s
STM-1
P. Raatikainen
L2 - 27
SDH reference model

MPX Tributaries STM-n DXC STM-n R STM-n R STM-n MPX Tributaries L2 - 28
Regeneration section Multiplexing section
Regeneration section
Regeneration section
Multiplexing section Path layer connection
- DXC - MPX -R
Digital gross-connect Multiplexer Repeater
P. Raatikainen
SDH-multiplexing
Multiplexing hierarchy for plesiochronous and synchronous tributaries (e.g. E1 and E3) Octet-interleaving, no justification bits - tributaries visible and available in the multiplexed SDH flow SDH hierarchy divided into two groups: multiplexing level (virtual containers, VCs) line signal level (synchronous transport level, STM) Tributaries from E1 (2.048 Mbit/s) to E4 (139.264 Mbit/s) are synchronized (using justification bits if needed) and packed in containers of standardized size Control and supervisory information (POH, path overhead) added to containers => virtual container (VC)
P. Raatikainen
SDH-multiplexing (cont.)
Different sized VCs for different tributaries (e.g. VC-12/E1, VC-3/E3, VC-4/E4) Smaller VCs can be packed into a larger VC (+ new POH) Section overhead (SOH) added to larger VC => transport module Transport module corresponds to line signal (bit flow transferred on the medium) bit rate is 155.52 Mbit/s or its multiples transport modules called STM-N (N = 1, 4, 16, 64, ...) bit rate of STM-N is Nx155.52 Mbit/s duration of a module is 125 s (= duration of a PDH frame)
P. Raatikainen
SDH network elements

regenerator (intermediate repeater, IR)
regenerates line signal and may send or receive data via communication channels in RSOH header fields
multiplexer
terminal multiplexer multiplexes/demultiplexes PDH and SDH tributaries to/from a common STM-n add-drop multiplexer adds or drops tributaries to/from a common STMn
digital cross-connect
used for rearrangement of connections to meet variations of capacity or for protection switching connections set up and released by operator
Example SDH network elements

Cross-connect
STM-n STM-n
STM-n
DXC
STM-n
STM-n
STM-n
Add-drop multiplexer
STM-n STM-n
Terminal multiplexer
STM-n
ADM
ADM
2 - 140 Mbit/s
2 - 140 Mbit/s
P. Raatikainen
L2 - 32
Generation of STM-1 frame
Justification
PDH/E1
VC-12
MUX
VC-4
STM-1
+ POH
+ POH
+ SOH
P. Raatikainen
L2 - 33
STM-n frame
Three main fields: Regeneration and multiplexer section overhead (RSOH and MSOH) Payload and path overhead (POH) AU (administrative) pointer specifies where payload (VC-4 or VC-3) starts
nx9 octets nx261 octets
3 1
RSOH AU-4 PTR P O H
MSOH
P. Raatikainen
L2 - 34
Synchronization of payload
Position of each octet in a STM frame (or VC frame) has a number AU pointer contains position number of the octet in which VC starts Lower order VC included as part of a higher order VC (e.g. VC-12 as part of VC-4)
RSOH
VC-4 no. 0
STM-1 no. k
AU-4 PTR
MSOH RSOH
VC-4 no. 1
STM-1 no. k+1
AU-4 PTR
MSOH
VC-4 no. 2
P. Raatikainen
L2 - 35
ATM concept in summary

cell
53 octets
routing/switching
based on VPI and VCI
adaptation
processing of user data into ATM cells
error control
cell header checking and discarding
flow control
no flow control input rate control
congestion control
cell discarded (two priorities)
ATM protocol reference model

AAL Convergence sublayer (CS) Segmentation and reassembly (SAR) Generic flow control ATM VPI/VCI translation Multiplexing and demultiplexing of cells Cell rate decoupling HEC header sequence generation/verification TC Phys PM Cell delineation Transmission frame adaptation Transmission frame generation/recovery Timing Physical medium
P. Raatikainen
L2 - 37
Reference interfaces
NNI EX ATM network
UNI
TE
NNI UNI EX TE
Network-to-Network Interface User Network Interface Exchange Equipment Terminal Equipment
P. Raatikainen
L2 - 38
ATM cell structure

5 octets 48 octets
ATM ATM header header

ATM header for UNI
GFC GFC VPI VPI VCI VCI VPI VPI VCI VCI PTI PTI CPL CPL
Cell Cellpayload payload
VCI VCI HEC HEC
ATM header for NNI

VPI VPI VCI VCI VPI VPI VCI VCI HEC HEC VCI VCI PTI PTI CPL CPL
UNI NNI VPI VCI GFC PTI CPL HEC
- User Network Interface - Network-to-Network Interface - Virtual Path Identifier - Virtual Channel Identifier - Generic Flow Control - Payload Type Identifier - Cell Loss Priority - Header Error Control
HEC = 8 x (header octets 1 to 4) / (x8 + x2 + x + 1)
P. Raatikainen
L2 - 39
ATM connection types
VCI 1 VCI 2
VPI 1
VPI 1
VCI 1 VCI 2
Physical channel
VCI 1 VCI 2
VPI 2
VPI 2
VCI 1 VCI 2
VCI k VPI k
- Virtual Channel Identifier k - Virtual Path Identifier k
P. Raatikainen
L2 - 40
Physical layers for ATM

SDH (Synchronous Digital Hierarchy)
STM-1 155 Mbit/s STM-4 622 Mbit/s STM-16 2.4 Gbit/s
PDH (Plesiochronous Digital Hierarchy)

E1 E3 E4 2 Mbit/s 34 Mbit/s 140 Mbit/s
TAXI 100 Mbit/s and IBM 25 Mbit/s Cell based interface

uses standard bit rates and physical level interfaces (e.g. E1, STM-1 or STM-4) HEC used for framing
Transport of data in ATM cells
Network layer
IP packet
65 535
Pad 0 - 47 octets (1+1+ 2) octets 4 octets

P
UU/ CPI/ LEN
ATM adaptation layer (AAL) 5

H
AAL 5 payload
CRC
ATM layer
48
Cell payload H Cell payload H Cell payload H Cell payload
Physical layer
P UU CPI LEN
Padding octets AAL layer user-to-user indicator Common part indicator Length indicator
P. Raatikainen
L2 - 42
ATM cell encapsulation / SDH

9 octets 261 octets
STM-1 frame
SOH AU-4 PTR

J1 B3 C2
VC-4 frame
... ...
SOH
G1 F2 H4 Z3 Z4 Z5
... ... ...
...
VC-4 POH
ATM cell
P. Raatikainen
L2 - 43
ATM cell encapsulation / PDH (E1)

32 octets
TS0 TS0 TS0 TS0 TS0 Header Header TS16 TS16 TS16 TS16 TS16 Head. Header
TS0 frame alignment F3 OAM functions loss of frame alignment performance monitoring transmission of FERF and LOC performance reporting
P. Raatikainen Switching Technology / 2005
...
TS16 reserved for signaling
L2 - 44
Cell based interface

Frame structure for cell base interfaces:
27 P L IDLE or PL-OAM H 1 ATM layer H 2 ATM layer 26 27 P L IDLE or PL-OAM
...
ATM layer
PL cells processed on physical layer (not on ATM layer) IDLE cell for cell rate adaptation PL-OAM cells carry physical level OAM information (regenerator (F1) and transmission path (F3) level messages) PL cell identified by a pre-defined header
00000000 00000000 0000000 00000001 (IDLE cell) 00000000 00000000 0000000 00001001 (phys. layer OAM) xxxx0000 00000000 0000000 0000xxxx (reserved for phys. layer) H = ATM cell Header, PL = Physical Layer, OAM = Operation Administration and Maintenance P. Raatikainen Switching Technology / 2005 L2 - 45
ATM network elements

Gross-connect
switching of virtual paths (VPs) VP paths are statically connected
Switch
switching of virtual channel (VCs) VC paths are dynamically or statically connected
DSLAM (Digital Subscriber Line Access Multiplexer)

concentrates a larger number of sub-scriber lines to a common higher capacity link aggregated capacity of subscriber lines surpasses that of the common link
Ethernet
Originally a link layer protocol for LANs (10 and 100 MbE) Upgrade of link speeds => optical versions 1GbE and 10 GbE => suggested for long haul transmission No connections - each data terminal (DTE) sends data when ready - MAC is based on CSMA/CD Synchronization
line coding, preamble pattern and start-of-frame delimiter Manchester code for 10 MbE, 8B6T for 100 MbE, 8B10B for GbE
Ethernet frame
64 - 1518 octets
S F D
Preamble 7
DA 6
SA 6
T/L 2
Payload 46 - 1500
CRC 4
Preamble - AA AA AA AA AA AA AA (Hex) SFD - Start of Frame Delimiter AB (Hex) DA - Destination Address SA - Source Address T/L - Type (RFC894, Ethernet) or Length (RFC1042, IEEE 802.3) indicator CRC - Cyclic Redundance Check Inter-frame gap 12 octets (9,6 s /10 MbE)
1GbE frame
512 - 1518 octets
S F D
Preamble
DA
SA
Payload 46 - 1500
CRC
Extension
Preamble - AA AA AA AA AA AA AA (Hex) SFD - Start of Frame Delimiter AB (Hex) DA - Destination Address SA - Source Address T/L - Type (RFC894, Ethernet) or Length (RFC1042, IEEE 802.3) indicator CRC - Cyclic Redundancy Check Inter-frame gap 12 octets (96 ns /1 GbE) Extension - for padding short frames to be 512 octets long P. Raatikainen Switching Technology / 2005 L2 - 49
Ethernet network elements

Repeater
interconnects LAN segments on physical layer regenerates all signals received from one segment and forwards them onto the next
Bridge
interconnects LAN segments on link layer (MAC) all received frames are buffered and error free ones are forwarded to another segment (if they are addressed to it)
Hub and switch

hub connects DTEs with two twisted pair links in a star topology and repeats received signal from any input to all output links switch is an intelligent hub, which learns MAC addresses of DTEs and is capable of directing received frames only to addressed ports
Optical transport network

Optical Transport Network (OTN), being developed by ITU-T (G.709), specifies interfaces for optical networks Goal to gather for the transmission needs of todays wide range of digital services and to assist network evolution to higher bandwidths and improved network performance OTN builds on SDH and introduces some refinements:
management of optical channels in optical domain FEC to improve error performance and allow longer link spans provides means to manage optical channels end-to-end in optical domain (i.e. no O/E/O conversions) interconnections scale from a single wavelength to multiple ones
OTN reference model

OMPX Optical channels OA OA OMPX Optical channels OTS L2 - 52
OTS
OTS OMS OCh
- OCh - OA - OMS - OMPX - OTS P. Raatikainen
Optical Channel Optical Amplifier Optical Multiplexing Section Optical Multiplexer Optical Transport Section Switching Technology / 2005
OTN layers and OCh sub-layers
SONET/ SDH
ATM
Ethernet
IP
OPU Optical channel payload unit ODU Optical channel data unit OTU Optical channel transport unit
Optical channel Optical multiplexing section (OMSn)

Optical transport section (OTSn)
P. Raatikainen
L2 - 53
OTN frame structure

Three main fields Optical channel overhead Payload Forward error indication field
GbE IP FR SONET/SDH ATM GbE IP
ATM/FR SONET/SDH
DWDM
Och
Payload Client Digital wrapper
FEC
P. Raatikainen
L2 - 54
OTN frame structure (cont.)

4080 bytes
1 ..... 16 17 ................................... 3824 3825 ... 4080
4 rows
Och overhead
Payload
FEC
.....
.....
14
15 ... 16
1 2 3 4
Frame alignmt.
OTU overhead
ODU overhead
OPU overh.
OTU - Optical transport unit ODU - Optical data unit OPU - Optical payload unit FEC - Forward error correction
Frame size remains the same (4x4080) regardless of line rate => frame rate increases as line rate increases Three line rates defined: OTU1 2.666 Gbit/s OTU2 10.709 Gbit/s OTU3 43.014 Gbit/s
P. Raatikainen
L2 - 55
Generation of OTN frame and signal

OTN frame generation
Client signal OPU + OPU-OH ODU + OTU-OH + FEC OTU
OTN signal generation

Client signal Client signal OCh OMUX OMS OTS OCh
P. Raatikainen
L2 - 56
OTN network elements

optical amplifier
amplifies optical line signal
optical multiplexer
multiplexes optical wavelengths to OMS signal add-drop multiplexer adds or drops wavelengths to/from a common OMS
optical cross-connect
used to direct optical wavelengths (channels) from an OMS to another connections set up and released by operator
optical switches ?
when technology becomes available optical switches will be used for switching of data packets in the optical domain
P. Raatikainen
L2 - 57
Generic Framing Procedure (GFP)

Recently standardized traffic adaptation mechanism especially for transporting block-coded and packet-oriented data Standardized by ITU-T (G.7041) and ANSI (T1.105.02) (the only standard supported by both organizations) Developed to overcome data transport inefficiencies of existing ATM, POS, etc. technologies Operates over byte-synchronous communications channels (e.g. SDH/SONET and OTN) Supports both fixed and variable length data frames Generalizes error-control-based frame delineation scheme (successfully employed in ATM)
relies on payload length and error control check for frame boundary delineation
GFP (cont.)
Two frame types: client and control frames
client frames include client data frames and client management frames control frames used for OAM purposes
Multiple transport modes (coexistent in the same channel) possible

Frame-mapped GFP for packet data, e.g. PPP, IP, MPLS and Ethernet) Transparent-mapped GFP for delay sensitive traffic (storage area networks), e.g. Fiber Channel, FICON and ESCON
P. Raatikainen
L2 - 59
GFP frame types
GFP frames
Client frames
Control frames
Client data frames
Client management frames
Idle frames
OA&M frames
P. Raatikainen
L2 - 60
GFP client data frame

Composed of a frame header and payload Core header intended for data link management
payload length indicator (PLI, 2 octets), HEC (CRC-16, 2 octets)
Payload field divided into payload header, payload and optional FCS (CRC-32) sub-fields Payload header includes:
payload type (2 octets) and type HEC (2 octets) sub-fields optional 0 - 60 octets of extension header
Payload:
variable length (0 - 65 535 octets, including payload header and FCS) for frame mapping mode (GFP-F) - frame multiplexing fixed size Nx[536, 520] for transparent mapping mode (GFP-T) - no frame multiplexing
GFP frame structure

PTI Payload type Type HEC CID Payload header PFI UPI EXI
Payload length indicator
Core header
Core HEC
0 60 bytes extension header (optional)
Spare Extension HEC MSB Extension HEC LSB
Payload area
[N x 536, 520 bytes or variable length packet]
Payload
Payload FCS
CID - Channel identifier FCS - Frame Check Sequence EXI - Extension Header Identifier HEC - Header Error Check PFI - Payload FCS Indicator PTI - Payload Type Indicator UPI - User payload Identifier
Source: IEEE Communications Magazine, May 2002
P. Raatikainen
L2 - 62
GFP relationship to client signals and transport paths

Ethernet Fiber Channel MAPOS ESCON Other client signals IP/PPP FICON RPR
Frame mapped
GFP client-dependent GFP client-independent
Transparent mapped
SDH/SONET path
OTN ODUk path
ESCON FICON IP/PPP MAPOS RPR
Enterprise System CONnection Fiber CONnection IP over Point-to-Point Protocol Multiple Access Protocol over SONET/SDH Resilient Packet Ring
Source: IEEE Communications Magazine, May 2002
P. Raatikainen
L2 - 63
Adapting traffic via GFP-F and GFP-T
GFP-F frame
2 bytes 2 bytes
PLI
cHEC
Payload header
4 bytes
Client PDU (PPP, IP, Ethernet, RPR, etc.)
FCS (optional)
4 bytes
GFP-T frame
2 bytes 2 bytes
PLI
cHEC
Payload header
4 bytes
8x64B/65B superblock #1
#2
... #N-1 #N
FCS (optional)
4 bytes
FCS cHEC PDU PLI
- Frame Check Sequence - Core Header Error Control - Packet Data Unit - Payload Length Indicator Switching Technology / 2005 L2 - 64
P. Raatikainen
GFP-T frame mapping

64B/65B code block
8B 8B 8B 8B 8B 8B 8B 8B
8 x 64B/65B code blocks
Superblock (8 x 64B/65B code blocks + CRC-16)

CRC-16
GFP-T frame with five superblocks
Core header and payload header
FCS (optional)
P. Raatikainen
L2 - 65
Switch Fabrics
P. Raatikainen
3-1
Switch fabrics
Basic concepts Time and space switching Two stage switches Three stage switches Cost criteria Multi-stage switches and path search
Switching Technology / 2003 3-2
P. Raatikainen
Switch fabrics
Multi-point switching Self-routing networks Sorting networks Fabric implementation technologies Fault tolerance and reliability
P. Raatikainen
3-3
Basic concepts Accessibility Blocking Complexity Scalability Reliability Throughput

P. Raatikainen
Accessibility
A network has full accessibility when each inlet can be connected to each outlet (in case there are no other I/O connections in the network) A network has a limited accessibility when the above given property does not exist Interconnection networks applied in todays switch fabrics usually have full accessibility
P. Raatikainen
3-5
Blocking
Blocking is defined as failure to satisfy a connection requirement and it depends strongly on the combinatorial properties of the switching networks
Network class
Network type Strict-sense non-blocking
Network state Without blocking states
Non-blocking
Wide-sense non-blocking Rearrangeably non-blocking
With blocking state
Blocking
Others
P. Raatikainen
3-6
Blocking (cont.)
Non-blocking - a path between an arbitrary idle inlet and arbitrary idle outlet can always be established independent of network state at set-up time Blocking - a path between an arbitrary idle inlet and arbitrary idle outlet cannot be established owing to internal congestion due to the already established connections Strict-sense non-blocking - a path can always be set up between any idle inlet and any idle outlet without disturbing paths already set up Wide-sense non-blocking - a path can be set up between any idle inlet and any idle outlet without disturbing existing connections, provided that certain rules are followed. These rules prevent network from entering a state for which new connections cannot be made Rearrangeably non-blocking - when establishing a path between an idle inlet and an idle outlet, paths of existing connections may have to be changed (rearranged) to set up that connection
P. Raatikainen
Complexity
Complexity of an interconnection network is expressed by cost index Traditional definition of cost index gives the number of crosspoints in a network
used to be a reasonable measure of space division switching systems
Nowadays cost index alone does not characterize cost of an interconnection network for broadband applications
VLSIs and their integration degree has changed the way how cost of a switch fabric is formed (number of ICs, power consumption) management and control of a switching system has a significant contribution to cost
P. Raatikainen
3-8
Scalability
Due to constant increase of transport links and data rates on links, scalability of a switching system has become a key parameter in choosing a switch fabric architecture Scalability describes ability of a system to evolve with increasing requirements Issues that are usually matter of scalability
number of switching nodes number of interconnection links between nodes bandwidth of interconnection links and inlets/outlets throughput of switch fabric buffering requirements number of inlets/outlets supported by switch fabric
P. Raatikainen
3-9
Reliability
Reliability and fault tolerance are system measures that have an impact on all functions of a switching system Reliability defines probability that a system does not fail within a given time interval provided that it functions correctly at the start of the interval Availability defines probability that a system will function at a given time instant Fault tolerance is the capability of a system to continue its intended function in spite of having a fault(s) Reliability measures:
MTTF (Mean Time To Failure) MTTR (Mean Time To Repair) MTB (Mean Time Between Failures)
P. Raatikainen Switching Technology / 2003 3 - 10
Throughput
Throughput gives forwarding/switching speed/efficiency of a switch fabric It is measured in bits/s, octets/s, cells/s, packet/s, etc. Quite often throughput is given in the range (0 ... 1.0], i.e. the obtained forwarding speed is normalized to the theoretical maximum throughput
P. Raatikainen
3 - 11
Switch fabrics
Switching Technology / 2003 3 - 12
P. Raatikainen
Switching mechanisms
A switched connection requires a mechanism that attaches the right information streams to each other Switching takes place in the switching fabric, the structure of which depends on networks mode of operation, available technology and required capacity Communicating terminals may use different physical links and different time-slots, so there is an obvious need to switch both in time and in space domain Time and space switching are basic functions of a switch fabric
Space division switching

A space switch directs traffic from input links to output links An input may set up one connection (1, 3, 6 and 7), multiple connections (4) or no connection (2, 5 and 8)
INPUTS 1 2 3 4 5 6 7 8 m INPUT LINKS INTERCONNECTION NETWORK OUTPUTS 1 2 3 4 5 6
n OUTPUT LINKS
P. Raatikainen
3 - 14
Crossbar switch matrix

Crossbar matrix introduces the basic structure of a space switch Information flows are controlled (switched) by opening and closing cross-points m inputs and n outputs => mn cross-points (connection points) Only one input can be connected to an output at a time, but an input can be connected to multiple outputs (multi-cast) at a time
1 2 3 4 5 6 7 8 m INPUT LINKS MULTI-CAST
A CLOSED CROSS-POINT 1 2 3 4 5 6
n OUTPUT LINKS
P. Raatikainen
3 - 15
An example space switch

m x1 -multiplexer used to implement a space switch Every input is fed to every output mux and mux control signals are used to select which input signal is connected through each mux
mux/connection control 1 2 m
mx1 mx1 mx1
1
P. Raatikainen
2
3 - 16
Time division multiplexing

Time-slot interchanger is a device, which buffers m incoming timeslots, e.g. 30 time-slots of an E1 frame, arranges new transmit order and transmits n time-slots Time-slots are stored in buffer memory usually in the order they arrive or in the order they leave the switch - additional control logic is needed to decide respective output order or the memory slot where an input slot is stored
TIME-SLOT INTERCHANGER
INPUT CHANNELS 6 5 4 3 2 1
Time-slot 1 Time-slot 2 Time-slot 3 Time-slot 4 Time-slot 5 Time-slot 6 5
OUTPUT CHANNELS 1 3 2 6 4
BUFFER SPACE FOR TIME-SLOTS
P. Raatikainen
3 - 17
Time-slot interchange
DESTINATION OUTPUT #
BUFFER FOR m INPUT/OUTPUT SLOTS 1 2 3 4 5 6 7 8 6 5 4 3 2 1 n OUTPUT LINKS
m INPUT LINKS
(3) 8 7 6
(2) 5 4
(4) (1,6) (5) 3 2 1
P. Raatikainen
3 - 18
Time switch implementation example 1

Incoming time-slots are written cyclically into switch memory Output logic reads cyclically control memory, which contains a pointer for each output time-slot Pointer indicates which input time-slot to insert into each output time-slot
Incoming frame buffer m 3 2 1 Outgoing frame buffer n j 2 1
Cyclic read
Switch memory 1 2 3 k
write address (3)
. . . . . .
Control memory 1 2 3
Cyclic write
. . . . . .
Time-slot counter & R/W control
P. Raatikainen
read/write address (j)
read address (k)
j (k)
3 - 19
Time switch implementation example 2

Incoming time-slots are written into switch memory by using write-addresses read from control memory A write address points to an output slot to which the input slot is addressed Output time-slots are read cyclically from switch memory
Incoming frame buffer m 3 2 1 Control memory 1 2 3 (k) Switch memory 1 2 3 k Outgoing frame buffer n j 2 1
Cyclic read
Cyclic write
write address (k)
read address (3)
. . .
m
. . .
P. Raatikainen
read/write address (2)
. . .
3 - 20
10
Properties of time switches

Input and output frame buffers are read and written at wire-speed, i.e. m R/Ws for input and n R/Ws for output Interchange buffer (switch memory) serves all inputs and outputs and thus it is read and written at the aggregate speed of all inputs and outputs => speed of an interchange buffer is a critical parameter in time switches and limits performance of a switch Utilizing parallel to serial conversion memory speed requirement can be cut Speed requirement of control memory is half of that of switch memory (in fact a little moor than that to allow new control data to be updated)
P. Raatikainen
3 - 21
Time-Space analogy
A time switch can be logically converted into a space switch by setting time-slot buffers into vertical position => time-slots can be considered to correspond to input/output links of a space switch But is this logical conversion fair ? Space switch Time switch
m 3 2 1 n 3 2 1 1 2 3 1 2 3 m
P. Raatikainen
3 - 22
11
Space-Space analogy
A space switch carrying time multiplexed input and output signals can be logically converted into a pure space switch (without cyclic control) by distributing each time-slot into its own space switch
Inputs and outputs are time multiplexed signals (K time-slots)
1 2 m 1 2 n 1 2 m 1 2 m 1 1 2 n 1 2 n 1 2 n
To switch a time-slot, it is enough to control one of the K boxes

P. Raatikainen
1 2 m
3 - 23
An example conversion
K multiplexed input signals on each link 1 2 m mxn
1 2 m 1 2 m
1 2 n KxK
1
1 2 m
nxm
1
1 2 m 1 2 m
1 2 m
1 2 m
P. Raatikainen
3 - 24
12
Properties of space and time switches

Space switches number of cross-points (e.g. AND-gates) - m input x n output = mn - when m=n => n2 output bit rate determines the speed requirement for the switch components both input and output lines deploy bus structure => fault location difficult Time switches size of switch memory (SM) and control memory (CM) grows linearly as long as memory speed is sufficient, i.e. - SM = 2 x number of time-slots - CM = 2 x number of time-slots a simple and cost effective structure when memory speed is sufficient speed of available memory determines the maximum switching capacity
3 - 25
P. Raatikainen
Switch fabrics
P. Raatikainen
13
A switch fabric as a combination of space and time switches

Two stage switches
Time-Time (TT) switch Time-Space (TS) switch Space-Time (SP) switch Space-Apace (SS) switch
TT-switch gives no advantage compared to a single stage T-switch SS-switch increases blocking probability
P. Raatikainen
3 - 27
A switch fabric as a combination of space and time switches (cont.)

ST-switch gives high blocking probability (S-switch can develop blocking on an arbitrary bus, e.g. slots from two different buses attempting to flow to a common output) TS-switch has low blocking probability, because T-switch allows rearrangement of time-slots so that S-switching can be done blocking free
ST-switch
TS 1 TS 2 TS 1 TS 1 TS 2 TS 2
TS-switch
TS 1 TS 2 TS 1 TS 1 TS 2 TS 2
1 2 n
TS n
1 2 n
1 2 n
TS n
1 2 n
TS n TS n
TS n TS n
P. Raatikainen
3 - 28
14
Time multiplexed space (TMS) switch

Space divided inputs and each of them carry a frame of three time-slots Input frames on each link are synchronized to the crossbar A switching plane for each time-slot to direct incoming TS3 s slots to destined output me 4 fra g n links of the mi 2 3 co n I corresponding 3 2 1 4 1 time-slot
Output link address 2 3 Cross-point closed 4 4x4 plane for slot 3 Outputs
e ac Sp 3 4
TS2 TS1 1 2
1 2
4x4 plane for slot 1 1 2 3
T i m e
4x4 plane for slot 2 1
P. Raatikainen
3 - 29
Connection conflicts in a TMS switch

Space divided inputs and each of them carry a frame of three time-slots Input frames on each link are synchronized to the crossbar A switching plane for each time-slot to direct incoming TS3 s me slots to destined output 4 fra ng i 2 3 links of the om Inc 3 2 1 corresponding 4 1 time-slot
Conflict solved by time-slot interchange Outputs
e ac Sp 3 4
TS2 TS1 1 2
4
1 2
T i m e
Connection conflict 4 Cross-point closed
4x4 plane for slot 3
P. Raatikainen
3 - 30
15
TS switch interconnecting TDM links

Time division switching applied prior to space switching Incoming time-slots can always be rearranged such that output requests become conflict free for each slot of a frame, provided that the number of requests for each output is no more than the number of slots in a frame
OUTPUTS OF 4x4 TMS
PL AN E
FO R
SL OT
2 3
PL AN E FO R
SPACE
1
1
PL AN E
3x3 TSI
P. Raatikainen
SS equivalent of a TS-switch
3x3 S-SWITCH PLANES 4 PLANES 3 PLANES 4x4 S-SWITCH PLANES
3 INPUTS
P. Raatikainen
UT P
UT S
FO R
SL O T
3 - 31
3 - 32
SL O T
TIME
16
Connections through SS-switch

Coordinate (X, Y, Z)
stage plane input/output port
4 PLANES
Example connections: - (1, 3, 1) => (2, 1, 2) - (1, 4, 2) => (2, 3, 4)

3x3 S-SWITCH PLANES 4x4 S-SWITCH PLANES
(2, 1, 2)
(1, 3, 1)
(1, 4, 2) (2, 3, 4)
P. Raatikainen
3 - 33
Switch fabrics
P. Raatikainen
17
Three stage switches

Basic TS-switch sufficient for switching time-slots onto addressed outputs, but slots can appear in any order in the output frame If a specific input slot is to carry data of a specific output slot then a time-slot interchanger is needed at each output
=> any time-slot on any input can be connected to any time-slot on any output => blocking probability minimized
Such a 3-stage configuration is named TST-switching (equivalent to 3-stage SSS-switching) TST-switch:

1 2
TS 1 TS 2 TS 1 TS 2
TS 1 TS 1 TS 2 TS 2
TS n TS n
TS 1 TS 1 TS 2 TS 2
TS n
TS n
1 2
TS n TS n
P. Raatikainen
3 - 35
SSS presentation of TST-switch
3x3 T- or S-SWITCH PLANES 4 PLANES
4x4 S-SWITCH PLANES
3x3 T- or S-SWITCH PLANES
4 PLANES
INPUTS
3 HORIZONTAL PLANES
OUTPUTS
P. Raatikainen
3 - 36
18
Three stage switch combinations

Possible three stage switch combinations:
Time-Time-Time (TTT) ( not significant, no connection from PCM to PCM) Time-Time-Space (TTS) (=TS) Time-Space-Time (TST) Time-Space-Space (TSS) Space-Time-Time (STT) (=ST) Space-Time-Space (STS) Space-Space-Time (SST) (=ST) Space-Space-Space (SSS) (not significant, high probability of blocking)
Three interesting combinations TST, TSS and STS

Time-Space-Space switch
Time-Space-Space switch can be applied to increase switching capacity
TS 1 TS 2
1 2 n
TS 1 TS 1 TS 2 TS 2
TS n TS n
1 2 n
TS 1 TS 1 TS 2 TS 2
TS n TS n
TS n
TS 1 TS 2 TS n
1 2 n
P. Raatikainen
1 2 n
3 - 38
19
Space-Time-Space switch
Space-Time-Space switch has a high blocking probability (like ST-switch) - not a desired feature in public networks
TS 1 TS 2
1 2 n
TS 1 TS 1 TS 2 TS 2
TS n TS n
TS n
1 2 n
P. Raatikainen
3 - 39
Graph presentation of space switch

A space division switch can be presented by a graph G = (V, E) - V is the set of switching nodes - E is the set of edges in the graph An edge e E is an ordered pair (u,v) V - more than one edge can exist between u and v - edges can be consider to be bi-directional V includes two special sets (T and R) of nodes not considered part of switching network - T is a set of transmitting nodes having only outgoing edges (input nodes to switch) - R is a set of receiving node having only incoming edges (output nodes from switch)
20
Graph presentation of space switch (cont.)

A connection requirement is specified for each t T by subset RtR to which t must be connected - subsets Rt are disjoint for different t - in case of multi-cast Rt contains more than one element for each t A path is a sequence of edges (t,a), (a,b), (b,c), ,(f,g), (g,r) E, t T, rR and a,b,c,,f,g are distinct elements of V - (T+R) Paths originating from different t may not use the same edge Paths originating from the same t may use the same edges
P. Raatikainen
3 - 41
Graph presentation example

INPUT NODES t OUTPUT NODES r
t1 t2 t3
s1 s2 s3 s4 s5
u1
v1 v2
r1 r2 r3
u2
v3 v4
...
V = (t1, t2 ,... t15, s1, s2 ,... s5 , u1, u2 , u3 , v1, v2 ,... v5 , r1, r2 ,... r15) E = {(t1, s1), ...(t15 , s5), (s1, u1), (s1, u2) ,... (s5, u3), (u1, v1 ), (u1, v2 ), ... (u3, v5 ), (v1, r1 ), (v1, r2 ),... (v5, r15)}
...
t15
u3
v5
r15
21
SSS-switch and its graph presentation
INPUTS t
3x3 S-SWITCH PLANES 5x5 S-SWITCH PLANES 3x3 S-SWITCH PLANES
OUTPUTS r
5 PLANE S
5 PLANES INPUTS 3 HORIZONTAL PLANES OUTPUTS
P. Raatikainen
3 - 43
Graph presentation of connections

INPUTS t OUTPUTS r
A TREE
A PATH
P. Raatikainen
3 - 44
22
Switch Fabrics
P. Raatikainen
4-1
Switch fabrics
P. Raatikainen
Cost criteria for switch fabrics

Number of cross-points Fan-out Logical depth Blocking probability Complexity of switch control Total number of connection states Path search
P. Raatikainen
4-3
Cross-points
Number of cross-points gives the number of on-off gates (usually and-gates) in space switching equivalent of a fabric minimization of cross-point count is essential when cross-point technology is expensive (e.g. electro-mechanical and optical cross-points) Very Large Scale Integration (VLSI) technology implements cross-point complexity in Integrated Circuits (ICs) => more relevant to minimize number of ICs than number of cross-points Due to increasing switching speeds, large fabric constructions and increased integration density of ICs, power consumption has become a crucial design criteria - higher speed => more power
- large fabrics => long buses, fan-out problem and more driving power - increased integration degree of ICs => heating problem
P. Raatikainen
4-4
Fan-out and logical depth
VLSI chips can hide cross-point complexity, but introduce pin count and fan-out problem

length of interconnections between ICs can be long lowering switching speed and increasing power consumption parallel processing of switched signals may be limited by the number of available pins of ICs fan-out gives the driving capacity of a switching gate, i.e. number of inputs (gates/cross-points) that can be connected to an output long buses connecting cross-points may lower the number of gates that can be connected to a bus
Logical depth gives the number of cross-points a signal traverses on its way through a switch
large logical depth causes excessive delay and signal deterioration

P. Raatikainen
Blocking probability

Blocking probability of a multi-stage switching network difficult to determine Lees approximation gives a coarse measure of blocking Assume uniformly distributed load

equal load in each input load distributed uniformly among intermediate stages (and their outputs) and among outputs 1 of the switch
n
1 2 1 n
kxn
nxk
...
Probability that an input is engaged is a = S where - = input rate on an input link - S = average holding time of a link
P. Raatikainen
4-6
Blocking probability (cont.)
Under the assumption of uniformly distributed load, probability that a path between any two switching blocks is engaged is p = an/k (kn) Probability that a certain path from an input block to an output block is engaged is 1 - (1-p)2 where the last term is the probability that both (input and output) links are disengaged Probability that all k paths between an input switching block and an output switching block are engaged is B = [1 - (1- an/k )2 ]k which is known as Lees approximation
P. Raatikainen Switching Technology / 2003 4-7
Control complexity
Give a graph G , a control algorithm is needed to find and set up paths in G to fulfill connection requirements Control complexity is defined by the hardware (computation and memory) requirements and the run time of the algorithm Amount of computation depends on blocking category and degree of blocking tolerated In general, computation complexity grows exponentially as a function of the number of terminal There are interconnection networks that have a regular structure for which control complexity is substantially reduced There are also structures that can be distributed over a large number of control units
Management complexity

Network management involves adaptation and maintenance of a switching network after the switching system has been put in place Network management deals with failure events and growth in connectivity demand changes of traffic patterns from day to day overload situations diagnosis of hardware failures in switching system, control system as well as in access and trunk network - in case of failure, traffic is rerouted through redundant built-in hardware or via other switching facilities - diagnosis and failure maintenance constitute a significant part of software of a switching system
In order for switching cost to grow linearly in respect to total traffic, switching functions (such as control, maintenance, call processing and interconnection network) should be as modular as possible
P. Raatikainen
Example 1
A switch with
a capacity of N simultaneous calls average occupancy on lines during busy hour is X Erlangs Y % requirement for internal use notice that two (one-way) connections are needed for a call
requires a switch fabric with M = 2 x [(100+Y)/100] x(N/X) inputs and outputs.
If N = 20 000, X = 0.72 and Y = 10% => M = 2 x 1.1 x 20 000/0.72 = 61 112 => corresponds to 2038 E1 links
1 1
2 M M
P. Raatikainen
4 - 10
Amount of traffic in Erlangs

Erlang defines the amount of traffic flowing through a communication system - it is given as the aggregate holding time of all channels of a system divided by the observation time period Example 1: - During an hour period three calls are made (5 min, 15 min and 10 min) using a single telephone channel => the amount of traffic carried by this channel is (30 min/60 min) = 0.5 Erlang Example 2: - a telephone exchange supports 1000 channels and during a busy hour (10.00 - 11.00) each channel is occupied 45 minutes on the average => the amount of traffic carried through the switch during the busy hour is (1000x45 min / 60 min) = 75 Erlangs
P. Raatikainen
4 - 11
Erlangs first formula

An n! E1 (n, A) = A2 An + + 1+ A+ 2! n! Erlang 1st formula applies to systems fulfilling conditions - a failed call is disconnected (loss system) - full accessibility - time between subsequent calls vary randomly - large number of sources E1(5, 2.7) implies that we have a system of 5 inlets and offered load is 2.7 Erlangs - blocking calculated using the formula is 8.5 % Tables and diagrams (based on Erlangs formula) have been produced to simplify blocking calculations
Erlang 1st formula
P. Raatikainen
4 - 12
Example 2
An exchange for 2000 subscribers is to be installed and it is
required that the blocking probability should be below 10 %. If E2 links are used to carry the subscriber traffic to telephone network, how many E2 links are needed ? - average call lasts 6 min - a subscriber places one call during a 2-hour busy period (on the average)
Amount of offered traffic is (2000x6 min /2x60 min) = 100 Erl. Erlang 1st formula gives for 10 % blocking and load of 100 Erl. that n = 97 => required number of E1 links is ceil(97/30) = 4
P. Raatikainen
4 - 13
Example 3

Suppose driving current of a switching gate (cross-point) is 100 mA and its maximum input current is 8 mA How many output gates can be connected to a bus, driven by one input gate, if the capacitive load of the bus is negligibly small ? Fan-out = floor[100/8] = 12 c c
1 2
c How many output gates can be connected to a bus driven by one input gate if load of the bus corresponds to 15 % of the load of a gate input) ? Fan-out = floor[100/(1.15x8)] = 10
c
M
4 - 14
Switch fabrics
P. Raatikainen
Multi-stage switching
Large switch fabrics could be constructed by using a single NxN crossbar, interconnecting N inputs to N outputs
- such an array would require N2 cross-points - logical depth = 1 - considering the limited driving power of electronic or optical switching gates, large N means problems with signal quality (e.g. delay, deterioration)
Multi-stage structures can be used to avoid above problems Major design problems with multi-stages
- find a non-blocking structure - find non-conflicting paths through the switching network
P. Raatikainen
4 - 16
Multi-stage switching (cont.)

Lets take a network of K stages Stage k (1kK) has rk switch blocks (SB) Switch block j (1j rk) in stage k is denoted by S(j,k) Switch j has mk inputs and nk outputs Input i of S(j,k) is represented by e(i,j,k) Output i of S(j,k) is represented by o(i,j,k) Relation o(i,j,k)= e(i,j,k+1) gives interconnection between output i and input i of switch blocks j and j in consecutive stages k and k+1 Special class of switches: nk = rk+1 and mk = rk-1 each SB in each stage connected to each SB in the next stage
P. Raatikainen
Clos network
mk = number of inputs in a SB at stage k nk = number of outputs in a SB at stage k rk = number of SBs at stage k parameter m1, n3, r1, r2, r3 chosen freely other parameters determined uniquely by n1 = r2, m2 = r1, n2 = r3, m3 = r2
m2 = r1 = 3 n1 = r2 = 5
n2 = r3 = 4 m = r = 5 3 2
n3 = 2 m1 = 3
r1 = 3
SB = Switch Block
r2 = 5
r3 = 4
4 - 18
Graph presentation of a Clos network

m2 = r1 = 3 n1 = r2 = 5 n2 = r3 = 4 m3 = r2 = 5
4x4 switch
m1 = 3 n3 = 2 1 2 3 4 1 2 3 4
r1 = 3 r2 = 5 r3 = 4
Every SB in stage k is connected to all rk+1 SBs in the following stage k+1 with a single link.
Path connections in a 3-stage network

An input of SB x may be connected to an output of SB y via a middle stage SB a Other inputs of SB x may be connected to other outputs of SB y via other middle stage SBs (b, c, ) Paulls connection matrix is used 1ST STAGE 2ND STAGE 3RD STAGE to represent paths in three SBs SBs SBs stage switches
SB a
SB x
SB b
SB y
SB c
P. Raatikainen
4 - 20
10
Paulls matrix

Middle stage switch blocks (a, b, c) connecting 1st stage SB x to 3rd stage SB y are entered into entry (x,y) in r1 x r3 matrix Each entry of the matrix may have 0, 1 or several middle stage SBs A symbol (a,b,..) appears as many times in the matrix as there are connections through it
1 1 Stage 1 switch blocks 2 . . . x . . . a, b, c 2 Stage 3 switch blocks . . . . . . y
r3
r1 P. Raatikainen Switching Technology / 2003 4 - 21
Paulls matrix (cont.)

Conditions for a legitimate point-to-point connection matrix:
1 Each row has at most m1 symbols, since there can be as many paths through a 1st stage SB as there are inputs to it 2 Each column has at most n3 symbols, since there can be as many paths through a 3rd stage SB as there are outputs from it
1 1 2 Rows . . . x At most min(m1, r2) symbols in row x 2 . . . y . Columns . .
r3
At most min(n3, r2) distinct symbols in row y
. . .
r1 P. Raatikainen Switching Technology / 2003 4 - 22
11
Paulls matrix (cont.)

Conditions of a legitimate point-to-point connection matrix (cont.):
3 Symbols in each row must be distinct, since only one edge connects a 1st stage SB to a 2nd stage SB => there can be at most r2 different symbols 4 Symbols in each column must be distinct, since only one edge connects a 2nd stage SB to a 3rd stage SB and an edge does not carry signals from several inputs => there can be at most r2 different symbols In case of multi-casting, conditions 1 and 3 may not be valid, because a path from the 1st stage may be directed via several 2nd stage switch blocks. Conditions 2 and 4 remain valid.
Strict-sense non-blocking Clos

Definitions:
T is a subset of set T of transmitting terminals R is a subset of set R of receiving terminals Each element of T is connected by a legitimate multi-cast tree to a non-empty and disjoint subset R Each element of R is connected to one element of T A network is strict sense non-blocking if any t T- T can establish a legitimate multi-cast tree to any subset R - R without changes to the previously established paths. A rearrangeable network satisfies the same conditions, but allows changes to be made to the previously established paths.
12
Clos theorem
Clos theorem:
A Clos network is strict-sense non-blocking if and only if the number of 2nd stage switch blocks fulfills the condition
r2 m1 + n3 - 1
A symmetric Clos network with m1 = n3 = n is strict-sense nonblocking if
r2 2n - 1
P. Raatikainen
4 - 25
Proof of Clos theorem

Proof 1:
Lets take some SB x in the 1st stage and some SB y in the 3rd stage, which both have maximum number of connection minus one. => x has m1 -1 and y has n3 -1 connections One additional connection should be established between x and y In the worst case, existing connections of x and y occupy distinct 2nd stage SBs => m1 -1 SBs for paths of x has and n3 -1 SBs for paths of y To have a connection between x and y an additional SB is needed in the 2nd stage => required number of SBs is (m1 -1) + (n3 -1) + 1 = m1 + n3 -1
P. Raatikainen
13
Visualization of proof
1
m1-1 1
...
1
y
m3
x
n1 1
n3-1
...
P. Raatikainen
4 - 27
Paulls matrix and proof of Clos theorem

Proof 2:
A connection from an idle input of a 1st stage SB x to an idle output of a 3rd stage SB y should be established m1-1 symbols can exist already in row x, because there are m1 inputs to SB x. n3-1 symbols can exist already in row y, because there are n3 outputs to SB y. In the worst case, all the (m1-1 + n3-1) symbol are distinct To have an additional path between x and y, one more SB is needed in the 2nd stage => m1 + n3 -1 SBs are needed
P. Raatikainen
14
Procedure for making connections

Keep track of symbols used by row x using an occupancy vector ux (which has r2 entries that represent SBs of the 2nd stage) Enter 1 for a symbol in ux if it has been used in row x, otherwise enter 0 Likewise keep track of symbols used by column y using an occupancy vector uy To set up a connection between SB x and SB y look for a position j in ux and uy which has 0 in both vectors Amount of required computation 0 0 1 ux 0 1 1 is proportional to r2
1 2 3 j r2
common 0
uy
P. Raatikainen
1 1
1 2
0 3
0 j
0 r2
4 - 29
Rearrangeable networks
Slepian-Duguid theorem:
A three stage network is rearrangeable if and only if r2 max(m1, n3) A symmetric Clos network with m1 = n3 = n is rearrangeably nonblocking if r2 n
Paulls theorem:
The number of circuits that need to be rearranged is at most min(r1, r3) -1
15
Connection rearrangement by Paulls matrix
If there is no common symbol (position j) found in ux and uy, we look for symbols in ux that are not in uy and symbols in uy not found in ux => a new connection can be set up only by rearrangement Lets suppose there is symbol a in ux (not in uy) and symbol b in uy (not in ux) and lets choose either one as a starting point Let it be a then b is searched from the column in which a resides (in row x) - let it be column j1 in which b is found in row i1 In row i1 search for a - let this position be column j2 n This procedure continues until symbol a or b cannot be found in the column or row visited 0 1 1 1 ux 1 1
1 1 1 2 1 2
a a
b b
r2 1 1 r2
uy
P. Raatikainen
4 - 31
Connection rearrangement by Paulls matrix (cont.)
At this point connections identified can be rearranged by replacing symbol a (in rows x, i1, i2, ...) by b and symbol b (in columns y, j1, j2, ...) by a a and b still appear at most once in any row or column 2nd stage SB a can be used to connect x and y
1 1 i1 x i2 j1 y j3 j2
r3
1 i1 x i2
j1
j3
j2
r3
b a
a b a
a b b
b a
b
r1
P. Raatikainen
r1
16
Example of connection rearrangement by Paulls matrix

Lets take a three-stage network 24x25 with r1=4 and r3=5 Rearrangeability condition requires that r2=6 - let these SBs be marked by a, b, c, d, e and f => m1 = 6, n1 = 6, m2 = 4, n2 = 5, m3 = 6, n3 = 5
6x6
1 2 6 1 2 6
4x5
1(a)
6x5
1
1 2 5 1 2 5
2(b)
1 2 6
6(f)
1 2 5
P. Raatikainen
4 - 33
Example of connection rearrangement by Paulls matrix (cont.)

In the network state shown below, a new connection is to be established between SB1 of stage 1 and SB1 of stage 3 No SBs available in stage 2 to allow a new connection Slepian-Duguid theorem => a three stage network is rearrangeable if and only if r2 max(m1, n3) - m1 = 6, n3 = 5, r2 = 6 => condition fulfilled SBs c and d are selected to operate rearrangement
1 1 1st stage SBs 2 3 4 d f a,b d c c e,f a 3rd stage SBs 2 3 4 5 a b,e c d b,f c
Occupancy vectors of SB1/stage 1 and SB1/stage 3
u1-1 u3-1
1 a 1 a
1 b 1 b
1 c 0 c
0 d 1 d
1 e 0 e
1 f 0 f
P. Raatikainen
4 - 34
17

Start rearrangement procedure from symbol c in row 1 and column 5 5 connection rearrangements are needed to set up the required connection - Paulls theorem !!!
1 1st stage SBs 1 2 3 4 d f a,b
3rd stage SBs 2 3 4 5 1st stage SBs a d c e,f c a b,e c d b,f c 1 2 3 4
3rd stage SBs 2 3 4 5 a b,e
c,f
a,b
c d e,f
d c
a b,f
P. Raatikainen
4 - 35
Paulls theorem states that the number of circuits that need to be rearranged is at most min(r1, r3) -1 = 3 => there must be another solution Start rearrangement procedure from d in row 4 and column 1 => only one connection rearrangement is needed
1 1st stage SBs 1 2 3 4 d f a,b d c e,f c a 3rd stage SBs 2 3 4 5 1st stage SBs a b,e c d b,f c 1 2 3 4 1 c,f a,b c d e,f 3rd stage SBs 2 3 4 5 a b,e d c a b,f d
P. Raatikainen
4 - 36
18
Recursive construction of switching networks
To reduce cross-point complexity of three stage switches individual stages can be factored further Suppose we want to construct an NxN switching network and let N = p xq A rearrangeably non-blocking Clos network is constructed recursively by connecting a pxp, qxq and pxp rearrangeably nonblocking switch together in respective order => under certain conditions result may be a strict-sense nonblocking network A strict-sense non-blocking network is constructed recursively by connecting a p(2p - 1), qxq and p(2p - 1) strict-sense non-blocking switch together in respective order => result may be a rearrangeable non-blocking network
P. Raatikainen
3-dimensional construction of a rearrangeably non-blocking network

q PLANES pxp p PLANES q PLANES
pxp
qxq
Number of cross-points for the rearrangable construction is p2q + q2p + p2q = 2 p2q + q2p
4 - 38
19
3-dimensional construction of a strictsense non-blocking network

q PLANES p PLANES q PLANES (2p-1)xp
px(2p-1)
qx q
Number of cross-points for the strictly non-blocking construction is p(2p - 1)q + q2 (2p - 1) + p (2p - 1)q = 2p(2p - 1) q + q2 (2p - 1)
4 - 39
Recursive factoring of switching networks
N can be factored into p and q in many ways and these can be factored further Which p to choose and how should the sub-networks be factored further ? Doubling in the 1st and 3rd stages suggests to start with the smallest factor and recursively factor q = N/p using the next smallest factor => this strategy works well for rearrangeable networks => for strict-sense non-blocking networks width of the network is doubled => not the best strategy for minimizing cross-point count Ideal solution: low complexity, minimum number of cross-points and easy to construct => quite often conflicting goals
20
Recursive factoring of a rearrangeably non-blocking network

Special case N = 2n, n being a positive integer => a rearrangeable network can be constructed by factoring N into p = 2 and q = N/2 => resulting network is a Benes network => each stage consists of N/2 switch blocks of size 2x2 Factor q relates to the multiplexing factor (number of time-slots on inputs) => recursion continued until speed of signals low enough for real implementations
N/2 x N/2 SWITCH N INPUTS
N/2 x N/2 SWITCH
P. Raatikainen
N OUTPUTS
4 - 41
Benes network
Baseline network Inverse baseline network
Number of stages in a Benes network K = 2log2N - 1

N OUTPUTS
N INPUTS
21
Benes network (cont.)
Benes network is recursively constructed of 2x2 switch blocks and it is rearrangeably non-blocking (see Clos theorem) First half of Benes network is called baseline network Second half of Benes network is a mirror image (inverse) of the first half and is called inverse baseline network Number of switch stages is K = 2log2N - 1 Each stage includes N/2 2x2 switching blocks (SBs) and thus number of SBs of a Benes network is Nlog2N - (N/2) = N(log2N - ) Each 2x2 SB has 4 cross-points and number of cross-points in a Benes network is 4(N/2)(2log2N-1) = 4Nlog2N - 2N 4Nlog2N
Illustration of recursively factored Benes network
P. Raatikainen
16 OUTPUTS
16 INPUTS
4 - 44
22
Switch Fabrics
P. Raatikainen
5-1
Recursive factoring of a strict-sense non-blocking network
A strict-sense non-blocking network can be constructed recursively, but the size of network (number of cross-points) crows fast as a function of the number of inputs, namely CNlog2N Instead of starting with the smaller factor for p lets use switch blocks of N x N Let N = 2n and n = 2l then we are factoring square switches with number of inputs and outputs being power of 2 => condition for a strict-sense non-blocking network states that there are r2 2x2n/2 - 1 second stage SBs Let choose r2 = 2x2n/2 then sizes of the - 1st stage switches are 2n/2 x 2n/2+1 - 3rd stage switches are 2n/2+1 x 2n/2 Each of these can be made of two SBs each of size 2n/2 x 2n/2
Recursive factoring of a strict-sense non-blocking network (cont.)

2nd stage switches are of size 2n/2 x 2n/2 The three stages consist of 6x2n/2 SBs, each of size 2n/2 x 2n/2 Let F(2n ) be the cross-point complexity of an NxN switch then F(2n) = 6x2n/2F(2n/2 ) = 6lx2n/2+n/4++1F(21) < 6lx2nF(2) = N(log2N)2.58F(2) = 4N(log2N)2.58 The difference between rearrangeable and strict-sense nonblocking networks lies in the exponent for the log2N term
P. Raatikainen
5-3
Strict-sense non-blocking network with smaller number of cross-points
Strict-sense non-blocking networks with smaller number of crosspoints than F(2n) = 4N(log2N)2.58 can be constructed One alternative is to use Cantor network, which is constructed using Benes networks, multiplexers and demultiplexers
i-th input of Cantor network connected to j-th input of j-th Benes network using j-th output of a 1xm demultiplexer i-th output of j-th Benes network connected to i-th output of Cantor network using j-th input of a mx1 multiplexer
When N is known, number of required Benes planes to have a strict-sense non-blocking Cantor network is m = log2N Since a Benes network has a cross-point count of 4Nlog2N, number of cross-points of a Cantor network is roughly 4N(log2N)2 (when ignoring cross-points of the multiplexers and demultiplexers
Cantor network
1 TO LOG(N) DEMULTIPLEXERS 1 TO LOG(N) MULTIPLEXERS
P. Raatikainen
N OUTPUTS
N INPUTS
5-5
Cantor network strict-sense non-blocking

Proof:
Markings m number of parallel Benes networks k number of stage in a Benes network A(k) number of reachable 2x2 SBs without rearrangements in stage k (1klog2N) starting from an input of a Cantor network Reachable 2x2 SBs in consecutive stages A(1) = m A(2) = 2A(1) - 1 A(3) = 2A(2) - 2 A(k) = 2A(k-1) - 2k-2 = 22A(k-2) - 2x2k-2 = 2k-1A(1) - (k-1)x2k-2 A(log2N) = 2log2N-1m - (log2N -1) 2log2N-2 = Nm - (log 2N -1)N
Cantor network strict-sense non-blocking (cont.)
Cantor network is symmetrical at the middle => the same number of center stage nodes are reachable by an output of a Cantor network Total number of SBs in center stages is Nm/2 (m Benes networks) If the number of center stage SBs reached by an input and an output exceeds Nm/2 then there must be a SB reachable from both Hence strict-sense non-blocking is achieved if
1 2 [1 2 Nm - 4 (log 2 N - 1)N ] >
Nm 2
=> m > log2N - 1 Notice that a strict-sense non-blocking Cantor network is constructed of log2N rearrangeably non-blocking Benes networks
Visualization of proof
P. Raatikainen
N OUTPUTS 5-8
N INPUTS
Dimensioning example of Cantor network

Number of inputs and outputs of a switching network should be N = 32 x 2048 = 216 64 000 number of multiplexers = number of demultiplexers = number of Benes networks m = log2N = => number of outputs in demultiplexers = => number of inputs in Multiplexers = 64 000 64 000 16 16 16
number of stages in Benes networks = 2log2N - 1 = 2x16-1 = 31 number of 2x2 SBs in Benes networks = Nlog2N = 216 * 32 2N SBs in each Benes network
Control algorithms
Control algorithms for networks, which are formed recursively by three stage factorization Control algorithms can be applied recursively
works well for strict-sense non-blocking networks when setting up connections one at a time for rearrangeable networks adding just one connections may cause the connection pattern to change dramatically => adding a connection to a Benes network can be as complicated as reconnecting all input-output pairs
Lets examine control algorithm for a Benes network, formed by factoring recursively

N = 2m inputs and outputs start with a totally disconnected network and establish requested connection patterns
P. Raatikainen
Looping algorithm
During the first factorization of an NxN switch each 2x2 input SB may be connected to a 2x2 output SB either via upper (U) or lower (L) N/2xN/2 switch (see figure) 1) Initialization Start with 2x2 input SB 1 and mark it by S 2) Loop forward Connect an unconnected input of S to desired output by upper switch U. If no connection is required, go to 4. 3) Loop backward Connect the adjacent output of the output just visited to the desired input by the lower switch L. If no connection is required, go to 4. Otherwise, the newly visited input SB becomes S. Go to 2.
P. Raatikainen
5 - 11
Looping algorithm (cont.)

4) Start new loop Choose another SB, which has not been visited yet as S. Go to 2. If all connections for the NxN switch are made, the algorithm terminates at level m.
N/2 x N/2 LOWER SWITCH
P. Raatikainen
N OUTPUTS
N/2 x N/2 UPPER SWITCH
N INPUTS
5 - 12
Looping algorithm (cont.)
Looping algorithm is applied recursively to establish connections for the upper switch U and lower switch L Computation of paths is complex and time consuming and it can be shown that the total run time of the algorithm to compute paths for all inputs and outputs is proportional to (log2N )2
Looping algorithm suits for circuit switching, because connections computed per call not suitable for packet switching, because connections may have to be recomputed for all N input-output pairs within duration of a packet dedicating a processor for each input and output SB connection => computations become faster, but exchange of path information between processors gets very complicated => Alternative switching architectures needed for packet switching
Switch fabrics
P. Raatikainen
5 - 14
Graph presentation of connection patterns

Point-to-point switching Multi-cast switching
One-to-one connections C = {(i,o)| iI, oO} If (i,o) C and (i,o)C => o=o If (i,o) C and (i,o) C => i=i
One-to-many connections C = {(i,ni)| iI, niO}
C - a logical mapping from inputs to outputs

Graph presentation of connection patterns (cont.)

I Concentrator O
(compact if B compact)
Super-concentrator I O
A
To any member in O
A
To any member in specific B
One-to-one connections C = {(i,o)| iAI, oO} If (i,o)C and (i,o)C => o=o If (i,o)C and (i,o) C => i=i
P. Raatikainen
One-to-one connections C = {(i,o)| iAI, oO} If (i,o)C and (i,o)C => o=o If (i,o)C and (i,o)C => i=i
5 - 16
Graph presentation of connection patterns (cont.)

I Copy O
To any ni members in O
One-to-many connections C = {(i,ni)| iAI, niN} Order and identity of outputs ni ignored (output unspecific)
Combinatorial bound
(G) is log2 of the number of distinct and legitimate C realized by G

R is number of cross-points in a switch fabric
measures combinatorial power of a graph, may bear no direct relationship to control complexity of finding a switch setting to realize a connection pattern
R2 is the number of states in a switch fabric of R cross-points => rough upper bound for the number of Cs in G is R Better upper bound obtained by removing - all non-legitimate states, e.g., those in which two cross-points are feeding one output - one of states for which another state produces the same C Such improvements not easily found
Combinatorial bound (cont.)

Let us look at number of different C measured as realized by different connection functions
For each connection function, which defines a set of legitimate C, we may compute the logarithm of the total number of distinct C, marked by A graph G is rearrangeably non-blocking if all such C can be realized by G => we must have (G) for rearrangeably non-blocking G It follows that by observing the number of distinct C realized by different connection functions, we can find the lower bound of complexity for any rearrangeably non-blocking fabric.
Lower combinatorial bound for point-to-point connections
NxN switch with full connectivity (any element in I can be connected to any distinct element in O) Obviously network that can realize all maximal connection patterns can realize less than maximal patterns Number of C we want to realize equals to N! Sterlings approximation: 2 NN+ e-N = 2 exp2(Nlog2N - Nlog2e + log2N) => N! =>
pt-pt = log2N! Nlog2N - 1.44N + log2N = O(NlogN)
If 2x2 SBs are used, at least Nlog2N such SBs are needed to realize the N! possible maximal connection patterns
=> A poit-to-point interconnection network has a complexity of O(NlogN)

10
Visualization of point-to-point mappings

Number of connection patterns in point-to-point switching Lc= N! N = 2 => Lc= 2
N = 3 => Lc= 6
Construction of C: Enumerate inputs, mix them in an arbitrary order.

Lower bound of Benes network

Number of 2x2 SBs in a Benes network: => 2log2N - 1 stages and each stage has N/2 SBs of size 2x2 => total number of 2x2 SBs is Nlog2N - N/2 , which is close to pt-pt => total number of cross-points 4(Nlog2N - N/2) 4Nlog2N
P. Raatikainen
16 OUTPUTS
16 INPUTS
5 - 22
11
Lower combinatorial bound for multi-point connections
Lets suppose that any input can be connected to any output, i.e., each element in O may choose any one of the N inputs => total number of connection patterns C is NN = exp2(Nlog2N) => mcast = Nlog2N => mcast - pt-pt = 1.44N A fabric architecture that would implement multi-casting and would be close to the lower bound of complexity is not known yet It is known that Benes network implements multi-cast if the number of 2x2 SBs is doubled compared to the pt-to-pt case
P. Raatikainen
5 - 23
Visualization of multi-cast mappings

Number of connection patterns in multi-cast switching Lc= NN
N = 3 => Lc= 27
etc.
12
Lower combinatorial bound for concentrator

A concentrator with M inputs and N outputs (M>N) Connection pattern C defined to be a set of any N of the M inputs Number of these sets =
1,2 1 0,8 0,6
M
M N
Sterlings approximation:
concentrator = log 2
log 2
M! N!(M N)!
M-1+N M-1
M! M 2 N(M N)! N N (M N) M -N
0,4 0,2 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
MH(c)
Entropy function: H(c) = -clog2c - (1-c)log2(1-c)

Lower combinatorial bound for concentrator (cont.)

For given C => concentrator = O(M) This lower bound is smaller by a factor of logM than that of point-topoint or multi-point networks Although concentrators with linear complexity (linear to number of inputs) can be shown to exist, there are no known practical solutions - complicated control algorithms It can be shown that a strict-sense non-blocking concentrator is as complex as a point-to-point non-blocking concentrator - M logM => MlogM-fabrics are used for concentration
P. Raatikainen
5 - 26
13
Visualization of concentrator mappings
Number of connection patterns in concentrator Lc=
(M )= N
M! N!(N-M)!
Concentrator MxN = 4x2 =>
Lc = 6
P. Raatikainen
5 - 27
Lower combinatorial bound for super-concentrator
A super-concentrator with M inputs, N outputs and K elements (K M,N) Connection pattern C defined to be a legitimate set of C = (A, B) by all A and B with K elements Total number of these sets =
K K super con M H + N H M N M N K K
Super concentrators more complex than concentrators Compact super-concentrator specifies output set B once the starting position of the compact sequence is specified => there are N possible starting positions and hence
K super con M H + log 2 N M
14
Visualization of super-concentrator mappings

Number of connection patterns in super-concentrator M N M!N! Lc= K K = K!(M-K)!K!(N-K)!
( )( )
Super concentrator M=4, N=3 and K=2 =>
Lc = 18
P. Raatikainen
5 - 29
Lower combinatorial bound for copy network

A copy network with M inputs and N outputs Connection requests ni over all inputs i is equal to N => Number of connection patterns C equals to
=> copy (M - 1+ N )H
M 1 M 1 + N
Complexity lower bound is liner in M and N
P. Raatikainen
5 - 30
15
Visualization of copy mappings

Number of connection patterns in copy network M-1+N (M-1+N)! = (M-1)!N! Lc= M-1
Copy MxN = 4x2 => Lc= 10
Lc is the number of ways N objects can be put into M bins.

Compact super-concentrator example

Inverse Banyan network formed by recursive 2-stage factoring using super-concentrators
P. Raatikainen
16 OUTPUTS
16 INPUTS
5 - 32
16
Two stage factoring

COMPACT SUPER CONCENTRATOR
p horizontal pales
M/p x q super concentrators
p x N/q super concentrators
2-stage factoring can be used to construct compact superconcentrators

Distribution network
Mirror image of a compact super-concentrator is called a distribution network Provided that an input-output connection pattern C = { (i,oi) } satisfies: - Compactness condition - active inputs i for the pair in C are compact in modulo fashion - Monotone condition - outputs oi to be connected to each active input are strictly increasing in i in modulo fashion a 2-stage network can be made a non-blocking one if the connection requests arrive in sorted order - one way to achieve this is to put a sorting network in front of a 2-stage network
All point-to-point connections satisfying the above two conditions can be connected using the distribution network
ve r
tic al
pa le s
17
Compact and monotone connection pattern

Compact active inputs A modulo-wise compact set
N-1 N-2 N-3 0 1 2 3 4
...
Cyclically monotone outputs

0 1 2 3 4 5
0 1 2 3 4 5
P. Raatikainen
Construction of a distribution network
P. Raatikainen
...
N-4 N-3 N-2 N-1
MIRROR IMAGE DISTRIBUTION NETWORK
...
N-4 N-3 N-2 N-1
...
5 - 35 5 - 36
18
Example of a distribution network

Distribution network based on inverse Banyan network
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
P. Raatikainen
5 - 37
Construction of copy networks
Distribution network which allows multiple connections between an input and outputs is called a copy distribution network An input can make extra connections to outputs if the outputs connected remain monotonically increasing with respect to the inputs - Compactness condition - active inputs i for the pair in C are compact in modulo fashion - Monotone condition - each element in Oi is greater than each element in Oi if i >i in modulo fashion Inverse of many-to-one concentrator performs the copy functions
P. Raatikainen
5 - 38
19
Copy distribution network
DISTRIBUTION NETWORK
MONOTONE SET OF Oi MULTI-COPY INPUT
P. Raatikainen
5 - 39
Compact and monotone connection pattern

Many-to-one concentration
0 1 2 3 4 5 0 1 2 3 4 5
MIRROR IMAGE
One-to-many copying
0 1 2 3 4 5 0 1 2 3 4 5
Cyclically monotone input sets P. Raatikainen
...
N-4 N-3 N-2 N-1
Compact outputs
...
N-4 N-3 N-2 N-1
Compact active inputs
...
N-4 N-3 N-2 N-1
Cyclically monotone output sets 5 - 40
...
N-4 N-3 N-2 N-1
20
Construction of multi-cast networks
Multi-cast networks can be constructed, e.g. by concatenating a copy network and a point-to-point network - 3-stage factorization can be applied to get a point-to-point network - resulting network consists of a concentrator, copy distribution network and Benes network => number of stages increases => total number of 2x2 SBs is 2Nlog2N There are alternative ways to construct multi-cast networks, but they encounter the above mentioned problems - number of stages increases Difficult to calculate connections through the fabric Complicated fabric control algorithms One way to solve the control problem is to use self-routing
Switch fabrics
Multi-point switching Self-routing networks Sorting networks Fabric technologies Fault tolerance and reliability
P. Raatikainen
5 - 42
21
Self-routing
Self-routing is a popular principle in fast packet switching Header of each packet contains all information needed to route a packet through a switch fabric One or more paths may exist from an input to an output Interconnection network has the unique path property - if the sequence of nodes connecting an input to an output is unique for all input-output pairs An important class of networks having the unique property is the generic banyan network - NlogN complexity - only one path connecting each input-output pair
P. Raatikainen
5 - 43
Examples of unique route network

Banyan network Baseline network
Shuffle exchange (Omega) network
Flip network (inverse shuffle exchange)
P. Raatikainen
5 - 44
22
Self-routing principle
S1, S2 , SK is a sequence of switch blocks, which have n1, n2, nK outputs respectively Route of a packet uses the bk-th output of switch block k => route given by the sequence b1 b2 bk bK At switch block k, packet routed to output bk and address bk is removed from the self-routing header
Self-routing address bK ... b1 Self-routing address bK ... b2 Self-routing address bK
1 2 ... b1 ... n1
1 2 ... b2 ... n2
1 2 ... bK ... nK
Switch block S1
Switch block S2
Switch block SK
P. Raatikainen
5 - 45
Self-routing principle (cont.)

In a self-routing shuffle exchange (NxN) network - N = 2K inputs and outputs interconnected by K stages of 2K-1 nodes
- nodes numbered in each stage from 0 to 2K-1-1 (binary K-1 format) - links (= edges) in each stage numbered from 0 to 2K-1 (binary K format) - outgoing links numbered by appending 0 (up going links) or 1 (down going links) to the nodes number
000 001 010 011 100 101 110 111 00
UP =0
00
00
000 001 010 011 100 101 110 111
01
01
01
1 N= W DO
10
10
10
11 Edge 011
11 Edge 110
11
P. Raatikainen
5 - 46
23
Self-routing in a shuffle exchange network

Self-routing shuffle exchange scheme
- a packet at input a1a2 aK is destined to output b1b2 bK
- b1b2 bK is used as the self-routing address (up link chosen if bk=0 and down link if bk=1) - packet visits first node a2aK => travels along edge a2aKb1 => visits node a3aKb1 => = > finally after visiting node b1bn-1 arrives at output edge b1bn
000 001 010 011 100 101 110 111 Edge 1
001
00
00
00
000 001 010 011 100 101 110 111 Edge 4
Source 0 1 Edge 1 Edge 2
Destination 1 0 1
01
01
01
10
011
10
110
10
Edge 3 Edge 4
11 Edge 2
11
11
Edge 3
P. Raatikainen
5 - 47
Monotone and compact addresses

Top-down numbering of nodes and links can be used also for other self-routing networks having the unique property If self-routing addresses of packets at the inputs satisfy conditions: addresses are strictly monotone in the sense that destination addresses are strictly increasing in top-down manner at the inputs packets are compact in the sense that there is no idle input between any two inputs with packets => self-routing paths used by these packets do not share any link within the shuffle exchange network => no need for buffering at inputs of the internal nodes
24
Limitations of banyan networks

Banyan network can realize exp2(Nlog2N) = (NN) input-output permutations (connection patterns) Full connectivity requires N! connection patterns => Banyan network is a blocking one Combinatorial power of banyan network can be increased significantly by implementing - multiple links between nodes - duplicated switch - appended switch with random routing (shuffle) - buffering at intermediate nodes (=> undesirable random delay)
P. Raatikainen
5 - 49
25
Switch Fabrics
P. Raatikainen
6-1
Switch fabrics
P. Raatikainen
6-2
Sorting networks
Types of blocking Internal blocking Output blocking Head of line blocking Sorting to remove internal blocking Resolving output conflicts Easing of HOL blocking
P. Raatikainen
6-3
Internal blocking
Internal blocking occurs at the internal links of a switch fabric In a switch fabric, which implements synchronous slot timing, internal blocking implies that some input (i) to output (j) connection cannot be established (even if both are idle ones) Internally non-blocking switch makes all requested connections (i, ji), provided that there are no multiple request to the same output (ji ji if i i, 1i,jN) Input Dest. Output
i 1 2 3 4 ji
1 4 3
j 1 2 3 4
Connection pattern = {(2, 1), (3, 4), (4, 3)} P. Raatikainen Switching Technology / 2003 6-4
Output blocking
Internally non-blocking switch can block at an output of a switch fabric due to conflicting requests, i.e., ji = ji for some i i When output conflict occurs, switch should connect one of the conflicting inputs to requested output => output conflict resolution Major distinction between a circuit and packet switching node
a packet switching node must solve output conflicts per time-slot (timeslots are not assigned beforehand) Input Dest. Output a circuit switching node solves i ji j possible output conflicts and 3 1 1 assigns a time-slot for entire 1 2 2 duration of a connection 4 3 3 beforehand
4
3
4 Conflicting output request
P. Raatikainen
6-5
Head of line (HOL) blocking

Packets not forwarded due to output conflict are buffered => more delay experienced Buffered packets normally served in a FCFS (First Come First Served) manner => HOL blocking introduced at the input queues Packet facing HOL blocking Dest. may prevent the next packet in Input i ji the queue to be delivered to 2 3 1 a non-contended output 4 1 2 => throughput of a switch 3 4 3 reduced
4
1 3
Output j 1 2 3 4
Packet blocked by HOL queuing P. Raatikainen Switching Technology / 2003
Conflicting output request 6-6
Sorting to remove internal blocking

If connection requests at the inputs of a banyan network are compact and in strictly increasing order => input-output paths are link-disjoint => banyan internally non-blocking A method for building an internally non-blocking network is to apply a sorting network in front of a banyan network to generate a strict increasing order of destination addresses for the banyan network A sorting network connects an input i, which has a connection request to output ji, to an output of a sorting network according to the position of ji in the sorted list of destination requests (see figure) Sorting networks can be formed by interconnecting nodes of smaller sorting networks (such as 2x2) Self-routing should be applied in the sorting network
Internally non-blocking and self-routing switch
Input Dest. i ji 1 2 3 4
1 4 3
Compact and monotone output addresses
Output j 1 2 3 4
1 3 4
Sorting network (Batcher)
Routing network (Banyan)
P. Raatikainen
6-8
Sorting to remove internal blocking
A permuted list (a1, a2 , , aN) can be restored to its original order by sorting A switching network for a maximal connection pattern can be obtained from a sorting network by treating 2x2 sorting elements as 2x2 switching elements Asymptotic lower bound for 2x2 sorting elements to build a NxN sorting network is Nlog2N (as for a respective switching network) - no sorting network found so far to obtain this bound Sequential merge-sorting process can be used to obtain Nlog2N bound for the number of binary sorts
P. Raatikainen
6-9
Merge-sorting algorithm
Merge-sorting algorithm Input : unsorted list AN = (a1, a2 , , aN) Sort procedure: Sort (AN) = Merge {Sort(a1, , aN), Sort (aN+1 , , aN)} Merge procedure: Merge {(a1, , am), (a1, , am)} = {a1, (Merge ((a2, , am), (a1, , am))} if a1 a1 = {a1, (Merge ((a1, , am), (a2, , am))} if a1> a1 Procedure Merge, called by procedure Sort, takes two sorted lists and merges them by comparing the smallest elements in each of the two sorted lists
P. Raatikainen
6 - 10
Merge-sorting algorithm (cont.)

Merging of two sorted lists (N/2 numbers in each) requires N binary sorts Total complexity of sorting N numbers is given by C(N) = 2C(N/2) = N + 2(N/2 + 2C(N/4)) = = Nlog2N Due to sequential nature of procedure Merge the sorting takes at least O(N) time
1 4 6 10 11 15 17 20 P. Raatikainen
Compare numbers at the top of lists then merge
2 5 7 9 12 14 16 24 6 - 11
Odd-even merging
Recursive construction of an odd-even merger
- number of sorting stages is log2N - number of sorting elements is 0.5N [log2N-1]+1
a0 a1 a2 a3 aN/2-2 aN/2-1 b0 b1 c0 c1 c2 e0 e1 e2 e3 e4
...
Even merger N/2
...
cN/2-2 cN/2-1
...
...
...
bN/2-4 bN/2-3 bN/2-2 bN/2-1
...
dN/2-2 dN/2-1
P. Raatikainen
...
... ...
...
...
Odd merger N/2 dN/2-3

...
d0 d1
eN/2-5 eN/2-4 eN/2-3 eN/2-2 eN/2-1
6 - 12
Bitonic list
Bitonic list AN = (a1, a2 , , aN) is a list for which it holds that a1 a2 ak-1 ak and ak ak+1 aN-1 aN (1 k N) Unique cross-over property - when comparing a monotonically
increasing list with a monotonically decreasing list, there is at most one position where the two lists cross-over in their values (see figures) Bitonic list
1 4 6 10 11 15 17 20 24 16 14 12 9 7 5 2
Circular bitonic list

17 20 24 16 14 12 9 7 5 2 1 4 6 10 11 15
Cross-over
< >
> <
Cross-over
P. Raatikainen
6 - 13
Bitonic merging
Recursive construction of a bitonic merger
- number of sorting stages is log2N - number of sorting elements is 0.5N log2N
a0 a1 c0 c1 c2 e0 e1
...
... ...
Bitonic merger N/2
aN/2-2 aN/2-1 aN/2 aN/2+1
...
aN-2 aN-1
P. Raatikainen
... ...
...
cN/2-2 cN/2-1 d0 d1
...
eN/2-2 eN/2-1 eN/2 eN/2+1
...
dN/2-3 dN/2-2 dN/2-1
Bitonic merger N/2
...
...
eN-2 eN-1
6 - 14
Sorting by merging
Recursive construction of a sorting by merging network - number of sorting stages is 0.5Nlog2N(log2N + 1)
a0 a1
Merger N/4
e0
...
aN/2-1
aN/2-2
Merger N/4
Merger N/2
aN-2 aN-1
Merger N/4
P. Raatikainen
Odd-even sorting network example

Number of sorting stages is 0.5log2N(log2N + 1) Number of sorting elements is 0.25N[log2N(log2N - 1) + 4] - 1
2x2 SORTER 4x4 SORTER 8x8 SORTER
P. Raatikainen
...
aN/2 aN/2+1
...
Merger N
...
Merger N/4
... ...
Merger N/2
...
eN-1
6 - 15
2x2 UP SORTER

6 - 16
2x2 DOWN SORTER
Bitonic sorting network example

Number of sorting stages is 0.5log2N(log2N + 1) Number of sorting elements is 0.25Nlog2N(log2N + 1)
2x2 SORTER 4x4 SORTER 8x8 SORTER

P. Raatikainen
2x2 BITONIC SORTER 4x4 BITONIC SORTER
2x2 UP SORTER
Batcher-Banyan self-routing network

8x8 BITONIC SORTER 8x8 ROUTER
2x2 DOWN SORTER

6 - 17
SORTING NETWORK
P. Raatikainen
2x2 UP SORTER

ROUTING NETWORK 2x2 DOWN SORTER
6 - 18
Resolving output blocking

Packet switches do not maintain a scheduler for dedicating time-slots for packets (at the inputs) => output conflicts possible => output conflict resolution needed on slot by slot basis Output conflicts solved by polling (e.g. round robin, token circulation) - do not scale for large numbers of inputs
- outputs just served have an unfair advantage in getting a new time-slot
sorting networks (making a banyan network internally non-blocking) An example of sorting networks is sort-purge-concentrate network
when sorting self-routing addresses, duplicated output requests appear adjacent to each other in the sorted order (see figure) - either one has to be purged (deleted) - successful delivery is acknowledged and purged packets are re-sent
Sort-purge-concatenate network
A sorting network can easily handle packet priority by - adding a priority field in the self-routing address
- higher priority packets are placed in a favorable position before purging - support of priority is an essential feature when integrating circuit and packet switching in a sort-banyan network
Input i 1 2 3 4 Dest. ji
3 1 4 3
Sorted destination addresses
Compact and sorted output addresses
Output j 1 2 3 4
1 3 3 4
1 3 4
Sorting network P. Raatikainen
Purge network
Concentration network
Routing network (Banyan) 6 - 20
10
Resolving HOL blocking
HOL blocking solved by allowing packets behind a HOL packet to contend for outputs allow multiple delivery of conflicting HOL packets to an output buffer
- multiple rounds of arbitration for sort-banyan network - multiple planes of sort-banyan networks
a good solution is to implement multiple input buffers (one for each output if possible) and if the packet in turn cannot be transmitted due to HOL, transmit an other packet from another buffer
P. Raatikainen
6 - 21
Construction of a multipoint packet switch

In a self-routing multipoint switch - incoming packets destined to multiple outputs - packets carry all destination addresses in their headers
COPY NETWORK ROUTING NETWORK
COPY DISTRIBUTION NETWORK
BANYAN SWITCH
P. Raatikainen
6 - 22
11
Batcher-Banyan example
Source/dest. 000/110 (6) 001/100 (4) 010/ *** (-) 011/011 (3) 100/111 (7) 101/010 (2) 110/ *** (-) 111/001 (1) 2x2 BITONIC SORTER
4 6 3 2 7 1
4x4 BITONIC SORTER

4 3 6 7 2 1 3 4 7 6 2 1
8x8 BITONIC SORTER

3 2 4 1 6 7 2 1 3 4 6 7 1 2 3 4 6 7 -
8x8 ROUTER
1 3 2 6 7 4 1 3 2 4 6 7 1 2 3 4 6 7
SORTING NETWORK
P. Raatikainen
2x2 UP SORTER
Switch fabrics
Multipoint switching Self-routing networks Sorting networks Fabric implementation technologies Fault tolerance and reliability
ROUTING NETWORK
2x2 DOWN SORTER
6 - 23
P. Raatikainen
6 - 24
12
Fabric implementation technologies Time division fabrics

Shared media Shared memory
Space division fabrics

Crossbar Multi-stage constructions
Buffering techniques
Time division fabrics
Shared media Bus architectures Ring architectures Shared memory
P. Raatikainen
6 - 26
13
Shared bus
Bus architecture
Switching in time domain, but time and space switching implementations enabled Easy to implement and low cost (cost index = N) One time-slot carried through the bus at a time => limited throughput (multi-casting possible) => low number of line interfaces => limited scalability
Line Interface #1 Line Interface #2 Line Interface #3
...
Line Interface #n Bus control
P. Raatikainen
6 - 27
Shared bus (cont.)

Bus architecture
Internally non-blocking implementations require high capacity switching bus => throughput aggregate capacity of line interfaces Inherently a single stage switch, but TST-switching possible if linecards support time-slot interchange Multiple-bus structures can be used to improve reliability and increase throughput
Bus 1 Line Interface #1 Line Interface #2 Line Interface #3
...
Bus 2
Line Interface #n
Bus control
P. Raatikainen
6 - 28
14
Ring architectures
Ring architecture
Rings coarsely divided into source and destination release rings
in source release (SR) rings only one switching operation in progress at a time => limited throughput (like a shared bus) destination release (DR) rings allow spatial reuse, i.e., multiple time-slots can be carried through the ring simultaneously => improved throughput
Switching in time domain, but time and space switching implementations enabled Usually easy to implement and low cost (cost index = N) Scales better than a shared bus
Ring architectures (cont.)

Ring architecture
Internally non-blocking implementations require that throughput of a ring bus aggregate capacity of line interfaces Throughput can be improved by implementing parallel ring buses - control usually distributed => MAC implementations may be difficult Multi-casting relatively easy to implement Inherently a single stage switch, but TSTswitching possible if line-cards support timeslot interchange Multiple rings can be used to implement switching networks
15
Ring architectures (cont.)

Dual ring architecture
Multiple rings used to improve throughput, decrease internal blocking, improve scalability and increase reliability
P. Raatikainen
6 - 31
Shared memory
Shared memory architecture
Switching in time domain, but time and space switching implementations enabled Inherently a single stage switch, but allows TST-switching if linecards support time-slot interchange Easy to implement and low cost (cost index = N)
Line Interface #1 Line Interface #2 Line Interface #3 Line Interface #n
...
Bus
CPU
Buffer memory
P. Raatikainen
6 - 32
16
Shared memory (cont.)

Every time-slot carried twice through the bus => low throughput => low number of line interfaces => limited scalability Internally non-blocking if throughput of a switching bus and speed of shared memory aggregate capacity of line interfaces Performance can be improved by dual bus architecture or replacing the bus with a space switch (such as crossbar)
P. Raatikainen
6 - 33
Shared memory (cont.)


Dual-bus architecture improves throughput, decreases internal blocking, improves scalability and increases reliability Memory speed requirement equal to that of single bus solutions
Bus 1 Line Interface #1 Line Interface #2 Line Interface #3 Line Interface #n
...
Bus 2
CPU
Buffer memory
P. Raatikainen
6 - 34
17
Dimensioning example
A shared memory architecture, which uses a shared bus to connect line interfaces to the memory, is used to implement a switching equipment. The bus is 32 bits wide and bus clock is 150 MHz. Three clock cycles are needed to transfer a 32 bit word through the bus and 20 % of the bus capacity is used for other than switching purposes. How many E1 interfaces can be supported by the switch ? What is the required memory speed ? Solution: If the bus transfers an eight bit time-slot (of a 64 kbit/s PDH channel) across the bus at a time, a single bus solution can transfer 0.8x(150/3) Mbytes/s = 40 Mbytes/s
P. Raatikainen
6 - 35
Dimensioning example (cont.)

Solution (cont.): In a single bus solution, half of the bus capacity (20 Mbytes/s) is used for storing time-slots to memory and another half for reading time-slots from memory => memory speed requirement is 1/(20 Mbytes/s) = 50 ns => during a 125 s period (= duration of an E1 frame) the bus switches 125x20 bytes = 2500 time-slots and the number of supported E1 links is 2500/32 78
Throughput of the switching system can be increased by adding a 32 bit receive-register to the shared switch memory block, which enables to transfer 4 time-slots (in parallel) through the bus at a time. By doing so, the throughput of the bus gets four fold and the number of supported E1 links increases to 312. Time-slots are still written one by one to the switch memory, and thus the memory speed requirement is 12.5 ns.
18
P. Raatikainen
6 - 37
Crossbar
Crossbar architecture
Inherently a space division switch Allows to build TST-switches if interfaces implement time-slot interchange functionality Hard to implement large switches due to complicated control schemes => high cost (cost index = N2) Commercial high-speed NxN crossbar components enable modular and relatively inexpensive fabric constructions, but still control of the switch is a problem
P. Raatikainen
6 - 38
19
Crossbar (cont.)
Crossbar architecture
Inherently a strict-sense non-blocking fabric architecture Possible to carry N time-slots through the switch at a time => high throughput => possible to implement a large number of line interfaces => scales well within the limits of available modular components => scaling up means increase of cross-point count from NxN to to (N+k)x (N+k) Switch control Multi-casting easy to implement
P. Raatikainen
6 - 39
Crossbar (cont.)
Example implementation of a crossbar
1
c AND c AND
...
AND
Inputs
2
c AND c AND
...
AND
C - connection control
P. Raatikainen
...
m
...
...
AND
... c
AND
...
AND
...
Outputs
6 - 40
20
Crossbar (cont.)
An 8x8 switch constructed of four 4x4 crossbar blocks Notice that doubling of input/output count increases the number of crossbar components from one to four.
P. Raatikainen
6 - 41
Multi-stage building blocks

Multi-stage switches usually constructed of 2x2 switching blocks Implemented usually in FPGAs (Field Programmable Gate Arrays) and/or ASICs (Application Specific Integrated Circuit)
FPGA for experimental use and low volume production ASICs for high volume production
Batcher-banyan network most popular Used to implement space division In1 switching In2 Allows to build TST-switches if In3 In4 interfaces implement time-slot In5 interchange functionality
In6 In7 In8
X X X X
Out1 Out2 Out1
In1 In2 In1
X X X X
Out1 Out2 Out3 Out4 Out5 Out6 Out7 Out8
Out2 Switch network In2 composed of Out1 2x2 blocks In1 Out2 Out1 Out2 In2 In1 In2
P. Raatikainen
6 - 42
21
Multi-stage building blocks (cont.)

Hard to implement large circuit switches due to complicated control schemes (especially rearrangeable fabrics) => high cost (cost index CNlog2N) Suitable for packet switching when self-routing functionality included Fixed duration time-slot implementations favored to obtain strictsense non-blocking fabrics Possible to carry N time-slots through the switch at a time => relatively high throughput => scalable only if larger networks can be factored using smaller NxN components => scaling up means increase of cross-point count from CNlog2N to C(N+k)log2(N+k)
In1 In2
Out1 Out2
P. Raatikainen
6 - 43
Problems with multi-stages

Path search required Fast connection establishment implies need for fast control system => part of switching capacity is lost if control system is not fast enough Multi-cast is not self evident, because multi-cast complicates path search and control scheme and increases blocking probability Multi-slot connections (i.e. several slots used for a particular connection) complicate matters - especially if path delay is not constant, e.g., slots belonging to the same connection may arrive to outputs in different order than they were at the inputs - blocking increases
P. Raatikainen
6 - 44
22
Trends in fabric technologies

Memory technology getting faster and faster Current SRAM (Static Random Access Memory) technology allows easy implementations of large PDH switches, e.g. full matrix for 8000 E1 (2M) PDH circuits - bigger fabrics hardly needed in narrow band networks => in narrow band networks the trend over the last 10 years has been to build full matrix fabrics based on shared memory However, when striving for broadband communications, memory based switch fabrics do not scale to bandwidth needed => multi-stage and crossbar switches have their change
P. Raatikainen
6 - 45
Trends in fabric technologies (cont.)

Multistage fabrics were reinvented at the advent of ATM - ATM suits perfectly for fixed length time-slot switching - self-routing and sorting applies for ATM cell routing - blocking and buffering causes headache
=> in spite of huge research effort, there have been very few commercial multi-stage fabrics available (mostly proprietary ASICs)
Development of IC technologies, increased packing density (number of gates/chip) and increased speed, have enabled crossbar fabrics suitable for high-speed switching applications (N = 2 64 and line rate 2.5 40 Gbit/s)
- examples: Cx27399/Mindspeed, ETT1/Sierra, CE200/Internet Machines and PI140xx/Agere
Packet switching and advent of optical networking favors multistages and crossbars => packet switching introduces a new problem - buffering
23
Technological tradeoffs in switch fabric design

When trying to simplify path search and to speed up connection establishment
=> bus speed increases (inside fabric) => faster memory required => power consumption increases => integration level of a cross-point product needs to be increased => faster memory required, etc.
If fast memory not available, use

=> crossbar fabrics (for small switches) => multistage fabrics (for large switches) - real switching capacity may be less than theoretical - minimization of cross-point count often pointless
Level of crosspoint integration
Memory speed
Complexity of path search
P. Raatikainen
6 - 47
Electronic design problems

Signal skew - caused by long signal lines with varying capacitive load
inside switch fabric and/or on circuit boards
Mismatching line termination - caused by long signal lines combined

with varying (high) bit rates
Varying delay on bus lines - caused by differently routed bus lines (nonuniform capacitive load)
Crosstalk - caused by electro-magnetic coupling of signals from adjacent

signal lines
Power feeding and voltage-swing - incorrectly dimensioned power

source/lines cause non-uniform voltage and lack of adequate filtering causes fluctuation of voltage
Mismatching timing signals - different line lengths from a centralized

timing source cause phase shift and distributed timing may suffer from lack of adequate synchronization
Bus speed
24
Some design limitations

Speed of available components vs. required wire speed and slot time interval Component packing density and power consumption vs. heating problem Maximum practical fan-out vs. required size of fabric Required bus length inside switch fabric - long buses decrease internal speed of fabric - diagnostics get difficult IPR policy - whether company wants to use special components or more general all-purpose components
P. Raatikainen
6 - 49
Design optimization example

An NxN switch fabric is to be designed and there are three alternative crossbar components a, b and c available - a is an NaxNa fabric component - b is an NbxNb fabric component - c is an NcxNc fabric component and Na<Nb<NcN Component a has entered the market at time ta, b at time tb and c at time tc Product development starts at tpd and the switch product should come in the market at tm. Components are expected to be available when the product development starts => ta < tb < tc tpd < tm Price of a component develops with time and is generally given by P(t)=Cf(t) + D, where Cf(t) is a time dependent and D a constant part of components price Question: Which one of the three components to choose for constructing an NxN switch fabric ?
25
Design optimization example (cont.)

As an example, lets assume that price of each component is a function of time and is given by P(t)=Ce-t/T+ D , where C, D and T are component specific constants => Pa(t)=Cae-t/Ta+ Da , Pb(t)=Cbe-t/Tb+ Db and Pc(t)=Cce-t/Tc+ Dc Number of alternative crossbar components needed to build an NxN switch => Ka = ceil[N/Na]2, Kb = ceil[N/Nb]2 , Kc = ceil[N/Nc]2 Alternative component costs as a function of time t => Pa(t)=Cae-(t- ta)/Ta+ Ca => Pb(t)=Cbe-(t- tb)/Tb+ Cb => Pc(t)=Cce-(t- tc)/Tc+ Cc These functions can be used to draw price development curves to make comparisons

Numerical example:
Let N = 64, Na = 16, Nb = 32, Nc = 64, Ta = Tb = Tc = 3 time units (years), Ca = 20,Cb = 50, Cc = 100 and Da = 10, Db = 20, Dc = 40 price units (euros) Product development period is assumed to be 1 time unit (year) and tb = ta +1.5, tc = ta +3, tm = ta +4 => tpd = ta + 3 Choosing that tpd = to = 0 => ta = t + 3, tb = t +1.5, tc = t, tm = t -1 (t tpd = 0 ) Number of components needed Ka = 16, Kb = 4, Kc = 1 Switch fabric component cost functions => Pa(t)=16[20e-(t+3)/3 + 10] => Pb(t)=4[50e-(t+1.5)/3 + 20] => Pc(t)=100e-(t)/3 + 40
P. Raatikainen
6 - 52
26

Numerical example (cont.) :
Component cost
160 140 120 100 Cost 80 60 40 20 0 0 1 2 Pa(t) 3 Time Pb(t) Pc(t) Ta(t) 4 5 6 Cost 350 300 250 200 150 100 50 0 0 1 2 3 Time Tb(t) Tc(t) 4 5 6
Switch fabric cost
Although the price of component c is manifold compared to the price of component a or b, c turn out to be the cheapest alternative Another reason to choose c is that it probably stays longest in the market giving more time for the switch product
27
Switch Fabrics
P. Raatikainen
7-1
Switch fabrics
P. Raatikainen
7-2
Fabric implementation technologies Time division fabrics

Shared media Shared memory

Buffering techniques
Buffering alternatives
Input buffering Output buffering Central buffering Combinations input-output buffering central-output buffering
P. Raatikainen
7-4
Input buffering
Buffer memories at the input interfaces
INPUT BUFFERING SWITCH FABRIC
P. Raatikainen
7-5
Input buffering (cont.)

Pros
required memory access speed - in FIFO and dual-port RAM solutions equal to incoming line rate - in one-port RAM solutions twice the incoming line rate Speed of switch fabric - multi-stages and crossbars operate at input wire speed - shared media fabrics operate at the aggregate speed of inputs low cost solution (due to low memory speed)
Cons
FIFO type of buffering => HOL problem buffer size may be large (due to HOL) HOL avoided by having a buffer for each output at each input
Output buffering
Buffer memories at the output interfaces
SWITCH FABRIC OUTPUT BUFFERING
P. Raatikainen
7-7
Output buffering (cont.)

Pros
better throughput/delay performance than in input buffered systems no HOL problem
Cons
access speed of buffer memory - in FIFO and dual-port RAM solutions N times the incoming line rate
- in one-port RAM solutions N+1 times the incoming line rate
high cost due to high memory speed requirement switch fabric operates at the aggregate speed of inputs (N x wire speed)
Central buffering
Buffer memory located between two switch fabrics - shared by all inputs/outputs - virtual buffer for each input or output
SWITCH FABRIC 1 CENTRAL BUFFERING SWITCH FABRIC 2
P. Raatikainen
7-9
Central buffering (cont.)

Pros
smaller buffer size requirement and lower average delay than in input or output buffering HOL problem can be avoided
Cons
speed of buffer memory - in dual-port RAM solutions larger than N times the incoming line rate
- in one-port RAM solutions larger than 2xN times the incoming line rate
speed of switch fabric N x wire speed complicated buffer control high cost due to high memory speed requirement and control complexity
Input-output buffering
Input-output buffering common in QoS aware switches/routers - inputs implement output specific buffers to avoid HOL
- outputs implement dedicated buffers for different traffic classes - combined buffering distributes buffering complexity between inputs and outputs
INPUT BUFFERING SWITCH FABRIC OUTPUT BUFFERING
P. Raatikainen
7 - 11
Input-central buffering
Input-central buffering used in QoS aware switches/routers - inputs implement output specific buffers to avoid HOL
- central buffer implements dedicated buffers for different traffic classes for each output
INPUT BUFFERING SWITCH FABRIC 1 CENTRAL BUFFERING SWITCH FABRIC 2
P. Raatikainen
7 - 12
Summary of buffering techniques
Buffering principle Input buffering Output buffering Central buffering
Memory space high medium low
Memory speed
(~input rate)
Memory control simple simple complicated
Queueing Multi-casting delay capabilities

(due to HOL)
slow fast fast
longest
extra logic needed supported supported but complex
(~N x input rate)
medium shortest
(~N x input rate)
P. Raatikainen
7 - 13
Priorities and buffering

Separate buffer for each traffic class A scheduler needed to control transmission data
highest priority served first longest queue served first minimization of lost packets/cells
OUTPUT/CENTRAL BUFFERING CLASS 1
Priority given to high quality traffic

low delay and delay variation traffic low loss rate traffic best customer traffic
CLASS 2
Scheduling principles
round robin weighted round robin fair queuing weighted fair queuing etc.
CLASS 3
CLASS 4
P. Raatikainen
7 - 14
Basic memory types for buffering
FIFO (First-In-First-Out) RAM (Random Access Memory) Dual-port RAM
P. Raatikainen
7 - 15
Basic memory types for buffering (cont.)

FIFO
RAM
Read/Write
DUAL-PORT RAM
Write Read
P. Raatikainen
7 - 16
Switch fabrics
P. Raatikainen
7 - 17
Fault tolerance and reliability
Definitions Fault tolerance of switching systems Modeling of tolerance and reliability
P. Raatikainen
7 - 18
Definitions

Failure, malfunction - is deviation from the intended/specified performance of a system Fault - is such a state of a device or a program which can lead to a failure Error - is an incorrect response of a program or module. An error is a indication that the module in question may be faulty, the module has received wrong input or it has been misused. An error can lead to a failure if the system is not tolerant to this sort of an error. A fault can exist without any error taking place.
P. Raatikainen
Fault tolerance
Fault tolerance is the ability of a system to continue its intended performance in spite of a fault or faults A switching system is an example of a fault tolerant system Fault tolerance always requires redundancy of some sort
P. Raatikainen
7 - 20
10
Categorization of faults
Duration based
permanent or stuck-at (stuck at zero or stuck at one) intermittent - fault requires repair actions, but its impact is not always observable transient - fault can be observed for a short period of time and disappears without repair
Observable or latent (hidden) Based on the scope of the impact (serious - less serious)
P. Raatikainen
7 - 21
Graceful degradation
Capability of a system to continue its functions under one or more faults, but on a reduced level of performance For example
in some RAID (Redundant Array Inexpensive Disks) configurations, write speed drops in case of a disk fault, but continues on a lower level of performance even while the fault has not been repaired
P. Raatikainen
7 - 22
11
Reliability and availability

Reliability R(t) - probability that a system does not fail within time t under the condition that it was functioning correctly at t = 0 for all known man-made systems R(t) 0 when t Availability A(t) - probability that a system will function correctly at time t
for a system that can be repaired A(t) approaches some value asymptotically during the useful lifetime of the system
P. Raatikainen
7 - 23
Repairable system
Maintainability M(t) - probability that a system is returned to its correct functioning state during time t under the condition that it was faulty at time t = 0
P. Raatikainen
7 - 24
12
MTTF, MTTR and MTBF

MTTF (Mean-Time-To-Failure) - expected value of the time duration from the present to the next failure MTTR (Mean-Time-To-Repair) - expected value of the time duration from a fault until the system has been restored into a correct functioning state MTBF (Mean-Time-Between-Failures) - expected value of the time duration from occurrence of a fault until the next occurrence of a fault MTBF = MTTR + MTTR
P. Raatikainen
7 - 25
High availability of a switching system

High availability of a switching system is obtained by maintenance software
Supervision Detection of errors and faults In a unit under normal working load HW implementation => fast SW implementation => detection delay Alarm system Fault analysis and pinpointing Often a rule based system Recovery Recovery - elimination of faults Diagnostics Fault location
Maintenance software is one of the most important software sub-systems in a switching system in parallel with call/connection control and charging P. Raatikainen
Utilizes In a unit temporarily redundancy without normal switch-overs load - active <=> standby restarts - a single program - a preprocessor - a single main processor - whole system - fall back to previous SW package 7 - 26
13
Main types of redundancy

Hardware redundancy
duplication (1+1) - need for self-checking-recovery blocks that detect their own faults n+r -principle (n active units and r standby units)
Software redundancy
required always in telecom systems
Information redundancy
parity bits, block codes, etc.
Time redundancy
delayed re-execution of transactions
P. Raatikainen
7 - 27
Modeling of reliability
Combinatorial models Markov analysis Other modeling techniques (not covered here)
- Fault tree analysis - Reliability block diagrams - Monte Carlo simulation
P. Raatikainen
7 - 28
14
Combinatorial reliability
A serial system S functions if and only if all its parts Si (1in) function => Rs = Ri and Fs = (1- Rs)
i=1 n S1 S2 Sn
Failures in sub-systems are supposed to be independent A parallel (replicated) system fails if all its subsystems fail => Fs = (1-Ri)
i=1 n S1 S2
and Rs = 1- Fs = 1- (1-Ri)
i=1 Sn
Reliability of a duplicated system (Ri = R) is Rs = 1- (1-R)2

7 - 29
Combinatorial reliability example 1

Calculate reliability Rs and failure probability Fs of system S given that failures in sub-systems Si are independent and for some time interval it holds that R1 = 0.90, R2 = 0.95 and R3 = R4 = 0.80 => Rs = Ri = R1 x R2 x R3-4 => R3-4 = 1- (1-Ri) = 1- (1- R3)(1- R4) => Rs = R1 x R2 x [1- (1- R3)(1- R4)] => Fs = 1- Rs = 1 - R1 x R2 x [1- (1- R3)(1- R4)] => Rs = 0.82 and Fs = 0.18
S3-4
S
S1 S2
S3 S4
15
Combinatorial reliability (cont.)

A load sharing system functions if m of the total of n sub-systems function If failures in sub-systems Si are independent then probability that the system fails is P(fails) = P(k<m) and probability that it functions is P(functioning) = P(km) = 1- P(k<m) where k is the number of functioning sub-systems P(km) = P(k=i) and P(k<m) = P(k=i)
i=m i=0 n m-1
S1 S2
m/n
Sn
P. Raatikainen
7 - 31
Combinatorial reliability example 2

As an example, suppose we have a system having m=2 and n=4 and each of the four sub-systems have a different R, i.e. R1, R2, R3 and R4, and failures in sub-systems Si are independent Probability that the system fails is
P(fails) = P(k<2) = P(k=i) = P(k=0) + P(k=1)
i=0 1
S1 S2 S3 S4
P(k=0) and P(k=1) can be derived to be

P(k=0) = (1- R1)(1- R2)(1- R3)(1- R4) P(k=1) = R1(1- R2)(1- R3)(1- R4) + (1- R1)R2(1- R3)(1- R4) + (1- R1)(1- R2) R3(1- R4) + (1- R1) (1- R2)(1- R3) R4
2/4
If R1=0.9 ,R2,=0.95 ,R3 =0.85 and R4 =0.8 then Rs = 0.994 and Fs = 0.0058
16
Combinatorial reliability (cont.)

If failures in sub-systems Si of an m/n system are independent and Ri = R for all i[1,n] then the system is a Bernoulli system and binomial distribution applies
=> Rs =
( k )Rk(1-R)n-k k=m
n
3
S1 S2
For a system of m/n = 2/3 => R2/3 = k!(3-k)! Rk(1-R)3-k = 3R2 - 2R3 k=2 If for example R
3!
m/n
Sn
= 0.9 => R2/3 = 0.972
P. Raatikainen
7 - 33
Computing MTTF
MTTF =
R(t)dt - valid for any reliability distribution
Single component with a constant failure rate (CFR) - R(t) = e-t - MTTF = 1/ Serial systems with n CFR components - Rs(t) = R1(t) x R2(t) x ... x Rn(t) = e- (1 + 2 + ... + n)t = e- st - s= 1 + 2 + ... + n MTTFs = 1/ s 1/MTTFs = 1/MTTF1 + 1/MTTF2 + ... + 1/MTTFn
17
Telecom exchange reliability from subscribers point of view
n-1/n
Line-card Subscriber module control Subscriber call control
Centralized functions
Exchange terminal CCS7 signaling processors

(n-1)/n operational processors for call setup chosen processor functions during a call
Premature release requirement P 2x10-5 applied

Failure intensity
Unit of failure intensity is defined to be [] = fit = number of faults /109 h Failure intensities for replaceable plug-in-units varies in the range 0.1 - 10 kfit Example: if failure intensity of a line-card in an exchange is 2 kfit, what is its MTTF ? MTTF = 1/ = = 58 years 2000 = 2x24x360
109 h 1 000 000 h
P. Raatikainen
7 - 36
18
Reliability modeling using Markov chains

Markov chains
A system is modeled as a set of states of transitions Each state corresponds to fulfillment of a set of conditions and each transition corresponds to an event in a system that changes from one state to another
State 1
State 2
By using this method it is possible to find reliability behavior of a complex system having a number of states and non-independent failure modes
Markov chain modeling

A set of states of transitions leads to a group of linear differential equations For a given modeling goal it is essential to choose a minimal set of states for equations to be easily solved By setting the derivatives of the probabilities to zero an asymptotic state is obtained if such exists
P0 P1
= failure intensity = repair intensity (repair time is exponentially distributed) Pi = probability of state i, e.g. P0 = R(t) and P1 = F(t),
P. Raatikainen
7 - 38
19
Markov chain modeling (cont.)

Probabilities (i) of the states and transition rates (ij) between the states are tied together with the following formula
= 0
where
= [ 1 2
n]
(12 + 13 + 21 = 31
12 (21 + 23 + 32

13 23 (31 + 32 +

P. Raatikainen
7 - 39

Example
12 13 (12 + 13 ) = + ( ) 21 21 23 23 + ( ) 31 32 31 32 = 0
12 S1 21 13 S3 32 S2 23
and = [ 1 2
n]
31
(12 + 13 ) 1 + 12 2 + 13 3 = 0 21 1 (21 + 23 ) 2 + 23 3 = 0 + ( + ) = 0 32 2 31 32 3 31 1
P. Raatikainen
7 - 40
20
Birth-death process
Birth-death process is a special case of continuous-time Markov chain, which models the size of population that increases by 1 (birth) or decreases by one (death).
0
S0 S1
1
S2
2
S3
...
1 = 0 1 0 2 = 1 0 0 2 1
k 1 1 0 0 k 21

Balance equations: - State S0 - State S1 - State Sk
0 0 = 1 1
=> => =>
(
(
+ 1 ) 1 = 0 0 + 2 2
+ k 1 ) k 1 = k 2 k 2 + k k
k 1
k =
P. Raatikainen
7 - 41
Birth-death process (cont.)

k = k1 k
1 0 0 = k 1 2 1
1 1 0 where k = k (k=1, 2, 3, ) k +1
k = 1 Substituting these expressions for k into k=0 0 +
yields
k 1 1 0 1 0 0 = 1 => 0 1 + k 1 =1 2 1 2 1 k =1 k k =1 k
1 0 k 1 => = 1 + k 1 = 0 21 k

Sk
k
Sk+1
=> k = k 1 1 0 0 (k=1, 2, 3, ) k 2 1

k+1
P. Raatikainen
7 - 42
21
Example of birth-death process

A switching system has two control computer, one on-line and one standby. The time interval between computer failures is exponentially distributed with mean tf . In case of a failure, the standby computer replaces the failed one. A single repair facility exist and repair times are exponentially distributed with mean tr . What fraction of time the system is out of use, i.e., both computers having failed? The problem can be solved by using a three state birth-death model.
0
S0 S1
1
S2
1/tr
1/tr
S1 S2
=>
S0
1
P. Raatikainen
2
1/tf
1/tf
7 - 43
Example of birth-death process (cont.)

S0 - both computer operable S1 - one computer failed S2 - both computer failed
1 1 1 tr tr = 1+ + 1 1 0 tf tf
2
=>
0 =
t r2 t r2 + t r t f + t 2 f
(probability that both computers have failed)
If tr/tf = 10 , i.e. the average repair time is 10 % of the average time between failures, then 0 =0.009009 and both computer will be out of service 0.9 % of the time.
22
Additional reading of Markov chain modeling

P. Raatikainen
7 - 45
Markov chain modeling

A continuous-time Markov Chain is a stochastic process {X(t): t 0} X(t) can have values is S={0,1,2,3,...} Each time the process enters a state i, the amount of time it spends in that state before making a transition to another state has an exponential distribution with mean 1/i When leaving state i, the process moves to a state j with probability pij where pii=0 The next state to be visited after i is independent of the length of time spend in state i
0
S0 S1
1
S2
2
S3
...
4
7 - 46
1
P. Raatikainen
23

Transition probabilities
pij ( t ) = P {X ( t + s ) = j X ( s ) = i} i= j i j
Continuous at t=0, with

1 if lim pij ( t ) = t 0 0 if
Transition matrix is a function of time

p11 ( t ) P ( t ) = p21 ( t )
p12 ( t )

P. Raatikainen
7 - 47

Transition intensity:
j (t ) = ij ( t ) =
d p jj (0 ) dt (rate at which the process leaves state j when it is in state j)
d pij ( 0 ) = i pij dt
(transition rate into state j when the process in is state i)
The process, starting in state i, spends an amount of time in that state having exponential distribution with rate i . It then moves to state j with probability
pij =
ij i
i , j
p =
ij j =1 j =1
ij
i
j =1
ij
=1
i = ij
j =1
P. Raatikainen
7 - 48
24

Chapman-Kolmogorov equations:
pij ( t + s ) = pik ( t ) pkj ( s )
kS
i , j S s , t 0
Since p(t) is a continuous function d pij ( t ) = pij (0 ) + pij ( 0) t + o( t 2 ) dt d ij ( t ) = pij (0 ) We have defined => dt For ij: For i=j:
pij ( t ) = pij (0 ) + ij t + o( t 2 ) ij t
(for small t) (for small t)
pii ( t ) = pii (0 ) + ii t + o( t 2 ) 1 + ii t
P. Raatikainen
7 - 49

From Chapman-Kolmogorov equations:
pij ( t + t ) = pik ( t ) pkj ( t ) = pij ( t ) p jj ( t ) + pik ( t ) pkj ( t )
k k j
= pij ( t ) 1 + jj t + o( t ) + pik ( t ) kj t + o( t 2 )
2 k j
pij ( t + t ) = pij ( t ) + pik ( t )kj t + pik ( t ) o( t 2 ) k k pij ( t + t ) pij ( t ) t o( t 2 ) = pik ( t )kj + pik ( t ) k k t
Taking the limit as t 0

P. Raatikainen
d pij ( t ) = pik ( t )kj dt k
i , j
7 - 50
25

The process is described by the system of differential equations:
d pij ( t ) = pik ( t )kj dt k i , j
which can be given in the form

d P ( t ) = P ( t ) dt i , j
p (t ) = 1
ij j
i , t
d d pij ( t ) = dt (1) = 0 dt j
d pij ( t ) = 0 dt j
ij
=0
The sum of of each row of is zero !

P. Raatikainen

Example
S1 12 21 13 S3 32 S2 23
31
12 13 (12 + 13 ) = (21 + 23 ) 21 23 31 32 (31 + 32 )
The sum of of each row of must be zero !

26

Steady state probabilities
lim pij ( t ) = j
t
(Independent of initial state i)
Must be non-negative and must satisfy
i =1
=1
In case of continuous-time Markov chains balance equation used to determine . For each state i, the rate at which the system leaves the state must equal to the rate at which the system enters the state =>
i i = ji j + ki k + li l
j k i l
P. Raatikainen
7 - 53

Balance equation
ij i = ki k k i j i i
Steady state distribution is computed by solving this system of equations

ij i = ki k j i k i i
i =1
=1
P. Raatikainen
7 - 54
27

An alternative derivation of the steady-state conditions begins with the differential equation describing the process:
d pij ( t ) = pik ( t )kj dt k i , j
Suppose that we take the limit of each side as t => => =>
lim
t
d pij (t ) = lim pik (t )kj t dt k
d lim pij (t ) = lim pik (t )kj t dt t k
kj = 0
i.e. =0
P. Raatikainen

Example
12 13 (12 + 13 ) = + ( ) 21 21 23 23 + ( ) 31 32 31 32 = 0
12 S1 21 13 S3 32 S2 23
and = [ 1 2
n]
31
(12 + 13 ) 1 + 21 2 + 31 3 = 0 12 1 (21 + 23 ) 2 + 32 3 = 0 + ( + ) = 0 23 2 31 32 3 13 1
28
PDH Switches
P. Raatikainen
8-1
PDH switches
General structure of telecom exchange Timing and synchronization Dimensioning example
P. Raatikainen
8-2
PDH exchange
Digital telephone exchanges are called SPC (Stored Program Control) exchanges controlled by software, which is stored in a computer or a group of computers (microprocessors) programs contain the actual intelligence to perform control functions software divided into well-defined blocks - modularity makes the system less complicated to maintain and expand Main building blocks subscriber interfaces and trunk interfaces switch fabric switch/call control
Basic blocks of a PDH exchange
SUBSCRIBER INTERFACE
SWITCH FABRIC
LOCAL LOOP SWITCH CONTROL
P. Raatikainen
TRUNK INTERFACE
8-4
Switch control
Centralized all control actions needed to set up/tear down a connection are executed in a central processing unit processing work normally shared by a number of processors hierarchical or non-hierarchical processor architecture Distributed control functions are shared by a number of processing units that are more or less independent of one another switching device divided into a number of switching parts and each of them has a control processor
P. Raatikainen
8-5
Switch control (cont.)
Centralized non-hierarchical processor system
Centralized hierarchical processor system
Control Control processor processor
RP
RP
Control units usually doubled or tripled

Central processor
RP - Regional Processor 8-6

Distributed control with independent switching parts
Switching part with control processor
P. Raatikainen
8-7
Example construction of a PDH exchange

SUBSCRIBER INTERFACES SWITCHING & CALL CONTROL TRUNK INTERFACES ET
NT
SWITCH FABRIC
NT
CONTROL PROCESSOR
ADMINISTRATION COMPUTER
AUX - Auxiliary equipment AT - Exchange Terminal NT - Network Terminal
P. Raatikainen
ET AUX
8-8
Example of call control processing
SSU SSU SSU
RU RU RU
LSU LSU LSU
CCSU CCSU CCSU
M M M
CM CM CM
STU STU STU
CCSU CM LSU M
Common Channel Signaling Unit Central Memory Line Signaling Unit Marker
RU SSU STU
- Registering Unit - Subscriber Stage Unit - Statistics Unit

DX200 / Nokia
P. Raatikainen
8-9
Hierarchical control software

Software systems in the control part:
- signaling and call control - charging and statistics - maintenance software
Administration programs
Control of connections:
- calls should not be directed to faulty destinations - faulty connections should be cleared - detected faulty connections must be reported to far-end if possible
Call control programs
Signaling message processing
P. Raatikainen
8 - 10
Switching part
Main task of switching part is to connect an incoming time-slot to an outgoing one - unit responsible for this function is called a group switch Control system assigns incoming and outgoing time-slot, which are reserved by signaling, on associated physical links => need for time and space switching
Group switch
2 A 2
1 B 1
P. Raatikainen
8 - 11
Group switch implementations

Group switch can be based on a space or time switch fabric Memory based time switch fabrics are the most common ones - flexible constructions - due to advances in IC technology suitable also for large switch fabrics
Incoming frame buffer m 3 2 1
Cyclic read
Switch memory 1 2 3 k
write address (3)
. . . . . .
Control memory 1 2 3
Outgoing frame buffer n j 2 1
Cyclic write
. . . . . .
P. Raatikainen
read/write address (j)
read address (k)
j (k)
8 - 12
Subscriber connections
Subscriber mux
Local exchange
Subscriber switch Group switch
Remote subscriber switch
P. Raatikainen
8 - 13
Subscriber and trunk interface

Subscriber interface on-hook/off-hook detection, reception of dialed digits check of subscriber line, power supply for subscriber line physical signal reception/transmission, A/D-conversion concentration Trunk interface timing and synchronization (bit and octet level) to line/clock signal coming from an exchange of higher level of hierarchy frame alignment/frame generation multiplexing/demultiplexing
P. Raatikainen
8 - 14
Example of telephone network hierarchy

International transit level National transit level Regional transit level Tandem level Local exchange
P. Raatikainen
8 - 15
Network synchronization
Need for synchronization Todays digital telecom networks are combination of PDH and SDH technologies, i.e. TDM and TDMA utilized These techniques require that time and timing in the network can be controlled, e.g., when traffic is added or dropped from a bit stream in an optical fiber or to/from a radio-transmitted signal The purpose of network synchronization is to enable the network nodes to operate with the same frequency stability and/or absolute time Network synchronism is normally obtained by applying the master-slave timing principle
P. Raatikainen
8 - 16
Network synchronization
Methods for network synchronization Distribute the clock over special synchronization links
- offers best integrity, independent of technological development and architecture of the network
Distribute the clock by utilizing traffic links - most frequently used (master-slave network superimposed on the traffic
network)
Use an independent clock in each node - expensive method, but standard solution in international exchanges Use an international navigation system in each node - GPS (Global Positioning System) deployed increasingly
- independent of technological development and architecture of network
Combine some of the above methods

Master-slave synchronization over transport network

International level
High-stability reference clocks Transit level
Local exchange level Remote subscriber switch
ITU-T Recommendations G.810, G.811, G.812, G.812, G.823

SDH synchronization network reference chain

As the number of clocks in tandem increases, synchronization signal is increasingly degraded To maintain clock quality it is important to specify limit to the number of cascaded clocks and set limit on degradation of the synchronization signal Reference chain consists of K SSUs each linked with N SECs Provisionally K and N have been set to be K=10 and N=20 - total number of SECs has been limited to 60
N x SEC 1st
PRC SSU
N x SEC 2nd
SSU
N x SEC K-th
SSU
N x SEC
PRC SEC SSU P. Raatikainen
- Primary Reference Clock (accuracy 10-11) - SDH Equipment Clock (accuracy 10-9) - Synchronization Supply Unit (accuracy 10-6) Switching Technology / 2003 8 - 19
PDH synchronization reference connection

End-to-end timing requirements are set for the reference connection Link timing errors are additive on the end-to-end connection By synchronizing the national network at both ends, timing errors can be reduced compared to totally plesiochronous (separate clock in each switch) operation International connections mostly plesiochronous
27 500 km Nation network
Local
International network TC
X
Nation network
Local
LE
X
PC
X
SC
X
ISC
X
...
ISC
X
ISC
X
TC
X
SC
X
PC
X
LE
X
LE - Local Exchange PC - Primary Exchange SC - Secondary Exchange P. Raatikainen
TC - Tertiary Exchange ISC - International Switching Center

X
Digital exchange
Digital link 8 - 20
10
Types of timing variation

Frequency offset
- steady-state timing difference - causes buffer overflows
Periodic timing differences - jitter (periodic variation > 10 Hz)

- wander (periodic variation < 10 Hz)
Random frequency variation cased by - electronic noise in phase-locked loops of timing devices and recovery systems - transients caused by switching from one clock source to another Timing variation causes - slips (= loss of a frame or duplication of a frame) in PDH systems - pointer adjustments in SDH systems => payload jitter => data errors
Visualization of jitter and wander
Jitter amplitude
P. Raatikainen
8 - 22
11
Timing variation measures

Time interval error (TIE)
- difference between the phase of a timing signal and phase of a reference (master clock) timing signal (given in ns)
Maximum time interval error (MTIE) - maximum value of TIE during a measurement period Maximum relative time interval error (MRTIE) - underlying frequency offset subtracted from MTIE Time deviation (TDEV) - average standard deviation calculated from TIE for varying window
sizes
P. Raatikainen
8 - 23
Maximum time interval error

the maximum of peak-to-peak difference in timing signal delay during a measurement period as compared to an ideal timing signal
timing delay compared to ideal signal
MTIE
t
Measurement period ( S )
12
MTIE limits for PRC, SSU and SEC

Clock source PRC Time-slot interval [ns] 25 ns 0.3t ns 300 ns 0.01t ns 25 ns 10t ns 2000 ns 433t0.2 + 0.01t ns 250 ns 100t ns 2000 ns 433t0.2 + 0.01t ns Time-slot interval [ns] 0.1 < t < 83 s 83 < t < 1000 s 1000 < t < 30 000 s t > 30 000 s 0.1 < t < 2.5 s 2.5 < t < 200 s 200 < t < 2 000 s t > 2 000 s 0.1 < t < 2.5 s 2.5 < t < 20 s 20 < t < 2 000 s t > 2 000 s
SSU
SEC
ETS 300 462-3

Occurrence of slips
Slips occur on connections whose timing differs from the timing signal used by the exchange If both ends of a connection are internally synchronized to a PRC signal, theoretically slips occur no more frequently than once in 72 days In a reference connection a slip occurs theoretically once in 72/12 = 6 days or if national segments are synchronized once in 720/4 = 18 days Slip requirement on an end-to-end connection is looser:
Average frequency of slips 5 slips / 24h 5 slips/ 24 h . 30 slips/ 1h 10 slips / 1h
Share of time during one year 98.90 % <1% < 0.1 %

8 - 26
13
Slip calculation example

Show that two networks with single frame buffers and timed from separate PRCs would see a maximum slip rate of one slip every 72 days Solution: Timing accuracy of a PRC clock is 10-11 Let the frequencies of the two ends be f1 and f2 In the worst case, these frequencies deviate from the reference clock fo by 10-11x fo and those deviations are to different directions Let the frequencies be f1 = (1+ 10-11) fo and f2 = (1- 10-11) fo Duration of bits in these networks are T1= 1/ f1 and T2= 1/ f2
Slip calculation example (cont.)

Solution (cont.): During one bit interval, the timing difference is T1- T2 and after some N bits the difference exceeds a frame length of 125 s and a slip occurs => NT1- T2 = 125x10-9 => N = 125x10-9 /[(1/ f1 -1/ f2) ] Inserting f1 = (1+ 10-11) fo and f2 = (1- 10-11) fo into the above equation, we get => N = 125x10-9 fo (1- 10-22)/(2x 10-11) Multiplying N by the duration (Tb) of one bit , we get the time (Tslip) between slips In case of E1 links, fo= 2.048x106/s and Tb = 488 ns. Dividing the obtained Tslip by 60 (s), then by 60 (min) and finally by 24 (h) we get the average time interval between successive slips to be 72.3 days
14
Synchronization of a switch
Synchronization sub-system in an exchange Supports both plesiochronous and slave mode Clock accuracy is chosen based on the location of the exchange in the synchronization hierarchy - accuracy decreases towards the leaves of the synchronization tree Synchronizes itself automatically to several PCM signals and chooses the most suitable of them (primary, secondary, etc.) Implements a timing control algorithm to eliminate - instantaneous timing differences caused by the transmission network (e.g. switchovers - automatic replacement of faulty equipment with redundant ones) - jitter Follows smoothly incoming synchronization signal
Synchronization of a switch (cont.)

Exchange follows the synchronization signal Relative error used to set requirements - maximum relative time interval error MRTIE1000 ns (S 100s) Requirement implies how well the exchange must follow the synchronization signal when the input is practically error free When none of the synchronization inputs is good enough, the exchange clock automatically switches over to plesiochronous operation In plesiochronous mode MRTIE (aS +0.5bS2 + c) ns Timing system monitors all incoming clock signals and when a quality signal is detected, the system switches over back to slave mode (either manually by an operator command or automatically)
15
Stability of an exchange clock

Clock stability is measured by aging (=b) - temperature stabilized aging in the order of n x 10-10/day MRTIE (aS +0.5bS2 + c) ns - S = measurement period - a = accuracy of the initial setting of the clock - b = clock stability (measured by aging) - c = constant
Transit node clock a b c
P. Raatikainen
Local node clock

10.0 - corresponds to an initial frequency shift of 1x10-8 2.3x10 -4 - corresponds to aging of 2x10-8 1000
8 - 31
0.5 - corresponds to an initial frequency shift of 5x10-10 1.16x10-5 - corresponds to aging of 10-9/days 1000
MRTIE in an exchange
(plesiochronous mode)
1E+10 1E+9 1E+8 1E+7
Local exchange
MRTIE ns
1E+6 1E+5 1E+4 1E+3 1E+2 1E+1 1E+0 1E+2 1E+3 1E+4 1E+5 1E+6 1E+7
Transit exchange
Observation time (S)
Duration of a time-slot in a PCM-signal is 3.9 s and duration of a bit is 488 ns

16
Example of SRAM based PDH switch fabric

4xE1
Time-interchange based 64 PCM switch

4xE2 4xE3 E4 => 64 x E1 demux 64 x E1 => E4 mux
2M
8M
34M
140M Switching Technology / 2003 8 - 33
P. Raatikainen
Example of SRAM based PDH switch fabric (cont.)

Memory size and speed requirement: Switch memory (SM) and control memory (CM) both are single chip solutions Size of both SM and CM 64x32 octets = 2048 octets Number of SM write and read cycles during a frame interval (125 s) is 2x64x32 = 4096 Access cycle of SM should be 125 s/4096 = 30,5 ns Number of CM write and read cycles during a frame interval (125 s) is 1x64x32 = 2048 Access cycle of CM should be 125 s/2048 = 61 ns
P. Raatikainen
8 - 34
17
PDH bit rates and related bit/octet times

Hierarchy level E1/2M E2/8M E3/34M E4/140M Time-slot Bit interval interval [ns] [ns] 3906 947 233 57.4 488 118 29 7.2
When time-slots turn into parallel form (8 bits in parallel) memory speed requirement decreased by a factor of 8 Present day memory technology enables up to 256 PDH E1 signals to be written to and read from a SRAM memory on wire speed
Properties of full matrix switches

Pros strict-sense non-blocking no path search - a connection can always be written into the control memory if requested output is idle multi-cast capability constant delay multi-slot connections possible Cons switch and control memory both increase in square of the number of input/outputs broadband - required memory speed may not be available
P. Raatikainen
8 - 36
18
Make full use of available memory speed

At the time of design, select components that - give adequate performance - will stay on the market long enough - are not too expensive (often price limits the use of the fastest components) To make full use of available memory speed, buses must be fast enough When increasing required memory speed, practical bus length decreases (proportional to inverse of speed)
$/SRAM Bus Bit-rate Bus bit-rate
DRAM: 40 ... 70 ns
5 ns P. Raatikainen
20 ns
Memory speed Switching Technology / 2003
Length of a bus 8 - 37
Power consumption - avoid heating

problem
Power consumption of an output gate is a function of - inputs connected to it (increased number of inputs => increased power consumption) - bit rate/clock frequency (higher bit rate => increased power consumption - bus length (long buses inside switch fabric => increased power consumption and decreased fan-out) Increase in power consumption => heating problem Power consumption and heating problem can be reduced, e.g. by using lower voltage components (higher resolution receivers)
Fan-out
Power
Bus length P. Raatikainen
Fan-out Switching Technology / 2003
Power Receivers resolution 8 - 38
19
Logical structure of a full matrix switch

1 Feasible SM with available components Feasible SM with available components 1
...
...
N Replication of inputs P. Raatikainen
Feasible SM with available components Switching Technology / 2003
...
N
N = 2n
Multiplexed inputs 8 - 39
Example of a matrix switch (DX200)

0 S/P 63 0 S/P 63 0 S/P Fan-out=32 63 0 S/P 63
Bus buffer Wr SM 32x64=2k
...
...
SM
16
Address
CM
SM
Read
SM
P/S
0 63
CM
SM
SM
SM
SM
P/S
0 63
CM
SM
SM
SM
SM
P/S
0 63
CM
SM
SM
SM
SM
P/S
0 63
Control & switching memory card
P. Raatikainen
8 - 40
20
Example of a matrix switch (cont.)

S/P (Serial/Parallel conversion) - incoming time-slots are turned into parallel form to reduce the speed on internal buses P/S (Parallel/Serial conversion) - parallel form output signals converted back to serial form 64 PCM S/P-P/S pairs implemented on one card, which is practical because PCMs are bi-directional One switch block can serve max 4 S/P-P/S pairs - which is chosen based on required capacity (64, 128, 192 or 256 E1/PCMs) One S/P+P/S pair feeds max 8 parallel switch blocks - chosen based on the required capacity in the installation (n * 256 E1/PCMs) Max size of the example DX200-system fabric is 2048 E1/PCMs Currently, a bigger matrix ( 8K E1/PCMs) is available, slightly different SRAMs are needed, but principle is similar
P. Raatikainen
8 - 41

A time-slot is forwarded from an S/P to all parallel switch blocks and in each switch block it is written to all SMs along the vertical bus A single time-slot replicated into max 4x8=32 locations Data in CMs used to store a time-slot in correct positions in SMs CM also includes data to read a correct time-slot to be forwarded to each output time-slot on each output E1 link CM includes a 16-bit pointer to a time-slot to be read
2 bits of CM content point to an SM chip and 5 + 6 = 11 bits point to a memory location on an SM chip remaining 3 bits point to (source) switch block
P. Raatikainen
8 - 42
21

Number of time-slots to be switched during a frame (125 s): - 8x4x64x32 = 65 536 time-slots (= 64 kbytes) Each time-slot stored in 4 SMs in each of the 8 switch blocks => max size of switch memory 8x4x65 536 = 2097152 (= 2 Mbytes) Every 32nd memory location is read from SM in a max size switch => average memory speed requirement < 31 ns (less than the worst case requirement 64x32 write and 64x32 read operations during a 125 s period) Control memory is composed of 4x4 control memory banks in each of the 8 switch blocks and each memory bank includes 2.048 kwords (word= 2 bytes) for write and 2.048 kwords for read control, i.e. max CM size is 8x4x4x8kbytes = 1048576 bytes (= 1 Mbytes)
Growth of matrix
256 PCM
512 PCM
P. Raatikainen
8 - 44
22
ATM Switches
P. Raatikainen
L9 - 1
ATM switches
General of ATM switching Structure of an ATM switch Example switch implementations

Knockout switch Abacus
P. Raatikainen
L9 - 2
General of ATM switching

ATM switches correspond to layer 2 in the OSI reference model and this layer can roughly be divided into a higher and lower layer: higher layer = ATM Adaptation Layer (AAL) lower layer = ATM layer
Error recovery & flow control Error & congestion control Limited error detection Layer 2 (L) Layer 1 Layer 2 (L) Layer 1 Error & congestion control Limited error detection
Network edge
Switching node
Network edge
P. Raatikainen
L9 - 3
ATM Adaptation Layer

AAL maps higher-layer information into ATM cells to be transported over an ATM network. At reception, AAL collects information from ATM cells for delivery to higher layers. AAL offers different service classes for user data
delay, bit rate and connection type (connectionless or circuit emulation) are the basic attributes of the service classes
SAR (Segmentation and Reassembly) sub-layer for segmentation of variable length user data packets into fixed-size ATM cell payloads and at reception reassembly of ATM cell payload into user packets CS (Convergence Sub-layer) maps specific user data requirements onto ATM transport network
ATM service classes

Service class Attribute Timing relation between source and destination Bit rate Connection mode ALL (s) AAL1 E1, nx64 kbit/s emulation Class A Class B Class C Class D
Required Constant Connection - o riented AAL2 Packet video, audio
Not required Variable Connec tionless AAL3/4 or ALL5 Frame Relay X.25 AAL3/4 or ALL5 IP, SMDS
Examples
P. Raatikainen
L9 - 5
ATM Adaptation layer 1 (AAL1)

1 octet P format
SAR-PDU header
1 octet
Structure pointer
46 octets
User information
Non-P format
SAR-PDU header
User information
CS
Sequence count
CRC control
Parity
Seq. number field
Seq. number protection
ATM header
ATM cell payload
SAR-PDU - Segmentation and Reassembly Packet Data Unit CS - Convergence Sub-layer CRC - Cyclic Redundancy Check
P. Raatikainen
L9 - 6

CID 8 bits LI 6 bits RES 5 bits HEC 5 bits
LLC packet header CID=1
User information
LLC packet header CID=2
User information
...
3 octets
< 61 octets
ATM header
STF
ATM cell payload
ATM header
STF
ATM cell payload
ATM header
STF
OSF 6 bits
SN 1 bit
Parity 1 bit
CID HEC LI LLC RES
- Connection Identification - Header Error Check - Length Indicator - Logical Link Control - Reserved
OSF - Offset SN - Sequence Number STF - Start Field
P. Raatikainen
L9 - 7
ATM Adaptation layer 3/4 (AAL3/4)

AAL3/4 CPCS PDU
4 octets CS-PDU header < 65 535 octets CS-PDU user information 0-3 octets Pad 4 octets CS-PDU trailer
CSPDU type
Btag
BASize
Protocol control ETag
CS User Info. length
AAL3/4 SAR PDU

2 octets SAR-PDU header
48 octets 4 - 44 octets SAR-PDU payload 2 octets Pad SAR-PDU trailer
SAR SARtype SN
MID
10 bits
SAR-PDU User info. length 6 bits
SAR-PDU CRC
10 bits
CS CPCS CRC MID SAR PDU Btag BAsize Etag
- Convergence Sub-layer - Common Part CS - Cyclic Redundancy Check - Message Identifier - Segmentation and Reassembly - Packet data Unit - Beginning tag - Buffer Allocation tag - Ending tag
2 bits 4 bits
ATM header
ATM cell payload
P. Raatikainen
L9 - 8

0 - 47 octets 2 octets 1 octet 1 octet
< 65 535 octets CS-PDU payload
4 octets CRC
Pad
UU
CPI
LEN
ATM header
ATM cell payload
ATM header
ATM cell payload
...
Pad UU CPI LEN CRC
Padding octets AAL layer user-to-user indicator Common part indicator Length indicator Cyclic redundancy Check
P. Raatikainen
L9 - 9
ATM layer
ATM layer (common to all services) offers transport of data in fixedsize cells and also defines the use of virtual connections (VPs and VCs) multiplexing/demultiplexing of cells belonging to different virtual connections translations of inbound VPIs/VCIs to outbound VPIs/VCIs cell header generation for data received from AAL and cell header extraction when a cell is delivered to AAL flow control
P. Raatikainen
L9 - 10
General of ATM switching (cont.)

ATM is a connection-oriented transport concept an end-to-end connection (virtual channel) established prior to transfer of cells signaling used for connection set up and release data transferred in fixed 53 octets long cells (5 octets for header and 48 octets for payload) Cells routed based on two header fields virtual path identifier (VPI) - 8 bits for UNI and 12 bits for NNI virtual channel identifier (VCI) - 16 bits for UNI and NNI combination of VPI and VCI determines a specific virtual connection between two end-points
ATM cell structure

5 octets 48 octets
ATM ATM header header

ATM header for UNI
GFC GFC VPI VPI VCI VCI VPI VPI VCI VCI PTI PTI
Cell Cellpayload payload
VCI VCI HEC HEC
CPL CPL
ATM header for NNI

VPI VPI VCI VCI VPI VPI VCI VCI HEC HEC VCI VCI PTI PTI CPL CPL
UNI NNI VPI VCI GFC PTI CPL HEC
- User Network Interface - Network-to-Network Interface - Virtual Path Identifier - Virtual Channel Identifier - Generic Flow Control - Payload Type Identifier - Cell Loss Priority - Header Error Control
HEC = 8 x (header octets 1 to 4) / (x8 + x2 + x + 1)
P. Raatikainen
L9 - 12

VPI/VCI is determined on a per-link basis => VPI/VCI on an incoming link is replaced (at the ATM switch) with another VPI/VCI for an outgoing link => number of possible paths in an ATM network increased substantially (compared to having end-to-end VPI/VCIs) Each ATM switch includes a Routing Information Table (RIT), which is used in mapping incoming VPI/VCIs to outgoing VPI/VCIs RIT includes: old VPI/VCI new VPI/VCI output port address priority

When an ATM cell arrives to an ATM switch, VPI/VCI in the 5-octet cell header is used to point to a RIT location, which includes new VPI/VCI to be added to an outgoing cell output port address indicating to which port the cell should be routed priority field allowing the switch to selectively send cells to output ports or discard them (in case of buffer overflow) Three routing modes: unicast - log2N bits needed to address a destination output port multi-cast - N bits needed to address destined output ports broadcast - N bits needed to address destined output ports In multi-cast/broadcast case, a cell is replicated into multiple copies and each copy is routed to its intended output port/outbound VC

ATM connections are either pre-established - permanent virtual connections (PVCs) dynamically set up - switched virtual connections (SVCs) Signaling (UNI or PNNI) messages carry call set up requests to ATM switches Each ATM switch includes a call processor, which processes call requests and decides whether the requested connection can be established updates RIT based on established and released call connections
- ensuring that VPIs/VCIs of cells, which are coming from several inputs and directed to a common output are different
finds an appropriate routing path between source and destination ports

VPI/VCI translation along transport path
ATM switch
15 X Y
ATM switch
10 Z
ATM switch
8 W
RIT
RIT
RIT
15
10
Old VPI/VCI New VPI/VCI Output port Priority field
RIT - Routing Information Table
P. Raatikainen
L9 - 16
VPI/VCI translation (cont.)
VPI/VCI replacement usually takes place at the output ports => RIT split into two parts input RIT - includes old VPI/VCI and N-bit output port address output RIT - includes log2N-bit input port address, old VPI/VCI and new VPI/VCI Since cells from different input ports can arrive to the same output port and have the same old VPI/VCI, the input port address is needed to identify uniquely different connections
P. Raatikainen
L9 - 17
ATM switches
Functional blocks of an ATM switch

Main blocks
Line interface cards (LICs), which implement input and output port controllers (IPCs and OPCs) Switch fabric provides interconnections between input and output ports Switch controller, which includes - a call processor for RIT manipulations - control processor to perform operations, administration and maintenance (OAM ) functions for switch fabric and LICs
P. Raatikainen
L9 - 19
Main functional blocks of an ATM switch

SWITCHING & CONNECTION CONTROL
LIC IPC
OPC SWITCH FABRIC
LIC IPC
OPC
SWITCH CONTROLLER
LIC - Line Interface Card IPC - Input Port Controller OPC - Output Port Controller
P. Raatikainen
L9 - 20
Functions of input port controller

Line termination and reception of incoming line signal Conversion of optical signal to electronic form if needed Decoding/descrambling of line/block coded or scrambled line signal Transport frame, e.g. SDH or PDH frame, processing Extraction of cell header for processing Storing of cell payload (or whole cells) to buffer memory HEC processing => discard corrupted cells => forward headers of uncorrupted cells to routing process Generation of a new cell header (if RIT only at input) and routing tag to be used inside switch fabric Cell stream is slotted and a cell is forwarded through switch fabric in a time-slot
Functions of output port controller

Cells received from switch fabric are stored into output buffer Generation of a new cell header (if RIT also at output) One cell at a time is transferred to the outgoing line interface If no buffering available then contention resolution => one cell transmitted and others discarded If buffering available and priorities supported then higher priority cells forwarded first to transport frame processing Cell encapsulation into transport frames, e.g. SDH or PDH frame Line/block encoding or scrambling of outgoing bit stream Conversion of electronic signal to optical form (if needed) Transmission of outgoing line signal
L9 - 22
Input and output controller blocks

Input controller blocks:
48
Buffer
From network
O/E
...
STM-1 frame
SDH
5 W
+
RIT
Z 10 W Z 10
Z 10
W ...
To switch fabric
Old VPI/VCI New VPI/VCI Output port
Output controller blocks:

From switch fabric
Z
STM-1 frame
Buffer
...
SDH
...
E/O
To network
P. Raatikainen
L9 - 23
Switch control
Switch controller implements functions of ATM management and control layer Control plane responsible for establishment and release of connections, which are either pre-established (PVCs) using management functions or set up dynamically (SVCs) on demand using signaling, such as UNI and PNNI signaling signaling/management used to update routing tables (RITs) in the switches implements ILMI (Integrated Local Management Interface), UNI signaling and PNNI routing protocols processes OAM cells
P. Raatikainen
L9 - 24

ILMI protocol uses SNMP (Simple Network Management Protocol) to provide ATM network devices with status and configuration information related to VPCs, SVCs, registered ATM addresses and capabilities of ATM interfaces UNI signaling specifies the procedures to dynamically establish, maintain and clear ATM connections at UNI PNNI protocol provides the functions to establish and clear connections, manage network resources and allow network to be easily configurable Management plane provides management functions and capabilities to exchange information between the user plane and control plane
ATM protocol reference model
Management plane
Plane management
Control plane
User plane
Layer management
Higher layer protocols Higher layer protocols ATM adaptation layer ATM layer Physical layer Terminal Node Node Terminal
P. Raatikainen
L9 - 26
Switch fabric
Provides interconnections between input and output interfaces ATM specific requirements switching of fixed length cells no regular switching pattern between an input-output port pair, i.e., time cap between consecutive cells to be switched from an input to a specific output varies with time Early implementations used time switching principle (mostly based on shared media fabrics) - easy to use, but limited scalability Increased input rates forced to consider alternative solutions => small crossbar fabrics were developed => multi-stage constructions with self-routing reinvented
P. Raatikainen
L9 - 27
Cell routing through switch fabric

Cells usually carried through switch fabric in fabric specific frames Carrier frames include, e.g. header, payload and trailer fields Header field sub-divided into source port address destination port address flow control sub-field (single/multi-cast cell, copy indication, etc.) Payload field carries an ATM cell (with or without cell header) Trailer is usually optional and implements an error indication/ correction sub-field, e.g. parity or CRC
General structure of a cell carrier frame
Frame header
P. Raatikainen
Frame payload
Frame trailer
L9 - 28
ATM switching and buffering

Due to asynchronous nature of ATM traffic, buffering is an important part of an ATM switch fabric design A number of different buffering strategies have been developed input buffering output buffering input-output buffering internal buffering shared buffering cross-point buffering recirculation buffering multi-stage shared buffering virtual output queuing buffering
Buffering strategies
Input buffered Output buffered Input-output buffered
Switch fabric
Switch fabric
Switch fabric
...
...
...
...
Internally buffered Shared buffer
...
Switch fabric
Shared memory
Switch fabric
...
P. Raatikainen
... L9 - 30
...
Buffering strategies (cont.)

Cross-point buffered Recirculation buffered
...
Switch fabric
Multi-stage shared buffer
Virtual output queuing
...
...
...
...
...
Switch fabric
...
...
... ... P. Raatikainen Switching Technology / 2005
...
...
L9 - 31 L9 - 32
ATM switching and buffering (cont.)

Input buffered switches Suffers from HOL blocking => throughput limited to 58.6 % of the maximum capacity of a switch (under uniform load) Windowing technique can be used to increase throughput, i.e. multiple cells in each input buffer are examined and considered for transmission to output ports (however only one cell transmitted in a time-slot) => window size of two gives throughput of 70 % => windowing increases implementation complexity
Input buffered Switch fabric
...
P. Raatikainen
...
...
...

Output buffered switches No HOL blocking problem Theoretically 100 % throughput possible High memory speed requirement, which can be alleviated by concentrator => output port count reduced => reduced memory speed requirement => increased cell loss rate (CLR) Output buffered systems largely used in ATM switching
Output buffered Switch fabric
...
P. Raatikainen
...
L9 - 33

Input-output buffered switches Intended to combine advantages of input and output buffering - in input buffering, memory speed comparable to input line rate - in output buffering, each output accepts up to L cells (1LN) => if there are more than L cells destined for the same output, excess cells are stored in input buffers Desired throughput can be obtained by engineering the speed up factor L, based on the input traffic distribution Output buffer memory needs to operate at L times the line rate => large-scale switches can be realized by applying input-output buffering Complicated arbitration mechanism required to determine, which L cells among the N possible HOL cells go to output port

Internally buffered switch Buffer implemented within switch blocks Example is a buffered banyan switch Buffers used to store internally blocked cells => reduced cell loss rate Suffers from low throughput and high transfer delay Support of QoS requires scheduling and buffer management schemes => increased implementation cost
Internally buffered
P. Raatikainen
L9 - 35

Shared-buffer switches All inputs and outputs have access to a common buffer memory All inputs store a cell and all outputs retrieve a cell in a time-slot Shared => high memory access speed buffer Works effectively like an output buffered switch Switch Shared Switch fabric memory fabric => optimal delay and throughput performance For a given CLR shared-buffer switches need less memory than other buffering schemes => smaller memory size reduces cost when switching speed is high ( Gbits/s) Switch size is limited by the memory access speed (read/write time) Cells destined for congested outputs can occupy shared memory leaving less room for cells destined for other outputs (solved by assigning minimum and maximum buffer capacity for each output)
...
P. Raatikainen
L9 - 36
...

Cross-point buffered
Cross-point buffered switches A crossbar switch with buffers at cross-points Buffers used to avoid output blocking Each cross-point implements a buffer and an address filter Cells addressed to an output are accepted to a corresponding buffer Cells waiting in buffers on the same column are arbitrated to the output port one per time-slot No performance limitation as with input buffering Similar to output queuing, but the queue for each output is distributed over a number (N) of buffers => total memory space for a certain CLR > CLR for an output buffered system Including cross-point memory in a crossbar chip, limits the number of cross-points

Recirculation buffered switches Switch fabric Proposed to overcome output port contention problem Cells that have lost output contention are stored in circulation buffers and they content again in the next time-slot Out-of-sequence errors avoided by assigning priority value to cells Priority level increased by one each time a cell loses contention => a cell with the highest priority is discarded if it loses contention Number of recirculation ports can be engineered to fulfill required cell loss rate (CLR = 10-6 at 80 % load and Poisson arrivals => recirculation port count divided by input port count = 2.5) Example implementations Starlite switch and Sunshine switch - Sunshine allows several cells to arrive to an output in a time-slot => dramatic reduction of recirculation ports
... P. Raatikainen Switching Technology / 2005 L9 - 38 ...
Recirculation buffered
...

Multi-stage shared buffer switches Shared buffer switches largely used in implementing small-scale switches - due to sufficiently high throughput, low delay and high memory utilization Large-scale switches can be realized by interconnecting multiple shared buffer switch modules => system performance degraded due to internal blocking In multi-stage switches, queue lengths may be different in the 1st Multi-stage shared buffer and 2nd stage buffers and thus maintenance of cell sequence at the output module may be very complex and expensive
... ... ... ... ... ... ...
P. Raatikainen
L9 - 39

Virtual output queuing switches A technique to solve HOL blocking problem in input buffered switches Each input implements a logical buffer for each output (in a common buffer memory) HOL blocking reduced and throughput increased Fast and intelligent arbitration mechanism required, because all HOL cells need to be arbitrated in each time-slot Virtual output queuing => arbitration may become the system bottleneck
...
Switch fabric
P. Raatikainen
...
L9 - 40
...
...
Design criteria for ATM switches
Several criteria need to be considered when designing an ATM switch architecture A switch should provide bounded delay and small cell loss probability while achieving a maximum throughput (close to 100%) Capacity to support high-speed input lines (which possibly deploy different transport technologies, e.g. PDH or SDH) Self-routing and distributed control essential to implement largescale switches Maintenance of correct cell sequence at outputs
P. Raatikainen
L9 - 41
Performance criteria for ATM switches

Performance defined for different quality of service (QoS) classes Performance parameters: cell loss ratio (CLR) cell transfer delay (CTD) two-point cell transfer delay variation (CDV)
Bellcore recommended performance requirements Performance parameter CLP QoS1 QoS3 QoS4
Cell loss ratio Cell loss ratio Cell transfer delay (99th percentile) Cell delay variation (10-10 quantile) Cell delay variation (10-7 quantile) N/S - not specified
0 1 1/0 1/0 1/0
< 10-10 <10-7 N/S N/S 150 s 150 s 250 s N/S N/S 250 s
<10-7 N/S 150 s N/S 250 s
Distribution of cell transfer delay

Figure below shows a typical cell transfer delay distribution through a switch node Fixed delay is attributed to table lookup delay and other cell header processing (e.g. HEC processing) For example: - Prob(CTD > 150 s) < 1 - 0.99 => a = 0.01 and x = 150 s (QoS1, 3 and 4)
- Prob(CTD > 250 s) < 10-10 => a = 10-10 and x = 250 s (QoS1)
Probability density
1- a
a Cell transfer delay
fixed density
peak - to - p eak CDV
maximum CDV P. Raatikainen Switching Technology / 2005 L9 - 43
Cell processing times at different transmission speeds

100 ms 10 ms
221 s/E1 Processing time

1 ms
2.83 s/STM-1 708 ns/STM-4
100 us 10 us 1 us 100 ns
177 ns/STM-16
9.6
64 kbit/s
384
10
34
100 Mbit/s
155
622
2.5 Gbit/s
Link speed
P. Raatikainen
L9 - 44
Delay and jitter components

User A
1 3 5
6 1 2
Node 1
3 4 1 5
6 1 2
Node 2
3 4 1 5
6 1 2
Node n
3 4 1 5
6 1
User B
3 7
1 2 3 4
Encapsulation/decapsulation delay Admission control (smoothing) Queuing delay Switching delay
5 6 7
Transmission delay Propagation delay Reassembly (playtime) delay
No contribution to jitter Contribution to jitter
P. Raatikainen
L9 - 45
ATM switches
ATM switching fabric implementations

A lot of different switching network architectures have been experimented in ATM switch fabrics :
Batcher-banyan based switches, e.g. Sunshine Clos network based switches, e.g. Atlanta Crossbar/crosspoint switches, e.g. TDXP (Tandem-Crosspoint) Ring and single/dual bus based switches
Most advanced ATM switching concepts are switching network independent, e.g. Knockout and Abacus
P. Raatikainen
L9 - 47
Knockout switch
Output buffered switches largely used in ATM networks Capacity of output buffered switches limited by memory speed Problem solved by limiting the number of cells allowed to an output during each time-slot and excess cells discarded => knockout principle How many cells to deliver to an output port during each time-slot => this number can be determined for a given cell loss rate (CLR), e.g. 12 time-slots for CLR=10-10, independent of switch size Memory speed seemed to be no more a bottleneck, however no commercial switch implementations appeared - inputs are supposed to be uncorrelated (not the case in real networks) - idea of discarding cells not an appealing one Knockout principle has been basis for various switch architectures
Knockout principle
N input lines each implement a broadcast input bus, which is connected to every output block An output block is composed of cell filters that are connected to an N-to-L concentrator, which is further connected to a shared buffer No congestion between inputs and output blocks Congestion occurs at the interfaces of outputs (inside concentrator) k cells passing through cell filters enter the concentrator and - if kL then all cells go to shared buffer - if k>L then L cells go to shared buffer and k-L cells are discarded Shared buffer includes a barrel shifter and L output (FIFO) buffers - barrel shifter stores cells coming from concentrator to FIFO memories in round robin fashion => complete sharing of output FIFO buffers
Knockout switch interconnection architecture
Inputs
1 2
Broadcast buses ...
...
1 2
Bus interfaces
N
Outputs
P. Raatikainen
L9 - 50
Knockout switch bus interface

1 2 3 4
Inputs
N-1
N Cell filters
...
Concentrator
1 2
...
L Shared buffer Cell buffers
Barrel shifter
1 2 L
...
Output P. Raatikainen Switching Technology / 2005 L9 - 51
Operation of barrel shifter

At time T
A B C
Barrel shifter
Buffer
... ... ... ... ... ... ... ...

Barrel shifter Buffer
A B C
At time T+1
D E F G H I J
... ... ... ... ... ... ... ...
I J
A B C D E F G H
P. Raatikainen
L9 - 52
Example construction of concentrator

Input
An 8-to-4 concentrator
D D D
D D D D
Outputs
P. Raatikainen
L9 - 53
Cell loss probability

In every time-slot there is a probability that a cell arrives at an input Every cell is equally likely destined for any output Pk denotes probability of k cells arriving in a time-slot to the same output, which is a binomial distribution
N Pk = 1 , k = 0,1, , N k N N Probability of a cell being dropped in N-to-L concentrator is given by
k N k
P(cell loss) =
1 N N 1 (k - L ) N k = L +1 k N
N-k
Taking the limit as N and with some manipulation

L k e - L Le- P(cell loss) = 1 1 + L! k=0 k!
P. Raatikainen
L9 - 54
Cell loss probability (cont.)

Cell loss probability for some switch sizes (90% load)
100 10-2
Cell loss probability for some load values (N = )

100 10-2
10-4 10-6
10-4 10-6 Load = 100 % Load = 90 % Load = 80 % Load = 70 % Load = 60 %
10-8 10-10
N = 16 N = 32 N = 64 N=
10-8 10-10
10-12
10
11
10-12
10
11
Number of concentrator outputs, L
Number of concentrator outputs, L
P. Raatikainen
L9 - 55
Channel grouping
Channel grouping principle used in modular two-stage networks A group of outputs treated identically in the first stage A cell destined for an output of a group is routed to any output (at the first stage), which is connected to that group at the second stage First stage switch routes cells to proper output groups and second stage switches route cells to destined output ports
1st stage
Cell to 6 Cell to 1 Cell to 4 Cell to 3 4 5 6 7
2nd stage
0 1 2 3
P. Raatikainen
L9 - 56
Channel grouping (cont.)

Asymmetric switch with line extension ratio of KM/N Output group of M output ports corresponds to a single output address for the 1st stage switch At any given time-slot, M cells at most can be cleared from a particular output group (one cell on each output port)
M 1 NxKM switch K MxM switch M
1 2
...
M MxM switch
...
M
P. Raatikainen
L9 - 57

Maximum throughput per input increases with K/N for a given M (because load per output group decreases) increases with M for given K/N (because each output group has more ports for clearing cells)
Maximum throughput per input for some values of M and K/N
M 1 2 4 8 16 K/N = 1/16 0,061 0,121 0,241 0,476 0,878 1/8 0,117 0,233 0,457 0,831 0,999 1/4 0,219 0,426 0,768 0,991 1 1/2 0,382 0,686 0,959 1 1 0,586 0,885 0,996 1 2 0,764 0,966 1 4 0,877 0,991 1 8 0,938 0,998 1 16 0,969 0,999
P. Raatikainen
L9 - 58

Maximum throughput per input increases with M for given KM/N Channel grouping has a strong effect on throughput for smaller KM/N than for larger ones
Maximum throughput as a function of line expansion ratio KM/N M KM/N = 1 2 4 8 16 32 1 0,586 0,764 0,877 0,938 0,969 0,984 2 0,686 0,885 0,966 0,991 0,998 0,999 4 0,768 0,959 0,966 1 1 1 8 0,831 0,991 1 16 0,878 0,999 32 0,912 1 64 0,937 128 0,955 256 0.968 512 0,978 1024 0,984 P. Raatikainen Switching Technology / 2005 L9 - 59
Multicast output buffered ATM switch (MOBAS)

Channel grouping extends to the general Knockout principle MOBAS adopts the general Knockout principle MOBAS consists of input port controllers (IPCs) multi-cast grouping networks (MGN1 and MGN2) multi-cast translation tables (MTTs) output port controllers (OPCs) LxM
1 2 1 NxLN switch K LMxM switch
1 M
NxN switch with group extension ratio L

N
...
LxM LMxM switch
...
M
N-M+1 N
P. Raatikainen
L9 - 60
MOBAS switch performance

IPCs terminate incoming cells, look up necessary information in translation tables and attach information in front of cells so that the cells can properly be routed in MGNs MGNs replicate multi-cast cells based on their multi-cast patterns and send one copy to each addressed output group MTTs facilitate the multi-cast cell routing MGN2 OPCs store temporarily multiple arriving cells (destined for their output ports) in an output buffer, generate multiple copies for multi-cast cells with a cell duplicator (CD), assign a new VCI obtained from a translation table to each copy, convert internal cell format to standard ATM cell format and finally send the cell to the next switching node CD reduces output buffer size by storing only one copy of a multi-cast cell - each copy is updated with a new VCI upon transmission
MOBAS architecture
MGN 1 1 IPC SM 1
L1xM MGN2 L2 OPC
Output buffer CD
MTT
...
SM 1
...
...
Group 1
...
...
...
MGN2 L1xM
...
MTT
L2
SM M
Output buffer
CD
L2
OPC
Output buffer CD
MTT
...
SM 1
SM K N IPC
...
...
Group K
IPC - Input Port Controller MGN - Multi-cast Grouping Network MTT - Multi-cast Translation Table
OPC - Output Port Controller SM - Switching Module SSM - Small Switch Module
...
MTT
L2
SM M
Output buffer
CD
CD
- Cell Duplicator
P. Raatikainen
...
N
...
M N-M+1
L9 - 62
Abacus switch
Knockout switches suffer from cell loss due to concentration/channel grouping (i.e. lack of routing links inside switch fabric) In order to reduce CLR, excess cells are stored in input buffers => result is an input-output buffered switch Abacus switch is an example of such a switch basic structure similar to MOBAS, but it does not discard cells in switch fabric switching elements resolve contention for routing links based on priority level of cells input ports store temporarily cells that have lost contention extra feedback lines and logic added to input ports distributed arbitration scheme allows switch to grow to a large size
Abacus switch (cont.)

MGM 1 2 IPC
...
LxM MTT OPC SSM 1 MTT OPC
M 1
...
IPC
RM 1
...
M M OPC N-M+1
...
...
LxM MTT
...
...
RM K
SSM K MTT OPC
...
N
IPC
IPC - Input Port Controller MGM - Multi-cast Grouping Network MTT - Multi-cast Translation Table
...
OPC - Output Port Controller RM - Routing Module SSM - Small Switch Module
P. Raatikainen
L9 - 64
ATM switches
An ATM-switch is to be designed to support 20 STM-4 interfaces. RIT will be implemented at the input interfaces. How fast should RIT lookup process be ? Cells are encapsulated into frames for delivery through the switch fabric. A frame includes a 53-octet payload field and 3 octets of overhead for routing and control inside the switch fabric. What is the required throughput of the switch fabric ?
Solution ATM cells are encapsulated into VC-4 containers, which include 9 octets of overhead and 9x260 octets of payload. One VC-4 container is carried in one STM-1 frame and each STM-1 frame contains 9x261 octets of payload and 9x9 octets of overhead. (See figure on the next slide)
ATM cell encapsulation / SDH

9 octets 261 octets
STM-1 frame
SOH AU-4 PTR

J1 B3 C2
VC-4 frame
... ...
SOH
G1 F2 H4 Z3 Z4 Z5
... ... ...
...
VC-4 POH
ATM cell
P. Raatikainen
L9 - 67
Dimensioning example (cont.)

Solution (cont.) STM-4 frame carries 4 STM-1 frames and thus there will be 4x9x260 / 53 = 176.6 cells arriving in one STM-4 frame One STM-4 frame is transported in 125 s => 176.6/125 s = 1412830.2 cells will arrive to an input in 1 sec => one RIT lookup should last no more than 707,8 ns
Total throughput of the switch fabric is 20x1412830.2 cells/s Since each cell is carried through the switch fabric in a container of 56 octets, the total load introduced by the inputs to the switch fabric is 20x1412830.2x56 octets/s 1.582 109 octets/s 12,7 Gbits/s
P. Raatikainen
L9 - 68
Routers implementations
P. Raatikainen
L10 - 1
Router implementations
General of routers Functions of an IP router Router architectures Introduction to routing table lookup
P. Raatikainen
L10 - 2
General of routers
Router is a network equipment, which
performs packet switching operations operates at network layer of the OSI protocol reference model switches/routes variable length packets routing decisions based on address information carried in packets
Router is used to connect two or more networks that may or may not be similar Routers communicate with each other by means of routing messages to
exchange routing information resolve next hop addresses maintain network topology to make routing decisions
IPv4 packet structure

4 Version - IP version number IHL - Internet header length in 32bit words (min. size = 5) TOS - type of service (guidance to end-system IP modules and router along transport path) Length - total length (header + payload) of IP packet in octets Identifier - sequence number, which together with address and protocol fields identify each IP packet uniquely Ver. 20 octets 4 IHL Identifier Time to live Protocol 8 TOS Flag 3 16 Length Fragment offset Header checksum
Source address (32 bits) Destination address (32 bits) Options + padding
Data field
Flag - used in connection with fragmentation: more bit indicates whether this fragment is the last one of a fragmented packet and dont fragment bit inhibits/prohibits packet fragmentation Fragment offset - indicates where in the original user data message this fragment belongs, measured in 64-bit units (all but last the fragment has a data field that contains a multiple of 64-bit payload) Time-to-live - defines the maximum time in seconds a packet can be in transit across the Internet (decremented by each visited router by a defined amount) Protocol - indicates the type of protocol (TCP, UDP, etc.) carried by the IP packet Header checksum - carries a checksum calculated over the header bits
P. Raatikainen
L10 - 4
IPv6 packet structure

4 Version Version - version number of IP protocol Traffic class - Class-of-service (CoS) priority of the packet Flow label - identifies all packets belonging to a specific flow (requiring a specific CoS), and routers can identify these packets and handle them in a similar fashion. A flow is uniquely identified by the combination of a source address and a non-zero flow label. 8 Traffic class Payload length (16) 40 octets 20 Flow label Next header (8) Hop limit (8)
Source address (128 bits)
Destination address (128 bits)
Data field
Payload length - indicates the number of octets in the payload field Next header - indicates the type of additional (extension) header following the main header Hop limit - value for the maximum number of hops the packet is allowed to travel in a network
P. Raatikainen
L10 - 5
General of routing Functions of an IP router Router architectures Introduction to routing table lookup
P. Raatikainen
L10 - 6
Major tasks of a router

F
In_port
...
F C - Classify (classification, filtering and routing) F - Forward (transfer of packets from input interfaces to addressed output interfaces) S - Scheduling (transmission of data packets based, e.g. on priority)
Main functional blocks of a router

Generic router architecture
Input Input Interface Interface Card #1 Input port#1 #1 Card Input Input Interface Interface Card #1 Output port #1 Card #1
Switch fabric
Network processor
P. Raatikainen
...
Out_port
L10 - 8
Input port functionality

Layer 1 termination of incoming physical links (e.g. SDH, Ethernet) Layer 2 frame decapsulation to inter-operate with data-link protocols of connected networks (e.g. AAL5/ATM/SDH and PPP/SDH) Forwarding of control packets, e.g. routing information packets (RIP, OSPF, IGMP), to network processor to update routing table and network topology Some implementations distribute a copy of routing table and table lookup to each input port, while some other implementations forward all incoming packets to a centralized routing processor
Input port functionality
Layer 1 func. Line termination Layer 2 func. Protocol decapsulation Lookup/ forwarding/ queuing
P. Raatikainen
L10 - 9
Output port functionality

Buffering of outbound packets Scheduling of buffered packets to guarantee required QoS Layer 2 frame generation and encapsulation of packets into frames (e.g. AAL5/ATM/SDH, PPP/SDH and Ethernet) Layer 1 physical signal generation
Output port functionality
Buffer management/ queuing Layer 2 func. Protocol encapsulation Layer 1 func. Line termination
P. Raatikainen
L10 - 10
Switch fabric functionality

Main function is to route data packets from input ports to addressed output ports Depending on the switch fabric implementation, packets are transported through the fabric either as uniform variable length packets or they are fragmented to fixed size data units In either case, extra information is added in front of the packets to direct them through the fabric switching of whole packets is usually applied in low-speed routers switching of fragments is normally used in high-speed routers Majority of switch fabrics are based on three basic architectures: bus based, memory based and interconnection network based
P. Raatikainen
L10 - 11
Network processor functionality

Maintenance of routing table Execution of routing protocols Maintenance of routing topology Performance of network management Wire-speed operation obtained by implementing key functions in hardware Processing of packets - classification - order management - acceleration of lookup - queue management - QoS engine
Network processor functionality

Incoming packets Classifier Classifier Classifier
Order management Embedded processor Lookup Embedded processor Lookup Embedded processor Lookup
Queue management QoS QoS Outgoing packets QoS
L10 - 12
Router classification
Access routers
link homes and small business to ISPs (Internet Service Provider) need to support a variety of access technologies, e.g. high-speed modems, cable modems and xDSL
Enterprise/metropolitan routers
used as campus and office interconnects QoS guarantees for local traffic support of several network layer protocols (e.g. IP and IPX) support of additional features, such as firewalls, security policies and virtual LANs
Backbone/long haul routers

interconnect enterprise routers huge number of packets per second => very-high-speed requirement critical components for interworking => reliability of utmost concern
P. Raatikainen
L10 - 14
Basic types of router architecture

Router with forwarding engines
Forwarding Engine Forwarding Engine Line Interface Line Interface
Router with added processing power in interfaces

Line Int. + Forwarding Line Int. + Forwarding Line Int. + Forwarding Line Int. + Forwarding
Forwarding Engine Network processor
Line Interface
Line Int. + Forwarding Network processor
Line Int. + Forwarding
P. Raatikainen
First generation router architecture

Network layer protocols were constantly changing => adaptable solution was needed => a single and common purpose processor structure was a reasonable one in which operating system in central role Low throughput (packets transferred twice through the bus) did not scale well with increasing line speeds
Shared bus, a single processor card and line interface cards

Interface Interface card #1 Interface card #1 card #1
CPU
P. Raatikainen
L10 - 15 L10 - 16
Second generation router architecture

Each line card implemented a processor => distributed and parallel routing became available Main processing unit took care of delivery of routing information to line interface cards Operating system still in central role Cache memories were introduced to speed up routing decisions (most recently used routing entries kept in cache) Increased throughput, but shared bus still a bottleneck Solution did not scale with increasing line speeds
Shared bus and a processor on each line interface card
Main CPU Interface Interface Interface card #1 card #1 card #1 + CPU
P. Raatikainen
L10 - 17
Third generation router architecture

Shared bus replaced with more powerful switch fabrics (e.g. multistage and crossbar) Parallel processing units (based on general purpose processors) Cache memories to enhance routing decision making Operating system still played an important role Communication between line interfaces no more a problem QoS increases processing power requirement (IP/TCP/application)
Interface Interface Interface card #1 #1 card card #1
+ CPU + cache
Did not scale well enough with the most advanced line speeds
Switch fabric and more processing power
P. Raatikainen CPU CPU #1
L10 - 18
Support of differentiated services

Traditional routers are limited in terms of their quality of service and differentiation features. Advances in research and hardware capabilities have provided mechanisms to overcome these limitations. Following operations, possible today to carry out in high speed, allow provisioning of differentiated services: Packet classification - distinguish packets and group them according to their different requirements Buffer management - determine how much buffer space should be given to certain kinds of network traffic and which packets should be discarded in case of congestion Packet scheduling - decide that the packet servicing order meets the bandwidth and delay requirements of different types of traffic
DiffServ routing
Appl. TCP/UDP IP Eth. MAC 100 MbE Eth. MAC 100 MbE TCP/UDP IP Eth. MAC 1 GbE Phys. (optical) Eth. MAC 1 GbE TCP/UDP IP Eth. MAC 100 MbE Appl. TCP/UDP IP Eth. MAC 100 MbE
Phys. (twisted pair)
Host 1 Router 1 Router 2
Host 2
P. Raatikainen
L10 - 20
DiffServ routing (cont.)

Appl. TCP/UDP IP Eth. MAC 100 MbE Eth. MAC 100 MbE Appl. TCP/UDP IP Eth. MAC 1 GbE Phys. (optical) Eth. MAC 1 GbE Appl. TCP/UDP IP Eth. MAC 100 MbE Appl. TCP/UDP IP Eth. MAC 100 MbE
Host 1 Router 1 Router 2
Host 2
P. Raatikainen
L10 - 21
Sharing of processing resources and pipelining

DiffServ-optimized router architecture
Interface card #1 Interface card #1 + multiple special Interfacespecial card #1 + multiple purpose CPUs Interfacespecial card #1 + multiple purpose CPUs + multiple special purpose CPUs purpose CPUs
Concentrator
Scheduler
Filtering
Route Lookup
Buffering
P. Raatikainen
L10 - 22
Sharing of processing resources and pipelining (cont.)
Packet processing divided into a number of consecutive processes each process has a dedicated processing unit (buffering, filtering, routing, etc.) Pipelined processes shared by several interfaces to increase number of line interfaces - concentrator schedules packets for processes QoS-based scheduler takes care of packet transmission from buffers to outbound interfaces
P. Raatikainen
L10 - 23
Packet processing capacity

Packet processing capacity of a router is given as the number of forwarded packets/second and/or forwarded bits/second Tasks affecting forwarding speed - link protocol processing delay (input and output) - address lookup time - switching of packets from input ports to outputs ports - queuing at output ports and possibly at input ports Other tasks that may have an impact on forwarding speed - routing table management/updates - network and router management In high capacity routers, routing table lookups are a major problem Queuing is the main component of routing latency Routing capacity requirement determined by the shortest packets
Future challenges
Increase of line speeds
- 100 Mbit/s => 1 Gbit/s => 10 Gbit/S => 40/100 Gbit/s
QoS-support => increased processing need

- DiffServ, IntServ, MPLS, ...
From best effort service to controlled use of network resources

=> programmable network nodes
Different needs in the core and edge networks

- huge routing capacity in the core network ( > 10 million packets/s) - a lot of functionality and intelligence in the edge routers
P. Raatikainen
L10 - 25
Speedup mechanisms for routing table lookup

Caching routing table entries of most lately arrived packets or entries most frequently accessed are stored in cache memory Pipelining different phases of routing table lookup are executed by different pipelined processing units Distribution of lookup to interfaces or to several routing engines network processor takes care of routing table updates and distributes updated tables to separate interface/routing engines In centralized routing solutions only packet headers are sent to a routing processor Implementation of lookup functionality in hardware (at the expense of flexibility)
Caching to speedup packet processing and forwarding

When a packet with a new destination address arrives to an input port, it passes through the conventional routing process (slow path) and its routing entry is stored in cache memory Subsequent packets carrying the same destination address are routed using the routing entry in the cache memory (fast path) A routing entry is removed Slow path from cache when predefined Fast path conditions to keep it in cache 1 expire, e.g. packet arrival rate declines or time since the last packet becomes Slow path too long

N
Fast path
P. Raatikainen
High speed router examples

GSR /Cisco - first gigabit router on the market - switching capacity of 27.5 Gbits/s - equipped with POS (Packet Over Sonet) and ATM interfaces 12000 Terabit System /Cisco - initial switching capacity of 150 Gbits/s, but scalable up to 5 Tbits/s - can be equipped with OC192/STM-64 (10 Gbits/s) interfaces NX64000 /Lucent (Nexabit) - one of the highest capacity routers (6.4 Tbits/s) - supports interface rates up to OC192/STM-64 - distributed programmable hardware based forwarding engine - 1 million routing entries on each line card - 40 ms delay guarantee for variable size packets
L10 - 27
High speed router examples (cont.)

MGR (Multi-Gigabit Router) /BBN Technologies - forwarding rate up to 21 million packets per second - switching backplane capacity of 50 Gbits/s - multiple line cards and separate forwarding engine cards plugged into a high-speed switch - only packet headers are directed to forwarding engines - payloads queued on line cards TSR (Terabit Switch-Router) / Avici - designed to be scalable from 600 Mbit/s to several Tbits/s - hardware based routing, forwarding, multi-casting and QoS service - each line card implements a 70 Gbit/s router and 20 such line cards fit into a dual-shelf chassis => total switching capacity is 1.4 Tbits/s
Example of routing table lookup speed determination

In a distributed routing table lookup solution, each input port implements a routing table. What is the maximum allowed routing decision delay if the input link is a 100 Mbit/s Ethernet link or 1 Gbit/s link and the router should operate at wire-speed ? Solution: In both example cases, the routing decision delay requirement corresponds to maximum packet arrival rate at these interfaces The maximum packet arrival rate is encountered when there is a constant stream of minimum size Ethernet frames The minimum size 100 MbE frame is 64 octets and there are 8 octets of preamble and SFD information in front of each frame and additionally there is always a 0.96 s time gap between successive frames
Example of routing table lookup determination (cont.)

Solution (cont.): Time required to transmit 72 octets (64+8) at the speed of 100 Mbit/s is 5.76 s => minimum time interval between successive frames is 5.76 s + 0.96 s = 6,72 s, which is also the maximum allowed routing decision delay for a 100 MbE input port => forwarding capacity is about 149 000 packets/s The minimum size 1 GbE frame is 512 octets and there are 8 octets of preamble and SFD information in front of each frame and there is a 96 ns time gap between successive frames Time required to transmit 520 octets (512+8) at the speed of 1 Gbit/s is 4.16 s => minimum time interval between successive frames is 4.16 + 0.096 s = 4.256 s, which is also the maximum allowed routing decision delay for a 1 GbE input port (frame bursting excluded) => forwarding capacity is about 235 000 packets/s
10/100 MbE frame

64 - 1518 octets
S
Preamble F 7 1
DA 6
SA 6
T/L 2
Payload 46 - 1500
CRC 4
Preamble - AA AA AA AA AA AA AA (Hex) SFD - Start of Frame Delimiter AB (Hex) DA - Destination Address SA - Source Address T/L - Type (RFC894, Ethernet) or Length (RFC1042, IEEE 802.3) indicator CRC - Cyclic Redundancy Check Inter-frame gap 12 octets (9,6 s /10 MbE)
1GbE frame
512 - 1518 octets
S Preamble F D
DA
SA
Payload 46 - 1500
CRC Extension
Preamble- AA AA AA AA AA AA AA (Hex) SFD- Start of Frame Delimiter AB (Hex) DA- Destination Address SA- Source Address T/L- Type (RFC894, Ethernet) or Length (RFC1042, IEEE 802.3) indicator CRC- Cyclic Redundancy Check Inter - frame gap 12 octets (96 ns /1 GbE) Extension- for padding short frames to be 512 octets long P. Raatikainen Switching Technology / 2005 L10 - 33
P. Raatikainen
L10 - 34
Classful addressing scheme

In IPv4, addresses are 32 bits long - broken up into 4 groups of 8 bits and represented usually as four decimal numbers separated by dots, e.g., 10000010 01010110 00010000 01000010 = 130.86.16.66 IP intended for interconnecting networks => routing based on network is a natural choice (rather than based on host) IP address scheme initially used a simple two level hierarchy networks at the top level and hosts at the bottom level Network part (i.e. address prefix) corresponds to the fist bits Prefixes written as bit strings up to 32 bits in IPv4 followed by *
- e.g. 1000001001010110* represents all the 216 addresses that begin with bit pattern 1000001001010110 - an alternative way is to use dotted-decimal expression, i.e., 130.86/16 (number after the slash indicates length of prefix)
Classful addressing scheme

With the two level hierarchy, IP routers forwarded packets based on the network part, until packets reached their destination network Forwarding table only needed to store a single entry to forward packets to all hosts attached to the same network - technique is called address aggregation and allows prefixes to represent a group of addresses Three different network sizes were defined: A, B and C (see figure)
7 Class A 0 14 Class B 10 21 Class C P. Raatikainen 110 Switching Technology / 2005 L10 - 36 8 16 24
Classful addresses
Classful addressing scheme worked well in the early days of the Internet Two basic problems appeared when the number of hosts and networks grew address space was not efficiently used (only three possible network sizes available) and was getting exhausted very rapidly forwarding tables in the backbone routers grew rapidly, because routers maintain an entry in the forwarding table for every allocated network address => larger memory requirement => long lookup times
P. Raatikainen
L10 - 37
Classless InterDomain Routing (CIDR)

CIDR was introduced to allow more efficient use of IP address space and to slow down growth of backbone routing tables CIDR allows prefixes to be of arbitrary length, not just 8, 16 or 24 bits as in classful address scheme A network that has identical routing information for all sub-nets, except for a single one, requires only two entries in the routing table In CIDR, each IP route is represented by a route_prefix / prefix_length pair prefix length indicates the number of significant bits in a route prefix e.g. a routing table may have prefixes 12.0.54.8/32, 12.0.54.0/24 and 12.0.0.0/16. If a packet is destined for address 12.0.54.2, the second prefix matches
Difficulties with longest match prefix search

In classful addressing scheme, prefix length is coded in the most significant bits of an IP address => address lookup is a relatively simple operation - prefixes are organized in three separate tables (A, B and C) => an exact prefix match could be found using standard search algorithms based on hashing or binary search CIDR allows reduced size of forwarding tables, but address lookup problem becomes more complex => prefixes are of arbitrary length and no longer correspond to the network part => search in the forwarding table can no longer be performed by exact matching, because the length of the prefix cannot be derived from the address itself => searching in two dimensions: bit pattern value and length
Route lookup
Primary goal in designing a data structure to be used in a forwarding table is to minimize lookup time, i.e. - minimize number of memory accesses required during lookups - minimize size of data structure (to fit partly or entirely into a cache) Secondary goals of a data structure - as few instructions during lookup as possible - keep the entities naturally aligned as much as possible to avoid expensive instructions and cumbersome bit-extraction operations A binary tree, spanning the entire IPv4 address space has a height of 32 and number of leaves is 232
Depth 32 232 leaves (IP addresses)
Route lookup (cont.)

A prefix of a routing table entry defines a path in the tree ending in some point and all IP addresses (leaves) in a sub-tree, rooted at that node, should be routed according to that routing entry, i.e. each routing table entry defines a range of IP addresses with identical routing information If several routing entries cover an IP address, the longest matching rule is applied, i.e. the longest applicable prefix should be used In the figure below, e2 represents a longer match than e1 for addresses in range r
r P. Raatikainen Switching Technology / 2005 L10 - 41
e1 e2
Route lookup based on binary trie

A trie is a tree-based structure allowing to organize prefixes on a digital basis by using the bits of prefixes to direct the branching In a trie, a node on level k represents the set of all addresses that begin with the same k bits that label the path from the root to that node, e.g. node c in the figure on the next slide is at level 3 and represents all addresses beginning with the sequence 011 Nodes that correspond to prefixes are shown in a darker shade these nodes contain the forwarding information or a pointer to it Some addresses may match several prefixes, e.g. addresses beginning with 011 will match prefixes c and a => prefix c is preferred because it is more specific (longest match rule)
A binary trie for a set of prefixes

Information stored by a node: Prefixes: a - 0* b - 01000* c - 011* d - 1* e - 100* f - 1100* g - 1101* h - 1110* I - 1111* 0 b 0 a 1 1 c 0 e 0 f 1 d 0 1 0 1 1 i
Next-hop pointer (if prefix) Left-ptr Right-ptr
0 0
1 0 g h
P. Raatikainen
L10 - 43
Address space of 5-bit long addresses

Prefixes: a - 0* b - 01000* c - 011* d - 1* e - 100*
f g h i
a
- 1100* - 1101* - 1110* - 1111*

d
e f g f g g h h h i i i
P. Raatikainen
L10 - 44
Route lookup based on binary trie

Tries allow finding the longest prefix that matches a given destination address and the search is guided by the bits of the destination address While traversing the trie and visiting a node marked as a prefix, this prefix is marked as the longest match found so far The search ends when no more branches to take exist and the longest match is the prefix of the latest visited prefix node An example address 10110
from root move to the right (1st bit value = 1) to node d marked as a prefix, i.e. 1st found prefix is 1* then move to the left (2nd bit value = 0) to a node not marked as a prefix => prefix d still valid 3rd address bit = 1, but at this point there is no branch to the right => search stops => d is the last visited prefix node and prefix of d is the longest match
Route lookup based on binary trie (cont.)

Going trough a trie is a sequential prefix search by length when trying to find a better match
begin looking in the set of length-1 prefixes, located at level 1 then proceed in the set of length-2 prefixes at level 2, then proceed to level 3 and so on
While stepping through a trie, the search space reduces hierarchically

at each step, the set of potential prefixes reduces and the search ends when this set is reduces to one
Update operations are straightforward

inserting a new prefix proceeds as a normal search and when arriving to a node with no branch to take, insert the necessary node deleting a prefix proceeds also as a search and when finding the required node, unmark it as a prefix node and delete it if necessary
Path-compressed tries
In binary tries, long sequences of one-child nodes may exist and these bits need to be inspected even though the actual branching decision has been made => search time can be longer than necessary => one-child nodes consume additional memory Lookup time of a binary trie is O(W) and memory requirement O(NW)
W is the address length in bits and N the number of entries in a table
Path-compression technique can be used to remove unnecessary one-way branch nodes and reduce search time and memory consumption
P. Raatikainen
L10 - 47
Path-compressed tries (cont.)

Path-compression was first introduced in a scheme called Patricia, which is an improvement of the binary trie structure
it is based on the observation that an internal node, which does not contain a prefix and has only one child, can be removed removal of internal nodes requires information of missing nodes to be added in remaining nodes so that search operations can be performed correctly, e.g. a simple mechanism is to store a number, which indicates how many nodes have been skipped (skip value) or the number of the next address bit to be inspected
There are many ways to exploit path-compression technique, an example is shown on the next slide Lookup time is O(W) and worst case storage requirement O(NW)
P. Raatikainen
L10 - 48
A path-compressed trie example

Prefixes: a - 0* b - 01000* c - 011* d - 1* e - 100* f - 1100* g - 1101* h - 1110* I - 1111*
0 b
Uncompressed binary trie

0 a 1 1 c 0 e 0 f 1 d 0 1 0 1 1
Compressed binary trie

3 a 0 1 1 2 d 0 1 0 g h
0 0
0 6 b
1 0 c 4 4 e
1 3 1 i 4 1
1 0 g h
4 0 5 f
Information stored by a node:

Bit string Left-ptr Next-hop ptr (if prefix) Bit position Right-ptr
P. Raatikainen
L10 - 49
A path-compressed trie example (cont.)
Two nodes preceding b have been removed Since prefix a was located at one child node, it was moved to the nearest descendant, which is not a one-child node If several one-child nodes, in a path to be compressed, contain prefixes, a list of prefixes must be maintained in some of the nodes Due to removal of one-child nodes, search jumps directly to an address bit where a significant decision is to be made => bit position of the next address bit to be inspected must be stored => bit strings of prefixes must be explicitly stored
P. Raatikainen
L10 - 50
Search in a path-compressed trie

Search goes as follows: Start from the root and descent in the trie under the guidance of the address bits, but this time only inspect bit positions indicated by the bit position number in the nodes traversed When a node marked as a prefix is encountered, comparison with the actual prefix is performed - this is needed, because during the descent in the trie, we may skip some bits If a match is found, we proceed traversing the trie and keep the prefix as the best match prefix (BMP) so far Search ends when a leaf is encountered or a mismatch found BMP is the last matching prefix encountered
P. Raatikainen
L10 - 51
A path-compressed trie example (cont.)

In the previous example case, take an address beginning with 010110 start from root and since its bit position number is 1, inspect the first bit of the address => 1st bit is 0 => go to the left => since this node is marked as a prefix, compare prefix a (0) with the corresponding part of the address => they match => keep a as the BMP so far => bit position number of the new node is 3 so skip the 2nd address bit and inspect the 3rd one, which is 0 => proceed left => next node includes a prefix so compare prefix b with the corresponding part of address => no match => stop search => the last recorded BMP is a
Multibit trie
Drawback of binary (1-bit) trie is that one bit at a time is inspected and the number of memory accesses (in the worst case) can be 32 for IPv4 Number of lookups can be substantially decreased by using the multibit trie structure, i.e. several bits are inspected at a time
for example, inspecting four bits at a time would lead to only 8 memory accesses in the worst case for an IPv4 address
Number of bits (K) to be inspected is called a stride and the stride can be constant or variable In a K-bit trie, each node has 2K pointers (children) If a route prefix is not a multiple of K, it needs to be expanded to K or its multiples Lookup time is O(W/K) and storage requirement O(2 (K-1) NW/K)
Multibit trie example 1

Prefixes a and d are expanded to length 2 and prefix c has been expanded to length 4 (rest of the prefixes remain unchanged) Height of the trie has been decreased and so has the number of memory accesses when doing a search Uncompressed binary trie
0 a 1 1 c 0 e 0 f 1 d 0 1 0 1 1 a
Variable stride multibit trie

00 01 a 00 01 10 11 0 c c e i 0 b 1 10 d 11 d 1 00 01 10 11 f g h i
0 0 0 b
1 0 g h
Information stored by a node: Next-hop pointer (if prefix) Ptr00 Ptr01 Ptr10 Ptr11
P. Raatikainen
L10 - 54
Multibit trie example 2

An alternative multibit trie of the previous example case - prefixes a and d have been expanded to length 3 - rest of the prefixes remain the same as before expansion When an expanded prefix collides with an existing one, forwarding information of the existing one must be maintained (to respect the longest match) Fixed stride multibit trie
000 001 010 011 100 101 110 111
a
00 01 10 11
00 01 10 11 00 01 10 11
b
f g g h h i
L10 - 55
Search in a multibit trie

Search in a multibit trie is essentially the same as search in a binary (1-bit) trie - successively look for longer prefixes that match and the last one found is the longest match prefix for a given address Multibit tries do linear search on length as do the binary tries, but the search is faster because the trie is traversed using larger strides A multibit trie is a fixed stride system, if all nodes at the same level have the same stride size, otherwise it a variable stride system Fixed strides are simpler to implement than variable strides, but usually consume more memory
P. Raatikainen
L10 - 56
Choice of stride size and update of tries

Choice of stride size is a trade-off between search speed and memory consumption in the extreme case, a trie with a single level could be made (stride size = 32) and search would take only one memory access, but a huge amount of memory would be required (232 entries for IPv4) a natural way to choose stride size and memory consumption is to let the binary trie structure determine this Update bounds determined by stride size a multibit trie with several levels allows, by varying stride K, an interesting trade-off between search time, memory consumption and update time - larger strides make faster search => memory consumption increases and updates will require more entries to be modified (due to expansion)
Level compression (LC) trie

Path-compressed trie is an effective way to compress a trie when nodes are sparsely populated LC-tries were developed to compress densely populated tries LC-tries combine the path-compression and multibit trie compression to optimize binary trie structures first, a binary trie is developed to a compact path-compressed trie second, the largest full binary sub-trie with multilevels is transformed into a corresponding one-level multibit sub-trie this process starts from the root node and repeats recursively on each child node of the obtained multibit sub-trie all bit strings that are proper prefixes of other ones are removed from the LC-trie meaning that only leaf nodes contain prefixes
LC-trie example
Prefixes: a - 0* b - 01000* c - 011* d - 1* e - 100*
Uncompressed binary trie

f g h I 1100* 1101* 1110* 1111* 0 a 0
0 1 1 c 0 e
1 d 0 1 0 0 f 1 1
Compressed binary trie 0 3 a 0 1 b c
1 0
1 0 g h
2 d 0 4 0 f
1 3 1 i 4 1
LC-trie
b c e f g h i
1 0 g h
P. Raatikainen
L10 - 59
Level compression trie (cont.)

To save memory, all nodes in LC-trie are stored in a single node array
first root, then all nodes at the second level, then all nodes at third level, and so on all the descendants of an internal nodes are stored in consecutive memory locations => an internal node only needs to point to its first descendant
Information stored in each node

branch number of descendants of a node (always power of 2) skip number number of bits to be skipped at this node during search operation pointer an internal node points to its first descendant (given as the index value of the first child node) and leaf node points to one entry of another base vector table, where the real prefix and next-hop information are stored
Index 0 /root 1 /b 2 /c 3 /e 4 5 /f 6 /g 7 /h 8 /i
Branch 2 0 0 0 2 0 0 0 0
Skip 0 0 0 0 3 0 0 0 0
Pointer 1 b c e 5 f g h i
P. Raatikainen
L10 - 60
Level compression trie (cont.)

Each entry of the base vector table includes
complete string of the prefix next-hop information special prefix vector, which
Index b c Prefix bit string 01000 011 Next hop ptr_b ptr_c Special prefix vector a=0/ptr_a a=0/ptr_a d=1/ptr_d d=1/ptr_d d=1/ptr_d d=1/ptr_d d=1/ptr_d
contains information of strings that e 100 ptr_e are proper prefixes of other strings f 1100 ptr_f is needed because internal nodes g 1101 ptr_g of an LC-trie do not contain pointers h 1110 ptr_h to the base vector table i 1111 ptr_i this information implies whether there exists a longer prefix matching the IP address and gives the next-hop information in case of a match
Lookup time is O(W/K) and memory requirement is O(2 K NW/K)

Multibit tries in hardware

In core network routers, lookup times are very short and lookup algorithms are implemented in hardware to obtain required speed Basic scheme uses two level multibit trie with fixed strides - 24 bits at the first level and 8 at the second level In backbone routers, most of the entries have a prefix length of 24 bits or less => longest match prefix found in one memory access in the majority of cases Only a small number of sub-entries at the 2nd level To save memory, internal nodes not allowed to store prefixes => prefixes corresponding to internal nodes expanded to 2nd level => result is a multibit trie with disjoint expanded prefixes - 1st level has 224 nodes and is implemented as a table with the same number of entries
Multibit tries in hardware (cont.)

An entry at the 1st level contains either the forwarding information or a pointer to the corresponding sub-trie at the 2nd level - two bytes needed to store a pointer/forwarding information => a memory bank of 32 Mbytes is needed to store 224 entries
Destination address
0
1st memory bank

0 1
24
2nd memory bank
Forwarding information
23 31 P. Raatikainen
224 entries
8
220 entries
Multibit tries in hardware (cont.)

Number of sub-tries at the 2nd level depends on the number of prefixes longer than 24 bits 2nd level stride is 8 bits => a sub-trie at the 2nd level has 28=256 leaves Size of 2nd memory bank depends on the expected worst case prefix length distribution, e.g. 220 one byte entries (a memory bank of 1 Mbytes) supports a maximum of 212= 4096 sub-tries at the 2nd level Lookup requires a maximum of two memory accesses - memory accesses can be pipelined or parallelized to speed up performance Since the first stride is 24 bits and leaf pushing is used, updates may take a long time in some cases
New directions in IP lookup

More efficient lookup schemes have been developed to improve the average lookup performance and storage complexity Examples of new methods are
Binary search on trie levels, which decomposes the longest prefix operation into W exact matching operations, each performed on prefixes of equal length Multiway or K-way range search, which applies a binary search to best matching prefixes by using two routing entries per prefix and with some precomputation Ternary CAM uses a special CAM (Content Addressable Memory), which performs parallel comparisons internally. TCAM stores each W-bit field as a [val, mask] pair and when a bit string is presented to the input, TCAM outputs the location (or address) where a match is found.
P. Raatikainen
L10 - 65
New directions in IP lookup (cont.)

Conventional routers offer the best-effort service by processing each incoming packet in the same way. New applications require different QoS levels and to meet these requirements new mechanisms, such as admission control, resource reservation and per-flow queuing, need to be implemented in routers. Routers are required to distinguish and classify incoming traffic into different flows Flows are specified by rules and each rule consists of operations for comparing packet fields with certain values Packet fields to be inspected are collected from different protocols => packet classification
P. Raatikainen
L10 - 66
Comparison of some lookup schemes

Scheme Binary trie Path compressed trie K-stride multibit trie LC-trie Binary search on tries level K-way rang search TCAM Worst case lookup O(W) O(W) O(W/K) O(W/K) O(log2W) O(log2W) O(1) Memory O(NW) O(N) O(2 NW/K) O(2KNW/K) O(Nlog2W) O(N) O(N)
K
Update O(W) O(W) O(W/K+2K) O(log2W) O(N) -
W - length of address in bits, N - number of prefixes in a prefix set, K - stride size Source: Proceedings of the IEEE, vol. 90, No. 9, 2002
Lookup scalability and IPv6

On scalability point of view, important aspects are the number of entries in a lookup table and the prefix length Multibit tries improve lookup speed with respect to binary tries, but only by a constant factor on the length dimension => multibit tries scale badly to longer addresses (128 bit in IPv6) Binary search on tries level has logarithmic complexity with respect to prefix length => scalability very good for IPv6 Range search has logarithmic lookup complexity with respect to the number of entries, but independent of prefix length => if the number of entries does not grow excessively, range search is scalable for IPv6
P. Raatikainen
L10 - 68
Introduction to Multiwavelength Optical Networks

Source: Stern-Bala (1999), Multiwavelength Optical Networks
Contents
The Big Picture Network Resources Network Connections
P. Raatikainen
11 - 2
Optical network
Why optical networks?
The information superhighway is still a dirt road; more accurately, it is a set of isolated multilane highways with cow paths for entrance.
Definition: An optical network is a telecommunications network

with transmission links that are optical fibers, and with an architecture designed to exploit the unique features of fibers
Thus, the term optical network (as used here)

does not necessarily imply a purely optical network, but it does imply something more than a set of fibers terminated by electronic switches
The glue that holds the purely optical network together consists of
optical network nodes (ONN) connecting the fibers within the network network access stations (NAS) interfacing user terminals and other nonoptical end-systems to the network
Network categories
Multiwavelength optical network = WDM network = optical network utilizing wavelength division multiplexing (WDM)
Transparent optical network = purely optical network Static network = broadcast-and-select network Wavelength Routed Network (WRN) Linear Lightwave Network (LLN) = waveband routed network Hybrid optical network = layered optical network Logically Routed Network (LRN)
P. Raatikainen
11 - 4
Physical picture of the network

ATM
Workstation
NAS NAS
LAN
Supercomputer
NAS
NAS ATM NAS ATM NAS ATM
ONN
Multimedia terminal
Multimedia terminal
LAN
LAN
ONN - Optical Network Node NAS - Network Access Station LAN - Local Area Network 11 - 5
P. Raatikainen
Wide area optical networks - a wish list

Connectivity
efficient and rapid means of fault identification and recovery support of a very large number of stations and end systems Structural features support of a very large number of scalability concurrent connections including multiple modularity connections per station survivability (fault tolerance) efficient support of multi-cast connections
Performance
high aggregate network throughput (hundreds of Tbps) high user bit rates (few Gbps) small end-to-end delay low error rate / high SNR low processing load in nodes and stations adaptability to changing and unbalanced loads
P. Raatikainen
Technology/cost issues
access stations: small number of optical transceivers per station and limited complexity of optical transceivers network: limited complexity of the optical network nodes, limited number and length of cables and fibers, and efficient use (and reuse) of optical spectrum
11 - 6
Optics vs. electronics

Optical domain
photonic technology is well suited to certain simple (linear) signal-routing and switching functions optical power combining, dividing and filtering wavelength multiplexing, demultiplexing and routing channelizing needed to make efficient use of enormous bandwidth of the fiber by wavelength division multiplexing (WDM) many signals operating on different wavelengths share each fiber => optics is fast but dumb - connectivity bottleneck
P. Raatikainen
Electrical domain
electronics is needed to perform more complex (nonlinear) functions signal detection, regeneration and buffering logic functions (e.g. reading and writing packet headers) however, these complex functions limit the throughput electronics also gives a possibility to include in-band control information (e.g. in packet headers) enabling a high degree of virtual connectivity easier to control => electronics is slow but smart - electronic bottleneck
11 - 7
Optics and electronics

Hybrid approach:
a multiwavelength purely optical network as a physical foundation one or more logical networks (LN) superimposed on the physical layer, each designed to serve some subset of user requirements and implemented as an electronic overlay electronic switching equipment in the logical layer acts as a middleman taking the high-bandwidth transparent channels provided by the physical layer and organizing them into an acceptable and cost-effective form
Why this hybrid approach ?

purely optical wavelength selective switches: huge aggregate throughput of few connections electronic packet switches: large number of relatively low bit rate virtual connections hybrid approach exploits the unique capabilities of optical and electronic switching while circumventing their limitations
Example LAN interconnection

Consider a future WAN serving as a backbone that interconnects a large number of high-speed LANs (say 10,000), accessing the WAN through LAN gateways (with aggregate traffic of tens of Tbps) Purely optical approach each NAS connects its LAN to the other LANs through individual optical connections 9,999 connections per NAS this is far too much for current optical technology Purely electronic approach electronics easily supports required connectivity via virtual connections however, the electronic processing bottleneck in the core network does not allow such traffic Hybrid approach: both objectives achieved, since LN composed of ATM switches provides the necessary connectivity optical backbone at the physical layer supports the required throughput
Contents
The Big Picture Network Resources
Network Links: Spectrum Partitioning Layers and Sublayers Optical Network Nodes Network Access Stations Electrical domain resources
Network Connections
Network links
A large number of concurrent connections can be supported on each network link through successive levels of multiplexing Space division multiplexing in the fiber layer: a cable consists of several (sometimes more than 100) fibers, which are used as bi-directional pairs Wavelength division multiplexing (WDM) in the optical layer: a fiber carries connections on many distinct wavelengths (-channels) assigned wavelengths must be spaced sufficiently apart to keep neighboring signal spectra from overlapping (to avoid interference) Time division multiplexing (TDM) in the transmission channel sublayer: a -channel is divided (in time) into frames and time-slots each time-slot in a frame corresponds to a transmission channel, which is capable of carrying a logical connection location of a time-slot in a frame identifies a transmission channel
Fiber resources
Cable Fibers Wavebands { Transmission channel out
-channels out
...
... ... ...
-channels
in { Transmission channel in
Space
P. Raatikainen
Wavelength
Time
11 - 12
Optical spectrum
Since wavelength and frequency f are related by f = c, where c is the velocity of light in the medium, we have the relation
f c 2
Thus, 10 GHz 0.08 nm and 100 GHz 0.8 nm in the range of 1,550 nm, where most modern lightwave networks operate The 10-GHz channel spacing is sufficient to accommodate -channels carrying aggregate digital bit rates on the order of 1 Gbps - modulation efficiency of 0.1 bps/Hz typical for optical systems The 10-GHz channel spacing is suitable for optical receivers, but much too dense to permit independent wavelength routing at the network nodes - for this, 100-GHz channel spacing is needed. In a waveband routing network, several -channels (with 10-GHz channel spacing) comprise an independently routed waveband (with 100-GHz spacing between wavebands).
Wavelength partitioning of the optical spectrum

Unusable spectrum
...
10 GHz 0.08 nm
f/ [GHz/nm]
-channel spacing for separability at receivers 1 2 m
...
100 GHz/0.8 nm
f/
-channel spacing for separability at network nodes

Wavelength and waveband partitioning of the optical spectrum

10 GHz
1,1
2,1
10,1
...
100 GHz/0.8 nm
w1
100 GHz/ 0.08 nm
w2
...
wm
f/
P. Raatikainen
11 - 15
Network based on spectrum partitioning

1, 2 ,..., m
Single waveband
1 2
w1 w2 1,10 - 10,2
1,1 -10,1
w1 w2
...
...
wm
1,m - 10,m
wm
Wavelength-routed
Waveband-routed
P. Raatikainen
11 - 16
Contents
Network Connections
Layered view of optical network (1)
VIRTUAL CONNECTION Logical layer LOGICAL PATH LOGICAL CONNECTION TRANSMISSION CHANNEL OPTICAL LAYER Physical layer FIBER LAYER OPTICAL CONNECTION -CHANNEL OPTICAL/WAVEBAND PATH FIBER LINK FIBER SECTION Sub- layers
P. Raatikainen
11 - 18
Layered view of optical network (2)

NAS E O ONN TP OT RP OR Access Link OA Network Link OA ONN E NAS O OR RP OT TP
Fiber Section Fiber Link Optical/Wavelength Path -channel Optical Connection Transmission Channel Logical Connection
-E -O - OA - ONN Electronic Optical Optical Amplifier Optical Network Node - OR - OT - RP - TR Optical Receiver Optical Transmitter Reception Processor Transmission Processor
P. Raatikainen
11 - 19
Layers and sublayers

Main consideration in breaking down optical layer into sublayers is to account for multiplexing multiple access (at several layers) switching Using multiplexing several logical connections may be combined on a -channel originating from a station Using multiple access -channels originating from several stations may carry multiple logical connections to the same station Through switching many distinct optical paths may be created on different fibers in the network, using (and reusing) -channels on the same wavelength
10
Typical connection
ES
Virtual Connection Logical Path
ES
LSN
LSN
LSN
Logical Connection
NAS
Logical Connection
NAS
Optical Connection
Optical Connection
NAS
ONN ONN
ONN
OA
ONN
OA
ONN ONN
ES LSN NAS ONN OA
= End System = Logical Switching Node = Network Access Node = Optical Network Node = Optical Amplifier
P. Raatikainen
11 - 21
Contents
Network Connections
11
Optical network nodes (1)

Optical Network Node (ONN) operates in the optical path sublayer connecting N input fibers to N outgoing fibers ONNs are in the optical domain
1 Basic building blocks: 2 wavelength multiplexer (WMUX) wavelength demultiplexer (WDMUX) N directional coupler (2x2 switch) static input fibers dynamic wavelength converter (WC) 1 2
N output fibers
P. Raatikainen
11 - 23

Static nodes without wavelength selectivity NxN broadcast star (= star coupler) Nx1 combiner 1xN divider with wavelength selectivity NxN wavelength router (= Latin router) Nx1 wavelength multiplexer (WMUX) 1xN wavelength demultiplexer (WDMUX)
P. Raatikainen
11 - 24
12

Dynamic nodes without wavelength selectivity (optical crossconnect (OXC)) NxN permutation switch RxN generalized switch RxN linear divider-combiner (LDC) with wavelength selectivity NxN wavelength selective crossconnect (WSXC) with M wavelengths NxN wavelength interchanging crossconnect (WIXC) with M wavelengths RxN waveband selective LDC with M wavebands
Wavelength multiplexer and demultiplexer
1 2 3 4 1,,4 1,,4
1 2 3 4
WMUX
WDMUX
P. Raatikainen
11 - 26
13
Directional Coupler (1)

Directional coupler (= 2x2 switch) is an optical four-port
ports 1 and 2 designated as input ports ports 1 and 2 designated as output ports
Optical power
enters a coupler through fibers attached to input ports, divided and combined linearly leaves via fibers attached to output ports
Power relations for input signal powers powers P1 and P2 are given by
P1 and P2 and output

1
a11 a21 a12 a22
11 - 27
P 1 = a11 P 1 + a12 P2 P2 = a21 P 1 + a22 P2
1 2
Denote the power transfer matrix by A

P. Raatikainen
= [aij]
Directional Coupler (2)

Ideally, the power transfer matrix A is of the form
1 A=
, 1
0 1
If parameter is fixed, the device is static, e.g. with = 1/2 and signals present at both inputs, the device acts as a 2x2 star coupler If can be varied through some external control, the device is dynamic or controllable, e.g. add-drop switch If only input port 1 is used (i.e., P2 = 0), 1 the device acts as a 1x2 divider If only output port 1 is used (and port 2 is 2 terminated), the device acts as a 2x1 combiner
1 1
1 2
11 - 28
14
Add-drop switch
OT
OR
OT
OR
Add-drop state
Bar state
P. Raatikainen
11 - 29
Broadcast star
Static NxN broadcast star with N wavelengths can carry
N simultaneous multi-cast optical connections (= full multipoint optical connectivity) 1 2 3
1 2 3 4
1/N 1/N 1 / 1/N N
1, , 4 1, , 4 1, , 4 1, , 4
1 2 3 4
4 Power is divided uniformly To avoid collisions each input signal 1 must use different wavelength 2 Directional coupler realization (N/2) log2N couplers needed 3 4
1/2 1/2
1/2 1/2
1 2 3 4
1/2 1/2
broadcast star realized by directional couplers

15
Wavelength router
Static NxN wavelength router with N wavelengths can carry N2 simultaneous unicast optical connections (= full point-to-point optical connectivity) Requires N 1xN WDMUXs N Nx1 WMUXs
1 2 3 4
1, , 4 1, , 4 1, , 4 1, , 4
1, , 4 1, , 4 1, , 4 1, , 4
1 2 3 4
WDMUXs
WMUXs
11 - 31
Crossbar switch
Dynamic RxN crossbar switch consists of R input lines 1 N output lines 2 RN crosspoints Crosspoints implemented by 3 controllable optical couplers RN couplers needed 4 A crossbar can be used as a NxN permutation switch (then R = N) or a RXN generalized switch
crossbar used as a permutation switch

11 - 32
P. Raatikainen
16
Permutation switch
1 Dynamic NxN permutation switch (e.g. crossbar switch) 2 unicast optical connections 3 between input and output ports 4 N! connection states (if nonblocking) each connection state can carry N simultaneous unicast optical connections representation of a connection state by a NxN connection matrix 1 2 3 4
output ports
1 2 3 4 1 1 2 1 3 1 4
input ports
P. Raatikainen
11 - 33
Generalized switch
Dynamic RxN generalized switch (e.g. crossbar switch)
1 , aij = NR 0,
P. Raatikainen
if switch (i,j ) is on otherwise

input ports
Input/output power relation P = AP with NxR power transfer matrix A = [aij], where
any input/output pattern possible 2 2NR connection states 3 each connection state can carry (at most) R simultaneous multicast optical 4 connections a connection state represented by a RxN connection matrix
1/N 1/N 1 / 1/N N 1/R1/R 1/R 1/R
1 2 3 4
output ports
1 2 3 4 1 1 1 2 1 1 3 1 4 1 1 1
11 - 34
17
Linear Divider-Combiner (LDC)

1 11 Linear Divider-Combiner (LDC) is 21 31 41 a generalized switch that 2 controls power-dividing and power-combining ratios 3 41 less inherent loss than in crossbar 42 43 44 4 Power-dividing and power-combining ratios ij = fraction of power from input port j directed to output port i ij = fraction of power from input port j combined onto output port i In an ideal case of lossless couplers, we have constraints 1 2 3 4
ij = 1 and ij = 1
The resulting power transfer matrix A = [aij] is such that
aij = ij ij
LDC and generalized switch realizations

directional couplers
11
rx1 combiner
12 1r
...
...
21
22
2r
1xn splitter ...
- linear divider-combiner
n1
n2
nr
Generalized optical switch

18
Wavelength selective cross-connect (WSXC)

Dynamic NxN wavelength selective crossconnect (WSXC) with M wavelengths includes N 1xM WDMUXs, M NxN permutation switches, , , 4 1, , 4 1 1 and N Mx1 WMUXs , , 4 1, , 4 (N!)M connection states 2 1 if the permutation switches , , 4 1, , 4 3 1 are nonblocking , , 4 1, , 4 4 1 each connection state can carry NM simultaneous unicast optical connections WDMUXs WMUXs representation of a connection state by M NxN connection matrices optical switches
1 2 3 4
Wavelength interchanging cross-connect (WIXC)

Dynamic NxN wavelength interchanging crossconnect (WIXC) with M wavelengths includes N 1xM WDMUXs, 1 NM x NM permutation switch, NM WCs, and N Mx1 WMUXs (NM)! connection states if , , 4 1, , 4 1 1 the permutation switch is , , 4 1, , 4 nonblocking 2 1 , , 4 1, , 4 each connection state can 3 1 carry NM simultaneous , , 4 1, , 4 4 1 unicast connections representation of a connection WCs state by a NMxNM WDMUXs WMUXs connection matrix
optical switch
1 2 3 4
19
Waveband selective LDC

Dynamic RxN waveband selective LDC with M wavebands includes R 1xM WDMUXs, M RxN LDCs, and N Mx1 WMUXs 2RNM connection states (if used as a generalized switch) w , , w4 w1, , w4 1 1 each connection state can w , , w4 w1, , w4 carry (at most) RM 2 1 simultaneous multi-cast w , , w4 w1, , w4 3 1 connections w , , w4 w1, , w4 4 1 representation of a connection state by a M RxN connection matrices
WDMUXs LDCs
1 2 3 4
WMUXs
Contents
Network Connections
20
Network access stations (1)

Network Access Station (NAS) operates in the logical connection, transmission channel and -channel sublayers NASs are the gateways between the electrical and optical domains
e/o
Functions: interfaces the external LC ports to the optical transceivers implements the functions necessary to move signals between the electrical and optical domains
1 2 L 1 2 L
a a
electronic wires
optical fibers
P. Raatikainen
11 - 41
Network access stations (2)

Transmitting side components:
Transmission Processor (TP) with a number of LC input ports and transmission channel output ports connected to optical transmitters (converts each logical signal to a transmission signal) Optical Transmitters (OT) with a laser modulated by transmission signals and connected to a WMUX (generates optical signals) WMUX multiplexes the optical signals to an outbound access fiber
Receiving side components:

WDMUX demultiplexes optical signals from an inbound access fiber and passes them to optical receivers Optical Receivers (OR) convert optical power to electrical transmission signals, which are corrupted versions of the original transmitted signals Reception Processor (RP) converts the corrupted transmission signals to logical signals (e.g. regenerating digital signals)
21
Elementary network access station

e/o
logical connection ports OT TP OT WMUX
ONN
OR RP OR WDMUX
NAS
access fiber pair internodal fiber pairs
P. Raatikainen
11 - 43
Nonblocking network access station

e/o
logical connection ports OT OT WMUXs
TP
ONN
RP OR OR WDMUXs
NAS
access fiber pairs internodal fiber pairs
P. Raatikainen
11 - 44
22
Wavelength add-drop multiplexer (WADM)

1
WDMUX
WADM
m 1
OT OR OT
2
OR TP/RP
...
m
OT OR NAS
...
P. Raatikainen
WMUX
2
...
11 - 45
Contents
Network Connections
23
End System
End systems are in the electrical domain In transparent optical networks, they are directly connected to NASs purpose is to create full logical connectivity between end stations In hybrid networks, they are connected to LSNs purpose is to create full virtual connectivity between end stations
a a
access wires
P. Raatikainen
11 - 47
Logical Switching Node (LSN)

Logical switching nodes (LSN) are needed in hybrid networks, i.e. logically routed networks (LRN) LSNs are in the electrical domain They may be e.g. SONET digital cross-connect systems 1 1 (DCS), or 2 2 ATM switches, or IP routers
N N output wires
input wires
P. Raatikainen
11 - 48
24
Logically routed network
LS
Logically switching node
Logical layer
LSN
NAS
ONN
Physical layer LSN - Logically Switching Node LS - Logical Switch NAS - Network Access Station ONN - Optical Network Node
P. Raatikainen
11 - 49
Contents

Connectivity Connections in various layers Example: realizing full connectivity between five end systems
P. Raatikainen
11 - 50
25
Connectivity
Transmitting side: one-to-one (single) unicast one-to-many multiple unicasts (single) multicast multiple multicasts Network wide: point-to-point multipoint
Receiving side: one-to-one (single) unicast (single) multicast

many-to-one
multiple unicasts multiple multicasts
Connection Graph (CG)

Representing point-to-point connectivity between end systems
transmitting side receiving side
1 2 3
1 2 3 4
Connection graph
Bipartite representation
P. Raatikainen
11 - 52
26
Connection Hypergraph (CH)

Representing multipoint connectivity between end systems
transmitting side hyperedges receiving side
1
E2
E1
1 2 3
E1 E2
1 2 3 4
Connection hypergraph
Tripartite representation
P. Raatikainen
11 - 53
Contents
P. Raatikainen
11 - 54
27
Connections in various layers

Logical connection sublayer Logical connection (LC) is a unidirectional connection between external ports on a pair of source and destination network access stations (NAS) Optical connection sublayer Optical connection (OC) defines a relation between one transmitter and one or more receivers, all operating in the same wavelength Optical path sublayer Optical path (OP) routes the aggregate power on one waveband on a fiber, which could originate from several transmitters within the waveband
P. Raatikainen
11 - 55
Notation for connections in various layers

Logical connection sublayer [a, b] = point-to-point logical connection from an external port on station a to one on station b [a, {b, c, }] = multi-cast logical connection from a to set {b, c, } station a sends the same information to all receiving stations Optical connection sublayer (a, b) = point-to-point optical connection from station a to station b (a, b)k = point-to-point optical connection from a to b using wavelength k (a,{b,c,}) = multi-cast optical connection from a to set {b,c,} Optical path sublayer a, b = point-to-point optical path from station a to station b a, bk = point-to-point optical path from a to b using waveband wk a, {b, c, } = multi-cast optical path from a to set {b,c,}
28
Example of a logical connection between two NASs

Logical connection [A,B] NAS TP
Electrical
NAS RP Transmission channel Optical connection (A,B)1

Electrical
OT
Optical WMUX
OR
1 ... m
-channel
m ... 1
Optical WDMUX
Optical path <A,B>w1

ONN w2 w1 ONN ONN
P. Raatikainen
11 - 57
Contents
P. Raatikainen
11 - 58
29
Example: realization of full connectivity between 5 end systems
1 5 2
P. Raatikainen
11 - 59
Solutions
Static network based on star physical topology
full connectivity in the logical layer (20 logical connections) 4 optical transceivers per NAS, 5 NASs, 1 ONN (broadcast star) 20 wavelengths for max throughput by WDM/WDMA
Wavelength routed network (WRN) based on bi-directional ring physical topology

full connectivity in the logical layer (20 logical connections) 4 optical transceivers per NAS, 5 NASs, 5 ONNs (WSXCs) 4 wavelengths (assuming elementary NASs)
Logically routed network (LRN) based on star physical topology and unidirectional ring logical topology
full connectivity in the virtual layer but only partial connectivity in the logical layer (5 logical connections) 1 optical transceiver per NAS, 5 NASs, 1 ONN (WSXC), 5 LSNs 1 wavelength
30
Static network realization
1 1 5 2
1 2
3 4
3 4 3
5x5 broadcast star
LCG
P. Raatikainen
11 - 61
Wavelength routed network realization
3x3 WSXC
1 2
2 5 2
3 4
LCG
P. Raatikainen
11 - 62
31
Logically routed network realization
1 2
LSN
5
1 5 2 2
3 4
3 5
5x5 WSXC
LCG
P. Raatikainen
11 - 63
32
Multiwavelength Optical Network Architectures

Source: Stern-Bala (1999), Multiwavelength Optical Networks
Static networks Wavelength Routed Networks (WRN) Linear Lightwave Networks (LLN) Logically Routed Networks (LRN)
P. Raatikainen
12 - 2
Static networks
Static network, also called broadcast-and-select network, is a purely optical shared medium network
passive splitting and combining nodes are interconnected by fibers to provide static connectivity among some or all OTs and ORs OTs broadcast and ORs select
Broadcast star network is an example of such a static network

star coupler combines all signals and broadcasts them to all ORs - static optical multi-cast paths from any station to the set of all stations
- no wavelength selectivity at the network node
optical connection is created by tuning the source OT and/or destination OR to the same wavelength two OTs must operate at different wavelengths (to avoid interference) - this is called the distinct channel assignment (DCA) constraint however, two ORs can be tuned to the same wavelength - by this way, optical multi-cast connections are created
Realization of logical connectivity

Methods to realize full point-to-point logical connectivity in a broadcast star with N nodes: WDM/WDMA - a whole -channel allocated for each LC - N(N-1) wavelengths needed (one for each LC) - N-1 transceivers needed in each NAS TDM/TDMA - 1/[N(N-1)] of a -channel allocated for each LC - 1 wavelength needed - 1 transceiver needed in each NAS TDM/T-WDMA - 1/(N-1) of a -channel allocated for each LC - N wavelengths needed (one for each OT) - 1 transceiver needed in each NAS, e.g. fixed OT and tunable OR
(FT-TR), or tunable OT and fixed OR (TT-FR)
Switching Technology / 2003 P. Raatikainen 12 - 4
Broadcast star using WDM/WDMA

LCs [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] 1 OT 2 OT 3 OT 4 OT 5 OT 6 OT NAS 1 3 5 OR OR 1 6 OR OR 2 4 OR OR LCs [2,1] [3,1] [1,2] [3,2] [1,3] [2,3]
TP
1,2 3,4
1-6 1-6
RP
TP
RP
TP
5,6
1-6
RP
star coupler
Broadcast star using TDM/TDMA

LCs [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] NAS 1 LCs [2,1] [3,1] [1,2] [3,2] [1,3] [2,3]
TP
OT
1 1
1 1
OR
RP
TP
OT
OR
RP
TP
OT
1 OR RP
star coupler
Effect of propagation delay on TDM/TDMA

OT1 [1,2]1 [1,2]2 [1,3]1 [1,2]1 [1,2]2 [1,3]1
OT2
[2,3]1
[2,3]2
Coupler
From 1 F1
From 2
From 1 F2
From 2
OR2
From 1 F1
From 2
From 1 F2
From 2
OR3
From 1 F1
From 2
From 1 F2
From 2
A TDM/TDMA schedule
P. Raatikainen
12 - 7
Broadcast star using TDM/T-WDMA in FT-TR mode

LCs [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] fixed NAS 1 tunable 2 3 OR 1 3 OR 1 2 OR LCs [2,1] [3,1] [1,2] [3,2] [1,3] [2,3]
TP
OT
1 2
1-3 1-3
RP
TP
OT
RP
TP
OT
1-3
RP
star coupler
Broadcast star using TDM/T-WDMA in TT-FR mode

LCs [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] tunable 2 OT 3 1 OT 3 1 OT 2 NAS 1 1 2,3 1,3 1-3 1-3 2 fixed LCs [2,1] [3,1] [1,2] [3,2] [1,3] [2,3]
TP
OR
RP
TP
OR
RP
TP
1,2
1-3
OR
RP
star coupler
Channel allocation schedules for circuit switching

WDM/WDMA 1 2 3 4
[1,2] [1,2] [1,2] [1,2] [1,3] [1,3] [1,3] [1,3] [2,1] [2,1] [2,1] [2,1] [2,3] [2,3] [2,3] [2,3]
TDM/T-WDMA with FT-TR 1 [1,2] [1,3] [1,2] [1,3] 2 [2,3] [2,1] [2,3] [2,1] 3 [3,1] [3,2] [3,1] [3,2] frame
TDM/T-WDMA with TT-FR 1 [2,1] [3,1] [2,1] [3,1] 2 [3,2] [1,2] [3,2] [1,2] 3 [1,3] [2,3] [1,3] [2,3] frame
5 [3,1] [3,1] [3,1] [3,1] 6 [3,2] [3,2] [3,2] [3,2] frame TDM/TDMA
Channel allocation schedule (CAS) should be - realizable = only one LC per each OT and time-slot - collision-free = only one LC per each and time-slot - conflict-free = only one LC per each OR and time-slot
1 [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] frame
Packet switching in the optical layer

Fixed capacity allocation, produced by periodic frames, is well adapted to stream-type traffic. However, in the case of bursty packet traffic this approach may produce a very poor performance By implementing packet switching in the optical layer, it is possible to maintain a very large number of LCs simultaneously using dynamic capacity allocation - packets are processed in TPs/RPs of the NASs (but not in ONNs) - TPs can schedule packets based on instantaneous demand - as before, broadcast star is used as a shared medium - control of this shared optical medium
requires a Medium Access Control (MAC) protocol NAS equipped for
packet switching
TP OT MAC RP P. Raatikainen Switching Technology / 2003 OR 12 - 11 ONN
Additional comments on static networks

The broadcast-and-select principle cannot be scaled to large networks for three reasons:
Spectrum use: Since all transmissions share the same fibers, there is no possibility of optical spectrum reuse => the required spectrum typically grows at least proportionally to the number of transmitting stations Protocol complexity: Synchronization problems, signaling overhead, time delays, and processing complexity all increase rapidly with the number of stations and with the number of LCs. Survivability: There are no alternate routes in case of a failure. Furthermore, a failure at the star coupler can bring the whole network down. For these reasons, a practical limit on the number of stations in a broadcast star is approximately 100
Contents
P. Raatikainen
12 - 13
Wavelength Routed Networks (WRN)

Wavelength routed network (WRN) is a purely optical network each -channel can be recognized in the ONNs (= wavelength
selectivity) and routed individually ONNs are typically wavelength selective crossconnects (WSXC) network is dynamic (allowing switched connections) a static WRN (allowing only dedicated connections) can be built up using static wavelength routers
All optical paths and connections are point-to-point

each point-to-point LC corresponds to a point-to-point OC full point-to-point logical/optical connectivity among N stations requires N-1 transceivers in each NAS multipoint logical connectivity only possible by several point-to-point optical connections using WDM/WDMA
Static wavelength routed star

Full point-to-point logical/optical connectivity in a static wavelength routed star with N nodes can be realized by
WDM/WDMA a whole -channel allocated for each LC N-1 wavelengths needed - spectrum reuse factor is N (= N(N-1) optical connections / N-1 wavelengths) N-1 transceivers needed in each NAS
P. Raatikainen
12 - 15
Static wavelength routed star using WDM/WDMA

LCs [1,2] [1,3] [2,1] [2,3] [3,1] [3,2] 1 OT 2 OT 2 OT 1 OT 1 OT 2 OT NAS 1 2 1 OR OR 1 2 OR OR 2 1 OR OR LC [2,1] [3,1] [1,2] [3,2] [1,3] [2,3]
TP
1,2 1,2
1,2 1,2
RP
TP
RP
TP
1,2
1,2
RP
wavelength router
Routing and channel assignment

Consider a WRN equipped with WSXCs (or wavelength routers)
no wavelength conversion possible
Establishing an optical connection requires

channel assignment routing
Channel assignment (executed in the -channel sublayer) involves

allocating an available wavelength to the connection and tuning the transmitting and receiving station to the assigned wavelength
Routing (executed in the optical path sublayer) involves determining a suitable optical path for the assigned -channel and
setting the switches in the network nodes to establish that path
P. Raatikainen
12 - 17
Channel assignment constraints

Following two channel assignment constraints apply to WRNs
wavelength continuity: wavelength of each optical connection remains the same on all links it traverses from source to destination this is unique to transparent optical networks, making routing and wavelength assignment a more challenging task than the related problem in conventional networks 1 distinct channel assignment (DCA): all optical 2 connections sharing a common fiber must be assigned distinct -channels (i.e. distinct wavelengths) 3 - this applies to access links as well as internodal links - although DCA is necessary to ensure distinguishability 1 of signals on the same fiber, it is possible (and generally advantageous) to reuse the same wavelength on 2 fiber-disjoint paths
3 P. Raatikainen Switching Technology / 2003
1 2 3 1 2 3
12 - 18
Routing and Channel Assignment (RCA) problem

Routing and channel assignment (RCA) is the fundamental control problem in large optical networks
Generally, the RCA problem for dedicated connections can be treated off-line => computationally intensive optimization techniques are appropriate On the other hand, RCA decisions for switched connections must be made rapidly, and hence suboptimal heuristics must normally be used
1 3 5 1 (1,3)1 3 (3,5)1 (4,6)2 2 4 6 2 4 6 2 4 5 1 (1,3)1 3 (3,5)1 (4,6)2 6 5
dedicated
P. Raatikainen
switched 1
switched 2
12 - 19
Example bi-directional ring with elementary NASs

Consider a bi-directional ring of 5 nodes and 3 stations with single access fiber pairs Full point-to-point logical/optical connectivity requires 2 - 4 wavelengths => spectrum reuse factor is 20/4 = 5 physical topology - 4 transceivers in each NAS
3 2 1 5 1 2 3 4 5 1 -1R 3R 4L 2L 2 1L -4R 2R 3L 4
L 1
3 2L 3L -1R 4R
4 3R 4L 2L -1R
5 4R 2R 1L 3L --
Fiber from ONN1 to ONN2

RCA
12 - 20
Example bi-directional ring with nonblocking NASs

Consider a bi-directional ring of 5 nodes and 3 stations with two access fiber pairs Full point-to-point logical/optical connectivity requires 2 - 3 wavelengths => spectrum reuse factor is 20/3 = 6.67 physical topology - 4 transceivers in each NAS
3 2 1 5 1 2 3 4 5 1 -1R 2R 2L 1L 2 1L -1R 3R 3L 4
L 1
3 2L 1L -2R 1R
4 2R 3L 2L -3R
5 1R 3R 1L 3L --
Fiber from ONN1 to ONN2

RCA
12 - 21
Example mesh network with elementary NASs

Consider a mesh network of 5 nodes and stations with single access fiber pairs Full point-to-point logical/optical connectivity requires 4 wavelengths => spectrum reuse factor is 20/4 = 5 4 transceivers in each NAS despite the richer physical topology, no difference with the corresponding bidirectional ring (thus, the access fibers are the bottleneck)
4 5 1 2 3
physical topology
RCA?
Example mesh network with nonblocking NASs

Consider a mesh network of 5 nodes and stations with three/four access fiber pairs Full point-to-point logical/optical connectivity requires only 2 wavelengths => spectrum reuse factor is 20/2 = 10 4 transceivers in each NAS
4 5 1 2 3
physical topology
RCA?
Contents
P. Raatikainen
12 - 24
Linear Lightwave Networks (LLN)

Linear lightwave network (LLN) is a purely optical network
nodes perform (only) strictly linear operations on optical signals
This class includes

both static and wavelength routed networks but also something more
The most general type of LLN has waveband selective LDC nodes
LDC performs controllable optical signal dividing, routing and combining these functions are required to support multipoint optical connectivity
Waveband selectivity in nodes means that

optical path layer routes signals as bundles that contain all -channels within one waveband
Thus, all layers of connectivity and their interrelations must be examined carefully
Routing and channel assignment constraints

Two constraints of WRNs need also to be satisfied by LLNs
Wavelength continuity: wavelength of each optical connection remains the same on all the links it traverses from source to destination Distinct channel assignment (DCA): all optical connections sharing a common fiber must be assigned distinct -channels
Additionally, the following two routing constraints apply to LLNs

Inseparability: channels combined on a single fiber and situated within the same waveband cannot be separated within the network - this is a consequence of the fact that the LDCs operate on the aggregate power carried within each waveband Distinct source combining (DSC): only signals from distinct sources are allowed to be combined on the same fiber - DSC condition forbids a signal from splitting, taking multiple paths, and then
recombining with itself - otherwise, combined signals would interfere with each other
Inseparability
1
C A a H
3 1
S1 S2
1*
B D E
S2
3* S1 2* S1 1*
S1
C a H
3
A
2 S2
B D E
S2
G
3*
2*
P. Raatikainen
12 - 27
Two violations of DSC
C A B
C A B
P. Raatikainen
12 - 28
Inadvertent violation of DSC
S1 A B H
S1 + S2
C F D
S3 d
S1 + S2
1*
S2
f S1 + S2 + S3
G
3*
S1 + S2 + S3 S1 + S2 + S3 2*
P. Raatikainen
12 - 29
Avoidance of DSC violations

1
S1 A B H
S3
C F D E
2*
S1
1*
G
S2 + S3 3*
S2
3 1
S2 + S3
S1 A
a
C F
f
S1 + S2 + S3
1*
B
h b
G
S1 + S2 + S3 3*
S2
D
S3
E
2*
S1 + S2
P. Raatikainen
12 - 30
Color clash
1
S1 A
(1, 1*)1
C B F D
d
S1
1*
G
3*
S2
(2, 2*)1
E
2*
3 1
S1 A B H
S3
C F D
d S2 (3, 3*)2 2*
S1 f S2 + S3
1*
G
3*
S2
P. Raatikainen
12 - 31
Power distribution
In a LDC it is possible to specify combining and dividing ratios
ratios determine how power from sources is distributed to destinations combining and dividing ratios can be set differently for each waveband
How should these ratios be chosen? The objective could be

to split each sources power equally among all destinations it reaches to combine equally all sources arriving at the same destination
Resultant end-to-end power transfer coefficients are independent of

routing paths through the network number of nodes they traverse order in which signals are combined and split
Coefficients depend only on

number of destinations for each source number of sources reaching each destination
Illustration of power distribution
(1/2)(S1+ S2) a S3 b
1/3 2/3
a
1
b (1/6) )(S1+ S2) h
h
2/3
1/3
c (2/9) )(S1+ S2)+(1/3) S3
P. Raatikainen
12 - 33
Multipoint subnets in LLNs

Attempts to set up several point-to-point optical connections within a common waveband leads to unintentional creation of multipoint paths => complications in routing, channel assignment and power distribution On the other hand, waveband routing leads to more efficient use of the optical spectrum In addition, the multipoint optical path capability is useful when creating intentional multipoint optical connections
LLNs can deliver a high degree of logical connectivity with minimal optical hardware in the access stations this is one of the fundamental advantages of LLNs over WRNs
Multipoint optical connections can be utilized when creating a full logical connectivity among specified clusters of stations within a larger network => such fully connected clusters are called multipoint subnets (MPS)
Example - seven stations on a mesh

Consider a network containing seven stations interconnected on a LLN with a mesh physical topology and bidirectional fiber links - notation for fiber labeling: a and a form
a fiber pair with opposite directions
4
4 D d e a
1
f
5
C 5 c g 7 b 2
E
7
Set of stations {2,3,4} should be interconnected to create a MPS with full logical connectivity This can be achieved, e.g. by creating an optical path on a single waveband in the form of a tree joining the three stations (embedded broadcast star)
P. Raatikainen
A 6
6
physical topology
2 3 4 2 3 4 2 3 4
2 3 4
LCG
LCH
12 - 35
Realization of MPS by a tree embedded in mesh

4
D f C
Optical path c B
2 2
2
f g c 3
B
g g 3 f
B C
f 3
f g c 3
3 4
3 4
P. Raatikainen
12 - 36
Contents
Static networks Wavelength Routed Networks (WRN) Linear Lightwave Networks (LLN)
Seven-station example
Logically Routed Networks (LRN)
P. Raatikainen
12 - 37
Seven-station example
Assume:
nonblocking access stations each transmitter runs at a bit rate of R0 bi-directional ring mesh multistar of seven physical stars fully connected (point-to-point logical topology with 42 edges) - realized using WRN fully shared (hypernet logical topology with a single hyperedge) - realized using a broadcast-and-select network (LLN of a single MPS) partially shared (hypernet of seven hyperedges) - realized using LLN of seven MPSs
Physical topologies (PT):
Logical topologies (LT):
Physical topologies
1 2 3 4 5
A B C D E F G
D
3
E F G A
1
D
6
5
C B
2
E
7
7 1
A
6
2 6 7
ring
mesh
multistar
P. Raatikainen
12 - 39
Fully connected LT - WRN realizations

Ring PT: 6 s with spectrum reuse factor of 42/6 = 7
=> RCA? 6 transceivers in each NAS network capacity = 7*6 = 42 R0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Mesh PT: 4 s with spectrum reuse factor of 42/4 = 10.5

Multistar PT: 2 s with spectrum reuse factor of 42/2 = 21

LCG
12 - 40
Fully shared LT - Broadcast and select network realizations

Any PT WDM/WDMA: 42 s with spectrum reuse factor of 1
6 transceivers in each NAS network capacity = 7*6 = 42 R0
1 2 3 1 2 3
TDM/T-WDMA in FT-TR mode: 7 s with spectrum reuse factor of 1

1 transceiver in each NAS network capacity = 7*1 = 7 R0
E1
4 5 6 7 4 5 6 7
TDM/TDMA: 1 with spectrum reuse factor of 1

1 transceiver in each NAS network capacity = 7*1/7 = 1 R0
LCH
12 - 41
Partially shared LT - LLN realizations

Note: Full logical connectivity among all stations Mesh PT using TDM/T-WDMA in FT-TR mode:
2 wavebands with spectrum reuse factor of 7/2 = 3.5 => RCA? 3 s per waveband 3 transceivers in each NAS network capacity = 7*3 = 21 R0
E1
1 1
E2
2 2
E3
3 3
E4
4 4
Multistar PT using TDM/T-WDMA in FT-TR mode:

1 waveband with spectrum reuse factor of 7/1 = 7 => RCA? 3 s per waveband 3 transceivers in each NAS network capacity = 7*3 = 21 R0
E5
5 5
E6
6 6
E7
7 7
LCH
12 - 42
Contents
P. Raatikainen
12 - 43
Logically Routed Networks (LRN)

For small networks, high logical connectivity is reasonably achieved by purely optical networks. However, when moving to larger networks, the transparent optical approach soon reaches its limits. For example, to achieve full logical connectivity among 22 stations on a bi-directional ring using wavelength routed point-to-point optical connections 21 transceivers are needed in each NAS and totally 61 wavelengths. Economically and technologically, this is well beyond current capabilities. => we must turn to electronics (i.e. logically routed networks) Logically routed network (LRN) is a hybrid optical network which performs logical switching (by logical switching nodes (LSN)) on top of a transparent optical network LSNs create an extra layer of connectivity between the end systems and NASs
Two approaches to create full connectivity
Multihop networks based on point-to-point logical topologies

realized by WRNs
Hypernets based on multipoint logical topologies

realized by LLNs
P. Raatikainen
12 - 45
Point-to-point logical topologies

In a point-to-point logical topology
a hop corresponds to a logical link between two LSNs maximum throughput is inversely proportional to the average hop count
One of the objectives of using logical switching on top of a transparent optical network is
to reduce cost of station equipment (by reducing the number of optical transceivers and complexity of optics) while maintaining high network performance
Thus, we are interested in logical topologies that

achieve a small average number of logical hops at a low cost (i.e., small node degree and simple optical components)
An example is a ShuffleNet
for example, an eight-node ShuffleNet has 16 logical links and an average hop count of 2 (if uniform traffic is assumed) these networks are scalable to large sizes by adding stages and/or increasing the degree of the nodes
Eight-node ShuffleNet
1 2 1 2 3 4 5 6 7 8 1 2 3 4 3 4 5 6 7 8 1 2 3 4 5 6 7 8
logical topology
LCG
12 - 47
ShuffleNet embedded in a bi-directional ring WRN

Bi-directional ring WRN with elementary NASs 2 s with spectrum reuse factor of 16/2 = 8 2 transceivers in each NAS average hop count = 2 network cap. = 8*2/2 = 8 R0
3
4
8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 -- -- -- -- 1L 2L -- --- -- -- -- -- -- 1R 2R -- -- -- -- 2R 1R -- --- -- -- -- -- -- 2L 1L 1R 2R -- -- -- -- -- --- -- 1L 2L -- -- -- -2L 1L -- -- -- -- -- --- -- 2R 1R -- -- -- --
6 R L
5
RCA
Note: station labeling!

12 - 48
Details of a ShuffleNet node

L R 1
2 1, 2 1, 2 1, 2 2 1, 2
L R 1
5 6
1 1 2
5 6
ONN1
R 6 5 L 5 1 1 2
1 OT1 TP
2 OT2 TP
1 RP
2 RP
OR1 OR2
Fibers between ONN5 and ONN1

Multipoint logical topologies

High connectivity may be maintained in transparent optical networks while economizing on optical resource utilization through the use of multipoint connections These ideas are even more potent when combined with logical switching For example, a ShuffleNet may be modified to a Shuffle Hypernet
an 8-node Shuffle Hypernet has 4 hyperarcs each hyperarc presents a directed MPS that contains 2 transmitting and 2 receiving stations an embedded directed broadcast star is created to support each MPS for a directed star, a (physical) tree is found joining all stations in both the transmitting and receiving sets of the MPS any node on the tree can be chosen as a root LDCs on the tree are set to create optical paths from all stations in the transmitting set to the root node, and paths from the root to all receiving stations
Eight-node Shuffle Hypernet

1 E1 1 3 2 4 5 6 7 8 1 2 3 4 2 3 4 5 6 7 8 E3 E4 E1 E2 1 2 3 4 5 6 7 8
transformation
LCH
12 - 51
Shuffle Hypernet embedded in a bidirectional ring LLN

Bi-directional ring LLN with elementary NASs using TDM/T-WDMA in FT-TR mode 1 waveband with spectrum reuse factor of 4/1 = 4 2 s per waveband 1 transceiver in each NAS network cap. = 8*1/2 = 4 R0
inbound fibers root ONN5 ONN8 ONN2 ONN3 outbound fibers waveband 4
4 4
d c
e f 7
6 b a a
5
g h
2
E1 E2 E3 E4
a, b, c e, f, g g, a, h c, d, e
b f h d
1 1 1 1
RCA
P. Raatikainen
Note: station and fiber labeling!

Details of node in Shuffle Hypernet

5
a b 5
w1 w1 w1
w1 w1 w1
a b 5
5, 2 7, 1 7, 2
1 2
1, 6 3, 5
3
5 6
E2
E1
ONN5
3 3 3 3
7, 1
1, 6 RP OR TP OT 1, 6 1
c c
6
6 6 6
b b
5
5 5 5
a a
1
1 1 1
7, 2 7, 1 5, 2
Fibers between ONN3 and ONN1

... 12 - 53
Contents
Virtual connections: an ATM example
P. Raatikainen
12 - 54
Virtual connections - an ATM example

Recall the problem of providing full connectivity among five locations suppose each location contains a number of end systems that access the network through an ATM switch. The interconnected switches form a transport network of 5*4 = 20 VPs. The following five designs are now examined and compared:
Stand-alone ATM star Stand-alone ATM bi-directional ring ATM over a network of SONET cross-connects ATM over a WRN ATM over a LLN
Traffic demand: each VP requires 600 Mbits/s ( STM-4/STS-12) Optical resources: -channels and transceivers run at the rate of 2.4 Gbits/s ( STM-16/STS-48)
Stand-alone ATM networks
1 5 6 2 5
ATM switch/cross-connect with transceiver

Embedded ATM networks
A S 1 6 3
S
A S
S
1 2
5 S S 4 S
2 S
5 6
S
A A 1 A 5 2 A 4 A 3
DCS network
A ATM switch
P. Raatikainen
Case 1 - Stand-alone ATM star

Fiber links are connected directly to ports on ATM switches, creating a pointto-point optical connection for each fiber each link carries 4 VPs in each direction each optical connection needs 2.4 Gbits/s, which can be accomodated using a single -channel
one optical transceiver is needed to terminate each end of a link, for a total of 10 transceivers in the network
Processing load is unequal:

end nodes process their own 8 VPs carrying 4.8 Gbits/s center node 6 processes all 20 VPs carrying 12.0 Gbits/s bottleneck Inefficient utilization of fibers, since even though only one -channel is used, the total bandwidth of each fiber is dedicated to this system
Poor survivability, since

if any link is cut, network is cut in two if node 6 fails, the network is completely distroyed
Optical network
S SDH/SONET DCS
Shared medium
ONN
12 - 57
Case 2 - Stand-alone ATM bi-directional ring

Fiber links are connected directly to ports on ATM switches, creating a pointto-point optical connection for each fiber assuming shortest path routing, each link carries 3 VPs in each direction each optical connection needs 1.8 Gbits/s, which can be accommodated using a single -channel (leaving 25% spare capacity) 1 optical transceiver is needed to terminate each end of a link, for a total of 10 transceivers in the network Equal processing load: each ATM node processes its own 8 VPs and 2 additional transit VPs carrying an aggregate traffic of 6.0 Gbits/s Thus, no processing bottleneck the same problem with optical spectrum allocation as in case 1 but better survivability, since network can recover from any single link cut or node failure by rerouting the traffic
Case 3 - ATM embedded in DCS network

ATM end nodes access DCSs through 4 electronic ports Fiber links are now connected to ports on DCSs, creating a point-to-point optical connection for each fiber each link carries 4 VPs in each direction => each optical connection needs 2.4 Gbits/s, which can be accommodated using a single -channel again, 1 optical transceiver is needed to terminate each end of a link Processing load is lighter ATM nodes process their own 8 VPs carrying 4.8 Gbits/s but it is much simpler to perform VP cross-connect functions at the STM4/STS-12 level than at the ATM cell level (as was done in case 1) a trade-off must be found between optical spectrum utilization and costs the more -channels on each fiber (to carry background traffic), the more (expensive) transceivers are needed Survivability and reconfigurability are good since alternate paths and additional bandwidth exist in the DCS network
Case 4 - ATM embedded in a WRN

DCSs are now replaced by optical nodes containing WSXCs Each ATM end node is connected electronically to a NAS Each VP in the virtual topology must be supported by a point-to-point optical connection occupying one -channel thus, 4 tranceivers are needed in each NAS (and totally 20 transceivers) however, no tranceivers are needed in the network nodes With an optimal routing and wavelength assignment, the 20 VPs can be carried using 4 wavelengths (= 800 GHz) Processing load is very light due to optical switching (without optoelectronic conversion at each node) Note: ATM nodes still process their own 8 VPs carrying 4.8 Gbits/s As in case 3, survivability and reconfigurability are good since alternate paths and additional bandwidth exist in the underlying WRN
Case 5 - ATM embedded in an LLN

WSXCs are now replaced by LDCs A single waveband is assigned to the ATM network, and the LDCs are set to create an embedded tree (MPS) on that waveband the 20 VPs are supported by a single hyperedge in the logical topology since each -channel can carry 4 VPs, 5 -channels are needed totally, all in the same waveband (= 200 GHz) only 1 transceiver is needed in each NAS (and totally 5 transceivers) using TDM/T-WDMA in FT-TR mode Processing load is again very light due to optical switching (without optoelectronic conversion at each node) Note: ATM nodes still process their own 8 VPs carrying 4.8 Gbits/s As in cases 3 and 4, survivability and reconfigurability are good since alternate paths and additional bandwidth exist in the underlying LLN
Comparison of ATM network realizations
Case 1 2 3 4 5
Optical spectrum usage Very high Very high Lowest Medium Low
Number of optical transceivers 10 10 10 20 5
Node processing load Very high High Medium Very low Very low
Others Poor survivability High DCS Rapid tunability required, optical multi-cast possible
P. Raatikainen
12 - 63
Optical switches
P. Raatikainen
13 - 1
Optical switches
Components and enabling technologies Contention resolution Optical switching schemes
P. Raatikainen
13 - 2
Components and enabling technologies

Optical fiber Light sources, optical transmitters Photodetectors, optical receivers Optical amplifiers Wavelength converters Optical multiplexers and demultiplexers Optical add-drop multiplexers Optical cross connects WDM systems
Optical fiber
Optical fiber is the most important transport medium for highspeed communications in fixed networks Pros immune to electromagnetic interference does not corrode huge bandwidth (25 Tbit/s) Cons connecting fibers requires special techniques (connectors, specialized personnel to splice and connect fibers) does not allow tight bending An optical fiber consists of ultrapure silica mixed with dopants to adjust the refractive index
Optical fiber (cont.)

Optical cable consists of several layers silica core cladding, a layer of silica with a different mix of dopants buffer coating, which absorbs mechanical stresses coating is covered by a strong material such as Kevlar outermost is a protective layer of plastic material
Plastic Cladding Glassy core Buffer coating KevlarTM
Cross section (not to scale)
P. Raatikainen
13 - 5

Fiber cable consists of a bundle of optical fibers, up to 432 fibers. Refractive index profile of a fiber is carefully controlled during manufacturing phase n(x) n Typical refractive index profiles Step index step index profile n profile graded index profile n
2 1
x Cladding Core fiber
n(x) n2 n1 n2 Graded index profile
P. Raatikainen
13 - 6

Light beams are confined in the fiber - by total reflection at the core-cladding interface in step-index fibers - by more gradual refraction in graded index fibers
n2 n1 Step index
Graded index
P. Raatikainen
13 - 7

Fiber can be designed to support several propagation modes => multimode fiber just a single propagation mode => single-mode fiber
D = 125 2 m d = 50 m D = 125 2 m d = 8.6 - 9.5 m
Core fiber ncore Cladding nclad (many directional rays)
Cladding nclad
(one directional rays due to small d/D ratio)
Multimode fiber
Single-mode fiber
P. Raatikainen
13 - 8

Multimode graded index fiber small delay spread 1% index difference between core and cladding amounts to 1-5 ns/km delay spread easy to splice and to couple light into it bit rate limited up to 100 Mbit/s for lengths up to 40 km fiber span without amplification is limited Single mode fiber almost eliminates delay spread more difficult to splice and to exactly align two fibers together suitable for transmitting modulated signals at 40 Gbit/s or higher and up to 200 km without amplification
Optical fiber characteristics

Dispersion is an undesirable phenomenon in optical fibers causes an initially narrow light pulse to spread out as it propagates along the fiber There are different causes for dispersion modal dispersion chromatic dispersion Modal dispersion occurs in multimode fibers caused by different (lengths) propagation paths of different modes Chromatic dispersion material properties of fiber, such as dielectric constant and propagation constant, depend on the frequency of the light each individual wavelength of a pulse travels at different speed and arrives at the end of the fiber at different time
Optical fiber characteristics (cont.)

Chromatic dispersion (cont.) dispersion is measured in ps/(nm*km), i.e. delay per wavelength variation and fiber length Dispersion depends on the wavelength at some wavelength dispersion may be zero in conventional single mode fiber this typically occurs at 1.3 m - below, dispersion is negative, above it is positive For long-haul transmission, single mode fibers with specialized index of refraction profiles have been manufactured dispersion-shifted fiber (DSF) zero-dispersion point is shifted to 1.55 m
P. Raatikainen
13 - 11

Fiber attenuation is the most important transmission characteristic limits the maximum span a light signal can be transmitted without amplification Fiber attenuation is caused by light scattering on fluctuations of the refractive index imperfections of the fiber impurities (metal ions and OH radicals have a particular effect) A conventional single-mode fiber has two low attenuation ranges one at about 1.3 m another at about 1.55 m
P. Raatikainen
13 - 12
Between these ranges is a high attenuation range (1.35-1.45 m), with a peak at 1.39 m, due to OH radical special fibers almost free of OH radicals have been manufactured such fibers increase the usable bandwidth by 50% the whole range from 1.335 m to 1.625 m is usable, allowing about 500 WDM channels at 100 GHz channel spacing
P. Raatikainen
13 - 13

Absorption due to OH(peak at 1385 nm)
Transmitted optical loss or attenuation (dB)
Without OH-
Zero-dispersion line 1.2 1.4 1.6 (m)
The attenuation is measured in dB/km; typical values are 0.4 dB/km at 1.31 m 0.2 dB/km at 1.55 m for comparison, attenuation in ordinary clear glass is about 1 dB/cm = 105 dB/km
Light sources and optical transmitters

One of the key components in optical communications is the monochromatic (narrow band) light source Desirable properties compact, monochromatic, stable and long lasting Light source may be one of the following types: continuous wave (CW); emits at a constant power; needs an external modulator to carry information modulated light; no external modulator is necessary Two most popular light sources are light emitting diode (LED) semiconductor laser
Light emitting diode (LED)

LED is a monolithically integrated p-n semiconductor diode Emits light when voltage is applied across its two terminals In the active junction area, electrons in the conduction band and holes in the valence band are injected Recombination of the electron with holes releases energy in the form of light Can be used either as a CW light source or modulated light source (modulated by the injection current)
Terminal P-type Active junction N-type Terminal Emitted light
P. Raatikainen
13 - 16
Characteristics of LED
Relatively slow - modulation rate < 1 Gbit/s Bandwidth depends on the material - relatively wide spectrum Amplitude and spectrum depend on temperature Low cost Transmits light in wide cone - suitable for multimode fibers
1.0
45 oC
Relative intensity
0.5
As temperature rises, spectrum shifts and intensity decreases
0.0
50 oC
~690 ~ 700
(nm)
P. Raatikainen
13 - 17
Semiconductor laser
LASER (Light amplification by stimulated emission of radiation) Semiconductor laser is also known as laser diode and injection laser Operation of a laser is the same as for any other oscillator - gain (amplification) and feedback As a device semiconductor laser is similar to a LED (i.e. p-n semiconductor diode) A difference is that the ends of the active junction area are carefully cleaved and act as partially reflecting mirrors this provides feedback The junction area acts as a resonating cavity for certain frequencies (those for which the round-trip distance is multiple of the wavelength in the material - constructive interference)
Semiconductor laser (cont.)

Light fed back by mirrors is amplified by stimulated emission Lasing is achieved above a threshold current where the optical gain is sufficient to overcome losses (including the transmitted light) from the cavity
+ p n
Cleaved surface
Cleaved surface
P. Raatikainen
13 - 19

Cavity of a Fabry-Perot laser can support many modes of oscillation => it is a multimode laser In single frequency operation, all but a single longitudinal mode must be suppressed - this can be achieved by different approaches: cleaved-coupled cavity (C3) lasers external cavity lasers p p n p p p n distributed Bragg reflector (DBR) lasers Active Active distributed feedback (DFB) lasers layer layer Diffraction The most common light sources gratings for high-bit rate, long-distance Guiding layer transmission are the DBR and DFB lasers.
10

Laser tunability is important for multiwavelength network applications Slow tunability (on ms time scale) is required for setting up connections in wavelength or waveband routed networks achieved over a range of 1 nm via temperature control Rapid tunability (on ns-s time scale) is required for TDM-WDM multiple access applications achieved in DBR and DFB lasers by changing the refractive index, e.g. by changing the injected current in grating area Another approach to rapid tunability is to use multiwavelength laser arrays one or more lasers in the array can be activated at a time
P. Raatikainen
13 - 21

Lasers are modulated either directly or externally direct modulation by varying the injection current external modulation by an external device, e.g. Mach-Zehnder interferometer
V Light input Ii 0 Modulated light Io
Mach-Zehnder interferometer
11
Photodetectors and optical receivers

A photodetector converts the optical signal to a photocurrent that is then electronically amplified (front-end amplifier) In a direct detection receiver, only the intensity of the incoming signal is detected in contrast to coherent detection, where the phase of the optical signal is also relevant coherent systems are still in research phase Photodetectors used in optical transmission systems are semiconductor photodiodes Operation is essentially reverse of a semiconductor optical amplifier junction is reverse biased in absence of optical signal only a small minority carrier current is flowing (dark current)
Photodetectors and optical receivers (cont.)

Operation is essentially reverse of a semiconductor optical amplifier (cont.) a photon impinging on surface of a device can be absorbed by an electron in the valence band, transferring the electron to the conduction band each excited electron contributes to the photocurrent PIN photodiodes (p-type, intrinsic, n-type) An extra layer of intrinsic semiconductor material is sandwiched between the p and n regions Improves the responsivity of the device captures most of the light in the depletion region
12
Photodetectors and optical receivers (cont.)

Avalanche photodiodes (APD) In a photodiode, only one electron-hole pair is produced by an absorbed photon This may not be sufficient when the optical power is very low The APD resembles a PIN an extra gain layer is inserted between the i (intrinsic) and n layers a large voltage is applied across the gain layer photoelectrons are accelerated to sufficient speeds produce additional electrons by collisions => avalanche effect largely improved responsivity
Optical amplifiers
Optical signal propagating in a fiber suffers attenuation Optical power level of a signal must be periodically conditioned Optical amplifiers are key components in long haul optical systems An optical amplifier is characterized by gain - ratio of output power to input power (in dB) gain efficiency - gain as a function of input power (dB/mW) gain bandwidth - range of frequencies over which the amplifier is effective gain saturation - maximum output power, beyond which no amplification is reached noise - undesired signal due to physical processes in the amplifier
13
Optical amplifiers (cont.)
Types of amplifiers
Electro-optic regenerators Semiconductor optical amplifiers (SOA) Erbium-doped fiber amplifiers (EDFA)
P. Raatikainen
13 - 27
Electro-optic regenerators
Optical signal is
received and transformed to an electronic signal amplified in electronic domain converted back to optical signal at the same wavelength
Fiber
O/E
Optical receiver
Amp
E/O
Fiber
Optical transmitter Photonic domain
Photonic domain
Electronic domain
O/E - Optical to Electronic E/O - Electronic to Optical Amp - Amplifier
P. Raatikainen
13 - 28
14
Semiconductor optical amplifiers (SOA)

Structure of SOA is similar to that of a semiconductor laser It consists of an active medium (p-n junction) in the form of waveguide - usually made of InGaAs or InGaAsP Energy is provided by injecting electric current over the junction
Current pump AR Weak input signal Fiber AR
OA
Amplified output signal Fiber
P. Raatikainen
13 - 29
Semiconductor optical amplifiers (cont.)

SOAs are small, compact and can be integrated with other semiconductor and optical components They have large bandwidth and relatively high gain (20 dB) Saturation power in the range of 5-10 dBm SOAs are polarization dependent and thus require a polarizationmaintaining fiber Because of nonlinear phenomena SOAs have a high noise figure and high cross-talk level
P. Raatikainen
13 - 30
15
Erbium-doped fiber amplifiers (EDFA)

EDFA is a very attractive amplifier type in optical communications systems EDFA is a fiber segment, a few meters long, heavily doped with erbium (a rare earth metal) Energy is provided by a pump laser beam
Pump (980 or 1480 nm at 3 W) Weak signal in Fiber Isolator

EDFA
Amplified signal out Isolator Fiber
Fiber
P. Raatikainen
13 - 31
Erbium-doped fiber amplifiers (cont.)

Amplification is achieved by quantum mechanical phenomenon of stimulated emission
erbium atoms are excited to a high energy level by pump laser signal they fall to a lower metastable (long-lived, 10 ms) state an arriving photon triggers (stimulates) a transition to the ground level and another photon of the same wavelength is emitted
Excited erbium atoms at high energy level ~1 s Atoms at metastable energy (~10 ms) Stimulated emission (1520 - 1620 nm) Erbium atoms at low energy level
Longer wavelenght source (1480 nm) Short- wavelenght source (980 nm)
P. Raatikainen
13 - 32
16
Erbium-doped fiber amplifiers (cont.)

EDFAs have a high pump power utilization (> 50 %). Directly and simultaneously amplify a wide wavelength band (> 80 nm in the region 1550 nm) with a relatively flat gain Flatness of gain can be improved with gain-flattening optical filters Gain in excess of 50 dB Saturation power is as high as 37 dBm Low noise figure Transparent to optical modulation format Polarization independent Suitable for long-haul applications EDFAs are not small and cannot easily be integrated with other semiconductor devices
Wavelength converters
Wavelength converters Enable optical channels to be relocated Achieved in optical domain by employing nonlinear phenomena Types of wavelength converters Optoelectronic approach Optical gating - cross-gain modulation Four-wave mixing
P. Raatikainen
13 - 34
17
Wavelength converters optoelectronic approach

Simplest approach Input signal is received converted to electronic form regenerated transmitted using a laser at a different wavelength.
Receiver
Regenerator
Transmitter
P. Raatikainen
13 - 35
Optical gating cross-gain modulation

Makes use of the dependence of the gain of a SOA (semiconductor optical amplifier) on its input power Gain saturation occurs when high optical power is injected Signal s carrier concentration is depleted Signal p Filter p SOA gain is reduced Probe p Fast can handle 10 Gbit/s rates Signal
Carrier density Gain Sprobe output Time
P. Raatikainen
13 - 36
18
Four-wave mixing
Four-wave mixing is usually an undesirable phenomenon in fibers Can be exploited to achieve wavelength conversion In four-wave mixing, three waves at frequencies f1, f2 and f3 produce a wave at the frequency f1 + f2 - f3 When f1 = fs (signal) f2 = f3 = fp (pump) => a new wave is produces at 2fp - fs Four-wave mixing can be enhanced by using SOA to increase the power levels Other wavelengths are filtered out
Four-wave mixing (cont.)
2fp- fs 2fs- fp 2fp- fs fs fp

SOA Filter
2fp- fs
P. Raatikainen
fp
fs
13 - 38
19
Optical multiplexers and demultiplexers

An optical multiplexer receives many wavelengths from many fibers and converges them into one beam that is coupled into a single fiber An optical demultiplexer receives a beam (consisting of multiple optical frequencies) from a fiber and separates it into its frequency components, which are directed to separate fibers (a fiber for each frequency)
1 2 N
... P. Raatikainen
1 2 N Multiplexed beam 1+ 2+ ...+N Lens
1 2
...
1 ,2
, ,N
1 ,2
, ,N
Optical multiplexer
Optical demultiplexer
Prisms and diffraction gratings

Prisms and diffraction gratings can be used to achieve these functions in either direction (reciprocity) in both of these devices a polychromatic parallel beam impinging on the surface is separated into frequency components leaving the device at different angles based on different refraction (prism) or diffraction (diffraction grating) of different wavelengths
Fibers 1 2 N Incident beam 1+ 2+ ...+N Lens Fibers Diffracted wavelenghts
...
...
Diffraction grating
P. Raatikainen
13 - 40
20
Prisms and diffraction gratings (cont.)

n1 Multiplexed beam 1+ 2+ ...+N Fiber Lens n2
1 2 3
Lens
Prism
...
n1 Multiplexed beam 1+ 2+ ...+N n2
1 2 3 N
...
P. Raatikainen
13 - 41
Arrayed waveguide grating (AWG)

AWGs are integrated devices based on the principle of interferometry a multiplicity of wavelengths are coupled to an array of waveguides with different lengths produces wavelength dependent phase shifts in the second cavity the phase difference of each wavelengths interferes in such a manner that each wavelength contributes maximally at one of the output fibers Reported systems SiO2 AWG for 128 channels with 250 GHz channel spacing InP AWG for 64 channels with 50 GHz channel spacing
w1 Array of waveguides 1+ 2+ ...+N
S1
wN
Array of fibers 1
P. Raatikainen
...
N
13 - 42
21
Optical add-drop multiplexers (OADM)

Optical multiplexers and demultiplexers are components designed for wavelength division (WDM) systems multiplexer combines several optical signals at different wavelengths into a single fiber demultiplexer separates a multiplicity of wavelengths in a fiber and directs them to many fibers The optical add-drop multiplexer selectively removes (drops) a wavelength from the multiplex then adds the same wavelength, but with different data
1, 2, ... ,N 2, ... ,N
OADM
1 1
1, 2, ... ,N
P. Raatikainen
13 - 43
Optical add-drop multiplexers (cont.)

An OADM may be realized by doing full demultiplexing and multiplexing of the wavelengths a demultiplexed wavelength path can be terminated and a new one created
1, 2, ... , N
OA
1, 2, ... ,N-1
OADM
OA
1, 2, ... , N
P. Raatikainen
13 - 44
22
Optical cross-connects
Channel cross-connecting is a key function in communication systems Optical cross-connection may be accomplished by hybrid approach: converting optical signal to electronic domain, using electronic cross-connects, and converting signal back to optical domain all-optical switching: cross-connecting directly in the photonic domain Hybrid approach is currently more popular because the all-optical switching technology is not fully developed all optical NxN cross-connects are feasible for N = 232 large cross-connects ( N 1000) are in experimental or planning phase All-optical cross-connecting can be achieved by optical solid-state devices (couplers) electromechanical mirror-based free space optical switching devices
P. Raatikainen
13 - 45
Solid-state cross-connects
Based on semiconductor directional couplers Directional coupler can change optical property of the path Propagation constant polarization control (voltage) Lightguide propagation constant absorption index Signal on Signal in refraction
Signal off
Optical property may be changed by means of heat, light, mechanical pressure current injection, electric field Technology determines the switching speed, for instance LiNbO3 crystals: order of ns SiO2 crystals: order of ms
23
Solid-state cross-connects (cont.)

A multiport switch, also called a star coupler, is constructed by employing several 2x2 directional couplers For instance, a 4x4 switch can be constructed from six 2x2 directional couplers Due to cumulative losses, the number of couplers in the path is limited and, therefore, also the number of ports is limited, perhaps to 32x32
1 2 3 4
2x2 2x2
2x2 2x2
2x2 2x2
1 2 3 4 Waveguide Control Substrate
P. Raatikainen
13 - 47
Microelectromechanical switches (MEMS)

Tiny mirrors micromachined on a substrate outgrowth of semiconductor processing technologies: deposition, etching, lithography a highly polished flat plate (mirror) is connected with an electrical actuator cab be tilted in different directions by applied voltage
R.J. Bates, Optical switching and networking handbook, McGraw-Hill, 2001 P. Raatikainen Switching Technology / 2003 13 - 48
24
Optical cross connects

MEMS technology is still complex and expensive. Many MEMS devices may be manufactured on the same wafer reduces cost per system Many mirrors can be integrated on the same chip arranged in an array experimental systems with 16x16=256 mirrors have been built each mirror may be independently tilted An all-optical space switch can be constructed using mirror arrays
R.J. Bates, Optical switching and networking handbook, McGraw-Hill, 2001 P. Raatikainen Switching Technology / 2003 13 - 49
Optical switches
P. Raatikainen
13 - 50
25
Contention resolution
Contention occurs when two or more packets are destined to same output at the same time instant In electronic switches, contention solved usually by store-and-forward techniques In optical switches, contention resolved by optical buffering (optical delay lines) deflection routing exploiting wavelength domain
scattered wavelength path (SCWP) shared wavelength path (SHWP)
P. Raatikainen
13 - 51
Optical delay loop

mT 2T T
...
...
In_1 In_2
Out_1 Out_2
...
...
In_n
Out_n
P. Raatikainen
13 - 52
26
Deflection routing
In_1 In_2 In_3 In_4
Out_1 Out_2 Out_3 Out_4
...
...
In_n
Out_n
P. Raatikainen
13 - 53
Wavelength conversion
3 1
In_1 In_2 In_3 In_4
2 1
Out_1 Out_2 Out_3 Out_4
...
...
In_n
Out_n
P. Raatikainen
13 - 54
27
Optical switches
P. Raatikainen
13 - 55
Optical packet switching

User data transmitted in optical packets
packet length fixed or variable
Packets switched in optical domain packet-by-packet No optical-to-electrical (and reverse) conversions for user data Switching utilizes TDM and/or WDM Electronic switch control Different solutions suggested
broadcast-and-select wavelength routing optical burst switching
28
Optical packet switch

Input interfaces Switch fabric Output interfaces
...
...
...
Sync. control
Switch control
Header rewrite
packet delineation packet alignment header and payload separation header information processing header removal P. Raatikainen
switching of packets from inputs to correct outputs in optical domain contention resolution
header insertion optical signal regeneration
Broadcast-and-select
Input ports support different wavelengths (e.g. only one wavelength/port) Data packets from all input ports combined and broadcasted to all output ports Each output port selects dynamically wavelengths, i.e. packets, addressed to it Inherent support for multi-casting Requires that control unit has received routing/connection information before packets arrive
P. Raatikainen
...
13 - 57 13 - 58
29
Broadcast-and-select
Wavelength encoding
TWC/FWC
Buffering
Wavelength selection
Out_1
In_1 In_2
1 COMBINER
...
In_n
...
TWC FWC
- Tunable Wavelength Converter - Fixed Wavelength Converter Switching Technology / 2003 13 - 59
P. Raatikainen
Wavelength routing
Input ports usually support the same set of wavelengths Incoming wavelengths arrive to contention resolution and buffering block, where the wavelengths are converted to other wavelenths (used inside the switch) demultiplexed routed to delay loops of parallel output port logics Contetion free wavelengths of the parallel output port logics are combined and directed to wavelength switching block Wavelength switching block converts internally routed -channels to wavelengths used in output links and routes these wavelengths to correct output ports Correct operation of the switch requires that control unit has received routing/connection information before packets arrive
...
Out_n
30
Wavelength routing
Contention resolution and buffering
1 TWC TWC
Wavelength switching
Out_1
...
...
TWC
- Tunable Wavelength Converter Switching Technology / 2003 13 - 61
P. Raatikainen
Optical burst switching

Data transmitted in bursts of packets Control packet precedes transmission of a burst and is used to reserve network resources no acknowledgment, e.g. TAG (Tell-and-Go) acknowledgment, e.g. TAW (Tell-and-Wait) High bandwidth utilization (lower avg. processing and synchronization overhead than in pure packet switching) QoS and multicasting enabled
P. Raatikainen
..
In_n
TWC
...
TWC
...
Out_n
..
In_1
13 - 62
31
Header and packet formats

In electronic networks, packet headers transmitted serially with the payload (at the same bit rate) In optical networks, bandwidth is much larger and electronic header inspection cannot be done at wire speed Header cannot be transmitted serially with the payload Different approaches for optical packet format
packets switched with sub-carrier multiplexed headers header and payload transmitted in different -channels header transmitted ahead of payload in the same -channel tag () switching - a short fixed length label containing routing information
P. Raatikainen
Header and packet formats (cont.)

Packets with sub-carrier headers
Payload
1
Header Sub-carrier
Fiber
Header and payload in different -channels
Fiber
Payload Header
1 2
Header transmitted ahead of payload in the same -channel

P. Raatikainen
Fiber
Payload Payload Header
Header
1 2
13 - 64
32
Example optical packet format (KEOPS)

He a der s patt ynch. ern load patt synch . ern
tag
d tim
d tim
Ro u ting
load
Gua r
Gua r
Time slot
P. Raatikainen
13 - 65
Research issues in optical switching

Switch fabric interconnection architectures Packet coding techniques (bit serial, bit parallel, out-of-band) Optical packets structure (fixed vs. variable length) Packet header processing and insertion techniques Contention resolution techniques Optical buffering (delay lines, etc.) Reduction of protocol layers between IP and fiber Routing and resource allocation (e.g. GMPLS, RSVP-TE) Component research (e.g. MEMS)
Gua r
Pay
Pay
d tim
33

Optical Switching Comprehensive Guide

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optical Switching Comprehensive Guide

Uploaded by

Copyright:

Available Formats

Optics Switching Technology

Switching Technology / 2005

Switching Technology / 2005

Goals of the course

Switching Technology / 2005

Course outline Switch implementations

Switching Technology / 2005

13 lectures ( 3 hours) and 7 exercises ( 2 hours) Calculus exercises Grating

Switching Technology / 2005

Switching Technology / 2005

Switching Technology / 2005

Switching Technology / 2005

Switching Technology S38.165

Switching Technology / 2005

Switching in general Switching modes Transport and switching

Switching Technology / 2005

inlet/outlet = a line or a channel

Switching in general (cont.)

Switching Technology / 2005

Switching systems are central components in communications networks

Switching Technology / 2005

Full connectivity between hosts

Number of links to/from a host = n-1 Total number of links = n(n-1)/2

Switching Technology / 2005

Number of links to/from a host = 1 Total number of links = n

Switching Technology / 2005

Switching network to connect hosts

Switching Technology / 2005

Hierarchy of switching networks

Local switching network

To higher level of hierarchy

Long distance switching network

Switching Technology / 2005

Sharing of link capacity

Space Division Multiplexing (SDM)

- physical cable or twisted pair - frequency - light wave

Sharing of link capacity (cont.)

Asynchronous transfer mode (ATM)

Switching Technology / 2005

Main building blocks of a switch

Input Interface Input Card #1 Interface Input Card #1 interface #1

Output Interface Output Card #1 Interface Output Card #1 interface #1

Switching Technology / 2005

and heterogeneity among transmission links by providing a variety of interface types

Switching Technology / 2005

Heterogeneity by switching (cont.)

Remote subscriber switch

Switching Technology / 2005

Basic types of witching networks

Dynamically switched networks

Development of switching technologies

Source: Understanding Telecommunications 1, Ericsson & Telia, Studentlitteratur, 2001.

Switching Technology / 2005

Development of switching tech. (cont.)

Switching Technology / 2005

Development of switching tech. (cont.)

Roadmap of Finnish networking technologies

Switching Technology / 2005

Challenges of modern switching

Simultaneous switching of highly different data rates

Support of varying delay requirements

Reliability Cost Throughput

Switching Technology S38.165