You are on page 1of 5

Design and ASIC Implementation of Ethernet Switch for

Network Application
Narasimha Murthy P1, Cyril Prasanna Raj2, Vasudeva Murthy3
Department of Electronics and Communication Engineering
M.S.Ramaiah School of Advanced Studies, Bangalore
__________________________________________________________________________________________________
_

Abstract:
Ethernet is a low level data delivering technology used in Local Area Networks (LAN’s) and
data is transmitted over network in discrete packets and Ethernet switch allows mutually
independent connections to be established among pairs of computers through the same
interconnecting device. Therefore, if a LAN of N node uses a switch, there can be as many as N/2
one-to-one connections at any given instant. LAN switching provides dedicated, collision-free
communication between network devices, with support for multiple simultaneous conversations.
LAN switches are designed to switch data frames at high speeds. A LAN Switch Can Link 10-Mbps
and 100-Mbps Ethernet Segments.
This work presents a switch which conforms to the IEEE 802.3 Ethernet specifications. All
nodes are capable of full duplex operation. All the sub modules of Ethernet switch and top level
block are modeled in verilog HDL. The designed switch is a modified-cut-through type with
dedicated memory for each input port and a common memory for the output ports. The look-up
table for MAC addresses is designed on a linear search algorithm for optimum speed. It is
observed that the controller block consumes more delay. The output FIFO’s are connected
randomly for reducing complexity and delay in read and write FIFO controller. This results in
better data rate and Latency.
The design is simulated and synthesized with 130nm technology libraries. The data rate
of 183 Mbps is achieved. All the sub modules of Ethernet switch and top level block are working
at 400 MHz frequency. The chip area is 2246610.744µm2. The total power obtained after Physical
design Implementation is 58.143 mW and the IR-Drop is 0.063mV.
__________________________________________________________________________________________________
_

Abbreviations output ports. The look-up table for MAC addresses is designed
to have a linear search algorithm for optimum speed.
MAC Medium Access Control
LAN Local Area Network
FIFO First In First Out 1. INTRODUCTION
SOF Start of Frame An Ethernet switch is vastly superior to a
DC Design Compiler conventional network hub. However, due to its high cost, it is
DA Destination Address not commonly used. In a typical small to medium scale Local
SA Source Address Area Network (LAN), a hub is the preferred interconnection
HDL Hardware Description Language device due to its relatively low cost and ease of configuration.
FPGA Field Programmable Gate Array A hub, however, has a serious limitation – in a network of N
ASIC Application Specific Integrated Circuit machines, only one machine can transmit at a time. A switch,
on the other hand, allows mutually independent connections to
be established among pairs of computers through the same
interconnecting device. Therefore, if a LAN of N nodes uses a
OBJECTIVE switch, there can be as many as N/2 one-to-one connections at
Our objective was to implement an Ethernet Switch, any given instant. Supposedly a switch does only one job –
initially capable of handling four computer nodes. The switch increase the network throughput dramatically without
works in the Data-Link and Physical layers of the OSI network incorporating a web server and other high-end stuff; it would
model, and conforming to the IEEE 802.3 Ethernet be an obvious choice for small networks. This has the added
specifications. All nodes are capable of full duplex operation. advantage of being truly plug-and play, relieving the user from
The switch is a modified-cut-through type with dedicated tedious configuration.
memory for each input port and a common memory for the

1
2. WORKING OF A SWITCH The main function of a switch is to look for the port
The working of a switch deals with sending a packet address and to hunt for an available output FIFO. Once the port
coming from a source to a destination. The original Ethernet address is obtained from the MAC table, the contents of the
standards defined the minimum frame size as 64-bytes and the input 64 byte FIFO are pushed into an output FIFO pre-filled
maximum as 1518-bytes. The packet can be divided into three with preamble and start of frame. The Preamble Generator does
main parts: the preamble, the addresses and the data. the pre-filling.
The preamble is a sequence of 56 bits having The destination port polls the address register
alternating 1 and 0 values that are used for synchronization. associated with each output FIFO for its port address and, if
The Start Frame Delimiter comprises the last part of the found to be its own, reads out the packet along with the pre-
preamble. It is 10101011. It is used to indicate the start of the filled preamble and SOF from the FIFO. Therefore, when a
frame. packet leaves the witch, it will regain its original structure. The
Every network interface card in the world has a packet leaves the switch through the transceiver that performs
unique Media Access Control (MAC) address, which is of the Differential Manchester encoding.
length six bytes. The Destination MAC Address field identifies
the station, which receives the frame. The Source MAC
Address identifies the station that originated the frame. The
data field contains the data transferred from the source station
to the destination station(s). The maximum size of this field is Number of Ports 4
1500 bytes. It is followed by a frame check sequence. The
Data rate 10/100 Mbps
packet that comes in from the source goes through a transceiver
that is used to convert the Differential Manchester Encoded Packet Size 64 Bytes
data into NRZ TTL logic. The packet will consist of the above
fields when coming out of the transceiver and entering our Type of switching Modified cut-through
switch. The packet then goes through a Preamble Removing
Circuit where the preamble and start of frame are removed and Data transfer mode full Duplex
the fields after that are forwarded. The packet that contains the Table 1: Ethernet switch specifications
data except for the preamble and start of frame is stored in the
input FIFO that is 512 bits long. This is the aspect of modified 3. INPUT LOGIC DESIGN
cut through switching, which requires the switch to check if a The Input Logic forms the gateway for receiving the
packet is more than 512 bits long And dispose it otherwise. bit stream into the switch and subsequently outputting it to the
This prevents runt frames from going through the switch. output FIFO. There are basically 2 kinds of frames that can
come into the Switch:
1. Valid frames - they satisfy the condition that the
frame is at least 512 bits long.
2. Runt frame – this frame is less than 512 bits long.
The data that comes into the Transceiver in the form
of Differential Manchester encoded data will be converted to
TTL logic. Rx_Data, Rx_Clock and Carrier Detect are the
output signals from the Transceiver. The Rx_Clock is
recovered from the preamble at the transceiver. The Carrier
Detect is ‘high’ as long as the data is available in the line.

Figure 1: Ethernet switch Architecture

The packet is fed in tandem into a look up FIFO of


97 bits long where the source and destination addresses are
stored. These addresses are compared with the addresses in the
MAC table, which is a directory of MAC addresses and the
ports to which those nodes are connected. If match is found for
the destination address, the port address corresponding to the
table entry is written into an address register associated with an
Figure 2: Input logic design of Ethernet switch
output FIFO. If there is no match for the destination address,
then the packet is broadcast to all other nodes. If the source
The TTL data is then passed into the Preamble
address is not found in the MAC table it is written into it along
Removing Circuit. The data alone is passed through the output,
with the corresponding port number from which it originated.
removing the preamble and start of frame. There are three
This self-learning property will build up the MAC table,
without any trouble to the user.

2
output lines from the Preamble Removing Circuit, which are Now, on winning the contention, the A latch of the first sieve
Rx_Data, Rx_Clock and Data Detect. row is set, and the
The Rx_Data is given as input to the 512-bit FIFO. V latch also gets enabled when the frame is
The Rx_Clock and Data Detect are given to the clock select for validated. If the destination port is very busy reading from
selecting either the Rx_Clock clock or Internal Clock to drive other FIFO’s, it will take a considerable amount of time before
the FIFO. The Rx_Clock is selected initially when the packet is the contents of the first output FIFO are read out; time enough
coming through. The Internal Clock is selected at the end of the for the next frame to reach Port A for routing to Port C. Now, if
frame when Rx_Clock is not available to flush out the last part A were to contend and win the first output FIFO again (since
of the frame present in the 512-bit FIFO. the Column A of the sieve is clear now), this would result in
The Rx_Data is concurrently given to the look up two frames within a single output FIFO, with a single
FIFO of 97 bits long. This FIFO will comprise of the source destination port assigned (wrongly) to both of them. This
and destination addresses. Data entry into the FIFO is stopped mistake is avoided by using the Valid latch, because since the
as and when the source and destination addresses are filled into high Row Star value (contributed by the V latch also) ensures
the FIFO. that no port wins ownership of the first FIFO until it is emptied.
A D flip-flop is used to check whether the frame is The column sieve is reset in two conditions, which are:
valid or not by checking the Full pin of the FIFO. If the Full 1. When there is a RUNT frame the input FIFO will be
pin is ‘high’ then the frame is valid. A combinational circuit reset and the status pin EMPTY of the FIFO will be
connected to this flip-flop is used to reset the FIFO if the frame ‘HIGH’ and hence reset the column.
is invalid or when there is no output FIFO available.
The output of the 512-bit FIFO will go to the output 2. When the data has been flushed into the output FIFO
FIFO only when that particular port has contented and won an then also the EMPTY flag of the input FIFO will be
output FIFO for itself. Once that is done then the data can be ‘HIGH’ and hence will reset the column.
pushed into the output FIFO. The sieve matrix is implemented using basic single-bit memory
elements.
4. SIEVE MATRIX
The Sieve matrix as shown below is basically made
6. LOOK UP LOGIC
The function of the Look-Up Logic is to find the
up of single-bit memory elements, arranged in rows and
destination port of a frame arriving at the switch. When a frame
columns. There are eight rows in the sieve matrix, each
arrives at a port (say, port x) of the switch, its destination and
corresponding to an output 2048 Byte FIFO. A, B, C and D
source addresses (say, addresses A and B respectively) are
columns are dedicated to each port. Each latch in a row is set as
compared with the entries of a Look Up table. Each entry of the
and when the output FIFO corresponding to that row has been
look up table is called a Sector, and consists of a MAC address,
allocated to that particular port. The Column E is used to
a port number and a time stamp. If the destination address A is
indicate when the Preamble Generator is filling the
either a Broadcast or a Multicast address, or if it is not found in
corresponding output FIFO’s. The column V is used to indicate
the Look Up table; then the destination ports for the frame are
when there is a valid frame inside the output FIFO.
all the ports except port x. On the contrary, if a match exists,
then the destination port for the frame is given by the port
5. WORKING OF THE SIEVE MATRIX address of the matched sector. If a match for the source address
The single-bit memory elements of columns A, B, C B exists in the Look-up table, nothing further is done about it,
and D in a specific row are set when the corresponding port has else this address along with the originating port address x are
contented for the output FIFO related to the row, and has been entered into the look-up table as a new sector.
successful in winning it. At any time, there can be only one
port connected to an output FIFO. Thus each row of the Sieve
matrix is dedicated to one output FIFO and can have only one
7. HUNTING LOGIC
The hunting logic is essentially made up of four
latch set among the port-related latches. The columns of the
hunting elements and eight contention elements. There is one
Sieve matrix are dedicated to each port. There can be many
hunting element for each port of the switch which when
FIFO’s allocated to the same port but there cannot be many
notified of the arrival of a packet starts hunting for an empty
ports connected to the same FIFO.
Output FIFO. The contention element is a module to avoid the
The column E is used to indicate which FIFO’s are
same FIFO from being booked by more than one port. This is
empty and must be pre-filled with the preamble and start of
based on the idea of lateral inhibition. The operation of the
frame by the Preamble Generator. No data can be pushed into
hunting logic revolves around a sieve matrix which keeps track
the output FIFO while this function is taking place. The column
of which FIFO is being used by which port. It is modified by
V is set by the valid bit of the input logic. An enabled latch
the hunting logic.
indicates that the corresponding output FIFO houses a valid
frame. The main function of this column is can be explained as
follows. Once a port determines that the frame being received 8. OUTPUT LOGIC
by it is valid (i.e. greater than 63 bits long), it enables the V
latch of all output FIFO’s won by it. This Valid bit remains
until the frame has been pushed out of the output FIFO and the
FIFO has been pre-filled with the preamble and start of frame,
ready for new data to come in. Now assume that a port (say,
Port A) receives a valid frame destined to another port (say,
port B); and that it wins ownership of the first output FIFO.

3
The results of the simulation were as follows:
1. The valid frame from Port A was broadcast to all
other ports since the MAC Table was initially empty.
2. The runt frame from Port C was discarded.
3. The acknowledgement frame from Port B was routed
correctly, and was received by Port A alone.
The operations verified by the simulation process were
1. Initialization of all modules
2. Functioning of modules
3. Ability to broadcast
4. Ability to route a packet to the correct destination
after building the Look-Up Table
5. The process of simulation broadly involved the
Figure 3: Output logic design of Ethernet switch
following steps.
6. A valid frame and a runt frame was sent from Ports A
The four input ports contend for the empty FIFO via
and C respectively, to Port B.
the contention logic. The port that wins the contention books
7. An acknowledgement frame was sent from Port B to
the slot in the empty FIFO. Only one port can book a FIFO at
Port A.
time. Once a FIFO is booked by a particular port, it cannot be
booked by any other port until the data is pushed out of that
FIFO and it becomes empty. If a port that wins the contention
logic finds that a particular port is booked, it polls each of the
FIFO’s using a counter until it comes across an empty FIFO.
When it finds an empty FIFO, it stops searching and begins to
push the data inside the FIFO. At the output side, the last two
bits of the FIFO signify the destination port address bits. The
output ports poll the last two bits of each FIFO until it finds its
own port address. If it does find its port address then it starts
reading the data meant for it .If the port address of the current
FIFO does not match the address of the port then it polls each
FIFO successively using the counter till it finds its own
address.

9. RESULTS
The Ethernet Switch results are divided into three sections: Figure 5: Synthesis results of Ethernet switch
1. Simulation results which show the timing diagram of
the various components, when run on a test bench in The Ethernet Switch is implemented using Very Large Scale
ModelSim Suite SE and VCS. Integration, on a Design compiler and Astro made by Synopsis
2. Synthesis results from Design Compiler (DC), which Corporation. The programming language used is Verilog HDL,
and the synthesized GDSII file is ready for fusion.
give the practical parameters for implementation on a
Application Specific Integrated Circuit (ASIC).
3. The netlist diagram of the Ethernet Switch 10. CONCLUSION
implementation at the highest modular hierarchy. The work on the implementation of the Ethernet
Simulation Results switch has given me an insight into some often-overlooked
issues inference in digital logic design and HDL programming.
Here are some of the lessons I learnt.
1. Due to the similarity of HDL languages like Verilog
to programming languages like C++, one tends to
commit the error of designing logic circuits like
software programs. In software programs, it is quite
all right to use long sequentially-executed code with
nested loops and breaks; unfortunately an HDL
program modeled on the same lines would be a poor
one indeed! One of the greatest advantages of
hardware logic over software is that the former has
concurrent execution while the latter is limited to a
Figure 4: Simulation results of Ethernet switch step-by-step execution. However, this advantage
comes at a price strict timing rules. Thus, it would be
The simulation of the Ethernet Switch design was difficult for a C-like HDL code to be implemented
performed using a test bench module to feed simulation inputs practically, especially if it involves I/O with
to the system, and to log the results.

4
peripheral devices, even though the simulation would [10] Bhasker Jayaram, Verilog HDL Primer, New Delhi,
yield acceptable results. Mashbra Industries
2. The implementation of a combinational circuit with [11] Clark Martin P, Networks and Telecommunications, John
the least number of gates is often inferior to an Wiley & Sons
implementation which may have more gates overall [12] Douglas Comer, Computer Networks and Internet, N.J.
but with lesser number of cascaded gates than the Prentice Hall, 1997
former design. This is simply because the latter [13] James F Kurose & Keith W Ross, Computer Networking,
implementation has a smaller critical path than the Pearson Education.
former, and hence can support operation at higher [14] The Institute of Electrical and Electronics Engineers, Inc,
clock speeds. Verilog Hardware Description Language IEEE STD (1364-
2001), IEEE Park Avenue New York, USA
3. Contrary to popular perception, it is; in general,
[15] www.cirruslogic.com
better to design a logic circuit in behavioral level
[16] www.cisco.com
than at structural or dataflow levels. However, it
[17] www.lantronix.com
must be stated that, when using the HDL synthesis
[18] www.nortel.com
tool Design Compiler (DC), one has no choice but to
[19] www.xilinx.com
use structural or dataflow design if one wishes to
make an asynchronous circuit.
4. Digital logic design with simulation using VCS and
synthesis using Design Compiler is a very tricky
business – the two tools are incompatible with each
other, especially when working with behavioral level
code.
5. The design is simulated and synthesized with 130nm
technology libraries. The data rate of 183 Mbps is
achieved. All the sub modules of Ethernet switch
and top level block are working at 400 MHz
frequency. The chip area is 2246610.744µm2. The
total power obtained after Physical design
Implementation is 58.143 mW and the IR-Drop is
0.063mV.
REFERENCES
[1] Erico Bastos, Everton Carara, Daniel Pigatto, Ney Calazans
and Fernando Moraes, “A scalable architecture for ethernet
switches”, Computer Society Annual Symposium on VLSI, pp.
2895-2896, 2007.
[2] Feng-Li Lian, Yang-Chung Tu and Chun-wei Li, “Ethernet
switch controller design for real-time control applications”,
Proceedings of the 2004 IEEE International Conference on
Control Applications, pp. 464-467, 2004.
[3] Hyoung-Il Lee and Seung-Woo Seo, “Matching Output
Queueing with a Multiple Input/Output-Queued Switch”,
IEEE/ACM Transactions on networking, vol. 14, no.1,
February 2006.
[4] Sundar Iyer, “Analysis of the Parallel Packet Switch
Architecture”, IEEE/ACM Transactions On Networking,
vol.11, no. 2, April 2003.
[5] Mark J.Karol, Michael G. Hluchyj and Samuel p. Morgan,
“Input versus Output Queueing on a Space division Packet
Switch”, IEEE Transactions on communication, vol.35, no.12,
December 1987.
[6] Andrew Tanenbaum, Computer Networks Ed 2, Prentice
Hall, 1996.
[7] HyeRan Barry Nance, Introduction to Networking, New
Delhi, TMH.
[8] The Institute of Electrical and Electronics Engineers, Inc,
Local Area Networks (CSMA/CD) Access Method and Layer
Specification, Wiley–Interscience.
[9] Andrew Bruce Caslow, Cisco certification: Bridges,
Routers and Switches for CCIES, New Delhi, Addision –
Wesley, 1999.

You might also like