You are on page 1of 25

Top-Down Design of ALU

Supporting 16-Bit Addition & 8-Bit Multiplication


Alper UÇAR

Contents
1. Overview………………………………..…………………………………………….1
2. VHDL Description ………………………………………………………….………..3
3. Functional Simulation with Modelsim…….…………………………………….......10
4. Logic Synthesis with LeonardoSpectrum…….……………………………..………12
5. Schematic Generation…….…………………………………………………............15
6. Simulation in Accusim…….…………………………………………………..........15
7. Automated Layout…….………………………………………………….................19
8. Adding Padframes…….………………………………………………….................21

1. Overview
Top-down design of an ALU supporting 16-bit addition/subtraction and 8-bit multiplication is
assigned in this project. Arithmetic operations are controlled by two selection inputs s0 and s1.
The function table lists these operations.

s0 s1 function
0 0 multiplication
0 1 no operation
1 0 addition
1 1 subtraction

Table 1.1: Operation table

Figure 1.1: Interface of the ALU


Figure 1.2 illustrates the design flow. As a first step, VHDL was used to model aspects of the
system. About 300 lines of VHDL code was written for the system. Functional simulation was
carried out with Modelsim.

Figure 1.2: Design flow for the project

Once the overall system architecture and partitioning is stable, LeonardoSpectrum synthesizes
the logic. EDIF netlist file created by LeonardoSpectrum was converted into an EDDM database
that is used by Design Architect (DA) and IC.

Schematic Generator (SG) creates a schematic of the design for DA. A symbol of the design was
generated and used in Accusim to verify the schematic created. After this verification, design
viewpoints were generated and used in IC to extract the layout. The last step was to add the
padframes with the help of the layout and the symbol for the schematic.

1
2. VHDL Description
2.1 Structural Description of the ALU
The very first step was to describe the ALU in structural domain. In top-down design, designer
begins with the knowledge of the root. Hence, ALU was partitioned into subcomponents with the
help of Figure 2.1.

Figure 2.1: Structural decomposition

Interface description for the ALU is written with the help of Figure 1.1.

entity alu is
port (
a,b : in std_logic_vector (15 downto 0); --operands
s0,s1 : in std_logic; --function select
f : out std_logic_vector(15 downto 0); --function output
c : out std_logic --carry output –carry out
);
end alu;

A carry out signal (c) is needed since the addition of two 16-bit values will result at most 17-bits
long. By inspecting the Figure 2.1, one can see that there are 6 internal signals and 4 components
that should be declared in structural description. Each component declared in structural domain
will also have an interface and a behavioral description.

2
architecture structural of alu is

signal mult, add_sub : std_logic; --selection inputs to mux


signal nop : std_logic; --no operation
signal add, sub : std_logic; --decoder outputs that will be ORed and transmitted to mux
signal sum, product : std_logic_vector(15 downto 0); --carries 16-bit results
signal c : std_logic;

component decoder
port (
s0,s1 : in std_logic ;
mult : out std_logic ;
nop : out std_logic ;
add : out std_logic ;
sub : out std_logic
);
end component;

component adder
port (
a,b : in std_logic_vector (15 downto 0);
add_sub, sub : in std_logic;
sum : out std_logic_vector (15 downto 0);
c : out std_logic
);
end component;

component multiplier
port (
a,b : in std_logic_vector (15 DOWNTO 0);
mult : in std_logic ;
product : out std_logic_vector (15 DOWNTO 0)
);
end component;

component mux
port (
product,sum : in std_logic_vector (15 downto 0);
add_sub,mult : in std_logic ;
f : out std_logic_vector (15 downto 0)
);
end component;

After declaring components, internal and external signals should be mapped to appropriate ports
of the components. That is, selection inputs to decoder, operands to adder and multiplier,
decoder and adder/multiplier outputs to mux, and so on...

3
begin
add_sub <= add or sub; --the signal transmitted to the mux

p0 : decoder port map


(
s0 => s0,
s1 => s1,
mult => mult,
nop => nop,
add => add,
sub => sub
);

p1 : adderport map
(
a => a,
b => b,
c => c,
add_sub => add_sub,
sub => sub,
sum => sum
);

p2 : multiplier port map


(
a => a,
b => b,
mult => mult,
product => product
);

p3 : mux port map


(
product => product,
sum => sum,
add_sub => add_sub,
mult => mult,
f => f
);

end structural;

2.2 Decoder Module


Decoder module collects selection inputs and transmits mult, add, and sub signals to adder,
multiplier and mux modules for operation select.

entity decoder is
port (
s0,s1 : in std_logic;
mult : out std_logic; -- case 00 multiplication
nop : out std_logic; -- case 01 no operation
add : out std_logic; -- case 10 addition
sub : out std_logic -- case 11 subtraction
);
end decoder;

4
architecture behavioral_decoder of decoder is

begin
decoder: process(s0, s1)
begin
if (s0 = '0') and (s1= '0') then
mult <= '1'; nop <= '0'; add <= '0'; sub <= '0';

elsif (s0 = '0') and (s1 = '1') then


mult <= '0'; nop <= '1'; add <= '0'; sub <= '0';

elsif (s0 = '1') and (s1 = '0') then


mult <= '0'; nop <= '0'; add <= '1'; sub <= '0';

elsif (s0 = '1') and (s1 = '1') then


mult <= '0'; nop <= '0'; add <= '0'; sub <= '1';
end if;
end process decoder;

2.3. Adder Module

Adder module has the operands a, b and addition/subtraction selections add and sub as inputs.
To perform addition each bit of a, b, and a carry variable –which was initially zero– is XORed.
Carry is propagated by AND-OR operations. 2's complement is used to create a negative
representation of the variable to be subtracted. Then subtraction is a matter of adding the
negative variable to the operand from which it is being subtracted. It should be also noted that if
carry is 1 after subtraction, result has minus sign.

entity adder is
port (
a,b : in std_logic_vector (15 downto 0);
add, sub : in std_logic; --selection inputs
sum : out std_logic_vector (15 downto 0);
c : out std_logic --carry
);
end adder;

5
architecture behavioral_adder of adder is

begin
adder: process (a, b, add, sub)
variable carry : std_logic;
variable sum_reg : std_logic_vector(15 downto 0);
variable tmp_reg : std_logic_vector(15 downto 0);

begin
if (add = '1') and (sub='0') then --perform addition
carry := '0';
for i in 0 to 15 loop
sum_reg(i) := a(i) xor b(i) xor carry;
carry := (a(i) and b(i)) or (a(i) and carry) or (b(i) and carry);
end loop;
sum <= sum_reg;
c <= carry;

elsif (add = '0') and (sub='1') then --perform subtration


tmp_reg := "1111111111111111";
tmp_reg := b xor tmp_reg; --take the complement
carry := '1'; --add 1
for i in 0 to 15 loop
sum_reg(i) := a(i) xor tmp_reg(i) xor carry;
carry := (a(i) and tmp_reg(i)) or (a(i) and carry) or
(tmp_reg(i) and carry);
end loop;
sum <= sum_reg;
c <= carry; -- if c=1, result has minus sign
end if;
end process adder;

end behavioral_adder;

2.4 Multiplier Module


Shift and Add algorithm was applied in multiplier module. A pure multiplier was implemented
that only uses logical operators in ieee.std_logic_1164. Partial product is accumulated and
shifted in each step depending upon the LSB of partial product register. Since we were assigned
to design an 8-bit multiplier, most significant 8-bit of the operands a, b were ignored in
multiplication.

entity multiplier is
port (
a,b : in std_logic_vector (15 DOWNTO 0);
mult : in std_logic ; --enable
product : out std_logic_vector (15 DOWNTO 0)
);

end multiplier;

6
architecture behavioral_multiplier of multiplier is

begin
multiplier:process(a,b,mult)

variable a_reg: std_logic_vector(8 downto 0);


variable product_reg: std_logic_vector(17 downto 0);
variable psum : std_logic_vector (8 downto 0);
variable carry_reg : std_logic;
begin

if mult='1' then

a_reg := '0' & a(7 downto 0);


product_reg := "0000000000" & b(7 downto 0);
psum := "000000000";
carry_reg := '0';

for j in 1 to 9 loop --this loop is for shifting


if product_reg(0)='1' then
for i in 0 to 8 loop --this loop is for addition
psum(i) := product_reg(i+9) xor a_reg(i) xor carry_reg;
carry_reg := (product_reg(i+9) and a_reg(i)) or (product_reg(i+9)
and carry_reg) or (a_reg(i) and carry_reg);
end loop;
product_reg(17 downto 9):=psum(8 downto 0);
carry_reg :='0';

end if;

product_reg(17 downto 0) := '0' & product_reg(17 downto 1);


end loop;

product <= product_reg(15 downto 0);


end if;
end process multiplier;
end behavioral_multiplier;

2.5 MUX Module


Mux module collects signals sum and product --which holds the results of adder and multiplier
modules-- as well as add_sub and mult for selecting the data source. Output of mux f has the
result without carry.

entity mux is
port (
product,sum : in std_logic_vector (15 downto 0);
add_sub,mult : in std_logic ;
f : out std_logic_vector (15 downto 0)
);
end mux;

7
architecture behavioral_mux of mux is
begin
mux : process (sum, add_sub, mult, product)
begin
if (mult = '1') and (add_sub = '0') then --transmit multiplication result
f <= product;

elsif (mult = '0') and (add_sub = '1') then --transmit add/sub result
f <= sum ;

else
f <= "0000000000000000";

end if;

end process mux;


end behavioral_mux;

8
3. Functional Simulation with Modelsim
Functional simulation is performed by compiling VHDL files using the Modelsim command
vcom, loading the design with vsim and finally starting the simulation with the run command.
The waveform window shows how the output value is incremented for each clock cycle. The
design was tested for various signals to ensure the operation.

As an example, if operand a = 1010 1100 0111 0001 and operand b = 0011 0001 1101 1001 ,
then a + b = 1101111001001010 and a − b = 0111 1010 1001 1000 . On the other hand, for
multiplication we should consider least significant 8-bits. Hence, 0111 0001 × 1101 1001
equals 0101 1111 1100 1001 . Figures 3.1, 3.2, 3.3, and 3.4 illustrate these results. Output signals
f, c and internal signals can be observed in the wave window.

Figure 3.1: Multiplication (case 00)

Figure 3.2: No operation (case 01)

9
Figure 3.3: Addition (case 10)

Figure 3.4: Subtraction (case 11)

10
4. Logic Synthesis with LeonardoSpectrum
VHDL synthesis with LeonardoSpectrum produces registered and combinational logic at the
RTL level. RTL specify the characteristics of a circuit by operations and the transfer of data
between the registers. The RTL schematic for the design is shown in Figure 4.1. Notice that
RTL schematic matches with the figure assigned1.

Figure 4.1: RTL schematic of the ALU

Hierarchy of the RTL schematic looks as Figure 4.2.

Figure 4.2: RTL schematic with hierarchy

Leonardo takes a design in RTL, and produces a highly optimized gate-level netlist with
minimum delay and area. We are assigned to synthesize the design with ami05μ (typ) at 200
MHz. But the delay report indicates that critical path is 13.33ns. I decided to synthesize at 75
MHz. Another issue is that synthesis with ami05μ (typ) takes to much time and practically
cumbersome. After waiting 4886 CPU seconds for the optimization, I had switched to ami05μ

1
http://www.ee.hacettepe.edu.tr/~alkar/ELE711/index_files/image001.gif

11
(fast) and synthesize the design with area optimization. The critical path and delay and reports
are as follows:

Figure 4.3: Critical path

Critical path #1, (path slack = 0.1):

NAME GATE ARRIVAL LOAD


------------------------------------------------------------------------------
a(0)/ 0.00 0.00 dn 0.21
p2/ix832/Y inv02 0.13 0.13 up 0.06
p2/ix79/Y nor04 0.37 0.49 dn 0.01
p2/ix81/Y ao21 0.44 0.93 dn 0.05
p2/ix154/Y aoi32 0.31 1.25 up 0.06
p2/ix137/Y inv02 0.13 1.38 dn 0.01
p2/ix214/Y aoi32 0.28 1.66 up 0.06
p2/ix213/Y inv02 0.13 1.80 dn 0.01
p2/ix272/Y aoi32 0.28 2.08 up 0.06
p2/ix313/Y xnor2 0.35 2.43 dn 0.01
p2/ix317/Y ao21 0.68 3.11 dn 0.07
p2/ix292/Y xnor2 0.25 3.36 up 0.06
p2/ix331/Y inv02 0.12 3.48 dn 0.01
p2/ix386/Y aoi22 0.30 3.78 up 0.05
p2/ix571/Y oai22 0.40 4.18 dn 0.05
p2/ix564/Y aoi22 0.34 4.53 up 0.05
p2/ix624/Y xnor2 0.34 4.86 dn 0.02
p2/ix697/Y inv02 0.12 4.98 up 0.02
p2/ix558/Y mux21 0.37 5.35 dn 0.07
p2/ix715/Y xnor2 0.26 5.62 up 0.06
p2/ix717/Y xnor2 0.23 5.85 dn 0.01
p2/ix721/Y ao21 0.71 6.56 dn 0.08
p2/ix574/Y xnor2 0.25 6.80 up 0.05
p2/ix849/Y oai22 0.32 7.12 dn 0.05
p2/ix700/Y aoi22 0.34 7.47 up 0.05
p2/ix750/Y xnor2 0.34 7.80 dn 0.02
p2/ix949/Y inv02 0.12 7.92 up 0.02
p2/ix690/Y mux21 0.37 8.30 dn 0.07
p2/ix967/Y xnor2 0.26 8.56 up 0.06
p2/ix969/Y xnor2 0.23 8.79 dn 0.01
p2/ix973/Y ao21 0.71 9.50 dn 0.08
p2/ix714/Y xnor2 0.25 9.75 up 0.06
p2/ix989/Y xor2 0.53 10.28 dn 0.02
p2/ix680/Y mux21 0.36 10.64 up 0.07
p2/ix1007/Y xnor2 0.39 11.03 dn 0.06
p2/ix758/Y inv02 0.13 11.15 up 0.02
p2/ix1081/Y oai22 0.28 11.44 dn 0.05
p2/ix766/Y aoi22 0.34 11.78 up 0.05
p2/ix1173/Y oai22 0.40 12.18 dn 0.05
p2/ix822/Y aoi22 0.22 12.41 up 0.01
p2/ix1197/Y nor02 0.24 12.65 dn 0.02
p2/ix1199/Q latch 0.36 13.01 dn 0.01
ix101/Y ao22 0.25 13.26 dn 0.00
f(15)/ 0.00 13.26 dn 0.00
data arrival time 13.26

data required time (default specified) 13.33


------------------------------------------------------------------------------
data required time 13.33
data arrival time 13.26
----------
slack 0.07
------------------------------------------------------------------------------

12
*******************************************************

Cell: alu View: structural Library: work

*******************************************************

Cell Library References Total Area

adder work 1 x 251 251 gates


and02 ami05_fast 1 x 1 1 gates
ao22 ami05_fast 16 x 2 31 gates
fake_gnd ami05_fast 1 x 1 1 fake_gnd
inv02 ami05_fast 6 x 1 5 gates
multiplier work 1 x 496 496 gates
nor02 ami05_fast 2 x 1 2 gates

Number of ports : 51
Number of nets : 93
Number of instances : 28
Number of references to this view : 0

Total accumulated area :


Number of fake_gnd : 1
Number of gates : 785
Number of accumulated instances : 534

There is also an overall picture for the technology dependent schematic shown in Figure 4.4.

Figure 4.4: Overall picture for the design synthesized in ami05 fast technology

13
5. Schematic Generation
EDIF netlist was converted into EDDM database with enread script. Schematic Generator
creates a visual schematic from the EDDM file. This schematic contains 8 sheets and impractical
to depict here. After convention, Design Architect was used for symbol generating a symbol of
the schematic.

6. Simulation in Accusim
adk_dve script generates design viewpoints that will be used in Accusim. Since I/O signals a, b,
and f are 16-bit, wiring must be realized using busses. Each bus should be named after its size.
For an n-bit bus, net name should be like a(n:0). After connecting busses, port in/port out
symbols were connected to them and each I/O bit is manually named. Figure 6.1 illustrates this
process.

Figure 6.1: Naming bus bits

14
Port a was excited with 1010 1100 0111 0001 (5V for 1) and port b with 0011 0001 1101 1001 .
If the schematic is correct we should observe 1101111001001010 for a+b and
0111 1010 1001 1000 for a − b . For multiplication we should consider least significant 8-bits.
Hence, 0111 0001 × 1101 1001=0101 1111 1100 1001 .

Figure 6.2: Excitation of signals (case 11 - subtration)

Figures 6.3, 6.4, 6.5, and 6.6 validate that the schematic of the ALU is correct.

15
Figure 6.3: Multiplication Figure 6.4: No operation

16
Figure 6.5: Addition Figure 6.6: Subtraction

17
7. Automated Layout
A new a cell is created with a link to the generated EDDM schematic viewpoint (layout version).
After the insertion of the floorplan and standard cells, layout looks like figure 7.1. Cell locations
are determined by their interconnectivity. Cells which share connections are placed near one
another.

Figure 7.1: Layout before routing metal layers.

Yellow lines indicate overflows which have to be routed. This structure has no METAL1 or
METAL2 layers yet. Autoroute command inserts METAL1 and METAL2 layers to route all
connections in the cell. Figure 7.2 shows the finalized layout. Zooming in reveals that only
standard cells (XOR, XNOR, AND, OR, latch, etc.) and metal layers are depicted in the layout.
This is the case for a semi-custom ASIC design process since we are using cell libraries of
ami05.

18
Figure 7.2: Layout of ALU

Figure 7.3: A detail view of the layout in which standard cells can be noticed

19
8. Adding Padframes
Adding padframes to the cell requires the symbol of the schematic for the cell extracted. Cited
symbol should be inserted into a new schematic. The core logic needs phy_comp property to be
attached to its symbol, so that the symbol can be identified with its corresponding physical
layout. The value of phy_comp needs to be the name of the IC cell layout created. After this
operation, pads were chosen from ADK Library (ami05) and inserted to the schematic like
Figure 8.1. ALU has two 16-bit and two 1-bit inputs, one 16-bit and one 1-bit output. 51 pads for
I/O and 2 pads for VDD/GND are needed.

Figure 8.1: Inserting pads in DA (symbol ALU is in the middle)

Pads should be connected to the symbol using busses. Each bus should have a net name like
a(n:0) like mentioned in Section 6. This process is depicted in Figures 8.2 and 8.3.

Figure 8.2: Connecting pads to the symbol with busses

20
Figure 8.3: Connecting pads to the symbol with busses

At this stage allocation of the ports to the pins of the padframe should be done. The properties
‘PINXX’ of the I/O pads must be changed so that ‘XX’ is replaced with the pin number of the
padframe. Once this task is over, schematic looks as Figure 8.4.

Figure 8.3: Overall image of the schematic after wiring is over

The sheet should be saved with a new name and adk_dve script generates new design
viewpoints.
Creation of the layout of the complete chip follows the Schematic Driven Layout (SDL)
methodology. IC Station was invoked to create a new cell. The cell name should be the same

21
as the new schematic. When creating the cell, connectivity option should point to layout
viewpoint. From SetupÆSDL we should modify the search path so that layout of the ALU is
included. From ADK Edit palette menu and logic source window was opened. And from DLA
logic palette Inst was clicked to place the cell. Now the layout looks as Figure 8.4.

Figure 8.4: SDL of the cell with padframes

ADK -> Generate Padframe -> AMI 0.5 will generate the padframe. 0.5 micron tinychip has 40
pads while 53 I/O pads are needed for the design. ALU does not fit into padframe generated.
This inconsistency is illustrated in Figure 8.5

22
Figure 8.5: ALU does not fit into padframe generated since pads are outnumbered

To verify the steps I had followed were correct, I decided to reduce the number of pads by using
both inputs on the pads as input ports (One pad for two bit). Of course, this is not correct, but I
just wanted to ensure that the only problem is the lack of pad number. With the reduced pad size
(to 37) ALU fits into padframe and AutorouÆAll will locate the pads as depicted in Figure 8.6.

23
Figure 8.6

24

You might also like