Integer Add and Mult

1
ECE 366 Computer Architecture
Instructor: Shantanu Dutt Department of Electrical and Computer Engineering University of Illinois at Chicago
Lecture Notes # 10 COMPUTER ARITHMETIC: Integer Addition and Multiplication
c Shantanu Dutt, UIC
ADDERS Full Adder FA:

X i Yi
Ci+1
FA i
Ci Si
Inputs Outputs are

, where

Ripple-Carry Adder (RCA)

x7 y7 x6 y6 x5 y5 x4 y4 x3 y3 x2 y2 x1 y1 x0 y0
cout FA6 c7 S6 S5 S4 S3 c6 c5 c4 S7
FA6
FA5
FA4
FA3
c3
FA2
c2 S2
FA1
c1 S1
FA0
c0 S0
Problem: Delay is gate delays or each FA has a 2-gate delay. Thus is each gate has a delay of 2ns, delay for a 32-bit RCA is 64ns.

Overow in Addition

Overow occurs when the result of the operation does not t in the representation being used For example, if 4-bit unsigned numbers 6 = and 12 = are added the sum (18) overows since its binary equivalent does not t in 4 bits Overow is detected for unsigned addition when the carry out of the nal Full Adder is 1 For 2s complement representation of signed numbers overow occurs when the carry into the MSB (most signicant bit), which is also the sign bit, is different from the carry out of that bit. The carry out of the MSB bit representation always represents the sign bit of the sign-extended of the sum
! ! " ! ! !
"
"
ADDERS (Contd.) Speeding up addition A faster adder: The traditional carry-lookahead adder (CLA): Dene two extra functions:
$ %
X i Yi
FA i
Ci From carrygen. S Pi Gi logic i To carrygen. logic
is the generate bit, which is 1 only if a carry out is to be generated irrespective of the input carry. This obviously is the case only when
!
is the propagate bit, which is 1 only if the output carry is to be the same . This will be the case only when either as the input carry, i.e., or (but not both) is 1 Thus the carry out of the th stage can be expressed as:
% '
&
Traditional CLA Adder (Contd.)
Consider a 4-bit adder: Note that and can be generated in constant time (specically, 1 gate delay) by each modied full adder (MFA).
$ % $( %( ( "
$
%
$
%
$
%
$(
$(
%
%(
%( $ " % " $( % " %( % % % ( $ " % " $( % ) ) ) "
)
"
"
"
0
)
"
)
)
"
%( % % % % % %
The s are generated by a carry-generation unit as shown below.

X 3 Y3
X 2 Y2
X 1 Y1
X 0 Y0
C4 S3
FA 3
C3 S2 P3 G3
FA 2
C2 S1 P2 G2
FA 1
C1 S0 P1 G1 CARRY GENERATION LOGIC
FA 0
C0
P0
G0
In this CLA the s use 2-level logic and thus have a delay of 3 gate delays (1 for each , and 2 for the s), as opposed to gate delays for a 4-bit RCA.
1 2
Disadvantage: For a 16-bit adder, for example, we cannot go on generating the s in this manner, since the hardware becomess execssive and messy A 16-bit adder can be partitioned into groups of 4 4-bit CLA adders, with the inter-group carries rippling through the four groups:

X 1512 Y1512 4 4bit CLA C 16 = S 16 4 S 1512 C 12 4 4
X 118 Y118 4 4bit CLA C8 4 S 118
X 74 4
Y74 4 4bit CLA C4 4 S 74
X 30 4
Y30 4 4bit CLA
C0 4 S 30
Such a 16-bit adder has a delay of 12 gate delays (= 24ns for a gate delayof = 32 gate delays (= = 64ns) in 2ns) as opposed to a delay of a 16-bit RCA In general for an -bit group-CLA adder with 4-bit CLA cells, the delay gate delays as opposed to gate delays for the ripple-carry adder is
1 1 ! 4 2 5 6 5

7
A note of caution: -input gates have greater delays than 2-input gates for . We had assumed that for a 4-bit CLA the 5-input OR and AND gates have the same delay as a 2-input gate. The delay of a 5-input gate will be greater because:
7 8
Vdd A Vdd B AB A Gnd B AB A B
(a) 2i/p OR gate: delay = 2t s+ 3RC

Vdd A C D B E A+B+C+D+E A B
(c) 5i/p OR using 2i/p gates: delay = 4t s + 7RC
Vdd E ABCDE A B C ABCDE D Gnd E B D A C E
(b) 5i/p OR gate: delay = 2t s + 6RC
Let be the max. of the switching on and the switching off time of a MOS transistor, i.e., the max. of the time for charge to collect at the channel or the time for the charge to be removed from the channel Let be the resistance of a channel, and the input gate capacitance of a transistor Then, a 2-i/p AND/OR gate has a delay of
@ 9 A B 9 A 5
a 5-i/p gate has a delay of

9 A 4 @ B
while a 5-i/p AND/OR ckt. formed of cascaded 2-i/p gates will have a delay of
9 2 A C @ B
Thus, while a 5-i/p AND/OR gate will be slower than a 2-i/p AND/OR gate, it will be faster than a cascaded implementation of a 5-i/p AND/OR ckt. The latter is primarily because more transistors switch on in parallel in a 5-i/p AND/OR gate than in a 5-i/p AND/OR ckt.
@
10
In a 4-bit CLA adder, we are essentally replacing a series of 2-i/p cascaded (multi-level) logic by a multiple-input 2-level logic Note that the delay of an -bit group-CLA adder with 4-bit cells will be somewhat more than 2-i/p gate delays.
5 6 2
10
11
Faster adders
Can we do better than a delay that is linear in the number of bits ? Yes, by using a carry-select adder ( time), OR by using a parallel prex circuit to generate all carries in time Carry-Select adder:
D FE G H
x n1 , y n1
, y
Add in
0 c
out
Add
0 in
out
Add
0 in
out
Add
in
Add in
out
Add
1 in
out
Add
1 in
Mux
Mux
Mux
s n
If group size is , time taken is For , the time minimizes to delays

Q
D 2
D 2
(2-i/p) gate delays (2-i/p) gate

PI 6 I 2
11
12
The parallel-prex CLA adder

' TS S
For the th full adder, dene the symbols : kill incoming carry (when ) : propagate incoming carry (when ) : generate a carry (when ) We can encode these symbols as and call these pair of bits . Each FA can produce in constant time As a matter of fact, , where and are the generate and propagate bits of a FA discussed earlier Dene operator as:
! ! ! ! R % U U U % $ V $ % $ !
!
R U %
R % R R R % $ $ $ $ $ R $
Note that is an associative operator, i.e., Also, is not a commutative operator, i.e.,
V V
V XW Y W Y V Y ` WV W Y
12
13
The parallel-prex CLA adder (contd.) Dene

a ' Ved V a d cb d U( U U V Ved # b V d # b " U( U U V d
for the th stage as
is the th prex of the associative computation:

'
If we can compute each quickly, then we can obtain the carry-in the th FA as follows: If then else if then else if then The s can be computed in constant time after the s are available
a ' ! a & R a a & % $ ! ( a
of
13
14
The parallel-prex CLA adder (contd.) Computing the s quickly: Dene as

gf a a V d cb V d a U U gf U fih V g
Thus, Since is associative, we have

a f ( a V a gf
.
a a h V b
r i,j ! r i,k r k1,j
k k1
gf
14
15
The parallel-prex CLA adder (contd.)

gf a
We can use the above property to form 1-level s by combining 2 adjacent s, then 2-level s by combining two adjacent 1-level s, etc. This yields a tree-structured circuit with a logic at every node; this ckt. gives us only those s for which for some
a U gf ' q Ip a ! V
r 7,0 r 7,4 ! r7,6 ! q7 q6 q5 r 5,4 ! q4 q3 r3,2 ! q2 q1 ! r 3,0 ! r 1,0 ! q0
gf
15
16
The parallel-prex CLA adder (contd.)

a
To obtain all s, the tree has to be augmented as shown below.

r 7,0 r 3,0 3,0 p r 7,4 r
!!
Legend: x x
!!
r7,6 r 5,0 1,0 5,4 r 1,0 r r3,2 r
!! !! c ! a
q 0
! a y b
!!
q 7 r r r 6,0 q q 4,0 q 2,0 q q q 3 2 1 6 5 4
!! !
!!
E G H
This circuit is called a parallel prex circuit, and can be used to obtain the prexes of any associative operation (like AND, OR, addition, multiplication, etc.) The delay is -logic steps, steps to go up the tree and steps to come down Extra hardware used: logic units VLSI area reqd.: height of tree is , width of tree is
I E ! V G E G H I H G ! ts u I# I H I v ! ! ! I ! E G H V
16
17
The parallel-prex CLA adder (contd.) Another VLSI implementation of a parallel prex tree:
q n1 q 2 0 q
i+1
i+1,i
r n1
17
18
The parallel-prex CLA adder (contd.) The nal hardware:

C out
Parallel Prefix Module

Cin C
15
q15
C 14 S 2 2 S
q 14
q2
2 S
C2
q1
2 S
C1
q0
2 S
FA
FA
FA
FA
FA
X Y
15 15
X Y
14 14
X2 Y
X 1Y
X Y
0 0
18
19
COMPUTER ARITHMETIC SUBTRACTION

I I w x w x
Subtraction can be done using an adder, since This means that , which we assume is in 2s complement notation, has to be negated. A 2s complement number is negated by complementing it and adding a 1, i.e., The augmentation to an adder to perform subtraction is shown below:
x I x x y
Y n n n 1 nbit Adder
Cout n
Cin
Subtract/Add = 1/0 From Control Unit
19
20
COMPUTER ARITHMETIC MULTIPLIERS Serial Multiplication Add-and-shift (A&S) multiplication: Manual Example:
If the additions are done one at a time, we obtain a sequence of partial products
( e #
21
Each partial product

(
is obtained as
d
x
Thus
# b d d # b # x x x d (
where
w x
is the multiplier and
the multiplicand
21
22
A&S multiplication (contd.):
The same effect as shifting the multiplicand left ( ) can be achieved by keeping the multiplicand xed at the left-most position and shifting the partial product right. Example:
x
In this case, the partial products obtained are:

b ( # x
However,
is the same:
#
# b
#
22
23
A&S multiplication (contd.): Hardware:

C out Reg. Multiplier X Accumulator 16 16 And 16 C out 1 16 Addandshift multiplication for unsigned numbers 16bit Adder 1 Q 1
Multiplicand Y M
Algorithm: Initialize AC = 0; Q = Multiplier; M = Multiplicand. Do the following steps times
If LSB of Q is 1 then AC = AC+M else AC = AC; Shift -AC-Q register combination right by 1 bit
B
Final product is in AC-Q register NOTE: Overows are tolerated in the additions AC when right shifting
is fed to the MSB of

B
23
24
A&S 2s complement multiplication
Assumption: Both and are in their 2s complement representation Method 1: If multiplier is -ve get its 2s complement so that it becomes positive, i.e. If multiplicand is -ve get its 2s complement so that it becomes positive, i.e. Multiply If exactly one of and was negative, get s 2s complement so that it becomes negative, i.e. Disadvantage: Preprocessing and postprocessing can take up to 4 clock cycles (ccs)
w w I w w x I x w x w x x x I
24
25
A&S 2s complement multiplication (contd.) :
Method 2: When the multiplier is +ve perform taking care to do the following when each is shifted right: 1. When there is no overow in the addition (recall the condition for overow for 2s complement addition), an arithmetic right shift of register AC.Q is performed without shifting in into the MSB of AC 2. Arithmetic right shift: MSB is sign-extended, i.e., if the MSB (sign bit) is 1 a 1 is shifted into the MSB of AC, otherwise a 0 is shifted in 3. If there is an overow, then as in the unsigned case, shift into MSB of AC when shifting AC.Q right. This works because in this case the bit output of the adder, where is the MSB, is the exact 2s complement representation of the sum. Check this by sign extending the inputs to bits and compute the sumthe output will be the same as for the -bit inputs, but without overow out of the th bit
w w B x B ! ! !
25
26
A&S 2s complement multiplication (contd.) : Method 2 (contd.): Example:
26
27
A&S 2s complement multiplication (contd.) :
Method 2 (contd.): If the multiplier is negative, perform the rst additions as explained above, and then subtract as the nal step This works because the value of a 2s complement number is given by
I w x ! # b ( # b ( I # b w # b "
Thus is 1 and thus

I w x
# b
. When
# b # b ( x " ( I x x d # b
is negative,
w
# b
where the rst term represents normal bits of

! w
"
multiplication for the rst

I
# b
27
28
A&S 2s complement multiplication (contd.) : Method 2Example:
Note that multiplication is performed on the basis:

w w ed Xf x
The magnitude of a negative

w f
is given by
w I # # b ( w I # b "
Thus which is what we are doing

# b I I w x
# b ( # b x ( I # b "
# b
"
28
29
A&S 2s complement multiplication (contd.) : Method 2 (contd.): Hardware:

1 Accumulator AC[0] Logic 1 1 If ovfl then AC[0] : output = C out Logic else output = AC[0] 16 Ovfl. det. C 15 C out 1 16bit Adder 1 16 And 16 Multiplier X Q 1
Multiplicand Y M
16
Addandshift multiplication for 2s complement numbers
29
30
Speeding Up Serial Multiplication Booths Algorithm Idea: Consider the following substring of
gkj gi gh h h h h gh g g g o mj o i m lm gh h h h n h g g g g g g gh g g n g I x ! o x x w w g n g g g g g w
Thus instead of adding and shifting 4 times (corresponding to the string of 4 1s in 011110), we can subtract ( ) when we see the 1st 1 coming after a 0 in , i.e., we detect the 2-bit substring 10 in the last 2 bits of the current , just shift 4 times and then add ( ) when we see that the current string of 1s in have ended, i.e., we detect the 2-bit substring 01 in the last 2 bits of the current . This saves us two adds Thus when the multiplier contains long (greater than length 2) strings of 1s, Booths multiplication is faster
x w w
!
30
31
Booths Algorithm (contd.)

w
Booths multiplication also takes care of a negative multiplier automatically. Consider the following :
j p q r i g j gi h h h h gh g g g g g g g g g g g
Since the multiplication algorithm contains steps ( in the above example), the 1000000 is ignored and we end up subtracting times the multiplicandexactly the right answer!
4
3
31
32
Booths algorithm is decribed by the following table for iteration : Bit Bit Explanation (current) (prev.) 1 0
I ' ' !
Action
I x
Ts x !
1 0 0
1 1 0
Beginning of a run of 1s Middle of a run of 1s Only shift 0 End of a run of 1s 1 Middle of a run of 0s Only shift 0
Note: (1) For unsigned multiplication, we need to pad the multiplier with mythical 0s on both sides (right of LSB padding required to start off the process) (2) For 2s complement multiplication, we need to pad the multiplier with a mythical 0 only to the right of its LSB. This works because (for 2s complement, the last run of 1s is 11. . . 1, where the leftmost 1 is the sign bit (bit ) and suppose the rightmost 1 is the th bit from left, then the value of this sequence is , which is exactly the value we alloted to this sequence when we subtracted at the th bit position at the
I ! I# I g I g # b ' x
'
33
beginning of this last run of 1s. Further, if is the value we alloted to the rest of the multiplier before the last run of 1s, then the nal value we give to the multiplier is , which is its correct value in 2s complement: . (3) is the th bit of the Booth Recoding of : (4) means no arithmetic operation, means add , means subtract
I I# t g t g # b u ws v t ' w t xs ! x
Hardware: Excercise
33
34
Booths Algorithm (contd.) Examples:
34
35
Problem: When the multiplier contains long strings (say, of length ) of alternating 1s and 0s ( ), then we perform additions and subtractions using Booths algorithm compared to only additions using regular add-and-shift Solution: Look at 3 consecutive bits of the multiplier instead of 2 to decide what to do. This is called the Modied Booths Algorithm (MBA) This will enable us to treat isolated 1s and 0s differently from runs of 1s and 0s.
! ! ! ! 7 6 6
35
36
Modied Booths Algorithm Thus when we see a

gh x x g
we add corresponding to the isolated 1. This is correct, since in BA we would have subtracted a on detecting 10 and added on subsequently detecting 01. Thus assuming the 1 is the th bit, we would have added we are doing the RHS in MBA
I x x x h gh ' x
When we see a
and this isolated 0 is not following an isolated 1, then we subtract corresponding to the isolated 0. This is correct, since in BA we would on detecting 01 and then subtracted on detecting 10. have added This is equivalent to adding again, we are doing the RHS in MBA
x I x x I x x
36
37
Modied Booths Algorithm (contd.)

w
After detecting an isolated 1 (010), it should be noted as such so that after shifting right, we dont misinterpret the bit pattern as ending a run of 1s. For example, consider two bit patterns showing the 4 consecutive bits of
! z y { ! x ! z y { ! !
The 1st has an isolated 1 and the 2nd a run of 1s. For the 1st case we have added corresponding to the 1, and in the second case, we do not do anything as we are in the middle of a run of 1s (as in BA) After a right shift, we have the patterns
which are identical. In the rst case, we need to have noted that the 1 corresponds to an isolated 1, so that we do not do anything. In the second case, this means end of a run of 1s, and so we need to add (as in BA). These cases are distinguished by setting a latch to be 0 when an isolated 1 is spotted, and to 1 if a run of 1s is spotted Thus actually the least signicant of the 3 bits that we observe should be and not the previous bit of . Except when distinguishing between an
| w | x
38
isolated 1 (0) and the end of a run of 1s (0s) will be the same as the previously observed bit of . Thus after a right shift, the above 3-bit patterns will be
w z y | { | ! |
38
39
Similarly, we need to distinguish between an isolated 0 and the end of a run of 0s. In the former case, is set to 1 and to 0 in the latter case Again, consider two bit patterns showing 4 consecutive bits of
| ! ! ! z y { ! !
The 1st has an isolated 0 in its 2nd bit for which we subtracted and the 2nd pattern has a 0 in its 2nd bit that ends a run of 0s. Thus after a right shift, the above 3-bit patterns will not be identical, but will be
! ! ! z y | { ! ! |
In the rst case, we correctly do nothing (we already subtracted corresponding to the isolated 0) corresponding to the middle bit and in the second we subtract , since the middle bit begins a run of 1s which has not yet been accounted for.
x
39
40
The rightmost bit in the 3 bits that we are looking at is actually which is initialized to 0 The second bit is (in the th iteration, ), and the leftmost bit is Note that except in the isolated 1/0 case, (as in BA), otherwise
I ' } ' } | cb y | cb !
40
41
We thus have the following Modied Booths Algorithm described for iteration , :
I ' } ' }
Bit (next) 0 0 1
~
Bit (current) 0 1 0 1 0 1 0 1 0 1 1 1 1
Explanation
New 0 0 0 1 0 1 1 1
~
1 0 0 1 1
0 Middle of a run of 0s 0 Isolated 1 0 Isolated 0 following an isolated 1 OR Middle of a run of 0s Begins a run of 1s Begins a run of 0s Middle of a run of 1s Isolated 0 following a run of 1s Middle of a run of 1s
NOTE: (1) The multiplier needs to be padded by mythical 0s on both sides for unsigned and 2s complement multiplication. (2) is the th bit of the Modied Booth Recoding of (3) This signed-digit encoding has 0s on the average, as opposed to in the regular binary code. Thus fewer arithmetic operations are required on the average using MBA for multiplication.
xs ' w 6 5
0 1 0
1 0 0
41
42
Modied Booths Algorithm (contd.) Examples:
42

Integer Add and Mult

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Integer Add and Mult

Uploaded by

Copyright:

Available Formats

1

ECE 366 Computer Architecture

Lecture Notes # 10 COMPUTER ARITHMETIC: Integer Addition and Multiplication

c Shantanu Dutt, UIC

ADDERS Full Adder FA:

Inputs Outputs are

c Shantanu Dutt, UIC

Ripple-Carry Adder (RCA)

c Shantanu Dutt, UIC

c Shantanu Dutt, UIC

Ci From carrygen. S Pi Gi logic i To carrygen. logic

c Shantanu Dutt, UIC

Traditional CLA Adder (Contd.)

%(    $ " % "  $( % " %( % % % (  $ " % "  $( % ) ) ) "

The s are generated by a carry-generation unit as shown below.

C1 S0 P1 G1 CARRY GENERATION LOGIC

c Shantanu Dutt, UIC

Traditional CLA Adder (Contd.)

X 1512 Y1512 4 4bit CLA C 16 = S 16 4 S 1512 C 12 4 4

X 118 Y118 4 4bit CLA C8 4 S 118

Y74 4 4bit CLA C4 4 S 74

Y30 4 4bit CLA

c Shantanu Dutt, UIC

Traditional CLA Adder (Contd.)

(a) 2i/p OR gate: delay = 2t s+ 3RC

(c) 5i/p OR using 2i/p gates: delay = 4t s + 7RC

Vdd E ABCDE A B C ABCDE D Gnd E B D A C E

(b) 5i/p OR gate: delay = 2t s + 6RC

c Shantanu Dutt, UIC

Traditional CLA Adder (Contd.)

a 5-i/p gate has a delay of

c Shantanu Dutt, UIC

If group size is , time taken is For , the time minimizes to delays

(2-i/p) gate delays (2-i/p) gate

c Shantanu Dutt, UIC

The parallel-prex CLA adder

c Shantanu Dutt, UIC

The parallel-prex CLA adder (contd.) Dene

for the th stage as

is the th prex of the associative computation:

c Shantanu Dutt, UIC

The parallel-prex CLA adder (contd.) Computing the s quickly: Dene as

Thus, Since is associative, we have

r i,j ! r i,k r k1,j

c Shantanu Dutt, UIC

The parallel-prex CLA adder (contd.)

r 7,0 r 7,4 ! r7,6 ! q7 q6 q5 r 5,4 ! q4 q3 r3,2 ! q2 q1 ! r 3,0 ! r 1,0 ! q0

c Shantanu Dutt, UIC

The parallel-prex CLA adder (contd.)

To obtain all s, the tree has to be augmented as shown below.

c Shantanu Dutt, UIC

c Shantanu Dutt, UIC

The parallel-prex CLA adder (contd.) The nal hardware:

Parallel Prefix Module

c Shantanu Dutt, UIC

COMPUTER ARITHMETIC SUBTRACTION

Subtract/Add = 1/0 From Control Unit

c Shantanu Dutt, UIC

Each partial product

is the multiplier and

c Shantanu Dutt, UIC

A&S multiplication (contd.):

In this case, the partial products obtained are:

c Shantanu Dutt, UIC

%( $ " % " $( % " %( % % % ( $ " % " $( % ) ) ) "