Alu Project Theory

International Journal of Current Engineering and Technology ISSN 2277 4106
2013 INPRESSCO. All Rights Reserved

Available at http://inpressco.com/category/ijcet
Research Article
Design and Simulation of Floating Point Pipelined ALU Using HDL
and IP Core Generator
Itagi Mahi P.a* and S. S. Kerurb
a
Department of Electronics and Communication Engineering, SDMCET, Dharwad, India
Abstract
Short for Arithmetic Logic Unit, ALU is one of the important components within a computer processor. It performs
arithmetic functions like addition, subtraction, multiplication, division etc along with logical functions. Pipelining allows
execution of multiple instructions simultaneously. Pipelined ALU gives better performance which will evaluated in terms
of number of clock cycles required in performing each arithmetic operation. Floating point representation is based on
IEEE standard 754. In this paper a pipelined ALU is proposed simulating five arithmetic operations namely addition,
subtraction, multiplication, division and square root in the HDL environment. Simulation Results is also obtained in IP
Core Generator supported by Xilinx. Synthesis is carried out on the Xilinx 13.2 platform and ISim is used for the
simulation process.
Keywords: ALU, IEEE standard 754, Single Precision, Pipelining, Hardware Description Language (HDL), VHDL.
1. Introduction for binary floating-point numbers in IEEE 754. This

standard is followed by almost all modern machines. It is
1
The idea of floating-point representation over an effort for the computer manufacturers to conform to a
intrinsically integer fixed-point numbers, which consist common representation and arithmetic convention for
purely of significand, is that expanding it with the floating point data.
exponent component achieves greater range. For instance, The standard defines:
to represent large values, e.g. distances between galaxies, Arithmeticformats: sets
there is no need to keep all 39 decimal places down of binary and decimal floating-point data, which
to femtometre-resolution. Assuming that the best consist of finite numbers (including signed
resolution is in light years, only the 9 most significant zeros and subnormal numbers), infinities, and special
decimal digits matter, whereas the remaining 30 digits "not a number" values (NaNs)
carry pure noise, and thus can be safely dropped. Interchange formats: encodings (bit strings) that may
In computing, floating point describes a method of be used to exchange floating-point data in an efficient
representing an approximation of a real number in a way and compact form
that can support a wide range of values. The term floating Rounding rules: properties to be satisfied when
point refers to the fact that a number's radix point (decimal rounding numbers during arithmetic and conversions
point, or, more commonly in computers, binary point) can Operations: arithmetic and other operations on
"float"; that is, it can be placed anywhere relative to the arithmetic formats
significant digits of the number. Over the years, a variety Exception handling: indications of exceptional
of floating-point representations have been used in conditions (such as division by zero, overflow, etc.)
computers. However, since the 1990s, the most commonly
encountered representation is that defined by the IEEE The IEEE single precision floating point standard
754 Standard. representation requires a 32 bit word whose bits may be
The Xilinx CORE Generator System delivers a library represented as numbered from 0 to 31, left to right. The
of both parameterizable and point solution LogiCORE IP format is as follows:
cores with detailed data sheet specifications. LogiCORE
IP are designed and supported by Xilinx.
1.1 Single Precision Floating Point Format Fig1: Single Precision Floating Point Format
The IEEE has standardized the computer representation
1-bit Sign signifies whether the number is positive or
*Corresponding author: Itagi Mahi P negative. 1 indicates negative number and 0 indicates
263 |Proceedings of National Conference on Women in Science & Engineering (NCWSE 2013), SDMCET Dharwad
Itagi Mahi P et al International Journal of Current Engineering and Technology, Special Issue1 (Sept 2013)
positive number. 8-bit Exponent provides the exponent capabilities. Floating-point representation is able to retain
range from E (min) =-126 to E (max) =127. 23-bit its resolution and accuracy. IEEE specified standard for
Mantissa signifies the fractional part of a number. The floating-point representation is known as IEEE 754
mantissa must not be confused with the significand. The standard. This standard specifies interchange and
leading 1 in the significand is made implicit. arithmetic formats and methods for binary and decimal
floating-point arithmetic in computer programming
1.2 Conversion of Decimal to Floating Point Number environments. (IEEE 754-2008)
The main objective of implementation of floating point
Conversion of Decimal to Floating point 32 bit format is operation on reconfigurable hardware is to utilize less chip
explained with an example. Suppose the decimal number area with less combinational delay (Karan Gumber et.al,
considered is 129.85. Before converting into floating May 2012) which means less latency i.e. faster speed. A
format this number is converted into binary value which is parameterizable floating point adder and multiplier
10000001.110111. After conversion the radix point is implemented using the software-like language Handel-C,
moved to the left such that there will be only one bit using the Xilinx XCV1000 FPGA, a five stages pipelined
towards the left of the radix point and this bit must be 1. multiplier achieved 28MFlops (A. Jaenicke et. Al ,2001).
This bit is known as hidden bit. The above binary value The hardware needed for the parallel 32-bit multiplier is
now can be represented as 1.00000011101110000000000. approximately 3 times that of serial.
The number which is after the radix point is called the A single precision floating point multiplier that doesnt
mantissa which is of 23 bits and the whole number is support rounding modes can be implemented using a digit-
called significand which is of 24 bits(including the hidden serial multiplier (L. Louca et. al, 1996). The ALU is a
bit). Now the number of times the radix point is shifted fundamental building block of the central processing
(say x) is counted. In above case there is 7 times shifting unit of a computer, and even the
of radix point to the left. This value must be added to 127 simplest microprocessors contain one for purposes such as
to get the exponent value i.e. original exponent value is maintaining timers. By using pipeline with ALU design,
127 + x. Thus the exponent becomes 127 + 7 = 134 ALU provides a high performance. With pipelining
which is 10000110. Sign bit i.e. MSB is 0 because concept ALU execute multiple instructions simultaneously
number is + ve. Now the result is assembled into 32 bit (Suchita Pare et. al, 2012)
format which is sign, exponent, mantissa:
01000011000000011101110000000000. 3. Methodology
1.3 Xilinx Core Generator The block diagram of the designed arithmetic unit is
shown in the following figure. It supports five arithmetic
Xilinx CORE Generator System accelerates design time operations: addition, subtraction, multiplication, division
by providing access to highly parameterized Intellectual and square root. It has 4 inputs: a start bit, two 32-bit
Properties (IP) for Xilinx FPGAs and is included in the operand in IEEE format, one 3-bit operation code, one 2
ISE Design Suite. CORE Generator provides a catalog bit code for rounding operation; and has outputs: 32-bit
of architecture specific, domain-specific (embedded, output in IEEE format, one ready bit, eight exceptions
connectivity and DSP), and market specific IP (Inexact, Overflow, Underflow, Divide-by-zero, Infinity,
(Automotive, Consumer, Mil/Aero, Communications, Zero, QNaN, SNaN). All arithmetic operations have been
Broadcast etc.). These user-customizable IP functions carried out in four separate modules one for addition and
range in complexity from commonly used functions, such subtraction and one each for multiplication, division and
as memories and FIFOs, to system-level building blocks, square root as shown in figure 3.1. In this unit one can
such as filters and transforms. Using these IP blocks can select operation to be performed on the 32-bit operands by
save days to months of design time. The highly optimized a 3-bit op-code and the same op-code selects the output
IP allows FPGA designers to focus efforts on building from that particular module and connects it to the final
designs quicker while helping bring products to market output of the unit. Ready bit will be high when result will
faster. Through its seamless integration with the ISE be available to be taken from the output. Particular
development environment, the CORE Generator system exception signal will be high whenever that type of
streamlines the design process and improves design exception will occur. Since the result precision is not
quality. infinite, sometimes rounding is necessary. In this paper the
rounding mode considered is the Round to Nearest Even,
2. Literature Survey wherein the value is rounded up or down to the nearest
infinitely precise result. If the value is exactly halfway
In recent years, Floating-point numbers are widely between two infinitely precise results, then it should be
adopted in many applications due to its high dynamic rounded up to the nearest infinitely precise even.
range and good robustness against quantization errors,
264 | Proceedings of National Conference on Women in Science & Engineering (NCWSE 2013), SDMCET Dharwad
Fig 2: Floating Point ALU Basic Architecture
Fig3: Flowchart for floating point addition/subtraction

Arithmetic core: The basic arithmetic operations are
All arithmetic operations have these three stages: done here.
Pre-normalize: The operands are transformed into Post-normalize: the result will be normalized if
formats that makes them easy and efficient to handle possible (leading bit before decimal point will be 1, if
internally. normalized) and then transformed into the format
specified by the IEEE standard.
Fig 4: Flowchart for floating point multiplication
OperandA = signA & eA & fracA The conventional floating-point addition algorithm
OperandB = signB & eB & fracB consists of five stages - exponent difference, pre-
alignment, addition, normalization and rounding. The
algorithm used for adding or subtracting floating point
Count leading zeroes in numbers is shown in the following flowchart.
both fractions : ZA & ZB
3.2 Multiplication Algorithm

Shift left fracA ZA bits
Shift left fracB ZB bits In floating point multiplication, the two mantissas are to
be multiplied, and the two exponents are to be added. In
order to perform floating-point multiplication, a simple
e0= eA eB + bias(127) -zA + zB
algorithm is realized:
frac0= fracA / fracB
sign0= signA xor signB Add the exponents and subtract 127 (bias)
Multiply the mantissas and determine the sign of the
result
Normalize the resulting value, if necessary
Round frac0
3.3 Division Algorithm
It is shown in Fig 5.
Excepti
on Signal Exception
Occurr
3.4 Square Root
ed?
In the square-root algorithm used, multiplications were

replaced with left-shifts and all divisions with right-shifts.
Normalise
This makes the algorithm very efficient and fast for
hardware implementations. The algorithm used is similar
to division algorithm except for only one operand will be
considered. An iterative algorithm is used [2]. The
Output= sign0 & e0 & frac0
equation for calculating the output exponent value is:
Fig5: Flowchart for floating point division eO= (eA + bias (127) - zA) / 2
3.1 Addition and Subtraction 3.5 Xilinx IP Core Generator
The IP provides all the necessary IEEE compliant, highly
parameterizable floating-point arithmetic operators, Output = 01000000 10010000 01110010 10110000

allowing us control the fraction and exponent word
lengths, as well as the latency and implementation
specifics for the required algorithm. The core chosen here
is floating point addition core as shown in the figure
below:
Fig8: Simulation results for floating point subtraction
4.3 Multiplication
Fig 6: Snapshot of the Floating Point Core Generator Operand A= 00111111100000000000000000000000

window Operand B =00111111100000000000000000000000
Output = 00111111 10000000 0000000000000000
The following files are generated after the generation of
the core:
XCO
EDN/NGC
SYM
VHO or VEO Template Files
VHD or V Simulation Wrapper Files
VHD or V Source Code Files
4. Simulation Results
4.1 Addition
Operand A= 01000000101101000000000000000000 Fig 9: Simulation results for floating point multiplication

Operand B= 00111111100011100011010100111111
4.4 Division
Output = 01000000 11010111 10001101 01010000
Operand A= 00111111100000000000000000000000
Operand B =00111111100000000000000000000000
Output = 00111111 10000000 0000000000000000
Fig7: Simulation results for floating point addition

4.2 Subtraction
Operand A= 01000000101101000000000000000000
Operand B = 00111111100011100011010100111111 Fig10: Simulation results for floating point division
4.5 Square Root Table1: Result in terms of no. of clock cycles
Operand A= 00111111100000000000000000000000 OPERATION NO. OF

Output = 00111111 10000000 0000000000000000 CYCLES
Addition 4
Subtraction 4
Multiplication 8
Division 8
Square Root 10
Device Utilization Summary:
Number of Slice Registers 2%

Number of Slice LUTs 10%
Number used as Logic 10%
Number used as Memory 0%
Number of LUT Flip Flop pairs used 8070
Number with an unused Flip Flop 80%
Fig11: Simulation results for floating point square root
Number with an unused LUT 8%
Number of fully used LUT-FF pairs 10%
4.6 Floating point Addition Using IP Core Generator
Number of IOs 112
Simulation results for for addition is shown in the Number of bonded IOBs 17%
following figure obtained by using the Xilinx IP Core
generator. The same operands have been used as used in
the addition done in the VHDL environment. Acknowledgment
The authors thank the authorities of Sri Dharmasthala

Manjunatheshwara College of Engineering and
Technology, Dhavalagiri, Dharwad, Karnataka, India for
encouraging them for this research work.
Reference
IEEE 754-2008 (2008), IEEE Standard for Floating-Point

Arithmetic.
Behrooz Parhami (2000), Computer Arithmetic
Algorithms and Hardware Designs, Oxford University
Press.
Karan Gumber, Sharmelee Thangjam (May 2012),
Performance Analysis of Floating Point Adder using
Fig12: Simulation results for floating point addition using VHDL on Reconfigurable Hardware, International
IP Core Generator Journal of Computer Applications, Volume 46 No.9.
A. Jaenicke and W. Luk (2001), Parameterized Floating-
The simulation results of only floating point addition have Point Arithmetic on FPGAs, Proc. of IEEE ICASSP,
been shown for convenience and the other arithmetic Vol. 2, pp.897-900.
operations follow in the similar way. L. Louca, T. A. Cook, and W. H. Johnson (1996),
Implementation of IEEE Single Precision Floating Point
Addition and Multiplication on FPGAs Proceedings of
5. Conclusion
83 the IEEE Symposium on FPGAs for Custom
The pipelined Floating point Arithmetic unit has been Computing Machines (FCCM96), pp. 107116.
designed to perform five arithmetic operations, addition, Suchita Pare, Dr. Rita Jain (December 2012), 32 Bit
subtraction, multiplication, division and square root, on Floating point Arithmetic Logic Unit ALU Design and
floating point numbers. IEEE 754 standard based floating Simulation, International Journal of Emerging Trends in
point representation has been used. The unit has been Electronics and Computer Science (IJETECS) Volume
coded in VHDL. The same arithmetic operations have also 1, Issue 8.
been simulated in Xilinx IP Core Generator.

Alu Project Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alu Project Theory

Uploaded by

Copyright:

Available Formats

International Journal of Current Engineering and Technology ISSN 2277 4106

2013 INPRESSCO. All Rights Reserved

1. Introduction for binary floating-point numbers in IEEE 754. This

Fig 2: Floating Point ALU Basic Architecture

Fig3: Flowchart for floating point addition/subtraction

Fig 4: Flowchart for floating point multiplication

3.2 Multiplication Algorithm

3.3 Division Algorithm

In the square-root algorithm used, multiplications were

3.1 Addition and Subtraction 3.5 Xilinx IP Core Generator

The IP provides all the necessary IEEE compliant, highly

parameterizable floating-point arithmetic operators, Output = 01000000 10010000 01110010 10110000

Fig8: Simulation results for floating point subtraction

Fig 6: Snapshot of the Floating Point Core Generator Operand A= 00111111100000000000000000000000

Operand A= 01000000101101000000000000000000 Fig 9: Simulation results for floating point multiplication

Fig7: Simulation results for floating point addition

4.5 Square Root Table1: Result in terms of no. of clock cycles

Operand A= 00111111100000000000000000000000 OPERATION NO. OF

Device Utilization Summary:

Number of Slice Registers 2%

The authors thank the authorities of Sri Dharmasthala

IEEE 754-2008 (2008), IEEE Standard for Floating-Point

You might also like