You are on page 1of 29

By: Mehrnaz Monajati

Instructor: Dr. S.M. Fakhrai


This is a class presentation. All data are copy rights of
their respective authors as listed in the references and
have been used here for educational purpose only.

Fixed vs. Floating Point


DSPs
Cost
Ease of use
Accuracy
Dynamic range

Fixed vs. Floating Point


DSPs
Cost
Today, fixed-point DSPs continue to benefit more

from cost reductions of scale in manufacturing

since they are more often used for high-volume


applications

the same reductions will apply to floating-point

DSPs when high-volume demand for the devices


appears.
Today, cost has increasingly become an issue of
SOC integration and volume, rather than a result
of the size of the DSP core itself.
3

Fixed vs. Floating Point


DSPs
Ease of use

Last days

Today

TI floating-point supported the C


language
FXP DSPs were programmed at the
assembly code level
Coding of real arithmetic in to hardware

TI fixed-point DSPs have long been

Directly in FLP
indirectly in FXP
software routines that added

development time and extra

supported by outstandingly efficient


C compilers
The advantage of implementing
real arithmetic directly in floatingpoint hardware still remains
Reduction in FXP complexity
FXP DSPs still have an edge in cost

and FLP DSPs in ease of use, but the


edge has narrowed

instructions to the algorithm


Programming

Fixed vs. Floating Point


DSPs
Accuracy
Dynamic range
Accuracy of FLP is greater than FXP
FLP has greater precision in integer as well as real
values
Exponentiation vastly increases the dynamic
range
Internal data representations in FLP DSPs are more
exact than in FXP
ensuring greater accuracy in end result
5

Fixed vs. Floating Point


DSPs
FXP DSPs
TIs TMS320C62x FXP DSPs
Two data paths operating in parallel
Each with a 16-bit word width
provides signed integer values within a range from 2^15 to

2^15

TMS320C64x DSPs,
double the overall throughput with four 16-bit multipliers

TMS320C5x and TMS320C2x DSPs


designed for handheld and control applications, respectively
are based on single 16-bit data paths
6

Fixed vs. Floating Point


DSPs
FLP DSPs
TMS320C67x FLP DSPs
divide a 32-bit data path into two parts: a 24-bit mantissa and
an 8-bit exponent.
16M range of precision
supporting a vastly greater dynamic range than is available
with the FXP format. The C67x DSP can also perform
calculations
C67x DSP
Using industry-standard double-width precision

64 bits, including a 53-bit mantissa and an 11-bit exponent


Achieves much greater precision and dynamic range at the
expense of speed, since it requires multiple cycles for each
operation
7

Standards for FLP


Number Formats

FLP Nnumber Formats

Sample Floating Point


DSPs
AMD - Athlon Processor
Xilinx Virtex-5 APU Floating Point Unit
Digital Core Design DFPAU ver 2.05

10

AMD - Athlon Processor


2000
Include the most powerful floating point engine for
x86 platforms
Delivers twice the peak x87 floating point
execution rate of the Intel Pentium III processor
Rivals the FP performance of many RISC
processors in that time
Superscalar and Super pipelined
Higher clock frequencies
Higher overall throughput

Ref. [3]

11

AMD - Athlon Processor


2000

Ref. [3]

12

Xilinx Virtex-5 APU FLP


Unit
2009
designed for the PowerPC 440 embedded microprocessor of

the Virtex-5 FXT FPGA family


support for IEEE-754 standard in single or double precision
Optimized for 2:1 and 3:1 APU:CPU clock ratios
allowing PowerPC processor to operate at maximum frequency

Application:
Digital signal processing of high-quality audio or video signals

where a very large dynamic range is needed to retain fidelity.


Matrix inversion in wireless communications and radar
Digital signal processing tasks, spectral methods such as FFT
Statistical processing
where floating-point is often the simplest way to avoid integer

overflow and rounding errors


13

Xilinx Virtex-5 APU FLP


Unit
2009
Increased Processing Capacity
Hardware floating-point operations complete faster than the equivalent

software emulation routines


The floating-point operators within the FPU are pipelined

multiple floating-point calculations can proceed in parallel

The FPU is autonomous


the PowerPC processor internal pipeline can continue to execute integer instructions
while floating-point operations are handled by the FPU in parallel

IEEE 754-1985 / Book-E Standard Compatibility


The standard represents very small numbers by allowing significands of

the form "0.x" in addition to the usual 1.x used by normalized FLP
numbers
In Book-E, the multiply part of a multiply-add operation should not round
its result before supplying it to the addition part
The FPU treats all not-a-number (NaN) values as quiet NaNs, which do
not cause exceptions. When a floating-point operation results in a NaN
because one of the inputs was a NaN, the input NaN is not propagated
to the output; the default quiet NaN value is provided. This value is
0x7ff8000000000000 in double precision, and 0x7f800000 in single
precision
14

Xilinx Virtex-5 APU FLP


Unit

Ref. [4]

15

Digital Core Design DFPAU ver.


2.05, 2010
It is a FLP Arithmetic Co-processor
directly replaces C software functions, by

equivalent, very fast hardware operations


significantly accelerate system performance

It doesnt require any programming


Everything is done automatically during software
compilation by the DFPAU C driver.
Supports addition, subtraction, multiplication,

division, square root, comparison, absolute value


The input numbers format is according to IEEE-754
Each floating point function can be turned on/off
at configuration level
providing the flexible scalability of DFPAU module

technology independent design

16

Digital Core Design DFPAU ver.


2.05, 2010

Ref. [5]

Ref. [5]

17

Architectural Modification to
Improve FLP Unit in FPGAs
Variable
length shifters account for over 30%
2008
[1]
of a adder and 25% of a multiplier
embedded
Coarse-grained approach
shifter
Embedded Shifter

Consumed
fine-grained approach
area
Multiplexer
Saved area
Increased

4:1
multiple

chip 1.5%

xer
0.48%

14.6%
clock 3.3%

7.3%
11.6%

rate

18

Low power FLP Unit


2009
Design of[2]
embedded systems applications with
low power consumption and fast processing
performing basic operations such as addition,
subtraction, multiplication and division
Idea:
the functional units (adder, shifter, registers) are

shared between different operations


Advantage: saving silicon area
Disadvantage: the increase in the number of
cycles required to perform the operation
19

Low power FLP Unit 2009

Ref. [2]

20

Low power FLP Unit 2009

Ref. [2]

21

Reconfigurable FLP Unit


2009
[7] applications usually have very
Non-numerical
few FLP operations
FLP unit is always under idle mode
In idle mode, the floating-point unit still

consume power and the die area is wasted

Idea:
reconfigurable floating-point unit that provide

integer and floating-point operations

22

Reconfigurable FLP Unit

rAMM
Array

Ref. [7]

23

Reconfigurable FLP Unit

Ref. [7]
24

Reconfigurable FLP Unit

Ref. [7]

Ref. [7]

25

References
1.
2.
3.

4.
5.
6.
7.

M. Beauchamp, et al., "Architectural modifications to enhance the


floating-point performance of FPGAs," IEEE Transactions on Very
Large Scale Integration Systems, vol. 16, p. 177, 2008.
R.Neves, et al. "A Floating Point Unit Architecture for Low Power
Embedded Systems Applications," XXIV SIM - South Symposium
on Microelectronics, 2009.
AMD Athlon Floating Point Engine, "AMD Athlon Processor floating
Point Capability, The Most Powerful, Architecturally Advanced
Floating Point Engine Ever Delivered in an x86 Microprocessor,"
with paper, 2000.
Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet,
DS693, 2009.
DFPAU floating-point pipelined divider, 2010,
<http://www.altera.com>.
G. Frantz and R. Simar, "Comparing Fixed and Floating Point
DSPs," SPRY061, Texas Instruments, 2004.
Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit,"
2009.
26

27

Embedded shifter block


diagram

Ref. [1]

28

4:1 Multiplexer

Ref. [1]
29

You might also like