DSP Floating Point Formats

By: Mehrnaz Monajati
Instructor: Dr. S.M. Fakhrai

This is a class presentation. All data are copy rights of
their respective authors as listed in the references and
have been used here for educational purpose only.
Fixed vs. Floating Point

DSPs
Cost
Ease of use
Accuracy
Dynamic range

DSPs
Cost
Today, fixed-point DSPs continue to benefit more
from cost reductions of scale in manufacturing
since they are more often used for high-volume

applications
the same reductions will apply to floating-point
DSPs when high-volume demand for the devices

appears.
Today, cost has increasingly become an issue of
SOC integration and volume, rather than a result
of the size of the DSP core itself.
3

DSPs
Ease of use
Last days
Today
TI floating-point supported the C

language
FXP DSPs were programmed at the
assembly code level
Coding of real arithmetic in to hardware
TI fixed-point DSPs have long been
Directly in FLP
indirectly in FXP
software routines that added
development time and extra
supported by outstandingly efficient

C compilers
The advantage of implementing
real arithmetic directly in floatingpoint hardware still remains
Reduction in FXP complexity
FXP DSPs still have an edge in cost
and FLP DSPs in ease of use, but the

edge has narrowed
instructions to the algorithm

Programming

DSPs
Accuracy
Dynamic range
Accuracy of FLP is greater than FXP
FLP has greater precision in integer as well as real
values
Exponentiation vastly increases the dynamic
range
Internal data representations in FLP DSPs are more
exact than in FXP
ensuring greater accuracy in end result
5

DSPs
FXP DSPs
TIs TMS320C62x FXP DSPs
Two data paths operating in parallel
Each with a 16-bit word width
provides signed integer values within a range from 2^15 to
2^15
TMS320C64x DSPs,
double the overall throughput with four 16-bit multipliers
TMS320C5x and TMS320C2x DSPs

designed for handheld and control applications, respectively
are based on single 16-bit data paths
6

DSPs
FLP DSPs
TMS320C67x FLP DSPs
divide a 32-bit data path into two parts: a 24-bit mantissa and
an 8-bit exponent.
16M range of precision
supporting a vastly greater dynamic range than is available
with the FXP format. The C67x DSP can also perform
calculations
C67x DSP
Using industry-standard double-width precision
64 bits, including a 53-bit mantissa and an 11-bit exponent

Achieves much greater precision and dynamic range at the
expense of speed, since it requires multiple cycles for each
operation
7
Standards for FLP

Number Formats
FLP Nnumber Formats
Sample Floating Point

DSPs
AMD - Athlon Processor
Xilinx Virtex-5 APU Floating Point Unit
Digital Core Design DFPAU ver 2.05
10

2000
Include the most powerful floating point engine for
x86 platforms
Delivers twice the peak x87 floating point
execution rate of the Intel Pentium III processor
Rivals the FP performance of many RISC
processors in that time
Superscalar and Super pipelined
Higher clock frequencies
Higher overall throughput
Ref. [3]
11

2000
Ref. [3]
12
Xilinx Virtex-5 APU FLP

Unit
2009
designed for the PowerPC 440 embedded microprocessor of
the Virtex-5 FXT FPGA family

support for IEEE-754 standard in single or double precision
Optimized for 2:1 and 3:1 APU:CPU clock ratios
allowing PowerPC processor to operate at maximum frequency
Application:
Digital signal processing of high-quality audio or video signals
where a very large dynamic range is needed to retain fidelity.

Matrix inversion in wireless communications and radar
Digital signal processing tasks, spectral methods such as FFT
Statistical processing
where floating-point is often the simplest way to avoid integer
overflow and rounding errors

13

Unit
2009
Increased Processing Capacity
Hardware floating-point operations complete faster than the equivalent
software emulation routines

The floating-point operators within the FPU are pipelined
multiple floating-point calculations can proceed in parallel
The FPU is autonomous

the PowerPC processor internal pipeline can continue to execute integer instructions
while floating-point operations are handled by the FPU in parallel
IEEE 754-1985 / Book-E Standard Compatibility

The standard represents very small numbers by allowing significands of
the form "0.x" in addition to the usual 1.x used by normalized FLP
numbers
In Book-E, the multiply part of a multiply-add operation should not round
its result before supplying it to the addition part
The FPU treats all not-a-number (NaN) values as quiet NaNs, which do
not cause exceptions. When a floating-point operation results in a NaN
because one of the inputs was a NaN, the input NaN is not propagated
to the output; the default quiet NaN value is provided. This value is
0x7ff8000000000000 in double precision, and 0x7f800000 in single
precision
14

Unit
Ref. [4]
15
Digital Core Design DFPAU ver.

2.05, 2010
It is a FLP Arithmetic Co-processor
directly replaces C software functions, by
equivalent, very fast hardware operations

significantly accelerate system performance
It doesnt require any programming

Everything is done automatically during software
compilation by the DFPAU C driver.
Supports addition, subtraction, multiplication,
division, square root, comparison, absolute value

The input numbers format is according to IEEE-754
Each floating point function can be turned on/off
at configuration level
providing the flexible scalability of DFPAU module
technology independent design
16
Digital Core Design DFPAU ver.

2.05, 2010
Ref. [5]
Ref. [5]
17
Architectural Modification to
Improve FLP Unit in FPGAs
Variable
length shifters account for over 30%
2008
[1]
of a adder and 25% of a multiplier
embedded
Coarse-grained approach
shifter
Embedded Shifter
Consumed
fine-grained approach
area
Multiplexer
Saved area
Increased
4:1
multiple
chip 1.5%
xer
0.48%
14.6%
clock 3.3%
7.3%
11.6%
rate
18
Low power FLP Unit

2009
Design of[2]
embedded systems applications with
low power consumption and fast processing
performing basic operations such as addition,
subtraction, multiplication and division
Idea:
the functional units (adder, shifter, registers) are
shared between different operations

Advantage: saving silicon area
Disadvantage: the increase in the number of
cycles required to perform the operation
19
Low power FLP Unit 2009
Ref. [2]
20
Low power FLP Unit 2009
Ref. [2]
21
Reconfigurable FLP Unit

2009
[7] applications usually have very
Non-numerical
few FLP operations
FLP unit is always under idle mode
In idle mode, the floating-point unit still
consume power and the die area is wasted
Idea:
reconfigurable floating-point unit that provide
integer and floating-point operations
22
rAMM
Array
Ref. [7]
23
Ref. [7]
24
Ref. [7]
Ref. [7]
25
References
1.
2.
3.
4.
5.
6.
7.
M. Beauchamp, et al., "Architectural modifications to enhance the

floating-point performance of FPGAs," IEEE Transactions on Very
Large Scale Integration Systems, vol. 16, p. 177, 2008.
R.Neves, et al. "A Floating Point Unit Architecture for Low Power
Embedded Systems Applications," XXIV SIM - South Symposium
on Microelectronics, 2009.
AMD Athlon Floating Point Engine, "AMD Athlon Processor floating
Point Capability, The Most Powerful, Architecturally Advanced
Floating Point Engine Ever Delivered in an x86 Microprocessor,"
with paper, 2000.
Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet,
DS693, 2009.
DFPAU floating-point pipelined divider, 2010,
<http://www.altera.com>.
G. Frantz and R. Simar, "Comparing Fixed and Floating Point
DSPs," SPRY061, Texas Instruments, 2004.
Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit,"
2009.
26
27
Embedded shifter block

diagram
Ref. [1]
28
4:1 Multiplexer
Ref. [1]
29

DSP Floating Point Formats

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSP Floating Point Formats

Uploaded by

Copyright:

Available Formats

By: Mehrnaz Monajati

Instructor: Dr. S.M. Fakhrai

Fixed vs. Floating Point

Fixed vs. Floating Point

from cost reductions of scale in manufacturing

since they are more often used for high-volume

the same reductions will apply to floating-point

DSPs when high-volume demand for the devices

Fixed vs. Floating Point

TI floating-point supported the C

TI fixed-point DSPs have long been

development time and extra

supported by outstandingly efficient

and FLP DSPs in ease of use, but the

instructions to the algorithm

Fixed vs. Floating Point

Fixed vs. Floating Point

TMS320C5x and TMS320C2x DSPs

Fixed vs. Floating Point

64 bits, including a 53-bit mantissa and an 11-bit exponent

Standards for FLP

FLP Nnumber Formats

Sample Floating Point

AMD - Athlon Processor

AMD - Athlon Processor

Xilinx Virtex-5 APU FLP

the Virtex-5 FXT FPGA family

where a very large dynamic range is needed to retain fidelity.

overflow and rounding errors

Xilinx Virtex-5 APU FLP

software emulation routines

multiple floating-point calculations can proceed in parallel

The FPU is autonomous

IEEE 754-1985 / Book-E Standard Compatibility

Xilinx Virtex-5 APU FLP

Digital Core Design DFPAU ver.

equivalent, very fast hardware operations

It doesnt require any programming

division, square root, comparison, absolute value

technology independent design

Digital Core Design DFPAU ver.

Low power FLP Unit

shared between different operations

Low power FLP Unit 2009

Low power FLP Unit 2009

Reconfigurable FLP Unit

consume power and the die area is wasted

integer and floating-point operations

Reconfigurable FLP Unit

Reconfigurable FLP Unit

Reconfigurable FLP Unit

M. Beauchamp, et al., "Architectural modifications to enhance the

Embedded shifter block

You might also like