Floating Point Representation

Floating-Point Representation
http://fourier.eng.hmc.edu/e85/lectures/arithmetic_html/node11.html
next
previous
Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication
Decimal Cases
( ( )
In programming, a floating point number general, a floating-point number can be written as
is expressed as
. In
where M is the fraction mantissa or significand. E is the exponent. B is the base, in decimal case . Binary Cases As an example, a 32-bit word is used in MIPS computer to represent a floating-point number:
1 bit ..... 8 bits .............. 23 bits representing: The implied base is 2 (not explicitly shown in the representation). The exponent can be represented in signed 2's complement (but also see biased notation later). The implied decimal point is between the exponent field E and the significand field M. More bits in field E mean larger range of values representable. More bits in field M mean higher precision. Zero is represented by all bits equal to 0: Normalization To efficiently use the bits available for the significand, it is shifted to the left until all leading 0's disappear
1 of 7
6/3/2012 7:17 AM
(as they make no contribution to the precision). The value can be kept unchanged by adjusting the exponent accordingly. Moreover, as the MSB of the significand is always 1, it does not need to be shown explicitly. The significand could be further shifted to the left by 1 bit to gain one more bit for precision. The first bit 1 before the decimal point is implicit. The actual value represented is
However, to avoid possible confusion, in the following the default normalization does not assume this implicit 1 unless otherwise specified. Zero is represented by all 0's and is not (and cannot be) normalized. Example: A binary number can be represented in 14-bit floating-point form in the following ways (1 sign bit, a 4-bit exponent field and a 9-bit significand field):
with an implied 1.0: By normalization, highest precision can be achieved. Biased Notation for Exponent To simplify the hardware for comparing two exponents (to use simpler integer sorting rather than subtraction), we may want to avoid 2's complement representation for the exponent. This can be done by simply adding 1 (a bias) at the MSB of the exponent field and the resulting representation is called biased notation. Consider a 5-bit exponent field (range of exponents: ):
2 of 7
6/3/2012 7:17 AM
The bias depends on number of bits in the exponent field. If there are e bits in this field, the bias is , which lifts the representation (not the actual exponent) by half of the range to get rid of the negative parts represented by 2's complement. The range of actual exponents represented is still the same. With the biased exponent, the value represented by the notation is:
Floating-Point Notation of IEEE 754 The IEEE 754 floating-point standard uses 32 bits to represent a floating-point number, including 1 sign bit, 8 exponent bits and 23 bits for the significand. As the implied base is 2, an implied 1 is used, i.e., the significand has effectively 24 bits including 1 implied bit to the left of the decimal point not explicitly represented in the notation. Note in particular that in IEEE 754 notation, the bias for the 8-bit exponent is (instead of The 8-bit exponent field: ).
3 of 7
6/3/2012 7:17 AM
Note: Zero exponent is represented by , the bias of the notation;
The range of exponents representable is from -126 to 127; The exponent (with all zero significand) is reserved to represent infinities or not-anumber (NaN) which may occur when, e.g., a number is divided by zero; The smallest exponent is reserved to represent denormalized numbers (smaller than which cannot be normalized) and zero, e.g., is represented by:
Other Implied Bases Given e bits for the exponent field, the range of exponent values representable is and the range of magnitudes representable is about
For example, if
, the range of exponent values representable is
and the range of magnitudes representable is
This range can be extended by (a) increasing number of bits for exponent, or (b) increasing the implied base from 2 to 4, 8, 16, etc. (or in general, the range of magnitudes representable is ). For example, when the implied base is ,
Normalization: If the implied base is , the significand must be shifted multiple of q bits at a time so that the exponent can be correspondingly adjusted to keep the value unchanged. If at least one of the first q bits of the significand is 1, the representation is normalized. Obviously, the implied 1 can no longer be used. Examples: Normalize . Note that the base is 4 (instead of 2)
4 of 7
6/3/2012 7:17 AM
Note that the significand has to be shifted to the left two bits at a time during normalization, because the smallest reduction of the exponent necessary to keep the value represented unchanged is 1, corresponding to dividing the value by 4. Similarly, if the implied base is , the significand has to be shifted 3 bits at a time. In general, if , normalization means to left shift the significand q bits at a time until there is at least one 1 in the highest q bits of the significand. Obviously the implied 1 can not be used. Represent in biased notation with and implied base is 2. bits for exponent field. The bias is
The biased exponent is
, and the notation is (without implied 1):
or (with implied 1):
Find the value represented in this biased notation:
The biased exponent is 17, the actual exponent is
, the value is (without implied 1):
or (with implied 1):
Examples of IEEE 754: -0.3125
5 of 7
6/3/2012 7:17 AM
1.0
37.5
The based exponent:
. -78.25
The biased exponent:
As the most negative exponent representable is -126, this value is a denorm which cannot be normalized:
Can you answer the following questions regarding 32-bit IEEE 754 floating-point representation and explain why?:
6 of 7
6/3/2012 7:17 AM
What is the largest magnitude (absolute value) representable?
What is the smallest magnitude (absolute value) representable?
what is the largest gap between two consecutive numbers?
what is the smallest gap between two consecutive numbers?
next
previous
Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication Ruye Wang 2003-10-24
7 of 7
6/3/2012 7:17 AM

Floating Point Representation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Floating Point Representation

Uploaded by

Copyright:

Available Formats

Floating-Point Representation

Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication

In programming, a floating point number general, a floating-point number can be written as

Note: Zero exponent is represented by , the bias of the notation;

, the range of exponent values representable is

and the range of magnitudes representable is

The biased exponent is

, and the notation is (without implied 1):

or (with implied 1):

Find the value represented in this biased notation:

The biased exponent is 17, the actual exponent is

, the value is (without implied 1):

or (with implied 1):

Examples of IEEE 754: -0.3125

The biased exponent is

The biased exponent is

The based exponent:

The biased exponent:

What is the largest magnitude (absolute value) representable?

What is the smallest magnitude (absolute value) representable?

what is the largest gap between two consecutive numbers?

what is the smallest gap between two consecutive numbers?

You might also like