You are on page 1of 14

Floating Point Numbers

CS1251
Computer Organization
Carl Hamacher

4/2/2014

Department of Information
Technology

Scientific Notation
Components (Normalized)

Whole number
Fraction
Exponent

Examples

4/2/2014

2.9979 x 108
6.6254 x 10-27
6.0247 x 1023

Department of Information Technology

IEEE Floating Point Standard


Base 10

X1.X2X3X4X5X6X7 x 10Y1Y2

Binary

1.M x 2E

IEEE Standard (Single Precision = 32 bit)

4/2/2014

Sign (1-bit): S
Exponent (8-bit, excess-127): E' = E + 127
Mantissa (23-bit): M
1.M x 2E'-127
Department of Information Technology

IEEE Floating Point Standard


Base 10

X1.X2X3X4X5X6X7...X16 x 10Y1Y2Y3

Binary

1.M x 2E

IEEE Standard (Double Precision = 64 bit)

4/2/2014

Sign (1-bit): S
Exponent (11-bit, excess-1023): E' = E + 1023
Mantissa (53-bit): M
1.M x 2E'-1023
Department of Information Technology

Examples
Single Precision (32-bits)

000101000001010...0 = +1.001010...0 x 2-87


110000001010100...0 = 1.010100...0 x 22
= 101.012
= 5.2510

Double Precision (64-bits)

100000010111100000...1 = 1.100000...1 x 2-1000


010000000100000111...0 = +1.000111...0 x 25
= +100011.12
= +35.510

4/2/2014

Department of Information Technology

Special Considerations
Normalization

Shift Fraction
Adjust Exponent

Guard Bits

Extra bits retained during intermediate steps

Truncation

4/2/2014

Removing guard bits to create approximation


Chopping (biased)
Rounding (unbiased)
Department of Information Technology

Special Considerations
Special Values

E' = 0 and M = 0 0
E' = 255 and M = 0
E' = 0 and M 0 Denormal (0.M x 2-126)
E' = 255 and M 0 Not a Number (e.g. 0/0)

Exceptions (Flags)

4/2/2014

Overflow and Underflow (out of range)


Invalid and Divide by Zero
Inexact (truncation)
Department of Information Technology

Addition and Subtraction


1. Shift the number with the smaller exponent
to the right by the difference in exponents
2. Set the exponent of the result to the larger
exponent
3. Perform the addition/subtraction on the
mantissas and determine the sign of the
result
4. Normalize the resulting value (if necessary)
4/2/2014

Department of Information Technology

Hardware

4/2/2014

Department of Information Technology

Example (16-bit)

6-bit Exponent (excess-31)


9-bit Mantissa (normalized)

A = 12.2510 = 1100.012 = 1.1000102 x 23


+ B = + 6.5010 = + 110.102 = 1.1010002 x 22
R = 18.7510 = 10010.112 = 1.0010112 x 24

EA' =
100010
EB' = - 100001
=
E'
E'

MA =
1100010000
MB = + 01101000000

1 = n

M =

=
100010
= -1 = X

MR =

ER' =

100101100000
100101100000

100011
A = 0100010100010000
B = 0100001101000000
R = 0100011001011000

4/2/2014

Department of Information Technology

10

Example (16-bit)

6-bit Exponent (excess-31)


9-bit Mantissa (normalized)

A =
3.5010 =
11.102 = 1.1100002 x 21
+ B = + -4.7510 = + -100.112 = -1.0011002 x 22
R =
-1.2510 =
-1.012 = -1.0100002 x 20x

EA' =
100000
EB' = - 100001
=
E'
E'

MB =
-1001100000
MA = + 01110000000

-1 = n

M =

-00101000000

=
100001
= 10 = X

MR =

-1010000000

ER' =

011111
A = 0100000110000000
B = 1100001001100000
R = 1011111010000000

4/2/2014

Department of Information Technology

11

Multiplication (and Division)


1. Add (Subtract) the exponents and subtract
(add) 31
2. Multiply (Divide) the mantissas and
determine the sign of the result
3. Normalize the resulting value (if necessary)

4/2/2014

Department of Information Technology

12

Example (16-bit)

6-bit Exponent (excess-31)


9-bit Mantissa (normalized)

A =
-4.7510 =
-100.112 = -1.00110002 x 22
x B = x 3.5010 =
x
11.12 = 1.11000002 x 21
R = -16.62510 = -10000.1012 = -1.00001012 x 24x

EA' =
100001
EB' = + 100000

MA =
-1001100000
MB = x 1110000000

M = -10000101000000000000

E'

= 1000001
= - 011111

E'
E'

=
100010
= -1 = X

ER' =

MR =

-1000010100

100011
A = 1100001001100000
B = 0100000110000000
R = 1100011000010100

4/2/2014

Department of Information Technology

13

Questions?

4/2/2014

Department of Information Technology

14

You might also like