You are on page 1of 40

Carnegie Mellon

Floang Point

15-213: Introducon to Computer Systems 4th Lecture, Sep 5, 2013

Instructors: Randy Bryant, Dave OHallaron, and Greg Kesden

Carnegie Mellon

Today: Floang Point


Background: Fraconal binary numbers IEEE oang point standard: Denion Example and properes Rounding, addion, mulplicaon Floang point in C Summary

Carnegie Mellon

Fraconal binary numbers

What is 1011.1012?

Carnegie Mellon

Fraconal Binary Numbers


2i 2i-1



4 2 1 1/2 1/4 1/8

bi bi-1 b2 b1 b0 b-1 b-2 b-3 b-j




4

Representaon

Bits to right of binary point represent fraconal powers of 2 Represents raonal number:

2-j

Carnegie Mellon

Fraconal Binary Numbers: Examples

5 3/4 2 7/8 1 7/16

Value

101.112 010.1112 001.01112

Representaon

Observaons

Divide by 2 by shiing right (unsigned) Mulply by 2 by shiing le Numbers of form 0.1111112 are just below 1.0

1/2 + 1/4 + 1/8 + + 1/2i + 1.0 Use notaon 1.0

Carnegie Mellon

Representable Numbers

Limitaon #1

Can only exactly represent numbers of the form x/2k


Representaon 0.0101010101[01]2 0.001100110011[0011]2 0.0001100110011[0011]2

Other raonal numbers have repeang bit representaons 1/3 1/5 1/10

Value

Limitaon #2

Just one seng of binary point within the w bits

Limited range of numbers (very small values? very large?)

Carnegie Mellon

Today: Floang Point


Background: Fraconal binary numbers IEEE oang point standard: Denion Example and properes Rounding, addion, mulplicaon Floang point in C Summary

Carnegie Mellon

IEEE Floang Point

IEEE Standard 754

Established in 1985 as uniform standard for oang point arithmec


Before that, many idiosyncrac formats Supported by all major CPUs

Driven by numerical concerns

Nice standards for rounding, overow, underow Hard to make fast in hardware

Numerical analysts predominated over hardware designers in dening standard

Carnegie Mellon

Floang Point Representaon

Numerical Form: (1)s M 2E

Sign bit s determines whether number is negave or posive Signicand M normally a fraconal value in range [1.0,2.0). Exponent E weights value by power of two

Encoding

MSB s is sign bit s exp eld encodes E (but is not equal to E) frac eld encodes M (but is not equal to M)
s exp frac
9

Carnegie Mellon

Precision opons

Single precision: 32 bits


s exp 1 8-bits frac 23-bits

Double precision: 64 bits


s exp 1 11-bits frac 52-bits

Extended precision: 80 bits (Intel only)


s exp 1 15-bits frac

63 or 64-bits
10

Carnegie Mellon

Normalized Values

When: exp 0000 and exp 1111 Exponent coded as a biased value: E = Exp Bias
Exp: unsigned value exp Bias = 2k-1 - 1, where k is number of exponent bits

Single precision: 127 (Exp: 1254, E: -126127) Double precision: 1023 (Exp: 12046, E: -10221023)

Signicand coded with implied leading 1: M = 1.xxxx2


xxxx: bits of frac Minimum when frac=0000 (M = 1.0) Maximum when frac=1111 (M = 2.0 ) Get extra leading bit for free

11

Carnegie Mellon

Normalized Encoding Example

Value: Float F = 15213.0; 1521310 = 111011011011012 Signicand


M = frac = E = Bias = Exp =

= 1.11011011011012 x 213 1.11011011011012 110110110110100000000002 13 127 140 =

Exponent
100011002

0 10001100 11011011011010000000000
s

Result:

exp

frac

12

Carnegie Mellon

Denormalized Values

Condion: exp = 0000

Exponent value: E = Bias + 1 (instead of E = 0 Bias) Signicand coded with implied leading 0: M = 0.xxxx2

Cases

xxxx: bits of frac

exp = 0000, frac = 0000

Represents zero value Note disnct values: +0 and 0 (why?) exp = 0000, frac 0000 Numbers closest to 0.0 Equispaced
13

Carnegie Mellon

Special Values

Condion: exp = 1111 Case: exp = 1111, frac = 0000 Represents value (innity)

Operaon that overows Both posive and negave E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =

Case: exp = 1111, frac 0000

Not-a-Number (NaN) Represents case when no numeric value can be determined E.g., sqrt(1), , 0
14

Carnegie Mellon

Visualizaon: Floang Point Encodings

NaN

Normalized

Denorm -0

+Denorm +0

+Normalized

+ NaN

15

Carnegie Mellon

Today: Floang Point


Background: Fraconal binary numbers IEEE oang point standard: Denion Example and properes Rounding, addion, mulplicaon Floang point in C Summary

16

Carnegie Mellon

Tiny Floang Point Example


s exp frac 3-bits 1 4-bits

8-bit Floang Point Representaon

the sign bit is in the most signicant bit the next four bits are the exponent, with a bias of 7 the last three bits are the frac

Same general form as IEEE Format


normalized, denormalized representaon of 0, NaN, innity

17

Carnegie Mellon

Dynamic Range (Posive Only)


s exp frac E Value 0 0 Denormalized 0 numbers 0 0 0 0 0 0 0 Normalized numbers 0 0 0 0 0 0000 000 0000 001 0000 010 0000 0000 0001 0001 0110 0110 0111 0111 0111 110 111 000 001 110 111 000 001 010 -6 -6 -6 -6 -6 -6 -6 -1 -1 0 0 0 7 7 n/a 0 1/8*1/64 = 1/512 2/8*1/64 = 2/512 6/8*1/64 7/8*1/64 8/8*1/64 9/8*1/64 14/8*1/2 15/8*1/2 8/8*1 9/8*1 10/8*1 = = = = = = = = = 6/512 7/512 8/512 9/512 14/16 15/16 1 9/8 10/8 closest to zero

largest denorm smallest norm

closest to 1 below closest to 1 above

1110 110 1110 111 1111 000

14/8*128 = 224 15/8*128 = 240 inf

largest norm

18

Carnegie Mellon

Distribuon of Values

6-bit IEEE-like format


e = 3 exponent bits f = 2 fracon bits Bias is 23-1-1 = 3
s exp frac 2-bits 1 3-bits

Noce how the distribuon gets denser toward zero. 8 values

-15

-10

-5 Denormalized

0 5 Normalized Infinity

10

15

19

Carnegie Mellon

Distribuon of Values (close-up view)

6-bit IEEE-like format


e = 3 exponent bits f = 2 fracon bits Bias is 3
s exp frac 2-bits 1 3-bits

-1

-0.5 Denormalized

0 Normalized

0.5 Infinity

20

Carnegie Mellon

Special Properes of the IEEE Encoding

FP Zero Same as Integer Zero


All bits = 0

Can (Almost) Use Unsigned Integer Comparison


Must rst compare sign bits Must consider 0 = 0 NaNs problemac

Will be greater than any other values What should comparison yield? Otherwise OK Denorm vs. normalized Normalized vs. innity
21

Carnegie Mellon

Today: Floang Point


Background: Fraconal binary numbers IEEE oang point standard: Denion Example and properes Rounding, addion, mulplicaon Floang point in C Summary

22

Carnegie Mellon

Floang Point Operaons: Basic Idea

x +f y = Round(x + y) x f y = Round(x y) Basic idea


First compute exact result Make it t into desired precision

Possibly overow if exponent too large Possibly round to t into frac

23

Carnegie Mellon

Rounding

Rounding Modes (illustrate with $ rounding) $1.40 $1.60 $1.50 $2.50 $1.50
$1 $1 $2 $1 $1 $1 $2 $2 $1 $1 $2 $2 $2 $2 $3 $2 $1 $2 $1 $2

Towards zero Round down () Round up (+) Nearest Even (default)

24

Carnegie Mellon

Closer Look at Round-To-Even

Default Rounding Mode

Hard to get any other kind without dropping into assembly All others are stascally biased

Sum of set of posive numbers will consistently be over- or under- esmated

Applying to Other Decimal Places / Bit Posions


When exactly halfway between two possible values

Round so that least signicant digit is even E.g., round to nearest hundredth 1.2349999 1.23 (Less than half way) 1.2350001 1.24 (Greater than half way) 1.2350000 1.24 (Half wayround up) 1.2450000 1.24 (Half wayround down)
25

Carnegie Mellon

Rounding Binary Numbers

Binary Fraconal Numbers

Even when least signicant bit is 0 Half way when bits to right of rounding posion = 1002

Examples
Value 2 3/32 2 3/16 2 7/8 2 5/8

Round to nearest 1/4 (2 bits right of binary point)


Binary 10.000112 10.001102 10.111002 10.101002 Rounded 10.002 10.012 11.002 10.102

Acon (<1/2down) (>1/2up) ( 1/2up) ( 1/2down)

Rounded Value 2 2 1/4 3 2 1/2

26

Carnegie Mellon

FP Mulplicaon
(1)s1 M1 2E1 x (1)s2 M2 2E2 s E Exact Result: (1) M 2

Sign s: Signicand M: Exponent E:

s1 ^ s2 M1 x M2 E1 + E2

Fixing

If M 2, shi M right, increment E If E out of range, overow Round M to t frac precision

Implementaon

Biggest chore is mulplying signicands

27

Carnegie Mellon

Floang Point Addion

(1)s1 M1 2E1 + (-1)s2 M2 2E2


A ssume E1 > E2
E1E2 (1)s1 M1

s E Exact Result: (1) M 2

S ign s, signicand M:

Result of signed align & add E xponent E: E1

(1)s2 M2 (1)s M

Fixing

I f M 2, shi M right, increment E i f M < 1, shi M le k posions, decrement E by k O verow if E out of range R ound M to t frac precision
28

Carnegie Mellon

Today: Floang Point


Background: Fraconal binary numbers IEEE oang point standard: Denion Example and properes Rounding, addion, mulplicaon Floang point in C Summary

29

Carnegie Mellon

Floang Point in C

C Guarantees Two Levels


f loat d ouble
single precision double precision

Conversions/Casng

C asng between int, float, and double changes bit representaon double/float int
Truncates fraconal part Like rounding toward zero Not dened when out of range or NaN: Generally sets to TMin int double Exact conversion, as long as int has 53 bit word size int float Will round according to rounding mode

30

Carnegie Mellon

Summary
IEEE Floang Point has clear mathemacal properes E Represents numbers of form M x 2 One can reason about operaons independent of implementaon

As if computed with perfect precision and then rounded

Not the same as real arithmec


programmers

Violates associavity/distribuvity Makes life dicult for compilers & serious numerical applicaons

31

Carnegie Mellon

Floang Point Puzzles

For each of the following C expressions, either:


Argue that it is true for all argument values Explain why not true

int x = ; float f = ; double d = ;

Assume neither d nor f is NaN

 d == (double)(float) d  f == -(-f);  2/3 == 2/3.0  d < 0.0 ((d*2) < 0.0)  d > f -f > -d  d * d >= 0.0
 x == (int)(float) x  x == (int)(double) x  f == (float)(double) f  (d+f)-d == f
32

Carnegie Mellon

Creang Floang Point Number

Steps

Normalize to have leading 1 1 4-bits Round to t within fracon Postnormalize to deal with eects of rounding

exp

frac 3-bits

Case Study
128 15 33 35 138 63

Convert 8-bit unsigned numbers to ny oang point format


Example Numbers
10000000 00001101 00010001 00010011 10001010 00111111
33

Carnegie Mellon

Normalize

exp

frac 3-bits

Requirement

1 4-bits

Set binary point so that numbers of form 1.xxxxx Adjust all to have leading one
Decrement exponent as shi le Value Binary Fracon 128 10000000 1.0000000 15 00001101 1.1010000 17 00010001 1.0001000 19 00010011 1.0011000 138 10001010 1.0001010 63 00111111 1.1111100

Exponent 7 3 4 4 7 5

34

Carnegie Mellon

Rounding
Guard bit: LSB of result

1.BBGRXXX
Scky bit: OR of remaining bits

Round bit: 1st bit removed

Round up condions

Round = 1, Scky = 1 > 0.5 Guard = 1, Round = 1, Scky = 0 Round to even


Value
128 15 17 19 138 63 1.0000000 1.1010000 1.0001000 1.0011000 1.0001010 1.1111100

Fracon

000 100 010 110 011 111

GRS

N N N Y Y Y

Incr?

1.000 1.101 1.000 1.010 1.001 10.000

Rounded

35

Carnegie Mellon

Postnormalize

Issue

Rounding may have caused overow Handle by shiing right once & incremenng exponent
Value 128 15 17 19 138 63 Rounded 1.000 1.101 1.000 1.010 1.001 10.000 Exp 7 3 4 4 7 5 Adjusted 1.000/6

Result 128 15 16 20 134 64

36

Carnegie Mellon

More Slides

37

Carnegie Mellon

Interesng Numbers
Descripon exp Zero 0000 Smallest Pos. Denorm. 0000 Single 1.4 x 1045 Double 4.9 x 10324 Largest Denormalized 0000 Single 1.18 x 1038 Double 2.2 x 10308 Smallest Pos. Normalized 0001 Just larger than largest denormalized One 0111 Largest Normalized 1110 Single 3.4 x 1038 Double 1.8 x 10308 frac 0000 0001

{single,double}
Numeric Value 0.0 2 {23,52} x 2 {126,1022}

1111

(1.0 ) x 2 {126,1022}

0000 0000 1111

1.0 x 2 {126,1022} 1.0 (2.0 ) x 2{127,1023}

38

Carnegie Mellon

Mathemacal Properes of FP Add

Compare to those of Abelian Group


Closed under addion?
But may generate innity or NaN Commutave? Associave? Overow and inexactness of rounding 0 is addive identy? Every element has addive inverse Except for innies & NaNs Yes

Yes No Yes Almost

Monotonicity

a b a+c b+c?

Almost

Except for innies & NaNs

39

Carnegie Mellon

Mathemacal Properes of FP Mult

Compare to Commutave Ring


Closed under mulplicaon?

Yes Yes No Yes No

But may generate innity or NaN Mulplicaon Commutave? Mulplicaon is Associave? Possibility of overow, inexactness of rounding 1 is mulplicave identy? Mulplicaon distributes over addion? Possibility of overow, inexactness of rounding

Monotonicity

a b & c 0 a * c b *c?

Almost
40

Except for innies & NaNs

You might also like