Professional Documents
Culture Documents
John Palmer
Intel Corporation
This paper describes a new device, the Intel~ new applications, most notably interval arithmetic
8087 Numeric Data Processor, with unprecedented [1]. The 8087 provides an unprecedented level of
speed, accuracy and capability. Its modified stack capability, safety and r e l i a b i l i t y with high per-
architecture and instruction set are explained and formance and low cost and is a prime example of the
i l l u s t r a t i v e examples are included. The 8087, almost incredible p o s s i b i l i t i e s in combining soft-
which conforms to the proposed IEEE FloAting-Point ware and architectural expertise with VLSI proces-
Standard, is a coprocessor in the Intel~8086 fam- sing capability.
i l y . I t supports seven data types: three REAL,
three INTEGERand one packed BCD format, and per- 2.0 8087 OVERVIEW
forms a l l necessary numeric operations from addi-
tion to logarithmic and trigonometric functions. The 8087 consists of a stack of registers for
holding operands and results, a set of registers
constituting i t s environment and a set of instruc-
tions.
TAGS STACK
1.0 INTRODUCTION
¢- SIGN
The Intel~)8087 is a high performance gen- -7,=1 o
eral purpose nume~c data processor. I t is a
part of the InteNJ8086 family and can be used 5 EXPONENT 51GNIFleRND ST{~)
with either the 8086 or the 8088 to extend their
instruction sets by over 120 numeric data manip- ST (a)
ulation operations. The 8087 is not a peripheral ST(K)
but a coprocessor; i t monitors the instruction ST(o) STBCK
stream and when an 8086/8088 ESCAPE instruction
is read, the 8087 takes over the bus and inter-
prets and executes the ESCAPE instruction as one
of i t s own instructions. This t i g h t l y coupled
coprocessing interface permits the 8087 to exe-
i ST(G)
cute numeric instructions while the 8086 executes II 5T(5)
any others. The concurrent instruction execution ST(4)
increases the throughput of the system. Further-
more, the 8087 is the only chip that must be added
to an 8086 (8088) system to provide numeric capa-
b i l i t y that exceeds software in speed by more than The tag f i e l d is used to detect u n i n i t i a l i z e d
a factor of 100. stack elements and to designate special values
(e.g. zero) for microcode optimization.
The 8087 is intended to be general purpose
and satisfy a very wide range of needs for math- The value represented in a register has 64
ematical computation. I t is fast enough for a
great many s c i e n t i f i c and s t a t i s t i c a l calculations; bits of precision and a range of about 10±4900 (15
i t is accurate enough for business and commercial b i t exponent). A more complete description of the
computation; and i t is precise enough for entirely register values w i l l be given in Section 3.
PC : precision control - r e s u l t s are rounded The standard s p e c i f i e s and the 8087 supports
to one of three p r e c i s i o n s : Temporary three f l o a t i n g - p o i n t data types: Real ( s i n g l e
Real (64 b i t s ) , Long Real (53 b i t s ) , p r e c i s i o n ) , Long Real (double p r e c i s i o n ) and Tem-
Real (24 b i t s ) . porary Real (extended p r e c i s i o n ) . A l l formats are
binary and each has a biased exponent. The values
RC : rounding control - r e s u l t s are rounded represented by the three formats are shown below.
in one of four d i r e c t i o n s : unbiased
round to nearest, round towards + ~ ,
,~m ~ o
round towards - ~ , round towards zero.
IC : i n f i n i t y control - there are two types
of i n f i n i t y a r i t h m e t i c provided: a f f i n e
and p r o j e c t i v e .
t75
i. I : i n v a l i d operation
RERL.. LONGR£BL T g M R REAL t h i s exception is signaled by stack
TOTRL. ~4 bi-t-5 80 bit5 overflow or underflow, the use of a
L.E N ~TH 3 E bits NAN as an operand and several other
EXPoNENT '3 bits I I bi'i~ 15 bits cases as l i s t e d in ~3]
LENGTH
EXPO~E~,,I'r p.,1 _ [ ~.,o_ I ~'~- I 2. D: denormalized operand
VALU4 e.-O
<o..F') at least one operand is denormalized
The 8087 provides these rounding modes as con- These two modes require the representation of
trolled by a f i e l d (RC) in the CONTROLWORD. two zeros (±0) which are "equal" in comparison and
a l l other operations except division where*I/+O=,loc~
The 8087, which does a l l c a l c u l a t i o n s in +I#O:-~. The mode of i n f i n i t y arithmetic is de-
Temporary Real format, has another f i e l d in the termined by a f i e l d (IC) in the CONTROLword.
CONTROL word f o r s p e c i f y i n g the precision to which
a r e s u l t is rounded (PC). Thus, the p r e c i s i o n of There are instructions that support the stand-
r e s u l t s is independent of the p r e c i s i o n of operands ard by controlling rounding, precision and i n f i n -
and, though held in Temporary Real format and ben- i t y arithmetic and by permitting complete exception
e f i t t i n g from extended range, may be forced to handling. These instructions load and store either
Real, Long Real or Temporary Real. This control the control word or the entire environment and
is provided f o r languages t h a t do not a l l o w ex- store the exception flags.
tended p r e c i s i o n intermediates and to allow the
same code to be run under d i f f e r e n t precision set- The features and instructions discussed above
t i n g s as an aid to e r r o r estimation. support the Intel floating-point (REALMATH) stand-
ard but additional capability is also desired.
The standard also specifies that a l l excep-
tions must be detected and that an implementation 3.2 Capability Extension
should permit exception handling. The 8087 sup-
ports this by detecting six types of exceptions The 8087, by supporting the required and op-
and by generating an interrupt i f the exception is tional aspects of the standard and by supporting
not masked. I f an interrupt is generated, the in- several features not mentioned by the standard,
terrupt procedure (exception handler) has avail- s i g n i f i c a n t l y extends the capabilities of the 8086
able the exception flags, a pointer to the instruc- family beyond that expected from a typical floating-
tion causing the interrupt and a pointer to the point processor. These extensions include addi-
tional data types, provision of exact arithmetic,
datum i f memory was addressed. The six exceptions,
each of which has an associated "sticky" flag (once support for interval arithmetic and special func-
set i t remains set until reset by software), are tions.
listed below.
176
The 8087 addresses seven d i f f e r e n t data types Start at a and proceed clockwise u n t i l b is
using a l l of the 8086 addressing modes. These data reached; a l l numbers covered belong to I. The
types are: signs on zero and i n f i n i t y permit us to have open
or closed intervals when zero or i n f i n i t y is an end
1. Real (32 b i t s ) point with the sign denoting which case pertains.
I f an endpoint is neither zero nor i n f i n i t y then
2. Long Real (64 b i t s ) the interval is always closed. A complete d e f i n i -
tion of interval arithmetic cannot be given here;
3. Temporary Real (80 b i t s ) however, we can l i s t some of i t s uses. In addition
to i t s obvious a b i l i t y to bound rounding errors,
4. Integer Word (16 b i t s 2's complement) interval arithmetic can be used to estimate the
effect of noise in data, to compute confidence in-
5. Integer (32 b i t 2's complement) tervals and to do worst-case analysis.
This new INTERVAL data type, which the 8087 The remainder i n s t r u c t i o n is f o r reducing ar-
supports through the rounding modes (RC) and the guments of periodic f u n c t i o n s to a primary range.
signed zeros and i n f i n i t i e s , can be represented I t c a l c u l a t e s the exact remainder (no roundoff er-
as an ordered pa~r: INTERVAL, I = [a,b~. If a~b ror) of the top two stack elements:
then I includes a l l numbers between a and b; but
REM = (TOP) modulo (next-of-TOP)
i f a > b then I includes a l l numbers x where x ~ a
or x ~ b . An i l l u s t r a t i o n may help c l a r i f y the con-
The remainder is returned to the stack top and the
cept. Consider the set of numbers as a c i r c l e with
the two cases described above pictured as next-of-TOP ( " d i v i s o r " ) is not changed. Since the
execution of a f l o a t i n g - p o i n t remainder could be
very lengthy, the remainder i n s t r u c t i o n is a c t u a l l y
a primitive: the r e s u l t is e i t h e r the remainder or
the p a r t i a l remainder a f t e r a f i x e d number of steps.
Thus to compute a remainder requires a software
loop that terminates when I(TOP)I is less than
I(TOP +I) I. Even by using remainder we w i l l not
have t r i g o n o m e t r i c functions with period 2'Irsince
'IT'cannot be e x a c t l y represented in the 8087. How-
ever, the functions w i l l be e x a c t l y p e r i o d i c with
o 0
'177
period 2"Ir'* (whereqT'* is the machine approximation l i n k stage, i t is necessary to explain the 8086-
to.lr') and thus w i l l obey the i d e n t i t i e s t h a t do not 8087 i n t e r f a c e .
explicitly involveqT'.
The 8086 (8088) has a set of ESCAPE i n s t r u c -
The other i n s t r u c t i o n s provided f o r special t i o n s t h a t , in memory addressing mode, cause the
functions are TANGENT, ARCTANGENT, EXPONENTIAL and 8086 to c a l c u l a t e the address and read the contents
LOGARITHM. of t h a t address. The 8086 ignores the word i t
reads and then preceeds to execute subsequent i n -
The tangent assumes the top of stack, X, i s s t r u c t i o n s . The 8087 is monitoring the same i n -
between zero and'IT'/4 and returns two r e s u l t s as s t r u c t i o n stream and when i t detects an ESCAPE i t
shown: . knows t h a t i t is being i n s t r u c t e d to do something.
I t latches the opcode and i f there was an address
ToP .
A
X
I t
I
TAN
T~P
/
/
A
Y
c a l c u l a t e d the 8087 captures both the address and
the datum read by the 8086. By decoding the i n -
s t r u c t i o n the 8087 knows how many more words i t
meeds from memory and i t increments the address and
fetches data u n t i l a l l required data is read. The
8087 then releases the bus and begins c a l c u l a t i n g
w h i l e the 8086 continues executing the i n s t r u c t i o n
The arctangent works in reverse by using two argu-
stream. Because of the overlapped coprocessing of
ments and r e t u r n i n g one: the 8086-8087 i t is necessary to preceed 8087 i n -
s t r u c t i o n s (ESCAPE) with a WAIT i n s t r u c t i o n in or-
der to synchronize the two processors. In place
A
:
•
Ily~z
ATAN
>O
.
A
" of the WAIT, when the software emulator is to be
invoked, an INTERRUPT i n s t r u c t i o n is inserted.
y IT°p " X There are some other d i f f e r e n c e s between the hard-
ware and software i n t e r f a c e s but they are the same
TOP = ~ II X--arc'fon(y~)j length and use the same addressing mechanism. This
The exponential i n s t r u c t i o n , which c a l c u l a t e s permits a compiler to output an external reference
instead of the WAIT-ESCAPE and l e t the LINKER f i l l
2 X -1, assumes t h a t 0 _~x~1/2 and overwrites the
in with e i t h e r WAIT-ESCAPE or INTERRUPT depending
argument on the top of the stack with the r e s u l t . on whether the user has an 8087 or desires to use
The logarithm f u n c t i o n , which computes Y * log2(X),
the emulator.
uses two arguments and returns a s i n g l e r e s u l t as
shown In a d d i t i o n to software emulation to aid s o f t -
ware development, the 8087 has an e i g h t level stack
I i i
of r e g i s t e r s t h a t supports the Temporary Real (80
b i t ) format and makes the 8087 f a r easier to use
than other f l o a t i n g - p o i n t processors. A l l calcu-
Y x >o ~" l a t i o n s are done in t h i s extended format and as
TOP ~ X [~:y~loq~Cx)l long as intermediates are kept in the stack or i t s
e q u i v a l e n t memory format ( i f e i g h t is not enough)
The e r r o r bound f o r a l l these f u n c t i o n s is about 2 then the t h r e a t of roundoff damage and r i s k of over-
u n i t s in the l a s t place thus a l l o w i n g f o r Long Real flow or underflow is g r e a t l y reduced. Roundoff er-
arguments to be computed to Long Real accuracy. ror is reduced because Temporary Real intermediates
The p r o v i s i o n of the described special f u n c t i o n s are more precise than Long Real data or f i n a l re-
support the goal of increased c a p a b i l i t y . s u l t s by eleven guard b i t s . Most overflows and
underflows occur on intermediate c a l c u l a t i o n s and
3.3 Ease of Use the extended range of Temporary over Long Real
(1024900 vs. 10 ±308 ) ensures t h a t on intermediates
As stated above, ease of use, along with sup- these exceptions need seldom, i f ever, occur.
port of the standard and extended c a p a b i l i t y , is
a major 8087 goal. We have made the 8087 easy and The symmetric mixed mode i n s t r u c t i o n set also
convenient f o r programmers and automatic code gen- c o n t r i b u t e s to ease of use. The CORE i n s t r u c t i o n s ,
erators by providing software emulation, a deep which include LOAD, STORE & POP, STORE, ADD, SUB-
(8 l e v e l s ) i n t e r n a l stack of very wide precision TRACT, SUBTRACT REVERSE, MULTIPLY, DIVIDE, DIVIDE
(64 bits) and large range (10:1:4900), optimized sym- REVERSE, COMPARE, and COMPARE & POP, take one o-
metric mixed mode arithmetic and on chip default perand from the top of stack and a second operand
exception handling. from e i t h e r memory or a stack element. There are
thus two forms of CORE i n s t r u c t i o n s : memory ad-
The i n t e r f a c e between the 8086 (8088) and 8087 dressed and stack addressed. The memory addressed
allows f o r software emulation of the 8087 permit- form supports four memory formats in a l l 8086 ad-
t i n g software f o r the 8087 to be developed, de- dressing modes:
bugged and executed on a system containing only an
8086 (8088). In order to run the developed soft- Integer Word (16 b i t 2's complement)
ware on an 8087 i t is not necessary to recompile Integer (32 b i t 2's complement)
but only r e l i n k . To understand how one can delay Real (32 b i t )
the resolution of either 8087 or emulator u n t i l the Long Real (64 b i t )
~78
The LOAD Integer i n s t r u c t i o n converts an i n t e g e r p h i c a l l y l a r g e r ( i g n o r i n g the sign) otherwise i t
to Temporary Real format and pushes i t on the stack; generates a special NAN c a l l e d INDEFINITE as the
the ADD Long Real i n s t r u c t i o n converts a Long Real result.
operand to Temporary Real and adds i t to the top of
the stack; and t h e STORE Integer Word i n s t r u c t i o n 2. Denormalized Operand - the operand is con-
converts the top of stack to a 16 b i t integer and verted to an e q u i v a l e n t unnormalized rep-
stores i t in memory ( w i t h o u t a l t e r i n g the contents resentation preserving the same number of
of the stack). leading zeros.
The stack addressed form of the CORE i n s t r u c - 3. Zero D i v i s o r - since the dividend is non-
t i o n s obtains the second operand from one of the zero the r e s u l t is ± ~ with the sign set
stack elements instead of memory. The reference in the usual way (XOR of the signs of the
is always r e l a t i v e to the top of stack; thus stack operands).
element i , where i:O . . . . . 7, refers to the i t h ele-
ment of the stack under the top of stack. The 4. Overflow - the r e s u l t i s ~ w i t h the sign
stack addressed form has two options f o r the des- of the overflowed r e s u l t .
t i n a t i o n of the r e s u l t . The r e s u l t can e i t h e r over-
w r i t e the top of stack or replace the contents of 5. Underflow - the r e s u l t is denormalized to
the i t h stack element depending on the s e t t i n g of f i t the d e s t i n a t i o n ' s format ("gradual
the "di-rection" (D) b i t in the i n s t r u c t i o n . I f the underflow" E4J).
d e s t i n a t i o n is the i t h stack element then depending
on the s e t t i n g of another b i t (the "pop" (P) b i t ) 6. Inexact Result - the c o r r e c t l y rounded
the stack is popped or l e f t unaltered. r e s u l t is returned.
The EXTENDED instructi~on set consists of two A l l of the features discussed above: software em-
memory addressed type of i n s t r u c t i o n s , LOAD and u l a t i o n , deep Temporary Real stack, symmetric and
STORE & POP, t h a t support three a d d i t i o n a l memory powerful i n s t r u c t i o n set and d e f a u l t exception
formats: handling, make the 8087 easy and convenient to use;
but to be useful i t must also be e f f i c i e n t .
Long Integer (64 b i t 2's complement)
Temporary Real (80 b i t ) 3.4 Effic.iency
Packed BCD (80 b i t )
E f f i c i e n c y was a major goal in the design of
The Temporary Real format is supported f o r extending the 8087. An extensive treatment of the i n t e r n a l
the 8087 stack to memory when necessary; the Packed hardware and algorithms w i l l be given elsewhere,
BCD format, which is a signed 18 d i g i t i n t e g e r as but a b r i e f d e s c r i p t i o n w i l l i l l u s t r a t e our concern
shown, f o r performance. The 8087's main ALU is more than
64 b i t s wide. This is to handle e f f i c i e n t l y 64
b i t operands with guard, round and s t i c k y b i t s [ 6 ]
°I °°. Hod and at l e a s t one overflow b i t . I t s s h i f t e r can
s h i f t r i g h t or l e f t from 0 to 63 places in one
clock cycle. This is useful f o r f o r m a t t i n g , nor-
is used to aid binary-decimal conversion and COBOL malizing and denormalizing and f o r the transcen-
type c a l c u l a t i o n s ; and the Long Integer format is dental f u n c t i o n s . For normalizing there is hard-
supported f o r a p p l i c a t i o n s r e q u i r i n g very wide pre- ware f o r detecting the p o s i t i o n of the most s i g -
c i s i o n exact computation. Again i t is important n i f i c a n t one. F i n a l l y , there is special harc~ware
to note t h a t conversion of these formats to Tem- to permit m u l t i p l y , d i v i d e , remainder and square
porary Real is done with no rounding e r r o r . root to be calculated r a p i d l y . Approximate speeds
of the basic operations f o r stack operands are
Another i n s t r u c t i o n , included to make the 8087 summarized below:
easy to use, is in n e i t h e r the CORE nor the EXTEN-
5MHz
DED set but i t s value is obvious. That i n s t r u c t i o n
Microseconds
is EXCHANGEtop of stack with the i t h stack element.
This i n s t r u c t i o n has no memory form and ignores the COMPARE 5
D and P b i t s . ADD (MAGNITUDE) 10
SUBTRACT (MAGNITUDE) 16
A f u r t h e r user convenience in the 8087 is i t s MULTIPLY 16, 24*
on-chip d e f a u l t exception handling. Though i t is DIVIDE 38
possible to handle exceptions with software, i t is SQUARE ROOT 38
often an onerous task to w r i t e , debug and maintain
exception handlers. The d e f a u l t 8087 response to * shorter time i f e i t h e r operand was o r i g i n a l l y
an exception is invoked by masking in the CONTROL Real (32 b i t )
WORD t h a t exception. The 8087's response to masked
exceptions balances safety With the u t i l i t y of con- The above timings apply f o r Real, Long Real or
tinued c a l c u l a t i o n . Listed below are the d e f a u l t Temporary Real operands and r e s u l t s . The p r e v i -
responses to masked exceptions: ously described overlapped i n s t r u c t i o n execution
by the 8086 and 8087 also increases throughput.
1. I n v a l i d Operation - i f e i t h e r operand is However, more important t h a t absolute execution
NAN, the 8087 propagates the l e x i c o g r a - speeds is the stack with i t s i n t e r n a l addressing
'179
t h a t minimizes memory referencing. There is an i n - 5. Add TOP (XT) to Sx and POP
s t r u c t i o n f o r scaling t h a t is much f a s t e r than mul-
tiply. For rapid context s w i t c h i n g , the 8087 has 6. LOAD Yi
SAVE and RESTORE i n s t r u c t i o n s . The i n s t r u c t i o n set
and the hardware to execute i t r a p i d l y give the 7. Add TOP (Yj) to My
8087 very high performance w i t h o u t s a c r i f i c i n g
quality. 8. M u l t i p l y TOP (Yi) to Xi
'180
5. Kahan, W. (1972), "A Survey of Error Analysis,"
Information Processing 71, North Holland Pub-
lishing Company, 1214-1239.
t81