You are on page 1of 44

Some Hardware Fundamentals and an Introduction to Software

In order to comprehend fully the function of system software, it is vital to understand the operation of computer hardware and peripherals. The reason for this is that software and hardware are inextricably connected in a symbiotic relationship. First, however, we need to identify types of software and their relationship to each other and, ultimately to the hardware.
Figure 1 Software Hierarchy

Application Software System Software Firmware

Hardware

Computer Logic Circuits

Figure 1 represents the relationship between the various types of software and hardware. The figure appears as an inverted pyramid to reflect the relative size and number of the various types of software, on one hand, and their proximity to computer hardware, on the other. First, application software is remote from, and rarely interacts with, the computers hardware. This is particularly true of applications that run on modern operating systems such as indows !T ".#, $###, %& and '!I% variants. (y $###, with the advent of the .!)T and *ava paradigms, applications became even further removed from hardware, 1 Tom (utler

as .!)Ts +ommon ,anguage -untime .+,-/ and the *ava 0irtual 1achine .*01/ provide operating system and hardware access. Indeed, from the operating systems perspective, the *01 and the +,- are merely applications. 2lder operating systems such as 134523, permitted some direct interaction between applications, chiefly computer games6 however, this meant that vendors of such applications had to write code that would interact with the computers (I23 .(asic Input72utput 3ystem or firmware based on the computers read only memory or -21 integrated circuits/. Indeed, when computers first appeared on the mar8et this practice was the norm, rather than the exception. 9pplication programmers soon tired of reinventing the wheel every time they wrote an application, as they would have to include software routines that helped the application software communicate and control hardware devices, including the +&' .central processing unit/. In order to overcome this, computer scientists focused on developing a new type of software:operating or system software: whose sole purpose was to provide an environment or interface for applications such that the burden of managing and communicating the computer hardware was removed from application programs. This proved important as technological advances resulted in computer hardware becoming more sophisticated and difficult to manage. Thus, operating systems were developed to manage a computers hardware resources and provide an application programming interface, as well as a user or administrator interface, to permit access to the hardware for use and configuration by application software programmers and systems administrators. In early computer systems, a boot strap code, that was either loaded into the system manually via switches, and7or pre4coded punched cards or teletype tape, was re;uired to load the operating system program and boot the system so that an application program could be loaded and run. The advent of read only memory .-21/ in the 1<=#s saw the emergence of firmware6 that is, system software embedded in the hardware. The developers of mainframe computers, minicomputers, and early microprocessors saw the advantage of having some operating system code integrated into a computers hardware to permit efficient operation, particularly during the power and boot up phases and before the operating system was loaded from secondary storage:initially magnetic tape and later floppy and hard dis8s. >owever, firmware came into its own in microprocessors $ Tom (utler

systems and, later, personal computers. (y the turn of the new millennium, entire operating systems, such as indows !T and ,inux, appeared in the firmware of embedded systems. The most recent advances in this area have been in the &59 or &oc8et &+ mar8et, where &alm 23 and 1icrosoft +) are competing for dominance. That said, while almost every type of electronic device possesses firmware of one form or other, the most prevalent appears in personal computers .&+s/. ,i8ewise, &+s dominate the computer mar8et due to their presence in all areas of human activity. >ence, understanding &+ hardware has become a sine ;ua non for all who call themselves IT professionals. The remainder of this chapter therefore focuses on delineating the basic architecture of todays &+.

A Brief Look Under the Hood of Todays PC


This section provides a brief examination of the ma?or components of the &+.

The Power Supply


The most oft ignored of the &+s component is the system power supply. 1ost household electrical appliances operate on alternating current .9+/ 11# 0olt .@# >z 9+, e.g. '39/ or $$# 0olt .A# >z 9+, )urope/. >owever, electronic subassemblies or entire devices with embedded logic circuitry, whether microprocessor4based or not, operate exclusively on direct current .5+/. The ?ob of a &+s power supply is to transform and rectify the external 9+ commercial supplies to a range of 5+ voltages re;uired by the computer logic, associated electronic components, the 5+ motors in the hard dis8 drives, floppy, +5B-21, and 505 drives and the system fans. Typical 5+ power supplies in a &+ are rated at 1.A, C.C, A, 4A, 1$, 41$ volts. 9lso note that as !oteboo8 and ,aptop computers have a rechargeable 5+ battery, it re;uires special 5+45+ converters to generate the re;uired range of 5+ voltages. 3everal colour4designated cables emanate from a computers power supply unit, the largest of which is connected to the computers main circuit board, called the motherboard. The various 5+ voltages are distributed via the power supply rails printed onto the circuit board.

Tom (utler

The Basic Input/Output Operating System


The (asic Input72utput 3ystem .(I23/ is system software and is a collection of hardware4related software routines embedded as firmware on a read4only memory .-21/ integrated circuit .I+/ Dchip which is typically housed on a computers motherboard. 'sually referred to as -21 (I23, this software component provides the most fundamental means for the operating system to communicate with the hardware. >owever, most (I23s are 1@ bit programs and must operate in real mode 1 on machines with Intel processors. hile this does not cause performance problems during the boot4 up phase, it means a degradation in &+ performance as the +&' switches from protected to real mode when (I23 routines are referenced by an operating system. C$ bit (I23s are presently in use, but are not widespread. 1odern C$ bit operating systems such as ,I!'% do not use the (I23 after bootup, as the designers of ,I!'% integrated C$ bit versions of the (I23 routines into the ,I!'% 8ernel. >ence, the limitations of real mode switches in the +&' are avoided. !evertheless, the (I23 plays a critical role during the boot4up phase, as it performs the power4on self test .&23T/ for the computer and then loads the boot code from the hard dis8s master boot record .1(-/, which in turn copies the system software into -91 and loads it into the +&'. hen a computer is first turned on, 5+ voltages are applied to the +&' and associated electrical and logic circuits. This would lead to electronic mayhem if the +&' did not assert control. >owever, a +&' is merely a collection of hundreds of thousands .and now millions/ of logic circuits. +&' designers therefore built in a predetermined se;uence of programmed electronic events, which are triggered when a signal appears on the +&'s reset pin. This has the +&'s control unit use the memory address in the instruction counter .I+/ register to fetch the first instruction to be executed. The C$ bit value placed in the I+ is the address of the first byte of the final @" E( segment in the first 1 1( of the computers address space .this is a hangover from the early days of the &+ when the last CF" E( of the first 1 1( of -91 was reserved for the system and peripheral (I23 routines, each of which were @" E( in length/. This is the address of first of the many 1@
1

1@ bit applications operate in real mode on all Intel +&'s. This effectively limits the address space to 1

1(, by using 1@ x @" E( program segments. )ach 1@ bit application can only address @" E( .$ 16 G @A,AC@ locations/, however, the +&' manages and uses an extra " bit7address lines to provide the (I23, 23 and applications with 1@ .$4 G 1@/ segment addresses.

"

Tom (utler

bit (I23 instructions to be executedH remember these instructions ma8e up the various hardware specific software routines in the system (I23. These routines systematically chec8 each basic hardware component including the +&', -91, the system bus .including address, data and control lines/, the expansion buses. &eripheral devices such as the video graphics chipset7adapter, hard dis8 drives, etc. are then chec8ed. )ach hardware device also has its own (I23 routines that enable the +&' to communicate with it. These are viewed as extensions of the system (I23 and on boot up are invo8ed to ensure that the device is operating properly. The -21 (I23 also runs a chec8 on itself. 2f course all of this happens under the control of the +&', which is itself controlled by the (I23 routines. If any of the basic components, such as the +&', -91, system bus, etc. malfunction a special se;uence of beeps are emitted from the +&' spea8er. )xamples of errors detected by &23T routines areH (I23 -21 chec8sum, -91 refresh failure, -91 parity chec8, -91 address line failures, (ase @"E -91 failure, timer malfunction, +&' malfunction, 8eyboard fail, 0I5)2 memory malfunction, and so on. 2nce the video has been tested the text error messages are displayed for malfunctioning components. These allow a repair technician to ;uic8ly diagnose the cause of the failure. Finally, the (I23 examines the system configuration with that stored in the +123 chip. If a new device has been added .e.g. a hard dis8 drive/ changed or its configuration altered the (I23 will alert the user and7or ta8e remedial action such as necessary. For example, it will install a new hard dis8 and register it in the +123 table. +123 stands for complementary metal oxide semiconductor .or silicon, in some texts/. +123 integrated circuits have low power consumption characteristics and are therefore suitable as non4volatile -91. +123 chips can contain @" E( of data, but the core system data ta8es only 1$F bytes. The (I23 puts structure on the data much li8e a database management system would. It also provides a software interface called the 3etup 'tility that can be accessed at boot up. The setup utility provides software control of the following system componentsH +&', (I23 routines, the motherboard chipset, integrated peripherals .floppy dis8 drive/, power management setup, &lugn&lay, &eripheral +omponent Interconnect .&+I/, basic system security. The drawbac8 of +123 chips is power has to be maintained even if the computer is powered down. This A Tom (utler

is achieved using a small long4life battery mounted on the motherboard. >owever, with the advent of Flash -21, the +123 function has been integrated with the (I23 itself. This has also been significant for (I23 upgrades, which can be downloaded over the Internet and loaded or Dflashed onto the -21 (I23 chip. 3imilar software is used to save or ma8e changes to system setup features. The other ma?or function of the (I23 is to identify the boot device .+54-21, floppy dis8 or hard dis8/ and transfer the operating system code to -91. The boot strap loader is simply a short routine that polls each bootable device and then uses he devices master boot record to locate the system and7or boot partitions and thereby load the operating system files. In the partition. In indows !T7$### world, if there is more than one partition or dis8 drive, with one or more operating systems, then the first of these will be called the system indows !T7$### machines the system partition will hold the following filesH !T,5-, (22T.I!I, !T5)T)+T.+21, !T(22T55.3I3. For example, the boot4up route in the 1(- will start the !T,5- program. In a multiboot system with more than one operating system, !T,5- will examine the (22T.I!I file to identify the default operating system and7or present options to the user to boot up a particular operating system such as indows <F, %& etc. If indows !T7$### is selected, then the !T5)T)+T.+21 program determines the hardware configuration of the computer. This will have been previously stored in the >E)IB,2+9,B19+>I!) >ive of the -egistry. The registry is stored as a binary file, but is a database of hardware and software components, as well as authorized users, their passwords and personal settings.

Tom (utler

Figure

The Intel !"# $hipset

The %other&oard and the $hipset


The motherboard or system board houses all system components, from the +&', -91, expansion slots .).J. I39 and &+I/, to the I72 controllers. >owever, the 8ey component on a motherboard is the chipset. hile motherboards are identified physically by their form factor, the chipset designation indicates the capability of the motherboard to house system components. The most popular form factor is I(1s 9T%. This motherboard was designed by I(1 to increase air movement for cooling on4board components, and allow easier access to the +&' and -91. hile the motherboard contains many chips or I+s, such as the +&', -91, (I23, and a variety of smaller chips, two chips now handle most of the I72 functionality of a &+. The first is the !orthbridge chip, which handles all communication .address, data and control/ to the +&', -91, 9ccelerated Jraphics &ort = Tom (utler

and &+I devices. The frontside system bus .F3(/ terminates on the !orthbridge chip and permits the +&' to access the -91, 9J& and &+I devices and those serviced by the 3outhbridge chip .and vice versa/. The 3outhbridge chip permits communication with slow peripherals such as the floppy dis8 drive, the hard dis8 drive7+54-213, I39 devices, and the parallel, serial, mouse, 8eyboard ports Flash -21 (I23.
Figure ' The Intel !#( $hipset

Intel and 0I9 are the leaders in chipset manufacture as of $##$, although there are several other manufacturers:9li and 3i3. hile Intel services its own +&'s, 0I9 manufactures for both Intel and its ma?or competitor 915. In $##$, the basic Intel iFA# chipset consisted of the F$FA# !orthbridge 1+> .1emory +ontroller >ub/ and a I+>$ .I72 +ontroller >ub/ 3outhbridge. The chipset also contains a Firmware >ub .F>/ that provides access to the Flash -21 (I23. This permits up to "J( of -91 with )++ F Tom (utler

.error correction/, "%9J& 1ode, " 'ltra 9T9 1## I5) dis8 drives, and four '3( ports. I39 is not supported. 5ifferent chipset designs support different -91 types and speeds .e.g. 55- 35-91 or -91(us 5-91/, +&' types and pac8aging, system bus speeds, and so on. In $###, Intel announced that the future of -91 in the &+ industry was -91(us 5ram .-5-91/. This heralded the release of the Intel F$# D+amino chipset, which supported three -91(us memory slots. >owever, errors in the design meant that only two memory slots could be used. 9 loss of confidence in the mar8etplace meant that withdrawal of the ill4fated +amino and its replacement with the Intel F"# D+armel chipset. This includes a @" bit &+I controller, a redesigned and improved -5-91 memory repeater, and an 35-91 memory repeater that converts the -5-91 protocol to 35-91. This was a smart move by Intel, which bac8fired terribly as the 35-91 hub had design errors that limited the limited the number of 35-91s that could be used. In addition, the -5-91 to 35-91 conversion protocol impaired overall memory throughput when using 35-91. +onse;uently, faster memory performance on Intels &entium III +oppermine +&'s with an 1CC 1hz Frontside (us could only be achieved using 0I9s 9pollo &ro 1CC 9. To ma8e matters worse, the Intel F1A 3olano chipset, which was introduced to support &+ 1CC 5I11s .35-91 memory modules/ and to help regain mar8et share from 0I9, would not allow 35-91 modules wor8 at 1CC 1hz, if +&'s .such as certain variants of Intels &entium III/ rated for a 1## 1hz external cloc8 rate were fitted on the motherboard. This particularly applies to the +eleron family which ran at a @@ 1hz external cloc8 rate. It is significant that many of Intels competitors promoted &+1CC and &+ $@@ 5I11 standards over the more expensive -91(us 5-91. This further impeded the acceptance of -5-916 however, by late $##$, -5-91 had its own mar8et niche as the price of 35-91 increased once more. Intel learned from its experience with +amino and +armel chipsets. (owing to mar8et pressure it designed two new chipset families for use with its new &entium I0 +&'. The first of these, the iF"A .see Figure $/ was targeted at systems based on the &entium I0 and synchronous 5-91 memory such as the &+1CC, $CC, and CCC, with up to C J( of memory. The iFA# .see Figure C/ was targeted on -5-914based systems of up to " J(, which supported the &+ F##, 1#CC and 1#@@ -91(us memory. In late $##$, The Intel < Tom (utler

F"AJ) chipset was released to support &+CCC 55- 3-91 and &entium " processor. The chipset also included Intels )xtreme Jraphics technology which ran at $@@ 1>z core speed. The basic member of the Intel FA# chipset family had support for &+F## -5-91 memory and provided a balanced performance platform for the &entium " processor with "##1>z system bus and !et(urstK 9rchitecture. It also supports dual channel access to -5-91 -I11s, which increases overall throughput to C.$ Jbps. 3ubse;uent developments in this chipset family provided support for -5-91 running at 1#CC 1hz, 1#@@ 1hz and a ACC 1>z F3(. Further advances in 55- 35-91 technologies saw 55- 35-914based Intel and 0I9 chipsets which accommodated &+$"## 9!5 &+$=## 55- 3--91 running at 1A# 1hz and 1@@1>z.respectively and which is double cloc8ed to C## and CCC 1hz .so called 55- C## and CCC/. >owever, the evolution of 55-C@@ and chipset design led to the &+C### 55- 35-91 being released
with even higher bandwidth speeds.

Basic CPU Architectures

CISC vs. RISC


There are two types of fundamental +&' architectureH complex instruction set computers .+I3+/ and reduced instruction set computers .-I3+/. +I3+ is the most prevalent and established microprocessor architecture, while -I3+ is a relative newcomer. Intels F#xF@ and &entium microprocessor families are +I3+4based, although -I3+4type functionality has been incorporated into &entium +&'s. 1otorolas @F### family of microprocessors is another example of this type of architecture. 3un 1icrosystems 3&9-+ microprocessors and 1I&3 -$###, -C### and -"### families dominate the -I3+ end of the mar8et6 however, 1otorolas &ower&+, J", Intels iF@#, and 9nalog 5evices Inc.s digital signal processors .53&/ are in wide use. In the &+7 or8station mar8et, 9pple +omputers and 3un employ -I3+ microprocessors as their choice of +&'.

1#

Tom (utler

Ta&le 1 $IS$ and )IS$ $IS$ ,arge instruction set +omplex, powerful instructions Instruction sub4commands microcoded in on board -21 +ompact and versatile register set !umerous memory addressing options for operands )IS$ +ompact instruction set 3imple hard4wired machine code and control unit &ipelining of instructions !umerous registers +ompiler and I+ developed simultanwously

The difference between the two architectures is the relative complexity of the instruction sets and underlying electronic and logic circuits in +I3+ microprocessors. For example, the original -I3+ I prototype had ?ust C1 instructions, while the -I3+ II had C<. In the -I3+ II prototype, these instructions are hard4wired into the microprocessor using "1,### integrated transistors, so that when a program instruction is presented for execution it can be processed immediately. This typifies the pure -I3+ approach, which results in up4to4a fourfold increase in processing power over comparable +I3+ processors. In contrast, the Intel CF@ has $F#,### and uses microcode stored in on4board -21 to process the instructions. +omplex instructions have to be first decoded in order to identify which microcode routine needs to be executed to implement the instructions. The &entium II uses <.A million transistors and while older microcode is retained, the most fre;uently used and simpler instructions, such as 11%, are hardwired. Thus &entium +&'s are essentially a hybrid, however they are still classified as -I3+ as their basic instructions are complex. -emember the internal transistor logic gates in a +&' are opened and closed under the control of cloc8 pulses .i.e. electrical voltage values of # or A 0 .volts/ being # or 1/. These simply process the binary machine code or data by producing predetermined outputs for given inputs. 1achine code or instructions .the binary e;uivalent of high level programming code/ control the operation of the +&' so that logical or mathematical operations can be executed. In +I3+ processors, complex instructions are first decoded and the corresponding microcode routine dispatched to the execution unit. The decode activity can ta8e several cloc8 cycles depending on the complexity of the instruction. In

11

Tom (utler

Figure " Typical %icroprocessor Architectures

9ddress (us

5ata (us

+ontrol (us

(us Interface 'nit

Includes read7write, interrupt, cloc8 and reset

Internal (us

Instruction -egister

+ontrol 'nit

5ecode 'nit

&rogram +ounter 3tac8 &ointer 9% (& (% 3I +% 5I 5% Flag Jeneral purpose registersH 9% is the 9ccumulator

9rithmetic and ,ogic 'nit

1$

Tom (utler

the 1<=#s, an I(1 engineer discovered that $#L of the instructions were doing F#L of the wor8 in a typical +&'. In addition, he found that a collection of simple instructions could perform the same operation as a complex instruction in less cloc8 cycle3. This led him to propose an architecture based on reduced instruction set size, where small instructions could be executed without decoding and in parallel with others. 9s indicated, this simplified +&' design and made for faster processing of instructions with reduced overhead in terms of cloc8 cycles.

Inside the CPU


The basic function of a +&' is to fetch, decode and execute instructions held in -21 or -91. To accomplish this it must fetch data from an external memory source and transfer it into its own internal memory, each addressable component of which is called a register. It must also be able to distinguish between instructions and operands, that is, the. read7write memory locations containing the data to be operated on. These may be byte addressable location in -21, -91 or in the +&'s own registers. In addition, the +&' must perform additional tas8s such as responding to external events such as resets and interrupts, provide memory management facilities to the operating system, etc. 9 consideration of the fundamental components in a basic microprocessor is first underta8en before introducing more complex modern devices. Figure $ illustrates a typical microprocessor architecture 1icroprocessors must perform the following activitiesH 1. &rovide temporary storage for addresses and data $. &erform arithmetic and logic operations C. +ontrol and schedule all operations. )egisters -egisters for a variety of purposes such as holding the address of instructions and data, storing the result of an operation, signaling the result of a logic operation, or indicating the status of the program or the +&' itself. 3ome registers may be accessible to programmers, while others are reserved for us by the +&' itself. -egisters store binary values such as 1 or # as electrical voltages of say A volts or # volts. They consist of

1C

Tom (utler

several integrated transistors which are configured as a flip4flop circuits each of which can be switched into a 1 or # state. They remain in that state until changed under control of the +&' or until the power is removed from the processor. )ach register has a specific name and is addressable, some, however, are dedicated to specific tas8s while the ma?ority are Dgeneral purpose. The width of a register depends on the type of +&', e.g., an 1@, C$ or @" bit microprocessor. In order to provide bac8ward compatibility, registers may be sub4divided. For example, the &entium processor is a C$ bit +&', and its registers are C$ bits wide. 3ome of these are sub4divided and named as F and 1@ bit registers in order to run F and 1@ bit applications designed for earlier xF@ microprocessors. Instruction )egister hen the (us Interface 'nit receives an instruction it transfers it to the Instruction -egister for temporary storage. In &entium processors the (us Interface 'nit transfers instructions to the ,1 I4+ache, there is no instruction register as such. Stac* Pointer 9 Dstac8 is a small area of reserved memory used to store the data in the +&'s registers whenH .1/ system calls are made by a process to operating system routines6 .$/ when hardware interrupts generated by input7output .I72/ transactions on peripheral devices6 .C/ when a process initiates an I72 transfer6 .C/ when a process rescheduling event occurs on foot of a hardware timer interrupt. This transfer of register contents is called a Dcontext switch. The stac8 pointer is the register which holds the address of the most recent Dstac8 entry. >ence, when a system call is made by a process .to say print a document/ and its context is stored on the stac8, the called system routine uses the stac8 pointer to reload the register contents when it is finished printing. Thus the process can continue where it left off. Instruction +ecoder The Instruction 5ecoder is an arrangement of logic elements which act on the bits that constitute the instruction. 3imple instructions with corresponding logic hard4wired into the execution unit are simply passed to the )xecution 'nit .and7or the 11% in the &entium II, III and I0/, complex instructions are decoded so that related microcode

1"

Tom (utler

modules can be transferred from the +&'s microcode -21 to the execution unit. The Instruction 5ecoder will also store referenced operands in appropriate registers so data at the memory locations referenced can be fetched. Program or Instruction $ounter The &rogram +ounter .&+/ is the register that stores the address in primary memory .-91 or -21/ of the next instruction to be executed. In C$ bit systems, this is a C$ bit linear or virtual memory address that references a byte .the first of " re;uired to store the C$ bit instruction/ in the processs virtual memory address space. This value is translated to determine the real memory address in which the instruction is stored. hen the referenced instruction is fetched, the address in the &+ is incremented to the address of the next instruction to be executed. If the current address is ##(# hex, then the next address will be ##(" hex. -emember each byte in -91 is individually addressable, however each complete instruction is C$ bits or " bytes, and the address of the next instruction in the process will be " bytes on. Accumulator The accumulator may contain data to be used in a mathematical or logical operation, or it may contain the result of an operation. Jeneral purpose registers are used to support the accumulator by holding data to be loaded to7from the accumulator. $omputer Status ,ord or Flag )egister The result of a 9,' operation may have conse;uences of subse;uent operations6 for example, changing the path of execution. Individual bits in this register are set or reset in accordance with the result of mathematical or logical operations. 9lso called a flag, each bit in the register has a preassigned meaning and the contents are monitored by the control unit to help control +&' related actions. Arithmetic and -ogic .nit The 9rithmetic and ,ogic 'nit .9,'/ performs all arithmetic and logic operations in a microprocessor viz. addition, subtraction, logical 9!5, 2-, )%42-, etc.. 9 typical 9,' is connected to accumulator and general purpose registers and other +&' components

1A

Tom (utler

that help transfer the result of its operations to -91 via the (us Interface 'nit and the system bus. The results may also be written into internal or external caches. $ontrol .nit The control unit coordinates and manages +&' activities, in particular the execution of instructions by the arithmetic and logic unit .9,'/. In &entium processors its role is complex, as microcode from decoded instructions are pipelined for execution by two 9,'s. The System $loc* The Intel F#FF had a cloc8 speed of ".== 1hz6 that is, its internal logic gates were opened and closed under the control of a s;uare wave pulsed signal that had a fre;uency of ".== million cycles per second. 9lternatively put, the logic gates opened and closed ".== million times per second. Thus, instructions and data were pumped through the integrated transistor logic circuits at a rate of ".== million bits per second. ,ater designs ran at higher speeds viz. the i$F@ F4$# 1hz, the iCF@ 1@4CC 1hz, i"F@ $A4A# 1hz. here does this cloc8 signal come fromM )ach motherboard is fitted with a ;uartz oscillator in a metal pac8age that generates a s;uare wave cloc8 pulse of a certain fre;uency. In iF#FF systems the crystal oscillator ran at 1".C1F 1hz and this was fed to the iF$F" to generate the system cloc8 fre;uency of ".== 1hz in earlier system, to 1#1hz is later designs. ,ater, the i$F@ &+s had a 1$ 1hz crystal which provided iF$$F" I+ multiplier7divider with the primary cloc8 signal. This then divided7multiplied the basic 1$ 1hz to generate the system cloc8 signal of F4$# 1hz. ith the advent of the i"F@5%, the system cloc8 signal, which ran at $A or CC 1hz, was effectively multiplied by factors of $ and C to deliver an internal +&' cloc8 speed of A#, @@, =A, 1## 1hz. This approach is used in &entium I0 architectures, where the primary crystal source delivers a relatively slow A# 1hz cloc8 signal that is then multiplied to the system cloc8 speed of 1##41CC 1hz. The internal multiplier in the &entium then multiplies this by a fact or $#N to obtain speeds of $Jhz and above.

1@

Tom (utler

Instruction $ycle 9n instruction cycle consists of the activities re;uired to fetch and execute an instruction. The length of time ta8e to fetch and execute is measured in cloc8 cycles. In +I3+ processors this will ta8e many cloc8 cycles, depending on the complexity of the instruction and number of memory references made to load operands. In -I3+ computers the number of cloc8 cycles are reduced significantly. hen the +&' finishes the execution of an instruction it transfers the content of the program or instruction register into the (us Interface 'nit .1 cloc8 cycle/ . This is then gated onto the system address bus and the read signal is asserted on the control bus .1 cloc8 cycle/. This is a signal to the -91 controller that the value of this address is to be read from memory and loaded onto the data bus ."N cloc8 cycles/. The instruction is read in from the data bus and decoded .$ N cloc8 cycles. The fetch and decode activities constitute the first machine cycle of the instruction cycle. The second machine cycle begins when the instructions operand is read from -91 and ends when the instruction is executed and the result written bac8 to memory. This will ta8e at least another FN cloc8 cycles, depending on the complexity of the instruction. Thus an instruction cycle will ta8e at least 1@ cloc8 cycles, a considerable length of time. Together, -I3+ processors and fast -91 can 8eep this to a minimum. >owever, Intel made advances by super pipelining instructions, that is by interleaving fetch, decode, operand read, execute, and retire .i.e. write the result of the instruction to -91/ activities into two separate pipelines serving two 9,'s. >ence, instructions are not executed se;uentially, but concurrently and in parallel:more about pipelining later.

#th and /th 0eneration Intel $P. Architecture


The &entium microprocessor was the last of Intels Ath generation microprocessors and had several basic unitsH the (us Interface 'nit .(I'/6 the I4+ache .F E( of write4through 3tatic -91:3-91/6 the Instruction Translation ,oo8aside (uffer .T,(/6 The 54 +ache .FE( of write4bac8 3-91/6 the 5ata T,(6 the +loc8 5river71ultiplier6 Instruction Fetch 'nit6 the (ranch &rediction 'nit6 the Instruction 5ecode 'nit6 +omplex Instruction 3upport 'nit6 3uperscalar Integer )xecution 'nit6 &ipelined Floating &oint 'nit. Figure A presents a bloc8 diagram of the original &entium.

1=

Tom (utler

The &entium was the first Intel chip to have a @" bit external data bus which was split internally into two separate pipelines, each C$ bits wide. This allowed the &entium to execute two instructions simultaneously6 however, more than one instruction could be in the pipeline, thus increasing instruction throughput. >eat dissipation is enemy of chip designers, as the greater the number of integrated transistors, the higher the speed of operation and the operating voltage, the more poser is consumed, and the more heat generated. The first two &entium versions ran at @# and @@ 1hz respectively with an operating voltage of A 0 5+. >ence they ran ;uite hot. >owever, a change in pac8age design .from 3oc8et A to =, &in Jrid 9rray:&J9/ and a reduction in operating voltage to C.C 0olts lowered power consumption and heat dissipation. Intel also introduced a cloc8 multiplier which multiplied the external cloc8 signals and enabled the &entium to run at 1.A, $, $.A and finally C times this speed. Thus while the system bus ran at A#, @#, and @@ 1hz, the +&' ran at =A4$##1hz. In 1<<=, Intel changed the &entium design in several ways, the most significant was the inclusion of an 11% unit .multi media extension/ and 1@ E( instruction and data caches. The 11% unit contains a eight new @" bit registers and A= Dsimple hardwired 11% instructions that operate on " new data types. The internal architecture and external operation of the &entium family evolved from the &entium 11%, with the &entium &ro, &entium II and &entium III. >owever, ma?or design changes came with the &entium I0. 1odifications and design changes centered on .a/ the physical pac8age6 .b/ the process by which instructions were decoded and executed6 .c/ support for memory beyond the " J( limit6 .c/ the integration and enhancement of ,1 and ,$ cache performance and size6 .d/ the addition of a new cache6 .e/ the speed of internal and external operation. )ach of these issues receives attention in the following subsections.

1F

Tom (utler

Figure # Pentium $P. Bloc* +iagram

64 bit Data Bus

32 bit Address Bus Bus Interface Unit

Control Bus

I-Cache (8KB)

TLB

D-Cache (8KB)

TLB

Brach Target Buffer

Clock Multiplier

Prefetch Buffer

Fetch and Decode Unit

Microcode ROM

Dual Pipeline Execution Unit

U-Pipeline

V-Pipeline

Control Unit Floating Point Unit

ALU

ALU

Registers

Advanced Programmable Interrupt Controller

1<

Tom (utler

Physical Packaging
Two terms are employed to describe the pac8aging employed for the &entium family of processorsH the first refers to the motherboard connection, and the second to the actual pac8age itself. For example, the original &entium &A was fitted to the 3oc8et A type connection on the motherboard using a 3taggered &in Jrid 9rray .3&J9/ for the dies I72 .die is the technical term for the physical structure that incorporates the chip/. ,ater variants used the 3oc8et = connector. The &in Jrid 9rray .&J9/ family of pac8ages are associated with different 3oc8et types, which are numbered. 9 pin grid array is simply an array of metal pin connectors used to form an electrical connection between the internal electronics of the +&' .pac8aged on the die/ and other system components li8e the system chipsets. The pins plug into corresponding receptacle pinholes in the +&'s soc8et on the motherboard. The different types of &J9 reflect the type of pac8aging, e.g. ceramic to plastic, the number of pins, and how they are arrayed. The &entium &ro used a 3&J9 with a staggering CF= pins for connection to the motherboard soc8et, called 3oc8et F. The &entium &ro was the first Intel processor to have an ,$ cache connected to the +&' via bac8side bus, but on a separate die. This was a significant technical achievement pac8aging. hen Intel designed the &entium II they decided to change the pac8aging significantly and introduced a 3ingle )dge +ontact +onnector .3)++/ pac8age .with three variants 3)++ for the &entium II, 3)++$ for the &entium II and 3)&& for the +eleron/, each of which plugged into the 3lot 1 connector on the motherboard. >owever, later variants of the +eleron and &entium III used &J9 pac8aging for certain applicationsH the +eleron uses the &lastic &J9, the +eleron III and &entium III the Flip4 +hip &in Jrid 9rray .F+4&J9/. (oth use the C=#4pin 3oc8et. The &entium I0 saw a full return to the &J9 for all chips. >ere a Flip4+hip &in Jrid 9rray .F+4&J9/ was employed in a "=F &+&J9 pac8age.

Overall Architectural Comparison of the Pentium Family of Microprocessors


The &entium .&A"/ first shipped in 1<<C and had C.1 million transistors. It used a A 0olt to power its core and I72 logic, &J9 on 3oc8et ", had a $xF8b ,1 cache, and operated at A#, @# and @@ 1hz. The system bus also operated at these speeds. The &entium .&A"+/ was released in 1<<" and had &J9 on 3oc8et A and =, C.C 0olts supply for core and I72 logic. It was also the first to use a multiplier to give processor speeds of =A, $# Tom (utler

<#,1##,1$#,1CC, 1A#, 1@@ and $## 1hz. The last version of the first member of this sub4 generation was the &entium 11% .&AA+/. This had a ".1 million transistors, fit 3oc8et =, and had a $x1@E( ,1 cache with improved branch prediction logic. It operated at $.F 0 for its core logic and C.C0 for I72 logic. Its @# and @@ 1>z system cloc8 speed was multiplied on board the +&' to give between 1$#4C##1>z +&' cloc8 speeds.

3uperscalar architectureH Two integer .' .slow/ and 0 .fast// and one floating point pipelines. The ' and 0 pipelines contain five stages of instruction execution, while the floating point pipeline has F stages. The ' and 0 pipelines are served by two C$ byte prefetch buffers. This allows overlapping execution of instructions in the pipelines.

5ynamic branch prediction using the (ranch Target (uffer. The &entiums branch prediction logic helps speed up program execution by anticipating branches and ensuring that branched4to code is available in cache

9n Instruction and a 5ata +ache each of F Ebyte capacity 9 @" bit system data bus and C$ bit address bus 5ual processing capability 2n4board 9dvanced &rogrammable Interrupt +ontroller The &entium 11% version contains an additional 11% unit that speeds up multimedia and C5 applications. &rocessing multimedia data involves instructions operating on large volumes of pac8etized data. Intel proposed a new approachH single instruction multiple data, which could operate on video pixels or Internet audio streams. The 11% unit contains a eight new @" bit registers and A= Dsimple hardwired 11% instructions that operate on " new data types. To leverage the features of the 11% unit, applications must be programmed to include the new instructions.

Pentium Pro
The &entium &ro was designed around a the @ th generation &@ architecture, which was optimized for C$ bit instructions and C$4bit operating systems such as indows !T and ,inux. It was the first of the &@ family, which included the &entium II, the +eleron $1 Tom (utler

variants, and the &entium III. 9s indicated, the physical pac8age was also significant advance, as was the incorporation of additional -I3+ features. >owever, aimed as it was at the server mar8et, the &entium &ro did not incorporate 11% technology. It was expensive to produce as it included the ,$ cache on its substrate .but on a separate die/ and had A.A million transistors at its core and over F million in its ,$ cache. Its core logic operated at C.C0olts. The microprocessor was still, however, chiefly +I3+ in design, and optimized for C$ bit operation. The chief features of the &entium &ro wereH

9 partly integrated ,$ cache of up to A1$ E( .on a specially manufactured 3-91 separate die/ that was connected via a dedicated Dbac8side bus that ran at full +&' speed.

Three 1$ staged pipelines 3peculative execution of instructions 2ut4of4order completion of instructions "# renamed registers 5ynamic branch prediction 1ultiprocessing with up to " &entium &ros 9n increased bus size to C@ bits .from C$/ to enable up to @" Jb of memory to be used. .&lease note that the " extra bits can address up to 1@ memory locations6 this gives " Jb x 1@ G @" Jb of memory./ The following description is ta8en from Intels introduction to its microprocessor

architecture is relevant to all members of the &@ family, including the +eleron, &entium II and III. The Intel &entium &ro processor has three4way superscalar architecture. The term Othree4way superscalarP means that using parallel processing techni;ues, the processor is able on average to decode, dispatch, and complete execution of .retire/ three instructions per cloc8 cycle. To handle this level of instruction throughput, the &entium &ro processor uses a decoupled, 1$4stage superpipeline that supports out4of4order instruction execution. It does this by incorporating even more parallelism than the &entium processor. The

$$

Tom (utler

&entium &ro processor provides 5ynamic )xecution .micro4data flow analysis, out4of4 order execution, superior branch prediction, and speculative execution/ in a superscalar implementation. The centerpiece of the &entium &ro processor architecture is an innovative out4of4 order execution mechanism called Odynamic execution.P 5ynamic execution incorporates three data4processing conceptsH

Q 5eep branch prediction. Q 5ynamic data flow analysis. Q 3peculative execution.


(ranch prediction is a concept found in most mainframe and high4speed -I3+ microprocessor architectures. It allows the processor to decode instructions beyond branches to 8eep the instruction pipeline full. In the &entium &ro processor, the instruction fetch7decode unit uses a highly optimized branch prediction algorithm to predict the direction of the instruction stream through multiple levels of branches, procedure calls, and returns.
Figure / Functional Bloc* +iagram of the Pentium Pro Processor %icro1architecture

$C

Tom (utler

5ynamic data flow analysis involves real4time analysis of the flow of data through the processor to determine data and register dependencies and to detect opportunities for out4 of4order instruction execution. The &entium &ro processor dispatch7execute unit can simultaneously monitor many instructions and execute these instructions in the order that optimizes the use of the processors multiple execution units, while maintaining the integrity of the data being operated on. This out4of4order execution 8eeps the execution units busy even when cache misses and data dependencies among instructions occur. 3peculative execution refers to the processors ability to execute instructions ahead of the program counter but ultimately to commit the results in the order of the original instruction stream. To ma8e speculative execution possible, the &entium &ro processor microarchitecture decouples the dispatching and executing of instructions from the commitment of results. The processors dispatch7execute unit uses data4flow analysis to execute all available instructions in the instruction pool and store the results in temporary registers. The retirement unit then linearly searches the instruction pool for completed instructions that no longer have data dependencies with other instructions or unresolved branch predictions. hen completed instructions are found, the retirement unit commits the results of these instructions to memory and7or the Intel 9rchitecture registers .the processors eight general4purpose registers and eight floating4point unit data registers/ in the order they were originally issued and retires the instructions from the instruction pool. Through deep branch prediction, dynamic data4flow analysis, and speculative execution, dynamic execution removes the constraint of linear instruction se;uencing between the traditional fetch and execute phases of instruction execution. It allows instructions to be decoded deep into multi4level branches to 8eep the instruction pipeline full. It promotes out4of4order instruction execution to 8eep the processors six instruction execution units

$"

Tom (utler

running at full capacity. 9nd finally it commits the results of executed instructions in original program order to maintain data integrity and program coherency. Three instruction decode units wor8 in parallel to decode ob?ect code into smaller operations called Omicro4opsP .microcode/. These go into an instruction pool, and .when interdependencies dont prevent/ can be executed out of order by the five parallel execution units .two integer, two F&' and one memory interface unit/. The -etirement 'nit retires completed micro4ops in their original program order, ta8ing account of any branches. The power of the &entium &ro processor is further enhanced by its cachesH it has the same two on4chip F4E(yte ,1 caches as does the &entium processor, and also has a $A@4A1$ E(yte ,$ cache thats in the same pac8age as, and closely coupled to, the +&', using a dedicated @"4bit .Obac8sideP/ full cloc8 speed bus. The ,1 cache is dual ported, the ,$ cache supports up to " concurrent accesses, and the @"4bit external data bus is transaction 4oriented, meaning that each access is handled as a separate re;uest and response, with numerous re;uests allowed while awaiting a response. These parallel features for data access wor8 with the parallel execution capabilities to provide a Onon4bloc8ingP architecture in which the processor is more fully utilized and performance is enhanced.

Pentium Pro Modes of Operation


The Intel 9rchitecture supports three operating modesH protected mode, real4address mode, and system management mode. The operating mode determines which instructions and architectural features are accessibleH Protected mode2 The native state of the processor. In this mode all instructions and architectural features are available, providing the highest performance and capability. This is the recommended mode for all new applications and operating systems. 9mong the capabilities of protected mode is the ability to directly execute Oreal4addressmodeP F#F@ software in a protected, multi4tas8ing environment. This feature is called 3irtual1!(!/ mode, although it is not actually a processor mode. 0irtual4F#F@ mode is actually a protected mode attribute that can be enabled for any tas8.

$A

Tom (utler

)eal1address mode2 &rovides the programming environment of the Intel F#F@ processor with a few extensions .such as the ability to switch to protected or system management mode/. The processor is placed in real4address mode following power4up or a reset.

System management mode2 9 standard architectural feature uni;ue to all Intel processors, beginning with the IntelCF@ 3, processor. This mode provides an operating system or executive with a transparent mechanism for implementing platform4specific functions such as power management and system security. The processor enters 311 when the external 311 interrupt pin .31IR/ is activated or an 31I is received from the advanced programmable interrupt controller .9&I+/. In 311, the processor switches to a separate address space while saving the entire context of the currently running program or tas8. 3114specific code may then be executed transparently. 'pon returning from 311, the processor is placed bac8 into its state prior to the system management interrupt.

The basic execution environment is the same for each of these operating modes, Basic Pentium 45ecution 4n3ironment 9ny program or tas8 running on an Intel 9rchitecture processor is given a set of resources for executing instructions and for storing code, data, and state information. These resources .shown in Figure / include an address space of up to $C$ bytes, a set of general data registers, a set of segment registers, and a set of status and control registers. hen a program calls a procedure, a procedure stac8 is added to the execution environment. .&rocedure calls and the procedure stac8 implementation are described in +hapter ", Procedure Calls, Interrupts, and Exceptions./
Figure 6 Basic 45ecution 4n3ironment

$@

Tom (utler

Pentium Pro %emory Organi7ation The memory that the processor addresses on its bus is called physical memory. &hysical memory is organized as a se;uence of F4bit bytes. )ach byte is assigned a uni;ue address, called a physical address. The physical address space ranges from zero to a maximum of $C$ S 1 ." gigabytes/. 0irtually any operating system or executive designed to wor8 with an Intel 9rchitecture processor will use the processors memory management facilities to access memory. These facilities provide features such as segmentation and paging, which allow memory to be managed efficiently and reliably. 1emory management is described in detail later. The following paragraphs describe the basic methods of addressing memory when memory management is used. hen employing the processors memory management facilities, programs do not directly address physical memory. Instead, they access memory using any of three memory modelsH flat, segmented, or real4address mode. ith the flat memory model .see Figure C4$/, memory appears to a program as a single, continuous address space, called a linear address space. +ode .a programs instructions/, data, and the procedure stac8 are all contained in this address space. The linear address space is byte addressable, with addresses running contiguously from # to $C$ 4 1. 9n address for any byte in the linear address space is called a linear address. ith the segmented memory model, memory appears to a program as a group of independent address spaces called segments. hen using this model, code, data, and stac8s are typically contained in separate segments. To address a byte in a segment, a program must issue a logical address, which consists of a segment selector and an offset.

$=

Tom (utler

.9 logical address is often referred to as a far pointer./ The segment selector identifies the segment to be accessed and the offset identifies a byte in the address space of the segment. The programs running on an Intel 9rchitecture processor can address up to 1@,CFC segments of different sizes and types, and each segment can be as large as $ C$ ."J(/ bytes. Internally, all the segments that are defined for a system are mapped into the processors linear address space. 3o, the processor translates each logical address into a linear address to access a memory location. This translation is transparent to the application program. The primary reason for using segmented memory is to increase the reliability of programs and systems. For example, placing a programs stac8 in a separate segment prevents the stac8 from growing into the code or data space and overwriting instructions or data, respectively. 9nd placing the operating systems or executives code, data, and stac8 in separate segments protects Them from the application program and vice versa. ith either the flat or segmented model, the Intel 9rchitecture provides facilities for dividing the linear address space into pages and mapping the pages into virtual memory. If an operating system7executive uses the Intel 9rchitectures paging mechanism, the existence of the pages is transparent to an application program. The real1address mode model uses the memory model for the Intel F#F@ processor, the first Intel 9rchitecture processor. It was provided in all the subse;uent Intel 9rchitecture processors for compatibility with existing programs written to run on the Intel F#F@ processor. The real address mode uses a specific implementation of segmented memory in which the linear address space for the program and the operating system7executive consists of an array of segments of up to @"E bytes in size each. The maximum size of the linear address space in real4address mode is $$# bytes.
Figure ! Three %emory %anagement %odels

$F

Tom (utler

' 1&it 3s2 1/1&it Address and Operand Si7es The processor can be configured for C$4bit or 1@4bit address and operand sizes. FFFFFFFF> .$C$/, and operand sizes are typically F bits or C$ bits. operand sizes are typically F bits or 1@ bits. ith C$4

bit address and operand sizes, the maximum linear address or segment offset is ith 1@4bit address and operand sizes, the maximum linear address or segment offset is FFFF> .$1@/, and hen using C$4bit addressing, a logical address .or far pointer/ consists of a 1@4bit segment selector and a C$4bit offset6 when using 1@4bit addressing, it consists of a 1@4bit segment selector and a 1@4bit offset. Instruction prefixes allow temporary overrides of the default address and7or operand sizes from within a program. hen operating in protected mode, the segment descriptor for the currently executing code segment defines the default address and operand size. 9 segment descriptor is a system data structure not normally visible to application code. 9ssembler directives allow the default addressing and operand size to be chosen for a program. The assembler and other tools then set up the segment descriptor for the code segment appropriately. hen operating in real4address mode, the default addressing and operand size is 1@ bits. 9n address4size override can be used in real4address mode to

$<

Tom (utler

enable C$ bit addressing6 however, the maximum allowable C$4bit address is still ####FFFF> .$1@/.
Figure 8 Application Programming )egisters

)40IST4)S
The processor provides 1@ registers for use in general system and application programming. 9s shown in Figure, these registers can be grouped as followsH

0eneral1purpose data registers. These eight registers are available for storing operands and pointers.

Segment registers. These registers hold up to six segment selectors. Status and control registers. These registers report and allow modification of the state of the processor and of the program being executed.

General-Purpose Data Re isters The C$4bit general4purpose data registers )9%, )(%, )+%, )5%, )3I, )5I, )(&, and )3& are provided for holding the following itemsH

C#

Tom (utler

2perands for logical and arithmetic operations 2perands for address calculations

9lthough all of these registers are available for general storage of operands, results, and pointers, caution should be used when referencing the )3& register. The )3& register holds the stac8 pointer and as a general rule should not be used for any other purpose. 1any instructions assign specific registers to hold operands. For example, string instructions use the contents of the )+%, )3I, and )5I registers as operands. hen using a segmented memory model, some instructions assume that pointers in certain registers are relative to specific segments. For instance, some instructions assume that a pointer in the )(% register points to a memory location in the 53 segment. The following is a summary of these special usesH

)9%:9ccumulator for operands and results data. )(%:&ointer to data in the 53 segment. )+%:+ounter for string and loop operations. )5%:I72 pointer. )3I:&ointer to data in the segment pointed to by the 53 register6 source pointer for string operations.

)5I:&ointer to data .or destination/ in the segment pointed to by the )3 register6 destination pointer for string operations.

)3&:3tac8 pointer .in the 33 segment/. )(&:&ointer to data on the stac8 .in the 33 segment/.

9s shown in Figure, the lower 1@ bits of the general4purpose registers map directly to the register set found in the F#F@ and Intel $F@ processors and can be referenced with the names 9%, (%, +%, 5%, (&, 3&, 3I, and 5I. )ach of the lower two bytes of the )9%, )(%, )+%, and )5% registers can be referenced by the names 9>, (>, +>, and 5> .high bytes/ and 9,, (,, +,, and 5, .low bytes/.

C1

Tom (utler

Segment )egisters The segment registers .+3, 53, 33, )3, F3, and J3/ hold 1@4bit segment selectors. 9 segment selector is a special pointer that identifies a segment in memory. To access a particular segment in memory, the segment selector for that segment must be present in the appropriate segment register. hen writing application code, you generally create segment selectors with assembler directives and symbols. The assembler and other tools then create the actual segment selector values associated with these directives and symbols. If you are writing system code, you may need to create segment selectors directly. >ow segment registers are used depends on the type of memory management model that the operating system or executive is using. hen using the flat .unsegmented/ memory model, the segment registers are loaded with segment selectors that point to overlapping segments, each of which begins at address # of the linear address space .as shown in Figure/. These overlapping segments then comprise the linear4address space for the program. .Typically, two overlapping segments are definedH one for code and another for data and stac8s. The +3 segment register points to the code segment and all the other segment registers point to the data and stac8 segment./ hen using the segmented memory model, each segment register is ordinarily loaded with a different segment selector so that each segment register points to a different segment within the linear4address space .as shown in Figure </. 9t any time, a program can thus access up to six segments in the linear4address space. To access a segment not pointed to by one of the segment registers, a program must first load the segment selector for the segment to be accessed into a segment register.
Figure 1( .se of Segment )egisters for Flat %emory %odel

C$

Tom (utler

Figure 11 .se of Segment )egisters in Segmented %emory %odel

)ach of the segment registers is associated with one of three types of storageH code, data, or stac8/. For example, the +3 register contains the segment selector for the code segment, where the instructions being executed are stored. The processor fetches instructions from the code segment, using a logical address that consists of the segment selector in the +3 register and the contents of the )I& register. The )I& register contains the linear address within the code segment of the next instruction to be executed. The +3 register cannot be loaded explicitly by an application program. Instead, it is loaded implicitly by instructions or internal processor operations that change program control .such as, procedure calls, interrupt handling, or tas8 switching/. The 53, )3, F3, and J3 registers point to four data segments. The availability of four data segments permits efficient and secure access to different types of data structures. For example, four separate data segments might be createdH one for the data structures of the current module, another for the data exported from a higher4level module, a third for a dynamically created data structure, and a fourth for data shared with another program. To

CC

Tom (utler

access additional data segments, the application program must load segment selectors for these segments into the 53, )3, F3, and J3 registers, as needed. The 33 register contains the segment selector for a stac* segment, where the procedure stac8 is stored for the program, tas8, or handler currently being executed. 9ll stac8 operations use the 33 register to find the stac8 segment. 'nli8e the +3 register, the 33 register can be loaded explicitly, which permits application programs to set up multiple stac8s and switch among them. The four segment registers +3, 53, 33, and )3 are the same as the segment registers found in the Intel F#F@ and Intel $F@ processors and the F3 and J3 registers were introduced into the Intel 9rchitecture with the IntelCF@ family of processors. 4F-A0S )egister The C$4bit )F,9J3 register contains a group of status flags, a control flag, and a group of system flags. Figure C4= defines the flags within this register. Following initialization of the processor .either by asserting the -)3)T pin or the I!IT pin/, the state of the )F,9J3 register is #######$>. (its 1, C, A, 1A, and $$ through C1 of this register are reserved. 3oftware should not use or depend on the states of any of these bits. 3ome of the flags in the )F,9J3 register can be modified directly, using special4purpose instructions .described in the following sections/. There are no instructions that allow the whole register to be examined or modified directly. >owever, the following instructions can be used to move groups of flags to and from the procedure stac8 or the )9% registerH ,9>F, 39>F, &'3>F, &'3>F5, &2&F, and &2&F5. 9fter the contents of the )F,9J3 register have been transferred to the procedure stac8 or )9% register, the flags can be examined and modified using the processors bit manipulation instructions .(T, (T3, (T-, and (T+/. hen suspending a tas8 .using the processors multitas8ing facilities/, the processor automatically saves the state of the )F,9J3 register in the tas8 state segment .T33/ for the tas8 being suspended. hen binding itself to a new tas8, the processor loads the )F,9J3 register with data from the new tas8s T33.

C"

Tom (utler

hen a call is made to an interrupt or exception handler procedure, the processor automatically saves the state of the )F,9J3 registers on the procedure stac8. saved in the T33 for the tas8 being suspended. Instruction Pointer The instruction pointer .)I&/ register contains the offset in the current code segment for the next instruction to be executed. It is advanced from one instruction boundary to the next in straightline code or it is moved ahead or bac8wards by a number of instructions when executing *1&, *cc, +9,,, -)T, and I-)T instructions. The )I& register cannot be accessed directly by software6 it is controlled implicitly by controltransfer instructions .such as *1&, *cc, +9,,, and -)T/, interrupts, and exceptions. The only way to read the )I& register is to execute a +9,, instruction and then read the value of the return instruction pointer from the procedure stac8. The )I& register can be loaded indirectly by modifying the value of a return instruction pointer on the procedure stac8 and executing a return instruction .-)T or I-)T/. 9ll Intel 9rchitecture processors prefetch instructions. (ecause of instruction prefetching, an instruction address read from the bus during an instruction load does not match the value in the )I& register. )ven though different processor generations use different prefetching mechanisms, the function of )I& register to direct program flow remains fully compatible with all software written to run on Intel 9rchitecture processors. Operand1si7e and Address1si7e Attri&utes hen processor is executing in protected mode, every code segment has a default operand4size attribute and address4size attribute. These attributes are selected with the 5 .default size/ flag in the segment descriptor for the code segment. 1@4bit size attributes are selected. always 1@ bits. hen the 5 flag is set the C$4bit operand4size and address4size attributes are selected6 when the flag is clear, the hen the processor is executing in real4address mode, virtual4F#F@ mode, or 311, the default operand4size and address4size attributes are hen an interrupt or exception is handled with a tas8 switch, the state of the )F,9J3 register is

CA

Tom (utler

The operand4size attribute selects the sizes of operands that instructions operate on. hen the 1@4bit operand4size attribute is in force, operands can generally be either F bits or 1@ bits, and when the C$4bit operand4size attribute is in force, operands can generally be F bits or C$ bits. The address4size attribute selects the sizes of addresses used to address memoryH 1@ bits or C$ bits. hen the 1@4bit address4size attribute is in force, hen the C$4bit address4size attribute is in segment offsets and displacements are 1@4bits. This restriction limits the size of a segment that can be addressed to @" E(ytes. force, segment offsets and displacements are C$4bits, allowing segments of up to " J(ytes to be addressed. The default operand4size attribute and7or address4size attribute can be overridden for a particular instruction by adding an operand4size and7or address4 size prefix to an instruction. The effect of this prefix applies only to the instruction it is attached to.

Pentium II
The &entium II incorporates many of the salient features of the &entium &ro and &entium 11%6 however, its physical pac8age was based on the 3)++73lot 1 interface and its A1$ E( ,$ cache ran at only half the processor internal cloc8 rate. First generation &entium II Elamath +&'s operated at $CC, $@@, C## and CCC1hz with a F3( of @@1hz and a core voltage of $.F 0olts. In 1<<F, Intel introduced the &entium II 5eschutes that operated at a speed of CA#, "## and "A# 1>z with a 1## 1hz, and later @@1>z, F3( and at $.# 0olts at the core. Its ma?or improvements wereH

1@ Eb ,1 instruction and data caches ,$ cache with non4proprietary commercially available 3-91 Improved 1@ bit capability through segment register caches 11% unit. 3tandard &entium II could only be used in dual multiprocessor configurations6 however, &entium %)2! cpus had up to $ 1( of ,$ cache and could be used in multiprocessor configurations of up to " processors.

C@

Tom (utler

Celeron
The +eleron began as a scaled down version of the &entium II and was designed to compete against similar offerings from Intels competitors. The Elamath4based +ovington core ran at $@@ and C## 1>z and were constructed without an ,$ cache. >owever, adverse mar8et reaction saw the 5eschutes4based 1endocino core introduced with an 1$F Eb ,$ cache and ran at C##, CCC, "##, "CC, "@@, A## and ACC 1>z. +elerons have the same ,1 cache as their bigger brothers:&entium II and III. The important distinction is that the ,$ cache operates at full +&' cloc8 rates, unli8e the &entium II and the 3)++ pac8aged &entium III. .,ater variants of the &entium III had an on4die ,$ cache which ran at full +&' cloc8 rate. The +eleron III .+oppermine1$F core/has the same internal features as the &entium III, but has reduced functionalityH @@ 1hz cloc8 rate, no error correction codes for the data bus, and parity creation for the address bus, and a maximum of " J( of address space. +eleron III +oppermine1$Fs with a 1.@ 0 core and a 1## 1>z were produced in $##1 and operated at core speeds of up to 1.1 1hz. Tualatin4core +elerons were put on the mar8et in late $##1 and ran at 1.$ J>z. $##$ saw the final versions produced running aty 1.C and 1." 1>z.

Pentium III
The only significant difference between the &entium III and its predecessor was the inclusion of =$ 11% instructions, 8nown as the Internet 3treaming 3ingle Instruction 1ultiple 5ata )xtensions .I33)/, they include integer and floating point operations. >owever, li8e the original 11% instructions, application programmers must include the corresponding extensions if any use is to be made of these instructions. The most controversial and short4lived addition was the +&' I5 number which could be used for software licensing and e4commerce. 9fter protest from various sources, Intel disabled it as default, but did not remove it. 5epending on the (I23 and motherboard manufacturer, it may remain as such but it can be enabled via the (I23. In reality, &entium III performance was based. The three variants of &entium III were the were the Eatami, +oppermine, and Tualatin. Eatami first introduced the I33) .11%7$/ as described with an F3( of 1## 1>T. The +oppermine also introduced 9dvanced Transfer +ache .9T+/ for the ,$ cache which reduced cache capacity to $A@ E( but saw the cache run at full processor speed. 9lso the @"4bit Eatami cache bus was ;uadrupled to $A@ bits. C= Tom (utler

+oppermine also uses an F4way set associative cache, rather than the "4way set associative cache in the Eatami and older &entiums. (ringing the cache on4die also increased the transistor count to C# million, from the 1# million on the Eatami. 9nother advance in the +oppermine was 9dvanced 3ystem (uffering .93(/, which simply increased the number of buffers to account for the increased F3( speed of 1CC 1>z. The &entium III Tualatin had a reduced die size that allowed it to run at higher speeds. Tualatins use a 1CC1>z F3( and have 9T+ and 93(.

Pentium I9: The ;e5t 0eneration


The release of the &entium I0 in $### heralded the seventh generation of Intel microprocessors. The release was premature, however, due to the out performance of the &entium III +oppermine, with its 1 Jhz performance threshold, by Intels ma?or competitor the microprocessor mar8et, the 915 9thlon. Intel was not ready to answer the competition through the early release of the next member of its &entium III family, the &entium III Tualatin, which were designed to brea8 the 1 Jhz barrier. &revious attempts to do so with the &entium III +oppermine 1.1C Jhz met with failure due to design flaws. &aradoxically, however, Intel was in a position to release the first of the &entium I0 family the illamette, which ran at 1.C, 1." and 1.A 1hz, using a F+4&J9 orse still, the only Intel chipset available for the pac8age on the short4lived 3oc8et "$C, which was a design dead end for motherboard manufacturers and consumers. &entium I0 could only house the highly expensive -ambus 5-91. In addition, the early versions of &entium I0 +&' were outperformed by slower 915 9thlons. !evertheless, the core capability of Intels seventh generation processors is that they can run at ever4 higher speeds. For example, Intels sixth generation &entiums began at 1$# 1hz with the &entium &ro and ended at over 1.$ Jhz, a tenfold increase. The bottom line here is that Intels seventh generation chips could end up running at speeds of 1# Jhz or more. >ow has Intel achieved thisM Through a radical redesign of the &entiums core architecture. The following sections illustrate the ma?or advances. The most visible feature seen of the new &entium I0 is the Front 3ide (us .F3(/ which initially operated at e;uivalent speed of "## 1hz as compared to 1## 1>z on the &entium III. The &entium III has a @"4bit data bus that delivered a data throughput of

CF

Tom (utler

1.#@@ J( .@"U 1CCG 1.#@@/. The &entium I0 F3( bus is also @"4bit wide, however, its 1## 1hz bus speed is D;uad4pumped giving an effective bus speed of "##1hz and a data transfer rate of C.$ J(. The newer .as of late $##$/ &entium I07chipsets operate at 1CC 1hz and deliver a bus speed of ACC 1hz and a bus speed of ".$ Jhz. Thus, the &entium I0 exchange data with the iF"A and iFA# chipsets faster than any other processor, thus removing the &entium IIIs most significant bottlenec8. IntelVs FA# chipset for the &entium I0 uses two -ambus channels to $4" -5-91 -I11s. Together, these two -5-91 channels are able to deliver the same data bandwidth as the &entium I0 F3(. 9s the later discussion on 5-91 indicates, similar transfer rates are delivered using the iF"A chipset and 55- 5-91. stellation enables &entium "4systems to have the highest data transfer rates between processor, system and main memory, which is a clear benefit.

Advanced Transfer Cache


The first ma?or improvement is the integration of the ,$ cache and the evolution of the 9dvanced Transfer +ache introduced in the &entium III +oppermine which had ?ust $A@ E( of ,1 +ache. The first &entium I0, the illamette, had a similar sized cache, but could transfer data at "F J( per second at a +&' cloc8 speed of 1.A Jhz into the +&'s core logic, In comparison, the +oppermine could only transfer 1@ J(7s at 1 Jhz to its ,1 Instruction +ache. !ote also that the Front 3ide (us speed of the &entium III was 1CC 1hz, while the &entium I0 illamette had a F3( speed of "## 1hz. In addition, the &entium I0 ,$ cache has 1$F4byte cache lines, which are divided in two @"4byte segments. For example, when the &entium I0 fetches data from the -91, it does so in @" byte burst transfers. >owever, if ?ust four bytes .C$ bits/ are re;uired this bloc8 transfer becomes inefficient. >owever, the cache has advanced 5ata &refetch ,ogic that predicts the data re;uired by the cache and loads it into the ,$ cache in advance. The &entium I0Vs hardware prefetch logic significantly accelerates the execution of processes that operate on large data arrays. The read latency .the time it ta8es the cache to transfer data into the pipeline/ of &entium "Vs ,$4cache is = cloc8 pulses. >owever, its connection to the core logic .the Translation ,oo8aside buffer in this case, there is no I4+ache in the &entium I0/ is $A@4bit wide and cloc8ed the full processor speed. The second member of the &entium I0 family was the !orthwood, which had a A1$ E( ,$ +ache running at the processors cloc8 speed. C< Tom (utler

L1 Data Cache
The second ma?or development in cache technology is that the &entium I0 has only one ,1 F E( data cache. In place of the ,1 instruction cache .I4+ache/ in the @ th generation &entiums it has a much more efficient )xecution Trace +ache. Intel reduced the size of its ,1 data cache to enable a very low latency of only $ cloc8 cycles. This results in an overall read latency .the time it ta8es to read data from cache memory/ of less than half of the &entium IIIVs ,1 data cache.

7th Generation NetBurst Micro-Architecture


Intels !et(urst 1icro49rchitecture provides a firm foundation for future advances in processor performance, particularly where speed of operation is concerned. The !et(urst micro4architecture has four ma?or componentsH >yper &ipelined Technology, -apid )xecution )ngine, )xecution Trace +ache and a "##1>z system bus. 9lso incorporated are four significant improvements over sixth generation architectureH 9dvanced 5ynamic )xecution, 9dvanced Transfer +ache, )nhanced Floating &oint W 1ultimedia 'nit, and 3treaming 3I15 )xtensions $.

Hyper Pipelined Technology


The traditional approach to increasing a +&'s cloc8 speed was ma8e smaller processors by shrin8ing the die. 9n alternative strategy evident in -I3+ processors is to ma8e the +&' more efficient do less per cloc8 cycle and have more of them. To do this in a +I3+4 based processor, Intel simply increased the number of stages in the processors pipeline. The upshot of this is that less is accomplished per cloc8 cycle. This is a8in to a Dbuc8et4 brigade passing smaller buc8ets rapidly down a chain, rather than larger buc8ets at a slower rate. For example, the ' and 0 integer pipelines in the original &entium each had ?ust five stagesH instruction fetch, decode 1, decode $, execute and write4bac8. The &entium &ro introduced a &@ architecture with a pipeline consisting of 1# stages. The &= !et(urst micro4architecture in the &entium I0 increased the number of stages to $#. This, Intel terms its >yper &ipelined Technology.

Enhanced Branch Prediction


The 8ey to pipeline efficiency and operation is effective branch prediction, hence the much improved branch prediction logic in the &entium I0s 9dvanced 5ynamic "# Tom (utler

)xecution )ngine .95)/. The &entium I0s branch prediction logic delivers a CCL improvement in prediction efficiency than that of the &entium III. The &entium I0 also contains a dedicated " E( (ranch Transfer (uffer. hen a processors branch prediction logic predicts the flow of operation correctly no changes need to be made to the code in the pipeline. >owever, when an incorrect prediction is made, the contents of the pipline must be replaced a new instruction cycle must begin at the start at the beginning of the pipeline. @th generation processors with their 1# stage pipeline suffer a lower overhead penalty for an unpredicted branch than that of the &entium I0 with its $# stage pipeline. The longer the pipeline, the further bac8 in a processs instruction execution path the processor needs to go in order to correct unpredicted branches. 2ne critical element in overcoming problems with unpredicted branches is the )xecution Trace +ache.

Execution Trace Cache


The &entium I0s sophisticated fancy )xecution Trace +ache is simply a 1$ E( ,1 instruction cache that lies sits between the decoders and the -apid )xecution )ngine. The cache stores the microcode .micro4ops/ of decoded complex instructions, especially those in a program loop, and minimises the wait time of the execution engine.

Rapid Execution Engine


The ma?or advance in the &entium I0s execution unit is that its two 9rithmetic ,ogic 'nits operate at twice the +&' cloc8 rate. This means that the 1.AJ>z &entium " had 9,'s running at CJ>zH the 9,' is effectively Ddouble pumped. The Floating &oint 'nit has no such feature. &entium III. hy the differenceM Intel had to double pump the 9,'s in order to deliver integer performance that was at least e;ual to that of a lower cloc8ed hyM The length of the &entium I0s $# stage pipeline and to ensure that any hit caused by poor branch prediction could be made up for by faster execution of microcode. The benefits here are that as the &entium I0s cloc8 speed increases, the integer performance of the processor will improve by a factor of two.

Enhanced Floating Point Processor


The &entium I0 has 1$F4bit floating point registers .up from the F# bit registers in he @ th generation &entiums/ and a dedicated register for data movement. This enhances floating

"1

Tom (utler

point operations, which are not prone to the same type of branch prediction inefficiencies as integer4based instructions.

Streaming SIMD Extensions 2


In the follow4up to Intels 3treaming 3I15 .3ingle Instruction 1ultiple 5ata/ )xtensions .33)/. 3I15 is a technology that allows a single instruction to be applied to multiple datasets at the same time. This is especially useful when processing C 5 graphics. 3I154 F& .Floating &oint/ extensions help speed up graphics processing by ta8ing the multiplication, addition and reciprocal functions and apply them to the multiple datasets simultaneously. -ecall, 3I15 first appeared with the &entium 11% which incorporated A= 11% instructions. These are essentially 3I154Int .integer/ instructions. Intel first introduced 3I154F& extensions in the &entium III with =$ 3treaming 3I15 )xtensions .33)/. Intel introduced 1"" new instructions in the &entium I0 that enable it to handle two @"4bit 3I154I!T operations and two double precision @"4bit 3I154F& operations. This is contrast to the two C$4bit operations the &entium 11% and III .under 33)/ handle. The ma?or benefit of 33)$ is enhanced greater performance, particularly with 3I154F& instructions, as it increases the processors ability to handle greater precision floating point calculations. 9s with 11% and 33), these instructions re;uire software support.

Celeron IV
The +eleron I0 first appeared in $##$, these were based on the &entium I0 and could be accommodated on the 3oc8et "=F motherboards. (ased on the illamette, the ,$ was halved to 1$F E( and ran at 1.= J>z. ,ater models ran at 1.F, 1.< and $ J>z. The next member was based on the !orthwood and had $A@ E( ,$ cache. (ased on the iF"A chipset, the new +elerons are now good value entry level processors.

Additional )esources
The following 5iagrams of the &entium III, I0 and 915 9thlon +&'s are provided to highlight the architectural features of these microprocessors and enhance the foregoing text. The following figures have been obtained from Toms >ardware Juide .!2T this Tom/H further insights into the Intel architectures may be found atH .httpH77www@.tomshardware.com7cpu7$###11$#7index.html/.

"$

Tom (utler

"C

Tom (utler

""

Tom (utler

You might also like