You are on page 1of 7

High Performance Scalable On-Chip Interconnects: Unleashing System Performance

Mohammad AsifKhan, Javed Ahmed, Ahmed Waqas Department ofComputer Science, Sukkur Institute ofBusiness Administration Airport Road, Sukkur Sindh, Pakistan asif.khan@iba-suk.edu.pk, javed@iba-suk.edu.pk, ahmad.waqas@iba-suk.edu.pk

Abstract- There is always demand for high performance


single chip microprocessor. In this regard microprocessor's manufacturers have worked hard and have gone through different techniques like increasing its clock speed, cache size, cores and hyper threading. But for past few years the designer have realized that all these techniques are insignificant until microprocessor have fast interconnect, which can provide fast access to memory, its cores and other

110

devices.

FSB

(Front

Side

Bus)

was

used

in for

microprocessors

which

was

shared

pathway

communication of microprocessors with other devices. FSB with shared nature has a great amount of contention for its access, which introduced performance bottle neck in this architecture. To over come the bottle necks ofFSB both AMD and Intel have introduced their proprietary standards HyperTransport and QuickPath interconnect respectively. Both of these interconnects are designed by keeping in view low latency, high bandwidth, scalable point-to-point interconnect, which can cope with the high performance needs of On-Chip microprocessor. These two interconnects have remarkably greater performance than traditional FSB and are well shaped for the future generations of microprocessors. Kqword--QuickPath, HyperTransport,Front Side Bus (FSB), On-Chip Interconnect, Scalable Interconnet

I.

I NTRODUCTION

For past ten years the microprocessor manufacturers have worked on different techniques to develop single chip microprocessors with high performance. To achieve this goal designers have worked on different aspects like increasing processor's clock speed, increasing cache size, increasing number of core on a single chip of processor and processor bus. Increasing the bus performance of processor is one of the most important aspect which can lead processor towards high performance, because bus is the only communication link through which microprocessor communicates with memory, hard disk and other I/O devices [4].Having processor that is good in all other aspects but is with inefficient microprocessor bus would lead to a situation where processor will execute its instructions quickly, but bus will process them slowly as result of this processor will wait for the response, and microprocessor's cycles will be wasted. Hence microprocessor bus/Interconnect has great impact over its performance [1]. In start manufacturers used FSB (Front Side Bus) as a microprocessor's bus, Pentium Pro was the first

microprocessor from Intel Corporation to use it [14].FSB provides a shared pathway between processor and other devices. Intel continued to use FSB in its processors till first generation of Core2 processor series. Intel was able to continue with FSB for long time because Intel brought very good innovations into the architecture of FSB to increase its performance by reducing latency and increasing its bandwidth. Intel increased FSB's performance by increasing FSB transfer per cycle to 2X and then to 4X which is also called quad pumped FSB. With this Intel changed FSB architecture to DIB (Dual Independent Bus) and then to DHSI (Dedicated high speed Interconnect) [1]. AMD also used FSB with its microprocessors and uses EV6 technology to boost bus to 2X data transfer per clock cycle [7]. AMD realized that FSB design is no more scalable and it is useful to develop new solution and designed a new microprocessor interconnect for its next generatim microprocessors. AMD introduced HyperTransport Technology to replace FSB in year 2001[13]. After replacing FSB with HyperTransport technology interconnect AMD has made good innovations in its architecture by bringing different versims of HyperTransport interconnect versions Lx, 2.0 and 3.0. Intel brought some very good innovations to FSB's architecture but they also realized the bottleneck of FSB as its performance was not scaling well to support its next generation microarchitectures of microprocessors. Intel in 2008 introduced QuickPath interconnect in its microarchitectures Nehalem and Tukwila [3]. QuickPath is low latency, high bandwidth, scalable point-to-point interconnect. This Interconnect is well equipped with features to support Intel's next generation of microprocessors [3]. In this paper we describe importance of interconnect or bus for processor performance and provided survey of various approaches used by different processor manufacturers. In section II of this paper FSB is described briefly, after having concept of FSB and its different variations a brief description about Intel's Quick Path architecture is given in section III. AMD's HyperTransport interconnect is discussed in section IV. At the end both Intel Quick Path and AMD HyperTransport technology are compared to conclude which one is the best for meeting the future demands of low latency, high bandwidth and scalable interconnect for future generations of microprocessors.

978-1-61284-941-6/11/$26.00 CS>2011

IEEE

21

II. FSB (FRONT SIDE BUS) FSB ( Front Side Bus) is a shared pathway for microprocessor through which it communicates with memory, hard disk and other devices.lt is bi-directional in nature as data flow in both directions on it. All the data which goes inside and outside of microprocessor will pass through FSB. Hence to achieve good performance from microprocessor, FSB of microprocessor must be good enough to process data and avoid the delay faced by microprocessor in waiting for response from memory and other 110 devices. The performance of FSB depends upon frequency, width and its data transfer per clock cycle, as the bandwidth of FSB depends on these three factors and is the product of them. Bandwidth of FSB (Width of FSB) X(Clock speed of FSB)X (Data transfer per clock cycle by FSB) [7] When selecting any microprocessor the people usually checkout its clock speed, its cache size and clock speed of FSB. The speed of FSB is mostly mentioned in MHz and which moslty people consider it FSB's clock speed. But in actual it is not its clock speed, but it is the number of transfer by FSB per clock cycle, it means if you have FSB 200rv1Hz with 2 transfer per clock cycle then its clock speed will be }OOMHz with 200MT(Mega Transfer) per second
=

TABLE! FRONT SIDE BuS WIDlH FSB Width( bits) 8 16 16 32 32 64 64 64 64 FSB Width( bits) 64 64 64 64 64 64 64 64 64

Processor 8088 8086 80286 80386 80486 Pentium 1 Pentium 2 Pentium 3 Pentium 4

Processor AMD K6 AMD K6 AMDDuron AMDAthlon AMD Atblon XP AMD Atblon 64 AMD Operon Itanium Core 2 Duo

A. FSB (Front Side Bus) Architecture The location of FSB with respect to microprocessor and other devices is shown very clearly in fig.}; it shows how much important FSB is for microprocessor's performance as FSB is the main point of contention for the memory and other devices to access microprocessor. It also shows that the presence of north and south bridge slow down the speed by increasing latency. With this FSB architecture there is more contention on FSB. Hence to have good performance, Intel brought some good changes to simple FSB architecture to achieve required performance. Intel increased FSB's performance by increasing its freqency, with transfer per cycle and width of FSB. In earlier microprocessor architectures AMD and Intel attempted to increase the width of FSB to move more data parallelly. Below is the table which shows the increase in width of FSB.

From the above table it can be noticed that the width is increased but not more than 64bits [7]. All the new architectures don't have FSB width more than 64bits because there is bottleneck for increasing its width. Increasing FSB width will increase power dissipation of bus as result heat is generated, which degrades performance. Hence designer started thinking on remaining two aspects of FSB which are increasing transfer per cycle and its clock speed. The next step Intel and AMD took for increasing FSB's performance was to increase number of transfer per clock cycle of FSB. The number of transfer per clock cycle depends on the type of technology used which drives buses. Intel used GTL+ ( Gunning transceiver logic) to transfer one transfer per clock cycle. Later in Pentium 4 Intel used AGTL+(Assisted Gunning transceiver logic) to increase to 4 transfer per cycle.[4] Similarly AMD also used EV6 signaling to increase transfer to 2 transfer per clock cycle.

Microprocessor

Fig. 2 FSB architecture with four Core Processor

Fig. 1 Front Side Bus Architecture

The other important drawback of FSB was that its performance was not scaling well with increase in number of cores of microprocessor in a single chip. In fig.2 it is clearly shown how four cores will contend over the shared FSB which will lead to performance degradation.

22

Intel came up with the solution for shared FSB and introduced DIB (Dual Independent Bus) in 2004 which in actual 2X the bandwidth of FSB by providing separate pathways to four processors. Howeve the snoop traffic was still broadcasted on both buses sImultaneously. To solve this problem Intel brought a solution in 2005 and put a snoop filter to avoid broadcasting of snoop traffic [1]. The architecture of DIB is shown in fig.3.

Memory Interface

Other Devices Interface

Fig. 4 Intel QuickPath architecture

Fig. 3 Intel's DIB (Dual Independent Bus)

Intel further enhanced the DIB architecture and provided separate paths to each ore in a sin e chip . which again increased the bandwldth.Thls archItecture was call1ed DHSI (Dedicated High Speed Inteconnect). DHSI architecture was introduced in Intel's Hi-k and Xeon microprocessors. After DHSI Intel realized tht FSB's architecture was not able to scale more for theIr next generations of microprocessors. Intel introduced QuickPath interconnect in its microarchitecture Nehalm (Code name). AMD since 2001 replaced FSB wIth HyperTransport interconnect and till now have come up with different versions of HyperTransport. III. INTEL QUICKPATH INTERCONNECT Intel introduced QuickPath interconnect to overcome bottlenecks of FSB, in 200S Nehalem and Tukwila microarchitectures were introduced which are based on QuickPath. Initially its code name was CSI (Common System Interconnect) but it was intruce with he name of QuickPath in Nehalem and TukwIla roarhlt:ctures. Intel Quick Path was designed by keepIng In VIew all bottlenecks of FSB architecture and future demands for fast microprocessor's interconnect. A. QuickPath Interconnect Architecture Intel QuickPath Interconnect provide low latenc, high bandwidth scalable and distributed memory POInt-to point intrconnect. Intel QuickPath interconnect is specially optimized for multicore microprocessors to take advantage of their fast processing by providing separate memory controller at each core and providing poit-to point interconnect between the cores ?f m?ltlcore microprocessor. The microprocessor cores In QUlckPath will now communicate with I/O devices via separate dedicated 110 hub. Separate unidirectional serial links for read and write are provided which allows simultaneous read and write of data from processor which were not available in FSB [2].

In each direction one full link is provided which is consisted of 20bits lanes among which 16bits are used for data transfer and 4bits are used CRC to give flexibility to receiver to check the correctness of data. The data on these 20 single bits lanes is transferred serially. Due to serial communication QuickPath interconnect have achieved greater clock speed as compared to FSB. QuickPath till now can have clock speed up to 3.2 GHz with two transfers per clock cycle.Hence with 16bits wide bus it can have bandwidth of 12.SGB/s in single direction and overall 25.6GB/s (3.2GHz X 2 transfer X 16 bits width/S) bandwidth. Where the maximum clock speed with FSB is 400MHz (400MHz X 4 Transfer X SB width 12.SGB/s) with 12.SGB/s bandwidth where on this bandwidth only read or only write can be performed once. [2][1]. . With Intel QuickPath interconnect less bIts are trasferred parallely as in FSB's width is 64bits and QuickPath have 20bits width, but despite of this fact Intel QuickPath transfers the data more faster then FSB because latency has been reduced remarkably by providing point-to-point interconnect and separate memory controller on each microprocessor core. Further these memory controllers have two paths to access memory attached to them as result increase in memory bandwidth to 2X. The memories attached to each processor are non-uniformly accessed (NUMA architecture) by each microprocessor. The other advantage of separate memory controllers is that memory access bandwidth will scale with increase in the number of cores as they will have their separate memory controllers and memories. With such features Intel QuickPath interconnect provide great support to leading memory technologies [3][2].
=

B. QuickPath Layered Approach Intel QuickPath interconnect distinguishes itself with remarkable reduction in latency and increase in bandwidth but there is also another great feature which Intel introduced in QuickPath is the grouping of all

23

functionalities into layers. Layered design of QuickPath provide greater flexibility in terms of designing and troubleshooting. Before Intel QuickPath there was no concept of layering in the processor's bus. Intel divided the functionalities of QuickPath into Four layers. Physical layer, Transport layer, Link Layer and Protocol layer. 1) Physical Layer: The physical layer of QuickPath deals with the wires which carries the electrical signals and with the logical circuitry required for the transmission and reception of data. The unit of physical layer is phit which consists of 20 bits, a single Phit which is 20bits long is transferred with single clock edge. Physical layer also defines the characteristics of individual signals [6]. The QuickPath Interconnect has two serial links in each direction for data transfer. A full link in one direction is consisted of 20 one bit lanes which uses differential signaling for the data transfer [4]. It is not necessary to use full 20 bits link as it can be used as half and quarter width according to processing needs, error recovery and to reduced the power consumption on processing node. One forward clock is provided with a full link in each direction.

Fig. 5 Intel QuickPath Layers

TX

20 mial single bit Lanes

11

RX

ClK

ClK

RX

!\f

,erial singl e lane,

TX

Fig. 6 QuickPath 20 bits serial lanes

For making Intel QuickPath Interconnect total 84 signals are required which are much less as compared to FSB, which has 150 signals. QuickPath interconnect also uses IBIST (Interconnect Built in Self Test) tool for testing interconnect path at full operational speed, hence there is no need for any external tool for testing Quick Path links. Waveform equalizer is used to have optimized characteristics for good transmission[l].

2) Link Layer: Link layer has great importance in Intel QuickPath architecture as this layer contains very good characteristics for achieving reliability, availability and serviceability for microprocessors. This layer is an abstraction of physical layer to the upper layers[4]. The main characteristics of link layer are , maintaining the message classes, defining the virtual networks, providing credit and debit scheme for flow control and supporting reliable communication. The transfer unit of link layer is flit which consist of 80 bits hence single flits contains four Phits (Physical layer unit). Currently Intel QuickPath architecture supports six message classes (Snoop SNP, Home HOM, Non-data Response NDR, Data Response DR, Non-coherent Standard NCS, Non-Coherent Bypass NCB). These message classes support unordered transfer of data and are used in virtual networks to support replicating channels. With the help of virtual channels reliable routing and complex topologies can be supported. Currently Link layer provide three virtual channels VNo, VN 1 and VNA (Adaptive Virtual Network). Each of six message classes support three virtual channels. Multicore processor with two cores mostly usesVNo and VNA 1 and multicore microprocessors with cores more than two uses three of them [1]. Link layer provides the credit/debit scheme for controlling the flow of data between two entities. This scheme works at the initialization of communication when receiver set at credit number (the number of packets or flits It want to receive) and sends it to sender. Sender then starts sending the packets or flits and with each send it decrement the credit counter, when sender credit is over it stops sending the data. The very important responsibility of link layer is to provide the reliable communication between entities by providing CRC(Cyclic Redundancy Check). The link layer unit of communication is flits with size 80bits among these 8 bit are used for CRC and are transferred in parallel to receiver which checks the error and reports retransmission if error is found[l][6]. 3)Routing Layer:A routing layer was added into Intel QuickPath layered architecture to have the shortest and more reliable link for a destined packet. At this layer the routing tables are managed to provide best RAS (Reliable,Available,Serviceable) and shortest path for the packet to be travelled. Routing tables are initialzed by firmware at the time of boot and are unchanged in the system having less core but these tables are more dynamic in the systems that have more cores.[6] Another main idea behind the inclusion of routing layer was to partition the system into smaller sub systems, as result separate logical operatoions for these sub systems[4][6]. 4) Protocol Layer:The protocol layer is the most upper layer in Intel QuickPath interconnect architecture. This layer in actual deals with the coherence issues of the data. This layer maintains the coherence between cache controllers and distributed memory controllers. At his layer the non-coherent data of the systems is also delt.

24

This layer adds a new state F (Forward) state to the MESI protocol which enables fast transfer of data in shared stateon QuickPath Interconnect. [5] For cache coherence management, Intel QuickPath Interconnect uses Source based snooping for small systems and Home (directory) based snooping for larger systems. These both snooping techniques have low latencies in their respective systems. In small systems Source based snooping is used. When any node requires data then it sends request simultaneously to memory and all attached nodes, as result if any cache contains that data then it send its copy to the requesting node and update memory about this. Hence, this whole operation is done just in two messages. Otherwise, it would have taken more time and 4 messages. But in large systems source based snooping might lead to greater latency because some nodes might be situated at longer distance. In such situation Home (memory controller directory) based snooping is best option. In Home based snooping requester sends request to directory which maintains the record of all data hold by different caches. So, if any node is situated at longer distance and that node doesn't have copy of required data then directory would not send request to waste time. The directory in such case will only send request to the valid caches [5][1]. At Protocol layer QuickPath adds a forward state into MESI protocol because, if any node request a data to all nodes and two or more caches have its copy in shared state then they all would sent data to requestor which will create confusion to the requester to accept which copy. But with the addition of forward state only one node enters into forwards state and sends copy to the requesting node [6]. C. Compatibility With Desktops And Servers Intel Quick Path Interconnect provide RAS (reliability Availability and Serviceability) for server side computing. It provide reliability by providing CRC parallelly with data, Availability by providing redundant memory controllers and serviceability by providing self healing functionality in case if any data lane starts malfunctioning then QuickPath stops sending data on that. QuickPath in such case starts functioning in half (Lanes) mode or quarter mode. Besides this for servers Home based snooping is a good option for reducing the latency. Similarly for desktop computers Source based snooping reduces the latency [5][2]. IV. AMD's HYPERTRANSPORT INTERCONNECT AMD's HyperTransport interconnect is a low latency, high bandwidth, scalable, packetized point-to-point interconnect. HyperTranport interconnect was designed to meet high bandwidth, low latency demands which cannot be fulfilled by conventional FSB. AMD took step and replaced entire FSB's architecture with HyperTransport Interconnect. Initially HyperTransport 1.0 was introduced in 2001 to replace FSB. After the introduction of HT 1.0 AMD have brought very good innvations in HyperTransport architecture and now it is available in three flavors chip to chip interconnect, board to board

interconnect and System to System interconnect. [11] Till now HypeTransport Interconnect have achieved clock speed of 3.2GHz and single link bandwidth of 51.2GB/s.
TABLE II HYPERTRANSPORTVERSIONS [12] Hyper Transport Version HT 1.0 HT 2.0 HT 3.0 HT 3.1 Clock Speed 800MHz 1.4GHz 2.1 GHz 3.2 GHz Year 2001 2004 2006 2008 Interconnect Type Chip to Chip Chip to Chip Board to Board Board to Board

A. HyperTransport Interconnect Architecture AMD's HyperTransport is point-to-point low latency, high bandwidth, and scalable interconect. In the architecture of HyperTransport at every core of processor separete memory controllers are provided, besides this all cores of processors are connected into point-to-point. HyperTransport link can be of 2,4,16 or 32 bits. This 32 bits ( max) link is narrower than PCI-X and FSB widths. This results in reducing the power consumptions and lead to less errors and low latency. The data link between the node and I/O device not need to be symmetrical they can vary in 2, 4, 8, 16 or 32 bits width [13].
DDR
-

Processor

Processor and VO expension

PCI-X Bridge

Fig. 7 Initial HyperTransport architecture

HyperTransport link is packetized as result of this address, command and data can be sent across the same link at same time and there is no need for the dedicated pins. [13] This also helps in reducing the link width. HyperTransport's Interconnect comes in three main versions HT Lx, HT 2.0, HT 3.0 and HT 3.1. These three versions may vary in some characteristics but their general architecture is same. These features which are common among them are two unidirectional links with low voltage differential signalling, the links are scalable in width from 2-bits to 32.bits, and links support asymetric link width means two links can have different width, virtual channel support. These four verions of HyperTransport Interconnect are also different by their clock speeds and bandwidth but similar in low latency architecture.HT 3.x version have extra features of dynamic clocking for adjusting the link width and link spliting to half link.[12]

25

HYPERTRANSPORT CLOCK AND BANDWIDlH Hyper Transport Versions HT Lx HT 2.0 HT 3.0 HT 3.1 Clock Speed 800MHz L4GHz 2.6GHz 3.2GHz Bandwidth 12.8GBls 22.4GG/s 4L6GBls 51.2GB/s

TABLE III

and link width optimization. e.g if two devices with asymmetric link widths want to communicate via HyperTransport so it is the Session layer which negotiates at the boot time to optimize the link width for both devices. C. HyperTransport Support For Industry The domain of HyperTransport is not limited to microprocessor, as HyperTransport Interconnect is available in Chip to Chip, Board-to-Board and System-to System variations. Different companies have taken the license from AMD for using HyperTransport as Interconnect in their devices. The famous partner of AMD for HyperTransport are Sun Micro Systems, Cisco Sytems, Hp, Acer, Texas Instruments and others. V. RELATED WORK A substantial amount of work has been done over On Chip interconnect due to its importance for On-Chip as well as outside-Chip performance. On-Chip interconnects importance increases when there is talk about multi-core microprocessor, Kumar and Zyuban [14] have really worked well on the analysis for multi-core processor interconnect, they have tested the performance of different interconnects for On-Chip multi-core microprocessors.[17] Reeuparna and others have worked on the design and evaluation of hierarchical On-Chip interconnects. Juan Gomez and their fellows have worked on simulation of coherency in MESI protocol via Cache coherence simulator [18]. AMD is doing good amount of research on On-Chip interconnect via HyperTransport Consortium. Similarly after introduction of QuickPath by Intel, they are carrying further research on it. Fedy Abi-Chahla [15] has studied the different aspects of Intel QuickPath in terms of network, cache coherence, memory subsystem and support for multi-core. VI. CONCLUSIONS In this paper we have made comparative study of two latest next generations interconnect which are developed by the leading microprocessor manufacturers. This study distinguishes itself as, first the bottle necks of traditional FSB architecture are discussed thoroughly. Secondly the layered approach of both interconnects is discussed, which provides low latency, high bandwidth, distributed memory, self healing functionality features. Both interconnect have some edges over each other. HyperTransport has some edge in terms as it has scaled from On-Chip to chip- to-chip, board-to-board and system-to-system. In this research article, we have worked only on On-Chip interconnects, whereas we are planning to provide thorough research survey of chip-to chip, board-to-board, and system-to-system interconnects in future. As far as our On-Chip survey is concerned, we have concluded that Intel QuickPath has more edge over HyperTransport. QuickPath provides all possible means to achieve high performance on single chip microprocessor. QuickPath provides NUMA architecture, adding F state to speed up cache coherence, credit and

B. HyperTransport Layered Architecture AMD has provided much flexibility in HyperTransport architecture by mapping it into five layer architecture. This layered architecture provides much fexibility in HyperTransport interconnect architecture and is very helpful in increasing its performance and scalability. This layered archiecture support five layers Physical Layer, Data Link Layer, Protocol Layer, Transaction Layer and Session Layer. l)Physical Layer: This Layer deals with the electrical characteristics, clock, data and control lines of HyperTransport Interconnect. The data path of HyperTransport is point-to-point link with variable width of 2, 4, 8, 16, 32bits in each direction. Due to Packetized traffic the command, data and address will travel across the same set if wires eliminating the extra need of separate wires and pins. AMD uses Low voltage differential signalling for achieving high date rates by having less power consumption. [13] Hyper Transport link are double pumped due to which they transfer data on both rising and falling edge of clock signal. 2) Data Link Layer: At the Data Link layer HyperTransport includes initialization for the adjustment of link width for different devices support. The devices with different asymmetric data links can be connected via HyperTransport interconnect. It uses sequence ID in packet to support virtual channel to separate the packet coming from different virtual channels. HyperTransport implements periodic CRC to detect error and retransmit packet accordingly. [21] 3)Protocol Layer: The protocol Layer of Hyper transport deals with the virtual channels, the commands and ordering rules which are helpful in the flow control. HyperTransport supports six virtual channels.The most common virtual channel used are Post, Non-Posted and Responses with commands Post write, Broadcast, Fence Non-Posted Write, Flush and Read Response.[9]. HyperTransport can handle multiple streams from different devices which are identified by UnitID field in the Packet. 4) Transaction Layer: The read and write operations are performed at Transaction layer with the help of elements which are provided by Protocol layer. The read, Write and read response command can be put with he help of virtual channel(Protocol layer elements). At the transaction layer HyperTransport maintaince low latency by providing less overhead of 8 byte for read request control packet following data and 8 bytes for write request control packet. 5) Session Layer: The Session layer for HyperTransport is theplatform where it negotiates for Power management, interrupts, frequency optimization

26

debit scheme to flow control, it is more dynamic in all aspects than HyperTransport for single chip.
REFERENCES [I] Research at Intel Corporation, An Introduction to the Intel January Interconnect QuickPath 2009.www.intel.comltechnology/quickpathl introductionpdf Research at Intel Corporation, White Paper, Intel QuickPath Architecture, new system architecture for unleashing the performance of future generations of Intel multi-core microprocessors, http://www.intel.com!technology /quickpathlwhitepaper. pdf Research Intel Corporation, WHITE PAPER, Intel Xeon processor 3500 and 5500 series Intel M icroarchitecture, http://www.intel.com! technology/architecture-siliconlnext genl319724.pdf Research at Intel Corporation, The Essentials of the InteI Q.IickPath Interconnect Electrical Architecture, http://www.intel.com!intelpresslfiles /Intelo/02Sr"/029_Q.IickPath_Interconnect_E1ectrial_Arcbitecture.pdf Research at Intel Corporation, First Look at the Intel QuickPath Interconnect, http://www.intel.com.intelpresslarticles/ A]irst_Look _auhe_InteIO/02Sr"/029 _QuickPath_Interconnect.pdf Research at Intel Corporation, The Architecture of the Intel QuickPath Interconnect.http://www.intel.com.intelpress/articleslThe_ Architec ture _oCthe_Intel%2Sr"/029_QuickPath_Interconnect.pdf From Wikipedia, the fre e encyclopedia, Front-side bus, http://en.wikipedia.orglwikilFront-side_bus Research at AMD Corporation, June 2004, Hyper Transport Consortium, Hyper Transport VO Technology Overview, an Optimized. Low-Latency Board Level Architecture, http://www.hypertransport.orgidocslwp/HT_Overview.pdf Research at AMD Corporation, White Paper, AMD Sunny vale July 2001, Hyper Transport Technology VO Link, A High [10]

[2]

[II]

[12]

[3]

[13]

[4]

[14]

[5]

[15]

[6]

[16]

[7] [S]

[17]

[IS]

[9]

Bandwidth VO Architecture, http://www.hypertransport.orgidocs/wp/25012A_HTWhit]aper_ vI.I.pdf Research at AMD Corporation, HyperTransport HTC _ WP04 White Paper June 2004, HyperTransport Consortium, HyperTransport VO Technology Comparsion with Traditional and Emerging I/O Technologies, http://www.hypertransport.orgidocs/wp/HT_Comparisonpdf HyperTransport Consortium, HyperTransport Link Specifications, http://www.hypertransport.orgidefaultcfin?page=HyperTransport Specifications Research at AMD Corporation, HyperTransport Overview, HyperTransport Consortium,http://www.hypertransport.orgidefault.cfm?page=Tec hnology Sean Cleveland, Scott Swanstrom and ChrisNeuts, Research at AMD, HyperTransport Technology: Simplifiying System Design. July 2002,http://www.hypertransport.orgidocslwplHT_System_Design.pdf R. Kumar, Zyuban and Tullsen, "Interconnections in multi-core architectures", SIGARCH Comput, Archit, vo1.33, no.2, pp. 40S419 2005. Fedy Abi-Chahla, QuickPath Interconnect. October 200S, http://www.tomshardware.comlreviewsiIntel-i 7 -nehalem cpu,2041-S.html. Research at AMD, White Paper, Doug O'Flaherty and Michael GoddaJd, AMD Opteron Processor Benchnmking for Clustered Systems.http://www.opteronics.com.pd1739497A_HPC_WhitePaper _2xCli.pdf Reetuparna Das, Soumya, Asit, Vijaykrishnan, Chita R. Das, Design and Evaluation of Hierarchical On-Chip Interconnect for Next Generation CMPs, http://www.cse.psu.edul-rdaslpapersl topology-hpca09.pdf Juan G6mez-Luna, Ezequiel Herruzo and Jose Ignacio, MESI Cache Coherence Simulator for teaching Purposes, http://www.clei.cl/c1eiej/paperslvI2iIp5.pdf

27

You might also like