You are on page 1of 23

Two Xeon CPUs Are Better Than One Intel P4 Extreme

Platform
By # , MAY 14, 2004 11:00 AM

1. A Dual Vs. Single Processor Price Comparison


Designed primarily for server and workstation applications, dual Xeon systems have largely led a niche
existence. Additionally, their high price made them unattractive for standard users. Dual Xeon systems also
required expensive storage modules, special pow er packs and big, ugly cases. Now , however, the situation
has changed considerably.

W hen w e compare, for example, the price of a Pentium 4 Extreme 3.2 GHz against two Xeons with 2.8 GHz,
w e see that the latter option turns out to be much less expensive. A Pentium 4 Extreme costs $950, w hile tw o
2.8 GHz Xeons can be had for $760. Applications that explicitly support the dual processor environments
usually operate much faster with two CPUs than with one.

Also, a lot has happened in the area of memory technology. Thanks to the introduction of the AMD Athlon FX,
Registered DDR memory has clearly become cheaper; even many no-name manufacturers have sw itched over
to it. Two 512 MB modules, for example, can already be had for $250. In addition to that, finally there are
currently motherboards for the Xeon Socket 604, which can operate w ith unbuffered memory - provided they
are based on the E7505 chipset from Intel. Until now , this market segment was dominated by the space-
hogging WTX boards, but now many manufacturers also offer such systems in the usual ATX format, and a
Dual Socket 604 board fits w ithout any problems into a conventional desktop tower. The prices for such
motherboards start at around $260. Due to the price situation, the dual-capable E7505/Placer chipset is an
obvious choice, especially for the Xeon.
Cinema 4D with scene renderings

Even w hen taking HyperThreading processes into consideration, there are big everyday advantages for
certain users w ho have a PC equipped w ith dual processors. As a result, softw are for graphics rendering,
video and audio encoding and simultaneous operation of tw o or more calculation-intensive applications profit
from the impressive increases in performance. In the area of graphics rendering, there is dual-capable
softw are, such as 3D Studio MAX, Cinema 4D and Lightw ave; in video encoding, there is, for example,
MainConcept Encoder, Pinnacle Studio 9 or Flask Mpeg.

In addition to multiprocessor softw are usage, the user's w ork environment is also slowly changing. Because
graphics cards often have tw o slots, and monitors are relatively inexpensive, many users already use tw o
displays. Ambitious home users can tell you a thing or two about that: w hoever wants to encode a video and
start a game at the same time w ill immediately experience the limits of a single processor system. An
intelligently configured dual platform reacts differently.

Here, w e analyze Intel's dual-processor capable E7505/Placer chipset and offer tips for memory usage. In a
subsequent article, using a self-programmed tool, we w ill show that increases in performance can be achieved
w ith certain applications, as long as certain threads are not managed by an operating system but are
manually assigned to a CPU. In connection with that, w e have also completed a comparison test of E7505
motherboards, w hich w ill be posted soon on the website.

2. E7505/Placer Chipset Technology In Detail


The Intel E7505 chipset, code named "Placer," is based on a 180-nanometer process and is designed for tw o
processors. The chipset has the same FC-BGA package as the 875/Canterw ood, therefore it also has the
same number of 1,005 soldering balls.

W ith 143 mm2, the surface size of the silicon die seems bigger, because the 875 requires only 100 mm2. The
HUB 2.0 interface and the memory controller account for the larger surface area. This also keeps the price low
for the motherboard manufacturers. For the E7505, you have to pay $100 per unit in quantities of 1,000 -
tw ice as much compared to the 875.

E7505 chipset as a block diagram


The E7505 Northbridge from Intel

The E7505 Northbridge (a.k.a. Memory Controller Hub, abbreviated as MCH) is typically bundled with the ICH4
and P64H2 Southbridge. The ICH4 is connected to the HUB 1.5 interface and clocks a speed of 66 MHz. This
interface can transfer files to Northbridge at a maximum speed of 266 MB per second with an 8 bit bus width.

3. E7505/Placer Chipset Technology In Detail, Continued


The P64H2 bridge, on the other hand, operates at 133 MHz according to the HUB 2.0 protocol. This speed can
accommodate data transfer rates of up to 1 GB per second over a 16 bit w ide bus. Furthermore, the E7505
has a dual memory interface as w ell as an AGP 8x interface.
The ICH4 Southbridge from Intel

The ICH4 (82801DB) Southbridge, based on a 250-nm process, offers connectivity for six USB 2.0 ports, four
ATA100 drives, a 100 MBit LAN chip, an AC97 sound decoder and support for a maximum of six PCI master
devices, each w ith 133 MB per second bandwidth.
Intel's P64H2 Southbridge

Things w ork differently with the P64H2 (82870P2) Bridge. It w as designed for the fast PCI 64 and PCI X
interfaces. The PCI 64 interface corresponds to Version 2.3. Both operate in 64 bit mode. All motherboards
w ith an E7505 chipset in the WTX format have connection possibilities for a maximum of three PCI 64 and one
PCI X cards. PCI 64 operates either with 33 MHz or 66 MHz, resulting in transfer rates of betw een 266 MB/sec
and 533 MB/sec (maximum). In comparison, the PCI X operates w ith 66 MHz, 100 MHz and 133 MHz. Data
transfer rates of betw een 533 MB/sec and 1066 MB/sec are reached.
Block diagram of the E7505 chipset with P64H2 Southbridge

4. Data Transfer Rates Depending On PCI Standard


Transfer rates
Standard Bit Clock
(bi-directional)
PCI 2.3 32 Bit 33 MHz 133 MB/sec
PCI 2.3 32 Bit 66 MHz 266 MB/sec
PCI 64 64 Bit 33 MHz 266 MB/sec
PCI 64 64 Bit 66 MHz 533 MB/sec
PCI-X 1.0 64 Bit 66 MHz 533 MB/sec
PCI-X 1.0 64 Bit 100 MHz 800 MB/sec
PCI-X 1.0 64 Bit 133 MHz 1066 MB/sec
PCI-X 2.0 (DDR) 64 Bit 133 MHz 2132 MB/sec
PCI-X 2.0 (QDR)64 Bit 133 MHz 4264 MB/sec
PCI-Express 1 Lines 8 Bit 2.5 GHz 512 MB/sec
PCI-Express 2 Lines 8 Bit 2.5 GHz 1 GB/sec
PCI-Express 4 Lines 8 Bit 2.5 GHz 2 GB/sec
PCI-Express 8 Lines 8 Bit 2.5 GHz 4 GB/sec
PCI-Express 16 Lines 8 Bit 2.5 GHz 8 GB/sec
The HUB 2 connection offers a maximum data transfer rate of 1 GB per second betw een Southbridge and
Northbridge, the follow ing combination possibilities resulting for a maximum interface load for a P64H2 chip:

1. 1x PCI-X 133 MHz = 1066 MB/sec


2. 1x PCI-X 100 MHz = 800 MB/sec

3. 2x PCI-X 66 MHz = 1066 MB/sec

4. 2x PCI 64 66 MHz = 1066 MB/sec

5. 3x PCI 64 33 MHz = 798 MB/sec

Standard cards do not exhaust data transfer rates of 1,066 MB per second. Only high-performance products,
such as SCSI320 cards (320 MB/s) or 10 GB LAN chips (max. 1250 MB per second), w ould be sensible
candidates.

Many PCI cards are capable of performing their services not only with conventional PCI 2.3 slots, but also with
a PCI 64 slot. Examples of these are network cards, RAID controllers and even 56K modems. In order to avoid
incorrect configurations, they have an additional notch on the connection contacts.

A 56K modem for a 64 bit slot

The 56K modem operates here w ith conventional 33 MHz in 32 bit mode.

A Promise SATA controller for a 64 bit slot

This Raid controller can handle even 66 MHz in 32 bit mode.

5. Comparison Of Current Workstation Chipsets From Intel


Chipset I860 I875P E7205 E7505
MCH 82860 82875P E7205 E7505
Codename Colusa Canterw ood Granite Bay Placer
Developed for Xeon DP Pentium 4 Pentium 4 Xeon DP
Hyper Threading Yes yes yes yes
Support
Number of supported
1-2 1 1 1-2
CPUs
FSB 100 MHz 133/200 MHz 100/133 MHz 100/133 MHz
4 RIMMS
Memory modules 4 DIMMs 4 DIMMs 4 DIMMs
(8 w ith MRH-R)
Dual-
Channels Single-Channel Dual-Channel Dual-Channel
Channel
Memory type PD800/600 RDRAM DDR266/333/400 DDR200/266 DDR266
Max. Memory 4 GB (w ith 2 Repeaters) 4 GB 4 GB 16 GB
Number of Row s 32 8 4 6
288/256 128/256/512
Mbit Support 128/256/512 128/256/512
144/128 1024
ECC Yes Yes Yes Yes
Graphic Interface
1x/2x/4x
1x/2x/4x (1.5V)
AGP 2x/4x (1.5) 4x/8x (1.5V) (1.5V)
4x/8x (0.8V)
4x/8x (0.8V)
I/O HUB
ICH5 (82801EB)
ICH4
Southbridges ICH2 (82801BA) ICH5R ICH4 (82801DB)
(82801DB)
(82801ER)
PCI-Standard 2.2 2.3 2.2 2.2
PCI Master Slots (max) 6 6 6 6
ATA
IDE ATA 33/66/100 ATA 33/66/100 ATA 33/66/100
33/66/100
SATA Support No 2 No No
6x USB 1.1
USB Ports 4x USB 1.1 8x USB 2.0 6x USB 2.0
USB 2.0 (P64H2)
Integrated10/100
LAN Yes CSA 266 MHz Yes
Mbit
AC'97 Audio/Modem AC'97 2.3 Yes AC'97 2.3
Manageability
I/O Management SMBus/GPIO SMBus 2.0/GPIO SMBus/GPIO SMBus 2.0/GPIO
I/O HUB (Expansion)
PCI Controller P64H n/a n/a P64H2
2x 64Bit PCI/PCI-X
PCI 64 (2x 66 MHz) or PCI 33 (4x 33 PCI max 66 MHz
PCI Support n/a n/a
MHz) PCI-X max 133
MHz
PCI Master 6 n/a n/a 3

Chipset Price

Chipset Codename Price per 1000


E7505 Placer $100
E7501 Plumas 533 $92
E7500 Plumas $92
E7205 Granite Bay $57
I875P Canterw ood $50
I865PE Springdale $28
6. Memory Of Up To 16 GB
Because the Intel E7505 chipset always synchronizes the processor data bus w ith the main memory (1:1),
only DDR266 memory is suitable for such a platform. As with the 875, the chipset has a dual memory
controller, with w hich it can attain a theoretical memory performance of up to 4.2 GB per second at 133 MHz.
For comparison purposes : The 875 chipset manages 6.4 GB per second on the basis of its higher speed of
200 MHz. System security, how ever, plays a bigger role, and that’s why Intel integrates the ECC (Error
Checking and Correction) option.

Upgrade : ECC Requires An Additional Chip Per Row

Like the 875 chipset, the E7505 manages 8 rows (also called pages). Reminder : 1 memory module has either
one row (single page) or two rows (double page). The follow ing table provides a sample calculation for the
respective maximum memory upgrade of the platforms (w ithout ECC) :

Memory expansion Module typical structure (non-ECC)


1 GB 2 4 Row s x 8 Chips x 256 MBit = 8,192 MBit
2 GB 4 8 Row s x 8 Chips x 256 MBit = 16,384 MBit
4 GB 4 8 Row s x 8 Chips x 512 MBit = 32,796 MBit
8 GB 4 8 Row s x 16 Chips x 512 MBit = 65,536 MBit
16 GB 4 8 Row s x 16 Chips x 1 GBit = 131,072 MBit
How ever, if the user wants to play it safe and use modules w ith ECC, then he should note that an additional
chip w ould have to be added per row. This chip is merely responsible for the proof totals and does not have
any influence on the maximum memory upgrade.

Number of possible chips without ECC Number of possible chips w ith ECC
8 9
16 18

Memory from Corsair w ith CL 2.0-3-2-6 timings

Registered and ECC memory from Mushkin w ith CL 2.0-3-2 timings

Registered and ECC memory from Legacy Electronics w ith CL 2.5 timings

DDR333 Registered and ECC memory from Infineon w ith CL 2.5 timings

To give you a worst case example : Modules w ith 16 GB ECC system memory can consist of 144 chips - an
enormous burden for the Memory Controller Hub ! However, only 128 of these chips are used for actual
memory functions, w hile the rest is used for administrative tasks.

Registered Versus Unbuffered Memory

Classical memory is alw ays available in unbuffered versions. W hat’s new is registered memory (previously
referred to as buffered memory). The more chips a memory controller has to manage, the less clear the data
signals w ill be.

And now the trick : If you put a small manager in front of the nose of individual memory chips, every row /page
w ill trick the memory controller into believing that only one chip is available. And this improves the signal
quality and data security. But this comes at the cost of speed because the small register chip causes a short
time delay in the electrical signals.

7. Dual Xeon Double Offer


In Task Manager, two real and two virtual processors are shown.

As a rule, every Socket 604 Xeon CPU is suitable for HyperThreading technology. The E7505 is capable of
simultaneously operating w ith tw o processors, as well as with HyperThreaded applications. As a result, four
processors (tw o physical and tw o virtual) are available for the operating system. Nevertheless, the chipset
has only one CPU interface, w hich means that both processors have to share one bus. At a speed of 133 MHz
(533 MHz QDR), a bandwidth of 4.2 GB per second results. In a worst-case scenario, each virtual CPU w ill
receive only one data flow at only 1 GB per second. However, this could have negative effects only with some
OpenGL applications.

On the left, the Xeon and on the right, the P4 Northwood from Intel

The Intel Xeon (code name "Prestonia") is based on the same core as the Pentium 4 "Northwood". The latter
operates with an FSB of 200 MHz (800 MHz QDR), and compiles at 6.4 GB per second. In order to balance out
the up to 34% low er bandw idth w ith the Xeon, Intel also offers models w ith 1 or 2 MB L3 cache, beginning
w ith the 2.4 GHz versions.

Prices For Current Xeon Processors

Intel Xeon Processor (Socket 604)


Processor Codename FSB L2 Cache L3 Cache Price per 1000
Xeon 2.0 GHz Prestonia 133 MHz 512 kB n/a $198
Xeon 2.4 GHz Prestonia 133 MHz 512 kB n/a $209
Xeon 2.66 GHz Prestonia 133 MHz 512 kB n/a $256
Xeon 2.8 GHz Prestonia 133 MHz 512 kB n/a $316
Xeon 3.06 GHz Prestonia 133 MHz 512 kB n/a $455
Xeon 2.4 GHz Prestonia 133 MHz 512 kB 1024 kB $316
Xeon 2.8 GHz Prestonia 133 MHz 512 kB 1024 kB $455
Xeon 3.06 GHz Prestonia 133 MHz 512 kB 1024 kB $690
Xeon 3.2 GHz Prestonia 133 MHz 512 kB 1024 kB $581
Xeon 3.2 GHz Prestonia 2M133 MHz 512 kB 2048 kB $1043
An analysis of availability and prices show s that 2.66 GHz models provide the best price-performance ratio.
Intel's next step is to increase the FSB to 200 MHz (800 MHz QDR). Once again, this w ill mean new chipsets.

8. The Right Motherboard Format: ATX Or WTX

On the left, the big WTX format and on the right, the ATX format

Compared to standard ATX boards, Xeon w orkstation boards have considerably more units, including, for
example, PCI64/X interfaces, tw o Southbridges, LAN chips, voltage regulators, CPU socket or an additional
SCSI controller. In order to accommodate the higher number of components, larger boards in WTX standard
are required. These have a 32.94% larger surface area, measuring 33 x 33.5 cm compared to the ATX boards
(30.5 x 24.5 cm). Boards w ith a WTX form factor do not fit in a conventional home PC case. The manufacturers
MSI and Tyan also offer motherboards w ithout the additional components, such as P64H2 Bridge and LAN in
an ATX format. At any rate, installing them in a conventional tow er would not be a problem.

The Right Power Adapter: ATX Or EPS12V


Fully stocked Xeon system

Because w e are talking here about a dual CPU platform, the processing unit's power loss is also doubled. The
fastest Xeon models w ith a Prestonia 2-M core and 3.2 GHz speed have, as a pair, a maximum pow er loss of
184 watts. Added to that are board components (an average of 50 w atts), a high-performance graphics card
w ith 70 w atts, and a large memory upgrade - all together, it quickly uses up 350 watts.

A 20-pole plug provides a motherboard with voltage

This overloads the pow er supply to the motherboard. As a result, the boards in WTX format have another
power adapter standard, w hich goes by the name of EPS12V. They have connections w ith more pow er and
mass cores, as w ell as w ider plugs in order to distribute the load better. As with the ATX form factor, the
power pins are also made of gold in order to attain a low er resistance and to therefore improve the quality of
the signals.

A voltage adapter from a Tagan power adapter (TG480-U01)

9. The Right Power Adapter: ATX Or EPS12V, Continued


W ith more than 350 w atts, today's ATX power adapters deliver sufficient pow er in order to be able to supply
dual systems in ATX format as w ell. In the meantime, there are pow er adapters on the market that support
both ATX and EPS12V standards w ith the aid of a special adapter cable. Consequently, this eliminates the
need for an eventual pow er-adapter replacement, and it saves additional expenses incurred w hen changing
systems. Many motherboards are capable of operating w ith both pow er adapter standards.

On the left, a 24-pin WTX plug and on the right, a 20-pin ATX plug
On the left an 8-pin WTX plug and on the right, a 4-pin PWR plug

The "20/24P" marking on the large voltage connection indicates that it can operate w ith the 24-pin W TX as
w ell as w ith the 20-pin ATX memory. The same applies as well to the "12V-8/4P" marking on the small AUX
connection - it supports the 8-pin as w ell as the 4-pin connections. Each of the four missing leads is a
redundant voltage pin for load sharing.

The various allocations of ATX and WTX plugs

For the pow er adapter with the EPS12V standard, additional +12V, +3.3V, +5 V and mass leads are connected
to the board.

AGP: Support For All Cards

The E7505 Northbridge offers support for AGP graphics cards, w hile most motherboards have a "Pro" slot.
W ith the Pro versions, the card is supplied w ith additional voltage pins.

Signaling Level
Data Rate AGP 3.0 1.5 V3.3 V
PCI-66 Yes Yes No
1 x AGP No Yes No
2 x AGP No Yes No
3 x AGP Yes Yes No
4 x AGP Yes No No
Support for the 3.0 standard is also offered, and all graphics cards available on the market can be used
w ithout any problem.

10. Test Configuration


Intel Processors (Socket 604)
133 MHz FSB (DUAL DDR266)Intel Xeon 3.06 GHz (3066 MHz, 12-8/512/1024 kB)
Intel Processors (Socket 478)
133 MHz FSB (DUAL DDR266)Pentium 4 3.06 GHz (3066 MHz, 12-8/512 kB)
200 MHz FSB (DUAL DDR400)Pentium 4 3.2E GHz (3200 MHz, 12-8/1024 kB)
200 MHz FSB (DUAL DDR400)Pentium 4EE 3.2 GHz (3400 MHz, 12-8/512/2048 kB)
Memory
2 x 512 MB / 5ns / 64 Bit (Corsair)
DDR400 (200 MHz)
CMX512-3200LL (CL 2.0-3-2-6)
2 x 512 MB / 5ns / 64 Bit (Mushkin) REG ECC
DDR400 (200 MHz)
MS64D64020U-5 (CL 2.0-3-2-6)
Common Hardw are
Terratec Aureon 7.1 Space
Sound Card
96.00 kHz sample rate
Asus A9800XT/TVD, Rev. 1.01
Graphics Card GPU : ATI Radeon 9800XT, 412 MHz Chip Clock
Memory : 256 MB DDR-SDRAM, 365 MHz Chip Clock
FastTrak S150 TX2plus (Bios : 1.00.0.30)
Hard Drive 2 x SATA Maxtor 6Y080M0 (Raid 0)
80 GB / 8 MB Cache / 7200 rpm
DVD/CD-ROM MSI MS-8216 16x DVD
Softw are
Chipset Installation Utility Ver. 5.1.1.1002
Chipset
IAA RAID Edition 3.5.3
Graphics ATI Catalyst XP 4.3 (Driver 6.14.10.6430)
Promise RAID 1.00.0.37
DirectX Version : 9b
OS Window s XP, Build 2600 SP1 (English)

11. Benchmarks And Settings


OpenGL
Version 1.32
1024x768 - 32 bit
Quake III Team Arena Timedemo1 / demo thg3
"custom timedemo"
Graphics detail = Normal
DirectX 9a
Version 3.4.0
3DMark 2003 Graphics and CPU Default Benchmark
1024 x 786 - 32 bit
Video
Version 1.4.1
Mainconcept MPEG Encoder 1.2 GB DV to MPEG II
(720x576, Audio) converting
Version: 9.0.0
Pinnacle Studio 9 Rendering - DVD Compatible
no Audio
Version: 9.00.00.2980
Window s Media Encoder 9 436 MB AVI File conversion to WMV
Window s Media Server (Streaming)
Version 2.0.3312.0
Microsoft Movie Maker
416 MB DV to W MV
Version 2.521
TMPGEnc Plus 1.2 GB DV to MPEG I
(720x576, Audio) converting
Audio
magix mp3 maker 2004 Version 4.11 Build 19593
Version 2.1
Syntrillum Cool Edit Pro Amplitude Normalizing
2.6 GB Wave Audio file
Applications
3D Studio Max 6.0 Rendering Single, 1024x768
Version 7.5c - Build 572
Render First Frame = 1
Render Last Frame = 60
Render Frame Step = 1
New tek Lightw ave Rendering Bench "variation.lw s"
Show Rendering in Progress = 320x240
Ray Trace Shadow s, Reflection, Refraction,
Transparency = on
Multithreading = 8 Threads
Version 8.503
Maxon Cinema 4D XL 8
Rendering in 1024x768, "ship_dirt"
Version 2003 (Enterprise Architect)
Microsoft Visual Studio .NET
Visual C++: compiling Emule 0.42b
Version 1.84
LIUtilities W inBackup 650 MB Wave file
Encryption: 256 Bit DES, Passw ord "test"
Synthetic
Build 1.1.0
PCMark 2004 Pro
CPU and Memory Tests
Version 2004.10.9.89
SiSoftware Sandra 2004 CPU Test: CPU Multimedia / CPU Arithmetic
Memory Test: Memory Bandw idth Benchmark

12. Benchmark Results


In the follow ing benchmarks, the differences in performance can be seen betw een a dual platform and a
"normal" Pentium 4 in single operation.

OpenGL
DirectX 9a

13. Video
14. Video, Continued
Audio
15. Applications
16. Applications, Continued
Synthetic

17. Synthetic, Continued


Conclusion

Applications already optimized for HyperThreading see performance gains from the use of two physical CPUs.
In view of system costs, it is therefore worthw hile for users to go with a Dual Xeon as their next system if
most of their time is spent rendering or encoding.

In the subsequent articles, w e w ill show how the various E7505 motherboards measure up in a head-to-head
comparison. We w ill also soon publish an article on how to increase performance by using a self-programmed
tool, w hich can assign tasks to certain CPUs. This tool takes away automatic task assignments from the
operating system and forces an application to run on a manually-specified CPU.

You might also like