You are on page 1of 21

Architecture or Parallel Computers CSC / ECE 506

Summer 2006 Introduction / Overview


5/22/2006
Dr Steve Hunter

Architecture of Parallel Computers


Taught jointly by Dr Ed Gehringer and Dr Steve Hunter Course days
Monday Wednesday 4:00 6:45 4:00 5:15

Goal: Understand the interaction of hardware and software with respect to parallel systems design and implementation. Textbook Parallel Computer Architecture, by Culler and Singh Selected papers possible

Arch of Parallel Computers

CSC / ECE 506

Architecture of Parallel Computers


Steves Info:
NCSU Adjunct Professor IBM Corporation Website: http://www.ee.duke.edu/~shunter/ email: hunters@us.ibm.com

Academic
Auburn University NC State University Duke University BSEE MSEE PhD

IBM Corporation
IBM Networking Division Systems and Technology Group 14 years 8 years

Areas of Interest
Systems and Network Architecture and Technology Computer and Network Performance and Dependability Server Clustering and Software Dependability
Arch of Parallel Computers
CSC / ECE 506 3

Course Outline (Tentative)


Mon. 4:00 5/225/24 5/295/31 6/56/7 6/126/14 1. Overview of parallel computation (1) Memorial Day 5. Data-parallel algorithms (8) 8. Invalidation-based cache-coherence protocols (13) 10. Scalable multiprocessors (16) 13. Scalable cache coherence (19) Independence Day 16. Extending cache coherence (15) 18. Open RDMA, open fabrics 21. Interconnection network topologies (25) 17. Open MPI 19. Memory consistency (22) 22. Routing in interconnection networks (26) 6. Cache organization (10) 9. Update-based coherence protocols and perf.(14) Mon. 5:30 2. Message-passing and data-parallel models (3-5) Wed. 4:00 3. Steps in parallelization (6) 4. Parallelizing the Ocean application (7) 7. The cache-coherence problem/interleaved memory (12) Test 1

6/196/21
6/266/'28 7/'37/5 7/107/12 7/177/19 7/247/26 7/318/2 8/78/9

11. Realizing programming models in scalable systems (17)


14. Directory-protocol correctness and performance (20)

12. Design space for communication architectures

15. The Silicon Graphics S2MP architecture (21) Test 2 20. Relaxed memory-consistency models (23) 23. Switch design (27)

Final exam

http://courses.ncsu.edu/csc506/lec/052/lectures/syllabus.html

Arch of Parallel Computers

CSC / ECE 506

What is Parallel Computer Architecture?


A Parallel Computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues:
Resource Allocation: how large a collection? how powerful are the elements? how much memory? Data access, Communication and Synchronization how do the elements cooperate and communicate? how are data transmitted between processors? what are the abstractions and primitives for cooperation? Performance and Scalability how does it all translate into performance? how does it scale?

Arch of Parallel Computers

CSC / ECE 506

Historical Perspective
Parallel computing was represented by competing models and corresponding unique architectures, no clear path for growth Competing Methods
Dataflow Systolic Arrays SIMD (bit serial) Shared Memory Message passing

Confusion occurs over which model to use paralyzed parallel software development
Section 1.2 shows several architectures. Shared-Memory Multiprocessors
Bus-based; Crossbar-based; * MIN-based

Message Passing Machines (Hypercube)


IBM SP2 Architecture
Arch of Parallel Computers
CSC / ECE 506 6

Why Study Parallel Computer Architecture?


Role of a computer architect:
To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost.

Parallelism:
Provides alternative to faster clock for performance Applies at all levels of system design Is a fascinating perspective from which to view architecture Traditionally central in information processing elements in the same locality However, greater networking bandwidth is expanding parallelism over greater distances.

Arch of Parallel Computers

CSC / ECE 506

Parallel Computation: Why and Why Not?


Pros
Performance Cost-effectiveness (commodity parts) Smooth upgrade path Fault Tolerance

Cons
Difficult to parallelize applications Requires automatic parallelization or parallel program development Software!

Arch of Parallel Computers

CSC / ECE 506

Is Parallel Computing Inevitable?


Application demands: (the need for computing cycles)
Petroleum (reservoir analysis) Automotive (crash simulation, drag analysis, combustion efficiency) Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism) Computer-aided design Pharmaceuticals (molecular modeling) Visualization in all of the above entertainment (films like Toy Story, The Hulk) architecture (walk-throughs and rendering) Financial modeling (yield and derivative analysis) Search Engines etc.

Arch of Parallel Computers

CSC / ECE 506

Application Trends
Application demand for performance fuels advances in hardware, which enables new applications, which...
Cycle drives exponential increase in microprocessor performance Drives parallel architecture harder most demanding applications

New Applications More Performance

Range of performance demands


Need range of system performance with progressively increasing cost
Arch of Parallel Computers

CSC / ECE 506

10

Speedup

Speedup (p processors) =

Performance (p processors) Performance (1 processor)

For a fixed problem size (input data set), performance = 1/time

Speedup fixed problem (p processors) =

Time (1 processor) Time (p processors)

Arch of Parallel Computers

CSC / ECE 506

11

Is Parallel Computing Inevitable?


Technology Trends
Chip technology continues to increase in density Driving frequency of single core designs requires too much power Use of commodity or off-the-shelf technology for low costs Multi-core processing becoming common among mainstream microprocessors (e.g., AMD, IBM, Intel) Greater interconnect bandwidth becoming generally available Standard interconnects: Infiniband, 10Gb Ethernet

Architecture Trends
Packaging parallel solutions in a common chassis e.g., Blade servers (IBM, HP, Dell, etc.) Software being packaged for mainstream solutions e.g., Windows Compute Cluster Server 2003 High availability commonly achieved by clustering of processing elements

Arch of Parallel Computers

CSC / ECE 506

12

Is Parallel Computing Inevitable?


Economics
The reducing costs of low end servers (dual and quad socket) with high bandwidth of interconnects is driving applications to be parallel Commodity microprocessors not only fast but CHEAP
Development costs tens of millions of dollars BUT, many more are sold compared to supercomputers Crucial to take advantage of the investment, and use the commodity building block

Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors Standardization makes small, bus-based SMPs commodity

Desktop: few smaller processors versus one larger one? Multiprocessor on a chip?

Arch of Parallel Computers

CSC / ECE 506

13

Scale Up vs Scale Out Model


Large SMP

Large Parallel Clusters


Scale Up / SMP Computing
x455

x445

High Density Rack Mount

BladeCenter
xSeries 335/eServer 325

Scale Out / Distributed Computing

Arch of Parallel Computers

CSC / ECE 506

14

Blade Server Example - BladeCenter


Nov 2002
BladeCenter
7U Chassis Form Factor Highest Density, Lowest cost Super power efficient, Consolidated Management

March 2004
BladeCenter T
8U Chassis Form Factor Highly rugged, Telco AC/DC, Long Life, NEBS, Air Filtration

Jan 2006
BladeCenter H
9U Chassis Form Factor Ultra High Performance 4xIB/10Gb Backplane New Management Module

Compatible Set of Blades and Switches


Web hosting/serving FSS, File/Print Geophysical Analysis Collaboration Graphic Rendering Telco/Core Applications Government Military Rugged Industrial DC Medical HPC Applications Technical Clusters Virtual Enterprise Solutions Future I/O

One family, many applications, many environments, long term investment protectionBladeCenter Simply Smarter IT

Arch of Parallel Computers

CSC / ECE 506

15

Blade Server Example BladeCenter H


Fourteen Blades in a 9U Chassis Form Factor
Blade and switch compatibility across BladeCenter and BladeCenter-T

High performance networking fabrics


New high performance switches and blade I/O Corresponding bridge bays for protocol translation

Power Enhancements
Four front load 2900W Power Supplies

Arch of Parallel Computers

CSC / ECE 506

16

BladeCenter Overview
Switching Modules
Ethernet Fiber Channel Infiniband

Blade I/O Card (or local drive)


I/O card matches switch technology in corresponding slot

Arch of Parallel Computers

CSC / ECE 506

17

BladeCenter H Architecture
I/O Bridge

High-speed Switch
Ethernet or Infiniband

Blade 1

HS Switch 1 HS Switch 2

4x (16 wire) blade links


4x (16 wire) bridge links 1x (4 wire) Mgmt links Uplinks: Up to 12x links for IB and at least four 10Gb links for Ethernet

Blade 2

I/O Bridge

I/O Bridge 3/ SM3

. . .

HS Switch 3 HS Switch 4 I/O Bridge 4 / SM4

I/O Bridge
Switch Module 1 Switch Module 2

e.g., Ethernet, Fibre Channel, Passthru Dual 4x (16 wire) wiring internally to each HSSM

Blade 14
Mgmt Mod 1 Mgmt Mod 2

Arch of Parallel Computers

CSC / ECE 506

18

InfiniBand on BladeCenter H
Enabling High Performance and Virtualized I/O

Expanding BladeCenter Ecosystem with Cisco Systems


Switch module and daughter card designed for BladeCenter H Daughter card provides dual port 4x (10G) InfiniBand connectivity to each blade

Help Reduce Data Center Complexity


Reduce the number of adapters, cables, and switch ports required Manage the addition or removal of I/O or storage bandwidth centrally Enable users to adjust resources on demand without downtime

High Performance Computing Features


Leverages RDMA to deliver low latency performance Delivers higher bandwidth connectivity (160 Gbps to chassis) Achieve blade port consolidation through remote I/O

I/O Virtualization via Cisco VFrame

BladeCenter H InfiniBand Solution provides high-speed, low latency solutions while lowering TCO

Arch of Parallel Computers

CSC / ECE 506

19

The End

Arch of Parallel Computers

CSC / ECE 506

20

Grid Example

Inter-Grids Extra-Grids Extra -Grids Intra-Grids Intra -Grids

Inter -Grids

Cactus Cactus
(SF) (SF) Express Express Project Project

NTG NTG

Grid
NAS/SAN

Grid VPN
NAS/SAN

Grid
NAS/SAN

2003
Commerce with Trusted Partners

Fin. Services

MFG

"Full Commercialization" with unknown partners

2006+

Arch of Parallel Computers

CSC / ECE 506

21 Courtesy of Ellen Stokes

You might also like