Professional Documents
Culture Documents
13 Multiprocessors
13-1
Computation can proceed in parallel in one of two ways 1) Multiple independent jobs can be made to operate in parallel 2) A single job can be partitioned into multiple parallel tasks Classified by the memory Organization 1) Shared memory or Tightly-coupled system
Local memory + Shared memory
1) Time-shared common bus 2) Multi-port memory 3) Crossbar switch 4) Multistage switching network 5) Hypercube system
Chap. 13 Multiprocessors
13-2
Time-shared Common Bus Time-shared single common bus system : Fig. 13-1
Only one processor can communicate with the memory or another processor at any given time
when one processor is communicating with the memory, all other processors are either busy with internal operations or must be idle waiting for the bus
the memory connected to the common system bus is shared by all processors Link each local but to a common system bus
Local bus
Memory unit
CPU
IOP
Local memory
Local bus
CPU 1
CPU 2
CPU 3
IOP 1
IOP 2
CPU
IOP
Local memory
CPU
Local memory
Local bus
Local bus
Chap. 13 Multiprocessors
13-3
Multi-port memory : Fig. 13-3 multiple paths between processors and memory
Advantage : high transfer rate can be achieved Disadvantage : expensive memory control logic / large number of cables & connectors
Crossbar Switch : Fig. 13-4 Memory Module I/O Port Crossbar Switch Block diagram of crossbar switch : Fig. 13-5
MM
Memory modules MM 1 MM 2 MM 3 MM 4
CPUs
Data,address, and control form CPU 1 Data
Memory modules MM 1 MM 2 MM 3 MM 4
CPU 1
CPU 1
Address
CPU 2
CPU 2
Memory module
Read/write
CPU 3
CPU 3
CPU 4
CPU 4
Chap. 13 Multiprocessors
Crossbar Switch
cluster cluster cluster cluster cluster cluster cluster cluster CrossbarHierarchies cluster cluster cluster cluster
13-4
cluster
cluster cluster cluster
Cluster
Node Node
Node
PU
8 8
CU I/O
Network Interface
Local Memory
Chap. 13 Multiprocessors
13-5
Multistage Switching Network Control the communication between a number of sources and destinations
Tightly coupled system : PU Loosely coupled system : PU
MM PU
Basic components of a multistage switching network : two-input, two-output interchange switch : Fig. 13-6 ) 2 Processor (P1 and P2) are connected through switches to 8 memory modules (000 - 111) : Fig. 13-7 Omega Network : Fig. 13-8
2 x 2 Interchange switch N input x N output network topology
0
0 1 000 001
000 001
A B A connected to 0
0 1
A B A connected to 1
0
0
1 0 1 P0 P1 0 1 0 1 0 100 101
4 5 100 101
010 011
2 3
010 011
A B B connected to 0
0 1
A B B connected to 1
0 1
1
0 1
110 111
6 7
110 111
Chap. 13 Multiprocessors
13-6
Hypercube Interconnection : Fig. 13-9 Loosely coupled system Hypercube Architecture : Intel iPSC ( n = 7, 128 node )
011 111
01
11
010 001
110 101
00
10
000
100
Chap. 13 Multiprocessors
13-7
System Bus : IEEE Standard 796 MultiBus 86 signal lines : Tab. 13-1
Bus Arbitration : BREQ, BUSY,
Bus Arbitration Algorithm : Static / Dynamic Static : priority fixed Serial arbitration : Fig. 13-10
Highest priority Bus 1 PI arbiter 1 PO PI arbiter 1 Bus PO PI arbiter 1 Bus PO
* Bus Busy Line If this line is inactive, no other processor is using the bus
Lowest priority Bus PI arbiter 1 PO
To next arbiter
24 Decoder
13-8
Interprocessor Synchronization Enforce the correct sequence of processes and ensure mutually exclusive access to shared writable data Mutual Exclusion
Protect data from being changed simultaneous by two or more processor
Once begun, must complete execution before another processor accesses Indicate whether or not a processor is executing a critical section Processor generated signal to prevent other processors from using system bus
Semaphore
Hardware Lock
Chap. 13 Multiprocessors
13-9
Semaphore shared memory 1) TSL SEM (Test and Set while Locked)
Hardware Lock SEM 2 memory cycle R M [ SEM ] : Test semaphore (semaphore R ) M [ SEM ] 1 : Set semaphore ( processor shared memory )
2) R = 0 : shared memory is available R = 1 : processor can not access shared memory (semaphore originally set)
X = 120 Main memory Bus
X = 52
X = 52
Caches
P2
P3
Processors
P1 X 120 Write
X = 52
X = 52
Main memory
X = 52
Main memory
Bus
X = 120 X = 52 X = 52
Bus
Caches
X = 52
X = 52
Caches
P0 P2 P3 Processors
P0
P2
P3
Processors
Chap. 13 Multiprocessors
13-10
Hardware
1) Monitor possible write operation : Snoopy cache controller
:
IEEE Computer, 1988, Feb. Synchronization, coherence, and event ordering in multiprocessors IEEE Computer, 1990, June. A survey of cache coherence schemes for multiprocessors
Chap. 13 Multiprocessors