Professional Documents
Culture Documents
Content
Logical organization
Flynn taxonomy
SIMD and MIMD
Communication
Physical organization
Historical context
Shared memory versus distributed memory
Why? How?
Logical organization
2.
Flynns taxonomy
Compilers and CPUs also try to minimize this problem by guessing the
outcome (branch prediction).
The key difference between the above pipeline and a MISD architecture is that the
floating-point pipeline stages are not programmable.
Each PE has its own stream of instructions operating on its own data.
Is the most general of the architectures: each of the other cases can
be mapped onto the MIMD architecture.
The vast majority of modern parallel systems fit into this category.
On a MIMD computer each of the parallel processing units executes
operations independently of each other, subject to synchronization
through appropriate message passing at specified time intervals.
Both parallel data distribution as well as the message passing and
synchronization are under user control.
Ex: Intel Gamma, Delta Touchstone, Cray C-90, IBM SP2 (1990s).
A vector processor than can simultaneously work with 64 vector elements can also
generate 64 results per cycle
In practice:
Vector computers used to be very common in the field of HPC, as
they allowed very high performance even at lower CPU clock
speeds.
They have begun to slowly disappear.
This is a binary mask associated with each data item and operation
that specifies whether it should participate in the operation or not.
Conditional execution can be detrimental to the performance of SIMD
processors and therefore must be used with care.
1.
2.
Multi-core
SIMD inefficiency => a natural evolution of multiprocessor systems towards the more flexible MIMD model
2.
GMSV: shared-memory
multiprocessors
DMMP: distributed-memory
multicomputers
DMSV: is sometimes called
distributed shared memory
Advantage of shared mem.systs: all processors make use of the whole mem
ccNUMA
Communication Model of
Parallel Platforms
1. shared data space
2. exchanging messages
Shared-Address-Space Platform
1.
shared-address-space machines
only support an address translation mechanism
leave the task of ensuring coherence to the programmer.
The native programming model for such platforms consists of primitives such
as get and put.
These primitives allow a processor to get (and put) variables stored at a
remote processor.
If one of the copies of this variable is changed, the other copies are not
automatically updated or invalidated.
Message-Passing Platforms
1.
2.
Physical organization
Supercomputer
The development of
1.
2.
Advantages:
neither require data distributions over the processors nor data
messages between them.
the communications between PUs required for the control of the
application are implicitly performed via the shared memory and can
thus be very fast.
the memory connection is facilitated by fast bus technology or a
variety of switch types (i.e., omega, butterfly etc)
easy to program
with loop parallelization being the simplest and the most efficient means
of achieving parallelism
Disadvantages:
Cache memory resulted in relatively complicated data
management/cache coherency issues.
the fact that it always was and still is almost impossible
2.
3.
1.
2.
3.
number p of processors,
number m of memory modules
Cache
Motivation: reduce the amount of data that must pass through the
processor-to-memory interconnection network
Use a private cache memory of reasonable size within each
processor.
The reason that using cache memories reduces the traffic through the
network is the same here as for conventional processors:
Copies of data in the main memory & in various caches may become
inconsistent.
Approach:
1.
Do not cache shared data at all or allow only a single cache copy.
2.
If the volume of shared data is small and access to it infrequent, these policies
work quite well.
Do not cache writeable shared data or allow only a single cache copy.
each cache monitors all data transfers on the bus to see if the
validity of the data it is holding will be affected
directory-based schemes
Advantages:
Disadvantages:
Interconnection networks
Clusters:
Of tens order
Hybrid: