Professional Documents
Culture Documents
Lecture-5&6
Example 1
-5 2
Represent both divisor and dividend in 2s
complement notation.
M = 2 = 0010
Q = -5 = 1011 AQ = 11111011
Example 1
Example 2
+7 (-4)
M = -4 = 1100
Q = +7 = 0111
AQ = 00000111
Parallel Processing
Parallel Processing is the use of a collection of
integrated and tightly coupled processing
elements or processors that are cooperating
and communicating on a single task to speed
up its solution.
Motivation
Higher Speed, or Solving Problems Faster
This is important when applications have hard or soft
deadlines (called real time systems)
For example, we have at most few hours to do 24-hour
weather forecasting or to produce timely tornado
warning
Higher Throughput, or Solving More Instances of
Given Problems
E.g. Transaction processing for banks and airlines
Higher Computational Power, or Solving Larger
Problems
Generate more detailed, accurate and longer simulation
e.g. 5 day weather forecasting
Limitations to Uniprocessor
Improvement
Speed of Light
For each instruction executed, the processor must
fetch the instruction, and move it to the IR inside
the processor. The time to complete this operation
is bounded by the speed of propagation of
electromagnetic signals--3 x 108 meters per second.
(Actually, that is the speed of light in vacuum; the
speed of light through silicon is less.) If the distance
between the processor and the memory unit is 30
cm, it takes about 1 billionth of a second for an
instruction to travel from memory to the processor.
Limitations to Uniprocessor
Improvement
Limits on Miniaturization
Though miniaturization leads to faster
processing, as it increases the switching speed
of the transistors, we cannot do it indefinitely,
because, after all, a transistor needs some
space on the chip and cannot disappear.
Von-Neumann Bottleneck
The speed disparity between processor and memory is
growing with the passage of time, causing huge
performance bottlenecks.
A parallel system (e.g. a Cluster) overcomes this
shortcoming by providing more and more aggregate
memory and cache capacity as well as boosting the
memory bandwidth required by HPC applications.
Some of the fastest growing applications of parallel
computing utilize not their raw computational speed,
rather their ability to pump data to memory and disk
faster.
Instruction Pipelining
The EX (execution) stage is marked by the ALU usage
be it for adding two register operands, operand
address calculation, testing condition for a branch
instruction, etc.
In M (memory) stage, an instruction either reads or
writes a data element from/to memory.
The result produced by an instruction is written back
to register in WB (write back) stage.
Non-Pipelined Execution
Speedup
Speedup
Instruction Throughput