Computer Arch

CAO
Lecture-5&6
Division of Signed Numbers

Let,
M = n-bit divisor
AQ = 2n-bit dividend
As the algorithm terminates, remainder is found

in A and quotient in Q.
If signs of divisor and dividend disagree, 2s
complement of the number in Q must be taken to
obtain the actual quotient.
Division of Signed Numbers
Example 1
-5 2
Represent both divisor and dividend in 2s
complement notation.
M = 2 = 0010
Q = -5 = 1011 AQ = 11111011
Example 1
Example 2
+7 (-4)
M = -4 = 1100
Q = +7 = 0111
AQ = 00000111
Parallel Processing
Parallel Processing is the use of a collection of
integrated and tightly coupled processing
elements or processors that are cooperating
and communicating on a single task to speed
up its solution.
Motivation
Higher Speed, or Solving Problems Faster
This is important when applications have hard or soft
deadlines (called real time systems)
For example, we have at most few hours to do 24-hour
weather forecasting or to produce timely tornado
warning
Higher Throughput, or Solving More Instances of
Given Problems
E.g. Transaction processing for banks and airlines
Higher Computational Power, or Solving Larger
Problems
Generate more detailed, accurate and longer simulation
e.g. 5 day weather forecasting
Limitations to Uniprocessor
Improvement
Speed of Light
For each instruction executed, the processor must
fetch the instruction, and move it to the IR inside
the processor. The time to complete this operation
is bounded by the speed of propagation of
electromagnetic signals--3 x 108 meters per second.
(Actually, that is the speed of light in vacuum; the
speed of light through silicon is less.) If the distance
between the processor and the memory unit is 30
cm, it takes about 1 billionth of a second for an
instruction to travel from memory to the processor.
Limitations to Uniprocessor
Improvement
Limits on Miniaturization
Though miniaturization leads to faster
processing, as it increases the switching speed
of the transistors, we cannot do it indefinitely,
because, after all, a transistor needs some
space on the chip and cannot disappear.
Power dissipation issues due to increased clock rate

Increased clock rate results in staggering power consumption
density on chip
Clock rates are now stagnating to counter increased level of
power consumption
Stagnating clock rates are now being compensated by
multiple processor cores on the same chip i.e. multicore
architectures
Consequently, uniprocessor is now becoming to disappear
even from the desktops thus making it imperative for
programmers to learn parallel programming techniques to
exploit the hardware parallelism available in state-of-the-art
machines.
Von-Neumann Bottleneck
The speed disparity between processor and memory is
growing with the passage of time, causing huge
performance bottlenecks.
A parallel system (e.g. a Cluster) overcomes this
shortcoming by providing more and more aggregate
memory and cache capacity as well as boosting the
memory bandwidth required by HPC applications.
Some of the fastest growing applications of parallel
computing utilize not their raw computational speed,
rather their ability to pump data to memory and disk
faster.
Parallelism in a Uniprocessor System

Multiprogramming & Timesharing
1. In multiprogramming several processes reside
in main memory and CPU switches from one
process (say P1) to another (say P2) when the
currently running process (P1) blocks for an I/O
operation. I/O operation for P1 is handled by a
DMA unit while the CPU runs P2.
2. In timesharing processes are assigned slices of
CPUs time. The CPU executes the processes in
the round robin fashion as below.

Multiplicity of Functional Units
Use of multiple functional units like multiple adders,
multipliers or even multiple ALUs to provide concurrency is
not a new idea in uniprocessor environment. It has been
around for decades.
Harvard Architecture
This provides separate memory units for instructions and
data. This effectively doubles the memory bandwidth saving
CPUs time. E.g. is split cache in which instructions are kept in
I-cache and data in D-cache
In contrast when instructions and data are kept in the same
memory, the architecture is called Princeton Architecture. E.g.
are unified cache, main memory, etc.

Memory Hierarchy
A parallel processing mechanism supported by
memory hierarchy is the simultaneous transfer
of instructions/data between (CPU, cache) and
(MM, secondary memory)
Instruction Pipelining
The EX (execution) stage is marked by the ALU usage
be it for adding two register operands, operand
address calculation, testing condition for a branch
instruction, etc.
In M (memory) stage, an instruction either reads or
writes a data element from/to memory.
The result produced by an instruction is written back
to register in WB (write back) stage.
An Introductory Analysis of Pipelines
Non-Pipelined Execution
Speedup
Speedup
Instruction Throughput
Cycles Per Instruction (CPI)

Computer Arch

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Arch

Uploaded by

Copyright:

Available Formats

CAO

Division of Signed Numbers

As the algorithm terminates, remainder is found

Division of Signed Numbers

Power dissipation issues due to increased clock rate

Parallelism in a Uniprocessor System

Parallelism in a Uniprocessor System

Parallelism in a Uniprocessor System

An Introductory Analysis of Pipelines

Cycles Per Instruction (CPI)

You might also like