System On Chip Architectures

SYSTEM DESIGN
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
PVPIT, Budhgaon.
1
shindesir.pvp@gmail.com
CONCEPT OF SYSTEM
A system is a collection of elements or components that are

organized for a common purpose.
A system is a set of interacting or interdependent components
forming an integrated design.
A system has structure: it contains parts (or components) that

are directly or indirectly related to each other;
A system has behavior: it exhibits processes that fulfill its
function or purpose;
A system has interconnectivity: the parts and processes are
connected by structural and/or behavioral relationships.
SYSTEM
Elements of a system
Input: The inputs are said to be fed to the systems in order to

get the output.
Output: Those elements that exists in the system due to the

processing of the inputs is known as output
Processor: It is the operational component of a system which

processes the inputs.
Control: The control element guides the system. It is the

decision-making sub-system that controls the activities like
governing inputs, processing them and generating output.
Boundary and interface: The limits that identify its

3
components, processes and interrelationships when it interfaces
with another system.
IMPORTANCE OF SYSTEM ARCHITECTURES
A system architecture is the conceptual model that defines the

structure, behavior (functioning) and more views of a system.
A system architecture can comprise:
system components,
the externally visible properties of those components,
the relationships between them.
It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the
overall system.
SYSTEM ON CHIP
System-on-a-chip (SoC or SOC) refers to integrating all

components of a computer or other electronic system into a single
integrated circuit (chip).
It may contain digital, analog, or mixed-signal
all on one semiconductor chip.
SIMD
SINGLE INSTRUCTION MULTIPLE DATA
SIMD
Single Instruction Multiple Data (SIMD), is a class of parallel

computers in Flynn's taxonomy.
In computing, SIMD is
level parallelism.
a technique employed to Achieve data
SIMD
SIMD machines are capable of applying

the exact same instruction stream to
multiple streams of data simultaneously.
This type of architecture is perfectly
suited to achieving very high processing
rates, as the data can be split into
many different independent pieces, and
the multiple instruction units can all
operate on them at the same time.
For
example:
each
of
64,000
processors
in
a
Thinking Machines CM-2 would execute the same instruction
at the same time so that you could do 64,000 multiplies on
64,000 pairs of numbers at a time.
SIMD
SIMD Processable Patterns
SIMD Unprocesable Patterns
Brightness Computation by SIMD Operations
SIMD TYPES
Synchronous (lock-step):
These systems are synchronous, meaning that they are built in
such a way as to guarantee that all instruction units will receive
the same instruction at the same time, and thus all will potentially
be able to execute the same operation simultaneously.
Deterministic SIMD architectures:
These are deterministic because, at any one point in time, there is
only one instruction being executed, even though multiple units
may be executing it. So, every time the same program is run on the
same data, using the same number of execution units, exactly the
same result is guaranteed at every step in the process.
Well-suited to instruction/operation level parallelism:
The single in single-instruction doesnt mean that theres only one
instruction unit, as it does in SISD, but rather that theres only
one instruction stream, and this instruction stream is executed by
multiple processing units on different pieces of data, all at the
10
same time, thus achieving parallelism.
SIMD (ADVANTAGES)
An application where the same value is being added (or subtracted)

to a large number of data points, a common operation in many
multimedia applications.
The data is understood to be in blocks, and a number of values can

be loaded all at once.
One example would be changing the brightness of an image.

To change the brightness, the R G and B values are read from memory,
a value is added (or subtracted) from them, and the resulting values
are written back out to memory.
Instead of a series of instructions saying "get this pixel, now get the
next pixel", a SIMD processor will have a single instruction that
effectively says "get lots of pixels. This can take much less time than
"getting" each pixel individually, like with traditional CPU design.
If the SIMD system works by loading up eight data points at once,

the add operation being applied to the data will happen to all eight
11
values at the same time.
SIMD (DISADVANTAGES)
Not all algorithms can be vectorized.

Implementing
an
algorithm
with
SIMD instructions
usually requires human labor; most compilers don't generate SIMD
instructions from a typical C program, for instance.
Programming with particular SIMD instruction sets can involve
numerous low-level challenges.
It has restrictions on data alignment.
Gathering data into SIMD registers and scattering it to the
correct destination locations is tricky and can be inefficient.
Specific instructions like rotations or three-operand addition
aren't in some SIMD instruction sets.
12
SISD
SINGLE INSTRUCTION SINGLE DATA
13
SISD
This is the oldest style of computer

architecture, and still one of the most
important: all personal computers fit within
this category
Single instruction refers to the fact that
there is only one instruction stream being
acted on by the CPU during any one clock
tick;
single data means, analogously, that one
and only one data stream is being employed
as input during any one clock tick.
14
SISD
In
computing,
SISD is a term referring
to a computer architecture in which a single
processor,
(uniprocessor) executes a single
instruction stream, to operate on data stored
in a single memory.
This corresponds
Architecture.
to
the
Von
Neumann
Instruction fetching and pipelined execution

of instructions
are common
examples
found
in
most
modern
SISD
computers.
15
CHARACTERISTICS OF SISD
Serial Instructions are executed one after the other, in lockstep; this type of sequential execution is commonly called serial,
as opposed to parallel, in which multiple instructions may be
processed simultaneously.
Deterministic Because each instruction has a unique place
in the execution stream, and thus a unique time during which it
and it alone is being processed, the entire execution is
said to be deterministic, meaning that you (can potentially)
know exactly what is happening at all times, and, ideally, you can
exactly recreate the process, step by step, at any later time.
Examples:
All personal computers,
All single-instruction-unit-CPU workstations,
Mini-computers, and
Mainframes.
16
MIMD
MULTIPLE INSTRUCTION MULTIPLE DATA
17
MIMD
In computing, MIMD is a technique

employed to achieve parallelism.
Machines using MIMD have a
number of processors that function
asynchronously and independently.
At any time, different processors
may
be
executing
different
instructions on different pieces of
data.
MIMD architectures may be used in
a number of application areas such
as computer-aided design/computeraided manufacturing, simulation,
modeling, and as communication
18
switches.
MIMD
MIMD machines can be of either

shared memory or distributed
memory categories.
Shared
memory
machines
may be of the
bus-based,
extended or hierarchical type.
istributed memory machines
D
may have hypercube or mesh
interconnection schemes.
19
MIMD: SHARED MEMORY MODEL
The processors are all connected to a "globally available"

memory, via either a software or hardware means. The
operating system usually maintains its memory coherence.
Bus-based:
MIMD machines with shared memory have processors which share a
common, central memory.
Here all processors are attached to a bus which connects them to
memory.
This setup is called bus-base point where there is too much
contention on the bus.
Hierarchical:
MIMD machines with hierarchical shared memory use a
hierarchy of buses to give processors access to each other's memory.
Processors on different boards may communicate through inter-nodal
buses.
Buses support communication between boards.
With this type of architecture, the machine may support over 20a
thousand processors.
MIMD: DISTRIBUTED MEMORY MODEL
In distributed memory MIMD machines, each processor has its

own individual memory location. Each processor has no direct
knowledge about other processor's memory.
For data to be shared, it must be passed from one processor to
another as a message. Since there is no shared memory, contention
is not as great a problem with these machines.
It is not economically feasible to connect a large number of
processors directly to each other. A way to avoid this multitude of
direct connections is to connect each processor to just a few others.
The amount of time required for processors to perform simple
message routing can be substantial.
Systems were designed to

hypercube and mesh are
interconnection schemes.
reduce
among
this
two
time loss and

of the popular
21
MIMD: DISTRIBUTED MEMORY MODEL
Interconnection schemes:
Hypercube interconnection network:

In an MIMD distributed memory machine with a hypercube system
interconnection network containing four processors, a processor
and a memory module are placed at each vertex of a square.
The diameter of the system is the minimum number of steps it takes
for one processor to send a message to the processor that is the farthest
away.
So, for example, In a hypercube system with eight processors and each
processor and memory module being placed in the vertex of a cube, the
diameter is 3. In general, a system that contains 2^N processors with
each processor directly connected to N other processors, the diameter of
the system is N.
Mesh interconnection network:

In an MIMD distributed memory machine with a mesh
interconnection network, processors are placed in a two- dimensional
grid.
Each processor is connected to its four immediate neighbors. Wrap
around connections may be provided at the edges of the mesh.
22
One advantage of the mesh interconnection network over the
hypercube is that the mesh system need not be configured in
powers of two.
MIMD: CATEGORIES
The most general of all of the major categories, a MIMD machine

is capable of being programmed to operate as if it were in fact any
of the four.
Synchronous or asynchronous MIMD instruction streams
can potentially
be
executed
either
synchronously
or
asynchronously, i.e., either in tightly controlled lock-step or in a more
loosely bound do your own thing mode.
Deterministic or non-deterministic MIMD systems are
potentially capable of deterministic behavior, that is, of reproducing
the exact same set of processing steps every time a program is run on
the same data.
Well-suited to block, loop, or subroutine level parallelism. The
more code each processor in an MIMD assembly is given domain over,
the more efficiently the entire system will operate, in general.
Multiple Instruction or Single Program MIMD-style systems

are capable of running in true multiple-instruction mode, with every
23
processor doing something different, or every processor can be given
the same code; this latter case is called SPMD, Single Program
Multiple Data, and is a generalization of SIMD-style parallelism.
MISD
MULTIPLE INSTRUCTION SINGLE DATA
24
MISD
In
computing,
MISD is a type of
parallel computing architecture where
many functional units perform different
operations on the same data.
Pipeline architectures belong to this type.
Fault-tolerant computers executing the
same instructions redundantly in order to
detect and mask errors, in a manner
known as task replication, may
be
considered to belong to this type.
Not
many
instances
of
this
architecture
exist,
as
MIMD
and
SIMD are often more appropriate for
25
common data parallel techniques.
MISD
Another example of a MISD

process that is carried out
routinely at United Nations.
When a delegate speaks in a

language of his/her choice, his
speech
is
simultaneously
translated into a number of
other languages for the benefit
of other delegates present.
Thus the delegates speech (a
single data) is being processed
by a number of translators
(processors) yielding different
results.
26
MISD
This category was included more for the sake of completeness than
to identify a working group of actual computer systems.
MISD Examples:
Multiple frequency filters operating on a single signal stream.
Multiple cryptography algorithms attempting to crack a single
coded message.
Both of these are examples of this type of processing where multiple,

independent instruction streams are applied simultaneously to
a single data stream.
27
PIPELINING
28
PIPELINING
In
computing, a pipeline is a set of data processing
elements connected in series, so that the output of one element
is the input of the next one.
The elements of a pipeline are often executed in parallel or in
time-sliced fashion; in that case, some amount of buffer storage
is often inserted between elements.
29
PIPELINING (CONCEPT AND MOTIVATION)
Consider the washing of a car:

A car on the washing line can have only one of the three
steps done at once. After the car has its washing, it moves for
drying, leaving the washing facilities available for the next car.
The first car then moves on to polishing, the second car to
drying, and a third car begins to have its washing.
If each operation needs 30 minutes each, then finishing all

three cars when only one car can be operated at once would take
(??????) minutes.
On the other hand, using the washing line, the total time to
complete all three is (?????) minutes. At this point, additional
cars will come off the assembly line.
30
PIPELINING (IMPLEMENTATIONS)
Buffered, Synchronous pipelines:

Conventional microprocessors are synchronous circuits that use buffered,
synchronous pipelines. In these pipelines, "pipeline registers" are inserted inbetween pipeline stages, and are clocked synchronously.
Buffered, Asynchronous pipelines:

Asynchronous pipelines are used in asynchronous circuits, and have their
pipeline registers clocked asynchronously. Generally speaking, they use a
request/acknowledge system, wherein each stage can detect when it's
"finished.
When a stage is finished and the next stage has sent it a "request" signal, the
stage sends an "acknowledge" signal to the next stage, and a "request" signal
to the previous stage. When a stage receives an "acknowledge" signal, it clocks
its input registers, thus reading in the data from the previous stage.
Unbuffered pipelines:
Unbuffered pipelines, called "wave pipelines", do not have registers in-between
pipeline stages.
Instead, the delays in the pipeline are "balanced" so that, for each stage, the
difference between the first stabilized output data and the last is minimized.31
INSTRUCTION PIPELINE
An instruction pipeline is a
technique used in the design
of computers and other digital
electronic devices to increase
their instruction throughput
(the number of instructions that
can be executed in a unit of
time).
The fundamental idea is to
split
the processing of a
computer instruction into a
series of independent steps,
with storage at the end of each
step. This allows the computer's
control
circuitry
to
issue
instructions at the processing
rate of the slowest step, which is
much faster than the time
needed to perform all steps 32
at
once.
INSTRUCTION PIPELINE
For example, the classic RISC

pipeline is broken into five
stages with a set of flip flops
between each stage.
Instruction fetch
Instruction
decode
register fetch
Execute
Memory access
Register write back
and
33
PIPELINING (ADVANTAGES AND DISADVANTAGES)
Pipelining does not help in all cases. An instruction pipeline is said to

be fully pipelined if it can accept a new instruction every clock cycle. A
pipeline that is not fully pipelined has wait cycles that delay the
progress of the pipeline.
Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issuerate in most cases.
Some combinational circuits such as adders or multipliers can be made faster by
adding more circuitry. If pipelining is used instead, it can save circuitry.
Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a time. This
prevents branch delays and problems with serial instructions being executed
concurrently. Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than
in a pipelined equivalent. This is due to the fact that extra flip flops must be
added to the data path of a pipelined processor.
A non-pipelined processor will have a stable instruction bandwidth. The
performance of a pipelined processor is much harder to predict and may vary
more widely between different programs.
34
PARALLEL
COMPUTING
35
PARALLEL COMPUTING
Parallel computing is a form of computation in which many

calculations are carried out simultaneously, operating on the
principle that large problems can often be divided into smaller
ones, which are then solved concurrently ("in parallel").
There are several different forms of parallel computing:
bit-level,
instruction level,
data, and
task parallelism.
Parallelism has been employed for many years, mainly in highperformance computing, but interest in it has grown lately due to
the physical constraints preventing frequency scaling. As power
consumption by computers has become a concern in recent years,
parallel computing has become the dominant paradigm in
computer architecture, mainly in the form of multicore
processors.
36
PARALLEL COMPUTING
Traditionally, computer software has been written for serial

computation. To solve a problem, an algorithm is constructed and
implemented as a serial stream of instructions. These instructions
are executed on a central processing unit on one computer. Only
one instruction may execute at a timeafter that instruction is
finished, the next is executed.
Parallel computing, on the other hand, uses
processing elements simultaneously to solve a problem.
multiple
This is accomplished by breaking the problem into independent

parts so that each processing element can execute its part of the
algorithm simultaneously with the others.
The processing elements can be diverse and include resources such

as a single computer with multiple processors, several networked
computers, specialized hardware or any combination of the above.37
TYPES OF PARALLELISM
Bit-level parallelism:
Instruction-level parallelism:
A computer program is, a stream of instructions executed by a

processor. These instructions can be re-ordered and combined into
groups which are then executed in parallel without changing the result
of the program. This is known as instruction-level parallelism.
Data parallelism:
From the advent of VLSI in the 1970s until about 1986, speed-up in
computer architecture was driven by doubling computer word size
the amount of information the processor can manipulate per cycle.
Increasing the word size reduces the number of instructions the
processor must execute to perform an operation on variables whose
sizes are greater than the length of the word.
Data parallelism is parallelism inherent in program loops, which

focuses on distributing the data across different computing nodes to be
processed in parallel.
Task parallelism:
Task parallelism is the characteristic of a parallel program that

"entirely different calculations can be performed on either the same or
different sets of data This contrasts with data parallelism, where the
same calculation is performed on the same or different sets of data. 38
Bit-level parallelism is a form of parallel computing based on

increasing processor word size.
Increasing the word size reduces the number of instructions the
processor must execute in order to perform an operation on
variables whose sizes are greater than the length of the word.
For example:
Consider a case where an 8-bit processor must add two 16-bit

integers. The processor must first add the 8 lower-order bits from each
integer, then add the 8 higher-order bits, requiring two instructions to
complete a single operation. A 16-bit processor would be able to
complete the operation with single instruction
Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit,

then 32-bit microprocessors. This trend generally came to an end with
the introduction of 32-bit processors, which has been a standard in
general purpose computing for two decades. Only recently, with
the advent of x86-64 architectures, have 64-bit processors become
commonplace.
39
Instruction-level parallelism (ILP) is a measure of how many

of the operations in a computer program can be performed
simultaneously.
Consider the following program:
For Example:
1. e = a + b
2. f = c + d
3. g = e * f
Here, Operation 3 depends on the results of operations 1 and 2, so
it cannot be calculated until both of them are completed. However,
operations 1 and 2 do not depend on any other operation, so they
can be calculated simultaneously.
If we assume that each operation can be completed in one
40
unit of time then these three instructions can be completed in a
total of two units of time, giving an ILP of 3/2.
Instruction-level parallelism (ILP):
A goal of compiler and processor designers is to identify

and take advantage of as much ILP as possible.
Ordinary programs are typically written under a
sequential execution model where instructions execute one
after the other and in the order specified by the programmer.
ILP allows the compiler and the processor to overlap the
execution of multiple instructions or even to change the
order in which instructions are executed.
How much
specific. In
computing
workloads
parallelism.
ILP exists in programs is very application

certain fields, such as graphics and scientific
the amount can be very large. However,
such as cryptography exhibit much less
41
Data parallelism (also known as loop-level

parallelism) is a form of parallelization of computing
across multiple processors in parallel computing
environments.
Data parallelism focuses on distributing the data across
different parallel computing nodes.
In a multiprocessor system executing a single set of
instructions (SIMD), data parallelism is achieved when
each processor performs the same task on different
pieces of distributed data. In some situations, a single
execution thread controls operations on all pieces 42
of
data.
Data parallelism
For instance, consider a 2-processor system (CPUs A and B)
in a parallel environment, and we wish to do a task on some
data d. It is possible to tell CPU A to do that task on one
part of d and CPU B on another part simultaneously,
thereby reducing the duration of the execution.
The data can be assigned using conditional statements
As a specific example, consider adding two matrices. In a
data parallel implementation, CPU A could add all
elements from the top half of the matrices, while CPU B
could add all elements from the bottom half of the matrices.
Since the two processors work in parallel, the job of
performing matrix addition would take one half the time of
43
performing the same operation in serial using 51 one CPU
alone.
Task
parallelism
(also
known
as
function
parallelism and control parallelism) is a form of
parallelization of computer code across multiple processors in
parallel computing environments.
Task parallelism focuses on distributing execution
processes (threads) across different parallel computing
nodes.
In a multiprocessor system, task parallelism is achieved

when each processor executes a different thread (or process)
on the same or different data.
The threads may execute the same or different code. In the
general case, different execution threads communicate with
one another as they work. Communication takes place
usually to pass data from one thread to the next as part of44a
workflow.
Task
parallelism
As a simple example, if we are running code on a 2processor system (CPUs "a" & "b") in a parallel
environment and we wish to do tasks "A" and "B" , it is
possible to tell CPU "a" to do task "A" and CPU "b" to do
task 'B" simultaneously, thereby reducing the runtime
of the execution.
The tasks
statements.
can
be
assigned
using
conditional
Task
parallelism
emphasizes
the
distributed
(parallelized) nature of the processing (i.e. threads), as
45
opposed to the data (data parallelism).

System On Chip Architectures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

System On Chip Architectures

Uploaded by

Copyright:

Available Formats

SYSTEM DESIGN

A system is a collection of elements or components that are

A system has structure: it contains parts (or components) that

Input: The inputs are said to be fed to the systems in order to

Output: Those elements that exists in the system due to the

Processor: It is the operational component of a system which

Control: The control element guides the system. It is the

Boundary and interface: The limits that identify its

IMPORTANCE OF SYSTEM ARCHITECTURES

A system architecture is the conceptual model that defines the

System-on-a-chip (SoC or SOC) refers to integrating all

Single Instruction Multiple Data (SIMD), is a class of parallel

a technique employed to Achieve data

SIMD machines are capable of applying

SIMD Processable Patterns

SIMD Unprocesable Patterns

Brightness Computation by SIMD Operations

An application where the same value is being added (or subtracted)

The data is understood to be in blocks, and a number of values can

One example would be changing the brightness of an image.

If the SIMD system works by loading up eight data points at once,

Not all algorithms can be vectorized.

This is the oldest style of computer

Instruction fetching and pipelined execution

In computing, MIMD is a technique

MIMD machines can be of either

MIMD: SHARED MEMORY MODEL

The processors are all connected to a "globally available"

MIMD: DISTRIBUTED MEMORY MODEL

In distributed memory MIMD machines, each processor has its

Systems were designed to

time loss and

MIMD: DISTRIBUTED MEMORY MODEL

Hypercube interconnection network:

Mesh interconnection network:

The most general of all of the major categories, a MIMD machine

Multiple Instruction or Single Program MIMD-style systems

Another example of a MISD

When a delegate speaks in a

Both of these are examples of this type of processing where multiple,

PIPELINING (CONCEPT AND MOTIVATION)

Consider the washing of a car:

If each operation needs 30 minutes each, then finishing all

Buffered, Synchronous pipelines:

Buffered, Asynchronous pipelines:

For example, the classic RISC

PIPELINING (ADVANTAGES AND DISADVANTAGES)

Pipelining does not help in all cases. An instruction pipeline is said to

Parallel computing is a form of computation in which many

Traditionally, computer software has been written for serial

This is accomplished by breaking the problem into independent

The processing elements can be diverse and include resources such

A computer program is, a stream of instructions executed by a

Data parallelism is parallelism inherent in program loops, which

Task parallelism is the characteristic of a parallel program that

Bit-level parallelism is a form of parallel computing based on

Consider a case where an 8-bit processor must add two 16-bit

Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit,

Instruction-level parallelism (ILP) is a measure of how many

Consider the following program:

Instruction-level parallelism (ILP):

A goal of compiler and processor designers is to identify

ILP exists in programs is very application