You are on page 1of 130

Computer Architecture

Knowledge Level:
Basic
Module Information
Module
Description
The Module aims to provide an
overview of Computer Architecture
as well as Computer Organization
and Design
Level Basic
Prerequisit
es
Basic knowledge of Computer
fundamentals and Electronics
Basic structure of Computer
hardware and software
Module Objective:
After completing this Module, you will be able to
Know about the Computer evolution
Understand the Computer Architecture
Input and Output Operations performed by CPU
Idea about the Memory Organization
Module Flow:




2.
Fundamental
Concepts
1.
Computer
Evolution

3.
Central
Processing
Unit
4.
The
Memory
5.
I/O
Organization
6.
Concept of
Pipelining
1.0 Evolution of Computer
Introduction:
This Topic gives an overview on computer evolution
and and its history.

Objective:
After completing this Topic, you will be able to
understand,
1. Types of Computer
2. History of Computer
3. Typical Computer System

Types of Computer
Types of Computers (based on size, speed and cost)
Mainframe Computer
Desktop Computer
Portable (notebook) Computers
Workstations
Supercomputer
History of Computers
The First Generation (1945-55): Vacuum Tubes
The Second Generation (1955-65): Transistors
The Third Generation (1965-75): Integrated Circuits
The Fourth Generation (1975 - ): VLSIs
Beyond the Fourth Generation

MAIN FRAME COMPUTER
They come second to super computers. They have
the following features:

they are larger in size compared to other
computers

Have larger capacity and are more powerful in
terms of processing speed
MAIN FRAME COMPUTER

NOTE BOOK COMPUTER
An extremely lightweight personal
computer.
Notebook computers typically weigh less
than 6 pounds and are small enough to fit
easily in a briefcase.
offered reduced computing power when
compared to a full- sized laptop.
significantly less expensive than other
laptops
NOTE BOOK COMPUTER
WORKSTATION

Workstations are high-end, expensive
computers that are made for more complex
procedures.
intended for one user at a time.
EX: science, math and engineering
calculations and are useful for computer
design and manufacturing.
Workstations are sometimes improperly
named for marketing reasons.
Real workstations are not usually sold in
retail.
WORKSTATION
SUPER COMPUTERS
Supercomputers are the fastest and the most
expensive computers.
Their huge processing power mean they can be used
for complex applications such as predicting protein
folding patterns (something really really complex).
They're normally used to solve very complex
science and engineering problems.
Supercomputers get their processing power by
taking advantage of parallel processing; they use lots
of CPUs at the same time on one problem
. A typical supercomputer can do up to ten trillion
individual calculations every second.
SUPER COMPUTER
History of Computer
First Generation (1945 - 1955)
Programs and data located in same memory
Assembly language was used to write programs
Vacuum Tube Technology for performing basic processing
Mercury delay line memory was used
Type writers as I/O devices
vacuum tubes were used for calculation as well as storage and
control.
They were very large in size.
Production of the heat was in large amount in first generation
computers.
MERCURY DELAY LINE MEMORY
inserting an information pattern into a path
which contains delay.
The end of the delay path is connected back to
the beginning through amplifying and timing
circuits, a closed loop is formed allowing for
recirculation of the information pattern.
MERCURY DELAY LINE MEMORY
Main Memory

(Programs and
data for
execution)
Instruction
Data
Central Processing
Unit (CPU)
Program
Control
Data
Processing
Programs, data,
operator
commands
Input-output
equipment


Secondary
memory,
keyboard,
printer etc.
Organization of a first-generation computer
History of Computer
History of Computer
Second Generation (1955 1965)
Transistor replaced vacuum tubes
Magnetic core memories and magnetic storage devices
were more widely used
High level language such as FORTRAN was developed
Compilers were developed
Separate I/O processors were developed along with CPU
IBM became major manufacturer
TRANSISTOR
It is a semiconductor device used to amplify
and switch electronic signals and electrical
power.
It is composed of semiconductor material
with at least three terminals for connection to
an external circuit.
A voltage or current applied to one pair of the
transistor's terminals changes the current
through another pair of terminals
Because the controlled (output) power can be
higher than the controlling (input) power, a
transistor can amplify a signal.
Third Generation ( 1965-1975)
Many transistors on a single chip (IC) enabled lower cost , faster
processors and memory elements
IC memories replaced magnetic core memories
Introduction of micro-programming , parallelism , pipelining
Effective time sharing in operating system
Development of Cache & Virtual memory

Fourth Generation (1975-1996)
Still advancement in IC technology (VLSI) that is Very Large Scale
Integration
Microprocessor concept - Motorola, Texas Instruments were the
major companies
Parallelism, Pipelining , Cache , Virtual Memories evolved to produce
high performance
Computing systems of today
History of Computer
IC
IC is a single component containing a number
of transistors.
Transistors were miniaturized and placed on
silicon chips, called semiconductors, which
drastically increased the speed and efficiency
of computers.
Magnetic-core memory

It uses tiny magnetic toroids (rings), the cores,
through which wires are threaded to write
and read information.
Each core represents one bit of information.
The cores can be magnetized in two different
ways (clockwise or counterclockwise) and the
bit stored in a core is zero or one depending
on that core's magnetization direction
The wires are arranged to allow an individual
core to be set to either a "one" or a "zero.
It can be read when "selected" by a "sense
wire
The core memory contents are retained even
when the memory system is powered down.
However, when the core is read, it is reset to a
"zero" which is known as destructive readout.
Circuits in the computer memory system then
restore the information in an immediate re-
write cycle.

VLSI
With the help of VLSI technology
microprocessor came into existence.
The computers were designed by using
microprocessor, as thousands of integrated
circuits were built onto a single silicon chip
FIFTH GENERATION COMPUTERS
They are in developmental stage which is
based on the artificial intelligence.
The goal is to develop the device which could
respond to natural language input.
Quantum computation and molecular and
nanotechnology will be used.
The fifth generation computers will use super
large scale integrated chips. They will be able
to use more than one CPU for faster
processing speed.EX:robots

A typical computer system
Video
Monitor
Secondary
Memory
Keyboard
Commu
nication
Network
Video
Control
Hard
Disk
Control
Keyboard
Control
Network
Control
IO
expansion
slots
IO Devices
IO (local) Bus
Peripheral (IO) interface control unit
CPU
Cache
Microprocessor
M
a
i
n

M
e
m
o
r
y
System Bus
Bus Interface
Unit
A Typical Computer System
Evolution of Computer : Summary
The types of Computers are determined based on
size, speed and cost.
The First Generation computer had Vacuum
Tubes
The Second Generation computers had
Transistors
The Third Generation computers had Integrated
Circuits
The Fourth Generation computers had VLSIs


2.0 Fundamental Concepts
Introduction:
This Topic gives an overview on computer architecture, instruction
set and addressing methods.

Objective:
After completing this Topic, you will be able to understand,
1. Architecture and Organization
2. Data Representation
3. Structure of Computer System
4. Instruction Set
5. Addressing Methods: Basic Concepts
6. Instruction Set and Addressing Methods
Architecture and Organization
Architecture:
Refers to those attributes that are visible to the programmer. That deals with
logical execution of a program
Example:
Instruction set
The number of bits used
Techniques for Memory management
For example, It is an architectural design issue whether the computer will have multiply
instruction
Organization:
Refers to the operational units and their interconnections that realize the
architectural specification
Example:
Hardware details such as control signals
Hardware interfaces
Memory technology
Using the same example, It is an organizational issue as to how the
multiply instruction should be implemented
Architecture and Organization
Data Representation
Classification of Data Types
Numbers used in arithmetic operation
Letters of alphabet used in data processing
Discrete symbols used for specific purpose
All types of data, except binary numbers, are represented in computer
registers in binary coded form

Data Representation
Number Systems
A number system for base or radix r is a system that uses distinct symbols
for r digits
Binary Number System
r = 2
2 symbols are 0,1 (called Bits)
The string of digits (binary digits or bits) 101.1 is interpreted to
represented the quantity
1 X 2
2
+ 0 X 2
1
+ 1 X 2
0
+ 1 X 2
-1
1 Byte = 8 bits, 1 KB (called 1 Kilo Byte) = 2
10
Bytes, 1 MB (called 1
Mega Byte) = 2
20
Bytes


Data Representation
Octal Number System
r = 8
8 symbols are 0,1,2,3,4,5,6,7
The string of digits 125.6 is interpreted to represented the quantity
1 X 8
2
+ 2 X 8
1
+ 5 X 8
0
+ 6 X 8
-1

Decimal Number System
r = 10
10 symbols are 0,1,2,3,4,5,6,7,8,9
The string of digits 125.6 is interpreted to represented the quantity
1 X 10
2
+ 2 X 10
1
+ 5 X 10
0
+ 6 X 10
-1
Data Representation
Hexadecimal Number System
r = 16
16 symbols are 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
The string of digits D25.B is interpreted to represented the quantity
D X 16
2
+ 2 X 16
1
+ 5 X 16
0
+ B X 16
-1
Data Representation
Decimal
Number
Binary-coded decimal (BCD)
number
Binary
Representation
0
1
2
3
4
5
6
7
8
9
10
20
234
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
0001 0000
0010 0000
0010 0011 0100
0
1
10
11
100
101
110
111
1000
1001
1010
10100
11101010
Data Representation
Signed Number Representations
Signed-magnitude Representation
Signed-1s complement Representation
Signed-2s complement Representation
Data Representation
Signed Number Representations
0 1 1 1 +7 +7 +7
0 1 1 0 +6 +6 +6
0 1 0 1 +5 +5 +5
0 1 0 0 +4 +4 +4
0 0 1 1 +3 +3 +3
0 0 1 0 +2 +2 +2
0 0 0 1 +1 +1 +1
0 0 0 0 +0 +0 +0
1 0 0 0 -0 -7 -8
1 0 0 1 -1 -6 -7
1 0 1 0 -2 -5 -6
1 0 1 1 -3 -4 -5
1 1 0 0 -4 -3 -4
1 1 0 1 -5 -2 -3
1 1 1 0 -6 -1 -2
1 1 1 1 -7 -0 -1
b
3
b
2
b
1
b
0
Signed-magnitude
Representation
Signed-1s Complement
Representation
Signed- 2s Complement
Representation
Data Representation
Floating-Point Representations
Floating-point representation of a number has 2 parts
Mantissa signed, fixed point representation
Exponent to designate the position of the decimal (or, binary) point

Value Mantissa Exponent Interpretation
+ (6132.789)
10
+04 +6132789 +(0.6132789)
10
X 10
+4
+ (1001.11)
2
01001110 000100 +(0. 1001110)
2
X 2
+4
Data Representation
Alphanumeric Representation
Many applications require the handling of data that consist not only of
numbers, but also of the letters of the alphabet and certain special characters
Alphanumeric character set includes 10 decimal digits, 26 letters of the
alphabet and a number of special characters, such as $,+ and =. Such a set
contains between 32 to 64 elements (if only upper case letters are included) or
between 64 to 128 elements (if both uppercase and lowercase letters are
included).
The standard alphanumeric binary code is the ASCII (American Standard Code
for Information Interchange), which uses 7 bits to code 128 characters
Another alphanumeric code used in IBM computers is the EBCDIC (Extended
BCD Interchange Code), which uses 8 bits for each character. It has the same
character symbols as ASCII (I.e, the character set is the same) but the bit
assignments to characters is differen
Structure of Computer System
Structural Components
Central Processing Unit (CPU)
ALU
Control Units
Registers
Interconnections
Main Memory (MM)
Input/Output
CPU M
D1
ICN
DK D2
..
Central
Processing Unit
Main
Memory
System Bus
(Inter
Connected
Network)
IO
Devices
Basic Computer Structure
Structure of Computer System
Bus Structures
A set of parallel conductors, which allow devices attached to
it to communicate with the CPU.
The bus consists of three main parts:
Control lines
These allow the CPU to control which operations the devices
attached should perform,
read or write
Address lines
Allows the CPU to reference certain (Memory) locations within
the device
Data lines
The meaningful data which is to be sent or retrieved from a
device is placed on to these
lines


Structure of Computer System
Structure of Computer System
Bus Structures
(Time) Shared Bus system
The bus can be used to transfer information between different sets of
devices at different times, but only two units can actively use the bus at
a given instant
Main Virtue is its Low cost and flexibility of attaching I/O devices
Dedicated Bus system
The bus is designed to link only two devices
It has better performance because of effective parallel processing
The main drawback is its high cost.
Adding a device to the system is difficult as the device needs to be
physically attached to other devices.
Instruction Set
Elements of a Machine Instruction
Operational code
Specifies the operation to be performed
Address Field
Designates a memory address or a processor register
Mode Field
Specifies the way the operand or the effective address is determined
Types of Operations
Data Transfer
Arithmetic
Logical
Input/Output
Addressing Methods: Basic Concepts
Addressing Modes
Rule for interpreting or modifying the address field of the instruction before the
operand is actually referenced
Register mode (operands are in registers within the CPU)
Immediate mode (operand is specified in the instruction itself)
Direct Address mode (effective address = address field)
Indirect Address mode (address field contains the address where the effective
address is stored)
Indexed Addressing mode (effective address = address field + content of an index
register)
Relative Address mode (effective address = address field + content of the
program
counter)
Fundamental Concepts : Summary
Computer Architecture refers to those attributes that are visible to the
programmer. That deals with logical execution of a program
Computer Organization refers to the operational units and their
interconnections that realize the architectural specification
Data Types can be of two types,
Numbers used in arithmetic operation
Letters of alphabet used in data processing
Number system includes, Binary, Octal, decimal and hexadecimal number
The structural components of a computer are, CPU, main memory and
input/output
A bus is a set of parallel conductors, which allow devices attached to it to
communicate with the CPU.
Addressing modes are rule for interpreting or modifying the address field
of the instruction before the operand is actually referenced

3.0 Central Processing Unit
Introduction:
This Topic gives an overview on the major components of CPU.



Objective:
After completing this Topic, you will be able to understand,
1. Major Components of CPU
2. List of special purpose registers for the basic computer
3. Overview of CPU Behavior
4. Instruction Fetching
5. Feting a word from Memory
6. Storing Word in Memory
7. Register Transfers
8. Arithmetic Logic Unit
9. Register Gating
3.0 Central Processing Unit
10. Timing of Data Transfer
11. Control Unit
12. Hard Wired Control
13. Micro Programmed Control
14. Control Sequence
15. Micro Instructions

Central Processing Unit
Control
Arithmetic logic Unit
(ALU)
Register set
Major components of CPU
List of special purpose registers for
the basic computer
Register
symbol
Register
name
Function
MDR
MAR
AC
IR
PC
TR
INPR
OUTR
Memory Data Register
Memory Address Register
Accumulator
Instruction Register
Program Counter
Temporary Register
Input Register
Output Register
Holds memory operand
Holds address for memory
Processor register
Holds instruction code
Holds address of instruction
Holds Temporary data
Holds Input character
Holds output character
Central Processing Unit
Instruction
Decoder
IR
R0
R(n-1)
TEMP
Z
ALU
Y
MDR
MAR
PC
Add
Sub
XOR
Address
lines
Data
lines
Memory
lines
ALU
control
lines
CPU
bus
Carry-in
Overview of CPU Behavior
Begin
Yes
No Are there
instructions waiting?
Are there interrupts
waiting?
Execute the
instruction
Transfer to interrupt-
handling program
Fetch the next
instruction
Yes
No
Overview of CPU behavior
Instruction Fetching
Execution of a complete instruction
Add (R3), R1
Fetch the instruction
Fetch the operands
Perform the addition
Load the result in R1
Instruction fetching
The instructions in a program to be executed are loaded in
sequential locations in the MAIN MEMORY
CPU fetches one instruction at a time and performs the
functions specified
Branching and jump changes the sequence of instructions
The dedicated CPU register , Program Counter(PC) keeps
track of the address of the next instruction
The contents of the memory location pointed by PC are
stored in IR
IR [[PC]]
The content of the PC will be incremented by 1
PC [PC] + 1

For fetching word from Memory CPU has to specify the
address of the memory location
Request a READ operation
CPU transfers address of the required word to MAR
MAR connected to Address lines of Memory
Address transfer to Memory
CPU waits until Memory unit sends the signal of completing
READ request
MFC (Memory Function Completed) Signal
Data will be available on Data Lines
The DATA will be loaded into MDR
This data will be available for processing

Fetching a word from Memory
Example 1
R1 - : Address of the Memory location
R2 - : The data fetched from Memory
The Various Steps
MAR [R1]
Read
Wait for MFC signal ( WMFC )
[R2] [MDR]
The duration of step 3 depends on the speed of memory
Time for READ from memory > time for performing single operation
within CPU
Functions which do not need MAR , MDR can be carried out during
WAIT period
E.g. PC can be incremented

Example
Storing Word In Memory
The address of the data to be written is loaded into MAR
The data is loaded into MDR
Write command is issued
Wait for MFC signal
Steps 1 and 2 can be carried out simultaneously for Multiple Bus
Structure only
Example 2
R1 :- Memory address of the data
R2 :- Data to be stored in Memory
Steps Involved
MAR [R1]
MDR [R2]
Write
Wait MFC

Storing Word in Memory
To enable data transfer between various blocks
connected to BUS , I & O gating is required
Transfer of data from R1 to R4 needs following
action
1) R1
out
= 1, Content of R1 will be on Bus
2) R4
in
= 1 , this Loads Data from Bus to R4
Register Transfers
No internal storage in ALU
To add 2 numbers, 2 operands have to be available to ALU simultaneously
Register Y contains 1 number
The other number is gated onto the BUS
The result stored in Temporary register Z
The function performed by ALU depends on signal applied to ALU control
lines
If Add-line set to 1, causing output of ALU to be SUM of inputs to ALU
Example
R1
out
, Y
in
R2
out
, Add
Z
in
Z
out
, R3
in
Arithmetic Logic Unit
Register Gating
Electronic Switches are connected to Gates Of the Register
Functions like Mechanical On/Off switches
Switch in ON state, transfers data from Register to Bus
Switch in OFF state, the register output is electrically disconnected from Bus
The output of register-switch circuit can be in one of the 3 States - 1 , 0 , open
circuit
The separate control input is used to enable the Gate O/P to drive bus to 0, 1
or electrically disconnected state (Open circuit)
Timing of Data Transfers
Finite delay is encountered for Gate to open and then for the data to travel
along Bus to ALU
Propagation Delay through ALU Adder circuit
For the result to be properly stored in register Z, data must be maintained on
the bus for some time (Hold time)

Register Gating
Control Unit
To execute instructions, CPU generates control signals via Control Unit
The 2 categories of various techniques are
Hard Wired Control
Micro Programmed Control

Hard Wired Control
The control logic is implemented with gates, flip-flops, decoders and
other digital circuits
It can be optimized to produce a fast mode of operation
It requires changes in the wiring among the various components if the
design is to be changed or modified
The goal is to minimize the number of components used and maximize
the speed of operation
Control Unit




Status Signals
Control Signals
Sequential logic
circuit
Instruction
Register
General structure of a hardwired control unit
Hardware Control Unit
It is built around a storage unit called control memory, where all the
control signals are stored in a program like format resembling
(microprograms)
The microprograms are designed to implement or emulate the behavior of
a given instruction set
Each instruction causes the corresponding micro program to be fetched
and its control information extracted in a manner that resembles fetching
and execution of a program from the main memory
Control signals are organized into formatted words (micro instructions)
Design changes can be easily implemented just by changing the contents
of the control memory
More costly due to the presence of the control memory and its access
circuitry, and it tends to be slower for the extra time required to fetch
micro instructions from the control memory

Micro Programmed Control







Control
memory
Address
logic
Microinstruction
Register
Decoder
Instruction
Register
Status
signals
Control
signals
General structure of a microprogrammed control unit
Micro Programmed Control
Control Sequence
Step Action

1

PC
out
, MAR
in
, Read, Clear Y, Set carry-in to ALU, Add, Z
in

2 Z
out,
PC
in,
WMFC

3 MDR
out
, Ir
in

4 R3
out
, MAR
in
, Read

5 R1
out
, Y
in
, WMFC

6 MDR
out
, Add, Zin

7 Z
out
, R1
in
, End



Micro Instructions
R
e
a
d

1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Micro-
instru . .
ction
P
C
i
n

P
C
o
u
t

M
A
R
i
n

M
D
R
o
u
t

I
R
i
n

Y
i
n

C
l
e
a
r

Y

C
a
r
y
-
i
n

A
d
d

Z
i
n

Z
o
u
t

R
I
o
u
t

R
I
i
n

R
3
o
u
t

W
M
F
C

E
n
d

Central Processing Unit : Summary
The Central Processing Unit (CPU) performs the actual processing of
data. The data it processes is obtained, via the system bus, from the
main memory. Results from the CPU are then sent back to main
memory via the system bus. In addition to computation the CPU
controls and co-ordinates the operation of the other major
components. The CPU has two main components, namely:
The Control Unit -- controls the fetching of instructions from the main
memory and the subsequent execution of these instructions. Among
other tasks carried out are the control of input and output devices and
the passing of data to the Arithmetic/Logical Unit for computation.
The Arithmetic/Logical Unit (ALU) -- carries out arithmetic operations
on integer (whole number) and real (with a decimal point) operands. It
can also perform simple logical tests for equality and greater than and
less than between operands.

Basic Concepts

Ideally, the Memory would be Fast, Large, &
Inexpensive
The maximum size of memory depends on
Addressing Scheme
16-bit computer generates 16 bit addresses & is
capable of addressing up to 2
16
= 64 K memory
locations
Number of locations = Address space
ROM (Read Only Memory)
Contents can not be changed online if they can be
altered at all
Widely used to store control programs such as
microprograms
Well suited for storing fixed programs and data
Can also be a part of Main Memory
Data are written to ROM when it is manufactured




ROM (Read Only Memory)
Programmable ROM
User can load data into PROM
PROMs are more flexible & convenient
PROMs are less expensive and fast
Erasable Programmable ROM
Here data can be erased and new data can be loaded
EPROMs are capable of storing information for longer time
While using EPROMs, memory changes and updates can be easily
made
Drawback - For reprogramming and erasing the contents of EPROM
chip it should be physically removed
Electrically Erasable Programmable ROM
These ROMs can be electrically erased and programmed
These chips need not have to be removed for Erasure
The specific cells of EEPROM can also be erased
Drawback - Different voltage is needed for READ, WRITE, ERASE

RAM (Random Access Memory)
RAM is now used to designate read/write memory (ROM
is also random access)
If any memory location can be accessed for READ or
WRITE operation in some fixed amount of time that is
independent of locations address then that memory
unit is RAM
Main memory is mostly of this type
Access time in case of Secondary Storage devices, access
time depends on address or position of data
Semiconductor Memories are available in a wide range of
speeds
Cycle time : 100 nano-sec to less than 10 nano-seconds


4.0 The Memory
Introduction:
This Topic gives an overview on the memory organization of a computer.

Objective:
After completing this Topic, you will be able to understand,
1. Basic Concepts
2. Read Only Memory (ROM)
3. Random Access Memory (RAM)
4. Internal Organization of RAM
5. Static Memory
6. Dynamic Memories
7. Speed, Size and Cost
8. Characteristics of Memory System
9. Memory Hierarchy
10. Main Memory
11. Connection Between Main Memory and CPU
12. Cache Memory
13. Virtual Memory
14. Paging
Internal Organization of RAM

Memory cells are organized in form of arrays
Each cell is capable of storing 1 bit of information
Each row of cell constitute a Memory Word
Cells of a row are connected to common line called Word line
Word line is driven by Address Decoder
Cells in each column are connected to Sense/Write circuit by 2 bit
lines
Sense/Write circuit is connected to data lines
During READ operation S/W circuit sense or Read the information
stored in a cell selected by word line
S/W circuit transmits this information to Data line
During WRITE operation, S/W circuit receive input information &
store in the cell of selected word
Static Memories
Memories that are capable of retaining their state as long
as power is applied are known as Static Memories
Continuous power is needed for retaining state of cell
SRAM is volatile memory
SRAM consumes very less power as the current flows
only when cell is accessed
SRAMs can be accessed very quickly
Access time is around 10 ns
SRAM is used where speed is a concern


Dynamic Memories
Static RAM is very fast but very costly as their cell require several
transistors
DRAM cells are simple and hence less expensive
The state of the cell can not be retained permanently and hence
called DRAM
In DRAM, the information is stored in the form of charge on
capacitor
DRAM is capable of storing information for only a few milliseconds
The contents of the cell should be retained for a much longer time
Periodic refreshment by restoring the capacitor charge to its full
value is the way to overcome above problem

Dynamic Memories
DRAM cell consists of a capacitor and a transistor
In order to store the information in this cell Transistor is turned ON
and appropriate voltage is applied on bit line
This causes the known amount of charge to be stored on capacitor
Once the Transistor is turned OFF, Capacitor begins to discharge
Information stored in cell can be retrieved correctly only if it is
READ before the charge on capacitor drops below THRESHOLD
VALUE
DRAM are widely used in MAIN Memory because of its
Low cost
High Density

Speed, Size & Cost
Fast Memory can be achieved if SRAM chips are
used, Cache Memory can be built with SRAM
technology
SRAM chips are Costly as their basic cell consists of 6
Transistors
The basic cells of DRAM are simple and less
expensive but DRAM are slower
Main Memory can be built with DRAM technology


Characteristics of Memory System
Location: CPU, Internal (main), External (secondary)
Capacity: Word size, Number of words
Unit of Xfer: Word, Block
Access Methods: Sequential, Direct, Random,Associative
Performance: Access time, Cycle time, Xfer rate
Physical Type: Semiconductor, Magnetic surface
Physical Char:Volatile/non-volatile,erasable/nonerasable
Memory Hierarchy
Overall goal : to obtain the highest possible average access speed while minimizing the
total cost of the entire memory system
Registers
Cache
Main Memory
Magnetic Disk (Disk Cache)
Characteristics of the hierarchy
Decreasing cost / bit
Increasing capacity
Increasing access time
Decreasing frequency of access of memory by CPU
Memory hierarchy in a computer
system
Magnetic
tapes
Magnetic
disks
CPU
Cache
memory
Main
memory
I/O
processor
Registers
DMA
Main Memory
Central storage unit, relatively large and fast memory
used to store programs and data during computer
operation
Mostly made up of RAM, which is used to store the
bulk of programs and data that are subject to change
The ROM portion of main memory is required for
storing an initial program called a bootstrap loader,
whose function is to start the computer software
working when the computer is turned on

Connection of the Main Memory
to CPU
CPU
MAR
MDR
k-bit
address bus
n-bit
data bus
Main Memory
Up to 2k addressable
locations
Word length = n bits
Control lines
(Read, Write, MFC, etc.)



Connection of the Main Memory to
CPU
Connection between Main Memory & CPU
Example
k-bit address bus
n- bit data bus
2
k
addressable locations
Word length = n bits


Memory Access and Cycle Time
Memory Access Time
Useful measure of speed of Memory
Elapsed time between Initiation of operation and its completion
Example :
Time between READ and MFC signal
Memory Cycle Time
Minimum time delay required between two successive Memory
Operations
For Example, the time between two successive READ
operations.
Cycle time is usually > Access time

Cache Memory
Concept: To exploit the locality of reference
Mapping fuction: Direct-mapped, Associative-mapped, Set-associative-mapped
Replacement algorithm: Least-recently-used algorithm
The effectiveness of Cache Mechanism is based on a property of computer
programs called Locality of Reference
Analysis of Program Execution shows that Most of the execution time is spent on
routines in which instructions are executed repeatedly
These instructions could be DO Loop, few procedures etc


Cache Memory

Many instructions in Localized Areas of Programs are executed repeatedly during
some time period
Remainder of program is accessed infrequently
The above concept is Locality of Reference
Temporal : The recently executed Instruction is likely to be executed again very soon
Spatial : The instructions in close proximity to a recently executed instruction are also likely to
be executed soon
If Active segments can be placed in Fast Cache Memory, then total execution time
can be reduced
Operation of a Cache Memory is dependent on Locality of Reference concept
The cache memory control circuitry is designed to take advantage of the property
of locality of reference
The temporal aspect of Locality Of Reference suggests that whenever information
is needed, it should be brought to Cache Memory


Cache Memory
The spatial aspect suggests to bring adjacent instructions as well in
advance to Cache Memory
Cache Memory can store less number of blocks than Main memory
Correspondence between Cache Memory & Main Memory BLOCKS is
specified by a Mapping Function
When the cache is full and memory word not in cache is referenced, Cache
Control Hardware decides which block should be removed etc.
The collection of rules for making this decision constitutes Replacement
Algorithm
CPU does not need to know whether the word is available in Cache or not.
CPU issues Read/Write request, Cache control circuitry checks the
existence of the word in Cache, If so the necessary operation will be
performed
Cache Memory
Read from Cache Memory
Main memory is not involved
The word is directly read from the cache memory
Write Into Cache Memory
In Technique called Write-Thru, Cache & Main Memory are
updated simultaneously
In Write-Back or Copy-Back technique
The bit in Cache is marked with associated flag bit (called as Dirty
or Modified bit)
The corresponding bit location of Main Memory is modified when
the Block containing that dirty bit is getting removed from Cache
Cache Memory
Read Miss
Addressed Word in a Read operation is not in Cache, a Read Miss occurs
The missed Block is copied from Main Memory to Cache and the request is
forwarded to CPU
Alternatively, this word may be sent to CPU as soon as it is read from Main
Memory
This approach reduces CPUs waiting period at expense of COMPLEX
circuitry
Write Miss
Write miss = Addressed word not found in Cache Memory
In Write-Through protocol, the information is written back into Memory
In Write-Back protocol, block containing addressed word is taken Into
Cache
The desired word is overwritten with new information



Virtual Memory
In most modern computer, The main memory is not sufficient to accommodate
voluminous programs and data
When program is not getting accommodated completely in Main Memory then part of
the program which is not under execution is stored in Secondary storage devices
Operating systems moves data and program between Main Memory & Secondary
device
Techniques that automatically move the data & programs into Main Memory are called
as Virtual Memory techniques
The binary addresses that the processor issues for either instructions or data are called
VIRTUAL Addresses
These addresses are converted into actual (physical) addresses by the combination of
hardware & software

Cache Memory vs. Virtual Memory
The Cache bridges the speed gap between Processor
& Main Memory
The Cache is implemented in Hardware
Virtual Memory bridges the Size gap & Speed gap
between Main Memory & Secondary Storage Devices
Virtual Memory concepts are implemented by
Software Techniques

Virtual Memory
Virtual-memory Address Translation
Virtual Address is the address generated by the CPU
Physical Address is the address of the accessed memory
location
Memory Management Unit translates the Virtual
Addresses into Physical Addresses
If the data is not there in Main Memory, MMU causes the
operating system to bring data into Memory from Disk
This entire process is transparent to the CPU

Paging
f
p d d f
p
Logical
Address
Page Table
Physical
Address
Physical
Memory
Paging

Paging is a memory allocation technique where the concept of
virtual memory can be implemented.
Assume all Programs And Data are composed of Fixed length Units
called PAGES
PAGE = Block of words that occupy contiguous locations in Main
Memory
Page = 2 K to 16 K Bytes in length
Each Virtual address is interpreted as Virtual Page number + Offset
Offset specifies the location of particular word or Byte within a
page
PAGE TABLE contains the location of each page in Main Memory
This includes the Main Memory Address where Page is stored and
current status of page
Paging
Main Memory area which holds one page is called Page Frame
Starting address of Page Table is kept in PAGE TABLE BASE REGISTER
Each entry into page table also includes control bits which store the status of page
Ideally Page Table should be kept into MMU
As Page Table is large and hence kept in a Main Memory Unit
Small portion of Page Table can be kept into MMU
That small portion is called as TLB
TLB stands for Translation Look-aside Buffer which is CACHE
TLB contains most recently accessed pages
The contents of TLB be coherent with contents of Page Table in memory
If the contents of page tables are getting changed, corresponding entries in TLB should be changed
One of the control bit is provided for this purpose


Paging
Address Translation Steps in Paging
Given a Virtual Address, the MMU looks in TLB for the
referenced page
If the Page table entry for this page is found in the TLB, the
physical address is obtained quickly
If there is a miss in TLB, the required entry is obtained from
Page Table in Main Memory
TLB is updated
When page is not found in Main Memory then PAGE FAULT is
said to have occurred
MMU asks the operating System to Load the page from Disk to
Memory

Memory Organization
Virtual memory Organization
Processor
Cache
Main Memory
Disk Storage
MMU
Virtual address
Physical address
Physical address
Data
Data
DMA transfer
The Memory:Summary
Contents of ROM cannot be changed online. Data are written to ROM when it is manufactured
ROM can be of the following types: P-ROM, EP-ROM, ERP-ROM
RAM is used as READ/WRITE memory
Memories that are capable of retaining their state as long as power is applied are known as Static
Memories
DRAM cells are simple and hence less expensive. The state of the cell can not be retained
permanently and hence called DRAM
Cache memory is the semiconductor memory made of technology, which is better in speed than a
main memory. It is placed in between main memory and CPU.
When program is not getting accommodated completely in Main Memory then part of the program
which is not under execution is stored in Secondary storage devices
The binary addresses that the processor issues for either instructions or data are called VIRTUAL
Addresses
Paging is a memory allocation technique where the concept of virtual memory can be
implemented.

5.0 Input Output Organization
Introduction:
This Topic gives an overview on hardware I/O.

Objective:
After completing this Topic, you will be able to understand,
1. Connection of I/O bus to input output devices
2. Accessing I/O Devices
3. Hardware to Connect I/O devices to BUS
4. Implementation of I/O operations
5. Programmed I/O
6. Memory-mapped IO
7. IO-mapped IO
8. Interrupts
9. How to achieve better Speed of Execution
10. Direct Memory Access (DMA)
11. BUS ARBITER
12. Input-Output Processor
Connection of I/O bus to
input/output devices
Interfa
ce
Processor
I/O bus
Keyboard and
display
terminal
Data
Interface Interface Interface
Printer
Magnetic
disc
Magnetic
tape
Address
Control
Accessing I/O Devices

Most modern computers use single bus arrangement for
connecting I/O devices to CPU & Memory
Bus consists of 3 set of lines -: Address, Data, Control
Processor places a particular address on address lines
Device which recognizes the address responds the commands
on Control lines
Processor requests for either Read / Write
The data will be placed on Data lines
Any machine instruction that can access memory can be used
to transfer data to or from I/O devices

Hardware to Connect I/O devices to
BUS

Interface Circuit
Address Decoder
Control Circuits
Data registers
Status registers
The Registers in I/O Interface
Status Registers like SIN, SOUT
Data Registers like Data-IN, Data-out
Implementation of I/O operations

Programmed I/O
Interrupts
Direct Memory access (DMA)
I/O Processor (IOP)
Programmed I/O
Here Processor repeatedly checks the status flag to achieve synchronization
between CPU and I/O devices
Useful in small low-speed computers or in systems that are dedicated to monitor a
device continuously
Inefficient, the CPU wastes time in checking the flag instead of doing some other
useful processing task
Programmed I/O
CPU, memory and IO devices usually communicate via the system register
An IO device is connected to the bus via IO ports which from CPUs
perspective is an addressable data register
The address line of the system bus used to select memory locations can also
be used to select IO devices
2 different techniques can be used
Memory-mapped IO
IO-mapped IO
Memory-mapped IO
IO port 3
Data
Address
Main
memory
WRITE
READ
CPU
IO port 1 IO port 2
IO
device A
IO
device B
A part of main memory address space is assigned to the IO ports
Used in Motorola 680X0 series

IO-mapped IO

The main memory and IO address spaces are separate
Used in Intel 80X86 series

Data
Address
Main
memory
WRITE M
READ M
CPU
IO port 1 IO port 2 IO port 3
IO
device A
IO
device B
READ IO
WRITE IO
Interrupts

During WAIT period of a Processor, other task can be
performed
When I/O Device is ready, it sends the INTERRUPT
signal to processor
One of the Bus Control line is dedicated for interrupt
request
Using Interrupts we are ideally eliminating WAIT
period

Example 1

Consider a task which requires computations to be performed and results to
be printed on Line printer
This is followed by more computations and output
Program consists of 2 routines COMPUTE & PRINT
The printer accepts only one line at a time
PRINT routine should send 1 line of text at a time and wait for getting it
printed
The above simple approach is time consuming as CPU has to wait for long time
If it is possible to overlap printing & computation, I.e. to execute COMPUTE
routine while printing is in progress, faster overall speed of execution can be
achieved

How to achieve better Speed of
Execution
First, COMPUTE routine is executed to produce n lines of
output
PRINT routine is executed to send first line of text to
printer
PRINT routine is temporarily suspended
Execution of COMPUTE routine can continue
After completion of printing of current line, Printer sends
an Interrupt Signal to CPU
In response, CPU stops execution of COMPUTE routine
CPU transfers control to PRINT routine
PRINT routine sends next line to Printer

Direct Memory Access (DMA)

So far, We have discussed the data transfer between Processor & I/O
devices
For I/O transfer, Processor determines the status of I/O devices
To do this
Processor either polls a status flag in I/O Interface device or
CPU waits for the device to send Interrupt signal
Considerable overhead is incurred in above I/O transfer processing
To transfer large blocks of data at high Speed, between EXTERNAL devices
& Main Memory DMA approach can be used
The continuous intervention by Processor can be drastically reduced
DMA controller allows data transfer between I/O device and Memory

Direct Memory Access (DMA)
DMA controller acts as a Processor
CPU controls the operation of DMA controller
To initiate the transfer of blocks of words, the processor sends
the following data to controller
The starting address of the memory block where data are available
(for read) or where data are to be stored (for write)
The word count, which is the number of words in the memory block
Control to specify the mode of transfer such as read or write
A control to start the DMA transfer
After receiving information, DMA controller proceeds to
perform the requested transfer
After entire transfer of word block, DMA controller sends an
Interrupt signal to Processor
Direct Memory Access (DMA)
Registers in a DMA Interface
The first register stores the starting address.
Second register stores Word count
Third register contains status and control flags
Status & Control Flags
R/W bit determines the direction of data transfer
R/W bit = 1 , READ operation
R/W bit = 0 , WRITE operation
Done Flag -: It will be 1, when controller finishes data transfer
IE (Interrupt Enable flag) : When set to 1, this flag causes the controller to raise an
Interrupt after data transfer
IRQ -: ( Interrupt Request )
IRQ bit will be 1, when it has requested an interrupt




Direct Memory Access (DMA)
Role of Operating System in DMA transfer
I/O operations are always performed by Op.System of computer
in response to a request from application program
Operating systems initiates the DMA operation for the current
program
Operating system starts the execution of another program
After transfer DMA controller sends the Interrupt signal to
Processor
Operating system, puts the suspended program in the Runnable
state so that that will be selected NEXT for I/O transfer

BUS ARBITER
DMA controller & Processor should not use the
same BUS at a time to access Main Memory
Different DMA controllers used for various I/O
devices should not use same BUS to access Main
Memory
To resolve above conflicts, a special circuit called
BUS ARBITER is used
Bus Arbiter co-ordinates the activities of all devices
requesting Memory Transfers

Direct Memory Access (DMA)
Direct Memory Access (DMA)
Input-Output Processor
Processor with direct memory access capability that
communicates with I/O devices
Unlike the DMA controller that must be setup entirely by
the CPU, IOP can fetch and execute its own instructions
IOP instructions are specially designed to facilitate I/O
transfer
IOP can also perform other processing tasks, such as
arithmetic, logic, branching and code translation
Block diagram of a computer with
I/O processor
Memory Unit
Peripheral Devices
Central
processing unit
(CPU)
Input-Output
Processor (IOP)
I/O bus
PD PD PD PD
Input Output Organization:Summary
Most modern computers use single bus
arrangement for connecting I/O devices to
CPU & Memory
Bus consists of 3 set of lines -: Address, Data,
Control
I/O operations are implemented through
programmed i/o, interrupts, DMA and I/O
Processor


6.0 Concept of Pipelining
Introduction:
This Topic gives a brief introduction on pipelining

Objective:
After completing this Topic, you will be able to
understand,
o 1. Pipelining
o 2. Effect of operation that takes more than 1 clock
cycle
o 3. Why cant the pipeline operate at its maximum
theoretical speed?
Concepts of Pipelining
The Performance of a computer depends on
The way in which Compiler translates programs into machine language
Choice of machine language instructions
The design of hardware
Concept of Parallelism
Cache Memory concept
Measure of performance - Processor clock cycle
Execution Time , T
T = ( N x S ) / R
Where N = Machine language instructions
S = Average number of basic steps per instruction
R = Clock rate in Cycles / second
Improvement in the performance by PIPELINING techniques
RISC and CISC processors

Concepts of Pipelining
What is Pipelining
The speed of Execution of Programs depends on
Faster circuit Technology to build Processor & Main Memory
Arrange Hardware so that more than 1 operation can be performed at the
same time
The pipelining is the effective way of organizing parallel activity in a
computer System
It is a technique of decomposing a sequential process into
suboperations, with each subprocess being executed in a special
dedicated segment that works concurrently with all other segments
Any operation that can be decomposed into a sequence of
suboperations of about the same complexity can be implemented by a
pipeline processor
This technique is efficient for those applications that need to repeat
the same task many times with different sets of data

Concepts of Pipelining
Example
We want to perform the combined multiply and add operations with a stream of numbers.
A
i
* B
i
+ C
i
for i = 1,2,3,..,7

Ai
Ci
bi
R1
R3
R2
R5
R4
Multiplier
Adder
R1 through R5 are registers that
receive new data with every clock pulse
The suboperations performed are:
R1 Ai, R2 Bi
R3 R1*R2, R4 Ci
R5 R3 + R4
Concepts of Pipelining
Example (contd.)
Clock
Pulse
Number
Segment1
R1 R2
Segment2 Segment3
R3 R4 R5
1
2
3
4
5
6
7
8
9
A1
A2
A3
A4
A5
A6
A7
----
----


B1
B2
B3
B4
B5
B6
B7
----
----
----
A1*B1
A2*B2
A3*B3
A4*B4
A5*B5
A6*B6
A7*B7
----
----
C1
C2
C3
C4
C5
C6
C7
----

----
----
A1*B1+C1
A2*B2 +C2
A3*B3 +C3
A4*B4 +C4
A5*B5 +C5
A6*B6 +C6
A7*B7 +C7
Contents of registers
Concepts of Pipelining
There are 2 areas of computer design where the pipeline
organization is applicable:
An arithmetic pipeline divides an arithmetic operation into
suboperations for execution in the pipeline segments
An instruction pipeline operates on a stream of instructions by
overlapping the fetch, decode and execute phases of the instruction
cycle
With a four-stage pipeline, the rate at which instructions are
executed is almost four times that of sequential operation
Pipelining does not speed up the time required for the
execution of an instruction
Pipelining results in more throughput (Number of Instructions
per second)
Stalling of pipeline operation
Possible reasons for Stalling of Pipeline
operation
1) Some Arithmetic operations
2) Cache miss
3) Page fault
4) Some logic operations
5) Divide operations

Effect of operation that takes more than 1
clock cycle
Why cant the pipeline operate at its
maximum theoretical speed?
Different segments may take different times to
complete their suboperation. The clock cycle must be
chosen to equal the time delay of the segment with
maximum propagation time. This causes all other
segments to waste time while waiting for the next
clock
The time delay for a pipeline circuit is usually greater
than the nonpipeline equivalent circuit

Pipelining : Summary
Pipelining is the effective way of organizing
parallel activity in a computer System
There are 2 areas of computer design where the
pipeline organization is applicable:
An arithmetic pipeline divides an arithmetic
operation into sub-operations for execution in
the pipeline segments
An instruction pipeline operates on a stream of
instructions by overlapping the fetch, decode and
execute phases of the instruction cycle

Computer Architecture: Next Step
Resource
Type
Description
Book Computer System Architecture by M.
Morris Mano
Book
Computer Architecture and
Organization by John P. Hayes
URL
White
Paper
Research
article
Classroom
training

You might also like