Professional Documents
Culture Documents
to
Computing System Design
Dr.Pradeep.C
Principal
Mar Baselios Christian College of
Engineering and Technology,
Kuttikkanam
Digital Systems
DIGITAL
CIRCUITS
The Concept of a 3
Computer
Application software
Systems software
User Hardware
Operating system
compiler
assembler
Programs user
writes and runs
Software 4
Compiler Assembler
Hardware
memory
Machine instructions
Binary Machine Code 5
00000000101000010000000000011000
00000000000110000001100000100001
10001100011000100000000000000000
10001100111100100000000000000100
10101100111100100000000000000000
10101100011000100000000000000100
00000011111000000000000000001000
Input
Control
Datapath Memory
Central Processing
Unit (CPU)
Application
or “processor” Output
software
Systems software
Hardware
FIVE PIECES
Hardware Processes Machine 7
Code
User program is translated into binary machine code by compiler
and assembler and is stored in memory.
Control unit reads program from memory, one word at a time
(fetch operation).
Control unit deciphers the instruction bits of program word and
configures datapath logic, which processes data and saves
results in memory (decode and execute operations).
Digital Hardware of Computer
8
Control Finite
State Machine
Memory (FSM)
Datapath:
Arithmetic logic
and registers
Input/Output bus
George Boole, 1815-1864
Born, Lincoln, England
Professor of Math., Queen’s
College, Cork, Ireland
Book, The Laws of Thought,
1853
Wife: Mary Everest Boole
Claude E. Shannon (1916-2001)
William Shockley
(seated)
John Bardeen
Walter Brattain
Jan 23, 1948- first
junction transistor
Nobel Prize in Physics
1956
Integrated Circuit (1958)
Processors Reliability ?
Instruction Flexibility
90% Area Overhead
(Cache , Predictions)
FPGA
Device-wide flexibility
99% Area Overhead
(Configuration)
ASIC
No Flexibility
20% Area Overhead
(Testing)
× × × ×
+ +
+ Time
grade
SPACE
2 Ways to Compute
0.1 0.1 0.2 0.2 0.4 0.4
tmp
× tmp
+ tmp
× tmp
+ tmp
+ tmp
× Processor
+
×
+
×
+ Application Specific Integrated Circuit
× ASIC
Processor vs ASIC
Take longer to Take shorter time to
compute compute
slow fast
Actual computation
AMD Opteron 64-bit processor Full Custom ASIC
1MB L2 Cache 4x4 SVD Decomposition
193 mm sq 3.5 mm sq
0.18 micron CMOS 90nm CMOS
89W @ 1.8GHz 34mW @ 100 MHz clock
~3 Op / cycle (int op) 70 GOPS = 700 Op / cycle
Between Temporal & Spatial
Computing
Single ASIC
Processor
Temporal
? Spatial
Example: FPGA
Introduction to FPGA
Field Programmable Gate Array
Began as ASIC replacements
ASIC that can be configured “in the field”
At power up, configuration is load to the chip
Chip acts as an ASIC until power down
Modern FPGA more like computers
Exploit dynamic, partial reconfiguration
Embedded processors
Xilinx, Altera are 2 major market leaders
FPGA Principles
CLB SB CLB
SB SB SB
Interconnection Network
Look-Up MUX
SET
Table D Q
(LUT)
CLR Q
CLB SB CLB
SB SB SB
Interconnection Network
A B C D O
0 0 0 0 0
0 0 0 1 0 0
0
0 0 1 0 0 0
A 0
MUX O
0 0 1 1 0 0
0 SET
0 1 0 0 0 B 0 D Q
0
0 1 0 1 0 0
C 0
0
0 1 1 0 0 0 CLR Q
0 1 1 1 0 D 0
0 0
0
1 0 0 0 0 1
1 0 0 1 0
1 0 1 0 0 Configuration bits
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Example 2: Find the configuration
bits for the following circuit
A0
2-to-1 SET
MUX
D Q
A1
CLR Q
Clock A0 MUX
SET
A1 D Q
A0 A1 S
S
0 0 0 CLR Q
0 0 1
0 1 0
0 1 1 Configuration bits
1 0 0
1 0 1
1 1 0
1 1 1
Interconnection Network
Configuration
bits 0 1
0
0
CLB SB CLB
0 0
SB SB SB
Interconnection Network
Input2
CLB0 SB0 CLB1
Input1 D
SET
Q
Input2 Output
Q
Input3 CLR
Input3
CLB2 SB4 CLB3 Output
CLBs required
CLB 1 CLB 2
Input1 D
SET
Q
Input2 Output
CLR Q
Input3
0
0
MUX O MUX Output
SET
0 D Q D
SET
Q
Input1 O 1
Input2 0 Input3 1
CLR Q CLR Q
1 0
1 0
Input2
CLB0 SB0 CLB1
Input3
CLB2 SB4 CLB3 Output
Routing: Select path
Input1
Input2
CLB0 SB0 CLB1 SB1
Configuration bits
0 0
0
SB4
Configuration bits
Input3
CLB2 SB4 CLB3 Output
0 0
1
0
0 0
Configuration Bitstream
• Compile-time Reconfiguration
• One configuration per application
• System must be halted and then restarted with new
program
• Most common approach
Dynamic Reconfiguration
Flash controller
(Microblaze)
JTAG
Controller
Module B
Module
ModuleBB Bitstreams
Base system
disabled enabled
storage configuration
External Reconfigurable
Module
I/O A request Static area
area
1. System controller does not need to be placed in an external device
2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz)
3. Smaller partial bitstreams
4. No need to halt complete system when reconfiguring a module
5. Time multiplexing of FPGA resources, load and unload HW modules on demand
Relocation and Defragmentation
New
configuration
Relocation
/ c1
c5 c1 c3 defragment c5
ation
c2
c2 c3
c4 c4
Reliability?
Reliability:
Fault Tolerant Computing
Software-based Works only
for transient faults! specific
fault detection
& compensation
Fault
event HW logic & Typically works
RT-level for transient and universal
detection & permanent faults!
compensation
Execution
Unit 1
input Result out
signal (majority)
Execution Comparator
Unit 2 Voter
Error
Execution detect
Unit 3
Properties of MORe
Algorithm
Increased number of
permanent fault recovery.
A SAD
256-byte array
integer
B sad
256-byte array
go
!(i<256)
Want fast sum-of-absolute-differences (SAD) component
When go=1, sums the differences of element pairs in arrays A and B,
outputs that sum
RTL Example: Video Compression – Sum
of Absolute Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)
go
S0 !go
S0: wait for go go
sum = 0 a
S1
S1: initialize sum and index i=0
S2: check if done (i>=256) (i<256)’
S2
S3: add difference to sum,
i<256
increment index sum=sum+abs(A[i]-B[i])
S3
S4: done, write to output i=i+1
sad_reg
S4 sad_reg = sum
RTL Example: Video Compression
– Sum of Absolute Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
(i<256)’ sum 32 abs
S2 sum_clr
i<256 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
reg=sum 32
sad_ Datapath
S4
sad
Step 2: Create datapath
RTL Example: Video Compression
– Sum of Absolute
go AB_rd
Differences AB_addr A_data B_data
i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
? i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
!(i<256)
i=i+1 i_inc=1 sad_reg_ld
S4 sad_reg=sum a
sad_reg +
sad_reg_ld=1
!(i<256) (i_lt_256) Controller 32
sad
Step 3: Connect to controller
Step 4: Replace high-level state machine by FSM
RTL Example: Video Compression
– Sum of Absolute Differences
Comparing software and custom
circuit SAD
Circuit: Two states (S2 & S3) for each
i, 256 i’s 512 clock cycles
Software: Loop (for i = 1 to 256), but (i<256)’
for each i, must move memory to S2
local registers, subtract, compute i<256
absolute value, add to sum, S3
sum=sum+abs(A[i]-B[i])
increment i – say about 6 cycles per i=i+1
array item 256*6 = 1536 cycles
Circuit is about 3 times (300%) faster
Behavioral Level Design: C
5.5
to Gates C code
S0 !go
int SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_reg = sum
X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a
Functional
Synthesis Simulation
Timing
Implementation Simulation
In-Circuit
Download Verification
Synthesis Design Verification
Behavioral
HDL
Verilog HDL Simulation
Synthesize the
Functional
Synthesis Simulation design to create
an FPGA netlist
Timing
Implementation Simulation
In-Circuit
Download Verification
Implementation
Design Verification
Behavioral
HDL
Verilog HDL Simulation
Functional
Synthesis Simulation
Translate, place
Implementation Timing and route, and
Simulation
generate a
bitstream to
In-Circuit download in the
Download Verification FPGA
On-Chip Verification
ChipScope ILA System Diagram
Target FPGA
USER
Chipscope ILA
FUNCTION
USER with ILA cores
FUNCTION
ILA
ILA
PC running ChipScope
USER
FUNCTION
Control ILA
JTAG
JTAG
MultiLINX Cable or Connection
Parallel Cable III
Target Board
FPGA Development Boards
An FPGA-based development platform with a large FPGA and I/O devices
to support a wide range of digital circuits, including a complete computer
system.
Applications of Reconfigurable
Systems
• Space Missions
• Deffence
• Adaptive Embedded Systems
• Cognitive Computing
• Entertainment
LIST OF PUBLICATIONS
Jisha, M., Pradeep,C., Intelligent Selective Modular Redundancy for Online Fault
Detection of Adders in FPGA. International Journal of High Performance Systems
Architecture (IJHPSA). Accepted. Inderscience Publishers.
Saranya, R., Pradeep, C., Design and Implementation of a Reconfigurable Finite Impulse
Response Filter for Adaptive Systems. International Journal of Computational
Systems Engineering (IJCSyE). Under Review. Inderscience Publishers.
Eapen, M.E., Pradeep, C., Varghese, A.A. and Nair, J.M., 2016. Placement Strategies for
Faulty Cells in Module Relocation Based BISR Approach. Innovations in Bio-Inspired
Computing and Applications (pp. 437-446). Springer International Publishing.
Anjana, S., Pradeep, C. and Samuel, P., 2015. Synthesize of High Speed Floating-point
Multipliers Based on Vedic Mathematics. Procedia Computer Science, 46, pp.1294-
1302. Elsevier Publishing.
LIST OF PUBLICATIONS
Baby, N., Pradeep, C., Saranya, R. and Radhakrishnan, R., 2015. Synthesis of Reconfigurable Video
Compression Modules in Virtex FPGAs for Multiple Fault Repair Mechanism. Procedia Computer
Science, 46, pp.1333-1340. Elsevier Publishing.
Saranya, R., Pradeep, C., Baby, N. and Radhakrishnan, R., 2015. FPGA Synthesis of Reconfigurable
Modules for FIR Filter. International Journal of Reconfigurable and Embedded Systems
(IJRES), 4(2).
Baby, N. and Pradeep, C., 2014, July. FPGA partitioning and synthesis of reconfigurable video
compression module. In Control, Instrumentation, Communication and Computational
Technologies (ICCICCT), 2014 International Conference on (pp. 360-364). IEEE Xplore.
Saranya, R. and Pradeep, C., 2014, July. FPGA synthesis of area efficient data path for reconfigurable
FIR filter. In Control, Instrumentation, Communication and Computational Technologies (ICCICCT),
2014 International Conference on (pp. 349-354). IEEE Xplore.
Anjana, S. and Pradeep, C., 2014, July. High speed integer multiplier designs for reconfigurable
systems. In Control, Instrumentation, Communication and Computational Technologies (ICCICCT),
2014 International Conference on (pp. 393-397). IEEE Xplore.
Reshma Mary John, Pradeep C., 2013.”Responsive Back-Up Circuits (RBC) Inspired Fault-Recovery
Algorithm for Reconfigurable Systems”, Proceedings of U.G.C Sponsored III National Conference
on Modern Trends in Electronic Communication & Signal Processing.
LIST OF PUBLICATIONS
Ajith Ravindran, Soya Treesa Jose and Pradeep C "A 1.5V Area Efficient Asynchronous Adder using
MODL and Double Pass Transistor Logic" Proceedings of International Conference on Global
Innovation in Technology and Sciences (ICGITS 2013),4-6th April 2013.
Reshma Mary John, Pradeep C., 2013 ”Self-Repairing Algorithm with Shared Spare Allocation for
Reconfigurable Systems”, International Journal of Emerging Technology and Advanced
Engineering. Volume 3, Issue 8, pp 716-721.
Ajith Ravindran, Soya Treesa Jose and Pradeep C"A 1.5V Area Efficient Asynchronous Adder using
MODL and Double Pass Transistor Logic" International Journal of Scientific & Engineering
Research, Volume 4, Issue 8, August 2013
Jose, S.T. And Pradeep, C,2013 "Design of a multichannel NAND Flash memory controller for efficient
utilization of bandwidth in SSD's "proceedings of International Multi-Conference on Automation,
Computing, Communication, Control and Compressed Sensing (iMac4s), 22-23 March
2013,Kottayam,India.pp 235 - 239. IEEE Xplore
Oommen,D. And Pradeep,C.,2012 "Reconfigurable router using RLBS algorithm " Proceedings of 12th
International Conference on Intelligent Systems Design and Applications (ISDA). 27-29 Nov.2012,
Kochi, India. pp 332 - 336. IEEE Xplore
LIST OF PUBLICATIONS
Pradeep, C, Radhakrishnan, R & Philip Samuel 2014, ‘Reduced Time Testing Method for Permanent
Faults in Interconnects of Reconfigurable Hardware’, Proceedings of International Conference On
Systemic, Cybernetics and Informatics, vol. 1 & 2, pp. 018-022.
Pradeep, C, Radhakrishnan, R & Philip Samuel 2014, ‘Fault Recovery Algorithm Using King Spare
Allocation and Shortest Path Shifting for Reconfigurable Systems’, Journal of Theoretical and
Applied Information Technology, vol. 61, no.2, pp 254-261.
Pradeep, C, Radhakrishnan, R 2014, ‘FPGA Evaluation of Reconfigurable Modules with Self Repair
Mechanism’, International Journal of Reconfigurable and Embedded Systems, vol.3, no.2,pp.1-12.
Pradeep, C, Radhakrishnan, R, Saranya, R & Philip Samuel 2014, ‘Area Efficient Data Path with Online
Fault Detection Mechanism for Reconfigurable Systems’, Australian Journal of Basic and Applied
Sciences, vol.8, no.10, pp. 239-245.
Pradeep, C, Radhakrishnan, R, Neena Baby & Philip Samuel, ‘Multi objective Built in Self Repair
Algorithm with Multiple Fault Detection for Reconfigurable Systems’ ,Journal of Theoretical and
Applied Information Technology, vol. 69, no.2,pp.248-256.
Pradeep, C, Radhakrishnan, R 2014, ‘Fault Detection Methods for Interconnects of Reconfigurable
Hardware’, I-manager’s Journal on Embedded systems, vol.4, no.2, pp.1-11.
LIST OF PUBLICATIONS
Pradeep C,"Design and Implementation of 32 bit RISC Processor in FPGA" Proceedings of National
Conference NCACS 2009, SJCET, Pala, Kottayam. pp 5-10.
Pradeep C, “Verilog HDL implementation of Superscalar Processor with Speculative branch Prediction",
Proceedings of National Conference NC-(ET) 2, SAINTGITS College of Engineering, Kottayam. pp
315-318.
Pradeep C, NIMISHA SUBHASH, RESHMA MARY JOHN, 2013.”Permanent Fault Detection Method for
Interconnects in Reconfigurable Systems”, Proceedings of U.G.C Sponsored III National
Conference on Modern Trends in Electronic Communication & Signal Processing.
Research labs
1. http://rise.cse.iitm.ac.in/rise1/index.html
2. https://ece.gmu.edu/research-
interests/reconfigurable-computing
3. https://www.cs.washington.edu/affiliates/abstra
cts/vlsi/vlsi.abstracts.html
4. http://brass.cs.berkeley.edu/
5. http://www.ece.auckland.ac.nz/en/about/our-
research/research-
areas/parallelandreconfigurablecomputingrese
archgroup.html
The End