You are on page 1of 216

SPECIFICATION AND DESIGN OF EMBEDDED SYSTEMS

by

Daniel D. Gajski Frank Vahid Sanjiv Narayan Jie Gong


University of California at Irvine Department of Computer Science Irvine, CA 92715-3425

1 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design representations

Behavioral
Represents functionality but not implementation

Structural
Represents connectivity but not dimensionality

Physical
Represents dimensionality but not functionality

Introduction

2 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Levels of abstraction

Levels

Behavioral forms Differential eq., currentvoltage diagrams Boolean equations, finitestate machines Algorithms, flowcharts, instruction sets, generalized FSM Executable spec., programs

Structural components Transistors, resistors, capacitors Gates, flipflops Adders, comparators, registers, counters, register files, queues Processors, controllers, memories, ASICs

Physical objects

Transistor

Analog and digital cells Modules, units Microchips, ASICs PCBs, MCMs

Gate

Register

Processor

Introduction

3 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design methodologies

Capture-and-simulate
Schematic capture Simulation

Describe-and-synthesize
Hardware description language Behavioral synthesis Logic synthesis

Specify-explore-re ne
Executable speci cation Software and hardware partitioning Estimation and exploration Speci cation re nement

Introduction

4 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Motivation
Executable specification System implementation

Processor

Memory

if (x = 0) then y=a*b/2

Video accelerator

ASIC

I/O

Models Languages

Partitioning Estimation Refinement

Software compilation Behavioral synthesis Logic synthesis

Physical design Test generation Manufacturing

Introduction

5 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Introduction Design models and architectures System-design languages An example Translation Partitioning Estimation Re nement Methodology and environments Outline
6 of 214
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models and architectures


Models (Specification)

Specification + Constraints

Design process

Implementation

Architectures (Implementation)

Models are conceptual views of the systems functionality Architectures are abstract views of the systems implementation

Models & Architectures

7 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models and architectures

Model: a set of functional objects and rules for composing these objects Architecture: a set of implementation components and their connections

Models & Architectures

8 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models of an elevator controller


"If the elevator is stationary and the floor requested is equal to the current floor, then the elevator remains idle. If the elevator is stationary and the floor requested is less than the current floor, then lower the elevator to the requested floor. If the elevator is stationary and the floor requested is greater than the current floor, then raise the elevator to the requested floor." (a) English description

loop if (req_floor = curr_floor) then direction := idle; elsif (req_floor < curr_floor) then direction := down; elsif (req_floor > curr_floor) then direction := up; end if; end loop; (b) Algorithmic model

(req_floor < curr_floor) / direction := down

(req_floor = curr_floor) / direction := idle

(req_floor > curr_floor) / direction := up

Down

(req_floor < curr_floor) / direction := down (req_floor = curr_floor) / direction := idle

Idle

(req_floor > curr_floor) / direction := up (req_floor = curr_floor) / direction := idle

Up

(req_floor < curr_floor) / direction := up (req_floor < curr_floor) / direction := down (c) Statemachine model

Models & Architectures

9 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Architectures for implementing the elevator controller

Combinational logic

req_floor curr_floor State register

direction req_floor curr_floor In/out ports direction

Processor Bus

Memory

(a) Register level

(b) System level

Models & Architectures

10 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Models

State-oriented models
Finite-state machine (FSM), Petri net, Hierarchical concurrent FSM

Activity-oriented models
Data ow graph, Flowchart

Structure-oriented models
Block diagram, RT netlist, Gate netlist

Data-oriented models
Entity-relationship diagram, Jacksons diagram

Heterogeneous models
Control/data ow graph, Structure chart, Programming language paradigm, Object-oriented paradigm, Program-state machine, Queueing model

Models & Architectures

11 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Finite-state machine (Mealy model)

r1/n r2/u1 start S 1 r1/d1


r2/ d1
u1

r2/n S2

r3 /u2
r1/ d2

S3 r3/n S = { s1, s2, s3} I = {r1, r2, r3} O = {d2, d1, n, u1, u2} f: S x I > S h: S x I > O

Models & Architectures

12 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

r3/

UC Irvine

State oriented: Finite-state machine (Moore model)

r1 r1 r1

r3

start

S11/d2
r2 r1 r1

S21/d1
r2 r2 r3 r3

r2 r2

S31 /n

r3

S /d1 12
r1 r2 r2 r1

r1

S22 /n

S32 /u1
r3

r2

r3

r2

S /n 13
r1

S23 /u1
r3 r3 r3

S33 /u2

Models & Architectures

13 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Finite-state machine with datapath

(curr_floor != req_floor) / output := req_floor curr_floor; curr_floor := req_floor

start

S 1
(curr_floor = req_floor) / output := 0

Models & Architectures

14 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Finite-state machines

Merits:
represent systems temporal behavior explicitly suitable for control-dominated system

Demerits:
lack of hierarchy and concurrency resulting in state or arc explosion when representing complex systems

Models & Architectures

15 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Petri nets

p2 t4

p1

t1

p5

t2

p4

p3 Net = (P, T, I, O, u) P = {p1, p2, p3, p4, p5} T = {t1, t2, t3, t4} I: I(t1) = {p1} I(t2) = {p2,p3,p5} I(t3) = {p3} I(t4) = {p4} O: O(t1) = {p5} O(t2) = {p3,p5} O(t3) = {p4} O(t4) = {p2,p3}

t3

u: u(p1) = 1 u(p2) = 1 u(p3) = 2 u(p4) = 0 u(p5) = 1

Models & Architectures

16 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Petri nets

t1

t2

t1

t2

t1

(a) Sequence

(b) Branch

(c) Synchronization

t1

t2

t1

t2

t3

t4

(d) Resource contention

(e) Concurrency

Models & Architectures

17 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Petri nets

Merits:
good at modeling and analyzing concurrent systems

Demerits:
at model that is incomprehensible when system complexity increases

Models & Architectures

18 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State oriented: Hierarchical concurrent FSM

Y A D

E u

a(P)/c

r F s a

Models & Architectures

19 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical concurrent FSMs

Merits:
support both hierarchy and concurrency good for representing complex systems

Demerits:
concentrate only on modeling control aspects and not data and activities

Models & Architectures

20 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Activity oriented: Data ow graphs (DFG)

A2.1

A2.2

Input X

A2.3

X A1 Y A2 W Output (a) Activity level Z

Output Y

V File

(b) Operation level

Models & Architectures

21 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data ow graphs

Merits:
support hierarchy suitable for specifying complex transformational systems represent problem-inherent data dependencies

Demerits:
do not express temporal behaviors or control sequencing weak for modeling embedded systems

Models & Architectures

22 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Activity oriented: Flowchart (CFG)

start

J=1 MAX = 0 J = J+1 No J>N Yes No MEM(J) > MAX Yes MAX = MEM(J)

end

Models & Architectures

23 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Flowcharts

Merits:
useful to represent tasks governed by control ow can impose a order to supersede natural data dependencies

Characteristics:
used only when the systems computation is well known

Models & Architectures

24 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structure oriented: Component-connectivity diagrams

Left bus Program memory Data memory

Right bus

Register file

Processor

System bus LIR RIR

I/O coprocessor

Application specific hardware

ALU

(a) Block diagram

(b) RT netlist

(c) Gate netlist

Models & Architectures

25 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Component-connectivity diagrams

Merits:
good at representing systems structure

Characteristics:
often used in the later phases of design process

Models & Architectures

26 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data oriented: Entity-relationship diagram

Availability

Supplier

P.O. instance

Product

Customer

Request

Order

Models & Architectures

27 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Entity-relationship diagrams

Merits:
provide a good view of the data in the system, also suitable for expressing complex relations among various kinds of data

Demerits:
do not describe any functional or temporal behavior of the system.

Models & Architectures

28 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data oriented: Jacksons diagram

Drawing AND Color Shape OR Name Users *

Circle

Rectangle AND

Radius

Width

Height

Models & Architectures

29 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Jacksons diagrams

Merits:
suitable for representing data having a complex composite structure.

Demerits:
do not describe any functional or temporal behavior of the system.

Models & Architectures

30 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Control/data ow graph


Data flow graphs
Read X Read W

+
Control flow graph
Write A

start

stop W = 10 disable enable W

X
1

C 2

Const 3 E

Read X

stop / disable A2 , disable A3

S 0
start / enable A1 , enable A2

A1

+
X := X + 2 A := X + 5 A := X + 3 A := X + W Write A

disable S 1
W = 10 / disable A1 , enable A3

enable

A2
Z

Read X

Const 2

+
disable enable A3

Const 5

S 2 Control

+
Write X Write A

(a) Activity level

(b) Operation level

Models & Architectures

31 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control/data ow graphs

Merits:
correct the inability of DFG in representing the control of a system correct the inability of CFG to represent data dependencies

Models & Architectures

32 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Structure chart

control Data
A,B A,B

Main
C A,B A,B C,D

Get
Branch

Transform
A B

Compute

Out_C

B A B

Get_A

Get_B

Change_A

Change_B

Do_Loop1

Do_Loop2

Iteration

Models & Architectures

33 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structure charts

Merits:
represent both data and control

Characteristics:
used in the preliminary stages of program design

Models & Architectures

34 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Programming languages

Imperative vs declarative programming languages:


C, Pascal, Ada, C++, etc. LISP, PROLOG, etc.

Sequential vs concurrent programming languages:


Pascal, C, etc. CSP, ADA, VHDL, etc.

Models & Architectures

35 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Programming languages

Merits:
model data, activity, and control

Demerits:
do not explicitly model the systems states

Models & Architectures

36 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Object-oriented paradigm

Object Data Operations

Object Data Operations

Object Data Operations

Transformation function

Models & Architectures

37 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Object-oriented paradigms

Merits:
support information hiding, inheritance, natural concurrency

Demerits:
not suitable for systems with complicated transformation functions

Models & Architectures

38 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Program-state machine

Y A

variable A: array[1..20] of integer

D variable i, max: integer ; B max = 0; for i = 1 to 20 do if ( A[i] > max ) then max = A[i] ; end if; end for

e1

e2

e3

Models & Architectures

39 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Program-state machines

Merits:
represent systems states, data, control and activities in a single model overcome the limitations of programming languages and HCFSM models

Models & Architectures

40 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Heterogeneous: Queueing model

Arriving requests

Queue Server

(a) One server

Arriving requests

(b) Multiple servers

Models & Architectures

41 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Queueing model

Characteristics:
used for analyzing systems performance, and can nd utilization, queueing length, throughput

Models & Architectures

42 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Architectures

Application-speci c architectures
Controller architecture, Datapath architecture, Finite-state machine with datapath (FSMD).

General-purpose processors
Complex instruction set computer (CISC) Reduced instruction set computer (RISC) Vector machine Very long instruction word computer (VLIW)

Parallel processors

Models & Architectures

43 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Controller architecture

State register

Nextstate function

Output function

Outputs

Inputs

Models & Architectures

44 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Datapath architecture
x(i) b(0) x(i1) b(1) x(i2) b(2) x(i3) b(3)

* +

* +

Pipeline stages

+
y(i) (a) Three stage pipeline

x(i) b(0)

x(i1) b(1)

x(i2) b(2)

x(i3) b(3)

* +

* +

* +
y(i)

Pipeline stages (b) Four stage pipeline

Models & Architectures

45 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

FSMD

Datapath inputs

State register

Nextstate function

Output function

Control

Datapath

Status Control unit

Datapath outputs

Models & Architectures

46 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

CISC architecture

Control

Microprogram memory

Datapath PC

MicroPC +1 Address selection logic


Status

Memory Control unit Instruction reg.

Models & Architectures

47 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

RISC architecture

Datapath Register file

Hardwired output and nextstate logic

Control

ALU

State register
Status

Data cache

Instruction reg. Control unit

Instr. cache

Memory

Models & Architectures

48 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Vector machines

Interleaved memory

Memory pipes

Memory pipes

Vector registers

Scalar registers

Vector functional unit

Scalar functional unit

Models & Architectures

49 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

VLIW architecture

Memory

Register file

Models & Architectures

50 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Parallel processors: SIMD/MIMD

PE 0 Proc. 0 Control unit

PE 1 Proc. 1

PE N1 Proc. N1

Mem. 0

Mem. 1

Mem. N1

Interconnection network (a) Message passing

Proc. 0

Proc. 1

Proc. N1

Interconnection network

Mem. 0

Mem. 1 (b) Shared memory

Mem. N1

Models & Architectures

51 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Conclusion

Different models focus on different aspects Proper model needs to represent systems features Models are implemented in architectures Smooth transformation of models to architectures increases productivity

Models & Architectures

52 of 214 Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System speci cation

For every design, there exists a conceptual view Conceptual view depends on application
Computation : conceptualized as a program Controller : conceptualized as a state-machine

Goal of speci cation language


Capture conceptual view with minimum designer effort

Ideal language
1-to-1 mapping between conceptual model & language constructs

53 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Characteristics of commonly used conceptual models:


Concurrency, hierarchy, synchronization

Requirements for embedded system speci cation Evaluate HDLs with respect to embedded systems
VHDL, Verilog, Esterel, CSP, Statecharts, SDL, SpecCharts

System speci cation

54 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Concurrency

Behavior: a chunk of system functionality


e.g. process, procedure, state-machine

System often conceptualized as set of concurrent behaviors Concurrency can exist at different abstraction levels:
Job-level Task-level Statement-level Operation-level Bit-level

Two types of concurrency within a behavior


Data-driven, Control-driven

System speci cation

55 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data-driven concurrency

Operations execute when input data is available Execution order determined by data dependencies

add 1: Q = A + B 2: Y = X + P 3: P = (C D) * Q

subtract

multiply

add Q P Y
UC Irvine

System speci cation

56 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Control-driven concurrency

Control thread : set of operations executed sequentially Concurrency represented by multiple control threads
Q

Fork-join statement

sequential behavior X begin Q(); fork A(); B(); C(); join; R(); end behavior X;

Process statement System speci cation

concurrent behavior X begin process A(); process B(); process C(); end behavior X;

57 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State-transitions

Systems often are state-based, e.g. controllers State may represent


mode or stage of being computation

Dif cult to capture using programming constructs


u P w x S y v

start

finish

System speci cation

58 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchy

Required for managing system complexity


Allows system modeler to focus on one subsystem at a time Enhances comprehension of system functionality Scoping mechanism for objects like types and variables

Two types of hierarchy


Structural hierarchy Behavioral hierarchy

System speci cation

59 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structural hierarchy

System represented as set of interconnected components Interconnections between components represent wires Several levels: systems, chips, RT-components, gates
System Processor
Control Logic Datapath data bus

Memory
control lines

System speci cation

60 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Behavioral hierarchy

Ability to successively decompose behavior into sub-behaviors


behavior P variable x, y; begin Q(x) ; R(y) ; end behavior P; P Q
e4 Q1 e1 Q2 e3 e7 e5 e2 Q3 e6 R2

Concurrent decomposition
Fork-join Process

R
R1 e8

Sequential decomposition
Procedure State-machine

System speci cation

61 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Programming constructs

Some behaviors easily conceptualized as sequential algorithms Wide variety of constructs available
Assignment, branching, iteration, subprograms, recursion, complex data types (records, lists)
type buffer_type is array (1 to 10) of integer; variable buf : buffer_type; variable i, j : integer; for i = 1 to 10 for j = i to i if (buf(i) > buf(j)) then SWAP(buf(i), buf(j)); end if; end for; end for;

System speci cation

62 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Behavioral completion

Behavior completes when all computations performed Advantages


Behavior can be viewed without inter-level transitions Allows natural decomposition into sequential subbehaviors
B q
start

X X1 q
3 final state e1 e5

Y Y1
e3

X3 X2
e2 e4

Y2

System speci cation

63 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication

shared memory

Concurrent behaviors exchange data Shared-memory model


Sender updates common medium Persistent, Non-persistent
process P process Q

Message-passing model
Data sent over abstract channels Unidirectional / bidirectional Point-to-point / multiway Blocking / non-blocking

process P
begin variable x .... send (x); .... end

process Q
begin variable y .... receive (y); .... end

channel C

System speci cation

64 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Synchronization

Concurrent behaviors execute at different speeds Synchronization required when


Data exchanged between behaviors Different activities must be performed simultaneously

Two types of synchronization mechanisms


Control-dependent Data-dependent

System speci cation

65 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control-dependent synchronization

Synchronization based on control structure of behavior


Q

Fork-join

behavior X begin Q(); fork A(); B(); C(); join; R(); end behavior X;

C
synchronization point

Reset
A

ABC B C

AB A A1 A2 B B1 B2

System speci cation

66 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Data-dependent synchronization

Synchronization based on communication of data between behaviors


AB AB A A1
e

AB B B1
e

A B A1

B B1
(x=1)

A A1
e

B1
entered A2

x:=0 e

A2

B2

A2

B2

A2
x:=1

B2

Synchronization by common event

Synchronization by status detection

Synchronization by common variable

System speci cation

67 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exception handling

Occurrence of event terminates current computation Control transferred to appropriate next mode Example of exceptions: interrupts, resets
P

P1 P2

System speci cation

68 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Timing

Required to represent real world implementations Functional timing: affects simulation of system speci cation
wait for 200 ns; A <= A + 1 after 100 ns;

Timing constraints: guide synthesis and veri cation tools


min 50 ns

behavior Q IN behavior B max 10 ms channel C (max 10 Mb/s) behavior P

OUT time

System speci cation

69 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Embedded system speci cation

Embedded system: behavior de ned by interaction with environment Essential characteristics


State-transitions Behavioral hierarchy Programming constructs
start P u P1 fork P2 e Q x

Exceptions Concurrency Behavioral completion


P

P
v

Q
w

R
join

System speci cation

70 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

VHDL

IEEE standard, intended for documentation and exchange of designs [IEE88] Characteristics supported
Behavioral hierarchy : single level of processes Structural hierarchy : nested blocks and component instantiations Concurrency : task-level (process), statement-level (signal assignment) Programming constructs Communication : shared-memory using global signals Synchronization : wait on and wait until statements Timing : wait for statement, after clause in assignments

Characteristics not supported


Exceptions : partially supported by guarded signal assignments State transitions

System speci cation

71 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Verilog and Esterel

Verilog [TM91] developed as proprietary language for speci cation, simulation Esterel [Hal93] developed for speci cation of reactive systems Characteristics supported:
Behavioral hierarchy : fork-join Structural hierarchy : hierarchy of interconnected modules Programming constructs Communication : shared registers (Verilog) and broadcasting (Esterel) Synchronization : wait for an event on a signal Timing : modeling of gate, net, assignment delays in Verilog Exceptions : disable (Verilog), watching, do-upto, trap statements (Esterel)

Characteristics not supported: State transitions System speci cation


Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

72 of 214

UC Irvine

SDL (Speci cation and Description language)


system

CCITT standard in telecommunication for protocol speci cation [BHS91] Characteristics supported
Behavioral hierarchy : nested data ow Structural hierarchy : nested blocks State transitions : state machine in processes Communication : message passing Timing : timeouts generated by timer object

block process signal route

process signal route

channel

channel

Characteristics not supported


Exceptions Programming constructs

block channel

System speci cation

73 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

CSP (Communicating Sequential Processes)

Intended to specify programs running on multiprocessor machines [Hoa78] Characteristics supported


Behavioral hierarchy : fork-join using parallel command Programming constructs Communication : message passing using input, output commands Synchronization : blocking message passing

Characteristics not supported


Exceptions State transitions Structural hierarchy Timing

System speci cation

74 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts

Developed for embedded system speci cation [NVG92] PSM (program-state machine) model + VHDL Characteristics supported
Behavioral hierarchy : sequential/concurrent behaviors State transitions: TOC (transition on completion) arcs Communication : shared memory, message passing Exceptions : TI (transition immediately) arcs

port P, Q : in integer;

B
type INTARRAY is array (natural range <>) of integer; signal A : INTARRY (15 downto 0);

X
X1

Y
variable MAX : integer ; MAX := 0; for J in 0 to 15 loop if ( A(J) > MAX ) then max := A(J) ; end if; end loop

e1 X2

e2

Characteristics similar to VHDL


Programming constructs Structural hierarchy Synchronization and Timing

e3

System speci cation

75 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : state transitions

State transitions represented by TOC and TI arcs between behaviors

start

P
v

behavior MAIN type sequential subbehaviors is begin P : (TOC, u, Q) ; Q : (TOC, v, P), (TOC, w, R); R : (TOC, x, Q); behavior P ..... behavior Q ..... behavior R ..... end MAIN;

Q
w x

System speci cation

76 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : behavioral hierarchy

Hierarchy represented by nested behaviors Behavior decomposed into sequential or concurrent subbehaviors
behavior MAIN type sequential subbehaviors is begin P : (TOC, true, Q_R); Q_R : (TOC, true, S); S:; fork behavior P ..... ..... behavior Q_R type concurrent subbehavior is begin Q : (TOC, true, halt); R : (TOC, true, halt); behavior Q ..... behavior R ..... end Q_R; behavior S..... end MAIN;

R
join

System speci cation

77 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

SpecCharts : exceptions

Exceptions represented by TI (transition immediately) arcs

P1 P2 e Q

behavior MAIN type sequential subbehaviors is begin P : (TI, e, Q); Q : ; behavior P behavior P1 ....... behavior P2 ....... behavior Q ...... end MAIN;

System speci cation

78 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary
Embedded System Features Language
State Transitions Behavioral Hierarchy Concurrency Program Constructs Exceptions Behavioral Completion

VHDL Verilog Esterel SDL CSP Statecharts SpecCharts


Feature fully supported Feature partially supported Feature not supported

System speci cation

79 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Speci cation example

An executable speci cation-language enables:


Early veri cation Precision Automation Documentation

A good language/model match reduces:


Capture time Comprehension time Functional errors

80 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Capture an examples model in a particular language


PSM model in the SpecCharts language

Point out the bene ts of a good language/model match Highlight experiments that demonstrate those bene ts

Speci cation example

81 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Answering machine controllers environment


phone line

Announcement unit

Tape unit
tape_play

Line circuitry

ann_done

tape_rew

ann_play

tape_fwd

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

82 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Highest-level view of the controller

Controller SystemOff power=0 power=1

SystemOn
phone line Announcement unit Tape unit
tape_play

Line circuitry

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

83 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The SystemOn behavior

SystemOn

System usually responds to the line Pressing any machine button gets immediate response
phone line Announcement unit Tape unit
tape_play

RespondToLine rising(any_button_pushed) RespondToMachineButton

Line circuitry

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

84 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RespondToMachineButton behavior

RespondToMachineButton behavior RespondToMachineButton type code is begin if (play=1) then HandlePlay; elsif (fwd=1) then HandleFwd; elsif (rew=1) then HandleRew; elsif (memo=1) then HandleMemo; elsif (stop=1) then HandleStop; elsif (hear_ann=1) then HandleHearAnn; elsif (rec_ann=1) then HandleRecAnn; elsif (play_msgs=1) then HandlePlayMsgs; end if; end;
messages

HandlePlay play=1 HandleFwd fwd=1 HandleRew rew=1 HandleMemo memo=1 HandleStop stop=1 HandleHearAnn hear_ann=1 HandleRecAnn rec_ann=1 HandlePlayMsgs play_msgs=1

phone line

Announcement unit

Tape unit
tape_play

Line circuitry

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

beep

tone

ring

(a)

(b)

Speci cation example

85 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RespondToLine behavior

Monitors line for rings Answers line


RespondToLine

Responds to exceptions
Hangup Machine turned off
phone line Announcement unit Tape unit
tape_play

Monitor rising(hangup) falling(machine_on)

Line circuitry

Answer
messages

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

beep

tone

ring

86 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The Monitor behavior


Monitor signal rings_to_wait : integer range 1 to 20 := 4; function DetermineRingsToWait return integer is begin if ((num_msgs > 0) and (tollsaver=1) and (machine_on=1)) then return(2); elsif (machine_on=1) then return(4); else return(15); end if; end;

Counts for required rings Requirements may change


phone line Announcement unit Tape unit
tape_play

MaintainRingsToWait loop rings_to_wait <= DetermineRingsToWait; wait on tollsaver, machine_on; end loop;

Line circuitry

CountRings variable I : integer range 0 to 20; i := 0; while (i < rings_to_wait) loop wait on rings_to_wait, ring; if (rising(ring)) then i := i + 1; end if; end loop;

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

87 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The Answer behavior

Answer rising(hangup) PlayAnnouncement button="0001" RecordMsg button="0001" RemoteOperation (a) behavior RecordMsg type code is begin ProduceBeep(1 s); if (hangup = 0) then tape_rec <= 1; wait until hangup=1 for 100 s; ProduceBeep(1 s); num_msgs <= num_msgs + 1; tape_rec <= 0; end if; end; (c) Hangup

behavior PlayAnnouncement type code is begin ann_play <= 1; wait until ann_done = 1; ann_play <= 0; end;

(b)
phone line Announcement unit Tape unit
tape_play

Line circuitry

ann_done

tape_rew

tape_fwd

ann_play

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

88 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The RemoteOperation behavior

Owner can operate machine remotely by phone Owner identi es himself by four button ID
RemoteOperation hangup=1 CheckCode code_ok=1 code_ok=0 behavior CheckUserCode type code is begin code_ok <= true; for (i in 1 to 4) loop wait until tone /= "1111" and toneevent; if (tone /= user_code(i)) then code_ok <= false; end if; end loop; end; (b)

RespondToCmds (a)
phone line Announcement unit Tape unit
tape_play

Line circuitry

ann_done

tape_rew

ann_play

tape_fwd

tape_rec

tape_cnt

ann_rec

hangup

offhook

tollsaver power

beep

tone

ring

messages

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

Speci cation example

89 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

The answering machine controller speci cation


Controller SystemOff

phone line
SystemOn

power=1

power=0

Announcement unit

InitializeSystem

RespondToMachineButton

Tape unit
tape_play

Line circuitry
rising(any_button_pushed)

ann_done

tape_rew

ann_play

tape_fwd

tape_rec

RespondToLine
tape_cnt hangup offhook beep tone ring

ann_rec

Monitor

rising(hangup)

falling(machine_on)

tollsaver power

messages

Answer
rising(hangup)

PlayAnnouncement
tone="0001"

RecordMsg

Hangup

rec ann Controller hear ann on/off memo play msgs mic stop rew play fwd light

RemoteOperation
hangup=1

CheckUserCode
code_ok not code_ok

RespondToCmds
tone="0010"

HearMsgsCmds
hangup=1

MiscCmds
other

ResetTape

Speci cation example

90 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Executable speci cation use

Precision
Readability/precision compete in a natural language Executable speci cation encourages precision Designer asks questions, speci cation answers them

Language/model match (SpecCharts/PSM):


Hierarchy State-transitions Programming constructs Concurrency Exceptions Completion Equivalence of states and programs

Speci cation example

91 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Speci cation capture experiment


VHDL Average specificationtime in minutes Number of modelers Number of incorrect specifications first time Number of incorrect specifications second time 40 3 2 1 SpecCharts 16 3 0 0

VHDL modelers required 2.5 times longer Two VHDL speci cations possessed control errors SpecCharts were effective for state-transitions and exceptions

Speci cation example

92 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Comparison of SpecCharts, VHDL and Statecharts


Answering machine example
Conceptual model
Specification attributes

SpecCharts 42 40 0 7 446 1733

VHDL (hierarch.) 42 40 84 27 1592 6740

VHDL (flat) 32 152 1 29 963 8088

Statecharts 80 135 0 X

Programstates Arcs Control signals Lines/leaf Lines Words No sequential program constructs No hierarchy

42 40

X X

X X X

Shortcomings

No exception constructs No hierarchical events No statetransition constructs

Speci cation example

93 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Design quality experiment

Design attribute Control transistors Datapath transistors Total transistors Total pins

Designed from English 3130 2277 5407 38

Designed from SpecCharts 2630 2251 4881 38

No loss in design quality with an executable language

Speci cation example

94 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

Executable languages encourage precision and automation The language should support an appropriate model
Makes speci cation easy

Strongly parallels programming languages


Structured vs. assembly languages Object-oriented model and C++

Speci cation example

95 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Translation

Model often unsupported by a standard language (1) Use a standard language anyway
Many tools available But, captures model unnaturally

(2) Use an application-speci c language


Captures model naturally But, not many tools available

(3) Use a front-end language


Captures model naturally Many tools available after translating to a standard

96 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Front-end language in VHDL environment State machine translation Fork-join translation Exception translation

Translation

97 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A front-end language in a VHDL environment

VHDL

SpecCharts

Translator

VHDL

VHDL environment Synthesis tool Simulator Debuger Testgenerator

Tool output

Translation

98 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

State machine translation

type state_type is (P, Q, R); variable state : state_type := P; start loop case (state) is when P => <actions for P> if (u) then state := Q; else if (not u) then state := R; end if; when Q => <actions for Q> state := P; when R => <actions for R> state := Q; end case; end loop; (b)

not u

(a)

Translation

99 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Fork-join translation

signal fork, P1_done, P2_done : boolean; Main: process begin statement1; parallel { P1; P2; } statement2; ... (a) Main : process begin statement1; fork <= true; wait until P1_done and P2_done; statement2; ... (b) P1_process : process begin wait until fork; P1; P1_done <= true; wait until not fork; P1_done <= false; end;

Translation

100 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exception translation

event e : T > S; T: statement1; statement2; statement3;

T statement1; if (e) goto S_start; statement2; if (e) goto S_start; statement3; S_start: S statement4; statement5; (b)

T T_loop : loop statement; if (e) exit T_loop; statement2; if (e) exit T_loop; statement3; exit T_loop; end loop; S statement4; statement5; (c)

S: statement4; statement5; (a)

Translation

101 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

The perfect standard language may never exist No standard language supports all models Using a front-end language solves the problem
Natural capture Large base of tools and expertise

Translators are simple


Maps characteristics to existing constructs Generates well-structured and consistent output

Translation

102 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System partitioning

System functionality is implemented on system components


ASICs, processors, memories, buses

Two design tasks:


Allocate system components or ASIC constraints Partition functionality among components

Constraints
Cost, performance, size, power

Partitioning is a central system design task

103 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Structural vs. functional partitioning Natural vs. executable language speci cations Basic partitioning issues and algorithms Functional partitioning techniques for hardware Hardware/software partitioning Functional partitioning techniques for software Exploring tradeoffs with functional partitioning

System partitioning

104 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Structural vs. functional partitioning

Structural: Implement structure, then partition Functional: Partition function, then implement
Enables better size/performance tradeoffs Uses fewer objects, better for algorithms/humans Permits hardware/software solutions But, its harder than graph partitioning

System partitioning

105 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Natural vs. executable language speci cations

Alternative methods for specifying functionality Natural languages common in practice Executable languages becoming popular
Automated estimation/partitioning explores solutions Early veri cation reduces costly late changes Precision eases integration

System partitioning

106 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues


Specification abstractionlevel

Granularity Metrics and estimations Partitioning algorithms Objective and closeness functions Systemcomponent allocation

Output

System partitioning

107 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

Speci cation-abstraction level: input de nition


Just indicating the language is insuf cient Abstraction-level indicates amount of design already done e.g. task DFG, tasks, CDFG, FSMD

Granularity: speci cation size in each object


Fine granularity yields more possible designs Coarse granularity better for computation, designer interaction e.g. tasks, procedures, statement blocks, statements

Component allocation: types and numbers


e.g. ASICs, processors, memories, buses

Output: format and uses


e.g. new speci cation, hints to synthesis tool

System partitioning

108 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

Metrics and estimations: "good" partition attributes


e.g. cost, speed, power, size, pins, testability, reliability Estimates derived from quick, rough implementation Speed and accuracy are competing goals of estimation

Objective and closeness functions


Combines multiple metric values Closeness used for grouping before complete partition Weighted sum common e.g. k1 F (area c) + k2 F (delay c) + k3 F (power c)

System partitioning

109 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning issues (cont.)

Algorithms: control strategies seeking best partition


Constructive creates partition Iterative improves partition Key is to escape local minimum

A Cost B

Number of moves

System partitioning

110 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Typical partitioning-system con guration

User interface

Input Model

Output

Algorithms

Estimators Design feedback Objective function

System partitioning

111 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Basic partitioning algorithms

Clustering and multi-stage clustering [Joh67, LT91] Group migration (a.k.a. min-cut or Kernighan/Lin) [KL70, FM82] Ratio cut [KC91] Simulated annealing [KGV83] Genetic evolution Integer linear programming

System partitioning

112 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering

Constructive algorithm using closeness metrics Overview


Groups closest objects Recomputes closenesses Repeats until termination condition met

Cluster tree maintains history of merges


Cutline across the tree de nes a partition

System partitioning

113 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering algorithm


/* Initialize each object as a group */ for each o loop end loop

p P

oS =P p
=
i

/* Compute closenesses between objects */ for each p loop for each p loop c = ComputeCloseness(p p ) end loop end loop
i j i j i j

/* Merge closest objects and recompute closenesses */ while not Terminate(P ) loop p p = FindClosestObjects(P

end loop end loop return P

C) P ;p ; p Sp for each p loop c = ComputeCloseness(p p P


i j

ij

ij k

ij

System partitioning

114 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hierarchical clustering example

o
30

1 10

25

o1
20

o1 o3 o2
10

o o3 o2

o2
10

15

o3
10

o2
10 10

o3 o4

o4

o4

o4

Avg(10,10) = 10 Avg(15,25) = 20

o1 o2 o3 o4 (a)

o1 o2 o3 o4 (b)

o1 o2 o3 o4 (c)

o1 o2 o3 o4 (d)

System partitioning

115 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Simulated annealing

Iterative algorithm modeled after physical annealing process Overview


Starts with initial partition and temperature Slowly decreases temperature For each temperature, generates random moves Accepts any move that improves cost Accepts some bad moves, less likely at low temperatures

Results and complexity depend on temperature decrease rate

System partitioning

116 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Simulated annealing algorithm

while not Frozen loop while not Equilibrium loop P tentative = Move(P ) cost tentative = Objfct(P

temp = initial temperature cost = Objfct(P )

end if end loop temp = DecreaseTemp(temp) end loop where:

tentative) cost = cost tentative ; cost if (Accept(cost temp) > Random(0 1)) then P = P tentative cost = cost tentative

cost Accept(cost temp) = min(1 e; temp )

System partitioning

117 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for hardware: BUD

Goal: incorporate area/time into synthesis [MK90] Clusters CDFG operations into datapath modules Closeness metrics:
Interconnecting wires Concurrency Shared hardware

Each clustering corresponds to an allocation/scheduling Selects clustering with best area/time

System partitioning

118 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

BUD example
start
(bitwidths = 4)

+ =
x cond x := a + b; if (a = b) c := ((x y) < z); 0 x 1 y z cond
38 .

+
0

.2

.7

<
4

<
c

finish (a) (b) (c)

System partitioning

119 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

.2

UC Irvine

BUD example (cont.)


Av

38

+
.2

+
AVG(.19,.12) =
=

)=

g(

,0

0, .2

19

g(

+=<

.1

4)

Av

<

.035

=<

<

(a)

<

<

Clusters +=< +, =< +, =, < +, , =, <

Chip area A Expected cycle time T 17.5 15.8 13.8 16.4 (b) 36 26 26 26

Objfct = AxT 630 411 359 (best) 426

Chip

+
Controller

<
(c)

=
3 clusters

System partitioning

120 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for hardware: Aparty

Extends BUD clustering to multiple stages [LT91]


Different closeness metrics for each stage

Closeness metrics:
Control transfer reduction Data transfer reduction Hardware sharing

System partitioning

121 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Aparty example

o1 o 12 o2 o4 o3
23 21 17

o 12 o3 o4 o3

o4

o1 o2 o3 o4 (a) (b)

o 12

o3 o4 (c)

System partitioning

122 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware/software partitioning

Combined hardware/software systems are common Software is cheap, modi able, and quick to design Hardware is fast Special algorithms are needed to favor software Proposed algorithms
Greedy [GD92] Hill climbing [EHB94] Binary-constraint search with hill climbing [VGG93]

System partitioning

123 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for systems: Vulcan, Cosyma

Vulcan [GD90]I
Partitions CDFG operations among hardware only Group migration and simulated annealing algorithms

Vulcan II [GD93]
Partitions operations among hardware/software Architecture: processor, hardware, memory, bus All communication through memory Uses greedy algorithm, extracts behaviors from hardware

Cosyma [EHB94]
Partitions statement blocks among hardware/software Architecture: processor, hardware, memory, bus Simulated annealing, extracts behaviors from software

System partitioning

124 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional partitioning for systems: SpecSyn

Solves three partitioning problems


Behaviors to processors/ASICs Variables to memories Communication channels to buses

Uses fast incremental-update estimators Covers both hardware and hardware/software partitioning [GVN94, VG92]

System partitioning

125 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Exploring tradeoffs with functional partitioning


1200.0 chipset1 chipset2 chipset3 1000.0

performance (microseconds)

Each line represents a different vendors chip set Each point represents an allocation and partition Many designs quickly examined

800.0

600.0

400.0 C A 200.0 0.0 20.0 40.0 B 60.0 80.0 100.0 cost (dollars) 120.0 140.0

System partitioning

126 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

Partitioning heavily in uences design quality Functional partitioning is necessary Executable speci cation enables:
Automation Exploration Documentation

Variety of algorithms exist Variety of techniques exist for different applications

System partitioning

127 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Future directions

Metrics from real design to guide partitioning Comparison of functional partitioning algorithms Impact of metric selections and orderings Impact of of granularity on partition quality Exploitation of regularity in partitioning

System partitioning

128 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Estimation

Estimates allow
Evaluation of design quality Design space exploration

Design model
Represents degree of design detail computed Simple vs. complex models

Issues for estimation


Accuracy Speed Fidelity

129 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Accuracy versus speed Fidelity Quality metrics


Performance metrics Hardware and software cost metrics

Estimation

130 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Accuracy vs. Speed

Accuracy: difference between estimated and actual value

) M ; j E (DM;D) (D) j (

Speed: computation time for obtaining estimate

Estimation Error

Computation Time

Simple Model

Actual Design
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Estimation

131 of 214

UC Irvine

Fidelity

Estimates must predict quality metrics for different design alternatives Fidelity: % of correct predictions for pairs of design implementations Higher delity =) correct decisions based on estimates

Metric estimate (A, B) = (B, C) = (A, C) = measured Fidelity = 33 % A B C Design points


Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

E(A) > E(B), M(A) < M(B) E(B) < E(C), M(B) > M(C) E(A) < E(C), M(A) < M(C)

Estimation

132 of 214

UC Irvine

Quality metrics

Performance Metrics
Clock cycle, control steps, execution time, communication rates

Cost Metrics
Hardware: manufacturing cost (area), packaging cost(pin) Software: program size, data memory size

Other metrics
Power, testability, design time, time to market

Estimation

133 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware design model

Memory p 1

DR

AR

Control Logic
n2

Control Register
R1 RF
n

Muxes
R2

Registers/ Register Files

n 1

State Reg.
n6

p 3

Muxes
n 4

NextState Logic
n 5

FU Status bits Status Register

p 2

Functional Units

Control Unit

Datapath

Estimation

134 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clock cycle estimation


Clock cycle determines:
Resources, execution time

Determining clock cycle


Designer speci ed [PK89, MK90] Maximum delay of any functional unit [PPM86, JMP88] Clock utilization [NG92]
i1 i2 i1 i2 i3 i4 i5 i6 150 150 80 i3 i4 i5 i6 i1 i2 i3 i4 i5 i6

x
80

80

+
80

x
80

+
150

80

x
80

+
80

+
80

+
150

+ +
80 80 80

150

x
o1

+
150

+ +
o2 o2 o1

x
o1

o2

Clock Cycle Exec. Time Resources

: 380 ns : 380 ns : 2 x, 4 +

Clock Cycle Exec. Time Resources

: 150 ns : 600 ns : 1 x, 1 +

Clock Cycle : 80 ns Exec. Time : 400 ns Resources : 1 x, 1 +


UC Irvine

Estimation

135 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Clock slack and utilization

Slack : portion of clock cycle for which FU is idle

slack(clk ti )

= (

ddelay(ti) clke clk ) ; delay(ti)


T X i

Average slack: FU slack averaged over all operations

ave slack(clk)

occur(ti) slack(clk ti ) ]
T X occur(ti) i

Clock utilization : % of clock cycle utilized for computations

utilization(clk)
Estimation

; ave slack(clk) clk


UC Irvine

136 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Clock utilization

number of operations occur(x)=6 occur()=2 occur(+)=2

1 x CLK

2 x CLK

3 x CLK

50 Functional unit delay

100 Slack

150 Clock = 65 ns

time (ns)

6x32

x
ave_slack(65 ns) = 6 +

2x9

2 x 17

+
2

+
+

+
2

= 24.4 ns

utilization(65 ns) = 1 (24.4 / 65.0) = 62

Estimation

137 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Slack minimization algorithm

Clock Slack Minimization [NG92] Compute range: clkmax, clkmin Compute occurrences: occur(t )
i

max utilization = 0 clkmax loop

/* Examine each clock cycle in range */ for all operation types t 2 T loop Compute slack slack (clk t ) end loop
i i

for

clkmin

clk

Compute average slack: ave slack (clk ) Compute utilization: utilization(clk ) /* If highest utilization */ if utilization(clk ) > max then

utilization

end if end loop

max utilization = utilization(clk) max utilization clk = clk clk(SM ) = max utilization clk

Estimation

138 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Execution time vs. clock utilization


Second order differential equation example

Clock with highest utilization results in better execution times

Clock cycle vs. Utilization


160.0
Execution time (ns)
1200.0

Execution time vs. utilization

140.0 120.0

1000.0

Clock cycle (ns)

100.0 80.0 60.0 40.0 20.0 0.0 0.0 20.0 40.0 60.0 Utilization (%) 80.0
92% 56 ns

800.0

600.0

560 ns 92%

400.0 0.0

20.0

40.0 60.0 Utilization (%)

80.0

100.0

100.0

Estimation

139 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Control steps estimation

Operations in the speci cation assigned to control step Number of control steps determines:
Execution time of design Complexity of control unit

Scheduling
Granularity is operations in a data ow graph Computationally expensive

Estimation

140 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Operator-use method

Granularity is statements in speci cation Faster than scheduling, average error 13%
u1 := u x dx
ti add mult sub num(t i ) clocks(t i ) 1 2 1 1 4 1 maximum macronode control steps

u2 := 5 x w u3 := 3 x y y1 := i x dx w := w + dx

u1 := u x dx u2 := 5 x w n u3 := 3 x y 1 y1 := i x dx w := w + dx

add: (1/1)*1= 1 mult: (4/2)*4= 8 max (1 , 8) = 8

u1 := u x dx ; u2 := 5 x w ; u3 := 3 x y ; y1 := i x dx ; w := w + dx ; u4 := u1 x u2 ; u5 := dx x u3 ; y := y + y1 ; u6 := u u4 ; u := u6 u5 ;

n u4 := u1 x u2 u5 := dx x u3 y := y + y1 u6 := u u4 u := u6 u5 n n

u4 := u1 x u2 add: (1/1)*1= 1 2 u5 := dx x u3 mult: (2/2)*4= 4 y := y + y1 max (1 , 4) = 4 3 u6 := u u4 4 u := u6 u5


sub: (1/1)*1= 1 max (1 ) = 1 sub: (1/1)*1= 1 max (1 ) = 1 Estimated total control steps = 14

Estimation

141 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Branching in behaviors

Control steps maybe shared across exclusive branches


sharing schedule: fewer states, status register non-sharing schedule: more states, no status registers
B 1 o1 o2 s1 s2 s3 o1 o2 o3 o6 s1 s2 s3 o1 o2 o3 o6 s6 o7 s7

B 2 o3 o4 o5 B 4

B 3 o6 o7

s4 s5

o4 o5 o8 (b)

o7

s4 s5 s8

o4 o5 o8 (c)

o8 (a)

s6

Estimation

142 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Execution time estimation

Average start to nish time of behavior Straight-line code behaviors

exectime(B )

csteps(B )

clk

Behavior with branching


Estimate execution time for each basic block Create control ow graph from basic blocks Determine branching probabilities Formulate equations for node frequencies Solve set of equations exectime(B ) = X exectime(bi) freq(bi) b 2B
i

Estimation

143 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Probability-based ow analysis

B1

A := A + 1;
V 1
e 12

A := A + 1; for I in 1 to 10 loop B := B + 1; C := C A; if (D > A ) then D := D + 2; else D := D + 3; end if E := D * 2; end loop; B := B * A; C := 3


B B B2 B := B + 1 ; C := C A; D <= A B D := D + 3;

V
0.5

2
0.5
e 24

D>A

3
D := D + 2;

4
V3

23

V 4
e 45 e 52

e 35

5 E := D * 2 ;
(I > 10)

(I =< 10)

V 5
e 56

0.9 0.1

B: = B * A; C := 3;

Estimation

144 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Probability-based ow analysis

Flow equations:

freq(S ) freq(v1) freq(v2) freq(v3) freq(v4) freq(v5) freq(v6) freq(v1) freq(v3) freq(v5)

= = = = = = =

1:0 1:0 1:0 0:5 0:5 1:0 0:1 1:0 5:0 10:0

freq(S ) freq(v1 ) freq(v2 ) freq(v2 ) freq(v3 ) freq(v5 ) freq(v2) freq(v4) freq(v6)

0:9 1:0

freq(v5) freq(v4)

Node execution frequencies:


= = =

= = =

10:0 5:0 1:0

Can be used to estimate number of accesses to


variables, channels or procedures

Estimation

145 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication rates
bits sent over channel C

200

400

600

800

1000

time (ns)

Average channel rate


rate of data transfer over lifetime of behavior 56 bits averate(C ) = 1000 ns = 56 Mb=s

Peak channel rate


rate of data transfer of single message 8 bits peakrate(C ) = 100 ns = 80 Mb=s

Estimation

146 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Communication rate estimation

Total behavior execution time consists of

Computation time, comptime(P ), obtained from ow-analysis Communication time, commtime(P C ) = access(P C ) delay (C )

Total bits transferred by the channel,

total bits(P C ) averate(C )


=

access(P C )

bits(C )

Channel average rate

total bits(B C ) comptime(B ) + commtime(B C )

Channel peak rate

bits(C ) peakrate(C ) = protocol delay(C )


Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Estimation

147 of 214

UC Irvine

Area estimation

Two tasks:
Determining number and type of components required Estimating component size for a speci c technology (FSMD, gate arrays etc.)

Behavior implemented as a FSMD ( nite state machine with datapath)


Datapath components: registers, functional units, multiplexers/buses Control unit: state register, control logic, next-state logic

We will discuss
Datapath component estimation Control unit estimation Layout area for a custom implementation

Estimation

148 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clique-partitioning

Commonly used for determining datapath components Let G = (V

E ) be a graph, V

and E are set of vertices and edges

Clique is a complete subgraph of G Clique-partitioning


divides the vertices into a minimal number of cliques each vertex in exactly one clique

One heuristic: maximum number of common neighbors [CS86]


Two nodes with maximum number of common neighbors are merged Edges to two nodes replaced by edges to merged node Process repeated till no more nodes can be merged

Estimation

149 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Clique-partitioning
Edge e s s 1 v
1 1,3 1,4 2,3 2,5 3,4 4,5

Common neighbors 1 1 0 0 1 0 v3 s 4 s 13 v
1

2 v2

Edge e e
13,4 2,5 4,5

Common neighbors 0 0 0

2 v2

e e e s 5

v3 s 3 s 4

v4

e e

v4

v5

v5

s v
1

2 v2

Edge e
2,5

Common neighbors 0 v3 s

v2

25

v4

v5

v3 s 134

134 s Cliques:

v4

v5

134 s 25

= =

{v1 , v 3 , v 4 } {v2 , v 5 }

Estimation

150 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Storage-unit estimation

Variables not used concurrently maybe mapped same storage-unit To use clique-partitioning, construct a graph where
Each variable represented by a vertex Variables with non-overlapping lifetimes have an edge between] their vertices
v1 v2 v 3 v4 v5 v6 v7 v8 v9 v10 v11 s s v8 v10 v1 v9 s
2 0

Cliques v2 v7 {v2 , v 3 } {v6 , v7 , v 9 } {v4 , v5 , v 8 } {v , v 11 } 10 {v1 } v6

Storage unit = = = = = R1 R2 R3 R4 R5

v11 s
3

v3

v5

v4

Estimation

151 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Functional-unit and interconnect-unit estimation

Clique-partitioning can be applied For determining the number of FUs required, construct a graph where
Each operation in behavior represented by a vertex Edge connects two vertices if Corresponding operations assigned different control steps There exists an FU that can implement both operations

For determining the number of interconnect units, construct a graph where


Each connection between two units is represented by a vertex Edge connects two vertices if corresponding connections not used in same control step

Estimation

152 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Computing datapath area

Routing channel LSB

Bit slices MSB

Bit-sliced datapath

Lbit = area(bit) area(DP )

tr(DP )
= =

bit

nets Hrt = nets per track

Lbit

Hcell

Hrt) area(bit)
H cell H bit H rt Datapath components

bitwidth(DP )

Control lines

Estimation

153 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Pin estimation

Number of wires at behaviors boundary depends on


Global data Port accessed Communication channels used Procedure calls
variable N : integer; variable X : bit_vector(15 downto 0); procedure SUM(A, B, OUT) is begin .... end SUM; process Main ( ch1, ch2) out channel ch1 ; in channel ch2; { send (ch1, N); portF <= portG + 4; ............ receive (ch2, Result); } process Factorial ( ch1, ch2) in channel ch1 ; out channel ch2; { receive (ch1, M); /* compute factorial */ ................ send (ch2, result); } portF portG

channel ch1

channel ch2

Estimation

154 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Software estimation models

Specification Specification Compile to 8086 Compile to 68000 Compile to MIPS

Compile to generic instructions


8086 instruction timing & size information

8086 instructions

68000 instructions

MIPS instructions

Generic instructions
MIPS instruction timing & size information

8086 Estimator

8086 instruction timing & size information

68000 Estimator

68000 instruction timing & size information

MIPS Estimator

Estimator

technology files for target processors

68000 instruction timing & size information

Software Metrics

Software Metrics

MIPS instruction timing & size information

Processor specific model

Generic model

Estimation

155 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Deriving processor technology les


Generic instruction
dmem3 = dmem1 + dmem2

8086 instructions
instruction mov ax, word ptr[bp+offset1] add ax, word ptr[bp+offset2] mov word ptr[bp+offset3], ax clocks
(10) (9 + EA1) (10)

68020 instructions
bytes
3 4 3

instruction
mov a6@(offset1), d0 add a6@(offset2), d0 mov d0, a6@(offset3)

clocks
(7) (2 + EA2) (5)

bytes
2 2 2

technology file for 8086 generic instruction


... dmem3 = dmem1 + dmem2 ... 35 clocks 10 bytes

technology file for 68020 size generic instruction


... dmem3 = dmem1 + dmem2 ... 22 clocks 6 bytes

execution time

execution time

size

Estimation

156 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Software estimation

Program execution time


Create basic blocks and compile into generic instructions Estimate execution time of basic blocks Perform probability-based ow analysis Compute execution time of the entire behavior: exectime(B ) = ( X exectime(bi) freq(bi) ) b 2B accounts for compiler optimizations
i

Program memory size X

progsize(B ) datasize(B )

g2G

instr size(g) datasize(d)

Data memory size X


=

d2D

Estimation

157 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary and future directions

We described methods for estimating:


Performance metrics: clock, control steps, execution time, communication rates Cost metrics: design area, pins, program and data memory size

Future directions:
Incorporating synthesis/compilation optimizations New metrics for testability, power, integration cost, etc. New architectural features for the estimation model

Estimation

158 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re nement

Functional objects are grouped and mapped to system components


Functional objects: variables, behaviors, and channels System components: memories, chips or processors, and buses

Re nement is update of speci cation to re ect mapping Need for re nement


Makes speci cation consistent Enables simulation of speci cation Generate input for synthesis, compilation and veri cation tools

159 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Re ning variable groups Channel re nement Resolving access con icts Re ning incompatible interfaces

Re nement

160 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re ning variable groups

Group of variables mapped to a memory Variable folding:


Implementing each variable in a memory with a xed word size

Memory address translation


Assignment of addresses to each variable in group Update references to variable by accesses to memory

Re nement

161 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Variable folding
variable variable variable variable A: B: C: D: bit_vector( 3 downto 0) ; bit_vector(15 downto 0) ; bit_vector(11 downto 0) ; bit_vector(11 downto 0) ;
11 8 7 0

4x1 7 ... 0 7..4 3..0

A( 3 downto 0) B( 7 downto 0) B(15 downto 8) C( 7 downto 0) C(11 downto 8) D( 5 downto 0) D(11 downto 6)
... ... 11

to variable C in memory

6x1

5..0

8bit Memory

to variable D in memory

Re nement

162 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Memory address translation


variable J, K : integer := 0; variable V : IntArray (63 downto 0); .... V(K) := 3; X := V(36); V(J) := X; .... for J in 0 to 63 loop SUM := SUM + V(J); end loop; .... Original specification

V (63 downto 0)

MEM(163 downto 100)

Assigning addresses to V

variable J, K : integer := 0; variable MEM : IntArray (255 downto 0); .... MEM(K +100) := 3; X := MEM(136); MEM(J+100) := X; .... for J in 0 to 63 loop SUM := SUM + MEM(J +100); end loop; .... Refined specification

variable J : integer := 100; variable K : integer := 0; variable MEM : IntArray (255 downto 0); .... MEM(K + 100) := 3; X := MEM(136); MEM(J) := X; .... for J in 100 to 163 loop SUM := SUM + MEM(J); end loop; .... Refined specification without offsets for index J

Re nement

163 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Re ning channel groups

Channels are virtual entities over which messages are transferred Bus is a physical medium that implements groups of channels Bus consists of:
wires representing data and control lines protocol de ning sequence of assignments to data and control lines

Two re nement tasks


Bus generation: determining buswidth i.e. number of data lines Protocol generation: specifying mechanism of transfer over bus

Re nement

164 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Characterizing communication channels

For a given behavior P that sends data over channel C ,

Message size, bits(C ) : number of bits in each message Accesses, accesses(P C ) : number of times P transfers data over C Average rate, averate(C ) : rate of data transfer of C over lifetime of behavior Peak rate, peakrate(C ) : rate of transfer of single message
8 8 X2 100 200 300 8 X3 400

channel X
t=0

X1

time (ns)

bits(C ) = 8 bits 24 averate(C ) = 400bits = 60 Mbits=s ns 8 bits peakrate(C ) = 100 ns = 80 Mbits=s


Re nement
165 of 214
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Characterizing buses

For a given bus B ,

Buswidth , buswidth(B ) : number of data lines in B Protocol delay, protdelay (B ) : delay for single message transfer over bus Average rate, averate(B ) : rate of data transfer over B over lifetime of system Peak rate, peakrate(B ) : maximum rate of transfer of data on bus

buswidth peakrate(C ) = protdelay((B )) B

Re nement

166 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Determining bus rates

Idle slots of a channel used for messages of other channels To ensure that channel average rates are unaffected by bus X

averate(B )

C 2B

averate(C )

Goal: to synthesize a bus that constantly transfers data i.e.

peakrate(B ) = averate(C )
8 X2 16 Y2 16 Y3

Average rate
8

channel X

X1 16

(2x8 bits) / 4s = 4 bits/s

channel Y
8

Y1

(3x16 bits) / 4s = 12 bits/s

16 Y1

16 Y2

8 X2

16 Y3

bus B
t=0

X1

(4 + 12 bits/s) = 16 bits/s
4s

1s

2s

3s

time

Re nement

167 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Constraints for bus generation

Buswidth: affects number of pins on chip boundaries Channel average rates: affects execution time of behaviors Channel peak rates: affects time required for single message transfer
16 16 X2

channel X

X1

averate(X) = 8 bits/s

8 X1

8 X2

bus B
16

averate(B) = 8 bits/s peakrate(B) =8 bits/s

16 X2

bus B
t=0

X1

averate(B) = 8 bits/s peakrate(B) = 16 bits/s


2s 3s 4s

1s

time

Re nement

168 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation algorithm [NG94]


/* Determine range of buswidths */ minwidth = 1, maxwidth = Max(bits(C ))

mincost = 1, mincostwidth = 1 for currwidth in minwidth to maxwidth loop peakrate(B ) = currwidth protdelay(B )
/* compute bus peak rate */ /* compute sum of channel average rates */ averatesum = 0; for all channels C 2 B loop

end loop if (peakrate(B ) > averatesum) then /* feasible solution, determine minimal cost */ currcost = ComputeCost(currwidth) if (currcost < mincost) then mincost = currcost, mincostwidth = currwidth end if end if end loop return(mincostwidth)

access P bits(C ) averate(C ) = comptime((P ) C )commtime(P ) + averatesum = averatesum + averate(C );

Re nement

169 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation algorithm

Compute buswidth range: For minwidth

minwidth = 1, maxwidth = Max(bits(C ))

Compute bus peak rate:

currwidth maxwidth loop


( )

Compute channel average rates

peakrate(B ) = currwidth protdelay(B )

bits C commtime(P ) = access(P C ) d currwidth e protdelay(B ) ] access P bits(C ) averate(C ) = comptime((P ) C )commtime(P ) + X if peakrate(B ) averate(C ) then C 2B if bestcost > ComputeCost(currwidth) then bestcost = ComputeCost(currwidth) bestwidth = currwidth

Re nement

170 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Bus generation example

2 behavior accessing 16 bit data over two channels Constraints speci ed for channel peak rates
9000.0 8000.0 7000.0 6000.0 5000.0 4000.0 3000.0 2000.0 1000.0 0.0 -1000.0 0.0

Cost Function Value

infeasible implementations feasible implementations

selected buswidth

4.0

8.0

12.0 16.0 Buswidth

20.0

24.0

Re nement

171 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Performance vs. buswidth tradeoffs

Allows a buswidth to be selected, given performance constraints


e.g. behavior P 1 has performance constraint of 2500 clocks. buswidths of 4 or greater must be selected
Behavior execution time (clocks)

7000.0 6000.0 5000.0 4000.0 3000.0 2000.0 1000.0 0.0 0.0 4.0 8.0 12.0 16.0 Buswidth (pins) 20.0 24.0

Re nement

172 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation

Bus consists of several sets of wires:


Data lines, used for transferring message bits Control lines, used for synchronization between behaviors ID lines, used for identifying the channel active on the bus

All channels mapped to bus share these lines Number of data lines determined by bus generation algorithm Protocol generation consists of six steps

Re nement

173 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation
1. Protocol selection: full handshake, half-handshake etc. 2. ID assignment: N channels require log2(N ) ID lines

behavior P variable AD; begin ..... X <= 32 ; ..... MEM(AD) := X + 7; ..... end ; behavior Q variable COUNT; begin ..... MEM(60) := COUNT ; ..... end ;

CH0 CH1

"00" "00"
variable X : bit_vector(15 downto 0) ;

CH2 CH3

"00" "00" bus B


variable MEM : bit_vector (63 downto 0, 15 downto 0);

Re nement

174 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation
type HandShakeBus is record START, DONE : bit ; ID : bit_vector(1 downto 0) ; DATA : bit_vector(7 downto 0) ; end record ; signal B : HandShakeBus ;

3. Bus structure de nition

procedure ReceiveCH0( rxdata : out bit_vector) is begin for J in 1 to 2 loop wait until (B.START = 1) and (B.ID = "00") ; rxdata (8*J1 downto 8*(J1)) <= B.DATA ; B.DONE <= 1 ; wait until (B.START = 0) ; B.DONE <= 0 ; end loop; end ReceiveCH0; procedure SendCH0( txdata : in bit_vector) is begin bus B.ID <= "00" ; for J in 1 to 2 loop B.data <= txdata(8*J1 downto 8*(J1)) ; B.START <= 1 ; wait until (B.DONE = 1) ; B.START <= 0 ; wait until (B.DONE = 0) ; end loop; end SendCH0;

4. Bus protocol de nition

Re nement

175 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol generation

5. Update variable references 6. Generate behaviors for variables


process P variable AD Xtemp; begin ..... SendCH0(32) ; ..... ReceiveCH1(Xtemp); SendCH2(AD, Xtemp+7); ..... end ; process Xproc variable X ; begin wait on B.ID; if (B.ID="00") then receiveCH0(X); elsif (B.ID="01" ) then sendCH1(X); end if; end; process MEMproc variable MEM: array(0 to 63); begin wait on B.ID; if (B.ID="10") then receiveCH2(MEM); elsif (B.ID="11" ) then receiveCH3(MEM); end if; end;

bus B

process Q variable COUNT; begin ..... SendCH3(60, COUNT); ..... end ;

Re nement

176 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Resolving access con icts

System partitioning may result in concurrent accesses to a resource


Channels mapped to a bus may attempt data transfer simultaneously Variables mapped to a memory may be accessed by behaviors simultaneously

Arbiter needs to be generated to resolve such access con icts Three tasks
Arbitration model selection Arbitration scheme selection Arbiter generation

Re nement

177 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Arbitration models
addr / data addr / data

MemArbiter
req, grant req, grant

port1

port2

memory MEM

Static

behavior P

behavior Q

behavior R

addr / data addr / data

Dynamic
MemArbiter
req, grant req, grant req, grant

port1

port2

memory MEM

behavior P

behavior Q

behavior R

Re nement

178 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Arbiter generation

Example of bus arbitration

Two behaviors accessing a single resource, bus B Behavior P assigned higher priority than Q Fixed priority implemented with two handshake signals Req and Grant
process P variable AD Xtemp; begin ..... Req_P <= 1; wait until (Grant_P = 1); SendCH0(32) ; Req_P <= 0; ..... end process ; process Q variable COUNT; begin ..... Req_Q <= 1; wait until (Grant_Q = 1); SendCH3(60, COUNT); Req_Q <= 0; ..... end process;

bus B

process B_arbiter
begin wait until (Req_P=1) or (Req_Q = 1); if (Req_P = 1) then Grant_P = 1; wait unitl (Req_P = 0); Grant_P = 0"; elsif (Req_Q = 1) then Grant_Q <= 1; wait until (Req_Q = 0); Grant_Q <= 0; end if; end process;

Req_P Grant_P

process Xproc variable X ; begin wait on B.ID; if (B.ID="00") then receiveCH0(X); elsif (B.ID="01" ) then sendCH1(X); end if; end process;

Req_Q Grant_Q

process MEMproc variable MEM: array(0 to 63); begin wait on B.ID; if (B.ID="10") then receiveCH2(MEM); elsif (B.ID="11" ) then receiveCH3(MEM); end if; end process;

Re nement

179 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Effect of binding on interfaces


Custom Custom

behavior A protocol
Custom

Channel X
Pa Pb

behavior B protocol
Standard

behavior X

Channel X
Pa Pb

behavior B

Standard

Standard

behavior A

Pa

Interface Process

Pb

behavior B

Re nement

180 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol operations

Protocols usually consist of ve atomic operations


waiting for an event on input control line assigning value to output control line reading value from input data port assigning value to output data port waiting for xed time interval

Protocol operations may be speci ed in one of three ways


Finite state machines (FSMs) Timing diagrams Hardware description languages (HDLs)

Re nement

181 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol speci cation : FSMs

Protocol operations ordered by sequencing between states Constraints between events may be speci ed using timing arcs Conditional & repetitive event sequences require extra states, transitions
start start

ADDRp <= AddrVar(7 downto 0); a1 ARDYp <= 1; (ARCVp = 1 ) ADDRp <= AddrVar(15 downto 8); a2 AREQp <= 1; (DRDYp = 1 ) a3 DataVar <= DATAp (100 ns) (RDp = 1)

b1

b2

MAddrVar := MADDRp

b3

MDATAp <= MemVar (MAddrVar)

Protocol Pa

Protocol Pb
Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Re nement

182 of 214

UC Irvine

Protocol speci cation : Timing diagrams

Advantages:
Ease of comprehension, representation of timing constraints

Disadvantages:
Lack of action language, not simulatable Dif cult to specify conditional and repetitive event sequences
ARDYp ADDRp ARCVp DREQp DRDYp 100ns DATAp 15..0 15..0 7..0 15..8 15..0 MADDRp RDp MDATAp

Protocol Pa

Protocol Pb

Re nement

183 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Protocol speci cation : HDLs

Advantages:
Functionality can be veri ed by simulation Easy to specify conditional and repetitive event sequences

Disadvantages:
Cumbersome to represent timing constraints between events
port ADDRp : out bit_vector(7 downto 0); port DATAp : in bit_vector(15 downto 0); port ARDYp : out bit; port ARCVp : in bit; port DREQp : out bit; port DRDYp : in bit; ADDRp <= AddrVar(7 downto 0); ARDYp <= 1; wait until (ARCVp = 1 ); ADDRp <= AddrVar(15 downto 8); DREQp <= 1; wait until (DRDYp = 1); DataVar <= DATAp; 8 ADDRp DATAp 16 ARDYp ARCVp DREQp DRDYp RDp MADDRp MDATAp 16 port MADDRp : in bit_vector(15 downto 0); port MDATAp : out bit_vector(15 downto 0); port RDp : in bit; wait until (RDp = 1); MAddrVar := MADDRp ; wait for 100 ns; MDATAp <= MemVar (MAddrVar);

16

Protocol Pa

Protocol Pb

Re nement

184 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Interface process generation

Input: HDL description of two xed, but incompatible protocols Output: HDL process that translates one protocol to the other
i.e. responds to their control signals and sequence their data transfers

Four steps required for generating interface process (IP):


Creating relations Partitioning relations into groups Generating interface process statements interconnect optimization

Re nement

185 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: creating relations

Protocol represented as an ordered set of relations Relations are sequences of events/actions


Protocol Pa
ADDRp <= AddrVar(7 downto 0); ARDYp <= 1; wait until (ARCVp = 1 ); ADDRp <= AddrVar(15 downto 8); DREQp <= 1; wait until (DRDYp = 1); DataVar <= DATAp;

Relations
A1 [ (true) : ADDRp <= AddrVar(7 downto 0) ARDYp <= 1 ] A2 [ (ARCVp = 1) : ADDRp <= AddrVar(15 downto 8) DREQp <= 1 ] A3 [ (DRDYp = 1) : DataVar <= DATAp ]

Re nement

186 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: partitioning relations

Partition the set of relations from both protocols into groups. Group represents a unit of data transfer
Protocol Pa A1 (8 bits out) B1 (16 bits in) A2 (8 bits out) G1 Protocol Pb

A3 (16 bits in)

B2 (16 bits out)

G2

G1
Re nement
187 of 214

= (

A1 A2 B1 )

G2

= (

B1 A3 )
UC Irvine

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

IP generation: inverting protocol operations

For each operation in a group, add its dual to interface process Dual of an operation represents the complementary operation Temporary variable may be required to hold data values
Interface Process Atomic operation wait until (Cp = 1) Cp <= 1 var <= Dp Dp <= var wait for 100 ns Dual operation Cp <= 1 wait until (Cp = 1) Dp <= TempVar TempVar := Dp wait for 100 ns /* (group G1) */ wait until (ARDYp = 1); 8 TempVar1(7 downto 0) := ADDRp ; ADDRp ARCVp <= 1 ; DATAp wait until (DREQp = 1); 16 TempVar1(15 downto 8) := ADDRp ; ARDYp RDp <= 1 ; ARCVp MADDRp <= TempVar1; /* (group G2) */ DREQp wait for 100 ns; DRDYp TempVar2 := MDATAp ; DRDYp <= 1 ; DATAp <= TempVar2 ;

16

MADDRp MDATAp 16

RDp

Re nement

188 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

IP generation: interconnect optimization

Certain ports of both protocols may be directly connected Advantages:


Bypassing interface process reduces interconnect cost Operations related to these ports can be eliminated from interface process
Interface Process
ADDRp 8 ARDYp ARCVp

DREQp DRDYp

wait until (ARDYp = 1); TempVar1(7 downto 0) := ADDRp ; ARCVp <= 1 ; wait until (DREQp = 1); TempVar1(15 downto 8) := ADDRp ; RDp <= 1 ; MADDRp <= TempVar1; wait for 100 ns; DRDYp <= 1 ;
16

MADDRp 16 RDp

DATAp

MDATAp

Re nement

189 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Transducer synthesis [BK87]

Input: Timing diagram description of two xed protocols Output: Logic circuit description of transducer

Steps for generating logic circuit from timing diagrams:


Create event graphs for both protocols Connect graphs based on data dependencies or explicitly speci ed ordering Add templates for each output node in combined graph Merge and connect templates Satisfy min/max timing constraints Optimize skeletal circuit

Re nement

190 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Generating event graphs from timing diagrams


e.g. FIFO stack control cell
Ri Ro Ri Ao Cell Ai L L Ro Ao Ai

Ri L Ro Ao Ai L

Ri L Ro Ao Ai E L

Re nement

191 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Deriving skeletal circuit from event graph

Ao Ri L Ao Ri L Ao Ro L Ao Ro L

Ri Ro L Ri Ro L

S Q R
Ro

S R Q
L

Ai Ro L Ai Ro

S Q R
Ai

Advantages:
Synthesizes logic for transducer circuit directly Accounts for min/max timing constraints between events

Disadvantages:
Cannot interface protocols with different data port sizes Transducer not simulatable with timing diagram description of protocols

Re nement

192 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Hardware/Software interface re nement

v2

Software partition v1 v2

Hardware partition v3 s1 v4 s2

v1

Processor

Memory

Data access

B1

B2 p1 v3 Buffer s1 v4 s2 p2 Ports

B1

B2

B3

B4

p1

p2

p3 ASIC B3 B4

p1 (a) Partitioned specification

p2 (b) Mapping to architecture

p3

Re nement

193 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Tasks of hardware/software interfacing

Data access (e.g., behavior accessing variable) re nement Control access (e.g., behavior starting behavior) re nement Select bus to satisfy data transfer rate and reduce interfacing cost Interface software/hardware components to standard buses Schedule software behaviors to satisfy data input/output rate Distribute variables to reduce ASIC cost and satisfy performance

Re nement

194 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary and future directions

In this section, we described:


Re nement of variable groups: variable folding, address translation Re nement of channel groups: bus and protocol generation Resolution of access con icts: arbiter generation Re nement of incompatible interfaces: IP generation, transducer synthesis

Future work should address the following issues:


Effects of bus arbitration delays on performance of a behavior Developing metrics to guide selection of protocols and arbitration schemes Ef cient synthesis of arbiter and interface processes

Re nement

195 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Methodology

Past design effort focused on lower levels Higher levels lack well-de ned methodology and tools Paradigm shift to higher levels can increase productivity Need methodology and tools for system level

196 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Outline

Basic concepts in design methodology Example A design methodology A generic synthesis system Conceptualization environment

Methodology

197 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Items a design methodology must specify

Syntax and semantics of input and output Algorithms for transforming input to output Components to be used in the design implementation De nition and ranges of constraints Mechanism for selection of architectural styles Control strategies (scenarios or scripts)

Methodology

198 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Example: Interactive TV processor

InteractiveTvProcessor audio_in Analog subsystem video_in av_cmd video audio + commands button
keypad receiver IC

audio_out Digital subsystem video_out Analog subsystem video

audio

Main computer

Methodology

199 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Examples data ow behavior


Digital subsystem audio_in StoreAudio audio2[100k][8] video_in video[500k][8] video_out audio1[100k][8] audio_out GenerateAudio

ProcessAVCmd

StoreGenerateVideo

av_cmd[8]

OverlayCharacters

fonts[128][16][16]

screen_chars[30][30][8] av_cmd StoreAVCmd ProcessMainCmds main_cmds ProcessRemoteButtons button

Methodology

200 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Examples implementation after system design


Digital subsystem Memory1 audio1[100k][8] video[500k][8] audio2[100k][8] Memory2

audio_in video_in

audio_out video_out

ASIC1 StoreAudio GenerateAudio

ASIC2 StoreGenerateVideo StoreAVCmd

Memory3 fonts[128][16][16] screen_chars[30][30[]8] av_cmd[8]

av_cmd

Processor ProcessAVCmd ProcessMainCmds ProcessRemoteButtons OverlayCharacters

main_cmds

button

Methodology

201 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

An example design methodology


Current practice Functionality specification Proposed methodology

Natural language Manual


bus Processor Funct. Spec.

Functional specification

Executable language Allocation Partitioning Refinement

System design

ASIC Funct. Spec.

ASIC Funct. Spec.

Memory Variables

Component implementation

detailed bus protocol Processor ASIC ASIC Memory mapped address space

C code

RTL struct.

RTL struct.

Methodology

202 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

System-design tasks

Systemdesign tasks

Functional objects

Allocation Variables Behaviors Channels Memories Processors Buses

Partitioning Variables to memories Behaviors to processors Channels to buses

Refinement Address assignment Interfacing Arbitration/protocols

Methodology

203 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

One possible ordering of tasks

1. Functionality specification Specification

Memory allocation

Variabletomemory partitioning

Bus allocation

Channeltobus partitioning 2. System design ASIC/processor allocation

BehaviortoASIC/processor partitioning

Interface synthesis

Arbiter synthesis 3. Component implementation Implement software Implement hardware

Methodology

204 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Generic synthesis system requirements

Completeness
All levels of design, all implementation styles

Extensibility
Allow addition of new algorithms and tools

Controllability
User control of tools, design-quality feedback

Interactivity
Partial design, design modi cation

Upgradability
Evolve to describe-and-synthesize method

Methodology

205 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic synthesis system


System Specification Designer

System synthesis

Compilation

Logic/Sequential synthesis

CDB

Physical design synthesis

Assembly code

ASIC description to manufacturing

Methodology

206 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

Conceptualization environment

Verification/simulation suite

Software synthesis
Description generators

ASIC synthesis

SDB
Intermediate forms

UC Irvine

A generic system-synthesis tool


System behavioral specification

Compiler

Allocator

Transformer

SR

Estimators

Partitioner

Interface & arbitration synthesis Systemmodule behavioral specifications

To software synthesis

To chip synthesis

Methodology

207 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic chip-synthesis tool


Behavioral description

Compiler

Scheduler

Component selector CDFG Storage binder Functional unit binder Interconnection binder Module selector

Technology mapper CDB Microarchitecture optimizer

Logic/Sequential synthesis

To physical design

Methodology

208 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A generic logic-synthesis tool


State tables Boolean expressions Timing diagrams Memory specifications

State minimization

Timing graph compiler

Memory synthesis

State encoding

Interface synthesis

Logic minimization

Technology mapping

Physical design

Methodology

209 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Conceptualization environment

Tool is only effective if the designer can use it


Understandable display of data Highlight design parts that need attention

Must support many design avenues

Methodology

210 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

A system-synthesis tool interface

Mappings System ASIC1 CaptureAudio

Module type

$ 105 /100* 30

Execution Area time

Pins

Instr

X100

100/110 100/110 X100 30 100/110 100/110 V1000 10

16000 46/60 /20000

Allocation Partition Estimates Constraints

GenerateAudio ASIC2 CaptureGenerateVideo CaptureAVCmd Memory1 audio_array1 audio_array2 Memory2 video_array Processor1 ProcessRemoteButtons ProcessMiscCmds Cost: 5.43 View options Y900 25 V1000 10

18000 48/60 /20000

6000 /5000*

Partition/Allocate

Refine

Methodology

211 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

An optional design view

Quality metric $(System) Executiontime(CaptureAudio) Executiontime(GenerateAudio) Executiontime(CaptureGenerateVideo) Executiontime(CaptureAVCmd) Area(ASIC1) Area(ASIC2) Pins(ASIC1) Pins(ASIC2) Instr(Processor1)

Estimate/ Constraint 105/100 100/110 100/110 100/110 100/110 16000/20000 18000/20000 56/60 58/60 6000/5000 0

Violation?

constraint

Methodology

212 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Summary

Three-step design methodology


Functionality speci cation System design Component implementation

Major tasks in system design


Allocation Partitioning Re nement

Generic synthesis tool Conceptualization environment


Crucial to practical use

Methodology

213 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

Future directions

Advanced estimation methods Formal veri cation Testability Frameworks and databases Regularity exploiting System-level transformations Feedback incorporation

Methodology

214 of 214

Copyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong

UC Irvine

References
[BHS91] F. Belina, D. Hogrefe, and A. Sarma. SDL with Applications from Protocol Speci cations. Prentice Hall, 1991. [BK87] G. Borriello and R.H. Katz. \Synthesis and optimization of interface transducer logic,". In Proceedings of the International Conference on Computer-Aided Design, 1987. [CS86] C.Tseng and D.P. Siewiorek. \Automated synthesis of datapaths in digital systems,". IEEE Transactions on Computer-Aided Design, pages 379{395, July 1986. [EHB94] R. Ernst, J. Henkel, and T. Benner. \Hardware-software cosynthesis for microcontrollers,". In IEEE Design & Test of Computers, pages 64{75, December 1994. [FM82] C.M. Fiduccia and R.M. Mattheyses. \A linear-time heuristic for improving network partitions,". In Proceedings of the Design Automation Conference, 1982. [GD90] R. Gupta and G. DeMicheli. \Partitioning of functional models of synchronous digital systems,". In Proceedings of the International Conference on Computer-Aided Design, pages 216{219, 1990. [GD92] R. Gupta and G. DeMicheli. \System-level synthesis using re-programmable components,". In Proceedings of the European Conference on Design Automation (EDAC), pages 2{7, 1992. [GD93] R. Gupta and G. DeMicheli. \Hardware-software cosynthesis for digital systems,". In IEEE Design & Test of Computers, pages 29{41, October 1993. [GVN94] D.D. Gajski, F. Vahid, and S. Narayan. \A system-design methodology: Executable-speci cation re nement,". In Proceedings of the European Conference on Design Automation (EDAC), 1994. [Hal93] Nicolas Halbwachs. Synchronous Programming of Reactive Systems. Kluwer Academic Publishers, 1993. [Hoa78] C.A.R. Hoare. \Communicating sequential processes,". Communications of the ACM, 21(8): 666{677, 1978. [IEE88] IEEE Inc., N.Y. IEEE Standard VHDL Language Reference Manual, 1988. [JMP88] R. Jain, M. Mlinar, and A. Parker. \Area-time model for synthesis of non-pipelined designs,". In Proceedings of the International Conference on Computer-Aided Design, 1988. [Joh67] S.C. Johnson. \Hierarchical clustering schemes,". Psychometrika, pages 241{254, September 1967.

[KC91] Y.C. Kirkpatrick and C.K. Cheng. \Ratio cut partitioning for hierarchical designs,". IEEE Transactions on Computer-Aided Design, 10(7): 911{921, 1991. [KGV83] S. Kirkpatrick, C.D. Gelatt, and M. P. Vecchi. \Optimization by simulated annealing,". Science, 220(4598): 671{680, 1983. [KL70] B.W. Kernighan and S. Lin. \An ef cient heuristic procedure for partitioning graphs,". Bell System Technical Journal, February 1970. [LT91] E.D. Lagnese and D.E. Thomas. \Architectural partitioning for system level synthesis of integrated circuits,". IEEE Transactions on Computer-Aided Design, July 1991. [MK90] M.C. McFarland and T.J. Kowalski. \Incorporating bottom-up design into hardware synthesis,". IEEE Transactions on Computer-Aided Design, September 1990. [NG92] S. Narayan and D.D. Gajski. \System clock estimation based on clock slack minimization,". In Proceedings of the European Design Automation Conference (EuroDAC), 1992. [NG94] S. Narayan and D.D. Gajski. \Synthesis of system-level bus interfaces,". In Proceedings of the European Conference on Design Automation (EDAC), 1994. [NVG92] S. Narayan, F. Vahid, and D.D. Gajski. \System speci cation with the SpecCharts language,". In IEEE Design & Test of Computers, Dec. 1992. [PK89] P.G. Paulin and J.P. Knight. \Algorithms for high-level synthesis,". In IEEE Design & Test of Computers, Dec. 1989. [PPM86] A.C. Parker, T. Pizzaro, and M. Mlinar. \MAHA: A program for datapath synthesis,". In Proceedings of the Design Automation Conference, 1986. [TM91] D.E. Thomas and P. Moorby. The Verilog Hardware Description Language. Kluwer Academic Publishers, 1991. [VG92] F. Vahid and D.D. Gajski. \Speci cation partitioning for system design,". In Proceedings of the Design Automation Conference, 1992. [VGG93] F. Vahid, J. Gong, and D.D. Gajski. \A hardware-software partitioning algorithm for minimizing hardware,". UC Irvine, Dept. of ICS, Technical Report 93-38,1993.

You might also like