You are on page 1of 16

Clocking Part 2

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Clocking
For modern processors, cycle time is around 1620 FO4 delays, of which registers take 2-4
FO4 delays
Power consumption dominated by clock load, both
distribution network and end loads (latches,
prechargers)
70% of total power in IBM POWER4 design

Simple single-edge triggered registers are fine


for most ASIC designs. This lecture well
examine what is happening in high performance
designs.

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Edge Triggered Timing Constraints


TPmin/TPmax
Combinational
Logic
CLK1

CLK2

Slow path timing constraint


Tcyc TCQmax + TPmax + Tsetup+ Tskew

worst case is when CLK2 is earlier/later than CLK1

Fast path timing constraint


TCQmin + TPmin Thold + Tskew

worst case is when CLK2 is earlier/later than CLK1

Fast path constraint cannot be fixed by slowing


clock fatal to chip design
Skew reduces cycle time
6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Two Phase Latch Based Design


Combinational
Logic 1
CLK1

Combinational
Logic 2
CLK2

CLK1

CLK1
CLK2
Non-overlap times

Divide cycle into two phases

phase 2 latches can only sample values generated from


phase 1 latch outputs, and vice versa.

Latches driven by two non-overlapping clocks


Can guarantee no fast path problems with larger
non-overlap
6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Two Phase Timing


A

Combinational
Logic 1

CLK1
CLK1

Tx

TNO

TNO
Ty

Combinational
Logic 2

CLK2

CLK2

Tz
TDQmax

TP1max

TDQmax

TP2max

In steady state, Tz Tx, therefore minimum cycle time


Tcyc TP1max + TP2max + 2TDQmax
Non-overlap time, TNO, can be adjusted such that no hold time
violations are possible:
TNO + TCQmin - Tskew Thold

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Time Borrowing
A

Combinational
Logic 1

CLK1
CLK1

C C.L. D
2

CLK2
Tx

TNO

CLK2
A

TNO
Tsetup

TCQmax
TP1max

C
D

Can place latches where convenient in logic path


Maximum time in one combinational logic block is
TP1max Tcyc TCQmax Tsetup TNO Tskew

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Single Clock Latch Based Design


Combinational
Logic 1
CLK

Combinational
Logic 2
CLK

CLK

Two phase non-overlapping system requires distribution of two clocks.


Can distribute single clock signal, and invert locally at latch.
Clock skew can cause overlap between transparent phases of CLK and
inverted CLK, so must check for fast path hold time violations.
Very common clocking scheme for full custom chips, works well with
pipelined domino logic.

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Pipelining Domino Logic

Domino circuits require monotonic change in input signal during


evaluation phase - cannot easily guarantee this with most edge
triggered devices.
Transparent latches allow setup of logic inputs before clock edge.
X
Q

NMOS

CLK

CLK
X
Q

eval.

Q
CLK-Q delay discharges
precharge node

CLK

precharge

Degraded
level
eval.

NMOS

X
Q setup before clock edge

CLK
6.371 Fall 2002

precharge

11/6/02

L18 Clocks Part 2

Pulse Latches
By using narrow clock pulses, can have only a single latch in any
combinational loop.
Used in Cray-1, and in many high-performance (Pentium-4) and
low-power microprocessors (XScale).
Tw

Combinational
Logic

CLK

Thold

Tsetup

CLK

TCQmin
TPmin

A
B
TPmax

Thold

Cycle time, Tcyc,min TDQmax + TPmax + Tsetup + Tskew Tw


Tw is pulse width, and gives maximum time borrowing for
previous cycle
Two-sided timing constraint on pulse width
Tsetup < Tw < TCQmin + TPmin - Thold - Tskew
6.371 Fall 2002

11/6/02

L18 Clocks Part 2

Double-Edge Triggered Registers


Clock load of flip-flops is significant fraction of total chip
power. Can reduce clock frequency in half by using a
double-edge triggered flip-flop.

Q
B

CLK
A

Latch Sample Latch Sample

Sample Latch Sample

Latch
B

CLK

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

10

Pentium-4 Pulse Latches


Pentium-4 distributes 50% duty cycle global clock at
advertised frequency (e.g., 2.8GHz Pentium-4 has 2.8GHz
clock)
Fast ALU section of Pentium-4 runs at twice advertised
clock frequency using pulse latches driven from both edges
of the distributed clock. Clock buffers have duty cycle
correction circuitry to ensure 50% duty cycle.
GCLK (2.8 GHz)
PCLK (5.6 GHz)

GCLK

6.371 Fall 2002

PCLK

11/6/02

L18 Clocks Part 2

11

Flip-Flops Timing

[ Stojanovic and Oklobdzija ]


6.371 Fall 2002

11/6/02

L18 Clocks Part 2

12

Crossing Time Domains


Common to have to communicate between logic
blocks running at unrelated clock frequencies
TCLK
TCLK Clock
Clock
Domain
Domain

TCLK

RCLK

RCLK
RCLK Clock
Clock
Domain
Domain

TCLK
RCLK

Possible
setup time
violation

Possible
hold time
violation

If setup and hold times are violated, flip-flops


might hang in a metastable state.
6.371 Fall 2002

11/6/02

L18 Clocks Part 2

13

Metastability
CLK

Voltage

Feedback

CLK

metastable

Sampling latch

Observation
Interval, t

Time

Probability of failure (i.e., not valid 1 or 0) when observed


time t after clock edge
- t r
F(t) = k e
Parameters k and r functions of latch design. r is called the
time constant of resolution and is primarily controlled by the
gain-bandwidth product of the feedback loop (dont use
dynamic latches as synchronizers!). Error probability decreases
exponentially with t but always some chance of failure.
6.371 Fall 2002

11/6/02

L18 Clocks Part 2

14

Metastability Failure Calculations


-t/ r
ff = tW fT fC e
Frequency of failures for sampling window (setup+hold) of tW,
sampling frequency fC and input transition frequency of fT
For 1GHz sampling clock, 100MHz transitions, 50ps
setup+hold, 50ps time constant, 950ps observation time
ff = 0.03Hz (Mean Time Between Failures: 33 seconds)
Increase observation time to 1950ps (two cycles)
ff = 5.8x10-11 Hz (MTBF 550 years)
Increase observation time to 2950ps (three cycles)
ff = 1.2x10-19 Hz (MTBF 266 billion years)

6.371 Fall 2002

11/6/02

L18 Clocks Part 2

15

Synchronizers
RCLK

TCLK

Use N interleaved registers, each


clocked at 1/N of RCLK rate, to
increase resolution interval by factor of
N without decreasing signal bandwidth.

TCLK CLKA

6.371 Fall 2002

Use pipelined registers to give full RCLK


cycle to resolve asynchronous input.

CLKB

RCLK

CLKC

Rotating
Select

RCLK
CLKA
CLKB
CLKC

Observation
Interval

11/6/02

Repeat
Interval

L18 Clocks Part 2

16

You might also like