Professional Documents
Culture Documents
4, APRIL 2004
517
I. INTRODUCTION
Early works on buffer insertion are mostly focused on improving interconnect timing performance. The most influential
pioneer work is van Ginnekens dynamic programming algorithm [16] that achieves polynomial time-optimal solution on
a given Steiner tree under Elmore delay model [7]. In [13],
Lillis et al. extended van Ginnekens algorithm by using a
buffer library with inverting and noninverting buffers, while
also considering power consumptions.
The major weakness of the van Ginneken approach is that it
requires a fixed Steiner tree topology has to be provided in advance, which makes the final buffer solution quality dependent
Manuscript received May 30, 2003; revised October 16, 2003. This work was
supported in part by the Semiconductor Research Corporation under Contract
2003-TJ-1124. This paper was recommended by Guest Editor P. Groeneveld.
C. J. Alpert and S. T. Quay are with the IBM Corporation, Austin, TX 78758
USA.
G. Gandham is with the IBM Corporation, Hopewell Junction, NY 12533
USA.
M. Hrkic is with the Department of Computer Science, University of Illinois,
Chicago, IL 60607 USA.
J. Hu and C. N. Sze are with the Department of Electrical Engineering,
Texas A&M University, College Station, TX 77843 USA (e-mail: jianghu@
ee.tamu.edu).
Digital Object Identifier 10.1109/TCAD.2004.825864
518
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004
porosity in the layout directly (as opposed to simply specifying a handful of allowable buffer insertion locations). The
approach begins with a timing-driven but porosity-ignorant
Steiner tree, then applies a plate-based adjustment guided
by length-based buffer insertion. After performing localized
blockage avoidance, the resulting tree is then passed to van
Ginnekens algorithm to obtain a porosity-aware buffered
Steiner tree. Physical synthesis experiments on real industrial
circuits confirms the effectiveness of this framework.
Fig. 1. (a) Arbitrary Steiner tree may force buffers being inserted at dense
regions. (b) A porosity-aware buffered Steiner tree will enable buffers solution
at sparse regions.
Fig. 2. (a) Arbitrary Steiner tree may force wires to pass through routing
congested regions. (b) A congestion-aware buffered Steiner tree may let wires
go through sparse regions.
with source
and sinks
Given a net
, load capacitance
and required arrival time
for
, tile graph
, and a buffer type ,
, in which
and
construct a Steiner tree
edges in span every node in , such that a buffer insertion
may be obtained with a minimal
solution that satisfies
porosity cost (wire density cost).
For porosity cost, one can sum the costs of the tiles that the
routes cross. We use the following function, though certainly
others can be used as well, depending on the application. For
is defined
each buffer inserted in a tile , the porosity cost
as
According to this porosity cost definition, any difference between low placement density (below 0.5) does not matter.
However, when density is high (greater than 0.8), the cost
penalty is much heavier. Thus, the more buffers inserted, the
higher the cost, though buffers are close to free in sparser
and
are
regions. The coefficients for terms with
selected in a way to make this cost function continuous. Note
that we do not recommend using an infinite penalty if inserting a buffer exceeds the density target. It may be better
to insert a buffer into an overdense region and move other
cells out of the region than to accept a significantly inferior
delay result. Ultimately, the right tradeoff between porosity
cost and timing depends on the criticality of the net and the
design characteristics.
519
Fig. 3. (a) Candidate solutions are generated from v and v and propagated to every tile, which is shaded, in the plate for v . Solution search is limited to the
bounding boxes indicated by the thickened dashed lines. (b) Solutions from v and every tile in the plate for v are propagated to the plate for v . (c) Solutions
from plate of v are propagated to the source and the thin solid lines indicate an alternative tree that may result from this process.
each tile to avoid blockages, thereby allowing more buffer insertion candidates. However, this stage does not disturb the tree
topology uncovered from Stage 2.
Finally, in Stage 4 the resulting tree topology is fixed for van
Ginneken-style buffer insertion [16]. Van Ginnekens algorithm
is a dynamic programming-based bottom-up approach. A set of
candidate solutions are propagated from leaf nodes toward the
source and the optimal solution is selected at the source.
Since we use known algorithms for Stages 1 and 4, the rest
of the discussion focuses on the other two stages.
B. Stage 2: Plate-Based Adjustments
The basic idea for the plate-based adjustment is to perform
a simplified simultaneous buffer insertion and local tree adjustment so that the Steiner nodes and wiring paths can be moved
to regions with greater porosity without significant disturbance
on the timing performance obtained in Stage 1.
The buffer insertion scheme employed here is similar to the
length based buffer insertion in [2]. However, we allow the
Steiner points and wiring paths to be adjusted. This approach
is also distinctive from a simultaneous buffer insertion and
Steiner tree construction approach [14] in which the Steiner
tree is built from scratch. The plate-based adjustment traverses
the given Steiner topology in a bottom-up fashion similar to
van Ginnekens algorithm. During this process, Steiner nodes
and wiring paths may be adjusted together with buffer insertion
to generate multiple candidate solutions.
The range of the moves are restrained by the plates defined
which is located in a tile
as follows. For a node
, a plate
for
is a set of tiles in the neighborhood
itself. During the plate-based adjustment,
of , including
we confine the location change for each Steiner node within its
corresponding plate. If is a sink or the source node, we let
. For example, when is a branch node, we set
to be the 3 3 array of tiles centered at . Of course, if
lies on the border of the entire tile graph,
may include
fewer tiles. Fig. 3(a) gives an example of the plate corresponding
520
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004
to Steiner node . The plate indicates any of the possible location which the Steiner node may be moved to. This example
shows the Steiner node being moved up one tile and right one
tile.
The search for alternate wiring paths is limited to the
minimum bounding box covering the plates of two end nodes.
In Fig. 3, such bounding boxes are indicated by the thickened
dashed lines. Therefore, the size of plates define the search
range for both Steiner nodes and wiring paths. Certainly, a plate
may be defined to include more tiles or to even be a irregular
shape. For example, if the plate size is 1 1, our algorithm will
basically reduce to a length based/van Ginneken-style buffer
insertion algorithm along a fixed topology. However, one could
also choose the plate to be the entire tile graph, which yields
an algorithm similar to the S-Tree embedding [9]. By choosing
a plate of size larger than one but smaller than the entire grid
graph, we still obtain the ability to modify the topology to
move critical Steiner points into low-porosity regions while
also capping the runtime penalty. Hence, our experiments use
a 3 3 tile size to capture this tradeoff. An example of how a
new Steiner topology might be constructed from an existing
topology is demonstrated in Fig. 3(a) and (c).
Adopting a length-based buffer insertion scheme [2] makes
the adjustment simpler and thereby faster. Since this stage
only produces a Steiner tree before buffering, the purpose of
including buffering at this stage is for estimation purposes,
hence a simplified scheme should be adequate.
Alpert et al. [2] suggest that buffer insertion may be performed in a simple way by just following a rule of thumb: the
maximal interval between two neighboring buffers is no greater
than certain upper bound. The interval between neighboring
buffers is specified in term of number of tiles. In this work, this
constraint is modified to be the maximum load capacitance a
buffer/driver may drive, so that sink/buffer capacitance can be
incorporated. To keep the succinctness of the tile-based interval
metric in [2], we discretize the load capacitance in units equivalent to the capacitance of wire with average tile size. If the average tile boundary length is and the wire capacitance per unit
length is , then a load capacitance is expressed in the number
. This modification allows us to use nonuniform-sized
of
tiles, even though we still assume such nonuniformity is very
limited.
Also, for the sake of simplification, we assume a single
typical buffer type, the Elmore delay model for interconnect
and a switch-level RC gate delay model for this plate-based
adjustment.
Each intermediate buffer solution is characterized by a threein which is the root of the subtree, is the
tuple
discretized load capacitance seen from , and is the accumuis said to be domlated porosity cost. A solution
if
and
inated by another solution
. A set of buffer solutions
at node is a nondomidominated by annating set when there is no solution in
. Usually,
is arranged as an array
other solution in
in the ascending order of load
capacitance. The basic operations of the length-based buffer insertion are:
Fig. 4.
Core algorithm.
Fig. 6. (a) Path overlaps with a buffer blockage can be moved out of the
blockage with a small detour (b) with a small penalty coefficient on cost. Only
a greater penalty coefficient can allow a large detour, like from (c) to (d).
521
Fig. 7. Example of local blockage avoidance. Shaded rectangles represent buffer blockages. The Steiner tree (a) before blockage avoidance and (b) after blockage
avoidance.
.
dominated by another solution
: grow
to node by Ad
, insert buffer for
dWire to get solution
to obtain
. Add the unbuffered solution
and buffered solution into solution set
and prune
.
the solutions in
: merge solution set from left child
TABLE I
BENCHMARK CIRCUITS. THE SLACK UNIT IS PICOSECONDS
522
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004
TABLE II
EXPERIMENTAL RESULTS WHEN ONLY POROSITY COST IS MINIMIZED. POROSITY COST IS DENOTED AS P
TABLE III
STATISTICS FOR THE POROSITY COST ONLY EXPERIMENTAL RESULTS. THE
SLACK IMPROVEMENT IS THE DIFFERENCE FROM 1-4 FLOW. POROSITY
COST IS NORMALIZED WITH RESPECT TO THE POROSITY COST FROM
1-4 FLOW. THE MIN-AVE-MAX IS THE MINIMUM, THE AVERAGE, AND
THE MAXIMUM VALUE AMONG ALL NETS
Fig. 8.
523
TABLE IV
EXPERIMENTAL RESULTS WHEN ONLY WIRE DENSITY COST
IS MINIMIZED. WIRE DENSITY IS DENOTED AS
TABLE V
STATISTICS FOR THE WIRE DENSITY COST ONLY EXPERIMENTAL RESULTS.
THE SLACK IMPROVEMENT IS THE DIFFERENCE FROM 1-4 FLOW. WIRE
DENSITY IS NORMALIZED WITH RESPECT TO THE WIRE DENSITY FROM
1-4 FLOW. THE MIN-AVE-MAX IS THE MINIMUM, THE AVERAGE,
AND THE MAXIMUM VALUE AMONG ALL NETS
The results for Set 1 are shown in Table II with its statistics in
Table III. The data in Table III show that the complete 12-34
flow can achieve 200 ps slack improvement over 14 flow and
120 ps improvement over 13-4 flow on average. The average
is an average value for all buffers inserted in
porosity cost
a net and
is the maximum porosity among all buffers. The
average porosity cost is reduced by 9% from 14 flow. Since
13-4 flow does not consider porosity, its rerouting in Stage 3
increases the average porosity cost by 12%. The effect is more
. The 12-34
significant for the maximum porosity cost
flow achieves 21% maximum porosity cost reduction while the
13-4 flow results in 68% greater cost. Therefore, Stage 2 is very
necessary on improving both timing and porosity cost. In order
to further illustrate the effect of our algorithm, the cost-slack
for different flows are shown in
tradeoff curves of net
Fig. 8. These curves come from the final candidate solutions
524
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004
TABLE VI
EXPERIMENTAL RESULTS OF 1-2-3-4 FLOW WHEN BOTH POROSITY COST AND WIRE DENSITY COST ARE
CONSIDERED. POROSITY COST AND WIRE DENSITY ARE DENOTED AS AND , RESPECTIVELY
TABLE VII
STATISTICS FOR EXPERIMENTS WHEN BOTH POROSITY COST AND WIRE
DENSITY COST ARE CONSIDERED. THE SLACK IMPROVEMENT IS THE
DIFFERENCE FROM 1-4 FLOW. POROSITY COST AND WIRE DENSITY
ARE NORMALIZED WITH RESPECT TO RESULT OF 1-4 FLOW. THE
MIN-AVE-MAX IS THE MINIMUM, THE AVERAGE, AND THE
MAXIMUM VALUE AMONG ALL NETS
TABLE VIII
TOTAL RUNTIME FOR EACH STAGE IN SECONDS
Fig. 10. Steiner tree obtained through porosity-aware method on the same net
as in Fig. 9.
both timing and cost. The CPU time is concluded in the rightmost column of Table II.
B. Experiments Considering Wiring Congestion
at the root node in Van Ginnekens algorithm at Stage 4. Obviously, the solutions from our proposed 12-34 flow is better on
We also tested the effect of our algorithm on wiring congestion for the same set of test cases in Section IV-A. When wiring
congestion is considered, only the cost function in Stage 2 is
, where is the
changed by augmenting wire density cost
) and the maximum (
) wire
wire density. Average (
density over all tile boundaries the edges of the Steiner tree
crossing are reported in the experimental results.
Tables IV and V show results from experiments in which wire
density is considered but porosity cost is ignored. In these experiments, the weighting factors for porosity cost, wire density,
,
, and
, reand scaled wirelength are
spectively. The statistics in Table V show that the average wire
density, which is the average of the wire density among all the
tile boundaries the routing tree across, is improved by 30%. The
peak wire density is reduced by 19% as well.
The combination of porosity and wiring congestion is
tested and the results are listed in Tables VI and VII. In these
525
TABLE IX
PHYSICAL SYNTHESIS ON INDUSTRIAL DESIGNS
526
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004
Milos Hrkic received the B.S. degree in computer science from the University
of Illinois, Chicago, where he is currently pursuing the Ph.D. degree in computer
science.
He was an Intern with IBM Austin Research Laboratory, Austin, TX, in 2001,
2002, and 2003. His research interests include combinatorial optimizations and
VLSI design automation, in particular interconnect synthesis and buffering.
Mr. Hrkic received an award at the International Conference on ComputerAided Design CADAthlon 2002 Programming contest.
Jiang Hu received the B.S. degree in optical engineering from Zhejiang University, Hangzhou, China,
in 1990, the M.S. degree in physics from the University of Minnesota, Duluth, in 1997, and the Ph.D.
degree in electrical engineering from the University
of Minnesota, Minneapolis, in 2001.
He was with IBM Microelectronics Division in
2001 and 2002. He is currently an Assistant Professor in the Department of Electrical Engineering,
Texas A&M University, College Station. His current
research interest is on VLSI design automation,
especially on interconnect optimization, clock network synthesis, design for
manufacturability, and uncertainty tolerant design.
Prof. Hu has served on the Technical Program Committee for the International
Conference on Computer Design, the International Sympoisum on Physical Design, and the International Symposium on Circuits and Systems. He received a
Best Paper Award at the Design Automation Conference in 2001.
Stephen T. Quay received the B.S. degree in electrical engineering and computer science from Washington University, St. Louis, MO, in 1983.
He is currently a Senior Engineer with the IBM Microelectronics Division,
Austin, TX. Since 1983, he has worked in many areas of chip layout and analysis
for IBM. He currently develops design automation applications for interconnect
extraction and performance optimization.