You are on page 1of 9

Timing analysis with compact variation-aware standard cell models

Seyed-Abdollah Aftabjahani, Linda Milor

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
a r t i c l e i n f o
Article history:
Received 29 January 2008
Received in revised form
3 November 2008
Accepted 11 November 2008
Keywords:
Standard cell characterization
Process variation
Within-die variation
Timing analysis
a b s t r a c t
A compact variation-aware timing model for a standard cell in a cell library is developed. The cell model
incorporates variations in the input waveform and loading, process parameters, and the environment
into the cell timing model. The cell model operates on full waveforms, which are modeled using
principal component analysis (PCA). PCA enables the construction of a compact model of a set of
waveforms impacted by variations in loading, process parameters, and the environment. Cell
characterization involves describing with equations how waveforms are transformed by a cell as a
function of the input waveforms, process parameters, and the environment. The models have been
evaluated by calculating the delay of paths. The results demonstrate improved accuracy in comparison
with table-based static timing analysis at comparable computational cost. Complexity of the models as
a function of the number of parameters modeling variation is also discussed, and shows reduced
memory requirements as the number of parameters describing variations increases.
& 2008 Elsevier B.V. All rights reserved.
1. Introduction
Circuit timing analysis is needed to ascertain if a design meets
timing requirements before manufacturing. The standard ap-
proach to estimate circuit timing is through static timing analysis
(STA). STA involves converting a circuit into a timing graph, where
each edge represents the delay of a gate between its inputs and
outputs. STA then performs a graph traversal to nd the longest
path, based on a project planning technique, called the Critical
Path Method [1].
The delay through gates is a function of the slope of the input
signals. Hence, the traditional approach to accounting for the
input slope is to characterize cell delay through tables, which pre-
compute delay and output slew as a function of input slew for
each gate in a standard cell library. In order to account for slew,
STA requires an additional step, a preliminary backwards traversal
through the timing graph to determine the relationship between
slew and delay to the output for each node in the network [2].
Circuit timing is increasingly impacted by variation due the
manufacturing process and the operating environment. The
standard approach to account for variation is through worst-case
analysis [3]. Worst-case analysis assumes that parameters are
constant within a chip, but vary between chips. Designers ensure
that their design satises specications for all process corners by
simulating the circuit with a small set of corner models that
represent process extremes. The corner models consist of tables
relating delay and output slew to input slew and loading for these
process extremes.
Circuit timing has, however, become increasingly susceptible
to within-die variation due to both the manufacturing process and
the operating environment. Hence, it has become imperative to
take into account these variations in device and interconnect
characteristics during design. Worst-case design does not take
into account within-die variation.
To account for within-die variation, we need to perform
statistical static timing analysis (SSTA) at corners that dene
die-to-die variation [412]. SSTA can determine the variation in
critical path delays as a function of random and systematic
variation within and between paths. SSTA resembles STA, except
gates are characterized by delay distributions. The gate delay and
arrival time distributions result in distributions of output delays,
and correlations among these delays. Graph traversal involves
applying statistical sum to arrival time distributions and the delay
distribution for each gate, and statistical maximum operations to
the resulting gate delay distributions.
Clearly, for SSTA we need compact models of standard cells
that are accurate over parameter and environmental variations,
not just at process extremes, as in worst-case design. Our
proposed models can be used to generate the delay distribution
functions, which can account for spatial correlations, as needed,
using methods as in [612]. Our models can also be used directly
in Monte-Carlo-based SSTA, which involves path enumeration,
Monte Carlo analysis of critical paths, and the statistical
maximum operation on the resulting path delays, as described
in [8,1317].
The goal of this work is to develop a methodology to construct
compact variation-aware timing models for standard cells in a cell
ARTICLE IN PRESS
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/vlsi
INTEGRATION, the VLSI journal
0167-9260/$ - see front matter & 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.vlsi.2008.11.008

Corresponding author. Tel.: +1404 894 4793; fax: +1404 894 4641.
E-mail address: linda.milor@ece.gatech.edu (L. Milor).
INTEGRATION, the VLSI journal 42 (2009) 312320
library, which are accurate over process and environmental
variations. The model also utilizes compact models of waveforms.
This paper will show that these compact waveform models, when
used for static timing analysis, are more accurate than the well-
known tabular method [18] and comparable in terms of
computational cost.
The compact waveform models are constructed via PCA [19] of
waveforms, where the waveforms are described by principal
component scores (PCSs), which can reconstruct the waveforms.
Moreover, since the principal component basis functions are
shared among all waveforms, cell library characterization requires
that we only store the equations that describe the transformations
of the principal component scores as the waveform passes
through the cell. The equations also describe changes in cell
performance as a function of variations in the process and
operating environment.
This method differs from traditional static timing analysis
(a) by working with waveforms with realistic shapes,
(b) by storing the waveform transformation through a cell as an
equation rather than a table, and
(c) by including equations that describe any changes in cell
performance as a function of variations in the process and
operating environment.
This is not the rst attempt to accurately model waveforms for
timing analysis. Recent work has considered accurate modeling of
waveform propagation through standard cells. In [20], it is shown
that realistic waveforms do not resemble the idealized ramp, and
in [21] it is shown that realistic waveform modeling results in
more accurate timing analysis. Examples of waveform modeling
include [22], where a Weibull shape parameter is added to
waveform characterization to account for the differences between
real waveforms and their approximation by a ramp. Other work
has aimed to model realistic waveforms with a set of basis
functions [2326]. The basis functions have been selected in a
variety of ways, including an error minimization heuristic,
involving shifting and scaling of waveforms [23,24], PCA [25],
and singular value decomposition (SVD) [26]. All prior work has
shown that a few basis functions can be used to approximate
realistic waveforms.
Like [24,27], the proposed work considers the impact of
process and environmental variations on waveforms. In the
proposed work, the basis functions are derived by PCA. Hence,
the proposed approach extends prior work in [25,26] by including
in PCA waveform model construction for large variations from
parameters related to the process and the environment. This work
formalizes, generalizes, and species restrictions for the approach,
and proposes methods to make the waveform models practical.
The cell models differ from prior work on modeling cells as
equations [10,11,2830] since the cell models operate on para-
meters that describe waveforms, not just process parameters,
waveform slew, and environmental parameters. The parameters
are not required to be independent, and the compact model
consists of multivariate polynomials with a minimum number of
terms, which are selected based on analysis of variance and
accuracy.
Since cells operate on waveforms in the PCA domain, several
new problems arise. First, we need to determine the set of PCSs
that correspond to realistic waveforms, i.e. PCSs that can be
transformed back to the time domain. Second, we need common
principal component basis functions for both the inputs and
outputs of cells. This is because PCA is a data-driven methodology.
Hence each set of input waveforms and each standard cell can
generate a unique set of principal component basis functions
describing the output waveforms. Hence, some additional steps
are needed to come up with a common set of basis functions for
all inputs and cells.
Additionally, for our model involving PCA waveform modeling
and cell characterization with equations, we show that unlike the
tabular static timing analysis method, where memory usage
increases exponentially as a function of accuracy in the dis-
cretization of parameters that characterize the input and output
waveforms (slope and fanout), our proposed method is typically
quadratic in memory usage as a function of the parameters
describing the waveforms, process, and environmental variations.
Finally, we apply the PCA model to static timing analysis and
examine the accuracy of delay calculations for long chains of
gates.
This paper is organized as follows. Section 2 describes the
experimental platform and the parameters modeling variability
for cells and waveforms. Sections 3 and 4 discuss waveform model
construction and accuracy analysis, respectively. Section 5
describes cell model construction and evaluates accuracy of
delays of paths in comparison with Hspice [31] and tabular
static timing analysis. Memory usage and computational com-
plexity are summarized in Section 6, followed by a conclusion in
Section 7.
2. The experimental platform and model of variation
Traditionally, input waveforms are represented by delayslope
pairs. In this work, the slope is replaced by a set of PCSs. The
number of PCSs determines the accuracy of the model. In one
extreme, if all the scores are used, the model can reconstruct the
exact waveform.
An inverter, designed and layed out with TSMC 180nm
technology, was used to develop the methodology. This technol-
ogy was the most advanced one available for our CAD tools. After
DRC
1
and LVS,
2
parasitics were included in the model through
parasitic extraction [32]. Advanced features of Hspice automated
the large number of simulation runs, which included generating
input waveforms based on a model and capturing the data points
of the output waveforms at predetermined relative voltage
intervals. The dataset was imported and manipulated using
Matlab [33] to construct the two-level full factorial model [34]
for each output parameter. The signicant effects were deter-
mined to form the compact models.
Timing characteristics of standard cells are primarily a function
of loading capacitance (fanout), the input waveform, variations of
device parameters, i.e., the channel lengths and the threshold
voltages of transistors, and the environment, i.e., the power supply
voltage and temperature. The ranges of parameters in the model
are listed in Table 1. These parameters include the fanout,
parameters that describe the input waveform (either slope or
principal components, [PC1,PC2] or [L,Y], described in Section 3),
the gate length and threshold voltage of the NMOS and PMOS
transistors, temperature, and supply voltage.
The ranges for process parameters were chosen to be small
relative to realistic die-to-die process parameter variations, which
are on the order of 730%. This is because die-to-die variation is
effectively handled with corner models, and the focus of this work
is to supplement these models with variation-aware compact
models at each corner that can account for within-die variation,
whose range is smaller than die-to-die variation.
ARTICLE IN PRESS
1
Design rule check.
2
Layout versus schematic.
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 313
A set of models describes the stage delay and output waveform
shape, characterized by its principal components ([PC1,PC2] or
[L,Y]), as a function of all parameters in Table 1. The models were
designed to be valid over a wide range of variation by using a full
factorial experimental design covering all extreme corners of the
experimental space.
3. Construction of the waveform model
In order to develop the waveform models, a dataset of 256
falling and 256 rising waveforms was generated by running a two-
level full factorial experiment varying the parameters in Table 1,
i.e. characterizing fanout, parameters that describe the input
waveform (either slopeduring the rst iterationor principal
componentsfor other iterations, [PC1,PC2] or [L,Y]), the gate
length and threshold voltage of the NMOS and PMOS transistors,
temperature, and supply voltage. The datasets for rising and
falling waveforms were merged by converting the fall times to rise
times by subtracting fall times voltages from the maximum
voltage. A set of waveforms is shown in Fig. 1.
The resulting 512 timing waveforms were discretized, by
partitioning the voltage scale into equal intervals to form 19
voltage and time point pairs. An analysis of the impact of
discretization on accuracy is summarized in the Appendix.
Analysis of the inverter waveforms revealed that two PCSs
cover 99.8% of variation for both rising and falling transitions.
Hence, only two PCSs (PC1 and PC2) serve as weights for the two
waveforms, whose linear combination is used to reconstruct
the time domain transition waveform. Moreover, each transition
maps to a single point in the two-dimensional PCA domain.
The points in the PCA domain that correspond to the waveforms in
Fig. 1 are shown in Fig. 2. It can be seen that the groups of
waveforms in the time domain map to clusters of points in the
PCA domain.
Mapping between the time domain to the PCA domain and vice
versa can be represented by a pair of transformations. If the data
are not standardized, the transformation equations are the
following:
PCS PCMT U (1)
T U PCMI PCS (2)
where PCS is a 19 element vector of scores, T is a 19 element
vector of time points describing the waveform, U is a 19 element
vector that is the average of all Ts in the dataset, PCM is the PCA
model transformation matrix from the time domain to the PCA
domain, and PCMI is the inverse of PCM. For a 19 element vector,
PCM and PCMI are 1919-dimensional matrices. PCM is found by
computing the eigenvectors of the 1919 covariance matrix from
the dataset. The rows of PCM are the normalized eigenvectors of
this covariance matrix.
Based on (1), for a 19 element vector, there are 19 mapping
functions (3); each maps the 19 time points describing a
waveform to a point in the 19-dimensional PCA space.
pc1 pcm1; 1t1 u1 pcm1; 2t2 u2
pc2 pcm2; 1t1 u1 pcm2; 2t2 u2
. . .
pc19 pcm19; 1t1 u1 pcm19; 2t2 u2 (3)
The elements of the PCM matrix are coefcients of the linear
equations for the transformation.
If the data are standardized, Eqs. (1) and (2) are replaced by
Eqs. (4) and (5):
PCS PCMT UD
1
(4)
T U PCMI PCS D (5)
where D is a diagonal matrix of standard deviations associated
with each of the 19 elements of the dataset.
The signicant PCSs are found through determining the
eigenvalues of the covariance matrix. Small eigenvalues corre-
spond to insignicant PCSs. Dimensional reduction is achieved by
setting coefcients of PCM that correspond to the eigenvectors
associated with insignicant PCSs to zero. It is worth mentioning
that the sum of the eigenvalues corresponding to the eigenvectors
selected for the model determine the variance coverage.
ARTICLE IN PRESS
Table 1
Model parameters.
Variable Variation Variable Variation
Lp 05% Ln 05%
DVtp 5% to 5% DVtn 5% to 5%
DT 0701C Slope [PC1, PC2] [L,Y] 0.44ns (for slope) Dataset range (otherwise)
Fanout 164
Vdd 10% to 10%
Fig. 1. The dataset of time domain rising and falling waveforms generated using a
full factorial experimental design.
Fig. 2. The waveforms corresponding to rising and falling transitions transformed
to the PCA domain.
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 314
The inverse of the PCM matrix, PCMI, is used to reconstruct
waveforms. PCMI is the transpose of PCM. The signicant PCSs
weight the waveforms stored in PCMI to generate time domain
transition waveforms, as follows, if the data are not standardized.
t1 u1 pcmi1; 1pc1 pcmi1; 2pc2
t2 u2 pcmi2; 1pc1 pcmi2; 2pc2
. . .
t19 u19 pcmi19; 1pc1 pcmi19; 2pc2 (6)
All of the points in the PCA domain do not necessarily map to
valid transition waveforms. Valid transitions require that the
waveform does not move backwards in time. Accordingly, it is
required that
t194t184 4t1 (7)
This creates an acceptability region restriction on the PCA space
which is obtained by substituting Eq. (2) or (5) into (7) to create
18 linear relationships, as follows for the case with non-
standardized data.
u1 pcmi1; 1pc1 pcmi1; 2pc2
ou2 pcmi2; 1pc1 pcmi2; 2pc2
u2 pcmi2; 1pc1 pcmi2; 2pc2
ou3 pcmi3; 1pc1 pcmi3; 2pc2
. . . (8)
The acceptability region is also restricted by the maximum and
minimum of the PCSs from the dataset. Linear programming is
used to nd the acceptability region. The resulting acceptability
region is shown in Fig. 3(a).
Fig. 3(a) also contains some points, marked by AD in the PCA
domain. They correspond to the waveforms in the time domain in
Fig. 3(b). Waveforms A and B in Fig. 3(b) are not valid waveforms
because they contain segments where time moves backwards.
They correspond to points A and B in Fig. 3(a), which are outside
of the acceptability region. Waveforms C and D in Fig. 3(b) are
monotonic and valid. They are inside the acceptability region
illustrated in Fig. 3(a).
Some of the original data points in Fig. 2 lie outside of the
acceptability region, as can be seen in Fig. 4. This is because of
dimensional reduction. These waveforms can be reconstructed by
augmenting the original dataset with waveforms containing
negative time points by reecting the transitions across the
voltage axis. The addition of these waveforms with negative time
points for model construction widens the acceptability region. It
does not invalidate the model because the PCA model uses only
the positive time points.
3
Additionally, the PCA model generated
from the resulting dataset has the property that U 0. As a result,
the line segments bounding the acceptability region determined
by Eq. (7) always pass through the origin.
Initial analysis modeled the input waveforms with a slope.
However, it is desirable to determine a set of universal PCA basis
functions for both input and output transitions to avoid extra
mapping steps. To do this the corners of the PCA space that dene
the extreme waveforms must be determined for two-level
factorial analysis. But, as can be seen from Fig. 3, two of the PCSs
that correspond to corners of the PCA space lie outside of the
acceptability region and correspond to invalid waveforms.
This problem was tackled by using a polar coordinate system,
instead of a Cartesian coordinate system, for dening the corners
of the PCA space for full factorial experimental design.
4
In order to
map PC1 and PC2 to polar coordinates, one nds the magnitude
(L) and angle (Y) of a vector from the origin, as follows:
L PC1 PC1 PC2 PC2
0:5
y arctanPC2=PC1 (9)
The acceptability region in the polar domain is then determined
to guarantee valid waveforms, and a rectangle of maximum size is
t into the acceptability region to dene the corners for full
factorial experimental design, denoted as L(min), L(max), y(min),
and y(max).
A common set of principal components for the input and
output waveforms of a cell are generated by running the following
iterations:
(a) nd the principal components of the output waveforms;
(b) determine the acceptability region in the PCA space in terms
of polar coordinates;
(c) t a rectangle into the acceptability region to nd the corners
for full factorial experimental design;
(d) generate the waveforms corresponding to these corners and
apply these waveforms as inputs to the cell;
ARTICLE IN PRESS
Fig. 3. (a) The acceptability region in the PCA domain, together with some points,
labeled as AD corresponding to corners of the PCA domain. (b) Time domain
waveforms corresponding to the corner points AD in (a).
Fig. 4. Data points corresponding to the waveforms in Fig. 1 and the acceptability
region.
3
This process is similar to what is done in Fourier analysis, where negative
frequencies are used to help construct a model.
4
This coordinate conversion assumed only two signicant PCSs. If more than
two PCSs are signicant, pairs of PCSs can be converted to polar coordinates, with
the same transformation.
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 315
(e) simulate the cell to determine the corresponding output
waveforms;
(f) nd the principal components of the output waveforms; and
(g) go to (a).
If the input waveform principal components match the output
waveform principal components, then the principal components
have converged to a set of waveforms appropriate for both the
input and output of the cell.
Convergence is only possible by restricting the time window
for valid PCs, because a slow-rising input will create an output
with a slower transition. In our example, we have restricted the
time window to be from 0.4 to 8ns.
5
This time restriction imposes
an additional limit on the acceptability region, illustrated by the
diagonal line in Fig. 5. With this limit, convergence was achieved
in two iterations.
The resulting principal components are shown in Fig. 6.
Principal component basis functions related to the original 512
waveforms are labeled as GN (0th iteration), where the input was
a ramp. The following iterations are designated as IT(i) where i
is the iteration number. For these iterations, the input had a
realistic shape and was dened by the extremes of the PCA space
in Fig. 5. Principal component basis functions from the two
iterations using realistic input waveform shapes were almost
indistinguishable, and hence the model has converged.
4. Comparison of PCA methods for waveform modeling
PCA waveform models can be constructed in a variety of ways,
including (a) the symmetric non-standardized model (SNM),
obtained from a dataset formed by augmenting the original
dataset with waveforms with negative time points, (b) the
symmetric standardized model (SSM), obtained like the SNM
method, but with a standardized dataset (Eqs. (4) and (5)), and (c)
the asymmetric standardized model (ASM), obtained with the
standardized dataset, but without augmenting the dataset with
waveforms containing negative time points. Note that the
asymmetric non-standardized model was not considered because
a large number of the original data points are outside of the
acceptability region.
Several criteria
6
have been suggested to select the appropriate
number of PCs for a model [19]. These criteria recommend very
different numbers of principal component basis functions, ranging
from one to 17. In order to keep our models compact, we have
selected two principal component basis functions. Two principal
component basis functions cover 99.8% of variation for both rising
and falling transitions for all models.
The accuracy of the standard cell model is dependent on the
accuracy of
(a) the mapping of a waveform from the time domain to the PCA
domain,
(b) the mapping of input PCSs to output PCSs through a cell, and
(c) the mapping of output PCSs back to the time domain.
We analyzed the PCA modeling accuracy by determining the
residuals at each voltage level for all 512 transition waveforms
used to construct the model. Residuals are expressed as time
ARTICLE IN PRESS
Fig. 5. The nal acceptability region, including the limit imposed the convergence
requirement.
Comparing PC1s (Trise)
0
0.1
0.2
0.3
0.4
0.5
0
Data Points
P
C
1
s
Comparing PC2s (Trise)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
Data Points
P
C
2
s
GN
IT(1)
IT(2)
GN
IT(1)
IT(2)
5 10 15 20
5 10 15 20
Fig. 6. The coefcients of the principal component basis functions, computed after
each of the iterations: (a) PC1 and (b) PC2.
Table 2
Residuals of PCA models.
SNM SSM ASM
Max. (19-pt) 1.10 1.75 1.67
Average (19-pt) 0.08 0.07 0.07
Max. Ave. (19-pt) 0.25 0.38 0.36
Max. Ave. (15-pt) 0.00033 0.00023 0.00181
5
This window size impacts the acceptability region. A larger window size
creates a larger acceptability region but reduces model accuracy and reduces the
speed of convergence.
6
They include the following methods: the Broken Stick, the Average Root,
Variability Explained by PCs, the Scree Plot, the Residual Trace, the Velicer Partial
Correlation Function, the Index of Correlation Matrix, Imbedded Error, and the
Indicator Function.
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 316
domain errors for a xed voltage level. Table 2 summarizes the
results for all of the models. The table shows the maximum,
average, and maximum of the averages of the residuals for each
voltage point. It also includes the maximum of the average of the
residuals for the 15 middle voltage points, which correspond to
the 1090% range of the transition, which is more critical for
accurate timing analysis [35]. It was found that larger errors are
associated with longer transitions and the tails of the waveforms.
Specically, it can be seen that residuals associated with the
center of the waveform are close to zero. The symmetric models
appear to be the more accurate.
The Q-statistic [19] and T
2
-statistic [19] were used to analyze
the adequacy of the models by determining the number of outliers
in the original dataset. Outliers correspond to waveforms in the
original dataset that are not accurately modeled by PCA. Table 3
shows the fraction of outliers considering each of the screening
statistics. The table indicates that the number of outliers for all
three models is very similar.
5. The cell model and timing analysis
We have applied the SNM waveform model with two
signicant principal component basis functions to our dataset to
nd a relationship between the input parameters (listed in
Table 1) and output parameters (L, Y, and Stage_Delay) for the
inverter cell. L and Y characterize the shape of the output
waveform. Stage_Delay is the delay from input to output
measured at 50% of the supply voltage. The relationship between
the input parameters and output parameters is computed using
Yates algorithm [34] to determine all 511 effects (linear
coefcients and interactions) and the average. Because of the lack
of experimental error, signicant effects were found using normal
probability plots [34]. The resulting model indicates how the
shape of the output waveform and delay vary as a function of the
shape of the input waveform, process parameters, and variations
in the operating environment (temperature and supply voltage).
The output waveform is characterized by L, Y, and Stage_
Delay. L was found to be a function of the input waveform L and Y,
fanout, supply voltage, temperature, n-channel threshold voltage,
and p-channel length. Y was found to be a function of the input
waveform L and Y and fanout. Stage_Delay was found to be a
function of the input waveform L and Y, fanout, supply voltage,
temperature, n- and p-channel threshold voltages, and p-channel
length.
In evaluating the accuracy of the model, we do not consider the
presence of variation in process parameters, supply voltage, and
temperature, which is a function of the number of terms on the
model equations. The accuracy of the model as a function of
parameters is considered in detail in [36]. Instead, we just
consider accuracy in the presence of variations in the shape of
the input waveform and fanout. This enables us to make a direct
comparison with tabular static timing analysis.
The accuracy of the PCA model is evaluated by estimating the
delay of a narrow tree of inverters, with a depth of 20 and fanout
ranging from two to ve, as shown in Fig. 7. This provides a way to
determine the accuracy of the model for timing analysis of paths
in large circuits, with the only simplication being that the same
cell is used for all of the stages. The number of fanouts at each
stage and the slope of the input to the rst gate were varied. The
total delay from the input to the output of each stage was
determined using the following three methods.
Method 1: Tabular (Slope, Fanout) propagation. The inverter
timing is characterized for combinations of (Slope, Fanout) in
tables. Delay is estimated through linear and bilinear interpola-
tion from the tables. This method requires the following functions,
where i is the index for the stage.
Slopei 1 Slope_FunctionSlopei; Fanouti
Stage_Delayi 1 Delay_FunctionSlopei; Fanouti
Total_Delayi 1 Total_Delayi Stage_Delayi 1 (10)
ARTICLE IN PRESS
Table 3
Fraction of outliers (%).
Signicant level SNM SSM ASM
0.01 0.05 0.01 0.05 0.01 0.05
T
2
-Statistic 0 0 0 0 0 0
Q-Statistic 4 8 4 10 5 8
Both 4 8 4 10 5 8
Vin
t
2 3 21 1
t
Vout
Fig. 7. Narrow tree of inverters used to evaluate the accuracy of the PCA method.
Comparison of Total Delay (fast rising
transition, fanout = 2)
0
0.5
1
1.5
2
2.5
1
Stage Node
T
o
t
a
l

D
e
l
a
y

(
n
s
)
Hspice
Tabular
PC
Comparison of Total Delay (slow rising
transition,fanout = 3)
0
0.5
1
1.5
2
2.5
3
T
o
t
a
l

D
e
l
a
y

(
n
s
)
Hspice
Tabluar
PC
2 3 4 5 6 7 8 9 101112131415161718192021
1
Stage Node
2 3 4 5 6 7 8 9 101112131415161718192021
Fig. 8. Comparison of delay for the three methods for (a) a fast rising input
transition and (b) a slow rising input transition. (f fanout).
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 317
Our implementation included 272 elements in the table: 16
slopes and 17 fanouts.
Method 2: Simulation using Hspice. This method solves
numerical differential equations to nd delay.
Method 3: PCA for delay propagation. Delay is calculated as
follows, where i is the index for the stage.
Li 1 Length_FunctionLi; Ti; Fanouti
Yi 1 Angle_FunctionLi; Yi; Fanouti
Stage_Delayi 1 Delay_FunctionLi; Yi; Fanouti
Total_Delayi 1 Total_Delayi Stage_Delayi 1 (11)
Our implementation requires the storage of 16 coefcients. The
input to the rst stage for all methods was a ramp. Therefore, the
input to the rst gate must be mapped to the PCA domain for
the PCA method.
The delays obtained using the three methods are compared in
Fig. 8. It shows that Method 1 diverges from Hspice for long
chains. Method 3 is not smooth, and oscillates around the delay
from Hspice, but does not diverge as fast as Method 1.
Delays from Hspice (Method 2) are used as the basis of
comparison to obtain errors for each method. The average relative
errors and average variances are compared in Fig. 9, using data
from the outputs of each of the stages, i.e. from stage 2 to 21 (20
points), which shows that Method 3 is more accurate.
The simulation of the circuit were performed on a 4-CPU Ultra
Sparc II 400MHz server with a Sun Solaris operating system to
compare the three methods. The simulation time for Methods 1 and
3 were 0.2s, while the simulation time for Method 2 was 21.8s.
6. Complexity analysis
Table 4 compares the estimated time and space complexity per
transition entry per input for each cell for Methods 1 and 3. Let p
be the number of parameters characterizing a cell. For Method 1,
p 2, i.e. slope and fanout. For Method 3, p 3, since this method
requires a pair of PCSs to characterize the waveform shape, plus a
delay value. Let us also suppose that we take into account q
sources of variation, deriving from the process and environment
(temperature and supply voltage). Method 1 requires a p-
dimension table of numbers with k levels in each dimension. If
we take into account q sources of variation by computing
sensitivities to each of these parameters for each of the table
entries (i.e. we are postulating a linear model), we then require
q+1 tables with k
p
entries. Otherwise a table with k
(p+q)
entries is
needed for a model with all interactions. Hence the space
complexity of Method 1 is O(k
(p+q)
), which is reduced to O(qk
p
)
if we assume a linear model.
Method 3 discretizes waveforms into w voltage steps. A model
with p parameters has p1 signicant eigenvalues. Consequently,
(p1)w numbers must be stored. In addition, the model produces
a maximum of 2
(p+q)
coefcients for each of the p expressions. The
resulting model space complexity for Method 3 is O((pw+2
(p+q)
)).
However, typically, only linear terms are signicant, in which case
each of the p expressions have O(p+q) coefcients, resulting in a
space complexity of O(p(w+p+q)).
The characterization time complexity of Method 1 is propor-
tional to number of simulations needed to obtain each number in
ARTICLE IN PRESS
Average Relative Error
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
(
0
.
3
2
,
2
)
(
1
.
7
6
,
2
)
(
3
.
2
,
2
)
(
0
.
3
2
,
3
)
(
1
.
7
6
,
3
)
(
3
.
2
,
3
)
(
0
.
3
2
,
4
)
(
1
.
7
6
,
4
)
(
3
.
2
,
4
)
(
0
.
3
2
,
5
)
(
1
.
7
6
,
5
)
(
3
.
2
,
5
)
(Slope,fanout)
A
v
e
r
a
g
e

R
e
l
a
t
i
v
e

E
r
r
o
r
Method 1 Method 3
Average Variance
0
0.05
0.1
0.15
0.2
0.25
(
0
.
3
2
,
2
)
(
1
.
7
6
,
2
)
(
3
.
2
,
2
)
(
0
.
3
2
,
3
)
(
1
.
7
6
,
3
)
(
3
.
2
,
3
)
(
0
.
3
2
,
4
)
(
1
.
7
6
,
4
)
(
3
.
2
,
4
)
(
0
.
3
2
,
5
)
(
1
.
7
6
,
5
)
(
3
.
2
,
5
)
(Slope, fanout)
A
v
e
r
a
g
e

V
a
r
i
a
n
c
e
Method 1 Method 3
Fig. 9. Average relative error (a) and error variance (b) of delay for Methods 1 and 3 in comparison with Hspice using data from the outputs of each of the 21 stages.
Table 4
Time and space complexity comparison per delay/transition entry per input.
Method 1 Method 3
Model space complexity O(k
(p+q)
) linear case:
O(qk
p
)
O(pw+p2
(p+q)
) linear case:
O(p(w+p+q))
Characterization time
complexity
O(k
(p+q)
) linear case:
O(qk
p
)
Maximum:
O(w
3
+(w
2
+p(p+q))2
(p+q)
)
Simulation time
complexity
O(sk(p+q)) linear case:
O(s(pk+q))
O(sp2
(p+q)
) linear case:
O(sp(p+q))
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 318
its lookup table, and hence is the same as the models space
complexity.
Method 3 requires several steps. First, 2
(p+q)
simulations are
performed, which results in w2
(p+q)
points to be analyzed by PCA.
The generation of the appropriate w-dimensional covariance
matrix and its eigen decomposition using SVD have computa-
tional costs of O(w
2
2
(p+q)
) and O(w
3
), respectively. Iterative
methods exist, which avoid nding the covariance matrix. They
reduce the computational cost to O(rw) per iteration, where row
and where an iteration involves sequentially inputting each of the
2
(p+q)
w-dimensional vectors. One such method is Sangers
generalized Hebbian algorithm [37]. Next all of the waveforms
are converted to the PCA domain at a cost of O(w2
(p+q)
) to develop
p expressions by analyzing the resulting PCA domain dataset at a
cost of O(p2
(p+q)
). The coefcients of the resulting full factorial
model are sorted to nd the signicant factors, at a cost of
O(p(p+q)2
(p+q)
), and the signicant coefcients are selected at a
cost of O(p2
(p+q)
). Table 4 shows the dominant terms.
The simulation time complexity of Method 1 is proportional to
the table lookup time and the number of stages (s). A table lookup
requires a search in each of the p dimensions among the k entries,
which has complexity O(pk). Once the appropriate entry is
selected, the delay is computed, which takes into account the q
sensitivities and has complexity O(q) if the model is linear. This
process is repeated for each of the s stages, resulting in a
simulation time complexity of O(s(pk+q)). For a nonlinear model
the time complexity is O(sk(p+q)).
Method 3 requires the evaluation of p expressions, each with at
most 2
(p+q)
terms, for each of the s stages. This results in a
computational complexity of O(sp2
(p+q)
). However, typical expres-
sions contain at most (p+q) linear terms, corresponding to a
simulation time complexity of O(sp(p+q)).
It can be seen from Table 4 that Method 1 is linear in
characterization time complexity, while Method 3 is exponential.
However, characterization is done only once for a cell library.
Model users are only impacted by space and simulation time
complexity. In addition, if a fractional factorial experimental
design [34], rather than a full factorial experimental design were
performed to generate the dataset, characterization would be
polynomial in q.
Method 1 is exponential in model space and characterization
time complexity as a function of p, which limits the discretization
of the space that describes the input waveforms (slope and
fanout), while Method 3 is not. As a result, memory usage does
not increase rapidly with increasingly accurate waveforms.
Moreover, as we add more parameters, q, Method 1 requires k
p
more entries for each additional parameter, while Method 3
requires only p additional entries. Therefore, memory usage does
not increase as rapidly for Method 3 as the number of parameters
increases.
7. Conclusions
This paper provides a method to develop compact models of
standard cells for static timing analysis enabling accurate
characterization over variations in input waveform characteristics,
output loading, process parameters, and the environment (tem-
perature and power supply voltage). Compact characterization
utilizes principal component analysis of waveforms. The resulting
models are stored as coefcients of equations. The compact
models enable the performance of a variety of statistical
experiments, including efcient Monte Carlo analysis of the
impact of within-die variation on delay and of the impact of
various temperature proles and variations in the power supply
voltage on delay.
Three approaches to PCA analysis have been compared, and
indicate that the SSM and SNM methods result in smaller
modeling errors.
In addition, the accuracy and efciency of the method has been
evaluated in comparison with Hspice and the slope-fanout tabular
method. Runtimes are comparable with the tabular method, while
accuracy and memory usage is improved.
Acknowledgement
The authors would like to thank the Semiconductor Research
Corporation for support of this research project under Task
1419.001.
Appendix. Accuracy analysis of the PCA waveform model
The discretization level for waveform modeling was chosen in
order to have straightforward voltage levels for transistors in the
technology that was used. However, accuracy of the PCA wave-
form model is a function of
(a) the number of discretization levels along the voltage axis, and
(b) the choice of voltage levels to discretize the waveform on the
voltage axis.
To analyze waveform model accuracy with respect to the
number of discretization levels along the voltage axis, seven
ARTICLE IN PRESS
19 Levels
Voltage
15 Levels
10 Levels
5 Levels
--case 1
5 Levels
--case 2
5 Levels
--case 3
3 Levels
Fig. 10. PCA waveform discretization patterns for the voltage scale.
PCA Waveform Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
3
S
u
m

o
f

S
q
u
a
r
e
s

o
f

E
r
r
o
r
Uniform Discretization 5 Levels -- Case 1
5 Levels -- Case 3
5 7 9 11 13 15 17 19
Fig. 11. Increase in accuracy of the PCA waveform as a function of the number of
discretization levels.
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 319
discretization patterns were compared. These are summarized in
Fig. 10, with between three and 19 levels.
Fig. 11 compares the accuracy of the uniform discretication
plans using the sum of squares of error (SOS). It can be seen that at
least 10 points are needed to achieve high accuracy, and that
increasing beyond 10 points does not increase accuracy much.
However, it should be noted that as few as ve points can achieve
high accuracy if they are appropriately placed. Additionally, fewer
discretization levels result in fewer principal component basis
functions. However, even with the worst case that we studied
where we considered 19 levels for discretization, two principal
component basis functions covered 99.8% of the variation.
References
[1] T. Kirkpatrick, N. Clark, PERT as an aid to logic design, IBM J. Res. Dev. 10 (2)
(1966) 135141.
[2] D. Lee, V. Zolotov, D. Blaauw, Static timing analysis using backward signal
propogation, Proc. Design Automat. Conf. (2004) 664669.
[3] S.R. Nassif, A.J. Strojwas, S.W. Director, A methodology for worst-case analysis
of integrated circuits, IEEE Trans. Comput. Aided Des. 5 (1) (1986) 104113.
[4] A. Agarwal, V. Zolotov, D.T. Blaauw, Statistical timing analysis using bounds
and selective enumeration, IEEE Trans. Comput. Aided Des. 22 (9) (2003)
12431260.
[5] X. Li, et al., Dening statistical timing sensitivity for logic circuits with large-
scale process and environmental variations, IEEE Trans. Comput. Aided Des.
27 (6) (2008) 10411053.
[6] H. Chang, S.S. Sapatnekar, Statistical timing analysis under spatial correla-
tions, IEEE Trans. Comput. Aided Des. 24 (9) (2005) 14671482.
[7] B. Cline, et al., Analysis and modeling of CD variation for statistical static
timing, Proc. Int. Conf. Comput. Aided Des. (2006) 6066.
[8] D. Blaauw, et al., Statistical timing analysis: from basic principles to state of
the art, IEEE Trans. Comput. Aided Des. 27 (4) (2008) 589607.
[9] S. Bhardwaj, S. Vrudhula, A. Goel, A unied approach for full chip statistical
timing and leakage analysis of nanoscale circuits considering intradie process
variations, IEEE Trans. Comput. Aided Des. 27 (10) (2008) 18121825.
[10] L. Zhang, et al., Correlation-preserved non-Gaussian statistical timing analysis
with quadratic timing model, Proc. Des. Automat. Conf. (2005) 8388.
[11] V. Khandelwal, A. Srivastara, A general framework for accurate statistical
timing analysis considering correlations, Proc. Des. Automat. Conf. (2005)
8994.
[12] J. Singh, S.S. Sapatnekar, A scalable statistical static timing analyzer
incorporating correlated non-Gaussian and Gaussian parameter variations,
IEEE Trans. Comput. Aided Des. 27 (1) (2008) 160173.
[13] B. Choi, D.M.H. Walker, Timing analysis of combinational circuits including
capacitive coupling and statistical process variation, Proc. VLSI Test Symp.
(2000) 4954.
[14] A. Gattiker, et al., Timing yield estimation from static timing analysis, Proc.
Int. Symp. Qual. Electron. Des. (2001) 437442.
[15] M. Orshansky, K. Keutzer, A general probabilistic framework for worst case
timing analysis, Proc. Des. Automat. Conf. (2002) 556561.
[16] A. Agarwal, et al., Statistical delay computation considering spatial correla-
tions, Proc. Asia South Pacic Des. Automat. Conf. (2003) 271276.
[17] C.S. Amin, et al., Statistical static timing analysis: how simple can we get?,
Proc. Design Automat. Conf. (2005) 567652.
[18] CMOS nonlinear delay model calculation, in: Library Compiler User Guide,
vol. 2, Synopsys, 1999.
[19] J.E. Jackson, A Users Guide to Principal Components, Wiley, New York, 2003.
[20] P. Feldmann, S. Abbaspour, Towards a more physical approach to gate
modeling for timing, noise, and power, Proc. Des. Automat. Conf. (2008)
453455.
[21] R. Gandhi, et al., Delay modeling using ramp and realistic signal waveforms,
Proc. Int. Conf. Electro-Inform. Technol. (2005).
[22] C.S. Amin, F. Dartu, Y.I. Ismail, Weibull-based analytical waveform model,
IEEE Trans. Comput. Aided Des. 24 (8) (2005) 11561168.
[23] A. Jain, D. Blaauw, V. Zolotov, Accurate delay computation for noisy waveform
shapes, Proc. Int. Conf. Comput. Aided Des. (2005) 946952.
[24] V. Zolotov, et al., Compact modeling of variational waveforms, Proc. Int. Conf.
Comput. Aided Des. (2007) 705712.
[25] S.R. Nassif, E. Acar, Advanced waveform models for the nano-meter regime,
International Workshop on Timing Issues in the Specication and Synthesis
of Digital Systems, 2004.
[26] A. Ramalingam, et al., Accurate waveform modeling using singular value
decomposition with applications to timing analysis, Proc. Des. Automat. Conf.
(2007) 148153.
[27] H. Fatemi, S. Nazarian, M. Pedram, Statistical logic cell delay analysis using a
current-based model, Proc. Des. Automat. Conf. (2007) 253256.
[28] S. Basu, P. Thakore, R. Vemuri, Process variation tolerant standard cell library
development using reduced dimension statistical modeling and optimization
techniques, Proc. Int. Symp. Qual. Electron. Des. (2007) 814820.
[29] A. Goel, et al., A methodology for characterization of large macro cells and IP
blocks considering process variations, Proc. Int. Symp. Qual. Electron. Des.
(2008) 200206.
[30] S. Sundareswaran, et al., Characterization of standard cells for intra-cell
mismatch variations, Proc. Int. Symp. Qual. Electron. Des. 1 (2008) 213219.
[31] Star-Hspice Manual, Avant! Corporation and Avant! Subsidiary, 2001.
[32] Assura
TM
Physical Verication User Guide, Cadence Design Systems Inc.,
2005.
[33] Using MATLAB, The MathWorks Inc., 1999.
[34] G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters, Wiley, New
York, 1978.
[35] M. Ketkar, K. Dasamsetty, S. Sapatnekar, Convex delay models for transistor
sizing, Proc. Des. Automat. Conf. (2000) 655660.
[36] S.-A. Aftabjahani, L. Milor, Compact variation-aware standard cell models for
static timing analysis, Proc. Des. Circuits Integrated Syst. (2008).
[37] T.D. Sander, Optimal unsupervised learning in a single-layer linear feedfor-
ward neural network, Neural Networks 2 (6) (1989) 459473.
Seyed-Abdollah Aftabjahani received his B.S. degree
in Computer Engineering from the National University
of Iran, Tehran, Iran in 1994, and his M.S. in Electrical
and Computer Engineering from the University of
Tehran, Tehran, Iran in 1997. He is currently conducting
research for his Ph.D. dissertation in the Semiconduc-
tor Testing and Yield Enhancement Group Laboratory
at the School of Electrical and Computer Engineering,
Georgia Institute of Technology.
Prior to going back to school in 2001, he worked as a
software engineer for TRW and the Computer Science
Corporation in Atlanta. He also worked as a research
engineer for 7 years at the Iran Telecommunications
Research Center on designing and developing embedded systems for telecommu-
nications systems, fax machines, automatic mark readers. He was the technical
lead of the software development team. He also served as a consultant for several
companies and has experience in automation and control using computers and
embedded systems for industrial and commercial products. His research interests
include: statistical variation-aware timing analysis and modeling, simulation
acceleration using compiler techniques, digital testing and testable design, and
modeling for digital testing using Hardware Description Languages.
Linda Milor received her Ph.D. degree in Electrical
Engineering from the University of California, Berkeley,
in 1992.
She is currently an Associate Professor of Electrical
and Computer Engineering at the Georgia Institute of
Technology, in Atlanta, Georgia. Prior to that, she
served as Vice President of Process Technology and
Product Engineering at eSilicon Corporation, as Product
Engineering Manager at Advanced Micro Devices,
Sunnyvale, California, and as a Faculty Member at the
University of Maryland, College Park. She has authored
over 90 publications and holds four patents on yield
and test of semiconductor-integrated circuits.
ARTICLE IN PRESS
S.-A. Aftabjahani, L. Milor / INTEGRATION, the VLSI journal 42 (2009) 312320 320

You might also like