Professional Documents
Culture Documents
2, FEBRUARY 2007
AbstractA high-capacity electromagnetic solution, layered fi- IC design [3]. The importance has been further realized in
nite element method, is proposed for high-frequency modeling of todays low power design. In power efficient mobile chips, low
large-scale three-dimensional on-chip circuits. In this method, first, power states and clock gating are gaining momentum as the
the matrix system of the original 3-D problem is reduced to that of
2-D layers. Second, the matrix system of 2-D layers is further re- main power saving mechanisms. In these architectures, entire
duced to that of a single layer. Third, an algorithm of logarithmic blocks of circuits are switched on and off to achieve an optimal
complexity is proposed to further speed up the analysis. In addi- power-performance operating point. These power reduction
tion, an excitation and extraction technique is developed to limit techniques result in large processor current variations and fast
the field unknowns needed for the final circuit extraction to a single transient droops and noises in the power supply network, which
layer only, as well as keep the right-hand side intact during the
matrix reduction process. The entire procedure is numerically rig- cannot be accurately captured by a static-based IR drop or
orous without making any theoretical approximation. The compu- transient droop analysis.
tational complexity only involves solving a single layer irrespec- In addition to high-frequency digital IC design, electromag-
tive of the original problem size. Hence, the proposed method is netic analysis is also of paramount importance to analog, RF,
equipped with a high capacity to solve large-scale IC problems. The and mixed-signal IC design. Integrated computing and commu-
proposed method was used to simulate a set of large-scale inter-
connect structures that were fabricated on a test chip using con- nication calls for increasing levels of integration of RF, ana-
ventional Si processing techniques. Excellent agreement with the logue, and digital systems. Integrating as many circuits as pos-
measured data has been observed from dc to 50 GHz. sible on the same die leads often to undesired coupling and
Index TermsElectromagnetics, finite element method, high ca- sometimes to system failure. For instance, switching currents
pacity, high frequency, on-chip circuits, three dimension. induced by logic circuits cause ringing in the power-supply rails
and in the output driver circuitry. This, in turn, couples through
the common substrate to corrupt sensitive analog signals on the
I. INTRODUCTION same chip. Prevailing circuit-based signal integrity paradigms
are reaching their limits of predictive accuracy when applied to
S ON-CHIP designers move to faster clock frequencies
A enabled by process technology scaling with reduced
feature sizes, electromagnetic analysis has drawn the attention
high-frequency mixed-signal settings. An electromagnetic so-
lution is indispensable to sustain the continual scaling and inte-
gration of digital, analog, mixed-signal, and RF circuitry.
of the on-chip design community. In 2001, a research team at However, high-frequency IC design imposes many modeling
Intel started to validate RLC-based parasitic extraction at tens challenges to electromagnetic analysis. These challenges in-
of gigahertz. Significant mismatch between measurements and clude conductor loss, large numbers of dielectric stacks, strong
RLC models was observed at multigigahertz frequencies on non-uniformity, the presence of substrate, large numbers of
3-D interconnect bus structures [1]. In contrast, full-wave elec- conductors, large aspect ratio, broadband, and 3-D complexity
tromagnetic-based modeling accurately captured the measured [3]. Almost every challenge increases the number of unknowns,
behavior over the entire frequency band [1], [2]. The mismatch and hence the problem size one needs to solve when tackling
between RLC models and measurements was attributed to an IC problem. For instance, due to conductor loss, one has to
the decoupled E and H model employed in static modeling discretize into conductors with very fine elements to capture
by extracting the capacitance and inductance independent of rapid field variation within skin depth. This generates a large
each other [1]. This finding demonstrated the importance of number of unknowns even for small on-chip structures. In
full-wave electromagnetic-based solutions in high-frequency addition to on-chip intricacy that increases the problem size,
the need for full-chip analysis also stresses problem size. A
Manuscript received May 21, 2006; revised August 31, 2006. This work was full-chip analysis is often needed to capture the global electrical
supported in part by a grant from the Office of Naval Research under Award interactions between integrated circuits on the die, and between
N00014-06-1-0716 and in part by a grant from Intel Corporation.
D. Jiao is with the School of Electrical and Computer Engineering, Purdue
the die and the package. However, to date, the fastest integral
University, West Lafayette, IN 47906 USA (e-mail: djiao@purdue.edu). equation solver needs operations and
S. Chakravarty and C. Dai are with the Design and Technology Solutions, storage in dealing with -unknown electrodynamic problems;
Intel Corporation, Santa Clara, CA 95052-8119 USA. the fastest partial-differential-equation based solvers scale
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. as in both memory requirement and CPU cost. This
Digital Object Identifier 10.1109/TAP.2006.889847 performance is generally regarded as the limit that one can
0018-926X/$25.00 2007 IEEE
JIAO et al.: A LAYERED FINITE ELEMENT METHOD FOR ELECTROMAGNETIC ANALYSIS 423
(4)
(8)
The superscript denotes element. In each prism element, By ordering the unknowns layer by layer as shown in Fig. 3, we
there are nine vector bases as shown in Fig. 2. generate a banded matrix . Though a banded matrix, its solu-
These functions can be written as tion can be highly computationally expensive when the number
of unknowns is large. A direct solution generally requires a large
amount of memory; an iterative solution can converge slowly,
and is inefficient in the presence of multiple right-hand sides.
To solve this problem, we first reduce the system matrix to
one that only involves 2-D surface unknowns in each layer, the
detail of which is illustrated in the next section.
(12)
Fig. 4. Procedure of eliminating volume unknowns.
in which is the layer thickness. Therefore, based on b)
and c), the following equality holds true:
[16]. For instance, the volumetric unknowns in Layer 1, which
is , can be eliminated by using the procedure illustrated in
Fig. 4. and are the surface unknowns on the top and (13)
bottom surfaces of Layer 1.
d) Matrix is linearly proportional to the layer thickness.
In Fig. 4, matrix is formed between unknowns and
In fact, matrix is assembled from the following ele-
is formed between unknowns and is
mental matrix:
formed between unknowns and is formed between
unknowns and is formed between unknowns
and and is formed between unknowns and . (14)
The relationship between the transformed matrices and the
original matrices can be written as Therefore, matrix only needs to be formed and inverted
for a layer of unit thickness. Others can be obtained by
scaling accordingly.
e) For interconnect structures, matrix only needs to be
formed for each unique structure seed.
The concept of structure seeds in 3-D interconnect structures
(9) was first introduced in [2]. A structure seed has a unique cross
Essentially, the volume unknowns are eliminated by using the section. For a 3-D bus structure of orthogonal layers, the
relationship between the surface and volume unknowns. This number of structure seeds is . This number is small. For in-
relationship can be also used to recover volume unknowns from stance, for an interconnect involving seven metal layers, this
the surface unknowns. number is only 8 irrespective of the number of wires. If we
From (9), apparently, in order to eliminate volume unknowns, choose the layer-growth direction to be either or [please
one needs to fill in matrices , and for each refer to Fig. 1(b)], then each layer features the same dielectric
layer. In addition, one has to evaluate , and configuration. Then, from (14), one can see clearly that matrix
for each layer. The resultant computational cost can of unit thickness is different only when the conductor config-
be very high when the number of layers is large as well as the uration is different. Hence, matrix only needs to be formed
number of unknowns in each layer. In this paper, a fast tech- and inverted for each structure seed of unit thickness. The in-
nique is proposed to eliminate the volume unknowns efficiently. verse of matrix in each layer can then be readily obtained by
This fast technique is achieved by deriving the following matrix linearly scaling the structure-seed-based inverse matrix with the
properties. layer thickness .
a) Matrix is the same for all the layers. As an immediate result of the aforementioned factors, the
This is because matrix is assembled from the following computational cost of reducing the 3-D system matrix to a 2-D
elemental matrix: layered one only involves solving for each structure
seed. Since and are extremely sparse matrices, and gener-
ally there are only a few structure seeds for on-chip interconnect
(10) structures, the reduction can be performed very efficiently.
In (10), denotes the region forming a triangular element, V. REDUCTION OF THE 2-D LAYERED SYSTEM MATRIX
TO A SINGLE-LAYER ONE
is the edge basis function ([14, pp. 234237]), and
is the node basis function ([14, pp. 80]). Since the perme- With all the volume unknowns eliminated, we obtain a system
ability does not change in the realistic on-chip struc- matrix that only involves 2-D surface unknowns in each layer. If
tures, remains the same for all the layers. Therefore, the number of layers is only a few, we can stop there and solve
matrix only needs to be filled for one layer. the reduced system matrix as a whole without further reduc-
b) Matrices and are correlated tion. However, in reality, we often encounter a large number of
layers. For example, in a realistic on-chip interconnect structure,
one can encounter a large number of layers by either segmenting
(11) along the or direction. Therefore, further reduction is often
426 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 55, NO. 2, FEBRUARY 2007
extraction and simulation results within one run. One can either
obtain the S-parameter model of the interconnects as aforemen-
tioned, or load the interconnect with current sources, and obtain
voltages directly from the proposed method. When the ground
plane is placed far away from the structure of interest, instead
of using a probe that goes all the way from the ground to the
structure, a short probe that does not start from the ground is
Fig. 7. Excitation and Extraction.
used. In doing so, the port current becomes unknown. It is ex-
tracted from the port voltages sampled at multiple points. With
port currents and voltages known, the Z-, Y-, and S-parameters
algorithm is also used to handle multiple layers resulted from the can be extracted.
discretization of thick silicon and conductors. For instance, the
thick silicon substrate often constitutes a numerical challenge to
a partial-differential-equation based solver because of the large VIII. PERFORMANCE ANALYSIS
number of volume unknowns resulted from its discretization. The memory usage of the proposed method is modest com-
With this technique, it does not constitute a challenge any more pared to the conventional finite element method. Maximally, it
because one can account for its contribution to the rest of the only requires the storage of a single-layer matrix formed by sur-
system in operations and single-layer storage. This algo- face unknowns irrespective of the original problem size. There-
rithm of logarithmic complexity, to certain extent, resembles the fore the proposed method possesses a high capacity to deal with
one used for treating deep cavities with a constant cross section very large scale electromagnetic problems.
[18]. The CPU run time can be analyzed for step I and step
II in the content of both serial implementation and parallel
VII. EXCITATION AND EXTRACTION implementation.
Step I (Reduction of the 3-D Layered System Matrix to
Here, we give a simple example to illustrate the excitation
a 2-D Layered One): Assuming the number of layers is
and extraction scheme. Consider a wire sitting above a ground
and the number of volume unknowns per layer is ,
plane as shown in Fig. 7. If the layer-growth direction is chosen
the lower bound of the CPU cost for eliminating all the
to be , we use a current source orientated in either or
volume unknowns is apparently . However, the
direction; If the layer-growth direction is chosen to be , we
proposed method can achieve it in operations,
use a current source orientated in either or direction; if the
in which is the number of structure seed, which is gen-
layer-growth direction is chosen to be , we use a current source
erally much less than . If implemented in parallel, since
located in - plane. The purpose is to associate field unknowns
the elimination of the volume unknowns in different layers
involved in the excitation and extraction to those remaining in
is completely decoupled, each of them can be assigned to a
the final matrix system. For all the other unknowns, their cor-
single processor, and no communication is needed between
responding right-hand sides are zero. Therefore, the matrix re-
the processors. Therefore, the CPU cost is just , the
duction process illustrated in Sections IV and V does not involve
cost of a single-layer sparse matrix inversion.
the modification of the right-hand side at all, which is efficient.
Step II (Reduction of the 2-D layered System Matrix to
Multiple columns of current filaments can be used from the wire
a Single-layer One): Assuming the number of layers is ,
to the ground. Multiple rows can also be used. But they are all
the total number of surface unknowns is the CPU cost
placed in the layers or layer of interest. The right-hand sides
of the serial implementation can be estimated as
corresponding to the field unknowns associated with the current
filaments become
(17)
(16)
in which is the cost of a single-layer surface-un-
in which is the current and is the length of the current fil- known elimination. The function depends on the compu-
ament. When we inject current into one port, we leave other tational complexity of the matrix solver used to solve the
ports open. We then sample the voltage generated at each port. single layer matrix. For instance, if an advanced sparse ma-
The voltage can be evaluated by performing a line integral of trix solver is used, can be a linear function
the electric field from the port to the ground. Thus, we obtain
one column of the impedance matrix . We then inject current
into another port. We can obtain another column of matrix.
We continue this procedure by injecting current into each port
in turn. Finally, we obtain the entire matrix. From the ma- If an iterative solver such as the conjugate gradient method
trix, one can easily obtain both - and -parameter matrices. is used
It should be noted that, different from the general RLC-based
interconnect modeling process in which the extraction stage is
separated from the simulation stage, here one can obtain both
428 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 55, NO. 2, FEBRUARY 2007
(18)
(19)
IX. EXAMPLES
To validate the proposed method, we simulated a set of inter-
connect structures that were fabricated on a test chip using con-
ventional Si processing techniques [1]. High resolution cross-
sectional scanning electron microscopy and optical microscopy
were used to measure the relevant dimensions of the fabricated
structures. Parasitics signals were removed from the measured
S-parameters using a de-embedding approach [1].
The first test structure is of 300 mu width. It involves a 10-mu Fig. 8. S-parameters of an on-chip interconnect structure of length 2000 mu.
wide strip in metal 2(M2) layer, one ground plane in metal 1 (a). S11 magnitude. (b) S11 phase. (c). S12 magnitude. (d) S12 phase.
(M1) layer, and one ground plane in metal 3 (M3) layer. The
distance of this strip to the M2 returns at the left- and right-hand
sides are 50 mu, respectively. The strip is of a length of 2000 mu. direction. The 2000 mu length is subdivided into 40 layers. The
The discretization is done in - plane, and extruded along the 2-D surface matrix is only formed for one layer. The algorithm
JIAO et al.: A LAYERED FINITE ELEMENT METHOD FOR ELECTROMAGNETIC ANALYSIS 429
Fig. 12. Simulation of an on-die power grid. (a) Geometry. (b) VSS voltage
droop at dc. (c) S-parameters.
X. CONCLUSION
In this paper, we proposed a layered finite element method
for high-frequency modeling of large-scale three-dimensional
Fig. 11. Simulation of a RF spiral inductor. (a) Geometry. (b) Y-parameters on-chip circuit structures. This method is capable of solving
and Q value. an orders-of-magnitude smaller system to rigorously obtain the
solution of the original big problem. The system matrix of the
original 3-D problem is first reduced to that of 2-D layers. For
power grid for the purpose of validation. Eight C4 bumps on-chip interconnect structures, the computational cost of this
are landing at M8 wide metals, and 6-pair current sources reduction is modest, only involving the solution of a few 2-D
are attached at the bottom metal layer M4. Fig. 12(b) shows structure seeds. The matrix system of 2-D layers is then further
JIAO et al.: A LAYERED FINITE ELEMENT METHOD FOR ELECTROMAGNETIC ANALYSIS 431
reduced to that of a single layer. This reduction only involves [14] J. M. Jin, The Finite Element Method in Electromagnetics, 1st ed.
single-layer unknowns irrespective of the original problem size. New York: Wiley, 1993.
[15] R. D. Graglia, D. R. Wilton, A. F. Peterson, and I. Gheorma, Higher
As a result, the proposed method possesses a high capacity to order interpolatory vector bases on prism elements, IEEE Trans. An-
solve large-scale interconnect problems. Equally important, the tennas Propag., vol. 46, no. 3, pp. 442450, Mar. 1998.
entire procedure is numerically rigorous without making any [16] D. Jiao, S. Chakravarty, C. Dai, and S. W. Lee, Surface-based finite
element method for large-scale 3-D circuit modeling, in Proc. 14th
theoretical approximation. In addition, it solves Maxwells cou- Topical Meeting on Electrical Performance of Electronic Packaging,
pled E-H equations, and hence features uncompromised electro- Oct. 2426, 2005, pp. 347350.
magnetic accuracy. Its accuracy and capacity are demonstrated [17] C. S. Desai and J. F. Abel, Introduction to the Finite Element Method.
New York: Van Nostrand Reinhold, 1972.
by numerical and experimental results. [18] J. Liu and J. M. Jin, A special higher-order finite element method for
scattering by deep cavities, IEEE Trans. Antennas Propog., vol. 48,
pp. 694703, May 2000.
ACKNOWLEDGMENT
Changhong Dai received the B.S. degree in physics cuit simulation, interconnect modeling for parasitic extraction, static and full
from Beijing University, Beijing, China, in 1985, and wave simulation of interconnects and RF devices, and power and power de-
the M.S. and Ph.D. degrees in materials sciences and livery modeling for IC product design. He is currently a Director with the Tech-
engineering from Stanford University, Stanford, CA, nology and Manufacturing Group of Intel Corporation, with the responsibility
in 1992 and 1995, respectively. of directing the development of Core CAD Technologies that enables Intel pro-
He joined the Technology CAD Division of Intel cessing technology and product design. His current responsibility covers the full
Corporation, Santa Clara, CA, in 1995, as a Senior spectrum of CAD tool and infrastructure development with a focus of bridging
CAD Engineer. Since 1995, he has been an R&D En- the processing technology and chip design.
gineer or Manager for model and CAD tool develop-
ment for circuit analysis and physical design, in the
areas of circuit reliability, transistor modeling for cir-