LFEM

422 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 55, NO.
2, FEBRUARY 2007
A Layered Finite Element Method for Electromagnetic

Analysis of Large-Scale High-Frequency
Integrated Circuits
Dan Jiao, Senior Member, IEEE, Sourav Chakravarty, and Changhong Dai
AbstractA high-capacity electromagnetic solution, layered fi- IC design [3]. The importance has been further realized in
nite element method, is proposed for high-frequency modeling of todays low power design. In power efficient mobile chips, low
large-scale three-dimensional on-chip circuits. In this method, first, power states and clock gating are gaining momentum as the
the matrix system of the original 3-D problem is reduced to that of
2-D layers. Second, the matrix system of 2-D layers is further re- main power saving mechanisms. In these architectures, entire
duced to that of a single layer. Third, an algorithm of logarithmic blocks of circuits are switched on and off to achieve an optimal
complexity is proposed to further speed up the analysis. In addi- power-performance operating point. These power reduction
tion, an excitation and extraction technique is developed to limit techniques result in large processor current variations and fast
the field unknowns needed for the final circuit extraction to a single transient droops and noises in the power supply network, which
layer only, as well as keep the right-hand side intact during the
matrix reduction process. The entire procedure is numerically rig- cannot be accurately captured by a static-based IR drop or
orous without making any theoretical approximation. The compu- transient droop analysis.
tational complexity only involves solving a single layer irrespec- In addition to high-frequency digital IC design, electromag-
tive of the original problem size. Hence, the proposed method is netic analysis is also of paramount importance to analog, RF,
equipped with a high capacity to solve large-scale IC problems. The and mixed-signal IC design. Integrated computing and commu-
proposed method was used to simulate a set of large-scale inter-
connect structures that were fabricated on a test chip using con- nication calls for increasing levels of integration of RF, ana-
ventional Si processing techniques. Excellent agreement with the logue, and digital systems. Integrating as many circuits as pos-
measured data has been observed from dc to 50 GHz. sible on the same die leads often to undesired coupling and
Index TermsElectromagnetics, finite element method, high ca- sometimes to system failure. For instance, switching currents
pacity, high frequency, on-chip circuits, three dimension. induced by logic circuits cause ringing in the power-supply rails
and in the output driver circuitry. This, in turn, couples through
the common substrate to corrupt sensitive analog signals on the
I. INTRODUCTION same chip. Prevailing circuit-based signal integrity paradigms
are reaching their limits of predictive accuracy when applied to
S ON-CHIP designers move to faster clock frequencies
A enabled by process technology scaling with reduced
feature sizes, electromagnetic analysis has drawn the attention
high-frequency mixed-signal settings. An electromagnetic so-
lution is indispensable to sustain the continual scaling and inte-
gration of digital, analog, mixed-signal, and RF circuitry.
of the on-chip design community. In 2001, a research team at However, high-frequency IC design imposes many modeling
Intel started to validate RLC-based parasitic extraction at tens challenges to electromagnetic analysis. These challenges in-
of gigahertz. Significant mismatch between measurements and clude conductor loss, large numbers of dielectric stacks, strong
RLC models was observed at multigigahertz frequencies on non-uniformity, the presence of substrate, large numbers of
3-D interconnect bus structures [1]. In contrast, full-wave elec- conductors, large aspect ratio, broadband, and 3-D complexity
tromagnetic-based modeling accurately captured the measured [3]. Almost every challenge increases the number of unknowns,
behavior over the entire frequency band [1], [2]. The mismatch and hence the problem size one needs to solve when tackling
between RLC models and measurements was attributed to an IC problem. For instance, due to conductor loss, one has to
the decoupled E and H model employed in static modeling discretize into conductors with very fine elements to capture
by extracting the capacitance and inductance independent of rapid field variation within skin depth. This generates a large
each other [1]. This finding demonstrated the importance of number of unknowns even for small on-chip structures. In
full-wave electromagnetic-based solutions in high-frequency addition to on-chip intricacy that increases the problem size,
the need for full-chip analysis also stresses problem size. A
Manuscript received May 21, 2006; revised August 31, 2006. This work was full-chip analysis is often needed to capture the global electrical
supported in part by a grant from the Office of Naval Research under Award interactions between integrated circuits on the die, and between
N00014-06-1-0716 and in part by a grant from Intel Corporation.
D. Jiao is with the School of Electrical and Computer Engineering, Purdue
the die and the package. However, to date, the fastest integral
University, West Lafayette, IN 47906 USA (e-mail: djiao@purdue.edu). equation solver needs operations and
S. Chakravarty and C. Dai are with the Design and Technology Solutions, storage in dealing with -unknown electrodynamic problems;
Intel Corporation, Santa Clara, CA 95052-8119 USA. the fastest partial-differential-equation based solvers scale
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. as in both memory requirement and CPU cost. This
Digital Object Identifier 10.1109/TAP.2006.889847 performance is generally regarded as the limit that one can
0018-926X/$25.00 2007 IEEE
JIAO et al.: A LAYERED FINITE ELEMENT METHOD FOR ELECTROMAGNETIC ANALYSIS 423
achieve in computational electromagnetics. Since is a big

number in IC analysis even for a circuitry of a modest size, the
current performance of computational electromagnetic tech-
niques is still insufficient when tackling a realistic IC design
problem that can involve billions of unknowns. Therefore, it is
of paramount importance to study and develop a high-capacity
electromagnetic solution.
Having realized the importance of full-wave electromagnetic
analysis in high frequency chip design, and the unique mod-
eling challenges of IC problems, researchers in both circuit and
fields have been working on developing innovative electromag-
netic solutions [2][13]. However, most of the research efforts
have been placed on enriching existing electromagnetic mod-
eling techniques with new capabilities to address on-chip intri-
cacy. Little work has been reported in open literature on high-ca-
pacity electromagnetic solutions, which can potentially tackle
full-chip problems. In [2], we presented a novel, rigorous, and
fast method for the full-wave modeling of large-scale high speed Fig. 1. 3-D circuit problems. (a) RF CMOS. (b) Global on-chip interconnects.
interconnect structures. In this method, a general interconnect (c) RF IC.
structure is decomposed into a few structure seeds. In each struc-
ture seed, the original wave propagation problem is represented
To solve (1), we formulate a numerical algorithm to obtain
into a generalized eigenvalue problem. A novel mode-matching
the fields or fields inside the computational domain at each
technique is developed to solve large-scale 3-D problems by
discretized point, from which the design parameters of interest
using 2-D-like CPU time and memory. This method has shown
are obtained. Due to the computational complexity of on-chip
great capability in modeling Manhattan-type large-scale inter-
circuit problems as stated in Section I, the resultant numerical
connect structures. In this paper, we propose a layered finite el-
system is generally extremely big even for a small circuitry. This
ement method to tackle both Manhattan- and non-Manhattan-
prevents the direct use of existing computational electromag-
type large-scale multilayered structures. Layout periodicity can
netic techniques in guiding chip-level high-frequency IC design.
be employed to further speed up the proposed method, but it is
We propose to tackle this problem by developing a layered fi-
not a must to achieve the high capacity of the proposed method.
nite-element method.
We will elaborate this method in the following six sections:
Section II problem statement, Section III layered finite-element III. LAYERED FINITE ELEMENT SCHEME
scheme, Section IV reduction of the 3-D layered system matrix
to a 2-D layered one, Section V reduction of the 2-D layered In accordance with variational principle [14], the solution to
system matrix to a single-layer one, Section VI an algorithm of the boundary value problem defined by (1) and its boundary
logarithmic complexity for further speed-up, Section VII excita- conditions can be obtained by seeking the stationary point of
tion and extraction, and Section VIII performance analysis. Fi- the functional
nally, we will demonstrate the accuracy and high capacity of the
proposed method by a number of numerical and experimental
results.
II. PROBLEM STATEMENT

Consider 3-D circuit problems shown in Fig. 1. (2)
The circuit can be a RF CMOS device metallic system,
a global on-chip interconnect structure, an RF IC circuit, or In (2), denotes the truncation boundary, which is the outer-
others. Inside these circuits, the electric field satisfies the most region in the computational domain. is an operator as-
second-order vector wave equation sociated with the absorbing boundary condition placed on the
truncation boundary. If the first-order absorbing boundary con-
(1) dition is used, (2) can be written as
subject to certain boundary conditions. In (1), the bar over

denotes a complex permittivity that comprises both permittivity
and conductivity; is the relative permeability; and are
free-space wave number and impedance, respectively; is the
current source; V is the computational domain that encloses
the circuit. The stack-growth direction is defined as and used
(3)
throughout this paper.
424 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 55, NO. 2, FEBRUARY 2007
Fig. 3. Unknown ordering scheme.
Fig. 2. Illustration of prism vector basis functions.

In (5), , and are area coordinates (also known as node
basis functions ([14, pp. 80]), is 0 at the bottom plane, and
For the boundaries that are shorted to ground, tangential is ex- 1 at the upper plane. Clearly, as shown in Fig. 2, basis func-
plicitly enforced to be zero therein. Next, we perform discretiza- tions , and reside on the surfaces of
tion. The discretization is conducted for both dielectric regions each layer, and hence their associated unknowns are called as
and conducting regions. Disretizing into conductors allows for surface unknowns throughout this paper, while the unknowns
an accurate modeling of conductor loss. The triangular prism el- associated with basis functions , and are named
ements are used to discretize the computational domain. These as volume unknowns. From (5), one can also see clearly that
elements are very suitable for the discretization of on-chip struc- the surface basis functions , and are
tures, which are multilayered structures. They extrude along the formed by multiplying or with edge basis functions
layer-growth direction while capturing the irregular geometry ([14, pp. 234237]).
in the transverse cross section. This allows for the modeling of Substituting (4) into (3), and taking the partial derivative of
irregular on-chip circuits such as trapezoidal- and spiral-type (3) with respect to unknown coefficients yield the following
interconnects. In this method, the layer-growth direction can be matrix equation:
chosen the same as the natural layer-growth direction, which is
the stack-growth direction. It can also be chosen from other di-
rections to minimize the number of unknowns in the transverse (6)
cross section that is perpendicular to the layer-growth direction. in which and are assembled from their elemental
For example, a Manhattan-type integrated circuit structure is counterparts
layered by looking from any of , and directions. There-
fore, the layer-growth direction can be chosen from any of the
, and directions to render a transverse cross section that
has the minimal problem size. (7)
In each prism element, the electric field is expanded into
prism vector basis functions [15] In (7), the inner product is defined as
(4)
(8)
The superscript denotes element. In each prism element, By ordering the unknowns layer by layer as shown in Fig. 3, we
there are nine vector bases as shown in Fig. 2. generate a banded matrix . Though a banded matrix, its solu-
These functions can be written as tion can be highly computationally expensive when the number
of unknowns is large. A direct solution generally requires a large
amount of memory; an iterative solution can converge slowly,
and is inefficient in the presence of multiple right-hand sides.
To solve this problem, we first reduce the system matrix to
one that only involves 2-D surface unknowns in each layer, the
detail of which is illustrated in the next section.
IV. REDUCTION OF THE 3-D LAYERED SYSTEM MATRIX

TO A 2-D LAYERED ONE
To form a matrix system that only involves 2-D surface un-
knowns in each layer, we need to eliminate volume unknowns.
(5) This can be achieved by using the procedure we proposed in
As a result, the need of evaluating and

is eliminated and only needs to be evaluated.
c) Matrix is equal to matrix in each layer
(12)
Fig. 4. Procedure of eliminating volume unknowns.
in which is the layer thickness. Therefore, based on b)
and c), the following equality holds true:
[16]. For instance, the volumetric unknowns in Layer 1, which
is , can be eliminated by using the procedure illustrated in
Fig. 4. and are the surface unknowns on the top and (13)
bottom surfaces of Layer 1.
d) Matrix is linearly proportional to the layer thickness.
In Fig. 4, matrix is formed between unknowns and
In fact, matrix is assembled from the following ele-
is formed between unknowns and is
mental matrix:
formed between unknowns and is formed between
unknowns and is formed between unknowns
and and is formed between unknowns and . (14)
The relationship between the transformed matrices and the
original matrices can be written as Therefore, matrix only needs to be formed and inverted
for a layer of unit thickness. Others can be obtained by
scaling accordingly.
e) For interconnect structures, matrix only needs to be
formed for each unique structure seed.
The concept of structure seeds in 3-D interconnect structures
(9) was first introduced in [2]. A structure seed has a unique cross
Essentially, the volume unknowns are eliminated by using the section. For a 3-D bus structure of orthogonal layers, the
relationship between the surface and volume unknowns. This number of structure seeds is . This number is small. For in-
relationship can be also used to recover volume unknowns from stance, for an interconnect involving seven metal layers, this
the surface unknowns. number is only 8 irrespective of the number of wires. If we
From (9), apparently, in order to eliminate volume unknowns, choose the layer-growth direction to be either or [please
one needs to fill in matrices , and for each refer to Fig. 1(b)], then each layer features the same dielectric
layer. In addition, one has to evaluate , and configuration. Then, from (14), one can see clearly that matrix
for each layer. The resultant computational cost can of unit thickness is different only when the conductor config-
be very high when the number of layers is large as well as the uration is different. Hence, matrix only needs to be formed
number of unknowns in each layer. In this paper, a fast tech- and inverted for each structure seed of unit thickness. The in-
nique is proposed to eliminate the volume unknowns efficiently. verse of matrix in each layer can then be readily obtained by
This fast technique is achieved by deriving the following matrix linearly scaling the structure-seed-based inverse matrix with the
properties. layer thickness .
a) Matrix is the same for all the layers. As an immediate result of the aforementioned factors, the
This is because matrix is assembled from the following computational cost of reducing the 3-D system matrix to a 2-D
elemental matrix: layered one only involves solving for each structure
seed. Since and are extremely sparse matrices, and gener-
ally there are only a few structure seeds for on-chip interconnect
(10) structures, the reduction can be performed very efficiently.
In (10), denotes the region forming a triangular element, V. REDUCTION OF THE 2-D LAYERED SYSTEM MATRIX
TO A SINGLE-LAYER ONE
is the edge basis function ([14, pp. 234237]), and
is the node basis function ([14, pp. 80]). Since the perme- With all the volume unknowns eliminated, we obtain a system
ability does not change in the realistic on-chip struc- matrix that only involves 2-D surface unknowns in each layer. If
tures, remains the same for all the layers. Therefore, the number of layers is only a few, we can stop there and solve
matrix only needs to be filled for one layer. the reduced system matrix as a whole without further reduc-
b) Matrices and are correlated tion. However, in reality, we often encounter a large number of
layers. For example, in a realistic on-chip interconnect structure,
one can encounter a large number of layers by either segmenting
(11) along the or direction. Therefore, further reduction is often
Fig. 5. Matrix cascading.
needed. Hence, we continue to reduce the dimension of the

system matrix to the size that one can handle with available com-
putational resources. For instance, if one is only able to handle
ten layers, we reduce the system to involve only ten layers; if one
can only handle one layer, we reduce the system matrix all the
way down to the one that only involves single-layer unknowns.
The reduction process is conducted by matrix cascading.
Fig. 5 illustrates a matrix cascading process, in which
, and denote the top surface unknowns of layer 1,
layer 2, and layer 3, respectively. Cascading layer 1 with layer
2 is equivalent to eliminating unknowns. The relationship
between the submatrices in the reduced matrix and the original Fig. 6. Reduction of the 2-D layered system to a single-layer one.
ones can be written as
the number of cascading multiplied by the time cost in each cas-

cading. The former is equal to the number of layers minus one,
while the latter is the cost of a single-layer unknown elimina-
tion. Therefore if one is capable of solving one-layer matrix, he
(15)
is capable of solving all the layers. However, due to the large
The reduction in (15) can be achieved efficiently by using sym- number of layers, this implementation could be slow despite the
metric backward Gaussian elimination [17]. To be specific, the high capacity it can achieve. A fast version can be obtained by
unknowns to be eliminated are first reordered to the bottom. parallel implementation.
They are then eliminated one by one by symmetric backward Parallel Implementation: Assuming there are L layers, we di-
Gaussian elimination. Since the cascading is performed on the vide them into L/2 groups. Each group consists of adjacent two
field-based matrix in each layer, the tangential field continuity layers. Each group has no overlapping layers. We then assign
is guaranteed at each interface. Therefore, different from cir- these L/2 groups to L/2 machines, each of which cascades two
cuit-port based cascading, the cascading procedure proposed matrices. Thus we obtain L/2 matrices simultaneously. We then
here is rigorous. subdivide these matrices into L/4 groups, each of them again
Now, assuming one is interested in layer , the matrix in each contains a pair of matrices without overlapping. We then assign
layer before the th layer can be cascaded together to form ma- them to L/4 machines, which return L/4 cascaded matrices si-
trix . Similarly, those matrices after the th layer can be cas- multaneously. We repeat this procedure until we have only two
caded together to form matrix . If one is equipped with suf- matrices left. Clearly, with the above parallel implementation,
ficient resources to solve three layers, he can stop here: solving the computational cost is only multiplied by a single-
the reduced matrix system shown in Fig. 6 to obtain the solu- layer unknown elimination cost. This performance can be fur-
tion. If not, he can continue to cascade the three layers together ther improved by exploring more advanced parallel algorithms.
to form one that only involves single-layer unknowns. The mul-
tiple-layer matrix cascading can be implemented either serially VI. AN ALGORITHM OF LOGARITHMIC COMPLEXITY
or in parallel. FOR FURTHER SPEED-UP
Serial Implementation: First we cascade layer 1 with layer If the on-chip structure is periodic with layers, com-
2 using the procedure shown in Fig. 5, we obtain matrix , plexity can be achieved even with a single machine. First, we use
which is a square matrix relating unknowns to . The the approaches in Sections IV and V to form the matrix for one
interface unknowns is eliminated in this process. We then period. We then cascade two of this matrix to form the matrix
cascade the with the submatrix in layer 3 to form matrix for two periods. We then cascade two of the two-period ma-
, which is a square matrix relating unknowns to . trix to form the matrix for four periods. We continue this pro-
In this process is eliminated. We continue this procedure cedure until we reach the length of the structure. Clearly, for a
until we reach the th layer, the resultant matrix is , structure which has layers, by using the aforementioned ap-
which relates to , where are the top surface un- proach, one only needs to cascade times to reach the re-
knowns of layer . Similarly, we cascade from layer to quired length. This algorithm of logarithmic complexity drasti-
layer to obtain matrix . Clearly the computational cost is cally speeds up the analysis of periodic on-chip structures. This
extraction and simulation results within one run. One can either
obtain the S-parameter model of the interconnects as aforemen-
tioned, or load the interconnect with current sources, and obtain
voltages directly from the proposed method. When the ground
plane is placed far away from the structure of interest, instead
of using a probe that goes all the way from the ground to the
structure, a short probe that does not start from the ground is
Fig. 7. Excitation and Extraction.
used. In doing so, the port current becomes unknown. It is ex-
tracted from the port voltages sampled at multiple points. With
port currents and voltages known, the Z-, Y-, and S-parameters
algorithm is also used to handle multiple layers resulted from the can be extracted.
discretization of thick silicon and conductors. For instance, the
thick silicon substrate often constitutes a numerical challenge to
a partial-differential-equation based solver because of the large VIII. PERFORMANCE ANALYSIS
number of volume unknowns resulted from its discretization. The memory usage of the proposed method is modest com-
With this technique, it does not constitute a challenge any more pared to the conventional finite element method. Maximally, it
because one can account for its contribution to the rest of the only requires the storage of a single-layer matrix formed by sur-
system in operations and single-layer storage. This algo- face unknowns irrespective of the original problem size. There-
rithm of logarithmic complexity, to certain extent, resembles the fore the proposed method possesses a high capacity to deal with
one used for treating deep cavities with a constant cross section very large scale electromagnetic problems.
[18]. The CPU run time can be analyzed for step I and step
II in the content of both serial implementation and parallel
VII. EXCITATION AND EXTRACTION implementation.
Step I (Reduction of the 3-D Layered System Matrix to
Here, we give a simple example to illustrate the excitation
a 2-D Layered One): Assuming the number of layers is
and extraction scheme. Consider a wire sitting above a ground
and the number of volume unknowns per layer is ,
plane as shown in Fig. 7. If the layer-growth direction is chosen
the lower bound of the CPU cost for eliminating all the
to be , we use a current source orientated in either or
volume unknowns is apparently . However, the
direction; If the layer-growth direction is chosen to be , we
proposed method can achieve it in operations,
use a current source orientated in either or direction; if the
in which is the number of structure seed, which is gen-
layer-growth direction is chosen to be , we use a current source
erally much less than . If implemented in parallel, since
located in - plane. The purpose is to associate field unknowns
the elimination of the volume unknowns in different layers
involved in the excitation and extraction to those remaining in
is completely decoupled, each of them can be assigned to a
the final matrix system. For all the other unknowns, their cor-
single processor, and no communication is needed between
responding right-hand sides are zero. Therefore, the matrix re-
the processors. Therefore, the CPU cost is just , the
duction process illustrated in Sections IV and V does not involve
cost of a single-layer sparse matrix inversion.
the modification of the right-hand side at all, which is efficient.
Step II (Reduction of the 2-D layered System Matrix to
Multiple columns of current filaments can be used from the wire
a Single-layer One): Assuming the number of layers is ,
to the ground. Multiple rows can also be used. But they are all
the total number of surface unknowns is the CPU cost
placed in the layers or layer of interest. The right-hand sides
of the serial implementation can be estimated as
corresponding to the field unknowns associated with the current
filaments become
(17)
(16)
in which is the cost of a single-layer surface-un-
in which is the current and is the length of the current fil- known elimination. The function depends on the compu-
ament. When we inject current into one port, we leave other tational complexity of the matrix solver used to solve the
ports open. We then sample the voltage generated at each port. single layer matrix. For instance, if an advanced sparse ma-
The voltage can be evaluated by performing a line integral of trix solver is used, can be a linear function
the electric field from the port to the ground. Thus, we obtain
one column of the impedance matrix . We then inject current
into another port. We can obtain another column of matrix.
We continue this procedure by injecting current into each port
in turn. Finally, we obtain the entire matrix. From the ma- If an iterative solver such as the conjugate gradient method
trix, one can easily obtain both - and -parameter matrices. is used
It should be noted that, different from the general RLC-based
interconnect modeling process in which the extraction stage is
separated from the simulation stage, here one can obtain both
If a direct solver such as the LU decomposition is used
In contrast, the CPU cost of a conventional method is
(18)
assuming it uses the same matrix solver as used in the pro-

posed method for a fair comparison. Therefore, it can be
seen clearly from (17) and (18) that the speed up of step
II is determined by the speed of the conventional method.
The slower the conventional method is, the faster the pro-
posed method is. For instance, if the conventional method
uses LU decomposition to solve a matrix, the speed-up of
step II is . If the conventional method uses an advanced
sparse matrix solver that scales linearly with the number
of unknowns, then the serial implementation of step II
wouldnt save any CPU time. However, sparse solvers gen-
erally cannot scale linearly especially when the number
of unknowns is large. Therefore, one can still gain better
efficiency by implementing step II serially. Furthermore,
when multiple right-hand sides exist, due to the reduced
size of the final system matrix, the forward and backward
substitution time is much less compared to a conventional
direct solver. In addition, one benefits from the modest
memory usage of the proposed method. The CPU cost of
the parallel implementation of step II is log multiplied
by a single-layer elimination cost
(19)
This performance can be further improved by exploring more

advanced parallel algorithms. Therefore, compared to serial im-
plementation, the parallel implementation drastically speeds up
the analysis. It should be noted that the numerical procedure
of the proposed method facilitates the parallel implementation
because of the decoupled nature of subproblems, and hence
zeroing the communication between processors.
IX. EXAMPLES
To validate the proposed method, we simulated a set of inter-
connect structures that were fabricated on a test chip using con-
ventional Si processing techniques [1]. High resolution cross-
sectional scanning electron microscopy and optical microscopy
were used to measure the relevant dimensions of the fabricated
structures. Parasitics signals were removed from the measured
S-parameters using a de-embedding approach [1].
The first test structure is of 300 mu width. It involves a 10-mu Fig. 8. S-parameters of an on-chip interconnect structure of length 2000 mu.
wide strip in metal 2(M2) layer, one ground plane in metal 1 (a). S11 magnitude. (b) S11 phase. (c). S12 magnitude. (d) S12 phase.
(M1) layer, and one ground plane in metal 3 (M3) layer. The
distance of this strip to the M2 returns at the left- and right-hand
sides are 50 mu, respectively. The strip is of a length of 2000 mu. direction. The 2000 mu length is subdivided into 40 layers. The
The discretization is done in - plane, and extruded along the 2-D surface matrix is only formed for one layer. The algorithm
of logarithmic complexity stated in Section VI is then used to

cascade all the layers to form one matrix that only involves the
surface unknowns on the near end and the far end planes. The
vertical current filament is placed between the ground and the
strip at the near end and the far end to excite the structure and
extract the circuit parameters. The back plane in M1 layer and
the top plane in M3 layer are both discretized to model the con-
ductor loss accurately. The simulated S-parameters are shown in
Fig. 8 in comparison with the measured data. Excellent agree-
ment is observed from dc to 50 GHz.
The second structure is a crosstalk structure. Two wires are
placed in the center of M2 layer. One is of 1.1 mu width, and the
other is of 2.07 mu width. The spacing between these two wires
is 2.0 mu. The distance to M2 returns at the left- and right-hand
sides is 10.1 mu. The solid planes in M1 and M3 layers in the
first test structure are replaced by 146 parallel returns, respec-
tively. These returns are 1.05-mu wide, and 1-mu apart. They
are shorted to the ground at the near and the far end. The struc-
ture is 2000 mu long. Like the first structure, only one structure
seed is involved. Hence, the 2-D surface matrix is formed for
one layer and cascaded by using the algorithm of logarithmic
complexity to form the final system matrix that only involves
surface unknowns at the near end and far end. In Fig. 9(b), we
compare measured and simulated crosstalk. Clearly the agree-
ment is good in both magnitude and phase. The current distri-
bution at 2 GHz at the near end of the structure is depicted in
Fig. 9(a).
The third structure again is a crosstalk structure. However,
the M1 and M3 metal layers are populated by orthogonal re-
turns. These returns are 1-mu wide each and 1-mu wide apart.
The length of the structure is 2000 mu. The crosstalk is mea-
sured between two wires embedded in the M2 layer. One is of
2.1 mu width; the other is of 1.1 mu width. The spacing between
these two wires is 1.95 mu. The distance to the M2 returns of
both wires is 10.3 mu. The 2-D surface matrices are formed for
two layers. One has the orthogonal returns present, while the
other does not. These two matrices are then cascaded to form
the matrix of a period. The one-period matrix is then cascaded
to form the matrix that covers 2000-mu length, and only involves
near-end and far-end surface unknowns. The current filaments Fig. 9. Simulation of an on-chip interconnect structure with parallel returns.
are placed between the ground and the M2 wires at the near and (a) Current distribution at 2 GHz. (b) Cross talk.
far end to extract circuit parameters. Fig. 10 shows the com-
parison of the simulated crosstalk in comparison with the mea-
sured data, which reveals an excellent agreement. If one uses a The backplane is 15-mu thick. Fig. 11(b) shows the Y parameter
standard finite element method to solve the problem, due to the and Q value simulated by the proposed method in comparison
densely populated orthogonal returns in M1 and M3 layers, one with those simulated by commercial electromagnetic simulator
has to solve over 3.043 million unknowns. In contrast, the pro- HFSS. Excellent agreement can be observed. Q value becomes
posed method solves the same problem rigorously using only negative because the inductor in fact becomes a capacitor at high
2270 unknowns. frequencies.
In the aforementioned examples, the layer-growth direction is Finally, we simulated a large-scale on-chip pentium4
chosen to be . In other words, the structure is segmented along M2-M8 power grid structure as shown in Fig. 12(a). The grid
. The layer-growth direction can also be chosen the same as the is 10 000 mu long and 200 mu wide. Before simulating this
stack-growth direction. This is useful to accommodate irregular example, we tested the accuracy of this method in power
geometries in the - plane. We simulated a spiral inductor to grid analysis by comparing its IR drop results at dc against a
demonstrate this capability. The geometry of the spiral inductor resistance (R)-based IR drop analysis. Since R-based analysis
is shown in Fig. 11(a). Its diameter (OD) is 1000 mu. The wire is generates a large number of resistances that are beyond the
50 mu wide ( mu) and 15 mu thick. The port separation, capability of a conventional SPICE-type circuit simulator, a
PS, is 50 mu. The inductor is backed by two package planes. 200 mu 400 mu block was sampled from the large-scale
Fig. 10. Crosstalk of a 3-D interconnect structure with orthogonal returns.
Fig. 12. Simulation of an on-die power grid. (a) Geometry. (b) VSS voltage
droop at dc. (c) S-parameters.
the VSS voltage droop simulated by the proposed method in

comparison with those obtained by R-based analysis. Excellent
agreement can be observed. We then performed the dynamic
analysis of the entire structure. Eight ports were sampled on
the grid as shown in Fig. 12(a). Fig. 12(c) shows the calculated
S parameters. Despite the large number of unknowns, the peak
memory usage is only 738-Mbytes for this example.
X. CONCLUSION
In this paper, we proposed a layered finite element method
for high-frequency modeling of large-scale three-dimensional
Fig. 11. Simulation of a RF spiral inductor. (a) Geometry. (b) Y-parameters on-chip circuit structures. This method is capable of solving
and Q value. an orders-of-magnitude smaller system to rigorously obtain the
solution of the original big problem. The system matrix of the
original 3-D problem is first reduced to that of 2-D layers. For
power grid for the purpose of validation. Eight C4 bumps on-chip interconnect structures, the computational cost of this
are landing at M8 wide metals, and 6-pair current sources reduction is modest, only involving the solution of a few 2-D
are attached at the bottom metal layer M4. Fig. 12(b) shows structure seeds. The matrix system of 2-D layers is then further
reduced to that of a single layer. This reduction only involves [14] J. M. Jin, The Finite Element Method in Electromagnetics, 1st ed.
single-layer unknowns irrespective of the original problem size. New York: Wiley, 1993.
[15] R. D. Graglia, D. R. Wilton, A. F. Peterson, and I. Gheorma, Higher
As a result, the proposed method possesses a high capacity to order interpolatory vector bases on prism elements, IEEE Trans. An-
solve large-scale interconnect problems. Equally important, the tennas Propag., vol. 46, no. 3, pp. 442450, Mar. 1998.
entire procedure is numerically rigorous without making any [16] D. Jiao, S. Chakravarty, C. Dai, and S. W. Lee, Surface-based finite
element method for large-scale 3-D circuit modeling, in Proc. 14th
theoretical approximation. In addition, it solves Maxwells cou- Topical Meeting on Electrical Performance of Electronic Packaging,
pled E-H equations, and hence features uncompromised electro- Oct. 2426, 2005, pp. 347350.
magnetic accuracy. Its accuracy and capacity are demonstrated [17] C. S. Desai and J. F. Abel, Introduction to the Finite Element Method.
New York: Van Nostrand Reinhold, 1972.
by numerical and experimental results. [18] J. Liu and J. M. Jin, A special higher-order finite element method for
scattering by deep cavities, IEEE Trans. Antennas Propog., vol. 48,
pp. 694703, May 2000.
ACKNOWLEDGMENT
The authors would like to thank M. J. Kobrinsky at Intel

Corporation for providing measured data, J. He (Intel) and
R. Chao (then Intel, now Taiwan National Chiao Tung Univer- Dan Jiao (S00M02SM06) received the Ph.D.
sity) for providing HFSS data. degree in electrical engineering from the University
of Illinois at Urbana-Champaign, in October 2001.
From 2001 to September 2005, she was a Senior
CAD Engineer, Staff Engineer, and Senior Staff
REFERENCES Engineer in the Technology CAD Division at Intel
Corporation, Santa Clara, CA. In September 2005,
[1] M. J. Kobrinsky, S. Chakravarty, D. Jiao, M. C. Harmes, S. List, and she joined Purdue University, West Lafayette, IN, as
M. Mazumder, Experimental validation of crosstalk simulations for an Assistant Professor in the School of Electrical and
on-chip interconnects using S-parameters, IEEE Trans. Advanced Computer Engineering. She has authored two book
Packaging, vol. 28, no. 1, pp. 5762, Feb. 2005, [see also IEEE Trans. chapters and over 60 papers in refereed journals and
Compon., Packaging and Manufacturing Technol., Part B: Advanced international conferences. Her current research interests include high frequency
Packaging]. digital, analogue, mixed-signal, and RF IC design and analysis, high-perfor-
[2] D. Jiao, M. Mazumder, S. Chakravarty, C. Dai, M. J. Kobrinsky, M. mance VLSI CAD, modeling of micro- and nano-scale circuits, computational
C. Harmes, and S. List, A novel technique for full-wave modeling electromagnetics, applied electromagnetics, fast and high-capacity numer-
of large-scale three-dimensional high-speed on/off-chip interconnect ical methods, fast time domain analysis, scattering and antenna analysis,
structures, in Proc. Int. Conf. Simulation of Semiconductor Processes RF, microwave, and millimeter wave circuits, wireless communication, and
and Devices, Sep. 35, 2003, pp. 3942. bio-electromagnetics.
[3] D. Jiao, C. Dai, S.-W. Lee, T. R. Arabi, and G. Taylor, Computa- Dr. Jiao received the 2006 Jack and Cathie Kozik Faculty Start-up Award,
tional electromagnetics for high-frequency IC design, in Proc. IEEE which recognizes an outstanding new faculty member in Purdue ECE. In
Int. Symp. Antennas and Propagation Transl.:invited paper, 2004, pp. 2004, she received the Best Paper Award from Intels annual corporate-wide
33173320. technology conference (Design and Test Technology Conference) for her work
[4] D. Gope and V. Jandhyala, PILOT: A fast algorithm for enhanced 3-D on generic broadband model of high-speed circuits. In 2003, she won the
parasitic extraction efficiency, in Proc. IEEE 12th Topical Meeting Intel Logic Technology Development (LTD) Divisional Achievement Award
on Electrical Performance of Electronic Packaging (EPEP), 2003, pp. in recognition of her work on the industry-leading BroadSpice modeling/sim-
337340. ulation capability for designing high-speed microprocessors, packages, and
[5] Z. H. Zhu, B. Song, and J. K. White, Algorithm in fastimp: A fast and circuit boards. She was also awarded the Intel Technology CAD Divisional
wideband impedance extraction program for complicated 3-D geome- Achievement Award for the development of innovative full-wave solvers for
tries , in Proc. 40th ACM/IEEE Design Automation Conf., 2003, pp. high frequency IC design. In 2002, she was awarded by Intel Components
712717. Research the Intel Hero Award (Intel-wide she was the tenth recipient) for the
[6] A. C. Cangellaris, Towards full-chip Analysis with Electromagnetic timely and accurate two- and three- dimensional full-wave simulations. She
Accuracy, presented at the IEEE 12th Topical Meeting on Electrical also won the Intel LTD Team Quality Award for her outstanding contribution
Performance of Electronic Packaging (EPEP), 2003. to the development of the measurement capability and simulation tools for
[7] L. Daiel, A. Sangiovanni-Vincentelli, and J. White, Using conduction high frequency on-chip cross-talk. She was the winner of the 2000 Raj Mittra
modes basis functions for efficient electromagnetic analysis of on-chip Outstanding Research Award given her by the University of Illinois at Ur-
and off-chip interconnects, Proc. DAC, pp. 563566, 2001. bana-Champaign. She has served as a reviewer for many IEEE journals and
[8] A. Rong, A. C. Cangellaris, and L. Dong, Comprehensive broadband conferences.
electromagnetic modeling of on-chip interconnects with a surface
discretization-based generalized PEEC model, in Proc. IEEE 12th
Topical Meeting on Electrical Performance of Electronic Packaging
(EPEP), 2003, pp. 367370. Sourav Chakravarty received the B.Tech. degree
[9] A. E. Yilmaz, J. M. Jin, and E. Michielssen, A parallel FFT-acceler- in electronics and telecommunication from the
ated transient field-circuit simulator, IEEE Trans. MTT, vol. 53, pp. Regional Engineering College, Kurukshetra, India,
28512865, Sep. 2005. in 1992, the M.E. degree in telecommunications
[10] S. Kapur and D. E. Long, Large-scale full-wave simulation , DAC, from Jadavpur University, Calcutta, India, in 1997,
pp. 806809, 2004. and the Ph.D. degree in electrical engineering from
[11] W. C. Chew, Toward a more robust and accurate fast integral solver the Pennsylvania State University, University Park,
for microchip applications, in Proc. IEEE 12th Topical Meeting on in 2001.
Electrical Performance of Electronic Packaging (EPEP), 2003, p. 333. From 1992 to 1995, he was a Senior Antenna
[12] D. Lukashevich, A. C. Cangellaris, and P. Russer, Broadband elec- Design Engineer at Superline Microwave Pvt. Ltd.,
tromagnetic analysis of interconnects by means of TLM and Krylov in Bangalore, India. He worked as a Research
model order reduction, in Proc. IEEE 14th Topical Meeting on Elec- Assistant in the Electromagnetic Communication Laboratory, Pennsylvania
trical Performance of Electronic Packaging, 2005, pp. 355358. State University, from 1997 to 2001. He is currently a Staff CAD Engineer at
[13] Z. G. Qian, J. Xiong, L. Sun, I. T. Chiang, W. C. Chew, L. J. Jiang, and Intel Corporation, Hillsboro, OR. His research interests include computational
Y. H. Chu, Crosstalk analysis by fast computational algorithms, in electromagnetics with emphasis on probabilistic optimization techniques and
Proc. IEEE 14th Topical Meeting on Electrical Performance of Elec- the applications of MoM and FDTD techniques to predict delay and crosstalk
tronic Packaging, 2005, pp. 367370. in interconnects.
Changhong Dai received the B.S. degree in physics cuit simulation, interconnect modeling for parasitic extraction, static and full
from Beijing University, Beijing, China, in 1985, and wave simulation of interconnects and RF devices, and power and power de-
the M.S. and Ph.D. degrees in materials sciences and livery modeling for IC product design. He is currently a Director with the Tech-
engineering from Stanford University, Stanford, CA, nology and Manufacturing Group of Intel Corporation, with the responsibility
in 1992 and 1995, respectively. of directing the development of Core CAD Technologies that enables Intel pro-
He joined the Technology CAD Division of Intel cessing technology and product design. His current responsibility covers the full
Corporation, Santa Clara, CA, in 1995, as a Senior spectrum of CAD tool and infrastructure development with a focus of bridging
CAD Engineer. Since 1995, he has been an R&D En- the processing technology and chip design.
gineer or Manager for model and CAD tool develop-
ment for circuit analysis and physical design, in the
areas of circuit reliability, transistor modeling for cir-

LFEM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LFEM

Uploaded by

Copyright:

Available Formats

422 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 55, NO.

A Layered Finite Element Method for Electromagnetic

achieve in computational electromagnetics. Since is a big

II. PROBLEM STATEMENT

subject to certain boundary conditions. In (1), the bar over

Fig. 3. Unknown ordering scheme.

Fig. 2. Illustration of prism vector basis functions.

IV. REDUCTION OF THE 3-D LAYERED SYSTEM MATRIX

As a result, the need of evaluating and

Fig. 5. Matrix cascading.

needed. Hence, we continue to reduce the dimension of the

the number of cascading multiplied by the time cost in each cas-

If a direct solver such as the LU decomposition is used

In contrast, the CPU cost of a conventional method is

assuming it uses the same matrix solver as used in the pro-

This performance can be further improved by exploring more

of logarithmic complexity stated in Section VI is then used to

Fig. 10. Crosstalk of a 3-D interconnect structure with orthogonal returns.

the VSS voltage droop simulated by the proposed method in

The authors would like to thank M. J. Kobrinsky at Intel

You might also like