You are on page 1of 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

A Low Overhead High Test Compression


Technique Using Pattern Clustering
With n-Detection Test Support
Seongmoon Wang, Wenlong Wei, and Zhanglei Wang

Abstract—This paper presents a test data compression scheme by LFSR reseeding is very difficult. Weighted random pattern
that can be used to further improve compressions achieved by testing has been developed as a technique to improve fault
linear-feedback shift register (LFSR) reseeding. The proposed coverage in random pattern-based built-in self-test (BIST)
compression technique can be implemented with very low hard-
ware overhead. The test data to be stored in the automatic test
[7], [8]. Recently, the application of weighted random pattern
equipment (ATE) memory are much smaller than that for pre- testing techniques to test data compression was presented
viously published schemes, and the number of test patterns that in [9]–[12]. Unlike other weighted random pattern testings
need to be generated is smaller than other weighted random [9]–[11], the technique proposed in [12], which is based on
pattern testing schemes. The proposed technique can be extended 3-weight weighted random BIST (or hybrid BIST [13]–[15]),
to generate test patterns that achieve high -detection fault requires no on-chip memory to store weight sets. In contrast
coverage. This technique compresses a regular 1-detection test
cube set instead of an -detection test cube set, which is typically to conventional weighted random pattern BIST where various
times larger. Hence, the volume of compressed test data for weights, e.g., 0, 0.25, 0.5 0.75, 1.0, can be assigned to outputs
-detection test is comparable to that for 1-detection test. Exper- of test pattern generators (TPGs), in 3-weight weighted random
imental results on a large industry design show that over 1600X BIST, only three weights, 0, 0.5, 1, are assigned. Due to its
compression is achievable by the proposed scheme with the test simplicity, it can be implemented with low hardware overhead.
sequence length, which is comparable to that of highly compacted
deterministic patterns. Experimental results on -detection test
The technique proposed in [12] enhances compressions
show that test patterns generated by the proposed decompressor achieved by simple LFSR reseeding with a 3-weight weighted
can achieve very high 5-detection stuck-at fault coverage and high random BIST technique. However, since this technique requires
compression for large benchmark circuits. two LFSRs, each of which should be loaded with a separate
Index Terms—Linear-feedback shift register (LFSR) reseeding, seed for each weight set, additional compression achieved by
linear decompression, -detection testing, test data compression. this technique is limited. The decompressor proposed in [16],
which is a preliminary version of this paper, needs only one
seed for each weight set to achieve even higher compression.
I. INTRODUCTION The proposed method requires no special automatic test pattern
generator (ATPG) that is customized for the proposed compres-

A SCERTAINING high quality of test for complex chips


requires huge test data. Hence, test data volumes for
complex chips often exceed memory capacity of automatic
sion scheme, and hence can be used to compress test patterns
generated by any ATPG tool. The proposed technique includes
efficient algorithms that compute the minimum number of
test equipments (ATEs). A number of different techniques weight sets (generators). In addition, two variations of decom-
to compress test data has been developed. Several test data pressor architecture are proposed to satisfy different objectives.
compression techniques based on linear-feedback shift register This paper extends our previous paper [16] with a technique to
(LFSR) reseeding (also called linear decompression) [1], [2] uncompact densely specified test cubes to balance numbers of
have been published since Könemann showed that it can ef- specified bits in weight sets (see Section IV-A), and a procedure
ficiently compress test patterns [3]. Several commercial tools to compute weight sets for designs with multiple scan chains
based on LFSR reseeding are also available [2], [4], [5]. LFSR (see Section VI-B).
reseeding techniques take advantage of the fact that typical Achieving high single stuck-at fault coverage does not guar-
scan test patterns have very few specified bits. Specified bits antee high quality of testing, especially for chips fabricated with
are those bits that are assigned binary values during test pattern nanometer processes. An -detection testing has been studied
generation. All other bits are not specified, i.e., don’t cares. by several researchers [17]–[20] as an effective test technique to
International Technology Roadmap for Semiconductor improve unmodeled defect coverage and reduce defective parts
(ITRS) 2005 [6] predicts that about 1000X compression will since it was proposed by Ma et al. [21]. An -detection test set is
be required around 2013. Achieving 1000X compression only developed to detect each fault by different test patterns in the
test set. Generating -detection test sets by ATPG techniques
Manuscript received December 09, 2008; revised June 14, 2009. has several serious difficulties that hinder wide adoption of this
S. Wang and W. Wei are with NEC Laboratories America, Princeton, NJ test method by the industry. First, the size of -detection test set
08540 USA (e-mail: swang@nec-labs.com; wwei@nec-labs.com).
Z. Wang is with Cisco Systems, Inc., San Jose, CA 95134 USA (e-mail:
grows approximately linearly with [19]. Volumes of even tra-
zhawang@cisco.com). ditional (1-detection) test data for today’s complex chips often
Digital Object Identifier 10.1109/TVLSI.2009.2026420 exceed the memory capacity of ATEs. Since times more test
1063-8210/$26.00 © 2009 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

patterns are applied, the test application time will also increase
about times. Test data volume and test application time are
the two major factors that determine the overall test cost. Most
ATPG-based techniques [17], [20] generate an -detection test
set by generating different test sets, and then, eliminating un-
necessary test patterns from the set. Hence, the total test gener- Fig. 1. Test cube set and generator.
ation time can increase significantly.
Although one of the main objectives of test compression is
improvement of test quality, to our best knowledge, there are a weight set like [13], and then, compressing the weight set by
very few published papers that directly address both high test LFSR reseeding. Merged test cubes are recovered by a 3-weight
data compression and high -detection coverage. A straightfor- weighed random BIST during test application.
ward approach to reduce large volume of an -detection test set, A test cube is a test pattern that has unspecified bits. A gener-
which was possibly generated by an ATPG-based -detection ator for a circuit with inputs, which is derived from a set of test
test generation technique, is to apply an existing test compres- cubes, is represented by an -bit tuple ,
sion technique on the -detection test set. This approach will where . If input is always assigned or
reduce the volume of the -detection test set. However, since 1 (0) in every test cube in the test cube set and assigned 1 (0)
most test compression techniques based on LFSR reseeding [2], in at least one test cube, then the input is assigned 1 (0) in
[3] and broadcast scan [22], [23] compress each test pattern sep- the corresponding generator. If the input is never assigned a
arately, i.e., test patterns are converted into different com- binary value 1 or 0 in any test cube in the test cube set, then is
pressed data, the volume of compressed data for an -detection assigned in the corresponding generator. Finally, if the input
test set will be also times larger than that of compressed data is assigned a 1 (0) in a test cube and assigned a 0 (1) in
for the 1-detection test set if the same compression method is another test cube in the test cube set, then the test cube
used. Pomeranz and Reddy [19] proposed a test compression is said to conflict with the test cube at input and is as-
technique for -detection test. Their decompressor is basically signed a in the generator. Inputs that are assigned s in
a large decoder. Area overhead for decompressors of large de- are called conflicting inputs of .
signs, which require large number of test patterns for high fault Example 1: In Fig. 1, is a deterministic
coverage, will be significant. Further, if test patterns are regen- test cube set that is merged into a generator . In , inputs
erated due to last minute design changes, then the decompressor and are assigned only or 0. Hence, weight 0 is given
should be also redesigned and resynthesized. A technique to to and in . Note that even if we fix and to 0s,
enhance the probability of detecting unmodeled defects by uti- we can still detect all faults that are detected by , , and .
lizing don’t cares existing in test patterns is proposed by Tang Since is always assigned or 1 in every test cube, weight 1 is
et al. [24]. This technique can be used in conjunction with a test assigned to , i.e., is fixed to 1. On the other hand, and
data compression scheme. are assigned 0 in some test cubes and 1 in some other test cubes.
With little modification, the proposed test data compression Hence, unlike , and , we cannot fix these inputs to binary
technique can generate test patterns that achieve high -detec- values, and weight 0.5 is assigned to these inputs (symbol that
tion coverage. This part of the study is presented in our prior denotes weight 0.5 is given to and in ). Finally, since
paper also[25]. The proposed technique compresses a 1-detec- the value at is a don’t care in every test cube, is assigned
tion test set rather than an -detection test set to generate -de- to in .
tection test patterns. Hence, even though the proposed technique The three test cubes that are merged into can be recovered
can achieve very high -detection coverage, the volume of com- by the conceptual decompressor shown in Fig. 1. The S-TPG
pressed test data for -detection test is comparable to that of and the F-TPG are controlled by the ATE during test application
compressed test data for 1-detection test. Unlike [19], the de- to generate desired patterns, while the R-TPG is a free-running
compressor need not be redesigned for design changes unless random pattern generator. The S-TPG controls the select input
there are drastic design changes. of the multiplexer; if is assigned a (0 or 1), then the output
The rest of this paper is organized as follows. Section II il- of the S-TPG is set to a 1 (0) at the th scan cycle to select the
lustrates the generator and the conceptual decompressor for the R-TPG (F-TPG) as the pattern source for . The F-TPG gener-
proposed compression method. In Section III, architecture of ates the values for the inputs that are assigned binary values in
the proposed decompressor is described. In Section IV, the al- . The F-TPG can be implemented with any linear test pattern
gorithm of computing generators is described. Section V de- generator such as an LFSR, a cellular automaton, or even a ring
scribes two variations of the proposed decompressor. Section VI generator of embedded decompression test (EDT) [2]. If is
extends the proposed method to multiple scan chain designs. assigned a 1 (0) in , then the output of the F-TPG should be
The application of the proposed method to -detection testing set to a 1 (0) at the cycles when a value for is scanned into
is presented in Section VII. Experimental results are shown in the scan chain. If four test patterns are generated from and
Section VIII, and Section IX has conclusions. the R-TPG generates 00, 01, 10, and 11 for and respec-
tively, in each of the four test patterns, then all faults detected by
II. PRELIMINARIES , , and are also detected by the four test patterns. In this
In this paper, compressions achieved by traditional LFSR re- paper, the values required at the output of the S-TPG are repre-
seeding are enhanced by compressing multiple test cubes into sented in S-pattern , while the values required at the output
one seed. This is achieved by merging multiple test cubes into of the F-TPG are represented by F-pattern .
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 3

by each generator to make every test cube merged into cover


at least one test pattern generated by the decompressor for .
Hence, we use for short test sequences.
Before each scan shift operation for , the F-TPG is
loaded with the same seed, which was computed from by
solving a set of linear equations [1], and repeatedly generates
the same pattern shown in Fig. 2. The S-TPG
comprises of a modulo-7 counter, a 2 3 FIFO, a multiplexer,
and a comparator. The modulo-7 counter is reset to 0 in every
capture cycle, and then increments by 1 thereafter at every shift
cycle. The FIFO is loaded with the locations of the scan inputs
that are assigned s in (since and are assigned s
in , shown in Fig. 1, the FIFO is loaded with 1 and 3). The
output of the comparator is set to 1 when the content of the
modulo-7 counter equals the first (topmost) entry of the FIFO,
and set to 0 in all other cycles. Hence, will be set to 1 (and all
entries in the FIFO rotate by one entry) in the cycles when the
content of the counter is 1 and 3, and 0 in all other cycles. This
is repeated for all four test patterns, , , , and
Since very small number of s are allowed, the number of
storage elements required for the S-FIFO inside the S-TPG is
also very small. The total number of storage bits for the S-FIFO,
Fig. 2. Decompressor architecture. i.e., the number of bits required to store locations of conflicting
scan inputs of a generator, is given by , where
is the number of scan flip-flops in the scan chain or scan depth.
Assume that 100X compression is achieved by reseeding the Let us consider a scan design with a scan chain that comprises
F-TPG, i.e., compressing each individual test cube into a seed 130 000 scan flip-flops. Assume that to compress the test set by
achieves 100X compression. If the proposed technique merges regular LFSR reseeding, a 650-stage (0.5% of total number of
on average three test cubes into a generator and requires about scan flip-flops) LFSR is required (this will achieve 200X com-
10% additional data, which include data for the S-TPG, and ad- pression). If for the proposed decompressor, then the
ditionally specified bits (since multiple test cubes are merged total data overhead for the S-TPG is
into a generator, the number of specified bits in a generator for each generator, less than 10% of the test data volume for
can be larger than that of specified bits in any individual test LFSR reseeding.
cube that was merged into the generator) are needed for the The S-FIFO needs 51 (depth 3 and width 17) storage ele-
proposed method, then the proposed method achieves approxi- ments. The depth of the S-FIFO is independent of the size of
mately compression. In other words, the design for which the decompressor is designed. The width of
proposed method improves the compression achieved by LFSR the S-FIFO is logarithmically proportional to the number of
reseeding by a factor of about 3. scan flip-flops in the design. Hence, the number of storage el-
ements (also, hardware overhead) for the S-TPG will not in-
III. ARCHITECTURE OF THE PROPOSED DECOMPRESSOR crease significantly even for large designs. In fact, the ratio of
Fig. 2 describes architecture of the decompressor for the the number of storage elements for the S-FIFO to the number
proposed method to generate test patterns for generator of storage elements for the F-TPG will be even lower for larger
shown in Fig. 1. During test application, if every test cube that designs. Other hardware components required to implement the
is merged into , i.e., , , and , covers at least one test proposed decompressor besides the S-FIFO and the F-TPG in-
pattern generated by the decompressor, then it is guaranteed clude a -stage modulo counter and a -bit com-
that the test patterns generated by the decompressor detect all parator (a 17-stage counter and a 17-bit comparator when
faults detected by , , and . An -bit test cube is said to ). Combined area overhead for these components is neg-
cover another -bit test cube if: 1) or , with ligible, considering the size of the design that has 130 000 flip-
or 1, at the positions where , where , and flops (since the R-TPG can be shared with the F-TPG, it is not
2) at the positions where , with . considered as additional hardware).
In the example shown in Fig. 2, covers , covers and
, and also covers and . IV. COMPUTING GENERATORS
The maximum number of conflicting inputs or s allowed If an LFSR is used to implement the F-TPG, the number of
in a generator is denoted by . If a large number of s are bits to be stored in the ATE memory for F-TPG data is roughly
allowed, i.e., large is used, then a large number of test given by the number of generators the number of specified
cubes can be merged into each generator, thus leading to higher bits in the most densely specified generator (since data for the
compression. However, since random patterns generated by the S-TPG are small as described before, F-TPG seeds will dom-
R-TPG are applied to the conflicting inputs, if large is inate the overall test data volume for the proposed method).
used, in general, more than patterns should be generated Hence, to minimize overall test data volume, we minimize the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

number of specified bits in the most densely specified generator


and the total number of generators.

A. Uncompacting Densely Specified Test Cubes


The number of care bits of a generator is tightly related to the
numbers of care bits of test cubes that were merged into the gen-
erator. The proposed technique compresses test cubes generated
by a regular (commercial) ATPG tool, instead of generating test
cubes suitable for compression by a special ATPG. Test data
compaction [26], [27], which is employed by most ATPG tools
Fig. 3. Uncompacting test cube d .
to reduce the number of test patterns, increases specified bits in
test cubes. Static compaction reduces the number of test patterns
by merging several compatible test cubes into one test cube.
Hence, it is likely that those very densely specified test cubes from whose input set includes the fewest inputs
were created from merging several test cubes by test data com- that are already specified in among all input sets (this
paction. We uncompact (reverse compact) a few very densely step minimizes overlaps of specified inputs between
specified test cubes into less densely specified test cubes. This and ). Remove from and mark all faults that are
will efficiently reduce the size of LFSR and the overall test data observed at the selected output from . For every input
volume without significant increase in the test sequence length, in , if input is assigned a binary value in , then
since only very few (say, 5%) densely specified test cubes are set the corresponding input to in . If there are no
uncompacted. unmarked faults in , go to Step 6. Repeat step 5.
To efficiently uncompact a densely specified test cube into For shown in Fig. 3, step 5 is iterated three times. In
two test cubes and , the following should be satisfied. the first iteration, since some bits of are specified
First, specified bits in should be evenly divided into and while none of are specified, the number of speci-
. Second, the overlap of specified bits between and fied bits in is greater than that of specified bits of
should be minimized. In other words, if the input is specified . Hence, and . Since the input
in , then it should not be specified in , and vice versa. set of includes no inputs that are specified in
The divided test cubes and should be able to detect all , is removed from , and , , and , which
faults, which the original test cube detects. To satisfy these are observed at , are marked. The values that the in-
requirements, in this paper, each very densely specified test cube puts in are assigned in are copied to , making
is divided into two less densely specified test cubes and . In the second itera-
by the following procedure. tion, since the number of specified bits in is the same
1) Identify a set of all outputs at which at least one fault as that of specified bits in , , and .
in set , where is the set of the faults that are detected Since no inputs in are specified in , is se-
by , is observed. lected next and removed from . The fault is marked
For example, since every output detects at least one fault (the other fault has already been marked). The values
in Fig. 3, . that the inputs in are assigned in are copied to to
2) For every output in , find the set of inputs that are make .
in the fanin cone of output and specified in . In the third iteration, and .
In Fig. 3, , , Since only is left in , is marked. Since
, and . , , , and in ,
3) Initialize all bits of two test cubes and with s, . Since
both of which have the same number of bits as . there are no unmarked faults in , we move on to step 6.
4) Select output at which the largest number of faults 6) Run fault simulation with the divided test cubes and
in are observed and mark the faults that are observed , and identify the lists of faults and that are
at from . For every input in , which drives detected by and , respectively.
(note that all inputs in are specified), if is and in
assigned , where or 1, in , set the corresponding Fig. 3.
input to in . If the number of specified bits in either of the partitioned test
In Fig. 3, since the largest number of faults, , , , cubes and is large, then the test cube is further divided
and, , are detected at , . The output into another pair of test cubes.
is removed from and , , , and are marked.
B. Algorithms to Compute Generators
Since the inputs, , , , and , which belong to
are assigned, respectively, 1, 0, 0, and 0 in , Let the set of test cubes to be compressed by the proposed
. method be . Test cubes in are grouped into test cube sub-
5) If the number of specified bits in is greater than that sets , where . A generator is derived from
of specified bits in , then set and each test cube subset according to the procedure described
(this process balances numbers of specified bits in and in Section II. Each test cube subset is formed by moving
). Otherwise, and . Select an output test cubes from into until adding any more test cube in
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 5

next test cube. Also, assume that there are no overspecified bits
in . Hence, is added into as it is. is updated to
.
Since the number of care bits and the number of conflicting
inputs to be incurred by adding a test cube are computed before
overspecified bits in the test cube are relaxed, some test cubes
that make the number of care bits in greater than , or the
number of conflicting inputs greater than before the relax-
ation can be actually added without exceeding or if
some overspecified bits in test cubes are relaxed to s. Hence,
we introduce margins and to compensate for this in-
accuracy. If no test cube in can be added into without
exceeding or (before relaxations), then we select a
Fig. 4. Constructing test cube subsets. (a) Original test cube set D. (b) Parti- test cube in that does not make the number of care bits in
tioned test cube subsets. greater than or the number of conflicting inputs
greater than , and relax overspecified bits in that test
cube. Assume that margins and are both set to 1.
into makes the number of care bits (0, 1, ) in the corre- No test cube remaining in can be added into without ex-
sponding generator greater than a predefined number , ceeding or before relaxations. However, adding
or the number of conflicting inputs in greater than another makes the number of specified bits 7 (not greater than
predefined number . ) and the number of conflicting inputs 2. Hence, is se-
Example 2: Fig. 4 illustrates computing generators from a lected as the next candidate. Assume the 1 assigned at is re-
set of test cubes , which has 12 test cubes. Assume that laxed to . Hence, is added into . Adding does not
is set to 6 and to 2, and the F-TPG is implemented with change . Since adding into makes the number of con-
an LFSR, which has stages, where is a small flicting inputs in 3 and the number of care
natural number added as a margin to ensure that equations are bits 7 , is selected as the next candidate.
solvable [3]. First, we run fault simulation with the entire test Assume that no specified bits in can be relaxed. Hence,
cubes in and identify the set of faults that are detected cannot be added into , and thus returned to . No more test
by each test cube , where . The set of faults cubes from can be added into without making the number
is called the target fault list of and the faults in are of specified bits in greater than or the number
called the target faults for . Then, we start constructing test of conflicting inputs greater than . Hence, forming
cube subsets starting from by moving test cubes from is completed.
one test cube at a time. The column shows numbers of We obtain F-pattern from
faults in target fault list. First, an empty set is created and . Next, a seed for is computed by using a linear solver.
generator is initialized to . The test cube We load the F-TPG with the computed seed and load the S-FIFO
that has the largest number of target faults is selected as the with locations of conflicting inputs of , i.e., 2 and 6.
first test cube to be moved. Since has the largest number patterns are generated by the decompressor. If there is any test
of target faults, is selected first to be moved into . After cube in that covers no test pattern in the
is added into , is updated to set of test patterns generated by the decompressor for ,
(see Section II). Next, the test cube then more test patterns are generated by the decompressor until
that will cause the minimum number of conflicting inputs in all the four test cubes cover at least one test pattern generated by
when added into is selected from . Since causes only the decompressor. We run fault simulation with the generated
one conflicting input and six care bits in , is test patterns, which are fully specified, and drop all detected
selected as the next test cube. faults from the target fault lists of test cubes remaining in .
Typically, even if some specified bits in a test cube, which Note that of some test cubes are reduced due to dropped
was generated by an ATPG tool, are relaxed to s (don’t cares), faults. This process is repeated until all test cubes are removed
all faults that are detected by the original test cube can still be from to merge the 12 test cubes in into four generators,
detected. When these overspecified bits are relaxed, more test .
cubes can be merged into each generator to reduce the total
number of generators. In this paper, we try to relax specified C. Overall Algorithm
inputs only if is in the current generator or is as- Now the procedure to compute generators from a set of test
signed in the test cube, but assigned in the current gener- cubes is summarized in the following.
ator. Relaxed inputs are denoted by underlines in Fig. 4(b). As-
sume that no specified bits can be relaxed to s in without 1) Apply the uncompaction process (see Section IV-A) to
making any target fault of undetected. After is added into a few exceptionally densely specified test cubes in .
, is updated to . When added Define and . .
into , both and cause only one additional conflicting 2) Unmark all test cubes in . and
input in , and neither of them makes the number of care bits . Select a test cube that has the largest number
in greater than . Assume that is selected as the of faults in its target fault list from , relax overspecified
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

bits in the test cube, and move it to . Update


accordingly.
3) If is empty, go to step 5. If there is at least one test cube
in that can be added into the current test cube subset
without making the number of s in greater than
or the number of care bits in greater than ,
then select test cube from those test cubes that will
cause the minimum number of new s in , add the
test cube into after relaxing overspecified bits in
, and update accordingly. Otherwise, go to step 4.
Repeat step 3.
4) If there is at least one unmarked test cube in that
does not make the number of s in greater than
or the number of care bits in greater
than when it is added into , then select a
test cube randomly among these test cubes and relax
overspecified bits in the test cube . Otherwise, go to Fig. 5. Variations of proposed decompressor. (a) Scheme to generate more vari-
step 5. If the relaxed test cube can be now added to able patterns. (b) Scheme to reduce test sequence length.
without making the number of s greater than
or the number of care bits in greater than , then
add test cube into and update accordingly. test data volume, it can increase test application time, which is
Otherwise, mark and put back into . Repeat step 4. also one of the important terms that determine test cost.
5) Compute a seed for , generate test patterns by Using the decompressor shown in Fig. 5(b), the number of
simulating the decompressor, fault simulate with the test test patterns generated for each generator can be reduced. Note
patterns generated by the decompressor using , and that the R-FIFO inside the R-TPG is loaded with 00, 11, and 10,
drop the detected faults from the target fault list of every which are covered, respectively, by 0X, 11, and 10 (the values
test cube in and eliminate test cubes whose target fault the conflicting inputs and are assigned in , , and ).
lists become empty. . If is empty, then exit. In each capture cycle, the first entry of the R-FIFO is loaded into
Otherwise, go to step 2. the shift register, and the other entries in the R-FIFO are shifted
up by one entry. Then, in every scan shift cycle when the counter
V. VARIATIONS OF THE S-TPG value equals the output value of the S-FIFO, i.e., is set to 1,
the last bit in the shift register is shifted into the scan chain. If
A. Scheme to Generate More Variable Patterns test cubes are merged into , this version of decompressor
needs to generate only test patterns for .
In Fig. 2, since only two inputs, and , are assigned s in
, all the four test patterns will differ only at and . Hence, VI. EXTENSION TO MULTIPLE SCAN CHAINS
only a few new faults will be detected by each test pattern after
the first test pattern. This leads to an increase in the number A. Decompressor Architecture
of generators. The decompressor shown in Fig. 5(a) can gen- Fig. 6 depicts an implementation of the proposed de-
erate test patterns with more variations. Because of the toggle compressor for a circuit with multiple (512) scan chains.
flip-flop inserted between the select signal of the multiplexer For convenience of illustration, assume that all of the 512
and the output of the comparator, for shown in Fig. 5(a), the scan chains comprises 256 scan flip-flops without any loss
signal is set to 1 at the fifth (tenth) scan cycle and stays at 1 of generality (hence, the design has a total of 131 072
until the eighth (14th) scan cycle. As a consequence, the set of scan flip-flops). Scan chains are organized into 64 groups,
consecutive inputs , , and , and another set of consecutive . Although in this particular
inputs , , and can be assigned different values in example, every group contains the same number (8) of scan
each test pattern. chains, it is not necessary for every group to have the same
number of scan chains. A multiplexer is inserted before the
B. Scheme to Reduce Test Sequence Length
input of each scan chain , where , to
In test cube subset shown in Fig. 4(b), and , which select a scan pattern source between the output of the F-TPG
are assigned s in , are assigned 0X in , 11 in , and and the output of the R-TPG . The select inputs for
10 in , respectively. In order to guarantee detecting all faults all eight multiplexers in each group are driven by the
that are detected by , , and , the decompressor should output of a common two-input AND gate. Each entry of
continue generating test patterns using the same generator the S-FIFO for the multiple scan chain version is divided into
until it generates three test patterns, each of which, respectively, two sections: one for the identification number of the group a
assigns 01 or 00, 11, and 10 to and . Since the R-TPG conflicting input belongs to (group ID) and the other for the
is free running, the R-TPG may not generate test patterns that location of the conflicting input in the scan chain. For example,
assign desired values to the conflicting inputs of the generator the first (topmost) entry in the FIFO, shown in Fig. 6, has 1 for
for a long period of time. Even though this does not increase the the group ID and 13 for the scan input location. The group ID
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 7

Fig. 7. Updating generator values.

with the values generated by the F-TPG) in the 13th shift cycle.
Then, the entries of the FIFO are rotated up by one entry and
becomes the first entry. In the 224th scan shift cycle,
the scan chains in are loaded with the values generated
by the R-TPG, and the entries of the FIFO are again rotated up
by one entry. When the scan test pattern is fully loaded into the
scan chains (at the 256th scan shift cycle), the counter is reset to
0. This makes the entries of the S-FIFO rotated up again and the
S-FIFO returns to its initial state. This is repeated for all
test patterns, which are generated from .

B. Computing Generators for Multiple Scan Chain Design


Fig. 6. Decompressor for 512 scan chains. Since all eight multiplexers in a group are controlled
by the same signal , the th scan inputs of all eight scan chains
in receive their scan values from the same TPG, either
of the first entry is input to the decoder to generate signals , the R-TPG or the F-TPG, in any scan shift cycle. Hence, during
where , each of which controls the AND gate in the process of computing generators (see Section IV), if adding
the corresponding group , and the scan input location a test cube into the current test cube subset causes a conflict at
of the first entry is input to the comparator. The output of the the th input of scan chain that belongs to , i.e.,
comparator is set to 1 if the content of the first entry equals changes the th input of in the current generator to
the counter value. Otherwise, is set to 0. from , where or 1, then the generator values assigned at
The main purpose of organizing scan chains into groups is to the th inputs of the other scan chains that belong to the same
reduce hardware overhead for the decoder (if scan chains group and are currently assigned binary values (0 or 1)
are grouped into each group, then the number of outputs of the in the current generator should also be changed to s. (If the th
decoder can be reduced by a factor of ). Grouping can also scan input of a scan chain in the same group is currently assigned
reduce test data volume to be stored in the ATE memory and an in the current generator, its value need not be changed.)
the number of storage elements required for the S-FIFO. If we Fig. 7 illustrates the addition of a new test cube into the cur-
reduce the number of chains in each group from eight to four rent test cube subset. The 13th scan input of scan chain
for the decompressor shown in Fig. 6, then the total number of (denoted by ) is assigned a 1 in the generator , while
groups increases from 64 to 128. The 6-to-64 decoder should be the 13th scan input of scan chain , which belongs to the
replaced by a 7-to-128 decoder. We also need 64 more two-input same group as , is assigned a 0 in before is added.
AND gates and extra routing to connect the 64 additional AND Now, the test cube in which is assigned a 0 is added into
gates to the outputs of the decoder. In addition, each group ID the current test cube subset to cause a conflict. Therefore, the 1
section of the S-FIFO needs one more bit. As an extreme case, at in is now changed to ( after is added into
if scan chains are not grouped, i.e., each group has only one the current test cube subset is denoted by in Fig. 7). Even
scan chain, then the decoder with 512 outputs is required and though is assigned an in (an does not cause a con-
the S-FIFO needs 9 bits for its group ID section. flict with any value), the 0 at is also changed to in
In Fig. 6, the S-FIFO is loaded to generate test patterns for due to at in . On the other hand, since input
generator that has three conflicting inputs ( s): the 13th scan is assigned in before is added and also assigned in
input of scan chain , the 13th scan input of , and , input holds its previous value in even after
the 224th scan input of . The 13th scan inputs of scan is added.
chains are assigned s in . Likewise, For the reason described in the previous paragraph, if each
the 224th scan inputs of all scan chains in group except the group contains large number of scan chains, then the number
224th scan input of are assigned s. The S-FIFO is of conflicting inputs will quickly reach and the number
loaded with two valid entries and (the last entry of test cubes that can be added into each test cube subset will
is not valid). Since the group ID field of the first entry decrease. This will, in turn, increase the total number of genera-
is 1, initially signal is set to 1 and all the other outputs of tors and decrease compression. On the contrary, if the number of
the decoder are set to 0. In the 13th scan shift cycle, the output scan chains in each group is too small, then it will increase hard-
of the comparator is set to 1 and the output of the AND gate ware overhead and also test data volume. The optimal number of
transitions to 1. Therefore, all scan chains , where scan chains in a group should be determined by considering the
, in are loaded with the values generated number of specified bits in test cubes; if test cubes are sparsely
by the R-TPG (all the other scan chains in the design are loaded specified, large sizes of groups will be preferable.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

C. Hardware Overhead
Extra hardware required to implement the proposed decom-
pressor shown in Fig. 6, excluding the F-TPG, which is also
required for a regular LFSR reseeding technique, is the S-TPG,
64 two-input AND gates, and 64 2-to-1 multiplexers. The S-TPG
is, in turn, composed of a 3 14 (6-bit group identification and
8-bit scan flip-flop location) FIFO, a 6-to-64 decoder, an eight-
stage counter, and an eight-bit comparator. Since the R-TPG can
be shared with the F-TPG, hardware overhead for the R-TPG
is not considered. The gate equivalent (the number of gates in
two-input NAND gates) for the 6-to-64 decoder is 385 and the
gate equivalent for the 8-bit comparator is 116 (we synthesized Fig. 8. Generating 2-detection test patterns.
the decoder and the comparator by Synopsis Design Compiler).
If we assume that the gate equivalent for a storage element is 6,
then the total gate equivalent for the 3 14 FIFO is 252. The
are assigned 10, 01, 00, and 11, respectively, in , , , and
eight-stage counter can be implemented with 48 NAND gates.
, then all faults , , and are detected by two different
Since a 2-to-1 multiplexer can be implemented with four NAND
patterns as shown in Fig. 8(c).
gates, the gate equivalent for the 64 2-to-1 multiplexers is 256.
The proposed decompressor [except the variation-R shown
The total gate equivalent for all aforementioned components is
in Fig. 5(b)] generates test patterns from every generator.
about 1100. Considering the size of the design, which has more
Let the number of deterministic test cubes in be and the
than 130 000 flip-flops, overhead for 1100 two-input NAND gates
set of faults detected by these test cubes be . Typically,
will be almost negligible (if we assume that the gate equivalent
is smaller than . Note that generating only patterns
of a scan flip-flop is 10, the overall overhead of the proposed
by the decompressor from is enough to detect all faults in
decompressor is ). Note that
once. Hence, the remaining patterns can be
this does not consider the combinational part and memory of the
used to detect hard faults, i.e., faults that have been detected by
design. If we consider them, it will be much lower. The width
fewer than test patterns. For example, in Fig. 8(c), generating
of the S-TPG (and also the counter and the comparator) is log-
, , and detects all the faults , , and . Hence, is
arithmically proportional to the scan depth (the number of scan
indeed generated to detect hard faults and by one more test
flip-flops in the longest scan chain). Hence, hardware overhead
pattern. If , instead of and , is a hard fault, then we will
will not increase significantly even for larger designs.
generate that detects .
VII. APPLICATION FOR HIGH -DETECTION COVERAGE
B. Decompressor Architecture for High -Detection Coverage
A. -Detection Property of the Proposed Decompressor Fig. 9(b) shows an implementation of the proposed -detec-
In Fig. 8(a), assume that the deterministic test cubes , , tion decompressor for shown in Fig. 9(a). Like the varia-
and , which are merged into generator , respectively, detect tion-I shown in Fig. 5, the S-FIFO inside the S-TPG is loaded
, and . Typically, a fault can be detected by many dif- with locations of pairs of inputs and , where , is
ferent test cubes. Assume that the test cube assigned a , is assigned a non- , i.e., 0, 1, or , and all the
detects and inputs between and are assigned s in . For example,
detects . The decompressor generates patterns , , , and in the generator shown in Fig. 9(a), is the first input that is
from as shown in Fig. 8(b) (only the inputs that are as- assigned a non- value after ( and between and
signed binary values in are specified). Although inputs , are assigned s). Hence, the locations, 5 and 8, of the input pair
, and are assigned the same values in all these four test pat- and are loaded. Likewise, the locations of another input
terns, all faults , , and are detected. This implies that , pair, and , are loaded after the pair 5 and 8.
, and are highly correlated with each other, and it will be In each capture cycle, both the T and the D flip-flop are reset
easy to make fault(s) that are detected by a test pattern , , to 0 and . The modulo-16 counter is reset to 0 in the
2, 3, or 4, detected by another test pattern , where and same cycle, and then, increments by 1 thereafter at every shift
, 2, 3, or 4, by carefully assigning binary values to the cycle. When , where or 1, patterns generated
inputs that are assigned s or s in . Since and are by the F-TPG are scanned into the scan chain. The F-TPG is
assigned s in , all faults that are detected by , , and loaded with a seed for before test patterns are generated for
can be detected independent of the values assigned to and . The T flip-flop flips its state when the counter value equals
. On the other hand, and , which are assigned s in , the output of the S-FIFO. Since the first entry of the S-FIFO is
should be assigned the proper values to make detect 5, the T flip-flop flips to 1 in the fifth scan shift cycle and
the faults detected by , , and , i.e., and should be becomes 10. Then, all entries in the S-FIFO rotate up by one
assigned 10, 01, and respectively, at least in one pattern entry (the first entry becomes 8). When , the U-TPG is
, where . The conceptual decompressor shown in selected as the pattern source for the scan chain. The one stored
Fig. 8(d) generates such patterns. Assume that the U-TPG gen- in the last stage of the shift register of the U-TPG is scanned
erates 10, 01, 11, and 00 for the inputs and respectively into the scan chain (hence, will be assigned 1) and the shift
in , , , and . If and , which are assigned s in , register shifts right by 1 bit. The D flip-flop is set to 1 one cycle
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 9

into two different groups according to the number of test cubes


from which each generator is derived: single cube generators (or
SCGs) and multiple cube generators (or MCGs). As the name
implies, the test cube subset for each SCG contains only one
test cube (because adding any more test cube into the test cube
subset makes the number of conflicting inputs greater than
or the number of specified bits greater than ). Note that
SCGs have no s. In contrast, MCGs are derived from test cube
subsets that contain more than one test cube.
After computed generators are divided into MCGs and SCGs,
each MCG is expanded into patterns, which represent
patterns generated by the decompressor during test application,
and are denoted by in Fig. 9(a). These ex-
panded patterns are only partially specified; only two different
types of inputs are specified. The first type of specified inputs
are the inputs that are assigned binary values in . If input
is assigned a binary value in , then is assigned in
all patterns generated by the decompressor from . For
example, in of Fig. 9(a), , , , , and are
assigned the values that these inputs are assigned in . As the
second type of specified inputs, if input is assigned a none-
in and is assigned , where , and the inputs be-
tween and are assigned s in , then the inputs between
Fig. 9. n-detection decompressor architecture. (a) Optimizing generator. (b) and are specified in all of the expanded patterns for .
Decompressor. (c) Timing diagram. The binary values assigned to the latter type of inputs in the ex-
panded patterns are determined by simulating the R-TPG (the
values for these inputs are provided by the R-TPG during test
after the T flip-flop is set to 1. The T and D flip-flops hold their application). Unlike the first type of specified inputs, the second
states 11, and the R-TPG is selected as the pattern source for the type of specified inputs are assigned different values in different
scan chain until the counter value becomes 8. When the content expanded patterns of . The values for the remaining unspec-
of the counter becomes 8, the T flip-flop flips back to 0 and the ified inputs in , where , are assigned later
entries of the S-FIFO rotate up by one entry again (10 becomes to maximize -detection coverage.
the first entry). Since , test patterns generated by the After every generator is expanded into partially spec-
F-TPG are scanned into the scan chain until the content of the ified patterns, for each fault in the fault list, we compute the
counter becomes 10. In the tenth scan shift cycle, the T flip-flop detection count, i.e., the number of patterns that detect , by fault
flips to 1 and the U-TPG is selected as the pattern source to scan simulating the expanded patterns for the entire MCGs and SCGs
the 1 at the last stage of the shift register into the scan chain. In (at this stage, SCGs are not expanded). During the fault simu-
the next cycle, since (the D-flip-flop is set to 1), the lation, faults that are detected by patterns are dropped from
R-TPG is selected as the pattern source. The T and D flip-flops the fault list. Note that if a fault is detected by different ex-
hold their states 11 until the content of the counter becomes 14. panded patterns, which are partially specified, then test patterns
After a scan pattern is fully loaded into the scan chain, the scan generated by the decompressor during real test application def-
chain captures the response to the loaded scan pattern. In the initely detect fault or more times.
same cycle, the shift register inside the U-TPG is loaded with Next, some inputs that are currently s ( inputs, for short)
the first entry in the U-FIFO and the entries in the U-FIFO shift in expanded patterns , where , are now
up by one entry. This is repeated for the other patterns, , , specified to detect hard faults in the next procedure. This pro-
and . Fig. 9(c) presents timing diagrams for related signals. cedure is similar to dynamic compaction procedure [26] of the
The compression scheme for -detection testing requires ATPG process. Test patterns after this procedure are represented
more storage bits than that for 1-detection testing. However, by , , , and in Fig. 9(a). We first select an ex-
the increase in test data volume due to U-FIFO data is very panded pattern of and imply the values assigned in
small; if (maximum eight test cubes can be merged into internal circuit lines (since is partially specified, only
into each generator), then maximum data for the U-FIFO are some internal circuit lines will be set to binary values). Then we
only 24 bits per generator. For the same reason, the increase in select a hard fault , i.e., a fault that is detected by fewer than
hardware overhead due to the U-FIFO is very small. expanded patterns, from the fault list and try to detect by
specifying inputs of . If it is found that any combination
C. Computing Generators for High -Detection Coverage of input assignments cannot detect , then we select another
Like generators for 1-detection testing, generators for -de- hard fault from the fault list and repeat the process for the se-
tection testing are also computed from 1-detection test cubes lected fault. Otherwise (specifying some inputs in detects
generated by a regular ATPG tool by the same algorithm de- ), we update and other expanded patterns for to re-
scribed in Section IV. The computed generators are divided flect the additionally specified inputs in . For example, since
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

input is specified to 1 (0) in an expanded pattern [see


Fig. 9(a)], of is set to 1 (0) accordingly. Note that
input is specified to 1 (0) in all , where ,
of .
The earlier process is repeated for the other three expanded
patterns of , , , and , in sequence until the number
of specified bits in becomes . After all inputs in
are tried, the same procedure is repeated for the next
generator , etc., until the expanded patterns for all MCGs
are tried. Typically, specifying additional inputs can detect only
a few or no hard faults in most expanded patterns. Hence, most
run time is wasted on trying to prove that a selected fault cannot
be detected by any combination of input assignments in each
expanded pattern. If there are many generators and many hard
faults in the fault list, then this procedure will require very long Fig. 10. Selecting best binary values for U s.
run time.
To reduce run time, we quickly filter out the faults that cannot
be detected by each expanded pattern without wasting time on
trying to detect those undetectable faults. If a binary value is 1) Generators are computed from a set of test cubes ,
implied at line by expanded pattern , then any combination and the computed generators are divided into MCGs and
of input assignments cannot detect the stuck-at fault at . SCGs according to the number of test cubes in the test
These faults are filtered out first. Then we filter out the faults cube subset from which each generator is derived.
whose unique sensitization path [28] is blocked. To detect a fault 2) The algorithm expands each MCG into partially
, its fault effect must pass through the unique sensitization path specified patterns, and assigns binary values to the inputs
of . We identify unique sensitization paths for all hard faults that are assigned binary values in the generator and the
and store them in a preprocessing step. Finally, we apply -path inputs whose scan values are provided by the R-TPG.
check [28] for the remaining faults; if there are no paths from 3) The algorithm computes the detection count for every
line to any output on which all lines are currently s, the fault in the fault list by applying expanded patterns of
faults at cannot be detected by any combination of input MCGs and test cubes of SCGs to the design. .
assignments. 4) The ATPG specifies inputs in each of the expanded
Next, we generate additional test cubes that detect hard faults patterns of MCG to detect hard faults.
for every SCG by a modified ATPG until the number of s in the 5) If there are expanded patterns of whose inputs are
generator becomes larger than or the number of care bits specified in step 4, update and the other expanded
becomes larger than . After additional test cubes are added patterns according to the values specified in these patterns.
into test cube subsets for SCGs, we apply the same procedure 6) Update the detection counts of the faults that are detected
used to specify inputs in expanded patterns of MCGs to the in step 4. If there exists any MCG that has not been
new MCGs (note that once an additional test cube is added for an processed, then , and go to step 4.
SCG, the SCG becomes an MCG). Next, we compute a seed for 7) For each SCG , keep adding test cubes that detect
of every generator and specify more s in the expanded hard faults to the corresponding test cube subset until the
patterns by simulating the F-TPG with the seed. Then we update number of conflicting inputs becomes greater than
detection counts for all faults remaining in the fault list and drop or the number of specified bits becomes greater than
the faults that are detected by different patterns. . Update all SCGs according to added test cubes.
Now the expanded patterns of every generator have 8) Do steps 4–6 on the new MCGs that were made by adding
or fewer unspecified inputs, i.e., only the conflicting in- more test cubes to the SCGs in step 7.
puts of each generator have not been specified. Now we assign 9) For every expanded pattern generated from each generator
best binary values to these inputs in each expanded pattern to (now all generators are MCGs), the algorithm finds
detect more hard faults. This procedure is described in Fig. 10. the best binary values for the inputs that are assigned s
Assume that . For each expanded pattern , where in to detect the most hard faults. Exit.
, we generate test patterns, each of which is as-
signed a different combination of binary values to the inputs that D. Multiple Scan Chain Design
are assigned s in as shown in Fig. 10. Then, we fault sim- Fig. 11 depicts the -detection version decompressor for a
ulate the circuit with these patterns and select the test pattern circuit with 512 scan chains. Like the 1-detection version de-
that detects the most hard faults among the patterns for each compressor shown in Fig. 6, the 512 scan chains are organized
expanded pattern . When this procedure is complete, we can into 64 groups to reduce hardware overhead. Initially, the F-TPG
determine fully specified test patterns that will be generated by is selected as the test pattern source for all scan chains. Since the
the decompressor. This procedure is repeated for the other gen- group ID of the first entry of the S-FIFO is 1, signal is set to
erators. 1 and all the other signals , where , are set to
The overall algorithm for computing generators for high 0. Since the scan input location of the first entry of the S-FIFO
-detection coverage is summarized as follows. is 13, the T flip-flop inside the S-TPG flips to 1 in the 13th scan
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 11

Fig. 12. Generator for multiple scan chain design.

different values from the R-TPG.) Hence, if the th inputs of


scan chains in are assigned in , then the th inputs
of all scan chains in should be assigned identical values
in every test cube in the corresponding subset ( is identical
with any value, 0, 1, or ).
In Fig. 12, assume that and belong to the same
group. First, is added into the current test cube subset .
Note that the 0 assigned at , the 13th input of , is not
identical with the 1 assigned at , the 13th input of ,
in . Adding into causes a conflict at and updates
the 0 at to in . Therefore, and will be as-
signed the same value during the test application although they
Fig. 11. n-detection decompressor for 512 scan chains. are assigned opposite values in . This problem is solved if we
can relax the 1 at (or the 0 at ) to in (it can be
relaxed if all faults that are detected by the original test cube
shift cycle, and the scan value generated by the U-TPG, i.e., can still be detected after the relaxation). Otherwise, i.e., if
the 0 stored in the last stage of the shift register in the U-TPG, neither the 0 at nor the 1 at in can be relaxed,
is scanned into all the scan chains in (scan chains in is removed from , placed back to , and is updated
the other groups are loaded with scan values generated by the accordingly. If there are more test cubes where the 13th input
F-TPG). Then, the S-FIFO is shifted up by one entry and the of any scan chain is assigned a value that is not identical to the
second entry becomes the first entry. Note that some of 13th inputs of other scan chains in the group, then we repeat the
14th scan inputs in are assigned binary values. Hence, earlier procedure for these test cubes until the 13th inputs of the
the scan values for these scan inputs should be provided by the scan chains in the group are always assigned identical values
F-TPG. On the other hand, the 14th to 83rd scan inputs of scan in every test cube in . Assume that another test cube is
chains in are assigned only s in . Hence, these added into the current test cube. It causes a conflict at .
scan inputs can be assigned values generated by the R-TPG, and However, since the 121st inputs of all scan chains in the group
is selected as the second entry of the S-FIFO. are assigned identical values in every test cube , , and
In the next cycle, the 1 at the T flip-flop shifts into the D (assume that values at the 121st inputs of the other scan chains
flip-flop, and both and are set to 1 in . Hence the in the group than and are also identical), we do
scan chains in receive their scan data from the R-TPG not have to relax the 121st input of any scan chain or remove
until the content of the counter becomes 84. In the 84th shift any test cube from .
cycle, the T flip-flop flips to 0 and all scan chains in the design
receive their scan data from the F-TPG. The operation described VIII. EXPERIMENTAL RESULTS
before for the pair of entries and is repeated for
other pairs of entries, which are located below in the A. 1-Detection Testing
S-FIFO. The second to the last entry is . Hence, in the Table I compares compressions achieved by using only
133th cycle, all the scan chains in receive value 1 from LFSR reseeding (columns under the heading Only LFSR Re-
the U-TPG. The F-TPG will continue providing test patterns to seeding), the proposed method along with LFSR reseeding, and
all scan chains until the scan pattern is fully loaded in the 256th other recent compression techniques [1], [2], [11]. For LFSR
shift cycle. Then, the response to the test pattern is captured into reseeding, we used our proprietary high-compression LFSR
scan flip-flops and the entries in the U-TPG shift up by one entry. reseeding technique. Scan cells were routed into ten scan chains
Note that the th scan inputs of all eight scan chains in in every benchmark circuit. The columns # pat give the number
that are assigned s in a generator , e.g., the 13th scan inputs of test patterns applied and the columns # store (bits) give the
of scan chains in and the 113th inputs of scan chains in total number of (compressed) data bits that need to be stored in
, are assigned the same value from the U-TPG while test the ATE memory. For the proposed method, we show results
patterns are generated for . (In contrast, in the 1-detection obtained by using all the three different decompressor schemes:
version shown in Fig. 6, these conflicting inputs are assigned the basic scheme (see Fig. 2), variation-I [see Fig. 5(a)], and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

TABLE I
EXPERIMENTAL RESULTS

TABLE II in [2] are also larger than those of storage bits of the proposed
RESULTS ON INDUSTRIAL DESIGNS (VARIATION-R) method for most circuits. However, since numbers of test pat-
terns are not reported in [2] and the compression depends on the
number of test patterns generated, fairness of the comparisons
with [2] is limited. We compare results of the proposed method
with another multilevel compression method [1]. Seeds, which
are obtained by LFSR reseeding, are further compressed by a
seed compression process in [1]. Finally, in the last column [29]
variation-R shown [see Fig. 5(b)]. For the LFSR reseeding FDR, compressions obtained by using frequency-directed run
(the heading Only LFSR Reseeding) and the proposed method, length (FDR) codes [30] for the circuits whose scan cells are
we first applied a sequence of pseudorandom patterns to drop specially routed to further reduce test data volumes are com-
easy-to-detect faults. The number of initial pseudorandom pared. Except s38417, the number of storage bits for the pro-
patterns is shown in the parenthesis in the column # pat under posed method is much smaller than that of the seed compression
the heading Only LFSR Reseeding. Then, for the remaining technique [1].
undetected faults, we generated deterministic test patterns Table II shows results of gate delay patterns for indus-
by an in-house ATPG and compressed them only by LFSR trial designs. The broadside (launch-off-capture) scheme
seeding or the proposed compression method. The number of was used to apply delay test patterns for every case. The
pseudorandom patterns applied to drop easy-to-detect faults column # FF gives the number of flip-flops in the circuit.
is included in the total number of test patterns reported in The columns under the heading Determin show results on
the columns # pat. The columns # Gen show the number of highly compacted deterministic delay test patterns generated
generators. by an in-house ATPG. Results obtained by using the pro-
The results clearly demonstrate that the proposed method can posed method (the variation-R was used) are given under the
efficiently improve compression ratios that are achieved when heading Proposed. The columns FE% give achieved fault effi-
only LFSR reseeding is used. Large reductions were achieved ciency, which is given by
especially for ITC benchmark circuits; numbers of storage bits . The compression ob-
for the proposed method are only about 1/2–1/3 of those of tained by using the proposed scheme is shown in the last
storage bits for LFSR reseeding for all ITC benchmark circuits. column, labeled CR. The compression is calculated as the ratio
Note that the variation-R scheme reduced the number of storage of storage required for highly compacted deterministic patterns
bits by a factor of about 3.4 for b17s without any increase in the to that required by the proposed scheme. Over 1600X compres-
number of patterns. Among the three different decompressors, sion was achieved for D3 by the proposed method. Note that the
the basic scheme achieved the highest compression and the vari- number of test patterns increased only about 33% against the
ation-R scheme generated the smallest number of patterns (the deterministic test set. About 500X compression was achieved
number of decompressed patterns generated by the variation-R for D2 and the increase in the number of test patterns is only
decompressor is always same as that of the deterministic pat- 45%. Note that higher compressions are achieved for larger
terns compressed by the proposed method). designs. The column TR gives the factor of reduction in total
We first compare ours with another hybrid BIST [11]. Since test cycles (the number in the parenthesis of the same column
[11] applied very long (32 000 patterns) sequences of pseudo- is the number of scan chains in the design). Since test patterns
random patterns, we also conducted experiments with long se- are internally generated in the proposed method, the number of
quences of pseudorandom patterns for fair comparisons. These scan chains need not be limited by the number of scan channels
results are shown in the columns and . Even that can be provided by the ATE. For deterministic test pattern
if a shorter test sequence was applied, the number of storage bits results, we used 16 scan chains for every design. Since the
for the proposed method is a lot smaller than that of storage bits increase in pattern count is small, the proposed compression
in [11] for every circuit except s15850. Numbers of storage bits method can also reduce the test application time significantly.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

WANG et al.: LOW OVERHEAD HIGH TEST COMPRESSION TECHNIQUE 13

TABLE III
n-DETECTION RESULTS

B. -Detection Testing ATPG). The columns bridge cov% compare bridging fault
coverage achieved by test patterns generated by the proposed
Experimental results for the -detection version are reported decompressor, and 1- and 5-detection ATPG test patterns. The
in Table III. The column # sa flts gives the number of collapsed test patterns generated by the proposed decompressor achieved
single stuck-at faults that were used to generate test patterns higher bridging fault coverage than 1-detection ATPG test sets
while the column # br flts gives the number of bridging faults but lower coverage than 5-detection ATPG test sets.
used for bridging fault simulation. These faults were randomly
IX. CONCLUSION
generated using nonfeedback AND/OR bridging fault model. Re-
sults (columns under the heading Proposed) obtained by the In this paper, a test data compression scheme that can be used
proposed compression technique are compared with results ob- to further improve compressions achieved by LFSR reseeding
tained by 5-detection test sets generated by an ATPG tool, which is presented. The proposed method consists of a novel decom-
was implemented based on the algorithm proposed by Huang pressor architecture and an efficient algorithm to compute gen-
[31] (columns under the heading 5-det ATPG). We also report erators (weight sets) that lead to minimum test data volume.
results of traditional 1-detection test patterns (the columns under The proposed decompressor can be implemented with very low
the heading 1-det ATPG). area overhead. Two variations of the decompressor, which can
The column # stgs gives the number of stages for the F-TPG be adopted for different testing requirements such as short test
of the proposed decompressor. The number of patterns gener- time application, are also proposed. Unlike most commercial
ated by the proposed decompressor is little larger than that of test data compression tools, the proposed method requires no
5-detection ATPG patterns for most circuits except s13207 and special ATPG that is customized for the proposed compression
s38417 (see columns # pat). While for s13207, the proposed de- scheme, and can be used to compress test patterns generated by
compressor generated even smaller number of test patterns than any ATPG tool.
the 5-detection ATPG, for s38417, the proposed decompressor Experimental results show that the proposed method can
generated about 5.8 times more patterns than the 5-detection effectively improve compressions achieved by LFSR reseeding
ATPG. The total number of bits that need to be stored in the ATE without increasing test sequence length significantly. Over
memory (the column stor bits) includes the F-TPG seeds, the 1600X compression was achieved for a large industrial design
U-FIFO data, and the S-FIFO data. The number of storage bits with only about 30% increase in the number of test patterns
for the proposed technique is about 1/17–1/86 of that of storage against a highly compacted deterministic test set. Numbers of
bits for the 5-detection ATPG, i.e., 17X–86X compressions are test patterns generated by the proposed method are comparable
achieved by the proposed compression technique. Results show to those of highly compacted deterministic test patterns for
that the proposed method can also achieve up to 19.4X com- most circuits. The test data to be stored in the ATE memory are
pression against 1-detection ATPG test sets. much smaller than that for previously published schemes, and
The column “ ” (“ ”) shows the number of the number of test patterns that need to be generated is smaller
faults that are detected by less than three (five) test patterns. than other weighted random pattern testing schemes.
The test patterns generated by the proposed decompressor The proposed test data compression scheme is extended to
achieved over 99% 3-detection fault coverage for all circuits achieve high -detection coverage with little modification. The
except s13207; 171 faults were detected less than three times -detection version of the proposed compression technique first
for s13207. This is mainly because the ATPG aborted gener- merges a 1-detection test set generated by a regular ATPG into
ating test cubes for large number (76) of faults when it was several generators. Then, generators are modified to achieve
generating 1-detection test cubes to be compressed by the high -detection coverage. Since the test data are compressed
proposed method. The 5-detection ATPG, which was modified from a 1-detection test set rather than an -detection test set,
from the same ATPG, also gave up generating 5-detection the proposed technique can achieve high compression. Exper-
test patterns for many (79) faults for s13207. Test patterns imental results demonstrate that the proposed technique can
generated by the proposed decompressor achieved very high achieve high -detection fault coverage.
5-detection fault coverage (very close to 100%) for the two
REFERENCES
largest benchmark circuits, s38417 and s38584. Note that the
[1] C. V. Krishna and N. A. Touba, “Reducing test data volume using
1-detection test cube sets achieved very low 5-detection fault LFSR reseeding with seed compression,” in Proc. Int. Test Conf., 2002,
coverage (see the column “ ” under the heading 1-det pp. 321–330.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

[2] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded deter- [25] S. Wang, Z. Wang, W. Wei, and S. T. Chakradhar, “A low cost test data
ministic test,” IEEE Trans. Comput.-Aided Design Integr. Circuit Syst., compression technique for high n-detection fault coverage,” in Proc.
vol. 23, no. 5, pp. 776–792, May 2004. Int. Test Conf., 2007, pp. 1–10.
[3] B. Könemann, “LFSR-coded test patterns for scan designs,” in Proc. [26] P. Goel and B. C. Rosales, “Test generation & dynamic compaction of
Eur. Des. Test Conf., 1991, pp. 237–242. tests,” in Dig. Papers Test Conf., 1979, pp. 182–192.
[4] L.-T. Wang, C.-W. Wu, and X. Wen, VLSI Test Principles and Archi- [27] J.-S. Chang and C.-S. Lin, “Test set compaction for combinational cir-
tectures. San Francisco, CA: Morgan Kaufmann, 2006. cuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuit Syst., vol. 14,
[5] P. Wohl, J. A. Waicukauski, S. Patel, and M. B. Amin, “Efficient com- no. 11, pp. 1370–1378, Nov. 1995.
pression and application of deterministic patterns in a logic BIST archi- [28] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems
tecture,” in Proc. IEEE-ACM Design Autom. Conf., 2003, pp. 566–569. Testing and Testable Design. New York: Computer Science, 1990.
[6] International Technology Roadmap for Semiconductors, “Test & Test [29] S.-J. Wang, K. S.-M. Li, S.-C. Chen, H.-Y. Shiu, and Y.-L. Chu, “Scan-
Equipment,” 2005. chain partition for high test-data compressibility and low shift power
[7] M. Bershteyn, “Calculation of multiple sets of weights for weighted under routing constraint,” IEEE Trans. Comput.-Aided Des. Integr. Cir-
random testing,” in Proc. Int. Test Conf., 1993, pp. 1031–1040. cuit Syst., vol. 28, no. 5, pp. 716–727, May 2009.
[8] A. P. Strole and H.-J. Wunderlich, “TESTCHIP: A chip for weighted [30] A. Chandra and K. Chakrabarty, “Test data compression and test re-
random pattern generation, evaluation, and test control,” IEEE J. Solid- source partitioning for system-on-a-chip using frequency-directed run-
State Circuits, vol. 26, no. 7, pp. 1056–1063, Jul. 1991. length (FDR) codes,” IEEE Trans. Comput.-Aided Design Integr. Cir-
[9] B. Könemann, “Care bit density and test cube clusters: Multi-level cuit Syst., vol. 52, no. 8, pp. 1076–1088, Jun. 2002.
compression opportunities,” in Proc. IEEE Int. Conf. Comput. Des., [31] Y. Huang, “On N-detect pattern set optimization,” presented at the Int.
2003, pp. 320–325. Symp. Quality Electron. Design, San Jose, CA, 2006.
[10] B. Könemann, “STAGE: A decoding engine suitable for multi-com-
pressed test data,” in Proc. Asian Test Symp., 2003, pp. 142–145.
[11] A. Jas, C. V. Krishna, and N. A. Touba, “Weighted pseudorandom hy-
brid BIST,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12,
Seongmoon Wang received the B.S. degree from
no. 12, pp. 1277–1283, Dec. 2004.
ChungBuk National University, Cheongju, Korea,
[12] S. Wang, K. J. Balakrishnan, and S. T. Chakradhar, “XWRC: Exter-
in 1988, the M.S. degree from Korea Advanced In-
nally-loaded weighted random pattern testing for input test data com-
stitute of Science and Technology, Daejeon, Korea,
pression,” presented at the Int. Test Conf., Austin, TX, 2005.
in 1991, and the Ph.D. degree from the University
[13] S. Wang, “Low hardware overhead scan based 3-weight weighted
of Southern California, Los Angeles, in 1998, all in
random BIST,” in Proc. Int. Test Conf., 2001, pp. 868–877.
electrical engineering.
[14] I. Pomeranz and S. Reddy, “3-weight pseudo-random test generation
He was a Design Engineer at GoldStar Electron,
based on a deterministic test set for combinational and sequential cir-
Korea, and a DFT Engineer at Syntest Technologies
cuits,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 12,
and 3Dfx Interactive. He is currently a Senior
no. 7, pp. 1050–1058, Jul. 1993.
Research Staff Member in the NEC Laboratories
[15] B. Könemann, K. D. Wagner, and J. A. Waicukauski, “Hybrid pattern
America, Princeton, NJ. His current research interests include design for
self-testing of integrated circuits,” U.S. Patent 005 612 963A, May 18,
testability, computer-aided design, and self-repair/diagnosis techniques of
1997.
very-large-scale integration.
[16] S. Wang, W. Wei, and S. T. Chakradhar, “A high compression and short
test sequence test compression technique to enhance compressions of
LFSR reseeding,” in Proc. Asian Test Symp., 2007, pp. 79–86.
[17] I. Polian, I. Pomeranz, S. M. Reddy, and B. Becker, “Exact computa-
tion of maximally dominating faults and its application to n-detection Wenlong Wei received the B.S. degree in biological
tests for full-scan circuits,” Proc. Inst. Electr. Eng., vol. 151, no. 3, pp. science from Nanjing University, Nanjing, China, in
235–244, May 2004. 1996, and the M.S. degree in electrical engineering
[18] C.-W. Tseng, S. Mitra, S. Davidson, and E. J. McCluskey, “An evalua- from the University of Texas, Arlington, in 2004.
tion of pseudo random testing for detecting real defects,” in Proc. VLSI In December 2004, he joined the NEC Laborato-
Test Symp., 2001, pp. 404–409. ries America, Princeton, NJ, where he is currently
[19] I. Pomeranz and S. M. Reddy, “On test data compression and n-de- an Associate Research Staff Member. His current re-
tection test sets,” in Proc. IEEE-ACM Des. Autom. Conf., 2003, pp. search interests include test compression, low-power
748–751. test, and defect diagnosis.
[20] S. Lee, B. Cobb, J. Dworak, M. R. Grimaila, and M. R. Mercer, “A new
ATPG algorithm to limit test set size and achieve multiple detections
of all faults,” in Proc. Des. Autom. Test Eur., 2002, pp. 94–99.
[21] S. C. Ma, P. Franco, and E. J. McCluskey, “An experimental chip to
evaluate test techniques experiment results,” in Proc. Int. Test Conf.,
1995, pp. 663–670. Zhanglei Wang received the B.Eng. degree from Ts-
[22] K.-J. Lee, J.-J. Chen, and C.-H. Huang, “Using a single input to support inghua University, Beijing, China, in 2001, and the
multiple scan chains,” in Proc. IEEE Int. Conf. Comput.-Aided Des., M.S.E. and Ph.D. degrees in computer and electrical
1982, pp. 74–78. engineering from Duke University, Durham, NC, in
[23] I. Hamzaoglu and J. H. Patel, “Reducing test application time for full 2004 and 2007, respectively.
scan embedded cores,” in Dig. Papers, 29th Int. Symp. Fault-Tolerant He is currently a Hardware Engineer at Cisco Sys-
Comp., 1999, pp. 260–267. tems, Inc., San Jose, CA. His current research inter-
[24] H. Tang, G. Chen, C. Wang, J. Rajiski, I. Pomeranz, and S. M. Reddy, ests include test compression, test pattern grading,
“Defect aware test patterns,” in Proc. Des. Autom. Test Eur., 2005, pp. test generation, high-speed test, and system-level test
450–455. and diagnosis.

You might also like