You are on page 1of 6

Diagnosis of Scan Chain Failures

Yuejian Wu
Northern Telecom, P.O. Box 3511, Station C, Ottawa, Ontario, Canada
Abstract
This paper first analyzes faulty scan chain behaviors. In addition to stuck-at faults, we also consider timing faults due to hold time violations. Test sequences to determine the fault types in a failing scan chain are presented. This is followed by a presentation of two scan design techniques that
simplifies scan chain fault diagnosis for both stuck-at and timing faults.

1. Introduction
Scan design is the most popular DFT methodology in the VLSI industry. Many scan-based
test/diagnostic tools assume operational scan chains even if the circuit under test is faulty. Otherwise, it is impossible to apply any scan test vector. Therefore, when zero or very low yield occurs
due to scan chain failures, it is important to locate the fault and correct the design and/or fabrication errors so that scan test can be performed in production test. Unfortunately, with standard scan
design methods, scan chain failure diagnosis is very difficult or even impossible. Recently, a few
scan chain diagnostic techniques have been reported. In [1], a technique uses a sequential ATPG to
generate diagnostic vectors without modifying scan chain. Unfortunately, its CPU time can be prohibitive for large designs and its resolution is often inadequate. In [2], a new scan design was proposed, where the output of each flip-flop on a chain is sampled by a flip-flop on another chain. So,
when a chain fails, one can always shift a diagnostic vector into another chain and have it loaded
into the failing chain. It costs a pair of wires between each pair of flip-flops from different chains
plus a global diagnostic control. In [3], another scan design method was proposed. With the addition of an extra XOR gate to each flip-flop and a global diagnostic control, it makes scan diagnosis
straightforward. Besides the cost for routing the global diagnostic control, the cost of the XOR
gates can become significant for large designs. In order to save silicon for routing the global diagnostic control, another technique was proposed in [4]. It takes advantages of a special class of scan
flip-flops, where each flip-flop includes a dedicated scan-out latch in addition to a normal masterslave scan flip-flop. Extra circuit was added to each flip-flop to detect a 1-to-0 transition upon SM
(scan mode) when CLK (or some times SCLK, a dedicated scan clock) is 0. Upon the detection of
such a event, it forces a constant value into the scan-out latch, which is then shifted out for analysis. Stuck-at faults are assumed in all the previous work. This paper presents two alternative scan
chain diagnostic techniques that consider both stuck-at and timing faults in scan chains.

2. Faults in Scan Chains


Many factors can contribute to scan chain failures. First, defects in a flip-flop or at its I/O
ports can cause it to fail. E.g., shorts or broken wires at the SI (scan in) input or Q output of a flipflop may cause stuck-at fault behaviors. Furthermore, some internal defects may also show up as
stuck-at faults at the SI or Q port of the faulty flip-flop. In addition to stuck-at faults, another
important cause of scan chain failures in practice is hold time violations. In a scan chain, the Q
output of a flip-flop is connected directly to the SI input of another flip-flop, as shown in Figure 2
(a). This makes scan flip-flops susceptible to hold time problems at SI inputs.
The cause of hold time violations are various. First, process variations and clock skews
can cause hold time violations. More importantly, defects in flip-flops can also cause hold time
problems at its SI input. E.g., a delay fault in the clocking circuitry in a flip-flop can behave like a

hold time fault at its SI input as the path to SI input is usually the fastest path in a whole design.
Depending on the amount of clock skew (due to whatever reason), there are three types of hold
time problems for scan chains shift operation. In a type-I problem, a faulty flip-flop captures
incorrect data if and only if its SI transits from 0 to 1. The cause for such a fault is that the Q of the
flip-flop that feeds the faulty one has faster rise time than fall time and the clock skew is just large
enough to fail the rising transition at the SI but small enough to pass the falling transition. Such
failures have been observed in practice. Similarly, if the proceeding flip-flop has faster fall time on
its Q, a faulty flip-flop fails if and only if its SI transits from 1 to 0 in a type II problem. In a typeIII problem, a flip-flop fails whenever its SI transits. This happens for large clock skews.
The behaviors of stuck-at faults and hold time faults in a scan chain are different. With a
stuck-at fault, the sequence shifted out from the chain is always constant of the stuck-at fault value
no matter what sequence is shifted in the chain. For a hold time fault in a scan chain, it does not
prevent the chain from shifting through sequences of both 0s and 1s. However, if transitions exist
in the sequence, the chain appears shorter, meaning that the bit following each problematic transition always gets shifted out of the chain a cycle earlier than expected. Figure 1 shows some example test responses due to a single hold time fault. It should be pointed out that the sequences shown
in Figure 1 are sequences observed at the scan output (SO) of a scan chain and are independent of
fault locations. As shown in Figure 1, a type-I fault generates an extra 1 for each 0-to-1 transition
shifted through the faulty flip-flop; a type-II fault generates an extra 0 for each 1-to-0 transition;
and a type-III fault makes an extra shift of all transitions towards the scan out of the chain.
FIGURE 1. Example faulty behaviors of single hold time faults in a scan chain

good response:
response:
response:
response:

Type-I
Type-II
Type-III

0001011100
0001111100
0000001100
0000101110

shift direction

Figure 2 (a) shows a scan chain with a hold time problem at the SI input of flip-flop i due
to a clock skew. Figure 2 (b) shows a model for the hold time problem. As shown in the model, if
a problematic transition occurs at the faulty flip-flop is input, it incorrectly captures the data from
flip-flop i+2 as opposed to from flip-flop i+1.
problematic
transition detector

FIGURE 2. A scan chain hold time fault model


si q

si q

si q

si q

si q

si q

0
1

si q

si q

skew

flop i+2

flop i+1 flop i flop i-1

(a) a scan chain with a hold time problem at flop i

flop i+2 flop i+1


flop i flop i-1
(b) a model for a hold time problem at flop i

3. Determine Fault Types


This section presents simple techniques to determine fault types in a failing scan chain.

3.1 Stuck-At Faults


To test a scannable ASIC on a tester, the first test is usually a flush test. During flush test, a
sequence of 0s and 1s is shifted into the chain and after certain number of clock cycles the same
sequence is expected to be shifted out. If a chain fails due to a stuck-at fault, the sequence shifted
out will be a sequence of either all-0s or all-1s. However, the observation of an all-0 or all-1
sequence during flush test cannot guarantee the existence of stuck-at faults. E.g., for a flush test
sequence of ...00110011..., if there exist two type-I faults in the chain, the test response will also be
a sequence of all-1s.

To determine if the fault in a failing scan chain is a stuck-at fault, we suggest to use two
flush test sequences, one of all-0s and the other of all-1s. If the response to the all-0 sequence is
an all-1 sequence, the fault must be stuck-at-1. Similarly, if the response to the all-1 sequence
becomes all-0, the fault must be stuck-at-0. This is because the suggested test sequences contain
no transition and thus will not trigger any hold time violation.

3.2 Hold Time Faults


To determine whether the fault in a failing scan chain is due to hold time violations, we
propose to use the sequences shown in Figure 3. Based on the test responses to the tests shown in
Figure 3, the fault type can be determined as follows:
1. Observation of extra 1s for sequence 1 but no extra 0s for sequence 2 indicates the existence of only type-I faults. The number of extra 1s corresponds to the number of faults.
2. Observation of extra 0s for sequence 2 but no extra 1s for sequence 1 suggests the presence of only type-II faults. The number of extra 0s is equal to the number of faults.
3. If the pulses in both sequences 3 and 4 are shifted out of the scan chain earlier than
expected, there must be type-III faults. The number of extra shifts of the pulses corresponds
to the number of faults.
FIGURE 3. Flush test to determine hold time faults in scan chains
N-bits

N-bits

sequence
sequence
sequence
sequence

1: ...111...111000...000...
2: ...000...000111...111...
3: ...000...000100000000...
4: ...111...111011...111...

shift direction
Note: N represents the scan chain length.

Similarly, we may also determine the combination of different types of hold time faults.

4. Fault Diagnosis by Flipping Scan Flip-Flops


This section presents a new scan chain diagnostic technique. Figure 4 (a) shows a scan
chain of five flip-flops (ignore the signal diag and port dm for now). Figure 4 (b) shows an example of diagnosing a stuck-at-1 fault at the output of flip-flop 3. The 0s and 1s in Figure 4 (b) represent the state of each flip-flop during different stages of the diagnosis with x being unknowns.
As the first line of Figure 4 (b) shows, after 5 clock cycles when shifting in all-0s, a 1
appears at the output of the chain due to the fault. Now, let us assume that by asserting signal diag
= 1 the state of each flip-flop inverts as the second line of Figure 4 (b) shows. Then, we set diag =
0 again and assume the inverted state stays. Now, if we shift the scan chain, the number of clock
cycles it takes to observe the first 1 at the scan out of the chain indicates the position of the fault. In
this case, the first 1 is observed at the 3rd clock, suggesting that flip-flop 2 has been affected by a
fault but flip-flop 3 has not. In other words, the fault is between flip-flop 3s output and flip-flop 2s
input, which corresponds to the assumed fault location.
FIGURE 4. Fault diagnosis by flipping scan flops.
data sampled

sa1

diag
dm

clk

dm

dm

dm

dm

si q

si q

si q

si q

si q

(a) a scan chain of five flops

sa1
flop 3
on a tester
after shifting
five 0s into the chain: 0 0 0 1 1 x
after a postive pulse on diag: 1 1 1 0 0 x shift
direction
shift out the chain: 1 1 1 1 0 0
1 1 1 1 1 0

(b) the states of the flops 1 1 1 1 1 1

error
observed

Figure 5 shows a logical representation of a modified scan flip-flop that is able to invert its
state when dm (diagnostic mode) is set to 1. When dm = 0, the modified flip-flop behaves like a
normal scan flip-flop with an extra mux delay added to its SI input. When dm = 1 and SM = 1, the
flip-flop complements its state at a clock edge. The modification has no performance impact on the

data path. Figure 5 is a logical representation of the modification. In reality, a flip-flop cell could
be modified to include the additional mux. In this case, the extra mux would cost only 6-8 transistors with reduced routing complexity.
FIGURE 5. An example implementation of scan flip-flops with dm ports
SM
D

0
0

SI

dm

Q
QB

CLK

4.1 Fault Diagnostic Procedure


This section presents a diagnostic procedure and discusses its diagnostic resolution and
the limitations. To diagnose a scan chain of length N, the diagnostic Procedure I is as follows:
1. shift a sequence of ...010101... of length N into the scan chain, with diag = 0 and SM = 1;
2. set diag =1 and SM = 1 and apply one clock cycle to invert the state of each flip-flop;
3. set diag = 0, SM = 1 and shift out the scan chain and record the number clock cycles it
takes to observe the first transition. If it takes m clock cycles, then the fault exists between
the mth flip-flops output and the (m-1)th flip-flops input.

4.2 Diagnosing Stuck-At Faults


Figures 6 (a) and (b) respectively illustrate the diagnosis of a stuck-at-0 (sa0) and a stuckat-1 (sa1). The faults are assumed at the output of the 3rd flip-flop. In both cases shown in Figure 6,
it takes 3 clock cycles for a tester to observe a first transition. This indicates a fault between flipflop 3s output and flip-flop 2s input, which matches the fault location assumed.
FIGURE 6. Diagnosis of stuck-at faults
sa0@flop 3

after step 1:
after step 2:
start step 3:

1
0
1
0
1

0
1
0
1
0

1
0
1
0
1

0
1
0
0
0

0
1
1
0
0

x
x
1
1
0

(a) diagnosis of sa0

sa1@flop 3
1 0 1
0 1 0
1 0 1
0 1 0
transition observed at 1 0 1
rd

data sampled at the


output of the chain

at the 3 clock cycle

data sampled at the


output of the chain
1
0
1
1
1

1
0
0
1
1

x
x
0
0
1

shift direction

transition observed at
at the 3rd clock cycle
(b) diagnosis of sa1

4.3 Diagnosing Hold Time Faults


Assuming a hold time fault at the input of flip-flop 3 shown in Figure 4 (a), Figures 7 (a)
and (b) illustrate the diagnosis of a type-I and a type-II faults respectively. As shown in Figure 7, a
transition is observed after 4 clock cycles in both cases. This indicates a fault between flip-flops 4
and 3 in both cases, which agrees with the assumed fault location.
In addition to a single fault, Procedure I is also applicable in the presence of multiple typeI and type-II faults. E.g., if two type-II faults exist, we can use the diagnostic sequence ...010101...
to locate the first fault closest to the chains input. Once this sequence passes the first fault, it
becomes all-0, which will not trigger the second fault. Thus, the diagnosis for the first fault is
exactly the same as described in Procedure I. To locate the second fault, we use a different
sequence ...011011... instead of the ...010101.... After this sequence is shifted through the first
fault, it becomes ..001001001.... When the sequence is further shifted through the second fault, it
becomes an all-0 sequence. Following steps 2 and 3, we can diagnose the second fault as well.
Although this scheme works well for both stuck-at and type-I/type-II faults, it is unable to
diagnose type-III fault. A second alternative scheme presented next will address this issue.

FIGURE 7. Diagnosis of type-I and type-II hold time faults


type-i@flop 3
type-ii@flop 3
1 x x x x x data sampled at the
starts step 1: 1 x x x x x data sampled at the

after step 2:
start step 3:

0
1
0
1
0

1
0
1
0
1

x
1
1
1
0

x
x
1
1
0

x
x
x
1
0

x output of the chain


x
x
x
x

1
0
1
0

0
1
0
1

1
1
1
1

0
1
1
1

0
0
1
1

0
0
0
1

0
1
0
1

1
0
1
0

x
0
0
0

x
x
0
0

x
x
x
0

x
x
x
x

output of the chain

shift direction

0 1 1 1 1 x

1 0
transition observed at 0 1
the 4th clock cycle
1 0
0 1

0
0
0
0

1
0
0
0

1
1
0
0

1
1 transition observed at
th
1 the 4 clock cycle
0

(b) diagnosis of type II fault

(a) diagnosis of type I fault

5. Fault Diagnosis by Setting/Resetting Scan Flip-Flops


This section presents a diagnostic technique that is able to cover all the above fault types
at the cost of slightly diminished diagnostic resolution.
The basic idea of this scheme is to set or reset flip-flops during diagnostic mode to load a
pre-defined pattern into the scan chain and then shift it out for analysis. A possible implementation
is shown in Figure 8. As shown, when signal diag = 1, every second flip-flop is reset to 0 and every
alternative flip-flop is set to 1. Thus, with diag = 1, the pattern ...010101... is loaded in the scan
chain (diag = 0 in mission and scan modes). Once the pattern is loaded, we then set diag = 0 and
SM = 1 to shift out the diagnostic pattern for analysis. The diagnostic Procedure II is as follows:
1. set diag = 1 to load the diagnostic pattern;
2. set diag = 0;
3. set SM = 1 and shift out the content of the scan chain for analysis. The observation of an
error after m clock cycles indicates a fault between flip-flop (m+1)s output and flip-flop (m1)s input. The reference response is identical to the diagnostic pattern loaded.
FIGURE 8. Fault diagnosis by setting/resetting scan flops

diag
clk

rst
si q

set
si q

flop i+2

flop i+1

rst
si q

set
si q

flop i

rst
si q

flop i-1

flop i-2

5.1 Diagnosing Stuck-at Faults


Figure 9 (a) and (b) illustrate the diagnosis of sa0 and sa1 respectively.
FIGURE 9. Diagnosis of stuck-at faults
data sampled at the
sa0@flop 3
output of the chain
after step 1:
1 0 1 0 1 x
in steps 2 and 3:

0 1 0 0 0 1
1 0 1 0 0 0
0 1 0 0 0 0

(a) diagnosis of sa0

data sampled at the


output of the chain

sa1@flop 3

an error observed at
the 3rd clock cycle

1
0
1
0
0

0
1
0
1
1
rd

1
0
1
0
0

0
1
1
1
1

1
0
1
1
1

x
1
0
1
1

shift direction
an error observed at
the 4th clock cycle
(b) diagnosis of sa1

As shown in Figure 9, an error is observed at the 3 clock cycle for the sa0 fault. This
indicates a fault between flip-flops 3 and 2, which matches the assumed fault location. However,
for the sa1 fault, an error is observed at the 4th clock cycle, which appears to suggest a fault
between flip-flops 4 and 3 even though the assumed fault is between flip-flops 3 and 2. This is
because the diagnostic vector at the fault site coincides with the faulty value, thus the fault effect is

masked. Therefore, we have to declare a fault between flip-flops 4 and 2, which spans 3 flip-flops.
In general, the diagnostic resolution for this scheme is 3 flip-flops.

5.2 Diagnosing Hold Time Faults


To diagnose hold time faults, we assume the presence of such a fault at the flip-flop 3s
input shown in Figure 4 (a). Figure 10 shows the diagnosis of type-I and type-II faults.
FIGURE 10. Diagnosis of hold time faults
type-i@flop 3
1 0
0 1
in steps 2 & 3: 1 0
0 1
1 0

starts step 1:

1
1
1
1
1

0
1
1
1
1

1
0
1
1
1

data sampled at the


output of the chain

x
1
0
1
1

an error observed at
the 4th clock cycle

(a) diagnosis of type-I fault

type-ii@flop 3
1 0
0 1
1 0
0 1
1 0
0 1

1
0
0
0
0
0

0
1
1
0
0
0

1
0
1
1
0
0

data sampled at the


output of the chain
x
shift direction
1
0
an error observed at
1
th
0 the 5 clock cycle
0

(b) diagnosis of type-II fault

As shown in Figure 10, for a type-I fault, the first error is observed at the 4th clock cycle,
which indicates a fault between flip-flops 4 and 3. This matches the assumed fault location. However, for the type-II fault at the same location, the first error is observed at the 5th clock cycle,
which appears to indicate a fault between flip-flops 5 and 4. This is because in the diagnostic pattern the first transition seen by the faulty flip-flop does not trigger the fault. Therefore, Procedure II
has to declare a fault between flip-flops 5 and 2, which spans 3 flip-flops.
Figure 11 illustrates the diagnosis of type-III faults. Figure 11 (a) assumes a fault at the
input of flip-flop 3 and Figure 11 (b) assumes a fault at the input of flip-flop 2. For such a fault at
the input of flip-flop 3, an error is observed at the 4th clock cycle. This indicates a fault between
the output of flip-flop 4 and the input of flip-flop 2. For a fault at the input of flip-flop 2 shown in
Figure 11 (b), an error is observed at the 3rd clock cycle. This suggests the existence of a fault
between flip-flops 3 and 2, which is exactly the fault location assumed.
FIGURE 11. Diagnosis of type-III faults
type-iii@flop 3
1 0
0 1
in steps 2 & 3:
1 0
0 1
1 0

starts step 1:

1
1
0
1
0

0
1
1
0
1

1
0
1
1
0

(a) diagnosis of type-iii fault@flop3

x
1
0
1
1

data sampled at the type-iii@flop 2


output of the chain
1 0 1 0
0 1 0 0
an error observed at 1 0 1 1
th
the 4 clock cycle
0 1 0 0

1
0
0
1

x
1
0
0

data sampled at the


output of the chain

shift direction
an error observed at
the 3rd clock cycle

(b) diagnosis of type-iii fault@flop2

6. Conclusions
In the case of zero or low yields due to scan chain failures during prototype/early production runs, it is important to locate the fault. In practice, in addition to stuck-at faults, timing faults
due to hold time violations are also an important cause of scan chain failures. This paper analyzed
the faulty behaviors of these faults and presented simple flush test sequences to distinguish these
faults. Two diagnostic techniques were also presented to simplify scan chain diagnosis.

Acknowledgment: The author would like to knowledge Dr. S. Adham and Mr. K. Brough
for helpful discussions and careful reading of the draft version of this paper.
References
[1] Kundu, S., On Diagnosis of Faults in a Scan Chain, Proc. VTS93., pp. 303-308.
[2] Schafer, J.L., et al., Partner SRLs for Improved Shift Register Diagnostics, Proc. VTS92, pp.
198-201.
[3] Edirisooriya, S., et al.., Diagnosis of Scan Path Failures, Proc. VTS95, pp. 250-255.
[4] Narayanan, S., et al.., An Efficient Scheme to Diagnose Scan Chains, Proc.ITC97.

You might also like