You are on page 1of 9

404

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007

An Overlapping Scan Architecture for Reducing Both Test Time and Test Power by Pipelining Fault Detection
Xiaoding Chen, Member, IEEE, and Michael S. Hsiao, Senior Member, IEEE
AbstractWe present a novel scan architecture for simultaneously reducing test application time and test power (both average and peak power). Unlike previous works where the scan chain is partitioned only based on the excitation properties of the ip-ops (FFs), our work considers both the excitation and propagation properties of the scan FFs. In the proposed scan architecture, the scan chain is partitioned to maximize the overlapping between the excitation and propagation on different fault sets. The scan architecture also allows the entire set of detectable faults in the circuit under test (CUT) to be detected with only a portion of the scan elements active at a time, and thereby completely eliminates the need for the serial full-scan mode which is inefcient for both the test time and test power. Experimental results show that by introducing minimal hardware overhead, and without sacricing fault coverage, an average peak power reduction of 22.8%, average power reduction of 41.6%, and an average reduction of 18.5% on the test application time can be achieved, compared with the ordinary full-scan architecture. Index TermsLow-power testing, pipelining of fault detection, scan architecture, test application time reduction.

I. INTRODUCTION OWER consumption and test application time have become critical issues in testing VLSI circuits. Several factors have led to the excessive power consumption during testing [7]. First, the rapidly increasing operating speed of the circuit under test (CUT) implies both heightened average and peak powers. Second, the need for at-speed testing worsens the power problem. Third, scan-based testing may introduce a rippling effect during the shifting phase and can potentially drive the state machine to a functionally unreachable state, leading to a potentially higher power than during the functional mode. The result of excessive power consumption during testing can permanently damage the CUT or render the CUT less reliable, such as elevated electromigration rate and reduced noise margins. Like-

Manuscript received January 10, 2006; revised June 19, 2006 and December 11, 2006. This work was supported in part by the National Science Foundation under Grant 0196470, Grant 0305881, and Grant 0417340, and by Semiconductor Research Corporation (SRC) under Grant 2005-TJ-1359.001. X. Chen was with the Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA. She is now with Synopsys Inc., Mountain View, CA 94043 USA (e-mail: xchen@synopsys.com). M. S. Hsiao is with Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061 USA (e-mail: mhsiao@vt.edu). Digital Object Identier 10.1109/TVLSI.2007.893657

wise, the enormous number of test vectors poses a serious economic concern as long as test application times can signicantly increase the cost of testing and data storage needs. For most deterministic automatic test pattern generation (ATPG) test sets, a large percentage (more than 90%) of data bits is dont care (X) bits. In other words, many bits can take on either 0 or 1. Research has shown that signicant performance improvement can be achieved in testing and verication by utilizing these dont cares intelligently. For example, in [2] and [3], the authors make use of the dont cares to simplify the verication of embedded memory system. Various schemes have also been proposed on lling Xs intelligently to reduce the test power or test time. The large portion of dont care bits embedded in the full-scan pattern set makes room for performing test time/power reduction techniques. If the longest scan chain in the CUT is of length , it takes cycles for every scan vector to be applied, which indicates the prolonged test application time compared with the nonscan architecture. Many scan architectures have been proposed to reduce test application time and data volume, such as the Illinois scan architecture (ILS) [5]. It has been observed that by using those methods, test application time is greatly reduced at the cost of elevated peak/average power. Scan-based architectures are popular as they improve the testability of the circuit. However, the power consumption in the test mode can be a potential threat since scanning in nonfunctional (unreachable) states may result in additional switching activities. Some built-in self test (BIST) architectures have been proposed in the literature to reduce the average power dissipation during testing, such as [1]. However, these techniques generally do not help in shortening the test application time. Although a challenge, some work has been proposed to reduce the test application time and the test power simultaneously. In [6], a multiple clock disabling (MCD) technique is proposed to conditionally disable scan cells. Together with scan chain reordering, signicant test application time reduction can be achieved on ISCAS89 [17] benchmark circuits. Though very powerful in performance, this technique requires a multipleinput signature register (MISR) whose stage equals or exceeds the total number of FFs in the CUT for propagating the fault effects. This may introduce high hardware overhead and the MISR itself may also become a big source of power consumption. In [16], a scan-based test scheme partitions the FFs into different sub-scan chains according to their excitation properties. Only the portion of the sub-scan chains that are necessary

1063-8210/$25.00 2007 IEEE

CHEN AND HSIAO: OVERLAPPING SCAN ARCHITECTURE FOR REDUCING BOTH TEST TIME AND TEST POWER

405

for fault excitation is active during the shifting phase, while the rest sub-scan chains are frozen. During the capture phase, every sub-scan chain loads the test response, while only some sub-scan chains are observed and the test responses in the unobserved sub-scan chains will act as part of the test stimuli for the next vector. Control and observation of a small subset of scan cells not only limit the test power but also reduce the test time. Random access scan (RAS) architecture consisting of a set of address registers and an address decoder is proposed to take the place of the ordinary serial scan chain architecture. With the RAS architecture, every ip-op can be accessed individually, which leads to a 3X speedup and over 99% reduction on the average power. As stated in [8], the cost is high hardware overhead for building the RAS and a big MISR for output response compaction. The three previously mentioned techniques only consider fault excitation for partitioning FFs and a MISR or spare register is added to guarantee the observation of fault effects. There are some techniques which target at both fault excitation and propagation to exclude the MISR. In [4], [10], and [12], FFs are clustered by analyzing both the input test set and its test response. Different phases during testing select whether the whole scan chain or only a portion of the scan chain is to be active. Test time and power can be saved if for most vectors, only a small partition of the scan chain needs to be active for fault detection. Note that in [4], [10], and [12], normal serial full-scan mode, where every FF in the scan chain is active in both shifting and capture phase, is inevitable to ensure no loss of test coverage. For the test controller to tell the scan phase for a vector (either in the serial full-scan mode or the partial-scan mode), additional data bits are needed in [4], [10], and [12]. There are also some other techniques available, such as scan array proposed in [9] and run-length codes introduced in [11]. Unlike the previously mentioned techniques, where FFs are clustered by considering only their excitation properties on target faults, this paper addresses a new scan architecture where FFs are partitioned based on both excitation and propagation properties. The excitation property of an FF reveals how important this FF needs to be controlled for exciting some faults; the propagation property of an FF reveals how important this FF needs to be observed for capturing some fault effects. The new scan architecture aims at maximizing the overlapping between the excitation and propagation of different fault sets and completely eliminating the need for the serial full-scan mode which could be a potential threat for peak power and test time. The FFs in the CUT have been divided into two categories according to their properties in detecting a target fault: 1) FFs that play both excitation and propagation and 2) FFs that play only excitation or propagation. Shown in Fig. 1, FFs in category , which is active all the 1) are put into a sub-scan chain time. FFs in category 2) are partitioned into two sub-scan chains and , which are alternatively active during test application. The partitioning criteria is that covers the excitation and the propagation space of fault set ; space of fault set covers the excitation space of fault set and the . covers the entire propagation space of fault set detectable fault set of the CUT. For the ordinary full-scan architecture, every pattern of width , that is, all the 7 7 is serially shifted into

Fig. 1. Our new overlapping scan architecture.

FFs shown in the top of Fig. 1. In the proposed overlapping scan architecture, patterns are of width 5. A pattern shifted into can excite some faults in , and the fault effects for which have been captured before are simultasome faults in neously shifted out. Similarly, a pattern shifted into can excite some faults in , and the fault effects for some faults in which have been captured before are simultaneously overlaps with the propagation shifted out. The excitations of overlaps with the excitaof ; likewise, the propagation of and tion of . Thus, the detection of different fault sets proceeds in a pipelined fashion. In this way, and do not need to be active simultaneously. The serial full-scan mode (where every FF is active both in the shifting phase and the capture phase) is completely avoided, because at any given time or , during test application, a portion of FFs, either those will not be active. In our scan architecture, excitation and propagation of faults which are partitioned into different fault sets are thus maximally pipelined. In doing so, we have two benets: 1) the effective shifting length of the scan chain has been reduced and 2) the percentage of switchable FFs inside each test cycle is reduced. These two benets can lead to savings on both test application time, average power, and peak power during testing. The main contributions of our work is as follows. 1) We cluster the FFs based on both excitation and observation properties to avoid the serial full-scan phase, which has been inevitable in previous approaches. In the conventional serial full-scan mode, every FF in the CUT has the possibility to switch in both shifting and capture phase, and the appearance of peak power spike is more frequent. 2) The selection of active sub-scan chains in our work is very straightforward, which makes our test controller much simpler than that in previous approaches [6], [10], [12], [16]. Our test controller can be implemented with a simple twostate nite-state machine (FSM), and it does not require additional precomputed control bits for selecting active sub-scan chains. 3) We introduce the least amount of hardware overheadno MISR is needed, which also benets diagnosis. In addition, after scan chain partitioning, the scan order of the FFs in each sub-scan chain does not matter.

406

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007

Fig. 2. Example on pattern application.

The rest of this paper is organized as follows. Section II gives the overview and motivations. Section III explains the underlying methodology for FF partitioning. Results are reported in Section IV, followed by conclusions in Section V. II. OVERVIEW AND MOTIVATION Fig. 2 shows how an ordinary full-scan test set can be modied and applied to our overlapping scan architecture, resulting in fewer test cycles, reduced FF switching activities, and no loss in fault coverage. In Fig. 2, each block represents an FF. The numbers above the blocks represent the indices of the FFs, with logic values indicated inside each of the FFs. For each scan vector, two phases are shown: the phase when scan shifting is completed and the phase when the capture is performed. A pattern-shaded FF in the capture phase represents that some

fault-effect has been propagated to this FF. A gray-shaded FF represents a nonactive FF whose clock is disabled and it retains its previous value. The top half of Fig. 2 shows the test application procedure in a full-scan architecture with a pregenerated deterministic test set with three scan vectors. The rst vector is 10X0XXX, and its capture response is 1101101. In the capture phase, fault effects have been detected on FFs #1, #4, and #7. Because there are seven FFs in the design, seven cycles are needed to shift in each scan vector and an additional cycle is needed to capture the response. The other two vectors can be explained in the same way. The bottom half of Fig. 2 shows how an intelligent partition on the FFs can be transformed into an overlapping scan archiconsists of three FFs: FFs #1, #4, tecture. Sub-scan chain

CHEN AND HSIAO: OVERLAPPING SCAN ARCHITECTURE FOR REDUCING BOTH TEST TIME AND TEST POWER

407

TABLE I COMPARISON OF ORDINARY FULL SCAN ARCHITECTURE WITH OUR NEW SCAN ARCHITECTURE ON TEST CYCLES AND POWER

and #5, which will be active all the time. The other four FFs have been divided into two sub-scan chains, outside of and . consists of FFs #2 and #6; consists of FFs #3 and and will not be simultaneously active. The partition #7. shown in this gure is a good partition for the three scan vectors because it meets the following criteria. 1) If a target fault cannot be detected without assigning a FF to play both roles of excitation and observation, this FF needs in order to retain the original fault to be placed into coverage. is formed, FFs outside are partitioned in 2) After play the role of excitation on some a way that fault set and propagate the corresponding fault effects to . Likewise, play the role of excitation on another fault set and propagate the fault effects to . This arrangement leads to the reduction on the number of test cycles that only shift in dont cares, and maximizes the pipelining between the excitation and propagation on different fault sets. covers the entire detectable fault set of the CUT. 3) 4) If criterion #3 is not satised, most remaining faults can be detected under a pipelined fashion with . When criterion #3 is not satised, the two scan chains and cannot be fully decoupled on some target faults. The full-scan test set needs to be modied by shifting in some specied values one vector ahead of time to pipeline the fault detection of together with . For example, in our scan architecture, for and overlapping scan vector #0, only sub-scan chains are active in the shifting phase, and will be nonactive for the capture phase of overlapping scan test #0 and the shifting phase of overlapping scan test #1. To prepare for overlapping scan vector #1, instead of shifting in X for FF #6 as specied in full-scan vector #0, the logic value 0 of FF #6 specied in full-scan vector #1 is shifted in. The same rule applies for overlapping scan vector #1, bit #7. If the compatibility among segments of the full-scan vectors allows detection of the to be pipelined with , there will be no addition on the test time for detecting . Table I compares the performance between the full-scan architecture and our new scan architecture on the example test set shown in Fig. 2. The second column Test Cycles reports the total test cycles. The third column Total Useless X Cycles reports the total number of useless X cycles. If a dont care bit shifted in does not shift out any fault effect from the capture response of the previous vector, that bit is dened as a useless X bit, which does not contribute to either fault excitation or fault detection. For example, bit #3, #5, #6, and #7 in full-scan vector #0 and bit #2 in full-scan vector #1 are useless X bits. The test cycle that shifts in a useless X bit is dened as a useless X cycle. The fourth column Total FF Dual Role Cycles reports the total

number of cycles when shifting in a specied bit into an FF for excitation results in the shifting out of fault effects, where excitation and propagation are really pipelined. For example, cycles for shifting in bit #1 and #4 in full-scan vector #1 are FF dual role cycles. For the example, vector set in Fig. 2, the total number of useless X cycles for ordinary full-scan architecture is 7, and it is reduced to 3 in our scan architecture; the total number of FF dual role cycles for ordinary full-scan architecture is 5, and it is increased to 6 in our architecture. The next two columns under Switches in Clock Tree report the number of switches in the clock tree per shifting/capturing cycle. The next six columns under Switches in Scan Chain on Capture Cycle report the number of FFs changing value after the capture cycle. These two measures give a rough estimate on peak power, which includes the power consumed in the scan chain, the combinational part of the CUT and the clock tree. Depending on the lling of Xs, this number may vary. We give both the upper bound and the lower bound, instead of a single value. Take full-scan vector #0 as an example. If the scan load vector is lled as 1000101, compared with the capture value 1101101, only two FFs change value after the capture cycle. On the other hand, if the scan load vector is lled as 1010010, six FFs change value. Shown in Table I, our scan architecture can reduce the number of FFs changing value after the capture cycle. Our new overlapping scan architecture is aimed at reducing both the test power and the test application time. This is achieved via a good partitioning (based on both excitation and propagation properties) on the FF set which can: 1) omit/reduce the useless X cycles which do not contribute to fault detection, 2) maximally pipeline the excitation and propagation on different fault sets, and 3) avoid the serial full-scan phase and allow portions of the scan chain to be disabled in both shifting and capture phase. III. UNDERLYING METHODOLOGY A. New ATPG Flow Fig. 3 explains the new ATPG ow for our overlapping scan architecture. A full-scan vector set is rst generated by is then analyzed via simulation to form an ATPG tool. ; the FFs outside are partitioned into and . Then, the ATPG is invoked again to generate another full-scan under dynamic constraint sets vector set and . is a subset of FFs in or ), which a currently nonactive scan chain (either have specied logic values in the previous active shifting phase. This constraint set tells the ATPG to generate a new vector only under the condition that the values of FFs inside are satised. tells the ATPG

408

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007

Fig. 3. Pattern generation ow of our new scan architecture.

Fig. 4. Example of ip-op assignments.

the sub-scan chain, either or , to which the fault effects are blocked should not be propagated. FFs inside from observation. With these two constraint sets, our overlapping scan architecture is emulated within the ATPG engine. After the desired fault coverage is achieved, the full-scan vector is post-processed to obtain the overlapping vector set set . We explain the pattern generation ow in Fig. 3 via an example. Assume that there are nine FFs ( , , , , , , , , , , and ) in the CUT, and . Under the columns , , and are the values of the FFs in each sub-scan chain for the current time is used to denote the value loaded frame. A value pair into the scan chain and the corresponding test response captured. If the test response of the value pair is omitted, it means that no capture operation is performed on the scan chain and in this time frame. The columns state the constraints obtained from the current pattern, which will be applied when generating the pattern in the next time frame. All nine FFs start with dont cares as shown in Fig. 4. In time-frame #1, is not active for shifting, so pattern #1 is gen. erated under the In other words, any test generated here must not pose any logic is not acvalue on FFs , , or . Next, in time-frame #2, in pattern #2 needs to be tive for shifting, so the portion of compatible with that in pattern #1. This is achieved by gener, since ating pattern #2 under the the test pattern generated in time-frame #1 assigned a logic 0 to is not active, so the portion of in FF . In time-frame #3, pattern #3 needs to be compatible with that in pattern #2. Thus, is applied when generating pattern #3. Patterns #4 and #5 can be explained in a similar manner. We use L(chain) and C(chain) to represent the serial shifting operation of loading the pattern into the scan chain

and the capture operation of getting the test response. Test application of the generated pattern set on the overlapping scan architecture is shown in the following. The operations performed on the sub-scan chains in each time frame are listed . The values to be loaded into the under the column , , active sub-scan chains are listed under the columns and

B. Algorithm for Forming Some target faults require certain FFs to play both roles of excitation and observation. These FFs will be selected and placed to ensure the fault coverage. Any remaining FF only into needs to play at most one role (either excitation/propagation or neither) for the detection of the other target faults. Procedure PickSdual() Set target fault set to ; Generate a full-scan vector set ; has been visited) While (not every vector in Simulate current vector by dynamically setting specied bits to be nonobservable; ; Output undetected faults to as Form an observation matrix , using faults in as the axis; the axis, and FFs in For each entry in

CHEN AND HSIAO: OVERLAPPING SCAN ARCHITECTURE FOR REDUCING BOTH TEST TIME AND TEST POWER

409

=S

TABLE II RATIO AND MAXIMAL SPEEDUP OF DIFFERENT BENCHMARK CIRCUITS

; ; Set target fault set to has been visited) While (not every vector in Simulate current vector by dynamically setting specied bits to be nonobservable; with fault index if fault effect of fault has been propagated to FFs of index , , ; ; Set = solve the minimal cover set problem on matrix

Fig. 5. Example on sparse matrix clustering.

The algorithm rst lters the detected faults under the restriction that each FF plays at most one single role (either excitation or propagation). The target fault set is reduced to those unde. An observation matrix is then formed, tected faults in which records the possible propagation of fault effects. Each target fault may be observed multiple times on different FFs. now becomes solving the minimal set covering Picking problem: picking the minimal number of FFs to cover the entire fault set. The previously described algorithm would require at least one iteration of fault simulation targeting on the entire fault set, and another iteration of fault simulation targeting on a partial fault set. The complexity for solving the minimal set covering problem can be adjusted by the user, such as using a more greedy heuristic to compute the cover more quickly. and denote the number of test cycles for Let the full scan and our new architectures, respectively. Let denote the ratio of the number of to the total number of FFs in the CUT, and FFs in . Then, (1) gives the potential speedup the new scan architecture can bring over the ordinary full-scan architecture Speedup (1)

can coverage is acceptable, then the number of cells in be reduced. This will lead to a reduction in test application time and power consumption. The normal value for is around 25%. is set to 0.9 Shown in the second row of Table II, when and a maximal coverage loss of 2.5% is acceptable, the size of can be greatly reduced. C. Partitioning FFs outside The partitioning of FFs outside is based on excitation relations. We prefer to place FFs in the same partition if they need to be simultaneously specied for exciting certain faults. An excitation matrix is formed by replacing every specied bit in the scan test set to an integer value 1 and all dont care values in the test set are ignored. Since a large portion of bits (sometimes more than 90%) in a full-scan test set is dont care, the excitation matrix only has a small fraction of entries that are lled with 1. In other words, will be a sparse matrix. A sparse matrix reordering algorithm [18] for cluster identication is then carried out on matrix , such that FFs that need to be simultaneously controlled for fault excitations are grouped together. Fig. 5 gives a clustering example on a scan test set consisting of seven vectors and eight FFs. The numbers on the left side of the matrix indicate the indices of the vectors, and those is above the matrix indicate the indices of the FFs. After in different ways no longer affects formed, partitioning the number of undetectable faults, but it can have an inuence on the number of test cycles to achieve a desired test coverage. D. Avoidance of the Serial Full-Scan Mode In this subsection, we provide the underlying theory on the avoidance of the serial full-scan mode by using the new overlapping scan architecture. Let the full-scan vector set be , is . Assume the scan and the fault set detected by , , and using the previously FFs are partitioned into presented algorithms. We use L(chain) and C(chain) to represent the serial shifting operation of loading the pattern into the scan chain and the capture operations of getting the test response.

and is related to the structure The ratio between ratios and the corof the CUT. Table II shows the that can be obtained on responding maximal speedups selected benchmark circuits with more than 100 FFs. The ratios are listed for when no loss of fault coverage is allowed and when minor loss is allowed. For example, the ratio for s5378 is 37%, giving 31.5% on the speedup. is set to adjust the size of , shown A cutoff value in (2)

(2) Let , (2) indicates that when is 1.0, there will be no coverage loss. If a reduction in fault

410

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007

TABLE III COMPARISON OF POWER AND TEST APPLICATION TIME AMONG DIFFERENT SCAN ARCHITECTURES

Test power computed is based on the transitions on the CUT including transitions in the clock tree

Proposition 1: For every fault in , the corresponding fault effect can be observed by at least one pattern in , on at least one FF in . , Proposition 1 is straightforward since if is detected by must excite and propagate its fault at least one vector in effect to a FF. , under the condition Proposition 2: For a fault in equals 1.0, if a FF to which the fault effect of that has propagated is not selected into , it indicates that there exists at least one vector that can detect without needing to specify a value on FF . In other words, FF only contributes to fault observation, not to fault excitation for fault . Otherwise, will be placed into in the overlapping scan architecture to avoid any coverage loss. Theorem 1: Given a sub-scan chain partition where and are nonempty, then for every fault in , its detection does not require the complete full scan, where the full scan is dened as serially shifting the test pattern into and a capture is made on all three sub-scan chains. Proof: Case 1: If an FF to which the fault effect of has propagated is in , any full-scan pattern that can detect can be fed into the overlapping scan architecture by using either: 1) (if all FFs in are Xs); 2) (if all FFs in are Xs); or 3) (if some FFs in both and need to be specied). Since is always active on capture cycles, the fault effect is guaranteed to be detected, and neither shift nor capture needs to engage all three sub-scan chains at any time. Case 2: If an FF to which the fault effect of has propagated is in , from Proposition 2, there exists at least one full-scan pattern that can detect without specifying the value of . If the excirequires no specied values in , tation of which is the ideal case, the pattern can be fed into the overlapping scan architecture by using: . Otherwise, if the excitation of requires some specied values in , the pattern can be fed into the overlapping scan architecture by using: . This guarantees fault can be excited and the fault

effect can be observed on after the second capture cycle. Case 3: If an FF to which the fault effect of has propagated is in , again from Proposition 2, there exists at least one full-scan pattern that can detect without specifying the value of . If the excitation of requires no specied values in , which is the ideal case, the pattern can be fed into the overlapping scan architecture by using: . If the excitation of requires some specied values in , the pattern can be fed into the overlapping scan architecture by using: . This guarantees fault can be excited and the fault effect can be observed on after the second capture cycle. In all of the previous three cases, at any time of the test application, there is a subset of FFs that stay inactive. Thus, the serial full-scan mode is avoided during the entire test application. IV. EXPERIMENTAL RESULTS We conducted experiments on ISCAS89 [17] benchmark circuits whose numbers of FFs exceed 100. In Table III, three different scan architectures are listed for comparison on test cycles, test peak power, and test average power. The columns Circuit and FF report the name of the circuit, and the number of FFs, respectively. The columns Desired Cov, Ptn# No Merge, and Ptn # Max Merge report the test coverage and the size of the non_compacted/compacted test set. The columns under Full Scan Architecture report the results for the ordinary full-scan architecture, where Cyc, PP, and AP are the abbreviations of test cycles, peak power, and average power, respectively. The columns under ILS report the results for the Illinois scan, when the number of parallel scan chains in the broadcast mode is set to 2; and the columns under Our Scan Architecture report those of our overlapping scan architecture. A commercial ATPG tool, TetraMAX [15] (any other combinational ATPG tool can play the same role), was used to generate vectors for all three scan architectures to obtain the most compact test sets (with the compact option set). We will compare the test cycle reduction that our overlapping scan architecture brings on the most compact test set, because this additional part of reduction is what static/dynamic compaction can not achieve. The peak and average powers are reported in the form of the sum of weighted switches on the internal gates of the CUT. Each clock

CHEN AND HSIAO: OVERLAPPING SCAN ARCHITECTURE FOR REDUCING BOTH TEST TIME AND TEST POWER

411

TABLE IV COMPARISON OF TEST TIME AND AVERAGE POWER WITH [16]

TABLE V COMPARISON ON PEAK POWER SPIKE POSITION AND NUMBER OF SERIAL FULL-SCAN MODES AMONG DIFFERENT SCAN ARCHITECTURES

Test power is based on the number of scan chain transitions

signal on an active FF is counted as two switches per clock cycle. Dont care bits in the test set are lled with the same logic value as its neighboring specied bit, respectively, to minimize the rippling effect during shifting. From Table III, we can see that ILS is very efcient in reducing test application time. With two parallel scan chains in the broadcast mode, it already outperforms the ordinary fullscan architecture and ours in terms of test cycles. Note that the number of parallel scan chains in ILS can be larger to further reduce test time. However, ILS elevated average/peak power over the ordinary full-scan architecture on benchmark circuits such as s9234 and s38417. The average increment on peak power is 25.2%. Compared with the ordinary full-scan architecture, our overlapping scan architecture outperforms for every circuit in average power, peak power, and test time. Take s13207 as an example. Our scan architecture achieves the same coverage with the full-scan architecture in 125782 test cycles, peak power of 3989, and average power of 1158, which are 29%, 32%, and 52% less compared with the ordinary full-scan architecture. For all the circuits listed in Table III, average reductions of 22.8%, 41.6%, and 25% were achieved on peak power, average power, and test application time, respectively. We also compared our approach with [16]. In [16], FFs are partitioned into sub-scan chains according to their excitation properties, and the test responses are selectively observed on different sub-scan chains. The test responses in the nonobserved sub-scan chains will act as part of the test stimuli for the next vector. In Table IV, the columns under [16] Compared with Ordinary Full Scan report the results from [16], where Pat #, Cyc #, and AP Red% denote the number of vectors, number of cycles, and percentage of average power reduction compared with the ordinary full-scan architecture. The columns under Our Work Compared with [16] report the results from our work, where Pat #, Cyc Red%, and AP Red% are the number of vectors, percentage of cycles reduced, and average power reduced compared with [16]. Although [16] and our work may have used different ATPG tools for generating different pattern sets, with the same motivation to introduce a new scan architecture that can reduce both test time and test power after compaction, it is fair to make comparisons on the same set of benchmark circuits. As seen from this table, our approach can achieve on average a reduction of 45.52% on test time over all the benchmark circuits tested compared with [16], while our average power is increased by 40.62% (although our average power is signicantly

lower than the full-scan architecture). This indicates that there is a tradeoff between the test time and the test power when the total energy is to be minimized. Different architectures shall be chosen based on the users applications. The reported area overhead in [16] is less than 0.5%. The area overhead in our work is less than 0.2% for all benchmark circuits tested. Due to page limitations, this area overhead data will not be explicitly listed in Table IV. In Table V, we report: 1) the phase in which peak power occurs under the column PS and 2) the number of full-scan test vectors necessary to achieve full fault coverage under column (SF#). Under the PS columns, SC denotes capture phase of the serial full-scan, SS denotes the shifting phase of the serial fullscan, BC denotes the capture phase of the broadcast mode, BS denotes the shifting phase of the broadcast mode, and PC denotes the capture phase of the partial scan mode in our work, respectively. We observe from this table that our overlapping scan architecture is the only scan architecture that can completely avoid the serial full-scan mode. By putting Tables III and V together, it is observed that the broadcast mode in ILS does not help in reducing peak power, while by freezing a portion of the FFs (both in the shifting phase and capture phase) in our work, both average and peak powers can be reduced. A. Contributions We would like to address the following points. 1) The test controller can be congured to support both overlapping scan mode and full scan mode. 2) Although in Section III-B, a pregenerated fullscan test set is used to extract the FFs in , it does not mean that needs to be extracted every time a new set of overlapping scan vectors is wanted. Given a full-scan test set, extracted reveals the excitation and propagation properties of FFs, which is determined by the structure of the CUT. Based on different full-scan test sets, the extracted may be different, while they will behave the same in our overlapping scan architecture. So to generate an additional set of overlapping scan vectors, start from step 4 in Fig. 3, but not step 1. is closely related to the potential speedup 3) The size of indicates more test and power reduction. A smaller of a smaller time and test power reductions. To obtain size, a nonmerged full-scan pattern set is preferred. 4) No special ATPG tool is needed. Any combinational ATPG tool can be slightly modied to support the functions of adding constrains on PIs and masking propagation on POs.

412

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007

5) Observation, only on a subset of the scan FFs in the overlapping scan mode, may lead to loss of nontarget defects [14]. On the other hand, many other tests, such as delay tests, etc., are still using the full-scan mode. Those nontarget defects which are not detected by single stuck fault test during the overlapping scan mode may be caught by other tests in the full-scan mode. The method proposed in [14] can be used in predicting the defective part level for the pattern set. 6) Vector application on our overlapping scan architecture is very routine. Unlike other scan chain disabling techniques, where the number of shifting cycles may differ for different vectors and additional data bits are needed to record the information, the number of shifting cycles in our overlapping scan architecture is a constant after the scan chain is congured. We can easily program the test controller other than using an additional control bit for each pattern. V. CONCLUSION A new overlapping scan architecture for reducing both the test application time and test power has been presented. In the proposed scan architecture, the FFs are intelligently partitioned into three scan chains based on the roles FFs play on the fault set. The maximization on the overlapping between the excitation and propagation roles allows the entire set of detectable faults in the CUT to be detected with only a portion of the FFs active at a time and, thus, eliminates the serial full-scan mode completely which can be a potential threat for peak power. The shifting-in cycles for the dont care bits, the switches inside the clock tree, and the percentage of FFs that are potentially switchable during test application are simultaneously reduced. Experimental results show that an average of 22.8% peak power reduction, 41.6% average power reduction, and an average of 25.0% test application time reduction can be achieved compared with the ordinary full-scan architecture, at the same fault coverage and with minimal hardware overhead. REFERENCES
[1] L. Whetsel, Adapting scan architecture for low power operation, in Proc. Int. Test Conf., 2000, pp. 863872. [2] T. Feng, Li-C. Wang, K.-T. Cheng, M. Pandey, and M. Abadir, Enhanced symbolic simulation for efcient verication of embedded array systems, in Proc. Asia South Pacic Des. Autom. Conf., 2003, pp. 302307. [3] G. Parthasarathy, M. Iyer, T. Feng, Li-C. Wang, K.-T. Cheng, and M. Abadir, Combining ATPG and symbolic simulation for efcient validation of embedded array systems, in Proc. Int. Test Conf., 2002, pp. 203212. [4] R. Sankaralingam, B. Pouya, and N. Touba, Reducing power dissipation during test using scan chain disable, in Proc. VLSI Test Symp., 2001, pp. 319324. [5] I. Hamzaoglu and J. Patel, Reducing test application time for full scan embedded cores, in Proc. Fault-Tolerant Comput., 1999, pp. 260267. [6] J. Chen, C. Yang, and K. Lee, Test pattern generation and clock disabling for simultaneous test time and power reduction, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 22, no. 3, pp. 363370, Mar. 2003.

[7] Y. Zorian, A distributed BIST control scheme for complex VLSI devices, in Proc. VLSI Test Symp., 1993, pp. 49. [8] D. Baik, K. Saluja, and S. Kajihara, Random access scan: A solution to test power, test data volume and test time, in Proc. Int. Conf. VLSI Des., 2004, pp. 883888. [9] L. Xu, Y. Sun, and H. Chen, Scan array solution for testing power and testing time, in Proc. Int. Test Conf., 2001, pp. 652659. [10] I. Lee, Y. Lin, and A. Ambler, Reduction of power and test time by removing cluster of dont-care from test data set, in Proc. Int. Conf. VLSI Des., 2005, pp. 255256. [11] A. Chandra and K. Chakrabarty, A unied approach to reduce SOC testing date volume, scan power and test time, IEEE Trans. Comput.Aided Des. Integr. Circuits Syst., vol. 22, no. 3, pp. 352362, Mar. 2003. [12] I. Lee, Y. Hur, and A. Ambler, The efcient multiple scan architecture reducing power dissipation and test time, in Proc. Asian Test Symp., 2004, pp. 9497. [13] R. Sankaralingam and N. Touba, Controlling peak power during scan testing, in Proc. VTS, 2002, pp. 153159. [14] J. Dworak, J. Wicker, S. Lee, M. Grimaila, R. Mercer, K. Butler, B. Stewart, and L. Wang, Defect-oriented testing and defective-part-level prediction, IEEE Trans. Des. Test Comput., vol. 18, no. 1, pp. 3141, Jan. 2001. [15] Synopsis, TetraMAX ATPG (1999). [Online]. Available: http://www. synopsys.com/products/test/tetramax_wp.html [16] O. Sinanoglu and A. Orailoglu, A novel scan architecture for powerefcient, rapid test, in Proc. Int. Conf. Comput.-Aided Des., 2002, pp. 299303. [17] F. Brglez, D. Bryan, and K. Kozminski, Combinational proles of sequential benchmark circuits, in Proc. Int. Symp. Circuits Syst., 1989, pp. 19291934. [18] C. Chemuell, Sparse Matrix Reordering Algorithms for Cluster Identication, (2005). [Online]. Available: http://www.osl.iu.edu/ Xiaoding Chen (S01M06) received the B.S. and the M.S. degrees in radio enginering from Southeast University, Nanjing, China, in 1998 and 2000, respectively, and the Ph.D degree in computer engineering from Virginia Polytechnic Institute and State University, Blacksburg, in 2006. Currently, she is with Synopsys Inc., Mountain View, CA. Her research interests include verication and testing of digital circuits.

Michael S. Hsiao (S95M97SM04) received the B.S. degree in computer engineering (highest honors), and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois at Urbana-Champaign, Urbana, in 1992, 1993, and 1997, respectively. He was a Visiting Scientist at NEC USA, Princeton, NJ, during the summer of 1997. Between 1997 and 2001, he was an Assistant Professor in the Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ. Between 2001 and 2006, he was an Associate Professor in the Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg. During the summer of 2002, he was a Visiting Professor at Intel, Santa Clara, CA. His current research interests include VLSI testing, design verication, diagnosis, and power management. He and his research group have published more than 140 refereed journal and conference papers. He has served on the program committee for more than 20 IEEE International Conferences and Workshops, in addition to serving on editorial boards of several journals. Dr. Hsiao was a recipient of the Digital Equipment Corporation Fellowship, the McDonnell Douglas Scholarship, and the National Science Foundation CAREER Award.

You might also like