You are on page 1of 47

Fault-tolerant design

Verification Testing Design for testability Built-in-self-test Concurrent checking

Types of testing
When testing is performed? On-line (concurrent) testing, off line testing Where is the source of stimuli? Self testing, external testing (tester) What do we test for? Design verification, acceptance testing for fabrication errors, etc How are the stimuli applied? Fixed order, adaptive testing What are the observed results? Entire output patterns, some functions of the output (compact testing /signature) What lines are accessible for testing? Only I/O, I/O and internal lines Who checks the results? self checking, external testing

Nature of faults
Permanent Always present Intermitted Occurs in regular intervals Transient One time and gone

fault

error

Self-checking circuits
Fault model (model describes a nature of faults) Error detecting codes Totally Self-checking property Self-testing property Fault-secure property Self-checking checker SOM-based checker
circuit checker
4

Error Detecting Codes


There is a possibility that during information processing or storage data can get corrupted due to physical defects in the system, there should be some provisions in the system for detecting erroneous bits in data. This typically request additional (redundant) bits to appended to the data for error detecting. The length the number of bits in encoded data, also known as code word is greater than that of the original data. The process of appending check bits to the information bits is called encoding; the opposite process extracting the original information bits from a code word is known as decoding. The ratio of the number of information bits to the number of code word bits is known as the code rate. An n-bit code obtained by encoding information bits of length k has 2k valid code words, (2n- 2k) invalid (non-code) words, and the code rate k/n.
5

Error Detecting Codes


The primary requirements of the code are as follows: It detects all likely errors. It achieve the desired degree of error detection by using minimum redundancy. The encoding and decoding process is fast and simple. Codes can be classified as either separable or nonseparable. The information bits in a separable code can be separately identified from the check bits; The information bits in a nonseparable code are embedded in a code word and can only be extracted by using a specific decoding. A separable code with k information bits is said to be systematic if all 2k patterns of information bits occur in code words.

Parity Code
The parity code is obtained by counting the number of 1s in information bits and tacking a 0 or a 1 to make the count odd or even. The odd parity is generally preferred because of ensures at least a single 1 in any code word. Parity check can detect only odd number of errors. The error detecting capability of the parity checker can be expanded by including a parity bit for each byte of information bits. More sophisticated parity oriented solutions are based on partitioning information bits into several blocks with each bit appearing in more then one block, and computing the parity for each block. Such overlapping detects not only more than 1-bit errors but in the case of a single erroneous bit the location of this bit can also be identified.

Multiple Error Detecting Codes


Single-bit error detecting code can handle only random error. However, in many cases error that occur in logic circuits and memory systems are of multiple nature. Multiple errors belong to one of the following classes: symmetric, unsymmetrical, and unidirectional. Symmetric errors: Both 0->1 and 1->0 errors can occur with equal probability in code word. Unsymmetrical errors: Only one type of error 0->1 or 1->0 but not both can occur in code word. Unidirectional errors: Both 0->1 and 1->0 errors can occur, but they do not occur simultaneously in any code word.
8

Unidirectional Error Detecting codes -Definitions


Let X and Y be two binary k-tuple. Denote N(X,Y) as the number of 1 0 crossovers from X to Y.
X = 101010 Y = 110101 N ( X ,Y ) = 2 N (Y , X ) = 3

The number of bits in which two distinct binary vectors differ is known as the Hamming distance of the code d(X,Y)=N(X,Y)+N(Y,X). A word X(x1,xk) covers another word Y(y1,,yk), (X Y), if yi=1 implies xi=1 for i=1,2,k. In other words, the positions of 1 in Y are a subset of the positions of 1 in X. X=101010 and Y =101000 XY

If X does not cover Y, and Y does not cover X, then X and Y are unordered. A code in which no code word is covered by any other code word is known to be unordered code.
9

Unordered Codes for Unidirectional Error Detecting


Many fault in VLSI circuits have been found to cause unidirectional errors. This has led to the development of several unidirectional error detecting codes. An unordered code is capable of detecting all unidirectional errors. This is because in such a code, a unidirectional error cannot transform one code word into another code word. Unordered code can be separable and nonseparable. For example both m-out-of-n and Berger code are unordered, but former is nonseparable and the latter is separable. Both codes detect single and unidirectional multiple errors.
10

m-out-of-n Codes
In an m-out-of-n code, all valid code words have exactly m 1s and (n-m) 0s. The total number of code words is n!/(n-m)!m!. If m=k and n=2k, we have a popular k-out-of-2k code. A special case of k-out-of-2k code consisting of only 2k code words out of the possible 2k!/k!k! code words is known as k-pair two-rail code. Each code word of this code has k-information bits and k check bits, which are bit-to-bit complements of the information bits. The 2-pair two-rail code consists of the following code words: 0011, 1001, 0110, 1100. If m= n / 2 in m-out-of-n code, then the code is optimal. In other words, there is no other unordered code except n/ 2 outof n code that has more code words of length n. An important subset of m-out-of-n code is 1-out-of-n code, in which exactly 1 bit of an n-bit code word is 1 and the remaining bits are all 0s.
11

Berger Code
A Berger code of length n has k information bits and c check bits where: c = log 2 (k + 1) and n=k+c. It is the least redundant unordered code for detecting single and unidirectional multiple errors. A code word is constructed by forming a binary number corresponding to the number of 1s in the information bits, and appending the bit-by-bit complement of the binary number as check bits to the information bits. For example: if k=0101000, c = log 2 (7 + 1) = 3 and the Berger code must have a length of 10 (=7+3), c check bits are derived as follows: Number of 1s in information bits k=2 (010). The bit to bit complement of 010 is 101, which are the c check bits. Thus, 0101000 101. k c
12

Berger Code
The c check bits may be the binary number representing the number of 0s in k information bits. Thus, the check bits for the Berger code can be generated by using two different schemes. The scheme that uses the bit-to-bit complement of the binary representation of the number of 1s in the information bits is known as the B1 encoding scheme. The other scheme, which uses the binary representation of the number of 0s in the information bits as check bits, is known as the B0 scheme.
13

Smith code
In the case when some subset of codewords is unordered and another subset is ordered the Smith code can be applied. Let the set of codewords is: {11111, 11100, 00101, 00110, 10001, 11000, 00100, 00000}. The idea of the Smith encoding is to make unordered not every couple of vectors but just those that are ordered. In our example there are 4 ordered chains of Hasse diagram: 1={11111, 11100, 11000, 00000} 2={11111, 00101, 00100, 00000} 3={11111, 00110, 00100, 00000} 4={11111, 10001, 00000} Each chain can be encoded undependably. Consequently, it is possible to encode each level of the Hasse diagram as follows:
14

Hasse diagram for the Smith code

15

Berger vs. Smith encoding


INFORMATION BITS y1 y2 y3 y4 y5 Berger H1 H2 H3 Smith s1

1 1 0 0 1 1 0 0

1 1 0 0 0 1 0 0

1 1 1 1 0 0 1 0

1 0 0 1 0 0 0 0

1 0 1 0 1 0 0 0

0 0 0 0 0 0 0 1

0 0 1 1 1 1 1 0

0 1 1 0 1 0 1 0

0 0 0 0 0 1 1 1
16

Self-Checking Combinational Circuits Design


Self-checking can be defined as the ability to verify automatically whether is any fault in logic without the need for externally applied testing. Self-checking circuits allow on-line error detection, that is faults can be detected during the normal operation of the circuit. One of the ways to achieve the self-checking design is through the use of error detecting codes.
17

Principles of Self-checking
Let a circuit has m primary input lines and n primary output lines. Then 2m binary vectors of length m form the input space X of the circuit. The output space is similarly defined to be the set of 2n binary vectors of length n. During the normal (fault free) operation the circuit receives only a subset of X called input code space and produces a subset of Z called the output code space. Member of the code space called code words. A non-code word at the output indicates the presence a fault in the circuit. However, a fault may also result in an incorrect codeword at the output, rather then a non-codeword, in which case the fault is undetectable.
18

Principles of Self-checking
A circuit may be designed to be self-checking only for an assumed set of faults. Such a set usually includes single stuck-at faults and unidirectional multiple faults. A single stuck-at fault assumes that a physical defect in a logic circuit results in one of the signal lines in the circuit being fixed to either a logic 0 (stuck-at-0) or logic 1 (stuck-at-1). If more then one signal line in the circuit is stuck-at-1 or stuckat-0 at the same time, the circuit is said to have a multiple stuck-at fault. A variation of the multiple fault is the unidirectional fault. A multiple fault is unidirectional if all its constituent faults are either stuck-at-0 or stuck-at-1 but not simultaneously. Self-checking circuits must satisfy the following properties: Self-testing Fault-secure
19

Inputs

Self-checking circuit

...

...

coded output

...
checker

error signal

Self-checking circuit
20

Self-checking property
I - input code space; S is output code space; Y ( X , ) is a function of an input vector in the fault free case; Y ( X , f ) is a function of an input vector and fault f in the circuit.

Definition 1 A circuit is fault-secure for an input set I and a fault set F if for any input X in I and for any fault f in F, Y ( X , ) S , and Y ( X , f ) S implies Y (X , f ) = Y (X , ) . A circuit is fault-secure if, for every fault from a prescribed set, the circuit is never produces an incorrect code space output for code space inputs.

21

Self-checking property
Definition 2 A circuit is self-testing for an input set N and a fault set F if for every fault f in F there is an input X in N such that Y ( X , f ) S . A circuit is self-testing if, for every fault from a prescribed set, the circuit produces a non-code space output for at least one code input.

Definition 3. A totally-self-checking circuit is a circuit that is self-testing for a normal input set and a fault set F, and fault-secure for N and a fault set F.

22

Example: PLA checking


Problem: to develop method for synthesis of the selfchecking PLA with minimal overhead (redundant area). Input space vs. Output space (code words). Totally Self-checking (TSC) property for the circuit have to be proven. A checker have to be totally self-checking (TSC). Fault coverage have to be high.

23

Example: PLA checking


Three kinds of faults can normally occur in PLAs: stuckat faults, bridging faults, and cross-point faults. An important assumption: non-concurrent property, that is any normal input pattern selects exactly one product term in PLA during fault free operation. All single faults in a PLA can cause only unidirectional errors in the outputs of the PLA.

24

Example: PLA checking (concurrency property)


x1 x2 x3 x4 x5 x6 x7 x8

Non-disjoint

Stuck at 1

Stuck at 0

y3 y5 y4 y6 y1 y2 y8 y10 y14 y12 y7

y9 y11 y13

25

Example: PLA checking (non-concurrency property)


x1 x2 x3 x4 x5 x6 x7 x8

disjoint

memory
d 3 d2 d1

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14

26

PLA checking
Faults:
cross-points (AND array, OR array): 0=>1; 1=>0; 0=>*; 1=>*. Stuck-at-fault : stuck-at-terms, stuck-atoutputs Using XOR is forbidden for checking Checker has to be totally self-checking (TSC) Whole system have to be TSC

27

SOM checker solution


The Sum-of-Minterms (SOM) checker implements the following logical function:

f err =

Y ,
t t =1

Yt a certain codeword, Q a number of possible code words.

28

PLA checking
Why do we need the non-concurrent property for a SOM checker? Example :
expression x1 x2 x3 x1 x2 x3 x1 1 1 x2 0 0 x3 1 1 y1 1 0 1 y2 0 1 1

An erroneous output due to a missing device in x2th column cannot be detected


29

Example: PLA checking (SOM checker)


x1 1 0 * 0 1 x2 0 0 1 1 1 x3 * * 1 0 0 y1 1 0 0 0 1 y2 1 0 1 1 0 y3 0 1 0 1 0 y4 0 1 1 1 0 Error 1 1 1 1 1

Non fault secure A missing or an additional device at these points cannot be detected SOM checker (by itself) cannot help in this case

1 0 0 0 1

1 0 1 1 0

0 1 0 1 0

0 1 1 1 0

Non TSC

ferr
30

A fault in the checker cannot be detected

SOM-based checker on PLA Berger encoding


x1 1 0 * 0 1 x2 0 0 1 1 1 x3 * * 1 0 0 INFORMATION BITS y1 y2 y3 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 y4 0 1 1 1 0 CHECK BITS b1 1 1 1 0 1 b2 0 0 0 1 1

Berger code

1 0 0 0 1

1 0 1 1 0

0 1 0 1 0

0 1 1 1 0

1 1 1 0 1

0 0 0 1 1

Error 1 1 1 0 0

0 0 0 1 1

Two-rail

31

Self-checking Checkers
Self-dual Parity Checking Two-rail checker TSC checkers for m-out-of-n codes TSC Berger checkers TSC Smith checkers

32

Parity Checking
In conventional parity checking the parity bit p corresponding to the output bits of the combinational circuit is compared with the parity bit p generated independently by the parity prediction circuit.
Inputs (x1, , xn)
Combinational Circuit

Outputs (y1, , yn)

Comparator

Parity prediction circuit

checker

Parity prediction function:

y p = y1 y2 yn

33

Self-dual Parity Checking


In general the area overhead for separate implementation of the parity prediction checker results in average overhead of 33%. To reduce the overhead a self-dual parity checking is developed. In this checking approach the parity prediction function is replaced by a circuit that generates a self-dual complement of the combination circuit function.
Inputs (x1, , xn)
Combinational Circuit

Outputs (y1, , yn) f(x)

Comparator Self-dual complement

(x)
34

Self-dual Parity Checking


The self-dual complement function (x), of the function f(x), must in respect to a self-dual function h(x) satisfies:

or Therefore,

h ( x) = f ( x) ( x)

( x) = f ( x) h ( x)
1 = ( f ( x ) ( x )) ( f ( x ) ( x ))
35

Self-dual Parity Checking - example


f ( x0 , x1 , x2 , x3 ) = x0 x2 + x2 x3 + x0 x3
Take an arbitrary self-dual function:

h( x0 , x1, x2 , x3 ) = x0 x1 x2 + x0 x1 x2 x3 + x0 x1x2 x3 + x0 x1x3 + x1x2 x3


It can be rewritten: h( x0 , x1 , x2 , x3 ) = f ( x0 , x1 , x2 , x3 ) x1 x3 + x1 x3

Therefore self-dual complement of f(x) is:

( x0 , x1 , x2 , x3 ) = x1 x3 + x1 x3
36

Implementation of the function and its selfdual complement


x0 x2 f x3

x0 x3

x1 x3

x1 x3

37

Self-dual parity checking


For circuit with multiple outputs such as y1, y2, y3, , yn, the parity of the output bits is compared with the self-dual complement of the parity function. f p = y1 y2 yn The self-dual complement p of fp is chosen such that the function h( x1 , xn ) = f ( x1 , xn ) ( x1 , xn ) is self-dual.
Inputs (x1, , xn) +
Combinational Circuit

+ + fp + h(x)
38

Self-dual complement of fp

Self-dual parity checking


During normal operation complementary input patterns are applied to the composite circuit implementing h(x) at time units t and t+1. If there is no fault output responses of f and are complementary. The self-dual parity checking in conjunction with a time redundancy scheme allows on-line error detection corresponding to function h(x). The drawback of the self-dual parity approach is 100% time redundancy in addition to hardware overhead.
39

Self-checking Checker
A totally self-checking checker must have two outputs, and, hence, four output combinations. Two of these combinations (for example 01 and 10) are considered as valid. A non-valid combination indicates either a non-code word at the input of the checker, or a fault in the checker itself. A checker does not need to be fault-secure because one is interested only in whether the checker circuit is a code word or not. It is not important whether 01 has changed to 10 or vice versa because the output of the checker will be 00 or 11 in presence of a fault (self-testing).

40

Two-rail checker
The two-rail checker has two groups of inputs (x1, , xn) and (y1, , yn) and two outputs f and g. f and g have to be complementary (1-out-of-2) if and only if pair xj, yj is also complementary for all j.
yn-1 xn-1 y1 x1 y0 x0

Totally self-checking two-rail checker

1-out-of-2

41

Truth Table of the Two-rail checker


The circuit has normal input set N={<0101>,<0110>,<1001>, <1010>} The circuit is totally self-checking for all unidirectional multiple faults.
x0 f

x1 0 0 0 0 0 0 0 0 1 1

y1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

x0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

y0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

f 0 0 0 0 0 0 1 1 0 0 1 1 0 1 1

g 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1

y0

x1 g y1

1 1 1 1 1 1

Can we detect a stuck-at-1 at this point?

1 42 1

TSC two-rail checker with six input pairs


x1 y1 x2 y2 x3 y3 x4 y4

*
f1 g1 f2

*
g2 y5 x6 y6

x5

*
f3 g3

*
f4 g4

*
43

Totally Self-checking checkers for m-outof-n codes


The m-out-of-n checker consists of two independent subcircuits, each subcircuit having a single output.

S = M &M
n k n k

n k +1

The k-out-of-2k checker is fault-secure for a single fault because it has two subcircuits; a single fault can affect the output of only one of them. If the checker is implemented with AND-OR logic, it is TSC also for unidirectional multiple faults.
44

Design of the k-out-of-2k checker


We know: Each monotonic symmetric function of n variables can be represented as a composition of elementary monotonic symmetrical functions of m and n-m variables:
Example:

n Mk
4 M2

= M
j =0

m j

nm & M k j
2 & M1

2 M2

2 & M0

2 M1

2 M0

2 & M2
45

= xy + ( x + y )( z + t ) + zt

Design of the k-out-of-2k checker


2k bits are partitioned into two disjoint subsets: A(x1,xk) and B(xk+1, , x2k). Outputs of the checker can be expressed as:

Z1 = Z2 =

i =1 i =1
k

kA Mi

kB & M k i

(i = 1, 3, 5, an odd number ) (i = 0, 2, 4, 6, an even number )

kA Mi

kB & M k i

where kA and kB are numbers of 1s occurring in subsets A and B, respectively.


46

Example 1: design of TSC 2-out-of-4 checker


Z1 = M 1k & M 1k = ( x1 + x2 ) ( x3 + x4 )
A B

k=2; A=(x1, x2); B=(x3, x4).

k k k k k k Z 2 = M 0 & M 2 + M 2 & M 0 = 1 M 2 + M 2 1 = x3 x4 + x1 x2
A B A B B A

x1 x2 x3 x4 Z1

x1 x2 Z2 x3 x4
47

You might also like