תכן לוגי מתקדם- הרצאה 9 - Fault Tolerant Design

Fault-tolerant design
Verification Testing Design for testability Built-in-self-test Concurrent checking
Types of testing
When testing is performed? On-line (concurrent) testing, off line testing Where is the source of stimuli? Self testing, external testing (tester) What do we test for? Design verification, acceptance testing for fabrication errors, etc How are the stimuli applied? Fixed order, adaptive testing What are the observed results? Entire output patterns, some functions of the output (compact testing /signature) What lines are accessible for testing? Only I/O, I/O and internal lines Who checks the results? self checking, external testing
Nature of faults
Permanent Always present Intermitted Occurs in regular intervals Transient One time and gone
fault
error
Self-checking circuits
Fault model (model describes a nature of faults) Error detecting codes Totally Self-checking property Self-testing property Fault-secure property Self-checking checker SOM-based checker
circuit checker
4
Error Detecting Codes

There is a possibility that during information processing or storage data can get corrupted due to physical defects in the system, there should be some provisions in the system for detecting erroneous bits in data. This typically request additional (redundant) bits to appended to the data for error detecting. The length the number of bits in encoded data, also known as code word is greater than that of the original data. The process of appending check bits to the information bits is called encoding; the opposite process extracting the original information bits from a code word is known as decoding. The ratio of the number of information bits to the number of code word bits is known as the code rate. An n-bit code obtained by encoding information bits of length k has 2k valid code words, (2n- 2k) invalid (non-code) words, and the code rate k/n.
5
Error Detecting Codes

The primary requirements of the code are as follows: It detects all likely errors. It achieve the desired degree of error detection by using minimum redundancy. The encoding and decoding process is fast and simple. Codes can be classified as either separable or nonseparable. The information bits in a separable code can be separately identified from the check bits; The information bits in a nonseparable code are embedded in a code word and can only be extracted by using a specific decoding. A separable code with k information bits is said to be systematic if all 2k patterns of information bits occur in code words.
Parity Code
The parity code is obtained by counting the number of 1s in information bits and tacking a 0 or a 1 to make the count odd or even. The odd parity is generally preferred because of ensures at least a single 1 in any code word. Parity check can detect only odd number of errors. The error detecting capability of the parity checker can be expanded by including a parity bit for each byte of information bits. More sophisticated parity oriented solutions are based on partitioning information bits into several blocks with each bit appearing in more then one block, and computing the parity for each block. Such overlapping detects not only more than 1-bit errors but in the case of a single erroneous bit the location of this bit can also be identified.
Multiple Error Detecting Codes

Single-bit error detecting code can handle only random error. However, in many cases error that occur in logic circuits and memory systems are of multiple nature. Multiple errors belong to one of the following classes: symmetric, unsymmetrical, and unidirectional. Symmetric errors: Both 0->1 and 1->0 errors can occur with equal probability in code word. Unsymmetrical errors: Only one type of error 0->1 or 1->0 but not both can occur in code word. Unidirectional errors: Both 0->1 and 1->0 errors can occur, but they do not occur simultaneously in any code word.
8
Unidirectional Error Detecting codes -Definitions

Let X and Y be two binary k-tuple. Denote N(X,Y) as the number of 1 0 crossovers from X to Y.
X = 101010 Y = 110101 N ( X ,Y ) = 2 N (Y , X ) = 3
The number of bits in which two distinct binary vectors differ is known as the Hamming distance of the code d(X,Y)=N(X,Y)+N(Y,X). A word X(x1,xk) covers another word Y(y1,,yk), (X Y), if yi=1 implies xi=1 for i=1,2,k. In other words, the positions of 1 in Y are a subset of the positions of 1 in X. X=101010 and Y =101000 XY
If X does not cover Y, and Y does not cover X, then X and Y are unordered. A code in which no code word is covered by any other code word is known to be unordered code.
9
Unordered Codes for Unidirectional Error Detecting

Many fault in VLSI circuits have been found to cause unidirectional errors. This has led to the development of several unidirectional error detecting codes. An unordered code is capable of detecting all unidirectional errors. This is because in such a code, a unidirectional error cannot transform one code word into another code word. Unordered code can be separable and nonseparable. For example both m-out-of-n and Berger code are unordered, but former is nonseparable and the latter is separable. Both codes detect single and unidirectional multiple errors.
10
m-out-of-n Codes
In an m-out-of-n code, all valid code words have exactly m 1s and (n-m) 0s. The total number of code words is n!/(n-m)!m!. If m=k and n=2k, we have a popular k-out-of-2k code. A special case of k-out-of-2k code consisting of only 2k code words out of the possible 2k!/k!k! code words is known as k-pair two-rail code. Each code word of this code has k-information bits and k check bits, which are bit-to-bit complements of the information bits. The 2-pair two-rail code consists of the following code words: 0011, 1001, 0110, 1100. If m= n / 2 in m-out-of-n code, then the code is optimal. In other words, there is no other unordered code except n/ 2 outof n code that has more code words of length n. An important subset of m-out-of-n code is 1-out-of-n code, in which exactly 1 bit of an n-bit code word is 1 and the remaining bits are all 0s.
11
Berger Code
A Berger code of length n has k information bits and c check bits where: c = log 2 (k + 1) and n=k+c. It is the least redundant unordered code for detecting single and unidirectional multiple errors. A code word is constructed by forming a binary number corresponding to the number of 1s in the information bits, and appending the bit-by-bit complement of the binary number as check bits to the information bits. For example: if k=0101000, c = log 2 (7 + 1) = 3 and the Berger code must have a length of 10 (=7+3), c check bits are derived as follows: Number of 1s in information bits k=2 (010). The bit to bit complement of 010 is 101, which are the c check bits. Thus, 0101000 101. k c
12
Berger Code
The c check bits may be the binary number representing the number of 0s in k information bits. Thus, the check bits for the Berger code can be generated by using two different schemes. The scheme that uses the bit-to-bit complement of the binary representation of the number of 1s in the information bits is known as the B1 encoding scheme. The other scheme, which uses the binary representation of the number of 0s in the information bits as check bits, is known as the B0 scheme.
13
Smith code
In the case when some subset of codewords is unordered and another subset is ordered the Smith code can be applied. Let the set of codewords is: {11111, 11100, 00101, 00110, 10001, 11000, 00100, 00000}. The idea of the Smith encoding is to make unordered not every couple of vectors but just those that are ordered. In our example there are 4 ordered chains of Hasse diagram: 1={11111, 11100, 11000, 00000} 2={11111, 00101, 00100, 00000} 3={11111, 00110, 00100, 00000} 4={11111, 10001, 00000} Each chain can be encoded undependably. Consequently, it is possible to encode each level of the Hasse diagram as follows:
14
Hasse diagram for the Smith code
15
Berger vs. Smith encoding

INFORMATION BITS y1 y2 y3 y4 y5 Berger H1 H2 H3 Smith s1
1 1 0 0 1 1 0 0
1 1 0 0 0 1 0 0
1 1 1 1 0 0 1 0
1 0 0 1 0 0 0 0
1 0 1 0 1 0 0 0
0 0 0 0 0 0 0 1
0 0 1 1 1 1 1 0
0 1 1 0 1 0 1 0
0 0 0 0 0 1 1 1
16
Self-Checking Combinational Circuits Design

Self-checking can be defined as the ability to verify automatically whether is any fault in logic without the need for externally applied testing. Self-checking circuits allow on-line error detection, that is faults can be detected during the normal operation of the circuit. One of the ways to achieve the self-checking design is through the use of error detecting codes.
17
Principles of Self-checking
Let a circuit has m primary input lines and n primary output lines. Then 2m binary vectors of length m form the input space X of the circuit. The output space is similarly defined to be the set of 2n binary vectors of length n. During the normal (fault free) operation the circuit receives only a subset of X called input code space and produces a subset of Z called the output code space. Member of the code space called code words. A non-code word at the output indicates the presence a fault in the circuit. However, a fault may also result in an incorrect codeword at the output, rather then a non-codeword, in which case the fault is undetectable.
18
Principles of Self-checking
A circuit may be designed to be self-checking only for an assumed set of faults. Such a set usually includes single stuck-at faults and unidirectional multiple faults. A single stuck-at fault assumes that a physical defect in a logic circuit results in one of the signal lines in the circuit being fixed to either a logic 0 (stuck-at-0) or logic 1 (stuck-at-1). If more then one signal line in the circuit is stuck-at-1 or stuckat-0 at the same time, the circuit is said to have a multiple stuck-at fault. A variation of the multiple fault is the unidirectional fault. A multiple fault is unidirectional if all its constituent faults are either stuck-at-0 or stuck-at-1 but not simultaneously. Self-checking circuits must satisfy the following properties: Self-testing Fault-secure
19
Inputs
Self-checking circuit
...
...
coded output
...
checker
error signal
Self-checking circuit
20
Self-checking property
I - input code space; S is output code space; Y ( X , ) is a function of an input vector in the fault free case; Y ( X , f ) is a function of an input vector and fault f in the circuit.
Definition 1 A circuit is fault-secure for an input set I and a fault set F if for any input X in I and for any fault f in F, Y ( X , ) S , and Y ( X , f ) S implies Y (X , f ) = Y (X , ) . A circuit is fault-secure if, for every fault from a prescribed set, the circuit is never produces an incorrect code space output for code space inputs.
21
Self-checking property
Definition 2 A circuit is self-testing for an input set N and a fault set F if for every fault f in F there is an input X in N such that Y ( X , f ) S . A circuit is self-testing if, for every fault from a prescribed set, the circuit produces a non-code space output for at least one code input.
Definition 3. A totally-self-checking circuit is a circuit that is self-testing for a normal input set and a fault set F, and fault-secure for N and a fault set F.
22
Example: PLA checking

Problem: to develop method for synthesis of the selfchecking PLA with minimal overhead (redundant area). Input space vs. Output space (code words). Totally Self-checking (TSC) property for the circuit have to be proven. A checker have to be totally self-checking (TSC). Fault coverage have to be high.
23
Example: PLA checking

Three kinds of faults can normally occur in PLAs: stuckat faults, bridging faults, and cross-point faults. An important assumption: non-concurrent property, that is any normal input pattern selects exactly one product term in PLA during fault free operation. All single faults in a PLA can cause only unidirectional errors in the outputs of the PLA.
24
Example: PLA checking (concurrency property)

x1 x2 x3 x4 x5 x6 x7 x8
Non-disjoint
Stuck at 1
Stuck at 0
y3 y5 y4 y6 y1 y2 y8 y10 y14 y12 y7
y9 y11 y13
25
Example: PLA checking (non-concurrency property)

x1 x2 x3 x4 x5 x6 x7 x8
disjoint
memory
d 3 d2 d1
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14
26
PLA checking
Faults:
cross-points (AND array, OR array): 0=>1; 1=>0; 0=>*; 1=>*. Stuck-at-fault : stuck-at-terms, stuck-atoutputs Using XOR is forbidden for checking Checker has to be totally self-checking (TSC) Whole system have to be TSC
27
SOM checker solution

The Sum-of-Minterms (SOM) checker implements the following logical function:
f err =
Y ,
t t =1
Yt a certain codeword, Q a number of possible code words.
28
PLA checking
Why do we need the non-concurrent property for a SOM checker? Example :
expression x1 x2 x3 x1 x2 x3 x1 1 1 x2 0 0 x3 1 1 y1 1 0 1 y2 0 1 1
An erroneous output due to a missing device in x2th column cannot be detected

29
Example: PLA checking (SOM checker)

x1 1 0 * 0 1 x2 0 0 1 1 1 x3 * * 1 0 0 y1 1 0 0 0 1 y2 1 0 1 1 0 y3 0 1 0 1 0 y4 0 1 1 1 0 Error 1 1 1 1 1
Non fault secure A missing or an additional device at these points cannot be detected SOM checker (by itself) cannot help in this case
1 0 0 0 1
1 0 1 1 0
0 1 0 1 0
0 1 1 1 0
Non TSC
ferr
30
A fault in the checker cannot be detected
SOM-based checker on PLA Berger encoding

x1 1 0 * 0 1 x2 0 0 1 1 1 x3 * * 1 0 0 INFORMATION BITS y1 y2 y3 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 y4 0 1 1 1 0 CHECK BITS b1 1 1 1 0 1 b2 0 0 0 1 1
Berger code
1 0 0 0 1
1 0 1 1 0
0 1 0 1 0
0 1 1 1 0
1 1 1 0 1
0 0 0 1 1
Error 1 1 1 0 0
0 0 0 1 1
Two-rail
31
Self-checking Checkers
Self-dual Parity Checking Two-rail checker TSC checkers for m-out-of-n codes TSC Berger checkers TSC Smith checkers
32
Parity Checking
In conventional parity checking the parity bit p corresponding to the output bits of the combinational circuit is compared with the parity bit p generated independently by the parity prediction circuit.
Inputs (x1, , xn)
Combinational Circuit
Outputs (y1, , yn)
Comparator
Parity prediction circuit
checker
Parity prediction function:
y p = y1 y2 yn
33
Self-dual Parity Checking

In general the area overhead for separate implementation of the parity prediction checker results in average overhead of 33%. To reduce the overhead a self-dual parity checking is developed. In this checking approach the parity prediction function is replaced by a circuit that generates a self-dual complement of the combination circuit function.
Inputs (x1, , xn)
Outputs (y1, , yn) f(x)
Comparator Self-dual complement
(x)
34
Self-dual Parity Checking

The self-dual complement function (x), of the function f(x), must in respect to a self-dual function h(x) satisfies:
or Therefore,
h ( x) = f ( x) ( x)
( x) = f ( x) h ( x)
1 = ( f ( x ) ( x )) ( f ( x ) ( x ))
35
Self-dual Parity Checking - example

f ( x0 , x1 , x2 , x3 ) = x0 x2 + x2 x3 + x0 x3
Take an arbitrary self-dual function:
h( x0 , x1, x2 , x3 ) = x0 x1 x2 + x0 x1 x2 x3 + x0 x1x2 x3 + x0 x1x3 + x1x2 x3

It can be rewritten: h( x0 , x1 , x2 , x3 ) = f ( x0 , x1 , x2 , x3 ) x1 x3 + x1 x3
Therefore self-dual complement of f(x) is:
( x0 , x1 , x2 , x3 ) = x1 x3 + x1 x3
36
Implementation of the function and its selfdual complement

x0 x2 f x3
x0 x3
x1 x3
x1 x3
37
Self-dual parity checking

For circuit with multiple outputs such as y1, y2, y3, , yn, the parity of the output bits is compared with the self-dual complement of the parity function. f p = y1 y2 yn The self-dual complement p of fp is chosen such that the function h( x1 , xn ) = f ( x1 , xn ) ( x1 , xn ) is self-dual.
Inputs (x1, , xn) +
+ + fp + h(x)
38
Self-dual complement of fp
Self-dual parity checking

During normal operation complementary input patterns are applied to the composite circuit implementing h(x) at time units t and t+1. If there is no fault output responses of f and are complementary. The self-dual parity checking in conjunction with a time redundancy scheme allows on-line error detection corresponding to function h(x). The drawback of the self-dual parity approach is 100% time redundancy in addition to hardware overhead.
39
Self-checking Checker
A totally self-checking checker must have two outputs, and, hence, four output combinations. Two of these combinations (for example 01 and 10) are considered as valid. A non-valid combination indicates either a non-code word at the input of the checker, or a fault in the checker itself. A checker does not need to be fault-secure because one is interested only in whether the checker circuit is a code word or not. It is not important whether 01 has changed to 10 or vice versa because the output of the checker will be 00 or 11 in presence of a fault (self-testing).
40
Two-rail checker
The two-rail checker has two groups of inputs (x1, , xn) and (y1, , yn) and two outputs f and g. f and g have to be complementary (1-out-of-2) if and only if pair xj, yj is also complementary for all j.
yn-1 xn-1 y1 x1 y0 x0
Totally self-checking two-rail checker
1-out-of-2
41
Truth Table of the Two-rail checker

The circuit has normal input set N={<0101>,<0110>,<1001>, <1010>} The circuit is totally self-checking for all unidirectional multiple faults.
x0 f
x1 0 0 0 0 0 0 0 0 1 1
y1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
y0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
f 0 0 0 0 0 0 1 1 0 0 1 1 0 1 1
g 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1
y0
x1 g y1
1 1 1 1 1 1
Can we detect a stuck-at-1 at this point?
1 42 1
TSC two-rail checker with six input pairs

x1 y1 x2 y2 x3 y3 x4 y4
*
f1 g1 f2
*
g2 y5 x6 y6
x5
*
f3 g3
*
f4 g4
*
43
Totally Self-checking checkers for m-outof-n codes

The m-out-of-n checker consists of two independent subcircuits, each subcircuit having a single output.
S = M &M
n k n k
n k +1
The k-out-of-2k checker is fault-secure for a single fault because it has two subcircuits; a single fault can affect the output of only one of them. If the checker is implemented with AND-OR logic, it is TSC also for unidirectional multiple faults.
44
Design of the k-out-of-2k checker

We know: Each monotonic symmetric function of n variables can be represented as a composition of elementary monotonic symmetrical functions of m and n-m variables:
Example:
n Mk
4 M2
= M
j =0
m j
nm & M k j
2 & M1
2 M2
2 & M0
2 M1
2 M0
2 & M2
45
= xy + ( x + y )( z + t ) + zt
Design of the k-out-of-2k checker

2k bits are partitioned into two disjoint subsets: A(x1,xk) and B(xk+1, , x2k). Outputs of the checker can be expressed as:
Z1 = Z2 =
i =1 i =1
k
kA Mi
kB & M k i
(i = 1, 3, 5, an odd number ) (i = 0, 2, 4, 6, an even number )
kA Mi
kB & M k i
where kA and kB are numbers of 1s occurring in subsets A and B, respectively.

46
Example 1: design of TSC 2-out-of-4 checker

Z1 = M 1k & M 1k = ( x1 + x2 ) ( x3 + x4 )
A B
k=2; A=(x1, x2); B=(x3, x4).
k k k k k k Z 2 = M 0 & M 2 + M 2 & M 0 = 1 M 2 + M 2 1 = x3 x4 + x1 x2
A B A B B A
x1 x2 x3 x4 Z1
x1 x2 Z2 x3 x4
47

תכן לוגי מתקדם- הרצאה 9 - Fault Tolerant Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

תכן לוגי מתקדם- הרצאה 9 - Fault Tolerant Design

Uploaded by

Copyright:

Available Formats

Fault-tolerant design

Verification Testing Design for testability Built-in-self-test Concurrent checking

Error Detecting Codes

Error Detecting Codes

Multiple Error Detecting Codes

Unidirectional Error Detecting codes -Definitions

Unordered Codes for Unidirectional Error Detecting

Hasse diagram for the Smith code

Berger vs. Smith encoding

Self-Checking Combinational Circuits Design

Example: PLA checking

Example: PLA checking

Example: PLA checking (concurrency property)

y3 y5 y4 y6 y1 y2 y8 y10 y14 y12 y7

Example: PLA checking (non-concurrency property)

y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14

SOM checker solution

Yt a certain codeword, Q a number of possible code words.

An erroneous output due to a missing device in x2th column cannot be detected

Example: PLA checking (SOM checker)

A fault in the checker cannot be detected

SOM-based checker on PLA Berger encoding

Outputs (y1, , yn)

Parity prediction circuit

Parity prediction function:

Self-dual Parity Checking

Outputs (y1, , yn) f(x)

Comparator Self-dual complement

Self-dual Parity Checking

Self-dual Parity Checking - example

h( x0 , x1, x2 , x3 ) = x0 x1 x2 + x0 x1 x2 x3 + x0 x1x2 x3 + x0 x1x3 + x1x2 x3

Therefore self-dual complement of f(x) is:

Implementation of the function and its selfdual complement

Self-dual parity checking

Self-dual parity checking

Totally self-checking two-rail checker

Truth Table of the Two-rail checker

Can we detect a stuck-at-1 at this point?

TSC two-rail checker with six input pairs

Totally Self-checking checkers for m-outof-n codes

Design of the k-out-of-2k checker

Design of the k-out-of-2k checker

(i = 1, 3, 5, an odd number ) (i = 0, 2, 4, 6, an even number )

where kA and kB are numbers of 1s occurring in subsets A and B, respectively.

Example 1: design of TSC 2-out-of-4 checker

k=2; A=(x1, x2); B=(x3, x4).

You might also like