Abstr

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS 1
Multiple Cell Upset Correction in Memories Using Difference Set Codes

Pedro Reviriego, Member, IEEE, Mark F. Flanagan, Senior Member, IEEE, Shih-Fu Liu, and Juan Antonio Maestro, Member, IEEE
AbstractError Correction Codes (ECCs) are commonly used to protect memories from soft errors. As technology scales, Multiple Cell Upsets (MCUs) become more common and affect a larger number of cells. An option to protect memories against MCUs is to use advanced ECCs that can correct more than one error per word. In this area, the use of one step majority logic decodable codes has recently been proposed for memory applications. Difference Set (DS) codes are one example of these codes. In this paper, a scheme is presented to protect a memory from MCUs using Difference Set codes. The proposed scheme exploits the localization of the errors in an MCU, as well as the properties of DS codes, to provide enhanced error correction capabilities. The properties of the DS codes are also used to reduce the decoding time. The scheme has been implemented in HDL, and circuit area and speed estimates are provided. Index TermsDifference set codes, error correction codes, majority logic decoding, memory, multiple cell upsets (MCUs).
I. INTRODUCTION OFT errors caused by radiation are a major issue for circuit reliability [1]. A soft error occurs when a radiation particle hits the device and changes the logical value of a circuit node. A number of techniques can be used to protect circuits against soft errors. For example, in [2] the use of a new type of ip-op is proposed to correct errors, and in [3] such protection is addressed at the transistor level. In memories, to prevent soft errors from causing data corruption, Single Error Correction (SEC) codes are typically used to protect memories [4]. However, as technology scales, it is more likely that a single particle hit changes the value of more than one memory cell [5]. This phenomenon is known as Multiple Cell Upset [6]. The cells affected by an MCU are physically close, as errors are caused by the same particle hit [7]. To prevent an MCU from causing more than one error in a coded word, interleaving is normally used [8]. With interleaving, the cells that store bits of the same coded
Manuscript received August 31, 2011; revised December 16, 2011; accepted February 08, 2012. This work was supported by the Spanish Ministry of Science and Innovation under Grant AYA2009-13300-C03-01. This paper was recommended by Associate Editor A. Sheikholeslami. P. Reviriego, S.-F. Liu and J. A. Maestro are with Universidad Antonio de Nebrija, E-28040 Madrid, Spain (e-mail: previrie@nebrija.es; sliu@nebrija.es; jmaestro@nebrija.es). M. F. Flanagan is with the School of Electrical, Electronic and Mechanical Engineering, University College Dublin, Beleld, Dublin 4, Ireland (email: mark.anagan@ieee.org). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TCSI.2012.2190632
word are physically separated. This ensures that one MCU affects only one bit per word. However, the use of interleaving adds complexity to the design and can impact area and power consumption [8], [9]. Interleaving is also not practical for Content Addressable Memories (CAM) or for small register sets [10]. Another alternative for memory protection in the presence of MCUs is the use of more advanced ECCs. The main issue with these codes is their implementation complexity. They require more parity bits, larger decoding times and more complex circuits to encode and decode the words. Recent efforts have focused on reducing the decoding time and the decoder complexity. For example, in [11] the use of a parallel decoder for Double Error Correction (DEC) codes was proposed in order to reduce the decoding time. In [12], the use of an approach which combines a Hamming code and a BoseChaudhuriHocquenghem (BCH) code was proposed in order to minimize the average decoding latency (this goal was achieved, as in most cases only the Hamming decoding had to be performed). To simplify the decoding circuitry, the use of One Step Majority Logic Decodable (OS-MLD) codes [13] for memory applications was rst proposed in [14]. In that work, a class of Euclidean Geometry codes that are OS-MLD was studied. Further work on the use of these codes for memory applications was presented in [15]. The results show that the decoder can be implemented with very simple circuitry, but requires a large decoding time. More recently, the use of Difference Set (DS) codes for memory applications was proposed in [16]. Difference Set (DS) codes are also OS-MLD and a method that substantially accelerates the decoding was presented in [16]. The use of OS-MLD codes for memory protection has focused so far on random errors affecting any of the bits in the coded word. However, errors caused by MCUs affect cells which are close together. This paper considers the use of DS codes to correct errors caused by MCUs. A scheme is proposed which can correct large MCUs, based on the properties of DS codes and the localization property of the errors in an MCU. The rest of the paper is organized as follows. Section II introduces Difference Set codes, illustrates the OS-MLD process with an example, and provides a brief overview of the method to reduce the decoding time presented in [16]. In Section III the proposed scheme is presented, and its error correction capabilities are analyzed theoretically and validated by simulation. This section also shows that the proposed scheme may be used in conjunction with the method proposed in [16] to reduce the decoding time when MCU errors are present, and nally presents an evaluation of the proposed scheme in terms of circuit area and
1549-8328/$31.00 2012 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS
Fig. 1. Decoder for the (21,11) DS code.
TABLE I PARAMETERS OF DIFFERENCE SET CODES
speed. Section IV discusses a case study of a memory system design in which the advantages of the proposed scheme are illustrated. Finally, the conclusions from this work are presented in Section V. II. DIFFERENCE SET CODES Difference Set (DS) codes were proposed by Rudolph [17] and Weldon [18] and are based on the concept of a perfect difference set [13]. This section focuses on presenting the main parameters of the Difference Set Codes which are useful for memory protection, on discussing the decoding algorithm and how the decoding time can be reduced using results presented recently in [16]. For completeness, the mathematical construction of the codes is summarized in an Appendix at the end of the paper. A detailed discussion of Difference Set codes can be found in [13]. A. Code Parameters The parameters of binary DS codes with word sizes up to approximately one thousand bits are shown in Table I. is the size of the coded word, is the number of data bits, and is the number of Majority Logic check equations. Note that is odd for all DS codes, and thus for all of the codes in Table I. B. Decoding As mentioned in the Introduction, one interesting property of DS codes is that they are One Step Majority Logic Decodable
(OS-MLD) [13]. This enables the use of a simple decoder. As an illustration, the decoder for the (21,11) DS code is shown in Fig. 1. The decoder is composed of check equations which feed a majority gate. In what follows, we will refer to the inputs of the majority gate as the check bits. When the majority of the check bits are equal to one, the bit is corrected. Decoding is performed one bit at a time, shifting each time the contents of the register by one bit. Then, after 21 cycles, the complete word is decoded. For DS codes, the check equations used in the decoder are derived from the perfect difference set, and all are cyclic shifts of a single basic equation. The values of the shifts used to derive the remaining equations are given by the elements in the difference set. For example, the (21,11) code is derived from the perfect difference set and the rst check equation is given by . The rest of the equations are obtained by cyclically shifting this equation by 2, 7, 8 and 11 bits respectively. For the 2-bit shift, the resulting equation is . The rest of the equations in Fig. 1 are obtained in the same way. Each check equation involves exactly bits, and each bit is checked by one and only one equation (except for the last bit, which is checked by every equation). This is different from other OS-MLD codes such as the Euclidean Geometry (EG) codes considered in [14], [15] where at a given iteration some bits are not checked by any equation. The decoder in Fig. 1 can correct all possible combinations of two errors. If there are two errors, a majority of ones in the check equations will occur only in the cycles for which is in error and will be corrected. For the case of three errors, some of the combinations will not be corrected. For example, if there are errors in bits , and , then would be inverted, causing an error. However, some three-error combinations of errors will be corrected; for example, when errors occur in bits , and . In this case, bit will be corrected in the rst cycle and then the other two will be corrected in the remaining cycles.
REVIRIEGO et al.: MULTIPLE CELL UPSET CORRECTION IN MEMORIES 3
In general, a DS code can correct all combinations of errors affecting up to bits. Some combinations of errors that affect more bits can also be corrected. The number of such combinations can be maximized by applying a modied OS-MLD algorithm [18]. This algorithm performs the decoding illustrated in Fig. 1 multiple times, requiring a different majority level among the check bits to trigger a correction each time. For example, the rst time ve ones are required to trigger a correction, the second time four ones, and the third time a majority of three ones is sufcient to trigger a correction. C. Reducing the Decoding Time One drawback of OS-MLD is the long time required for decoding. For the decoder in Fig. 1, a total of 21 cycles are needed to decode a word. The number of cycles is in general equal to the block length of the code. Therefore, for or the decoding time becomes quite large. In [16], a method that uses the rst three cycles of OS-MLD decoding to detect errors was proposed for DS codes. If errors are detected then the decoding proceeds, but if no errors are detected after the third cycle the decoding terminates. Since in most cases the word will not have errors, this scheme greatly reduces the average decoding time. Simulation results showed that for DS codes, all error combinations affecting ve or fewer bits were detected using this scheme. For errors affecting three or fewer bits, it was proved that all such error patterns will always be detected. Note that the work of [16] considers random bit errors; in the following section we extend the use of this method to the case where MCUs are present. III. PROPOSED MCU CORRECTION SCHEME In this section the proposed scheme for error detection and correction is presented. First, the algorithm that enables effective error correction of MCUs is presented. Then, it is shown that the method to reduce the decoding time introduced in [16] may be used in conjunction with this scheme. Finally, an implementation of the complete scheme is evaluated, including a discussion of the area and latency requirements for the proposed decoding scheme. A. Error Correction Scheme The proposed scheme is based on a property of Difference Set (DS) codes and on the localization of the errors in an MCU. For ease of exposition, the scheme will rst be explained for the (21,11) DS code, and afterwards extended to the general case. For the decoder in Fig. 1, let us consider that there are errors on bits , and (these bits are all checked by equation ). Let us also consider that the decoding algorithm is modied such that at least four ones among the check bits are needed to trigger a correction. First, it is obvious that no bits that are correct will be inverted, since there are only three errors. Next, when the bit that was initially in position reaches position , the other two will be in positions and and all will form part of equation , so that all check bits will be equal to one, and the error will be corrected. Similarly, when the bit that was initially in position reaches position , the bit which was initially in position will be in position . Both will be checked by equation and therefore the error in will be
corrected. Then, there will be only one error left, which will be corrected when the bit that was initially in position reaches position . In general, for DS codes, any bits involved in the same check equation on the rst decoding cycle will again be involved in a single equation when any of these bits is being corrected by the OS-MLD decoder. This follows from the fact that the check equations for DS codes are cyclic shifts of each other. This means that if there are errors only on bits which belong to one check equation, these can be corrected using the decoder in Fig. 1, but requiring check equations to take a value of one to perform a correction. In the following, the number of ones among the check bits required to perform a correction will be referred to as the correction threshold. Obviously, to avoid miscorrections, the number of errors must be smaller than the correction threshold. Therefore, up to bits in error can be corrected with this scheme when the bits are all involved in a single check equation. For the decoder of Fig. 1, this gives a value of three, in line with the example discussed previously. This is in contrast to the value of bits for random errors. This ability of DS codes to correct more errors when they occur on the same check equation can be combined with the error localization in MCUs to provide a more efcient error correction scheme. To this end, the memory organization shown in Fig. 2 is proposed. The bits that belong to the same check equation are stored contiguously, except for the last bit ( in this example) which is only stored in the last block. With this scheme, MCUs will tend to affect bits which belong to the same check equation. Two examples of MCU patterns are illustrated in Fig. 2. In the rst example (marked as MCU 1), the errors in the MCU affect several bits of words 1 and 2 but all are in . Therefore, the errors can be corrected, even if there are three bits in error in word 1. In the second example, (marked as MCU 2), the number of errors is the same, but these errors affect bits in both and . In this case, the errors in word 3 cannot be corrected using the proposed scheme. These examples illustrate how the proposed scheme is able to correct MCUs that affect up to bits in a word. However, when the errors cross a boundary between equations, the proposed scheme is no longer able to correct bits in error. In the following, we derive a general categorization of the error correction capability of this scheme. For a DS code with check equations, let us assume that there are bits in error in a given check equation (the last bit is included as part of the equation that contains the bits in error) and bits in error which are randomly distributed among the remaining equations. Let us also assume that OS-MLD decoding is performed using rst a correction threshold of ones among the check bits, and then successively reducing the correction threshold by one until a value of is used (as discussed in the previous section, this modication was suggested in [18] to correct some additional error combinations). The block diagram of this decoding algorithm is illustrated in Fig. 3. Then these errors can be corrected provided that
Fig. 2. Proposed memory organization, together with examples of MCUs.
TABLE II ERROR CORRECTION CAPABILITIES OF DS CODES USING THE PROPOSED APPROACH, WHERE IS THE NUMBER OF ERRORS IN ONE CHECK EQUATION THE NUMBER OF ERRORS IN THE REMAINING BITS AND
Fig. 3. Block diagram of the proposed error correction algorithm.
To prove this, let us consider rst the case in which is even. Then, since there can be at most errors, then when the correction threshold is , no miscorrections will take place when . Also, for this correction threshold, when any bits corresponding to one of the errors reaches the decoding position, there will be at least ones among the check bits, and therefore the errors will be corrected. The remaining bits in error will be corrected when lower values of the correction threshold are used, since from the condition it follows that . For odd, the reasoning is similar. In this case, if the correction threshold is , when the rst bit that corresponds to the errors reaches
the decoding position, there will be at least ones among the check bits and therefore the error will be corrected. Then the remaining errors will correspond to the case of an even number of errors, which will be corrected when the correction threshold is as discussed previously. Subsequently, the random errors will also be corrected when lower values of the correction threshold are used. Finally, note that since is always odd, the condition for error correction may in both cases be written as the equivalent single condition . These results show that DS codes can also deal with MCUs which affect bits involved in more than two equations. The error correction capabilities of some DS codes are illustrated in Table II. It can be observed that large MCUs can be corrected as grows. For example, for , which is a value similar to the memory word size used in many applications, MCUs affecting 5 bits can be corrected, even when they affect bits in more than one equation. This is interesting as it implies that MCUs with a larger number of errors can be corrected regardless of where they occur. The proposed scheme has been implemented for the DS codes with , 73 and 273. As discussed previously, the OS-MLD procedure is repeated with successively decreasing values of the correction threshold as suggested in [18]. Simulations have then been run, inserting errors in the bits of one of the equations and errors in the remaining bits, for the values of and in Table II. The equation in which the errors are inserted is selected randomly. The bit positions on which the errors are inserted are then randomly selected. Ten million combinations were tested for each pair of values of and . In all cases, all errors were corrected. These simulation results are consistent with the theoretical analysis presented previously. B. Reducing the Decoding Time As mentioned in Section II.C, one drawback of OS-MLD is the long time required for decoding. For the decoder in Fig. 1, a total of 21 cycles are needed to decode a word. The number of cycles is in general equal to the block length of the code. Therefore, for or the decoding time becomes quite large. As discussed before, in [16] a method that uses the rst
three cycles of OS-MLD decoding to detect errors was proposed for DS codes. In this section, the use of the rst three decoding cycles to detect errors due to MCUs is analyzed. We will consider errors due to MCUs which affect bits involved in either one or two equations. In [16] the following Lemmas were proved. Lemma 1: For any DS code, there are no three consecutive values in the difference set. Lemma 2: Given a check equation of a DS code, if a pair of 1s in it is separated by a distance , then there cannot be any other pairs of 1s in the same check equation at a distance or . These results will be useful in the following analysis, which shows how the rst three iterations can be used to detect errors caused by MCUs. Let us rst consider the case in which there are errors in bits that are checked by one MLD equation. If the number of errors is odd, the error will be detected in the rst decoding cycle. If the number is even, the error will be detected in the second cycle, unless there is an MLD equation corresponding to the one in which the bits were on the rst cycle shifted by one position. This is a consequence of the following lemma: Lemma 3: The bits in one MLD equation on a given cycle will belong to different MLD equations on the next cycle, unless there is an MLD equation which is equal to the rst one shifted by one position. Proof: Let us consider that two bits which were checked by one equation in a particular cycle are checked by another equation in the next cycle. Then, if the second equation is not the rst one shifted by one, there must be two pairs of ones at the same distance in the MLD equation. This would contradict Lemma 2. Therefore, if there are errors in the bits of one check equation, those would then be in different equations on the second cycle and the error will be detected. The exception is when the bits belong to a second MLD equation on the second cycle which is the one in which the bits were on the rst cycle shifted by one position; in this case, the error will be detected on the third cycle. This is because from Lemma 1, there cannot be a third equation that is the one in the second cycle shifted by one. Therefore, the following theorem can be stated: Theorem: Given a block protected with a DS code, and affected by any number of bit-ips on a single MLD equation, these can be detected in only three decoding cycles. Let us now consider the case in which there are errors in bits that are checked by two MLD equations. Then the error will be detected on the rst decoding cycle unless the number of errors on both equations is even. Based on Lemma 3, the errors would be detected in the second or third decoding cycle unless the number of errors in both equations is the same. Therefore, the following theorem can be stated: Theorem: Given a block protected with a DS code with , 73 or 273, and affected by up to seven bit-ips distributed over two MLD equations in the rst cycle, these can be detected in only three decoding cycles. Proof: Errors will not be detected on the rst three decoding cycles only when the number of errors is even and is the same for both equations. For two errors on each equation, the
TABLE III AREA AND SPEED RESULTS
resulting set of four errors will be detected based on the simulation results presented in [16]. Therefore, the minimum number of errors that may not be detected is eight (corresponding to four errors on each equation). The previous analysis shows that using the rst three decoding cycles to detect errors will accelerate the decoding signicantly, as most words will be error-free. For errors caused by MCUs, this method will detect all errors which affect seven or fewer bits. This extends previous results in [16] for random errors (in that case, only errors which affect ve or fewer bits were always detected). C. Evaluation The proposed scheme has been implemented in HDL for the different code lengths in Table I. The decoder rst performs three decoding cycles to detect errors, and when there is no error the decoding ends. When there are errors, the procedure described in Section III.A and illustrated in Fig. 3 is used. The designs have been synthesized for a TSMC35 library. The circuit area and speed for the different codes are shown in Table III. In the rst case, the number of equivalent gates is reported, while the maximum clock frequency in MHz is used in the second. The results for the accelerated OS-MLD procedure proposed in [16] are also included for reference. It can be observed that the proposed method requires more circuit area than the previous method. This is due to the additional control logic needed to implement the successive OS-MLD iterations with different thresholds. This area overhead is almost independent of . Therefore, for large values of the overhead is relatively small. For example, for the area increment is only 1.47%. The speed is also worse than that reported in [16] as the majority logic has to work with different thresholds. The speed penalty ranges from 24% to 10% and is also smaller for large values of . The decoding of a word will take three clock cycles when there are no errors. When there are errors a larger number of cycles is required. However, as the percentage of words with errors will be very low, the average access time will be close to that of the error-free case. Finally, to put these results in perspective, a decoder for a SEC-DED code has also been implemented in HDL and synthesized for the same library. The code is described in [19] and has parameters and . The number of equivalent gates is 754 and the speed is 297.6 MHz. Comparing the results with those of the DS code with , the DS code requires 67% more gates and has a speed penalty of 50%. Although this is a large difference, it should be noted that the DS code is a much more powerful code that can correct four random errors and has a larger block size. If the comparison is made with the
Fig. 4. Block diagram of the memory module considered in the case study.
DS code with then the area and speed are similar to those of the SEC-DED code with . These results indicate that the implementation of the proposed scheme is feasible at an acceptable cost and performance. As the DS code requires three clock cycles to decode an error-free word, one option to further reduce the decoding latency would be to implement the rst three decoding iterations in parallel to perform error detection in one cycle. This would improve the speed at the expense of an area increase. The study of this alternative is left for future work. IV. CASE STUDY In this section, a case study of a memory system design is presented to illustrate the benets of the proposed scheme. The analysis focuses on the MCU error correction capability and on the number of parity bits required. The comparison of decoder complexity and speed was considered in Section III.C. Let us consider a memory system that has a word size of 72 bits, as in some computer memory modules [20], [21]. The module is built using nine devices, each with a bit width of 8 bits. The structure of the module is shown in Fig. 4. The following options are considered for implementation of the ECC scheme: 1) SEC-DED with interleaving: The interleaving is implemented by dividing the 72 bits into sub-words and using a SEC-DED code on each of them. For example, if we divide the word into two halves of 36 bits, the odd bits form one sub-word and the even bits form another. This ensures that an error which affects two contiguous bits can be corrected. 2) Difference Set OS-MLD: bits are stored in sequential order and OS-MLD is performed with the correction threshold . 3) Difference Set Based Proposed Scheme: bits are stored grouping the ones which correspond to the same MLD check equation. The OS-MLD is performed successively with correction threshold values decreasing from to . In the rst option, the SEC-DED codes will be of varying block sizes, since the size of the sub-words depends on the interleaving used. For example, when the interleaving distance is four, the sub-words have 18 bits and a shortened SEC-DED (18, 12) code is used. In the second and third options, the DS code with is used by shortening it by one bit. The error correction capabilities of the ECC schemes considered are summarized in Table IV. The parameter considered is the maximum size (number of errors) of MCUs that are correctable in all cases. It is assumed that errors in an MCU affect contiguous bits. It can be observed that for SEC-DED the
TABLE IV PARAMETERS OF THE DIFFERENT OPTIONS CONSIDERED IN THE CASE STUDY
number of parity bits grows with the interleaving distance. However, even with this growth the number of parity bits required to correct MCUs of size four is smaller than the number of bits required for the DS-OS-MLD that can also correct MCUs of size four. The use of the proposed technique increases the MCU error correction capabilities of the DS code to seven, since all of the bits stored in a given device belong to the same MLD check equation. To achieve a similar protection using SEC-DED, a larger number of bits is required. This shows how the proposed scheme can be useful in some cases where large MCUs are a concern. Finally, let us consider that when multiple 8-bit devices are used, one of the devices may suffer a permanent failure. This situation has been studied for commercial computer memory systems [20], [22] and addressed through the use of SEC plus interleaving. In our proposed design, if the system is able to detect the device which has suffered the failure, then the proposed scheme would be able to recover the data. The data recovery process would perform the OS-MLD decoding, but starting with the cycles which correspond to the bits stored in the device that has suffered the failure. Then, since all of those bits form part of a single MLD equation in each of these cycles, at most one MLD equation will be affected and the errors will be corrected. This ability to recover from the failure of a memory device can be useful in some applications. V. CONCLUSIONS In this paper, a scheme has been presented which uses Difference Set (DS) codes to correct errors caused by Multiple Cell Upsets (MCUs) in memories. The scheme places the bits in the memory in such a way as to exploit the localization of the errors in an MCU, thus providing additional error correction capabilities. The new bit placement scheme is used in conjunction with an appropriately modied decoding algorithm. Additionally, a method to accelerate the decoding, previously proposed for DS codes, has been applied to the scheme presented. The results show that the method is also effective in reducing the decoding time when MCUs are present.
The proposed scheme has been validated by simulation using a large number of error combinations and implemented to evaluate its cost in terms of circuit area and speed. A case study was also presented which illustrates the benets of the proposed scheme in a practical memory conguration. Future work will study the applicability of the proposed technique to other One-Step Majority Logic Decodable (OS-MLD) codes. In particular, the class of OS-MLD Euclidean Geometry (EG) codes, which have been previously proposed for memory applications, will be considered. In this case, since at a given iteration some bits are not checked by any equation the proposed technique needs to be modied to account for those bits. Errors on those bits would have the same effect as random errors and therefore they should be placed apart for the scheme to be effective against MCUs. APPENDIX Difference Set codes are cyclic linear block codes which are one-step majority logic decodable and possess a high error-correction capability. They were proposed by Rudolph [17] and Weldon [18] and are based on the concept of a perfect difference set [13]. For a set of nonnegative integers such that , the set of differences can be dened as . The set is then a perfect difference set if and only if it has the following properties: 1. All positive differences in are distinct. 2. All negatives differences in are distinct. 3. If is a negative difference in then is not a positive difference in . From a perfect difference set with , a binary Difference Set (DS) code is a cyclic code which is derived by rst dening the polynomial according to (1) Then the parity-check polynomial greatest common divisor of and is dened as the , i.e.,
any common term except
, for . Therefore, form a set of polynomials orthogonal on the bit at position . Those are precisely the parity check-sums used in OS-MLD that are able to correct up to errors. Finally, the main parameters of the DS codes are summarized below as a function of : Code length: Message bits: Parity-check bits: Minimum distance: Error correction capability: .
REFERENCES
[1] R. C. Baumann, Radiation-induced soft errors in advanced semiconductor technologies, IEEE Trans. Device Mater. Reliab., vol. 5, no. 3, pp. 301316, 2005. [2] M. Zhang and N. R. Shanbhag, Dual-sampling skewed CMOS design for soft-error tolerance, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 12, pp. 14611465, Dec. 2006. [3] G. Torrens, B. Alorda, S. Barcel, J. L. Rossell, S. A. Bota, and J. Segura, Design hardening of nanometer SRAMs through transistor combination, IEEE Trans. Circuits width modulation and multiSyst. II, Exp. Briefs, vol. 57, no. 4, pp. 280284, Apr. 2010. [4] C. L. Chen and M. Y. Hsiao, Error-correcting codes for semiconductor memory applications: A state-of-the-art review, IBM J. Res. Develop., vol. 28, no. 2, pp. 124134, 1984. [5] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba, Impact of scaling on neutron-induced soft error rate in SRAMs from a 250 nm to a 22 nm design rule, IEEE Trans. Electron Devices, vol. 57, no. 7, pp. 15271538, Jul. 2010. [6] R. K. Lawrence and A. T. Kelly, Single event effect induced multiplecell upsets in a commercial 90 nm CMOS digital technology, IEEE Trans. Nucl. Sci., vol. 55, no. 6, pt. 1, pp. 33673374, Dec. 2008. [7] S. Satoh, Y. Tosaka, and S. A. Wender, Geometric effect of multiple-bit soft errors induced by cosmic ray neutrons on DRAMs, IEEE Electron Device Lett., vol. 21, no. 6, pp. 310312, Jun. 2000. [8] S. Baeg, S. Wen, and R. Wong, SRAM interleaving distance selection with a soft error failure model, IEEE Trans. Nucl. Sci., vol. 56, no. 4, pt. 2, pp. 21112118, Aug. 2009. [9] A. Dutta and N. A. Touba, Multiple bit upset tolerant memory using a selective cycle avoidance based SEC-DED-DAEC code, in Proc. 25th IEEE VLSI Test Symp., 2007, pp. 349354. [10] S. Baeg, S. Wen, and R. Wong, Minimizing soft errors in TCAM devices: A probabilistic approach to determining scrubbing intervals, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 814822, Apr. 2010. [11] R. Naseer and J. Draper, DEC ECC design to improve memory reliability in Sub-100 nm technologies, in Proc. IEEE ICECS, 2008, pp. 586589. [12] P. Ankolekar, S. Rosner, R. Isaac, and J. Bredow, Multi-bit error correction methods for latency-constrained ash memory systems, IEEE Trans. Device Mater. Reliab., vol. 10, no. 1, pp. 3339, 2010. [13] S. Lin and D. J. Costello, Error Control Coding, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2004. [14] S. Ghosh and P. D. Lincoln, Dynamic low-density parity check codes for fault-tolerant nano-scale memory, in Proc. Foundations of Nanoscience (FNANO07), Snowbird, UT, 2007 [Online]. Available: http://www.csl.sri.com/users/shalini/fnano.pdf [15] H. Naeimi and A. DeHon, Fault secure encoder and decoder for nanoMemory applications, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 4, pp. 473486, 2009. [16] S. Liu, P. Reviriego, and J. A. Maestro, Efcient majority logic fault detection with difference-set codes for memory applications, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, pp. 148156, 2012. [17] L. D. Rudolph, Geometric Conguration and Majority Logic Decodable Codes, M.E.E., University of Oklahoma, , 1964. [18] E. J. Weldon Jr., Difference-set cyclic codes, Bell System Tech. J., vol. 45, pp. 10451055, 1966.
(2) where . Finally, the generator polynomial of the code is obtained via (3) Given the perfect difference set polynomials according to , dene the
(4) for two polynomials (i.e., for and ). Note that no given by (4) can have
[19] M. Y. Hsiao, A class of optimal minimum odd-weight column SEC-DED codes, IBM J. Res. Develop., vol. 14, no. 4, pp. 395401, 1970. [20] T. J. Dell, A white paper on the benets of chipkill-correct ECC for PC server main memory, IBM Microelectronics Division, Jul. 1997. [21] Memory Interface Solutions User Guide, Xilinx, Inc., 2010, UG086, 3.6. [22] Q. Li and U. Patel, Enabling memory reliability, availability, and serviceability features on Dell PowerEdge servers, Dell Power Solutions, Aug. 2005. Pedro Reviriego (A03M04) received the M.Sc. and Ph.D. degrees (hons) in telecommunications engineering from the Technical University of Madrid, Madrid, Spain, in 1994 and 1997, respectively. From 1997 to 2000, he was an R&D Engineer with Teldat, Madrid, working on router implementation. In 2000, he joined Massana to work on the development of 1000BaseT transceivers. During 2003, he was a Visiting Professor with the University Carlos III, Legans, Madrid. From 2004 to 2007, he was a Distinguished Member of Technical Staff with the LSI Corporation, working on the development of Ethernet transceivers. He is currently with the Universidad Antonio de Nebrija, Madrid. He is the author of numerous papers in international conference proceedings and journals. He has also participated in the IEEE 802.3 standardization for 10 GBaseT. His research interests include fault-tolerant systems, performance evaluation of communication networks, and the design of physical layer communication devices.
Academic Member of the Claude Shannon Institute for Discrete Mathematics, Coding and Cryptography, Dublin. He is a reviewer for many leading international journals. His research interests are in the elds of information theory, wireless communications, and signal processing. Dr. Flanagan is currently serving as an Editor for IEEE COMMUNICATIONS LETTERS. He was a recipient of the ESRF Postdoctoral Fellowship from the Institute of Advanced Studies, University of Bologna, Italy, in 2007.
Shih-Fu Liu received the B.Sc. degree in electronic engineering from Carinthian Tech Institute, Villach, Austria, in 2003, the M.Sc. degree in computer system engineering from the Technical University of Denmark, Lyngby, Denmark, in 2005 and the Ph.D. in industrial engineering from the University Antonio de Nebrija, Madrid, Spain, in 2011. He has participated in various regional, national and European-funded projects, developing radiation fault tolerant circuit designs for ASICS and FPGAs and FPGA-implementations of a runtime-reprogrammable high-performance multirate lter processor. He is currently a full-time Professor and Researcher with the Universidad Antonio de Nebrija.
Mark F. Flanagan (M03SM10) received the B.E. and Ph.D. degrees in electronic engineering from University College Dublin, Ireland, in 1998 and 2005, respectively. He was a Project Engineer with Parthus Technologies Ltd. during 19981999. He served as Principal Investigator on the project Coding for Broadband wireless applications with the Digital Signal Processing Research Group, University College Dublin, during 20052006. Between 2006 and 2008, he held postdoctoral research fellowships at the University of Zrich, Switzerland; the University of Bologna, Italy; and the University of Edinburgh, U.K. He is currently with University College Dublin, where he was appointed as SFI Stokes Lecturer in electronic engineering in 2008. He is an
Juan Antonio Maestro (M07) received the M.Sc. degree in physics and the Ph.D. degree in computer science from Universidad Complutense de Madrid, Madrid, Spain, in 1994 and 1999, respectively. He has served both as a Lecturer and a Researcher at several universities, such as the Universidad Complutense de Madrid; the Universidad Nacional de Educacin a Distancia (Open University), Madrid; Saint Louis University, Madrid; and the Universidad Antonio de Nebrija, Madrid, where he currently manages the Computer Architecture and Technology Group. His current activities are oriented to the space eld, with several projects on reliability and radiation protection, as well as collaborations with the European Space Agency. Aside from this, he has worked for several multinational companies, managing projects as a Project Management Professional and organizing support departments. He is the author of numerous technical publications, both in journals and international conferences. His areas of interest include high-level synthesis and cosynthesis, signal processing, and real-time systems, fault tolerance, and reliability.

Abstr

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abstr

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

Multiple Cell Upset Correction in Memories Using Difference Set Codes

1549-8328/$31.00 2012 IEEE

Fig. 1. Decoder for the (21,11) DS code.

TABLE I PARAMETERS OF DIFFERENCE SET CODES

Fig. 2. Proposed memory organization, together with examples of MCUs.

Fig. 3. Block diagram of the proposed error correction algorithm.

TABLE III AREA AND SPEED RESULTS

TABLE IV PARAMETERS OF THE DIFFERENT OPTIONS CONSIDERED IN THE CASE STUDY

any common term except

You might also like