You are on page 1of 12

An Error Correction Code to Address Neutron Single Event Upsets in Semiconductor Memory

David W. Jensen, Ph.D. Advanced Computing Systems Rockwell Collins

400 Collins Road NE, MS 108-206


Cedar Rapids, Iowa 52498 dwjensen@rockwellcollins.com

April 23, 2002

Introduction

Why concerned about Neutron Single Event Upsets (NSEUs)? Mitigation techniques for microprocessor technology Error correction codes

Block code to address Singe Event Upsets (SEUs) and Multiple Bit Upsets (MBUs)
Summary

April 23, 2002

Page 2

Avionics Platforms

Avionics electronics and communication Upgrades to existing systems High altitude and latitude
55,000 feet Polar route

Life critical and national security critical operation


No single fault, however improbable, shall jeopardize the continued safe continue flight and landing

April 23, 2002

Page 3

Only Avionics?

The trend of current design practice suggests that device density will continue to halve every two years and that memory size will continue to quadruple every two years as well. These factors, along with the ever-decreasing power levels, will cause further reduced energy thresholds in microelectronics semiconductor circuits in the years to come. This suggests that SEU effects are likely to increase 10-fold every five years. For these reasons, it is conceivable that all computer devices - not just those at altitude - will need to be protected from SEU effects within the next 10 - 15 years
John H. Sohn Rockwell Collins, Air Transport Systems

April 23, 2002

Page 4

Mitigation Goals

20-year history of microprocessor development


Worlds first 16-bit CMOS microprocessor Worlds first direct execution Java Virtual Machine (JVM) microprocessor Avionics quality

Identifying design mitigation techniques for commercial fabrication of NSEU tolerant microprocessors

Goal :
Address SEU and MBU in microprocessor elements through design techniques instead of special fabrication technique Initial focus - Device level approaches for soft error upsets Future focus - System level approaches for hard errors, latchup, burnout, and ruptures Total dose issues will continue to require special fabrication techniques

April 23, 2002

Page 5

Proc Technology Mitigation Techniques

April 23, 2002

Page 6

Error Detection and Correction Code


Error Detection and Correction (EDAC) Hamming Created Correction Concept in 1950s Provides correction of errors instead of detection

Still used today

April 23, 2002

Page 7

Multiple Bit Upsets must be Addressed

Scaling of semiconductor device geometries causing MBUs

Single Error Correction / Double Error Detect codes ineffective for these MBUs
Created block code to efficiently address these physically adjacent errors

April 23, 2002

Page 8

Block Code Generation Technique

Adjacent errors always produce a syndrome that is the exclusive-or (xor) of the block code columns in error.
Simple set of guidelines to develop block code matrices that can correct double and triple adjacent errors:
Identify a unique set of syndromes to identify the column bits in error, the double adjacent columns in error, and the triple adjacent columns in error Compute the double and triple error syndrome values by exclusive-oring values of the corresponding single bit columns syndrome values Ensure that no duplicate syndromes exist for the single, double adjacent and triple adjacent errors

Syndrome generation and correction logic comparable to conventional EDAC designs


April 23, 2002

Page 9

Block Code Generation

Implement software to perform generation of code and checking of rules

Genetic algorithm approach used to generate block codes


Search technique used to generate block codes Search technique illustrated

April 23, 2002

Page 10

Adjacent Error Correction Efficiency

Acronyms:
SEC Single Error Correction DEC Double Error Correction DAEC Double Adjacent Error Correction TAEC Triple Adjacent Error Correction

Adjacent error correction nearly as efficient as SEC for 32 bit and 64 bit data

April 23, 2002

Page 11

Summary

Rockwell Collins has an ongoing interest in Single Event Effects (SEE)


Immediate concern for future avionics products Long term concern for land based products Several possible research threads in this area Concerned over issue of SEE in current designs and expect the problem to grow worse in the future

Combining multiple mitigation techniques could enable an NSEU-tolerant, commercially-fabricated microprocessor Presented efficient error correction block code to address SEUs and MBUs in semiconductor memory

April 23, 2002

Page 12

You might also like