You are on page 1of 8

Microelectronics Reliability 54 (2014) 2410–2416

Contents lists available at ScienceDirect

Microelectronics Reliability
journal homepage:

Electrical Overstress of Integrated Circuits

K.T. Kaschani a,⇑, R. Gärtner b,1
Texas Instruments, Haggertystr. 1, 85356 Freising, Germany
Infineon Technologies AG, Am Campeon 1-12, 85579 Neubiberg, Germany

a r t i c l e i n f o a b s t r a c t

Article history: Common misconceptions regarding electrical overstress (EOS) and the failure characteristics of
Received 11 March 2013 integrated circuits (ICs) are summarized, analyzed and clarified. In order to avoid EOS fails right from
Received in revised form 7 April 2014 the beginning of the IC design process, a methodology is proposed that accounts for the special
Accepted 9 April 2014
characteristics of ICs and their applications in order to deal with EOS in the design, handling and
Available online 10 May 2014
application of ICs.
Ó 2014 Elsevier Ltd. All rights reserved.
Electrical overstress
Integrated circuit

1. Introduction electrical stress requirements though needed are often neglected

and are therefore shaded in grey in Fig. 1. Hence, IC manufacturers
There is a trend driven by system manufacturers to integrate have to get by with the knowledge that they have learned from
more and more ‘‘EOS protection’’ on-chip due to cost, overall form previous designs and analyses of competitor ICs (2) in order to
factor and performance reasons [1]. Unfortunately, many users of design ‘‘robust’’ ICs (3). Since on the other hand system manufac-
ICs often do not distinguish between the different causes of EOS turers are not aware of the failure characteristics of ICs and of
[2,3]. E.g. users expect the integrated component-level ESD (CL- the electrical stress they are exposed to in their manufacturing
ESD) protection of ICs not only to withstand any CL-ESD event and application environments, system manufacturers often com-
but also to protect ICs from system-level ESD (SL-ESD) and other pare compatible ICs of competitors to select the most ‘‘robust’’
kinds of electrical stress. This lack of distinction has unintention- one (4). Unfortunately, this trial-and-error process causes often
ally been encouraged over many years by IC manufacturers, who unpleasant surprises, e.g. when an IC with an excellent CL-ESD
classified field returns, which were caused by specific types of immunity fails for another kind of electrical stress. In this case, sys-
EOS, simply as ‘‘EOS/ESD’’ [3]. Moreover, despite the remarkable tem manufacturers often request failure analysis reports and
advances in ESD control, many users of ICs still cling to meanwhile detailed information on the failing IC from the IC manufacturer
out-dated ESD requirements, e.g. 2 kV human-body-model (HBM) (5). Unfortunately, they generally do not provide much informa-
[4], or misinterpreted models, e.g. the so-called ‘‘machine-model’’ tion on the conditions that caused the failure. As a result, IC man-
(MM) [5]. At the same time, IC manufacturers are required to sup- ufacturers typically run failure analyses in order to find the root
ply ‘‘EOS robust’’ ICs to their customers without knowing the final cause of the failure and try to fix it by a re-design of the IC (6). If
application and its electrical stress requirements [1]. the root cause of the failure cannot be found and ICs still fail, some
These inconsistencies cause several replacement processes and system manufacturers respond to the assumed lack of ‘‘EOS robust-
result typically in a trial-and-error development process to design ness’’ by increasing their CL-ESD and LU requirements or by asking
‘‘EOS robust’’ ICs (cf. Fig. 1). Unfortunately, only a few electrical for robustness validation of ICs (7). Finally, they rate IC manufac-
stress requirements (e.g. CL-ESD and latch-up (LU) requirements) turers based on the ‘‘EOS robustness’’ of their ICs.
are generally specified by system manufacturers (1). Other The aforementioned inconsistencies and the problems they
cause are often due to common misconceptions regarding ICs
and EOS. These misconceptions and the desire for simple EOS solu-
⇑ Corresponding author. Tel.: +49 8161 80 4348; fax: +49 8161 80 3322. tions are often encouraged by today’s tight time-to-market and
E-mail addresses: (K.T. Kaschani), reinhold.gaertner
cost requirements [6]. Therefore, it is important to understand (R. Gärtner).
Tel.: +49 89 234 23754; fax: +49 89 234 9552117.
and accept that physics does not respond to time-to-market and
0026-2714/Ó 2014 Elsevier Ltd. All rights reserved.
K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416 2411

Fig. 1. Typical trial-and-error development process to design ‘‘EOS robust’’ ICs. (The chronological sequence is indicated by the digits 1–7).

cost requirements. Only by taking into account the characteristics simulated [7]. The non-linear characteristics of active devices
of ICs, their application systems and the different kinds of electrical become especially clear, by taking saturation, breakdown, snap-
stress that they may experience, it is possible to protect them from back, hysteresis and memory effects into account. Especially ESD
EOS. protection elements are typically driven deeply into breakdown,
Given these facts, it is the purpose of this paper to clarify the in order to clamp the voltage while conducting large currents. As
failure characteristics of ICs, to present typical causes of EOS and explained in [8], their breakdown does not necessarily occur
to propose a methodology that accounts for the special character- instantaneously as soon as a certain threshold voltage is exceeded
istics of ICs and their applications in order to deal with EOS in the but may be significantly delayed. The non-linearity of ESD protec-
design, handling and application of ICs. tion devices is also confirmed by the lack of a common correlation
between their HBM robustness and their SL-ESD robustness as
2. Failure characteristics of ICs reported in [1]. In fact, different ESD protection elements were
found to have significantly different correlation factors, which
The expectation of on-chip ESD protection to protect ICs not makes it very difficult (if not impossible) to predict their response
only from CL-ESD events but also from other kinds of electrical to other kinds of electrical stress.
stress is often based on one or the other of the following Even passive devices like resistors, capacitors and inductors are
assumptions: known to show non-linear characteristics, if these devices are
operated beyond their safe operating area, which may occur for
(a) The current path within an IC is assumed to be the same for any EOS [9,10]. Taking into account that almost all ICs employ both
different kinds of electrical stress. active and passive devices, it has to be concluded that ICs are gen-
(b) The same part of an IC is assumed to be damaged regardless erally non-linear systems.
of the kind of electrical stress.
(c) Specific parts of an IC are assumed to respond equally to dif- 2.2. Failure mechanisms
ferent kinds of electrical stress.
(d) The response of an IC to electrical stress is assumed to be The failures caused by EOS can generally be divided into revers-
independent of its mode of operation. ible (soft) and irreversible (hard) failures. A reversible failure can
(e) The response of an IC to electrical stress is assumed to be be removed by a functional reset (e.g. a power restart or a logic
proportional to the level of the given electrical stress. reset) or it can be healed by self-healing or physical treatment
(f) It is assumed that the responses of an IC to single kinds of (e.g. annealing). Reversible failures are caused by electrical stress
electrical stress can be superimposed to give the response exceeding the trigger threshold of a functional failure mechanism
of the IC to superimposed electrical stresses. (a switching operation, i.e. a non-linear mechanism) or by
approaching the failure threshold (e.g. breakdown, another non-
As explained in [2], all these assumptions refer to characteris- linear mechanism) of an IC component. Hence, the failure mecha-
tics of linear systems. Hence, the expectation of on-chip ESD nisms causing reversible failures are non-linear.
protection to protect ICs not only from CL-ESD events but also from Irreversible failures are permanent. They are either caused by
other kinds of electrical stress is typically based on the assumption electrical stress exceeding the trigger threshold of a physical fail-
that ICs are linear systems. Unfortunately, this assumption is ure mechanism (immediate failure) or by accelerated aging
generally wrong as will be explained in the following sections. (delayed failure). The major failure mechanisms leading to irre-
versible failures caused by EOS are thermal overload due to dissi-
2.1. Semiconductor devices and ICs pated energy and dielectric breakdown due to an electric field or
voltage stress imposed for a certain amount of time [11]. As
As shown in [2], active devices (diodes, transistors and SCRs) are explained in [2,12], the relation between the thermal overload
non-linear devices. This is the reason, why their electro-thermal threshold of semiconductor devices and the stress time on the
characteristics are not analytically solved but are numerically one hand and the relation between the time-to-failure and the
2412 K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416

electrical field (or voltage) across gate oxides on the other hand is to an electrical stress is not necessarily independent of its mode
always non-linear. Hence, also the major failure mechanisms caus- of operation, which in turn confirms the non-linearity of ICs.
ing irreversible failures of ICs are non-linear.

2.3. Failure signatures 2.7. Application circuit

According to [2,13] different kinds of EOS typically cause differ- Also application circuits can have a strong effect on the immu-
ent types of failure signatures. E.g. while charged-device-model nity of ICs to electrical stress. E.g. the effects of external capacitors
(CDM) type overstress events generally cause dielectric breakdown connected to the pins of ICs are explained in [2,18]. Depending on
due to their large peak currents, which in turn lead to high overvol- the kind of the ESD protection devices that are integrated with an
tages within ICs, HBM-type overstress events cause melt-filaments IC, external capacitors can be both beneficial or detrimental to the
due to the dissipated energy. In contrast, high energy EOS events electrical stress immunity of that IC. If non-snapback devices are
often result in large molten areas or even blown bond wires [16] employed, external capacitors can mitigate the electrical stress
owing to a tremendous power dissipated over a long period of imposed on ESD protection devices by buffering detrimental peak
time. This shows that there is generally no correlation between currents. However, if snapback devices are involved, external
the failure signatures of different EOS events. Hence, this lack of capacitors may lead to immediate destruction of an ESD protection
correlation confirms again that ICs are non-linear systems. device, as soon as its trigger voltage is exceeded. This shows that
Note, since the major failure mechanisms in ICs depend on the the electrical stress immunity of an IC is not necessarily indepen-
time duration that voltages and/or currents are applied [2], failure dent of its application system and that ICs have to be regarded as
signatures can generally not be clearly attributed to specific volt- non-linear systems.
age and/or current waveforms. In fact, many different voltage
and/or current waveforms can lead to the same failure signature. All of the aforementioned aspects prove the non-linear failure
By means of the type and the size of the different failure signatures, characteristics of ICs. These non-linear characteristics have a
only some fundamental characteristics of voltage and current strong impact on the design of any electrical stress protection both
waveforms that caused the failure can be inferred. Hence, the met- on component-level and on system-level. E.g. the quasi-static IV
aphor of a ‘‘fingerprint’’ that is often used to describe failure signa- characteristics of protection elements and the breakdown voltages
tures is misleading. Failure signatures are not that unique. of devices to be protected are not constant, but have to be adjusted
to each type of electrical stress. As a result, on-chip CL-ESD protec-
2.4. Failure location tion cannot be expected to protect ICs from any other kind of elec-
trical stress.
As already indicated in the previous section, different EOS
waveforms may lead to different failure locations within ICs. As 3. EOS root causes
explained in [2], the failure location can vary for different ESD
events that are imposed on the same pin of an IC because different One of the reasons, why EOS of ICs is so difficult to address, is
stress waveforms can trigger different current paths in ICs. The the lack of a clear and consistent definition of EOS. Traditionally,
more complex the on-chip circuitry is that is connected to a pin, many IC and system manufacturers are used to classify failures as
the more different current paths may be triggered by different ‘‘EOS failures’’ based on the size of the physical failure signature
stress waveforms that are imposed on that pin. This proves that and hence based on the dissipated energy. While small failure
the same part of an IC is not necessarily damaged regardless of signatures (small dissipated energies) are attributed to ‘‘ESD’’,
the kind of EOS. Furthermore, since SL-ESD is often regarded as large failure signatures (large dissipated energies) are attributed
superposition of HBM and CDM and although a successful superpo- to ‘‘EOS’’. Unfortunately, this ‘‘definition’’ is not clear as medium
sition has been reported in a couple of individual cases, the exam- dissipated energies and medium sized failure signatures are con-
ple given in [2] also proves that the responses of an IC to single cerned. Furthermore, it does not account for failures caused by
kinds of EOS can generally not be superimposed to give the electromagnetic interference (EMI) and by misapplication. Given
response of the IC to superimposed EOS. Again, such a behavior the traditional failure classification, any low-energy electrical
is a characteristic of a non-linear system. stress that exceeds the absolute maximum ratings (AMR) of an
IC and causes it to fail is likely to be attributed to the cause
2.5. Window effects ‘‘ESD’’ rather than to the cause ‘‘EOS’’. It is no surprise, that cus-
tomers of ICs request ever-increasing CL-ESD immunities, if the
A PASS–FAIL–PASS–FAIL sequence that is obtained as the stress root causes of IC failures are only determined based on the effects
imposed on an IC is increased is called window effect [2,9]. It is of electrical overstress. Therefore, EOS is defined in this paper as
characterized by failing stress levels that are framed by lower follows:
and higher stress levels, which are passed. Window effects are
well-known and often caused by competition between protection
elements and devices to be protected. They prove that the response
of an IC to an electrical stress is not necessarily proportional to the
level of the given electrical stress. Therefore, window effects are an
indication for the non-linear nature of ICs.

2.6. Mode of operation

According to [17] the SL-ESD immunity of an audio amplifier IC

was found to vary depending on the applied supply voltage of the
IC. When unbiased, the SL-ESD immunity of the IC was found to be
7 kV. However, when the IC was biased, the SL-ESD immunity Fig. 2. EOS in relation to AMR for a given electrical stress type and failure
turned out to be only 1 kV. This proves that the response of an IC probability distribution of an IC.
K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416 2413

Definition. Electrical overstress (EOS) is any electrical stress that tFT are not constant but depend on the given electrical stress
exceeds any of the absolute maximum ratings (AMR) of an IC and type. E.g. the power-to-failure of an integrated diode for constant
causes it to fail, reversibly or irreversibly, immediately or delayed. electrical stress is much lower than the power-to-failure of the
same diode for a 1 ns electrical stress pulse [2,9,10]. Therefore,
This definition and its implications are illustrated in Fig. 2. In AMR cannot be constant either and have to be adjusted for the
this figure, the rate of immediate failures indicated by the solid electrical stress that an IC is supposed to withstand. Hence, in
bold line is plotted over the stress level of a given electrical stress the EOS definition given above AMR are not restricted to constant
type. The stress level range that covers the increase of the failure voltages, but comprise all constant and time-dependent physical
rate from 0% to 100% is denoted as typical failure thresholds limits that are specified by IC manufacturers in order to ensure ser-
(tFT) of all healthy ICs of the same type that experience the same viceability and avoid reversible and irreversible, immediate and
electrical stress. Since failure thresholds are subject to process delayed failures of ICs. This understanding of AMR matches quite
spread they are given by a band rather than a sharp line. Above well the AMR definition given in IEC 60134 [14]. Following the
the tFT, all healthy ICs will immediately fail. Below the tFT, all EOS definition given above, EOS is a specification violation that
healthy ICs are robust against immediate failures; they may results in IC failures. In this connection, the term ‘‘EOS robust’’ is
become subject to accelerated aging and delayed failures though. of course a contradiction in terms, for no product can be expected
Within the tFT band, immediate failures or accelerated aging of to withstand a violation of its own specification.
ICs may occur depending on the individual failure thresholds of In order to develop effective measures to deal with EOS, it is
the given ICs and the electrical stress they experience. AMR sepa- necessary to understand its typical root causes rather than to con-
rate the area of safe handling and application from EOS. The uncer- centrate on its effects. Some of these root causes are shown in the
tainty introduced by unknown electrical stress and unknown cause-effect diagram of Fig. 3. According to this diagram, EOS
application conditions and the impact of accelerated aging on the causes can generally be divided into ESD, EMI and misapplication.
useful life of an IC have to be taken into account when AMR are The category ESD includes both causes of charging and of uncon-
specified. The guard band between the AMR and the tFT accounts trolled discharges. The category EMI is divided into slow voltage
for these factors. Note, in some cases noticeable accelerated aging transients (e.g. uncontrolled power supplies), surge currents, fast
may already occur when the recommended operating conditions voltage transients, noise and EMI due to RF and electromagnetic
(ROC) are exceeded for more than just an instant. Such cases are pulses (EMP). Finally, the category Misapplication is divided into
usually indicated in the datasheets of corresponding ICs. They causes due to assembly, testing, (application) system design, mis-
require that the recommended operating conditions are in total insertion and specification violations. Note, although [15] suggests
not exceeded for too long. that latch-up (LU) can cause EOS failures, LU is not included with

Fig. 3. Cause–effect diagram with typical causes of EOS to ICs (cf. [3]).
2414 K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416

the EOS cause-effect diagram, because LU is not a cause but a failure define precautions in order to minimize unacceptable risks [3]. In
mechanism. Quite the contrary, given the EOS definition of this this way, risk assessments also help to avoid over- or under-engi-
paper, EOS, i.e. electrical stress that exceeds the corresponding neering of systems. To consider all relevant risks both intuitive and
AMR, may trigger a LU of an IC. Given the number of typical causes systematic risk assessments should be carried out.
per EOS type, Fig. 3 nicely agrees with a recent report that attri- Intuitive methods (e.g. brainstormings) should be applied
butes only 6% of customer complaints to ESD but 94% to other first to avoid biasing other people’s mind. For the same reason,
EOS [16]. This clearly shows the need to concentrate on EOS rather individual intuitive methods should be applied prior to joint
than on ESD. intuitive methods. Afterwards, systematic methods should be
4. 5-Steps EOS methodology Systematic methods can be divided into failure-driven risk
assessments and risk assessments of peculiarities. Failure-driven
In order to minimize IC failures due to EOS the 5-steps EOS risk assessments focus on possible failures. ‘‘Bottom-up’’ methods
methodology shown in Fig. 4 is proposed [3]. It comprises the (e.g. failure mode and effects analysis) start with possible failures
following steps: on component-level and assess their impact on system-level and
application-level. In contrast, ‘‘top-down’’ methods (e.g. fault tree
(S1) EOS Minimization covers the minimization of EOS sources in analysis) start with failures on application-level and trace back
the design, handling and application wherever possible. all failures on system-level and component-level that may cause
(S2) EOS Risk Assessment denotes the risk assessment of possible the given failures on application-level. In order to cover as many
and potentially harmful electrical stress in the design, han- risks as possible, it is recommended to use at least one ‘‘bottom-
dling and application of ICs. up’’ method and one ‘‘top-down’’ method [19]. Special attention
(S3) Electrical Stress Specification addresses the specification of should also be paid to peculiarities because they often introduce
the electrical characteristics of inevitable and potentially special risks [3]. Special requirements or features should be taken
harmful electrical stress. into account (e.g. those related to functionality, reliability, time-to-
(S4) EOS Mitigation describes the mitigation of the impact of market or cost). Risks are also introduced by changes (e.g. of
inevitable electrical stress. human resources, materials, supplies, tools, machines, techniques
(S5) EOS Root Cause Analysis covers the failure analysis and root and procedures) or by special environmental conditions (e.g. espe-
cause identification of fails due to unforeseen EOS. cially low/high temperature, humidity, pressure, radiation, noise).
Finally, also possible and (assumed) ‘‘impossible’’ user action
4.1. EOS minimization (S1) (misuse) should be considered [19].
For any unacceptable risk that is identified in the course of the
Every EOS that does not occur, does not need to be risk assessed, EOS risk assessment, it has to be evaluated, if the given risk can be
specified, mitigated and root cause analyzed. Hence, minimization addressed by EOS minimization (S1) or if it needs to be addressed
of EOS sources is the most powerful measure to avoid fails due to by EOS mitigation (S4). Given the power of EOS minimization as
EOS [3]. Minimization can be achieved by paying attention to the explained in the previous section, it is recommended to address
causes of EOS shown in Fig. 3. as many EOS risks as possible by means of EOS minimization
EOS related to handling can be minimized e.g. by ESD control, (S1). Only those EOS risks that cannot be minimized should be
hot plug control and optimized working procedures. EOS related addressed by EOS mitigation (S4). Note, EOS risks can of course
to applications can be minimized e.g. by hot plug control, connec- be generally minimized by best practice design techniques (e.g.
tor design and AMR compliance. EOS related to cross-coupling can EMC design techniques) as well as fail-safe design techniques
be minimized e.g. by noise and EMI control. Standards can help to (e.g. provision of redundant systems) [19]. Specific EOS risks, how-
minimize EOS sources that are related to routine tasks and proce- ever, need to be carefully specified and addressed in order to be
dures, such as the handling of ICs, the design of connectors and the effectively mitigated [3].
control of noise and EMI.

4.2. EOS risk assessment (S2) 4.3. Electrical stress specification (S3)

The purpose of risk assessments is to discover and anticipate Due to the non-linear failure characteristics of ICs, different
risks, to identify their root causes, to quantify their impact and to parts of an IC may be damaged depending on the time depen-
dent waveform of an electrical stress [2]. Hence, a necessary
prerequisite for the mitigation of the impact of inevitable
electrical stress is a complete and detailed specification of its
electrical properties in terms of time-dependent current and
voltage waveforms and repetition rates [3]. Only with such a
specification, system and component designers are able to
develop effective mitigation strategies. Measurements and com-
puter simulations can help to characterize and specify inevitable
electrical stress [20].
Note, the specification of electrical stress requires a fundamen-
tal understanding of the given system: its manufacturing condi-
tions, its components, its application and its environmental
conditions [3]. Often, such a specification cannot be ruled by a
standard. Also, a comparative specification, which only refers to a
related system or component and is often made for lack of time,
is not sufficient because it bears the risk of ignoring serious threats
on the one hand and of driving unreasonable and ineffective over-
Fig. 4. 5-steps EOS methodology. engineerings on the other hand.
K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416 2415

4.4. EOS mitigation (S4) than a careful root cause analysis and correction process. Need-
less to say, problems that are delayed by such ‘‘short cuts’’ do
Typically, EOS involves both the application system and its not vanish but get bigger.
components. Taking into account the non-linearity of ICs and of Example: During the ramp-up of a new IC an increased number
other components, the resulting system affected by EOS can of field returns was observed by OEM 1. The failure analysis of the
become very complex [3]. IC manufacturer revealed a large carbonized area (burn mark)
As already mentioned, EOS can generally be mitigated by best above one of the power transistors that are used to drive a motor
practice design techniques as well as fail-safe design techniques and a smaller carbonized area above the ESD protection element
[19]. However, this is often not sufficient, if specific types of elec- of the corresponding pin. The system manufacturer and the IC
trical stress represent a special risk to the given system. In such manufacturer assumed a weakness of the wafer process to be the
a case, it is beneficial to decouple as many components as possible root cause of the field returns. To verify this assumption, it was
in terms of the given electrical stress from the remaining system. decided to analyze the final test data-logs and the process-control
This can be achieved e.g. in case of ESD by shielding and in case monitoring data of the affected lots and to run an ESD re-qualifica-
of EMI by low pass filters. Such decoupling measures should be tion with increased sample size on all affected lots. The analysis of
introduced as close as possible to the given electrical stress source the final test data-logs, the process-control monitoring data and
in order to be most effective. If a component is effectively decou- the ESD re-qualification did not give any results that could support
pled from the remaining system, an electrical stress created or the assumption of a weakness of the wafer process. A possible
fed into the system will not reach this component. Hence, compo- weakness of the PCB was a priori ruled out by the system manufac-
nents that are decoupled from the system can be designed inde- turer. For lack of time, it was finally decided to introduce a leakage
pendent of system-level electrical stress. As a result, electrical current screening test into the final test of the IC in order to screen
stress decoupling maximizes the freedom of system designers to for weak devices as a containment action. At the same time, OEM 2
design components into their systems. The electrical stress protec- and OEM 3 completed the qualification of products based on sim-
tion of the remaining system and its components has to be co- ilar PCBs of the same system manufacturer. Half a year later, the
designed, which generally will lead to a more special and hence number of field returns increased strongly. After three time-con-
less flexible design of both the component and the system. As a suming troubleshooting processes with three different OEMs, it
result, co-designing will limit the freedom of system designers to finally turned out that the field returns were caused by an insuffi-
apply co-designed components to different systems. However, cient free-wheeling diode that was used on the PCBs of the system
electrical stress decoupling and co-designing do not rule out each manufacturer as a replacement of a costly Schottky diode. This
other. While parts of a system can be decoupled, other parts can free-wheeling diode caused repetitive inductive voltage transients
be co-designed. that were driving the ESD protection element connected to the
Co-designing of IC pins implies that the interfacing IC cir- same IC pin repeatedly into breakdown until it finally failed. This
cuitry has to be designed to partly or completely withstand a failure caused a short-circuit condition for one of the power tran-
given electrical stress. This imposes additional requirements sistors, which was damaged because it was turned-off too late by
on the design of such circuitries, which in turn may impair the overtemperature detection circuit of the IC.
the functionality of the given IC pin. E.g. co-designing a high An effective root cause analysis requires [3]:
speed input pin for SL-ESD would add a significant capacitive
load and/or series resistance to this pin, which in turn could (1) Detailed specification of the failure conditions (When was
distort the input signal. A corresponding co-designing of a sup- the damaged part last known to be fully functional? When
ply pin could increase the leakage current of this pin so much was it first known to fail? Which processes has it experi-
that it may become critical for battery applications. As a result, enced in between?)
decoupling is the preferred mitigation strategy for IC pins that (2) Localization of the damaged structure (Which system, com-
may be functionally impaired by co-designing. These pins are ponent and structure is damaged?)
typically supply pins, high speed or RF input pins and low (3) Identification of the failure signature (How is the failure
impedance, high speed, RF or high voltage output pins. In addi- mechanically, optically and electrically characterized?)
tion, pins that are connected to local PCB networks in order to (4) Narrowing down of possible failure mechanisms (Which
support basic IC functions are naturally decoupled and usually failure mechanisms can explain the given failure conditions,
do not need to be co-designed. Basic IC functions are supported location and signature? This should be done by engineers,
e.g. by decoupling capacitors and charge-pump capacitors for who are familiar with the damaged structure and its failure
internal voltage supplies, grounded resistors, passive feedback mechanisms.)
networks and crystals for internal oscillators. As a general rule, (5) Identification of likely root causes that meet all of the afore-
pins connected to complex IC circuitries and pins, the function mentioned criteria (What is the probability of the experi-
of which is likely to be impaired by co-designing, should be enced processes to cause the given failure? Which
decoupled. processes have the highest probability to cause the given
damage? Computer simulations may help to narrow down
4.5. EOS root cause analysis (S5) the likely root cause, because in contrast to measurements
they do not affect the signal integrity [21].)
Even with all the aforementioned steps (S1) to (S4) meticu-
lously followed, there may be fails due to unforeseen EOS. Due Given today’s tight time-to-market requirements, trouble-
to the complexity of the entire system and the manufacturing shootings have to be both efficient and effective [3]. To meet
process, it is of utmost importance to identify the root cause these requirements they need to be supported by templates
of each fail in the first step before the most effective corrective that guide engineers quickly through the troubleshooting pro-
action on manufacturing-level, system-level or component-level cess. Also a co-troubleshooting by system and component
can be determined in the second step [3]. Unfortunately, this designers may be needed to reach this goal. Finally, given the
simple truth is often ignored for ‘‘lack of time’’. Instead, to fix likely root cause steps (S1) to (S4) of the 5-step EOS methodol-
the unsolicited fail without loss of time often ‘‘short cuts’’ are ogy should be repeated in order to determine effective correc-
taken, which eventually take much more time to fix the problem tive actions.
2416 K.T. Kaschani, R. Gärtner / Microelectronics Reliability 54 (2014) 2410–2416

5. Summary and conclusions Michael Mayerhofer from Infineon Technologies, Christoph Thienel
from Bosch and Jean-Luc Lefebvre from Presto Engineering for
Common misconceptions concerning ICs and EOS that facilitate many inspiring and fruitful discussions on EOS.
IC failures have been summarized, analyzed and clarified. It was
pointed out that ICs have in general non-linear failure characteris- References
tics, which have a strong impact on the design of any electrical
[1] Thijs S et al. SCCF – System to Component Level Correlation Factor. EOS/ESD
stress protection both on component-level and on system-level.
Symp., 2010.
In order to develop effective measures to deal with EOS, typical [2] Kaschani KT, Gärtner R. The system theoretical significance of ESD protection
causes of EOS have been presented. Based on a recent investigation integrated with ICs and its practical implications, 11. ESD-Forum, 2009.
of ESD and EOS failures the need to concentrate on EOS rather than [3] Kaschani KT, Gärtner R. The impact of electrical overstress on the design,
handling and application of integrated circuits, EOS/ESD Symp., 2011.
on ESD was explained. [4] JEDEC. Recommended ESD target levels for HBM/MM qualification, JEP155,
In order to avoid EOS fails right from the beginning of the design August 2008.
process, a 5-steps methodology was introduced to deal with EOS in [5] Kaschani KT et al. On the significance of the machine model, 10. ESD-Forum,
the design, handling and application of ICs. Based on the typical [6] Kaschani KT, Gärtner R. A methodology to deal with electrical overstress in the
causes of EOS this methodology addresses (S1) the minimization design, handling and application of integrated circuits, 12. ESD-Forum, 2011.
of EOS sources, (S2) the risk assessment of possible EOS, (S3) the [7] Snowden CM. Semiconductor device modelling. Berlin: Springer; 1989.
[8] Johnsson D et al. Avalanche breakdown delay in ESD protection diodes. IEEE
specification of inevitable electrical stress, (S4) the mitigation of Trans Electron Dev 2010;57:2470–6.
EOS and (S5) the root cause analysis of EOS failures. Special atten- [9] Amerasekera A, Duvvury C. ESD in silicon integrated circuits. Chichester: John
tion was paid to EOS minimization strategies, EOS risk assess- Wiley & Sons; 2002.
[10] Voldman SH. ESD physics and devices. Chichester: John Wiley & Sons; 2004.
ments, the concepts of electrical stress decoupling and co- [11] Mergens M. On-Chip ESD Protection in integrated circuits: device physics,
designing, typical mitigation strategies and to the measures for modeling, circuit simulation. Konstanz: Hartung-Gorre; 2001. PhD Thesis.
efficient and effective troubleshootings. [12] JEDEC solid state technology association, failure mechanisms and models for
semiconductor devices, JEP122E, March 2009.
Standards that are often called for can only help to solve fre-
[13] Stadler W et al. From the ESD robustness of products to the system ESD
quent problems related to regular electrical stress. However, they robustness, EOS/ESD Symp., 2004.
are not suitable and may be even detrimental for problems that [14] IEC International Electrotechnical Commission. Rating systems for electronic
are rare or related to irregular electrical stress. Needless to say, tubes and valves and analogous semiconductor devices, IEC 60134 ed1.0,
standards cannot improve the robustness of ICs against unknown [15] JEDEC solid state technology association. IC Latch-up, Test, JESD78D, 2011.
electrical stresses and standards cannot protect ICs from specifica- [16] Thienel C. Electrical overstress of automotive semiconductors, Euroforum,
tion violations. Therefore, both system designers and component 2010.
[17] Giraldo S et al. Impact of the power supply on the ESD system level robustness,
designers are asked to pay attention to the special characteristics EOS/ESD Symp., 2010.
of ICs and EOS and to collaborate in order to solve the EOS prob- [18] Verhaege K et al., Analysis of HBM ESD testers and specifications using a 4th
lems in the design, handling and application of ICs. order lumped element model, EOS/ESD Symp., 1993.
[19] Armstrong K. Including EMC in risk assessments. In: IEEE Intl. Symp. EMC,
[20] Reinvuo T, Tamminen P. Measurements and simulation in product specific risk
Acknowledgments analysis, EOS/ESD Symp., 2011.
[21] Kim KH et al., Systematic design technique for improvements of mobile
The authors would like to thank the members of the ESD Forum phone’s immunity to electrostatic discharge soft failures, In: IEEE Intl. Symp.
EMC, 2010.
e.V., Charvaka Duvvury and Hans Kunz from Texas Instruments,
ID Title Pages

546790 Electrical Overstress of Integrated Circuits 7