Professional Documents
Culture Documents
Common Cause
Failure Analysis
in Probabilistic Safety
Assessment
Proceedings of the ISPRA Course held at the
Joint Research Centre, Ispra, Italy,
16-19 November 1987
Edited by
Aniello Amendola
A series devoted to the publication of courses and educational seminars given at the
Joint Research Centre, Ispra Establishment, as part of its education and training program.
Published for the Commission of the European Communities,
Directorate-General Telecommunications, Information Industries and Innovation,
Scientific and Technical Communications Service.
The publisher will accept continuation orders for this series which may be cancelled at any time and
which provide for automatic hilling and shipping of each title in the series upon publication.
Please write for details.
ADVANCED SEMINAR ON
COMMON CAUSE
FAILURE ANALYSIS
IN PROBABILISTIC
SAFETY ASSESSMENT
Proceedings of the ISPRA Course held at the Joint Research Centre,
Ispra, Italy, 16-19 November 1987
Edited by
ANIELLO AMENDOLA
Commission of the European Communities,
Joint Research Centre, Ispra Establishment, Ispra, Italy
ISBN 0792302680
Commission of the European Communities ^ H ^ H Joint Research Centre Ispra (Varese), Italy
Publication arrangements by
Commission of the European Communities,
DirectorateGeneral Telecommunications, Information Industries and Innovation, Scientific and
Technical Communications Service, Luxembourg
EUR 11760
1989 ECSC, EEC, EAEC, Brussels and Luxembourg
LEGAL NOTICE
Neither the Commission of the European Communities nor any person acting on behalf of the
Commission is responsible for the use which might be made of the following information.
Foreword vii
(S. Hirschberg) 9
The Use of Abnormal Event Data from Nuclear Power Reactors for
Dependent Failure Analysis (H. W. Kalfsbeek.) 289
Index 343
FOREWORD
A. Amendola
CEC-JRC
Institute for Systems Engineering
Systems Engineering and Reliability Division
21020 Ispra (Va) - Italy
1
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 1-7.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
(as such dependency structures are being usually labelled) have
encouraged several proposals for analysis procedures and mathematical
models as well as significant international collaborative projects.
Already in November 1975 a Task Force on Problems of Rare Events
in the Reliability Analysis of Nuclear Power Plants has been set up on
the basis of a recommendation of CSNI (the Committee on the Safety of
Nuclear Installations of the Nuclear Energy Agency of the OECD). A
subsequent CSNI Research Group placed emphasis on protective systems
for nuclear reactors and elaborated a classification system (1) for
common mode failures (terminology problems are discussed elsewhere in
the book (2)) especially directed towards defences against CCFs.
Further insights from the CSNI project and from UKAEA-SRS
researchers in the field came from the Watson Review in 1981 (3).
In the meantime a number of probabilistic models were proposed:
Fleming's model based on the ratio of CCF to total failure rate (4),
the shock model by Apostolakis (5), the common load by Mankamo (6),
the multivariate model by Vesely (7) and the binomial failure rate
model by Atwood (8), which were followed by other models described
elsewhere in the book.
Together with these statistical parameter models to predict CCF
rates, several procedures have been proposed to identify CCF events
and to include them into the system logical diagrams (see a non
exhaustive but relevant literature list at Refs.(9-19)). Also some
data on Diesel generators (20) and pumps (21) have been estimated and
published since then.
Despite this significant theoretical effort, after the Wash 1400,
that used a boundary model for including CCFs, the CCF problem was for
many years either ignored in practice (the first NPP PSAs did not
include CCFs) or poorly approached. This was indeed evidenced in
Europe by the first Systems Reliability Benchmark Exercise (S-RBE)
(22) and in the USA by the identified need to agree on a consistent
classification system (23) as a basis for establishing an adequate
data base of CCF occurrences. Furthermore, problems connected with
sound statistical estimation procedures were identified and originated
discussions which continued even in recent papers (24-29).
The S-RBE project was aimed at assessing the complete procedure
of a reliability evaluation of a complex system by starting from the
basic documentation and familiarization with the reference system.
This was the EDF Auxiliary Feedwater System of the Paluel Unit. It was
constituted by two redundant trains, each one again with a double and
diverse redundancy (motor-driven and turbo-driven pumps). Therefore,
it presented interesting challenges to the expert teams involved in a
CCF analysis.
Participation in the exercise included representatives of major
partners involved in NPP safety assessment in Europe, i.e.
authorities, vendors, utilities and research institutes from EEC
member countries and Sweden.
S-RBE included both a structured qualitative analysis and
reliability modelling and evaluation; furthermore, it was subdivided
into several phases in order to separate the effects of the different
contributors upon the overall spread of the results. During all
- 2 -
working phases and subsequent discussioni, the discrepancy among the
approaches to the CCF problem could not be overcome as a result of the
programmed S-RBE. Indeed, in this respect, it was not a question of
different data or different construction of the fault tree; rather the
way in which the problem was dealt with differed completely, and
appeared to be a natural consequence of different philosophies.
Namely, only few teams attempted to quantify CCF occurrences in the
fault tree, other teams either analysed CCFs in a qualitative manner,
by indicating the reasons why such events could be excluded or
neglected since adequate defences were introduced into the design, or
performed sensitivity analysis of the overall results to selected CCF
occurrences. Furthermore, apparent discrepancies were appearing even
in the terminology and basic diffrenciation among the different
dependency structures implied by the CCF designation (see the analysis
performed by A. Games on the CCF aspects of the S-RBE in Part II of
Ref.22).
Subsequently an ad-hoc project was launched (1984-1986): the
"Common Cause - Reliability Benchmark Exercise" (CCF-RBE) on which
several papers in the book are based, which allowed the participants
to clarify most of the discrepancies in the approaches and to
elaborate a basic consensus on the most appropriate procedures to be
followed to include dependency analysis in PSA. Both s-RBE and CCF-
RBE were part of the JRC programme on NPP risk and reliability
evaluation, coordinated by G. Mancini.
The success of the CCF-RBE was also enhanced by the fact that
this time even USA teams, previously involved in the already mentioned
classification project (23) and in elaborating formal analysis
procedures (30,31), participated in the project. They brought,
therefore, not only their particular experience but also the lesson
learnt from a project on data benchmarking in which many laboratories
in the USA and Europe were involved (23). This project was sponsored
by EPRI and managed by D.H. Worledge, it started in 1982 and had as
objectives a better understanding of CCF and the development of
available experimental data into a data base to support PSA analysis
as well as the planning of CCF defensive strategies. In phase-I of the
project a preliminary formulation and test of a classification system
was performed. A second test was performed in phase-II. And as a
result a classification scheme was elaborated (32,2) which was the
basis for the analysis of NPP experience involving dependent events
performed by PL&G for EPRI (33). In this way the CCF-RBE became also a
significant test for the guidelines in preparation in the USA and gave
very significant insights and feedback to them (34).
A further very significant project on a "Common Cause Failure
Data Benchmark Exercise" was performed in the framework of the "Nordic
Cooperation in Nuclear Safety" which involved industries, research
organizations and authorities of the Scandinavian countries (35).
The seminar, of which this book presents the proceedings, has
been promoted by the wish to compare the findings and to confront the
approaches followed by the participants in the above mentioned
Benchmark projects, as well as to up-date them according to new
researches originated by issues identified during the CCF-RBE.
-3 -
Therefore, the book represents a state-of-the-art review on the
different aspects linked with the problem of identifying, modelling
and predicting probabilities of dependent failures as well as of
adequately defending a system design against their occurrence. It
collects the experiences of most of the teams having participated in
these international projects and is enriched by the analysis of the
operating experience from European NPPs.
In addition to data drawn from incident reports, the use of
component event data is another way to enrich available data bases on
dependent failures. Investigations on this topic have started at the
JRC Ispra (36,37) and have been finalized at NCSR, as described in the
book (38).
In addition to the results presented in this book, the
significant contributions given to the methodological results of the
multiple projects on CCF and, therefore, to the findings described in
the book by all other participants in these projects, must be
acknowledged.
Of course, no problem can be considered to be solved once and for
ever, and further research will certainly increase the consistency of
PSA analyses. However, some issues, on which a large consensus has
been found, should be considered to be definitively clarified at least
as far as their theoretical and procedural foundations are concerned.
These can be summarized as follows (39):
- Dependency structures have a dominant impact on the reliability of
redundant systems and as such should be considered right from the
early design stage throughout the system life cycle. They must be
properly accounted for in any PSA.
- A systematic and structured qualitative analysis is necessary in
order to identify potential multiple failure mechanisms (in addition
to the relevant chapter in the book, see also Ref.(40)), to screen
and rank potential dependent events and to consistently link the
qualitative insights about system design and operation with the
assignment of probabilities.
- Both explicit modelling and implicit (parametric) models are needed
since each technique has its rather well defined application domain.
Dependent events related to a clear cause-consequence structure
should, in principle, be modelled explicitly in the fault trees or
event trees, whereas the residual set of potential dependency
structures, which cannot be represented in the logical model (either
because of too onerous a modelling or because of a less definite
cause-consequence structure), can be captured by parametric models.
- Use of generic CCF parameters should be discouraged. Generic Beta
factors or similar data might be accepted only for screening
purposes when deciding which events should be included in the
probabilistic assessment.
- The parameters for implicit models should be estimated case by case,
starting from the widest possible event data base and screening the
events which potentially affect the design under investigation. This
analysis phase is the key contributor to analyst-to-analyst
variability and, therefore, to the assessment uncertainty, because
of the subjective judgement applied in the process of transfering
- 4-
experience from other NPPs to the one under analysis.
- Event statistics should be preferred to component statistics, even
if the quantitative effects are less important than the spread
introduced by the judgement in the screening of events and in the
assessment of the impact of the events on the plant under analysis.
- The problem of mapping up and mapping down, i.e. extrapolation or
interpolation from lower, respectively higher, redundancy systems to
the system under analysis, identified by P. Drre during the CCF-
RBE, should be consistently approached.
- Updating of prediction in the design phase with operational evidence
should be recommended in a Baysian perspective; and, finally
- Effort in enlarging the dependent event data base should be made by
analyzing both incident and component event files.
All these issues have been dealt with in the different chapters of the
book, which therefore should result to be a comprehensive guide to a
reliability analyst not only to include dependency structures in a PSA
procedure but also to make him aware of the connected problems and
uncertainties.
REFERENCES
S. HIRSC HBERG
ABB ATOM AB
Office of Interdisciplinary Engineering
S721 63 VSTERS
Sweden
1. Background
. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 9-29.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
During the last five years a very rapid development has been taking place
within this field in Scandinavia. In 1983 an international workshop on dependent
failure analysis (ref. 1), sponsored by the Swedish Nuclear Power Inspectorate,
was held in Vsters. The meeting was considered as very successful. The
general feeling of the participants was that some order has been brought to the
somewhat chaotic state of dependency analysis. The recommendations of the
workshop (ref. 2) were followed in several of the Swedish PSAs.
The development work has been and still is carried out by the Swedish
utilities, by ABB ATOM and by the Nordic research institutes partly in the
current PSAs and partly within NKA-projects, SK-1 (ref. 3) and RAS-^70
(ref. . Substantial effort has been directed towards improving the CCF-
quantification models (ref. 5, 6). This has lead to development of an approach
which consistently takes into account the high level of redundancy and diversity
typical for the safety systems in the latest generation of ABB ATOM'S BWRs.
Progress has been made within the recently completed CCF-data Benchmark
Exercise (ref. 4) in problem areas such as:
These findings will be described here based on the summary report of the
Benchmark Exercise (ref. f) and on individual analyses performed by the
participating organisations. More detailed coverage of some of the specific
topics mentioned above may be found in the Nordic contributions to this
Proceedings, addressing the subject of CCF-identification, CCF-quantification
and CCF-evidence based on operating experience.
The importance of dependencies has been recognized in the Swedish PSAs,
where as a rule great emphasis has been put on their treatment. The analyses
performed contain a variety of models, data and assumptions. Thus, it is
possible that modelling aspects could explain some of the differences in the
results of different studies. In view of the substantial uncertainties associated
with the treatment of certain types of dependencies, a major effort has been
undertaken to systematically compare the analyses performed in the Swedish
PSAs. Here, the main results from the qualitative phase of this project will be
described, with main emphasis on the treatment of CCFs. Detailed account of
this work may be found in ref. 7. A corresponding analysis concerning human
interactions has also been performed (ref. 8).
The CCF-data Benchmark Exercise and the systematic comparison of
analyses of dependencies and human interactions were carried out within the
Nordic project "Risk Analysis" (NKA/RAS-470), initiated as a part of the
research program of the Nordic Liaison Committee for Atomic Energy (NKA).
The participants in this project include both authorities, utilities, research
institutes and the Swedish vendor of nuclear power plants. The project
activities are planned to be completed in 1989.
10
2. Nordic Common Cause Failure Data Benchmark Exercise
The Benchmark Exercise has been carried out by four working groups:
Table 1 shows the project schedule of the Benchmark Exercise and the total
resources expended by the organisations involved.
Barsebck 1 771001821231
Barsebck 2 790101821231
Forsmark 1 810101-821231
Forsmark 2 810701-821231
Oskarshamn 1 740101-821231
Oskarshamn 2 760101-821231
Ringhals 1 761001-821231
- 11 -
TABLE 1
Project Schedule of the Benchmark Exercise.
CCF-identification 6.0
CCF-classification 1.5
CCF-quantification inch
uncertainty and sensi-
tivity analysis 8.0
Final report 3.0
Project coordination 1.5
Meetings, admini-
stration 3.0
13
2. RESULTS
14
M MGL^I
x MGLRI
D DAjI
<\ DA v n
<D ADDEPV
> BFR^n
OJ BFR V I
O MFRAA
10-6
Figure 1. Estimated probabilities of observing exactly i (1 = 1, 2, 3, 4) failures per demand and corresponding 90%
confidence intervals (the following index notation bas been used AA = ABB ATOM; R = RIS, S = STUDSVIK, V =
VTT).
a final opinion.
Due to the nature of the Swedish component failure reporting system
(concerns single components), the use of classification systems is of limited
value. Possibly, decisions could be facilitated in some doubtful cases. The
classification systems are cause-oriented, while the available failure reports
provide in the first place information about failure modes. Among the
advantages of classification systems we could mention that they provide a
systematic and standardised way of presenting and saving of the results as well
as a good framework for exchanging information and experiences between
analysts. Consequently, understanding of foreign CCF-events may be facilitated
by use of a good classification scheme.
The quantitative analysis has shown that direct assessment of CCF-
contributions is possible, given comprehensive information including system
flowschemes for identification of redundancies, and number of actuations and
failures of relevant components. Simple parametric methods (e.g. Additive
Dependence model and Multiple Greek Letter method) are still of major
interest; they may be directly combined with data on single failure probabilities
given in the Swedish Reliability Data Book (ref. 13), are easy to apply, are
suitable for checking the impact of modified assumptions, and represent in
practice the only option which may be applied to components which are not as
common at the plants as valves. On the other hand, the Binomial Failure Rate
model is complex and its use may result in arbitrariness. The Multinomial Fai-
lure Rate method needs as input the same type of information as direct
assessment.
In most cases the identified differences in quantitative analyses are only to
a certain extent model-dependent. The handling of data (CCF-definition in
general, definition of potential CCFs, screening, use of impact vectors and
weighting factors etc), is of decisive importance in this context. Consequently,
several of the factors which are essential in the process of CCF-identification
have also a major impact on quantification.
Hopefully, the results of the Nordic Benchmark Exercise will form a
valuable input to a future project aiming at generation of a Nordic CCF-data
book.
- 16 -
and Oskarshamn 3) and for one Westinghouse PWR plant (Ringhals 2).
Details of the comparative overview may be found in the main report
(ref. 7).
1) The issue of equipment related C ommon C ause Initiators (C C Is) has been
addressed in all PSAs. However, only the Ringhals 1 and Forsmark 3 studies
contain a systematic and dedicated analysis of C C Is. Significant C C Is have
been identified in the Ringhals 1 and Oskarshamn 1 PSAs. The coverage of
the Ringhals 1 analysis is not totally clear.
2) Functional dependencies are handled in all studies by the small event
tree/large fault tree approach. Discrepancies which cannot be explained by
design differences exist and are caused by different perception of the
design, different assumptions, or errors in the analysis. A detailed survey
of these differences has been generated within the SUPERASAR project
(ref. 15). Some functional deficiencies have been identified by the Ringhals
1 and Barsebck 1 PSAs.
3) Sharedequipment dependencies are covered in all studies by fault trees
which in most cases are characterized by a high degree of detail (see
table 2. Possible problems may origin from computerized Boolean reduc
tions or/and from manual reductions of these large logical models.
None of the Swedish PSAs contains a documented, systematic and compre
hensive search for physical interaction dependencies. However, a relatively
thorough survey has been made within the Barsebck 1 PSA. Documenta
tion of a similar study within the Oskarshamn 1 PSA is not available.
Influence of normal environment is covered by fault trees and residual
CCFs. The backflush operation in the context of LOC A is considered to be
17
TABLE 2
Rough Survey of Degree of Detail in Systems Modelling of Different PSAs.1)
Ringhals 1 H H H L H M
Barsebck 1 H H M H H L
Forsmark 3 H H H H H M
Oskarshamn 3 H H H H H L
Oskarshamn 1 H H L H .2) L
D The degree of detail is denoted as high (H), medium (M) or low (L)
2
>A note given in the study states that fault trees for RPS exist, but have not been included.
important in the Ringhals 1 and Barsebck 1 PSAs on the one hand, and not
necessary in the Oskarshamn 1 study. Only the Forsmark 3 PSA contains an
analysis of dynamic effects which may follow upon a pipe break. Such
effects may be much more significant for some of the other plants,
particularly those which do not belong to the BWR 75 generation.
5) Human interaction dependencies are represented in the fault trees and
event trees, and also covered by residual CCFs. Generally, the Ringhals 1
PSA contains the most ambitious analysis of human interactions. None of
the studies addresses the problem of errors of commission. Systematic
misconfiguration of redundant components has been addressed in the
Ringhals 2 PSA for some motor-operated valves, in the Forsmark 3 PSA
sensitivity study, and qualitatively in the Ringhals 1 PSA.
6) Non-standard methods for identification and evaluation of dependencies
have been used in some of the studies. Examples comprise e.g. extended
signal analysis within the Forsmark 3 PSA and systematic walk-through
analyses within the Barsebck 1 and Oskarshamn 1 PSAs. The limitations of
this type of approach are substantial.
7) Residual CCFs, which account for the dependencies not covered explicitly
by the event tree/fault tree model, have been used in all PSAs.
- 19 -
TABLE 3
Alternatives for Incorporation of C C Fcontributions into the Fault Trees
Completely automa
tic calculations
o
1) Simplicity
2) C lear definition of parameters
3) C orrectness (within specified limitations)
Generality
5) C ompatibility with existing data sources
6) Assurance of realism
7) Possibility to consider design and systemspecific factors
8) Possibility to distinguish between different failure multiplicities.
The following quantitative methods have been used in the Swedish PSAs:
21
Table 4 summarizes advantages and disadvantages of different approaches to
assignment of C C Fdata, as applied in the Swedish PSAs.
Betafactors from Forsmark 3 and Oskarshamn 3 PSAs and C factors from
Ringhals 1 and 2 PSAs are given in table 5. Betafactors in Barsebck 1 and in
Oskarshamn 1 PSAs are not componenttype specific. In these studies three
respectively two different values of betafactor (0.01, 0.05, 0.10 and 0.05,
0.10), have been used. The choice of the betafactor was in each particular case
dependent on the significance factors assessed in the qualitative analysis.
CCFs for failure to run have been neglected in case of diesel generators at
Barsebck 1, Oskarshamn 3 and Oskarshamn 1. These contributions can be
significant; in Ringhals 1 and in Forsmark 3 they constitute 54% and 42%,
respectively, of the corresponding C C Fs for failure to start.
The higher order parameters, gamma and delta have been assigned values of
0.4 and 0.6, respectively, in the Forsmark 3 PSA, and 0.3 and 0.9, respectively
in Oskarshamn 1 and 3 PSAs. Factor 0.5 has been used in Ringhals 1 PSA when
extending the C factor model to 4 50%, 3 100% or 4 100% systems. No
such extensions have been used in the Ringhals 2 and Barsebck 1 PSAs.
The numerical differences are very significant. C omparison of C C F esti
mates for diesel generators shows that quadruple C C Fcontribution for diesel
generators in the Oskarshamn 3 PSA is 30 times lower than in Ringhals 1 and 14
times lower than in Forsmark 3 (a twin plant). Such large discrepancies will
naturally have a dramatic impact on accident sequences resulting from loss of
offsite power. These discrepancies are not motivated by actual differences in
design and operation of diesel generator systems.
22
TABLE i
Different Sources of CCF-Data in Swedish PSAs.
Extremely difficult to
adjust data to Swedish
conditions
Applicability limited to
operating plants with
moderate or poor sepa-
ration
1'AH three studies use US-experience, but while data for Ringhals 2 have been
obtained after proper design- and application-oriented screening, data for
Ringhals 1 are plant-specific Ringhals 2-data and data for Oskarshamn 3 are
reduced Seabrook-specific data.
- 23 -
TABLE 5
Beta-factors of Forsmark 3 and Oskarshamn 3 Studies and C-factors of Ringhals 1 and 2 Studies
It is obvious from the comparison which has been carried out that diffe
rences exist between the studies with respect to the degree of coverage. Since
the PSAs reflect state of knowledge (some of them better than the others) it is
not possible to decide if any one of them is complete in the absolute sense. The
answer would probably be negative due to the fact that new findings are still
being made and the methodology for treatment of some problem areas is not
well established yet. Apart from that, all the studies are limited in scope and
some of potentially significant dependencies have been excluded from the
analyses. In order to assure reasonable completeness the review process
(internal and external) is extremely important. It should be remembered that
some of the studies compared in this report have not yet been subject to
external review.
The comparison of the degree of coverage of the studies provides anyway in
the relative sense a picture of completeness of the PSAs. Specific problems
have been identified, which are accounted for in some of the studies but not in
the others. These differences may be pointed out and should be the subject of
future studies. At the same time the PSAs exhibit many similarities in the
approaches to dependency analysis. This is not surprising since by and large the
same main frame for the analysis (small event tree/large fault tree approach)
has been used. In spite of the similarities, the parallel studies of accident
sequence and systems analysis modelling have disclosed specific discrepancies
with regard to functional and sharedequipment dependencies, which do not
necessarily origin from the actual design differences, but are due to different
perception of the design, different assumptions or mistakes.
The main problem from the point of view of the comparative analyses, is the
varying standard of the documentation of the studies. This means that some of
the PSAs may be reasonably complete, but their credibility would be higher and
the review process would be facilitated given an improved documentation.
The overall picture is anyhow positive. The identified dependencies consti
tute major findings of several studies and without performing the PSAs would
hardly been detected. This emphasises the fact that qualitative analysis of
dependencies is one of the strongest advantages of PSAmethodology and not a
26
weakness. It is important to stress that point since due to a rather common
misconception, dependency analysis is sometimes viewed as a weakness in the
current state of PSA. On the other hand, quantification of common cause
failure contributions is a definite limitation.
The proposals which may be made on the basis of the qualitative analysis are
preliminary. A more definite picture will probably emerge after performing the
sensitivity studies. Such studies have been initiated.
ACKNOWLEDGEMENT
This work has been carried out within the Nordic project NKA/RAS470. The
support by the Swedish Nuclear Power Inspectorate is gratefully acknowledged.
. References
1. Hirschberg, S., ed., 'Workshop on Dependent Failure Analysis', Vsters,
Sweden, April 2728, 1983. Swedish Nuclear Power Inspectorate and AB
ASEAATOM.
2. Hirschberg, S., 'Summary Report: C onclusions and Recommendations for
Future Work'. Workshop on Dependent Failure Analysis, Vsters, Sweden,
April 2728, 1983. AB ASEAATOM Report RPA 83212, July 1983.
3. Dinsmore, S., ed., PRA Uses and Techniques: A Nordic Perspective. Nordic
Liaison C ommittee for Atomic Energy, June 1985.
4. Hirschberg, S., ed., 'NKAproject Risk Analysis (RAS470). Summary
Report on C ommon C ause Failure Data Benchmark Exercise'. Final
Report, RAS470 (86) 14, June 1987.
5. Pulkkinen, U., ed., 'Proceedings of the C C F Workshop', Lepolampi, Espoo,
Finland, May 1011, 1984. Report RAS470(87)14 (VTT Work Report SAH
38/37), December 1987.
6. Hirschberg, S., 'C omparison of Methods for Quantitative Analysis of
Common C ause Failures A C ase Study'. International ANS/ENS Topical
Meeting on Probabilistic Safety Methods and Applications, San Francisco,
California, U.S.A., February 24 March 1, 1985.
7. Hirschberg, S., 'Retrospective Analysis of Dependencies in the Swedish
Probabilistic Safety Studies. Phase I: Qualitative Overview'. Report RAS
470 (87) 4 (AB ASEAATOM Report RPC 8736), July 1987.
8 Bengtz, M., Hirschberg, S., 'Retrospective Analysis of Human Interactions
in the Swedish Probabilistic Safety Studies. Phase I: Qualitative Overview'.
Report RAS470 (87) 5 (AB ASEAATOM Report RPC 8754), July 1987.
9. Fleming, K.N., Kalinowski, A.M., 'An Extension of the Beta Factor Method
to Systems with High Levels of Redundancy'. Report PLG0289, August
1983.
10. Atwood, C .L., 'Data Analysis Using the Binomial Failure Rate C ommon
Cause Model'. NUREG/C R3437, September 1983.
11. Apostolakis, G., Moieni, P., 'On the C orrelation of Failure Rates'. Fifth
European Reliability Data Bank Association (EuReData) C onference on
Reliability Data C ollection and Use in Risk and Availability Assessment,
Heidelberg, Federal Republic of Germany , 911 April 1986.
12. Mankamo, T., Pulkkinen, U., 'ADDEP Additive Dependence Model'.
Technical Research C entre of Finland, Research Report, February 1985.
28
13. Bento, JP., el al., Reliability Data Book for C omponents in Swedish
Nuclear Power Plants. Nuclear Safety Board of the Swedish Utilities and
Swedish Nuclear Power Inspectorate, May 1985.
If. PRA Procedures Guide. NUREG/C R2300, U.S. Nuclear Regulatory
Commission, January 1983.
15. C arlsson, L., Hirschberg, S., Johansson, G., 'Qualitative Review of Probabi
listic Safety Assessment C haracteristics'. International SNS/ENS/ANS
Topical Meeting on Probabilistic Safety Assessment and Risk Management,
Zrich, Switzerland, August 30 September <i, 1987.
16. Poucet, ., Amendola ., C acciabue, P.C ., 'C ommon C ause Failure
Reliability Benchmark Exercise'. Final Report, C EC JRC Ispra EUR 11054
EN.
17. Laakso, K., 'A Systematic Feedback of Plant Disturbance Experience in
Nuclear Power Plants'. Ph.D. Thesis (in Swedish), Helsinki University of
Technology, December 1984.
29
CLASSIFICATION OF MULTIPLE RELATED FAILURES
A. Amendola
Commission of the European Communities
Joint Research Centre - Institute for Systems Engineering
21020 Ispra (Va) - Italy
1. INTRODUCTION
31
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 31-46.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Collective learning processes have been implemented through
"Benchmark Exercises", which have resulted in a wide consensus of the
technical community on the way of treating dependency structures in
reliability assessments.
By historically reviewing the proposed classification schemes and
discussing the recent taxonomies, the paper shows that, even in the
lack of a unified terminology, ambiguities should no longer exist in
dealing with multiple related failures in probabilistic safety
assessments (PSA).
32
TABLE II The CSNI classification scheme (3)
CMF CAUSES
I
ENGINEERING (El
1
OPERATIONS (0)
1
DESIGN (ED) CONSTRUCTION (EC I PROCEDURAL I0PI ENVIRONMENTAL IOE)
I I _L l_
| (EOF) |(EDR) |IECM) llECI) 10PM) | | () (0EN|| I I0EEI
FUNCTIONAL REALISATION MANUFACTURE INSTALLATION MAINTENANCE OPERATION NORMAL ENERGETIC
DEFICIENCIES FAULTS COMMISSIONING I TEST EXTREMES EVENTS
Hazard
I Inadequate Inadequate I I
Channel Ouolily Quality Imperfect Operator Temperature
Undetectable Repair Fire
Dependency Control Control Errors
I I
Inadequate Common
Inadequate
I Imperfect Inadequate Pressure Flood
Instrumentation Operation t Inadequate Testing I I
Standards Procedures Humidity Weather
Protection Standards
Inadequate Components I I I
Control Inadequate I Imperfect
Inadequate
Vibration Earthquake
Inspection Inadequate Calibration
Supervision I
Operational I Explosion
Deficiencies
Inspection
I
Inadequate I Imperfect
Communication
Acceleration
I
I Testing
Inadequate Procedures
I Error Stress
Missiles
Inadequate Testing t Inadequate Corrosion Electrical
Components Commissioning Supervision Power
Contamination I
Design Errors
Radiation
Interfere
Design
Limitations I Chemical
Radiation Sources
I
Static Charge
TABLE I - List of CMF categories (2)
2. Design Deficiency
- unrecognized interdependence between "independent"
subsystems, components
- unrecognized electrical or mechanical dependence on
common elements
- dependence on equipment or parameters whose failure or
abnormality causes need for protection
4. External Phenomena
- tornado
- fire
- flood
- earthquake
5. Functional Deficiency
- misunderstanding of process variable behaviour
- inadequacy of designed protective action
- inappropriate instrumentation
- 34
term MRFs was proposed, where multiplicity should apply also to
multiple in-time failures of a same component when its failure rate
does not correspond to the expected behaviour because of whichever
possible factor like wrong maintenance, environment, etc.
Instead of attempting to give an exhaustive definition of MRFs 0 ,
we retain it as more appropriate to distinguish among the several
kinds of possible dependency structures, as these may deserve
significant differences in the corresponding modelling approaches.
This is also the conclusion of the very comprehensive project on
CCF classification sponsored by EPRI and tested through benchmark
exsercises involving both USA and European expert teams: "the label
CCF is an inadequate descriptor of events that can be more precisely
designated by an adequate classification system. It is, therefore,
recommended that the technical community discontinue the use of this
simplistic term" (5).
3. CLASSIFICATION SCHEMES
35 -
3.2 C lassification vs Effects
E Electromagnetic
- Welding equipment, rotating electrical
interference machinery, lightning, power supplies,
(EMI) transmission lines
R Radiation damage
- Neutron sources, sources of ionizing
radiation
M C onducting medium - Moisture, conductive gases
V Out-of-tolerance - Power surge
voltage
I Out-of-tolerance - Short circuit
current
which may provoke multiple failures are the driving elements of the
classification. Indeed "an extraordinary number of secondary failure
causes (generic causes) can be found that would result in component
failures. Also, several different events (sources) may result in the
same cause of secondary failure. Therefore, the analysis is initially
directed towards the generic cause of component failure rather than
the specific event that results in the component failure" (7). For
example, the effects of the generic cause "vibration" might be the
36
same independently of whether the source for vibration is internal
(like machinery in motion) or external (earthquake). The list of the
generic causes distinguished according to their nature is complemented
by a list of further possible common links including factors like
proximity, procedures, etc. (Table I V ) .
Field data on MRFs can be derived from both incident files and
component event files. In order to sort and analyze the relevant data,
appropriate classifications are needed which are not necessarily
coincident.
In fact, an incident description needs "a method for logically
dissecting multiple component unavailability scenarios into individual
37 -
component unavailabilities" (8), in order to identify related
failures. On the contrary, by starting from component failure event
data, "the principal objective in the analysis of component failure
data is not to dissect multiple unavailabilities in a same event
description but to detect relationships between unavailabilities
recorded in separate component event descriptions" (9).
Samples from incident data (LERs) were used by EPRI in order to
test the proposed classification (5) and to finalize it by
incorporating the comments arisen and removing the ambiguities
discovered during the Data Benchmark Test (8), whereas component data
from CEDB (10) were started being analyzed at JRC by sorting data
according to a MRF classification (1,9) established for the specific
purpose. Now both classifications are complementary and can be
usefully linked in data analysis; this is extensively described in
another paper of the present book (11).
The principles of the EPRI/LATA classification are briefly
described in the following, whereas the reader may refer to Ref. (11)
for details on the MRF classification used for sorting relevant
component event data.
The EPRI/LATA classification (5) as well as the Data Benchmark
Test promoted by EPRI have been very useful in diffusing some
important concepts within the technical community, which have
contributed in removing ambiguities and inconsistencies from PSA
praxis. Furthermore, it allowed data relevant for MRF analysis to be
consistently sorted out from LER data which are now currently screened
for parameter estimation as several papers in this book will show.
The objectives of the classification are:
- to describe a method for understanding multiple component
malfunction event scenarios;
- to suggest data management methods for information storage and
retrieval, to support model development and defensive strategy
analysis.
The classification, as well as the identification of failure sequence
events, is helped by graphical symbols and cause-effect logic
diagrams.
The set of symbols is given in Table V, whereas a cause-effect
logic diagram is exemplified by the following example:
n/rt Caute N o d
- 38
(gHShSKS^
1
RF Root-caused failure
RU Root-caused (functional) unavailability
CF Component-caused failure
CU Component-caused (functional) unavailability
SRF Shared root-caused failures
SRU Shared root-caused (functional) unavailabilities
SCF Shared component-caused failures
SCU Shared component-caused (functional) unavailabilities
- 39
CflEponentCustd
Sti tei
Functional Un Functional Un
FiHurtF awlliblHtyU HUed FU Fl1lureF ivllllbllltyU MlMd FU
RF RU
KJn S"
n i l 1 n i l 2 n i l 3 1 4
< < :
n i l
VSyhn VSAHSn
I 2 5 i 2 6 i l 7 i 2 12 ! S1 10
SRFU S RUU SRHU SCFU SOU SO
d i t t o figure d i t t o figur d i t t o figure d i t t o figure d i t t o figure ditto figure
11 12 13 li
I
CFtPX CU*PX
RF*PX RUPX
C
CKn (i!Xr -N i^ira (\-
I
n.l^r 1 1Jr n i l '
~J
ril r> 1
16 r i l 19 r i l 20
fV^n 1
/sys' /V/M
\SA^
i 2 ' i 2
l s / ^ Is/^g.^'r
i l
i l
TPxlr
r i l 2' r i l 25 r i l 26
* 3:
*> RPX
s
^ m
" i
?
~~\***
^ Nulr
Jpxl,
s
o
. j 1 33
40
TABLE VI C ause c o d e s .
|\] Functionally Unavailable Component
^O Failad Component
D) Design/Manufactunng/Construction
Inadequacy
S
Design E r r o r or Inadequacy
M a n u f a c t u r i n g Error or Inadequacy
e
(po) Defective Operational Procedure
Defective C a l i b r a t i o n / T e s t Procedure
Maintenance
Environmental Stress
Fire
( Temperature (high or low)
Chemical Reactions
Vibration Loads
Unknown
41
4. CLASSES OF DEPENDENCY STRUCTURES VS PSA MODELLING
42
4.5 Dependency Structures Linked with Human Factors
43 -
reliability assessment.
Possible multiple failures, caused by human interventions in test
and maintenance operations (events like A) can be identified through
an in depth task analysis of the corresponding procedures and of the
man-machines interface. It might also be possible to estimate some
probability figure for mutiple failures by using the methods proposed
for human reliability assessment; these however still give rather
uncertain results. Use of field data should therefore be recommended.
In practice, the inclusion of these events in the generic dependency
classes to be dealt with via parametric models might also be useful.
The existence of dependency structures between failure of
diagnosis (F) and hardware and software of the man-machine interface
and, therefore, correction of possible design (Deprocedure (E) faults
can be investigated by the tools described at Ref.(15)
characteristical for the assessment of human reliability. The major
difficulties in such an assessment are the dynamic aspects both of the
process and of the operator interventions; these can be better
experimentally investigated via replica simulators or can be modelled
via proper dynamic tools (16) which, however, are still in a
development phase. On the other hand a recent investigation project on
status of human reliability assessment has shown the existence of
large uncertainties in the human reliability quantitative estimation,
whereas a good consistency in identification of possible faults in
procedures and man-machine interfaces (17,18) does exist.
44
commissioning tests or normal operation which may become evident
under particular system demands.
Some of these faults present a "debugging" pattern similar to that
encountered when dealing with software reliability. It would be
theoretically possible to analyze and model each specific factor.
However, because of the paucity of the data for each single
phenomenon, and of the very significant increase in size and
complexity of the resulting system model, it is much more cost-
effective to include all these residual factors within the so-called
implicit or parametric models which are the subject of several further
chapters of this book.
REFERENCES
- 45
rates', Proc. of the 5th EuReDatA Conf. on Reliability Data
Collection and Use in Risk and Availability Assessment,
Heidelberg, April 9-11, 1986, Springer Verlag.
(14) USNRC: 'PRA procedure guide: a guide to the performance of
probabilistic risk assessments for nuclear power plants',
NUREG/CR 23000, April 1982.
(15) I.A. Watson: 'Human factors in reliability and risk assessment in
reliability engineering', Proc. of the Ispra Course held at
Madrid, September 22-26, 1986 (A. Amendola and A. Saiz de
Bustamante, eds.) Kluwer Academic Publishers (1988).
(16) A. Amendola, U. Bersini, .C. Cacciabue and G. Mancini: 'Modelling
operators in accident conditions: advances and perspectives on a
cognitive model', in Cognitive Engineering in Complex Dynamic
Worlds (E. Hollnagel, G. Mancini and D.W. Woods, eds.) Academic
Press, 1988.
(17) A. Poucet: 'Survey of methods used to assess human reliability in
the human factors reliability Benchmark exercise', in Accident
Sequence Modelling (G.E. Apostolakis, P. Kafka and E. Mancini,
eds.) Elsevier Applied Science, 1988.
(18) A. Poucet and A. Amendola: 'State of the art in PSA reliability
modelling as resulting from the international Benchmark exercises
project', NUCSAFE 88 Conf., Avignon, October 2-7, 1988.
46 -
DESIGN DEFENCES AGAINST MULTIPLE RELATED FAILURES
Humphreys
National Centre of Sytems Reliability
UKAEA
Wigshaw Lane
Culcheth
Warrington
WA3 4NE
47
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 47-70.
1989 ECSC, EEC, EAEC. Brussels and Luxembourg.
2. DEFINITION OF MULTIPLE RELATED FAILURES
P P x P
AB = A B C D
However since we know that multiple related failures occur, we
must express the possibility that a set of occurrences are not
necessarily independent in relation to one another.
The equation
P P x P
AB A (B/A) (2>
implies that the system failure probability P._ is the probability of A
failing, times the probability of failing, given that A has already
failed thus taking some account of the dependency issue. If the
failures of A and are independent then equation 2 reverts to the form
of equation 1.
48
3. THE DEVELOPMENT OF DEFENSIVE STRATEGIES AGAINST MRF's
49
l p CMP. Causes
Oeeigw
Manufactuts
RNMMUM
(ECM1
(EDRI
o Channel 'nedeouate
Dependency duality
Control
Common
Ooeretton A inadequate
Protection Siendards
Component j
inedequete
Operationel Inepeciion
Deficiencies
Inadequate
'naoeauete Testing
Components
Devon Errors
Limitations
r m s jrsTtH
io*
A
li' , io
1
>. l i ' ' li *
9
l" io"'
=
1 [
10
io
10
li'
TYPIS t SYSTIH
f T uatratt lytliai
I MCh I
52
TABLE I
Operational procedures:
Maintenance
Proof testing
Operation
Reliability assessment
The report also related the list of defences to the CMF cause
classification of Fig 1, indicating which of the defences could
counteract each of the specific CMF cause categories (Fig 3 ) .
The impact of system design as a major defensive measure was
assessed and a guide to the possible system failure probability was
provided. This indicated the benefits to be gained in moving from a
single channel system, to implementation of system redundancy and then
system diversity (Fig 4 ) .
The advice given in SRD-R-196 (2) has been used in the UK as a means of
aiding the system designer in the achievement of reliable systems
design. The CCF-RBE provided an opportunity for the safety analyst to
use the advice of reference 2 to develop a framework for plant
assessment and at the same time to use the CMF defensive strategy as
part of the modelling and quantification of unavailability.
- 53 -
CMP OEF ENCES CMP CAUSES
OESIGN REVIEW
FUNCTIONAL DIVERSITY
EQUIPMENT OIVERSITY
FAILSAFE OESICiN 1
OPERATIONAL INTERF
A CES
PROTECTION I SEGREG
A TION
REOUNOANCY t VOTING
DERATING t SIMPLICITY
CONSTRUCTION CONTROL
TESTING I COMMISSIONING
INSPECTION
CONSTRUCTION ST
A NO
A ROS
OPERATIONAL CONTROL
RELIABILITY MONITORING
MAINTENANCE
PROOF TEST
OPERATIONS
0.001
The defensive strategy (1) was reviewed and a list of nineteen CMF
defences was defined. The target plant for the CCF-RBE was then
assessed against each of the major CMF causes shown in Figure 1. The
effectiveness of each of the defined defences in combatting the common
mode failures was then assessed. The partial factor dependent
failures model developed by G Edwards (3) was then used to quantify
the effectiveness of each defence, and thus arrive at a value for
system unavailability. Figure 5 provides an example of the format of
the defence/cause link obtained thorough the application of the Partial
Factor model.
5. RE
C ENT DEVELOPMENTS IN DEFENSIVE STRATEGIES AND DEPENDENT FAILURES
MODELLING
- 56 -
6. C ON
C LUSION
REFEREMCES
- 57 -
Pre-occupational Failure Causes
DOO Design
DIO Design Requirements/Specifications Inadequacy
D20 Design Error or Inadequacy
F00 Manufacturing
FIO Manufacturing Error or Inadequacy
COO Construction
CIO Construction, Installation and Commissioning Error or
Inadequacy
- 58
Operational Failure Causes
MOO Maintenance
MIO Failure to Follow Maintenance Procedures
000 Operation
010 Failure to Follow Operating Procedures
Figure 6. continued
59
SAME:
200 Procedures
Procedures relating to 210 Specification
220 Design
230 Manufacture
240 Installation
250 Operation
260 Maintenance
261 Test
262 Calibration
60 -
561 Test
562 Calibration
600 Location
800 Hardware
Hardware 810 Manufacture
820 Type
830 Design Principle
900 Timing
Figure 7. continued
- 61
DIFFERENT:
200 Procedures
Procedures relating to 210 Specification
220 Design
230 Manufacture
240 Installation
250 Operation
260 Maintenance
261 Test
262 Calibration
62
561 Test
562 Calibration
600 Location
800 Hardware
Hardware 810 Manufacture
820 Type
830 Design Principle
900 Timing
Figure 8. continued
- 63 -
100 Design
210 Manufacture/construction
211 Materials and component quality control
212 Quality control of manufacturing methods and standards
used etc
220 Installation
221 Inspection
222 Commissioning - full functional and interface testing
300 Operation
410 Test
411 Proof testing
412 Extended testing
413 Routine inspection - plus post breakdown and maintenance
werk
414 Condition monitoring
420 Preventative maintenance
430 Maintenance procedures review
440 Maintenance personnel training
450 Adequate supervision of maintenance personnel
- 64 -
TULf A . . 1 (ShMt 1 of 4 )
17.11.74 n t w N1U Rod drop C M M d power e l l t hut Design e r r o r . Buffer a n e l i H e r COP Inadequacy Parti! Daagar 1
blw 1 power lakelance detector cow14 e l d not peral t t h i c h e a g . Mee
not be M t Low 72Z Co require effected ednUaletrtive l y .
MX
13. 2.75 Calvert C l i f f 1 Two higa power Crip cKanael H o l aplkea froei two larga Dependency Partial Daagar
roMt autoaatlcally r e l a y In l i a cabinet
29. t . 7 3 U M la 1 latea Drywell IPS end KCCS pressura Condensat In aenalng linea incapability Partial Daagar
w i t c h o p e r a t oute loe eue to teaaereture difference
12. 4.74 Irvaavlcfc 2 3 reactor lew l e v e l nwltchea Design he ued high level rror Partial
oat of c a l i b r a t i o n witch la error
21. 1.74 11 R Robin Delayed operat loa of two B r i t t l e c o l l l i n e r apparently EDR Inadequacy
reactor t r i p r e l a y causad by battery charging
voltage e f f e c t
29. S.75 HILletoaa 2 put During t a s t i n g , v a r i a b l e Dealgn d e f i c i e n c y ceueed n o l CDR Partial Daagar
CE aetpoint reaped up w i t h I n t r i p c i r c u i t to raset peak
detector
29. 4.7) Rancho Saco I S e t p o l n t e r r o r for f l u x / One point of "offset versi Partial Danger
labelence/flow t r i p power' during conversion
calculation
Appendix 1 Example:*
TABLE . 1.1 (Sheet 2 of <>)
29. 7.72 1 ( Rank Point nn Indication loat froa 2 atari Faulty cable between chaaber EDB Dependency Coaplete Danger
CE up channela and chaaber drive unit
27. 5.72 Oraaden 2 BUt 2 reactor trip relay fro 1SV Liait switch 'hanging up" CM Dependency coaplete Danger
did not operate on teat Internally under teat condition
22.12.72 Haddaa Hack PUR When one breaker racked out It Interlocking circuit dealgn BDI Dependency Coaplete Danger Occurred on
prevented other froa operating error pairs uf or
for anelli*,
services
23. 8.76 San Onorr I 3 feedwater control ayatea on F a i l u r e of r e c t i f i e r a in 13 EDt Dependency Coaplete Dangar
backup source, which waa de v o l t a power supply
graded euch that flow trips were
Inoperable
9.12.76 Calvert Cliff 2 4 preaaure awl tche had ex Systea design Trip Safe
cessive vibration, wired
different to other switches and
2 leaked fluid
1 16. 1.7 Hadda nack PU Reactor scraaaed froa 2 spurious Inatruaent sensing lines froten E DK Error Trip Safe I
UE ST high stsaa flow signals
S
1 20. 1.76 Davlalaaaa 1 PUR Analogue a a p l l f l e r 420 aA input Calibration lnforaatlon froa Quality Control I During
signals did not correspond w i t h Instruction aanual Incorrect construciei
t r i p b i s t a b l e aet pointa
Dreadan 1 BUI 4 chaaber a a p l l f l e r for incore Pinched wires In defective Inspection Coaplete Danger I
ce nucleonic inatruaenta f a l led transformera
runawtck 2 BUt A l l 4 Drywell high preaaure Callbrstlon error Coaalsslonlng Partial I During
CE witches out of c a l i b r a t i o n coaaisslonlf
Varaont Tankaa 1 BUR 3 Drywell pressure switches not Open ended tees between InaiailatIon Coaplete Danger I
CE operative Isolation valves and switches
Salan I PUR Both IP nuclear power channels Prior wetting of detectors During
Coaalsslonlng Coaplete Danger I
UEST Inoperable caused Increased connector coatmlsaloAlr
resistance
14. 5.72 Hontlcello 1 Average reactor power aonltora Incorrect calibration proce
Procedure Partial Danger 1
Indicating lower than theraal dures during startup
power
26.10.72 Varaont Yankee BUt 2 stcaa flow DP switches failed Suspected locking devices Test Ing Partial Danger I
to operate at required value caused setpolnta change
TULI .).I (Shet 3 of 6 )
D.M.73 Oconee 2 Reactor coolaat preaaure c r i p Set according to Oconee 1 which Procedure Partial Danger
Oconee 1 e t t i n g a lov were alao In error
24. 2.74 MlitdM PVI Two puap operation low ( l u x Operational d i f f i c u l t i e s prevent Procedurea Pertlal Danger
cc t r i p atpointa a l l g h t l y lo e r r o r adequate t e a t l n g
ti. 3.74 Pallad 1 3 of 4 contalnaent high preaaure C a l i b r a t i o n procedure not Procedurea Partial Danger
ce switch aetpolnta e r r o r allowing aargln for d r i f t
PVI 4 coolant preaaure tranealtter Set point d r i f t due to long Supervlalon Partial Danger
c a l l brat loo errore (24Z) period without t e a t l n g
Cooper 1 RWR 3 reector preaaure awl tchea Mechanical I n t e r f e r e n c e of pair Partial Danger
operated at Incorrect preaaure cover p l a t e with awltchee
I ? . 4.75 Pallaadaa 1 PWR 3 of 4 reactor preaaure awl tchea Incorrect c a l i b r a t i o n Procedurea Partial Danger
J CC c a l i b r a t i o n e r r o r (51) procedure
I
Three H U A l l 4 high power t r i p aet Confusing c a l i b r a t i o n Procedure Partial Danger
17. 4.75 Island 1 ce pointa Incorrect (4Z) procedure
23. .75 Oreaden 3 PWt 1 reactor preaaure evltchea had Drift due to extended period at Supervlalon Partial Danger
high eatpolnte (5Z) tero preeaure and no teat
P r a l r l a I eland 2 Roth IP channel t r i p e operated Not known why aetpolnta In error OPa Tee ting Partial Danger
at high value during a t a r i up
(5X)
10. 5.74 La Croaaa Reactor preaaure and water l e v e l Teet procedure aeeuai linear
channel had non l i n e a r i t y which relationship
reaulta In erron oua t r i p pointa
22. .75 PUgrla 1 3 turbine preaeure awttchea aet Teat procedurea error Procedure Partial Danger
pointa e r r o r
1. 6.76 Kewaunee 1 luabar undervoltage t r i p relaya Procedure changed to give Procedurea Partial Danger
a e t t l n g had l a i l l errore tolerance for d r i f t
PWR 4 power range d e t e c t o r Delta I Detectora were being rcacaled < Supervlalon
WEST l u e u t o r i had c a l i b r a t i o n errore but Delta I auaaatora were
Inadvertently o a l t t e d .
TABLE .4.1
World airline accidents - CMF summary
(Sheet 1 of 12)
NB: The information contained in this Table Sheets 1 to 12 except the last
column (CMF class) was obtained from Reference 29 of the main report
10. 7.59 C-46 ALN Port propeller broke, causing ex- S O.EE
SA cessive vibration and fire in (E.DR)
the fuel tanks. (Engine fragments
also seriously injured the captain).
- 68
TABLE A.4.1
World airline accidents CMF summary
(Sheet 2 of 12)
31. 1.61 C46N AAT Landing gear warning horn and S O.PM
lights inoperative.
69
TABLE A.4.1
World airline accidents CMF summary
(Sheet 3 of 12)
70
DESIGNRELATED DEFENSIVE MEASURES AGAINST DEPENDENT
FAILURES. ABB ATOM'S APPROAC H.
1. Introduction
ABB ATOM has delivered 11 Boiling Water Reactor (BWR) nuclear power plants
to utilities in Sweden and Finland. Two of the plants were turnkey deliveries.
In 1970, ABB ATOM decided to adopt a design (in the following referred to
as BWR 75 design) with four halfcapacity systems (<t 50%) in which each
division or subsystem is completely separated from its redundant counterparts.
Other important characteristic features of this concept include: integral plant
design, fine motion control rods, internal recirculation pumps, prestressed
concrete reactor containment, operational flexibility, excellent availability
performance and low occupational radiation doses.
Most of the important features of the BWR 75 design were implemented in
the first two 930 MWe units at Forsmark and in the two 660 MWe units at
Olkiluoto in Finland. Recently, the BWR 75 design concept was fully realised in
the Forsmark 3 and Oskarshamn 3 plants (1060 MWe), which went into
commercial operation in 1985. Thus, the advanced design which now serves as a
model to LWR designs being developed in other countries ("advanced BWR"), is
already proven in the operating Swedish and Finnish plants.
It is worth noting that a thorough rewiew has now been made of the BWR 75
concept for the purpose of providing the best possible nuclear power plant for
the 1990's (ref. 1). The product resulting from this effort is denoted BWR 90.
Some moderate design modifications of the BWR 75 concept have been
proposedin order to reduce costs, incorporate technological modernisation and
adapt to new safety requirements. The major improvements concern:
71
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 71-100.
1989 ECSC, EEC. EAEC. Brussels and Luxembourg.
strengthening of the capability of the reactor primary containment to
withstand the effects of a core melt accident
substantial reduction of building volume resulting in significant cost saving
extensive use of microcomputers and fibre optics for process control.
The four-divisional configuration of the safety systems was reconfirmed as
constituting an optimal arrangement with respect to safety, layout, and
maintainability.
General defensive measures against dependent failures include:
separation
diversity
fail-safe design
design review and verification
standardisation
periodic testing
administrative routines.
Note that standardisation may, however, increase the probability and impact
of certain types of Common Cause Failures (CCFs), caused by e.g. failures in
the manufacturing process.
In the following some of the defensive measures against dependent failures,
as implemented in ABB ATOM'S design and practices, will be described, with
main emphasis on separation and diversity. The importance of separation will be
illustrated in the context of protection against abnormal events. This will be
followed by an account of some findings from a recently performed Probabilis-
tic Safety Assessment (PSA) for a four-divisional plant.
2. Merits of Separation
The merits of separation as applied in ABB ATOM design, have been described
earlier in detail, from a qualitative point of view (ref. 2).
It was concluded that in cases of fire, pipe whip, missiles, flooding,
hurricanes, lightning, airplane crash and sabotage the physical separation of
structures, systems and components will effectively, although to a varying
degree, improve the safety of the nuclear power plant. Furthermore, a
consistent and logical application of the principles of physical separation also
means the absence of complicated links, interactions and interconnections
between safety related functions, which simplifies safety analysis and should
speed up licensing procedures. Some of the highlights of ref. 2 will be reflected
in the following.
The following safety related functions are subject to the "two-out-of-four"
redundancy principles:
72 -
All systems supporting these functions are completely separated into four
subsystems. This means that for a process system each subdivision contains not
only mechanical components (pumps, valves, heat exchangers, piping, etc) of its
own, but also separate electrical power supply including diesel generator and its
control equipment. In addition, the fourfold division of safety systems is
consistently carried through with respect to the reactor protection system
(RPS). Figure 1 illustrates the subdivision of the emergency core cooling
system, while physical separation of safety related systems is demonstrated in
figure 2. Thus, the emergency cooling systems are situated in four separate
bays in the reactor building, adjacent to the primary containment.
Figures 3 and demonstrate that the fourfold division of safety systems is
consistently carried out also with regard to such support functions as electric
power supply and reactor protection system. The onsite standby power units
(four diesel generator sets) are situated as shown in Figure 5 which provides the
overall building arrangement. The nuclear and safety related portions of the
plant, i.e. the reactor and the control and emergency generator buildings, are
assembled around the reactor and separated from the conventional, turbine and
auxiliary portions by a wide communication area.
As far as practically possible even the operational and safety equipment
have been separated both from the physical and functional point of view. This
limits the possibility of failure propagation. As an example, the following
principles are valid for overcurrent protection switches:
3. D iversity
73
BWR 75 - EMERGENCY COOLING SYSTEMS
Figure 1.
BWR 75 - PHYSICAL SEPARATION
Bottom part of the reactor building
(below the reactor containment)
Subdivision (Circuits) A
Subdivision (Circuits)
Subdivision (Circuits) C
Subdivision (Circuits) D
CL
fl
Figure 2.
BWR 75 - SINGLE LINE DIA GRA M FOR THE ELECTRIC POWER SYSTEMS
X
A
a=L
OPERATIONAL SYSTEMS
II
10 kV
| - 660 V J - Y
J - 380/220 V L
SAFETY-RELATED
SYSTEMS
_i_ Diese'-backed systems j f
380/220 V
Battery-backed AC system
Subdivision A Subdivision C Subdivision D Subdivision
Figure 3.
BWR 75 REA CTOR PROTECTION SYSTEM (RPS)
RPS logic
Analog signals Limit Digital signals Logic Signal exchange ,out Digital actuation
(0-10V) switch (0/24 V) channel (fibre optics) | c signals ( 0 / 2 4 V)
Subdivision A
I Subdivision
77
Subdivision C J
Subdivision D J
Figure 't.
BWR 75 - BUILDING ARRANGEMENT
Controlled area
Scale: o 50 100 m
D Uncontrolled area
Figure 5.
One of the critical safety functions is the shutdown of the reactor. An
example of a common cause failure in the shutdown system is an incident which
occured at General Electric BWR plant Brown's Ferry 3 on June 28, 19S0. Due
to equipment failure in connection with normal shutdown procedures one of the
scram discharge tanks was filled up with water, which prevented the full screm
function of the 76 rods with drives connected to it. Repeated manual activation
of the scram system finally resulted in insertion of all the failing rods after
about 15 minutes.
To avoid this type of CCFs in the shutdown system, the control rod drive in
ABB ATOM'S BWRs has diverse systems for normal operation and for scram
function. An electro-mechanical system with a motor gives a continuous fine
motion for normal use and a hydraulic system is used for scrams. Figure 6 shows
the function chain diagram for reactor shutdown function of Forsmark 3 plant.
- 79 -
REACTOR
CONTAINMENT
VESSEL
REACTOR
PRESSURE
VESSEL
351
354 221
532
Figure 6. Function chain diagram for reactor shutdown at the Forsmark 3 plant.
80
BWR 3000 - HYDRAULIC SC RAM SYSTEM (SYSTEM 354)
'f^<fr&xfy.&m&
h?
JUUL
tl JUL
SL
to the other
from
scram modules
system 754 =o<J=
to
system 352 =3
scram module
Figure 7.
CONTROL ROD DRIVE SYSTEM
I- Control rod
Fuel
[assembly
14)
-Baseplate
Control rod
guide tube
Control rod
drive
ASEA-ATOM BWR
core module Pressurized
water-
Figure 8.
Control rod in bottom Normal manoeuvring Fast insertion with
position with electric motor pressurized water
- 82
which acts as torsional and radial guide for the nut and piston tube.
Reactor scram involves automatic insertion of the piston tube and thus its
control rod into the core by means of high-pressure water from the hydraulic
scram system, regardless of the position of the drive nut. To maintain the
piston tube in the intended position after reactor scram, the bottom end of the
piston tube is provided with three latches, which are actuated as soon as the
piston tube and nut separate. The latches engage into special latching holes in
the guide tube. One latch is sufficient to hold the piston tube and control rod in
position. Simultaneously with the signal to the scram system, a starting signal is
applied to the control rod drive motors. The electro-mechanical transmission
will thus immediately start to move the drive nut upwards, and, after a
maximum of k minutes, the nut will come into contact with the piston tube in
its fully inserted position.
4. Other Defences
An example of fail-safe design has already been given. Below follows a short
summary of additional means for reducing the impact of dependent events.
Note that not all defensive measures described in this chapter are design-
related.
- 83 -
thorough knowledge of the actual system, including mechanical, electrical and
safety specialists, persons with experience from reactor operations and know
ledge of layout and installation of the system itself as well as of other
equipment which may influence the system considered. In this way the system
will be reviewed and analyzed from different angles, and the communication
between the different specialists may reveal design weaknesses which otherwise
would not have been noticed.
To ensure correct and conservative design specifications for equipment
which is relyed upon during accident conditions, it is important to have reliable
analytical tools to calculate the various loads which may occur during these
conditions.
In many cases analytical calculations are not sufficient, but they have to be
supplemented by type tests and verification experiments. C arefully planned and
conducted experiments and verification tests are effective weapons in the
defence against common cause failures.
5. Abnormal Events
The PSA studies applied to nuclear power plants show clearly that the designs
are reasonably "proof" against random faults and malfunctions. Thus, attention
is focused on possible sources of dependencies, which may eventually lead to
84
BWR 75 - REACTOR PROTECTION SYSTEM (RPS)
Dynamic testing of 2 - o u t - o f - 4 logic
Subdivision A /Output
No signals
I
equal?
S
Subdivision 2/4 logic
Test charme)
Subdivision C to main
computer
Alarm
Subdivision D
Figure 9.
multiple failures. The single occurrence that yields a multiple failure may be,
for example one of abnormal events such as:
fire
earthquake
aircraft crash
missiles
flooding
chemical explosion
hurricane
lightning.
The general principle adapted for BWR 75 is that subs A and C are separated
from subs B and D by being placed in different fire zones. In no case must a fire
zone share the ventilation equipment or air ducts with another fire zone (with
exception for main exhaust air stack), i.e. separate systems are provided for
normal ventilation emergency filters and smoke extraction. The fire zones are
separated by fire-resistant structures. ABB ATOM has divided the main
buildings for BWR 75 into nine fire zones. Larger buildings and the functionally
associated smaller buildings form their own fire zones. As an example, the
reactor building has been divided into two fire zones.
The nine fire zones (figure 10) are divided into fire cells of two types: fire
cells with separate normal ventilation and fire cells with common normal
ventilation. The degree of separation is determined by the fire-load of the area
considered and also by other hazards.
Figure 11 illustrates the physical separation principles as applied to elect-
rical equipment. The degree of separation reflects consideration of the
potential hazard, with regard to fire load as well as the consequence of fire.
Reactor protection system equipment is situated in four separate areas that
are provided with separate ventilation systems in the control building (figure
12). In the control room itself, the RPS-related functions are placed in different
cabinets provided with fire-resistant shielding. In the cable spreading area
below the control room, safety-related cables are also separated into four
channels that are individually shielded.
- 86 -
121 Reactor building
122 Turbine building
123 Condensate cleanup system building
124 Auxiliary systems buildings A,
125 Entrance building
126 Control building
127 Diesel buildings A, B, C, D
128 Waste building
129 Active workshop building
131 Auxiliary cooling water buildings A,
132 High voltage switchgear building
134 Transformer building
137 Turbine cooling water systems building
145 Offgas building
148 Storage building
I /
y
o
/
f
3^
Central control room
m
I II } I V.l.
tm23 axaa
distance distance
t bi ibi
barrier barrier
$
m
| *&; I o CO 00 I
booed
distance distance
(EH <3XS>
barrier <Z> barrier <8>
i
*l^^'^;v''^^'"'.!^V:^^',J
^'^', !'.''.
wmm;
l fc
& r e . . . S W^RV?).t>v.^iyA^r/,'.<^B.
VK' ltl ..'r.viJK'ftAyA.fc
Process areas
Subdivision A Subdivision D
Subdivision C Subdivision
Figure 11.
88
BWR 75 - CONTROL BUILDING, LEVEL 0.0
Subdivision A D Subdivision C
Subdivision Subdivision D
Figure 12.
Extensive active firefighting systems include the alarm system consisting
of two independent parts for the A/C and B/D portions of the plant, about 60
separate ventilation systems for normal ventilation and 15 systems for smoke
extraction, and numerous extinguishing systems (for water supply and hydrants,
fire sprinklers and halon extinguishers).
In view of the general separation principles and the features of ventilation
and fire extinguishing systems, it is highly improbable that the fire could affect
more than one sub.
1) Engineering judgement
2) Static analysis
3) Dynamic analysis
Testing.
The seismic input for all equipment are the floor response spectra obtained
from the dynamic analysis of different building structures. Using the floor
response spectra it has been possible to qualify some equipment by e.g.
referring to results from other types of vibration testing or by referring to the
low acceleration level.
Engineering judgement has preferably been used to evaluate the risk that
nonseismic equipment might jeopardize seismically classified equipment.
Static analysis has been used where it is obvious that a given component has
its natural frequency above 33 Hz, which is outside the frequency range of the
earthquake. In other cases it has been necessary to verify by dynamic analysis
that the natural frequency is >33 Hz and then apply a static analysis.
Dynamic analysis has been performed on all building structures to obtain the
floor response spectra. The reactor with internals has been analyzed dynami
cally together with the reactor building and containment. A dynamic analysis
has also been performed for most process piping with individual dynamic models
for different piping systems. Small piping has been treated in more generalized
manner to get appropriate support distances for different piping sizes.
Where the seismic design has not been possible to verify by analytical
methods, testing has been performed.
A separate analysis concerns the seismic response of the turbine plant. The
potential missiles are the turbine building, but also turbine itself and feedwater
tank. It is concluded that a Safe Shutdown Earthquake (SSE) may cause damage
90
BWR 75 - BUILDINGS, SEISMIC DESIGN
Figure 13.
to the turbine plant, but will not significantly affect the safety related
equipment.
The need for protection of a nuclear power plant against an aircraft crash is
related to the location of the plant. While only small aircrafts are considered a
hazard in the vicinity of most Scandinavian sites, large areoplanes are postu-
lated to be potential hazards at many continental sites, e.g. in the FRG and
Switzerland. Thus, both military and large civilian aircraft are analysed with
respect to their possible impact on the buildings of the plant.
In the ABB-ATOM plant, basic protection is achieved by protecting the
reactor containment by structures, and locating redundant portions of safety
related systems in building compounds which are separated in such a way that
the aircraft cannot damage redundant parts (figure 14).
Special attention has to be given to the risks associated with the burning of
huge amounts of aircraft fuel. The fire can affect emergency ventilation
systems and may lead to the choking of redundant diesel generator sets, needed
for emergency power. Thus, air intakes may need special protection.
Further protection can be obtained by reinforcing certain walls and roofs to
withstand specific impacts, defined by aircraft speed, size, and angle of
approach.
Two loading cases have been studied:
92 -
I
-o
94
containment, separation of safety related equipment is made by location in
different rooms, which makes it not possible for a missile to affect more than
one sub. However, jet impingements may affect safety related equipment.
Discharging fluid could also cause secondary missiles for example gratings
accelerated by jet forces. The potential for safety related impact from
secondary missiles is limited due to design and the fact that the identified
secondary missiles do not affect safety related equipment in more than one
subcircuit.
In conclusion, the missile analyses, carried out on a case-by-case basis,
ensure that the basic safety goals are not endangered by missiles. This implies
that
5.5. FLOODING
Possibility of external flooding, caused by a high water level, is site dependent.
In the Forsmark 3 case the protection of the safety equipment against such
eventuality is generally assured by placing it on a level which will not be
reached by the water from external sources, even under extreme conditions.
The criterion for protection against internal flooding is that safety functions
should not be impaired. To meet these requirements, certain rooms containing
safety related equipment are designed to be leak-tight to prevent the water
from propagation to adjacent rooms. Furthermore, to limit the outflow from a
pipe break, the plant is equipped with room monitoring equipment which
automatically actuates isolation of the break, thus limiting the outflow to the
room.
The analyses of internal flooding have been made for all buildings of safety
interest. The different buildings have been designed with discharge opening and
runoff ways to meet the water loads from potential internal flooding and
satisfy safety requirements.
During the last few years a lot of effort has been directed towards performing
Probabilistic Safety Assessment (PSA) for nuclear power plants. The Swedish
Nuclear Power Inspectorate (SKI) now requires PSAs to be made recurrently
(every 10th year) as a check of the safety level of operating nuclear power
plants. ABB ATOM has made essential contributions to this effort by partici-
pating as the main contractor or in cooperation with utility in question in PSAs
for the following plants (the year of publication of the final report is indicated
- 95 -
within parantheses):
Forsmark 3 (1977)
Ringhals 1 (1983)
Barsebck 1(1985)
Forsmark 3 (1985)
Oskarshamn 3 (1986).
Recently, ABB ATOM performed a PSA for a four-divisional plant (ref. 7). A
characteristic feature of this study is a thorough treatment of dependencies,
including a detailed modelling of intercomponent CCF-contributions. Three
types of dependencies have been considered:
The two last mentioned groups were in turn divided into functional,
shared-equipment, physical interaction and human interaction dependencies.
The dependencies not explicitly included in the plant model (small event
tree/large fault tree approach) were incorporated as residual CCF-contribu-
tions. Of particular interest are the analyses of CCIs and CCFs.
The analysis of plant-specific CCIs was limited to events of transient
character. External events and internal events causing severe environmental
stresses were outside of the scope of the PSA. Naturally, the choice of such
boundary conditions for the study was motivated by the specific design features
of the plant, minimizing the impact of this type of events. The analysis focused
on functions which may influence both normally operated systems as well as
stand-by safety systems, i.e.:
In all cases analysed, the potential CCIs are already either covered by
generic transient categories or do not contribute significantly to the core melt
frequency. This is a natural result of the consequent separation of redundant
safety equipment, both from the physical and functional point of view.
An issue related to CCIs are physical interaction dependencies created by
dynamic effects such as: pipe whips, jet impingement, secondary missiles and
pool-dynamic loads, which may folow upon a pipe break within the reactor
containment. As a part of the PSA the unavailability contributions from
- 96 -
dynamic effects, for systems mitigating the consequences of internal pipe
breaks, have been assessed. In some cases significant contributions have been
obtained for the pressure relief system, the auxiliary feedwater system and the
emergency core cooling system. However, it was demonstrated that dynamic
effects have small impact on estimated frequencies of accident sequences.
The residual CCF-contributions have been quantified for active components
within safety systems. Second-order contributions from passive components,
diversified equipment and intersystem-intercomponent residual CCFs, have
been neglected. A characteristic feature of the analysis is consideration of all
possible failure multiplicities (i.e. double, triple and quadruple failures) and
their combinations.
It has been shown in supplementary deterministic analyses that for the
majority of initiating events, e.g. all identified transients, only "one-out-of-
four" trains in the safety systems will be needed. Thus, the actual capacity of
safety systems corresponds in such situations to 4x100%. This leads to very
favourable conditions from a probabilistic point of view - a quadruple failure
must occur to disable a safety system. Only for the most limiting accident
sequences (some of the loss of coolant accidents) the "two-out-of-four" success
criterion applies. As shown in the PSA the LOCA sequences, sensitive to triple
CCFs, give very small contributions to the estimated core melt frequency.
Another important finding is that postulated systematic misconfiguration of
redundant components gives insignificant contributions. This is due to high level
of redundancy, separation, staggered testing of redundant trains, automatical
restoration of components to original position after testing and favourable
conditions for recovery.
97 -
Three CCFs could be eliminated if the four-divisional design (separation)
had been applied; this includes a triple failure at Salem 1
One CCF would not occur in BWR 75 due to different design of a special
system
Three CCFs would have an extremely low probability of ocurrence due to
improved separation (in two cases) and different design (in one case)
Six CCFs (with two diesels involved) cannot be excluded with certainty;
however, two of them have a negligible probability of striking four diesels
simultaneously
One CCF has unknown cause (triple failure at Brunswick 2).
1) Although the DG-studies basically use the same source material (failure
reports), significant discrepancies exist between the results of different
studies.
2) Frequently the information available from simple and interconnected
systems is extrapolated to systems with good separation and with an
increased level of redundancy. This extrapolation is not always carried out
with due concern to design and application differences, thus resulting in
pessimistic predictions that do not reflect actual design improvements.
3) The estimated multiple failure probabilities are subject to large
uncertainties due to plant-to-plant variation, different interpretation of
the same source information and use of different quantitative methods.
- 98 -
that application of primitive models to systems with high level of redundancy
may result in excessive conservatism (ref. 11).
7. Conclusions
The PSAs performed for four-divisional plants have demonstrated that the
design philosophy characterised by physical and functional separation of
redundant equipment constitutes an efficient defence measure against critical
dependent failures. This statement is valid even in view of uncertainties
associated with estimation of residual common cause failure contributions. In
this context it must be emphasized that improper extrapolation of information
available from simple and interconnected systems to systems with complete
separation and high level of redundancy, results in excessively pessimistic
predictions and does not reflect actual design improvements.
Apart from the inherent defensive measures against dependent events there
are other strong reasons in support of the four-divisional design:
8. References
1. Hellstrm, B., Lnnerberg, B., Tirn, I., 'The ASEA-ATOM Advanced BWR
is a Proven Design. Key Features of BWR 90'. Atomwirtschaft 8/9,
August/September 1987.
2. Helander, L-I., Tirn, L.I., 'Nuclear Power Plant Safety. The Merits of
Separation'. International Conference on Nuclear Power and its Fuel Cycle,
Salzburg, Austria, May 2-3, 1977.
3. Rolandson, S., 'Practical Defences Against Dependent Failures'. Workshop
on Dependent Failure Analysis, Vsters, Sweden, April 27-28, 1983,
Swedish Nuclear Power Inspectorate and AB ASEA-ATOM.
k. Ericsson, G. Lilja, T., 'ATWS in a BWR with Alternate Rod Insertion
Function - a Probabilistic Analysis'. International ANS/ENS Topical
Meeting on Probabilistic Risk Assessment, Port Chester, New Yourk, USA,
September 20-2<f, 1981.
99
5. Hirschberg, S., Tirn, L.I., 'Review of ASEAATOM Activities in the Area
of Abnormal Events, Risk Analysis and Reliability Engineering'.
Ris/UKAEA/ASEAATOM Seminar, Winfrith, England, November 1920,
1985.
6. Hirschberg, S., Knochenhauer, M., 'Advantages of the FourDivisonal
Design'. IAEAInternational C onference on Nuclear Power Performance
and Safety, Vienna, Septemper 28October 2, 1987.
7. 'Forsmark 3 Probabilistic Safety Study' (in Swedish). AB ASEAATOM,
February 1985.
8. McC lymont, ., McLagan, G., 'Diesel Generator Reliability: Data and
Preliminary Analysis'. Interim Report EPRI NP2433, Electric Power
Research Institute, 1982.
9. Pulkkinen, U., et al., 'Reliability of Diesel Generators in the Finnish and
Swedish Nuclear Power Plants'. Electrical Engineering Laboratory,
Research Report 7/82, Technical Research C entre of Finland, June 1982.
10. Hirschberg, S., Pulkkinen, U., 'C ommon C ause Failure Data: Experience
from Diesel Generator Studies'. Nuclear Safety 26, 3 (1985), 305.
11. Hirschberg, S., 'C omparison of Methods for Quantitative Analysis of
Common C ause Failures a C ase Study'. International ANS/ENS Topical
Meeting on Probabilistic Safety Methods and Applications, San Francisco,
California, USA, February 24 March 1, 1985.
12. Knochenhauer, M., Enqvist, ., 'Using PSA Models for Planning and
Evaluation of Preventive Maintenance during Power Operation'.
CSNI/UNIPEDE Specialist Meeting on Improving Technical Specifications
for Nuclear Power Plants, Madrid, Spain, September 711, 1987.
13. Leine, L., 'Design for Maintainability'. IAEA International Symposium on
Nuclear Power Plant Outage Experience, Karlsruhe, West Germany, June
1822, 1984.
100
DESIGN DEFENCES AGAINST COMMON CAUSE/MULTIPLE RELATED FAILURE
P. Drre R. Schilling
SIEMENS AG (KWU) SIEMENS AG (KWU)
P.O. Box 101063 P.O. Box 3220
D6050 Offenbach D8520 Erlangen
FR Germany FR Germany
1.1. Redundancy
Example. The residual heat removal (RHR) system which is part of the
emergency core cooling (EC C ) system, is of 2outofA design (with
respect to conservative success criteria). This means that if at least 2
of the h subsystems provided function when required, the ECCS is still
capable of fulfilling its safety function.
101
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment. 101106.
1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
1.2. Diversity
1.5. Automation
- 102
Some other features listed below do not refer to the hardware component
or system design, but to the (design of the) context or environment in
which these components operate. It is well known that the failure
behaviour of a component is not only a matter of its hardware design,
but as well depends on other conditions like the quality of preventive
maintenance, efficient and easy fault detection, efficient test and
operation procedures etc.
2. EXAMPLES
- 103 -
state must be maintained with the help of both the steam relief function
and the steam generator feed function. The latter function is
established by 2 diverse systems (see Fig. 1):
- the startup and shutdown system with 2x100% pumps, connected to 2 of
the k emergency power bus bars (grid D1),
- the emergency feed (EF) system with 4x100% pumps.
SIEMENS
Protected
a l tomai tvtnts
104
same routine functional test procedure for each subsystem,
staggered testing (one subsystem per week),
time trends in operational behaviour of sane important components
during tests are recorded and evaluated.
SIEMENS
SIEMENS
Section
Plan view
- 105
2.2. Reactor Protection (RP) System Design Requirements
According to the set of rules which govern the design of the reactor
protection (RP) system (KTA 3501), it is required that the RP system has
to be designed, installed, and operated such that failure-causing events
within and outside the reactor as well as within the RP system itself
cannot prevent the initiation of necessary protective actions.
Generally 4 types of component unavailability are accounted for in
the design stage:
- systematic failure
- random failure,
- command/cascade failure,
- unavailability due to repair.
The following potential failure-causing events have to be
considered within the RP system itself:
- random failures of components (modules) of the RP system, caused e. g.
by short circuit, interrupt, short to ground, change in voltage or
frequency, mechanical failure, fire;
- systematic failures like several simultaneous (or sequential in short
time) failures in subsystems of the RP system, originating from a
common cause in the RP system itself, e. g. manufacturing or design
error, drift.
With respect to systematic failures external to the RP system (but
internal to the plant), e. g. the following events have to be taken into
consideration:
3. C ON
C LUSIONS
References
1. KWU: Pressurized Water Reactor, Order No. K/10567-101, January 1982
2. Poucet, ., Amendola, ., and Cacciabue, P. C, "C C F-RBE: Common
Cause Failure Reliability Benchmark Exercise", EUR 11054 EN, 1987
106
MEASURES TAKEN AT DESIGN LEVEL TO COUNTER COMMON CAUSE FAILURES .
A FEW COMMENTS CONCERNING THE APPROACH OF EDF .
T. MESLIN
EDF/SPT
3, rue de Messine
75384 PARIS Cedex 08
FRANCE
CONTENTS
1. GENERAL COMMENTS
4. CONCLUSION
107
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 107-111.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
1. GENERAL COMMENTS
. redundancy,
. diversity,
. the fail-safe principle,
. channel separation,
. maintenance policy,
. automation.
The 1300 MWe PWR unit control and instrumentation consists entirely of
electronic components and microprocessors.
108
SPIN, the reactor protection system, provides the scram function
and manages safeguard-related actions. It has four independent chan-
nels. It embodies the fail-safe and separation principles (physical,
geographical, etc.).
The two systems are the subject of thorough experience feedback
processing. The details of each electronic card fault are entered in
a record at the site which is processed by computer at headquarters
with the assistance of the vendors. With this procedure, it is already
possible to calculate numerous parameters relating to the electronic
components, but for the time being, no design related common cause
failure has been found in this new equipment.
The design of the safety injection system of the 1300 MWe units has
been the subject of a probabilistic analysis, on completion of which
preference was given to a conventional two-stream arrangement rather
than a system with threefold redundancy. This was because multiplying
the number of identical streams does not appear to be an adequate mea-
sure against the risk of common cause failures. On the other hand, the
chosen safety injection arrangement provides diversity resulting from
the following three functions :
- 109 -
This engineer is called to the control room as soon as an event occurs
which could compromise the safety of the installation. The operators
are in charge of actual running of the unit and apply the appropriate
procedures.
The safety and radioprotection engineer takes responsability for
the operation of the installations in accident situations or in any
events which are not covered by the procedures. In such cases he ap-
plies "state based" procedure SPI, and his intervention provides a
redundant human analysis capability based on a method and resources
which are independent of the shift team.
This setup is supplemented, if the situation requires, by the
creation of independent policy making and analysis teams at local and
national levels.
The French safety authorities have required EDF to study the situations
created by the total loss of the redundant systems, in particular the
electrical power supplies, the steam generator feedwater and the heat
sink.
As a result, EDF has developed supplementary means of countering
such accidents, based on suitable procedures and diversified comple-
mentary devices.
For example, as concerns loss of the electrical power supplies,
each unit is equipped with a turbine generator which can re-supply in-
jection at the primary pump seals and the vital control and instrumen-
tation components, also each unit can be re-supplied within a very
short lapse of time by a site gas turbine with the same capacity as a
diesel generator.
In the presence of the supplementary means, and with the corres-
ponding implementation instructions, it has been demonstrated that
none of the design limit situations allowed for represents a core dry-
out risk which exceeds 10 7 per reactor-year.
110 -
3. COMMON CAUSE FAILURES-SUMMARY
A. CONCLUSION
- Ill
ANALYSIS PROCEDURES FOR IDENTIFICATION OF MRF's
Humphreys
National Centre of Systems Reliability
UKAEA
Wigshaw Lane
Culcheth
Warrington
WA3 4NE
1. S
C OPE OF THE LECTURE
113
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 113129.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Cascade Failures (CF); These are propagating failures.
2. BENEFITS OF ASSESSMENT
- 114 -
presence or absence of defences against each category of the scheme,
and by that means performs a qualitative analysis of the plant.
An event classification scheme provides a means of organising data
on DF events into a structural form. One such scheme is that
originally developed by LATA (3).
The LATA classification scheme is much more than a classification
b> cause. The scheme proposed in (1)(2) is only a causal
classification. However, there would appear to be no great difficulty
in cross-referencing these classifications of causal mechanisms.
The other information, besides that on causes, stored in the LATA
schemes, make it applicable to modelling requirements.
The heart of the scheme is the universe of cause-effect logic
units shown in Figure 1. Classification of events at this fundamental
level is a significant step forward in clearing the mists of ambiguity
that shroud the topic of dependent failures. It is now possible on
this basis to ensure a comparison of like with like. Furthermore, the
analyst can state, in terms of pinpointing certain cause-effect logic
units, exactly the type of event(s) included in the computation of
numerators for beta factors etc. This should help to clarify the
situation in which different analysts can product different beta
factors from the same raw event data simply because different types of
event were considered candidates for the numerator in each case. For
example, the cleavage between root-caused events and component-caused
events is an essential distinction from the point of view of modelling.
Component-caused events are functional dependencies which ought to be
explicitly modelled in the structure itself of a properly constructed
fault tree. The parametric analysis (eg, beta factor) is designed to
mop up the root-caused events. thus while it is important to recognise
all the types in order to distinguish carefully between them, selection
must be employed in the use of this information for parametric
modelling. Therefore an analyst must state clearly that events of
types corresponding to eg, units 5, 6 and 7 of the cause effect
universe have been used in the computation of beta.
The classification also makes important distinctions between
events involving failures and events involving functional
unavailabilities, and also between actual and potential failure events.
The originators of this scheme caution against the abuse of the
potential failure category; it must not be used for all manner of
hypothetical or speculative events, but reserved for a more limited
category of events eg, incipient failures. Within a subjective
assessment some weighting can be given to potential failures if the
presence of a mechanism capable of causing multiple failures is judged
to be present. With suitable discretion on the part of the analyst,
this can help alleviate the problem of data scarcity. The
classification makes no distinction between multiple failures in a
single channel and failures in different (redundant) channels. Again
this is useful in terms of increasing the database. The analyst needs
however to exercise care from the point of view of defences. A
dependency between, say, valves in a single channel could in principle
apply to valves in different channels unless additional defences have
115
Root Cautid Coaponint Cauttd
_Sifli_ Statt
Functional Un Functional Un
FailureF availabilityU MitdFU FailurtF availobiliryU Mid FU
RF RU CU
r, 0 0n
. 1 1 n.1 2 n 1 3 n.l 4
scf s c u SCM
SRF SRU SRM
CF.PX CU.PX
RF.PX RU.PX
^0" c-C3"
n . 1 ^ 0 r n t l ^ E i' ml
r i t )7 r i 1 18 ril
/ys'
c fcljl
fc 3
C^i
? VK
co3 .il
*
* s S r>1 R'33 r1 3
116
been utilised between channels. If this latter was the case, the event
data would not strictly be applicable.
An example of the worksheet associated with the scheme Is shown on
Figure 2.
A root cause classification scheme Is effectively a definition of
the structure of cause code scheme of column 3 of Figure 2. One of the
most comprehensive schemes used today is that developed with support
from the United States Nuclear Regulatory C ommission, and reported in
detail in ref 4. The scope of that scheme is too extensive to
adequately report here, and the reader is advised to read the source
document ref A.
We would conclude that although the classification schemes contain
imperfection and limitations, it is preferable to attempt to classify
DF data in accordance with a structured scheme, from which some attempt
can be made to derive parameter estimates for dependent failure
modelling.
- 117 -
PART 1
1 2 3 4 S 6 7 9 10 11 12 13 14 IS 16
UJ
od li UJ J Ul
UJ< O o
cc d o
" * GC >
O
o m X
J < OD
o * o UJ S *
UJ3
P. WITH!
DHPONEN
"5 li. _ j
PONENTS
j uj CL* UJ kJ
>?
PONENT
m i
PONENT
Q T. o r a r
LEVEL
O u u. CD
o o o >> o o o o z UJ
SS U l UJ
kJ
< OCj
"
< r
!*"
U I UJ UJ UJ
1/1 |/>
UJ UJ 3 3
UJ UJ
l/l l/> CL U I ID uj 3 ui
< o
UJ Sg > r 1
3 3 3 3 3 ) K . E a. i. I ^ u =
kj > t- u o
o o O o
< * H/N o ui <
S
0 UJ UJ UJ k J * </t 1/1 UJ u u < ui _<
u. -t
1 2/2 HP HA i/o F X X X X 1
2 1/1 11 wo F I XX X X 1
/ 0 1/0 FU XX X X 4/4
/ 1/0 FU F XX X X 5/5
4 1/1 MS i/o FU F XX X X /
S 1/1 HS i/o FU XX X X /
H 1 7/7 31 32 3 3 / /
11 / 34 35 41 / /
12 / 51 1/0 FU XXXX X 4 /
PART 2
21 21b 21c 17 11 19 20
1/1
UJ
> UJ
5
kJ
<
<
EVENT I0EN riFICATION OA TA UJ
C
u. UJ
u. "5>
> "
IFORMAT TO SE DEVELOPED) = X o > OUJ
,UJ
cr >
k<
Ol= UI >
J 1 UJ UJ
<>z
? UJ 0
1
"nr
Yl*
O u.
u , " I CK uj
r o CL O
L< </.
CBS A G CG
Figure 2
DATA WORKSHEET
ACCIDENTAL H O U N D I N G LOA OS TO STEA M GENERA TOR LOU LEVEL
118
CMF CA USES CL
A SSIFIC
A TION PARTIAL FA CTOR
CMF OEFENCES
EDF EOR ECM ECI OPM ninmiN stono
OEN OEE
n
OESION CONTROL 1.1
OESIGN REVIEW ... ... I I
FUNCTIONAL DIVERSITY 1.2
EQUIPMENT OIVERSITY .IS
FAIL S A F E DESIGN _ _ 1.1 1
OPERATIONAL INTERF
A CES
PROTECTION t SEGREG
A TION ... .
I.
REOUNOANCY i VOTING .
__
PROVEN OESIGN I STA NOA ROISA TION
.
OERATING t SIMPLICITY 0.
CONSTRUCTION CONTROL
0.
TESTING I COMMISSIONING (.7
INSPECTION
I.f
CONSTRUCTION ST
A NO
A ROS O.f
OPERATIONAL CONTROL
l.t
RELIABILITY MONITORING l.t
MAINTENANCE
" 0.7
PROOF TEST
0.7
_
OPERATIONS 0.1
0.001
- 120 -
IfCJVLF. Causes
There are a number of codes available to the analyst and within SRD,
the computer codes employed to support reliability analysis, are most
notably, the codes PREP-KITT and ALMONA employed for fault tree
analysis, while SAMPLE is utilised as part of sensitivity analysis.
Although there are a number of programs which are available to perform
the same tasks, SRD has standardised on the use of the PREP-KITT and
ALMONA codes and has produced a user-friendly environment in which to
operate the codes. This environment provides for graphical input and
editing of a fault tree by computer display, inbuilt routines for data
entry and checking, calculations of contact information, and a
formatted print-out of the trees. This environment eases the workload
of the assessor, making it possible to perform rapid evaluations of
changes to the tree structure and basic event data.
122 -
Develop system
logic
Defermine dependency
categories
identify components
affected
No
no dependency
Potential
Dependency
No
Potential dependency
not significant
Potentially
significant
dependency
No
Dependency not
significant
Significant dependency
quantify
- 123 -
Such a framework Is recommended since it considers all identified
potential dependencies initially. Those that have no significant
influence on system reliability can then be discarded by reference to
the fault tree cutsets.
Thus quantification, after a detailed examination of specific
defences, is performed only for the potentially significant
dependencies which remain.
The framework is now considered in greater detail.
- 124 -
checklist. For example, 'manufacturer' Is one attribute which should
always be present In the checklist. Different manufacturers can be
designated A, B, C and so on.
Thus each basic event would be labelled either , or C ... .
Then two components with Identical labels indicates that the attribute
or environment in question (in this case manufacturer) is the same for
those two components. Hazard assessment (internal plant hazards
mainly, since separate explicit methods are currently in vogue for
external hazard assessment) can also be incorporated as part of the
checklist analysis. Where hazards are not linked to particular items
of plant they are postulated to occur in a location without attempting
to identify a specific source. Within the zonal schemes for each such
hazard (eg, fire, humidity, . . . ) , the affected zone(s) are determined
by the identification of barriers to the propagation of the hazard from
one zone to adjacent zones. Barriers might typically be fire doors and
fire resistant walls. C omponents can be assigned labels as before
identifying which zone they are in with respect to a particular hazard.
By two components sharing the same label it is indicated that they
share a common environment ie, they are in the same zone for a
particular hazard and thus are susceptible to a potential dependency.
Defences against the occurrence of a general hazard in any zone need
only be considered when it has been found that a potentially
significant dependency exists. Such a general treatment is not
appropriate to many hazards which are caused by specific sources.
These may be the failure of a plant item, such as water escaping from a
breach or a dropped load. In this case, possible sources may be
identified and the immediate area defined as the primary zone.
Secondary zones can be found by considering the progression of the
hazard to other areas.
3.5.4 C heck whether any cutset of the fault tree contains two or more
affected components. In principle the fault tree cutsets are examined
one by one. In practice, this either requires the use of a computer
code to search through the minimal cutset listing for cases where basic
events within he same cutset share a common attribute or environment or
It may be possible to capitalise on the modular nature of the fault
tree. A matrix method has been used at SRD to greatly facilitate the
visual inspection of potential dependency routes. This method takes
advantage of the structure of fault trees which are composed of
identical modules.
For each attribute or environment a matrix is drawn up. The rows
of each matrix are the different modules comprising the overall
multi-channel system; while the columns re the analogous components in
each module. The matrix is then filled up with the appropriate labels.
Table 1 below is an example taken from a 4-train system featuring
identical redundancy. The components are denoted by numbers; the
numbers relevant to module 1 being representative of their identical
counterparts in other modules.
- 125 -
TABLE 1
COMMON MANUFACTURER
7 8 9 10 11 12 13 17 21 23 14 16 18 19 20
module 1 A A C E F G H M M A E J
module 2 A A C E F G H M M A E J
module 3 A A C E F G H M M A E J
module 4 A A C E F G H M M A E J
In the study from which this table is cited, the components shown
were in fact first order cutsets in each module. Thus all possible
combinations of four components, one from each row, yield fourth order
system cutsets. In conjunction with other matrices, this matrix showed
that for other order cutsets containing identical equipment
(principally the single columns) could reasonably be expected to
dominate the dependent failure analysis on the basis of sharing common
attributes. Thus in this instance it was not expedient to screen the
entire minimal cutset listing on a cutset-by-cutset basis. In the
example above entire cutsets shared a common attribute, this has been
designated a first rank potential dependency. A second rank potential
dependency would be for example where three components shared a common
attribute within a 4th order cutset. Third rank potential dependencies
can be similarly defined, and so on. In general these should not be
discarded without reasons being given for so doing.
- 126 -
following list of minimal cutsets: {(A), (B)}, {(C)}, {(AB)}, {(AC)},
{(BC)}, {(ABC)}. In a cutset-based approach it is not immediately
clear that the fourth, fifth and sixth items of the latter list would
be correctly quantified. Their effective inclusion would depend on the
data used to quantify (C) and (AB). Ideally (ABC) would be covered in
accounting parametrically for all possible dependencies involving A and
in the database. It is argued that with the configuration under
consideration, where components A and are in parallel, and that
combination is in series with component C, that the contribution from
cutsets AC and BC are irrelevant since C and AB form a series system.
The contribution from AC and BC should however be considered if they
enhance the failure rate of C. Dependent failures concerning common
supports such as C ought to be modelled, but such dependent effects
will normally be dominated by independent failures of first order
minimal cutsets In say a beta factor treatment. Thus the effect of
dependent failure logic on the logic structure itself is conceded, but
it is claimed that a cutset-based approach, when care has been taken in
fault tree construction not to incorporate any pre-suppositions of
independence, will serve as a good approximation. The approximation of
this approach is to limit the search for potential dependencies to
those which are significant in terms of the top event probability.
- 127 -
4. C ON
C LUSIONS
5. A
C KNOWLEDGEMENTS
REFERENCES
128
8 NUREG/CR-1401. Atwood, C L. 'Estimation for the BFR Common Cause
Model'.
- 129
DEPENDENT FAILURE MODELLING BY FAULT TREE TECHNIQUE
S. Contini
Commission of the European Communities
Institute for Systems Engineering
21020 Ispra (VA), Italy
1. INTRODUCTION
In PRA studies the fault tree model is widely applied for
systems analysis. The correct modelling of a system needs
considerations on all possible types of dependencies, in
order not to underestimate the accident occurrence
probabilities.
Some dependencies can straightforwardly be modelled during
the fault tree construction phase (i.e. functional
dependencies, cascade failures, command failures, some
technical specifications), for others (i.e. generic common
causes of failure) the application of complementary
procedures is generally needed. The CCF analysis can be
performed in different ways, depending on the aim of the
study, the availability of data, tools, etc.
Often, the fault tree construction is preceeded by a
Failure Mode and Effects Analysis (FMEA) study. Proper
modules are also used to document all potential causes of
common failure arising both internally and externally to
the plant /l /.
The result of the fault tree analysis is expressed by a
list of failure combinations (Minimal Cut Sets, MCSs) to
which the quantification procedure is applied. These
results allow the analyst to identify the critical points
of the system, and give him guidance for the definition of
suitable design changes to improve the characteristics of
the system.
In this paper the main approaches for system modelling and
131
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 131-143.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
analysis by fault tree are briefly presented. More
attention is focussed toward the techniques for
qualitative CCF analysis, being those for quantitative
analysis fully described in other papers of this volume.
2. SYSTEM ANALYSIS
Generally, during the fault tree construction phase, at a
component level, the analyst has to model the following
types of failures:
a) Primary failures: describe the intrinsic failures
caused by the random performance of the component; its
inputs are supposed to be within the design envelop; the
repair is required to return the component in the working
state;
b) Command faults: describe the situations in which the
component is unable to perform its intended function
because of the lack of proper input or because of spurious
input signals that change the component state, but without
damaging it; the repair is not required to return the
component in the working state;
c) Secondary failures: describe the component failures
due to eccessive stresses caused by external circumstances
e.g. the failure of another component, abnormal
environmental conditions, human errors in operation, or
caused by human errors in design, construction, etc.
In order to take the component in the working state the
repair action is necessary.
The types of dependencies among components due to
technical specifications can be described in the fault
tree model, even if they may lead to non-coherent logical
structures.
Consider, as an example, a two train stand-by redundant
system subject to periodic testing; suppose that only one
train at a time can be under test. The "natural" modelling
requires the use of the NOT operator as shown in /2 /.
The analysis of a non-coherent structure function requires
the use of a fault tree analysis code able to deal with
negated events: this is a feature that not all computer
codes have.
Technical, specifications can also be modelled as coherent
structures; in these cases, however, as many fault trees
as the number of different system configurations are to be
developed (i.e. both trains on-line, train Tl on line and
T2 under test, and vice versa).
132
The modelling and quantification of multiple failures due
to a single cause (not arising from a component failure),
referred to as "common cause failures", received a wide
attention since the early seventies due to their practical
relevance.
Common causes of failure are "introduced" throughout all
the development phases of the plant: from design to
operation. The identification of CCFs represents the most
important step of the analysis. In order to make the
identification phase as systematic as possible, it is
necessary to make use of a check list of sources of common
causes of failures in order to systematically select those
that might produce critical effects on the system being
analysed. The existing classification systems give a
guidance to the analyst in setting up the system oriented
check list /3 /.
A common cause failure analysis can be performed
quantitatively, provided that event histories are
available; in any case, also a simple qualitative analysis
allows to gather useful information on the existence of
potential dependencies and, consequently, to define the
actions to be taken either to eliminate or to reduce the
degree of dependency.
The main steps of the analysis procedure can be summarised
as follows.
a) Model, in the fault tree at component level, the
information on CCFs.
b) Determine the MCSs. Some of them contain events
representing failures due to common causes. If data is
available, perform the quantitative analysis.
c) Examine a MCS to check whether it can be verified (or
strongly affected ) by the CCF events.
d) If the effects of at least a CCF is considered to be
important, step e) is performed, otherwise step c) is
repeated for another MCS.
e) From a list of preventive measures those which can
eliminate or significantly reduce the effects of the
CCFs are considered. If there are other MCSs step c) is
repeated for a new failure combination.
Different methods have been developed for implementing
steps a) and b). A brief description of them is given in
the next sections.
- 133
2.1 Common cause failure modelling in a fault tree
Three approaches to system modelling are listed: all are
based on the representation of possible common cause
dependencies into the fault tree at component level.
In ref. / 4 / the CCFs are explicitely represented in the
tree as primary events. As an example, let us consider the
fault tree of a system formed from three components A, B
and C (see Fig. 1). Suppose that these components are
respectively sensitive to the following secondary causes :
A <==> [CI, C2, C3]
B <==> [Cl, C2]
C <==> [Cl, C3]
For instance, CI could represent a given manufacturer, C2
the abnormal environmental temperature and C3 the physical
location.
These new causes of failure are explicitely represented in
the fault tree as shown in Fig. 2. Each component is
therefore represented by a subtree describing the
disjunction of the random failure and other causes of
common failure. The complexity of the fault tree depends
on the number of common causes that are considered (i.e.
the extent of the analysis): a screening procedure can be
applied to reduce it.
Another way of representing the set of dependencies is to
associate each primary event with a vector of attributes
[Cl, C2, ] representing the common causes of failure
to which it is sensitive / 5 /. Fig. 3 shows the fault
tree for the example of the three components system. The
basic fault tree does not change, since the information on
CCFs is separately associated with the primary events.
A third way of modelling, mainly developed for
quantification / 6 /, is represented in Fig. 4.
This modelling is similar to the first one; the difference
is due to the different meaning associated with the CCF
events. Indeed, in this approach the common cause failure
events represent the generic dependence of a group of
components. For instance, in Fig. 4, in the subtree for
the component A, C-ABC represents the set of generic
causes that lead to the failure of all components and
C-AB, C-AC the set of generic causes that lead
respectively to the failure of A B and A C.
A screening procedure is applied to identify the groups of
components that may be dependent (i.e. pumps of the same
tyPe/ same manufacturer, installed in the same room etc.).
134
As with the first approach, this type of modelling
increases the fault-tree complexity, but it has the great
advantage of completeness, since all possible combinations
of components (i.e. all dependent, all independent, some
dependent and some independent) are included.
An idea about the complexity of the fault tree can be
obtained by looking at the number of primary events in a
parallel system of dependent components. In case of
independent failures only, the fault tree would have
primary events; when dependencies are incorporated the
number of primary events is given by:
r -,
Ne
i=l L i -l
The two first ways of modelling CCFs can give useful
information in case of qualitative analysis only, since
the meaning of the CCF events is chosen by the analyst.
Generally, the identification of CCFs require additional
information (e.g. the layout of the plant) and, for each
component, the list of the generic causes (e.g.
temperature, humidity, vibrations, ...) to which it is
sensitive.
These data are necessary to determine, for each common
cause event C i, the domains in which it can have some
influence and the list of affected components. The domain
of a given common cause of failure C i is the area
delimited by walls into which all components of the system
are subject to C i. From the characteristics of these
components, those that are sensitive to Ci can easily be
selected.
For a given event Ci, the determination of its domains of
influence and the list of affected components can easily
be automatised 111.
Other data are needed, concerning information on
manufacturer, test/maintenence procedures, etc.
135
The determination of the MCSs can be performed by means
of any existing fault tree analysis code. The logical and
probabilistic cut offs can also be applied to reduce the
computing time. The generic MCS contains any combination
of independent and common cause events. For the fault tree
of Fig. 1 the expression for the Top event takes the
following form.
Top = A*B + A*C + CI + C2 + C3 ( 1 )
Obviously, MC Ss in which a common cause affects only one
component are not considered.
If data for all primary events are available the
quantitative analysis can be performed. In / 4 / this is
performed by means of the Monte C arlo simulation
technique.
The second method of modelling (as well as the previous
one) presents some interesting features, especially when
the quantification can not be performed because of
unavailability of data. In these cases the analysis
addresses to the identification of those aspects of the
system design that need to be reconsidered. In other words
the aim of the analysis is to verify whether the system
may fail, or whether it may suffer from a relevant
degradation, as a consequence of the occurrence of a
single event. The engineering judgement substitutes the
quantification.
The dimension of the fault tree does not increase, but
this does not mean that the effort needed for the analysis
could be lower than that of the other cases (explicit
modelling).
The MC Ss resulting from the analysis of the tree contain
some attributes representing the common causes to which
their elements are sensitive. For the example under
consideration the result would be:
Top = A * + A * C <-- expression
CI Cl CI CI
C2 C2 C3 C3 attributes
This result can obviously be expressed also in terms of
MCSs as shown in ( 1 ) .
Attributes that are shared by all elements of the MCS are
referred to as "critical CCFs", since their occurrence may
directly lead the system in a failed state.
Those that are shared by j components out of k (k order of
the MCS) may cause a system degradation to an extent which
is a function of the number of the remaining k-j events.
These C C Fs are referred to as "relevant CCFs" of order
136
w = k-j. Looking at the example, CI, C2 and C3 are all
critical CCF.
The result is a qualitative ranking of CCFs according to
their possible effects on the system. Therefore, for
critical C C Fs and relevant CCFs of lower order (e.g. w =
1, w = 2 ) , an investigation has to be done to verify
whether or not the system contains suitable defences.
It can be realised that the analysis of a fault tree with
attributes may often lead to the determination of all
MCSs. Therefore, when the tree becomes a little bit
complex, many difficulties may arise even for determining
the set of MCSs whose components share, either completely
or partially, at least one attribute.
This steams from the fact that the cut-off techniques
loose their efficacy, since their application is limited
to those subcombinations not containing any attribute
(i.e. independent failures). Generally, the more is the
number of attribute, the more is the number of components
with attributes and therefore the higher is the
computation time.
Batch programs have been developed for qualitative and
quantitative C C F analysis / 7,8 /.
Instead of analysing all CCFs in a single run it seems
more convenient to perform as many runs as the number of
attributes. This approach can easily be implemented with
an interactive program.
For a given attribute, the result of a single run is the
disjunction of only those MCSs in which at least two
elements share the attribute.
If a "partial logical cut-off w " is defined as the
number of events in a MCS without any attribute, it is
possible to further specify the type of MC Ss to be
determined. For instance, w = 0 would imply the
determination of the list of MCSs in which all elements
share the attribute (i.e. critical CCF); w = x, (x > 0 ) ,
would lead to the list of MCSs having not more than "
events without the attribute.
Alternatively, it is possible to express the partial
logical cut-off in probabilistic terms simply by assigning
a probability 1equal to 1 to the events with attribute and
probability 10" to all the others. Therefore, the analysis
of the tree with probabilistic cut off 10->v gives all MCSs
with no more than " w " events without any attribute.
A fast a-priori check on the existance of critical C C Fs
can also be performed through a simple analysis of the
tree structure, i.e. without determining the MCSs.
137 -
Algorithms have been developed to analyse fault trees with
attributes / 5 /; they are based on a "pruning procedure"
that produces, for a given attribute, a reduced tree
whose analysis gives only the requested MCSs. In other
words, from the original tree all branches that cannot
give the requested MCSs are removed in order to simplify
the tree to be analysed. The main steps of these
algorithms, taken from /5/, are described in the appendix.
Finally, the determination of the MCSs for the fault tree
with C C Fs explicitely modelled according to the third
method, can be performed by means of standard fault tree
analysis codes.
However, due to the complexity of the resulting fault
tree, some procedures have been proposed to reduce the
analysis effort needed. A complete description of these
methods (approximated techniques) can be found in / 6 /.
One of these techniques consists in the analysis of the
tree in two stages. At first the MCSs are determined at
the component failure mode level (e.g. for the example of
fig. 4 the result would be A*B + A*C ), and then the
different causes of failure of each component are
expanded. This expansion can, to its turn, be performed at
different steps by progressively considering the various
types of common cause events, according to the following
priority, as suggested by the experience:
3. CONCLUSIONS
The analysis of a complex system has necessarily to
consider the possible types of dependencies, in order not
to under estimate the top event probability. The C C F
modelling can be performed in different ways depending on
the aim of the analysis. When data are available, the
third method described can be applied for quantification.
However, in many practical situations only qualitative
results can be obtained and in these cases the other two
approaches allow to identify those parts of the system
that may require further considerations from the designer.
To this aim an interactive analysis procedure is proposed
and two fundamental algorithms described.
138
.total
*
. . 4tll
^C
~^ <
n litt
139
'<>
C I. c
"~^ ^ 7
_1_
"11*'
H i l l
Hill
an tily
Ittil
S
TJ o c> ^ 7??
* IC
O"^ O^^> ^J
s
o o
140
4. REFERENCES
/l/ A. Poucet
"Experience and Results of the CCF-RBE"
This volume
/2/ A. Amendola, C. Clarotti, S. Contini, F. Spizzichino
"Analysis of C omplete Logical Structure in System
Reliability Assessment", EUR 6886, 1981
/3/ P. Humphreys
"Analysis Procedures for Identification of MRF"
This volume
IM A. Blin et al.
"Patrec, a Computer Code for Fault-Tree C alculation",
Synthesis and Analysis Methods for Safety and
Reliability Studies, Proceedings of NATO ASI, Edited
by G. Apostolakis, S. Garribba and G. Volta, 1980.
/5/ S. Contini,
"Algorithms for Common Cause Analysis. A Preliminary
Report", PER 106.01.81.16, JRC Ispra, 1981
/6/ K.N. Fleming, N.O. Siu, D.C. Bley, M. Kazarians
"PRA Procedures for Dependent Events Analysis", Vol.1,
2, EPRI PLG-0453, 1985
D.M. Rasmuson, N.H. Marshall, J.R. Wilson, G.R.
Burdick
"COMCAN II-A. A Computer Program for Automated C ommon
Cause Failure Analysis", TREE-1361, 1979.
/8/ CD . Heising, D.M. Luciani
"Application of a C omputerized Methodology for
Performing Common Cause Failure Analysis: The Mocus
Bacfire Beta Factor (MOBB) Code ".
Reliability Engineering, Vol. 17 No. 3, 1987
/9/ K.N. Fleming
"Parametric Models for Common Cause Failure Analysis"
This volume.
141 -
APPENDIX
The analysis of the sensitivity of a system to common
causes of failure can be advantageously performed
interactively by applying the algorithms described below.
The first algorithm can be applied to determine the
critical C C F events and the MCSs involved. The second
algorithm allows to determine also the set of relevant
CCFs. For the sake of simplicity these procedures are
described for a single common cause, being the
application to a set of common causes straightforward.
Both algorithms require, as input data, the fault tree and
the attributes associated with the primary events.
142
A2. Determination of the MCSs affected by a relevant CCF
143
TREATMENT OF MULTIPLE RELATED FAILURES BY MARKOV METHOD
J. CANTARELLA
Universit Libre de Bruxelles, Service de Mtrologie Nuclaire
50, avenue F.D. Roosevelt
1050 Brussels
Belgium
1. Introduction
145
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 145-157.
1989 ECSC, EEC, EAEC. Brussels and Luxembourg.
2.1. Markovian assumption
P(n) {p(n))
() = {()}
where
TTi.<n) is the probability to be in the state s at
time t
Tt(n) = (1 ).P(n1 )
dir(t)/dt = TT<t)P
146
It can be shown (and it also appears quite intuitively) that the
markovian assumption corresponds to exponentially distributed failure
and repair laws.
Failure and repair rates can depend on calendar time (non-
hotnogeneous process) or be constant (homogeneous process).
To illustrate the evolution equation, we finally present here
trie most trivial example, the one of a system including one single
repairable unit.
Example
"transition-diagram"
set of equations
diTi/dt = - + u ne
diTo/dt . u TTC
The starting point of our work has been the Sstagen-Mmarela code
(Superstate Generator-Markovian Reliability Analysis) developped by
I. PAPAZOGLOU and E. GYFTOPOULOS at the Brookhaven [1].
Its main features are :
an automated construction procedure of the transition-matrix
a block-storage of this matrix (as we will see later on, we
have to deal with very sparse matrices)
an automated merging procedure of Markov processes of systems
exhibiting symtries.
In spite of this last feature, there is still a strong need to
reduce memory requirements and calculation times, as we will see in
next section.
147
3. Advantages/Impediments To Markov Modelling
3.1. Impediments
3.2. Advantages
- 148 -
-, if component 2 is operating
but
- 149 -
1 r- 1 r '''
^10'
-2
10
v-
-10-*
m
<
10
<
>
^-8
-10
10
' I 1 I I 1 1 I I I . I J LJ
2.0 4.0 6.0 8.0 103
TIME(HOURS)
Without entering in details about results and conclusions of
this quite extended study we performed, let us just note that it
provided issues on :
the coherency between limiting condition of operation
respectively applied to level 1 and level 2 (bunker) protection
systems
the influence of testing policies
the influence of considering common cause failures upon choices
of testing-maintenancing policies.
151
upon the state of subsystem 1, i.e. Pr(Sa/Si) = PriSa/S,). Using
following boolean decomposition, we are able to quantify conditional
probabilities, as presented here,
S a = Sa<Xi + ,)
and
Pr(Sa/Si> = PrCSa/X^.PrCX,/^,) + Pr ( Sa/X ! ). Pr ( X i /S, )
Pr(X,.S,) = n 1 1 (t)
152
1 2
L_
Vapor
generators
6 2
FIGURE : STRUCTURE OF AF WS
electricnetwork 2
motopump 3,5
turbopump 4
motovalve 7
diesels 1,6, 8
Original system Supercomponent technique
= 1fiIIa
- 154 -
dir/dt = = (, 2 > (Kronecker sum)
d n V d t = 1 P, and d n V d t = na Pa
, 1 = 1 S rr
defining
, = n 1 1 >
Most of our calculation tests were performed on this
interpolation way.
155
iterations do or do not significatively modify results (values of ir);
if not, a fine grid iteration is injected.
Tests can include several eriterias, logically combined, the
choice being function of user's wishes (putting stress on
unavailability calculation, carrying a cheap (fast) calculation, or a
costly but precise one, etc.).
Ln norm L norm X
7. Some Conclusions
156
Concerning the multigrid approach, at this stage, we are able to
treat more than a thousand microstates.
References
- 157
PARAMETRIC MODELS FOR COMMON CAUSE
FAILURE ANALYSIS
K. N. FLEMING
Pickard, Lowe and Garrick, Inc.
2260 University Drive
Newport Beach, CA 92660 USA
ABSTRACT. This paper provides a brief presentation of several of the most frequently used
parametric models for common cause failure analysis. The concept of common cause basic
event is introduced and used to establish relations between various parametric models.
Generalized formulas to estimate the frequencies of common cause basic events are
provided for each parametric model.
1. Introduction
Parametric models have been the primary means for modeling and quantifying an
important class of dependent events called c o m m o n cause failures (CCF). The use
of parametric models, however, has been clouded by s o m e c o m m o n l y held myths.
For e x a m p l e , it is often heard that parametric CCF models:
Feedback loop between CCF experience data and system reliability models.
159
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safely Assessment, 159-174.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
Means of identifying and evaluating defensive measures to enhance reliability
performance.
In fact, the parametric models have been quite successful In all of these areas as
witnessed by the past and current work in probabilistic risk assessment of nuclear
power plants.
The objective of this paper is to provide a brief overview of the basic
characteristics of several of the most commonly used parametric models. A more
detailed discussion of these models can be found in Reference 1. The reader is
also referred to another paper in this volume (Reference 2) for issues regarding
data analysis and estimation of the parameters of these models. In order to present
the models and describe how they are related, it is useful to introduce the concept
of common cause basic events. This is the subject of Section 2. The parametric
models are then described in Section 3.
160
with the following minimal cutsets:
The reduced Boolean representation of the system failure in terms of the above
minimal cutsets of the component-level fault tree is
S = A B + AC + BC 0)
The expansion of this component-level Boolean expression down to the common
cause impact level can be illustrated by representing each component-level basic
event as a subtree, such as that shown below, in which it is assumed that common
cause failures can lead to either two or three components failing simultaneously.
AT = A, + C A B + C A C + C A B C (2)
where
=
CAB failure of components A and (and not component C ) from
common causes.
161
When all the components of our two-out-of-three example system are expanded
similarly, the following minimal cutsets are obtained
{AB,} ; {A | C ,} ; {B C ,}
(CAB) : {C AC } : {C BC }
{ C ABc)
S = A, , + A, C , + B, C , +
(3)
+ C A B + C A C + C BC + C A B C
Had the success criterion for this example been only one out of three instead of
two out of three, it is clear that a substitution of subtrees, like those shown above,
into the system fault tree would produce cutsets of the type, C A B C A C . These
cutsets have questionable validity unless the events C A B and C AC are defined more
precisely. One option is to define the events C A B and C A C to be mutually exclusive.
The n, the Boolean expression in Equation (2) would represent a partition of the
failure space of A into mutually exclusive parts based on the impact on other
components in the common cause component group of the underlying set of
causes. This would imply that the probabilities of cutsets like C A B C AC are
identically zero. An alternative option is to construct the events C A B , C AC , and C A B C
as sums of contributions from specific root causes so that, for example,
C = C
AB / AB
162
picture are neither mutually exclusive nor exactly independent, and the probability
of C A B C AC cannot be calculated directly without using the decomposition into
cause contributions.
It will be seen later that the causes are considered in classifying events in terms
of their impact on components. If in this process, events that could have been
identified as C A '' C A $ are classified (as is most likely) as
A| C BC , C| C AB , B| C AC , or C ABC , then cutsets like C A B C A C should be
eliminated to avoid double counting. Such a counting process then makes this
option equivalent to the previous, mutually exclusive definition of the events. It is
clear that the definition of the events, the counting process by which event reports
are classified, and the way the results are used to estimate the parameters of
common cause models are closely intertied.
Although complete agreement has not been reached on the most appropriate
definition of these events, it fortunately does not make a significant numerical
difference to the results because, in general, the contribution of cutsets like
C A B C CD is considerably smaller than that of cutsets like C A B C .
The primary objective of this step is to select the common cause model that will be
used in the quantification of the common cause basic events. The cutset Boolean
equation is transformed so that the probabilities of the basic events can be
substituted directly into the resulting algebraic expression.
For example, in the three-component example system of the previous section, the
algebraic equivalent of Equation (3) in terms of the probabilities of the basic events,
using the rare events approximation, is
a P(a) + P(b)
- 163 -
It is a common practice in risk and reliability analysis to assume that the
probabilities of similar events involving similar types of components are the same.
This approach takes advantage of the physical symmetries associated with
identically redundant components in reducing the number of parameters that need
to be quantified. For example, in Equation (4) it is assumed that
p c
P(CAB) = P(CAC) = ( B c ) = Q2 (5)
P(CABC) = Q3
Note that the probability of failure of any given basic event within a common
cause component group depends only on the number and not on the specific
components in that basic event. This is called the symmetry assumption.
Continuing with our example, the system failure probability [Equation (4)] can be
written as
Note that the total probability of failure of a specific component can be obtained
from the Q^'s. This can be seen, for example, from Equation (2) where the failure of
component A due to all causes is expanded in terms of the basic events.
Transforming Equation (2) into its equivalent probability model and using
Q1, Q2, and Q3, as defined in Equation (5), we get
Q, ~ Q, + 2Q2 + Q3 (8)
- 164 -
<9>
m
.=(::,
k=1
m-1 / (m-1)!
(10)
k-1 V (m-k)! (k-1)!
represents the number of different ways that a specific component can fail with
(k 1) other components in a group of m similar components.
The model that uses Q k 's defined in Equation (7) to calculate system failure
probability is called the basic parameter model (Reference 3). Ideally, Q k 's can be
calculated from data in which case there is no need for further probabilistic
modeling. Unfortunately, the data required to estimate Q k 's directly are not
normally available. Other models have been developed that put less stringent
requirements on the data. This, however, is only done at the expense of making
additional assumptions that address the incompleteness of the data. Several of
these models are summarized in Table I and explained in the following. These
models can be categorized in several different ways, based on the number of
parameters, their assumptions regarding the cause, coupling mechanism, and
impact of common cause failures.
The categories for the number of parameters required for modeling common
cause events are:
With respect to how multiple failures occur, there are two categories:
Shock M odels
Nonshock M odels
The "shock models" estimate the frequency of multiple component failures by
assuming that the system is subject to common cause "shocks" at a certain rate and
estimating the conditional probability of failure of components within the system,
given the occurrence of shocks. The common cause failure frequency is the
product of the shock rate and the conditional probability of failure, given a shock.
Finally, as mentioned before, except for the basic parameter model, all common
cause models discussed in this report estimate the probability of basic events
indirectly; i.e., through the use of other parameters. In general, the types of
parameters, estimation method, and data requirements vary from one model to
165
Table I. Key C haracteristics of the Parametric Models
m
t = k k
k=i
Binomial , , , w m 1
0i+W(1p) k=1
Failure k
0k = w ( 1 p)m~k k> 2
u o Rate
o -a ppm + w k= m
c a
166
another. However, with the current state of data that involve large uncertainties, the
numerical impact of selecting one model over another is not significant, given a
consistent treatment of data in all cases. These points become clearer in the
following sections. The remainder of this section deals with a brief description of
the various parametric models summarized in Table I.
The single parameter models refer to those parametric models that use one
parameter in addition to the total component failure probability to calculate the
common cause failure probabilities. The most widely used single parameter model,
and the first such model to be applied to common cause events in applied risk and
reliability analysis, is known as the beta factor model (Reference 4). A variant of
this model, called the C-factor method (References 5 and 6), employed the same
model, but, in order to address the incompleteness of the data sources, used a
different method of estimating the parameter. According to the beta factor model, a
fraction (/?) of the component failure rate can be associated with common cause
events shared by the other component in that group. According to this model,
whenever a common cause event occurs, all components within the common cause
component group are assumed to fail. Therefore, based on this model, for a group
of m components, all Qk 's defined in Equation (7) are zero except Q t and Q m . The
last two quantities are written as
Qi = ( 1 - 0 ) Q.
(11)
Qm = Qt
Qm
r
(12)
Ql+Qm
Note that Qt, the total failure probability of one component, is given as
Q. = Q l + Q m (13)
167 -
Q, = (1 - ) Q,
Q2 = 0 (14)
Q3 = Qt
which gives
As can be seen, the beta factor model requires that an estimate of the total failure
rate of the component be provided from generic sources of data and that a
corresponding estimate for the beta factor also be provided. A practical and useful
feature of this model is that the estimators of do not explicitly depend on system
or component success data, which are not generally available. This feature, the fact
that estimates of the parameter for widely different types of components vary
much less than estimates of Qk, and the simplicity of the model are the main
reasons for wide use of this method in risk and reliability studies. It should be
noted, however, that estimating factors, just as with any reliability analysis
parameter, requires specific assumptions concerning the interpretation of data
(Reference 7).
Although historical data collected from the operation of nuclear power plants
indicate that common cause events do not always fail all redundant components,
experience from using this simple model shows that, in many cases, it gives
reasonably accurate (only slightly conservative) results for redundancy levels up to
about three or four items. However, beyond such redundancy levels, this model
generally yields results that are conservative. When interest centers around
specific contributions from third or higher order trains, more general parametric
models are recommended.
For a more accurate analysis of systems with higher levels of redundancy, models
that represent the range of impact levels that common cause events can have are
more appropriate. These models involve several parameters with which to quantify
the specific contribution of various basic events.
Four such models are selected here to provide adequate representation of the
methods that have been proposed. In the nonshock model category, the multiple
Greek letter (MGL) model (Reference 8) and the alpha factor model (Reference 9)
are discussed. The shock model category is represented by the binomial failure
rate model (References 10 and 11). These models are briefly described in the
following paragraphs.
168
3.2.1 Multiple Greek Letter Mod el. The MGL model (Reference 8) is the most
general of a number of recent extensions of the beta factor model. The MGL model
was the one used most frequently In the International Common Cause F ailure
Reliability Benchmark Exercise (Reference 12). In this method, other parameters in
addition to the ^-factor are introduced to distinguish among common cause events
affecting different numbers of components in a higher order redundant system.
The MGL parameters consist of the total component failure frequency, which
Includes the effects of all independent and common cause contributions to that
component failure, and a set of failure fractions, which are used to quantify the
conditional probabilities of all the possible ways a common cause failure of a
component can be shared with other components in the same group, given
component failure has occurred. F or a system of m redundant components and for
each given failure mode, m different parameters are defined. F or example, the first
four parameters of the MGL model are, as before
plus
The general equation that expresses the frequency of multiple component failures
due to common cause, Qk, in terms of the MGL parameters, is given in Table I.
To see how these parameters can be used in developing the probabilities of the
basic events, consider the three-component system represented by Equation (6).
The maximum number of components that can share a common cause is three
(m = 3). Therefore, y is the conditional probability that the common cause of
failure of a component will be shared by exactly two additional components, and
<5 = 0.
169
Then, from Table I,
Q, = (1 )Qt
Q3 = /JyQ,
The above expressions for Q1? Q2, and Q3 can be used, for example, in
Equation (16) to obtain the unavailability of a two-out-of-three system in terms of the
MGL parameters:
Note that the beta factor model is a special case of the MGL model. For this
example, the MGL model reduces to the beta factor model if y = 1. In particular,
Equation (17) reduces to Equation (15) if y = 1.
no
and
, + 2 + ... + am = 1
The general equation relating the basic event probabilities, Qk 's to the -factor
model parameter, is given in Table I. As we can see, the key difference between
in this model and the parameters of the MGL and ^-factor models is that the former
is a fraction of the events that occur within a system, whereas the latter are
fractions of component failure rates.
Again, as an example, the probabilities of the basic events of the
three-component system of Equation (6), in terms of the -factor model parameters,
are written as (from the general equation in Table I, with m = 3)
Qi=-5J-Qt
Q2=-^-Q, (18)
2
Q? + 3 -- Qt + 3 -- Qt (19)
- < * )
3.2.3 Binomial Failure Rate (BFR) Model. The BF R model (References 10 and 11)
considers two types of failures. The first represents independent component
failures; the second type is caused by shocks that can result in failure of any
number of components in the system. According to this model, there are two types
of shocks: lethal and nonlethal. When a nonlethal shock occurs, each component
within the common cause component group is assumed to have a constant and
independent probability of failure. The name of this model arises from the fact that,
for a group of components, the distribution of the number of failed components
resulting from each nonlethal shock occurrence follows a binomial distribution. The
BFR model Is therefore more restrictive because of these assumptions than all
other multiparameter models presented in Table I. When originally presented and
applied, the model only included this nonlethal shock. Because of Its structure, the
model tended to underestimate the probabilities of failure of higher order groups of
171
components in a highly redundant system; therefore, the concept of lethal shock
was included. This version of the model is the one recommended.
When a lethal shock occurs, all components are assumed to fail with a conditional
probability of unity. Application of the BF R model with lethal shocks requires the
use of the following set of parameters.
The general form of the probability of basic events according to the BF R model Is
given in Table I.
As an example, using this model, the probabilities of the basic events in
Equation (6) are written as
Q1=Ql+MP (1-p)2
Q 2 = 2 (1 - ) (20)
3
Q3 = +
Therefore,
It should be noted that the basic formulation of the BF R model was introduced in
terms of the rate of occurrence of failures in time, such as failure of components to
continue running while in operation. Here, consistent with our presentation of other
models, the BF R parameters are presented in terms of general frequencies that can
apply to both failures in time and to failure on demand for standby components.
- 172 -
4. References
4. F leming, . N., "A Reliability Model for Common Mode F ailure in Redundant
Safety Systems," Proceed ings of the Sixth Annual Pittsburgh Conference on
Modeling and Simulation, General Atomic Report GA-A13284, April 23-25, 1975.
5. Evans, M., G. Parry, and J. Wreathall, "On the Treatment of Common Cause
Failures of System Analysis," Reliability Engineering, Vol. 9, pp. 107-115, 1984.
8. F leming, K. N., and A. M. Kalinowski, "An Extension of the Beta F actor Method
to Systems with High Levels of Redundancy," Pickard, Lowe and Garrick, Inc.,
PLG-0289, June 1983.
11. Atwood, C. L., "Common Cause F ault Rates for Pumps," NUREG/CR-2098,
prepared for U.S. Nuclear Regulatory Commission by EG&G Idaho, Inc.,
February 1983.
- 173
12. Poucet, ., A. Amendola, and P. C. Cacciabue, "Summary of the Common
Cause F ailure Reliability Benchmark Exercise," Joint Research Centre Report,
PER 1133/86, Ispra, Italy, April 1986.
13. Apostolakis, G., and P. Moieni, "On the Correlation of F ailure Rates," Reliability
Data Collection and Use in Risk and Availability Assessment, proceedings of the
Fifth EUREDATA Conference, Heidelberg, Germany, April 9-11, 1986, published
by Springer-Verlag Berlin, Heidelberg, Germany, 1986.
14. Paula, H. M., "Comments on 'On the Analysis of Dependent F ailures in Risk
Assessment and Reliability Evaluation'," Nuclear Safety, Vol. 27, No. 2,
April-June 19B6.
- 174 -
ESTIMATION OF PARAMETERS OF COMMON CAUSE
FAILURE MODELS
A. MOSLEH*
Plckard, Lowe and Garrick, Inc.
2260 University Drive
Newport Beach, CA 92660 USA
1. Introduction
In recent years, a number of models have been introduced and used for
quantification of the contribution of c o m m o n cause failures to the unavailability of
systems. These include single and multiple parameter models, most of which are
described in s o m e detail in another article presented in this v o l u m e (Reference 2).
Experience s e e m s to indicate that given the same treatment of data, all models with
the s a m e level of detail give comparable numerical values for the frequency of
c o m m o n cause basic events (References 3 and 4). Consequently, what s e e m s to be
driving the answers is the basic information used in estimating various model
parameters. In particular, the quality of failure event reports and the j u d g m e n t of
the analyst in translating the qualitative and quantitative information content of those
reports into basic statistics for parameter estimation, play an important role.
Incompleteness of the data bases and lack of detail in most event reports are
facts of life and the use of judgment by the data analyst in unavoidable
' C u r r e n t address:
Department of Chemical and Nuclear Engineering
University of Maryland
College Park, MD 20742
175
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 175-203.
1989 ECSC. EEC. EAEC, Brussels and Luxembourg.
(Reference 5). It is important, therefore, to follow a systematic procedure which
considers the various issues involving the use of data and ensures the most
consistent and effective use of operating experience in estimating common cause
model parameters. This paper focuses on such a procedure, and is primarily based
on the material presented in Reference 1. It is assumed that the reader is familiar
with the content of Reference 2, and in particular with the definition for different
models and their parameters. The models for which estimators are provided here
are basic parameter, multiple Greek letter (MGL), beta factor, alpha-factor, and
binomial failure rate (BFR).
The rest of the paper is organized into two sections addressing the two main
steps of data analysis. Section 2 presents a procedure for failure event data
classification and screening. Among the topics discussed are available data
sources and techniques for developing a plant-specific common cause failure data
base. Section 3 discusses the issues involving parameter estimation and the
various sources of uncertainty.
Ideally, the numerical value of the parameters of the common cause failure models
should be estimated in a manner that makes the maximum possible use of event
data; i.e., reports of operating experience. This requires review, evaluation, and
classification of the available information to obtain specialized failure data. Because
common cause failures can dominate the results of reliability and safety analysis, it
is extremely important that this analysis of data is performed within a context that
represents the engineering and operational aspects of the system being modeled.
Due to the rarity of common cause events and the limited experience of individual
plants, the amount of plant-specific data for common cause analysis is very limited.
Therefore, in almost all cases, we need to use data from the industry experience
and a variety of sources to make statistical inferences about the frequencies of the
common cause events. However, due to the fact that there is a significant variability
in plants, especially with regard to the coupling mechanisms and defenses against
common cause events (Reference 1), the industry experience is not, in most cases,
directly applicable to the specific plant being analyzed although much of it may be
indirectly applicable. Also, and perhaps equally important, the analysis boundary
conditions that dictate what category of components and causes should be
analyzed, requires careful review and screening of events to ensure consistency of
the data base with the assumptions of the system model, its boundary conditions,
and other qualitative aspects delineated in the analysis.
The significance of this step has also been emphasized by Reference 4 since an
important conclusion of the Common Cause Failure Reliability Benchmark Exercise
(CCF-RBE) was that the most important source of uncertainty and variation in the
176
numerical results s data interpretation. Thus, careful attention and documentation
must be given to this step.
The first step in data analysis is the data gathering task. This involves r e v i e w i n g
the existing data sources that generally fall into one of the following categories:
The quality and level of detail of the information provided by these sources varies
significantly. The best sources of information are the plant records or " r a w data."
However, since c o m m o n cause events are rare it is necessary to collect data f r o m a
large number of plants even w h e n the analysis is c o n c e r n e d with a particular plant
or system. This, however, will be costly and impractical in the scope of
plant-specific analyses.
The next best source is a generic raw data c o m p i l a t i o n , such as LERs, w h e r e
failure events which are selected and interpreted by the compiler, are d e s c r i b e d .
The obvious drawback of this type of data source is possible incompleteness with
respect to the number of events presented and the fact that the event descriptions
are often s u m m a r i z e d and are based on an analysts interpretation.
Finally, sources that provide classified event reports and/or generic estimates for
model parameters can and are widely used w h e n resources are limited. However,
as w e will discuss later, this is the least desirable option since c o m m o n cause
events exhibit significant plant-to-plant variability which must be considered in
estimating model parameters.
Once the raw data (event reports) are collected, the next step is a review and
classification of the events to identify where each event fits in a set of predefined
categories describing the type of the event, its cause(s), and its impact; e.g.,
number of c o m p o n e n t s failed. For this purpose, a data classification approach, such
as one developed for Electric Power Research Institute (EPRI) (References 16
and 17) is needed.
The EPRI classification system makes use of a cause-effect logic diagram to
portray the interactions between root causes and c o m p o n e n t states in an event.
- 177
Once the event scenario deduced from an event report is modeled in this way,
dependent events are easily identified and their impact on the original system can
be readily seen.
This classification of event reports is a rather subjective exercise, particularly in
light of the quality of many of the event reports. In an attempt to reduce subjectivity
in the screening of event data to identify common cause failures, the CCF-RBE
(Reference 4) identified the following rules, which have been somewhat modified.
3. If the cause of the reported event is a train interconnection that, in the plant
under consideration, does not exist, the event is considered as an
independent failure of one train.
6. If a second failure in an event happened after the restoration of the first, both
failures are considered as independent failures.
7. Events regarding incipient failure modes (e.g., packing leak, etc.) that clearly
do not violate component success criteria can be screened out.
8. Only the events regarding the failure modes of interest were taken into
consideration; events regarding failure modes that are irrelevant to the
system logic model can be screened out.
Rules 2 and 3 are more directed to the screening of events for applicability to
other plants. More detailed discussion about event screening can be found in
References 1 and 16.
178 -
2.3 EVENT IMPACT ASSESSMENT
Plant
(Date)
Status Event Description Cause-Effect Diagram
To complete the description of the event impact at the original plant, the analyst
needs to identify the f o l l o w i n g :
Figure 1(b) s u m m a r i z e s the information about the event for the e x a m p l e event
described in Figure 1(a) and introduces the representation called the impact vector
(References 3 and 5).
179
Component Impact Vector
Shock Type Fault Mode
Group Size
Fo Fl F2
2 0 0 1 Nonlethal (L) Fail To Open on Demand
The binary impact vector of an event that has occurred in a common cause
component group of size m has m + 1 elements.*
Each element represents the number of components that can fail in an event. If,
in an event, k components are failed, then a 1 is placed in the F^ position of the
binary impact vector, with 0 in other positions. In the example of Figure 1, the
component group size is 2; therefore, the binary impact vector has three elements:
{F 0 Ft F2} . Since two components were failed, we have F0 = Fj = 0 and F2 = 1.
A condensed representation is
I = {0,0,1} (1)
Most of the time, however, the event descriptions are not clear, the exact states
of components are not always known, and root causes are seldom identified.
Therefore, the interpretation of the event [i.e., the translation of the event
descriptions into a form similar to the example in Figures 1(a) and 1(b)] may require
establishing several hypotheses, each representing a different interpretation of the
event.
As an example, consider the event classified in Figure 2(a). Since it is not clear
whether the third diesel was also actually failed, the binary impact vector is
assessed under two different hypotheses [Figure 2(b)]. Under the first hypothesis,
only two diesels are considered failed, while, according to the second hypothesis,
all three diesels were failed. The analyst at this point needs to assess his or her
degree of confidence in each of the two hypotheses. In the example of Figure 2(b),
a weight of 0.9 is given to the first hypothesis, reflecting a very high degree of
confidence that only two diesels were actually failed. The weight for the second
180
hypothesis is obviously 0.1 since the weight should add up to 1. This property of
the weighting factors assumes all reasonable hypotheses are accounted for. Note
that the data analyst must be in a position to defend and document this assessment.
Plant
Status Event Description Cause-Effect Diagram
(Date)
COOLING DG
SYSTEM
h 0.9 0 0 1 0
Nonlethal
Failure
during
(N)
h 0.1 0 0 0 1 Operation
3
p2 p3
Po Pi
Average Irnnant
Vect or (I) 0 0 0.9 0.1
181
The expectation values for the impact vectors, taken over the two hypotheses, are
which is also shown in Figure 2(b). Note that F refers to a single binary impact
vector and P refers to an average impact vector.
This may be used for point estimation.
Up to this point, the event has been analyzed for the original plant. The next step is
to determine what that event implies for the plant and system that are being
analyzed. As was mentioned earlier, the same qualitative and quantitative
information obtained, based on the event at the original plant, may not be directly
applicable to the plant and system of interest due to several reasons, such as
differences in design, operation, common cause defenses, etc. It is therefore
essential to reinterpret the event in light of the specific characteristics of the system
under consideration.
In general, the differences between the system in which the data originated and
the system being analyzed arise in two ways: First, even for systems of the same
size, there are physical differences in system design, component type, operating
conditions, environment, etc.; second, there can be a difference in system size
(degree of redundancy).
In the following, a framework is described with which these two types of
differences can be taken into account explicitly in reinterpretation of the event and
the assessment of the impact vector for the system of interest.
2.4.1 Systems of the Same Size. First, we consider the differences, given the
assumption that the system size is the same. The question to be answered is the
following: given all the qualitative differences between the two systems, could the
same root cause(s) and coupling mechanism(s) occur in the system being analyzed?
In reality, this step involves a considerable amount of judgment. There are a
number of sources of uncertainty. These include the lack of detailed information
about the event, its circumstances, the nature of its causes, the nature of defenses
in the original system, and the effectiveness of defenses in the system being
analyzed. Yet, because of the sparsity of data, there is strong motivation to avoid
tossing the data out and to extract from it that evidence that is applicable. Due to
uncertainties involved and the important implications of screening events out of the
data base by declaring them inapplicable, the analyst must have a concrete reason
for his judgment. In the cases in which the analyst is uncertain about whether an
- 182 -
event is applicable or not, the impact vector of the original system may be modified
by a weight reflecting the degree of applicability of the event, as viewed by the
analyst. This is similar to the multiple hypothesis situation discussed earlier.
Hence, the alternative hypotheses are: (1) applicable with probability and (2) not
applicable with probability (1 - p).
2.4.2 Ad justments for Size Difference. The next step is to consider the system size
differences. The objective is to estimate or infer what the data base of applicable
events would look like if it all was generated by systems of the same size (i.e., the
number of components in each common cause group) as the system being
analyzed. This is done by simulating, in a thought experiment, the occurrence of
causes of failures (both independent and dependent) in the system of interest and
observing how the impact of these causes changes due to difference in system size.
Reference 1 provides a detailed discussion of the background and justification of the
need for adjustment in an impact assessment based on system size differences.*
Reference 1 also develops a set of rules and equations for changing the event
impact vectors of the original system to a corresponding set for the system being
analyzed.
The rules are presented for the following cases:
1. Mapping Down. The case in which the component group size in the original
system is larger than in the system being analyzed.
2. Mapping Up. The case in which the component group size in the original
system is smaller than in the system being analyzed.
2.4.3 Mapping Down Impact Vectors. A complete set of formulas for mapping down
data from systems having four, three, or two components to any identical system
having fewer components is presented in Table I. In this table, P|<(m' represents the
k-th element of the average Impact vector in a system (or component group) of size
m. The formulas show how to obtain the elements of the impact vector for smaller
size systems when the elements of the impact vector of a larger system are known.
'The numerical importance of this adjustment was first explained by Peter Doerre
of Kraftwerk Union, F ederal Republic of Germany, as part of a contribution to the
CCF Reliability Benchmark Exercise (Reference 4). The particular mapping
method presented here is one of several different ways that the impact vectors
can be mapped (see Reference 1B for an example).
- 183 -
Table I. Formulas For Mapping Down Event Impact Vectors
3 2 1
P_OIipMI + pW
2 3 4 3 4
o
iL
I2I . P (3I +. 1 P 13) 111. 131 1 2 |3) 1 (3|
P
0 0 3 l
0
0 * j 1 * 3 2
.
<
2 1 2 1 . 1 131 . 2 131 1 1 1 . ' 131 . 2 (31 13)
3
1 J ' 3 2 3 3 2 * 3
2
UJ
1 (21 131 ,. (31
>
<
V
2 3 2 * 3
IL
O
THE TERM ' 4 ' IS INCLUDED COMPLETENESS. BUT IN PRACTICE. ANY EVIDENCE THA T MIGHT EXIST A BOUT
CAUSES THAT IMPA CT NO COMPONENTS IN A FOUH SYSTEM WOULO BE "UNOBSERVA BLE."
184 -
2.4.4 Mapping Up Impact Vectors. It can be seen from the results presented above
that downward mapping is deterministic; i.e., given an impact vector for an identical
system having more components than the system being analyzed, the impact vector
for the same size system can be calculated without introducing any new
uncertainties. M apping up, however, as shown in Reference 1, is not deterministic.
To reduce the uncertainty inherent in upward mapping of impact vectors,
use is made of a powerful concept that is the basis of the binomial failure rate (BFR)
common cause model. This concept is that all events can be classified into one of
three categories:
2. Nonlethal Shocks. Causal events that act on the system as a whole with
some chance that any number of components within the system can fail.
Alternatively, nonlethal shocks can occur when a causal event acts on a
subset of the components in the system.
3. Lethal Shocks. Causal events that always fail all the components in the
system.
When enough is known about the cause (i.e., root cause and coupling
mechanism) of a given event, it can usually be classified in one of the above
categories without difficulty. If an event is identified as being either an independent
event or lethal shock, the impact vectors can be mapped upward deterministically,
as shown below. It is only in the case of nonlethal shocks that an added element of
uncertainty is introduced on mapping upward. How each event is handled is
separately summarized below.
\ = Pfk) (3)
- 185 -
2.4.6 Mapping Up Lethal Shocks. By definition, a lethal shock wipes out all the
redundant components present within a common cause group. From it follows the
following simple relationship:
2.4.7 Mapping Up Nonlethal Shocks. Nonlethal shock failures are viewed as the
result of a nonlethal shock that acts on the system at a rate that is independent of
the system size. For each shock, there is a constant probability, p, that each
component fails. The quantity pp is the conditional probability of each component
failure, given a shock.
Table II includes formulas to cover all the upward mapping possibilities with
system sizes up to four. In the limiting cases of = 0 and = 1 , the formulas in
Table II became identical to Equation (3) (mapping up independent events) and
Equation (4) (mapping up lethal shocks), respectively.
While it is the analyst's responsibility to assess, document, and defend his
assessment of the parameter p, some simple guidelines should help in its
quantification.
186
Table II. Formulas For Upward M apping of Events Classified as Nonlethal Shocks
2 3 4
>-
<
U. P,(4)|4/3l(lp)P|l3>
O
UJ 3 P2WpPl(3) + (I_p,p2(3)
tn P3(4|.pp2(3) + |,_p)p3(3)
4 3
P< pP 3 ' >
2.4.8 Development of Event Statistics From Impact Vectors. Once the impact vectors
of all the events in the data base are assessed for the system being analyzed, the
number of events in each impact category can be calculated by adding the impact
vectors. That is,
where
187
Later w e will see how nk-s are used to develop estimates of model parameters.
The result of this process is a set of impact vectors that s u m m a r i z e s the translation
of industry experience to the plant of interest. It is stressed that, for this to be
complete, the exercise has to be performed not only for the potential dependent
events but also for the independent events. In this process, s o m e events have been
s c r e e n e d out as being inapplicable. The validity of this screening out has been
questioned because it implies that the plant in question is s o m e h o w better than the
" g e n e r i c " plant that possesses all characteristics of all plants and that it has no
hidden causes of failure that other plants do not. Although it is clearly not
reasonable to a s s u m e each plant has all the characteristics of all plants in the data
base, this screening must be done with care and with specific justifiable reasons for
excluding any event.
The natural way to deal with the question is to c o m p e n s a t e for the deletion of
events by also reducing the exposure and (success data) the number of
independent events. As an example, suppose that it is felt that because of
significant design differences, the events occurring in one or more plants in the
generic data base are not applicable. One approach then is to exclude events for
the affected c o m p o n e n t s at these plants from the data base. This implies a smaller
e x p o s u r e , w h i c h affects the direct estimation of the basic event probabilities.
Additionally, if the parametric models are used, this implies that the associated
independent failure events also be e x c l u d e d . This smaller data base leads to larger
uncertainties in the parameter estimates, which may increase or decrease.
An intermediate case but less practical solution is one in which s o m e of the
failure causes of a c o m p o n e n t apply while others do not. In this case, the events
could be m o d e l e d or excluded depending on the cause. For example, events
relating to failures of diesel generators due to electric start motor problems do not
apply to diesel generators with air start s y s t e m s . On the other hand, generic
causes like human error w o u l d still be held to apply to systems with such specific
design differences. Each source of events w o u l d then be related to the relevant
exposure. This process is probably beyond the capability of current data systems to
support, and the former procedure of deleting plants from the data base for rejected
events is r e c o m m e n d e d .
188
3. Parameter Estimation
The next major step is to use the "pseudo-data" generated in the previous step to
provide estimates of either the basic event probabilities themselves (using the basic
parameter model) or the parameters of the common cause failure models (beta,
BFR, etc.). These estimates are subject to many sources of uncertainty and the
ways in which these are addressed are also discussed here.
The information provided by the set of impact vectors is the numbers of events in
which 1, 2, 3, and up to m, where m is the degree of redundancy, components
failed. To proceed further, it is necessary in the case of the direct estimation of the
basic event probabilities to have estimates of the exposure of the events to the
failures. The exposure may be measured in terms of the number of demands or the
total time, depending on which reliability model is appropriate for the failure mode
of interest. In the case of the parameters of common cause failure models, it is also
necessary to have at least an estimate of the relative exposures in order to derive
estimators. This is illustrated in the following example, which is included for two
reasons: first and most importantly to illustrate how assumptions made about the
way the events in the data base arose affects the estimation of common cause
event probabilities, and second to illustrate the way by which this pseudo-data base
can be anchored to preexisting estimates of single-component failure probabilities.
The example is the derivation of the estimator for the beta factor for the case of a
two-train redundant system in the failure to start mode. The illustration given is for
the case in which the reliability model chosen is that of a constant probability of
failure on demand. An alternative model, the assumption of a constant failure rate
while on standby is somewhat different.
Suppose that the evidence from the pseudo-data base is that there are n 1 failures
of single components and n 2 failure events in which both components failed.
Suppose further that an estimate of the total single-component failure probability,
Qj, already exists. Then, the unknown number of single-component demands, N, in
the pseudo-data base can be estimated by making the identification,
Q = ( n j + 2n 2 )/N. Now, all that is unknown is the number of times, N 2 , that there
was an effective test in the pseudo-data base for the common cause failure. F or
most redundant systems in nuclear power plants, the greatest number of demands
comes from surveillance testing so that the answer to this question can come from
knowing the testing strategy, as illustrated below. Consider the following two
strategies, both of which comply with a technical specification requirement that says
that each train must be tested once a month.
189
Strategy 1. Both components are tested at the same time (or at least the
same shift). In this case, the number of tests against the common cause can
be said to be N/2. The common cause failure probability therefore is 2n 2 /N,
and an appropriate beta factor estimator is
2n,
(6)
'-
This is the familiar estimator found in the PRA Procedures Guide for example.
= N2 + n, + n2 (7)
The terms r^ and n 2 arise because of the failure of the first component, which
occurs nj times on its own and n 2 times in conjunction with the failure of the
other. In this case, therefore, the common cause failure probability is given
by n 2 / N 2 , which is approximately n 2 /N when n1 and n 2 are small compared
to N. This is approximately half of the failure probability that results from
assuming the first strategy is correct. The appropriate beta factor in this case
is
'-* w
This example therefore illustrates the importance of recognizing that specific
estimators are based on particular assumptions about such things as testing
strategy. In general, the testing strategies at the plants in the pseudo-data base
may not be known and will probably be mixed. The two extreme cases here should
bound the real situation.
Table III presents simple point estimates for the various parametric models
described in References 1 and 2, based on the assumption that the data are from
plants in which nonstaggered testing is adopted. F ormulas for changing these
estimates to correspond with the staggered testing model are provided in
190
Reference 1. The estimators are provided in terms of the number of basic events
observed in each common cause impact category (i.e., nj n 2 > ..., nm) and, if
necessary, the number of system demands, N D , which is related to the number of
component demands, N, in the following way: = mN D . To obtain the time-based
parameters (e.g., failure during operation), the quantity N D should be replaced by T,
the cumulative system exposure time; e.g., total number of system operating hours.
Point estimates developed above only provide single values for the parameters of
the models. However, there are numerous sources of uncertainties that must be
taken into account to present a realistic picture of what the analyst knows about the
value of these parameters. In performing uncertainty analysis, it is often sufficient
to develop distributions only for the most important contributors to the system
unavailability, identified through ranking the contributors on the basis of point
estimates.
3.2.1 Sources of Uncertainty. The following provides a brief discussion of the most
important elements of uncertainty and some available techniques for incorporating
these elements in assessing parameter distributions. The uncertainties stem from
one or more of the following reasons:
191 -
Table III. Simple Point Estimators For Various Parametric Models
"('"')/('"')
'('>)/( "0
Generic values of Or, the total failure
Beta Factor frequency art usually available from
generic risk and r e l i a b i l i t y data sources.
l-il-p)n
(a) All estimators assume that. In every system demand, a l l components and possible combinations of components
are challenged. C onsequently, system tests are assumed to be nonstaooered.
(b) For the definition of various paranoters, sec lecture notes on parametric models.
(c) Estimates are developed (or a system of m redundant components.
192
In general, the statistical uncertainty distributions assume that the required data
(e.g., n k 's for the MGL model) are known. Ho wever, as discussed in the previous
sections, such is not the case and the full representation of all uncertainties
requires some refinements in the uncertainty models. In fact, the uncertainties are
mostly driven, not by the usual statistical uncertainties, but rather by such factors as
judgment used in data classification, assumptions made about the population from
which failure and success data are obtained, and completeness of the data bases.
={<,> 1= 1 ; j= 1 M} (9)
where is the analyst's probability for hypothesis j about event i, and I y is the
corresponding binary impact vector. represents the number of events in the data
base, and M( is the number of hypotheses about the ith event. Note that
M,
= 1 < 1 )
As an example, consider a data base composed of two events, with the following
hypotheses and impact vectors:
193
Impact Vector
Event Hypothesis Probability
Fo Fl F2 F3 NA
Event 1 "11 Pu 0 0 1 0 0
'12 Pl2 0 0 0 1 0
There are six possible data sets that can be obtained from the above set of
hypotheses by taking all possible combination of hypotheses. These data sets and
the associated probabilities are listed in the following.
D6 w
6 = P
12 P
23 0 0 1 1 0
() = Wi^lD i) (11)
i=1
194
In reality, the number of data sets that can be generated by considering all
possible combinations of various hypotheses about events is very large. As a
result, the implementation of the rigorous procedure described here is extremely
difficult. An approximate way of including these effects, at least in the mean values,
is to obtain an "average" impact vector for each event before combining them to
obtain the total number of events in each impact category. Formally,
E = { l , i = 1, N} (12)
where
p
u 'u (13)
For instance, in our two-event example, this averaging process results in:
Event p2 p3 NA
Po Pi
Event 1 0 0 Pn Pl2 0
Event 2 P21 P22 P
23 0 0
Then, the resulting data set (by adding P.s from each event) is
Data
"0 "1 "2 "3 NA
Set
D P2i P22 P
11 + P
23 Pl2 0
195
impact vectors. It is recognized that more work is required for a practical and more
complete treatment of uncertainty.
3.2.3 Uncertainty due to Success and Failure Data Completeness. The problems
associated with estimating success (exposure) data (e.g., the number of system
demands or operating hours) needed by some of the parametric models directly and
all others indirectly are not specific to common cause failure analysis. It is, in
general, very difficult to obtain an accurate estimate of the success data because no
success data recording and reporting system exists for the nuclear industry. Even
reconstruction of the success data from plant-specific records, as is often done in
plant-specific probabilistic risk assessments (PRA), is not only a major task, but also
heavily involves the judgment of the data analyst. However, the problem is
exacerbated in the case of common cause failures because of the problem of
estimating the success data for groups of components taken together. Since the
data on which the estimates are based are from groups of plants that probably have
different surveillance test strategies, it is unlikely that "exact" estimators can even
be found, thus adding another dimension to the uncertainty.
Similar uncertainties exist about the completeness of the failure event sources. It
is believed, for instance, that a substantial proportion of all independent failure
events are not reported to the Licensee Event Report (LER) system. Both of these
uncertainties can be represented explicitly in the parametric distributions through
Bayes' theorem by assuming uncertainty distributions for both the success and
failure data.
- i% -
from each plant, without screening events based on their applicability to the
situation under consideration. If it were practical, this would result in a wider range
of possible values for the parameters than if this variation were ignored. In the
second approach, which is adopted in this report for estimation of the common
cause parameters, failure events from various plants are reclassified and mapped
for the plant or system of interest. The result is the formation of a data base much
larger than one based only on the records of the specific plant under consideration.
The resulting statistical uncertainty range for the estimated parameters will
obviously be smaller in this case, compared with a distribution representing
differences in plants. This reduction in uncertainty is the result of applying the
additional information about the specific characteristics of the system being
analyzed and obviates the need for separate consideration of the plant-to-plant
variation for the common cause parameters. This decrease in statistical uncertainty
is bought at the expense of another uncertainty, that in the impact vector
assessment. It is however still essential to consider plant-to-plant variation for total
failure rates.
The systematic procedures for dependent events analysis require the analyst to
screen and classify event data, use estimators provided, and develop uncertainty
distributions and/or point estimates of model parameters for each specific analysis.
This procedure is recommended instead of using published numerical data for these
parameters for several important reasons. One reason is to prevent the use of data
that are inapplicable to the system being analyzed. Another is to provide a
consistent framework for combining data from systems having different numbers of
components and for accounting for differences between the number of components
being analyzed and those associated with systems providing the data. In addition,
event screening can eliminate all inconsistencies between the data and the
assumptions built into the common cause event models. Finally, the event
screening and classification process provides qualitative insights about possible
approaches to defending against future occurrences of these events in the system.
A formidable obstacle to the adoption of an approach based on event screening in
prior analyses was the amount of time needed to sift and sort through such event
reports as the LERs and the numerous problems associated with extracting
quantitative information from the review of these reports. A useful contribution to
lessening the work has been the development and application of the
EPRI-dependent events classification system. The final form of this classification
system (Reference 17) has been and is currently being applied to a large fraction of
the accumulated LERs covering U.S. power reactor experience. As mentioned
earlier, an initial data base of classified event reports, including several hundred
dependent events, is provided In Reference 16. Numerous examples of this
- 197 -
classified data base are presented in this section. This EPRI data base was
expanded in a companion project (Reference 20). The availability of these classified
data bases greatly reduces the time required to incorporate event screening as an
integral part of systems analysis if one is willing to accept the classification of the
authors of the report. It should be remembered that this classification is subjective.
However, at the very least, the report provides a prescreening of the data to identify
event reports worth looking at in detail.
Despite the availability of classified event reports, it is recognized that there is a
continuing need to support analysts who may need to bypass the event screening
step and use published numerical values of common cause event parameters. For
these analysts, a list of what the authors call "generic beta factors" are provided in
Reference 1 (Table IV). Although the use of these generic factors is strongly
discouraged as a substitute for the event screening approach, the use of these
generic beta factors may be used as a coarse and conservative screen for common
cause analysis, provided suitable qualification of the results is indicated. Implicit
assumptions in the use of these parameters include the following:
The analyzed system has the same, yet unspecified, success criteria as those
assumed by the analyst who classified the data.
The Table IV values of the beta factor include both failures to start on demand
and failure to run for all components except breakers and valves. Hence,
they represent an average of these modes weighted by their relative
frequency of occurrence.
- 198 -
Table IV. Event Classification and Analysis Summary
Number of
Reactor Generic Common Causa Generic
Component Events
Years Events Beta Factor
Classified <*> Independent Dependent
Potential Actual
Notes: '
a. Events classified include those having one or more actual or potenti al component failures or functionally
unavailable states.
b. Independent events are those in category LS (linear, single unit); d ependent events are those in the
following categories: LM (linear, multiple unit), BSR (branched, single unit, root-caused), and BSC (branched, single unit,
component-caused); generic common cause events are a subset of event category BSR that meets screening criteria to
be modeled in a systems analysis as a common cause event. Actual common cause events have at least two actual
component states.
c. Average of all component beta factors.
4. References.
10. Sams, D. W., and M. Trojosky, "Data Summaries of Licensee Event Reports
of Primary Containment Penetrations at U.S. Commerical Nuclear Power
201 -
Plants, January 1, 1972, to December 31, 1978, prepared for the U.S. Nuclear
Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-1730, EGG-EA-5/88.
11. Hubble, W. H., and C. F . Miller, "Data Summaries of Liscensee Event Reports
of Control Rods and Drive Mechanisms at U.S. Commercial Nuclear Power
Plants, January 1, 1972, to April 30, 1978, prepared for the U.S. Nuclear
Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-1331, EGG-EA-5079.
13. Atwood, C. L, "Common Cause F ault Rates for Pumps," prepared for the U.S.
Nuclear Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-2098,
EPRI-685-DOC-01, EGG-EA-5289.
14. Atwood, C. L, and J. A. Steverson, "Common Cause F ault Rates for Valves:
Estimates Based on Licensee Event Reports at U.S. Commercial Nuclear
Power Plants," prepared for the U.S. Nuclear Regulatory Commission, EG&G
Idaho, Inc., NUREG/CR-2770, EGG-EA-5485, F ebruary 1983.
15. Meachum, T. R., and C. L. Atwood, "Common Cause F ault Rates for
Instrumentation and Control Assemblies," prepared for U.S. Nuclear Energy
Commission under Department of Energy Contract No. DE-AC07-761D01570,
Idaho National Engineering Laboratory, EG&G, Idaho, Inc., NUREG/CR3289,
EPRI-685-DOC-06, EGG-2258, May 1973.
17. Los Alamos Technical Associates, Inc., "A Study of Common Cause
FailuresPhase II: A Comprehensive Classification System for Component
Fault Analysis," EPRI NP-3837, May 1985.
18. Doerre, P., "Possible Pitfalls in the Process of CCF Event Data Evaluation,"
Kraftwerk Union AG, Proceedings, PSA ' 8 7 - International Topical Conference
on Probabilistic Safety Assessment Risk Management, August 30-
September 4, 1987.
19. Mosleh, ., and N. O. Siu, "On the Use of Uncertain Data in Common Cause
Failure Analysis," Proceed ings, PSA '87International Topical Conference on
Probabilistic Safety Assessment Risk Management, August 30-September 4,
1987.
202
20. Smith, M., et al., "Data Based Defensive Strategies for Reducing
Susceptibility to Common Cause Failures," draft report, prepared for Electric
Power Research Institute, Saratoga Engineering Consultants, 1987.
203
PITFALLS IN COMMON CAUSE FAILURE DATA EVALUATION
P. Drre
SIEMENS AG (KWU)
P.O. Box 101063
D-6050 Offenbach
FR Germany
ABSTRACT. Two major aspects of the CCF problem are addressed: the nature
of CCF in the contexts of data evaluation and reliability analysis, and
uncertainties arising from limitations to expert judgment, when failure
data are evaluated. Seme inconsistencies and pitfalls in contemporary
dependent failure modeling methodology are identified, analyzed and
discussed quantitatively. Methods and solutions are proposed to avoid
inadequate pessimistic evaluations in case of redundancy mismatch.
1. INTRODUCTION
Identification and evaluation of common cause failure (CCF) events
involve a considerable amount of engineering judgment subject to error
and uncertainty. The quality of judgment influences validity and hence
credibility of quantitative results, e. g. of a reliability analysis.
This especially holds for highly redundant systems, in which CCF can be
a major contributor to the system failure probability.
This seminar contribution refers to insights and experiences
mostly gained from the participation in the recent Reliability Benchmark
Exercise on Common Cause Failure (CCF-RBE) of the European Community
(Poucet et al (1)). It consists of two parts.
The first qualitative part deals with the identification of CCF,
especially with unresolved issues, ambiguities, and definition problems.
Its intention is to give an impression why CCF still lacks relevant
definition and why so many different approaches to solve the problem
exist.
In the second part, it is assumed that any CCF can be identified
with sufficient accuracy (although a "definition" may still be lacking).
However, even perfect identification of all CCF events present in a
given data base alone will not guarantee a realistic reliability
estimate for a system to be analyzed, if this system is different with
respect to technical features, degree of redundancy etc. from the
systems in which the original events occurred: further aspects have to
be adequately accounted for.
205
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 205-219.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Within the general problem of using data from one or more
different data sources (which involves data uncertainties as well as
uncertainties due to variations in technical judgment with respect to
applicability), two special problems arising from different degree of
redundancy can be isolated and treated separately. The reason is that
here mathematical restrictions to the process of engineering judgment
play a decisive role. These two problems are connected with the
following two cases of different degree of redundancy:
- the system to be analyzed is of lower redundancy than (some of) the
systems in the event data base;
- the system to be analyzed shows a higher degree of redundancy than
(some of) the systems for which operational experience exists.
Two methods are introduced to treat these "mismatches" in
redundancy adequately. Whereas the first case is completely determined
by the laws of combinatorial analysis together with symmetry
considerations, the extrapolation to a higher degree of redundancy is
non-deterministic and therefore implies further assumptions.
2. CCF IDENTIFICATION
2.1. Prologue
In this chapter some global aspects of CCF definition and identification
are briefly described. No distinction is made between the notations
"common cause failure", "common mode failure", "dependent failure", as
is sometimes done, e. g. in Bourne et al (2). The essential effect of
CCF on a system is the breakdown of multiplicity (redundancy), i. e. the
loss of independence.
Some kinds of dependencies are often excluded from special CCF modeling,
as they can be modeled explicitly, e. g. as extra fault tree entries.
This holds for the following simple cases:
- common or shared components (e. g. auxiliary systems),
- external events or environmental effects,
- failure propagation to adjacent subsystems (if they are not
separated by appropriate barriers).
Although even for these "simple cases", CCF can in principle be
modeled implicitly by a parametric model, usually only the CCF events
which are not simple cases are retained and modeled that way. As a tool
for disentangling the universe of phenomena, a classification system has
been proposed by Crellin et al (3).
- 206 -
Variations in basic understanding of what a CCF is will lead to
different lists from different analysts with different subjective
judgment. This can be treated as a modeling uncertainty and quantified
within a parametric CCF model.
is rarely informative, as
- common causes always exist, as causes are generally not specific for
an individual component, but "generic", i. e. a set property of a set
of components (a statistical population) rather than an elemental
property of a single item;
- any multiple failure of two or more components of the same type within
a given time interval can be considered to be a CCF, if only one cause
for a special failure mode exist (less strict: if one "cause" or
"failure mode and effect" dominate the failure behaviour of this
special component type);
- the effects (manifestations and circumstances of the failure), not the
causes are observed directly and have to be investigated with respect
to common properties.
To the author's opinion, any tentative "definition" or description
of CCF has to address at least 3 major points, namely
- what failure is;
- the time aspect of failure;
- what a component is.
2.5. Failure
As to the notation "failure", Crellin et al (3) clearly distinguish
between two different types of malfunctions, namely failure and
functional unavailability, and propose the use of logic diagrams to
analyze multiple events. We note that the distinction mainly refers to
the (boundaries of the) object of failure and hence to the component
picture. (As this distinction is of minor relevance in this
contribution, the notation "failure" here is used in the most general
meaning of "malfunction".)
207 -
2.6. The Time Aspect: Criterion of Correlated Failure Times
Which are the reasons for a correlation between the failure behaviour of
two or more components? It was pointed out by Virolainen (A) that two
mutually exclusive types of correlations can be identified:
- 208 -
"failure rate coupling" in the context of state-of-knowledge
correlations, in which it is an incorrect treatment of CCF.
- 209
3.2. Representation of Evaluation Results: Impact Vectors
So far only the technical properties of the source systems and the
corresponding boundary conditions of multiple failure events played a
role. When one intends to apply this data base to another system, one
has to ask first of all whether data will still be appropriate under
changed conditions, and how data have to be modified if necessary.
The fundamental (yet often unrecognized) assumption for any data
transfer is the strong principle of causality, which in this context
states that "similar components show similar failure behaviour". Here
similarity of systems or components refers to technical, operational and
environmental features. The following cases of matching relations
between the data base source systems and the target system can be
distinguished:
- equivalent: the impact vector remains unchanged;
- similar (comparable): according to the degree of similarity,
modified weight factors can be chosen for the impact probability
distribution of the event. In this process of engineering judgment,
both technical reasons and mathematical implications have to be
considered;
- not comparable: the event does not apply to the target system.
Two special problems concerning modifications with respect to
different degree of redundancy will be discussed separately in more
detail in what follows. As they are independent from the choice of a
special parametric model, they can be described by using the impact
vector method alone.
210
Here complete symmetry is assumed, i. e. each component has the same
probability to fail in any malfunction event.
A. 2. Examples
() = m!/(n!(mn)!). (2)
211
Event description
P
0 . P1 P
2 P
3 P
A P
5 P
6 N/A
212
Event description
- case 2 (1 event) 0. 0. 0. 0. 1. 0. 0. 0.
213
different cases can be distinguished for the application of "mapping up"
techniques:
single (independent) failure,
total system failure (k of k redundancies),
partial system failure (at least 1 redundancy did not fail).
or (type 2) : (Z,)
r, (2)
/? =>r,H r, { ^ r, (2
' fill
n
' 4 n_ 12. and n n
i \\\)
we obtain
V: 2, (12)
214
5.3. Development of a C onsistent Mapping Up Procedure
The procedure i s developed in two s t e p s . The f i r s t preliminary step i s
very similar t o the method proposed i n the current draft version of the
US "C C F guide" (Mosleh et a l ( 1 2 ) ) , but needs further modification
(which w i l l be done as a second step) in order t o avoid s t a t i s t i c a l
inconsistencies.
n/* = A / S n / 3 1 . (14)
We pessimistically assume that there is also no doubt about the
corresponding outcome of any total failure event, i. e.
n^ = n3 . (15)
But what about the remaining case of incomplete failure events (here
only: 2 of 3)? In the worst case, on can assume
n^ = 0, n 3 = n2 , (16)
according t o the evidence "one component has always been observed i n the
i n t a c t s t a t e " . Likewise, the best case would r e s u l t i n
n ^ = n 3 ^ = 0 , n2>=n2<3\ (17)
215
Using the "most l i k e l y " case together with the approach for s i n g l e
and t o t a l f a i l u r e events, we w i l l now show t h a t t h i s procedure fully
conserves t h e most important numerical impact of C C F, i . e. the two "
f a c t o r s " of the 3 and 4redundant system (which contain the t o t a l
system f a i l u r e in t h e MC L model) are numerically i d e n t i c a l
3 = { 3 V 3 ) 3n3(3) / ( n / 3
^ 3
^ 3
*) (19)
, = ^ V " ^ = ^ ' / ^ ' ^ ^ ^ ^ ' ^ ^ ' ) (20)
With the above contributions of the 3redundant event frequencies from
eqs 14, 15, and 18, we get
where A refers to the general form of the numerators, and to the mul
tiple failure contribution in the denominators of e. g. eqs 8 and 20).
216
no ambiguity will remain about the behaviour of other redundancies in an
-redundant system with > k. The expected failure event frequency
is independent from the redundancy structure of the population and
proportional to the population size, which can therefore be kept
constant, i.e.
,) = P l ( k ) . (23)
Total system failure events. When the target system shows a higher
degree of redundancy (n > k ) , then any postulated of malfunction
event corresponds to an original k of k event, and therefore has to be
renormalized by a reduction factor k/n, as any actual k of k malfunction
event is only evidence for the fraction k/n of a fictitious of
event.
The failure of all system redundancies is often discussed in the
framework of a shock model as a "lethal shock", with the shock frequency
being independent from the degree of redundancy. It should be emphasized
that this assumption is not violated by the proposed procedure, as the
nunber of total failure events remains constant, i. e. independent from
the system size. However, their statistical weights have been
renormalized, accounting for incomplete information.
- The "worst case" method assumes that all -k components failed. For a
2 of 3 component failure event and a 4-redundant target system, the
fourth component is then assumed failed, which yields a target impact
vector with the only non-vanishing probability
IU)
P3 ' = 0.75. (24)
- The original event is used to derive the probability that any number
of the -k components failed: the "most likely" conditional
probabilities of 2/3 for "failed" and 1/3 for "intact" are employed.
The target system impact vector then reads
217
(1) determine the conditional failure probability = (1/k)
( 1 = 2 from k = 3 components failed in the above example,
k-1 = 1 remained intact);
p
(n) =_ (k/n)
(1 .,^ ,n-k M/,.\i ix. -,/,^-k-i
< n k ) (1/k)i n k-i (2
i+i I ((k-i)/k) " ?>
for i = 0, ..., n-k.
6. SUMMARY
- 218 -
REFERENCES
219
EXPERIENCE AND RESULTS OF THE CCF-RBE.
A. POUCET
Commission of the European Communiries
Joint Research Centre
System Engineering Institute
1-21020 Ispra (VA)
Italy
ABSTRACT. The J oint Research Centre of the CEC has programmed a series of benchmark exercises
in order to assess the state of the art in reliability assessment methodology, to assess the uncertainties
involved and to eventually arrive at consensus procedures for carrying out analysis. The Common Cause
Failure Reliability Benchmark Exercise (CCF-RBE) is the second in the series and deals with the problem
of identifying, modelling and quantifying dependent failures. On the basis of a real reference plant and
system, a common set of problems was defined and analysed by ten different teams. The results were then
compared in order to assess the state of the art in CCF analysis, to identify the methods and procedures
applied, to get insights in the magnitude and causes of the uncertainties involved, and to achieve consensus
on the most appropriate procedures, methods and data. The paper discusses these results and presents the
main conclusions and lessons learnt from the CCF-RBE.
1. a start-up and shut-down system with two 100%-redundant electrical motor driven pump
trains (13 in fig. 1);
2. an emergency feedwater system with four 100%-redundant diesel driven pump trains (17
in fig. 1).
221
. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 221-234.
Q1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
1 Steam generators 1 Feedwater unk
2 Vibe compartments 2 Feedwiier pumps
3 Slop fid contiol valves 3 Startup ind ihiiidotvn pumpt
4 Bypass llllion 4 HP leed heiter system
5 Rehmer 5 0 immillili ed riter storage unk
6 HP turbine 6 Demmedhied water pumps
7 LP tmbme 7 Emergency leed pumpi
8 Condenser B Ci'culilinq water pumps
9 Main condenan pumpt 9 Closed loop cooling system
10 LP feed heuer lystern
The TOP event analysed was the failure of sufficient supply of feedwater to the steam generators
after a loss of preferred power. The mission time was assumed to be two hours. In order to
limit the scope of the exercise, external events, i nterde pendencie s with other systems and human
interventions during the mission were not considered.
- 222
2. Working phase was organised to get a deeper understanding of the sources and magnitude
of uncertainties and spread in the quantitative results. The scope was limited to the four
train emergency system only and, based on the qualitative analysis in phase I, a common
set of components to be included in the CCF analysis was agreed. Then the following tasks
were performed:
a. repetition of the calculation according to the new scope but using the data assumed in
phase 1 (allowing to investigate the effect of the boundaries);
b. calculation of the unavailability using a common set of parameters estimated in a
consistent way for different models (allowing to analyse the spread due to different
models);
c. calculation of the unavailability using a set of own parameter estimates obtained from
analysis of 'raw' event data (assessment of spread due to parameter estimation).
The C C F started in September 1984, working phase I was concluded in September 1985 and
working phase ended in April 1986.
- 223 -
gained by the qualitative analysis, must be reflected not only in the choice of the model used for
CCF analysis and in the C C F component groups, but also in the quantification of the model.
The methods used for qualitative analysis were basically of two different types:
1. component driven methods, similar to Failure Modes and Effects Analysis (FMEA), in which
for every component and failure mode an investigation is made on the causes that can be
shared with other components. An example of a dependent failure oriented FMEA table
is given in fig. 2;
2. cause driven methods which start from a classification of potential common causes (checklist)
and analyses which components can be affected by these causes. An example of a C C F
checklist as used by some teams is given in fig. 3.
Component
Component
Component Coupon*t C (ponent Coponent Syitea Syitee Initial Component
Component Hal m e
Identification Typ* ( O n i o n / Ute/ Manu , Internal [temal Condi t I o n i and letting
location nane*
(HaMl/Huaber) Com t r u c i Ioni Tunc t l on facturer Condition! tnvlronaent Operating Policy
Policy
Charactcrlitlct
faergency Feed Heri t i l Provide ' A Del on l i e d Clean. X012S I n i t i a l l y In 4week t e t t Unknown
nfnejtag* emergency water satu endoied; Itandby; t w i t Interval
pump Drive centrifug*! feed flow rated with roo t e * H a r t on demand ttaggered
BSID001/ puBp type A , on deaand atmospheric perature and run for by t r a i n .
G1500001 I th gear to replace otygen; " range. > I houri.
coupling type A the water to JO'C;
to l d i e t e i l o l l by 1 to
drive type A ; tlowoff 90 bar.
actuation c i r due to
c u i t breaker eecay heat
type B. arter
thutdown.
224 -
Normal environment dust
burodity
temperature
vibrations
The FMEA approach can be valuable for determining functional dependencies and cascade
failures to be incorporated later explicitly in the logic model (fault trees). The approach was to
be complemented by a mapping of components according to their C C F related attributes in order
to be able to identify the C C F component groups. This mapping was performed on the basis of
a limited checklist of attributes (manufacturer, type, hazard and location).
The cause driven (checklist based) approach can be used for identification of dependent failures
in the narrow sense: i.e. for identifying groups of components susceptible to experience common
cause failures due to a same root cause, but excluding cascade failure and multiple functional
unavailability.
The C C FRBE led to the conclusion that both types of methods are complementary and that
both approaches should be present in a widescoping analysis. A way to achieve this is to use
a modified FMEA table (table 2). In this table the major component attributes are represented
and columns are inserted for C C F cause categories and identification of other components which
might be struck by the same causes.
225
Com F cd on Com Com Com Tei t ft Fallur Detec Effects Failure Other
ponent Aiuti ponent ponent ponent mainte od* tion pos on other cause com
Iden U cati on Ijpe mannfnc location nance ilbUltj categorica ponents
tarer ponenti lenii tl
for same
eau cs
As the number of identified dependencies can be large, it is necessary to perform some screening
and to extract a list of the most important ones. In this way the C C F component groups to
be analysed in a quantitative way can be determined. Qualitative screening can be performed
by using a set of rules derived from experience. The following basic rules were used by most
participants:
Screening can be performed also on the basis of quantitative criteria: e.g. using a simple model
(such as the factor model) and using generic parameters. A significance level with respect
to independent failure contributions is assumed to perform the screening.
226
22. QUANTITATIVE METHODS.
The quantitative methods used in the CCF-RBE included:
1. parametric models such as the beta-factor model, the multiple greek letter (MG L) model,
the Marshall-Olkin model, the binomial failure rate (BFR) model;
2. methods based on more judgemental assessment such as the cut-off technique and the partial
beta-factor model;
3. the use of failure rate coupling as implicit treatment of dependency.
The factor model was used by some participants for screening purposes only or in comparison
to other models. The attractiveness of the factor model lies in its simplicity and in the
availability of generic factors for different types of components. However, the model may
give conservative result sin the case of multiple redundant systems.
The Multiple G reek Letter (MG L) model was the most frequently used model in the CCF-
RBE. Many participants found it a natural extension of the factor model. Some problems
concerning the estiamtion of MG L parameters were identified during the CCF-RBE. They will
be discussed later.
The Binomial Failure Rate (BFR) model was used by fewer participants. The BFR model is
not general in the sense that it assumes that common cause events either fail all components
together because of a lethal shock or have a binomial impact. It appeared that there was no
significant difference between the results obtained using MG L and BFR models, at least not in
the cases studied.
Both MG L and BFR models suffer from the fact that the parameters share statistical evidence
and, hence, are not independent from each other. This creates some difficulty in calculating
uncertainties.
Moreover, the parameter estimation for those models is based on the use of estimators in terms
of individual component failures (component statistics). Models based on event statistics (taking
into account the number of events involving single, double, triple... failures) are preferable. The
use of component statistics can be shown to lead to an artificial increase in the strenght of the
evidence and, consequently, to narrower distributions on the parameters and a downward shift
of the mean value of the parameters (Apostolato 1986). However, the numerical impact was
believed to be small and not significant in respect of the much larger impact of analyst judgement
used in parameter estimation.
The Basic Parameter (BP) model has independent parameters and is event based, and, hence,
would be preferable from a theoretical point of view. However, it needs data about sample size
and observation time and such data are not easily obtainable. Recently other event based models
have been proposed (e.g. afactor model (Mosleh, 1987)).
The number of parameters for MG L and BP models and their values depend on the number of
redundancies in the system studied. For this and other reasons, the use of generic parameters for
such models was judged to be inappropriate but for screening purposes.
- 227 -
results of the CCF-RBE are discussed. The major source of this variability is the extensive use
of expert judgement in the process of event screening.
Parameter estimation involves:
1. analysis of operating experience in order to identify the event data base to be used;
2. screening of the events in the event data base in order to assess the degree of relevance of
the events for the plant under consideration (impact vector assessment);
3. calculation of the parameters for the choosen model.
The parameter estimation task in the CCF-RBE was based on the used of the Nuclear Power
Experience (NPE) data base composed of LER events and, hence, based on U.S. experience.
LER's only partially report independent failures. Therefore, the combination of MGL or other
parameters assessed from this data base with independent failure rates assessed from another type
of reporting (more complete with respect to independent failures) may lead to an overestimation
of CCF probabilities.
The impact vector method of screening the data was judged to be very convenient but the use
of this method in the CCF-RBE identified some previously not realised problems related with
the extrapolation or extrapolation of events from systems with lowerredundancyto systems with
higher redundancy and vice-versa (mapping up and mapping down). Whereas mapping down
impact vectors (i.e. intrapolation from a higher redundancy system to a lower redundancy one)
is a deterministic application and can be performed by using formulas that take into account
the difference in system size, mapping up implies some assumptions on the nature of the CCF
event (e.g. lethal shock or not). Since the CCF-RBE, some solutions to the problem have been
proposed in the literature (Doerre, 1987).
Besides the differences in system size, the analyst performing impact vector assessment is con-
fronted with differences in system design, technology and operation. In facing these differences,
the analyst may screen out events which are judged not to be relevant for the plant under anal-
ysis, but there is no way for 'pulling in' events which are relevant but do not appear in the
event data base. Therefore, it was agreed by all participants that he parameter estimation task
should be based on an as extensive as possible data base including events related to NPP's of
whatever design.
- 228 -
In the second working phase, a number of common assumptions have been adopted in order to
achieve a better comparability of the results.
* Point value
Mean f Median
Best estimate Range or 90% coni, bounds
point value
10' 8
3
1
3
Tjfla.8.
}l h!
j s'
10 4 *
* 1 fl
I
*
I #
.. 1
*
10 P
[
I 1.
10
w
*
10 7
1 1 1 1
Figure S: CCFRBE Phase I: total unavailability of feed function (startup/shutdown and emergency feed systems)
229
* Point estimate "f Median Best value from first phase
lis S
SS caioil Jnp
io2
*
1 ii "H s
" r ,,
3
*- 1 .
*
10 3 *
*
4
10
I I
F FRG1 FRG2 FRG3 I S UK A
US
DK
Figure 6: CCFRBE Phase II: emergency feed system unavailability calculated using data from Phase I.
A calculation was then performed using a common set of CCF parameters, assessed in a consistent
way for both the MGL and the BFR models. The parameters were derived from event data out of
the Nuclear Experience data base (a LER based system). The results obtained by the participants
using the common parameter set are represented in fig. 7.
The comparison of the results of these two calculations indicates that, given a well defined scope
and boundary for the system to analyse, different analysts can achieve consistent results on the
condition that the data base is the same. Indeed the results of the second calculation (fig. 7)
show a very good agreement even if different parametric models were applied (the MGL method
seems to yield somewhat wider uncertainty bounds).
Given this, it results that the spread observed in fig. 6 for the first calculation is due to the
spread in the original data (parameter values) used.
In a third calculation, the participants were asked to estimate themselves the CCF parameters for
the MGL or BFR model.
For this calculation, a set of event reports were provided to the participants. This set was the
same as the one used to assess the common parameters applied in the previous task.
The results of the parameter estimation task are summarised in fig. 8. The emergency feed
system unavailability calculated using the parameter estimates is represented in fig. 9.
230
* Point estimate t Median
Mean Range or 90% coni, interval
10 -i
i
li
u
if
<
-o
S
Sg
f? s
* 3
1*
*
* i .
101 t
* * *
**11
f
101
*
lO"4
l i l i I
DK F FRG1 FRG2 FRG3 I UK USA
s
Figure 7: CCFRBE Phase II: emergency feed system unavailability calculated using reference set of CCF parameters.
Again it can be observed that the parameter estimation is a key contributor to the variability
observed in CCF quantification. The major source of the variability is the subjective judgement
applied during the screening and assessment of the events: i.e. the decision whether some event
is applicable to the plant under consideration or not, and to what extent redundancies could be
affected.
In order to reduce the spread introduced in parameter estimation, it was judged helpful to have
some prees tabi ished guidelines for performing the event screening process.
The following general guidelines were used in the CCFRBE:
1. Component caused functional unavailabilites were screened out since it was assumed that
those were modelled explicitly in the fault trees;
2. if a specific defense existed against a class of dependent events, specific events if this class
were screened out;
3. if a reported event was caused by a train interconnection which in the plant under consider
ation does not exist, the event was considered as an independent event;
4. events related to inapplicable plant states (e.g. startup or shutdown) were creened out
inasmuch as they did not reveal general CCF mechanisms capable of occurring in normal
power operation;
231
CCF component group MCL DK(1) DK(2) FR Gl FRG2 FRG3 UK USA
.884 .OM .043 .013 .06 .O .09 .08
Emergency feed pump
dietei aggregale
Y .86 .92 .90 .48 JO Jl .90 .89
Motor valvet (3) .71 .97 .96 * .57 .11 .88 .76
Check valvet (4) . .93 .93 .49 .83 .25 .94 .93
232
* Point estimate t Median
Mean T Range or 90% coni, interval
10'
If 1
10=
!
s
IO"3
10 4
10' 1 1 r
DK UK USA
FRG1 FRG2 FRG3 I
Figure 9: C C FRBE Phase II: emergency feed system unavailability calculated using own parameter estimates
5. Conclusions
The CCFRBE has been very successful in achieving an assessment of the state of the art of CCF
analysis and identifying some key contributors to the uncertainty involved in this activity, .br First
it has allowed to clarify the confusing terminology related to dependent events and has contributed
to better understand the different categories of dependent events and their related terms, .br Once
these different categories were clear, it was also possible to achieve an agreement on the domain
of application of the various explicit and implicit (parametric) modelling approaches:
1. dependent events due to a clear deterministic cause such as unavailability of support func
tions, cascade failures, human errors, should in principle be modelled explicitly in the fault
trees and event tree;
2. the residue of potential multiple failure events for which no clear deterministic cause can
be identified in the system logic, for which a multiplicity of causes such as 'environment',
'design', 'maintenance',etc. can be assumed, or that because of cost reasons or impossibility
to quantify are not further decomposed, can be captured by parametric modelling.
There was a consensus among the participants about the importance of a well structured qualitative
analysis and about the necessity to link this qualitative analysis closely with the subsequent
quantification.
233
The C C FRBE has shown clearly that, with the current state of the art, different analysts can
come up with rather different quantitative results of a CCF analysis, even if very strict common
assumptions on what to quantify and how are taken. The C C FRBE has demonstrated that one
of the main contributors to the spread is the subjective judgement applied in processing event
data for parameter estimation.
Last but not least, the C C FRBE participants agreed that one should use the widest possible
event data base for estimating C C F model parameters. Indeed, the class of dependent events
that we want to cover using parametric models is linked with a very wide spectrum of causal
mechanisms. Hence, every event in whatever design might contain some new information
6. REFERENC ES
Amendola, A. (1985a). Results of the reliability benchmark exercise and the future C EC
JRC Programme, Proc. AN S/EN S Int. Topical Meeting on Probabilistic Safety Methods and
Applications, San Francisco, Feb. 24March 1,1985.
Amendola, A. (1985b). Systems reliability benchmark exercise, Final Report, C EC JRC Ispra
EUR 10696.
Apostolakis G. and P. Moie ni (1986). On the correlation of failure rates, Reliability data collection
and use in risk and availability assessment, H.J. Wingerder (ed.), Springer Verlag, Heidelberg.
Doerre P. (1987). Possible pitfalls in the process of C C F event data evaluation, International
Topical Conference on Probabilistic Safety Assessment and Risk Management, Zurich, August 30
Sept. 4, 1987.
Mosleh ., and N. O. Siu (1987). A Multiparameter, Eventbased C ommoncause Failure Model,
Proc. of the 9th SMIRT, Lausanne, August, 1987.
Poucet ., A. Amendola and P.C . Cacciabue (1987). C ommon Cause Failure Reliability Bench
mark Exercise, Final Report, C EC JRC Ispra EUR 11054 EN.
234
ANALYSIS OF CCF-DATA - IDENTIFICATION
THE EXPERIENCE FROM THE NORDIC BENCHMARK
K.E. Petersen
Systems Analysis Department
Ris National Laboratory
P.O. Box 49
DK-4000 Roskilde
Denmark
1. BACKGROUND
The Benchmark Exercise has been carried out by four working groups:
235
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 235-241
1989 ECSC. EEC. EAEC. Brussels and Luxembourg.
The goals of the exercise were:
- Identification of CCF's which have occurred during the operation
of the plants
- 236 -
Determination of what is a critical failure is to be performed
independently by each institute using own judgment, ATV's orig-
inal classification and ASEA-ATOM's classification which consti-
tutes the basis of the T-book.
3. DATA SOURCES
Barsebck 1 771001821231
Barsebck 2 790101821231
Forsmark 1 810101821231
Forsmark 2 810701821231
Oskarshamn 1 740101821231
Oskarshamn 2 760101821231
Ringhals 1 761001821231
4. IDENTIFICATION PROCESS
- 237 -
butions. Due to incompleteness of the failure reports subjective judge-
ment was used by all working groups causing differences in interpreta-
tion of the same sources of information.
In table 1 the main features of the approach to identification
chosen by all four working groups are summarized.
TABLE 1
FEATURE GROUP
ABB ATOM RIS STUDSVIK VTT
SCOPE
intersystem dependencies yes yes no no
limited to redundancies no no yes no
LERs considered yes no') noD noD
BOUNDING CONDITIONS
separate treatment of overhaul yes
periods
acceptance of failure criticality
according to the Reliability
Data Book yes yes no yes4)
distinction between potential
and actual CCFs yes5) yes ' yes ' no
APPROACH
computerized analysis yes no no
238
Table 2 Identified multiple failure events
Sun
Evnt PUnt Compo D ate Comm Event descri ASEA Ri** Studsvik VTT
No nenti nu* tlon ATOM
239 -
4.3. Screening for Applicability
The list of CCF's presented in table 2 was agreed as a common data base.
The list was screened by each working group independently. The screening
was performed with the following purposes:
5. CONCLUSIONS
- 240 -
- interviews with plant personnel
- maintenance logs.
REFERENCES
241
SOME COMMENTS ON CCF-QUANTIFICATION
THE EXPERIENCE FROM THE NORDIC BENCHMARK
Kurt Prn
Safety and System Analysis
Studsvik AB
S-611 82 Nykping
Sweden
The Benchmark Exercise has been carried out by the following groups:
Event Plant Compo- Date Comm- Event descrip- ASEA- Rise Studsvik VTT
No nents ents tion ATOM
3 Barsebck 2 322V1 810907 r.-R.FH Wrong connec- actual not not not
322V2 810907 r,-,R,FH tions CCF identified identified identified
322V3 810907 r , , R , F H
244
The following sections will show a great variety in data
screening, selection, selection of CCFmodels and parameter estimation
methods among the four teams participating in the exercise. Thus the
basic purpose of learning by the Benchmark Exercise was fulfilled.
2. PARAMETRI
C MODELS USED
P = 4
l/4 \
P = 6 + 6
2/4 2 12
P = 4X + 12 1 2 2
3/4 3 12 +
\ + 4
13
P = + 4 + 3 2 + 12>
4/4 4 V3 2 2X3 + 6
32 + 6
122 + 1 2
\22 +
+ 4X +
2 1
245
These relations combined with Table 2 give us the possibility to
express the probabilities P. . also in terms of the parameters of the
other models treated in this paper.
= ( _ ;). D(
c
j=1 V W D
which for m = 4 yields
X + 3 +3 +
c = \ 2 3 4 <2>
According to the definitions of , y and we can write
3^ + 3_, +.
2 3 4
32 33 + 4 (3)
+ +
3 +
>3 4
= 3 + 3 + (4)
* \ \ \
^4
246
by one or more additional components is not equal to the conditional
rate of multiple failure given a failure, which in the present case is
3 4
^ (6)
4 + 6 + 4 +
1 \ 3 4
The numerator of eq. (6) denotes the rate of multiple failures and the
denominator stands for the rate of failures in general, without specify
ing which components are involved. Ref. 7 also explains very clearly
how the likelihood function normally used for the estimation of MGLpara
meters is incorrectly stated with regard to the evidence of component
failures. The use of that function and the number of component failures
leads to overconfidence in the estimate, and in certain situations also
to an underestimation of the mean value. In light of these difficulties
one may consider the development of the Alpha Factor model described
elsewhere in these proceedings.
The MGLmethod presupposes the knowledge of
The simple relations between these parameters and the parameters of the
BPmodel are shown in Table 2. The parameters of BFRmodel are the same
independent of the number of components in the system. This feature is
a clear advantage compared to the other models presented here, where
247
the number of parameters increases with the redundancy level of the
system. Thanks to this feature one could make the reasonable assumption
that the BFRparameters really are independent of the redundancy level.
One of the Benchmark teams made this assumption and thereby avoided the
"mapping up" problem of transferring the lower redundancy data to sys
tems of higher redundancy. The maximum likelihood estimators developed
for utilizing data from systems of different redundancy led to the re
sults labelled BFR I in Fig. 1.
v
A direct estimation of the BFRparameters above requires observa
tion of independent failures, nonlethal and lethal shocks. In practise
this requirement is hard to fulfil, in particular to distinguish between
independent failures and single failures caused by shocks. Another diffi
culty in our Exercise was the definition of redundant components: all
valves in the same system or strictly redundant groups of valves. There
fore different interpretations were applied in the Exercise. The esti
mates of and have shown to be sensitive to the degree of redundancy
(m), approximately following the relation m = constant. There are
also cases where an increased number of visible nonlethal shocks leads
to a lower value of .
The assumption of the binomial distribution in the BFRmodel may
be too strong. Thanks to this assumption of a certain distribution,
given a nonlethal shock has occurred, the number of parameters keeps
constant for all levels of redundancy. Giving up this assumption leads
to the model described in the next section.
= occurrence rate of shocks that may fail more than one compo
nent
+ + + =1
1 2 3 4
Thus the three free fractions and the rates ~\ and constitute the
parameters of the MFRmodel for a 4redundant system. The relations of
these parameters to the parameters of other models are shown in Table 2.
As described in Ref. 7, the parameters above can be estimated eas
ily by Bayesian statistics if we know the number of system demands (N n )/
the number of independent failures (n ) and the number of potential or
actual CCFs in which j components fail (n.). The name of this model
J
248
refers to the multinomial likelihood function for the fractions .. In
our Benchmark Exercise there was only one MFRapplication, the result
of which, unfortunately, is not correct because of a misinterpretation
of the observation .
In section 3 we mentioned the overconfident estimates normally
used for the MGLparameters. As described in Ref. 7 it is much easier
to define consistent Bayesian estimators of the MFRparameters, based
on Beta and Dirichlet distributions as prior distributions. The multi
variate posterior distribution of the MFRparameters could then be trans
formed to the corresponding distribution of the MGLparameters according
to the relations between these parameters, shown in Table 2.
The recently developed Alpha Factor Model, described elsewhere in
this publication, can be considered as a simplified version of the MFR
model, where the simplification consists of the fact that no distinction
is made between independent failures and single failures caused by CCF
shocks. The Alphafactors have a multinomial likelihood and are there
fore easy to estimate by a posterior distribution. A posterior distribu
tion of the MGLparameters, if that distribution would be required for
reasons of comparison, could then be obtained from the relations in
Table 2, or explicitly
2. + 3, + 4.
2 3 4
, + 2 + 3. + 4.
1 2 3 4
3, + 4
3 4
(7)
2_ + 3, + 4
2 3 4
4
4
3, + 4
3 4
This assumption is quite fundamental and means that the failure probabil
ity of k specific components is not dependent on the total number of
redundant components of the system. Between the probabilities P, and
the probabilities of interest in this study, P. . , we have the relation
l/m
249
TABLE 2 Relations between the parameters of various CCFmodels.
ADDEP ALPHA
+ (-) + * ^ 1"8^ ( ^ ^ ^ - ^ ^ ) at c
2 2()2 2 / 6 ||<ix>X
( 1 B X Cc ( I ^2 Y V
3 + YW
4 Y 3 )^
V2X
cC 3^ '
3 ( 1 p ) *3/4 ()
J C 14>32 c
't
4
4
. 4
-r +*
" f
'4 w\
Bf5X
.,,
4 3 : .
t
\ = F"^ i k 2
* Mc1
a. = a. + 2 o . + 3a_, + 4a
t 1 2 3 4
(8)
(hl/iK/m.
In the ADDEPmodel, which is of pure mathematical nature, the probabil
ities , k = 1, 2, 3, 4, are expressed in terms of additive dependency
factors Dk as follows
k
=
1 1
P
2 = P12+ D
2 (9)
P P 3 + 3D P + D
3 = 1 2 1 3
P = P
4 14+ 4D
3P1 + 3D
2 (D
2 + 2P
1 )+ D
4
Equations (9) are easily extendable to higher multiplicities. Approxi
mate relations between the probabilities and the BPparameters are
shown in Table 2.
In our Benchmark Exercise an application of the ADDEP model was
performed, where the probabilities P, were estimated using MLE (see
Fig. 1). The same team had also applied the Bayesian method to estimate
the multiple failure probabilities P. , starting with both a noninforma
tive and an informative prior distribution. This technique is described
in more detail in the next section. In principle, one could have gone
further calculating the multivariate posterior distribution of the
ADDEPprobabilities by using eq. (8).
According to the definition of the probabilities , , these proba
bilities are directly applicable for the calculation of the minimal cut
set probabilities. The fundamental assumption of P, being independent
on the redundancy level m justifies, in principle at least, a simple
pooling of data for different redundancies.
7. DIRE
C T ASSESSMENT OF A NONPARAMETRIC MODEL
E P a (12)
< i/m> V
and the always negative coveriances
a. a.
1
Cov(P . .)= , (13)
/ 1/m a 2 ( a+ 1 }
where a = a..
o
n i]
^./} =d(Z
/m; * (15)
252
tion". However, one can raise the question whether we really know so
little apriori? A reasonable "engineering judgement" might be that
o r , a c c o r d i n g t o ( 1 2 ) , an p a r a m e t e r such t h a t
could possibly help us to choose the other a., i = 2,..., m. From the
prior distribution of the application DA II (Fig 1, = 4 721,
= 4 606, = 109, = 5, = 1 and = 0) we get the following
set of correlations between , and the other probabilities:
o/4
f (P
l/4' V0/A) = 97
(20)
( 2 / 4 ' P0/4> = " 2 1
f ( P 3/4' P0/4> = n
C ( P 4 / 4 , PO/4) = 0.06
If these correlation coefficients, as a first guess, could be consid
ered to be typical for CCFs of motoroperated valves, they could guide
the analyst to choose the parameter of the prior Dirichlet distribu
tion.
8 SUMMARY
253
io1
102
M MGL^I
X MGL^
103..
D DA S I
0 DAvn
C]ADDEPV io- 4
> BFRAAD
BFRVI
0 MFR a a 105
io1
REFERENCES
255
6. Atwood C. L., Estimators for the Binomial Failure Rate Common Cause
Model, NUREC/CR-1401, Prepared for USNRC by EG & G, Idaho, Inc., April
1980.
9. Dinsmore S., Prn K., CCF Quantification from Plant Data, Report
Studsvik/NP-86, Studsvik Energiteknik, Sweden, April 1986.
256
ANALYSIS OF COMMON CAUSE FAILURES BASED ON OPERATING EXPERIENCE
POSSIBLE APPROACHES AND RESULTS
T. MESLIN
EDF/SPT
3, rue de Messine
75384 PARIS Cedex 08
FRANCE
1. GENERAL REMARKS
257
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 257-276.
1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
A synthetic table provides a summary of the characteristics of the
applied methods.
- Quantification difficulties
- 258
2. ANALYSIS OF COMMON CAUSE FAILURES BASED ON OPERATING EXPERIENCE
RECORDS
2.1. Objectives
Two French data sources are available for this assessment but none is
adapted to the quantitative assessment of common cause failures (nor is
the N.P.R.D.S., as a matter of fact).
The analyses are thus first aimed at determining whether a quali
tative and quantitative assessment of common cause failures is feasible
considering the different data sources now available.
Top priority is therefore given to two aspects :
The amount of data, i.e the extent of the operating experience requi
red for the purpose of quantification.
The nature of the information content of the event reports (event
sheets or S.R.D.F. and N.P.R.D.S. sheets) required to identify and
characterize a common cause failure.
259
This attractive and ambitious characterization is tested against avai
lable data. One major problem here is to have sufficiently large sam
ples to make the characterization meaningful.
2.1.2.3. Quantification
2.2. Method
Instrumentation
. measurement transmetters
. all or nothing sensors
. analog measuring channels and, in particular, protection chan
nels
. process control channels
ATS pumps
. the pump itself
. the electric motor
. the turbine
. the lubrication and cooling devices
. the directly connected controls
LPSI pumps
. the pump itself
. the electric motor
260
. check valves
. manually - operated valves
Generally, all the events recorded in French 900 MWe PWR units since
their commissioning are taken into account.
The operating experience of 1300 MWe power plants was too limited
when the study was undertaken and was, therefore, not used in these
analyses.
When American data are used, they concern PWR plants in general
and, most often, only those built by Westinghouse.
2.2.2.1. Nature
- The N.P.R.D.S.
Wherever possible, failure rates have been computed for some equip-
ment from the N.P.R.D.S. in order to compare the results obtained.
For one type of equipment, the AFS pumps, all the available
N.P.R.D.S. sheets have been analyzed and transcribed into the S.R.D.F.
format using the relevant codes.
. Event File
All the French 900 MWe PWR power plants are taken into account.
- 261 -
The time span studied goes from the commissioning of each unit to
July 1 st, 1985 in general and to December 31 st, 1985 for most recent
studies.
The study thus covers :
9 power plant sites (Fessenheim, bugey, Tricastin, Gravelines,
Dampierre, Blayais, SaintLaurent B, Chinon, Cruas)
32 units
1 087 790 hours in total (124 reactoryears), that is
830 576 hours of operation with the unit connected to the grid.
. S.R.D.F.
The sample consists of the Fessenheim, Bugey and CP1 CP2 power
plants.
The time span considered goes from each plant commissioning to the
31 st 12 . 85 for most recent investigations.
As it takes a long time to incorporate the failures into the
S.R.D.F. after they have occurred, a relatively low number of 1985
events have been taken into account.
. N.P.R.D.S.
Computation Method
262
. Several components are involved in the event, but the failure is
either not catastrophic or not simultaneous.
group of k components
out of ( k < )
up +
+ up +
The table below gives the number of events considered in the various
equipment studies on common cause failure according to the data base
they belong to.
and indicate that the values of the reliability parameters
given in the data bases were used for comparison's sake.
263
EVENT FILE S.R.D.F. N.P.R.D.S. TOTAL
As can be seen, more than 1 200 events have been analyzed one after the
other in the framework of this project.
A total of 100 failures or events among those examined have been regar
ded as common cause failures. These events may be broken down as shown
below.
number of events
EF SRDF TOTAL
INSTRUMENTATION 16 _ 16
VALVES 47 47
AF PUMPS 6 5 11
CS PUMPS 2 2
SI PUMPS 9 9
CIRCUIT BREAKERS 15 15
TOTAL 69 31 100
264
The observed failures can be distributed among the large common cause
failure categories as follows :
Instrumen- Circuit-
tation
AFS SIS-CSS Valves
breakers
Total
Environmental effects
- unfit component 0 3 0 1 0 4
- System layout 1 0 7 0 1 10
- inadequate periodic testing 0 0 0 0 0 0
- loss of auxiliary supporting
system 0 0 0 3 0 3
Manufacturing errors
Unknown 0 0 0 20 0 20
TOTAL 16 11 11 47 15 100
As can be seen in the total sample, the various common cause failures
are rather evenly distributed among the main different failure catego-
ries.
- 20 X environmental effects
- 20 % design errors
- 15 X manufacturing defects
- 30 X human errors.
265
Environmental effects
- The cold wave recorded during the 1985 winter affected the instrumen-
tation (frozen sensors) and various components of the AFS (pipings
and lube oil pumps).
- The operating conditions of the moisture - separator - rehearters
systematically affected the level measurements in these tanks.
- The presence of steam and humidity in the main steam system valve
compartments impaired some sensors.
Design errors
- The Fessenheim plant LPSI pumps have long suffered from the moisture
build-up caused by the SG blowdown system.
- The defective design of the AFS pump discharge lines in some Westig-
house power plants in the USA has, on several occasions, resulted in
the loss of the AFS pumps (so-called "backleakage" phenomena).
Operating conditions
- Vibration of all the CS pumps in the CP1 and CP2 standard units.
- Isolation valve jamming in the RCS - RHRS systems due to an inner
thermal effect.
Human errors
266
EVENT REPORTS SRDF NPRDF EDF/CEA
EPRI NU REG GI
62 63 62 63 62 63
-2 -2 -2 -2 -2
AFS 001- 2,5.10 1,6.10 2.10 2.10 6,8.10" 4,5.10 3.10-2 1,5.10" Sao"2
002 PO
-2
AFS 003 PO 3,6.10
-2 -2 -2
CSS 001- 6.10 6.10 5.10-2 5.10
002 PO
-2 -2 -2 -2
SIS 001- 4.10 4.10 17.10 5.10
002 PO
62 63 64
-2
Instrumentation 5.10-2 2.10 -
-2 -2
Valves Electrically-operated valves 6.10 2.10
-2
Air operated valves 7.10-2 2.10 -
-1
Check valves (leaking) 10 5.10-2
-2
Check valves (stuck) 3.10 - -
-2 -2 -3
Pumps AFS pumps 5.10 2.10 5.10
CSS pumps 5.10 2.10"
-2
SIS pumps 5.10 2.10
- 267
3. ANALYSIS OF COMMON CAUSE FAILURES IDENTIFIED FROM ON-SITE
INVESTIGATIONS
3.1. Objectives
The objectives set for this study were twofold. They are related, on
the one hand, to the PSA of 1300 MWe power plants and, on the other, to
the analysis of plant safety during operation.
- As regards the PSA of 1300 MWe power plants, the purpose is to make
sure that actual operation is taken into account by spotting problems
that may have been overlooked during the design.
- Regarding the safety, the objective is to put to use the teachings of
these investigations to improve the safety level of plant operation
activities by correcting possible design or organization deficiencies
and by making sure these problems have actually been taken into ac-
count in the operating experience.
3.2. Methodology
- 268 -
Nb. %
Environmental effects
Design errors
- unfit component 2 9
- System layout 4 18
Manufacturing errors
- manufacturing defects 3 14
- insufficient controls 0 0
Assembly errors 0 0
Maintenance errors 6 28
Operation errors 3 14
Unknown 0 0
TOTAL 19 100
269
Such a study is undertaken at a centralized level by the Direction
de l'Equipement and by the Service de la Production Thermique. The re-
sulting alterations are introduced in the operating units and taken
into account in the design of the future units.
Concerning manufacturing defects, the replacement of defective
components can be rapidly decided upon and extended to all the concer-
ned units.
- 270 -
Initiating 1 2 3 4
Elements ry event Consequences
events Turbine Opening o f SC r e l i e f
Loss of both Loss of t r i p induced at lease valve
circulating condenser reactor one SG r closing Recorded PotentUl
puapg scraa r e l i e f valve
YES
t a b l U e a t l o n by
s t e s a duap to
ataoephere
Kain eteata
Line Break
i^)
\y S t a b i l i s a t i o n by
t e u <Kun> to
a tzao sphere
Sequence Hain s t e s a l i n e
description Break
OK
OK
W)
SG pressure
Loss of If turbine Scram If valve reduction
at lube o i l la operating > \ decallbra by manually
( Description
pumps tlon Initiating
< steam r e l i e f
system
- A relatively simple modification consists in having each lube oil
pump powered by a busbar in trains A and B. This modification was
studied and then implemented in the Paluel, Flamanville and St-Alban
operating units. It is incorporated in the design of 1300 MWe units
of the next series (Belleville, Nogent...).
- The following remarks sum up the teachings derived from the work per-
formed by EDF using these two approaches.
- 272 -
identical lines is not efficient as it results in a rather small
gain only in terms of reliability. On the other hand, compliance
with the basic quality assurance rules may be an appropriate
measure against human-induced common cause failures.
Quantification is possible.
As regards common cause failure quantification, the French opera-
ting experience is now sufficient for a general study of common
cause failures. Moreover, this approach is made possible by the
quality of the data bases used (Event Files and S.R.D.F.) : exhaus-
tivity, homogeneity, easy handling...
273 -
REFERENCES
121 C L . ATWOOD,
"Estimators for the Binomial Failure Rate Common Cause Model"
NUREG/CR-1401, EGG-EA-5112, April 1980
15/ T. MESLIN
"Analyse et quantification des dfauts de cause commune - Synthse
des tudes"
EPS DCC 008 - EDF/SPT D544 - SN 87/095
/9/ T. MESLIN
"Common cause failure analysis and quantification on the basis of
the operating experience"
Probabistic Safety assessment and Risk management
PSA' 87 - Zurich 30-8, 4-9 1987.
274
APPENDIX 1
1. ENVIRONMENTAL EFFECTS
. air crash ;
. dam failure - induced flood ;
. explosion ;
. fire.
- External n a t u r a l environment
2. DESIGN ERRORS
275
3. MANUFACTURING ERRORS
4. ASSEMBLY ERRORS
5. HUMAN ERRORS
276
MULTIPLE RELATED FAILURES FROM THE NORDIC OPERATING EXPERIENCE
K. U. Pulkkinen
Technical Research Centre of Finland
Electrical Engineering Laboratory
Otakaari 7
SF02150 Espoo
Finland
1. INTRODUCTION
277
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 277-288.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
TABLE I. Identified multiple failure events
Status
Event Plant Compo D ate Comm Event descro ASEA Rise Studsvik VTT
No nents enu tlon ATOM
278
2. THE MULTIPLE RELATED FAILURES OF MOTOR OPERATED VALVES
- 279 -
to a higher number of independent multiple failures, if the standby
failure rate of internal leakages is high.
The number of double internal leakages was 7. In addition to this
number one combination of internal and external leakage was found. As
stated earlier, the high number of multiple internal leakages may be
caused by high standby failure rate of these failures and by long test
intervals (the leakage tests are performed once a year). The cause of
internal leakages could not be exactly determined from the information
in the failure reports, and further, the size of the leakage was not
known. Thus we may conclude that some of the internal leakages may be
noncritical and that they may not be dependent failures. However, the
rather high number of these failures can be judged as evidence of
dependence.
Another type of multiple events was the difficulties in opening
or closing of the valves. This failure category included 7 events.
Four of these events were connected to erroneous torque switch
settings which caused too early torque switching and prevented the
opening or closing of the valve. These failures were often caused by
erroneous maintenance actions and are thus real common cause failures.
Their numerous occurrence gives rather strong evidence for this
conclusion.
One of the opening/closing failures was a triple failure of
valves, which prevented the closing of three valves in a real demand
situation. The cause of this event was a human error in the
maintenance of the control system of the valves: erroneous connection
of a jumper prevented the flow of the valve control signals. This
failure is not a common cause failure, because it was caused by a
failure in the support system of the valves. However, all of the minor
support systems are not explicitly modelled in reliability studies.
This gives a reason to include the above failure in the common cause
failure database as a real common cause failure. On the other hand,
the different design prevents failures of this kind and their
inclusion into the CCF-database may lead to overconservatism.
One of the multiple opening/closing failures was a quadruple
noncritical failure causing sticking of valves. The operation of the
valves was not prevented. Owing to its high multiplicity this kind of
failure is always very important. With as low failure rates as in
Swedish nuclear power plants, the purely random occurrence of
quadruple noncritical failure is extremely unlikely, and thus it is
probable that the cause of this event was connected to some dependent
phenomenon. The cause of this event could not be analysed on the basis
of the failure report. It is possible that the event was only a
preventive maintenance action.
One of the events in the above category was a double failure to
open, the cause of which was not stated clearly in the failure report.
It is, however, probable that the cause was connected to the setting
of the torque switches.
The remaining two of the multiple events were connected to
simultaneous maintenance actions during owerhaul period and they
cannot be classified as common cause failures, which are possible
during normal operation.
- 280 -
TABLE II. Plants and their diesel generators
- 282 -
failures (i.e. failures preventing the safety function of diesel
generator) and 371 noncritical or minor failures. The failures were
observed in start tests, load test, real demands or in routine
inspections. The 65 critical failures included two multiple failures.
The failures were classified according to their cause as
dependent and random failures. The failures were classified dependent
when they were caused by some mechanism which could have been common
for all redundant units. The dependent failures were caused by errors
in testing and maintenance, design errors, manufacturing and
Installation errors or by external events. The failures were
classified random when they occurred without any unexpected cause or
when they were caused by normal ageing. The above classification is
somewhat vague due to imprecise failure descriptions and it is always
based on engineering judgements.
The cause of dependent failure is defined "design error" if the
failure is caused by improper design and if the design is changed
after the occurrence of the failure. Thus design errors can not recur
many times.
The failures caused by errors in manufacturing or installation
are also non-recurring because the cause is removed from the system
after failure detection. The failures caused by faulty materials,
faulty manufacturing and unsatisfactory installation are classified
into this category.
Errors in testing
and maintenance 17 / 26.2% 37 / 10.0%
Errors in manu-
facturing and
installation 6 / 9.2% 34 / 9.1%
283 -
The failures caused by human errors in testing, maintenance and
repair are recurring dependent failures. The failures caused by
external events are also recurring. Sometimes it is very difficult to
make a difference between the above failure categories, and all the
conclusions based on this classification cannot be generalised.
The results of the failure classifications are in table III. The
most dominant failure category is random failures. However, it is
worth noticing that about one half of critical failures are caused by
dependent causes. The human errors in maintenance and testing are the
dominant dependent failure cause. The random failures occurred in all
diesel generator subsystems and they were mainly minor leakages or
failures of electrical devices.
3.2.2 The failures caused by human errors. The human errors in testing
and maintenance were rather frequent and they were thus studied in
more detail. They were further classified into four main categories:
faulty handling, faulty auxiliary activities, omissions and general
carelessness.
The category "faulty handling" refers to errors that originate in
active manipulating with the component and it includes erroneous
actions, imprcisions, careless handling of components etc. This
category is divided further into three subcategories according to the
phase during which the error has been made: testing, maintenance and
calibration.
The category "faulty auxiliary activities" refers to failures
caused by human errors made by personnel that does not normally handle
the component. These errors could possibly be prevented by adequate
communication or tests after auxiliary activities.
The failures caused by omissions form a category rather similar
to faulty handling. Instead of faulty handling of the component there
are omissions of some important phases of the work. These failures
could be prevented by careless restoration of the component after each
operation.
The category "general carelessness" refers to failures in
connection of which there are symptoms of improper care of the
component such as dirty parts, loose connections etc. These failures
might, to some extent, be interpreted as inadequate inspecting of the
status of the component during the test.
The results of the human error classification are in table IV.
The dominant failure category is "faulty handling". It is worth
noticing that one of the double common cause failures was caused by
human error the cause being in the category "faulty handling in
testing". The other human error categories caused about 30% of human
related failures. The distribution of the human error events was
rather similar to both critical and noncritical failures.
284
TABLE IV. Classification of human errors la testing
and maintenance
Faulty handling
- in testing 4 2 6
- in maintenance 4 10 14
- in calibration 3 15 18
Faulty auxiliary
activities 1 - 1
Omissions 1 2 3
General careless-
ness 4 8 12
Total 17 37 54
- 285 -
The first of the above common cause failures is a genuine common cause
event caused by erroneous testing. After detection of this failure the
procedures were changed in order to prevent the occurrence of this
failure.
The second event is a failure of the supporting system which
leads to unavailability of two diesel generators. It is net an actual
common cause failure and it can be included in the fault tree of the
diesel generator system as an independent single failure. It is,
however, doubtful whether this kind of single failure will be taken
into account in the fault trees having the typical level of detail.
In addition to the above critical common cause failures, some
events involving multiple noncritical failures have been reported.
These events have not caused multiple unavailabilities of diesel
generators, but there are indications of the risk that redundant
diesel generators may be simultaneously disconnected for repair of
noncritical failures due to lack of communication between testing and
maintenance groups.
- 286 -
PSAs Chis kind of work, has been started. The work done is, however,
insufficient to be used as basis for CCF-database.
5. CONCLUSIONS
REFERENCES
287 -
4. Dinsmore, S. Common C ause Events Identification from Plant Data.
Final Report, RAS470(85)11 (Studsvik Report/NR85120), February
1986.
288
THE USE OF ABNORMAL EVENT D ATA FROM NUCLEAR POWER
REACTORS FOR D EPEND ENT FAILURE ANALYSIS
H.W. Kalfsbeek
Commission of the European Communities
Joint Research Centre
21020 Ispra (Va)
Italy
1. INTROD UCTION
The Joint Research Centre (JRC) of the Commission of the European Communities
initiated in 1978 a project for creating an information system on operational experi
ence of nuclear power plants (NPPs), the European Reliability D ata System (ERD S).
After a feasibility study the design of this system started in 1980 [] and since 1984
most of it is operational.
The ERD S has been set up as an integral tool for the feedback and use of past
operating experience in the assessment of safety and operation of NPPs by various in
terested organizations such as utilities, constructors and safety authorities.
289
A. Amendola (ed).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 289-302.
61989 ECSC. EEC, EAEC, Brussels and Luxembourg.
The system is structured into four separate databanks, three of which contain
event data, the fourth one being dedicated to the organization of reliability parameter
data. The three event databanks are:
- Operating Unit Status Reports (OUSR), collecting data on productivity and outages
of the NPPs;
- Abnormal Occurrences Reporting System (AORS), collecting information on safety
related events in NPPs;
- Component Event Data Bank (CEDB), collecting information on the failure and
operation of safety significant components in some NPPs.
The fourth bank, of different nature and not yet fully implemented is the:
- Reliability Parameter Data Bank (RPDB), collecting reliability parameter data for
homogeneous classes of components, deriving from operational experience, labora-
tory tests, literature etc.
These databanks have logical couplings, in addition two other databanks con-
taining reactor unit design data are connected to this system, namely the Power
Reactor file, one of the databanks in the Power Reactor Information System (PRIS) of
the International Atomic Energy Agency in Vienna, and the Engineered Safety
Features and Auxiliary Systems (ESFAS) file. The latter databank contains information
on the philosophy and layout of the emergency systems and their support systems in
the NPPs. This data is important for assessing the severity of the abnormal events as
stored in the AORS.
In the present paper only the AORS is briefly presented and discussed. Descrip-
tions and presentations concerning the other parts of the ERDS can be found else-
where, see for instance [2], [s] and [4]. Also, no detailed discription will be given here
since this information can be found in the AORS Handbook [5]. Section 2 has been
added because it is useful for understanding the content of the paper.
The AORS collects information on safety related abnormal events from NPPs in
Europe and the USA. The data input for the system mainly originates from the various
national reporting organizations, in different languages and with different reporting
schemes and criteria, consisting in general of all the safety related operational events
which occurred in the participating plants, that by law have to be reported to the
competent safety authorities in the respective countries. In the AORS all this informa-
tion is merged together [e], homogeinized both in language (English) and content,
codified and stored in a databank which is accessible through the international data
communication networks.
The main theme of the present paper is a search procedure for identifying depen-
dent failure cases, as an illustrative example of the AORS potential for use and analy-
sis. In section 3 an outline of this procedure is given. This type of data retrieval aims
to support the incident or PSA analyst in upgrading the completeness of his insights
and system models by exploiting past operating experience to a maximum extent.
In section 4 an example application of the procedure is described, where as sub-
ject of investigation the mechanism(s) leading to multiple failures of breakers in power
distribution systems and electric power systems in any type of reactor design, is taken.
Also the impact of such failures on safety grade systems in the plant is treated. This
area has been chosen in view of the Sequence Analysis Benchmark Exercise presently
- 290 -
in progress []. Finally, the summary can be found in section 5, together with some
concluding remarks.
The AORS collects in a unique format all safety relevant events from NPPs as record
ed in the participating countries. The system has been setup with the specific objec
tive of providing an advanced tool for a synoptic analysis of a large number of events,
identifying patterns of sequences, trends, precursors to severe incidents, etc. Formally
stated the principal aim of the system is the collection, assessment and dissemination of
information concerning events that have or could have consequences for the safe
operation of NPPs; the system intends to:
homogeinize national data in one unique databank in order to facilitate the data
search and analysis;
be a support for safety assessment both by utilities and safety authorities;
feed back relevant operational experience and the lessons learned from it to the in
terested parties;
facilitate the data exchange between the various national reporting systems and utili
ties.
The AORS databank has been designed to reproduce the abnormal event informa
tion in a manner which is particularly tuned to the needs of the incident and safety/
reliability analyst. More specific, for each abnormal event not only a narrative de
scription is available, but also a coded sequence of socalled occurrences. These are the
basic elements (steps) into which the event can be decomposed and which are usually
the individual component failures, degraded (sub)system states or faulty human inter
ventions (or lack of interventions) determining together the course of the event. Also
the causal and sequential relationships between these occurrences are codified and
available for use.
An event databank thus structured has an increased potentiality for use and
analysis over the classical, pure narrative event databanks as currently in operation in
the international nuclear safety scene. The large, intrinsicly homogeneous and complete
data sample in the AORS gives the possibility to perform trend and pattern analyses,
comparative studies of (time dependent) plant behaviour and statistical frequency esti
mations. Moreover, the coding scheme, i.e. the various free text parts of the AORS
event reports combined with the occurrence sequence coding, allows in conjunction
with the previously mentioned characteristic for more sophisticated types of analysis,
such as the study of precursors to severe incidents, dependent failure analyses, human
behaviour assessment and sequence combining as a support for predictive (probabil
istic) modelling. In Ref. [] an overview is given of the various analysis possibilities
and applications.
The data sources of the AORS form an inhomogeneous set of safety related operating
experience collections. In order to homogeinize this information a manual data proces
291
sing scheme is applied, where the original reports (and where possible additional
sources) are screened thoroughly by hand for extracting the items featuring on the
AORS Reference Format, which is the key for unification of all the input information.
This format is treated exhaustivily in Volume II of the AORS Handbook [s]. It consists
of three types of sheets, the first one serves to collect general event and plant infor-
mation, including the event date, reactor identification, reactor type and age specifica-
tion, event category, classification of the type of transient (if applicable), operating
conditions at the onset of the event, consequences of the event (on plant operation,
radioactivity releases etc.) and a few lines of free text describing the safety signifi-
cance of the event. The second sheet contains the narrative event and sequence de-
scription. Per AORS report only one of these two sheet types is compiled. The third
type of sheet however collects data on the individual occurrences identified in the
event sequence. Hence from this sheet more than one may be present per AORS
report. We refer to the AORS Handbook [5] for a detailed description of all the items
on the Reference Format.
In the context of the present paper it is useful to review in somewhat detail the
items of the occurrence sheet.
Upon screening of all the available information, a chain of occurrences is de-
fined, describing the sequential and causal course of the abnormal event. There exists
no strict definition of what is an occurrence. They must represent the separate equip-
ment faults, operator errors, faulty or degraded system or subsystem states, insofar
discernable and relevant for the evolution of the event. Also, each occurrence can be
linked to one or more other occurrences within the sequence, employing two types of
links, namely purely sequential or causal. For each occurrence the following informa-
tion is compiled:
- title of the occurrence: a two-line narrative indication stating what the occurrence is
to represent, including what failed, in what mode and with what consequences;
- emergency system(s) responding (automatically or manually activated) as a conse-
quence of the occurrence (5 entries possible);
- location of the failure ('failed system'): the system in which the failing equipment is
located or the system affected by the failure (1 entry per occurrence);
- the failing item ('failed component'): the equipment failing or affected by the faulty
human action, or the degraded system or subsystem if none of these is identifiable
(1 entry);
- failure mechanism and causes, including dependency (if any) identification ('cause
of failure', 2 entries possible);
- consequences and effects of the occurrence ('effect on system/component operation',
2 entries);
- detection mode ('way of discovery', 2 entries);
- corrective (remedial) actions ('action taken', 3 entries);
- linking with other occurrence(s) in the sequence (10 entries).
Except for the title all these items are reported by means of codings. The code
dictionaries are given in the AORS Handbook [B] where also some guidance as to the
use of these codes is included. Here it is useful to describe in somewhat more detail
how the cause of failure item has to be interpreted.
There exists no strict separation of failure mechanisms into 'direct' or 'observed'
- 292 -
causes and 'underlying' or 'root' causes. Often it is not possible to extract the necessary
information from the original event reports for making such refined distinctions.
Usually only the direct cause is reported, hence in most cases also the AORS reports
bear only this information. The coding is then straightforward through one single oc-
currence in the AORS report. In the case that more detailed information is available
for establishing an AORS report, root causes may be reported as well. This can be
realized for the more simple cases necessitating only one occurrence in the AORS
report by means of the second cause entry. Otherwise, root cause mechanisms are re-
ported by means of a chain ( 2 or more) of causally related occurrences. The final
occurrence of such a train describes the end effect of the failure, with as cause of
failure the observed cause. Also the intermediate occurrences ( steps in the failure
process) have as cause of failure only the observed one, because these occurrences
themselves embody as a whole the root cause of the next occurrence in the chain, and
their root causes are represented by the preceding occurrence(s). The initiating occur-
rence of such a causal chain features in one of the cause of failure entries its root
cause, should this information be available in the original event report.
In summary, the definition of a series of occurrences within an abnormal event
creates searchable aggregate entities, beyond the classical level of simple codings. This
feature of the AORS is unique amongst the international abnormal event/incident
databases. It creates the possibility for optimal exploitation of past operating ex-
perience.
The type of data retieval practice described in this section is justified by recognizing
the fact that (partial) sequences taking place in different plants, of different design
and under different conditions, still may yield valuable insights for the plant under
study, provided that the generic issues from each sequence allow for applicability or
transferability (e.g. human error impact, dependent failure mechanisms).
In order to understand better the search procedure outlined below we first explain
how dependent/multiple failures are handled within the AORS coding scheme.
Dependent/multiple failures (sometimes named common cause or common mode
failures) are in the context of the AORS event reporting all those instances where
(similar) pieces of equipment failed (simultaneously) due to the same underlying cause
mechanism. This cause could be either internal to the equipment ('pure' common cause
cases) or be external, i.e. the failure of some other equipment, an environmental factor
or a human action, involving multiple equipment. Any of these cases would be marked
with the special dependency indicator, the occurrence cause code value 'CSX'.
In order to classify the cases as internal or external cause it is necessary to read
the complete event report, there exist no separate codified representations of the
various possibilities. However, some breakdown of the retrieved material can be estab-
lished automatically by defining 3 classes of 'CSX' cases from the AORS databank:
- event sequences comprising at least one occurrence that has the value 'CSX' for one
of its cause entries and that has not been caused by some other occurrence(s) within
the sequence. These cases have the higher possibility of being 'pure' common cause.
- 293 -
If that is the case, the 'root' cause of the dependent failure mechanism is reported
by means of the second cause of failure entry in the coding of the occurrence
responsible for the selection of the sequence.
- sequences with at least one occurrence marked with 'CSX' that has been caused by
another occurrence(s) within that sequence which itself is no common cause failure
(otherwise the event would belong to the previous class). Here the pure cases are less
probable, however these could still be present if their root causes are reported by
means of a separate occurrence or a sequence of occurrences (see section 2).
- sequences where there is at least one occurrence that causes more than one other oc-
currence irrespective of any CSX indication. These would be the typical external
cause cases.
The output from a databank search looking for the cause code 'CSX' in any of the oc-
currences split according the above 3 classes (and a fourth class containing mixed
cases) enables the user to digest the retrieved material more effectivily.
We see that the sequence coding in the AORS is the key for a search procedure
that identifies all the event reports bearing relevant information on dependent failures.
Furthermore the retrieved material is processed and presented in such way that the
analyst can use it easily in checking his plant system and component model for any
qualitative aspect not yet incorporated, or for any other purpose.
As described in more detail in [8] there exists a library named AORSLIB containing
computer programs in NATURAL (a fourth generation programming language) that
serve for AORS users as basis for performing more or less standardized searches and
analyses and for deriving user-specific applications.
Two programs, CCF and CCF-SS, may be used for developing applications where
dependent failures of specific characteristics are retrieved and processed. The program
CCF retrieves from the AORS databank four disjunct sets of reports according the
above mentioned grouping and prints for all retrieved cases the full report contents.
The program CCF-SS also first retrieves the four disjunct report sets but then
isolates from these those cases where the dependent failures have had impact on safety
grade systems in the plant. The full contents of these reports are then printed out,
sorted by affected safety system and search class. It is anticipated that safety analysts
are preferably interested in this type of dependent failures.
The scope of this section is to illustrate how a dependent failure search procedure is
applied. The application presented below was not part of an incident analysis activity,
nor of some PSA study. Therefore little attention is paid to the technical conclusions
one could draw from the output. As a matter of fact the screening results would
depend strongly on the scope and the boundary conditions of the study setting the
framework for a search of this type. However, the cases retrieved from the databank
could be relevant in the frame of the present accident sequence analysis Benchmark
- 294 -
Exercise conducted by the JRC-Ispra [7]. In this study a loss of power accident is in-
vestigated, so as an example of a dependent failure search we have chosen common
cause failures with breakers of any type in power transmission systems and in elec-
trical power systems. All systems in class G (power transmission) and class H (electric
power) of the ERDS Reference System Classification (see Vol. Ill of the AORS Hand-
book [6]) are considered:
Power transmission systems: power transmission system general, generator system, main
bus duct system, main transformer system, auxiliary transformers system, back-up
auxiliary transformers system, switchyard to station high voltage connection.
Electric power systems: electric power system general, medium voltage system, low
voltage system, vital instrument and computer AC system, on-site DC system, emer-
gency power supply system, electrical heat tracing system, lighting and taxed motive
power system, security system, communication system, cathodic protection system and
grounding system.
From the earlier mentioned NATURAL program CCF the program was derived that
retrieves the four disjunct groups of (potential) dependent failure cases involving
breakers and related equipment in the above systems. The numbers of reports retrieved
in each class are given in Table I below.
The complete listings off all 203 event reports have been screened by hand in
order to retain only the possibly relevant cases and to discriminate between the pure
- 295 -
(internal) common cause cases and the external cases. To that end, during this
screening the reports have been assigned to one (or more) of the following classes:
"internal technical", i.e. the dependent failure mechanism described in the report
is of mechanical, hydraulical, electrical etc. nature and plays within the boundary
of the affected components. Obviously, the root cause of such "technical" pro
cesses may be environmental or "human" (bad design, construction, maintenance,
repair etc.) but in the report no explicit indication of this is given.
IE "internal environmental", indicates mechanisms within the component boundary
such as fouling, ingress of foreign materials, ambediental effects (temperature,
humidity, chemical reactivity etc.).
IH "internal human", where there is a clear reference to some human deficiency as
root cause of the failure process inside the component boundary, such as design
fault, manufacturing problem, wrong choice of materials or subcomponents etc.
ET "external technical", where the multiple failure mechanism originates outside the
component boundary and is of "technical" nature, see class IT.
EE "external environmental", i.e. a mechanism operating from outside the component
boundary and of ambediental nature, such as lightning, fire, flood, tornado,
smoke, vibration, shocks etc.
EH "external human" where multiple component failure results from erroneous human
behaviour outside the component boundary.
In Annex I for each of the above categories some illustrative examples are given,
as found amongst the 203 retrieved reports.
The effectiveness of the search procedure as to automatically discriminating be
tween internal and external dependent failure mechanisms may be learned from Table
II, where for the power transmission systems and the electric systems together a cross
tabulation is shown of the design search classes A through D (see above) and the target
classes IT through EH.
A C D Tot.
IT 16 0 10 0 26
IE 6 1 2 0 9
IH 22 1 2 2 27
ET 2 14 14 3 33
EE 5 4 5 0 14
EH 51 4 15 6 76
Tot. 102 24 48 11 185
It can be seen that the search result did yield some nonrelevant cases, these stem
from the broad definition of "circuit breakers" used in the search algorithm and for
296
search class C, with nonmultiple failures. Furthermore, the majority of "internal"
cases, 44 out of 62 (71%) is indeed found in search class A, but the majority of class
A reports is concerned with "external" cases (58 out of 102, 57%). This is due to the
coding of human errors in the AORS, which is according the activity going on during
which the error occurred. Hence this is usually not delt with by introducing a separate
occurrence describing the erroneous behaviour. Looking at search classes and C
together, we see that indeed the majority (56 out of 72, 78%) concerns external cases.
Finally we remark that the "external human" cases are the most numerous, 76 out of
the overall total of 185 (41%).
In conclusion, the reports thus grouped constitute an interesting set of real life
examples of a variety of dependent failure mechanisms with circuit breakers and could
form a source of inspiration for the safety/incident analyst who has to model/analyze
this type of events.
From the NATURAL program CCFSS (see section 3) in the AORSLIB analysis pro
gram library the module was developed that analyses the above type of dependent
failures according to their impact on safety grade systems. The list of such systems as
defined in the AORS is appended as Annex II, their functional descriptions, interfaces
and boundaries are given in Vol. Ill of the AORS Handbook [B]. The setup of the
module is simple. First the 4 disjunct report sets (according search classes A through
D) are established. Then a loop is started wherein for all AORS defined safety systems
each subset is checked for the presence of event reports featuring in the sequence of
occurrences at least one occurrence that was caused by another occurrence and with
the current safety system as failed system. These reports are then printed out,
preceded by the current safety system and the search class.
Two remarks concerning this procedure should be made here. First, the printed
reports are not automatically all relevant because the failure in the safety system can
also be caused by another fault reported in the sequence which has no relation with
the dependent breakers failure. Secondly, the same report may be printed more than
once if failures within different safety systems are observed.
The numbers of reports printed for each class and for each safety system are
listed in Table III. Here the results for power transmission systems and electric power
systems have been pooled together.
An enduser of the procedure interested in a certain safety grade system has easy
access to the (possibly) relevant material, given that the original search results (the 203
reports distributed in search classes A through D ) have been postprocessed in this
manner.
- 297 -
TABLE III - Safety systems in search classes.
D (search class)
tot. syst.
2 A08 1 1
11 BOO 3 - 5 3
8 B03 2 - 3 3
2 B04 2
4 BIO 1 - 3 -
1 Bll 1
1 B12 1
4 B13 1 - 3 -
3 B14 1 - 2 -
8 B15 1 - 6 1
4 B16 1 - 1 2
2 B18 1 1
1 B20 1
3 B21 3
4 B22 2 - 2 -
11 B23 1 - 8 2
2 B36 2
7 C05 1 - 5 1
8 C07 2 - 5 1
1 C12 1 - - -
1 H04 1
37 H05 24 4 8 1
5 LOI 3 - 2 -
6 L03 3 1 2 -
- 298 -
exploiting safety related operating experience in NPPs, both for operational purposes
and for safety assessments.
References
299
ANNEX I - Examples of retrieved cases per screening class.
- 300
A N N E X II - AORS safety grade systems.
- 301 -
F06 T U R B I N E BYPASS SYSTEM
H04 ON-SITE D.C. SYSTEM
H05 E M E R G E N C Y POWER SUPPLY SYSTEM
LOO PROTECTION A N D CONTROL SYSTEM
LOI REACTOR PROTECTION SYSTEM
L02 BOP PROTECTION SYSTEM
L03 E N G E N E E R E D SAFETY F E A T U R E S A C T U A T I O N SYSTEM
LIO R E M O T E SHUTDOWN SYSTEM
M14 E M E R G E N C Y POWER SUPPLY BUILDING H V A C SYSTEM
M22 A U X I L I A R Y FEEDWATER PUMPS CHASE H V A C SYST. (PWR/GCR)
N15 BOP FIRE FIGHTING SYSTEM
302
MRF's FROM THE ANALYSIS OF COMPONENT DATA
303
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 303-341.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
methods of data analysis so that the dependent failures can be
identified.
Operational reliability data is usually one of two types:
1 Complete scenario descriptions, such as US Licencee Event
Reports (LERs) or Nuclear Power Experience Reports (NPERs) or more
generally Abnormal Occurrence Reports; or,
2 Component failure event data taken from plant maintenance
records.
Abnormal occurrence reports, or Licensee Event Reports (LERs) from
the nuclear industry, give a description of an event, including
operator actions, component failures and other event sequence
information. They refer to serious or potentially serious Incidents
according to some criteria laid down in nuclear safety regulations.
They do not, therefore, cover the entire population of component
failures.
Component failure event information, which is gleaned from
maintenance reports, can give details of all component failures on a
system or plant. This means that a very complete picture of the total
failure population could be derived (in theory). Some dependent
failure models require the total population failure rate, for example
the denominator of the beta factor would require this information.
Unfortunately, component failure data also has some shortcomings.
Firstly, it does not describe accident sequences which means that
human/operator activities are not well represented in the data.
Secondly, it incorporates a species of event which, even in relation to
straightforward failure rate calculations, presents problems. That is,
the data is contaminated with maintenance interventions and it becomes
necessary to distinguish between what is a complete failure during
operation and what is a failure which was prevented from going to
completion because of maintenance intervention or the introduction of
modifications.
Both types of operational data appear to suffer from the same
deficiency which is that full system descriptions are not immediately
available and, therefore, it is almost impossible to determine
components' operating duties and their arrangement in the structure of
the system. Component failure event information will, however, contain
design details of the components, if it is well collected.
- 304 -
Common Cause Failure (C C F): This is a specific type of dependent
failure where simultaneous (or near simultaneous) multiple
failures result from a single shared cause.
CAUSE 1 CAUSE 2
EFFECT 1 EFFECT 2
(COHPONENT STATE )
305
EFFECT
CAUSE
CAUSE
EFFECT
Typically, failure root causes might be related to poor design, poor
manufacture, maintenance error, inadequate operating procedures and so
on. Possible component states are
Failed
306 -
The value of this structured approach is that each event scenario
can be dissected and expressed in a consistent form. There still
remain problems of interpretation regarding what the analyst perceives
as the ultimte root cause or the ultimate effects and so on, as well as
the residual difficulties imposed by lack of certain types of
information in, for example, the Licencee event reports. In a
comparison of the results of the analysis of a set of events by four
different analysts it was shown that differences were due almost
entirely to interpretation which can be overcome by adding some
specific rules and guidelines to the classificatin scheme ( 2 ) . It has
also been suggested that inclusion of post-event actions, system
structure and more detailed cause description in event reporting would
facilitate data analysis.
It has been found, however, that this method Is a very valuable
analysis tool. The scheme (of which there have been two or three
variants) has been applied to LER and NPER data. PIckard Lowe and
Garrick (PLG) applied their version of the LATA scheme to NPERs with a
view to supporting parametric dependent failures models (3).
O
In order to find dependent events, or branched cause-effect logic
units, the links have to be detected by examination of the event
descriptions.
The common cause or branched events can be identified if two or
more component failure reports have the same recorded or coded failure
cause plus other identical attributes such as mode of failure or
failure descriptor. In addition, a link between events may be inferred
from a combined knowledge of time of occurrence, system configuration,
component location, and so on. The major difficulty is in providing a
systematic framework in which to do this, because when the number of
events is large it is not possible to rely upon an analysts powers of
recall to detect the links.
Another important feature of component failure data is that
it is derived from maintenance records. Maintenance may be
carried out for several reasons:
1 The component may not be functioning and repair may be
required (the repair may be urgent but because of built-in
redundancy the repair can be carried out when convenient).
2 A scheduled maintenance check may be carried out during
which incipient and degraded conditions may be found.
- 307
3 Repair or maintenance of one component may prompt
examination of other similar components.
4 The operation of a component may be degraded. Repair
may be urgent, or, if not, may be carried out when system
operation is not impaired.
5 A component may have a small defect (incipient fault)
but is operating within specifications and, therefore,
maintenance can be carried out when convenient.
Cases 2, 3, 4 and 5 all involve potential failures and these
form the bulk of component failure event data. For this reason,
quite a large proportion of multiple failure events, in
particular branched events, will include potential rather than
actual failures.
It is possible to glean very useful information relating to
multiple failures if the potentially failed components are taken into
consideration.
The discussion can now be extended to deal with the multiple
failure events and the concept of potential common cause failure events
emerges. These events, are vitally important as a basis for defence
measures against dependent failures. It was found during a study of
component failure data from the European Reliability Data System that
events of type
os
If, when this repair is carried out it is discovered that there
was a design defect, it is Important to check other, similar
components. These components, even though they have not yet failed
(because, perhaps, the revelation is time related) indicate a potential
component cause failure
308
which is a branched single unit root-caused structure. This is a long
way from the idea of catastrophic failure of multiple components.
Consider another case where a previous maintenance error results
in component AX's failure:
o^ AX
309
- in the same plant.
The LATA/EPRI event classification in Figure 1 does include the
cases where components are not identical and permits all components.
It is essential, therefore, that a search of data is not
restricted in terms of mode of failure (that is, whether failure is
catastrophic or incipient) or in terms of time of occurrence.
A method has been developed for sorting component failure data based on
a limited number of relevant event attributes. The classification
scheme recognises the possibility of related component failures between
and plants. It has been suggested that data should be sorted and
analysed for different combinations of components as defined in Figure
2.
Ideally, each component/system/plant combination should be
considered but this is a formidable task. Table 1 indicates the
likelihood of finding related component failures on in-system,
inter-system and inter-plant levels according to different failure
causes. This is rather subjective and is open to debate, but it does
give an indication of where, perhaps, the emphasis of analysis should
be.
- 310 -
Event Classification System Categories
Event Classification
Type Name Code Typical CauseEffect
Logic
Multiple Units LM
3
Root BSR
Caused
Single
Dependent Two or More Unit
Interdependent
Actual or Potential
Component
Caused
BSC
OD(g
Component States
Branched
BMC
Caused
Component "<5
Multiple
a>S
Unit
Mixed BMM
Causes
FIGURE 1. Relationship between dependent events and logic diagram event categories
PLANT SYSTEM COMPONENT
SELECTION SELECTION SELECTION
312
TABLE 1 LIKELIH OOD OF DISCOVERY OF RELATED COMPONENT FAILURES
AT IN-SYSTEM, INTER-SYSTEM AND INTER-PLANT LEVELS
DESIGN
Requirements / Y Y
Error / Y
Manufacturing / / Y)LIKELY WH EN SAME
Construction / / Y)DESIGNERS &
)CONSTRUCTORS
PROCEDURES
HUMAN
Procedures / Y)LIKELY WH EN
Misdiagnosis Y Y)COMMON
Accidental X)TRAINING
INTERNAL
- 313
where causal mechanisms relating to design, manufacturing and
environment may be seen as the main contributors.
Consider next the determination of the links between failure
events.
It was demonstrated (4) in the ERDS component event data study that
single failure event attributes such as mode of failure, cause of
failure etc, were not sufficient to ensure that all related failures
were detected.
Firstly, the use of failure data as the basis for sorting out
component failures was attempted.
This initial choice was because simultaneous occurrence of
failures, particularly on redundant systems, is potentially more
far-reaching in effect. The problem is with this that, once again, the
detection of potential 'common cause failure' events is limited. A
common design error for example may only be revealed with passage of
time. If it Is, therefore, necessary to consider time periods, and
examine component failures within those periods, the problem become
infinitely large.1
Combining failure data date with one or more other attributes
would have its advantages In providing a tighter link.
Of course, there are two ways of oing about the sorting on the
basis of failure-date or any other attribute.
1 Select a particular group of components and sort out their
failures on the basis of failure date
or 2 Sort out all failures on the basis of failure date then
select groupings of interest and determine the component details
thereof.
The obvious selection of cause of failure as the main linking
attribute also raises some problems. Suppose that all failures with
cause of failure 'design' are chosen. Not all of the events will
relate to the same design defect (indeed, they may all be different).
It is, therefore necessary to refine further the selection that has
been made. This can be done by
1 Selecting particular component types of groups of identical
components from the initial group of events
and/or
2 Reducing from
- failure date
- failure mode, failure descriptor
1
There are another problem arising from this method which came
about from the inclusion of scheduled maintenance activity. A
large number of events were reported in a short time period
(maintenance period) and obscured any other patterns emerging from
the data. Most of these events are degraded or incipient
component conditions which could have been repaired at any time.
314
- parts repaired/parts failed
- method of detection
the closer links between failures.
It can be seen immediately that more than one attribute is
required to establish a definite link between failure events.
The traditional 'common cause failure' will drop out quite nicely
because we deal with simultaneous failure of Identical components. But
to define this event we already need three attributes:
1 cause
2 time of occurrence
3 a measure of similarity of components.
The potential common cause failures which have been discussed
above require a more inferential approach based on more attributes.
it is relatively easy for the human being to scan some data and
draw out or infer relationships. Difficulties arise however
1 If vast quantities of data have to be analysed.
2 If more than one type of relationship is being sought.
3 If consistency of approach is to be assured.
4 If continuous up-dating of lengthy analysis is required.
Ideally, some form of semi- or fully automatic procedure would be
desirable using computing facilities in order to detect links between
failures.
Thus it is necessary to decide which combination of attributes
should be used. Four attributes, cause, mode, failure date and
identicality were used in the ERDS-CEDB study but this is not
necessarily the only combination. That method was chosen initially and
then, because of the tie required to investigate this, no other
combinations (except using event descriptions instead of mode of
failure) were attempted. Because of the ERDS interrogative procedures
only a semi-automatic procedure for multiple failure event
identification could be devised. The searches were inordinately
time-consuming (4). The second level of the classification scheme Is
shown in Figure 3.
The way of collecting together related failures may be based on a
histogram - select - histogram ... basis. Suppose that failures with
the same cause and same failure date are required. Firstly, a
histogram of failure causes for the entire population has to be
produced. Each existing failure cause is selected in turn, a histogram
is obtained for the failure dates and failure dates attributed to more
than one failure can be identified.
An alternative approach to analysis is to interrogate the data
base for specific combinations which have been chosen. For example it
may be decided that all pumps failing suddenly owing to a design error
should be selected. This is straightforward but has the disadvantage
that the analyst must think up or invent all combinations. This is a
recipe for disorganisation and inconsistency. Multivariate analysis
can be handled for a few variables (4 were used in the ERDS-CEDB
study), but after that more complex codes and techniques become
necessary.
Whatever techniques are ultimately used to Identify dependencies
from the component data, the net result should be a set of dependent
- 315 -
A common cause common cause common cause
common mode common mode common mode GROUP
simultaneous simultaneous simultaneous I
identical elements non-ident elements same element
316
failures. The data sets so derived can then be receded and expressed
in the same cause effect logic as discussed earlier.
PUMP 06
PUMP 07
317
An identical sort of the HFCIS pump failures on plant Y may give
PUMP 07
(Note that these examples imply a common design problem on plants 1 and
2. These examples are similar to a real situation found in component
data (4)).
There is a predominance of potential failures in data for the
reasons stated above but this information is of great engineering value
and the above examples demonstrate how performance might be monitored.
To demonstrate further the kind of post search analysis which
might be carried out some results from the ERDS study are attached.
(Appendix 1). The results relate to the Group A multiple failure
events which were found. These groups of failures involve failures
which common cause, mode and failure date and relate to "identical"
components (in this study identical implied components of the same
generic type eg pumps or motors etc).
It is after this preliminary searching that a more precise
definition of "dependent failure" can be imposed by the analyst. This
definition might be one related to a particular dependent failures
model which is being applied.
There are some residual problems such as what to do with POTENTIAL
dependencies, as described above which are not for the data analyst but
rather the dependent failures modeller to resolve.
To calculate beta factors it might be that only Group A related
events would be considered and, in particular only those involving
catastrophic or sudden failure, for example.
318
4. CONCLUSIONS
REFERENCES
ACKNOWLEDGEMENTS
319
PLANT PWRl
COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 2 2 t 1 0 0 1
BIO 3 2 0 5 5 0 0 5
Bit 2 2 2 6 3 0 1 t
B16 0 0 9 9 0 0 0 0
BIS 2 2 t 8 7 2 0 9
FOB 7 5 35 t7 2S 3 7 38
F16 2 6 2 IO 1 2 0 3
TOTAL: 18 19 56 93 ts 7 8 60
PLANT PWR2
COMPONENTS FAILURES
SYSTEM PIMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 2 2 t s5 1 0 2
BIO 3 2 0 6 0 0
lit 2 2 2 6 t 0 1
B16 0 0 9 9 0 0 t
BIS 2 2 t S 6 1 0
FOB 9 5 35 t9 t5 t 12 61
F16 2 6 2 10 2 0 0 2
TOTAL: 20 19 56 95 6t 5 19 88
PLANT PWR3
COMPONENTS FAILURES
SYSTEM PIMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 2 0 2 t
BIO 3 2 1 6 5 0 0 S
lit 2 0 16 S t 0 It 15
BI6 2 0 9 11 12 0 5 17
BI8 2 2 t a 0 0 1 1
FOB 7 5 35 t7 IS 1 15 3t
FI6 2 6 2 10 1 3 1 5
TOTAL: 22 19 71 112 42 t 35 81
PLANT PWR4
COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 2 0 0 2
BIO 5 2 0 7 0 0
lit 2 0 2 t 0 0 0 0
B16 2 0 9 11 t 0 8 12
BIS 2 2 t S 0 0 0 0
FOS 7 5 35 t7 22 0 17 39
F16 2 6 2 10 2 1 0 3
TOTAL: 24 19 56 109 38 I 25 64
PLANT PWR5
COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 0 0
BIO 3 2 0 5 0 0
Bit 2 0 2 t 0 1
BI6 2 0 9 11 0 2
BIS 2 2 t 8 0 0
FOB 7 5 35 t7 13 3 6 22
FI6 2 t 6 12 2 0
TOTAL: 22 17 60 99 31 5 9 45
P L A N T PWP.6
COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BOt t 2 t 10 2 0 0 2
BIO 3 2 0 5 3 0 0 3
Bit 2 0 2 t 0 0 0 0
B16 2 0 9 11 2 0 0 2
BIB 2 .2 t S 0 0 0 0
F08 7 5 35 t7 t 1 3 S
FI6 2 5 6 13 1 1 0 2
TOTAL: 22 16 60 88 12 2 3 17
320 -
APPENDIX 1
321
CODE SYSTEM
322 -
s
50-j
40 -.
V)
oc
13
30-
20^
I s s I
10^
i il1. !
IS
FIGURE 6
1A 1C 2D
l
2F 3G 31 4J 4L 5M
CATEGORY
P73
50
M M. m _rza
6P 6R 7S 7U
40-
=> 30H
OC
5 20-
O
10-
_ZZ3_
1A 1C 2D 2F 3G 31 4J 4L 5M 50 6P 6R 7S 7U
CATEGORY
FIGURE 7 HISTOGRAM SHOWING THE NUMBER OF MULTIPLE FAILURE EVENTS OF EACH TYPE
FOR EACH COMPONENT TYPE IN THE ANALYSIS OF DATA FOR SIX PWR'S
501
40
= 30 H
t/t
3 20
10
_ & _HgJ
1 IC 2D 2F 3G 31 4J 4L 5M 50 6P 6R 7S 7U
CATEGORY
LEGEND: SYSTEM I 1 B03 BIO EZZZ3B14
kXXXXX B16 K 5 W 1
mm Fu
FIGURE 8 HISTOGR
A M SHOWING THE NUMBER OF MULTIPLE FA ILURE EVENTS FOR EA CH
TYPE FOR EA CH SYSTEM IN THE A NA LYSIS OF DA TA FOR SIX PWR'S
50 H
40-
30-
a.
2 20^ I
I
10-
1A 1C 2D 2F
1
3G 31 4J 4L 5M
m
50 6P 6R
S2ZL_ESS
7S 7U
CATEGORY
LEGEND: PLANT I IPWR1 l ^ P W R 2 PWR3
vx&s&xi PWR4 ^ S ^ PWR5 minimi) PWR6
FIGURE 9 HISTOGRAM SHOWING THE NUMBER OF MULTIPLE F AILURE EVENTS OF EACH TYPE
FOR EACH PLANT IN THE ANALYSIS OF DATA F OR SIX PWR'S
Event Category Plant System Component Cause Mode
- 327 -
8 -
7 H
6 -!
5 -.
4 -:
a 3 .-!
UJ
DC 2 -i
1 -i
O
DEFFINNNPSSSW DEFFINNNPSSSW DEFFINNNPSSSW
1111111111111 1111111111111 111111111111 1 CAUSE
11 12 12 11 12 12 11 12 12
ELMO h PUMP h -VALV COMP
LEGEND: MODE ^ ^ B
A* Won't open
B* Won't close
C* Neither opens nor closes
D* Fails to start
E* Fails to stop
F* Fails to reach design specifications
A Sudden
Incipient
C No defined
(Note that in the analysis carried out codes for modes of failure on
demand are followed by an asterisk to distinguish them from modes of
operation
329 -
Fig 4, the total numbers of each component and total number of
failures for each type are very different:
RATIO OF CAT A
MFE's TO TOTAL
FAILURES 0.083 0.052 0.045
330
2 2 pump failures (incipient).
During planned maintenance, pumps' mechanical seals found eroded
by humidity and deformed.
No effect on system, other components or reactor.
3 4 valve failures (incipient).
Found during routine surveillance:
i Continuous leakage from sealing was noted; repack and
replace stem, disc and segment.
ii Continuous leakage from packing gland. Replaced at next
outage.
iii Internal leakage. Packing gland replaced,
and iv No effect on system, other components or reactors.
4 2 pump failures (incipient).
Failures Detected by operating staff.
Water present in lubricating oil owing to abnormal humidity. Each
failure caused loss of one redundancy (ie one was repaired while
the other was used and vice versa).
There was no effect on the reactor or other systems.
5 2 pump failures (incipient).
There was a complete loss of system function and a forced manual
reactor shut-down because of bearing failure. Abnormal humidity
resulted in water presence in lubrication oil.
6 2 pump failures (sudden).
Method of detection unknown and loss of one redundancy in each
case.
i Fluid leakage from welding of coupling flange with
cooling pipe.
ii 10cm crack on cooling pipe.
7 2 pump failures (incipient).
During planned or preventative maintenance, the pumps were
replaced owing to erosion of the diffuser, casing and shaft and
degradation of bearings and seals.
i Degraded system operation and caused a turbine trip.
ii No significant effect on other systems or components or
on reactor.
Some modifications introduced.
8 2 pump failures (incipient).
Water leakage from pump cooling pipe occurred because of flow
rates being too high and thereby causing erosion
i Lose of one redundancy.
ii Loss of system function and forced reactor shut-down.
9 2 valve failures (incipient).
One detected during monitoring of operating abnormalities and the
other detected during repair or corrective maintenance. Internal
leakage was greater than allowable, even after adjustment, because
of stem and disk scoring.
10 2 valve failures (failed to open on demand).
The failures were detected by calling the component into operation
during refuelling or revision.
Friction and jamming of the disk occurred owing to material
incompatibility and design error.
- 331 -
11 2 valve failures (Incipient).
i Detected during planned maintenance. Stem and bonnet
erosion because of leaking packing gland,
ii Detected during routine surveillance. Leakage from
packing glands.
12 2 valve failures (incipient).
Both detected during planned maintenance and having no significant
effect on the system or reactor.
In one stem scoring had occurred and in the other a gasket needed
replacing.
13 2 pump failures (incipient).
i Detected during routine surveillance. Leakage from both
mechanical seals. Tungsten carbide seals replaced with
silicon carbide (Burgman).
ii Detected during repair or corrective maintenance. Free-
side seal replaced with Burgman type seal.
14 Three valve failures (mode not defined).
Detected during repair or corrective maintenance.
A modification was made to the packing gland by the manufacturer
in order to avoid leakage.
15 2 valve failures (incipient).
Detected by operating staff during reactor refuelling.
Both leakages from the body-bonnet joint. Pins were cleaned,
joint replaced and repacking carried out.
16 4 valve failures (incipient).
Detected by operating staff.
No effects on other systems or reactor.
Modifications were carried out to the packing gland in each case.
17 2 valve failures (incipient).
Detected during planned or preventive maintenance.
No effects on other systems or reactor.
Both valves were replaced owing to bad maintenance.
18 2 valve failures (incipient).
Detected during routine surveillance.
No effects on other systems or reactor.
Both valves had modifications made (braids added) and spring and
thrust-washer grinding.
19 2 motors failed (sudden).
Detected by monitoring of operating abnormalities.
No effects on other systems or reactor.
Insulation failure and earth circuit in stator windings owing to
water ingress in the stator terminals box.
20 2 pump failures (mode not defined).
Failure detected during maintenance.
No effects on other systems or reactor.
A modification was made to the Installation of the thermostatic
valves on the bodies of the pumps in order to permit easy
disassembly.
21 2 pump failures (incipient).
Detected by technical deduction (degraded performance).
- 332 -
No effects on other systems or reactor.
Magnetic filters were added on the cooling devices of the pumps.
22 3 valve failures (failure to open on demand).
Detected by calling the system/component into operation. Failure
caused loss of system function but had no effect on other systems.
In one failure report it was stated that the reactor had to be
shut down.
These motorlsed valves did not open because incorrect adjustment
of the servomotor resulted in excessive loading during closing
which deformed the disk.
23 2 valve failures (sudden failure).
Method of detection unknown.
Each failure caused loss of one redundancy.
Body-bonnet welding was carried out. The cause was unspecified.
24 2 valve failures (sudden).
Method of detection not stated.
Caused loss of one redundancy.
Leaking from packing glands owing to expected wear.
25 2 valve failures (mode not defined).
Detected during planned maintenance.
No significant effects.
Checking and regulation of valves.
26 2 valve failures (incipient).
Detected by monitoring of operating abnormalities.
No effects on other systems or reactor.
Modification in design of thrust bearings (new material) and
packing glands replaced. Modifications also to lube-oil loop
design.
27 2 pump failures (sudden).
Automatic shut-down by protections systems.
Loss of system function.
Loss of pressure In the lube-oil exchanger circuit (water side)
causing automatic shut-down of both circulating pumps.
The cause was a design error but no further details were given.
28 2 motor failures (incipient).
Method of detection not given.
Effects on other systems or reactor none.
Bearings were replaced on both motors owing to bad quality of
lubricating grease. Excessive vibration was noted on shut-down so
improvements were made in the insulation of motor supports.
29 2 valve failures (mode not defined).
Method of detection not given.
Failures caused loss of redundancy.
The packing glands on the valves were modified.
30 2 valve failures (incipient).
Detected during planned maintenance.
No effects on other systems or reactor.
Eroded valve seals replaced.
There are several crucial observations which can be made from
these events
333
a In twenty cases out of thirty no effects on the system, other
systems, or the reactor were observed. In one case (event 10)
both failures were features to open on demand, other failure modes
were Incipient or unknown.
b Of those events which had some significant effects there
were
7 cases of loss of one redundancy
4 cases of loss of system function
1 case of degraded system operation ) same failure
1 turbine trip )
2 reactor shut-downs.
c There were some apparent inconsistencies in this information.
For example in event 7, one failure was said to have no effects
whereas the other indicated system degradation and turbine trip.
Both pumps were replaced. It can only be assumed that the
occurrence of one failure which had significant effects caused
further investigation of the other pump and both were repaired at
a convenient point in planned maintenance. The causes of failure
were "expected wear" but this is slightly suspicious because of
the extreme damage to both pumps. Similarly, event 8 includes one
failure causing loss of one redundancy whilst the other caused
loss of system function and forced reactor shut-down. Again,
causes of failure were identical. If the pumps were operating in
parallel, as is suspected but cannot be confirmed because of lack
of supporting system design information, it would require both to
fail before system shut-down was necessary. To add to the
confusion, both failures were 'incipient' and were, therefore, in
a certain sense 'under control'.
Apart from loss of system function (LOSF) in event 8 this
also occurred in events 5, 22 and 27. In event 5 the failures
were 'incipient'. It can only be surmised that in order to
prevent a catastrophic failure it was decided to shut down and
repair.
In event 22 the two valves failed to open on demand. This is much
more like the common-cause failure that is immediately
recognisable. Similarly in event 27 the two pumps failed suddenly
because of a design error.
e The only two events which were noted as 'common cause'
failures in the event description were events 27 and 28.
f Event 28 involved two incipient motor failures which had no
effect on the system but the failures had a common cause.
g In each case the failures were not of simply generically
similar components but truly identical components.
h In 24, for example, the common cause of failure was "expected
wear". This suggests that the failures were random, and this is
confirmed by the fact that the failures were dealt within both
cases during planned maintenance. The need for packing gland
replacement Is normal at some point in a valve's life.
I In events 19 and 15 the coded causes of failure was unknown
or unspecified whilst the failures in these events are very
closely coupled. This is a caveat for the analyst. If only
- 334 -
Interesting causes like 'engineering design' are searched for then
some interesting events might be lost.
j In event 6 it would appear that examination and repair of one
failure caused maintenance engineers to check similar systems thus
revealing the second failure. The failure did not however have a
truly common cause.
These results to a large extent vindicate the method being adopted
because:
1 Neither cause nor mode of failure is sufficient to identify
related failures. Analysed together these attributes can draw out
closer coupling.
2 An analysis which simply considered events which had serious
effects would not adequately identify many related events.
3 Maintenance intervention appears to successfully prevent
sudden multiple failures. If only events with sudden or
catastrophic failure modes were considered then much useful
information would be lost.
4 If ratios of multiple failure events to total numbers of
failure events are calculated only for those MFE's which caused
loss of system function, the writer believes that this would give
a ratio which is far too conservative.
Events which may have become catastrophic common cause failures
are events 1, 2, 3, 4, 9, 13, 14, 15, 16, 17, 18, 21, 25, 28.
Event 5 Loss of system function (decision to shut-down to
prevent catastrophic failure) (class as actual C C F ) .
Event 6 Independent failures coupled by maintenance intervention
only.
Events 7 Positive intervention as in 5 to prevent catastrophic
and 8 failure (class as actual CCF's).
Event 10 Actual CCF.
Events 11 Independent events coupled by maintenance intervention
and 12 only.
Event 19 Actual CCF.
Event 20 Modification to permit easy disassembly (not an actual
failure event, only functional unavailability).
Event 22 Actual CCF.
Event 23 Actual CCF.
Event 24 Independent events coupled by maintenance only.
Event 26 Modifications during maintenance.
Event 27 Actual CCF.
Event 29 Modifications to valves.
Event 30 Replacement of parts during planned maintenance.
Where modifications are carried out it is difficult to know how
essential they were to preventing dependent failures. These are not,
failures considered potential dependent failures. In essence these
component outages were functional unavailabilities not failures.
Recalculating the ratios of MFE's to total failures for each
component type:
335 -
MOTORS PUMPS VALVES
- 336 -
APPENDIX 2
RESULTS OF SEARCH NO 1
PLANT PWR2
337
V19 Il Ell A* 10 25 19 130879
338
F08 VIO Eli Wl 18 26 33 150980
Vil Eli Wl 18 26 33 150980
PLANT PWR3
PLANT PWR4
339 -
V55 SI Dl Fil A* 21 43 25 190379
340
B16 V56 Nil A 25 33 55 220980
V57 Nil A 25 33 55 220980
PLANT PWR5
PLANT PWR 6
341
Subject Index
- 343 -
- data sources 23,49,120,175,177,237
- event data 65-70,264-266,295-297,317-341
- impact in PSA 9-29
- initiators 17,27,96,277,286
C-factor model/values 21,22,24,167
Command faults 132
Common load 2,117,149
Component event data
- general, analysis 4,5,16,37,259,303-341
- collections: CEDB 290,319-341
NPRDS 259,261,262
SRDF 259,261,262
Swedish 11,279
Computer codes: ALMONA 122
PREP-KITT 122
SAMPLE 122
Sstagen-Mmarela 145-147
Coupling mechanism 56,60,61
Cut-off method 52,117,122,127,227
344
Identification procedures for CCFs 16,49,113-129,210,237-241
Impact vector 5,179-185,193-196,210-213,
218,228
Implicit modelling see parametric models
Incident report data 4,5,37,259,261
- collections: AORS 289-302
ERDS 290,319,321
LERs 11,38,197,209,228,230,309
NPERs 309
SPT 261
Inspection
- case study 149-151
- maintenance, overhaul procedures 1,13,14,42,108-111,149-151
Instrumentation systems 108,260-267
Intercomponent/
intersystem dependencies 17,96
Judgement (subjective,
engineering, etc.) 4,5,14,21,175,205,206,227,241
345
Quantitative analysis 9,21-22,227-228,243-256 (see
also CCF analysis procedures
and parametric models)
Uncertainty 4-5,9-10,15,19,22,26,49,
191-197,229-233,241,243,253-255
- 346
vancea seminar on
Common Cause Failure Analysis in
Probabilistic Safety Assessment
Proceedings of the ISPRA-Course held at
the Joint Research Centre, Ispra, 16-19 November 1987
Edited by
ANIELLO AMENDOLA
Commission of the European Communities, Joint Research Centre,
Ispra Establishment, Ispra, Italy
The analysis of dependent events is one of the most critical steps in safety and
reliability assessment of complex systems. However, a wide variety of dependency
structures may exist among the failure modes of equipment and components, thus
requiring a range of different analytical approaches. Several parametric models
have been proposed for the implicit treatment of so-called common cause failures:
the availibility of adequate data is still a matter of debate.
To clarify such problems, and to assess the state-of-the-art models, data, and
procedures, at least in regard to nuclear power plants, a series of benchmark studies
have been undertaken, involving the foremost experts available in the USA and
Europe. The participants in these benchmark studies here exchange the fruits of
their experiences, thereby providing a comprehensive review of the cutting edge of
work on the classification of independent failures: the identification of causes for
dependencies; implicit and explicit modelling procedures; defence criteria against
multiple failures; and data analysis and operating experience.
Although the studies have been undertaken against a background of nuclear
systems, the lessons learned may readily be generalized to other fields, thus
ensuring that the book will be beneficial to the whole community of reliability
analysts.