You are on page 1of 364

ed Seminar on

Common Cause
Failure Analysis
in Probabilistic Safety
Assessment
Proceedings of the ISPRA Course held at the
Joint Research Centre, Ispra, Italy,
16-19 November 1987

Edited by

Aniello Amendola

ISPRA Courses on Reliability and Risk Analysis

'luwer Academic Publishers


ADVANCED SEMINAR ON
COMMON CAUSE FAILURE ANALYSIS
IN PROBABILISTIC SAFETY ASSESSMENT
COURSES

ON RELIABILITY AND RISK ANALYSIS

A series devoted to the publication of courses and educational seminars given at the
Joint Research Centre, Ispra Establishment, as part of its education and training program.
Published for the Commission of the European Communities,
Directorate-General Telecommunications, Information Industries and Innovation,
Scientific and Technical Communications Service.

The publisher will accept continuation orders for this series which may be cancelled at any time and
which provide for automatic hilling and shipping of each title in the series upon publication.
Please write for details.
ADVANCED SEMINAR ON
COMMON CAUSE
FAILURE ANALYSIS
IN PROBABILISTIC
SAFETY ASSESSMENT
Proceedings of the ISPRA Course held at the Joint Research Centre,
Ispra, Italy, 16-19 November 1987

Edited by

ANIELLO AMENDOLA
Commission of the European Communities,
Joint Research Centre, Ispra Establishment, Ispra, Italy

KLUWER ACADEMIC PUBLISHERS,


D O R D R E C H T / BOSTON / LONDON [ V . ii^_
|CL
Library of Congress Cataloging in Publication Data
Advanced Seminar on Coaaon Cause Failure Analysis in Probabilistic
Safety Assessment (1987 : Ispra, Italy)
Advanced Seainar on Coaaon Cause Failure Analysis in Probabilistic
Safety Assessment : proceedings of the ISPRA course held at the
Joint Research Centre, Ispra. Italy. 1619 Noveaber 1967 / editad by
An te 1 lo Aaendola.
p. ca. (ISPRA courses on reliability and risk analysis)
Bibl lography p.
Includes index.
ISBN 0792302660
1. Systea failures (E ngineer i n g ) C o n g r e s s e s . 2 . Reliability
(Engineering)Congresses. 3. P r o b a b i l i t i e s C o n g r e s s e s .
I. Amendola. Aniello, 193 II. Commission of the E uropean
Communities. Joint Research Centre. Ispra E stablishment.
III. Title. IV. Series.
TA169.5.A34 1967
620 " . 0 0 4 5 2 d c 2 0 9000

ISBN 0792302680

Commission of the European Communities ^ H ^ H Joint Research Centre Ispra (Varese), Italy

Publication arrangements by
Commission of the European Communities,
DirectorateGeneral Telecommunications, Information Industries and Innovation, Scientific and
Technical Communications Service, Luxembourg

EUR 11760
1989 ECSC, EEC, EAEC, Brussels and Luxembourg
LEGAL NOTICE
Neither the Commission of the European Communities nor any person acting on behalf of the
Commission is responsible for the use which might be made of the following information.

Published by Kluwer Academic Publishers,


P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
Kluwer Academic Publishers incorporates the publishing programmes of
D. Reidel, Martinus Nijhoff, Dr W. Junk and MTP Press.

Sold and distributed in the U.S.A. and Canada


by Kluwer Academic Publishers,
101 Philip Drive, Norwell, MA 02061, U.S.A.

In all other countries, sold and distributed


by Kluwer Academic Publishers Group,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

All Rights Reserved


No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.

Printed in the Netherlands on acid free paper


TABLE OF CONTENTS

Foreword vii

Introduction (A. Amendola) 1

Treatment of Common Cause Failures. The Nordic Perspective /

(S. Hirschberg) 9

Classification of Multiple Related Failures (A. Amendola) 31

Design Defences against Multiple Related Failures (P. Humphreys) 47

Design-Related Defensive Measures against Dependent Failures - ABB


ATOM's Approach (S. Hirschberg and L. I. Tirn) 71
Design Defences against Common Cause/Multiple Related Failure
(P. Doerre and R. Schilling) 101

Measures Taken at Design Level to Counter Common Cause Failures.

A Few Comments Concerning the Approach of EDF (T. Meslin) 107

Analysis Procedures for Identification of MRF's (P. Humphreys) 113

Dependent Failure Modelling by Fault Tree Technique (S. Contini) 131

Treatment of Multiple Related Failures by MARKOV Method


(J. Cantarella) 145
Parametric Models for Common Cause Failures Analysis

(K. N. Fleming) 159

Estimation of Parameters of Common Cause Failure Models (A. Mosleh) 175

Pitfalls in Common Cause Failure Data Evaluation (P. Doerre) 205

Experience and Results of the CCF-RBE (A. Poucet) 221

Analysis of CCF-DATA-Identification. The Experience from the


Nordic Benchmark (K. E. Petersen) 235
Some Comments on CCF-Quantification. The Experience from the
Nordic Benchmark (K. Prn) 243
Analysis of Common Cause Failures Based on Operating Experience:
Possible Approaches and Results (T. Meslin) 257

Multiple Related Failures from the Nordic Operating Experience


(K. U. Pulkkinen) 277

The Use of Abnormal Event Data from Nuclear Power Reactors for
Dependent Failure Analysis (H. W. Kalfsbeek.) 289

MRF's from the Analysis of Component Data (P. Humphreys,


A. M. Games, N. J. Holloway) 303

Index 343
FOREWORD

There is today a wide range of publications available on the theory of


reliability and the technique of Probabilistic Safety Analysis (PSA).
To place this work properly in this context, we must recall a basic
concept underlying both theory and technique, that of redundancy.

Reliability is something which can be designed into a system, by the


introduction of redundancy at appropriate points. John Von Neumann's
historic paper of 1952 "Probabilistic Logics and the Synthesis of
Reliable Organisms from Unreliable Components" has served as
inspiration for all subsequent work on systems reliability. This paper
sings the praises of redundancy as a means of designing reliability
into systems, or, to use Von Neumann's words, of minimising error.

Redundancy, then, is a fundamental characteristic which a designer


seeks to build in by using appropriate structural characteristics of
the "model" or representation which he uses for his work. But any
model is established through a process of delimination and
decomposition. Firstly, a "Universe of Discourse" is delineated; its
component elements are then separated out; and moreover in a
probabilistic framework for each element each possible state is
defined and assigned an appropriate possibility measure called
probability.

This process of delimitation and decomposition is at the root of many


problems of modern technology - divergence between the model and
reality, disagreements among experts, and the consequent reluctance on
the part of the public to trust the experts. But the fact is that some
such process is necessary; it 3 the essence of modern scientific
method, and as such is the basis of our modern industrial
civilization.

In my opinion, the reluctance among politico-technical circles to


accept and understand the value of Probabilistic Safety Analysis stems
from the fact that a PSA has to carry this method of delimitation and
decomposition to the extreme. But we must remember that nonetheless
the PSA methodology succeeds in capturing and modelling a larger and
more consistent segment of reality than any previous method of safety
analysis.
In the oases we are considering, this analysis can throw up fictitious
redundancies, redundancies which appear in the model but are not
reflected in reality. The solution to the problem of Common Cause
Failure, Common Mode Failures, of dependencies, is in essence to avoid
creating fictitious redundancies in a model, or at least to eliminate
them in subsequent elaboration of the model.

I must emphasise that this problem lies upstream of any mathematical


manipulation of the model; it is a problem of representation of
reality. Any solutions will therefore be closely linked to the reality
being modelled and the type of model chosen.

The importance of this book lies in its overall approach to this


problem. It starts from the well-known decomposition methodology,
based on familiar Boolean two value logic, of fault tree analysis. It
then goes on to demonstrate, by the use of many concrete examples, how
this process can be guided, refined, and corrected by successive
approximations, in order to arrive at a model which is both
technically correct and practically useful.

This work is full of practical guidance and useful heuristics


indeed, it turns out that many so-called rules should be seen as
useful heuristics! These offer considerable help in the specific tasks
faced by the analyst trying to model a complex system. The book also
represents an important, if not unique, synthesis of experience and
collection of field data.

Altogether, we have here an extremely useful review of the "state of


the art" for anyone involved in the technical work of establishing a
PSA, as well as a cultural achievement which will appeal to those
interested in evaluating the methodology of probabilistic
representation of complex systems.

Dr. Giuseppe Volta


Commission of the European Communities
Joint Research Centre - Ispra Site
Director of the Institute for Systems Engineering
INTRODUCTION

A. Amendola
CEC-JRC
Institute for Systems Engineering
Systems Engineering and Reliability Division
21020 Ispra (Va) - Italy

For reliability assessments, complex systems are normally decomposed


into the number of their constituting items. These do not coincide
with the "hardware" components only. Indeed, a physical system
necessarily interacts with human operators according to control,
emergency, repair and maintenance procedures, technical
specifications, etc. So that such "software" elements and human
operators are further constituents of the system. At a higher system
level, even other factors such as the conceptual design or the overall
organizational management of the macrosystem, of which the physical
system is a part, can be considered as further generalized components
interacting with the physical system.
After the necessary decomposition a major problem for correct
modelling and assessment is the reconstruction of the actual
dependency structures among the items after having identified them and
made them explicit: this is the only way of achieving a model as close
as possible to the real system.
The assumption of "independence" of the items and that of purely
random failure processes are very helpful because of the easy
probabilistic calculations they make possible, but, of course, they
are very far from reality. Even when, by a well structured analysis
procedure, functional links among the different items have been
identified, dependency structures provoked by less perceivable
"software" elements, or by events occurring outside the physical
boundary of the operating systems (for instance events which occurred
during the conceptual design or the manufacturing) can become
dramatically evident at a certain time of system operation and only
under particular demands.
The unavoidable existence of more or less hidden dependency
structures is the limiting factor which impedes a system to achieve
unlimited reliability. The assignment of probabilities for failures
provoked by dependencies, which can only be hypothesized from
statistical evidence on systems different from those under
investigation, is the crucial problem of any risk assessment project.
The awareness of this problem, the need to implement adequate
defences in a system against the occurrence of common cause failures

1
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 1-7.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
(as such dependency structures are being usually labelled) have
encouraged several proposals for analysis procedures and mathematical
models as well as significant international collaborative projects.
Already in November 1975 a Task Force on Problems of Rare Events
in the Reliability Analysis of Nuclear Power Plants has been set up on
the basis of a recommendation of CSNI (the Committee on the Safety of
Nuclear Installations of the Nuclear Energy Agency of the OECD). A
subsequent CSNI Research Group placed emphasis on protective systems
for nuclear reactors and elaborated a classification system (1) for
common mode failures (terminology problems are discussed elsewhere in
the book (2)) especially directed towards defences against CCFs.
Further insights from the CSNI project and from UKAEA-SRS
researchers in the field came from the Watson Review in 1981 (3).
In the meantime a number of probabilistic models were proposed:
Fleming's model based on the ratio of CCF to total failure rate (4),
the shock model by Apostolakis (5), the common load by Mankamo (6),
the multivariate model by Vesely (7) and the binomial failure rate
model by Atwood (8), which were followed by other models described
elsewhere in the book.
Together with these statistical parameter models to predict CCF
rates, several procedures have been proposed to identify CCF events
and to include them into the system logical diagrams (see a non
exhaustive but relevant literature list at Refs.(9-19)). Also some
data on Diesel generators (20) and pumps (21) have been estimated and
published since then.
Despite this significant theoretical effort, after the Wash 1400,
that used a boundary model for including CCFs, the CCF problem was for
many years either ignored in practice (the first NPP PSAs did not
include CCFs) or poorly approached. This was indeed evidenced in
Europe by the first Systems Reliability Benchmark Exercise (S-RBE)
(22) and in the USA by the identified need to agree on a consistent
classification system (23) as a basis for establishing an adequate
data base of CCF occurrences. Furthermore, problems connected with
sound statistical estimation procedures were identified and originated
discussions which continued even in recent papers (24-29).
The S-RBE project was aimed at assessing the complete procedure
of a reliability evaluation of a complex system by starting from the
basic documentation and familiarization with the reference system.
This was the EDF Auxiliary Feedwater System of the Paluel Unit. It was
constituted by two redundant trains, each one again with a double and
diverse redundancy (motor-driven and turbo-driven pumps). Therefore,
it presented interesting challenges to the expert teams involved in a
CCF analysis.
Participation in the exercise included representatives of major
partners involved in NPP safety assessment in Europe, i.e.
authorities, vendors, utilities and research institutes from EEC
member countries and Sweden.
S-RBE included both a structured qualitative analysis and
reliability modelling and evaluation; furthermore, it was subdivided
into several phases in order to separate the effects of the different
contributors upon the overall spread of the results. During all

- 2 -
working phases and subsequent discussioni, the discrepancy among the
approaches to the CCF problem could not be overcome as a result of the
programmed S-RBE. Indeed, in this respect, it was not a question of
different data or different construction of the fault tree; rather the
way in which the problem was dealt with differed completely, and
appeared to be a natural consequence of different philosophies.
Namely, only few teams attempted to quantify CCF occurrences in the
fault tree, other teams either analysed CCFs in a qualitative manner,
by indicating the reasons why such events could be excluded or
neglected since adequate defences were introduced into the design, or
performed sensitivity analysis of the overall results to selected CCF
occurrences. Furthermore, apparent discrepancies were appearing even
in the terminology and basic diffrenciation among the different
dependency structures implied by the CCF designation (see the analysis
performed by A. Games on the CCF aspects of the S-RBE in Part II of
Ref.22).
Subsequently an ad-hoc project was launched (1984-1986): the
"Common Cause - Reliability Benchmark Exercise" (CCF-RBE) on which
several papers in the book are based, which allowed the participants
to clarify most of the discrepancies in the approaches and to
elaborate a basic consensus on the most appropriate procedures to be
followed to include dependency analysis in PSA. Both s-RBE and CCF-
RBE were part of the JRC programme on NPP risk and reliability
evaluation, coordinated by G. Mancini.
The success of the CCF-RBE was also enhanced by the fact that
this time even USA teams, previously involved in the already mentioned
classification project (23) and in elaborating formal analysis
procedures (30,31), participated in the project. They brought,
therefore, not only their particular experience but also the lesson
learnt from a project on data benchmarking in which many laboratories
in the USA and Europe were involved (23). This project was sponsored
by EPRI and managed by D.H. Worledge, it started in 1982 and had as
objectives a better understanding of CCF and the development of
available experimental data into a data base to support PSA analysis
as well as the planning of CCF defensive strategies. In phase-I of the
project a preliminary formulation and test of a classification system
was performed. A second test was performed in phase-II. And as a
result a classification scheme was elaborated (32,2) which was the
basis for the analysis of NPP experience involving dependent events
performed by PL&G for EPRI (33). In this way the CCF-RBE became also a
significant test for the guidelines in preparation in the USA and gave
very significant insights and feedback to them (34).
A further very significant project on a "Common Cause Failure
Data Benchmark Exercise" was performed in the framework of the "Nordic
Cooperation in Nuclear Safety" which involved industries, research
organizations and authorities of the Scandinavian countries (35).
The seminar, of which this book presents the proceedings, has
been promoted by the wish to compare the findings and to confront the
approaches followed by the participants in the above mentioned
Benchmark projects, as well as to up-date them according to new
researches originated by issues identified during the CCF-RBE.

-3 -
Therefore, the book represents a state-of-the-art review on the
different aspects linked with the problem of identifying, modelling
and predicting probabilities of dependent failures as well as of
adequately defending a system design against their occurrence. It
collects the experiences of most of the teams having participated in
these international projects and is enriched by the analysis of the
operating experience from European NPPs.
In addition to data drawn from incident reports, the use of
component event data is another way to enrich available data bases on
dependent failures. Investigations on this topic have started at the
JRC Ispra (36,37) and have been finalized at NCSR, as described in the
book (38).
In addition to the results presented in this book, the
significant contributions given to the methodological results of the
multiple projects on CCF and, therefore, to the findings described in
the book by all other participants in these projects, must be
acknowledged.
Of course, no problem can be considered to be solved once and for
ever, and further research will certainly increase the consistency of
PSA analyses. However, some issues, on which a large consensus has
been found, should be considered to be definitively clarified at least
as far as their theoretical and procedural foundations are concerned.
These can be summarized as follows (39):
- Dependency structures have a dominant impact on the reliability of
redundant systems and as such should be considered right from the
early design stage throughout the system life cycle. They must be
properly accounted for in any PSA.
- A systematic and structured qualitative analysis is necessary in
order to identify potential multiple failure mechanisms (in addition
to the relevant chapter in the book, see also Ref.(40)), to screen
and rank potential dependent events and to consistently link the
qualitative insights about system design and operation with the
assignment of probabilities.
- Both explicit modelling and implicit (parametric) models are needed
since each technique has its rather well defined application domain.
Dependent events related to a clear cause-consequence structure
should, in principle, be modelled explicitly in the fault trees or
event trees, whereas the residual set of potential dependency
structures, which cannot be represented in the logical model (either
because of too onerous a modelling or because of a less definite
cause-consequence structure), can be captured by parametric models.
- Use of generic CCF parameters should be discouraged. Generic Beta
factors or similar data might be accepted only for screening
purposes when deciding which events should be included in the
probabilistic assessment.
- The parameters for implicit models should be estimated case by case,
starting from the widest possible event data base and screening the
events which potentially affect the design under investigation. This
analysis phase is the key contributor to analyst-to-analyst
variability and, therefore, to the assessment uncertainty, because
of the subjective judgement applied in the process of transfering

- 4-
experience from other NPPs to the one under analysis.
- Event statistics should be preferred to component statistics, even
if the quantitative effects are less important than the spread
introduced by the judgement in the screening of events and in the
assessment of the impact of the events on the plant under analysis.
- The problem of mapping up and mapping down, i.e. extrapolation or
interpolation from lower, respectively higher, redundancy systems to
the system under analysis, identified by P. Drre during the CCF-
RBE, should be consistently approached.
- Updating of prediction in the design phase with operational evidence
should be recommended in a Baysian perspective; and, finally
- Effort in enlarging the dependent event data base should be made by
analyzing both incident and component event files.

All these issues have been dealt with in the different chapters of the
book, which therefore should result to be a comprehensive guide to a
reliability analyst not only to include dependency structures in a PSA
procedure but also to make him aware of the connected problems and
uncertainties.

REFERENCES

1. I.A. Watson and G.T. Edwards, A study of Common Mode Failures,


UKAEA Report, SRD R146, July 1979.
2. A. Amendola, Classification of multiple related failures (in this
same book).
3. I.A. Watson, Review of Common Cause Failures, UKAEA Report, NCSR
R27, July 1981.
4. K.N. Fleming, A reliability model for Common Mode Failures in
redundant safety systems, General Atomic Report GA-13284,
December 1974.
5. G.E. Apostolakis, Effect of a certain class of potential common-
mode failures on the reliability of redundant systems, Nucl. Eng.
& Design, Vol.36, No.l, January 1976.
6. T. Mankamo, Common load model - a tool for common cause failure
analysis, VTT (Technical Research Centre of Finland), 1977.
7. W. Vesely, Estimating common cause failure probabilities in
reliability and risk analyses, Int. Conf. on Nuclear Systems
Reliability Engineering and Risk Assessment, Gatlinberg,
Tennessee, 20-24 June 1977.
8. C.L. Atwood, Estimators for the binomial failure rate common
cause model, NUREG/CR-1401, EGG-EA-5112, GM (1980).
9. W.C. Gangloff and T. Franke, An engineering approach to common
mode failure analysis, Proc. of Development and Application of
Reliability Techniques to Nuclear Power Plants, Liverpool (1974).
10. W.C. Gangloff, Common Mode Failure Analysis, IEEE - Power
Apparatus and Systems, Vol.PAS-94, January/February 1975.
11. G. Volta, The Common Mode Failure Analysis, CEC-JRC, SR/76-8,
Ispra (1976).
12. G.R. Burdick, COMCAN - A computer code for common cause analysis,
IEEE Transaction on Reliability, Vol.R-26-2 (June 1977).
13. D.P. Wagner, C L . Cate and J.B. Fussell, Common cause failure
analysis methodology for complex systems, in: Nuclear Systems
Reliability Engineering and Risk Assessment (J.B. Fussell and
G.R. Burdick, eds.) SIAM (1977).
14. J.J. Rooney, J.B. Fussell, Backfire II - A computer program for
common cause failure analysis of complex systems, The University
of Tennessee, August 1978.
15. D.M. Rasmuson, N.W. Marshal, J.R. Wilson and G.R. Burdick, COMCAN
II A - A computer program for automated CCF analysis, EG&G Idaho,
TREE 1361, May 1979.
16. R.B. Worrel et al., A Boolean approach to common cause analysis,
Annual Reliability and Maintainability Symposium, IEEE, 1646
79RM067, pp,263-266.
17. S. Contini, Algorithms for common cause analysis - a preliminary
report, TN 1.06.01.81.16, CEC-JRC Ispra (1981).
18. EPRI, WAMCOM common cause methodologies using large fault trees,
EPRI-NP-1851, Project 1233-1, Final Report (1981).
19. J.K. Vaurio, Structures for common cause failure analysis, Int.
ANS/ENS Topical Meeting on Probabilistic Risk Assessment, Port
Chester, September 20-24, 1981.
20. J.A. Stevenson and C.L. Atwood, Common cause failure rate
estimates for Diesel generators in nuclear power plants, Int.
ANS/ENS Topical Meeting on Probabilistic Risk Assessment, Port
Chester, September 20-24, 1981.
21. C.L. Atwood, Common cause and individual failure and fault rates
for license event reports of pumps at U.S. commercial nuclear
power plants, EGG-EA-5289 (1980).
22. A. Amendola, Systems Reliability Benchmark Exercise, final
report, EUR 10696 EN (1985).
23. LATA Inc., Data benchmark test of a classification procedure for
common cause failures, prepared for EPRI, LATA-EPR-02-02, June
1983.
24. G. Apostolakis and S. Kaplan, Pitfalls in risk calculations,
Reliability Engng., 2 (1981) pp.135-145,
25. R. Virolainen, On common cause failures, statistical dependence
and calculation of uncertainty, disagreement in interpretation of
data, Nucl. Eng. and Des., 77, 1984, 103-108.
26. G.W. Parry, Incompleteness in data bases: impact on parameter
estimation uncertainty, The Annual Meeting of the Society for
Risk Analysis, Knoxville, Tennessee, September 30 - October 3,
1984.
27. H.M. Paula, Comments on the anaqlysis of dependent failures in
risk assessment and reliability evaluation, Nuclear Safety, 27
(1986) pp.210-212.
28. G. Apostolakis and Parviz Moieni, The foundations of models of
dependence in probabilistic safety assessment, Reliability
Engng., 18 (1987) pp.177-195.
29. A. Mosleh and P. Drre, Letter to the Editors on 'Event-based
Multiple Malfunction model', Reliability Engng. and System
Safety, 21 (1988) pp.239-243.
30. P.L.&G., PRA procedures for system level dependent events
analysis, prepared for EPRI and USNRC, January 1987.
31. K.N. Fleming, A. Mosleh and R.K. Deremer, A systematic procedure
for the incorporation of common cause events into risk and
reliability models, Nuclear Engng. Des., 93 (1986) pp.245-273.
32. G.L. Crellin et al. (LATA), A study of common cause failures -
Phase II: A comprehensive classification system for component
fault analysis, EPRI NP-3837, May 1985.
33. K.N. Fleming, A. Mosleh et al. (PL&G), Classification and
analysis of reactor operating experience involving dependent
events, EPRI-NP-3967, June 1985.
34. D. Worledge and I.B. Wall, What has been learned about common
cause failures in the last 5 years?, Int. Topical Conference on
Probabilistic Safety Assessment and Risk Management, Zrich,
August 30 - September 4, 1987.
35. S. Hirschberg (Ed.), NKA-Project 'Risk Analysis'. Summary report
on Common Cause Failure Data Benchmark Exercise, RAS-470(86)14,
June 1987.
36. A.M. Games, A. Amendola and P. Martin, Exploitation of a
component event data bank for common cause failure analysis,
Proc. of ANS/ENS Topical Meeting on Probabilistic Safety Methods
and Application, San Francisco, California (USA), February 24 -
March 1, 1985.
37. A.M. Games, P. Martin and A. Amendola, Multiple related component
failure events, Proc. of Reliability '85, NCSR, Birmingham (KK),
July 10-12, 1985.
38. P. Humphreis, A.M. Games and N.J. Holloway, MRF's from the
analysis of component data (in this same book).
39. A. Poucet and A. Amendola, State of the art in PSA reliability
modelling as resulting from the international benchmark exercises
project, NUCSAFE 88 Conference, Avignon, October 2-7, 1988.
40. B.D. Johnston, A structured procedure for dependent failure
analysis (DFA), Reliability Engng., 19 (1987) pp.125-136.
TREATMENT OF C OMMON C AUSE FAILURES. THE NORDIC PERSPEC TIVE.

S. HIRSC HBERG
ABB ATOM AB
Office of Interdisciplinary Engineering
S721 63 VSTERS
Sweden

ABSTRACT. This paper presents current developments in Nordic countries


within the area of treatment of C ommon C ause Failures (C C Fs). The organisa
tion and main findings of the recently performed Nordic Benchmark Exercise on
CCFdata are described. The findings concern recommendations on suitable
procedures for search of C C Fs, suggestions for improvements of the current
failure reporting system, merits and drawbacks of classification systems, most
sensitive elements in the process of C C Fquantification, and evaluation of
parametric models and use of direct assessment in the context of C C F
quantification. In the second part of this report the results from the first,
qualitative phase of a project aiming at a systematic comparison of the
treatment of dependencies (and C C Fs in particular) in the Swedish Probabilistic
Safety Assessments (PSAs), are summarized. A wide spectrum of differences
has been observed between the analyses performed. These differences may
concern level of ambition, degree of detail, scope, approach to modelling and
quantification, and documentation of the studies. Generally, the Swedish PSAs
have been successful in identification of various types of dependencies. The
recommendations based on the qualitative analysis comprise a list of specific
cases to be analysed in the planned sensitivity studies, proposals for future
Nordic research projects and suggestions for improvements of/supplements to
existing analyses.

1. Background

Treatment of dependencies is obviously one of the critical issues in the context


of Probabilistic Safety Assessment (PSA). Dependencies may have a major
impact on the plant safety level and must be addressed at all stages of the
analysis. One particular topic, namely C ommon C ause Failures (C C Fs), which
represent a particular type of dependency, has been clearly identified as a
major limitation of current PSAs. The treatment of this issue influences the
credibility of the studies, the question of completeness and the interpretation
of results.

. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 9-29.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
During the last five years a very rapid development has been taking place
within this field in Scandinavia. In 1983 an international workshop on dependent
failure analysis (ref. 1), sponsored by the Swedish Nuclear Power Inspectorate,
was held in Vsters. The meeting was considered as very successful. The
general feeling of the participants was that some order has been brought to the
somewhat chaotic state of dependency analysis. The recommendations of the
workshop (ref. 2) were followed in several of the Swedish PSAs.
The development work has been and still is carried out by the Swedish
utilities, by ABB ATOM and by the Nordic research institutes partly in the
current PSAs and partly within NKA-projects, SK-1 (ref. 3) and RAS-^70
(ref. . Substantial effort has been directed towards improving the CCF-
quantification models (ref. 5, 6). This has lead to development of an approach
which consistently takes into account the high level of redundancy and diversity
typical for the safety systems in the latest generation of ABB ATOM'S BWRs.
Progress has been made within the recently completed CCF-data Benchmark
Exercise (ref. 4) in problem areas such as:

procedures for search of CCFs


use of classification systems
treatment of data
use of parametric quantification methods and of direct assessment
sensitivity and uncertainly analyses.

These findings will be described here based on the summary report of the
Benchmark Exercise (ref. f) and on individual analyses performed by the
participating organisations. More detailed coverage of some of the specific
topics mentioned above may be found in the Nordic contributions to this
Proceedings, addressing the subject of CCF-identification, CCF-quantification
and CCF-evidence based on operating experience.
The importance of dependencies has been recognized in the Swedish PSAs,
where as a rule great emphasis has been put on their treatment. The analyses
performed contain a variety of models, data and assumptions. Thus, it is
possible that modelling aspects could explain some of the differences in the
results of different studies. In view of the substantial uncertainties associated
with the treatment of certain types of dependencies, a major effort has been
undertaken to systematically compare the analyses performed in the Swedish
PSAs. Here, the main results from the qualitative phase of this project will be
described, with main emphasis on the treatment of CCFs. Detailed account of
this work may be found in ref. 7. A corresponding analysis concerning human
interactions has also been performed (ref. 8).
The CCF-data Benchmark Exercise and the systematic comparison of
analyses of dependencies and human interactions were carried out within the
Nordic project "Risk Analysis" (NKA/RAS-470), initiated as a part of the
research program of the Nordic Liaison Committee for Atomic Energy (NKA).
The participants in this project include both authorities, utilities, research
institutes and the Swedish vendor of nuclear power plants. The project
activities are planned to be completed in 1989.

10
2. Nordic Common Cause Failure Data Benchmark Exercise

2.1. ORGANISATION AND TIME SCHEDULE

The Benchmark Exercise has been carried out by four working groups:

ABB ATOM, SWEDEN


RIS NATIONAL LABORATORY (RIS), DENMARK
STUDSVIK ENERGITEKNIK AB (STUDSVIK), SWEDEN
TECHNICAL RESEARCH CENTRE OF FINLAND (VTT).
The work has been coordinated by ABB ATOM and directed by a Project
Group composed of one or two project members from each participating
organisation and of experts from the Swedish Nuclear Power Inspectorate,
Finnish Centre for Radiation Protection and Nuclear Safety, Imatran Voima Oy,
and the Swedish State Power Board.

The main steps in the exercise have been:

1) Collection of relevant background material, i.e. a set of failure events for


the particular component type analysed.
2) Stipulation of common rules concerning identification of CCFs and formu-
lation of general directives for the estimation of CCF-contributions.
3) Practical carrying out of the exercise including study of related problems;
this step consists of two phases, identification and quantification.

Table 1 shows the project schedule of the Benchmark Exercise and the total
resources expended by the organisations involved.

2.2. PROBLEM DESCRIPTION AND BACKGROUND MATERIAL

Motor-operated valves (MOVs) in Swedish Boiling Water Reactor (BWR) plants


have been chosen as the object of the CCF-data Benchmark Exercise (BE).
ABB ATOM supplied the four groups participating in the BE with a common
set of failure events involving MOVs from seven Swedish BWRs. The database
comprised 340 failure reports from the Scandinavian Nuclear Power Reliability
Data system and two Swedish Licensee Event Reports (LERs). The following
time periods have been covered:

Barsebck 1 771001821231
Barsebck 2 790101821231
Forsmark 1 810101-821231
Forsmark 2 810701-821231
Oskarshamn 1 740101-821231
Oskarshamn 2 760101-821231
Ringhals 1 761001-821231

Additional information collected by ABB ATOM and distributed to all


working teams included:

- 11 -
TABLE 1
Project Schedule of the Benchmark Exercise.

Task Person 1985 1986


months
7 8 9 10 11 12 1 2 3 't 5 6 7 8 9 10 11 12

Collection <5c preparation


of background material 1.5
Stipulation of common
rules for BE 0.5

CCF-identification 6.0

CCF-classification 1.5
CCF-quantification inch
uncertainty and sensi-
tivity analysis 8.0
Final report 3.0
Project coordination 1.5
Meetings, admini-
stration 3.0

Person months, total 25.0


a list of all MOVs covered by failure reports, including operation times,
test intervals, number of activations and number of critical failures during
commercial operations
a list of all overhaul periods
failure codes used in the ATV-system
flowsheets of relevant systems.
Using this information each group identified a number of CCFs which have
occurred at different plants. Later on, based on the results from the identifica-
tion phase, the list of CCF-candidates was used as the starting point for the
identfication phase. However, according to the rules, reviewed and approved by
all participants, it was up to each analyst to decide which of these events are
relevant for the particular case analysed. The actual application concerned four
redundant MOVs in a safety system of Forsmark 1 plant. All failure multipli-
cities were to be considered, which should lead to estimation of P (i = 1,2,3,4),
i.e. probability of observing exactly "i" failures at a test or at a demand. Each
group was free to choose anyone of the available methods for CCF-quantifica-
tion.

2.3. OBSERVED DIFFERENCES IN APPROACHES TO IDENTIFICATION AND


QUANTIFICATION

Observed deviations in approaches of different groups to CCF-identification are


mainly related to the actual interpretations of the CCF-definition and to
differences in applications of boundary conditions. This may concern:

treatment of intersystem dependencies


treatment of identical non-redundant components
length of critical time period
failure criticality
definition of potential CCFs
treatment of failures reported during the overhaul period
treatment of different failure modes.

In the context of CCF-quantifcation the following methods were used:

Multiple Greek Letter (MGL) method (ref. 9)


Binomial Failure Rate (BFR) model (ref. 10)
Multinomial Failure Rate (MFR) model (ref. 11)
Direct Assessment (DA)
Additive Dependence (ADDEP) model (ref. 12).

The observed differences in quantification (in addition to choice of different


models) concern:

approach to screening ("non-CCF" events and design-oriented screening)


failure extension (mapping up)
weighting of potential CCFs.

13
2. RESULTS

As a result of the identification, 17 C C Fcontributions have been identifed.


Eleven events involved redundant valves; nine of these eleven events were
identified by at least three of the groups. Internal leakages, torqueswitch
problems and sticking valves constitute dominating failure modes.
Figure 1 shows the estimated probabilities of observing exactly i (i = 1,2,3,4)
failures per demand and corresponding 90% confidence intervals. The diffe
rences between the methods increase with growing failure multiplicity, but with
exception of BFR and ADDEPestimates of P3 and P^ are not dramatic.
Sensitivity analyses have been performed by all participating groups. The
BFR model has been shown to be very sensitive to assumptions. With few
exceptions most of the discrepancies in the estimated C C Fcontributions can be
explained by differences in the treatment of data (C C Fdefinition and
weighting factors etc).

2.5. MAIN FINDINGS

The methods used for C C Fidentification are straightforward and require as a


minimum information failure descriptions containing failure mode, cause,
criticality and time of detection. However, availability of much more detailed
background material including data on type of MOVs, physical locations,
manufacturers, maintenance policies etc, would decrease the impact of
subjective judgement on the results. These aspects should be considered in
future applications.
The use of computers to aid in searching, sorting and generally reorganising
failure reports is highly recommended. At the same time, dependence on the
computer alone to directly identify C C Fcandidates is discouraged. In the latter
case, it is necessary to depend heavily on the coding of failure reports, which is
a major source of uncertainty.
Since a majority of C C Fcandidates has been detected during the overhaul
period, any improvement of the quality of these reports would be most
welcome. In addition, when using screening procedures attention should be given
to the types of tests carried out during normal operation and during overhaul,
and their capability of revealing critical failures. Treatment of multiple failure
events detected during the overhaul period proved to be one of the most
controversial and unresolved issues in the Benchmark Exercise.
The search for C C Fs has shown that identification of such events can be
reasonably performed using the available failure reports. Excluding the diffe
rences in scope and neglecting the differences in definitions, the agreement
between the groups is satisfactory. On the other hand, including these diffe
rences produces much worse agreement. This fact supports the requirement of
providing a clear and concise specification of boundary conditions when perfor
ming studies of operational C C Fexperience. The differences in scope should be
seen in the first place as an organisation problem in the context of a Benchmark
Exercise. The impact of subjective judgement could be significantly decreased
if more information, particularly on failure causes, could be included in the
descriptive part of failure reports. Major improvements would be expected as a
result of involving plant personnel in scrutiny of the reports describing C C F
candidates. This could supply additional information and thus facilitate forming

14
M MGL^I

x MGLRI

D DAjI

<\ DA v n
<D ADDEPV

> BFR^n
OJ BFR V I

O MFRAA

10-6

Figure 1. Estimated probabilities of observing exactly i (1 = 1, 2, 3, 4) failures per demand and corresponding 90%
confidence intervals (the following index notation bas been used AA = ABB ATOM; R = RIS, S = STUDSVIK, V =
VTT).
a final opinion.
Due to the nature of the Swedish component failure reporting system
(concerns single components), the use of classification systems is of limited
value. Possibly, decisions could be facilitated in some doubtful cases. The
classification systems are cause-oriented, while the available failure reports
provide in the first place information about failure modes. Among the
advantages of classification systems we could mention that they provide a
systematic and standardised way of presenting and saving of the results as well
as a good framework for exchanging information and experiences between
analysts. Consequently, understanding of foreign CCF-events may be facilitated
by use of a good classification scheme.
The quantitative analysis has shown that direct assessment of CCF-
contributions is possible, given comprehensive information including system
flowschemes for identification of redundancies, and number of actuations and
failures of relevant components. Simple parametric methods (e.g. Additive
Dependence model and Multiple Greek Letter method) are still of major
interest; they may be directly combined with data on single failure probabilities
given in the Swedish Reliability Data Book (ref. 13), are easy to apply, are
suitable for checking the impact of modified assumptions, and represent in
practice the only option which may be applied to components which are not as
common at the plants as valves. On the other hand, the Binomial Failure Rate
model is complex and its use may result in arbitrariness. The Multinomial Fai-
lure Rate method needs as input the same type of information as direct
assessment.
In most cases the identified differences in quantitative analyses are only to
a certain extent model-dependent. The handling of data (CCF-definition in
general, definition of potential CCFs, screening, use of impact vectors and
weighting factors etc), is of decisive importance in this context. Consequently,
several of the factors which are essential in the process of CCF-identification
have also a major impact on quantification.
Hopefully, the results of the Nordic Benchmark Exercise will form a
valuable input to a future project aiming at generation of a Nordic CCF-data
book.

3. Retrospective Qualitative Analysis of Dependencies in Swedish PSAs

3.1. SCOPE OF WORK


This study has been carried out by ABB ATOM.
The main emphasis in this part of the present paper will be on the treatment
of CCFs. Regarding dependencies in general, the main findings will be shortly
summarized.
The focus on qualitative characteristics of dependency analysis should be
interpreted in a broad sense, i.e. also the quantitative methods and data applied
have been considered. The results from the present phase will be used later on
as an input to the quantitative phase of the project comprising sensitivity
studies.
Only the completed Swedish PSAs have been considered i.e. PSAs for five
ABB ATOM BWR plants, (Ringhals 1, Barsebck 1, Forsmark 3, Oskarshamn 1

- 16 -
and Oskarshamn 3) and for one Westinghouse PWR plant (Ringhals 2).
Details of the comparative overview may be found in the main report
(ref. 7).

3.2. APPROAC HES TO MODELLING OF DEPENDENC IES


A systematic classification of dependent failures is necessary when carrying out
the comparative studies. For this purpose the classification of PRA Procedures
Guide (ref. 14), has been adapted. C onsequently dependencies have been divided
into the following categories:

1) Type 1. C ommoncause Initiating Events (C C Is)


2) Type 2. Intersystem dependencies
Type 2A. Functional dependencies
Type 2B. Sharedequipment dependencies
Type 2C . Physical interactions
Type 2D. Humaninteraction dependencies.
3) Type 3. Intercomponent dependencies. Subtypes 3A, 3B, 3C and 3D are
defined to correspond with subtypes 2A, 2B, 2C and 2D, respecti
vely except that the multiple failures occur at the component
level instead of at the system level.

Below the treatment of different types of dependency will be briefly


outlined. The C C Fcontributions, which usually are a subset of intercomponent
dependencies are covered separately in section 3.3.

The following specific conclusions have been drawn:

1) The issue of equipment related C ommon C ause Initiators (C C Is) has been
addressed in all PSAs. However, only the Ringhals 1 and Forsmark 3 studies
contain a systematic and dedicated analysis of C C Is. Significant C C Is have
been identified in the Ringhals 1 and Oskarshamn 1 PSAs. The coverage of
the Ringhals 1 analysis is not totally clear.
2) Functional dependencies are handled in all studies by the small event
tree/large fault tree approach. Discrepancies which cannot be explained by
design differences exist and are caused by different perception of the
design, different assumptions, or errors in the analysis. A detailed survey
of these differences has been generated within the SUPERASAR project
(ref. 15). Some functional deficiencies have been identified by the Ringhals
1 and Barsebck 1 PSAs.
3) Sharedequipment dependencies are covered in all studies by fault trees
which in most cases are characterized by a high degree of detail (see
table 2. Possible problems may origin from computerized Boolean reduc
tions or/and from manual reductions of these large logical models.
None of the Swedish PSAs contains a documented, systematic and compre
hensive search for physical interaction dependencies. However, a relatively
thorough survey has been made within the Barsebck 1 PSA. Documenta
tion of a similar study within the Oskarshamn 1 PSA is not available.
Influence of normal environment is covered by fault trees and residual
CCFs. The backflush operation in the context of LOC A is considered to be

17
TABLE 2
Rough Survey of Degree of Detail in Systems Modelling of Different PSAs.1)

SYSTEMS Makeup Residual Reactor Electrical Reactor Other


Water Heat Shutdown Power Protec- Auxiliary
PSA Removal Supply tion (RPS) Functions

Ringhals 1 H H H L H M
Barsebck 1 H H M H H L
Forsmark 3 H H H H H M
Oskarshamn 3 H H H H H L
Oskarshamn 1 H H L H .2) L

D The degree of detail is denoted as high (H), medium (M) or low (L)
2
>A note given in the study states that fault trees for RPS exist, but have not been included.
important in the Ringhals 1 and Barsebck 1 PSAs on the one hand, and not
necessary in the Oskarshamn 1 study. Only the Forsmark 3 PSA contains an
analysis of dynamic effects which may follow upon a pipe break. Such
effects may be much more significant for some of the other plants,
particularly those which do not belong to the BWR 75 generation.
5) Human interaction dependencies are represented in the fault trees and
event trees, and also covered by residual CCFs. Generally, the Ringhals 1
PSA contains the most ambitious analysis of human interactions. None of
the studies addresses the problem of errors of commission. Systematic
misconfiguration of redundant components has been addressed in the
Ringhals 2 PSA for some motor-operated valves, in the Forsmark 3 PSA
sensitivity study, and qualitatively in the Ringhals 1 PSA.
6) Non-standard methods for identification and evaluation of dependencies
have been used in some of the studies. Examples comprise e.g. extended
signal analysis within the Forsmark 3 PSA and systematic walk-through
analyses within the Barsebck 1 and Oskarshamn 1 PSAs. The limitations of
this type of approach are substantial.
7) Residual CCFs, which account for the dependencies not covered explicitly
by the event tree/fault tree model, have been used in all PSAs.

3.3. TREATMENT OF COMMON CAUSE FAILURES


3.3.1. Definition. There is a variety of definitions of Common Cause Failures
(CCFs) in the Swedish PSAs. As a rule none of the studies supplies a strict
definition, but rather a number of characteristic features attributed to CCFs
have been specified. Among them we may mention: multiplicity, common cause,
simultaneity and criticality. From the practical point of view the exact wording
used in definitions is of secondary interest. Several examples may actually be
found where terminology is mixed-up and the more general term "dependency"
is used instead of "common cause failure" or vice versa. In practice CCFs are in
all studies considered as a subset of dependencies. This means that CCFs, which
are usually referred to as residual common cause failures, account for the
dependencies which are not explicitly included in the analytical models (event
trees and fault trees).

3.3.2. CCF-representation. CCFs are represented in Swedish PSAs on three


levels in the fault trees:

component tree level (Bl, 0 3 and Ol PSAs)


system train tree level (F3 PSA)
function tree level (Rl and R2 PSAs).

Table 3 summarizes the advantages and disadvanteges of different ways of


incorporation of CCF-contributions into the fault trees.

3.3.3. Degree of Coverage. The intended degree of coverage is almost identical


in all PSAs. CCF-contributions have been quantified for active, redundant
components (e.g. motor-operated valves, pumps and diesel generators) in safety
systems. As a rule only intrasystem contributions have been considered.
Consequently, contributions from passive components, diversified equipment,

- 19 -
TABLE 3
Alternatives for Incorporation of C C Fcontributions into the Fault Trees

CCFrepresentation Advantage Disadvantage


in Fault Trees

Component tree level Systematic proce Large complex fault


(Bl, Ol and 0 3 PSAs) dure trees for fourdivi
sional systems

Completely automa
tic calculations
o

System train tree Relatively compact Some handmade cal


level (F3 PSA) model even in the culations may be
case of fourdivi necessary
sional systems
Complete symmetry
necessary

Function tree level Compact model Correct distinction


( R l a n d R 2 PSAs) between different
failure multiplicities
not possible
intersystem C C Fs, have been neglected. Some exceptions from this rule have
been found, but they are not characteristic for any of the studies.

3.3.4. Quantification of C C Fcontributions. The choice of a suitable method for


quantification of C C Fcontributions has always been controversial. A number of
parametric models have been developed and used in the context of PSAstudies.
In an ideal world some requirements could be placed on these methods:

1) Simplicity
2) C lear definition of parameters
3) C orrectness (within specified limitations)
Generality
5) C ompatibility with existing data sources
6) Assurance of realism
7) Possibility to consider design and systemspecific factors
8) Possibility to distinguish between different failure multiplicities.

In practice some of the above requirements may be in conflict with each


other and the final choice is a matter of compromise. Evaluation of the
different approaches used, has been made on the basis of earlier comparisons of
quantitative methods (ref. 5, 6) and experiences from the Nordic C C Fdata
Benchmark Exercise (ref. .

The following quantitative methods have been used in the Swedish PSAs:

extended C factor model (RI PSA)


betafactor model (Bl PSA)
Cf actor model (R2 PSA)
MGL model (F3, 0 3 and Ol PSAs).
CCFdata in Swedish PSAs are based on:

U.S.experience (Rl, R2 and 0 3 PSAs)


engineering judgement (Bl and Ol PSAs)
limited study of the Swedish operating experience (F3 PSA).

In principle all these approaches may be acceptable, given a proper


treatment of data. The following deficiencies have been identified in this
context:

1) Ringhals 1 and Oskarshamn 3 C and betafactors, respectively, represent


plantspecific data fr Ringhals 2 and Seabrook. Direct application of such
data to other plants is not in line with the intentions behind the parametric
models.
2) Oskarshamn 1 gammafactors represent reduced Seabrookdata (see point 1
above) and are judged as optimistic.
3) Assignment of C C Fparameters in Barsebck 1 and Oskarshamn 1 PSAs is
based on engineering judgement. C onsequently, completeness and precision
can be questioned.
C C Fparameters of Forsmark 3 PSA are in some cases based on a rather
limited material (from statistical point of view).

21
Table 4 summarizes advantages and disadvantages of different approaches to
assignment of C C Fdata, as applied in the Swedish PSAs.
Betafactors from Forsmark 3 and Oskarshamn 3 PSAs and C factors from
Ringhals 1 and 2 PSAs are given in table 5. Betafactors in Barsebck 1 and in
Oskarshamn 1 PSAs are not componenttype specific. In these studies three
respectively two different values of betafactor (0.01, 0.05, 0.10 and 0.05,
0.10), have been used. The choice of the betafactor was in each particular case
dependent on the significance factors assessed in the qualitative analysis.
CCFs for failure to run have been neglected in case of diesel generators at
Barsebck 1, Oskarshamn 3 and Oskarshamn 1. These contributions can be
significant; in Ringhals 1 and in Forsmark 3 they constitute 54% and 42%,
respectively, of the corresponding C C Fs for failure to start.
The higher order parameters, gamma and delta have been assigned values of
0.4 and 0.6, respectively, in the Forsmark 3 PSA, and 0.3 and 0.9, respectively
in Oskarshamn 1 and 3 PSAs. Factor 0.5 has been used in Ringhals 1 PSA when
extending the C factor model to 4 50%, 3 100% or 4 100% systems. No
such extensions have been used in the Ringhals 2 and Barsebck 1 PSAs.
The numerical differences are very significant. C omparison of C C F esti
mates for diesel generators shows that quadruple C C Fcontribution for diesel
generators in the Oskarshamn 3 PSA is 30 times lower than in Ringhals 1 and 14
times lower than in Forsmark 3 (a twin plant). Such large discrepancies will
naturally have a dramatic impact on accident sequences resulting from loss of
offsite power. These discrepancies are not motivated by actual differences in
design and operation of diesel generator systems.

3.3.5 Treatment of Systems with Nonstandard Success C riteria. The previously


mentioned methods for quantificaiotn of C C Fcontrubutions may in their basic
formulations only be applied to systems with at most four redundant compo
nents. In certain system (e.g. groups of control rods, pressure relief valves,
frequency converters, reactor scram modules), a larger number of identical
components with the same functions may occur. This leads to nonstandard
success criteria for these systems and to requirements on nonstandard
approach to quantification of the corresponding C C Fcontributions.
Different methods have been used for estimation of such C C Fcontributions.
In Ringhals 1 PSA the extended C factor method has been applied. However,
only one failure combination has been considered (this applies also to the
Oskarshamn 1 PSA) which probably leads to underestimation of C C Fcontribu
tions. Both Barsebck 1 and Oskarshamn 3 PSAs use coupling factors for
quantification of C C Fcontributions for reactor shutdown functions. Such
approach is in principle correct, but the factors are arbitrary and not supported
by the operating experience. The Forsmark 3 PSA uses an extended MGL
method. Having in mind the uncertainties associated with the parameters
involved, this method should be seen as an extended sensitivity study. It should
be noted that the frequencies of ATWSsequences are in most cases propor
tional to these very uncertain C C Fcontribtutions.

3.3.6. Impact of C C Fs on Dominant Accident Sequences. Table 6 shows an


example of the impact of C C Fs on the dominant accident sequences.
It is apparent from the tables that the intercomponent C C Fs have a decisive
impact on the results of the studies. The principal C C Fcontributors are in the
first place motoroperated valves and pumps (mainly centrifugal). Thus, the

22
TABLE i
Different Sources of CCF-Data in Swedish PSAs.

Source Advantage Disadvantage

U.S.-experience ,\ Relatively comprehensive In principle not applic-


(Rl-, R2, 0 3 PSA) W material able to Swedish condi-
tions (differences in de-
sign, operation, mainte-
nance)

Extremely difficult to
adjust data to Swedish
conditions

Engineering judge- Plant-specific data Completeness question-


ment (Bl-and directly obtained able
01 PSAs)
Qualitative results useful Highly dependent on
for the utility quality of analysis

Applicability limited to
operating plants with
moderate or poor sepa-
ration

Experience from other


plants not used

Swedish/Finnish Relevant material Experience not very


experience (F3 PSA) comprehensive
Good knowledge of the
background

Valuable for specification


of defensive measures

1'AH three studies use US-experience, but while data for Ringhals 2 have been
obtained after proper design- and application-oriented screening, data for
Ringhals 1 are plant-specific Ringhals 2-data and data for Oskarshamn 3 are
reduced Seabrook-specific data.

- 23 -
TABLE 5
Beta-factors of Forsmark 3 and Oskarshamn 3 Studies and C-factors of Ringhals 1 and 2 Studies

Component Failure Beta-factor C-f actor


mode
F 3 03 Ringhals 1&2

Air-operated valve Fails to operate 0.16 - 0.1*


Scram valve Fails to open O.OS coupling o.i*1'
factors
used
31*-main valve Fails to open 0.0* _ o 1 ^2
Motor-operated valve Fails to operate 0.07 0.022 0.0*2 > 0.00963)
Motor-operated valve Wrong coniiguration^JI - 0.25
Check valve Fails to open 0.01 - -
Safety valve Opens inadvertently 0.02 0.02 -
Fan Fails to start _ _ 0.039 2
Centrifugal pump Fails to start 0.02 0.085 0.039 V0.00973)
_ll_
Fails to run - - 0.029
Piston pump Fails to start 0.02 0.026 not applicable
Turbine driven pump Fails to start not not 0.039 1 '
applicable applicable
_ll_
Fails to run 5 _H_ _H_
0.0291)
Diesel generator Fails to start ' 0.03 0.0065 0.05*
_ll_
Fails to run 0.03 - 0.008

^ Applicable only to Ringhals 1.


2) No repair.
3> With repair; applied only in Ringhals 2 PSA in situations where repair can be made in a 6 to 9 ours time interval.
*| Applies only to MOVs which do not receive a conformatory open or close signal upon system initiation.
5> The corresponding beta-factors for diesel generator in Oskarshamn 1 and Barsebck 1 are 0.11 and 0.03,
respectively.
TABLE 6
Impact of Intercomponent C C Fs on Top Ten Dominant Accident Sequences of the Ringhals 1 PSA.

Accident Sequence CCFs Repre CCFs Repre Principal Total C C F


Sequence Frequency sented in sented in Cut CCFcon Importance
(per year) out of Y Cut Set No. (among tributor (%)
Sets (X/Y) top ten)

S1W1 6.5 10"7 8/21 3,4,5,6,7,8,9 322MOVs 49


711pumps
322pumps
SI VI 4.310"7 5/41 1,2,3,5 323pumps 90
323MOVs
R 2.710"7
AW1 2.110"7 8/21 3,4,5,7,8,9 322MOVs 37
711pumps
Sy 1.910"7 _ _ _
S1Y 1.8 10"7
AVI 1.0 10"7 5/41 1,2,3,5 323pumps 90
323MOVs
AY 9.010"8
T UV1Q' 7.110"8 36/49 1,2,3,4,5,7,8 323pumps 55
323MOVs
2/14 1,2 100
T'pC2H3 6.010" 8 211valves
516relays
future efforts should be concentrated on supplying better estimates of C C F
contributions for these components. In the case of MOVs a comprehensive
background material has been collected and evaluated; this material could be
used for estimation of plantspecific C C Fs for Swedish BWRs. As indicated in
the Forsmark 3 PSA, the existing statistical evidence for pumps is signifcantly
weaker than that for MOVs. Other important components in the context of
CCFcontributions are diesel generators, gas turbines, scram valves, RPSlogic
channels and pressure relief valves.

3.4. C ONC LUSIONS AND REC OMMENDATIONS


3. C ompleteness Issue. The central questions to be addressed with respect to
completeness of the Swedish PSAs are:

What has been missed?


Are there any scenarios not explicitly included?
Have the studies been successful in identification of dependencies?

It is obvious from the comparison which has been carried out that diffe
rences exist between the studies with respect to the degree of coverage. Since
the PSAs reflect state of knowledge (some of them better than the others) it is
not possible to decide if any one of them is complete in the absolute sense. The
answer would probably be negative due to the fact that new findings are still
being made and the methodology for treatment of some problem areas is not
well established yet. Apart from that, all the studies are limited in scope and
some of potentially significant dependencies have been excluded from the
analyses. In order to assure reasonable completeness the review process
(internal and external) is extremely important. It should be remembered that
some of the studies compared in this report have not yet been subject to
external review.
The comparison of the degree of coverage of the studies provides anyway in
the relative sense a picture of completeness of the PSAs. Specific problems
have been identified, which are accounted for in some of the studies but not in
the others. These differences may be pointed out and should be the subject of
future studies. At the same time the PSAs exhibit many similarities in the
approaches to dependency analysis. This is not surprising since by and large the
same main frame for the analysis (small event tree/large fault tree approach)
has been used. In spite of the similarities, the parallel studies of accident
sequence and systems analysis modelling have disclosed specific discrepancies
with regard to functional and sharedequipment dependencies, which do not
necessarily origin from the actual design differences, but are due to different
perception of the design, different assumptions or mistakes.
The main problem from the point of view of the comparative analyses, is the
varying standard of the documentation of the studies. This means that some of
the PSAs may be reasonably complete, but their credibility would be higher and
the review process would be facilitated given an improved documentation.
The overall picture is anyhow positive. The identified dependencies consti
tute major findings of several studies and without performing the PSAs would
hardly been detected. This emphasises the fact that qualitative analysis of
dependencies is one of the strongest advantages of PSAmethodology and not a

26
weakness. It is important to stress that point since due to a rather common
misconception, dependency analysis is sometimes viewed as a weakness in the
current state of PSA. On the other hand, quantification of common cause
failure contributions is a definite limitation.

3.2. Recommendations for Future Work. Based on the conclusions of the


comparative study some recommendations have been made, concerning:

1) Next phase of the project


2) Future research projects within the field of dependency analysis
3) Possible improvements of existing analyses.
The sensitivity studies are proposed to address the following problems:

1) C ase studies of C C F models and data.


Diesel generators, motoroperated valves, and pumps are components of
primary interest. For motoroperated valves a good basis established within
the Nordic Benchmark Exercise exists. This material could be used for
generation of plantspecific C C Fparameters for all plants and subsequent
requantification of dominating accident sequences. Another subject of
interest is the influence of not distinguishing between all failure multipli
cities.
2) Systematic misconf iguration of redundant components.
3) C C Fs in systems with nonstandard success criteria.

The proposals which may be made on the basis of the qualitative analysis are
preliminary. A more definite picture will probably emerge after performing the
sensitivity studies. Such studies have been initiated.

The research projects which could improve the current stateoftheart in


Nordic countries are (the order reflects priority given to different projects):

1) C ollection of C C Fdata based on the Swedish/Finnish operating experience


and using experiences gained in the Nordic (ref. ) and Euratom (ref. 16)
Benchmark Exercises. In the long perspective also combination with foreign
experience should be considered. The problem of combining different data
bases is presently being studied within the RAS^70 project.
2) Search for C ommon C ause Initiators using the Swedish and Finnish
operating experience (transients) and basic methodology given in ref. 17.
The root causes behind the incidents are very important since they may
provide valuable information of generic character. Possibly, potential
interactions involving diversified systems could be identified.

In view of findings of the present work, some supplementary PSA analyses


should be performed. They are specified below in the order of priority,
which in this case is quite subjective since the phenomena are plant
specific.

1) C C Ianalyses (Ringhals 2, Barsebck 1, Oskarshamn 1). Power supply


systems are suggested as the primary (but not only) subject of the studies.
2) Physical interaction dependencies, in particular dynamic effects after
LOCA (Ringhals 1, Ringhals 2, Barsebck 1, Oskarshamn 1).
27
3) Human interaction dependencies (all plants with the possible exception of
Ringhals 1). The analyses should take more advantage of the knowledge and
experience of operating personnel by performing systematic talkthrough/
walkthrough studies.

ACKNOWLEDGEMENT
This work has been carried out within the Nordic project NKA/RAS470. The
support by the Swedish Nuclear Power Inspectorate is gratefully acknowledged.

. References
1. Hirschberg, S., ed., 'Workshop on Dependent Failure Analysis', Vsters,
Sweden, April 2728, 1983. Swedish Nuclear Power Inspectorate and AB
ASEAATOM.
2. Hirschberg, S., 'Summary Report: C onclusions and Recommendations for
Future Work'. Workshop on Dependent Failure Analysis, Vsters, Sweden,
April 2728, 1983. AB ASEAATOM Report RPA 83212, July 1983.
3. Dinsmore, S., ed., PRA Uses and Techniques: A Nordic Perspective. Nordic
Liaison C ommittee for Atomic Energy, June 1985.
4. Hirschberg, S., ed., 'NKAproject Risk Analysis (RAS470). Summary
Report on C ommon C ause Failure Data Benchmark Exercise'. Final
Report, RAS470 (86) 14, June 1987.
5. Pulkkinen, U., ed., 'Proceedings of the C C F Workshop', Lepolampi, Espoo,
Finland, May 1011, 1984. Report RAS470(87)14 (VTT Work Report SAH
38/37), December 1987.
6. Hirschberg, S., 'C omparison of Methods for Quantitative Analysis of
Common C ause Failures A C ase Study'. International ANS/ENS Topical
Meeting on Probabilistic Safety Methods and Applications, San Francisco,
California, U.S.A., February 24 March 1, 1985.
7. Hirschberg, S., 'Retrospective Analysis of Dependencies in the Swedish
Probabilistic Safety Studies. Phase I: Qualitative Overview'. Report RAS
470 (87) 4 (AB ASEAATOM Report RPC 8736), July 1987.
8 Bengtz, M., Hirschberg, S., 'Retrospective Analysis of Human Interactions
in the Swedish Probabilistic Safety Studies. Phase I: Qualitative Overview'.
Report RAS470 (87) 5 (AB ASEAATOM Report RPC 8754), July 1987.
9. Fleming, K.N., Kalinowski, A.M., 'An Extension of the Beta Factor Method
to Systems with High Levels of Redundancy'. Report PLG0289, August
1983.
10. Atwood, C .L., 'Data Analysis Using the Binomial Failure Rate C ommon
Cause Model'. NUREG/C R3437, September 1983.
11. Apostolakis, G., Moieni, P., 'On the C orrelation of Failure Rates'. Fifth
European Reliability Data Bank Association (EuReData) C onference on
Reliability Data C ollection and Use in Risk and Availability Assessment,
Heidelberg, Federal Republic of Germany , 911 April 1986.
12. Mankamo, T., Pulkkinen, U., 'ADDEP Additive Dependence Model'.
Technical Research C entre of Finland, Research Report, February 1985.

28
13. Bento, JP., el al., Reliability Data Book for C omponents in Swedish
Nuclear Power Plants. Nuclear Safety Board of the Swedish Utilities and
Swedish Nuclear Power Inspectorate, May 1985.
If. PRA Procedures Guide. NUREG/C R2300, U.S. Nuclear Regulatory
Commission, January 1983.
15. C arlsson, L., Hirschberg, S., Johansson, G., 'Qualitative Review of Probabi
listic Safety Assessment C haracteristics'. International SNS/ENS/ANS
Topical Meeting on Probabilistic Safety Assessment and Risk Management,
Zrich, Switzerland, August 30 September <i, 1987.
16. Poucet, ., Amendola ., C acciabue, P.C ., 'C ommon C ause Failure
Reliability Benchmark Exercise'. Final Report, C EC JRC Ispra EUR 11054
EN.
17. Laakso, K., 'A Systematic Feedback of Plant Disturbance Experience in
Nuclear Power Plants'. Ph.D. Thesis (in Swedish), Helsinki University of
Technology, December 1984.

29
CLASSIFICATION OF MULTIPLE RELATED FAILURES

A. Amendola
Commission of the European Communities
Joint Research Centre - Institute for Systems Engineering
21020 Ispra (Va) - Italy

ABSTRACT. Classification schemes for multiple related failures are


reviewed and terminology problems are discussed. It is shown how
different classifications have been established to match different
objectives, such as to build in defences against multiple failure
events, to analyze data, to model dependency structures in reliability
assessments, etc. Therefore, even in the lack of a unified
terminology, no ambiguity should affect a reliability assessment as
far as the treatment of dependency structures is concerned.

1. INTRODUCTION

To protect plants against uncontrolled accident developments, safety


significant systems are required to achieve high availability and
reliability standards. Therefore, special redundancies and
multiplicities of critical items are usually adopted to avoid that the
malfunction of some few components or the unavailability of some
single systems be sufficient to degrade the overall safety feature.
However, possible unidentified dependency structures, random
occurrences of abnormal environmental conditions, inadequacies in
design or manufacturing, inappropriate system operation, etc. may
provoke that multiple items are lost in a critical time period, thus
nullifying the efficiency of the design redundancies.
All the events provoking the loss of multiple items or defences,
which cannot be explained by a simple coincidence of random
independent failure events, have been commonly designated as common
mode or common cause failure (CMF/CCF) events.
The quite different nature of the possible dependency structures
implied under a same term has doubtlessly resulted for a long time in
ambiguities both in collecting the relevant data and in the
reliability modelling: indeed the proposed classification schemes did
not respond directly to the needs of the reliability analysts.
Therefore, more recently a large effort has been spent towards
taxonomies allowing data to be more consistently collected and used.

31
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 31-46.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Collective learning processes have been implemented through
"Benchmark Exercises", which have resulted in a wide consensus of the
technical community on the way of treating dependency structures in
reliability assessments.
By historically reviewing the proposed classification schemes and
discussing the recent taxonomies, the paper shows that, even in the
lack of a unified terminology, ambiguities should no longer exist in
dealing with multiple related failures in probabilistic safety
assessments (PSA).

2. SOME TERMINOLOGY PROBLEMS

In our opinion the more general expression "Multiple Related Failures"


(MRFs) (1) is better suited to label multiple failures provoked by
some existing dependency structures, than the usually adopted CCF
term. "CCF" is only one of many terms introduced to describe MRFs and
has predominated in the practical use over other designations like
common mode, crosslinked, systematic, coupled, simultaneous,
coincidence, dependent failures, or even "common disaster". Terms
which are not in every case synonymous with the term "common cause
failure". For example, the CMF term has been largely used even if it
is only strictly relevant to multiple failure presenting the same
"failure mode" in identical redundant components/systems, whilst the
limiting effect of CMFs in systems with multiple redundant elements is
most obvious, MRFs may affect systems performing different functions,
or supplying further redundancies but based on diverse functional
principles and, therefore, not presenting the same failure mode
(unless the overall unavailability of a system intended as "not
performance according to the design intentions" be assumed as the
common failure mode of diverse systems).
However, early classifications used the term "CMF", even if they
were rather based on "causes"*. As an example Table I shows a list of
CMF categories proposed in 1974 (2) which is clearly causeoriented.
This and other kinds of classifications were discussed in the
framework of the OECDCSNI working group on rare events, the results
of which have been described in the well known UKAEASRD report of
1979 (3). Table II is drawn from this CSNI classification: the term
CMF is still adopted. However, in a later publication (4) Watson
refers to CCFs: indeed he notes that whilst most authors are trying to
define CMFs they had, in fact, referred to cause of failure.
Essentially CMFs form a subset of CCFs. Watson also defines a CCF as
"the inability of multiple, first inline items to perform as required
in a defined critical time period due to a single underlying defect or
physical phenomena such that the end effect is judged to be the loss
of one or more systems".

Failure cause may be defined as: the circumstances during design,


manufacture and use which have led to failure. Failure mode may be
defined as: the effect by which a failure is observed.

32
TABLE II The CSNI classification scheme (3)

CMF CAUSES
I
ENGINEERING (El
1
OPERATIONS (0)
1
DESIGN (ED) CONSTRUCTION (EC I PROCEDURAL I0PI ENVIRONMENTAL IOE)
I I _L l_
| (EOF) |(EDR) |IECM) llECI) 10PM) | | () (0EN|| I I0EEI
FUNCTIONAL REALISATION MANUFACTURE INSTALLATION MAINTENANCE OPERATION NORMAL ENERGETIC
DEFICIENCIES FAULTS COMMISSIONING I TEST EXTREMES EVENTS
Hazard
I Inadequate Inadequate I I
Channel Ouolily Quality Imperfect Operator Temperature
Undetectable Repair Fire
Dependency Control Control Errors
I I
Inadequate Common
Inadequate
I Imperfect Inadequate Pressure Flood
Instrumentation Operation t Inadequate Testing I I
Standards Procedures Humidity Weather
Protection Standards
Inadequate Components I I I
Control Inadequate I Imperfect
Inadequate
Vibration Earthquake
Inspection Inadequate Calibration
Supervision I
Operational I Explosion
Deficiencies
Inspection
I
Inadequate I Imperfect
Communication
Acceleration
I
I Testing
Inadequate Procedures
I Error Stress
Missiles
Inadequate Testing t Inadequate Corrosion Electrical
Components Commissioning Supervision Power
Contamination I
Design Errors
Radiation
Interfere
Design
Limitations I Chemical
Radiation Sources
I
Static Charge
TABLE I - List of CMF categories (2)

1. External Normal Environment


- dust/dirt
temperature
- humidity/moisture
- vibration

2. Design Deficiency
- unrecognized interdependence between "independent"
subsystems, components
- unrecognized electrical or mechanical dependence on
common elements
- dependence on equipment or parameters whose failure or
abnormality causes need for protection

3. Operation and Maintenance Errors


- miscalibration
- inadequate or improper testing
- outdated instructions or prints
carelessness in maintenance
- other human factors

4. External Phenomena
- tornado
- fire
- flood
- earthquake

5. Functional Deficiency
- misunderstanding of process variable behaviour
- inadequacy of designed protective action
- inappropriate instrumentation

In reality, major accident occurrences like TMI have shown that


the multiple defences built-in to protect a NPP from core meltdown
events can be lost as a result of very complex dependency structures
involving a combination of causative factors related to design,
procedural, operational and management aspects as well as to hardware
failures. Consequently, the final undesired event may be provoked by a
sequences of related events (affected either by physical dependencies
or by stochastic ones), among which it is difficult to find a single
underlying defect.
Such correlations or dependencies among failures destroy the
assumption of "independence" assumed in the system analysis before
their dramatical appearance.
A more general definition of dependent events should also account
for their significance for plant availability and not only for safety
and reliability. This is the reason why in Ref. (1) the more general

- 34
term MRFs was proposed, where multiplicity should apply also to
multiple in-time failures of a same component when its failure rate
does not correspond to the expected behaviour because of whichever
possible factor like wrong maintenance, environment, etc.
Instead of attempting to give an exhaustive definition of MRFs 0 ,
we retain it as more appropriate to distinguish among the several
kinds of possible dependency structures, as these may deserve
significant differences in the corresponding modelling approaches.
This is also the conclusion of the very comprehensive project on
CCF classification sponsored by EPRI and tested through benchmark
exsercises involving both USA and European expert teams: "the label
CCF is an inadequate descriptor of events that can be more precisely
designated by an adequate classification system. It is, therefore,
recommended that the technical community discontinue the use of this
simplistic term" (5).

3. CLASSIFICATION SCHEMES

The principal classification schemes proposed can be distinguished


according to their purposes: in fact, their basic differences can be
well explained by the different objectives they match. By keeping in
mind this principle, different schemes may continue to be used,
without resulting in analysis ambiguities.

3.1 Classification vs Defences

The CSNI classification in Table II was clearly directed to the


identification of the causes of MRFs for implementing defences against
their occurrences. Indeed "the significant feature of the system is
that common-mode failures are classified by cause of failure, because
it is considered that if recommendations are to be made for a policy
of prevention of common-mode failures, it is essential that all causes
can be prominently identified" (3). Even in the further elaboration of
this classification by NCSR (6), the link between causes and defences
is clearly identified.
Similar schemes are being generally used by NPP designers and
operators as a kind of checklist to control built-in defences against
MRF occurrences.
Whereas similar classifications are very useful for
identification purposes, they are much less useful for PSA modelling,
since the different causes may deserve a diverse and, in some cases, a
separate treatment.

"An even more general expression might be Multiple Related


Unavailabilities, since a component may become unavailable even if it
has not failed (for instance because of wrong inputs or loss of power
supply, etc.).

35 -
3.2 C lassification vs Effects

For modelling (or better for identifying) the effects of undesired


dependency factors on system behaviour, other schemes have been
proposed. Table III gives some examples (7): here the generic causes

TABLE III - Generic CCF causes (7)

Symbol Generic cause Example sources

generic causes of a mechanical or thermal nature

I Impact - Pipe whip, water hammer, missiles,


earthquake, structural failure
V Vibration - Machinery in motion, earthquake
Pressure - Explosion, out-of-tolerance system
changes (pump overspeed, flow blockage)
G Grit - Airborne dust, metal fragments generated
by moving parts with inadequate
tolerances
S Stress - Thermal stress at welds of dissimilar
metals, thermal stresses and bending
moments caused by high conductivity and
density of liquid sodium
Temperature - Fire, lightning, welding equipment,
cooling system faults, electrical short
circuits

generic causes of an electrical or radiation nature

E Electromagnetic
- Welding equipment, rotating electrical
interference machinery, lightning, power supplies,
(EMI) transmission lines
R Radiation damage
- Neutron sources, sources of ionizing
radiation
M C onducting medium - Moisture, conductive gases
V Out-of-tolerance - Power surge
voltage
I Out-of-tolerance - Short circuit
current

which may provoke multiple failures are the driving elements of the
classification. Indeed "an extraordinary number of secondary failure
causes (generic causes) can be found that would result in component
failures. Also, several different events (sources) may result in the
same cause of secondary failure. Therefore, the analysis is initially
directed towards the generic cause of component failure rather than
the specific event that results in the component failure" (7). For
example, the effects of the generic cause "vibration" might be the

36
same independently of whether the source for vibration is internal
(like machinery in motion) or external (earthquake). The list of the
generic causes distinguished according to their nature is complemented
by a list of further possible common links including factors like
proximity, procedures, etc. (Table I V ) .

TABLE IV - Other links provoking dependencies

Symbol C ommon link Example situations that can result in


system failure when all basic events in
a minimal cut set share the special
condition

E Energy source Common drive shaft, same power supply


C Calibration Misprinted calibration instructions
F Manufacturer Repeated fabrication error, such as
neglect to properly coat relay contacts
I Installation Same subcontractor or crew
contractor
M Maintenance Incorrect procedure, inadequately
trained person
0 Operator or Operator disabled or overstressed,
operation faulty operating procedures
Proximity Location of all components of a cut set
in one cabinet (common location exposes
all of the components to many
unspecified common causes)
Test procedure Faulty test procedures which may affect
all components normally tested together
Energy Location in same hydraulic loop,
flow paths location in same electrical circuit
S Similar parts Important in the case of minimal cut
sets which contain only pumps, only
valves, etc.

Even if oriented towards modelling, such classification is more


useful for a qualitative system analysis to identify components
possibly failing because of a generic cause, rather than for a
complete PSA study. Indeed, internal and external sources can be
hardly dealt with by a common methodological approach.

3.3 C lassifications vs Field Data Analysis

Field data on MRFs can be derived from both incident files and
component event files. In order to sort and analyze the relevant data,
appropriate classifications are needed which are not necessarily
coincident.
In fact, an incident description needs "a method for logically
dissecting multiple component unavailability scenarios into individual

37 -
component unavailabilities" (8), in order to identify related
failures. On the contrary, by starting from component failure event
data, "the principal objective in the analysis of component failure
data is not to dissect multiple unavailabilities in a same event
description but to detect relationships between unavailabilities
recorded in separate component event descriptions" (9).
Samples from incident data (LERs) were used by EPRI in order to
test the proposed classification (5) and to finalize it by
incorporating the comments arisen and removing the ambiguities
discovered during the Data Benchmark Test (8), whereas component data
from CEDB (10) were started being analyzed at JRC by sorting data
according to a MRF classification (1,9) established for the specific
purpose. Now both classifications are complementary and can be
usefully linked in data analysis; this is extensively described in
another paper of the present book (11).
The principles of the EPRI/LATA classification are briefly
described in the following, whereas the reader may refer to Ref. (11)
for details on the MRF classification used for sorting relevant
component event data.
The EPRI/LATA classification (5) as well as the Data Benchmark
Test promoted by EPRI have been very useful in diffusing some
important concepts within the technical community, which have
contributed in removing ambiguities and inconsistencies from PSA
praxis. Furthermore, it allowed data relevant for MRF analysis to be
consistently sorted out from LER data which are now currently screened
for parameter estimation as several papers in this book will show.
The objectives of the classification are:
- to describe a method for understanding multiple component
malfunction event scenarios;
- to suggest data management methods for information storage and
retrieval, to support model development and defensive strategy
analysis.
The classification, as well as the identification of failure sequence
events, is helped by graphical symbols and cause-effect logic
diagrams.
The set of symbols is given in Table V, whereas a cause-effect
logic diagram is exemplified by the following example:

TABLE V - Standard logic symbols for cause-effect logic diagrams

fox) C a u t e Cod (root c a u t a )

n/rt Caute N o d

[/\J Failed Component

j\J Functionally Unavailable Component

PX] Potentially Failed Component

*The XX is used to represent the two-letter cause code.

- 38
(gHShSKS^
1

In the above scenario the XX might represents a human error in


switching off a lubrication oil pump, component 1. Although the pump
was not physically damaged, it could no longer provide its intended
function and the resultant effect is that the turbine, component 2,
physically failed due to lack of lubrication.
A list of cause codes is given in Table VI. This does not
significantly differ from other similar schemes.
The proposed representation allows the analyst to distinguish in
an incident sequence between:
failure events (because of root-causes);
failure events provoked by failures or unavailability of other
components (cascade failures);
unavailability events (functional unavailabilities) both
component and root-caused induced;
multiple failure/unavailability events;
potentially failed components (incipient failures or exposure to
causes that are known to have caused actual failures of similar
components under similar circumstances).

The universe of cause-effect units in a diagram is given in Figure 1


whereas the features distinguishing the units are summarized in Table
VII. The graphical representation is clearly only a way of visualizing
the design of cause-effects in an incidental occurrence: the
consistency of the logical dissection, characterized by the principal
pairs (12),
cause root caused vs component caused
state failure vs functional unavailability
number single vs multiple,
has been the basis for a successful establishment of a data base of
incident occurrences useful for MRF investigations.

TABLE VII - Labels for the units in Figure 1

RF Root-caused failure
RU Root-caused (functional) unavailability
CF Component-caused failure
CU Component-caused (functional) unavailability
SRF Shared root-caused failures
SRU Shared root-caused (functional) unavailabilities
SCF Shared component-caused failures
SCU Shared component-caused (functional) unavailabilities

- 39
CflEponentCustd

Sti tei

Functional Un Functional Un
FiHurtF awlliblHtyU HUed FU Fl1lureF ivllllbllltyU MlMd FU

RF RU

KJn S"
n i l 1 n i l 2 n i l 3 1 4

SRF SRU sm SCF seu SCM

< < :
n i l
VSyhn VSAHSn
I 2 5 i 2 6 i l 7 i 2 12 ! S1 10
SRFU S RUU SRHU SCFU SOU SO
d i t t o figure d i t t o figur d i t t o figure d i t t o figure d i t t o figure ditto figure
11 12 13 li
I
CFtPX CU*PX
RF*PX RUPX



C
CKn (i!Xr -N i^ira (\-
I
n.l^r 1 1Jr n i l '
~J
ril r> 1
16 r i l 19 r i l 20

SCFPX . SCUPX / \5<"

fV^n 1
/sys' /V/M
\SA^
i 2 ' i 2
l s / ^ Is/^g.^'r
i l
i l
TPxlr

r i l 2' r i l 25 r i l 26

SRFU+PX SRUU+PX SWU+PX SCFu>PX SCbWPI SCHUPX

d i t t o figure d i t t o figure d i t t o figure d i t t o figure d i t t o figur d i t t o figure

* 3:
*> RPX
s
^ m
" i

?
~~\***

^ Nulr
Jpxl,

s
o
. j 1 33

Figure 1 The universe of causeeffect component unavailabilities


unit.

40
TABLE VI C ause c o d e s .
|\] Functionally Unavailable Component

^O Failad Component

D) Design/Manufactunng/Construction

(DR) Plant Definition Requirement


Inadequacy

Inadequacy

S
Design E r r o r or Inadequacy

M a n u f a c t u r i n g Error or Inadequacy

(DO Construction E r r o r or Inadequacy

(DM Other (explain)

p) P r o c e d u r e ! Inadequacy (A mbiguous, Incomplete, Erroneous)

e
(po) Defective Operational Procedure

Defective Maintenance Procedure

Defective C a l i b r a t i o n / T e s t Procedure

(px) Other (explain)

Human A ctions. Plant Staff

(HP) Failure to Follow Procedures

(HM) Misdiagnosis (Followed wrong p r o c e d u r e )

(HA) A ccidental A ction

(HX) Other (Explain)

Maintenance

(MS) Scheduled Preventive Maintenance


^^ ( i n c l u d i n g Surveillance Tests and C a l i b r a t i o n )

(MF) Forced Maintenance (Repair of a Known Failure)

Environmental Stress

(EE) Electromagnetic Interference

(EM) Moisture ( s p r a y , flood, e t c . )

Fire
( Temperature (high or low)

(ER) Radioactive Radiation ( i r r a d i a t i o n )

Chemical Reactions

Vibration Loads

(El) Impact Loads

(Em Human Caused External Event

(EN) Acts of Nature

Q Internal ( I n t e r n a l to Component, Piece-Part)

Unknown

41
4. CLASSES OF DEPENDENCY STRUCTURES VS PSA MODELLING

As far as modelling is concerned, dependency structures may be


classified according to the more appropriate treatment they deserve.

4.1 Explicit Identification of Functional Dependencies in the Logic


Diagrams

One of the principal objectives of a system analysis should be the


identification of functional dependencies both when they result in
unavailabilities of multiple components within a same system or among
systems - functional unavailabilities - and when they result in
further component failures (within or among systems) - cascade
failures. Such dependencies must be explicitly modelled in the
relevant fault trees/event trees. Therefore, they do not need to be
subject of any other special statistical modelling.

4.2 Constraints Due to Procedures

Particular kinds of dependencies can be introduced by administrative


procedures (e.g. technical specifications) among the operability of
plants and/or systems. Operations might be conditional on the state
of certain systems or on the occurrence of given events. According to
the particular case fault trees with boundary conditions, Markovian or
Monte Carlo methods can analyze such problems in an appropriate
manner.
Other constraints or dependencies can be introduced by the
repair/maintenance policy assumed. For instance, in repairable systems
the performance processes of the different components can be
considered as independent only if an adequate number of repair
operators are available. These kind of problems should be considered
by adopting the most suitable reliability tool for the case under
examination (i.e. F.T., Markov, Monte Carlo).

4.3 Statistical Correlations Among Failure Rates

A particular class of correlations is that which should be assumed to


exist between failure rates of nominally identical components. This
correlation must be correctly taken into account when evaluating the
distribution of the reliability figures of interest (13).

4.4 External Events

Abnormal environmental conditions may provoke simultaneous loss of


functional and protective systems. Major fires, floodings, hurricanes,
earthquakes, aircraft crashes, etc. might indeed be common causes for
multiple failures or unavailabilities. These events should be
subjected to a separate probabilistic analysis as described at
Ref.(14).

42
4.5 Dependency Structures Linked with Human Factors

As Table II shows many causes of dependencies can be attributed to


human errors. These may occur at any stage of system design and
operation and may present very different patterns.
The TMI accident is very instructive about some typical effects
of human malfunctions and, therefore, is worth to be briefly analyzed
under this perspective. The incident was initiated by an anticipated
harmless transient (event ) , which provoked the shut-down of the
plant and demanded the start-up of the auxiliary feedwater system. The
AFWS had been designed with a redundancy degree large enough to ensure
it a high availability figure. However, because of a human failure
(event A) having occurred at a previous time, all the isolation valves
of the redundant trains had been left closed after completion of the
required maintenance interventions: a typical C C F event. C onsequently,
loss of heat removal provoked a pressure increase in the primary
cooling system; a relief valve opened into the pressurizer to mitigate
the pressure transient but stuck open (component failure - event C )
when the pressure decreased below its critical level. All what
happened between events and C was a physical functional transient
conditioned by the boundary condition A, with a pattern familiar to
the operator. However, because of event C a loss of coolant incident
(small LOC A) through the pressurizer relief valve was initiated: the
operator failed (event F) to recognize this event for a time
sufficient to provoke a partial core melt-down despite of later
correct interventions. All actions before the correct diagnosis had in
common the wrong representation of the event being faced with.
Now the event F was not simply a wrong response of the operator
to the actions demanded from him, which could be explained by an
"independent" human error. F was in some way "functionally linked"
with two other events having previously occurred in the design of the
hardware and software of the man-machine interface. Namely, the
operator did not recognize that the pressurizer relief valve was stuck
open because the corresponding signal in the control room indicated
that the valve closed. However, by a design error (event D ) , the
sensor for the control room indicator monitored only the presence of
the command to close and not the actual valve state. Furthermore, the
dynamics of the process had been too poorly analyzed (event E) with
respect to small LOC A occurrencies and, therefore, the training and
the operational procedures were not adequate under these
circumstances. The event F was clearly dependent on D and E. Even if
some other previously occurred events might let suppose that the plant
overall management might have introduced some other more general
dependency structures among the events, by limiting the analysis to
the description presented above, the events and C have to be
considered as independent random failures, as well as A has to be
considered independent of F, D and E. However, the event A in itself
and the events F, D and E are representative of some typical C C F
categories included in Table II and are worth of being discussed with
respect to the most suitable approaches to be adopted for them in a

43 -
reliability assessment.
Possible multiple failures, caused by human interventions in test
and maintenance operations (events like A) can be identified through
an in depth task analysis of the corresponding procedures and of the
man-machines interface. It might also be possible to estimate some
probability figure for mutiple failures by using the methods proposed
for human reliability assessment; these however still give rather
uncertain results. Use of field data should therefore be recommended.
In practice, the inclusion of these events in the generic dependency
classes to be dealt with via parametric models might also be useful.
The existence of dependency structures between failure of
diagnosis (F) and hardware and software of the man-machine interface
and, therefore, correction of possible design (Deprocedure (E) faults
can be investigated by the tools described at Ref.(15)
characteristical for the assessment of human reliability. The major
difficulties in such an assessment are the dynamic aspects both of the
process and of the operator interventions; these can be better
experimentally investigated via replica simulators or can be modelled
via proper dynamic tools (16) which, however, are still in a
development phase. On the other hand a recent investigation project on
status of human reliability assessment has shown the existence of
large uncertainties in the human reliability quantitative estimation,
whereas a good consistency in identification of possible faults in
procedures and man-machine interfaces (17,18) does exist.

4.4 Residual Class of Potential Dependencies

After having analyzed the system in order to identify possible


dependencies for which explicit treatment has to be made, further
attention must be devoted to other factors which might additionally
provoke multiple related failure events. Indeed the operational
experience of redundancy systems indicates that, despite the defences
normally built in against their occurrence, a series of events have
the potential to link the performance processes of items assumed to be
"independent". Whereas from a mathematical point of view it would be
possible to achieve an unlimited high reliability by increasing the
number of redundancies, in the praxis these common factors will always
put limits to what is really achievable. This class is constituted of
events like:
- human errors during tests and maintenance operations enhancing
stress on components or leaving the systems unavailable;
- fabrication or material defects which might appear on components
having in common pieceparts of a same fabricant;
- environmental conditions which might enhance corrosion and wear
of a certain group of components;
- incompleteness and errors in the system analysis and modelling
(e.g. lack of identification of the existence of functional
dependencies among components and systems such as those described
previously, or of inadequateness in design, procedures, etc.);
- construction and installation faults not detected during the

44
commissioning tests or normal operation which may become evident
under particular system demands.
Some of these faults present a "debugging" pattern similar to that
encountered when dealing with software reliability. It would be
theoretically possible to analyze and model each specific factor.
However, because of the paucity of the data for each single
phenomenon, and of the very significant increase in size and
complexity of the resulting system model, it is much more cost-
effective to include all these residual factors within the so-called
implicit or parametric models which are the subject of several further
chapters of this book.

REFERENCES

(1) A.M. Games, P. Martin and A. Amendola: 'Multiple related


component failure events', Reliability '85, Birmingham 10-12 July
1985.
(2) W.C. Gangloff and T. Franke: 'An engineering approach to common
mode failure analysis', Proc. Seminar on Development and
Application of Reliability Techniques to Nuclear Power Plants,
Liverpool (1974).
(3) G.T. Edwards and I.A. Watson: A Study of Common Mode Failures,
SRD R146, UKAEA, July 1979.
(4) I.A. Watson: 'Review of common cause failures', NCSR R27, UKAEA,
July 1981.
(5) G.L. Crellin et al., 'A study of common cause failures, phase II:
a comprehensive classification system for component fault
analysis', LASA-EPRI-NP-3837, January 1985.
(6) P. Humphreys: 'Design defences against multiple related failures'
(in this same book).
(7) D.P. Wagner, C L . Cate and J.B. Fussell: 'Common cause failure
analysis methodology for complex systems', in Nuclear Systems
Reliability Engineering and Risk Assessment (J.B. Fussell and
G.R. Burdik, eds.) SIAM (1977).
(8) Los Alamos Technical Associates Inc.: 'Data Benchmark test of a
classification procedure for common cause failures', prepared for
EPRI-LATA-EPRI-02-02 (Rev.l), June 1983.
(9) A.M. Games, A. Amendola and P. Martin: 'Exploitation of a
component event data bank for common cause failure analysis',
ANS/ENS Top. Meeting on Probabilistic Safety Methods and
Applications, San Francisco, Ca (USA), 24-28 February, 1985.
(10) CEC-JRC: 'The component event data handbook', Technical Note
No.1.05.01.86.66, Ispra, 1984.
(11) P. Humphreys, A.M. Games and N.J. Holloway: 'MRFs from the
analysis of component data' (in this same book).
(12) A.M. Games: 'A review of the latest developments in the LATA/EPRI
common cause failure study', Technical Note No.1.05.61.84.188,
JRC-Ispra, 1984.
(13) G. Apostolakis and D. Moieni: 'On the correlation of failure

- 45
rates', Proc. of the 5th EuReDatA Conf. on Reliability Data
Collection and Use in Risk and Availability Assessment,
Heidelberg, April 9-11, 1986, Springer Verlag.
(14) USNRC: 'PRA procedure guide: a guide to the performance of
probabilistic risk assessments for nuclear power plants',
NUREG/CR 23000, April 1982.
(15) I.A. Watson: 'Human factors in reliability and risk assessment in
reliability engineering', Proc. of the Ispra Course held at
Madrid, September 22-26, 1986 (A. Amendola and A. Saiz de
Bustamante, eds.) Kluwer Academic Publishers (1988).
(16) A. Amendola, U. Bersini, .C. Cacciabue and G. Mancini: 'Modelling
operators in accident conditions: advances and perspectives on a
cognitive model', in Cognitive Engineering in Complex Dynamic
Worlds (E. Hollnagel, G. Mancini and D.W. Woods, eds.) Academic
Press, 1988.
(17) A. Poucet: 'Survey of methods used to assess human reliability in
the human factors reliability Benchmark exercise', in Accident
Sequence Modelling (G.E. Apostolakis, P. Kafka and E. Mancini,
eds.) Elsevier Applied Science, 1988.
(18) A. Poucet and A. Amendola: 'State of the art in PSA reliability
modelling as resulting from the international Benchmark exercises
project', NUCSAFE 88 Conf., Avignon, October 2-7, 1988.

46 -
DESIGN DEFENCES AGAINST MULTIPLE RELATED FAILURES

Humphreys
National Centre of Sytems Reliability
UKAEA
Wigshaw Lane
Culcheth
Warrington
WA3 4NE

ABSTRACT. The causes of multiple related failures (MRF's) are


considered, and the development of defensive strategies to counteract
MRF's is described.
The use of a defensive strategy combined with MRF's causes to
assess plant unavailability during the Common Cause Benchmark Exercise
Is discussed.
The results of recent developments in cause/defence modelling are
described.

1. SC OPE OF THE LECTURE

This paper discusses the concept of Multiple Related Failures otherwise


known as Common Mode Failures or generally as Dependent Failures.
In order to differentiate between independent and dependent
failures, an all encompassing definition of dependent failures is
presented.
The history of the development of design defences in the UK is
described, from the initial assessment of common mode failures by
Edwards and Watson (1), up to the most recent work on defensive
strategies and modelling which is being undertaken at SRD.
Some examples of dependencies, are provided, taken from the nuclear
and aircraft industries.
The development of the defensive strategies is seen primarily from
the viewpoint of the nuclear industry, although, the strategies
discussed are equally valid for application to high reliability systems
designs in nonnuclear applications, such as civil engineering or
chemical industries.

47
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 47-70.
1989 ECSC, EEC, EAEC. Brussels and Luxembourg.
2. DEFINITION OF MULTIPLE RELATED FAILURES

Multiple related failures concern the possibility that a system or


mission failure involving multiple item failure may occur due to a
"common cause", ie the loss during some critical period of multiple,
redundant paths, component parts or functions due to an underlying
common mechanism, fault or phenomena.
In order to differentiate between independent, random failures and
Multiple Related Failures, the term Dependent Failure (DF) is
introduced, where:

Dependent Failure (DF): The failure of a set of events, the


probability of which cannot be expressed as the simple product of
the unconditional failure probabilities of the Individual events.

Included in this category are:

Common Cause Failure (C C F); This is a specific type of dependent


failure where simultaneous (or near simultaneous) multiple
failures result from a single shared cause.

Common Mode Failures (C MF): This term is reserved for


commoncause failures in which multiple equipment items fail in
the same mode.

Cascade Failures (CF): These are propagating failures.

Using the above definition of Dependent Failure, all other


failures can be classed as independent failures, where the probability
of failure of a set of events is expressible as the simple product of
individual event unconditional failure probabilities.

P P x P
AB = A B C D
However since we know that multiple related failures occur, we
must express the possibility that a set of occurrences are not
necessarily independent in relation to one another.
The equation

P P x P
AB A (B/A) (2>
implies that the system failure probability P._ is the probability of A
failing, times the probability of failing, given that A has already
failed thus taking some account of the dependency issue. If the
failures of A and are independent then equation 2 reverts to the form
of equation 1.

48
3. THE DEVELOPMENT OF DEFENSIVE STRATEGIES AGAINST MRF's

In order to understand how to defend against MRF's, the cause, means of


propagation through a system and impact on the system being analysed
must be well understood. These Issues were addressed by Edwards and
Watson in a study of common mode failures (1).

3.1 The Study of Common Mode Failures

The study (1) addressed the problems of identification of the common


failure mode, the uncertainty of the reliability models used, the
availability of data required for the solution of the models, and the
possible rarity of CMF events.
In support of the study, data was obtained on events reported at
nuclear plants and on airline accidents. This data was analysed to
determine the types of common mode failure to which systems employing
redundancy in design, were susceptible. From this analysis arose a
general definition of the term 'Common mode failure'.
The types of events which were identified as CMF's leading to a
system failing to perform its intended function were:-

i the coincidence of failures of two or more identical


components in separate channels of a redundancy system, due to a
common cause,
ii the coincidence of failures of two or more different
components in separate channels of a redundancy system due to a
common cause,
iii the failure of one or more components which result in the
coincidence of failures of one or more other components not
necessarily of the same type, as the consequence of some single
initial cause,
iv the failure of some single component or service which is
common to all channels in an otherwise redundant system.

3.2 Development of a Defensive Strategy

Using the data derived in the CMF study, a comprehensive classification


scheme was developed from the initial typing of events. In essence CMF
causes were attributed to either Engineering (E) or Operational (0)
roots, with sub-classification of Design (ED), Construction (EC),
Procedural (OP) and Environmental (OE) as shown in Fig 1.
Mitigating factors, or defences against common-mode failures were
then evaluated. It was considered essential that in the design and
operation of systems the general policy must be the prevention of
common-mode failures, or at least the minimisation of their frequency
of occurrence and their effects upon the system. To support this
objective, the report identified the various defences which were
available to the designer and operator, providing advice on the points
which needed to be considered both in design and operation of a system.
The advice is shown graphically in Fig 2.

49
l p CMP. Causes

Oeeigw
Manufactuts
RNMMUM
(ECM1
(EDRI

o Channel 'nedeouate
Dependency duality
Control
Common
Ooeretton A inadequate
Protection Siendards
Component j
inedequete
Operationel Inepeciion
Deficiencies
Inadequate
'naoeauete Testing
Components

Devon Errors


Limitations

Fig. Classification of CommonMode Failures


FIG. 2 OVERALL COMMON-MODE FAILURE DEFENSIVE STRATEGY
3.3 Refinements

Subsequent work by Bourne et al (2) developed the concept of Defences


against Common Mode Failures in Redundancy Systems and formulated a
guide for management, designers and operators. The concept of
common-mode failures was restated and the term "common-cause failures
(CCF)" was introduced. C MF's were considered to be a sub-set of CCF's
by some, while an alternative opinion is that the term CCF and CMF are
synonymous. This dilemma over definition was later resolved by the
introduction of the term 'Dependent Failures'.
The classification of common-mode failures used in (1) was
retained with the following changes. Under the heading Engineering
Design (ED) functional deficiencies (EDF), the original titles of
"Hazard undetectable" and "Inadequate Instrumentation" were replaced
with "Logical Error", "Inadequate Measurement" and "Inadequate
Response" in order to clarify the type of deficiency which could
arise.
The advice given in (1) concerning defences against CMF, was
refined, and presented under the following headings:-

r m s jrsTtH
io*
A
li' , io
1
>. l i ' ' li *
9
l" io"'


=
1 [

10

io
10
li'

TYPIS t SYSTIH

* Sia!, Hftflt tytt

S Siaplt, rtduntfonl lystra

C Portly intru syilia

0 Fully dit t r i t yitta

f T uatratt lytliai
I MCh I

Fig. * A guide to the unreliabilities of various system arrangements

52
TABLE I

Recommendations for CMF Defences

Management: Engineering control


Design review
Construction control
Operational control
Reliability performance monitoring

Technical: Engineering principles:


Functional diversity
Equipment diversity
Fail-safe design
Operational interfaces
Protection and segregation
Redundancy and voting
Proven design and standardisation
Derating & Simplicity

Construction quality control:


Construction standards and procedures
Inspection
Testing & commissioning

Operational procedures:
Maintenance
Proof testing
Operation

Reliability assessment

The report also related the list of defences to the CMF cause
classification of Fig 1, indicating which of the defences could
counteract each of the specific CMF cause categories (Fig 3 ) .
The impact of system design as a major defensive measure was
assessed and a guide to the possible system failure probability was
provided. This indicated the benefits to be gained in moving from a
single channel system, to implementation of system redundancy and then
system diversity (Fig 4 ) .

4. THE CCF RELIABILITY BENCHMARK EXERCISE

The advice given in SRD-R-196 (2) has been used in the UK as a means of
aiding the system designer in the achievement of reliable systems
design. The CCF-RBE provided an opportunity for the safety analyst to
use the advice of reference 2 to develop a framework for plant
assessment and at the same time to use the CMF defensive strategy as
part of the modelling and quantification of unavailability.

- 53 -
CMP OEF ENCES CMP CAUSES

PROJECT STAF F LEVEL ENGINEERING OPERATING


DESIGN CONSTRUCTION PROCEDURAL ENVIRONMENTAL
MANAGEMENT TECHNICAL
EOF EDR ECM EC I OPM OEM OEE
DESIGN CONTROL X X X X X X X X
DESIGN REVIEW X X X X X X X X
ENGINEERING PRINCIPLES
FUNCTIONAL DIVERSITY X X X X X X X X
EQUIPMENT DIVERSITY X X X X X X X
FAIL-SAFE DESIGN X X X X X
OPERATIONAL I N T E RF A C E S X X
PROTECTION t SEGREGATION X X X X X X
REDUNOANY t VOTING X X
PROVEN OESIGN t STANDARDISATION X
DERATING t SIMPLICITY X X
CONSTRUCTION CONTROL X X
QUALITY CONTROL
TESTING t COMMISSIONING X X X
INSPECTION X X
CONSTRUCTION STANDARDS X X
OPERATIONAL CONTROL X X X X
RELIABILITY MONITORING X X X X
OPERATIONAL PROCEDURES
MAINTENANANCE X X X
PROOF TEST X X
OPERATIONS X X X
RELIABILITY ASSESSMENT X X X X X X X X

FIG.3 COMMON-MODE F AILURE DEF ENCES RELATED TO THEIR CAUSES


CMF CA USES CL
A SSIFIC
A TION PARTIAL FA CTOR
CHF DEFENCES
MINIMUM JlfStIO
EOF EDR ECM ECI OPM OEN OEE
Pu*
DESION CONTROL

OESIGN REVIEW

FUNCTIONAL DIVERSITY

EQUIPMENT OIVERSITY

FAILSAFE OESICiN 1
OPERATIONAL INTERF
A CES

PROTECTION I SEGREG
A TION

REOUNOANCY t VOTING

PROVEN OESIGN t STA NDA RDISA TION

DERATING t SIMPLICITY

CONSTRUCTION CONTROL

TESTING I COMMISSIONING

INSPECTION

CONSTRUCTION ST
A NO
A ROS

OPERATIONAL CONTROL

RELIABILITY MONITORING

MAINTENANCE

PROOF TEST

OPERATIONS

0.001

FIG. SUBSYSTEM FA CTOR * " , i


4.1 The Application of the Partial Factor Model

The defensive strategy (1) was reviewed and a list of nineteen CMF
defences was defined. The target plant for the CCF-RBE was then
assessed against each of the major CMF causes shown in Figure 1. The
effectiveness of each of the defined defences in combatting the common
mode failures was then assessed. The partial factor dependent
failures model developed by G Edwards (3) was then used to quantify
the effectiveness of each defence, and thus arrive at a value for
system unavailability. Figure 5 provides an example of the format of
the defence/cause link obtained thorough the application of the Partial
Factor model.

5. RE
C ENT DEVELOPMENTS IN DEFENSIVE STRATEGIES AND DEPENDENT FAILURES
MODELLING

The National Centre of System Reliability is engaged in ongoing


research into the development of assessment and modelling techniques
relating to dependent failures (4,5). As part of that work, the
defensive strategy discussed in Section 4 has been revised. The
philosophy of the new defensive strategy is based on the premis that
dependent failures occur when some trigger event, (root cause), occurs
and one or more coupling mechanisms are present on the plant. A
trigger event is some category of component failure which could be the
total failure of the component or some internal failure, human error in
operations, test or maintenance environment including all external
influences. The coupling mechanisms are those features of sameness
which may be expected to couple or link component failures caused by
the trigger event(s).
Some examples of triggers (root causes) and coupling mechanisms
are given in Figures 6 and 7.
The defences against the coupling mechanisms (sameness) are to
make things different or diverse. For example where a coupling
mechanism leading to a dependent failure was "same location" of
components, the defence could be "different location". Some further
examples of defences against coupling mechanisms are shown in Figure
8.
The defences against triggers are not identical to those for
coupling mechanisms since the root cause of a failure and the
conditions for propagation of that failure as a dependent event may be
expected to have different attributes. Some examples of potential
defences against triggers are presented in Figure 9.
A Trigger-Coupling Mechanism (TC M) Model is being developed to
relate the interaction between triggers, coupling mechanisms, and the
defences which can be applied to protect plant against MRF's. This TCM
model, together with a new dependent failures database, will be used in
the assessment of plant designs in order to obtain a clearer picture of
the effectiveness of design defences against MRF's.

- 56 -
6. C ON
C LUSION

The accumulated experience of the design of high Integrity systems


combined with improvements In component reliability has enabled the
Impact of Independent failures to be reduced to an acceptable level.
The dominant failure mechanism are now seen to be those relating to
dependent failures. In order to protect against dependencies, It Is
first necessary to understand the causes of the dependencies. When the
Information Is known defences can be developed to counteract the
dependency by preventing the trigger mechanism occurring or by breaking
the coupling mechanism which would allow the event to propagate and
cause system failure. Such defences as are required should then be
incorporated high integrity systems designs as an Integral part of the
system design, rather than as an afterthought added at some later stage
of plant operation.

REFEREMCES

1 Edwards, G & Watson, I A "A Study of Common Mode Failures".


UKAEA, Report SRD-R-146, July 1979.

2 Bourne, A J, Edwards, G , Hunns, D M, Poulter, D R & Watson, I A.


"Defences against common-mode failures in redundancy systems".
UKAEA, Report SRD-R-196, January 1981.

3 Edwards, G T. 'The Partial Factor Model'. UKAEA, Unpublished


notes.

4 Ballard, G M. "Dependent Failures Analysis in PSA". IAEA


Conference on Nuclear Power Performance and Safety, Vienna,
Austria, 28 September - 2 October 1987. IAEA-C N-48/202.

5 Games, A M. "Dependent Failures: Data and Database" 1987 SRS


Associate Members Meeting, Warrington, England, 29-30 October.
RTS 87/41.

- 57 -
Pre-occupational Failure Causes

DOO Design
DIO Design Requirements/Specifications Inadequacy
D20 Design Error or Inadequacy

D21 Channel dependency

Common component or service


Common defect in design
Common vulnerability

D22 Inadequate facilities for operation provided


D23 Inadequate facilities for maintenance provided
D24 Inadequate facilities for test provided
D25 Inadequate facilities for calibration provided
D26 Inadequate components
D27 Inadequate design quality assurance activity

D30 Design Limitations

D31 Financial design limitations


D32 Spatial design limitations

F00 Manufacturing
FIO Manufacturing Error or Inadequacy

Fll Failure to follow (manufacturing) instructions


F12 Inadequate manufacturing control
F13 Inadequate (manufacturing) inspection
F14 Inadequate (manufacturing) testing

COO Construction
CIO Construction, Installation and Commissioning Error or
Inadequacy

Cll Failure to follow (construction) instructions


C12 Inadequate construction control
C13 Inadequate (construction) inspection
C14 Inadequate (construction) testing

Figure 6. Root cause of component unavailability

- 58
Operational Failure Causes

MOO Maintenance
MIO Failure to Follow Maintenance Procedures

Mil Failure to follow repair procedures


M12 Failure to follow test procedures
M13 Failure to follow calibration procedures

M20 Defective Maintenance Procedures

M21 Defective repair procedures


M22 Defective test procedures
M23 Defective calibration procedures

M30 Inadequate Maintenance Supervision

M31 Inadequate maintenance supervisory procedures


M32 Inadequate maintenance supervisory action
M33 Inadequate maintenance supervisory communication

M40 Peer Maintenance Communication Problem


M50 Inadequate Maintenance Training

000 Operation
010 Failure to Follow Operating Procedures

011 Misdiagnosis (following wrong procedures)


012 Accident (omission of action)
013 Accidental (commission of action)

020 Defective Operating Procedures


030 Inadequate Operating Supervision

031 Inadequate operating supervisory procedures


032 Inadequate operating supervisory action
033 Inadequate operating supervisory communication

040 Peer Operator Communication Problem


050 Inadequate Operator Training
060 Contractor/Other Personnel Activity
070 Inadequate Human Environment

Figure 6. continued

59
SAME:

100 Management and Supervision


Management and Supervision of 110 Specification
120 Design
130 Manufacture
140 Installation
150 Operation
160 Maintenance
161 Test
162 Calibration

200 Procedures
Procedures relating to 210 Specification
220 Design
230 Manufacture
240 Installation
250 Operation
260 Maintenance
261 Test
262 Calibration

300 Staff (Human action)


Staff (Human action in) 310 Specification
320 Design
330 Manufacture
340 Installation
350 Operation
360 Maintenance
361 Test
362 Calibration

400 Equipment used


Equipment used in 410 Specification
420 Design
430 Manufacture
440 Installation
450 Operation
460 Maintenance
461 Test
462 Calibration

500 Quality assurance


Quality assurance in 510 Specification
520 Design
530 Manufacture
540 Installation
550 Operation
560 Maintenance

Figure 7. Coupling mechanisms

60 -
561 Test
562 Calibration

600 Location

700 Environment (Ambient Conditions)

800 Hardware
Hardware 810 Manufacture
820 Type
830 Design Principle

900 Timing

Figure 7. continued

- 61
DIFFERENT:

100 Management and Supervision


Management and Supervision cof 110 Specification
120 Design
130 Manufacture
140 Installation
150 Operation
160 Maintenance
161 Test
162 Calibration

200 Procedures
Procedures relating to 210 Specification
220 Design
230 Manufacture
240 Installation
250 Operation
260 Maintenance
261 Test
262 Calibration

300 Staff (Human action)


Staff (Human action in) 310 Specification
320 Design
330 Manufacture
340 Installation
350 Operation
360 Maintenance
361 Test
362 Calibration

400 Equipment used


Equipment used in 410 Specification
420 Design
430 Manufacture
440 Installation
450 Operation
460 Maintenance
461 Test
462 Calibration

500 Quality assurance


Quality assurance in 510 Specification
520 Design
530 Manufacture
540 Installation
550 Operation
560 Maintenance

Figure 8. Defences against coupling mechanisms

62
561 Test
562 Calibration

600 Location

700 Environment (Ambient Conditions)

800 Hardware
Hardware 810 Manufacture
820 Type
830 Design Principle

900 Timing

Figure 8. continued

- 63 -
100 Design

110 Methods and Control


111 Administration (Procedures)
112 Standards - derating, simplicity, use of proven
designs, fail safe designs etc.
113 Maintenance and manufacture considerations - foolproof
assembly logic
114 Operational considerations - foolproof operation,
interlocks etc
115 Environmental control
a Industrial - absence from interference by other
machine, personnel
b Natural - protection from hostile conditions eg
heat, moisture
116 Design review - independent assessment, reliability
analysis and CMF check
117 CMF education, awareness by design staff
120 Redundancy and voting

200 Manufacture and Installation

210 Manufacture/construction
211 Materials and component quality control
212 Quality control of manufacturing methods and standards
used etc
220 Installation
221 Inspection
222 Commissioning - full functional and interface testing

300 Operation

310 Operating procedures review


320 Operator training
330 Adequate operator supervision
340 Reliability monitoring

400 Maintenance and Test Inspection

410 Test
411 Proof testing
412 Extended testing
413 Routine inspection - plus post breakdown and maintenance
werk
414 Condition monitoring
420 Preventative maintenance
430 Maintenance procedures review
440 Maintenance personnel training
450 Adequate supervision of maintenance personnel

Figure 9. Potential defences against triggers

- 64 -
TULf A . . 1 (ShMt 1 of 4 )

APS coow maia failure

Ocearraa Reactor C I U M c I M I f I c e t lon P e l l re Suk


date Paci11ty TP Occuroc CM syet.
Hel Mcondary Degree Category

17.11.74 n t w N1U Rod drop C M M d power e l l t hut Design e r r o r . Buffer a n e l i H e r COP Inadequacy Parti! Daagar 1
blw 1 power lakelance detector cow14 e l d not peral t t h i c h e a g . Mee
not be M t Low 72Z Co require effected ednUaletrtive l y .
MX

13. 2.75 Calvert C l i f f 1 Two higa power Crip cKanael H o l aplkea froei two larga Dependency Partial Daagar
roMt autoaatlcally r e l a y In l i a cabinet

29. t . 7 3 U M la 1 latea Drywell IPS end KCCS pressura Condensat In aenalng linea incapability Partial Daagar
w i t c h o p e r a t oute loe eue to teaaereture difference

12. 4.74 Irvaavlcfc 2 3 reactor lew l e v e l nwltchea Design he ued high level rror Partial
oat of c a l i b r a t i o n witch la error

19. 1.74 Davis Aaalogua e a p l l f l e r aad Drawing aaed to l n a t a l l During


b i s t a b l e la M f a t y feature quipaw t war Incoaplete construction
a c t a a t l o a s y t M had had
da f t c l e a t e l e c t r i c a l ground

14. 1.74 Ulltoa 2 Cora p r o t e c t l o a c a l c u l a t o r aay Dealgn w i r i n g e r r o r on Input Irror


I to edder/subetracter unit
not ganarat conservative
awaker f a r a l l plaat c o n d i t i o n

t. 2.74 P r a i r i e l e lead I PW Both IP high f l u x c h a a a e l Detector crrente change inadequacy To be c a l i b r a t e *


t r i p p e at 101 Ina tad of 23Z s i g n i f i c a n t l y w i t h Ciao at each a t a r t u p
at s t a r t up
22. a . 7 ) P r a i r i e l a La ad 2 Both IP high f l a x channel Detector c u r r e n t charge a l g n l RDR Inadequacy Partial Deader To be c a l l bra tee
t r i p p e d at 101 1 atead of 2 f leantly with tlaa et each a t a r t u p
at a t a r t u p

l.11.75 Zloa 2 2 of 3 p r e a u r i s e r l e v a i t rana Zaro s h i f t In t r a n s e l t t a r a Inadequacy


a i 11 re out of c a l i b r a t i o n being I n v e s t i g a t e d
(repetitive proble)

21. 1.74 11 R Robin Delayed operat loa of two B r i t t l e c o l l l i n e r apparently EDR Inadequacy
reactor t r i p r e l a y causad by battery charging
voltage e f f e c t

D e l t e f l u setpoints In e r r o r I n t e r f a c e drawings In e r r o r ASO


arstor anlfuoctlon c a u s i n g ground loop

29. S.75 HILletoaa 2 put During t a s t i n g , v a r i a b l e Dealgn d e f i c i e n c y ceueed n o l CDR Partial Daagar
CE aetpoint reaped up w i t h I n t r i p c i r c u i t to raset peak
detector

29. 4.7) Rancho Saco I S e t p o l n t e r r o r for f l u x / One point of "offset versi Partial Danger
labelence/flow t r i p power' during conversion
calculation

Appendix 1 Example:*
TABLE . 1.1 (Sheet 2 of <>)

Occurreace Reactor Cauae claaalfLcatlon Failure Sub


dat Facility typa Occurrence cauae yatea
Hala Secoadary Degree Category

29. 7.72 1 ( Rank Point nn Indication loat froa 2 atari Faulty cable between chaaber EDB Dependency Coaplete Danger
CE up channela and chaaber drive unit

27. 5.72 Oraaden 2 BUt 2 reactor trip relay fro 1SV Liait switch 'hanging up" CM Dependency coaplete Danger
did not operate on teat Internally under teat condition

22.12.72 Haddaa Hack PUR When one breaker racked out It Interlocking circuit dealgn BDI Dependency Coaplete Danger Occurred on
prevented other froa operating error pairs uf or
for anelli*,
services

23. 8.76 San Onorr I 3 feedwater control ayatea on F a i l u r e of r e c t i f i e r a in 13 EDt Dependency Coaplete Dangar
backup source, which waa de v o l t a power supply
graded euch that flow trips were
Inoperable

9.12.76 Calvert Cliff 2 4 preaaure awl tche had ex Systea design Trip Safe
cessive vibration, wired
different to other switches and
2 leaked fluid

1 16. 1.7 Hadda nack PU Reactor scraaaed froa 2 spurious Inatruaent sensing lines froten E DK Error Trip Safe I
UE ST high stsaa flow signals
S
1 20. 1.76 Davlalaaaa 1 PUR Analogue a a p l l f l e r 420 aA input Calibration lnforaatlon froa Quality Control I During
signals did not correspond w i t h Instruction aanual Incorrect construciei
t r i p b i s t a b l e aet pointa

Dreadan 1 BUI 4 chaaber a a p l l f l e r for incore Pinched wires In defective Inspection Coaplete Danger I
ce nucleonic inatruaenta f a l led transformera

runawtck 2 BUt A l l 4 Drywell high preaaure Callbrstlon error Coaalsslonlng Partial I During
CE witches out of c a l i b r a t i o n coaaisslonlf

D C Cook 1 PUK S t a t i c head c o r r e c t i o n of 25psl Correction subtracted lnatead Coaalsslonlng Partial 1


BST on pressuriser pressure t r a n a of added. Procedures changed
a l t t e r a incorrect for 2f yeara

Varaont Tankaa 1 BUR 3 Drywell pressure switches not Open ended tees between InaiailatIon Coaplete Danger I
CE operative Isolation valves and switches

Salan I PUR Both IP nuclear power channels Prior wetting of detectors During
Coaalsslonlng Coaplete Danger I
UEST Inoperable caused Increased connector coatmlsaloAlr
resistance

14. 5.72 Hontlcello 1 Average reactor power aonltora Incorrect calibration proce
Procedure Partial Danger 1
Indicating lower than theraal dures during startup
power

26.10.72 Varaont Yankee BUt 2 stcaa flow DP switches failed Suspected locking devices Test Ing Partial Danger I
to operate at required value caused setpolnta change
TULI .).I (Shet 3 of 6 )

Reactor Cauae c l a a a l f 1 c a t l o Pallure


Facility typ Occurrence Occurrence cauae
Hain Secondary Degree Category

D.M.73 Oconee 2 Reactor coolaat preaaure c r i p Set according to Oconee 1 which Procedure Partial Danger
Oconee 1 e t t i n g a lov were alao In error

4 . 9.71 Point l u c k I PW IP and I I P nuclear Inetruacnte Incorrect c a l i b r a t loa Parcial Safa


WEST low t r i p at11a*a high procedur

I. 2.7* PWt High f l u a t r i p e t t i n g oe IP C a l i b r a t i o n ba d on t he r e a l Procedure Partial Danger


ce chanola high power In e r r o r

1. 3.7* Reactor power/flow c h a n o ! C a l i b r a t i o n procedure e r r o r Procedurea Partial Danger


c11 brat lo Incorrect

24. 2.74 MlitdM PVI Two puap operation low ( l u x Operational d i f f i c u l t i e s prevent Procedurea Pertlal Danger
cc t r i p atpointa a l l g h t l y lo e r r o r adequate t e a t l n g

ti. 3.74 Pallad 1 3 of 4 contalnaent high preaaure C a l i b r a t i o n procedure not Procedurea Partial Danger
ce switch aetpolnta e r r o r allowing aargln for d r i f t

PVI 4 coolant preaaure tranealtter Set point d r i f t due to long Supervlalon Partial Danger
c a l l brat loo errore (24Z) period without t e a t l n g

Cooper 1 RWR 3 reector preaaure awl tchea Mechanical I n t e r f e r e n c e of pair Partial Danger
operated at Incorrect preaaure cover p l a t e with awltchee

I ? . 4.75 Pallaadaa 1 PWR 3 of 4 reactor preaaure awl tchea Incorrect c a l i b r a t i o n Procedurea Partial Danger
J CC c a l i b r a t i o n e r r o r (51) procedure
I
Three H U A l l 4 high power t r i p aet Confusing c a l i b r a t i o n Procedure Partial Danger
17. 4.75 Island 1 ce pointa Incorrect (4Z) procedure
23. .75 Oreaden 3 PWt 1 reactor preaaure evltchea had Drift due to extended period at Supervlalon Partial Danger
high eatpolnte (5Z) tero preeaure and no teat

P r a l r l a I eland 2 Roth IP channel t r i p e operated Not known why aetpolnta In error OPa Tee ting Partial Danger
at high value during a t a r i up
(5X)

Surry 1 3 preaaurlaar high preaaure Calibration procedurea not In Partial Safe


awltchee had aetpolnt errore accordance with aanufacturer

10. 5.74 La Croaaa Reactor preaaure and water l e v e l Teet procedure aeeuai linear
channel had non l i n e a r i t y which relationship
reaulta In erron oua t r i p pointa

22. .75 PUgrla 1 3 turbine preaeure awttchea aet Teat procedurea error Procedure Partial Danger
pointa e r r o r

7. 5.76 Cooper 1 3 IRH'a could not be aet at IRH'a a e n a l t l v l t y decreaaea w i t h Procedurea


I 2 0 / I 2 5 Z rSO on range 10 with expoeure procedurea changed
reactor t e than 45Z power d e l e t i n g check

1. 6.76 Kewaunee 1 luabar undervoltage t r i p relaya Procedure changed to give Procedurea Partial Danger
a e t t l n g had l a i l l errore tolerance for d r i f t

PWR 4 power range d e t e c t o r Delta I Detectora were being rcacaled < Supervlalon
WEST l u e u t o r i had c a l i b r a t i o n errore but Delta I auaaatora were
Inadvertently o a l t t e d .
TABLE .4.1
World airline accidents - CMF summary
(Sheet 1 of 12)

NB: The information contained in this Table Sheets 1 to 12 except the last
column (CMF class) was obtained from Reference 29 of the main report

Date Aircraft Airline Failure event description Damage CMF


class

30. 3.59 C46R Riddle Unguarded light bulb ignited D O.EE


combustible cargo in hold. The (E.DF)
resulting fire burned through
pipes carrying inflammable fluid
and ignited fluids. The flight
control system was destroyed.

12. 4.59 DC7 American No 2 propeller oversped and S E.DR


couldn't be feathered. It dis- (O.EE)
integrated and pieces broke open
No 1 engine sump causing loss of
lubricating oil.

28. 5.59 C46 Capital Retaining bolt on main landing S E.DR


gear (MLG ) failed, damaging micro-
switches which gave a false "safe
gear" indication. The MLG collapsed.

12. 5.59 DC3 Real Both propellers mal-functioned. S ?


Aerovas

14. 5.59 DC8 Douglas Test flight. Error in specified S O.PO


landing technique. No pilot error.
FAA test pilot.

26. 6.59 L-1649A TWA Thunderstorm about 101* feet D O.PO


altitude. Scattered storms (0.E0)
reported before take-off. Nos 6
and 7 fuel tanks exploded.

6. 7.59 L-10496 TWA No 2 turbine wheel retaining nut S O.EE


failed, eventually damaging No 1 (E.DR)
engine oil radiator and causing
a fire.

10. 7.59 C-46 ALN Port propeller broke, causing ex- S O.EE
SA cessive vibration and fire in (E.DR)
the fuel tanks. (Engine fragments
also seriously injured the captain).

21, 8.59 C-46 Sourdough Failures of all hydraulics

1959 Total number of accidents 206. No of CMF - 9 - 4.372 of total.

- 68
TABLE A.4.1
World airline accidents CMF summary
(Sheet 2 of 12)

Date Aircraft Airline Failure event description Damage CMF


class

15. 5.60 C46 TAS Severe vibration in port engine, D ?


then In starboard engine.

4.10.60 L188 Eastern Ingested birds into engines 1, D E


EO.
2 and 4 during climbout.

1960 Total number of ace: nts 193. No of CMF 2 1.04X of total.

31. 1.61 C46N AAT Landing gear warning horn and S O.PM
lights inoperative.

15. 2.61 707329 Sabena Material failure of the flying D ?


controls.

12. 4.61 DC3 DOT Fuel starvation to all engines D O.PO


due to improper usage of the
fuel selectors.

24. 5.61 DC4 TAA Captain with cardiac disorder, D O.PO


tried to leave his seat and
collapsed across all 4 throttle
levers, closing them.

11. 7.61 DC8 United Failure of thrust reversers of D ?


engines 1 and 2 probably due to
hydraulic failure.

14. 8.61 Convair PBY Fuel contamination, failed both S O.PO


SA engines. Water was present E (O. N)
after 2 year nonuse. Inadequate
flight preparation.

16. 9.61 DC8 PAWA Failure of Nos 3 and 4 thrust S E .DR


reversers '0' ring seals due to
material deterioration.

17. 9.61 L188C Northwest Aileron primary control system D O.PM


failure due to improper re
placement of boost assembly.

19. 9.61 C54A Starways Mismanagement of fuel system by S O.P0


the crew.

8.11.61 L049 Imperial Mimanagemet of fuel system by D O.PO


the crew.

1961 Total number of accidents 178. No of CMF 10 5.62X of total.

69
TABLE A.4.1
World airline accidents CMF summary
(Sheet 3 of 12)

Date Aircraft Airline Failure event description Damage CMF


class

3. 3.62 DC3 Aeronaute Both port engine magnetos O.PM


defective due to inadequate
maintenance.

13. 3.62 Dakota 4 Hunting Port engine vibration port


surveys propeller then feathered,
followed by starboard engine
vibration.

20. 3.62 L1049 Avianca Unable to lower the under


carriage.

9. 5.62 DC4 Slick Two engine failure due to O.EE


extreme conditions and in (O.EN)
adequate performance from
remaining 2 engines at
prevailing temperature.

25. 3.62 707 PAWA Unable to extend nosegear


N720P

14. 8.62 720B Lufthansa Contamination of hydraulic O.EN


oil in nosegear.

11. 9.62 Vanguard E


B A Bird Ingestion in all engines, O.EE
953 causing failure of Nos 2 and 4
and low performance from No 3.

1962 Total number of accidents 194. No of CMF 7 3.6 of total.

1963 Total number of accidents 145. No of CMF 0.

70
DESIGNRELATED DEFENSIVE MEASURES AGAINST DEPENDENT
FAILURES. ABB ATOM'S APPROAC H.

S. HIRSC HBERG and L. I. TIREN


ABB ATOM AB
Office of Interdisciplinary Engineering
S721 63 Vsters
Sweden

ABSTRACT. In this contribution defensive measures against dependent failures,


applied in the ABB ATOM fourdivisional design, are described. The main focus
is on separation and diversity. Basic design principles are explained and a
review is given of the inherent protection against abnormal events, i.e. fire,
seismic hazards, aircraft crash, missiles and flooding. The inherent resistibility
against the impact of dependent failures is illustrated by findings from a
recently finalized Probabilistic Safety Assessment (PSA) of a fourdivisional
plant and by examination of some C ommon C ause Failure (C C F) events
involving diesel generators at U.S. nuclear power plants.

1. Introduction
ABB ATOM has delivered 11 Boiling Water Reactor (BWR) nuclear power plants
to utilities in Sweden and Finland. Two of the plants were turnkey deliveries.
In 1970, ABB ATOM decided to adopt a design (in the following referred to
as BWR 75 design) with four halfcapacity systems (<t 50%) in which each
division or subsystem is completely separated from its redundant counterparts.
Other important characteristic features of this concept include: integral plant
design, fine motion control rods, internal recirculation pumps, prestressed
concrete reactor containment, operational flexibility, excellent availability
performance and low occupational radiation doses.
Most of the important features of the BWR 75 design were implemented in
the first two 930 MWe units at Forsmark and in the two 660 MWe units at
Olkiluoto in Finland. Recently, the BWR 75 design concept was fully realised in
the Forsmark 3 and Oskarshamn 3 plants (1060 MWe), which went into
commercial operation in 1985. Thus, the advanced design which now serves as a
model to LWR designs being developed in other countries ("advanced BWR"), is
already proven in the operating Swedish and Finnish plants.
It is worth noting that a thorough rewiew has now been made of the BWR 75
concept for the purpose of providing the best possible nuclear power plant for
the 1990's (ref. 1). The product resulting from this effort is denoted BWR 90.
Some moderate design modifications of the BWR 75 concept have been
proposedin order to reduce costs, incorporate technological modernisation and
adapt to new safety requirements. The major improvements concern:

71
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 71-100.
1989 ECSC, EEC. EAEC. Brussels and Luxembourg.
strengthening of the capability of the reactor primary containment to
withstand the effects of a core melt accident
substantial reduction of building volume resulting in significant cost saving
extensive use of microcomputers and fibre optics for process control.
The four-divisional configuration of the safety systems was reconfirmed as
constituting an optimal arrangement with respect to safety, layout, and
maintainability.
General defensive measures against dependent failures include:
separation
diversity
fail-safe design
design review and verification
standardisation
periodic testing
administrative routines.

Note that standardisation may, however, increase the probability and impact
of certain types of Common Cause Failures (CCFs), caused by e.g. failures in
the manufacturing process.
In the following some of the defensive measures against dependent failures,
as implemented in ABB ATOM'S design and practices, will be described, with
main emphasis on separation and diversity. The importance of separation will be
illustrated in the context of protection against abnormal events. This will be
followed by an account of some findings from a recently performed Probabilis-
tic Safety Assessment (PSA) for a four-divisional plant.

2. Merits of Separation

The merits of separation as applied in ABB ATOM design, have been described
earlier in detail, from a qualitative point of view (ref. 2).
It was concluded that in cases of fire, pipe whip, missiles, flooding,
hurricanes, lightning, airplane crash and sabotage the physical separation of
structures, systems and components will effectively, although to a varying
degree, improve the safety of the nuclear power plant. Furthermore, a
consistent and logical application of the principles of physical separation also
means the absence of complicated links, interactions and interconnections
between safety related functions, which simplifies safety analysis and should
speed up licensing procedures. Some of the highlights of ref. 2 will be reflected
in the following.
The following safety related functions are subject to the "two-out-of-four"
redundancy principles:

reactor shutdown (reactor protection system function)


emergency core cooling
reactor coolant makeup
emergency residual heat removal (reactor and primary containment).

72 -
All systems supporting these functions are completely separated into four
subsystems. This means that for a process system each subdivision contains not
only mechanical components (pumps, valves, heat exchangers, piping, etc) of its
own, but also separate electrical power supply including diesel generator and its
control equipment. In addition, the fourfold division of safety systems is
consistently carried through with respect to the reactor protection system
(RPS). Figure 1 illustrates the subdivision of the emergency core cooling
system, while physical separation of safety related systems is demonstrated in
figure 2. Thus, the emergency cooling systems are situated in four separate
bays in the reactor building, adjacent to the primary containment.
Figures 3 and demonstrate that the fourfold division of safety systems is
consistently carried out also with regard to such support functions as electric
power supply and reactor protection system. The onsite standby power units
(four diesel generator sets) are situated as shown in Figure 5 which provides the
overall building arrangement. The nuclear and safety related portions of the
plant, i.e. the reactor and the control and emergency generator buildings, are
assembled around the reactor and separated from the conventional, turbine and
auxiliary portions by a wide communication area.
As far as practically possible even the operational and safety equipment
have been separated both from the physical and functional point of view. This
limits the possibility of failure propagation. As an example, the following
principles are valid for overcurrent protection switches:

each switch is connected to equipment in only one system


safety equipment and operational equipment in the same system are
connected to different switches
separate switches are used for equipment within and outside of contain
ment.

In exceptional cases when interconnections between redundant subs cannot


be avoided, e.g. in the control logic, this is done in such a way that the
possibility of fault propagation from one sub to another is minimized (typically
by use of optoelectrical devices).
As a consequence of extensive separation between redundant subsystems the
protection level of the plant is not sensitive to protection level of separate
subsystems as long as it can be shown that a sufficient number of subsystems is
unaffected by the event considered.

3. D iversity

Application of diversity by ABB ATOM as a defensive measure against


dependent events has been discussed in ref. 3; ref. 4 deals with one crucial
example, namely the arrangement of reactor shutdown in ABB ATOM'S BWRs.
A diversified design means that a specific function can be accomplished by
separate systems whose function is based on different design principles. This is
often an effective defence against the type of C ommon C ause Failures (C C Fs)
where separation of redundant equipment not always helps. This includes
systematic design, manufacture, installation and maintenance errors. A
diversified design is usually rather expensive and is therefore used only for vital
safety functions. Use of pneumatic and motoroperated isolation valves on
different sides of the comtainment is a typical example of diversity.

73
BWR 75 - EMERGENCY COOLING SYSTEMS

3 2 2 Containment vessel spray system 652 Diesel engine auxiliary systems


323 Low pressure coolant injection 712 Shutdown cooling water system
system 721 Shutdown secondary cooling
327 Auxiliary feedwater system system

Ultimate heat sink

Figure 1.
BWR 75 - PHYSICAL SEPARATION
Bottom part of the reactor building
(below the reactor containment)

Subdivision (Circuits) A
Subdivision (Circuits)
Subdivision (Circuits) C
Subdivision (Circuits) D

CL
fl

Figure 2.
BWR 75 - SINGLE LINE DIA GRA M FOR THE ELECTRIC POWER SYSTEMS

400 ( 800 ) kV 110 kV

X
A

a=L
OPERATIONAL SYSTEMS
II

10 kV

| - 660 V J - Y
J - 380/220 V L
SAFETY-RELATED
SYSTEMS
_i_ Diese'-backed systems j f

380/220 V

DC systems -JL AL-JL

Battery-backed AC system
Subdivision A Subdivision C Subdivision D Subdivision

Figure 3.
BWR 75 REA CTOR PROTECTION SYSTEM (RPS)
RPS logic

Analog signals Limit Digital signals Logic Signal exchange ,out Digital actuation
(0-10V) switch (0/24 V) channel (fibre optics) | c signals ( 0 / 2 4 V)

Subdivision A

I Subdivision
77

Subdivision C J

Subdivision D J
Figure 't.
BWR 75 - BUILDING ARRANGEMENT

121 Reactor building


122 Turbine building
123 Condensate cleanup system building
124 Auxiliary systems buildings A,

125 Entrance building


126 Control building
127 Diesel buildings A, B.C. D
128 Waste building

129 Active workshop building


131 Auxiliary cooling water buildings A,
132 High voltage switchgear building
134 Transformer building

137 Turbine cooling water systems building


145 Offgas building
148 Storage building

Controlled area
Scale: o 50 100 m
D Uncontrolled area

Figure 5.
One of the critical safety functions is the shutdown of the reactor. An
example of a common cause failure in the shutdown system is an incident which
occured at General Electric BWR plant Brown's Ferry 3 on June 28, 19S0. Due
to equipment failure in connection with normal shutdown procedures one of the
scram discharge tanks was filled up with water, which prevented the full screm
function of the 76 rods with drives connected to it. Repeated manual activation
of the scram system finally resulted in insertion of all the failing rods after
about 15 minutes.
To avoid this type of CCFs in the shutdown system, the control rod drive in
ABB ATOM'S BWRs has diverse systems for normal operation and for scram
function. An electro-mechanical system with a motor gives a continuous fine
motion for normal use and a hydraulic system is used for scrams. Figure 6 shows
the function chain diagram for reactor shutdown function of Forsmark 3 plant.

3.1. HYDRAULIC SCRAM SYSTEM


The hydraulic scram system is shown in figure 7. Fast insertion of the control
rods, upon scram signal, takes place by means of high-pressure water. The
water used for this purpose is stored in tanks, which are pressurized by high-
pressure nitrogen. Reactor scram is initiated by a signal from the Reactor
Protection System (RPS), which is applied to the scram valves located in the
lines leading to the control rod drives.
The control rods are divided up into scram groups, each equipped with its
own scram module consisting of a scram tank, piping and valve. A total of 18
such groups are provided, each comprising eight to ten rods. The rods belonging
to any one group are distributed over the core so that reactivity interference
between them is virtually negligible. The consequence of a failure in one scram
group is therefore no more serious than the sticking of a single rod.

3.2 ELECTRO-MECHANICAL ROD INSERTION SYSTEM


A diverse way of inserting the control rods is achieved by an electro-
mechanical screw transmission. The electro-mechanical drives are primarily
used for normal positioning of the control rods, but they also contribute
powerful means for inserting the control rods during transient conditions, should
the hydraulic scram fail.
In figure 8 the main components of the drive and the control rod are shown.
They are
geared electric motor
mechanical screw transmission
piston tube and
guide tube.

The transmission consists of a motor-driven screw (drive screw) and a


torsionally restrained nut (drive nut). The lower end of the piston tube rests on
the nut, and its top is connected to the control rod by means of a bayonet
coupling.
The drive screw, drive nut and piston tube are enclosed by the guide tube,

- 79 -
REACTOR
CONTAINMENT
VESSEL
REACTOR
PRESSURE
VESSEL

351

222 313 649

354 221

532

221 CONTROL ROD DRIVES


222 CONTROL RODS
313 RECIRCULATION SYSTEM
351 BORON SYSTEM
354 HYDRAULIC SCRAM SYSTEM
532 CONTROL ROD OPERATION
649 FREQUENCY CONVERTERS

Figure 6. Function chain diagram for reactor shutdown at the Forsmark 3 plant.

80
BWR 3000 - HYDRAULIC SC RAM SYSTEM (SYSTEM 354)

'f^<fr&xfy.&m&

331 Reactor water cleanup system 331


352 Leakage drain system
754 Compressed nitrogen system

h?
JUUL
tl JUL
SL
to the other
from
scram modules
system 754 =o<J=

to the other 'f?C^



scram modules '

to
system 352 =3
scram module

Figure 7.
CONTROL ROD DRIVE SYSTEM

I- Control rod

Fuel
[assembly
14)

-Baseplate

Control rod
guide tube

Control rod
drive

ASEA-ATOM BWR
core module Pressurized
water-

Figure 8.
Control rod in bottom Normal manoeuvring Fast insertion with
position with electric motor pressurized water

- 82
which acts as torsional and radial guide for the nut and piston tube.
Reactor scram involves automatic insertion of the piston tube and thus its
control rod into the core by means of high-pressure water from the hydraulic
scram system, regardless of the position of the drive nut. To maintain the
piston tube in the intended position after reactor scram, the bottom end of the
piston tube is provided with three latches, which are actuated as soon as the
piston tube and nut separate. The latches engage into special latching holes in
the guide tube. One latch is sufficient to hold the piston tube and control rod in
position. Simultaneously with the signal to the scram system, a starting signal is
applied to the control rod drive motors. The electro-mechanical transmission
will thus immediately start to move the drive nut upwards, and, after a
maximum of k minutes, the nut will come into contact with the piston tube in
its fully inserted position.

3.3. REACTOR PROTECTION SYSTEM


The Reactor Protection System, which actuates the hydraulic scram system and
also the electromechanical rod insertion system, is operating in the energized
mode, i.e. electricity is normally flowing through the circuits, and a channel is
tripped when the current vanishes (fail-safe). In Forsmark 3 and Oskarshamn 3
plants there is an additional logic system with the purpose of actuating the
electro-mechanical shutdown system. The signals into this logic system come
from sensors and transmitters separated from those used by the ordinary RPS
logic, and the system works in the deenergized mode. In this way there are two
diversified actuation systems as well as two diversified insertion systems for
the control rods.

The diversity characteristic for the reactor shutdown function, together


with generous reactor pressure relief capacity, has led to regulatory acceptance
of the shutdown system as being the sufficient ATWS measure in all ABB ATOM
reactors. The control rod drive system is effectively "ATWS" proof.

4. Other Defences

An example of fail-safe design has already been given. Below follows a short
summary of additional means for reducing the impact of dependent events.
Note that not all defensive measures described in this chapter are design-
related.

*.l. DESIGN REVIEW AND VERIFICATION

If equipment would be subject to more severe conditions during normal


operation or accident conditions than it has been designed to withstand, there is
of course an apparent risk for common cause failures. One way to reduce the
probability for this type of common cause failures is to perform a through
design review of all important safety systems. The design review is an
important part in the process of designing a new nuclear power plant at ABB
ATOM. It is performed by assembling a number of concerned people with a

- 83 -
thorough knowledge of the actual system, including mechanical, electrical and
safety specialists, persons with experience from reactor operations and know
ledge of layout and installation of the system itself as well as of other
equipment which may influence the system considered. In this way the system
will be reviewed and analyzed from different angles, and the communication
between the different specialists may reveal design weaknesses which otherwise
would not have been noticed.
To ensure correct and conservative design specifications for equipment
which is relyed upon during accident conditions, it is important to have reliable
analytical tools to calculate the various loads which may occur during these
conditions.
In many cases analytical calculations are not sufficient, but they have to be
supplemented by type tests and verification experiments. C arefully planned and
conducted experiments and verification tests are effective weapons in the
defence against common cause failures.

.2. PERIODIC TESTING

Latent dependent failures can be detected by periodic testing whose frequency


is prescribed by Technical Specifications. Tests are always performed after
repair. C omponents manouvered during test are automatically restored to the
original position after completion of the test. In order to minimize the
possibility of introduction of C C Fs during testing the redundant subs are tested
in staggered manner.
In the case of the Reactor Protection System dynamic selftesting is applied
as illustrated in figure 9.

4.3. ADMINISTRATIVE PROC EDURES

A number of activities of more or less administrative character can contribute


to reduction of potential for C C Fs. Of central importance is an adequate
Quality Assurance Programme followed by the vendor and a thorough followup
of the operation experience by utilities and vendor. Performance of PSA studies
is one of the most efficient tools for identification of dependencies. Elimination
of unintended dependencies is the best way of dealing with them and does not
necessarily require large resources, as demonstrated by some modifications
introduced at older ABB ATOM plants as a result of PSA findings.
Finally, good communications between individuals and groups of people
involved in operation of the plants are important for minimizing the potential
for human interaction dependencies. This includes transfer of information
through manuals, procedures, documentation from tests and repair activities,
etc.

5. Abnormal Events

The PSA studies applied to nuclear power plants show clearly that the designs
are reasonably "proof" against random faults and malfunctions. Thus, attention
is focused on possible sources of dependencies, which may eventually lead to

84
BWR 75 - REACTOR PROTECTION SYSTEM (RPS)
Dynamic testing of 2 - o u t - o f - 4 logic

Logic Signal exchange


channels (optocouplers)
Actuation
signals

Subdivision A /Output
No signals
I
equal?
S
Subdivision 2/4 logic
Test charme)

Subdivision C to main
computer
Alarm

Subdivision D

Figure 9.
multiple failures. The single occurrence that yields a multiple failure may be,
for example one of abnormal events such as:
fire
earthquake
aircraft crash
missiles
flooding
chemical explosion
hurricane
lightning.

In view of the design principles outlined in the preceding chapter, brief


summaries will be given of specific measures applied by ABB ATOM in the
context of the treatment of abnormal events to be considered in the design of
nuclear power plants. The summary is based or ref. 5.

5.1. FIRE HAZARD

The fire protection as applied in ABB-ATOM's BWR 75 rests mainly on passive


principles. The following passive means have been used in a comprehensive
effort to protect the plant from fires:

separation between redundant safety related systems


use of fire resistant construction
application of fire resistant materials
reduction of quantities of combustible material to a minimum.

The general principle adapted for BWR 75 is that subs A and C are separated
from subs B and D by being placed in different fire zones. In no case must a fire
zone share the ventilation equipment or air ducts with another fire zone (with
exception for main exhaust air stack), i.e. separate systems are provided for
normal ventilation emergency filters and smoke extraction. The fire zones are
separated by fire-resistant structures. ABB ATOM has divided the main
buildings for BWR 75 into nine fire zones. Larger buildings and the functionally
associated smaller buildings form their own fire zones. As an example, the
reactor building has been divided into two fire zones.
The nine fire zones (figure 10) are divided into fire cells of two types: fire
cells with separate normal ventilation and fire cells with common normal
ventilation. The degree of separation is determined by the fire-load of the area
considered and also by other hazards.
Figure 11 illustrates the physical separation principles as applied to elect-
rical equipment. The degree of separation reflects consideration of the
potential hazard, with regard to fire load as well as the consequence of fire.
Reactor protection system equipment is situated in four separate areas that
are provided with separate ventilation systems in the control building (figure
12). In the control room itself, the RPS-related functions are placed in different
cabinets provided with fire-resistant shielding. In the cable spreading area
below the control room, safety-related cables are also separated into four
channels that are individually shielded.

- 86 -
121 Reactor building
122 Turbine building
123 Condensate cleanup system building
124 Auxiliary systems buildings A,
125 Entrance building
126 Control building
127 Diesel buildings A, B, C, D
128 Waste building
129 Active workshop building
131 Auxiliary cooling water buildings A,
132 High voltage switchgear building
134 Transformer building
137 Turbine cooling water systems building
145 Offgas building
148 Storage building

-Htf Boundary of fire zone


A/C side, designation Ax
D B/D side, designation Bx

Figure 10. Fire zones.


BWR 75 - PHYSICA L SEPA RA TION CRITERIA

I /
y
o

/

f
3^
Central control room
m

'' ' <-''' ' ' ' "

I II } I V.l.

Apparatus rooms or Diesel buildings

tm23 axaa
distance distance
t bi ibi

barrier barrier
$
m
| *&; I o CO 00 I

VT"V^~T; ^y^ivr'^ij^t^^y 1 . 1 '

Cable areas (culverts)

booed
distance distance
(EH <3XS>
barrier <Z> barrier <8>

i
*l^^'^;v''^^'"'.!^V:^^',J
^'^', !'.''.
wmm;
l fc
& r e . . . S W^RV?).t>v.^iyA^r/,'.<^B.
VK' ltl ..'r.viJK'ftAyA.fc

Process areas
Subdivision A Subdivision D
Subdivision C Subdivision

Figure 11.

88
BWR 75 - CONTROL BUILDING, LEVEL 0.0

1 Central control room


2 Computer room
3 Office area
4 Control room ventilation
5 Other ventilation

6 Safety-related control equipment


(IKM, etc.)
7 Operational control equipment, reactor
plant (IKM, etc.)
8 Operational control equipment, turbine
plant

9 Safety-related control voltage supply


10 Operational control voltage supply
11 Lighting and general power distribution
12 Batteries, 24 V

Subdivision A D Subdivision C
Subdivision Subdivision D

Figure 12.
Extensive active firefighting systems include the alarm system consisting
of two independent parts for the A/C and B/D portions of the plant, about 60
separate ventilation systems for normal ventilation and 15 systems for smoke
extraction, and numerous extinguishing systems (for water supply and hydrants,
fire sprinklers and halon extinguishers).
In view of the general separation principles and the features of ventilation
and fire extinguishing systems, it is highly improbable that the fire could affect
more than one sub.

5.2. SEISMIC HAZARD


Geological conditions in Sweden and Finland are quite stable. For this reason,
no specific seismic requirements were formulated in connection with the
erection of the first nuclear power plants. Such requirements were only put on
the last two plants, Forsmark 3 and Oskarshamn 3. The ground motion
horizontal acceleration for Forsmark 3 was estimated as 0.15 g for the Safe
Shutdown Earthquake (SSE). A seismic classification of buildings and structure
for these two plants was made. As in the case of other safety related
considerations, our work was based mainly upon American rules, regulations and
guidelines. Figure 13 shows which buildings are subject to seismic design.
Verification and seismic qualification of equipment has been performed with
the following methods:

1) Engineering judgement
2) Static analysis
3) Dynamic analysis
Testing.
The seismic input for all equipment are the floor response spectra obtained
from the dynamic analysis of different building structures. Using the floor
response spectra it has been possible to qualify some equipment by e.g.
referring to results from other types of vibration testing or by referring to the
low acceleration level.
Engineering judgement has preferably been used to evaluate the risk that
nonseismic equipment might jeopardize seismically classified equipment.
Static analysis has been used where it is obvious that a given component has
its natural frequency above 33 Hz, which is outside the frequency range of the
earthquake. In other cases it has been necessary to verify by dynamic analysis
that the natural frequency is >33 Hz and then apply a static analysis.
Dynamic analysis has been performed on all building structures to obtain the
floor response spectra. The reactor with internals has been analyzed dynami
cally together with the reactor building and containment. A dynamic analysis
has also been performed for most process piping with individual dynamic models
for different piping systems. Small piping has been treated in more generalized
manner to get appropriate support distances for different piping sizes.
Where the seismic design has not been possible to verify by analytical
methods, testing has been performed.
A separate analysis concerns the seismic response of the turbine plant. The
potential missiles are the turbine building, but also turbine itself and feedwater
tank. It is concluded that a Safe Shutdown Earthquake (SSE) may cause damage

90
BWR 75 - BUILDINGS, SEISMIC DESIGN

The following buildings are designed to


assure safe shut-down of the reactor
following an earthquake of specified
magnitude-.
121 Reactor building
124 Auxiliary systems buildings A,
126 Control building
127 Diesel buildings A,B,C,D
131 Auxiliary cooling water building
136 Cooling water screening plant
building
(128 Waste building*))
"Only to the extent necessary to
avoid release of stored radioactive
material to the groundwater

Figure 13.
to the turbine plant, but will not significantly affect the safety related
equipment.

5.3 AIRCRAFT CRASH

The need for protection of a nuclear power plant against an aircraft crash is
related to the location of the plant. While only small aircrafts are considered a
hazard in the vicinity of most Scandinavian sites, large areoplanes are postu-
lated to be potential hazards at many continental sites, e.g. in the FRG and
Switzerland. Thus, both military and large civilian aircraft are analysed with
respect to their possible impact on the buildings of the plant.
In the ABB-ATOM plant, basic protection is achieved by protecting the
reactor containment by structures, and locating redundant portions of safety
related systems in building compounds which are separated in such a way that
the aircraft cannot damage redundant parts (figure 14).
Special attention has to be given to the risks associated with the burning of
huge amounts of aircraft fuel. The fire can affect emergency ventilation
systems and may lead to the choking of redundant diesel generator sets, needed
for emergency power. Thus, air intakes may need special protection.
Further protection can be obtained by reinforcing certain walls and roofs to
withstand specific impacts, defined by aircraft speed, size, and angle of
approach.
Two loading cases have been studied:

1) Impact of a fast military airplane, weighing 20 tons and with a velocity of


215 m/s
2) Impact of a large commercial aircraft with a velocity of 100 m/s.
In both cases the airplanes are assumed to strike perpendicularly at any
exposed part of the buildings. The airplane strike resistent buildings are
designed to withstand the above impacts.
The minimum thickness of concrete members used for aircraft impact
protection is determined on the basis of semi-empirical relations. Both perfora-
tion and scabbing are concerned.
The computation of the response of an impacted structure is calculated on
the basis of energy consideration. The final verification is obtained by FEM
calculations, using detailed models of the reinforced concrete structures.
The global response of the reactor building and the response of housed
equipment to an aircraft impact and to the subsequent vibrations of the building
have been studied using the same models as for the seismic analyses of the
structures.
The layout of the buildings offers the option of obtaining protection against
other types of airplanes or other airborne missiles by simply varying the
thickness of the roof slab and upper parts of the external walls together with
providing an adequate reinforcement. For protection against a crashing large
commercial aircraft for instance, the required thickness of the roof slab would
be of the order of 2.0 m.

92 -
I
-o

121 Reactor building. 127 Diesel buildings A . B . C D

124 Auxiliary systems buildings . 131 Auxiliary cooling water building


126 Control building 136 Cooling water screening plant building

Figure 14. Aircraft protection.


5. MISSILES PROTEC TION
Analyses of two types of missiles have been carried out for Forsmark 3 and
Oskarshamn 3 plants:

1) Internally generated missiles associated with components overspeed


failures
2) Missiles that could originate from highenergy system ruptures.

Examples of potential missiles are:

break in a control rod drive


bolts in main circulation pumps
bolts in the reactor pressure vessel flange
overspeed missiles from the cooling system for the containment
atmosphere.
In all cases analysed it has been shown that it is either impossible to
generate a missile or that a generated missile cannot jeopardize the functions
of safety systems.
A special consideration has been given to turbine missiles, which constitute
the main missile hazard outside the containment. The most critical components
in this context are the turbine and f eedwater tank.
Due to extensive overspeed protection missiles generated from overspeed of
the turbine may be excluded. The consequences of a highly improbable missile
during normal operation are not critical since the longitudinal axis of the
turbine is directed towards the reactor building. Furthermore, a missile from
the blad will not be able to reach safetyrelated equipment because it will be
stopped by intermediate walls.
The feedwater tank contains a large amount of energy. A full break in the
tank could generate a missile (the tank gable) which would accelerate towards
the reactor building, but will not penetrate its walls. The probability of a
spontaneously occuring full break (e.g. due to material imperfections) has been
shown to be extremely small, which also applies to the risk of exceeding the
design pressure of the tank.
Postulated pipe ruptures which constitute the primary event of the LOC A
may damage components in the vicinity of the whipping pipe. As in the case of
other missiles the best general protection method against pipe whip is by
separation of systems and components which must not suffer damage in this
situation. In locations such as the reactor containment vessel where separation
of essential piping is impracticable due to the limited space available, protec
tion against pipe whip is arranged by means of restraints or local shields for
vital equipment. C onsequently, secondary missiles generated from the whipping
pipe are very unlikely.
A broken high energy pipe will discharge fluid, why safety related equip
ments must be appropriately protected against possible jet loads.
Whenever necessary, protection devices against jet impingements have been
utilized both outside and inside the containment. However, usually the physical
separation of equipment is sufficient to eliminate this hazard. Inside the
containment separation is carried out by distance or by barriers. This makes it
highly improbable for a missile to affect more than one sub. Outside the

94
containment, separation of safety related equipment is made by location in
different rooms, which makes it not possible for a missile to affect more than
one sub. However, jet impingements may affect safety related equipment.
Discharging fluid could also cause secondary missiles for example gratings
accelerated by jet forces. The potential for safety related impact from
secondary missiles is limited due to design and the fact that the identified
secondary missiles do not affect safety related equipment in more than one
subcircuit.
In conclusion, the missile analyses, carried out on a case-by-case basis,
ensure that the basic safety goals are not endangered by missiles. This implies
that

Overall integrity of buildings and structures housing safety-related equip-


ment is maintained. However, local damage may be the result from
missiles.
No loss of containment function as a result of missiles generated inside
primary containment will occur.
No LOCA will be incurred as a result of missiles generated outside primary
containment.
The plant is capable of being shut down and maintained in a safe shutdown
condition.

5.5. FLOODING
Possibility of external flooding, caused by a high water level, is site dependent.
In the Forsmark 3 case the protection of the safety equipment against such
eventuality is generally assured by placing it on a level which will not be
reached by the water from external sources, even under extreme conditions.
The criterion for protection against internal flooding is that safety functions
should not be impaired. To meet these requirements, certain rooms containing
safety related equipment are designed to be leak-tight to prevent the water
from propagation to adjacent rooms. Furthermore, to limit the outflow from a
pipe break, the plant is equipped with room monitoring equipment which
automatically actuates isolation of the break, thus limiting the outflow to the
room.
The analyses of internal flooding have been made for all buildings of safety
interest. The different buildings have been designed with discharge opening and
runoff ways to meet the water loads from potential internal flooding and
satisfy safety requirements.

6. Insights from Probabilistic Studies

During the last few years a lot of effort has been directed towards performing
Probabilistic Safety Assessment (PSA) for nuclear power plants. The Swedish
Nuclear Power Inspectorate (SKI) now requires PSAs to be made recurrently
(every 10th year) as a check of the safety level of operating nuclear power
plants. ABB ATOM has made essential contributions to this effort by partici-
pating as the main contractor or in cooperation with utility in question in PSAs
for the following plants (the year of publication of the final report is indicated

- 95 -
within parantheses):

Forsmark 3 (1977)
Ringhals 1 (1983)
Barsebck 1(1985)
Forsmark 3 (1985)
Oskarshamn 3 (1986).

In this chapter some insights from probabilistic studies concerning the


efficiency of design-related defensive measures against dependent failures, are
summarized. A more detailed review has been given in ref. 6.

6.1 PROBABILISTIC SAFETY ASSESSMENT FOR A FOUR-DIVISIONAL


PLANT

Recently, ABB ATOM performed a PSA for a four-divisional plant (ref. 7). A
characteristic feature of this study is a thorough treatment of dependencies,
including a detailed modelling of intercomponent CCF-contributions. Three
types of dependencies have been considered:

Common Cause Initiators (CCIs) of transient character


Intersystem dependencies
Intercomponent dependencies.

The two last mentioned groups were in turn divided into functional,
shared-equipment, physical interaction and human interaction dependencies.
The dependencies not explicitly included in the plant model (small event
tree/large fault tree approach) were incorporated as residual CCF-contribu-
tions. Of particular interest are the analyses of CCIs and CCFs.
The analysis of plant-specific CCIs was limited to events of transient
character. External events and internal events causing severe environmental
stresses were outside of the scope of the PSA. Naturally, the choice of such
boundary conditions for the study was motivated by the specific design features
of the plant, minimizing the impact of this type of events. The analysis focused
on functions which may influence both normally operated systems as well as
stand-by safety systems, i.e.:

reactor water level measurement


electric power supply
other support systems (pressurized gas systems and secondary cooling
systems).

In all cases analysed, the potential CCIs are already either covered by
generic transient categories or do not contribute significantly to the core melt
frequency. This is a natural result of the consequent separation of redundant
safety equipment, both from the physical and functional point of view.
An issue related to CCIs are physical interaction dependencies created by
dynamic effects such as: pipe whips, jet impingement, secondary missiles and
pool-dynamic loads, which may folow upon a pipe break within the reactor
containment. As a part of the PSA the unavailability contributions from

- 96 -
dynamic effects, for systems mitigating the consequences of internal pipe
breaks, have been assessed. In some cases significant contributions have been
obtained for the pressure relief system, the auxiliary feedwater system and the
emergency core cooling system. However, it was demonstrated that dynamic
effects have small impact on estimated frequencies of accident sequences.
The residual CCF-contributions have been quantified for active components
within safety systems. Second-order contributions from passive components,
diversified equipment and intersystem-intercomponent residual CCFs, have
been neglected. A characteristic feature of the analysis is consideration of all
possible failure multiplicities (i.e. double, triple and quadruple failures) and
their combinations.
It has been shown in supplementary deterministic analyses that for the
majority of initiating events, e.g. all identified transients, only "one-out-of-
four" trains in the safety systems will be needed. Thus, the actual capacity of
safety systems corresponds in such situations to 4x100%. This leads to very
favourable conditions from a probabilistic point of view - a quadruple failure
must occur to disable a safety system. Only for the most limiting accident
sequences (some of the loss of coolant accidents) the "two-out-of-four" success
criterion applies. As shown in the PSA the LOCA sequences, sensitive to triple
CCFs, give very small contributions to the estimated core melt frequency.
Another important finding is that postulated systematic misconfiguration of
redundant components gives insignificant contributions. This is due to high level
of redundancy, separation, staggered testing of redundant trains, automatical
restoration of components to original position after testing and favourable
conditions for recovery.

6.2. DESIGN-ORIENTED COMMON CAUSE FAILURE DATA ANALYSIS


The lack of reliable data and other limitations of available data sources stand
out as the weakest link in the current state of CCF-analysis. In view of these
uncertainties and sometimes misleading interpretations of the data, ABB ATOM
has carried out several investigations of this issue. The work has been supported
by the Swedish Nuclear Power Inspectorate.
Definitely most information is available on diesel generators (DGs) since the
systems are usually redundant, have relatively high failure rates and are
frequently tested.
Multiple DG-failures, representing the U.S.-experience and reported in the
EPRI-study (ref. 8), have been analysed with the aim of answering the following
question:
Is it possible to eliminate any of the failures by improving physical
separation and by increasing the level of redundancy (as applied in ABB ATOM'S
four-divisional plants)?
The failures reported in the EPRI-study (ref. 8) break down into:

237 single failures


15 double failures (of which 12 are dependent), and
2 triple failures.

The analysis performed shows that of totally 14 CCFs:

97 -
Three CCFs could be eliminated if the four-divisional design (separation)
had been applied; this includes a triple failure at Salem 1
One CCF would not occur in BWR 75 due to different design of a special
system
Three CCFs would have an extremely low probability of ocurrence due to
improved separation (in two cases) and different design (in one case)
Six CCFs (with two diesels involved) cannot be excluded with certainty;
however, two of them have a negligible probability of striking four diesels
simultaneously
One CCF has unknown cause (triple failure at Brunswick 2).

A conclusion may be drawn that a significant number of reported CCFs can


be eliminated by improved separation. Increased redundancy will in some cases
mitigate the effects of CCFs on system operation, especially when influence of
the environment on the components is concerned.
The analysis of DG-systems at 12 Finnish and Swedish nuclear power plants
(including eight plants with k DGs/plant) shows (ref. 9) that until the end of
1981 only two CCFs have occurred. Both of them involved two DGs at a plant
with four DGs. Also this result indicates that the probability of quadruple
failures, which are of main concern in the context of four-divisional design, is
very low.
As mentioned before the four-divisional separation is especially effective as
a defence measure against low probability events with great damage potential
(tornado, hurricane, lightning, flood, plane crash, meteorite, fire, explosion,
sabotage, etc.). This aspect is seldom reflected by available data. The benefits
of separation seem more obvious today than at the time when the original
design was made. In the PSAs made for plants with less emphasis on separation
between redundancies, considerable contributions to the frequency of severe
core damages originate from external events. In many cases it is very difficult
to give accurate predictions of low probability events like core damages caused
by external phenomena. The probabilistic tools are no longer reliable. It is
definitely preferable to have a design which is relatively insensitive to this type
of accident initiators.
A survey of available data sources for diesel generators (ref. 10) clearly
demonstrates the difficulties associated with absolute predictions of CCF-
probabilities and the danger of misusing such information. Some observations,
important to have in mind when using this numbers, have been made:

1) Although the DG-studies basically use the same source material (failure
reports), significant discrepancies exist between the results of different
studies.
2) Frequently the information available from simple and interconnected
systems is extrapolated to systems with good separation and with an
increased level of redundancy. This extrapolation is not always carried out
with due concern to design and application differences, thus resulting in
pessimistic predictions that do not reflect actual design improvements.
3) The estimated multiple failure probabilities are subject to large
uncertainties due to plant-to-plant variation, different interpretation of
the same source information and use of different quantitative methods.

A comparision of different methods for CCF-quantification has also shown

- 98 -
that application of primitive models to systems with high level of redundancy
may result in excessive conservatism (ref. 11).

7. Conclusions
The PSAs performed for four-divisional plants have demonstrated that the
design philosophy characterised by physical and functional separation of
redundant equipment constitutes an efficient defence measure against critical
dependent failures. This statement is valid even in view of uncertainties
associated with estimation of residual common cause failure contributions. In
this context it must be emphasized that improper extrapolation of information
available from simple and interconnected systems to systems with complete
separation and high level of redundancy, results in excessively pessimistic
predictions and does not reflect actual design improvements.
Apart from the inherent defensive measures against dependent events there
are other strong reasons in support of the four-divisional design:

1) The possibility of applying more flexible maintenance and repair strategies


during power operation. Specifically, the N-2 criterion is fulfilled and a
high system function availability is maintained, even if relatively long
subsystem outage times are permitted (ref. 12).
2) The prospects of speeding up licensing procedures due to the absence of
complicated links, interactions and interconnections between safety
related functions.
3) The simplicity, symmetry and avoidance of complex structures is also
favourable from the availability and maintainability point of view (ref. 13),
thus giving the owner a basis for good operating economy with low
radiation exposures to the crew and maintenance personnel, as well as low
releases to the environment.

8. References
1. Hellstrm, B., Lnnerberg, B., Tirn, I., 'The ASEA-ATOM Advanced BWR
is a Proven Design. Key Features of BWR 90'. Atomwirtschaft 8/9,
August/September 1987.
2. Helander, L-I., Tirn, L.I., 'Nuclear Power Plant Safety. The Merits of
Separation'. International Conference on Nuclear Power and its Fuel Cycle,
Salzburg, Austria, May 2-3, 1977.
3. Rolandson, S., 'Practical Defences Against Dependent Failures'. Workshop
on Dependent Failure Analysis, Vsters, Sweden, April 27-28, 1983,
Swedish Nuclear Power Inspectorate and AB ASEA-ATOM.
k. Ericsson, G. Lilja, T., 'ATWS in a BWR with Alternate Rod Insertion
Function - a Probabilistic Analysis'. International ANS/ENS Topical
Meeting on Probabilistic Risk Assessment, Port Chester, New Yourk, USA,
September 20-2<f, 1981.

99
5. Hirschberg, S., Tirn, L.I., 'Review of ASEAATOM Activities in the Area
of Abnormal Events, Risk Analysis and Reliability Engineering'.
Ris/UKAEA/ASEAATOM Seminar, Winfrith, England, November 1920,
1985.
6. Hirschberg, S., Knochenhauer, M., 'Advantages of the FourDivisonal
Design'. IAEAInternational C onference on Nuclear Power Performance
and Safety, Vienna, Septemper 28October 2, 1987.
7. 'Forsmark 3 Probabilistic Safety Study' (in Swedish). AB ASEAATOM,
February 1985.
8. McC lymont, ., McLagan, G., 'Diesel Generator Reliability: Data and
Preliminary Analysis'. Interim Report EPRI NP2433, Electric Power
Research Institute, 1982.
9. Pulkkinen, U., et al., 'Reliability of Diesel Generators in the Finnish and
Swedish Nuclear Power Plants'. Electrical Engineering Laboratory,
Research Report 7/82, Technical Research C entre of Finland, June 1982.
10. Hirschberg, S., Pulkkinen, U., 'C ommon C ause Failure Data: Experience
from Diesel Generator Studies'. Nuclear Safety 26, 3 (1985), 305.
11. Hirschberg, S., 'C omparison of Methods for Quantitative Analysis of
Common C ause Failures a C ase Study'. International ANS/ENS Topical
Meeting on Probabilistic Safety Methods and Applications, San Francisco,
California, USA, February 24 March 1, 1985.
12. Knochenhauer, M., Enqvist, ., 'Using PSA Models for Planning and
Evaluation of Preventive Maintenance during Power Operation'.
CSNI/UNIPEDE Specialist Meeting on Improving Technical Specifications
for Nuclear Power Plants, Madrid, Spain, September 711, 1987.
13. Leine, L., 'Design for Maintainability'. IAEA International Symposium on
Nuclear Power Plant Outage Experience, Karlsruhe, West Germany, June
1822, 1984.

100
DESIGN DEFENCES AGAINST COMMON CAUSE/MULTIPLE RELATED FAILURE

P. Drre R. Schilling
SIEMENS AG (KWU) SIEMENS AG (KWU)
P.O. Box 101063 P.O. Box 3220
D6050 Offenbach D8520 Erlangen
FR Germany FR Germany

ABSTRACT. The system design principles to ensure high safeguard system


reliability in German 1300 MW KWU nuclear power plants are briefly
described and illustrated. The special importance of these principles
for preventing Common Cause Failure (C C F)/ Multiple Related Failure
(MRF) is underlined.

1. DESIGN PRINC IPLES


To ensure high reliability of the engineered safeguards, the following
design principles are adhered to as consistently as possible (KWU (1)).

1.1. Redundancy

Random single failures are contained by applying the redundancy


principle. Redundancy implies multiplicity, i. e. important components
and systems are installed in greater numbers than necessary.
In the considerations behind redundancy it is postulated that
one subsystem fails because of one failure (single failure),
one further subsystem is simultaneously down for maintenance or repair
of a component,
the remaining subsystem(s) must be 100% capable of dealing with off
normal conditions.
It follows that the overall system must be of at least 3 100%
design. With the loop design concept used for the 1300 MW reactor,
subdivision of the safeguard systems into A 50% trains is more
suitable.

Example. The residual heat removal (RHR) system which is part of the
emergency core cooling (EC C ) system, is of 2outofA design (with
respect to conservative success criteria). This means that if at least 2
of the h subsystems provided function when required, the ECCS is still
capable of fulfilling its safety function.

101
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment. 101106.
1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
1.2. Diversity

Common cause failures such as design or manufacturing errors are preven-


ted e. g. in particular areas of the reactor protection system (RPS) by
application of the diversity principle. In the most general case diver-
sity implies the use of different modes. Thus diverse actuation criteria
(limit values of different physical parameters) are evaluated for the
initiation of reactor trip in the event of off-normal conditions.

Example. An increase in reactor power which is initially indicated by an


increase in neutron flux gives rise to an increase in coolant tempera-
ture and coolant pressure and, be-cause of the thermal expansion of the
coolant, causes the water level in the reactor coolant system to rise;
these changing properties represent diverse potential trip criteria.

1.3. Physical Separation and Structural Protection

To protect against faults that tend to spread to adjacent systems,


redundant subsystems are physically separated from each other. Physical
separation does not only extend to first-in-line systems, but as well to
the actuation signals (RPS), the emergency power bus bars, and the
trains of safety-related auxiliary systems (e. g. component cooling,
ventilation).
Where physically separate installation of redundant systems is not
possible or inexpedient, appropriate structural protection is provided.

1.4. Fail-Safe Principle


In certain cases, application of the fail-safe principle affords added
protection ("fail-safe" failure results in a safe action). Where
possible safeguard systems are designed such that faults in the systems
or failure of their power supply trigger actions that restore the plant
to a safe condition.

Example. The control rod assemblies are gripped by means of


electromagnets. Should their power supply fail, the gripper coils cease
to function and the control rod assemblies drop into the reactor by
force of gravity and cause a trip.

1.5. Automation

When actions designed to counteract off-normal events have to be


triggered quickly, no reliance is placed on the attentiveness and
correct decision-making of the operating crew. In order to prevent
incorrect decisions in the first few minutes following the onset,
essential safety functions operate fully automatically after onset of
the incident and for at least 30 minutes afterwards; manual actions are
overridden during this time interval.

- 102
Some other features listed below do not refer to the hardware component
or system design, but to the (design of the) context or environment in
which these components operate. It is well known that the failure
behaviour of a component is not only a matter of its hardware design,
but as well depends on other conditions like the quality of preventive
maintenance, efficient and easy fault detection, efficient test and
operation procedures etc.

1.6. Pre-Operational Testing

Commissioning with pre-operational testing is understood as an intensive


"debugging" phase, in which unexpected failure behaviour can at least be
detected for components/items which are installed in large quantities.
On the system level, latent initial design or construction errors can be
identified and eliminated by system demands under increasingly realistic
conditions.

1.7. Staggered Testing during Plant Operation

When staggered testing is applied, the intervals between the tests of


two like components are shortened. Severe failure of a whole higher-
redundant system can already be detected by only wo subsequent
subsystem tests, the results of which will reveal a common cause failure
condition if present, thus reducing the potential system downtime to
only a small fraction of the system's test interval.

1.8. Diversity in Personnel

Different redundancies of the same system will usually be tested and


maintained by different shift personnel, according to the shift schedule
(this is in the responsibility of the NPP's utilities).

2. EXAMPLES

2.1. Emergency Feed (EF) System

2.1.1. General features and boundary conditions. The features of the EF


system are discussed for the Grohnde design, which was studied in detail
during the Common Cause Failure Reliability Benchmark Exercise (CCF-RBE)
of the European Community (Poucet et al (2)). NPP Grohnde is one of the
latest German 1300 MW 4-loop pressurized water reactors constructed by
KWU. It started commercial operation in February 1985.
The postulated initiating event for the CCF-RBE study case was the
emergency power mode, which is
- initiated by coincident loss of both offsite (main grid and backup
grid) and onsite power,
- followed by reactor scram, turbine trip-off, and
- the starting of all A emergency power diesels (grid D1).
For definiteness it was assumed that the event occurs at 100%
power operation and lasts for 2 hours. During this time a hot standby

- 103 -
state must be maintained with the help of both the steam relief function
and the steam generator feed function. The latter function is
established by 2 diverse systems (see Fig. 1):
- the startup and shutdown system with 2x100% pumps, connected to 2 of
the k emergency power bus bars (grid D1),
- the emergency feed (EF) system with 4x100% pumps.

SIEMENS

Mechanical systems Electrical systems

Protected
a l tomai tvtnts

' MW. flt<ili*tlorwJ<uJhit j

Fig. 1 PWR 1300 MW: Possible Modes of Steam Generator


Feed (Survey)

In the CCF-RBE analysis the steam generator feed provided by the


EF system was investigated in detail. Each EF train is essentially made
up of (see Fig. 2 for RS10, i. e. train 1)
- a demineralized water pool,
- EF pump (RS12 DOOD with diesel drive and emergency power generator
(GY50, train 1 of grid D2),
- cooling (RS17) and ventilation (UV31) equipment,
- connecting piping and valves.

2.1.2. Properties of the EF system with respect to CCF. With respect to


subjection to or protection against CCF, the following properties have
to be taken into consideration:
- no shared hardware components (avoidance of incomplete redundancy),
- no hardware diversity (standardized equipment, i. e. the same
component type is installed for "homologous" components in all 4
subsystems),
- strict spatial separation and physical barriers which restrict events
like fire, internal explosion, flood, abnormal temperature to one
redundancy (see Fig. 3 ) ,
- extra building, protected against all external events,

104
same routine functional test procedure for each subsystem,
staggered testing (one subsystem per week),
time trends in operational behaviour of sane important components
during tests are recorded and evaluated.

SIEMENS

Fig. 2 NPP Grohnde: Qnergency Feed System - 1 Train


(Overview)

SIEMENS

Section

1 Emargancy ltd pump 5VrtiUtKxi


2 Diaul u t 6 Watai ttoraga lank
3 Oil tank 7 Cabla and pk ducts
4 Bananas BSwitchgaan

Plan view

Fig. 3 PWR: Bnergency Feed Building

- 105
2.2. Reactor Protection (RP) System Design Requirements

According to the set of rules which govern the design of the reactor
protection (RP) system (KTA 3501), it is required that the RP system has
to be designed, installed, and operated such that failure-causing events
within and outside the reactor as well as within the RP system itself
cannot prevent the initiation of necessary protective actions.
Generally 4 types of component unavailability are accounted for in
the design stage:
- systematic failure
- random failure,
- command/cascade failure,
- unavailability due to repair.
The following potential failure-causing events have to be
considered within the RP system itself:
- random failures of components (modules) of the RP system, caused e. g.
by short circuit, interrupt, short to ground, change in voltage or
frequency, mechanical failure, fire;
- systematic failures like several simultaneous (or sequential in short
time) failures in subsystems of the RP system, originating from a
common cause in the RP system itself, e. g. manufacturing or design
error, drift.
With respect to systematic failures external to the RP system (but
internal to the plant), e. g. the following events have to be taken into
consideration:

wrong operation or maintenance, water, fire;


mechanical impacts from pipes, valves and vessels;

With respect to systematic failures external to the plant, e. g.


the following events are considered:

flood, lightning and storm.

3. C ON
C LUSIONS

The goal for a the design of a reliable system is to achieve


independence of its redundant subsystems. Multiplicity with like
components alone is no protection against common cause failure (C C F):
other appropriate measures have to be applied in addition. Diversity is
one of these measures; other measures can be applied instead or in
addition, according to the necessary reliability level.

References
1. KWU: Pressurized Water Reactor, Order No. K/10567-101, January 1982
2. Poucet, ., Amendola, ., and Cacciabue, P. C, "C C F-RBE: Common
Cause Failure Reliability Benchmark Exercise", EUR 11054 EN, 1987

106
MEASURES TAKEN AT DESIGN LEVEL TO COUNTER COMMON CAUSE FAILURES .
A FEW COMMENTS CONCERNING THE APPROACH OF EDF .

T. MESLIN
EDF/SPT
3, rue de Messine
75384 PARIS Cedex 08
FRANCE

CONTENTS

1. GENERAL COMMENTS

2. ILLUSTRATIONS OF THE EDF APPROACH

2.1. 1300 MWe PWR unit control and instrumentation systems

2.2. Engineered safety system architecture

2.2.1. Auxiliary feedwater

2.2.2. Safety injection

2.3. Post-accident operation

2.3.1. Organization of teams


2.3.2. Computerized aid in post-accident operation

2.4. Arrangements concerning accidents at design limits

3. COMMON CAUSE FAILURES - SUMMARY

4. CONCLUSION

107
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 107-111.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
1. GENERAL COMMENTS

To achieve a high level of safety, the design of French nuclear power


plants embodies conventional principles intended to avoid or minimize
the effects of common cause failures :

. redundancy,
. diversity,
. the fail-safe principle,
. channel separation,
. maintenance policy,
. automation.

The approach adopted by EDF has a number of particularities resulting


from the presence of a large number of identical units :

- the search for an optimum between redundancy and diversity,


- the application of redundancy and diversity principles in post-
incident operation arrangements,
- the development of a coherent approach to counter a loss of redun-
dant systems.

These arrangements, which are designed to provide higher safety


for optimal cost, are in certain cases accompanied by, or analysed in
the light of, probabilistic studies.
Subsequently, the different aspects of this procedure are illus-
trated by means of examples relating to both the design of the systems
and the operation of the installations in an accident situation.

2. ILLUSTRATIONS OF THE EDF APPROACH

2.1. 1300 MWe PWR unit control and instrumentation systems

The 1300 MWe PWR unit control and instrumentation consists entirely of
electronic components and microprocessors.

It essentially consists of two systems :

- the conventional control and instrumentation system (Controbloc),


- the reactor protection system (SPIN).

These two systems are completely independent. They have different


architecture and are from different vendors.
Controbloc performs all the conventional relay functions. Although
not necessarily or directly safety-related, the system has redundant
architecture. Each module of the Controbloc control and instrumentation
system (rack) consists of two buses which can both perform all func-
tions of the rack. A self-diagnostic system monitors the behaviour of
the two buses and facilitates maintenance.

108
SPIN, the reactor protection system, provides the scram function
and manages safeguard-related actions. It has four independent chan-
nels. It embodies the fail-safe and separation principles (physical,
geographical, etc.).
The two systems are the subject of thorough experience feedback
processing. The details of each electronic card fault are entered in
a record at the site which is processed by computer at headquarters
with the assistance of the vendors. With this procedure, it is already
possible to calculate numerous parameters relating to the electronic
components, but for the time being, no design related common cause
failure has been found in this new equipment.

2.2. Engineered safety system architecture

2.2.1. Auxiliary feedwater

The auxiliary feedwater system of the 1300 MWe units is an illustra-


tion of diversity in addition to redundancy. It consists of two iden-
tical streams. Each stream consists of two streams itself, one equip-
ped with a motor-driven pump on a protected power supply and the other
with a turbine-driven pump.
This system has greatly benefited from feedback of experience
from 900 MWe units and is extremely reliable. In the event of an inci-
dent, it independently receives commands from the SPIN and Controbloc
systems (see preceding section), which provides the diversity of the
commands by the control and instrumentation.
Finally, it can be directly activated from the control room by
manual commands or even locally in the event of total loss of the elec-
trical power supplies, for example (accident beyond design limit).

2.2.2. Safety injection

The design of the safety injection system of the 1300 MWe units has
been the subject of a probabilistic analysis, on completion of which
preference was given to a conventional two-stream arrangement rather
than a system with threefold redundancy. This was because multiplying
the number of identical streams does not appear to be an adequate mea-
sure against the risk of common cause failures. On the other hand, the
chosen safety injection arrangement provides diversity resulting from
the following three functions :

- low-pressure safety injection function,


- medium-pressure safety injection function,
- high-pressure automatic borate function.

2.3. Post-accident operation

2.3.1. Organization of teams

In France, the shift team is assisted by a safety and radioprotection


engineer who is permanently present at the site.

- 109 -
This engineer is called to the control room as soon as an event occurs
which could compromise the safety of the installation. The operators
are in charge of actual running of the unit and apply the appropriate
procedures.
The safety and radioprotection engineer takes responsability for
the operation of the installations in accident situations or in any
events which are not covered by the procedures. In such cases he ap-
plies "state based" procedure SPI, and his intervention provides a
redundant human analysis capability based on a method and resources
which are independent of the shift team.
This setup is supplemented, if the situation requires, by the
creation of independent policy making and analysis teams at local and
national levels.

2.3.2. Computerized aid in post-accident operation

To complete the post-accident operating arrangements, it is neccesary


to aid the operators by providing validated hierarchical and synthetic
data at their disposal. This is the role of the safety panel which in-
cludes functions of aid in diagnostics, surveillance of engineered sa-
fety systems, aid in safety injection management and an overall faci-
lity for monitoring safety parameters.
Data acquisition is carried out by two independent computers ope-
rating in a duty standby configuration. The processing of core surveil-
lance parameters is thus performed by two completely independent compu-
ters ; finally, solving of the logic equations contained in the proces-
sing software is performed by two redundant algorithms. The availabi-
lity factor of the system exceeds 99,95 %.

2.4. Arrangements concerning accidents at design limits

The French safety authorities have required EDF to study the situations
created by the total loss of the redundant systems, in particular the
electrical power supplies, the steam generator feedwater and the heat
sink.
As a result, EDF has developed supplementary means of countering
such accidents, based on suitable procedures and diversified comple-
mentary devices.
For example, as concerns loss of the electrical power supplies,
each unit is equipped with a turbine generator which can re-supply in-
jection at the primary pump seals and the vital control and instrumen-
tation components, also each unit can be re-supplied within a very
short lapse of time by a site gas turbine with the same capacity as a
diesel generator.
In the presence of the supplementary means, and with the corres-
ponding implementation instructions, it has been demonstrated that
none of the design limit situations allowed for represents a core dry-
out risk which exceeds 10 7 per reactor-year.

110 -
3. COMMON CAUSE FAILURES-SUMMARY

The operating feedback experience study has demonstrated a priori the


pertinence of the measures taken at design level and on occasion to
correct faults passed over in analysis.
It would thus appear that 20 to 25 % at the most of the common
cause failures observed in the EDF 900 and 1300 MWe reactors are ascri-
bable to design errors. In addition, these faults are rarely observed
in the systems directly affecting safety, which confirms the options
taken up during the design work.
But in addition to the small proportion of design faults in the
different common cause failures observed, it must be remembered that
design faults are generally rapidly identified due to the large number
of identical reactors and the procedures for integrating experience
feedback. Similarly, the solutions which are developed are implemented
in all the units involved. This is a major benefit of the high degree
of standardization of the EDF reactors.
Unlike design faults, common cause failures linked to operating
and maintenance operations are more worrying insofar as they are more
varied, appearing at random and rarely allow solutions applicable to
all the sites to be developed rapidly. However, they form the basis
of special actions to ensure that all operators are aware of the si-
tuation.
In this context it is necessary to maintain the highest possible
level of safety in operation by promoting a veritable safety culture
in the maintenance and operating teams.

A. CONCLUSION

The EDF approach to safety regarding common cause failures reflects a


high regard for effectiveness and homogeneity. In the field of design,
it draws upon the conventional principles of defense against common
cause failures by combining redundancy and diversity. This approach is
supplemented by allowance for design limit situations (loss of redun-
dant systems) for which special procedures and equipment have been de-
veloped. The effects of these arrangements have been checked with pro-
babilistic criteria.
Experience feedback results show that the basic design options
taken up are justified and that all efforts to counter common cause
failures must be concentrated on maintaining a high level of safety in
operation. It is for this reason that the inculcation of operational
quality and the development of a spirit of safety are fundamental ob-
jectives of EDF.

- Ill
ANALYSIS PROCEDURES FOR IDENTIFICATION OF MRF's

Humphreys
National Centre of Systems Reliability
UKAEA
Wigshaw Lane
Culcheth
Warrington
WA3 4NE

ABSTRACT. The concept of multiple related failures (MRF's), otherwise


known as Dependent Failures, is restated.
The need to protect against MRF's is considered and the problems
of identification of MRF's are addressed.
Advice is given on the analysis procedures adopted in the UK for
the identification of MRF's.

1. S
C OPE OF THE LECTURE

This paper discusses the UK approach to the detection of Multiple


Related Failures within systems, by the analysis of the system in terms
of its specification, design, construction, operation and maintenance.
The objective of the assessment processes is to detect potential
dependencies before they arise during plant operation and to provide an
indication of which dependencies need to be eradicated in order for the
plant to function safely. Through the lecture notes, multiple related
failures will be referred to as dependent failures and the following
set of definitions will apply:

Dependent Failure (DF): The failure of a set of events, the


probability of which cannot be expressed as the simple product of
the unconditional failure probabilities of the individual events.

Included in this category are:

Common Cause Failure (C C F): This is a specific type of dependent


failure where simultaneous (or near simultaneous) multiple
failures result from a single shared cause.

Common Mode Failures (C MF): This term is reserved for


commoncause failures in which multiple equipment items fail in
the same mode.

113
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 113129.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Cascade Failures (CF); These are propagating failures.

2. BENEFITS OF ASSESSMENT

The benefits of dependent failures assessment are two-fold:


First: the assessment process provides a means of identifying
potential hazards to plant, operating staff and public.
Secondly: the process provides a mechanism for quantifying the
risk arising due to the presence of dependencies in system
design.
The first stage is one of qualitative analysis, giving an
indication of the effectiveness of applied defences and highlighting
the requirements for the application of further defences. The second
stage, quantification of risk, provides an indicator as to the relative
importance of each potential dependency detected by the qualitative
analysis. Using both the qualitative and quantitative information, the
analyst/utility operator can decide which, if any, of the potential
dependencies need to be eliminated, eg by design changes or
administrative controls. The quantification of risk arising from
dependencies, can also form part of the overall Probabilistic Safety
Assessment of the plant under review.

3. THE TOOLS AND TECHNIQUES REQUIRED FOR THE ANALYSIS OF DEPENDENT


FAILURES

In order that a DF analysis may be performed, some structure needs to


be developed to provide a strategy for analysis. Within the structure,
certain support tools need to be provided,
- a classification scheme is an essential element if a
structure is to be developed for DF analysis. In practice several
forms of classification are required
- root cause of a dependent event
- event classification
- defensive structure classification
- modelling techniques must be available to quantify the system
failure probabilisties
- data must be available to support the modelling processes
- computer support
- a process for qualitative engineering analysis needs to be
defined.

3.1 Classification Schemes

There are a number of Interrelated schemes. Perhaps the most important


scheme is the defensive structure classification since this provides
the assessor with a means of analysing a system against a set of
defined requirements.
The defensive classification scheme of references 1 and 2 has been
previously described. It can be used by the analyst to determine the

- 114 -
presence or absence of defences against each category of the scheme,
and by that means performs a qualitative analysis of the plant.
An event classification scheme provides a means of organising data
on DF events into a structural form. One such scheme is that
originally developed by LATA (3).
The LATA classification scheme is much more than a classification
b> cause. The scheme proposed in (1)(2) is only a causal
classification. However, there would appear to be no great difficulty
in cross-referencing these classifications of causal mechanisms.
The other information, besides that on causes, stored in the LATA
schemes, make it applicable to modelling requirements.
The heart of the scheme is the universe of cause-effect logic
units shown in Figure 1. Classification of events at this fundamental
level is a significant step forward in clearing the mists of ambiguity
that shroud the topic of dependent failures. It is now possible on
this basis to ensure a comparison of like with like. Furthermore, the
analyst can state, in terms of pinpointing certain cause-effect logic
units, exactly the type of event(s) included in the computation of
numerators for beta factors etc. This should help to clarify the
situation in which different analysts can product different beta
factors from the same raw event data simply because different types of
event were considered candidates for the numerator in each case. For
example, the cleavage between root-caused events and component-caused
events is an essential distinction from the point of view of modelling.
Component-caused events are functional dependencies which ought to be
explicitly modelled in the structure itself of a properly constructed
fault tree. The parametric analysis (eg, beta factor) is designed to
mop up the root-caused events. thus while it is important to recognise
all the types in order to distinguish carefully between them, selection
must be employed in the use of this information for parametric
modelling. Therefore an analyst must state clearly that events of
types corresponding to eg, units 5, 6 and 7 of the cause effect
universe have been used in the computation of beta.
The classification also makes important distinctions between
events involving failures and events involving functional
unavailabilities, and also between actual and potential failure events.
The originators of this scheme caution against the abuse of the
potential failure category; it must not be used for all manner of
hypothetical or speculative events, but reserved for a more limited
category of events eg, incipient failures. Within a subjective
assessment some weighting can be given to potential failures if the
presence of a mechanism capable of causing multiple failures is judged
to be present. With suitable discretion on the part of the analyst,
this can help alleviate the problem of data scarcity. The
classification makes no distinction between multiple failures in a
single channel and failures in different (redundant) channels. Again
this is useful in terms of increasing the database. The analyst needs
however to exercise care from the point of view of defences. A
dependency between, say, valves in a single channel could in principle
apply to valves in different channels unless additional defences have

115
Root Cautid Coaponint Cauttd

_Sifli_ Statt
Functional Un Functional Un
FailureF availabilityU MitdFU FailurtF availobiliryU Mid FU

RF RU CU

r, 0 0n
. 1 1 n.1 2 n 1 3 n.l 4

scf s c u SCM
SRF SRU SRM

_ ^ 0 J ^ H i ,^ / V ^ ' / V ^ ' / ^ V 3 "


^^,( ( 2^n n^H
il
^ ^ n ^ ^ n i snHil
S-
Hi! 5 ni 2 6 a i l 7 ni 2 J ni! 9 rail 10

SRFU SRUU SRHU SCFU SCUU SCMU


ditto f i g u r i ditto figurt ditto figurt ditto figurt ditto figurt ditto figurt

CF.PX CU.PX
RF.PX RU.PX

^0" c-C3"
n . 1 ^ 0 r n t l ^ E i' ml
r i t )7 r i 1 18 ril

SRF.PX SRU.PX SRHPX SCF.P>L SCU.PX SCM.PX


N 0
0
nil
n> 1
n2 n> 2 mi 1 ni 2 ^x]r n i 2 mi 1
ri1 1 rt 1 23 ri 1 r >1
Zi r i 1

SRFU'PX SRUUPX SRMU.PX SCFUPX SCUU.PX SCMU.PX


ditto figurt ditto figurt ditto figurt ditto figurt ditto figurt ditto figurt
Zi 29 30 32

*> 5 RPX CPX

/ys'
c fcljl
fc 3
C^i
? VK
co3 .il
*
* s S r>1 R'33 r1 3

Figurt 1 Tht Univtrt of CtuttCffect Uniti

116
been utilised between channels. If this latter was the case, the event
data would not strictly be applicable.
An example of the worksheet associated with the scheme Is shown on
Figure 2.
A root cause classification scheme Is effectively a definition of
the structure of cause code scheme of column 3 of Figure 2. One of the
most comprehensive schemes used today is that developed with support
from the United States Nuclear Regulatory C ommission, and reported in
detail in ref 4. The scope of that scheme is too extensive to
adequately report here, and the reader is advised to read the source
document ref A.
We would conclude that although the classification schemes contain
imperfection and limitations, it is preferable to attempt to classify
DF data in accordance with a structured scheme, from which some attempt
can be made to derive parameter estimates for dependent failure
modelling.

3.2 Modelling Techniques

The modelling of dependencies is desirable if quantification of their


effect on system reliability is to be achieved. There are many models
reported in the literature and some attempts have been made to evaluate
the merits of certain models. In (1), the following models were
discussed
sub-system CMF models
- factor model
- Boundary or median type
- C ausal or shock model
- C ommon load model
and it was concluded that all the models can only be regarded as very
approximate in deriving a value for a system DF probability. It was
further advised that the models must be sensitive to factors relating
to defence measures implemented within a plant and to the engineering
tasks involved. In consequences, model parameters must be related to
such factors.
The merits of the factor model have been further evaluated by
SRD in an internal document (5), leading to the derivation of a partial
factor model. In essence the partial factor model provides a
format for making comments on the extent of plant dependency problems
by using a structured subjective engineering appraisal of the quality
of plant-specific defences against dependent failures, from which an
assessment of each partial factor can be made. The process requires
that a series of DF defence categories be each assigned their own
partial factor. The factor is derived by an analysis of the system
under review, according to the applied defences (2), and the
classification of DF's into various cause categories. An example
format is shown in Figure 3, where the cause classification of Figure 4
has been used. The sub-system factor is derived by obtaining the
product of all the partial factors. On the basis of earlier work
(1)(2) the best estimate or minimum value for each partial factor is
also shown on Figure 3.

- 117 -
PART 1

1 2 3 4 S 6 7 9 10 11 12 13 14 IS 16


UJ

od li UJ J Ul
UJ< O o
cc d o
" * GC >

O
o m X
J < OD
o * o UJ S *
UJ3

P. WITH!
DHPONEN
"5 li. _ j

PONENTS
j uj CL* UJ kJ
>?

PONENT
m i

PONENT
Q T. o r a r

LEVEL
O u u. CD
o o o >> o o o o z UJ
SS U l UJ
kJ
< OCj
"
< r
!*"
U I UJ UJ UJ
1/1 |/>
UJ UJ 3 3
UJ UJ
l/l l/> CL U I ID uj 3 ui
< o
UJ Sg > r 1

3 3 3 3 3 ) K . E a. i. I ^ u =
kj > t- u o
o o O o
< * H/N o ui <
S
0 UJ UJ UJ k J * </t 1/1 UJ u u < ui _<
u. -t

1 2/2 HP HA i/o F X X X X 1
2 1/1 11 wo F I XX X X 1

3 1/1 21 i/o FU XX XX X X 1/1


/ i/o FU XX X X 2/2
/ C i/o FU XX XX 3/3

/ 0 1/0 FU XX X X 4/4

/ 1/0 FU F XX X X 5/5
4 1/1 MS i/o FU F XX X X /
S 1/1 HS i/o FU XX X X /
H 1 7/7 31 32 3 3 / /
11 / 34 35 41 / /
12 / 51 1/0 FU XXXX X 4 /

PART 2

21 21b 21c 17 11 19 20

1/1
UJ

> UJ

5
kJ
<
<
EVENT I0EN riFICATION OA TA UJ
C
u. UJ
u. "5>
> "
IFORMAT TO SE DEVELOPED) = X o > OUJ
,UJ
cr >
k<
Ol= UI >
J 1 UJ UJ
<>z
? UJ 0
1
"nr
Yl*
O u.
u , " I CK uj
r o CL O

L< </.
CBS A G CG

Figure 2
DATA WORKSHEET
ACCIDENTAL H O U N D I N G LOA OS TO STEA M GENERA TOR LOU LEVEL

118
CMF CA USES CL
A SSIFIC
A TION PARTIAL FA CTOR
CMF OEFENCES
EDF EOR ECM ECI OPM ninmiN stono
OEN OEE
n
OESION CONTROL 1.1
OESIGN REVIEW ... ... I I
FUNCTIONAL DIVERSITY 1.2
EQUIPMENT OIVERSITY .IS
FAIL S A F E DESIGN _ _ 1.1 1
OPERATIONAL INTERF
A CES

PROTECTION t SEGREG
A TION ... .
I.
REOUNOANCY i VOTING .
__
PROVEN OESIGN I STA NOA ROISA TION
.
OERATING t SIMPLICITY 0.
CONSTRUCTION CONTROL

0.
TESTING I COMMISSIONING (.7
INSPECTION
I.f
CONSTRUCTION ST
A NO
A ROS O.f
OPERATIONAL CONTROL
l.t
RELIABILITY MONITORING l.t
MAINTENANCE
" 0.7
PROOF TEST
0.7
_
OPERATIONS 0.1

0.001

FIG.3 SUBSYSTEM FA CTOR ^ , , . f


3.3 Data

The modelling of dependent failures requires data on plant events in


order to both refine and calibrate the models used. Any collection
activity will need to collect information on both dependent and
independent failures, if the major dependent failures models are to be
supported. Recognising the need for data collection, NCSR has for some
time operated a component data bank which provides both an event data
store and a reliability data store. These stores provide two main
services:
information to contributors to the bank on the performance,
availability and reliability of their own plant, and
generic reliability data required by designers, analysts and
reliability engineers.
The information contained within the data bank is useful in
providing data on independent events, and to a lesser extent on
dependent events. However, it was recognised that because of the
importance of dependent events in high risk applications, that more
comprehensive evaluation of dependent events was required if modelling
processes were to be improved. To facilitate the collection and
classification of dependent events, studies have been implemented on
three power station boiler feed systems and one chemical process plant
to look for evidence of dependencies, and a dependent failure data base
has been implemented within NCSR.
The process of data collection has been tackled in three parts.
In the first instance it is necessary to define the information which
must be collected on each event in order that subsequent analysis of
dependencies and refinement of models can be adequately undertaken.
With the data requirements established, the second stage is to define
the boundary of the plant, or section of plant for which data is
required and to generate an inventory of all components of interest
within the boundary. Finally analysis of event records, over the time
period of interest, is undertaken.
The data collection must include:-
the causal mechanism (eg, design error, operator error),
applied defences (ie, those in place which were breached),
prospective defences (ie, those which if they had been in
place would, in the opinion of the analyst, have succeeded in
preventing the multiple failure),
failure mode and number of failures/survivors.
In addition to the inventory/event data approach pursued in the
UK, there Is further event data available from several sources, notably
in the USA, whether nuclear Licence Event Reports (LER's is operated.
This is a general database containing both Independent and dependent
events. The LER database has been the subject of several assessments
in order to extract the dependent events and the results of one
assessment is presented in ref 1, where some 8000 LER's concerning
approximately 220 reactor years yielded 166 dependent events. It
should be recognised that LER's do not contain all reactor events and
therefore such data must be used with caution in any system
assessment.

- 120 -
IfCJVLF. Causes

Fig.4 Classification of Common-Mode Failures


3.4 C omputer Support

There are a number of codes available to the analyst and within SRD,
the computer codes employed to support reliability analysis, are most
notably, the codes PREP-KITT and ALMONA employed for fault tree
analysis, while SAMPLE is utilised as part of sensitivity analysis.
Although there are a number of programs which are available to perform
the same tasks, SRD has standardised on the use of the PREP-KITT and
ALMONA codes and has produced a user-friendly environment in which to
operate the codes. This environment provides for graphical input and
editing of a fault tree by computer display, inbuilt routines for data
entry and checking, calculations of contact information, and a
formatted print-out of the trees. This environment eases the workload
of the assessor, making it possible to perform rapid evaluations of
changes to the tree structure and basic event data.

3.5 Structured Process for Qualitative Analysis

The process whereby a qualitative analysis can be performed in a


rigorous manner, is best demonstrated by reference to the current
procedures adopted by SRD.

3.5.1 The current procedures at SRD for evaluation of dependent


failures require that a systematic engineering analysis of the system
under review be implemented. The analysis will include application of
event tree and fault tree techniques, supported by the use of computer
codes such as PREP-KITT, ALMONA and SAMPLE. Support for Probabilistic
Risk Assessment (PRA) is provided by means of a modified version of
PREP-KITT. Any analysis will include qualitative and quantitative
evaluation of system failure modes, utilising generic data from the
various NC SR data banks, plus data collection activities to support
specific plant analysis as required.
Modelling techniques employed primarily are the factor, partial
factor and boundary (cut-off) models. However applications of
further models, such as the Multiple Greek Letter Method (MGL) and the
Binomial Failure Rate Model (BFR) and also considered, and a new model,
the 'Trigger/ C oupling Mechanism' model is now being developed at SRD
to relate more clearly cause and defence information.
Current analysis within NC SR utilises the procedures discussed in
references 1 and 2 in the following manner.
1 Development of systems logic.
2 Identification of components affected by the attribute or
environment.
3 C heck whether any cutset of the fault tree contains two or
more affected components.
4 Assessment of component defences against the cause of the
dependency.
5 Evaluation of the effect on the reliability of the system
reflecting, as much as is possible, the detailed qualitative
analysis.

122 -
Develop system
logic

Defermine dependency
categories

identify components
affected

No

no dependency

Potential
Dependency

No

Potential dependency
not significant

Potentially
significant
dependency

No

Dependency not
significant

Significant dependency

quantify

FIGURE 5. Dependent f a i l u r e s analysis strategy

- 123 -
Such a framework Is recommended since it considers all identified
potential dependencies initially. Those that have no significant
influence on system reliability can then be discarded by reference to
the fault tree cutsets.
Thus quantification, after a detailed examination of specific
defences, is performed only for the potentially significant
dependencies which remain.
The framework is now considered in greater detail.

3.5.2 Development of systems logic. Although a checklist of common


attributes which past experience has revealed as being important (eg,
common design, common maintenance group) is essential, this ought to be
tailored to the specific system being studied via a detailed
qualitative analysis.
It is recommended that a dependent failure oriented Failure Modes
and Effects Analysis (FMEA) be performed as a precursor to Fault Tree
Analysis (FTA).
The principal difference between this and standard FMEA is in the
emphasis given to identifying possibly related components. Columns
specifically headed "cause" and "related components" serve as a useful
discipline in systematically identifying potential dependencies.
Components which share the same test and maintenance groups can be
shown to be thus related under the latter column. Test or maintenance
errors, eg in calibrating equipment, can potentially affect all
components in the same group. If it is clear that a specific system is
vulnerable to multiple calibration errors then it may be necessary to
specifically include this in the checklist of dependency categories.
Ultimately, a parametric analysis will seek to quantify this.
(Implicit modelling). Of course, functional failure dependencies (eg,
reliance on common batch of fuel) should be revealed by FMEA and
incorporated directly in the fault tree. (Explicit modelling). It is
also possible to incorporate within the FMEA a recognition of past
incidents. FMEA is also valuable in analysing control and
instrumentation malfunctions which past experience shows can be very
significant contributions to system unreliability.
On the basis of the experience gained in performing the FMEA, a
fault tree can be constructed. Since this will serve as the basis for
the dependent failure analysis, it is especially important that this
tree does not contain any pre-suppositions of independence. If it did
the cutset-based approach advocated here would be invalidated. If it
is at all possible the fault tree should be constructed in modular
form, with the modules corresponding to functional groupings of
equipment. Care should be taken to ensure that all relevant operating
conditions are modelled. Usually the normal alignment case will
suffice, but not always.
Such detailed qualitative analysis should be adequate for the
evolution of a checklist pertinent to the system in question.

3.5.3 Identification of components affected by the attribute or


environment. Each basic event addressed in the fault tree must then be
assigned a label for each item in the attribute and environment

- 124 -
checklist. For example, 'manufacturer' Is one attribute which should
always be present In the checklist. Different manufacturers can be
designated A, B, C and so on.
Thus each basic event would be labelled either , or C ... .
Then two components with Identical labels indicates that the attribute
or environment in question (in this case manufacturer) is the same for
those two components. Hazard assessment (internal plant hazards
mainly, since separate explicit methods are currently in vogue for
external hazard assessment) can also be incorporated as part of the
checklist analysis. Where hazards are not linked to particular items
of plant they are postulated to occur in a location without attempting
to identify a specific source. Within the zonal schemes for each such
hazard (eg, fire, humidity, . . . ) , the affected zone(s) are determined
by the identification of barriers to the propagation of the hazard from
one zone to adjacent zones. Barriers might typically be fire doors and
fire resistant walls. C omponents can be assigned labels as before
identifying which zone they are in with respect to a particular hazard.
By two components sharing the same label it is indicated that they
share a common environment ie, they are in the same zone for a
particular hazard and thus are susceptible to a potential dependency.
Defences against the occurrence of a general hazard in any zone need
only be considered when it has been found that a potentially
significant dependency exists. Such a general treatment is not
appropriate to many hazards which are caused by specific sources.
These may be the failure of a plant item, such as water escaping from a
breach or a dropped load. In this case, possible sources may be
identified and the immediate area defined as the primary zone.
Secondary zones can be found by considering the progression of the
hazard to other areas.

3.5.4 C heck whether any cutset of the fault tree contains two or more
affected components. In principle the fault tree cutsets are examined
one by one. In practice, this either requires the use of a computer
code to search through the minimal cutset listing for cases where basic
events within he same cutset share a common attribute or environment or
It may be possible to capitalise on the modular nature of the fault
tree. A matrix method has been used at SRD to greatly facilitate the
visual inspection of potential dependency routes. This method takes
advantage of the structure of fault trees which are composed of
identical modules.
For each attribute or environment a matrix is drawn up. The rows
of each matrix are the different modules comprising the overall
multi-channel system; while the columns re the analogous components in
each module. The matrix is then filled up with the appropriate labels.
Table 1 below is an example taken from a 4-train system featuring
identical redundancy. The components are denoted by numbers; the
numbers relevant to module 1 being representative of their identical
counterparts in other modules.

- 125 -
TABLE 1

COMMON MANUFACTURER
7 8 9 10 11 12 13 17 21 23 14 16 18 19 20
module 1 A A C E F G H M M A E J
module 2 A A C E F G H M M A E J
module 3 A A C E F G H M M A E J
module 4 A A C E F G H M M A E J

In the study from which this table is cited, the components shown
were in fact first order cutsets in each module. Thus all possible
combinations of four components, one from each row, yield fourth order
system cutsets. In conjunction with other matrices, this matrix showed
that for other order cutsets containing identical equipment
(principally the single columns) could reasonably be expected to
dominate the dependent failure analysis on the basis of sharing common
attributes. Thus in this instance it was not expedient to screen the
entire minimal cutset listing on a cutset-by-cutset basis. In the
example above entire cutsets shared a common attribute, this has been
designated a first rank potential dependency. A second rank potential
dependency would be for example where three components shared a common
attribute within a 4th order cutset. Third rank potential dependencies
can be similarly defined, and so on. In general these should not be
discarded without reasons being given for so doing.

3.5.5 Assessment of component defences. It is at this stage, after


filtering out potential dependencies which are insignificant on the
basis of the cutsets, that a consideration of specific defences against
significant potential dependencies due to hazards is best performed.
Component defences such a switch-gear mounted on plinths,
water-proofed motors, drains etc, need to be considered. It may be
possible on these grounds to justify the removal of further potential
dependencies.
Some comment ought to be made on the fact that this is a
cutset-based approach. Some analysts claim that it is necessary to
Incorporate dependent failure effects into the fault trees because of
the way in which the dependent failure logic impacts the logic model -
what is minimal before is not after, it is claimed, this argument is
usually advanced in the following manner:

Let A and be components both depending on the successful operation of


a third component C. Fault tree analysis yields the minimal cutsets
{(A), (B)} and {(C)}. Fault tree evaluation after incorporation of
dependent failure effects in the fault tree structure yields the

- 126 -
following list of minimal cutsets: {(A), (B)}, {(C)}, {(AB)}, {(AC)},
{(BC)}, {(ABC)}. In a cutset-based approach it is not immediately
clear that the fourth, fifth and sixth items of the latter list would
be correctly quantified. Their effective inclusion would depend on the
data used to quantify (C) and (AB). Ideally (ABC) would be covered in
accounting parametrically for all possible dependencies involving A and
in the database. It is argued that with the configuration under
consideration, where components A and are in parallel, and that
combination is in series with component C, that the contribution from
cutsets AC and BC are irrelevant since C and AB form a series system.
The contribution from AC and BC should however be considered if they
enhance the failure rate of C. Dependent failures concerning common
supports such as C ought to be modelled, but such dependent effects
will normally be dominated by independent failures of first order
minimal cutsets In say a beta factor treatment. Thus the effect of
dependent failure logic on the logic structure itself is conceded, but
it is claimed that a cutset-based approach, when care has been taken in
fault tree construction not to incorporate any pre-suppositions of
independence, will serve as a good approximation. The approximation of
this approach is to limit the search for potential dependencies to
those which are significant in terms of the top event probability.

3.5.6 Evaluation of the effect on the reliability of the system. The


qualitative analysis provides a structured systematic assessment
covering the identification and consideration of potential dependencies
from all sources within the study boundary conditions. While such an
analysis is the major feature of dependent failure assessment, the
nature of dependent failures Is such that it is difficult to explicitly
recognise some categories of potential failures, particularly design
Inadequacies and human error. The quantitative analysis, while seeking
to quantify the explicit dependencies, also accounts for the influence
of the broad characteristics of system design and operation. Thus the
quantitative analysis is important in providing the correct perspective
of the potential significance of dependent events for system
reliability.
The qualitative analysis is an essential first step before any
quantification of the dependent factors can be made. The analysis does
stand alone, however, as an insight into the quality and possible
weaknesses of the design. Methods are not available to quantify
directly all the dependencies identified in the qualitative analysis.
The aim of the quantitative analysis is to reflect, in some degree, in
numerical terms the preceeding qualitative analysis, and also the
potential for unidentified dependencies. Current methods are, however,
limited by data scarcity on dependent failure events, to simple,
general models.
Such models as may be used at SRD include:
System reliability cut-off,
Partial Beta Factor,
Conventional Beta Factor,
Multiple Greek Letter (MG L),
Binomial Failure Rate (BFR).

- 127 -
4. C ON
C LUSIONS

In order to successfully analyse the potential dependencies which may


be present in a system design, and to quantify the effect on system
unavailability, a structured assessment process must be implemented.
Utilising established cause, event and defence coding schemes,
together with FMEA and FTA techniques a qualitative assessment can be
performed to identify dependencies.
The qualitative analysis can provide an insight into the robustness of
the plant against dependent failures and can also highlight
deficiencies in plant design for which additional defences may be
required.
The impact of the detected dependencies on system unavailability
can be quantified by the application of one of several models available
to the analyst.

5. A
C KNOWLEDGEMENTS

The author would like to take this opportunity to give recognition to


Mr Brian Johnston (SRD) who developed and applied the structured
assessment technique within the CCF-RBE.

REFERENCES

1 SRD R146. Edwards, G and Watson, I A. A Study of Common-Mode


Failures, July 1979.

2 SRD R196. Bourne, A J, Edwards, G , Hunns, D M, Poulter, D R,


Watson, I A. Defences against common-mode failures in redundancy
systems. January 1981.

3 EPRI NP-3837. A Study of Common-Cause Failures. Phase 2: A


Comprehensive Classification System for Component Fault Analysis.
Los Alamos Technical Associates, Inc. June 1985.

4 INEL root cause scheme. Informal report.

5 Unpublished work. Edwards, G T. A Method of analysis of


Common-Mode Failures. April 1980.

6 PLG-0427. Fleming, ,. Mosleh, A, Deremer, R K. A Systematic


procedure for the incorporation of common cause events into risk
and reliability models.

7 Vesely, W E. 'Estimating CCF probabilities in reliability and


risk analyses: Marshall-Olkin specialisation'. NSRE & RA, SIAM,
Philadelphia 1977.

128
8 NUREG/CR-1401. Atwood, C L. 'Estimation for the BFR Common Cause
Model'.

9 IN-1349, Narura, R E, Vesely, W E. PREP and KITT - Computer Codes


for the Automatic Evaluation of Fault Trees.

- 129
DEPENDENT FAILURE MODELLING BY FAULT TREE TECHNIQUE

S. Contini
Commission of the European Communities
Institute for Systems Engineering
21020 Ispra (VA), Italy

ABSTRACT. The problem of dependent failure analysis of


complex systems by the fault tree technique is presented.
The approaches of modelling and analysis are briefly
outlined. Problems in qualitative fault tree analysis and
methods of solution are described.

1. INTRODUCTION
In PRA studies the fault tree model is widely applied for
systems analysis. The correct modelling of a system needs
considerations on all possible types of dependencies, in
order not to underestimate the accident occurrence
probabilities.
Some dependencies can straightforwardly be modelled during
the fault tree construction phase (i.e. functional
dependencies, cascade failures, command failures, some
technical specifications), for others (i.e. generic common
causes of failure) the application of complementary
procedures is generally needed. The CCF analysis can be
performed in different ways, depending on the aim of the
study, the availability of data, tools, etc.
Often, the fault tree construction is preceeded by a
Failure Mode and Effects Analysis (FMEA) study. Proper
modules are also used to document all potential causes of
common failure arising both internally and externally to
the plant /l /.
The result of the fault tree analysis is expressed by a
list of failure combinations (Minimal Cut Sets, MCSs) to
which the quantification procedure is applied. These
results allow the analyst to identify the critical points
of the system, and give him guidance for the definition of
suitable design changes to improve the characteristics of
the system.
In this paper the main approaches for system modelling and

131
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 131-143.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
analysis by fault tree are briefly presented. More
attention is focussed toward the techniques for
qualitative CCF analysis, being those for quantitative
analysis fully described in other papers of this volume.

2. SYSTEM ANALYSIS
Generally, during the fault tree construction phase, at a
component level, the analyst has to model the following
types of failures:
a) Primary failures: describe the intrinsic failures
caused by the random performance of the component; its
inputs are supposed to be within the design envelop; the
repair is required to return the component in the working
state;
b) Command faults: describe the situations in which the
component is unable to perform its intended function
because of the lack of proper input or because of spurious
input signals that change the component state, but without
damaging it; the repair is not required to return the
component in the working state;
c) Secondary failures: describe the component failures
due to eccessive stresses caused by external circumstances
e.g. the failure of another component, abnormal
environmental conditions, human errors in operation, or
caused by human errors in design, construction, etc.
In order to take the component in the working state the
repair action is necessary.
The types of dependencies among components due to
technical specifications can be described in the fault
tree model, even if they may lead to non-coherent logical
structures.
Consider, as an example, a two train stand-by redundant
system subject to periodic testing; suppose that only one
train at a time can be under test. The "natural" modelling
requires the use of the NOT operator as shown in /2 /.
The analysis of a non-coherent structure function requires
the use of a fault tree analysis code able to deal with
negated events: this is a feature that not all computer
codes have.
Technical, specifications can also be modelled as coherent
structures; in these cases, however, as many fault trees
as the number of different system configurations are to be
developed (i.e. both trains on-line, train Tl on line and
T2 under test, and vice versa).

132
The modelling and quantification of multiple failures due
to a single cause (not arising from a component failure),
referred to as "common cause failures", received a wide
attention since the early seventies due to their practical
relevance.
Common causes of failure are "introduced" throughout all
the development phases of the plant: from design to
operation. The identification of CCFs represents the most
important step of the analysis. In order to make the
identification phase as systematic as possible, it is
necessary to make use of a check list of sources of common
causes of failures in order to systematically select those
that might produce critical effects on the system being
analysed. The existing classification systems give a
guidance to the analyst in setting up the system oriented
check list /3 /.
A common cause failure analysis can be performed
quantitatively, provided that event histories are
available; in any case, also a simple qualitative analysis
allows to gather useful information on the existence of
potential dependencies and, consequently, to define the
actions to be taken either to eliminate or to reduce the
degree of dependency.
The main steps of the analysis procedure can be summarised
as follows.
a) Model, in the fault tree at component level, the
information on CCFs.
b) Determine the MCSs. Some of them contain events
representing failures due to common causes. If data is
available, perform the quantitative analysis.
c) Examine a MCS to check whether it can be verified (or
strongly affected ) by the CCF events.
d) If the effects of at least a CCF is considered to be
important, step e) is performed, otherwise step c) is
repeated for another MCS.
e) From a list of preventive measures those which can
eliminate or significantly reduce the effects of the
CCFs are considered. If there are other MCSs step c) is
repeated for a new failure combination.
Different methods have been developed for implementing
steps a) and b). A brief description of them is given in
the next sections.

- 133
2.1 Common cause failure modelling in a fault tree
Three approaches to system modelling are listed: all are
based on the representation of possible common cause
dependencies into the fault tree at component level.
In ref. / 4 / the CCFs are explicitely represented in the
tree as primary events. As an example, let us consider the
fault tree of a system formed from three components A, B
and C (see Fig. 1). Suppose that these components are
respectively sensitive to the following secondary causes :
A <==> [CI, C2, C3]
B <==> [Cl, C2]
C <==> [Cl, C3]
For instance, CI could represent a given manufacturer, C2
the abnormal environmental temperature and C3 the physical
location.
These new causes of failure are explicitely represented in
the fault tree as shown in Fig. 2. Each component is
therefore represented by a subtree describing the
disjunction of the random failure and other causes of
common failure. The complexity of the fault tree depends
on the number of common causes that are considered (i.e.
the extent of the analysis): a screening procedure can be
applied to reduce it.
Another way of representing the set of dependencies is to
associate each primary event with a vector of attributes
[Cl, C2, ] representing the common causes of failure
to which it is sensitive / 5 /. Fig. 3 shows the fault
tree for the example of the three components system. The
basic fault tree does not change, since the information on
CCFs is separately associated with the primary events.
A third way of modelling, mainly developed for
quantification / 6 /, is represented in Fig. 4.
This modelling is similar to the first one; the difference
is due to the different meaning associated with the CCF
events. Indeed, in this approach the common cause failure
events represent the generic dependence of a group of
components. For instance, in Fig. 4, in the subtree for
the component A, C-ABC represents the set of generic
causes that lead to the failure of all components and
C-AB, C-AC the set of generic causes that lead
respectively to the failure of A B and A C.
A screening procedure is applied to identify the groups of
components that may be dependent (i.e. pumps of the same
tyPe/ same manufacturer, installed in the same room etc.).

134
As with the first approach, this type of modelling
increases the fault-tree complexity, but it has the great
advantage of completeness, since all possible combinations
of components (i.e. all dependent, all independent, some
dependent and some independent) are included.
An idea about the complexity of the fault tree can be
obtained by looking at the number of primary events in a
parallel system of dependent components. In case of
independent failures only, the fault tree would have
primary events; when dependencies are incorporated the
number of primary events is given by:
r -,
Ne
i=l L i -l
The two first ways of modelling CCFs can give useful
information in case of qualitative analysis only, since
the meaning of the CCF events is chosen by the analyst.
Generally, the identification of CCFs require additional
information (e.g. the layout of the plant) and, for each
component, the list of the generic causes (e.g.
temperature, humidity, vibrations, ...) to which it is
sensitive.
These data are necessary to determine, for each common
cause event C i, the domains in which it can have some
influence and the list of affected components. The domain
of a given common cause of failure C i is the area
delimited by walls into which all components of the system
are subject to C i. From the characteristics of these
components, those that are sensitive to Ci can easily be
selected.
For a given event Ci, the determination of its domains of
influence and the list of affected components can easily
be automatised 111.
Other data are needed, concerning information on
manufacturer, test/maintenence procedures, etc.

2.2 Analysis of fault trees with dependencies


Regardless of the method adopted for modelling the
dependencies, the complexity of the fault tree may grow
considerably. C onsequently, the computing time for
determining the MC Ss may become very high, unless
particular methods of analysis are adopted.
For example, with the first modelling method , the fault
tree may be manipulated, before determining the MCSs, in
such a way so as to reduce its complexity IM.

135
The determination of the MCSs can be performed by means
of any existing fault tree analysis code. The logical and
probabilistic cut offs can also be applied to reduce the
computing time. The generic MCS contains any combination
of independent and common cause events. For the fault tree
of Fig. 1 the expression for the Top event takes the
following form.
Top = A*B + A*C + CI + C2 + C3 ( 1 )
Obviously, MC Ss in which a common cause affects only one
component are not considered.
If data for all primary events are available the
quantitative analysis can be performed. In / 4 / this is
performed by means of the Monte C arlo simulation
technique.
The second method of modelling (as well as the previous
one) presents some interesting features, especially when
the quantification can not be performed because of
unavailability of data. In these cases the analysis
addresses to the identification of those aspects of the
system design that need to be reconsidered. In other words
the aim of the analysis is to verify whether the system
may fail, or whether it may suffer from a relevant
degradation, as a consequence of the occurrence of a
single event. The engineering judgement substitutes the
quantification.
The dimension of the fault tree does not increase, but
this does not mean that the effort needed for the analysis
could be lower than that of the other cases (explicit
modelling).
The MC Ss resulting from the analysis of the tree contain
some attributes representing the common causes to which
their elements are sensitive. For the example under
consideration the result would be:
Top = A * + A * C <-- expression
CI Cl CI CI
C2 C2 C3 C3 attributes
This result can obviously be expressed also in terms of
MCSs as shown in ( 1 ) .
Attributes that are shared by all elements of the MCS are
referred to as "critical CCFs", since their occurrence may
directly lead the system in a failed state.
Those that are shared by j components out of k (k order of
the MCS) may cause a system degradation to an extent which
is a function of the number of the remaining k-j events.
These C C Fs are referred to as "relevant CCFs" of order

136
w = k-j. Looking at the example, CI, C2 and C3 are all
critical CCF.
The result is a qualitative ranking of CCFs according to
their possible effects on the system. Therefore, for
critical C C Fs and relevant CCFs of lower order (e.g. w =
1, w = 2 ) , an investigation has to be done to verify
whether or not the system contains suitable defences.
It can be realised that the analysis of a fault tree with
attributes may often lead to the determination of all
MCSs. Therefore, when the tree becomes a little bit
complex, many difficulties may arise even for determining
the set of MCSs whose components share, either completely
or partially, at least one attribute.
This steams from the fact that the cut-off techniques
loose their efficacy, since their application is limited
to those subcombinations not containing any attribute
(i.e. independent failures). Generally, the more is the
number of attribute, the more is the number of components
with attributes and therefore the higher is the
computation time.
Batch programs have been developed for qualitative and
quantitative C C F analysis / 7,8 /.
Instead of analysing all CCFs in a single run it seems
more convenient to perform as many runs as the number of
attributes. This approach can easily be implemented with
an interactive program.
For a given attribute, the result of a single run is the
disjunction of only those MCSs in which at least two
elements share the attribute.
If a "partial logical cut-off w " is defined as the
number of events in a MCS without any attribute, it is
possible to further specify the type of MC Ss to be
determined. For instance, w = 0 would imply the
determination of the list of MCSs in which all elements
share the attribute (i.e. critical CCF); w = x, (x > 0 ) ,
would lead to the list of MCSs having not more than "
events without the attribute.
Alternatively, it is possible to express the partial
logical cut-off in probabilistic terms simply by assigning
a probability 1equal to 1 to the events with attribute and
probability 10" to all the others. Therefore, the analysis
of the tree with probabilistic cut off 10->v gives all MCSs
with no more than " w " events without any attribute.
A fast a-priori check on the existance of critical C C Fs
can also be performed through a simple analysis of the
tree structure, i.e. without determining the MCSs.

137 -
Algorithms have been developed to analyse fault trees with
attributes / 5 /; they are based on a "pruning procedure"
that produces, for a given attribute, a reduced tree
whose analysis gives only the requested MCSs. In other
words, from the original tree all branches that cannot
give the requested MCSs are removed in order to simplify
the tree to be analysed. The main steps of these
algorithms, taken from /5/, are described in the appendix.
Finally, the determination of the MCSs for the fault tree
with C C Fs explicitely modelled according to the third
method, can be performed by means of standard fault tree
analysis codes.
However, due to the complexity of the resulting fault
tree, some procedures have been proposed to reduce the
analysis effort needed. A complete description of these
methods (approximated techniques) can be found in / 6 /.
One of these techniques consists in the analysis of the
tree in two stages. At first the MCSs are determined at
the component failure mode level (e.g. for the example of
fig. 4 the result would be A*B + A*C ), and then the
different causes of failure of each component are
expanded. This expansion can, to its turn, be performed at
different steps by progressively considering the various
types of common cause events, according to the following
priority, as suggested by the experience:

- consider only the event that fails all the components


of the cut set;
- consider the event that fail components (2 < < N-l).
A complete decription of the quantification procedures can
be found in / 6, 9 /.

3. CONCLUSIONS
The analysis of a complex system has necessarily to
consider the possible types of dependencies, in order not
to under estimate the top event probability. The C C F
modelling can be performed in different ways depending on
the aim of the analysis. When data are available, the
third method described can be applied for quantification.
However, in many practical situations only qualitative
results can be obtained and in these cases the other two
approaches allow to identify those parts of the system
that may require further considerations from the designer.
To this aim an interactive analysis procedure is proposed
and two fundamental algorithms described.

138
.total

*
. . 4tll


^C

~^ <

Fig. 1 Fault tree of the three components system

n litt

I *> tei Iti


s C* kW E M I Cata* o u i
s
CI

C> C3~^ <T^ ^ S


o~^ t; (j e; c; (J

Fig. Explicit modelling of the generic causes of


common failure according to the first approach

139
'<>

C I. c

"~^ ^ 7

Fig. 3 Modelling of generic causes as attributes

_1_
"11*'

H i l l
Hill

an tily
Ittil
S
TJ o c> ^ 7??
* IC

O"^ O^^> ^J

s
o o

Fig 4. Explicit modelling of generic causes of common


failure according to the third approach

140
4. REFERENCES

/l/ A. Poucet
"Experience and Results of the CCF-RBE"
This volume
/2/ A. Amendola, C. Clarotti, S. Contini, F. Spizzichino
"Analysis of C omplete Logical Structure in System
Reliability Assessment", EUR 6886, 1981
/3/ P. Humphreys
"Analysis Procedures for Identification of MRF"
This volume
IM A. Blin et al.
"Patrec, a Computer Code for Fault-Tree C alculation",
Synthesis and Analysis Methods for Safety and
Reliability Studies, Proceedings of NATO ASI, Edited
by G. Apostolakis, S. Garribba and G. Volta, 1980.
/5/ S. Contini,
"Algorithms for Common Cause Analysis. A Preliminary
Report", PER 106.01.81.16, JRC Ispra, 1981
/6/ K.N. Fleming, N.O. Siu, D.C. Bley, M. Kazarians
"PRA Procedures for Dependent Events Analysis", Vol.1,
2, EPRI PLG-0453, 1985
D.M. Rasmuson, N.H. Marshall, J.R. Wilson, G.R.
Burdick
"COMCAN II-A. A Computer Program for Automated C ommon
Cause Failure Analysis", TREE-1361, 1979.
/8/ CD . Heising, D.M. Luciani
"Application of a C omputerized Methodology for
Performing Common Cause Failure Analysis: The Mocus
Bacfire Beta Factor (MOBB) Code ".
Reliability Engineering, Vol. 17 No. 3, 1987
/9/ K.N. Fleming
"Parametric Models for Common Cause Failure Analysis"
This volume.

141 -
APPENDIX
The analysis of the sensitivity of a system to common
causes of failure can be advantageously performed
interactively by applying the algorithms described below.
The first algorithm can be applied to determine the
critical C C F events and the MCSs involved. The second
algorithm allows to determine also the set of relevant
CCFs. For the sake of simplicity these procedures are
described for a single common cause, being the
application to a set of common causes straightforward.
Both algorithms require, as input data, the fault tree and
the attributes associated with the primary events.

A.l Determination of the MCSs affected by a critical CCF

Stepl. Select the attribute for which the affected MCSs


have to be determined. Associate the binary variable to
each primary event, = 1 if the primary event is affected
by , otherwise = 0.
Step2. Visit the tree upwards and associate, to each
gate of the tree, the binary variable . The value to be
assigned to is given, for the generic jth gate, by:
j = (* + (1S)*i) if j is an OR gate
i
j = Ti(S*ai + (l6)*i) if j is an AND gate
i
with " i " ranging over all descendants." " takes the
value 1 if the ith input is a primary event, it takes the
value 0 if the ith input is another gate.
Step3. If the binary variable associated with the Top
event has value 1, the considered attribute represents a
critical C C F. In these cases the MCSs affected by this
common cause can be easily determined after pruning the
tree as described in the next step.
Step4. Delete all gates having = 0 and all primary
events with = 0. The resulting tree may contain some
branches with single descendants, and therefore it needs
to be restructured. The analysis of the resulting tree
gives the set of all MCSs affected by .

142
A2. Determination of the MCSs affected by a relevant CCF

Stepl. Select the common cause Q. Define the value of the


partial logical cutoff w. Associate the binary variable
to each primary event, = 0 if the primary event is
affected by , otherwise = 1. (note that takes values
which are opposite to those described in the previous
algorithm).
Let S be the set of events with = 1.
Step2. If S is not empty, it is partitioned into the two
subsets, SI and S2, containing respectively the repeated
and not repeated events. Determine ai = l/Ri (for all
elements of SI), where Ri is equal to the number of times
the repeated variable can be combined with itself during
the determination of the MCSs. A conservative value of Ri
is given by the number of occurrences of the event.
Step3. Visit the tree upwards and associate, with each
gate of the tree, the binary variable . The value to be
assigned to is given, for the generic jth gate, by:
j = min (Sai + (1S)*i) if j is an OR gate
i
j = (* + (l6)*i) if j is an AND gate
i
with " i " ranging over all descendants." " takes the
value 1 if the ith input is a primary event, it takes the
value 0 if the ith input is another gate. Note that the
values taken by and are not necessarely integer.
The value j represents the minimum number of components
without any attribute that can be found from the subtree
during the determination of the MCSs.
Step4. Delete all branches having > w. The resulting
tree may contain some branches with a single descendant,
and therefore it needs to be restructured. The resulting
tree can be analysed for determining the set of MC Ss
having not more than w events without any attribute.
During the analysis, however, the cutoff w has to be
applied again in order to eliminate the remaining not
significant cuts.

143
TREATMENT OF MULTIPLE RELATED FAILURES BY MARKOV METHOD

J. CANTARELLA
Universit Libre de Bruxelles, Service de Mtrologie Nuclaire
50, avenue F.D. Roosevelt
1050 Brussels
Belgium

1. Introduction

We will present the main features of works and searches performed


since 1984 at Brussels'University under direction of Professor J.
DEVOOGHT (director of the "Service de Mtrologie Nuclaire", Applied
Sciences).
We will first shortly introduce you to markovian modelling,
related to unavailability analysis, this chapter including the
description of the main features of the Sstagen-Mmarela code [1], the
starting point of our work.
We then put some stress on the main impediments to an extended
use of markovian tools, and on the other hand, on its main advantages
(i.e. integrating functional dependencies, repair policies, inspect-
ions and maintenances). As an illustration, a technical application
of one of our code version, integrating maintenance modelling, is
presented.
One of our target has been the development of markovian tools,
to be applied to nuclear power plant system's availability and
reliability analysis; being therefore faced to large systems, we
developped and implemented two tools enabling us to deal with "large-
scale" Markov processes - the "correlated supercomponent technique"-
a peculiar case of aggregation/disaggregation procedure, both
described here.
As a conclusion, ability to deal with "large-scale" processes is
specified.

2. Markovian Modelling Applied To Unavailability Analysis

We try here to present markovian modelling in a quite practical and


intuitive way. For more detailed and rigorous explanations, we
suggest references [8,9].

145
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 145-157.
1989 ECSC, EEC, EAEC. Brussels and Luxembourg.
2.1. Markovian assumption

Let us consider a discrete stochastic process and note

Sj(n) a state of the process at time t

Pij(n) the probability of process transition from state Si


to state sj at time t n

Whenever the transition probabilities at a given time only


depend on the current state occupied at this given time, and not on
the previously occupied states, the process can be described by a
markovian chain.
This is what we can call a "zeromemory" process [12] : one has
not to know the whole process history to evaluate its further
evolution.
In this case we can define a "transitionmatrix"

P(n) {p(n))

and considering a "statevector"

() = {()}
where
TTi.<n) is the probability to be in the state s at
time t

the evolution on timeinterval [tniit] is ruled by

Tt(n) = (1 ).P(n1 )

2.2. Application to system unavailability analysis

Let us consider the evolution of a system consisting in binary


components. Each of its component fails or is repaired according to
failure and repair distribution laws. Under some assumptions on
these distributions, the stochastic evolution of the system can be
modelized by a markovian process.
We note the statevector of the system, being the
probability that the system is in state i, a state of the system
being determined by the states of its components. We note the
transitionmatrix of the system, integrating components repair and
failure rates (, ).
In this continuous case, system evolution is ruled by a set of
firstdegree linear differential equations

dir(t)/dt = TT<t)P

with i n i t i a l condition 0 ( s t a t e v e c t o r of the system at t = o ) .

146
It can be shown (and it also appears quite intuitively) that the
markovian assumption corresponds to exponentially distributed failure
and repair laws.
Failure and repair rates can depend on calendar time (non-
hotnogeneous process) or be constant (homogeneous process).
To illustrate the evolution equation, we finally present here
trie most trivial example, the one of a system including one single
repairable unit.

Example

Notations "0" failed state of the component


"1" operating state of the component
failure rate in inverse-hours
u repair rate in inverse-hours

"transition-diagram"

set of equations

diTi/dt = - + u ne

diTo/dt . u TTC

2.3. The Sstagen-Mmarela code

The starting point of our work has been the Sstagen-Mmarela code
(Superstate Generator-Markovian Reliability Analysis) developped by
I. PAPAZOGLOU and E. GYFTOPOULOS at the Brookhaven [1].
Its main features are :
an automated construction procedure of the transition-matrix
a block-storage of this matrix (as we will see later on, we
have to deal with very sparse matrices)
an automated merging procedure of Markov processes of systems
exhibiting symtries.
In spite of this last feature, there is still a strong need to
reduce memory requirements and calculation times, as we will see in
next section.

147
3. Advantages/Impediments To Markov Modelling

3.1. Impediments

It is well known that the main impediment to an extended use of


Markov modlisation remain fastly increasing memory requirements.
A N-binary components system is described by a 2N components
state-vector, and, as we can see examining transition-diagrams, the
transition-matrix integrates 2N.(N+1) non-zero elements (whenever
each component is repairable).
Time requirements constitute another difficulty.
We have to temporaly integrate large differential systems.
Transition-matrices are very sparse (2N.(N+1) compared to 2 a N ) ;
so we have to store them properly, in order to avoid zero-
element storage. But of course, reduced-storage leads to
increase time-consuming whenever we use these reduced matrices
(e.g. reordering the elements, or merging).

3.2. Advantages

The markovian model applied to reliability analysis is well known as


an effective tool, whenever some dependencies affect the
probabilistic behaviour of system's components. This is due, as we
will see, to the perfect model's ability to integrate conditional
failure and repair rates, according to the different system's states.
Moreover, studying dynamical evolution of systems, we are able to
include human actions into the temporal evolution.
We here present three main points.
The model enables us to study repair policies, modelling the
effects of the number of repair crews, the priorities in
reparations, etc. In the same way, dealing with standby
components is quite obvious.
More generally, we are able to integrate functional dependencies
between components or subsystems, whenever in fact they show
strong temporal interactions. In some way, this can consist in
quantitatively dealing with related failures (case of "common
loads").
Let us consider following three-components system example

and let us consider that the failure rate of component 1 depends


on the current state of component 2, i.e.

- 148 -
-, if component 2 is operating

*, > , if component 2 is failed

We will then integrate at the transition-matrix level, for


instance

P u = with states i (1,1,1) and j {0,1,1}

but

Pfci = ', with states k = {1,0,1) and 1 = {0,0,1)

The integration can be performed automatically by the code when


evaluating P, the user having of course inputed the fact that
component 1 behaviour depends on component 2 current state.
Without entering in any computation features, we see here that
the markovian model naturally integrates conditional failures
(common load effects) in the most rigorously way.
Another important point is that, studying dynamical evolution of
systems, we are able to include human actions into systems'
temporal evolution. Different versions of the code have been
implemented by an inspection modelization and a maintenance
modelization, both able to integrate conditional human failures
probabilities.
What we call maintenances are finite duration operations, like
reparations or replacement of equipments. It allows to deal
with putting some redundancies in unavailable states (wheter
this occurs voluntarily or not) over finite durations.
Next section shortly presents an application of these
possibilities.

A. Inspections, Maintenances : A Case Study [10]

In collaboration with the belgian society Tractebel, we analyzed the


limiting condition of operation of Tihange II and III PWR reactors
with respect to one of its bunkerized systems, the Emergency
feedwater system.
This system is constituted by three 50 X trains, trimestrial
staggered testing being performed. The Technical Specifications
request that one train shouldn't be unavailable for more than two
months.
We performed sensitivity calculations about - the maximum
periods of unavailability - their repartition - the conjugate effect
of periodica.1 testing - the influence of common mode failures. As an
illustration, we here provide the plot of one of this calculation
(figure 1), where one clearly can see the effects of unavailability
periods and of testing.

- 149 -
1 r- 1 r '''
^10'

-2
10

v-
-10-*

m
<
10
<
>

^-8

-10
10

' I 1 I I 1 1 I I I . I J LJ
2.0 4.0 6.0 8.0 103
TIME(HOURS)
Without entering in details about results and conclusions of
this quite extended study we performed, let us just note that it
provided issues on :
the coherency between limiting condition of operation
respectively applied to level 1 and level 2 (bunker) protection
systems
the influence of testing policies
the influence of considering common cause failures upon choices
of testing-maintenancing policies.

5. Correlated Supercomponents Technique [3,11]


5.1. Principles

The reduction of a complex system to a collection of smaller


subsystems is a well known strategy [2].
We here proceed in three steps :
We define the system to be analyzed into subsystems, which
should be "as weakly as possible" interacting.
Subsystems are analyzed and lumped into "supercomponents",
characterized by the usual reliability parameters. These
parameters can be defined by matching in the least-square sense
the time-dependent reliability of a subsystem to the reliability
of a model system (e.g. one single supercomponent).
The final step consists in studying the original complete
system, now described in a simpler way by a combination of
supercomponents.
Problems arise whenever statistical dependency between
components, included in distinct subsystems, has to be considered.
Indeed, supercomponents conditional transition rates should be
evaluated in such a way that they are only dependent upon the states
(working, failed) of subsystems as wholes, and not upon the states of
individual components, because these last ones do not appear
explicitly at the final step of the procedure.
Following formal example shows how we deal with this main difficulty.

5.2. Formal example

Let us consider two subsystems, 1 and 2, including respectively


components x, and x a , and following notations for the failure events,

Si the event "failure of subsystem i"

Xv the event "failure of component Xi"

A the complementary event of A

Assuming that the failure probability of x a depends upon the


state of ,, obviously the failure probability of subsystem 2 depends

151
upon the state of subsystem 1, i.e. Pr(Sa/Si) = PriSa/S,). Using
following boolean decomposition, we are able to quantify conditional
probabilities, as presented here,

S a = Sa<Xi + ,)
and
Pr(Sa/Si> = PrCSa/X^.PrCX,/^,) + Pr ( Sa/X ! ). Pr ( X i /S, )

Pr(Sa/S,) = PrCSa/X^.PrtXT/S,) + Pr(Sa/Xn).Pr(X1/S1)

Some terms can be evaluated by subsystem analysis, according


to different contexts (here Pr(S2/Xi>, Pr(S2/x\ ) ). Other terms are
what we call "weights" (e.g. Pr(X,/S-, )). Weights are time-dependent,
for instance

-,/,) = Pr(X,.S,) / Pr(S, )

where the numerator corresponds to a set G of subsystem 1 microstates

Pr(X,.S,) = n 1 1 (t)

In a general way, we can say that a conditional rate at


supercomponent level is evaluated as a weighted sum of conditional
rates corresponding to different external contexts.

5.3. Integration of the method

Weights and transition rates expressions can become very heavy; we


provided Mmarela with an automated procedure evaluating the lumped
transition rates. Moreover, the lumped process becomes non-
homogeneous (time-dependence of the weights). So, we modified
Mmarela in order to perform a multiphase calculation, the transition
matrix being reevaluated at choosen times.

5.4. Example [3]

As an example, let us consider a eight components modelling of the


Auxiliary Feedwater System (AFWS) of the Doel III power plant (figure
2). It consists in three 50 '/. trains, diesel powered. A diesel
failure rate is assumed as depending upon the state of the two other
diesels, and each train includes the electric offsite power system
(subsystems are not disjoined). Each train will be modelized as one
single supercomponent, and as we see, there are several correlation
sources.
Following table presents two calculations, one on the original
system, the other one by the above presented technique, considering
the evaluation of system's reliability on a 24 hours mission.

152
1 2
L_

Vapor
generators

6 2
FIGURE : STRUCTURE OF AF WS
electricnetwork 2
motopump 3,5
turbopump 4
motovalve 7
diesels 1,6, 8
Original system Supercomponent technique

Number of states 169 7


Number of stored matrix 5.932 19
elements
Calculation time 61 cps 6 cps
Memory requirements 170 Kbytes 63 Kbytes
Unreliability 1.25.10-5 1.32.10-5

(performed on Cyber 170)

5.5. Some conclusions

Some features should be pointed out :


Obviously the gain in calculation time and memory will increase
with original system complexity.
Ue shouldn't focus on the gain; main problem of markovian
analysis remains its feasability, faced to large-scale problems.
In this perspective, our participation in the benchmark on CCF,
being faced to a hundred binary events, "qualified" in some way
our technique.

6. Aggregation/Disaggregation Methods - Multigrid Approach [11]

Considering large-scale systems, we present a second approach, being


an aggregation/disaggregation (A/D) iterative method. Enouncing in a
most simplified way the principle, it consists in solving
alternatively an aggregate problem and the original one, in order to
speed up the integration of the evolution equation.
These A/D methods applied to markovian problems are well known
[4]; nevertheless, main questions remain unsolved, like how to
aggregate and when?
We will introduce a multigrid method [5], grids being defined on
a tensorial classification of microstates, and see that this
classification corresponds to a physically relevant partition choice,
dividing the subsystem in "as weakly as possible" interacting
subsystems.

6.1. Theoretical aspects

Starting with the case of two independent subsystems, the system


state-vector can be described as a Kronecker product [7]

= 1fiIIa

this being a possible classification of microstates. The complete


evolution problem

- 154 -
dir/dt = = (, 2 > (Kronecker sum)

can be reduced in two smaller independent problems, n 1 and n a being


evaluated separately, i.e.

d n V d t = 1 P, and d n V d t = na Pa

Faced to interactions, we must have recourse to an iterative


procedure : indeed, evolution of vector 1 depends on subsystem 2
current state, and viceversa. So we do have to define interpolation
operators integrating current external context. We'll proceed here
with three "grids" : the fine grid (matrix P) describes whole system
evolution, two coarse grids (matrices rPpi, i=1,2) describe
subsystems evolution, where r are the restriction operators and p
the interpolation operators.
About restrictions, we consider following obvious relations,
~ ,
rn = 1

this being a microstate lumping with respect to tensorial


classification.
There are of course several ways to define interpolation
operators. A first one is most obvious, e.g.

, 1 = 1 S rr

interpolation operator reconstructing tensorial product, using here a


current approximate value of subsystem 2 vector.
A second definition is based on conditions upon the restricted
evolution equations, and introduces generalized inverses [6] of 1
and n a ,

, .dn/dt = r,Pn and , = ,1

defining

, = n 1 1 >
Most of our calculation tests were performed on this
interpolation way.

6.2. Some features of the calculation code

First, it has a threemodule structure : a stategenerator module, a


matrix generator module, and the A/D module, this modular structure
providing great flexibility.
Second feature is most relevant. Main purpose is to avoid fine
grid iterations, because they are time consuming. To limit their
use, we integrated tests on results differences between succesive
coarse grid iterations. Given criterias define wether coarse grid

155
iterations do or do not significatively modify results (values of ir);
if not, a fine grid iteration is injected.
Tests can include several eriterias, logically combined, the
choice being function of user's wishes (putting stress on
unavailability calculation, carrying a cheap (fast) calculation, or a
costly but precise one, etc.).

6.3. Some numerical trials

Following table is referring to a six components system trial. At a


given time h, evaluating n(h) knowing n(o), this multigrid
calculation included three steps, one fine grid iteration (FG)
followed by two consecutive coarse grid corrections (C G1, C G2).
Three error characterizations are given, we notice the decreases,
especially on unavailability error level.

Errors on Error on unavailability

Ln norm L norm X

FG 2,2 1 0 3 8,4 1 0 7 204 V.

CG1 1,8 ^0~A 1,3 10~ 6 204 X

CG2 1,7 10~ 4 1.1 10" 6 4,5 X

The latest tests were performed on a ten binary components


system (1.024 microstates).
Coarse grid iteration usefulness was confirmed.
We tested a logical cutoff procedure (neglecting higher order
failed states), integrated in the first module of the code. It
demonstrated that dealing with a thousand microstates can
correspond to an efficient analysis of systems including much
more than ten binary components.

7. Some Conclusions

We presented an application of the integrated inspection and


maintenance models.
We have been fighting against markovian analysis impediments by
several means :
a correlated supercomponent technique
a flexible multigrid approach
a cutoff technique
means which all can be combined, and we should say that at this day
we are able to deal with largescale Markov processes.
Concerning the supercomponent technique, our treatment capacity
appears as "unlimited", in this sense that it is quite difficult to
fix a limit. One can drive modularization very far, especially when
modules are allowed to be coupled.

156
Concerning the multigrid approach, at this stage, we are able to
treat more than a thousand microstates.

References

[1] I. PAPAZOGLOU and E. GYFTOPOULOS, Nureg/CR-0405, 1978.

[2] M. MODARRES, Topical Meeting on Probabilistic Risk Assessment,


Port Chester, 1981.

[3] J. CANTARELLA, Travail de fin d'tudes, Mtrologie Nuclaire,


ULB, 1983.

[A] F. CHATELIN, International Workshop on Applied Mathematics,


Pisa, 1983.

[5] Multigrid Methods (Proceedings, Kln-Porz, 1981), edited by W.


HACKBUSH and U. TROTTENBERG, LNM 960, Springer-Verlag.

[6] A. BEN ISRAEL, T. GREVILLE, Generalized Inverses. Theory and


Applications. Wiley, 1974.

[7] J. BREWER, IEEE Transactions on Circuits and Systems. Vol. cas-


25, N 9, 1978.

[8] J. KEMENY, L. SNELL, Finite Markov Chains. Van Nostrand, 1960.

[9] N.J. MC CORMICK, Reliability and Risk Analysis. Academic Press,


1981.

[10] "Justification du temps d'indisponibilit de 2 mois fourni dans


les spcifications techniques pour les systmes du bunker",
Tractebel S.A., 3 8 9 / D N / T 2 - T 3 , 1986.

[II] J. CANTARELLA, J. DEVOOGHT, C. SMIDTS, Probabilistic Safety


Assessment and Risk Management, Zrich, TUV Rheinland, 1987.

[12] A. FOGLIOPARA, S. GARRIBBA, in Synthesis and Analysis Methods


for Safety and Reliability Studies. Plenum Press, 1980.

- 157
PARAMETRIC MODELS FOR COMMON CAUSE
FAILURE ANALYSIS

K. N. FLEMING
Pickard, Lowe and Garrick, Inc.
2260 University Drive
Newport Beach, CA 92660 USA

ABSTRACT. This paper provides a brief presentation of several of the most frequently used
parametric models for common cause failure analysis. The concept of common cause basic
event is introduced and used to establish relations between various parametric models.
Generalized formulas to estimate the frequencies of common cause basic events are
provided for each parametric model.

1. Introduction

Parametric models have been the primary means for modeling and quantifying an
important class of dependent events called c o m m o n cause failures (CCF). The use
of parametric models, however, has been clouded by s o m e c o m m o n l y held myths.
For e x a m p l e , it is often heard that parametric CCF models:

Are used to account for unforeseen causes.


Are used to quantify the unquantifiable.
Do not permit d e v e l o p m e n t of e n g i n e e r i n g insights.
Are not needed when detailed fault trees are d e v e l o p e d .
Are useful to analyze all types of dependent failures.

However, the true purpose of parametric models is to provide a:

Model of the impacts of c o m m o n cause basic events on system performance.

Numerical measure of the degree of independence (or lack thereof) achieved


in the performance of redundant systems.

Feedback loop between CCF experience data and system reliability models.

Basis for realistic predictions of redundant system reliability characteristics.

159
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safely Assessment, 159-174.
1989 ECSC. EEC, EAEC. Brussels and Luxembourg.
Means of identifying and evaluating defensive measures to enhance reliability
performance.

In fact, the parametric models have been quite successful In all of these areas as
witnessed by the past and current work in probabilistic risk assessment of nuclear
power plants.
The objective of this paper is to provide a brief overview of the basic
characteristics of several of the most commonly used parametric models. A more
detailed discussion of these models can be found in Reference 1. The reader is
also referred to another paper in this volume (Reference 2) for issues regarding
data analysis and estimation of the parameters of these models. In order to present
the models and describe how they are related, it is useful to introduce the concept
of common cause basic events. This is the subject of Section 2. The parametric
models are then described in Section 3.

2. Common Cause Basic Events

To facilitate subsequent application of data on historical independent and dependent


failure events to the estimation of model parameters, it is convenient to define
common cause basic events; that is, basic events that represent failures of specific
components in a common cause component group. This step is equivalent to a
redefinition of the logic model basic events from a component-level basis to a lower
level of detail that identifies the particular impacts that common cause events of
specified multiplicity may have on the system. Thus, the common cause basic
events are written in terms of the particular combination of components affected.
The common cause basic events also provide an unambiguous and useful technical
vocabulary for discussing each of the parametric models. At this lower level of
detail, the specific causes of multiple failures are not explicitly included, but the
impacts of these causes on the particular number of components failed are.
As an example of this breakdown, consider a system of three identical
components, A, B, and C, with a two-out-of-three success logic. These components
form a single common cause component group. The component-level fault tree is:

160
with the following minimal cutsets:

{A, B} ; {A, C} ; {B, C}

The reduced Boolean representation of the system failure in terms of the above
minimal cutsets of the component-level fault tree is

S = A B + AC + BC 0)
The expansion of this component-level Boolean expression down to the common
cause impact level can be illustrated by representing each component-level basic
event as a subtree, such as that shown below, in which it is assumed that common
cause failures can lead to either two or three components failing simultaneously.

The equivalent Boolean representation of total failure of component A is

AT = A, + C A B + C A C + C A B C (2)

where

AT = total failure of component A.

A| = failure of component A from independent causes.

=
CAB failure of components A and (and not component C ) from
common causes.

C AC = failure of components A and C (and not component B)


from common causes.

C A B C = failure of components A, B, and C from common causes.

161
When all the components of our two-out-of-three example system are expanded
similarly, the following minimal cutsets are obtained

{AB,} ; {A | C ,} ; {B C ,}

(CAB) : {C AC } : {C BC }

{ C ABc)

The reduced Boolean representation of the system failure in terms of these


cutsets is

S = A, , + A, C , + B, C , +
(3)
+ C A B + C A C + C BC + C A B C

Had the success criterion for this example been only one out of three instead of
two out of three, it is clear that a substitution of subtrees, like those shown above,
into the system fault tree would produce cutsets of the type, C A B C A C . These
cutsets have questionable validity unless the events C A B and C AC are defined more
precisely. One option is to define the events C A B and C A C to be mutually exclusive.
The n, the Boolean expression in Equation (2) would represent a partition of the
failure space of A into mutually exclusive parts based on the impact on other
components in the common cause component group of the underlying set of
causes. This would imply that the probabilities of cutsets like C A B C AC are
identically zero. An alternative option is to construct the events C A B , C AC , and C A B C
as sums of contributions from specific root causes so that, for example,

C = C
AB / AB

where C A B ^ represents the common cause failures of components A and from


root cause i.
In this case, it is clear that cutsets of the form C A B C A C could occur from
combinations of such root causes as C A B ^ CA$ , but all combinations 0 ^ C A B ^
would be eliminated since component A would be supposed, in this cutset, to have
been failed twice by the same root cause. Thus, the events C A B and C A C in this

162
picture are neither mutually exclusive nor exactly independent, and the probability
of C A B C AC cannot be calculated directly without using the decomposition into
cause contributions.
It will be seen later that the causes are considered in classifying events in terms
of their impact on components. If in this process, events that could have been
identified as C A '' C A $ are classified (as is most likely) as
A| C BC , C| C AB , B| C AC , or C ABC , then cutsets like C A B C A C should be
eliminated to avoid double counting. Such a counting process then makes this
option equivalent to the previous, mutually exclusive definition of the events. It is
clear that the definition of the events, the counting process by which event reports
are classified, and the way the results are used to estimate the parameters of
common cause models are closely intertied.
Although complete agreement has not been reached on the most appropriate
definition of these events, it fortunately does not make a significant numerical
difference to the results because, in general, the contribution of cutsets like
C A B C CD is considerably smaller than that of cutsets like C A B C .

3. Probability Models for Common Cause Basic Events

The primary objective of this step is to select the common cause model that will be
used in the quantification of the common cause basic events. The cutset Boolean
equation is transformed so that the probabilities of the basic events can be
substituted directly into the resulting algebraic expression.
For example, in the three-component example system of the previous section, the
algebraic equivalent of Equation (3) in terms of the probabilities of the basic events,
using the rare events approximation, is

P(s) P(A,) P(B|) + P(A,) P(C,) + P(B,) P(C,) +


(4)'
+ P(CAB) + P(CAC) + P(CBC) + P(C ABC )

where P(x) = probability of event x.

According to rare events approximation for two events, a and b, we have


P(a b) =s 0 . Consequently,

P(a + b) = P(a) + P(b) - P(a b)

a P(a) + P(b)

- 163 -
It is a common practice in risk and reliability analysis to assume that the
probabilities of similar events involving similar types of components are the same.
This approach takes advantage of the physical symmetries associated with
identically redundant components in reducing the number of parameters that need
to be quantified. For example, in Equation (4) it is assumed that

P(A|) = P(B|) = P(C,) = Q

p c
P(CAB) = P(CAC) = ( B c ) = Q2 (5)

P(CABC) = Q3

Note that the probability of failure of any given basic event within a common
cause component group depends only on the number and not on the specific
components in that basic event. This is called the symmetry assumption.
Continuing with our example, the system failure probability [Equation (4)] can be
written as

Qs ~ 3Q? + 3Q2 + Q3 (6)

Here, the cutset information is lost, but quantification is easier.


Generalization of this concept is straightforward; for the basic events
corresponding to a common cause group of m components, the following
probabilities can be defined:

Qk = probability of a basic event involving k specific components.


(7)
1 ^k^m

Note that the total probability of failure of a specific component can be obtained
from the Q^'s. This can be seen, for example, from Equation (2) where the failure of
component A due to all causes is expanded in terms of the basic events.
Transforming Equation (2) into its equivalent probability model and using
Q1, Q2, and Q3, as defined in Equation (5), we get

Q, ~ Q, + 2Q2 + Q3 (8)

where, in this case, Qt is the total failure probability of component A. In general,


the total failure probability of a component in a common cause group of m
components is

- 164 -
<9>
m

.=(::,
k=1

where the binomial term

m-1 / (m-1)!
(10)
k-1 V (m-k)! (k-1)!

represents the number of different ways that a specific component can fail with
(k 1) other components in a group of m similar components.
The model that uses Q k 's defined in Equation (7) to calculate system failure
probability is called the basic parameter model (Reference 3). Ideally, Q k 's can be
calculated from data in which case there is no need for further probabilistic
modeling. Unfortunately, the data required to estimate Q k 's directly are not
normally available. Other models have been developed that put less stringent
requirements on the data. This, however, is only done at the expense of making
additional assumptions that address the incompleteness of the data. Several of
these models are summarized in Table I and explained in the following. These
models can be categorized in several different ways, based on the number of
parameters, their assumptions regarding the cause, coupling mechanism, and
impact of common cause failures.
The categories for the number of parameters required for modeling common
cause events are:

Single Parameter M odels

M ultiple Parameter M odels

With respect to how multiple failures occur, there are two categories:

Shock M odels
Nonshock M odels
The "shock models" estimate the frequency of multiple component failures by
assuming that the system is subject to common cause "shocks" at a certain rate and
estimating the conditional probability of failure of components within the system,
given the occurrence of shocks. The common cause failure frequency is the
product of the shock rate and the conditional probability of failure, given a shock.
Finally, as mentioned before, except for the basic parameter model, all common
cause models discussed in this report estimate the probability of basic events
indirectly; i.e., through the use of other parameters. In general, the types of
parameters, estimation method, and data requirements vary from one model to

165
Table I. Key C haracteristics of the Parametric Models

Estimation General Form For Multiple Component


Model Model Parameters*
Approach Failure Frequency
Basic OL o2 om 0k = 0k k=1, 2 m
t3
o Parameter
5

Beta 0,, (1 )0t k= 1


Factor 0k = 0 m> k> 1
m *
Ot k= m
fi
0.
w
o
o
o Multiple 0t, . y, S. ...
S k
. Greek (m 1 ( \ .
0k
f

o
13
Letters Parameters) "( mlU| = l"7(1 "k+l)'
/

c o o
o o
c i>1 = 1. P2 = - P3 = y Pm+1=
E

Alpha t "1
2 "m
. k "k
3 Factor a
* / m 1 \ t '
E

m
t = k k
k=i
Binomial , , , w m 1
0i+W(1p) k=1
Failure k
0k = w ( 1 p)m~k k> 2
u o Rate
o -a ppm + w k= m
c a

Refer to the text for definition of various parameters.

166
another. However, with the current state of data that involve large uncertainties, the
numerical impact of selecting one model over another is not significant, given a
consistent treatment of data in all cases. These points become clearer in the
following sections. The remainder of this section deals with a brief description of
the various parametric models summarized in Table I.

3.1 SINGLE PARAMETER MODELS

The single parameter models refer to those parametric models that use one
parameter in addition to the total component failure probability to calculate the
common cause failure probabilities. The most widely used single parameter model,
and the first such model to be applied to common cause events in applied risk and
reliability analysis, is known as the beta factor model (Reference 4). A variant of
this model, called the C-factor method (References 5 and 6), employed the same
model, but, in order to address the incompleteness of the data sources, used a
different method of estimating the parameter. According to the beta factor model, a
fraction (/?) of the component failure rate can be associated with common cause
events shared by the other component in that group. According to this model,
whenever a common cause event occurs, all components within the common cause
component group are assumed to fail. Therefore, based on this model, for a group
of m components, all Qk 's defined in Equation (7) are zero except Q t and Q m . The
last two quantities are written as

Qi = ( 1 - 0 ) Q.
(11)
Qm = Qt

This implies that

Qm
r
(12)
Ql+Qm

Note that Qt, the total failure probability of one component, is given as

Q. = Q l + Q m (13)

which is the special case of Equation (9) when Q2 = Q3 = ... = Q m _i = 0 .


As an example, using the beta factor model, the terms representing the basic
event in Equation (6) are written as

167 -
Q, = (1 - ) Q,

Q2 = 0 (14)

Q3 = Qt

which gives

Qs = 3(1/?) 2 Q? + /?Qt (15)

As can be seen, the beta factor model requires that an estimate of the total failure
rate of the component be provided from generic sources of data and that a
corresponding estimate for the beta factor also be provided. A practical and useful
feature of this model is that the estimators of do not explicitly depend on system
or component success data, which are not generally available. This feature, the fact
that estimates of the parameter for widely different types of components vary
much less than estimates of Qk, and the simplicity of the model are the main
reasons for wide use of this method in risk and reliability studies. It should be
noted, however, that estimating factors, just as with any reliability analysis
parameter, requires specific assumptions concerning the interpretation of data
(Reference 7).
Although historical data collected from the operation of nuclear power plants
indicate that common cause events do not always fail all redundant components,
experience from using this simple model shows that, in many cases, it gives
reasonably accurate (only slightly conservative) results for redundancy levels up to
about three or four items. However, beyond such redundancy levels, this model
generally yields results that are conservative. When interest centers around
specific contributions from third or higher order trains, more general parametric
models are recommended.

3.2 MULTIPLE PARAMETER MODELS

For a more accurate analysis of systems with higher levels of redundancy, models
that represent the range of impact levels that common cause events can have are
more appropriate. These models involve several parameters with which to quantify
the specific contribution of various basic events.
Four such models are selected here to provide adequate representation of the
methods that have been proposed. In the nonshock model category, the multiple
Greek letter (MGL) model (Reference 8) and the alpha factor model (Reference 9)
are discussed. The shock model category is represented by the binomial failure
rate model (References 10 and 11). These models are briefly described in the
following paragraphs.

168
3.2.1 Multiple Greek Letter Mod el. The MGL model (Reference 8) is the most
general of a number of recent extensions of the beta factor model. The MGL model
was the one used most frequently In the International Common Cause F ailure
Reliability Benchmark Exercise (Reference 12). In this method, other parameters in
addition to the ^-factor are introduced to distinguish among common cause events
affecting different numbers of components in a higher order redundant system.
The MGL parameters consist of the total component failure frequency, which
Includes the effects of all independent and common cause contributions to that
component failure, and a set of failure fractions, which are used to quantify the
conditional probabilities of all the possible ways a common cause failure of a
component can be shared with other components in the same group, given
component failure has occurred. F or a system of m redundant components and for
each given failure mode, m different parameters are defined. F or example, the first
four parameters of the MGL model are, as before

Qt = total failure frequency of the component due to all


independent and common cause events.

plus

= conditional probability that the common cause of a component


failure will be shared by one or more additional components.

y = conditional probability that the common cause of a component


failure that is shared by one or more components will be shared
by two or more components additional to the first.

= conditional probability that the common cause of a component


failure that is shared by two or more components will be shared
by three or more components in addition to the first.

The general equation that expresses the frequency of multiple component failures
due to common cause, Qk, in terms of the MGL parameters, is given in Table I.
To see how these parameters can be used in developing the probabilities of the
basic events, consider the three-component system represented by Equation (6).
The maximum number of components that can share a common cause is three
(m = 3). Therefore, y is the conditional probability that the common cause of
failure of a component will be shared by exactly two additional components, and
<5 = 0.

169
Then, from Table I,

Q, = (1 )Qt

Q2 = (1/2)/? (1y)Q t (16)

Q3 = /JyQ,

The above expressions for Q1? Q2, and Q3 can be used, for example, in
Equation (16) to obtain the unavailability of a two-out-of-three system in terms of the
MGL parameters:

Q s = 3(1/?) 2 Q? + | / ? ( 1 y ) Q t + /?yQt (17)

Note that the beta factor model is a special case of the MGL model. For this
example, the MGL model reduces to the beta factor model if y = 1. In particular,
Equation (17) reduces to Equation (15) if y = 1.

3.2.2 Alpha Factor Model. As explained in References 13 through 15, rigorous


estimators for the /Mactor model and its generalization, the MGL model parameters,
are fairly difficult to obtain although approximate methods have been developed and
used in practice (Reference 16). A rigorous approach to estimating ^-factors is
presented in Reference 14 through introducing an intermediate event-based
parameter, which is much easier to estimate from observed data. Reference 9 uses
the multiparameter generalization of event-based parameters directly to estimate
the common cause basic event probabilities. This multiparameter common cause
model is called the -factor model.
The difference between the -factor parameters and the MGL parameters is that
the former are system-failure based, while the latter are component-failure based.
The -factor parameters are thus more directly related to the observable number of
events than are the MGL parameters.
Like the MGL model, the alpha-factor model develops common cause failure
frequencies from a set of failure ratios and the total component failure rate. The
parameters of the alpha-factor model are defined as

Qt = total failure frequency of each component due to all


independent and common cause events

ak = fraction of the total frequency of failure events that occur


in the system involving the failure of k components due to
a common cause.

no
and

, + 2 + ... + am = 1

The general equation relating the basic event probabilities, Qk 's to the -factor
model parameter, is given in Table I. As we can see, the key difference between
in this model and the parameters of the MGL and ^-factor models is that the former
is a fraction of the events that occur within a system, whereas the latter are
fractions of component failure rates.
Again, as an example, the probabilities of the basic events of the
three-component system of Equation (6), in terms of the -factor model parameters,
are written as (from the general equation in Table I, with m = 3)

Qi=-5J-Qt

Q2=-^-Q, (18)

where a t = aj + 2a2 + 3a3, a normalizing factor.


Therefore, the system unavailability for our example [Equation (6)] is given by

2
Q? + 3 -- Qt + 3 -- Qt (19)
- < * )

3.2.3 Binomial Failure Rate (BFR) Model. The BF R model (References 10 and 11)
considers two types of failures. The first represents independent component
failures; the second type is caused by shocks that can result in failure of any
number of components in the system. According to this model, there are two types
of shocks: lethal and nonlethal. When a nonlethal shock occurs, each component
within the common cause component group is assumed to have a constant and
independent probability of failure. The name of this model arises from the fact that,
for a group of components, the distribution of the number of failed components
resulting from each nonlethal shock occurrence follows a binomial distribution. The
BFR model Is therefore more restrictive because of these assumptions than all
other multiparameter models presented in Table I. When originally presented and
applied, the model only included this nonlethal shock. Because of Its structure, the
model tended to underestimate the probabilities of failure of higher order groups of

171
components in a highly redundant system; therefore, the concept of lethal shock
was included. This version of the model is the one recommended.
When a lethal shock occurs, all components are assumed to fail with a conditional
probability of unity. Application of the BF R model with lethal shocks requires the
use of the following set of parameters.

Ql = independent failure frequency for each component.

= frequency of occurrence of nonlethal shocks.

= conditional probability of failure of each component,


given a nonlethal shock.

= frequency of occurrence of lethal shocks.

The general form of the probability of basic events according to the BF R model Is
given in Table I.
As an example, using this model, the probabilities of the basic events in
Equation (6) are written as

Q1=Ql+MP (1-p)2

Q 2 = 2 (1 - ) (20)

3
Q3 = +

Therefore,

Q w = afp, + (1 - ) 2 ] 2 + 32(1 - ) + 3 + (21 )

It should be noted that the basic formulation of the BF R model was introduced in
terms of the rate of occurrence of failures in time, such as failure of components to
continue running while in operation. Here, consistent with our presentation of other
models, the BF R parameters are presented in terms of general frequencies that can
apply to both failures in time and to failure on demand for standby components.

- 172 -
4. References

1. Mosleh, ., . . F leming, G. W. Perry, H. M. Paula, D. H. Worledge, and


D. M. Dasmuson, "Procedures for Treating Common Cause F ailure Events in
Safety and Reliability Studies," Vols. I and II, EPRI NP-5613 and
NUREG/CR-47B0. 1988.

2. Mosleh, ., "Estimation of Parameters of Common Cause F ailure Models," this


volume.

3. F leming, . N., and A. Mosleh, "Classification and Analysis of Reactor


Operating Experience Involving Dependent Events," Electric Power Research
Institute, EPRI NP-3967, F ebruary 1985.

4. F leming, . N., "A Reliability Model for Common Mode F ailure in Redundant
Safety Systems," Proceed ings of the Sixth Annual Pittsburgh Conference on
Modeling and Simulation, General Atomic Report GA-A13284, April 23-25, 1975.

5. Evans, M., G. Parry, and J. Wreathall, "On the Treatment of Common Cause
Failures of System Analysis," Reliability Engineering, Vol. 9, pp. 107-115, 1984.

6. Parry, G. W., "Incompleteness in Data Bases: Impact on Parameter Estimation


Uncertainty," SRA 1984 Annual Meeting, 1984.

7. Parry, G. W., "Technical Note: Modeling Uncertainty in Parameter Estimation,"


Nuclear Safety, Vol. 27, p. 212, 1986.

8. F leming, K. N., and A. M. Kalinowski, "An Extension of the Beta F actor Method
to Systems with High Levels of Redundancy," Pickard, Lowe and Garrick, Inc.,
PLG-0289, June 1983.

9. Mosleh, ., and N. O. Siu, "A Multi-Parameter, Event-Based Common-Cause


Failure Model," Paper M7/3, Proceedings of the Ninth International Conference
on Structural Mechanics in Reactor Technology, Lausanne, Switzerland, August
1987.

10. Vesely, W. E., "Estimating Common Cause F ailure Probabilities in Reliability


and Risk Analyses: Marshall-Olkin Specialization," IL-0454, 1977.

11. Atwood, C. L., "Common Cause F ault Rates for Pumps," NUREG/CR-2098,
prepared for U.S. Nuclear Regulatory Commission by EG&G Idaho, Inc.,
February 1983.

- 173
12. Poucet, ., A. Amendola, and P. C. Cacciabue, "Summary of the Common
Cause F ailure Reliability Benchmark Exercise," Joint Research Centre Report,
PER 1133/86, Ispra, Italy, April 1986.

13. Apostolakis, G., and P. Moieni, "On the Correlation of F ailure Rates," Reliability
Data Collection and Use in Risk and Availability Assessment, proceedings of the
Fifth EUREDATA Conference, Heidelberg, Germany, April 9-11, 1986, published
by Springer-Verlag Berlin, Heidelberg, Germany, 1986.

14. Paula, H. M., "Comments on 'On the Analysis of Dependent F ailures in Risk
Assessment and Reliability Evaluation'," Nuclear Safety, Vol. 27, No. 2,
April-June 19B6.

15. Apostolakis, G., and P. Moieni, 'The F oundations of Models of Dependence in


Probabilistic Safety Assessment," Reliability Engineering, Vol. 18, pp. 177-195,
1987.

16. Mosleh, ., "Hidden Sources of Uncertainty: Judgment in Collection and


Analysis of Data," Nuclear Engineering and Design, Vol. 93, 1986.

- 174 -
ESTIMATION OF PARAMETERS OF COMMON CAUSE
FAILURE MODELS

A. MOSLEH*
Plckard, Lowe and Garrick, Inc.
2260 University Drive
Newport Beach, CA 92660 USA

ABSTRACT. This paper presents an overview of current understanding of several important


issues regarding estimation of parameters of common cause failure models. The
presentation is primarily based on the treatment of the subject in a two volume document on
procedures for treating common cause failures in safety and reliability studies (Reference 1).
The two main topics discussed are: (1) event data classification and screening, and (2)
parameter estimation. Techniques for developing a pseudo plant-specific database from
generic data are discussed and various sources of uncertainties are investigated.

1. Introduction

In recent years, a number of models have been introduced and used for
quantification of the contribution of c o m m o n cause failures to the unavailability of
systems. These include single and multiple parameter models, most of which are
described in s o m e detail in another article presented in this v o l u m e (Reference 2).
Experience s e e m s to indicate that given the same treatment of data, all models with
the s a m e level of detail give comparable numerical values for the frequency of
c o m m o n cause basic events (References 3 and 4). Consequently, what s e e m s to be
driving the answers is the basic information used in estimating various model
parameters. In particular, the quality of failure event reports and the j u d g m e n t of
the analyst in translating the qualitative and quantitative information content of those
reports into basic statistics for parameter estimation, play an important role.
Incompleteness of the data bases and lack of detail in most event reports are
facts of life and the use of judgment by the data analyst in unavoidable

' C u r r e n t address:
Department of Chemical and Nuclear Engineering
University of Maryland
College Park, MD 20742

175
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 175-203.
1989 ECSC. EEC. EAEC, Brussels and Luxembourg.
(Reference 5). It is important, therefore, to follow a systematic procedure which
considers the various issues involving the use of data and ensures the most
consistent and effective use of operating experience in estimating common cause
model parameters. This paper focuses on such a procedure, and is primarily based
on the material presented in Reference 1. It is assumed that the reader is familiar
with the content of Reference 2, and in particular with the definition for different
models and their parameters. The models for which estimators are provided here
are basic parameter, multiple Greek letter (MGL), beta factor, alpha-factor, and
binomial failure rate (BFR).
The rest of the paper is organized into two sections addressing the two main
steps of data analysis. Section 2 presents a procedure for failure event data
classification and screening. Among the topics discussed are available data
sources and techniques for developing a plant-specific common cause failure data
base. Section 3 discusses the issues involving parameter estimation and the
various sources of uncertainty.

2. Data Classification and Screening

Ideally, the numerical value of the parameters of the common cause failure models
should be estimated in a manner that makes the maximum possible use of event
data; i.e., reports of operating experience. This requires review, evaluation, and
classification of the available information to obtain specialized failure data. Because
common cause failures can dominate the results of reliability and safety analysis, it
is extremely important that this analysis of data is performed within a context that
represents the engineering and operational aspects of the system being modeled.
Due to the rarity of common cause events and the limited experience of individual
plants, the amount of plant-specific data for common cause analysis is very limited.
Therefore, in almost all cases, we need to use data from the industry experience
and a variety of sources to make statistical inferences about the frequencies of the
common cause events. However, due to the fact that there is a significant variability
in plants, especially with regard to the coupling mechanisms and defenses against
common cause events (Reference 1), the industry experience is not, in most cases,
directly applicable to the specific plant being analyzed although much of it may be
indirectly applicable. Also, and perhaps equally important, the analysis boundary
conditions that dictate what category of components and causes should be
analyzed, requires careful review and screening of events to ensure consistency of
the data base with the assumptions of the system model, its boundary conditions,
and other qualitative aspects delineated in the analysis.
The significance of this step has also been emphasized by Reference 4 since an
important conclusion of the Common Cause Failure Reliability Benchmark Exercise
(CCF-RBE) was that the most important source of uncertainty and variation in the

176
numerical results s data interpretation. Thus, careful attention and documentation
must be given to this step.

2.1 DATA SOURCES

The first step in data analysis is the data gathering task. This involves r e v i e w i n g
the existing data sources that generally fall into one of the following categories:

Plant-Specific Raw Data Records; e.g., c o m p o n e n t maintenance records,


operator logs, etc.

Generic Raw Data Compilations; e.g., References 6 through 12

Generically Classified Event Data and Estimated Parameters; e.g., References


13 through 16

The quality and level of detail of the information provided by these sources varies
significantly. The best sources of information are the plant records or " r a w data."
However, since c o m m o n cause events are rare it is necessary to collect data f r o m a
large number of plants even w h e n the analysis is c o n c e r n e d with a particular plant
or system. This, however, will be costly and impractical in the scope of
plant-specific analyses.
The next best source is a generic raw data c o m p i l a t i o n , such as LERs, w h e r e
failure events which are selected and interpreted by the compiler, are d e s c r i b e d .
The obvious drawback of this type of data source is possible incompleteness with
respect to the number of events presented and the fact that the event descriptions
are often s u m m a r i z e d and are based on an analysts interpretation.
Finally, sources that provide classified event reports and/or generic estimates for
model parameters can and are widely used w h e n resources are limited. However,
as w e will discuss later, this is the least desirable option since c o m m o n cause
events exhibit significant plant-to-plant variability which must be considered in
estimating model parameters.

2.2 DATA CLASSIFICATION

Once the raw data (event reports) are collected, the next step is a review and
classification of the events to identify where each event fits in a set of predefined
categories describing the type of the event, its cause(s), and its impact; e.g.,
number of c o m p o n e n t s failed. For this purpose, a data classification approach, such
as one developed for Electric Power Research Institute (EPRI) (References 16
and 17) is needed.
The EPRI classification system makes use of a cause-effect logic diagram to
portray the interactions between root causes and c o m p o n e n t states in an event.

- 177
Once the event scenario deduced from an event report is modeled in this way,
dependent events are easily identified and their impact on the original system can
be readily seen.
This classification of event reports is a rather subjective exercise, particularly in
light of the quality of many of the event reports. In an attempt to reduce subjectivity
in the screening of event data to identify common cause failures, the CCF-RBE
(Reference 4) identified the following rules, which have been somewhat modified.

1. Component-caused functional unavailabilities were screened out since it was


assumed that this kind of dependency is modeled explicitly.

2. If a specific defense exists that clearly precludes a class of events, all


specific events belonging to that class can be screened out.

3. If the cause of the reported event is a train interconnection that, in the plant
under consideration, does not exist, the event is considered as an
independent failure of one train.

4. Events related to inapplicable plant conditions (e.g., preoperational testing,


etc.) can be screened out unless they reveal general causal mechanisms
capable of occurring during power operation.

5. If the event occurred during shutdown and would be restored before


resuming power operation because of preservice testing or if it cannot occur
during power operation, the event is screened out.

6. If a second failure in an event happened after the restoration of the first, both
failures are considered as independent failures.

7. Events regarding incipient failure modes (e.g., packing leak, etc.) that clearly
do not violate component success criteria can be screened out.

8. Only the events regarding the failure modes of interest were taken into
consideration; events regarding failure modes that are irrelevant to the
system logic model can be screened out.

Rules 2 and 3 are more directed to the screening of events for applicability to
other plants. More detailed discussion about event screening can be found in
References 1 and 16.

178 -
2.3 EVENT IMPACT ASSESSMENT

A useful tool towards developing statistical data from event descriptions is to


s u m m a r i z e the outcome of the event classification process up to this point in a form
similar to the example given in Figure 1(a).

Plant
(Date)
Status Event Description Cause-Effect Diagram

Two residual heat removal torus


cooling valves failed to operate.
Pilgrim It was found that the failure was
95%
(September due to excessive pressure
Power
1976) differential across the valves,
which exceeded the capacity of
the valve motors. OPEHATORS VALVES

Figure 1(a). Example of Event Classification and Impact Assessment - Event


Classification

To complete the description of the event impact at the original plant, the analyst
needs to identify the f o l l o w i n g :

1. Component Group Size. The number (m) of (typically similar) c o m p o n e n t s


that are believed to have been exposed to the root cause and coupling
m e c h a n i s m of the event.

2. Number of Components Affected. The number of c o m p o n e n t s within the


component group that were affected (e.g., failed) in the event.

3. Shock Type. Whether the cause(s) and coupling mechanism(s) involved


w e r e of the type that typically results in the failure of all c o m p o n e n t s within
the c o m p o n e n t group (lethal shock) or not (nonlethal shock).

4. Failure Mode. The particular component function affected; e.g., failure to


open on d e m a n d .

Figure 1(b) s u m m a r i z e s the information about the event for the e x a m p l e event
described in Figure 1(a) and introduces the representation called the impact vector
(References 3 and 5).

179
Component Impact Vector
Shock Type Fault Mode
Group Size
Fo Fl F2
2 0 0 1 Nonlethal (L) Fail To Open on Demand

Figure 1(b). Example of Event Classification and Impact Assessment - Event


Impact Assessment

The binary impact vector of an event that has occurred in a common cause
component group of size m has m + 1 elements.*
Each element represents the number of components that can fail in an event. If,
in an event, k components are failed, then a 1 is placed in the F^ position of the
binary impact vector, with 0 in other positions. In the example of Figure 1, the
component group size is 2; therefore, the binary impact vector has three elements:
{F 0 Ft F2} . Since two components were failed, we have F0 = Fj = 0 and F2 = 1.
A condensed representation is

I = {0,0,1} (1)

Most of the time, however, the event descriptions are not clear, the exact states
of components are not always known, and root causes are seldom identified.
Therefore, the interpretation of the event [i.e., the translation of the event
descriptions into a form similar to the example in Figures 1(a) and 1(b)] may require
establishing several hypotheses, each representing a different interpretation of the
event.
As an example, consider the event classified in Figure 2(a). Since it is not clear
whether the third diesel was also actually failed, the binary impact vector is
assessed under two different hypotheses [Figure 2(b)]. Under the first hypothesis,
only two diesels are considered failed, while, according to the second hypothesis,
all three diesels were failed. The analyst at this point needs to assess his or her
degree of confidence in each of the two hypotheses. In the example of Figure 2(b),
a weight of 0.9 is given to the first hypothesis, reflecting a very high degree of
confidence that only two diesels were actually failed. The weight for the second

"Common cause component group is defined as a group of (usually similar)


components that are considered to have a high potential of failing due to the same
cause (Reference 1).

180
hypothesis is obviously 0.1 since the weight should add up to 1. This property of
the weighting factors assumes all reasonable hypotheses are accounted for. Note
that the data analyst must be in a position to defend and document this assessment.

Plant
Status Event Description Cause-Effect Diagram
(Date)

Two diesel generators failed to _


M-
/
\

Maine Yankee rub due to plugged radiator.


(August 1977) The third unit radiator was also
C' J[X]
^-""N.
plugged.

COOLING DG
SYSTEM

Figure 2(a). Example of the Assessment of Impact Vectors Involving Multiple


Interpretation of Event - Event Classification

Component Shock Fault


Hypothesis Probability Fo Fl F2 F3
Group Size Type Mode

h 0.9 0 0 1 0
Nonlethal
Failure
during
(N)
h 0.1 0 0 0 1 Operation
3
p2 p3
Po Pi
Average Irnnant
Vect or (I) 0 0 0.9 0.1

Figure 2(b). Example of the Assessment of Impact Vectors Involving Multiple


Interpretation of Event - Multiple Hypothesis Impact Vector Assessment

181
The expectation values for the impact vectors, taken over the two hypotheses, are

i = (Po. Pi. P2) = (0.9)1,+(0.1)l 2


(2)
= {0, 0.9, 0.1}

which is also shown in Figure 2(b). Note that F refers to a single binary impact
vector and P refers to an average impact vector.
This may be used for point estimation.

2.4 REINTERPRETATION OF EVENTS: CREATION OF "PLANT-SPECIFIC" DATA BASE

Up to this point, the event has been analyzed for the original plant. The next step is
to determine what that event implies for the plant and system that are being
analyzed. As was mentioned earlier, the same qualitative and quantitative
information obtained, based on the event at the original plant, may not be directly
applicable to the plant and system of interest due to several reasons, such as
differences in design, operation, common cause defenses, etc. It is therefore
essential to reinterpret the event in light of the specific characteristics of the system
under consideration.
In general, the differences between the system in which the data originated and
the system being analyzed arise in two ways: First, even for systems of the same
size, there are physical differences in system design, component type, operating
conditions, environment, etc.; second, there can be a difference in system size
(degree of redundancy).
In the following, a framework is described with which these two types of
differences can be taken into account explicitly in reinterpretation of the event and
the assessment of the impact vector for the system of interest.

2.4.1 Systems of the Same Size. First, we consider the differences, given the
assumption that the system size is the same. The question to be answered is the
following: given all the qualitative differences between the two systems, could the
same root cause(s) and coupling mechanism(s) occur in the system being analyzed?
In reality, this step involves a considerable amount of judgment. There are a
number of sources of uncertainty. These include the lack of detailed information
about the event, its circumstances, the nature of its causes, the nature of defenses
in the original system, and the effectiveness of defenses in the system being
analyzed. Yet, because of the sparsity of data, there is strong motivation to avoid
tossing the data out and to extract from it that evidence that is applicable. Due to
uncertainties involved and the important implications of screening events out of the
data base by declaring them inapplicable, the analyst must have a concrete reason
for his judgment. In the cases in which the analyst is uncertain about whether an

- 182 -
event is applicable or not, the impact vector of the original system may be modified
by a weight reflecting the degree of applicability of the event, as viewed by the
analyst. This is similar to the multiple hypothesis situation discussed earlier.
Hence, the alternative hypotheses are: (1) applicable with probability and (2) not
applicable with probability (1 - p).

2.4.2 Ad justments for Size Difference. The next step is to consider the system size
differences. The objective is to estimate or infer what the data base of applicable
events would look like if it all was generated by systems of the same size (i.e., the
number of components in each common cause group) as the system being
analyzed. This is done by simulating, in a thought experiment, the occurrence of
causes of failures (both independent and dependent) in the system of interest and
observing how the impact of these causes changes due to difference in system size.
Reference 1 provides a detailed discussion of the background and justification of the
need for adjustment in an impact assessment based on system size differences.*
Reference 1 also develops a set of rules and equations for changing the event
impact vectors of the original system to a corresponding set for the system being
analyzed.
The rules are presented for the following cases:

1. Mapping Down. The case in which the component group size in the original
system is larger than in the system being analyzed.

2. Mapping Up. The case in which the component group size in the original
system is smaller than in the system being analyzed.

2.4.3 Mapping Down Impact Vectors. A complete set of formulas for mapping down
data from systems having four, three, or two components to any identical system
having fewer components is presented in Table I. In this table, P|<(m' represents the
k-th element of the average Impact vector in a system (or component group) of size
m. The formulas show how to obtain the elements of the impact vector for smaller
size systems when the elements of the impact vector of a larger system are known.

'The numerical importance of this adjustment was first explained by Peter Doerre
of Kraftwerk Union, F ederal Republic of Germany, as part of a contribution to the
CCF Reliability Benchmark Exercise (Reference 4). The particular mapping
method presented here is one of several different ways that the impact vectors
can be mapped (see Reference 1B for an example).

- 183 -
Table I. Formulas For Mapping Down Event Impact Vectors

SIZE OF SYSTEM MAPPING TO (NUMBER OF IDENTICA L TRA INS)

3 2 1

f3> _ (4) . 41 1" . Ml . . . . 1


2 ' 2 " 4 1 2 2 * 4 3
1 p(2l.pM) 2 MI.JLpMI
(31.2 Ml* MI d ' . ' . ' . 3 (4)
1 4 1 2 2 1 2 ' 3 ' 2 ^ 1 4 2 2 4 3
4
( 3 ) _ 1 (41 . 3 (4) p ( 2 l . l p M>,ip(4l Ml .<'
P P +
2 2 2 4 P3 2 6 2 2 3 *

P_OIipMI + pW
2 3 4 3 4
o

iL
I2I . P (3I +. 1 P 13) 111. 131 1 2 |3) 1 (3|
P
0 0 3 l

0
0 * j 1 * 3 2

.
<
2 1 2 1 . 1 131 . 2 131 1 1 1 . ' 131 . 2 (31 13)
3
1 J ' 3 2 3 3 2 * 3
2
UJ
1 (21 131 ,. (31

>
<

V
2 3 2 * 3

IL
O

111. 12) t . (2)



0 0 2

. ' 121. (21


2 1 2 2

THE TERM ' 4 ' IS INCLUDED COMPLETENESS. BUT IN PRACTICE. ANY EVIDENCE THA T MIGHT EXIST A BOUT
CAUSES THAT IMPA CT NO COMPONENTS IN A FOUH SYSTEM WOULO BE "UNOBSERVA BLE."

184 -
2.4.4 Mapping Up Impact Vectors. It can be seen from the results presented above
that downward mapping is deterministic; i.e., given an impact vector for an identical
system having more components than the system being analyzed, the impact vector
for the same size system can be calculated without introducing any new
uncertainties. M apping up, however, as shown in Reference 1, is not deterministic.
To reduce the uncertainty inherent in upward mapping of impact vectors,
use is made of a powerful concept that is the basis of the binomial failure rate (BFR)
common cause model. This concept is that all events can be classified into one of
three categories:

1. Independent Events. Causal events that act on components singly and


independently.

2. Nonlethal Shocks. Causal events that act on the system as a whole with
some chance that any number of components within the system can fail.
Alternatively, nonlethal shocks can occur when a causal event acts on a
subset of the components in the system.

3. Lethal Shocks. Causal events that always fail all the components in the
system.

When enough is known about the cause (i.e., root cause and coupling
mechanism) of a given event, it can usually be classified in one of the above
categories without difficulty. If an event is identified as being either an independent
event or lethal shock, the impact vectors can be mapped upward deterministically,
as shown below. It is only in the case of nonlethal shocks that an added element of
uncertainty is introduced on mapping upward. How each event is handled is
separately summarized below.

2.4.5 Mapping Up Independent Events. In this case, since the number of


independent events in the data base is simply proportional to the number of
components in the system, it can be shown that P|(<f) and P|(k) , the number of
independent events in systems with sizes and k, respectively, are related by the
following equation:

\ = Pfk) (3)

- 185 -
2.4.6 Mapping Up Lethal Shocks. By definition, a lethal shock wipes out all the
redundant components present within a common cause group. From it follows the
following simple relationship:

P,W = Pj> (4)

Hence, for lethal shocks, the impact vector is mapped directly.

2.4.7 Mapping Up Nonlethal Shocks. Nonlethal shock failures are viewed as the
result of a nonlethal shock that acts on the system at a rate that is independent of
the system size. For each shock, there is a constant probability, p, that each
component fails. The quantity pp is the conditional probability of each component
failure, given a shock.
Table II includes formulas to cover all the upward mapping possibilities with
system sizes up to four. In the limiting cases of = 0 and = 1 , the formulas in
Table II became identical to Equation (3) (mapping up independent events) and
Equation (4) (mapping up lethal shocks), respectively.
While it is the analyst's responsibility to assess, document, and defend his
assessment of the parameter p, some simple guidelines should help in its
quantification.

If an event is classified as a nonlethal shock and it fails only one component


of a group of three or more components, it is reasonable to expect that Is
small (p < .5).

If a nonlethal shock fails a number of components intermediate to the number


present, it is unreasonable to expect that is either very small (p -> 0) or
very large (p -* 1).

If a nonlethal shock fails all the components present in a system, it is


reasonable to expect that is large (p > .5).

186
Table II. Formulas For Upward M apping of Events Classified as Nonlethal Shocks

SIZE OF SYSTEM MAPPING TO

2 3 4

P , ' 2 ' 2(1 - P I P , ' 1 p,W4<1p)3p,(1)


P2(2).pPl(1) P2f"-3p(1-p)P1l,l PjWepdp^P,!'!
2 1
O PjPI-pSp,!!) p(4)_4p2_p)P(1)
EC
u. P 4 (4)_p3 P l (1)



. P, P I - (3/2X 1 -p)P,< 2 Pl(4)_2(1_p|2p|(21
< ., + ,,., P 2 ( 4 (5/2)p(1p]P,l 2 ) + ( 1
2 2 pl^Pl
2 3 13). 2 (2) P3(4).p2Pl(2) + 2 p < 1 _ p ) p 2
Ui p4(4).p2p2(2)

>-
<
U. P,(4)|4/3l(lp)P|l3>
O
UJ 3 P2WpPl(3) + (I_p,p2(3)
tn P3(4|.pp2(3) + |,_p)p3(3)
4 3
P< pP 3 ' >

2.4.8 Development of Event Statistics From Impact Vectors. Once the impact vectors
of all the events in the data base are assessed for the system being analyzed, the
number of events in each impact category can be calculated by adding the impact
vectors. That is,

"k = E PkW (5)


i=1

where

nk = total number of basic events" involving failure of


k similar components.
Pk(i) = the k-th element of the i-th impact vector.

187
Later w e will see how nk-s are used to develop estimates of model parameters.

2.5 SUMMARY OF EVENT SCREENING

The result of this process is a set of impact vectors that s u m m a r i z e s the translation
of industry experience to the plant of interest. It is stressed that, for this to be
complete, the exercise has to be performed not only for the potential dependent
events but also for the independent events. In this process, s o m e events have been
s c r e e n e d out as being inapplicable. The validity of this screening out has been
questioned because it implies that the plant in question is s o m e h o w better than the
" g e n e r i c " plant that possesses all characteristics of all plants and that it has no
hidden causes of failure that other plants do not. Although it is clearly not
reasonable to a s s u m e each plant has all the characteristics of all plants in the data
base, this screening must be done with care and with specific justifiable reasons for
excluding any event.
The natural way to deal with the question is to c o m p e n s a t e for the deletion of
events by also reducing the exposure and (success data) the number of
independent events. As an example, suppose that it is felt that because of
significant design differences, the events occurring in one or more plants in the
generic data base are not applicable. One approach then is to exclude events for
the affected c o m p o n e n t s at these plants from the data base. This implies a smaller
e x p o s u r e , w h i c h affects the direct estimation of the basic event probabilities.
Additionally, if the parametric models are used, this implies that the associated
independent failure events also be e x c l u d e d . This smaller data base leads to larger
uncertainties in the parameter estimates, which may increase or decrease.
An intermediate case but less practical solution is one in which s o m e of the
failure causes of a c o m p o n e n t apply while others do not. In this case, the events
could be m o d e l e d or excluded depending on the cause. For example, events
relating to failures of diesel generators due to electric start motor problems do not
apply to diesel generators with air start s y s t e m s . On the other hand, generic
causes like human error w o u l d still be held to apply to systems with such specific
design differences. Each source of events w o u l d then be related to the relevant
exposure. This process is probably beyond the capability of current data systems to
support, and the former procedure of deleting plants from the data base for rejected
events is r e c o m m e n d e d .

*A c o m m o n cause basic event is defined as an event involving c o m m o n cause


failure of a specific subset of c o m p o n e n t s within a c o m m o n cause c o m p o n e n t
group (References 1 and 2).

188
3. Parameter Estimation

The next major step is to use the "pseudo-data" generated in the previous step to
provide estimates of either the basic event probabilities themselves (using the basic
parameter model) or the parameters of the common cause failure models (beta,
BFR, etc.). These estimates are subject to many sources of uncertainty and the
ways in which these are addressed are also discussed here.
The information provided by the set of impact vectors is the numbers of events in
which 1, 2, 3, and up to m, where m is the degree of redundancy, components
failed. To proceed further, it is necessary in the case of the direct estimation of the
basic event probabilities to have estimates of the exposure of the events to the
failures. The exposure may be measured in terms of the number of demands or the
total time, depending on which reliability model is appropriate for the failure mode
of interest. In the case of the parameters of common cause failure models, it is also
necessary to have at least an estimate of the relative exposures in order to derive
estimators. This is illustrated in the following example, which is included for two
reasons: first and most importantly to illustrate how assumptions made about the
way the events in the data base arose affects the estimation of common cause
event probabilities, and second to illustrate the way by which this pseudo-data base
can be anchored to preexisting estimates of single-component failure probabilities.
The example is the derivation of the estimator for the beta factor for the case of a
two-train redundant system in the failure to start mode. The illustration given is for
the case in which the reliability model chosen is that of a constant probability of
failure on demand. An alternative model, the assumption of a constant failure rate
while on standby is somewhat different.
Suppose that the evidence from the pseudo-data base is that there are n 1 failures
of single components and n 2 failure events in which both components failed.
Suppose further that an estimate of the total single-component failure probability,
Qj, already exists. Then, the unknown number of single-component demands, N, in
the pseudo-data base can be estimated by making the identification,
Q = ( n j + 2n 2 )/N. Now, all that is unknown is the number of times, N 2 , that there
was an effective test in the pseudo-data base for the common cause failure. F or
most redundant systems in nuclear power plants, the greatest number of demands
comes from surveillance testing so that the answer to this question can come from
knowing the testing strategy, as illustrated below. Consider the following two
strategies, both of which comply with a technical specification requirement that says
that each train must be tested once a month.

189
Strategy 1. Both components are tested at the same time (or at least the
same shift). In this case, the number of tests against the common cause can
be said to be N/2. The common cause failure probability therefore is 2n 2 /N,
and an appropriate beta factor estimator is

2n,
(6)
'-
This is the familiar estimator found in the PRA Procedures Guide for example.

Strategy 2. The components are tested at staggered intervals, one every 2


weeks, and, if there is a failure, the second component is tested immediately.
In this case, the number of tests against the common cause is higher
because each successful test of a component is a confirmation of the absence
of the common cause. The number of tests against the common cause failure
N 2 is related to by the following equation:

= N2 + n, + n2 (7)

The terms r^ and n 2 arise because of the failure of the first component, which
occurs nj times on its own and n 2 times in conjunction with the failure of the
other. In this case, therefore, the common cause failure probability is given
by n 2 / N 2 , which is approximately n 2 /N when n1 and n 2 are small compared
to N. This is approximately half of the failure probability that results from
assuming the first strategy is correct. The appropriate beta factor in this case
is

'-* w
This example therefore illustrates the importance of recognizing that specific
estimators are based on particular assumptions about such things as testing
strategy. In general, the testing strategies at the plants in the pseudo-data base
may not be known and will probably be mixed. The two extreme cases here should
bound the real situation.

3.1 POINT ESTIMATES

Table III presents simple point estimates for the various parametric models
described in References 1 and 2, based on the assumption that the data are from
plants in which nonstaggered testing is adopted. F ormulas for changing these
estimates to correspond with the staggered testing model are provided in

190
Reference 1. The estimators are provided in terms of the number of basic events
observed in each common cause impact category (i.e., nj n 2 > ..., nm) and, if
necessary, the number of system demands, N D , which is related to the number of
component demands, N, in the following way: = mN D . To obtain the time-based
parameters (e.g., failure during operation), the quantity N D should be replaced by T,
the cumulative system exposure time; e.g., total number of system operating hours.

3.2 ASSESSM ENT OF UNCERTAINTY IN PARAMETER ESTIMATES

Point estimates developed above only provide single values for the parameters of
the models. However, there are numerous sources of uncertainties that must be
taken into account to present a realistic picture of what the analyst knows about the
value of these parameters. In performing uncertainty analysis, it is often sufficient
to develop distributions only for the most important contributors to the system
unavailability, identified through ranking the contributors on the basis of point
estimates.

3.2.1 Sources of Uncertainty. The following provides a brief discussion of the most
important elements of uncertainty and some available techniques for incorporating
these elements in assessing parameter distributions. The uncertainties stem from
one or more of the following reasons:

Uncertainty in data classification and impact vector assessment.

Uncertainty in estimating success (exposure) data and incompleteness of


failure event data sources; e.g., underreporting of independent events.

Statistical uncertainty dictated by the size of data sample.

Variation among plants in equipment, system design, and operations.

Uncertainties about PRA parameters are typically represented in the form of


probability distributions, and it is the mean value of these distributions that is the
most suitable "point estimate" for point calculations. Therefore, it is recommended
that the parameters be estimated with the associated uncertainty distributions even
when uncertainty propagation is not intended in system quantification.
The distribution of the parameters is estimated on the basis of evidence. The
evidence could be statistical when data are available or based on expert opinion.
Bayes' theorem provides a very flexible and powerful framework for incorporating
various types of information in parameter estimation. It is particularly useful when
the evidence is of an uncertain nature as is the case with PRA data in general and
common cause failure data in particular. For this reason, parameter estimation
techniques in this report are presented in the Bayesian framework.

191 -
Table III. Simple Point Estimators For Various Parametric Models

Method Point Estimator ' * - b ' c Remarks

Estimator Is a maximum likelihood el t fea tcr.


Bute Parameter
For time-based failure rates, replace systea
' (I)" demands (NQ) with total systea exposures
tine T.

Estimators arc only provided for three


Multiple Greek Letter paraneters 11, t , and ). Estimators
for higher order parameters are derived
similarly.
Generic values of O, the total ctwponent
failure freauency. are usually available
("0/(^) from risk and r e l i a b i l i t y data sources.
The estimators are based on approximate
method described in Reference 16.

"('"')/('"')
'('>)/( "0
Generic values of Or, the total failure
Beta Factor frequency art usually available from
generic risk and r e l i a b i l i t y data sources.

The estimator Is based on approximate method


described in Reference 16.

Alph* Factor Generic values of Ot, the total failure J


k
' ' "D frequency are usually available from generic '
risk and r e l i a b i l i t y data sources.
ok k-l The estimator is a maximum likelihood
estimator, described in Reference 16.
"k
k-l '

a Estimators are maximii* likelihood estimators.


Incelai Fa(lure Rate

a For time-based failure rates, replace s y ste


demands with total system exposure tine,

n[ is the number of sfnole component


failures due to conron cause shocks. The
P . k-l quantity nj represents nu-oer of
1 Independent fai lures.
lllp)" * n.
k-l *

l-il-p)n

(a) All estimators assume that. In every system demand, a l l components and possible combinations of components
are challenged. C onsequently, system tests are assumed to be nonstaooered.

(b) For the definition of various paranoters, sec lecture notes on parametric models.
(c) Estimates are developed (or a system of m redundant components.

192
In general, the statistical uncertainty distributions assume that the required data
(e.g., n k 's for the MGL model) are known. Ho wever, as discussed in the previous
sections, such is not the case and the full representation of all uncertainties
requires some refinements in the uncertainty models. In fact, the uncertainties are
mostly driven, not by the usual statistical uncertainties, but rather by such factors as
judgment used in data classification, assumptions made about the population from
which failure and success data are obtained, and completeness of the data bases.

3.2.2 Uncertainty in Data Classification and Impact Vector Assessment. The


uncertainties due to judgments required in interpretation and classification of failure
events and the assessment of impact vectors, as described before, are perhaps the
most significant of all sources of uncertainty. Using the impact vector, the analyst's
judgment about how a given event should be counted in estimating parameters is
encoded in his probability for each of several hypotheses set forth by him about the
possible impact of the event (number of components failed), for the system being
analyzed.
Formally, this type of uncertain data can be represented as

={<,> 1= 1 ; j= 1 M} (9)

where is the analyst's probability for hypothesis j about event i, and I y is the
corresponding binary impact vector. represents the number of events in the data
base, and M( is the number of hypotheses about the ith event. Note that

M,

= 1 < 1 )

As an example, consider a data base composed of two events, with the following
hypotheses and impact vectors:

193
Impact Vector
Event Hypothesis Probability
Fo Fl F2 F3 NA

Event 1 "11 Pu 0 0 1 0 0
'12 Pl2 0 0 0 1 0

Event 2 I21 P21 1 0 0 0 0


l22 P
22 0 1 0 0 0
P
23 0 0 1 0 0
'23

There are six possible data sets that can be obtained from the above set of
hypotheses by taking all possible combination of hypotheses. These data sets and
the associated probabilities are listed in the following.

Data Event Statistics


Probability
Set N2 NA
No Ni N3
Di wi = P i i P21 1 0 1 0 0
D2 w 2 = P i i P22 0 1 1 0 0
D3 W
3 = 11 23P P 0 0 2 0 0
D4 w 4 = P12 p21 1 0 0 1 0
D5 W5= 12 P P
22 0 1 0 1 0

D6 w
6 = P
12 P
23 0 0 1 1 0

An uncertainty distribution for a given common cause parameter, , can be found


by taking any of the six possible data sets listed in the above table as evidence. If
7r(/l|D) is such a distribution based on data set Dj , then the distribution of lambda,
taking into account all possible data sets, will be given by

() = Wi^lD i) (11)
i=1

where w is the probability associated with data set D .

194
In reality, the number of data sets that can be generated by considering all
possible combinations of various hypotheses about events is very large. As a
result, the implementation of the rigorous procedure described here is extremely
difficult. An approximate way of including these effects, at least in the mean values,
is to obtain an "average" impact vector for each event before combining them to
obtain the total number of events in each impact category. Formally,

E = { l , i = 1, N} (12)

where

p
u 'u (13)

For instance, in our two-event example, this averaging process results in:

Event p2 p3 NA
Po Pi
Event 1 0 0 Pn Pl2 0
Event 2 P21 P22 P
23 0 0

Then, the resulting data set (by adding P.s from each event) is

Data
"0 "1 "2 "3 NA
Set

D P2i P22 P
11 + P
23 Pl2 0

The implications of this approximation and comparison with the rigorous


treatment according to Equation (12) are discussed in Reference 19.
Another practical approximation that attempts to incorporate the uncertainty more
completely is choosing two bounding cases, one with a consistently pessimistic
view (and nonstaggered estimators), the other with a consistently optimistic view
(and staggered testing assumptions) to provide a measure of the range. A "best
estimate" may also be provided using, perhaps, an average or expected value of the

195
impact vectors. It is recognized that more work is required for a practical and more
complete treatment of uncertainty.

3.2.3 Uncertainty due to Success and Failure Data Completeness. The problems
associated with estimating success (exposure) data (e.g., the number of system
demands or operating hours) needed by some of the parametric models directly and
all others indirectly are not specific to common cause failure analysis. It is, in
general, very difficult to obtain an accurate estimate of the success data because no
success data recording and reporting system exists for the nuclear industry. Even
reconstruction of the success data from plant-specific records, as is often done in
plant-specific probabilistic risk assessments (PRA), is not only a major task, but also
heavily involves the judgment of the data analyst. However, the problem is
exacerbated in the case of common cause failures because of the problem of
estimating the success data for groups of components taken together. Since the
data on which the estimates are based are from groups of plants that probably have
different surveillance test strategies, it is unlikely that "exact" estimators can even
be found, thus adding another dimension to the uncertainty.
Similar uncertainties exist about the completeness of the failure event sources. It
is believed, for instance, that a substantial proportion of all independent failure
events are not reported to the Licensee Event Report (LER) system. Both of these
uncertainties can be represented explicitly in the parametric distributions through
Bayes' theorem by assuming uncertainty distributions for both the success and
failure data.

3.2.4 Statistical Uncertainty. This source of uncertainty is a well-known subject in


statistics. It stems from the fact that parameters are estimated only on a subset of
the entire population of failure and success data. Larger sample sizes result in a
higher degree of confidence in the estimated parameters simply because they are
better representative of the general population. For instance, the variance of the
distributions of the basic parameter model decreases as n k and ND (for
demand-based parameters) increase. Similar behavior is observed in the
distributions of other models. For a detailed discussion of statistical uncertainty
distributions for the various parametric models presented here, see Reference 1.

3.2.5 Plant-to-Plant Variability. The fourth source of uncertainty is the familiar


concept of variation of the value of the parameters from plant to plant. This type of
variability stems from the fact that similar equipment and systems in various plants
may show inherently different failure rates due to a variety of reasons, such as
minor design differences within the same category of equipment, variation in system
designs, and operating philosophies leading to different coupling mechanisms.
Conceptually, there are two approaches for dealing with this issue. One approach
is to assess the variability of the parameters that are based on statistical evidence

- i% -
from each plant, without screening events based on their applicability to the
situation under consideration. If it were practical, this would result in a wider range
of possible values for the parameters than if this variation were ignored. In the
second approach, which is adopted in this report for estimation of the common
cause parameters, failure events from various plants are reclassified and mapped
for the plant or system of interest. The result is the formation of a data base much
larger than one based only on the records of the specific plant under consideration.
The resulting statistical uncertainty range for the estimated parameters will
obviously be smaller in this case, compared with a distribution representing
differences in plants. This reduction in uncertainty is the result of applying the
additional information about the specific characteristics of the system being
analyzed and obviates the need for separate consideration of the plant-to-plant
variation for the common cause parameters. This decrease in statistical uncertainty
is bought at the expense of another uncertainty, that in the impact vector
assessment. It is however still essential to consider plant-to-plant variation for total
failure rates.

3.3 USE OF GENERIC VALUES OF COMMON CAUSE PARAMETERS

The systematic procedures for dependent events analysis require the analyst to
screen and classify event data, use estimators provided, and develop uncertainty
distributions and/or point estimates of model parameters for each specific analysis.
This procedure is recommended instead of using published numerical data for these
parameters for several important reasons. One reason is to prevent the use of data
that are inapplicable to the system being analyzed. Another is to provide a
consistent framework for combining data from systems having different numbers of
components and for accounting for differences between the number of components
being analyzed and those associated with systems providing the data. In addition,
event screening can eliminate all inconsistencies between the data and the
assumptions built into the common cause event models. Finally, the event
screening and classification process provides qualitative insights about possible
approaches to defending against future occurrences of these events in the system.
A formidable obstacle to the adoption of an approach based on event screening in
prior analyses was the amount of time needed to sift and sort through such event
reports as the LERs and the numerous problems associated with extracting
quantitative information from the review of these reports. A useful contribution to
lessening the work has been the development and application of the
EPRI-dependent events classification system. The final form of this classification
system (Reference 17) has been and is currently being applied to a large fraction of
the accumulated LERs covering U.S. power reactor experience. As mentioned
earlier, an initial data base of classified event reports, including several hundred
dependent events, is provided In Reference 16. Numerous examples of this

- 197 -
classified data base are presented in this section. This EPRI data base was
expanded in a companion project (Reference 20). The availability of these classified
data bases greatly reduces the time required to incorporate event screening as an
integral part of systems analysis if one is willing to accept the classification of the
authors of the report. It should be remembered that this classification is subjective.
However, at the very least, the report provides a prescreening of the data to identify
event reports worth looking at in detail.
Despite the availability of classified event reports, it is recognized that there is a
continuing need to support analysts who may need to bypass the event screening
step and use published numerical values of common cause event parameters. For
these analysts, a list of what the authors call "generic beta factors" are provided in
Reference 1 (Table IV). Although the use of these generic factors is strongly
discouraged as a substitute for the event screening approach, the use of these
generic beta factors may be used as a coarse and conservative screen for common
cause analysis, provided suitable qualification of the results is indicated. Implicit
assumptions in the use of these parameters include the following:

The analyzed system is susceptible to all the same (unspecified) common


cause events experienced by all the plants in the data base.

The analyzed system has the same, yet unspecified, success criteria as those
assumed by the analyst who classified the data.

The Table IV values of the beta factor include both failures to start on demand
and failure to run for all components except breakers and valves. Hence,
they represent an average of these modes weighted by their relative
frequency of occurrence.

The beta-factor estimates have been developed from systems of different


sizes. Their application implicitly assumes that the system being
analyzed has an "average" number of components.

The values do not account for underreporting of independent events. The


beta factors are therefore additionally conservative.

Included in this table is a generic beta factor for a "generic component."


This factor can be used with components not listed in the table but identified
by the analyst as being in a common cause group. It should be used for
screening purposes only. It is the responsibility of the analyst to defend any
conclusions derived from generic beta factors in light of the above severe
limitations. The authors generally discourage this approach and would prefer
that each analyst perform his own evaluation of the data to base each
analysis on.

- 198 -
Table IV. Event Classification and Analysis Summary

Event Distribution *>

Number of
Reactor Generic Common Causa Generic
Component Events
Years Events Beta Factor
Classified <*> Independent Dependent

Potential Actual

Reactor Trip Breakers 563 72 56 16 3 8 .19


Diesel Generators 394 674 639 35 9 13 .05
Motor-Operated 394 947 842 105 17 25 .08
Valves
Safety/Relief Valves
PWR 318 54 30 24 0 0 .07
BWR 245 172 136 36 7 7 .22
Check Valves 654 254 242 12 3 5 .06
Pumps
Safety Injection 394 112 77 35 2 6 .17
RHR 394 117 67 50 2 3 .11
Containment Spray 394 48 32 16 1 1 .05
Auxiliary Feedwater 394 255 194 61 2 3 .03
Service Water 394 203 159 44 2 2 .03
Chillers 654 33 27 6 2 2 .11
Fans 654 59 49 10 2 3 .13
All - 3,000 2,550 450 52 78 .10<c>

Notes: '

a. Events classified include those having one or more actual or potenti al component failures or functionally
unavailable states.
b. Independent events are those in category LS (linear, single unit); d ependent events are those in the
following categories: LM (linear, multiple unit), BSR (branched, single unit, root-caused), and BSC (branched, single unit,
component-caused); generic common cause events are a subset of event category BSR that meets screening criteria to
be modeled in a systems analysis as a common cause event. Actual common cause events have at least two actual
component states.
c. Average of all component beta factors.
4. References.

1. Mosleh, ., et al., "Procedures for Treating Common Cause F ailures in Safety


and Reliability Studies," prepared for Electric Power Research Institute and
U.S. Nuclear Regulatory Commission, NUREG/CR-4780, Vols. I and II, 1988.

2. F leming, . N., "Parametric Models for Common Cause F ailure Analysis,"


this volume.

3. F leming, K. N., A. Mosleh, and R. K. Deremer, "A Systematic Procedure for


the Incorporation of Common Cause Events into Risk and Reliability Models,"
Nuclear Engineering and Design, Vol. 93, pp. 245-273, 1986.

4. Poucet, ., A. Amendola, P. C. Cacciabue, "Summary of the Common Cause


Failure Reliability Benchmark Exercise," Joint Research Centre Report, PER
1133/86, Ispra, Italy, April 1986.

5. Mosleh, ., "Hidden Sources of Uncertainty: Judgment in Collection and


Analysis of Data," Nuclear Engineering and Design, Vol. 93, 1986.

6. EG&G Idaho, Inc., "Data Summaries of Licensee Event Reports of Diesel


Generators at U.S. Commercial Nuclear Power Plants, January 1, 1976, to
December 31, 1978," prepared for the U.S. Nuclear Regulatory Commission,
NUREG/CR-1362, EGG-EA-5092, March 1980.

7. EG&G Idaho, Inc., "Data Summaries of Licensee Event Reports of Pumps at


U.S. Commerical Nuclear Power Plants, January 1, 1972, to April 30, 1978,"
prepared for the U.S. Nuclear Regulatory Commission, NUREG/CR-1205,
EGG-EA-5044, January 1982.

8. EG&G Idaho, Inc., "Data Summaries of Licensee Event Reports of Valves at


U.S. Commercial Nuclear Power Plants, Main Report, January 1, 1976, to
December 31, 1978," prepared for the U.S. Nuclear Regulatory Commission,
NUREG/CR-1363, EGG-EA-5125, October 1982.

9. Miller, C. F ., W. H. Hubble, M. Trojovsky, and S. R. Brown, "Data Summaries


of Licensee Event Reports of Selected Instrumentation and Control
Components at U.S. Commercial Nuclear Power Plants," prepared for the
U.S. Nuclear Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-1363,
EGG-EA-5816, Rev. 1, October 1982.

10. Sams, D. W., and M. Trojosky, "Data Summaries of Licensee Event Reports
of Primary Containment Penetrations at U.S. Commerical Nuclear Power

201 -
Plants, January 1, 1972, to December 31, 1978, prepared for the U.S. Nuclear
Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-1730, EGG-EA-5/88.

11. Hubble, W. H., and C. F . Miller, "Data Summaries of Liscensee Event Reports
of Control Rods and Drive Mechanisms at U.S. Commercial Nuclear Power
Plants, January 1, 1972, to April 30, 1978, prepared for the U.S. Nuclear
Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-1331, EGG-EA-5079.

12. S. M. Stoller, Nuclear Power Experience, updated monthly.

13. Atwood, C. L, "Common Cause F ault Rates for Pumps," prepared for the U.S.
Nuclear Regulatory Commission, EG&G Idaho, Inc., NUREG/CR-2098,
EPRI-685-DOC-01, EGG-EA-5289.

14. Atwood, C. L, and J. A. Steverson, "Common Cause F ault Rates for Valves:
Estimates Based on Licensee Event Reports at U.S. Commercial Nuclear
Power Plants," prepared for the U.S. Nuclear Regulatory Commission, EG&G
Idaho, Inc., NUREG/CR-2770, EGG-EA-5485, F ebruary 1983.

15. Meachum, T. R., and C. L. Atwood, "Common Cause F ault Rates for
Instrumentation and Control Assemblies," prepared for U.S. Nuclear Energy
Commission under Department of Energy Contract No. DE-AC07-761D01570,
Idaho National Engineering Laboratory, EG&G, Idaho, Inc., NUREG/CR3289,
EPRI-685-DOC-06, EGG-2258, May 1973.

16. F leming, K. N., and A. Mosleh, "Classification and Analysis of Reactor


Operating Experience Involving Dependent Events," Electric Power Research
Institute, EPRI NP-3967, F ebruary 1985.

17. Los Alamos Technical Associates, Inc., "A Study of Common Cause
FailuresPhase II: A Comprehensive Classification System for Component
Fault Analysis," EPRI NP-3837, May 1985.

18. Doerre, P., "Possible Pitfalls in the Process of CCF Event Data Evaluation,"
Kraftwerk Union AG, Proceedings, PSA ' 8 7 - International Topical Conference
on Probabilistic Safety Assessment Risk Management, August 30-
September 4, 1987.

19. Mosleh, ., and N. O. Siu, "On the Use of Uncertain Data in Common Cause
Failure Analysis," Proceed ings, PSA '87International Topical Conference on
Probabilistic Safety Assessment Risk Management, August 30-September 4,
1987.

202
20. Smith, M., et al., "Data Based Defensive Strategies for Reducing
Susceptibility to Common Cause Failures," draft report, prepared for Electric
Power Research Institute, Saratoga Engineering Consultants, 1987.

203
PITFALLS IN COMMON CAUSE FAILURE DATA EVALUATION

P. Drre
SIEMENS AG (KWU)
P.O. Box 101063
D-6050 Offenbach
FR Germany

ABSTRACT. Two major aspects of the CCF problem are addressed: the nature
of CCF in the contexts of data evaluation and reliability analysis, and
uncertainties arising from limitations to expert judgment, when failure
data are evaluated. Seme inconsistencies and pitfalls in contemporary
dependent failure modeling methodology are identified, analyzed and
discussed quantitatively. Methods and solutions are proposed to avoid
inadequate pessimistic evaluations in case of redundancy mismatch.

1. INTRODUCTION
Identification and evaluation of common cause failure (CCF) events
involve a considerable amount of engineering judgment subject to error
and uncertainty. The quality of judgment influences validity and hence
credibility of quantitative results, e. g. of a reliability analysis.
This especially holds for highly redundant systems, in which CCF can be
a major contributor to the system failure probability.
This seminar contribution refers to insights and experiences
mostly gained from the participation in the recent Reliability Benchmark
Exercise on Common Cause Failure (CCF-RBE) of the European Community
(Poucet et al (1)). It consists of two parts.
The first qualitative part deals with the identification of CCF,
especially with unresolved issues, ambiguities, and definition problems.
Its intention is to give an impression why CCF still lacks relevant
definition and why so many different approaches to solve the problem
exist.
In the second part, it is assumed that any CCF can be identified
with sufficient accuracy (although a "definition" may still be lacking).
However, even perfect identification of all CCF events present in a
given data base alone will not guarantee a realistic reliability
estimate for a system to be analyzed, if this system is different with
respect to technical features, degree of redundancy etc. from the
systems in which the original events occurred: further aspects have to
be adequately accounted for.

205
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 205-219.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
Within the general problem of using data from one or more
different data sources (which involves data uncertainties as well as
uncertainties due to variations in technical judgment with respect to
applicability), two special problems arising from different degree of
redundancy can be isolated and treated separately. The reason is that
here mathematical restrictions to the process of engineering judgment
play a decisive role. These two problems are connected with the
following two cases of different degree of redundancy:
- the system to be analyzed is of lower redundancy than (some of) the
systems in the event data base;
- the system to be analyzed shows a higher degree of redundancy than
(some of) the systems for which operational experience exists.
Two methods are introduced to treat these "mismatches" in
redundancy adequately. Whereas the first case is completely determined
by the laws of combinatorial analysis together with symmetry
considerations, the extrapolation to a higher degree of redundancy is
non-deterministic and therefore implies further assumptions.

2. CCF IDENTIFICATION

2.1. Prologue
In this chapter some global aspects of CCF definition and identification
are briefly described. No distinction is made between the notations
"common cause failure", "common mode failure", "dependent failure", as
is sometimes done, e. g. in Bourne et al (2). The essential effect of
CCF on a system is the breakdown of multiplicity (redundancy), i. e. the
loss of independence.

2.2. Simple Cases for Identification

Some kinds of dependencies are often excluded from special CCF modeling,
as they can be modeled explicitly, e. g. as extra fault tree entries.
This holds for the following simple cases:
- common or shared components (e. g. auxiliary systems),
- external events or environmental effects,
- failure propagation to adjacent subsystems (if they are not
separated by appropriate barriers).
Although even for these "simple cases", CCF can in principle be
modeled implicitly by a parametric model, usually only the CCF events
which are not simple cases are retained and modeled that way. As a tool
for disentangling the universe of phenomena, a classification system has
been proposed by Crellin et al (3).

2.3. Existence of a Practical Solution

From a practical standpoint, "CCF" (with respect to malfunction of


components in a given system) can be considered to be defined by a
documented list of multiple failure events, each event being either
classified as "CCF" or "not CCF", like a set is defined by its elements.

- 206 -
Variations in basic understanding of what a CCF is will lead to
different lists from different analysts with different subjective
judgment. This can be treated as a modeling uncertainty and quantified
within a parametric CCF model.

2.4. CCF - a Name for a Non-Existent Phenomenon?


The question of non-existence does not at all touch the existence of
multiple failure events (which - as can be doubtlessly seen from
operational experience - do exist), but refers to the explanation theory
associated with these events: do we really need an extra theory for
explanation of these effects, and extra mathema-tical models to describe
them - or does "CCF" only exist as a residual effect, created by using
oversimplified models and concepts for component failure beyond their
domain of applicability?
The above mentioned practical solution does not solve this basic
problem. Furthermore, it can be argued that any definition like

"CCF refers to multiple failures which occur simultaneously


or in a short time interval due to a common cause"

is rarely informative, as
- common causes always exist, as causes are generally not specific for
an individual component, but "generic", i. e. a set property of a set
of components (a statistical population) rather than an elemental
property of a single item;
- any multiple failure of two or more components of the same type within
a given time interval can be considered to be a CCF, if only one cause
for a special failure mode exist (less strict: if one "cause" or
"failure mode and effect" dominate the failure behaviour of this
special component type);
- the effects (manifestations and circumstances of the failure), not the
causes are observed directly and have to be investigated with respect
to common properties.
To the author's opinion, any tentative "definition" or description
of CCF has to address at least 3 major points, namely
- what failure is;
- the time aspect of failure;
- what a component is.

2.5. Failure
As to the notation "failure", Crellin et al (3) clearly distinguish
between two different types of malfunctions, namely failure and
functional unavailability, and propose the use of logic diagrams to
analyze multiple events. We note that the distinction mainly refers to
the (boundaries of the) object of failure and hence to the component
picture. (As this distinction is of minor relevance in this
contribution, the notation "failure" here is used in the most general
meaning of "malfunction".)

207 -
2.6. The Time Aspect: Criterion of Correlated Failure Times

Which are the reasons for a correlation between the failure behaviour of
two or more components? It was pointed out by Virolainen (A) that two
mutually exclusive types of correlations can be identified:

Physical correlations (direct component-component interaction or


interaction via "third parties", e. g. the same room in case of fire).
Here a conmunality (a causal relation) in the physical failure process
exists, which is responsible for a correlation in the failure times of
components involved in a multiple failure.

Statistical correlations. Here no correlations between the exact actual


failure times do exist (however, a strong correlation between the times
of failure detection may exist).

The simple cases mentioned before belong to the first category


(the reverse is not true). It can further be expected that for
permanently supervised or permanently operating "redundant" (= shared
load) components, all observed CCF events also refer to the first
category.
There are two kinds of statistical correlations:
- state-of-knowledge correlations (Apostolakis and Kaplan (5)), caused
by shared statistical evidence on the failure behaviour of components;
- correlations introduced/created by model assumptions (especially
simplifications).
The most important model assumption may be that of a "homogeneous
population", which especially in case of rare knowledge or data cannot
be falsified. However, the usual situation is rather that of an
inhomogeneous population (especially when different data bases have been
merged) with unknown, or unspecified, systematic population variability:
such a population can be best described by a failure rate distribution
with respect to objective properties of the physical world (in contrast
to state-of-knowledge distributions, which also exist in case of
"physical" homogeneity). Causes for the spread may e. g. be different
"environments" (Hughes (6)). Then different random samples of small sets
of identical components will show different failure rates.
As another example, consider a set of events created by randan
independent failures due to enhanced or increasing (time-dependent)
failure rates, which cannot be correctly modeled simply by constant
"generic" failure rates. The "poor quality" case of Virolainen (h) is
included here. Further potential contributors are model assumptions like
"as good as new", which can.rarely be verified, as this implies perfect
knowledge of the component state after test or repair (Doerre (7)).
It is pointed out that although both kinds of statistical
correlation can be modeled by failure rate distributions (and can
therefore be treated by the same class of mathematical models), one
should clearly distinguish between the causes for the distributions: on
the one hand lack of knowledge, on the other hand systematic physical
influences. Thus, e. g. "failure rate coupling" can be used as a CCF
model - in disagreement to Apostolakis and Moieni (8), who only consider

- 208 -
"failure rate coupling" in the context of state-of-knowledge
correlations, in which it is an incorrect treatment of CCF.

2.7. The Component Picture

To the author's opinion, a major part of multiple failure events often


interpreted as "CCF" is created by inadequate modeling in an
oversimplified component picture. Failures or changes in the failure
frequency due to imperfect test procedures as well as imperfect human
actions during routine test, maintenance and repair are often considered
as absolute properties of the hardware component subjected to these
actions (Doerre and Sobottka (9)) An oversimplified component picture
favours the tendency of the analyst to use an implicit CCF
interpretation on the hardware component level rather than to model
shared non-hardware factors explicitly.
What a component "is", and under which conditions it can be
considered similar or equivalent to another component, can therefore not
generally be defined explicitly. However, with respect to a given
reliability analysis, components are implicitly defined e. g. by the
entries to a fault tree. In data analysis, a "definition" is introduced
e. g. by applying the two categories "root-caused" and "component-
caused" malfunction (Crellin et al (3)).

3. EVALUATION OF OPERATIONAL EXPERIENCE

3.1. Generic Evaluation

Evaluation of operational experience on (components in) other systems


(which will be denoted "source systems" subsequently) is usually
necessary due to lack of data on the system to be analyzed (the "target
system"). It should be noted that any analytical effort would completely
be superfluous, if there were sufficient data on the target system.
The evaluation with respect to CCF events is based on a valid CCF
identification criterion. This problem was discussed in Chapter 2; here
we assume that any CCF can be unanimously identified.
As a result of the CCF screening procedure applied to a set of
events, the following information is obtained
- which events have to be considered as independent failures,
- which events are candidates for CCF, and how many redundancies/
equivalent components have been affected by each event.
When US Licensee Event Reports (LEFs) are the major source of
information on CCF events (as was the case in the CCF-RBE), independent
failures of single components with minor safety impact or effect on the
plant behaviour could be "undercounted". Furthermore, the statistical
population (i. e. the number of equivalent or redundant components) is
not always known; this especially holds for motor-operated valves.

- 209
3.2. Representation of Evaluation Results: Impact Vectors

A simple and efficient tool for representing multiple malfunction


information by means of a discrete probability distribution for further
processing is the impact vector method introduced by Fleming et al (10).
The impact vector shows how many redundancies are affected. It is
possible to account for the estimated severity or potentiality of the
impact by using subjective weight factors. Example impact vectors are
given in the Tables. Impact vectors define a standardized information
"interface" to determine the parameters of implicit CCF models.

3.3 Specific Evaluation

So far only the technical properties of the source systems and the
corresponding boundary conditions of multiple failure events played a
role. When one intends to apply this data base to another system, one
has to ask first of all whether data will still be appropriate under
changed conditions, and how data have to be modified if necessary.
The fundamental (yet often unrecognized) assumption for any data
transfer is the strong principle of causality, which in this context
states that "similar components show similar failure behaviour". Here
similarity of systems or components refers to technical, operational and
environmental features. The following cases of matching relations
between the data base source systems and the target system can be
distinguished:
- equivalent: the impact vector remains unchanged;
- similar (comparable): according to the degree of similarity,
modified weight factors can be chosen for the impact probability
distribution of the event. In this process of engineering judgment,
both technical reasons and mathematical implications have to be
considered;
- not comparable: the event does not apply to the target system.
Two special problems concerning modifications with respect to
different degree of redundancy will be discussed separately in more
detail in what follows. As they are independent from the choice of a
special parametric model, they can be described by using the impact
vector method alone.

A. REDUCTION TO LOWER REDUNDANCY

h.1. The Problem of Mapping Down

It is obvious that when all redundant components failed on demand, also


all components in any system of lower redundancy have to be considered
failed. But when originally only less than all components have been
affected by a CCF event, what are the corresponding probabilities in the
impact vector of a lower-redundant system? Any event in a higher-
redundant system can be decomposed in a deterministic way into several
partial events which occur in systems with lower degree of redundancy.

210
Here complete symmetry is assumed, i. e. each component has the same
probability to fail in any malfunction event.

A. 2. Examples

I t i s shown by two s t r i k i n g examples from US NPP operational experience


t h a t disregard of combinatorial l i m i t a t i o n s , due t o the fact t h a t seme
redundancies did not f a i l , lead t o inadequate r e s u l t s . The f i r s t example
which demonstrates the effect of neglect of success information i s shown
in Tab. 1. The o r i g i n a l example i s taken from Fleming et a l (10). I t i s
c l e a r from the event description and impact estimation t h a t t h e analyst
r e f e r s t o an o r i g i n a l system of 6, not of A pumps, with 2 pumps
remaining i n t a c t . Keeping the o r i g i n a l event impact assessment
unchanged, combinatorial r e s t r i c t i o n s accounted for in a reassessment of
the event impact then lead t o a decrease of the t a r g e t impact vector
probability P. by a factor of 5.
In the Second example (see Tab. 2 ) , two events concerning MOVs can
e i t h e r be i n t e r p r e t e d as two double (case 1) or one quadruple f a i l u r e
(case 2 ) . For a redundant t a r g e t system, the p r o b a b i l i t y P, i s even in
the worst case considerably lowered by correct mapping down.

4 . 3 . Derivation of the General Equation for Mapping Down

This section i s taken from Doerre (11). Assume t h a t k components in an


mredundant source system f a i l e d , and t h a t the s t a t i s t i c a l weight of the
associated source impact vector i s w with 0 w i 1, i . e.

Source impact vector: P. = w, kim (1)

The corresponding impact vector of an redundant t a r g e t system (n < m)


has t o be determined. Let be the degree of redundancy and l i n the
number of f a i l e d components in the t a r g e t system. The number of
p o s s i b i l i t i e s t o take a sample of s i z e from a set of m components i s
given by the Binomial coefficient

() = m!/(n!(mn)!). (2)

Any sample consists of 1 from k failed and n1 from mk intact items.


Therefore we get for the

Target impact vector:


OISi
P. ' " = w. (3)

Nonvanishing p r o b a b i l i t i e s are obtained for a l l values of 1 which


satisfy the following r e s t r i c t i o n (with max = maximum, and min = minimum
value) :

max(0, nm+k) i l i min(n, k ) . (A)

211
Event description

Plant: Point Beach 1 and 2

Text: Preoperation strainers left in suction line plugged,


making motordriven AFW pump A on Unit 1 inoperable.
Similar strainers were found in Unit 1 motordriven AFW
pump and Units 1 and 2 turbinedriven AFW pumps.

Original event impact assessment



r r P P N/A
0 1 2 3 A
Point Beach: 0. 0.9 0. 0. 0.1 0.

3redundant system: 0. 0.9 0. 0.1 0. 0.

Combinatorial restrictions to assessment

As there are 3 pumps per Unit, the degree of redundancy is


definitely 6.

If the event is weighted with 0.9 as a 1 of 6 failure, then

( h u ) = 10 states show a 1 of 3 failure > weight factor 0.45,


1 ^
(H3) = 10 states show a 0 of 3 failure > weight factor 0.45.
If the event is weighted with 0.1 as a A of 6 failure, then
A 2
(JL) = A states show a 3 of 3 failure > weight factor 0.02,
A ?
(J(r) = 12 states show a 2 of 3 failure > weight factor 0.06,
A ?
C)(r) = A states show a 1 of 3 failure > weight factor 0.02.

Reassessment for a 3redundant system

P
0 . P1 P
2 P
3 P
A P
5 P
6 N/A

Point Beach: 0. 0.9 0. 0. 0.1 0. 0. 0.

3redundant system: 0.A5 0.A7 0.06 0.02 0.

Tab. 1: Reevaluation of a f a i l u r e event affecting AFW pumps.

212
Event description

Plant/date: Oconee 1 (Nov. 1975) and 2 (Dec. 1975)

Text: 2 MOVs failed due to low torque setting (2 events).

Original source impact vector: P~ P. - N/A


(2 impact vectors)
0.

Worst case target impact vector: _ P. P ? P^ P, N/A


(4-redundant target system)
0. 0. 0. 0. 1. 0

Modified source impact vector: PQ P- P_ P., , P^ , NA

- case 1 (2 for 2 events) 0. 0. 1. 0. 0. 0. 0. 0.

- case 2 (1 event) 0. 0. 0. 0. 1. 0. 0. 0.

Modified target impact vector: PQ P. P~ P., , , NA

- case 1 (2 for 2 events) .07 .53 0. 0. - - 0.

- case 2 (1 event; worst case) 0. 0. .53 .07 - - 0

Tab. 2: Reassessment of a CCF event affecting MOVs.

5. EXTRAPOLATION TO EXTENDED REDUNDANCY

5.1. The Problem of Mapping Up

When one extrapolates a multiple failure that originally affected 2 of 2


redundancies to higher-redundant systems (e. g. A subsystems), one
should introduce a renormalization factor. This becomes necessary
because the number of single (independent) failures in the statistical
population refers to the original 2-redundant population with 2N
components, whereas the number of assumed quadruple failures then always
refers to a fictitious "doubled" population with AN components.
Generally, if the extrapolation is from k to with > k, a
renormalization factor k/n should be introduced, which limits the
maximum value of the total weight of the associated impact vector. Three

213
different cases can be distinguished for the application of "mapping up"
techniques:
single (independent) failure,
total system failure (k of k redundancies),
partial system failure (at least 1 redundancy did not fail).

5.2. Motivation for Applying Mapping Up Techniques

In the following considerations, some special parametric models (as well


as special notions) are used for illustration. These models are
described in detail e. g. in Fleming et al (10) and Mosleh et al (12).
Assune that your data base (as is frequently the case) consists of
2redundant systems. Summing up the corresponding impact vector
probabilities of all events, we obtain e. g. the following event
frequencies:

n / 2 ' = I P / 2 ' = 180, n 2 ( 2 ) = P 2 ( 2 ) = 10. (5)


For s i m p l i c i t y , i t i s further assimed t h a t a l l double f a i l u r e s are
l e t h a l shocks. The "straightforward" extrapolation t o a 4redundant
t a r g e t system would then give

n^] = 180, n A U ) = 10, (6)

as "the shock frequency remains constant". In the framework of the


factor model (the MGL model for 2redundant systems),

2 = ( 2 ) = 2 n 2 ( 2 ) / (n 1 ( 2 ) +2n 2 ( 2 ) ) = 20/200 = 0.1 (7)

Correspondingly, we get for the 4red. system

A = ^ V ^ i ' ^ = An 4 ( A , /(n 1 ( 4 , + 4n A ( A ) ) =40/220= 0.18 (8)

For a kredundant system with k ^ oo, we then have

B^ = ^\{k) / ( n ^ ^ + k n ^ 1 0 ) = 10k / (100 + 10k) 1. (9)

This effect can be avoided by "renormalization": if in case of the 4


redundant system, we set

either (type 1): n^k) = n2(2) and n^] = 2n.,(2), (10)

or (type 2) : (Z,)
r, (2)
/? =>r,H r, { ^ r, (2
' fill
n
' 4 n_ 12. and n n
i \\\)
we obtain

V: 2, (12)

i. e. the numerical effect will remain constant (note that the


assumption "all original double failures are quadruple failures"
describes the worst case).

214
5.3. Development of a C onsistent Mapping Up Procedure
The procedure i s developed in two s t e p s . The f i r s t preliminary step i s
very similar t o the method proposed i n the current draft version of the
US "C C F guide" (Mosleh et a l ( 1 2 ) ) , but needs further modification
(which w i l l be done as a second step) in order t o avoid s t a t i s t i c a l
inconsistencies.

Step 1. Assume a data base of 3redundant source systems and a 4


observed system f a i l u r e behaviour:

(3) r, ( 3 ) 13) ..,.


n , n_ , n, ( 13)
For the 4redundant system, there is no doubt that any single
(independent) failure event will remain a single failure event, as
independent failure does not at all depend on the redundancy structure.
But due to the extended population size, now more single failures can be
expected (cf. renormalization type 1, Chapter 5.2):

n/* = A / S n / 3 1 . (14)
We pessimistically assume that there is also no doubt about the
corresponding outcome of any total failure event, i. e.

n^ = n3 . (15)

But what about the remaining case of incomplete failure events (here
only: 2 of 3)? In the worst case, on can assume

n^ = 0, n 3 = n2 , (16)

according t o the evidence "one component has always been observed i n the
i n t a c t s t a t e " . Likewise, the best case would r e s u l t i n

n ^ = n 3 ^ = 0 , n2>=n2<3\ (17)

according to "only two components have always been observed failed". In


a third "most likely" case, further information on the failure evidence
is evaluated: a 2 of 3 failure corresponds to a conditional failure
probability of 2/3, which is assumed to hold for the redundant system,
too. Then we get the following contributions to the redundant system
event frequencies:

n^ = 0 , 3>=2/32(3), n ^ ' = 1/3 n 2 < 3 ) (18)

In Mosleh et al (12), a slightly different approach is proposed


which would also yield a nonvanishing frequency n4 in contrast to
the observation of one intact component in each event.

215
Using the "most l i k e l y " case together with the approach for s i n g l e
and t o t a l f a i l u r e events, we w i l l now show t h a t t h i s procedure fully
conserves t h e most important numerical impact of C C F, i . e. the two "
f a c t o r s " of the 3 and 4redundant system (which contain the t o t a l
system f a i l u r e in t h e MC L model) are numerically i d e n t i c a l

3 = { 3 V 3 ) 3n3(3) / ( n / 3
^ 3
^ 3
*) (19)
, = ^ V " ^ = ^ ' / ^ ' ^ ^ ^ ^ ' ^ ^ ' ) (20)
With the above contributions of the 3redundant event frequencies from
eqs 14, 15, and 18, we get

. = 4 n < 3 ) / (4/3 n 1 (3) +2/3 <3)+6/3 '3)+4n.,(3) )


4 3 1 2 2 3
(21)
= 4/3 3 n 3 ( 3 ) / 4/3 ( n ^ 3 * + 2 n 2 ( 3 ) + 3n 3 < 3 ) ) = &y

Step 2. An undesired effect of the previous procedure is the creation of


fictitious single (independent) failure events, which will increase
statistical evidence due to renormalization factors greater than 1. In
order to avoid statistical inconsistencies associated with this effect
(Apostolakis and Moieni (8)), we remember that there exist two
alternative approaches (see Chapter 52, eqs 10 and 11): instead of
renormalizing single failure event frequencies (keeping multiple failure
frequencies constant), one can renormalize multiple failure event
frequencies as well (keeping single failure frequencies constant; for
total system failure, this has already been shown in Chapter 5.2).
However, the latter concept of renormalization (type 2) only employs
renormalization factors less than 1, thus decreasing the strength of
actual evidence, which in the context of uncertainty considerations is a
favourable feature.
Therefore the two approaches (renormalization type 1 and 2) are
not statistically equivalent. They are, however, numerically equivalent,
as can be easily seen from the present example:

Type 1: n . ^ n.(3\i>1, n^]= 4/3^^

Type 2: n^>= 3/4n' (3 \ i > 1, n ^ = /3*,

and A / (4/3 n.,(3) +B) = 3/4 A / {n^{2) +3/4 ) , (22)

where A refers to the general form of the numerators, and to the mul
tiple failure contribution in the denominators of e. g. eqs 8 and 20).

5.4. Proposed Solution for Mapping Up

It is recommended to use the type 2 renormalization procedure.

Single (independent) failure events. If an event can be classified in a


kredundant system as independent failure (or in case of multiple
failure can be decomposed into multiple independent failure events),

216
no ambiguity will remain about the behaviour of other redundancies in an
-redundant system with > k. The expected failure event frequency
is independent from the redundancy structure of the population and
proportional to the population size, which can therefore be kept
constant, i.e.

,) = P l ( k ) . (23)

Total system failure events. When the target system shows a higher
degree of redundancy (n > k ) , then any postulated of malfunction
event corresponds to an original k of k event, and therefore has to be
renormalized by a reduction factor k/n, as any actual k of k malfunction
event is only evidence for the fraction k/n of a fictitious of
event.
The failure of all system redundancies is often discussed in the
framework of a shock model as a "lethal shock", with the shock frequency
being independent from the degree of redundancy. It should be emphasized
that this assumption is not violated by the proposed procedure, as the
nunber of total failure events remains constant, i. e. independent from
the system size. However, their statistical weights have been
renormalized, accounting for incomplete information.

Partial system failure events. First of all a renormalization factor k/n


has to be introduced. Next only the unknown states of the remaining n-k
components have to be determined, as k states are already known.
According to the preferred degree of pessimism, there are two possible
methods:

- The "worst case" method assumes that all -k components failed. For a
2 of 3 component failure event and a 4-redundant target system, the
fourth component is then assumed failed, which yields a target impact
vector with the only non-vanishing probability
IU)
P3 ' = 0.75. (24)
- The original event is used to derive the probability that any number
of the -k components failed: the "most likely" conditional
probabilities of 2/3 for "failed" and 1/3 for "intact" are employed.
The target system impact vector then reads

P 2 U ) = 0.25, P 3 (/,) = 0.5. (25)


For realistic estimates, it is recommended to use the evidence-
based second extrapolation which can be interpreted in the framework of
Binomial failure. However, it is not recommended to replace actual
evidence by a theoretical picture, e. g. to employ the BFF model to
calculate all component states, as is proposed in Mosleh et al (12):
there is no need to destroy evidence.
The general solution to extrapolate an incomplete failure event to
higher redundancy proceeds as follows:

217
(1) determine the conditional failure probability = (1/k)
( 1 = 2 from k = 3 components failed in the above example,
k-1 = 1 remained intact);

(2) set Pn_k+l+1(n), . . . , P n ( n ) = 0 ; (26)


(3) calculate the n-k+1 impact vector probabilities
() ()
1 ' " ' rn-k+l
by using the Binomial distribution with = (1/k), i. e.

p
(n) =_ (k/n)
(1 .,^ ,n-k M/,.\i ix. -,/,^-k-i
< n k ) (1/k)i n k-i (2
i+i I ((k-i)/k) " ?>
for i = 0, ..., n-k.

6. SUMMARY

Some current issues referring to CCF identification and definition have


been discussed. In view of the state of the art, one can say that the
choice of the most appropriate method for the treatment of dependent
failures ("CCF") in a given reliability analysis depends on the purpose
and aim of this analysis as well as on quality and extent of available
data, which may also influence the possible or necessary degree of
detail of the analysis. A unique method for all purposes cannot be
recommended.
For practical purposes, the absence of an appropriate CCF
definition can be overcome by expert judgment, and all kinds of
uncertainties inherent in this process should be treated appropriately.
Two pitfalls are identified in the process of evaluating CCF event
data when systems do not match in the degree of redundancy:
- the neglect of success information when CCF event data are applied to
a system with lower redundancy,
- the neglect of renormalization when CCF event data are transferred to
a system with enhanced redundancy.
Both pitfalls lead to pessimistic overestimation of CCF effects on
system failure behaviour, especially in systems with redundancy degree >
3, in which biases of more than an order of magnitude can be observed
(Doerre ( 11) ).
It is recommended not to neglect the underlying implications to
expert judgment in any future CCF analysis in order to avoid
unnecessarily pessimistic and thus irrelevant results. As to data
evaluation, a complete description of a CCF event has to include both
the number of failed and intact components. In case of extrapolation to
a higher degree of redundancy, assumptions have to be supplemented in a
way that does neither destroy actual evidence nor create fictitious
evidence.
Two mathematical methods have been developed which treat the two
cases of redundancy mismatch adequately.

- 218 -
REFERENCES

1. Poucet, ., Amendola, ., and C acciatale, P. C , "C C FRBE: C ommon


Cause F a i l u r e R e l i a b i l i t y Benchmark Exercise", f i n a l r e p o r t , EUR
11054 EN, 1987
2. Bourne, A. J . , Edwards, G. T . , Hunns, D. M., Poulter, D. R., and
Watson, . ., "Defences against commonmode f a i l u r e s in redundancy
systems", UKAEA Report SRD R 196, Safety and R e l i a b i l i t y
D i r e c t o r a t e , Jan. 1981
3. C r e l l i n , G. L . , Jacobs, I . M., Smith, A. M., and Worledge, D. H.,
"Organizing Dependent Event Data a C l a s s i f i c a t i o n and Analysis of
Multiple C omponent Fault Reports", Reliab. Engng. 15(1986), 145158
4. Virolainen, R., "On common cause f a i l u r e s , s t a t i s t i c a l dependence
and c a l c u l a t i o n of uncertainty; disagreement in i n t e r p r e t a t i o n of
d a t a . " , Nucl. Eng. Des. 77(1984), 103108
5. Apostolakis, G., and Kaplan, S . , " P i t f a l l s in Risk C alculation",
Reliab. Engng. 2(1981), 135145
6. Hughes, R. P . , "A New Approach t o C ommon C ause F a i l u r e " , Reliab.
Engng. 17(1987), 211236
7. Doerre, P . , "An I n t e r a c t i o n Model for P e r i o d i c a l l y Tested Standby
Components", Proc. of the 9th I n t . Topical C onf. on P r o b a b i l i s t i c
Safety Assessment and Risk Management (PSA ' 8 7 ) , Zurich,
Switzerland, Aug. 31Sept. 4, 1987, Vol. I , 203208, Verlag TV
Rheinland GmbH, Kln 1987
8. Apostolakis, G., and Moieni, P . , "The Foundation of Models of
Dependence in P r o b a b i l i s t i c Safety Assessment", Reliab. Engng.
18(1987), 177195
9. Doerre, P . , and Sobottka, H., "The C hoice of Appropriate R e l i a b i l i t y
Data as a Decision Problem", Proc. of t h e 5th EuReDatA C onf.,
Heidelberg, April 9 1 1 , 1986, Springer Verlag, Berlin, Heidelberg,
New York, Tokyo 1986
10. Fleming, K. N., Mosleh, ., and Deremer, R. K., "A systematic
procedure for t h e incorporation of common cause events i n t o r i s k and
r e l i a b i l i t y models", Nucl. Eng. Des. A93(1986), 245
11. Doerre, P . , "Possible P i t f a l l s in the Process of C C F Event Data
Evaluation", Proc. of t h e 9th I n t . Topical C onf. on P r o b a b i l i s t i c
Safety Assessment and Risk Management (PSA ' 8 7 ) , Zurich,
Switzerland, Aug. 31Sept. 4, 1987, Vol. I , 7479, Verlag TV
Rheinland GmbH, Kln 1987
12. Mosleh, A. et a l . , "Procedures for Treating C ommon C ause F a i l u r e in
Safety and R e l i a b i l i t y Studies", prepared for EPRI/NRC , NUREG/C R
4780 (PLG0547) ( d r a f t ) , April 1987

219
EXPERIENCE AND RESULTS OF THE CCF-RBE.

A. POUCET
Commission of the European Communiries
Joint Research Centre
System Engineering Institute
1-21020 Ispra (VA)
Italy

ABSTRACT. The J oint Research Centre of the CEC has programmed a series of benchmark exercises
in order to assess the state of the art in reliability assessment methodology, to assess the uncertainties
involved and to eventually arrive at consensus procedures for carrying out analysis. The Common Cause
Failure Reliability Benchmark Exercise (CCF-RBE) is the second in the series and deals with the problem
of identifying, modelling and quantifying dependent failures. On the basis of a real reference plant and
system, a common set of problems was defined and analysed by ten different teams. The results were then
compared in order to assess the state of the art in CCF analysis, to identify the methods and procedures
applied, to get insights in the magnitude and causes of the uncertainties involved, and to achieve consensus
on the most appropriate procedures, methods and data. The paper discusses these results and presents the
main conclusions and lessons learnt from the CCF-RBE.

1. Organisation of the CCF-RBE

1.1. CASES ANALYSED


The CCF-RBE was organised on the basis of common study cases related to real systems in
a real NPP. The reference power plant was the Grohnde plant (KWU design and operated by
Preussen Elektra) and the systems studied were the systems that can deliver auxiliary feedwater
in the case of a loss of preferred power.
These systems are (fig. 1):

1. a start-up and shut-down system with two 100%-redundant electrical motor driven pump
trains (13 in fig. 1);
2. an emergency feedwater system with four 100%-redundant diesel driven pump trains (17
in fig. 1).

221
. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 221-234.
Q1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
1 Steam generators 1 Feedwater unk
2 Vibe compartments 2 Feedwiier pumps
3 Slop fid contiol valves 3 Startup ind ihiiidotvn pumpt
4 Bypass llllion 4 HP leed heiter system
5 Rehmer 5 0 immillili ed riter storage unk
6 HP turbine 6 Demmedhied water pumps
7 LP tmbme 7 Emergency leed pumpi
8 Condenser B Ci'culilinq water pumps
9 Main condenan pumpt 9 Closed loop cooling system
10 LP feed heuer lystern

Figure 1: Survey of systems for secondary side heat removal

The TOP event analysed was the failure of sufficient supply of feedwater to the steam generators
after a loss of preferred power. The mission time was assumed to be two hours. In order to
limit the scope of the exercise, external events, i nterde pendencie s with other systems and human
interventions during the mission were not considered.

1.2. WORKING PHASES


After a first meeting in which the general rules and objectives of the CCF-RBE were discussed,
the first documentation needed to perform the analysis was prepared by KWU and analysed
by the participants. This documentation and familiarisation phase was concluded in a second
meeting at Grohnde which included a visit to the Grohnde plant and more specifically a closer
look to the systems to be analysed.
After this initial familiarisation, the benchmark exercise was organised in two working phases:
1. Working phase I was organised to achieve an overview of procedures, methods and ap-
proaches used to perform CCF analysis. All teams were asked to perform a complete
qualitative and quantitative analysis of the system and top event according to their usual
procedures. In order to avoid divergence due to different modelling and data for indepen-
dent events, and to focus on CCF, a common fault tree for the systems and top event,
containing only independent failures was provided.

- 222
2. Working phase was organised to get a deeper understanding of the sources and magnitude
of uncertainties and spread in the quantitative results. The scope was limited to the four
train emergency system only and, based on the qualitative analysis in phase I, a common
set of components to be included in the CCF analysis was agreed. Then the following tasks
were performed:
a. repetition of the calculation according to the new scope but using the data assumed in
phase 1 (allowing to investigate the effect of the boundaries);
b. calculation of the unavailability using a common set of parameters estimated in a
consistent way for different models (allowing to analyse the spread due to different
models);
c. calculation of the unavailability using a set of own parameter estimates obtained from
analysis of 'raw' event data (assessment of spread due to parameter estimation).

The C C F started in September 1984, working phase I was concluded in September 1985 and
working phase ended in April 1986.

2. QUALITATIVE AND QUANTITATIVE METHOD S USED

2.1. QUALITATIVE ANALYSIS


The C C FRBE underlined the importance of systematic and structured qualitative analysis.
The main objectives of a qualitative analysis can be summarised as follows:
1. to understand the mechanisms and factors which might determine dependence between the
components of the system;
2. to identify important potential dependent failure events including:
a. identification of events leading to multiple functional unavailability and components
affected by such events;
b. identification of root cause events and cause effect chains leading to cascade failures
and the components affected in such;
c. identification of potential human errors in test and maintenance and affecting more
components;
d. identification of groups of components that may be affected by general C C F causes
(CCF component groups);
e. assessment of the effectiveness of defences builtin to prevent dependent failures or to
reduce their likelihood
3. to screen and rank the potential dependent failure events in order to identify the most
important ones which will be carried over to the quantification process;
4. to assist the screening and impact assessment of raw data used for the estimation of C C F
model parameters.
Especially with respect to item 4 above, a close link between qualitative analysis and quantifi
cation was judged to be absolutely necessary to get meaningful and consistent results during
the quantification. The insight in and judgement on the quality of system design and operation,

- 223 -
gained by the qualitative analysis, must be reflected not only in the choice of the model used for
CCF analysis and in the C C F component groups, but also in the quantification of the model.
The methods used for qualitative analysis were basically of two different types:

1. component driven methods, similar to Failure Modes and Effects Analysis (FMEA), in which
for every component and failure mode an investigation is made on the causes that can be
shared with other components. An example of a dependent failure oriented FMEA table
is given in fig. 2;
2. cause driven methods which start from a classification of potential common causes (checklist)
and analyses which components can be affected by these causes. An example of a C C F
checklist as used by some teams is given in fig. 3.

Component
Component
Component Coupon*t C (ponent Coponent Syitea Syitee Initial Component
Component Hal m e
Identification Typ* ( O n i o n / Ute/ Manu , Internal [temal Condi t I o n i and letting
location nane*
(HaMl/Huaber) Com t r u c i Ioni Tunc t l on facturer Condition! tnvlronaent Operating Policy
Policy
Charactcrlitlct

faergency Feed Heri t i l Provide ' A Del on l i e d Clean. X012S I n i t i a l l y In 4week t e t t Unknown
nfnejtag* emergency water satu endoied; Itandby; t w i t Interval
pump Drive centrifug*! feed flow rated with roo t e * H a r t on demand ttaggered
BSID001/ puBp type A , on deaand atmospheric perature and run for by t r a i n .
G1500001 I th gear to replace otygen; " range. > I houri.
coupling type A the water to JO'C;
to l d i e t e i l o l l by 1 to
drive type A ; tlowoff 90 bar.
actuation c i r due to
c u i t breaker eecay heat
type B. arter
thutdown.

emergency Feed Horftontal Frovide A Delonlied Clean. 10245 I n i t i a l l y In 4week t e i t Unknown


m t e r Pup and nlneitage eeergency water u t u endoied; itandby; s u i t Interval
Pump Drive centrifugal feed flow ra ted with r o e tem t t a r t on demand ttaggered
ns?roooi/ pump type A , on dcaand ataotpherlc perature and run f o r by t r a i n .
GH0D001 with gear to replace oiygtn; 20* range. > 2 hourt.
coupling type A the water to JO'C;
to a d l e i e l I o n by 1 to
drive type A ; blowoff 9B bar.
actuation c i r due to
c u i t breaker decay h u t
typo B. after
thutdown.
Cwrgency Feed Horizontal Provide A Delonlied Clean, DKS Initially In 4week t e i t Unknown
water Puep and ntneitige emergency water satu endo i e d ; itandby; e u i t Interval
Purp Orive centrifugal feed flow rated with roo tem t t a r t on demand ttaggered
fi S i f DOOI/ pup type A , on demand taoiphcrlc perature and run f o r by t r a i n .
GT70D001 with gear to replace o v s c n ; 20* range. >2 noun.
coupling type A the water to 30*C;
to a diesel I o n by 1 to
drive type A ; blowoff IB bar.
actuation c i r due to
c u i t breaker decay heat
type 1 . after
ihuldown.

Figure 2: D ependent failure oriented FMEA table

224 -
Normal environment dust
burodity
temperature
vibrations

In UrnMy generated abnormal environmental condition temperature


pressure
radiation
chemical corrosion
pipe whip
missiles
jeL impingement
local flooding
explosions
fires

Externally generated abnormal environmental condiiona aircraft crash


explosions in vicinity of site
fire in vicinity of site

Extreme natural environment extreme weather conditions


floods
earthquakes

Effecli of design errara systenVcomp. unfit for mission


systems with potential CCF
lysterns difficult to mainili Q

ETecti of manufacturing errori

Effects of aiMtnbly errori

Effects of human errara during operiti on


during lest and maintenance

Figure 3: CCF causes checklist

The FMEA approach can be valuable for determining functional dependencies and cascade
failures to be incorporated later explicitly in the logic model (fault trees). The approach was to
be complemented by a mapping of components according to their C C F related attributes in order
to be able to identify the C C F component groups. This mapping was performed on the basis of
a limited checklist of attributes (manufacturer, type, hazard and location).
The cause driven (checklist based) approach can be used for identification of dependent failures
in the narrow sense: i.e. for identifying groups of components susceptible to experience common
cause failures due to a same root cause, but excluding cascade failure and multiple functional
unavailability.
The C C FRBE led to the conclusion that both types of methods are complementary and that
both approaches should be present in a widescoping analysis. A way to achieve this is to use
a modified FMEA table (table 2). In this table the major component attributes are represented
and columns are inserted for C C F cause categories and identification of other components which
might be struck by the same causes.

225
Com F cd on Com Com Com Tei t ft Fallur Detec Effects Failure Other
ponent Aiuti ponent ponent ponent mainte od* tion pos on other cause com
Iden U cati on Ijpe mannfnc location nance ilbUltj categorica ponents
tarer ponenti lenii tl
for same
eau cs

Figure 4: C C F qualitative analysis table

As the number of identified dependencies can be large, it is necessary to perform some screening
and to extract a list of the most important ones. In this way the C C F component groups to
be analysed in a quantitative way can be determined. Qualitative screening can be performed
by using a set of rules derived from experience. The following basic rules were used by most
participants:

1. identical redundant components have to be considered in C C F component groups;


2. C C F's between completely diverse components in separate redundant trains can be neglected
if such trains contain also identical components (they will dominate the C C F contribution
anyway);
3. diverse redundant components having identical piece parts cannot be assumed to be inde
pendent. However if identical redundant components are present in the system, they can be
assumed to outweigh any C C F's on partially diverse components.

Screening can be performed also on the basis of quantitative criteria: e.g. using a simple model
(such as the factor model) and using generic parameters. A significance level with respect
to independent failure contributions is assumed to perform the screening.

226
22. QUANTITATIVE METHODS.
The quantitative methods used in the CCF-RBE included:
1. parametric models such as the beta-factor model, the multiple greek letter (MG L) model,
the Marshall-Olkin model, the binomial failure rate (BFR) model;
2. methods based on more judgemental assessment such as the cut-off technique and the partial
beta-factor model;
3. the use of failure rate coupling as implicit treatment of dependency.

The factor model was used by some participants for screening purposes only or in comparison
to other models. The attractiveness of the factor model lies in its simplicity and in the
availability of generic factors for different types of components. However, the model may
give conservative result sin the case of multiple redundant systems.
The Multiple G reek Letter (MG L) model was the most frequently used model in the CCF-
RBE. Many participants found it a natural extension of the factor model. Some problems
concerning the estiamtion of MG L parameters were identified during the CCF-RBE. They will
be discussed later.
The Binomial Failure Rate (BFR) model was used by fewer participants. The BFR model is
not general in the sense that it assumes that common cause events either fail all components
together because of a lethal shock or have a binomial impact. It appeared that there was no
significant difference between the results obtained using MG L and BFR models, at least not in
the cases studied.
Both MG L and BFR models suffer from the fact that the parameters share statistical evidence
and, hence, are not independent from each other. This creates some difficulty in calculating
uncertainties.
Moreover, the parameter estimation for those models is based on the use of estimators in terms
of individual component failures (component statistics). Models based on event statistics (taking
into account the number of events involving single, double, triple... failures) are preferable. The
use of component statistics can be shown to lead to an artificial increase in the strenght of the
evidence and, consequently, to narrower distributions on the parameters and a downward shift
of the mean value of the parameters (Apostolato 1986). However, the numerical impact was
believed to be small and not significant in respect of the much larger impact of analyst judgement
used in parameter estimation.
The Basic Parameter (BP) model has independent parameters and is event based, and, hence,
would be preferable from a theoretical point of view. However, it needs data about sample size
and observation time and such data are not easily obtainable. Recently other event based models
have been proposed (e.g. afactor model (Mosleh, 1987)).
The number of parameters for MG L and BP models and their values depend on the number of
redundancies in the system studied. For this and other reasons, the use of generic parameters for
such models was judged to be inappropriate but for screening purposes.

2.3. PARAMETER ESTIMATION


Parameter estimation was recognised as a very crucial issue in quantitative dependent failure
analysis. This task can introduce a large spread in the results as will be shown when the numeric

- 227 -
results of the CCF-RBE are discussed. The major source of this variability is the extensive use
of expert judgement in the process of event screening.
Parameter estimation involves:
1. analysis of operating experience in order to identify the event data base to be used;
2. screening of the events in the event data base in order to assess the degree of relevance of
the events for the plant under consideration (impact vector assessment);
3. calculation of the parameters for the choosen model.

The parameter estimation task in the CCF-RBE was based on the used of the Nuclear Power
Experience (NPE) data base composed of LER events and, hence, based on U.S. experience.
LER's only partially report independent failures. Therefore, the combination of MGL or other
parameters assessed from this data base with independent failure rates assessed from another type
of reporting (more complete with respect to independent failures) may lead to an overestimation
of CCF probabilities.
The impact vector method of screening the data was judged to be very convenient but the use
of this method in the CCF-RBE identified some previously not realised problems related with
the extrapolation or extrapolation of events from systems with lowerredundancyto systems with
higher redundancy and vice-versa (mapping up and mapping down). Whereas mapping down
impact vectors (i.e. intrapolation from a higher redundancy system to a lower redundancy one)
is a deterministic application and can be performed by using formulas that take into account
the difference in system size, mapping up implies some assumptions on the nature of the CCF
event (e.g. lethal shock or not). Since the CCF-RBE, some solutions to the problem have been
proposed in the literature (Doerre, 1987).
Besides the differences in system size, the analyst performing impact vector assessment is con-
fronted with differences in system design, technology and operation. In facing these differences,
the analyst may screen out events which are judged not to be relevant for the plant under anal-
ysis, but there is no way for 'pulling in' events which are relevant but do not appear in the
event data base. Therefore, it was agreed by all participants that he parameter estimation task
should be based on an as extensive as possible data base including events related to NPP's of
whatever design.

3. Results of first working phase


The numeric results obtained in phase I show a significant spread as can be observed in fig. 5.
It was however difficult to get a clear picture of the causes of the analyst to analyst variability
from the phase I results only.
The, sometimes poor, integration of the qualitative analysis with the modelling and quantitative
analysis and the different degree of structure of the various qualitative analyses have to our
opinion had a big impact on the large spread observed. This is confirmed by the fact that
there was in general quite a good agreement between the qualitative conclusions reached by the
different participants.
The variety of assumptions made by the participants, leading to differences in the established CCF
component groups and in the logic modelling, have also contributed to the variability observed.

- 228 -
In the second working phase, a number of common assumptions have been adopted in order to
achieve a better comparability of the results.

* Point value
Mean f Median
Best estimate Range or 90% coni, bounds
point value

10' 8
3
1

3
Tjfla.8.

}l h!
j s'
10 4 *
* 1 fl

I

*
I #
.. 1
*
10 P

[
I 1.

10
w
*

10 7
1 1 1 1

DK F FRG1 FRG2 FRG3 I S UK A


US

Figure S: CCFRBE Phase I: total unavailability of feed function (startup/shutdown and emergency feed systems)

4. Results of second working phase


The calculation of the emergency system unavailability according to the restricted scope agreed
for phase II was first performed using the original data assumed in phase I.
The results of this calculation are represented in Fig. 6.

229
* Point estimate "f Median Best value from first phase

Mean Range or 9 0 % conf. interval

lis S
SS caioil Jnp

io2

*
1 ii "H s
" r ,,
3
*- 1 .
*
10 3 *
*


4
10

I I
F FRG1 FRG2 FRG3 I S UK A
US
DK

Figure 6: CCFRBE Phase II: emergency feed system unavailability calculated using data from Phase I.

A calculation was then performed using a common set of CCF parameters, assessed in a consistent
way for both the MGL and the BFR models. The parameters were derived from event data out of
the Nuclear Experience data base (a LER based system). The results obtained by the participants
using the common parameter set are represented in fig. 7.
The comparison of the results of these two calculations indicates that, given a well defined scope
and boundary for the system to analyse, different analysts can achieve consistent results on the
condition that the data base is the same. Indeed the results of the second calculation (fig. 7)
show a very good agreement even if different parametric models were applied (the MGL method
seems to yield somewhat wider uncertainty bounds).
Given this, it results that the spread observed in fig. 6 for the first calculation is due to the
spread in the original data (parameter values) used.
In a third calculation, the participants were asked to estimate themselves the CCF parameters for
the MGL or BFR model.
For this calculation, a set of event reports were provided to the participants. This set was the
same as the one used to assess the common parameters applied in the previous task.
The results of the parameter estimation task are summarised in fig. 8. The emergency feed
system unavailability calculated using the parameter estimates is represented in fig. 9.

230
* Point estimate t Median
Mean Range or 90% coni, interval
10 -i

i
li
u

if
<
-o
S
Sg
f? s
* 3
1*
*
* i .
101 t
* * *


**11

f
101
*

lO"4

l i l i I
DK F FRG1 FRG2 FRG3 I UK USA
s

Figure 7: CCFRBE Phase II: emergency feed system unavailability calculated using reference set of CCF parameters.

Again it can be observed that the parameter estimation is a key contributor to the variability
observed in CCF quantification. The major source of the variability is the subjective judgement
applied during the screening and assessment of the events: i.e. the decision whether some event
is applicable to the plant under consideration or not, and to what extent redundancies could be
affected.
In order to reduce the spread introduced in parameter estimation, it was judged helpful to have
some prees tabi ished guidelines for performing the event screening process.
The following general guidelines were used in the CCFRBE:

1. Component caused functional unavailabilites were screened out since it was assumed that
those were modelled explicitly in the fault trees;
2. if a specific defense existed against a class of dependent events, specific events if this class
were screened out;
3. if a reported event was caused by a train interconnection which in the plant under consider
ation does not exist, the event was considered as an independent event;
4. events related to inapplicable plant states (e.g. startup or shutdown) were creened out
inasmuch as they did not reveal general CCF mechanisms capable of occurring in normal
power operation;

231
CCF component group MCL DK(1) DK(2) FR Gl FRG2 FRG3 UK USA
.884 .OM .043 .013 .06 .O .09 .08
Emergency feed pump
dietei aggregale
Y .86 .92 .90 .48 JO Jl .90 .89

.90 .95 .93 J5 .95 .69 .5 .95

.19 J7 .12 .005 .16 .01 J3 .


Demineralited waler
recirculation pumpa
.7 .. .91 .5 .65 .45 .96 .94

.5 .07 .87 .50 .93 .50 .96 .94

.15 .15 .015 .07 .04 .13 .13

HVAC fini .94 .94 .54 .63 M .80 .80

.94 .94 .40 .83 .50 .80 .80

.093 .096 .068 .023 .11 .05 .17 .09

Motor valvet (3) .71 .97 .96 * .57 .11 .88 .76

.85 .98 .98 M .83 .76 .97 .92

.17 .19 .19 .028 .08 .05 .23 .18

Check valvet (4) . .93 .93 .49 .83 .25 .94 .93

.14 .93 .93 J .83 .50 .94 .93

(l : peitimistic estimatet cf team DK

(2) : optimistic estimates of team DK

(3): Motor valves includethe following three groups:


-steam generator level control valves
-stop valves
-flow limiting valves

(4): Check valvet include the following three groups:


-check valves in front ofd S G.'s
-free flow check valves
-check valves downstream demin. water recirculation pumpt

Figure 8: CCFRBE Phase II: MGL parameters estimated by the participants

5. events related to incipient failure modes were screened out;


6. events regarding failure modes that are irrelevant to the system logic model were screened
out
Apart from the uncertainty arising from the subjective judgement used in processing the event
data, other issues contributing to uncertainty of the data were also identified but were found to
have lesser impact. These include the mapping up and down of impact vectors for extrapolating
to systems with higher or lower redundancy (see above), the event vs. component statistics issue
(see above), the uncertainty in the number of independent events because of underreporting of
such events and the choice of prior distribution if a Bayesian estimation of the parameters is
performed.

232
* Point estimate t Median
Mean T Range or 90% coni, interval
10'

If 1
10=
!
s
IO"3

10 4

10' 1 1 r
DK UK USA
FRG1 FRG2 FRG3 I

Figure 9: C C FRBE Phase II: emergency feed system unavailability calculated using own parameter estimates

5. Conclusions
The CCFRBE has been very successful in achieving an assessment of the state of the art of CCF
analysis and identifying some key contributors to the uncertainty involved in this activity, .br First
it has allowed to clarify the confusing terminology related to dependent events and has contributed
to better understand the different categories of dependent events and their related terms, .br Once
these different categories were clear, it was also possible to achieve an agreement on the domain
of application of the various explicit and implicit (parametric) modelling approaches:

1. dependent events due to a clear deterministic cause such as unavailability of support func
tions, cascade failures, human errors, should in principle be modelled explicitly in the fault
trees and event tree;
2. the residue of potential multiple failure events for which no clear deterministic cause can
be identified in the system logic, for which a multiplicity of causes such as 'environment',
'design', 'maintenance',etc. can be assumed, or that because of cost reasons or impossibility
to quantify are not further decomposed, can be captured by parametric modelling.
There was a consensus among the participants about the importance of a well structured qualitative
analysis and about the necessity to link this qualitative analysis closely with the subsequent
quantification.

233
The C C FRBE has shown clearly that, with the current state of the art, different analysts can
come up with rather different quantitative results of a CCF analysis, even if very strict common
assumptions on what to quantify and how are taken. The C C FRBE has demonstrated that one
of the main contributors to the spread is the subjective judgement applied in processing event
data for parameter estimation.
Last but not least, the C C FRBE participants agreed that one should use the widest possible
event data base for estimating C C F model parameters. Indeed, the class of dependent events
that we want to cover using parametric models is linked with a very wide spectrum of causal
mechanisms. Hence, every event in whatever design might contain some new information

6. REFERENC ES
Amendola, A. (1985a). Results of the reliability benchmark exercise and the future C EC
JRC Programme, Proc. AN S/EN S Int. Topical Meeting on Probabilistic Safety Methods and
Applications, San Francisco, Feb. 24March 1,1985.
Amendola, A. (1985b). Systems reliability benchmark exercise, Final Report, C EC JRC Ispra
EUR 10696.
Apostolakis G. and P. Moie ni (1986). On the correlation of failure rates, Reliability data collection
and use in risk and availability assessment, H.J. Wingerder (ed.), Springer Verlag, Heidelberg.
Doerre P. (1987). Possible pitfalls in the process of C C F event data evaluation, International
Topical Conference on Probabilistic Safety Assessment and Risk Management, Zurich, August 30
Sept. 4, 1987.
Mosleh ., and N. O. Siu (1987). A Multiparameter, Eventbased C ommoncause Failure Model,
Proc. of the 9th SMIRT, Lausanne, August, 1987.
Poucet ., A. Amendola and P.C . Cacciabue (1987). C ommon Cause Failure Reliability Bench
mark Exercise, Final Report, C EC JRC Ispra EUR 11054 EN.

234
ANALYSIS OF CCF-DATA - IDENTIFICATION
THE EXPERIENCE FROM THE NORDIC BENCHMARK

K.E. Petersen
Systems Analysis Department
Ris National Laboratory
P.O. Box 49
DK-4000 Roskilde
Denmark

ABSTRACT. Within the nuclear safety programme sponsored by NKA, the


Nordic Liaison Committee for Atomic Energy/ a project "Risk Analysis"
is performed. One topic within this project is a Benchmark Exercise
on treatment of common cause failure data. This paper describes the
identification of common cause failures from failure data concerning
motor-operated valves in Swedish Boiling Water Reactor plants. The paper
discusses identification procedures, results of the identification and
experiences and conclusions.
This presentation is based on the work carried out in cooperation
between ABB ATOM AB (Sweden), Studsvik Energy Technology AB (Sweden),
Technical Research Centre of Finland (Finland) and Ris National Labora-
tory (Denmark). The common conclusions are presented in the summary re-
port "Summary Report on Common Cause Failure Data Benchmark Exercise",
ref. 1.

1. BACKGROUND

The Benchmark Exercise has been carried out by four working groups:

- ABB ATOM AB, Sweden


- Studsvik Biergy Technology AB, Sweden
- Technical Research Centre of Finland (VTT)
- Ris National Laboratory, Denmark.
A total effort of 10 man months has been used for the CCF identifi-
cation phase which was carried out within an 8 months period of time.
Motor-operated valves in Swedish Boiling Water Reactor (BWR)
plants have been chosen as the object of the CCF-data Benchmark Exercise.

2. GOALS AND PRINCIPLES

235
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 235-241
1989 ECSC. EEC. EAEC. Brussels and Luxembourg.
The goals of the exercise were:
- Identification of CCF's which have occurred during the operation
of the plants

- Estimation of CCF-contributions for four redundant motor-operated


valves in a safety system at the FORSMARK 1 plant.

- All failure multiplicities should be considered.

This paper discusses the identification phase.

A set of basic principles has been established for the exercise:

1) Only multiple failures (actual or potential) are of interest.

2) It should be possible to point out one or several common causes


as a source of each CCF.

3) Critical time period is of great importance when dependencies are


considered. Failure of identical components can be manifested di-
rectly when the failure occurs or a long time after occurrence.

4) Cascade failures and other dependencies which originate from


basic design principles (e.g. functional dependencies related to
power supply, signal exchange or to other explicitly modelled
auxiliary systems), are not regarded as CCF's in this study.

5) Many failures are not manifested as multiple, since measures are


taken before they occur. In some cases the failure event has such
a character that even if only a single failure has actually oc-
curred, the same failure mechanism with the same cause may be de-
tected in other units. Furthermore, seemingly independent multiple
failures which occur close in time cannot a priori be regarded as
fully independent. One way to model such failures is to include
them in the group of potential (opposite to actual) CCF's. This
categori can even include components in degraded condition when-
ever they occur in conjunction with actual failures in a dependent
fashion.

6. The analysis is not a priori limited to redundant components


within single systems. It is, however, expected that the evidence
of CCF's involving identical components within different systems
(intercomponent - intersystem CCF's) will be rather weak.

Furthermore, some boundary conditions and limitations associated with


the identification procedure were given:

1) For physical boundaries of the components the definition given in


the Swedish Reliability (T-) Data Book, should be used.

2) Only critical failures should be classified as actual CCF's.

- 236 -
Determination of what is a critical failure is to be performed
independently by each institute using own judgment, ATV's orig-
inal classification and ASEA-ATOM's classification which consti-
tutes the basis of the T-book.

3) Among potential CCF's the combinations involving non-critical


failures may be found. Hie interest should be focused on cases
with combinations of e.g. a critical and a non-critical failure
or violations of Technical Specifications. Combinations of non-
critical failures discovered during the overhaul period are less
serious.

3. DATA SOURCES

Failure reports are comprised in the Scandinavian Nuclear Power Re-


liability Data System. In addition the Swedish Licensee Event Reports
(LER) involving motor-operated valves were selected. In total the fol-
lowing times have been covered for 7 Swedish nuclear power plants:

Barsebck 1 771001821231
Barsebck 2 790101821231
Forsmark 1 810101821231
Forsmark 2 810701821231
Oskarshamn 1 740101821231
Oskarshamn 2 760101821231
Ringhals 1 761001821231

34 0 failure reports and 2 Licensee Event Reports were collected.


In addition the following documentation was prepared for the exer-
cise:

- a list of all motor-operated valves covered by failure reports, in-


cluding operation times, test intervals, number of activations and
number of critical failures during commercial operations

- a list of all overhaul periods

- failure codes used in the ATV-system

- flowsheets of relevant systems.

4. IDENTIFICATION PROCESS

4.1. Initial Identification

An analysis of the CCF-data was performed independently by each of the


four working groups. The working procedures chosen by each group were
different. Further, differences in scope and additional boundary con-
ditions resulted in differences in the list of identified CCF-contri-

- 237 -
butions. Due to incompleteness of the failure reports subjective judge-
ment was used by all working groups causing differences in interpreta-
tion of the same sources of information.
In table 1 the main features of the approach to identification
chosen by all four working groups are summarized.

TABLE 1

FEATURE GROUP
ABB ATOM RIS STUDSVIK VTT

SCOPE
intersystem dependencies yes yes no no
limited to redundancies no no yes no
LERs considered yes no') noD noD

MAIN IDENTIFICATION FACTORS


critical time period yes 2) yes^ ' yes-*' yes-' '
failure cause(s) yes yes yes yes
failure mode yes yes yes yes

BOUNDING CONDITIONS
separate treatment of overhaul yes
periods
acceptance of failure criticality
according to the Reliability
Data Book yes yes no yes4)
distinction between potential
and actual CCFs yes5) yes ' yes ' no

APPROACH
computerized analysis yes no no

^)Left out unintentionally


2)one test interval
3)One month
4'with minor exceptions
^ D i f f e r e n t definitions of potential CCFs

4.2. Results of Identification

The results of the identification are presented in table 2, from where


it is seen that a total of 17 CCF's were identified by at least one
team.

238
Table 2 Identified multiple failure events

Sun
Evnt PUnt Compo D ate Comm Event descri ASEA Ri** Studsvik VTT
No nenti nu* tlon ATOM

t fttncbldi I 311V50 10090 r.R.FH Internal potential CCF CCF CCF


3UV0 0090 r,R,FH leakages CCF candidate candidate candidate

2 D arseback 1 721V23 100930 r,R,FH Internal potential CCF CCF CCF


72IV2 (0091t r,R,FH leakages CCF candidate candidate candidate

) Barsebck 2 322V1 110907 ,, Wrong connec actual not not not


322V2 110907 r,Jl,FH tlon CCP Identified identified identified
322V3 S10907 r,A,FH
J
* Barsebck 2 3UVJ0 SOO702 r.R.FH Sticking notable potential CCF not
I 3UV60 S00702 r,R,.FH valvei case CCF candidate identified
311V70 S00702 r.R.FH
311 VSO S00702 r,R,,FH

J Forsmark 1 321V33 S10627 r,R,FH Internal potential not CCF CCF


321V3 S1071S r,R,FH leakages CCF Identlfied candidate candidate
4 Forsmark 1 323V201 S20709 nr,R,FH Internal potential potential not not
m 323V21* S20709 nr,R,FH leakages CCF CCF identified identified

7 Forsmrk 1 32IV33 120701 r,R,FH Int.leakage excluded potential CCF not


m
321V3* 12011 r,R,tFH Ext. case CCF candidate Identified

I Forsmark 2 323V20 111021 nTi*Rt*FH Incorr.torque potential potential not not


323V21 111021 nr.R.FH twltch setting CCF CCF identified identified

9 Forsmark 2 322VWI 120310 nr,R,FH Internal excluded potential not CCF


322V30J 120)11 nr,R,FH leakages case CCF identified candidate

10 Oskarshamn 1 322VI 771116 r.R.FH Incorr.torque actual CCF CCF not


322V20 77111 r,,FH swltch settings CCF candidate candidate identified

II Oskarshamn 1 322 VI 790311 nr.R.tFH Sticking potential CCF not CCF


322V7 790J22 nr,R,FH valves CCF candidate identified candidate

12 Oskarshamn 1721V27 7*1210 r,RFH No information excluded CCF CCF not


m 721V21 7*1210 r.RoFH available case candidate candidate identified

13 Oskarshamn 2 32IV2 110723 nr.R.FH Internal potential potential not not


M
321V32 110730 nr,R,FH leakages CCF CCF identified identified

1* Oskarshamn 2 323V3 7S021 nr.R.FH Internal potential CCF not not


M
323V1J 71071 nr.R.FH leakages CCF candidate identified identified

13 Oskarshamn 2 721V23 770707 r.R.FH Sticking potential CCF CCF CCF


m
72IV2 770703 r,R,FH valves CCP candldate candidate candidate
l Ringhili 1 323V3 S103 r,RFH External notable CCF CCF not
m
323V 111103 r.R.fH leakages case candidat candidate Identified

17 Ringhals 1 322V7 791021 rR,FH Incorr.torqu not not not CCF


.322VI 791021 rR.FH switch settings Identified Identified Identified candidate

r redundant components . normal operation


nr non-redundant components FH critical failures
R overhaul period iFH a non-critical failures

239 -
4.3. Screening for Applicability

The list of CCF's presented in table 2 was agreed as a common data base.
The list was screened by each working group independently. The screening
was performed with the following purposes:

- it was up to each analyst to decide which of the 17 CCF's are


relevant for the present application

- it was up to each analyst to decide about the applicability of


CCF's involving non-redundant components belonging to the same
system.

The screening for "non-CCF" events caused an exclusion of 1-4 events


from the list. Further, 1-6 events involving non-redundant components
were excluded due to non-applicability. Finally, one team performed a
design oriented screening resulting in exclusion of one additional
event.
Since the identified CCF's originate mainly from plants with lower
level of redundancy than Forsmark 1, it must be decided on a case by
case basis if three or four valves (if present) would have been affected
by the shock in question. This may lead to extension of some of the ob-
served failures to higher multiplicity. This type of data modifications
has been performed in some cases by ASEA-ATOM, Ris and STUDSVIK. Ac-
cording to VTT's opinion such extensions are of Bayesian nature and
should be included in prior distributions, which cannot be done com-
pletely by using available methods.

5. CONCLUSIONS

The conclusions from the identification of CCF's are summarized below:

- Excluding the differences in scope and neglecting the differences


in definitions, 9 of the 11 events involving redundant valves were
identified by at least three of the groups. The result is con-
sidered encouraging and indicates that basic identification can be
reasonably performed with the available raw data.

- Tacing into account the use of subjective judgement at several


stages of the identification process, the results of the analyses
performed indicate that some consistency is possible. The discrep-
ancies may be explained by differences in scope, bounding con-
ditions and type of approach.

- Performing this kind of analysis additional information would be


beneficial:
- component specifications
- location of components
- manufacturers of components
- maintenance policies

- 240 -
- interviews with plant personnel
- maintenance logs.

The use of computer aid en searching, sorting and reorganizing


failure reports is highly recommended. However, dependence on the
computer alone to directly identify CCF's is discouraged.

The uncertainty concerning the quality of reports orginating from


the overhaul period is a serious drawback, emphasizing the need for
improvements of these reports.

Design oriented screening of CCF-data is recommended when suf-


ficient information is available.

Extension of failures to higher multiplicity may have a strong im-


pact on the results. Since the extensions are based on judgement
they have to be performed with care.

Relatively large effort is required to perform a proper analysis


including the collection of the relevant information.

Subjective judgements are needed in several stages of the identifi-


cation process, requiring a high quality of the documentation of
the analysis.

REFERENCES

Summary Report on Common Cause Failure Data Benchmark Exercise. Ed.


Stefan Hirschberg, AB ASEA-ATOM, Sweden. NKA Project "Risk Analysis"
RAS-470(86)14. June 1987.

241
SOME COMMENTS ON CCF-QUANTIFICATION
THE EXPERIENCE FROM THE NORDIC BENCHMARK

Kurt Prn
Safety and System Analysis
Studsvik AB
S-611 82 Nykping
Sweden

ABSTRACT. Within the nuclear safety programme partly sponsored by NKA,


the Nordic Liaison Committee for Atomic Energy, a project "Risk Analy-
sis" (Ref. 1) is being performed. Within the scope of this project a
Benchmark Exercise on the treatment of common cause failure (CCF) data
has been accomplished. The common conclusions of this Benchmark are
presented in a summary report (Ref. 2) and, more briefly, in another
paper of this book (Ref. 3 ) . The experiences from the CCF-identification
stage, in particular, are also described by a paper in this book
(Ref. 4 ) . This paper describes and comments the quantitative methods
used by the participants in the Benchmark Exercise. The paper defines
the parameters of the models and how they were estimated.
This presentation is based, to a great extent, on the work carried
out in cooperation between ABB ATOM AB (Sweden), STUDSVIK AB (Sweden),
Technical Research Centre of Finland and Ris National Laboratory
(Denmark).

1. BACKGROUND AND GOALS

The Benchmark Exercise has been carried out by the following groups:

ABB ATOM AB, Sweden


STUDSVIK AB, Sweden
Technical Research Centre of Finland (VTT)
Ris National Laboratory, Denmark

Motor-operated valves (MOV) in Swedish Boiling Water plants were chosen


as the object of the CCF-data Benchmark Exercise. References 3 and 4
provide the general background, main results and experiences from the
CCF-identification work in particular.
Starting from a common data base (Table 1) of CCF-candidates the
objective of the quantification phase was to estimate the CCF-contribu-
tions to the unavailability of 4-redundant MOVs in a safety system at
the Forsmark 1 plant. The quantities to be estimated were the probabil-
ities P. . = Pr {exactly i failures at a test or demand of m redundant
components} including uncertainty and sensitivity analysis.
243
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 243-256.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
TABLE 1 Identified Multiple Failure Events.
Status

Event Plant Compo- Date Comm- Event descrip- ASEA- Rise Studsvik VTT
No nents ents tion ATOM

1 Barsebck 1 3UV50 800904 r,R,FH Internal potential CCF- CCF- CCF-


311V60 80090 r.R.FH leakages CCF candidate candidate candidate

2 Barsebck 1 721V25 800930 r.R.FH Internal potential CCF- CCF- CCF-


721V26 S0091S r.R.FH leakages CCF candidate candidate candidate

3 Barsebck 2 322V1 810907 r.-R.FH Wrong connec- actual not not not
322V2 810907 r,-,R,FH tions CCF identified identified identified
322V3 810907 r , , R , F H

Barsebck 2 31IV50 800702 r,R,-FH Sticking notable potential CCF- not


311V60 800702 r,R,-FH valves case CCF candidate identified
311V70 800702 r,R,.FH
311V80 800702 r.R.^FH

5 Forsmark 1 321V33 810627 r,R,FH Internal potential not CCF- CCF-


321V3 810718 r.R.FH leakages CCF identified candidate candidate

6 Forsmark 1 323V201 820709 nr.R.FH Internal potential potential not not


323V214 820709 nr.R.FH leakages CCF CCF identified identified

7 Forsmark 1 32IV33 S20701 r.R.FH Int.leakage excluded potential CCF- not


321V34 820816 r,R,-.FH Ext." case CCF candidate identified

8 Forsmark 2 323V20 811021 n r , - R F H Incorr. torque- potential potential not not


323V21* 811021 nr,-R,FH switch settings CCF CCF identified identified

9 Forsmark 2 322V401 820510 nr.R.FH Internal excluded potential not CCF-


322V305 820511 nr.R.FH leakages case CCF identified candidate

10 Oskarsnamn 1 322V1 771116 r.-R.FH Incorr.torque- actual CCF- CCF not


322V20 771116 r.-R.FH switch settings CCF candidate candidate identified

11 Oskarshamn 1 322V1 790518 nr,R,-.FH Sticking potential CCF- not CCF-


322V7 790522 nr.R.FH valves CCF candidate identified candidate

12 Oskarshamn 1721V27 701210 r.R.-FH No information excluded CCF- CCF- not


721V28 7*1210 r . R o F H available case candidate candidate identified

13 Oskarshamn 2 321V2 810723 nr.R.FH Internal potential potential not not


321V32 810730 nr.R.FH leakages CCF CCF identified identified

1* Oskarshamn 2 323V3 780628 nr.R.FH Internal potential CCF- not not


323V15 780714 nr.R.FH leakages CCF candidate identified identified

15 Oskarshamn 2 721V25 770707 r.R.FH Sticking potential CCF- CCF- CCF-


721V26 770703 r,R,FH valves CCF candidate candidate candidate

16 Ringhals 1 323V3 811105 r , R , , F H External notable CCF- CCF- not


323V 811105 r.R.-FH leakages case candidate candidate identified

17 Ringhals 1 322V7 791028 r R , F H Incorr. torque not not not CCF-


322V8 791028 r,-,R,FH switch settings identified identified identified candidate

r * redundant components -R normal operation


nr = non-redundant components F H = critical failures
R * overhaul period -.FH = non-critical failures

244
The following sections will show a great variety in data
screening, selection, selection of CCFmodels and parameter estimation
methods among the four teams participating in the exercise. Thus the
basic purpose of learning by the Benchmark Exercise was fulfilled.

2. PARAMETRI
C MODELS USED

Several parametric CCFmodels were used, quite in accordance to the


objective of the exercise. The models were

Multiple Greek Letter (MGL) (Ref. 5)


Binomial Failure Rate (BFR) (Ref. 6)
Multinomial Failure Rate (MFR) (Ref. 7)
Additive Linear Model (ADDEP) (Ref. 8)

In addition, two teams applied also a nonparametric approach, where


probabilities P. . were directly assessed (DA) from the data using
Bayes' method (Rers. 8, 9 ) . In order to describe and comment the use of
the forementioned models we start with the parameters of the Basic Par
ameter (BP) Model (Ref. 5).

\ . = rate of failure on demand for a specific group of j compo


nents; j = 1, ..., m, among m identical components.

Thus each specific group of j components may be hit by a failure process


characterized by the failure rate ..
For the convenience of the reader we define below the parameters
of the forementioned models and we present the exact or approximate
relations between these parameters. Questions concerning the estimation
of the parameters are also commented. Throughout this paper we will
consider a system of four identical components.
The probabilities P. . of having exactly i failures at a demand of
m redundant components (to De estimated in this Exercise) are rather
difficult to express in terms of the BPparameters. This depends on the
"exclusive" nature of the event: exactly i failures. A rare event approx
imation yields the following relations for a system of four redundant
components

P = 4
l/4 \

P = 6 + 6
2/4 2 12

P = 4X + 12 1 2 2
3/4 3 12 +
\ + 4
13

P = + 4 + 3 2 + 12>
4/4 4 V3 2 2X3 + 6
32 + 6
122 + 1 2
\22 +

+ 4X +
2 1

245
These relations combined with Table 2 give us the possibility to
express the probabilities P. . also in terms of the parameters of the
other models treated in this paper.

3. MULTIPLE GREEK LETTER METHOD

The parameters of this model are:

X = rate of failure on demand for each component due to both


independent and common cause events

B,V, = conditional probability that the cause of a component


failure that is shared by i or more additional components
will be shared by i + 1 or more additional components,
i = 0, 1 and 2 respectively.

The total failure rate, , of a specific component, in a group of


m components, is given by

= ( _ ;). D(
c
j=1 V W D
which for m = 4 yields

X + 3 +3 +
c = \ 2 3 4 <2>
According to the definitions of , y and we can write

3^ + 3_, +.
2 3 4
32 33 + 4 (3)
+ +

3 +
>3 4
= 3 + 3 + (4)
* \ \ \

^4

The MGLmethod contains as many parameters as identical components.


Thus in the present case of four components we have the parameters (~\ ,
) , ) , which are related to the BPparameters . (j = 1, 2, 3, 4) as
shown in Table 2.
As Apostolakis and Moieni (Ref. 7) have pointed out it is very
important to note that the conditional rate of failures, given by expres
sions (3) (5), are component specific and thereby, are not equivalent
to the conditional rate of occurrence of multiple failures, as they are
often used (also in this Exercise). The conditional probability, ,
that the cause of failure of component A (among A, B, C and D) is shared

246
by one or more additional components is not equal to the conditional
rate of multiple failure given a failure, which in the present case is

3 4
^ (6)
4 + 6 + 4 +
1 \ 3 4

The numerator of eq. (6) denotes the rate of multiple failures and the
denominator stands for the rate of failures in general, without specify
ing which components are involved. Ref. 7 also explains very clearly
how the likelihood function normally used for the estimation of MGLpara
meters is incorrectly stated with regard to the evidence of component
failures. The use of that function and the number of component failures
leads to overconfidence in the estimate, and in certain situations also
to an underestimation of the mean value. In light of these difficulties
one may consider the development of the Alpha Factor model described
elsewhere in these proceedings.
The MGLmethod presupposes the knowledge of

k. = number of failures of multiplicity i,

= number of system demands (m components)

If X is known from some other source, it is enough to know the number


of failures, k., to estimate the parameters , } , .... This was just
the case in our Benchmark Exercise, where \ was assumed known from the
Swedish Reliability Data Book (Ref. 10). The MGLparameters and multiple
failure probabilities have been estimated by the maximum likelihood
method, which is not consistent with the Bayesian estimation used for
the uncertainty distributions. Further, concerning the results shown in
Fig. 1, it is to be noted that the MGLintervals are not complete uncer
tainty intervals because has been assumed exactly known.

4. BINOMIAL FAILURE RATE MODEL

The parameters of this model are:

= occurrence rate of independent failures

= occurrence rate of nonlethal shocks

w = occurrence rate of lethal shocks

= conditional probability of component failure given that a


nonlethal shock has occurred

The simple relations between these parameters and the parameters of the
BPmodel are shown in Table 2. The parameters of BFRmodel are the same
independent of the number of components in the system. This feature is
a clear advantage compared to the other models presented here, where

247
the number of parameters increases with the redundancy level of the
system. Thanks to this feature one could make the reasonable assumption
that the BFRparameters really are independent of the redundancy level.
One of the Benchmark teams made this assumption and thereby avoided the
"mapping up" problem of transferring the lower redundancy data to sys
tems of higher redundancy. The maximum likelihood estimators developed
for utilizing data from systems of different redundancy led to the re
sults labelled BFR I in Fig. 1.
v
A direct estimation of the BFRparameters above requires observa
tion of independent failures, nonlethal and lethal shocks. In practise
this requirement is hard to fulfil, in particular to distinguish between
independent failures and single failures caused by shocks. Another diffi
culty in our Exercise was the definition of redundant components: all
valves in the same system or strictly redundant groups of valves. There
fore different interpretations were applied in the Exercise. The esti
mates of and have shown to be sensitive to the degree of redundancy
(m), approximately following the relation m = constant. There are
also cases where an increased number of visible nonlethal shocks leads
to a lower value of .
The assumption of the binomial distribution in the BFRmodel may
be too strong. Thanks to this assumption of a certain distribution,
given a nonlethal shock has occurred, the number of parameters keeps
constant for all levels of redundancy. Giving up this assumption leads
to the model described in the next section.

5. THE MULTINOMIAL FAILURE RATE MODEL

The MFRmodel, developed by Apostolakis and Moieni (Ref. 7 ) , is charac


terized by the following parameters.

~\ = occurrence rate of independent failures

= occurrence rate of shocks that may fail more than one compo
nent

. = conditional fraction of exactly i failures, i = 0,1, ..., m,


given the occurrence of a shock.

If we neglect , the fraction of potential CCFs that lead to no fail


ure, we have the condition (for m = 4)

+ + + =1
1 2 3 4
Thus the three free fractions and the rates ~\ and constitute the
parameters of the MFRmodel for a 4redundant system. The relations of
these parameters to the parameters of other models are shown in Table 2.
As described in Ref. 7, the parameters above can be estimated eas
ily by Bayesian statistics if we know the number of system demands (N n )/
the number of independent failures (n ) and the number of potential or
actual CCFs in which j components fail (n.). The name of this model
J
248
refers to the multinomial likelihood function for the fractions .. In
our Benchmark Exercise there was only one MFRapplication, the result
of which, unfortunately, is not correct because of a misinterpretation
of the observation .
In section 3 we mentioned the overconfident estimates normally
used for the MGLparameters. As described in Ref. 7 it is much easier
to define consistent Bayesian estimators of the MFRparameters, based
on Beta and Dirichlet distributions as prior distributions. The multi
variate posterior distribution of the MFRparameters could then be trans
formed to the corresponding distribution of the MGLparameters according
to the relations between these parameters, shown in Table 2.
The recently developed Alpha Factor Model, described elsewhere in
this publication, can be considered as a simplified version of the MFR
model, where the simplification consists of the fact that no distinction
is made between independent failures and single failures caused by CCF
shocks. The Alphafactors have a multinomial likelihood and are there
fore easy to estimate by a posterior distribution. A posterior distribu
tion of the MGLparameters, if that distribution would be required for
reasons of comparison, could then be obtained from the relations in
Table 2, or explicitly

2. + 3, + 4.
2 3 4

, + 2 + 3. + 4.
1 2 3 4

3, + 4
3 4
(7)
2_ + 3, + 4
2 3 4

4
4
3, + 4
3 4

In our Benchmark Exercise there was no application of the Alpha Factor


Model.

6. THE ADDITIVE LINEAR MODEL

The basic element of the ADDEPmodel, described in more detail in


Ref. 8, is the probability

P. = Pr {k specific components out of k fail}.

This probability is assumed equal to the probability

P. = Pr {k specific components out of m fail}; m > k.

This assumption is quite fundamental and means that the failure probabil
ity of k specific components is not dependent on the total number of
redundant components of the system. Between the probabilities P, and
the probabilities of interest in this study, P. . , we have the relation
l/m
249
TABLE 2 Relations between the parameters of various CCFmodels.

ADDEP ALPHA

+ (-) + * ^ 1"8^ ( ^ ^ ^ - ^ ^ ) at c

2 2()2 2 / 6 ||<ix>X
( 1 B X Cc ( I ^2 Y V
3 + YW
4 Y 3 )^
V2X
cC 3^ '


3 ( 1 p ) *3/4 ()
J C 14>32 c
't

4
4
. 4
-r +*
" f
'4 w\
Bf5X
.,,
4 3 : .
t

\ = F"^ i k 2
* Mc1

a. = a. + 2 o . + 3a_, + 4a
t 1 2 3 4
(8)
(hl/iK/m.
In the ADDEPmodel, which is of pure mathematical nature, the probabil
ities , k = 1, 2, 3, 4, are expressed in terms of additive dependency
factors Dk as follows
k
=
1 1
P
2 = P12+ D
2 (9)
P P 3 + 3D P + D
3 = 1 2 1 3
P = P
4 14+ 4D
3P1 + 3D
2 (D
2 + 2P
1 )+ D
4
Equations (9) are easily extendable to higher multiplicities. Approxi
mate relations between the probabilities and the BPparameters are
shown in Table 2.
In our Benchmark Exercise an application of the ADDEP model was
performed, where the probabilities P, were estimated using MLE (see
Fig. 1). The same team had also applied the Bayesian method to estimate
the multiple failure probabilities P. , starting with both a noninforma
tive and an informative prior distribution. This technique is described
in more detail in the next section. In principle, one could have gone
further calculating the multivariate posterior distribution of the
ADDEPprobabilities by using eq. (8).
According to the definition of the probabilities , , these proba
bilities are directly applicable for the calculation of the minimal cut
set probabilities. The fundamental assumption of P, being independent
on the redundancy level m justifies, in principle at least, a simple
pooling of data for different redundancies.

7. DIRE
C T ASSESSMENT OF A NONPARAMETRIC MODEL

In this Benchmark Exercise we tried to estimate the probabilities P. .


of exactly i failures at a test or demand of m redundant components. In
the foregoing sections we have treated different parametric models, the
parameters of which denote something else than P. . but where these
probabilities can be expressed in terms of the parameters. Thus the
parameters have to be estimated first. Two teams of this Benchmark tried
to apply a direct estimation of the probabilities P. . combining the
available information about the number of failure events of different
multiplicities (n., i = 1, ..., m) and the number of system demands
(N ) with a prior distribution for the unknown probabilities (Fig 1).
Given the evidence = ( , , ..., ) from system demands
( = .) we have the multinomial nonstandardized likelihood
m i

L(P / ) = P /^'pm/Jtt (10)
./m ' m o/m m/m
251
A prior distribution n(P , ), which is conjugate with respect to this
likelihood, is the mvariate Dirichlet distribution with parameter

s = <V V
"<P. /m ; s) = a(p /m ; a)
( +...+ a ) a,l a 1 m a 1
= ^ L p , <1..) , (11)
( ) ( ) 1/m m/m , i/m'
m 1

which is defined at any point in the mdimensional simplex:

S = {(, . ,. .., , ) I P. , > 0 , i = l , . . . , m and P. . <. 1}


m * 1/m m/m' ' i/m l/m

This distribution has the mean values

E P a (12)
< i/m> V
and the always negative coveriances

a. a.
1
Cov(P . .)= , (13)
/ 1/m a 2 ( a+ 1 }

where a = a..

o

Now the posterior distribution will be an updated Dirichlet

n(P . ) = d(P . ; + ) , (14)


./m ' m ./m m
for which the moments corresponding to (12) and (13) are also easily
updated.
Because each element of the simplex S corresponds to a discrete
distribution in m + 1 points, this approach can be considered as a non
parametric model, where we assign a prior distribution on a set of prob
ability distributions (Ref. 11). The Dirichlet distribution in (11) is
our prior distribution, the mean values of which, (12), correspond to
our best prior guess. Thus our main problem is how to choose the para
meter of the Dirichlet distribution.
In our Benchmark Exercise two teams applied a so called noninforma
tive prior, which according to Box & Tiao (Ref. 12) is in this case the
following Dirichlet distribution

n i]
^./} =d(Z
/m; * (15)

In Ref. 12 the concept "noninformative" means "knowing little apriori


relative to the information to be provided by the experiment or observa

252
tion". However, one can raise the question whether we really know so
little apriori? A reasonable "engineering judgement" might be that

, > P. , > ... > , (16)


o/m 1/m m/m
Thus we could choose a prior distribution such that

E(P . ) > E(P, . ) > . . . > E(P . . ,...,.


v
o/m 1/m' m/m), (17)

o r , a c c o r d i n g t o ( 1 2 ) , an p a r a m e t e r such t h a t

> , > . . . > cr (18)


o l m
If our apriori judgement can be considered to be equivalent with the
evidence of system demands and single failures, we could start by
choosing = , = and a = . Some "typical" correlations
of the type
. . (19)
f (P
i/m' P j/m ) = " l(a a.)(a
l' a.)l
D'

could possibly help us to choose the other a., i = 2,..., m. From the
prior distribution of the application DA II (Fig 1, = 4 721,
= 4 606, = 109, = 5, = 1 and = 0) we get the following
set of correlations between , and the other probabilities:

o/4

f (P
l/4' V0/A) = 97
(20)
( 2 / 4 ' P0/4> = " 2 1

f ( P 3/4' P0/4> = n

C ( P 4 / 4 , PO/4) = 0.06
If these correlation coefficients, as a first guess, could be consid
ered to be typical for CCFs of motoroperated valves, they could guide
the analyst to choose the parameter of the prior Dirichlet distribu
tion.

8 SUMMARY

The most representative results of the Benchmark Exercise are shown in


Fig. 1 (Ref. 2 ) . For several reasons, as for instance different screen
ings of the basic set of CCFdata, different data extension schemes, in
some case different treatment of functional failures and leakages, and
in some other case incomplete determination of uncertainty intervals,
the results of Fig. 1 are not directly comparable to each other.
Common to all applications described in the previous sections is

253
io1

102

M MGL^I

X MGL^
103..
D DA S I

0 DAvn
C]ADDEPV io- 4
> BFRAAD

BFRVI

0 MFR a a 105

io1

FIGURE 1 Estimated probabilities of observing exactly i (i = 1, 2, 3, 4) failures per


per demand and
corresponding 90% confidence intervals.
the assumption of homogeneous populations of standby systems, which
means that there is no planttoplant variability. The basic set of
data is too scarce to allow the treatment of such a variability.
Different CCFmodels require different levels of dissection of the
operating and failure data. In this Benchmark Exercise the BFRmodel
has been applied to even weak redundancies, while all of the other mod
els require strict redundancy. Of course, even the weak redundancy must
be such that the BFRparameter can be considered to be equal for all
redundant components.
In the BFR and MFRmodels a distinction is made between indepen
dent failures and single failures caused by shocks (potential CCF),
while for other models treated in this paper it is sufficient to distin
guish single and multiple related failures. If the former distinction
is possible, the BFR and MFRmodels ought to give a more realistic
prediction of the CCFcontribution. In case of the BFRmodel, this state
ment is valid provided that the binomial distribution of that model
does not deteriorate its precision.
The nonparametric model applied in this Exercise for the estimation
of the probabilities P. . can be considered as a reference model, with
which the parametric models may be compared. Having no appropriate prior
information of the probabilities P., , a noninformative Dirichlet distri
i/m
bution may be used in a Bayesian approach. The Dirichlet distribution
is conjugate and therefore easy to apply in all models which have a
multinomial likelihood, such as the nonparametric model, the MFR and
the Alpha Factor model. The Dirichlet distribution can also be used as
a "state of knowledge"distribution for other models, the parameters of
which are easily derivable from the parameters of the multinomial
models. In addition, the use of a multinomial distribution such as the
Dirichlet has the advantage of displaying the negative correlation be
tween multinomial parameters.

REFERENCES

1. Hirschberg S, NKAproject 198589: Risk Analysis Proposed Techni


cal Content. RAS470(85)1 (AB ASEAATOM Report KPA 85124) May 1985.

2. Hirschberg S., ed., NKAproject Risk Analysis (RAS470): Summary


Report on Common Cause Failure Data Benchmark Exercise, Final Report,
Report RAS470(86)14, June 1987.

3. Hirschberg S., "Nordic Common Cause Failure Data Benchmark Exer


cise". Paper in this publication.

4. Petersen K. E., "Analysis of CCFData Identification The


Experience from the Nordic Benchmark". Paper in this publication.

5. Fleming K. N., Mosleh ., Deremer R. ., "A Systematic Procedure


for the Incorporation of Common Cause Events into Risk and Reliability
Models". Nuclear Engineering and Design 93, 1986.

255
6. Atwood C. L., Estimators for the Binomial Failure Rate Common Cause
Model, NUREC/CR-1401, Prepared for USNRC by EG & G, Idaho, Inc., April
1980.

7. Apostolakis G., Moieni P., "The Foundations of Models of Dependence


in Probabilistic Safety Assessment". Reliability Engineering 18, 1987.

8. Pulkkinen U., "Statistical Treatment of CCF-Data. VTT Symposium


Series, 1986.

9. Dinsmore S., Prn K., CCF Quantification from Plant Data, Report
Studsvik/NP-86, Studsvik Energiteknik, Sweden, April 1986.

10. Reliability Data Book for Components in Swedish Nuclear Power


Plants. RKS 85-25. Nuclear Safety Board of the Swedish Utilities and
Swedish Nuclear Power Inspectorate, May 1985.

11. Ferguson T. S., "A Bayesian Analysis of Some Nonparametric Prob-


lems", The Annals of Statistics, Vol. 1, No. 2, 1973.

12. Box G. E. P., Tiao G. L., Bayesian Inference in Statistical Analy-


sis, Addison Wesley, Reading Massachuseths, 1972.

256
ANALYSIS OF COMMON CAUSE FAILURES BASED ON OPERATING EXPERIENCE
POSSIBLE APPROACHES AND RESULTS

T. MESLIN
EDF/SPT
3, rue de Messine
75384 PARIS Cedex 08
FRANCE

ABSTRACT. The analysis and quantification of common cause failures may


be performed in two complementary ways :
- by using national operating experience records,
- by specific on-site investigations.
The paper describes their merits and results of both approaches and
illustrates them with examples.
In particular common cause failures have been analysed and quanti-
fied trough two studies performed at EDF :
- Nationwide compilation of more than 1 200 events or failure sheets
concerning common cause failures in 900 MWe units.
- An analysis based on investigations carried out on a 1300 MW site.
The paper concludes that common cause failures are unavoidable
and not simple solution is available but the quantification becomes
possible and orders of magnitude are know (for the second order).
Finally, the analysis of operating experience is an indispensable
condition of success to prevent common cause failures.

1. GENERAL REMARKS

The analysis of common cause failures may be performed in two comple-


mentary ways :
- by using national operating experience records,
- by using specific on-site investigations.
These two approaches have somewhat different goals and methods
but are both equally necessary.
When a operator as EDF who has gained an already wide operating
experience (170 reactor-years by the end of June 1987) compiles the
operating experience records, a large number of problems can be iden-
tified and quantified ; though the final collection of facts can never
be considered to be perfectly exhaustive.
On the other hand, failures that have escaped the fine toothcomb
of national records can be detected during specific investigations.
In the paper we examine the merits and results of both approaches
and we illustrate them with concrete examples.

257
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 257-276.
1989 ECSC, EEC. EAEC, Brussels and Luxembourg.
A synthetic table provides a summary of the characteristics of the
applied methods.

NATIONAL RECORDS INVESTIGATIONS

Objectives - Identification of - Identification of problems :


of the generic problems
analysis * specific problems
- Possible quantification
* undetected problems

- Quantification difficulties

Methods - Compilation - Investigation on the site

- Use of computer files - Limited data processing


means

- staff is needed on the spot

Analysis - No absolute exhausti- - High degree of exhaustivity


record vity guaranteed
content
- No warranted homoge-
neity

Analysis Covers the various power In general restricted to the


scope plant series (CPO, CP1, site studied
CP2, P4, P'4..)

The two following examples will be examined in this paper :


- Nationwide compilation of more than 1 200 events or failure sheets
concerning common cause failures in 900 MWe units.
- An analysis based on investigations carried out on a 1300 MW site
(two-years-long investigation at Paluel).

- 258
2. ANALYSIS OF COMMON CAUSE FAILURES BASED ON OPERATING EXPERIENCE
RECORDS

2.1. Objectives

2.1.1. General Remarks

This series of studies undertaken on common cause failures comes within


the framework of the probabilistic safety studies of the Paluel power
plant.
As the french operating experience was long considered inadequa
te, work in that field was often based on the American studies.
This approach can from now on be supplemented by French studies
since :
the operating experience of 900 MWe PWR reactors is growing
(> 124 reactor years when this study was performed),
the Service de la Production Thermique has acquired the necessary
means to undertake the analysis of operating experience (Event sheets
and the S.R.D.F.),
finally, the SPT has access to the N.P.R.D.S. American reliability
data file which has the same objectives as the SRDF.
Thus, two French data sources and an American one are available
for these analyses.

2.1.2. The Analysis Goals

2.1.2.1. Assessment of Data Sources

Two French data sources are available for this assessment but none is
adapted to the quantitative assessment of common cause failures (nor is
the N.P.R.D.S., as a matter of fact).
The analyses are thus first aimed at determining whether a quali
tative and quantitative assessment of common cause failures is feasible
considering the different data sources now available.
Top priority is therefore given to two aspects :
The amount of data, i.e the extent of the operating experience requi
red for the purpose of quantification.
The nature of the information content of the event reports (event
sheets or S.R.D.F. and N.P.R.D.S. sheets) required to identify and
characterize a common cause failure.

2.1.2.2. Characterization of Common Cause Failures

American studies generally contain lists of events regarded as common


cause failures : for instance, loss of power supply or human error du
ring calibration opration.
Moreover, these studies take into account another category of fai
lures the non lethal failures resulting in the failure of k com
ponents with a probability as determined by the binomial failure rate
method (2).

259
This attractive and ambitious characterization is tested against avai
lable data. One major problem here is to have sufficiently large sam
ples to make the characterization meaningful.

2.1.2.3. Quantification

These studies are ultimately aimed at quantifying common cause failu


res. This implies requirements for the data base (content, exhaustivi
ty...) which are not easily met. However, with proper care, the analy
sis can yield a minimum amount of results :
Determination of a realistic order of magnitude for the independent
failure rates and using the different sources.
Checking the consistency of the results against other sources (pre
vious American studies, values proposed by experts).
Assessment of common cause failure rates from the three available
sources using the same methods ; realistic result synthesis.

2.2. Method

2.2.1. Scope of the Studies

2.2.1.1 Equipment Considered

The series of studies performed at the SPT up to 1986 concerns six ty


pes of equipment.

Instrumentation
. measurement transmetters
. all or nothing sensors
. analog measuring channels and, in particular, protection chan
nels
. process control channels

ATS pumps
. the pump itself
. the electric motor
. the turbine
. the lubrication and cooling devices
. the directly connected controls

LPSI pumps
. the pump itself
. the electric motor

Containment spray pumps


. the pump itself
. the electric motor

Primary and secondary system valves and especially :


. electrically operated valves
. air operated valves

260
. check valves
. manually - operated valves

- Circuit-breakers and contactors in primary and secondary sys-


tems, especially :
. RPS (trip breaker)
. engineered safety systems
. 6,6 kV busbars

2.2.1.2. Events Considered

Generally, all the events recorded in French 900 MWe PWR units since
their commissioning are taken into account.
The operating experience of 1300 MWe power plants was too limited
when the study was undertaken and was, therefore, not used in these
analyses.
When American data are used, they concern PWR plants in general
and, most often, only those built by Westinghouse.

2.2.2. Data Bases

2.2.2.1. Nature

- Three data bases are used :


The data bas consisting of the SPT Event File. It contains some
15 000 events.
It is supplemented by related documents which are mentioned in
each event sheet as, for instance, incident reports.
This data base may be considered fairly exhaustive as regards the
reported events, but certain items, such as the repair time of the da-
maged equipment, are not yet well documented.

- The S.R.D.F. file (Systme de Recueil de Donnes de Fiabilit)


The SRDF contains over 10 000 failures. It was started in 1978 on
the Fessenheim and Bugey sites and, then, extended to all EDF PWR sites
and even quite soon to Creys-Malville. Some 400 components are monito-
red in each reactor resulting in a data flow of approximately 150 sheets
per unit each year.

- The N.P.R.D.S.
Wherever possible, failure rates have been computed for some equip-
ment from the N.P.R.D.S. in order to compare the results obtained.
For one type of equipment, the AFS pumps, all the available
N.P.R.D.S. sheets have been analyzed and transcribed into the S.R.D.F.
format using the relevant codes.

2.2.2.2. Time Period and Power Plants Studied

. Event File

All the French 900 MWe PWR power plants are taken into account.

- 261 -
The time span studied goes from the commissioning of each unit to
July 1 st, 1985 in general and to December 31 st, 1985 for most recent
studies.
The study thus covers :
9 power plant sites (Fessenheim, bugey, Tricastin, Gravelines,
Dampierre, Blayais, SaintLaurent B, Chinon, Cruas)
32 units
1 087 790 hours in total (124 reactoryears), that is
830 576 hours of operation with the unit connected to the grid.

. S.R.D.F.

The sample consists of the Fessenheim, Bugey and CP1 CP2 power
plants.
The time span considered goes from each plant commissioning to the
31 st 12 . 85 for most recent investigations.
As it takes a long time to incorporate the failures into the
S.R.D.F. after they have occurred, a relatively low number of 1985
events have been taken into account.

. N.P.R.D.S.

The N.P.R.D.S. sample is very varied.


29 different power plants located on 22 sites and belonging to 20
operators.
48 different pump models in the AFS system. A specific study has been
devoted to this system.
The total number of hours (total sums of the studied time periods for
each type of quipement) is over 7 000 000 hours.

2.2.3. Quantitative Analysis

Computation Method

An American statistical model developed by ATWOOD (ref. 2/3) is


used to quantify common cause failure rates.
This model is presently one of the best tools available to determi
ne second order or higher order common cause failure rates.
The computation is based on the following principles : A given type
of component belonging to a given elementary system is considered.
The model draws a distinction between three types of events :
Independent failures (hourly failure rate )
. The event concerns only one failed component, and the occurrence of
a CCF can not be inferred from the nature of
that failure.
. The event concerns several components, and, apparently, no func
tional or other relation can be identified which would link these
failures together.
Potential or non lethal common cause failures (hourly failure rate u)
. only one component is involved in the event though, by the very
nature of this event, another identical component could be involved.

262
. Several components are involved in the event, but the failure is
either not catastrophic or not simultaneous.

Lethal common cause failures (hourly failure rate )


. Only one component is involved in the event though, if there had
been several identical components, all would have failed simulta
neously. By definition, such events can only occur in non redundant
systems.
. Several components are involved in the event. They have all failed
simultaneously due to the same cause.

Once the parameters , , , have been determined, the failure


rate of any group of components may be computed :

group of k components
out of ( k < )

up +

and the coefficients ""

+ up +

Results for the analyzed events

The table below gives the number of events considered in the various
equipment studies on common cause failure according to the data base
they belong to.
and indicate that the values of the reliability parameters
given in the data bases were used for comparison's sake.

263
EVENT FILE S.R.D.F. N.P.R.D.S. TOTAL

NOT TAKEN INTO


INSTRUMENTATION 192 NOT USED 192
ACCOUNT

VALVES 503 503

AFS PUMPS 173 132 113 418

SI PUMPS NOT USED 36 36

CS PUMPS NOT USED 24 24

CIRCUITBREAKERS NOT USED 86 86

TOTAL 868 268 113 1 249

As can be seen, more than 1 200 events have been analyzed one after the
other in the framework of this project.

2.3. Qualitative Analysis

A total of 100 failures or events among those examined have been regar
ded as common cause failures. These events may be broken down as shown
below.

number of events

EF SRDF TOTAL

INSTRUMENTATION 16 _ 16
VALVES 47 47
AF PUMPS 6 5 11
CS PUMPS 2 2
SI PUMPS 9 9
CIRCUIT BREAKERS 15 15

TOTAL 69 31 100

264
The observed failures can be distributed among the large common cause
failure categories as follows :

Instrumen- Circuit-
tation
AFS SIS-CSS Valves
breakers
Total

Environmental effects

- normal inner environment 4 1 0 6 0 11


- internally - generated acci-
dent conditions 1 0 0 0 0 1
- externally - generated acci-
dent conditions 0 0 0 0 0 0
- external natural environment 6 2 0 0 0
s
Design errors

- unfit component 0 3 0 1 0 4
- System layout 1 0 7 0 1 10
- inadequate periodic testing 0 0 0 0 0 0
- loss of auxiliary supporting
system 0 0 0 3 0 3

Manufacturing errors

- non compliance with spcifi-


cations 0 0 0 0 0 0
- manufacturing defects 0 1 0 9 3 13
0 0 0 0 0
- insufficient controls 0
0 0 0 0 3
Assembly errors 3
3 2 4 7 8
Maintenance errors 24
1 2 0 1 0
Operation errors 4

Unknown 0 0 0 20 0 20

TOTAL 16 11 11 47 15 100

As can be seen in the total sample, the various common cause failures
are rather evenly distributed among the main different failure catego-
ries.
- 20 X environmental effects
- 20 % design errors
- 15 X manufacturing defects
- 30 X human errors.

Among the large common cause categories, there are :

265
Environmental effects

- The cold wave recorded during the 1985 winter affected the instrumen-
tation (frozen sensors) and various components of the AFS (pipings
and lube oil pumps).
- The operating conditions of the moisture - separator - rehearters
systematically affected the level measurements in these tanks.
- The presence of steam and humidity in the main steam system valve
compartments impaired some sensors.

Design errors

- The Fessenheim plant LPSI pumps have long suffered from the moisture
build-up caused by the SG blowdown system.
- The defective design of the AFS pump discharge lines in some Westig-
house power plants in the USA has, on several occasions, resulted in
the loss of the AFS pumps (so-called "backleakage" phenomena).

Operating conditions

- Vibration of all the CS pumps in the CP1 and CP2 standard units.
- Isolation valve jamming in the RCS - RHRS systems due to an inner
thermal effect.

Human errors

- Defective sensor adjustments.


- Defective assembly of all the AFS control valves at the Cruas power
plant (events not taken into account in the study as this problem
accured before commercial operation).
- Adjustment problems on some circuit-breakers.

Most of the failures thus identified have induced generic or specific


changes and, in particular :
- The problem of the LP safety injection pump at the Fessenheim power
plant has been solved.
- Systematic changes to protect the units against the effects of low
temperatures.
- Systematic inspection of exposed valves and replacement of defective
components in all 900 Mwe reactors.

2.4. Quantitative Results

Many parameters can be computed by applying the binomial failure rate


model.
An example of all the results that can be obtained for the diffe-
rent pumps is given below as well as the results of other simular stu-
dies.

266
EVENT REPORTS SRDF NPRDF EDF/CEA
EPRI NU REG GI
62 63 62 63 62 63

-2 -2 -2 -2 -2
AFS 001- 2,5.10 1,6.10 2.10 2.10 6,8.10" 4,5.10 3.10-2 1,5.10" Sao"2
002 PO

-2
AFS 003 PO 3,6.10

-2 -2 -2
CSS 001- 6.10 6.10 5.10-2 5.10
002 PO

-2 -2 -2 -2
SIS 001- 4.10 4.10 17.10 5.10
002 PO

Finally, in the light of the studies performed, it seems possible


to derive realistic second order common cause failure rates from the
French operating experience.
Third order failure rates are more difficult to determine since,
generally, the samples are still rather small and few events have in-
volved more than two components. However, relatively sensible third
and fourth order values were obtained for AFS pumps because the sample
is larger than for other components.

EQUIPMENT Used values

62 63 64
-2
Instrumentation 5.10-2 2.10 -

-2 -2
Valves Electrically-operated valves 6.10 2.10
-2
Air operated valves 7.10-2 2.10 -
-1
Check valves (leaking) 10 5.10-2
-2
Check valves (stuck) 3.10 - -
-2 -2 -3
Pumps AFS pumps 5.10 2.10 5.10
CSS pumps 5.10 2.10"
-2
SIS pumps 5.10 2.10

- 267
3. ANALYSIS OF COMMON CAUSE FAILURES IDENTIFIED FROM ON-SITE
INVESTIGATIONS

3.1. Objectives

The objectives set for this study were twofold. They are related, on
the one hand, to the PSA of 1300 MWe power plants and, on the other, to
the analysis of plant safety during operation.
- As regards the PSA of 1300 MWe power plants, the purpose is to make
sure that actual operation is taken into account by spotting problems
that may have been overlooked during the design.
- Regarding the safety, the objective is to put to use the teachings of
these investigations to improve the safety level of plant operation
activities by correcting possible design or organization deficiencies
and by making sure these problems have actually been taken into ac-
count in the operating experience.

3.2. Methodology

In the framework of the PSA of 1300 MWe units, an engineer is sent to a


nuclear power plant site to investigate all the events or failures
likely to impair the plant safety : this corresponds to a much wider
definition than that of events significant to safety as found in the
technical specifications.
Since May 1 st 1986, all these events have been collected and ana-
lyzed.
In the framework of the investigation into common cause failures,
the following items are looked into, in particular :
- the root causes,
- the observed consequences in terms of availability and safety,
- the potential hazards,
- the corrective action already implemented as well as the operating
experience.
The studied operating experience amounts to some 9 reactor-years
for 4 identical 1300 MWe units.
Over 300 events have been analyzed among which :
- 17 were classed as common cause failures.
- Approximately 20 sheets bear some relation to these 17 events but do
not actually belong to the common cause failure category.

3.3. Characterization of the Recorded Failures

As in nationwide investigations on 900 MWe power plants, the recorded


events can be distributed among the main CCF categories.
Some events have several different causes.
The recorded common cause failures belong to main categories :
- design errors (nearly one third of the sample),
- manufacturing defects (some 15 % ) ,
- human errors during plant operation or maintenance activities (almost
40 % ) .

- 268 -
Nb. %

Environmental effects

- normal Inner environment 0 0

- Internally - generated accident conditions 0 0

- externally - generated accident conditions 0 0

- external natural environnent 1 A,5

Design errors

- unfit component 2 9

- System layout 4 18

- Inadequate periodic testing 0 0

- loss of auxiliary supporting system 0 0

Manufacturing errors

- non compliance with spcifications 0 0

- manufacturing defects 3 14

- insufficient controls 0 0

Assembly errors 0 0

Maintenance errors 6 28

Operation errors 3 14

Unknown 0 0

TOTAL 19 100

Corrective actions vary according to the nature of the observed common


cause failures. Thus, whenever a design error is generic and common to
all the units of a same series, a study is undertaken to examine the
necessary alterations.

269
Such a study is undertaken at a centralized level by the Direction
de l'Equipement and by the Service de la Production Thermique. The re-
sulting alterations are introduced in the operating units and taken
into account in the design of the future units.
Concerning manufacturing defects, the replacement of defective
components can be rapidly decided upon and extended to all the concer-
ned units.

3.4. Case Study

3.4.1. Analysis of a Design Error

In 1300 MWe nuclear power plants of the P4 series, both circulation


lube oil pumps were initially powered by the same busbars : this design
could result in a common cause failure of the type described below.
On December 4 th 1986 in the Paluel power plant unit 2, then ope-
rating at 100 % of'its power, and during a D.G. periodical test, a
"loss of fuel oil alarm" brought about the Diesel generator trip.
During this test, the Diesel generator is the only power source for
train A busbars at a load of about 30 %. The loss of the Diesel gene-
rator resulted in a loss of the power supply to the busbars, in turn
leading to the loss of all the lube oil pumps (trains A and B) of the
circulating pumps ; the pumps then tripped, thereby inducing the loss
of the condenser, the turbine trip and the reactor scram.
This incident results in a significant production loss. Moreover,
in a probabilistic approach, it can be analyzed as an accident initia-
tor using the logic diagram below.

- 270 -
Initiating 1 2 3 4
Elements ry event Consequences
events Turbine Opening o f SC r e l i e f
Loss of both Loss of t r i p induced at lease valve
circulating condenser reactor one SG r closing Recorded PotentUl
puapg scraa r e l i e f valve

YES
t a b l U e a t l o n by
s t e s a duap to
ataoephere

Kain eteata
Line Break
i^)
\y S t a b i l i s a t i o n by
t e u <Kun> to
a tzao sphere

Sequence Hain s t e s a l i n e
description Break

OK

OK

W)

SG pressure
Loss of If turbine Scram If valve reduction
at lube o i l la operating > \ decallbra by manually
( Description
pumps tlon Initiating
< steam r e l i e f
system
- A relatively simple modification consists in having each lube oil
pump powered by a busbar in trains A and B. This modification was
studied and then implemented in the Paluel, Flamanville and St-Alban
operating units. It is incorporated in the design of 1300 MWe units
of the next series (Belleville, Nogent...).

3.4.2. Analysis of a Manufacturing Defect

A manufacturing defect in a high temperature water hose affected all


the standby Diesel generators in the Paluel, Flamanville and St-Alban
1300 MWe units.
This failure was detected in particular in the course of the inci-
dent described below :
- On 2nd December 1986, in unit 2 of the Paluel power plant which was
on hot standby, a leak in the hot water deaerator hose upstream of
the gas manifold at the inlet of the turbocompressor resulted in the
ignition of the thermal insulation sodden with antifreeze.
Corrective actions were immediately taken :
. quick fire fighting measures,
. adoption of a different hose model on all the Diesel generators on
the site,
. replacement of thermal insulation.
The changes concerned all the Diesel generators on all the
sites.

4. CONCLUSIONS AND GENERAL REMARKS

- Our knowledge of common cause failures is constantly increasing.

In short, there are basically two complementary approaches when using


the operating experience to solve this problem :
. the statistical analysis of large files (significant events, fai-
lure data system...)
. the detailed analysis based on on-site investigations.

- The following remarks sum up the teachings derived from the work per-
formed by EDF using these two approaches.

. Common cause failures are unavoidable.


Studies show that the nature of these failures is multifarious :
design problems, external environment, extreme weather conditions,
manufacturing defects, human errors... Design errors can be obser-
ved even on facilities benefiting from the experience gained with
previous series of PWRs.

. There is no simple solution to this problem.


For each type of common cause failure, an appropriate defense sys-
tem can be implemented. The solutions are manifold : diversifying
the lines, the components, the suppliers ; high quality operation
and maintenance. Apparently, the multiplication of redundant and

- 272 -
identical lines is not efficient as it results in a rather small
gain only in terms of reliability. On the other hand, compliance
with the basic quality assurance rules may be an appropriate
measure against human-induced common cause failures.

The analysis of operating experience is vital.


Among the efforts to prevent common cause failures, the creation of
a high-performance operating experience recording system is an in-
dispensable condition of success. Indeed, the identification and
analysis of all the events originating from an actual or potentiel
common cause are vital.
In this perspective, such probabilistic analysis techniques as
fault trees and event trees may be extremely helpful.
Such an analysis may be expedient in devising and implementing
appropriate remedies.

Quantification is possible.
As regards common cause failure quantification, the French opera-
ting experience is now sufficient for a general study of common
cause failures. Moreover, this approach is made possible by the
quality of the data bases used (Event Files and S.R.D.F.) : exhaus-
tivity, homogeneity, easy handling...

The orders of magnitude are known.


The order of magnitude of the results obtained for independent fai-
lure rates and common cause failure rates is consistent and similar
to that of the results usually proposed and to that derived form
the various SPT files. Furthermore, second order common cause fai-
lures seem to be well-known now, but higher order failure rates
remain in many ways incertain.

273 -
REFERENCES

/1/ J. P. BERGER, E. BOURGADE, BLIN, ELLIA HERVY, MILHEM, Groupe de


travail "Modes Communs", CEA, FRAMATOME, EDF
Valeurs retenues pour la quantification des dfaillances de mode
commun de diffrents matriels.

121 C L . ATWOOD,
"Estimators for the Binomial Failure Rate Common Cause Model"
NUREG/CR-1401, EGG-EA-5112, April 1980

13/ W.E. VESELY,


"Estimating Common Cause Failure Probabilities in Reliability and
Risk Analysis : Marshall-Olkin Specializations", in Nuclear Sys-
tems Reliability Engineering and Risk Assessment.
J.B. FUSSEL and G.R. BURDICK, Philadelphia : Society for Indus-
trial an Applied Mathematics, 1977, pp. 314-341

IUI J.A. STEVENSON and C. Atwood :


"Common Cause fault rates for valves"
NUREG/CR-2770, EGG, EA 5485, March 1983

15/ T. MESLIN
"Analyse et quantification des dfauts de cause commune - Synthse
des tudes"
EPS DCC 008 - EDF/SPT D544 - SN 87/095

/(I EPRI - D. WORLEDGE


"Classification and analysis of reactor operation experience in-
volving dependent events"
EPRI NP 3967. Research Project 2159-4 - June 1985

/7/I EPRI - D. WORLEDGE


"A Study of Common Cause Failures - Phase 2 : A comprehensive
Classification System for component fault analysis"
EPRI NP 3837. Research Project 2169-1 - June 1985

/8/ J.H. MOODY et S.M. FOLLEN


Yankee Atomic CO.
"Common cause modeling of reactor trip breaker configuration"
International Topical Meeting on Probabilistic Safety Methods
and Applications
San Franscisco - February 24 - march 1, 1985

/9/ T. MESLIN
"Common cause failure analysis and quantification on the basis of
the operating experience"
Probabistic Safety assessment and Risk management
PSA' 87 - Zurich 30-8, 4-9 1987.

274
APPENDIX 1

CLASSIFICATION OF THE MAIN COMMON CAUSE FAILURES

1. ENVIRONMENTAL EFFECTS

- Normal inner environment (*)


(Dust, salt, humidity, temperature, vibration, corrosive atmosphere,
ionizing radiation...).

- Internally-generated accident conditions

. environmental conditions resulting from an accident ;


. pipe whip ;
. missile ;
. local flooding ;
. fire ;
. explosion.

- Externally-generated accident conditions

. air crash ;
. dam failure - induced flood ;
. explosion ;
. fire.

- External n a t u r a l environment

extreme weather conditions (frost, wind...) ;


earthquakes ;
floods (water rising).

2. DESIGN ERRORS

system component unfit for its mission ;


system layout incorporating potential CCFs
inadquate or detrimental periodic testing
system (or component) difficult to operate
system (or component) difficult to service
oversight and negligence in design studies
inadequate optimization of design as regards common cause failures.

(*) Nuclear power plant, chemical plant.

275
3. MANUFACTURING ERRORS

. non-compliance with manufacturing technical specifications


. manufacturing defects ;
. inadequate controls.

4. ASSEMBLY ERRORS

5. HUMAN ERRORS

. during plant operation ;


during periodic testing ;
. during maintenance operations.

276
MULTIPLE RELATED FAILURES FROM THE NORDIC OPERATING EXPERIENCE

K. U. Pulkkinen
Technical Research Centre of Finland
Electrical Engineering Laboratory
Otakaari 7
SF02150 Espoo
Finland

ABSTRACT The operational experiences from Nordic nuclear power plants


are analysed in order to identify dependent and multiple related
failures. The components considered in more detail are motor operated
valves and diesel generators. The multiple failures are quite few but
single failures having causes typical to common cause failures have
occurred rather frequently. The analysis of operational experience with
respect to other components and common cause initiators have not been
performed exhaustively.

1. INTRODUCTION

The studies of the Nordic operating experience directed mainly to the


common cause failures have been rather few. The common cause failure
experiences have been studied as a part of ordinary failure data
analysis or operating experience analyses. The only studies the
objective of which has been the evaluation of common cause failure data
are the studies performed in the nordic CCFdata Benchmark. Exercise
/110/. Other studies where common cause failures have been considered
are the failure data analyses of diesel generators at the Nordic
nuclear power plants /1112/. The studies concerning common cause
initiator experience have been few. However, common cause initiators
have been considered in Swedish PSAs and the analyses of disturbances
at Swedish nuclear power plants /1315/. The common cause failures and
initiators will also be analysed in the Finnish PSAs, but the results
of these analyses are not yet available.
The Nordic operating experience concerning nuclear power plant
components is rather comprehensive. The single component failure data
has been analysed to be used as data in PSAs . The observed number of
multiple failures has not been very high, which has been the main
reason for the lack of extensive common cause failure data analyses.
In the following the common cause or dependent failures of diesel
generators and motor operated valves are considered. Other components
are not considered due to the lack of analysed data bases.

277
A. Amendola (ed.),
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 277-288.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
TABLE I. Identified multiple failure events

Status
Event Plant Compo D ate Comm Event descro ASEA Rise Studsvik VTT
No nents enu tlon ATOM

1 Barsebck 1 3UV50 8009 r.R.FH Internai potential CCF CCF CCF


m
3UV60 80090* r,R,FH leakages CCF candidate candidate candidate
2 Barseback 1 721V2S 800930 r.R.FH Internal potential CCF CCF CCF
m
721V26 800918 r.R.FH leakages CCF candidate candidate candidat
J Barseblick 2 J22V1 810907 r,,R,FH Wrong connec actual not not not
m 322V2 810907 r,Jl,FH tions CCF identified identified identified
u
322V3 810907 r.A.FH

* Barteback 2 311VJ0 800702 r.R.FH Sticking notable potential CCF not


3UV60 100702 r,R,,FH valves case CCF candidate identified
3UV70 800702 r,R.FH
m
311VSO 800702 r.R.FH

3 Forsmark 1 321V33 810627 r,R,FH Internal potential not CCF CCF


321V3* 810718 r,R,FH leakages CCF Identified candidate candidate
f Forsmark 1 323V201 820709 nr,R,FH Internal potential potential not not
323V21 820709 nr,R,FH leakages CCF CCF identified Identified
7 Forsmark 1 321V33 820701 r,R,FH Int. leakage excluded potential CCF not
m 321V3 820816 r,R,,FH Ext." case CCF candidate identified
I Forsmark 2 323V20* 811021 nr^Rf^FH Incorr.torque potential potential not not
m 323V21* 811021 nr,R,FH swltch settings CCF CCF Identified Identified
* Forsmark 2 322V*0l 820310 nr,R,FH Internal excluded potential not CCF
M
322V30J 820311 nr,R,FH leakages cas CCF identified candidate
10 Oskarshamn 1 322V1 771116 r,.R,FH Incorr.torque actual CCF CCF not
H
322V20 771116 r,A,FH switch settings CCF candidate candidate identified
II Oskarshamn 1 322V1 790318 nr,R,,FH Sticking potential CCF not CCF
322V7 790322 nr,R,FH valves CCF candidate identified candidate
12 Oskarshamn 1 721V27 7*1210 r,RFH No Information excluded CCF CCF not

721V2S 7*1210 r,R,,FH available case candidate candidate identified
13 Oskarshamn 2 321V2 810723 nr,R,FH Internal potential potential not not
321V32 810730 nr,R,FH leakages CCF CCF identified identified
1* Oskarshamn 2 323V3 780628 nr,R,FH Internal potential CCF not not
m
323V1J 78071 nr,R,FH leakages CCF candidate Identified identified
13 Oskarshamn 2 721V2J 770707 r,R,FH Sticking potential CCF CCF CCF
721V26 770703 r,R,FH valves " CCF candidate candidate candidate
16 Ringhals 1 323V3 811103 r.R.^rT External notable CCF CCF not.
m '
323V* 811103 r,R,JFH leakages case candidate candidate Identified
17 Ringhals 1 322V7 791028 rR,FH Incorr.torque not not not CCF
322V 791028 rR.FH switch settings Identifled Identified identified candidate

r ~ > redundant components . * normal operation


nr nonredundant components FH critical failures
R overhaul period iFH noncritical failures

278
2. THE MULTIPLE RELATED FAILURES OF MOTOR OPERATED VALVES

2.1 The data base

The CCFe or MRFs identified in the Nordic CCF-data Benchmark. Exercise


are considered in this paper. The background material was supplied by
ASEA-ATOM. The database comprised 340 failure reports from the
Scandinavian Nucleare Power Reliability Data System (ATV-system,
/16/). The failure reports originated in the following Swedish nuclear
power plants:

Barsebck 1 from period 771001 - 821231


Barsebck 2 " 700101 - 821231
Forsmark 1 " 810101 - 821231
Forsmark 2 " 810701 - 821231
Oskarshamn 1 " 740101 - 821231
Oskarshamn 2 " 760101 - 821231
Ringhals 1 " 761001 - 821231.

The boundary conditions for the CCF-data Benchmark exercise limited


the analysis of multiple failures with one or more possible common
causes. The boundary conditions stated also that possible multiple
failures (i.e. the failures which could have been multiple if the
preventive measures had not been applied) were to be considered. The
multiple failures were not restricted to the multiple failures within
systems, also the intersystem multiple failures should have been
analysed.
The teams, which participated in the CCF-data Benchmark exercise,
applied different methods to identify the dependent failures. In this
paper only the failures which were agreed to be multiple CCFs are
considered. The list of these multiple failures is in table I.

2.2 The multiple failures of motor operated valves

The multiple failures listed in table I were not identified or judged


as common cause failures by all teams, which illustrates the
difficulties connected to the CCF-data analyses. The list of table I
includes all the multiple failure events which bear with them most of
the attributes connected with CCFs: simultaneity or closeness in time,
common causes, common failure mode etc.
The total number of multiple failures considered in the Nordic
CCF-data Benchmark Exercise was 17. All except one of these events
were double failures or double component unavailabilities. Only one
event caused unavailability of three valves.
The number of the multiple events is too high to be explained by
independent failures. This was statistically tested by VTT team in
connection with the CCF-Benchmark exercise /8/. However, the validity
of the above observation depends on the possibly different time
dependency on different failure modes. For example, the internal
leakages are observed only at leakage tests, the interval of which may
be much longer than that of ordinary surveillance testing. This leads

- 279 -
to a higher number of independent multiple failures, if the standby
failure rate of internal leakages is high.
The number of double internal leakages was 7. In addition to this
number one combination of internal and external leakage was found. As
stated earlier, the high number of multiple internal leakages may be
caused by high standby failure rate of these failures and by long test
intervals (the leakage tests are performed once a year). The cause of
internal leakages could not be exactly determined from the information
in the failure reports, and further, the size of the leakage was not
known. Thus we may conclude that some of the internal leakages may be
noncritical and that they may not be dependent failures. However, the
rather high number of these failures can be judged as evidence of
dependence.
Another type of multiple events was the difficulties in opening
or closing of the valves. This failure category included 7 events.
Four of these events were connected to erroneous torque switch
settings which caused too early torque switching and prevented the
opening or closing of the valve. These failures were often caused by
erroneous maintenance actions and are thus real common cause failures.
Their numerous occurrence gives rather strong evidence for this
conclusion.
One of the opening/closing failures was a triple failure of
valves, which prevented the closing of three valves in a real demand
situation. The cause of this event was a human error in the
maintenance of the control system of the valves: erroneous connection
of a jumper prevented the flow of the valve control signals. This
failure is not a common cause failure, because it was caused by a
failure in the support system of the valves. However, all of the minor
support systems are not explicitly modelled in reliability studies.
This gives a reason to include the above failure in the common cause
failure database as a real common cause failure. On the other hand,
the different design prevents failures of this kind and their
inclusion into the CCF-database may lead to overconservatism.
One of the multiple opening/closing failures was a quadruple
noncritical failure causing sticking of valves. The operation of the
valves was not prevented. Owing to its high multiplicity this kind of
failure is always very important. With as low failure rates as in
Swedish nuclear power plants, the purely random occurrence of
quadruple noncritical failure is extremely unlikely, and thus it is
probable that the cause of this event was connected to some dependent
phenomenon. The cause of this event could not be analysed on the basis
of the failure report. It is possible that the event was only a
preventive maintenance action.
One of the events in the above category was a double failure to
open, the cause of which was not stated clearly in the failure report.
It is, however, probable that the cause was connected to the setting
of the torque switches.
The remaining two of the multiple events were connected to
simultaneous maintenance actions during owerhaul period and they
cannot be classified as common cause failures, which are possible
during normal operation.

- 280 -
TABLE II. Plants and their diesel generators

Plant First First Commercial Data Number Manufacturer Manufacturer


criticality syncronization operation collected of DGs of diesel of generators
from from engines

Oskarshamn 1 12.1970 08.1971 06.1972 01.01.1974 2 MTU ASEA


Oskarshamn 2 03.1974 10.1974 12.1974 01.01.1975 2 MTU ASEA
Barsebck 1 01.1975 05.1975 07.1975 01.07.1975 2 MTU ASEA
Barsebck 2 02.1977 03.1977 07.1977 01.07.1977 2 MTU ASEA
Forsmark 1 04.1980 06.1980 12.1980 10.12.1980 4 SACM ASEA
Forsmark 2 11.1970 01.1981 06.1981 10.06.1981 4 SACM ASEA
Ringhals 1 08.1973 10.1974 01.1976 01.01.1976 4 SACM ASEA
Ringhals 2 06.1974 08.1974 05.1975 01.05.1975 4 SACM ASEA
IVO 1 07.1978 09.1978 10.1979 02.09.1978 4 SACM ASEA
TVO 2 10.1979 02.1980 10.1980 18.02.1980 4 SACM ASEA
Loviisa 1 01.1977 02.1977 05.1977 08.02.1977 4 SACM STRMBERG
Loviisa 2 10.1980 11.1980 02.1981 04.11.1980 4 SACM STRMBERG

Diesel manufacturers: Generator manufact urers :

SACM: Socit Alsacienne de Constructions Mcaniques ASEA (Sweden)


de Mulhouse (France)
STRMBERG: Oy Strmberg Ab (Finland)
MTU: Motoren und Turbinen Union (FRG)
The multiple events concerning nonredundant valves or valves in
different systems were analysed in the CCF-Benchmark exercise by the
ASEA-ATOM team 111. Many multiple events were found. Their
characteristics are rather similar to those of the above events. The
rather high number of such events is evidence for dependencies between
the valves in different systems. However, they are very difficult to
be included in the statistical analyses of CCF-probabilities.

3. THE MULTIPLE RELATED FAILURES OF DIESEL GENERATORS AT THE NORDIC


NUCLEAR POWER PLANTS

3.1 The data base

The operating experience of the diesel generators at the Nordic


nuclear power plants was analysed in co-operation between ASEA-ATOM
and VTT at the beginning of the eighties. The study was financed by
the Swedish Nuclear Power Inspectorate and the Finnish utilities IVO
and TVO /ll/.
The goal of the study was the analysis of the diesel generator
operating experience with special emphasis on

- the impact of the frequency of surveillance testing and of


the test procedure
- the contribution of design and manufacturing errors
- the contribution of testing and maintenance errors and the
respective error mechanisms
- the potential or actual dependent failures (common cause
failures).

The operating experiences of all standby diesel generators at the


Finnish and Swedish nuclear power plants were covered. The data base
contained 40 diesel generators with about 150 diesel generator years
and about 4500 diesel generator starts and about 6000 hours of
operation. The failure data consist of the operating experience until
the end of 1981. The plants and the diesel generators considered are
in table II.
The failure data were collected from work orders, ATV reports and
special failure reports and analysed in co-operation with the plant
personnel.
The failures were classified according to many factors including
the failure cause, the failure criticality, the failure location in
the diesel generator system etc. The failure rates were estimated and
different failure models were used to describe the dependency of the
failure probability on testing interval. Also the maintenance
information was analysed in order to obtain repair and repair waiting
time distributions for different failure modes.

3.2 Dependent failures of the diesel generators

3.2.1 Classification of the failure causes. The total number of the


studied diesel generator failures was 436, which included 65 critical

- 282 -
failures (i.e. failures preventing the safety function of diesel
generator) and 371 noncritical or minor failures. The failures were
observed in start tests, load test, real demands or in routine
inspections. The 65 critical failures included two multiple failures.
The failures were classified according to their cause as
dependent and random failures. The failures were classified dependent
when they were caused by some mechanism which could have been common
for all redundant units. The dependent failures were caused by errors
in testing and maintenance, design errors, manufacturing and
Installation errors or by external events. The failures were
classified random when they occurred without any unexpected cause or
when they were caused by normal ageing. The above classification is
somewhat vague due to imprecise failure descriptions and it is always
based on engineering judgements.
The cause of dependent failure is defined "design error" if the
failure is caused by improper design and if the design is changed
after the occurrence of the failure. Thus design errors can not recur
many times.
The failures caused by errors in manufacturing or installation
are also non-recurring because the cause is removed from the system
after failure detection. The failures caused by faulty materials,
faulty manufacturing and unsatisfactory installation are classified
into this category.

TABLE 111. The causes of diesel generator failures

Failure cause Number/pe rcentage Numbe r/pe rcentage


of critical of noncritical
failures failures

Random 30 / 46.2% 264 / 71.2%

Errors in testing
and maintenance 17 / 26.2% 37 / 10.0%

Design errors 11 / 16.9% 36 / 9.7%

Errors in manu-
facturing and
installation 6 / 9.2% 34 / 9.1%

External events 1 / 1.5% 0 / 0%

Total dependent 35 / 53.8% 107 / 28.8

Total 65 / 100% 371 / 100%

283 -
The failures caused by human errors in testing, maintenance and
repair are recurring dependent failures. The failures caused by
external events are also recurring. Sometimes it is very difficult to
make a difference between the above failure categories, and all the
conclusions based on this classification cannot be generalised.
The results of the failure classifications are in table III. The
most dominant failure category is random failures. However, it is
worth noticing that about one half of critical failures are caused by
dependent causes. The human errors in maintenance and testing are the
dominant dependent failure cause. The random failures occurred in all
diesel generator subsystems and they were mainly minor leakages or
failures of electrical devices.

3.2.2 The failures caused by human errors. The human errors in testing
and maintenance were rather frequent and they were thus studied in
more detail. They were further classified into four main categories:
faulty handling, faulty auxiliary activities, omissions and general
carelessness.
The category "faulty handling" refers to errors that originate in
active manipulating with the component and it includes erroneous
actions, imprcisions, careless handling of components etc. This
category is divided further into three subcategories according to the
phase during which the error has been made: testing, maintenance and
calibration.
The category "faulty auxiliary activities" refers to failures
caused by human errors made by personnel that does not normally handle
the component. These errors could possibly be prevented by adequate
communication or tests after auxiliary activities.
The failures caused by omissions form a category rather similar
to faulty handling. Instead of faulty handling of the component there
are omissions of some important phases of the work. These failures
could be prevented by careless restoration of the component after each
operation.
The category "general carelessness" refers to failures in
connection of which there are symptoms of improper care of the
component such as dirty parts, loose connections etc. These failures
might, to some extent, be interpreted as inadequate inspecting of the
status of the component during the test.
The results of the human error classification are in table IV.
The dominant failure category is "faulty handling". It is worth
noticing that one of the double common cause failures was caused by
human error the cause being in the category "faulty handling in
testing". The other human error categories caused about 30% of human
related failures. The distribution of the human error events was
rather similar to both critical and noncritical failures.

284
TABLE IV. Classification of human errors la testing
and maintenance

Error category Number of Number of Total


critical noncritical
failures failures

Faulty handling

- in testing 4 2 6
- in maintenance 4 10 14
- in calibration 3 15 18

Faulty auxiliary
activities 1 - 1

Omissions 1 2 3

General careless-
ness 4 8 12

Total 17 37 54

3.2.3 Other dependent failures. The number of failures caused by


design errors was rather low: about one critical failure/plant and
three noncritical failures/plant. The most typical design errors were
cases in which a failure occurred due to some external event for which
there was no protection. Also inadequate protection against vibration
and other internal phenomena caused failures.
The number of manufacturing and installation errors per plant was
0.5 for critical failures and 2.8 for noncritical failures. The
typical failures in this category were leakages (of cooling water, for
instance) caused by unsatisfactory welding seams.

3.2.4 Multiple diesel generator failures. The data base analysed


included two double critical failures. Their descriptions are:

1) Frequent faulty synchronisation of the diesel generator


with the electrical net caused mechanical failures in
generators

2) Two diesel generators failed to start due to relay failure


which caused a short circuit in the common part of the
protection system of the diesel generators.

- 285 -
The first of the above common cause failures is a genuine common cause
event caused by erroneous testing. After detection of this failure the
procedures were changed in order to prevent the occurrence of this
failure.
The second event is a failure of the supporting system which
leads to unavailability of two diesel generators. It is net an actual
common cause failure and it can be included in the fault tree of the
diesel generator system as an independent single failure. It is,
however, doubtful whether this kind of single failure will be taken
into account in the fault trees having the typical level of detail.
In addition to the above critical common cause failures, some
events involving multiple noncritical failures have been reported.
These events have not caused multiple unavailabilities of diesel
generators, but there are indications of the risk that redundant
diesel generators may be simultaneously disconnected for repair of
noncritical failures due to lack of communication between testing and
maintenance groups.

4. MULTIPLE RELATED FAILURES OF OTHER COMPONENTS AND COMMON


INITIATORS

4.1 Common cause initiators

The common cause initiators occurred at Nordic nuclear power plants


have not been studied extensively. In a study concerning the
operational disturbances at Swedish nuclear power plants several
different events or chains of events have been analysed /13/. Some of
the events might be classified as common cause initiators. Their
analysis in this respect would require further work which could be
very useful.
The common cause initiators have been considered to some extent
in the Swedish PSAs and in this connection their frequency is also
estimated /14/.
The abnormal events at Nordic nuclear power plants are reported
by the utilities and safety authorities in many different reporting
systems. Many of the events included in these reports may be
classified as common cause initiators. The detailed analysis of these
events is not in the scope of this paper.

4.2 Other multiple related failures

The Nordic operating experience at component level is not analysed


extensively in order to identify multiple or common cause failures.
Some dependent events may be found from the reports concerning
abnormal events at the nuclear power plants. These reports are not
detailed enough to be used as basis for identification of common cause
failures.
It is clear that some dependent failures have occurred. Their
identification requires extensive studies, such as the work done in
the Nordic CCF-Benchmark exercise. In the connection of the Nordic

- 286 -
PSAs Chis kind of work, has been started. The work done is, however,
insufficient to be used as basis for CCF-database.

5. CONCLUSIONS

The multiple related failures considered in this paper have been


failures of motor operated valves and diesel generators at the Nordic
nuclear power plants. The operating experience of those components has
not Included many actual common cause failures, but some clear
indications of dependence between failures have been observed. The
events have not been numerous, which reflects two facts: on the other
hand the Nordic nuclear power plants have operated rather well
compared with other plants and, on the other hand the analyses of
operating experience focused on dependent events may not have been
extensive enough. Also there may be some events which have not been
reported in a form required in the identification work.
The common cause failures of other components are not extensively
analysed. This is a serious deficiency because CCFs play an important
role in PSAs and other reliability analyses and they have an essential
impact on the plant safety. Accordingly, such work is recommended. The
analyses can be performed using the methods studied in the Nordic
CCF-Benchmark exercise.
The qualitative analysis of events gives an essential insight
into the dependent failure mechanisms. Such analyses have not been
done in the Nordic countries to larger extent. Nowadays, methods for
both qualitative and quantitative analyses already exist and they can
be applied in practical work. However, the methods are not very well
known to personnel having knowledge about the components and their
function. Much work is to be done in order to increase the usefulness
of the methods reasonable in themselves .

REFERENCES

1 . H i r s c h b e r g , S. ( e d . ) NKA-Project "Risk A n a l y s i s " (RAS-470). Summary


r e p o r t on Common Cause F a i l u r e Data Benchmark E x e r c i s e , F i n a l R e p o r t ,
ASEA-ATOM, V s t e r s , Sweden, June 1 9 8 7 .

2 . Bengtz, M., B j r e , S . , H i r s c h b e r g , S. NKA P r o j e c t Risk A n a l y s i s


(RAS-470), I d e n t i f i c a t i o n of Common Cause F a i l u r e E v e n t s f o r Motor
Operated V a l v e s i n Swedish B o i l i n g Water Reactor P l a n t s . F i n a l R e p o r t ,
RAS-470(86)4, (AB ASEA-ATOM Report RPA 8 6 - 4 0 ) , J a n u a r y 1986.

3 . Bengtz, M., H i r s c h b e r g , S. NKA-Project Risk A n a l y s i s (RAS-470),


Benchmark E x e r c i s e , Phase 2: Q u a t l f i c a t i o n of Common Cause F a i l u r e
C o n t r i b u t i o n s . F i n a l R e p o r t , RAS-470(86)8 (AB ASEA-ATOM Report RPA
8 6 - 1 6 0 ) , J u l y 1986.

287 -
4. Dinsmore, S. Common C ause Events Identification from Plant Data.
Final Report, RAS470(85)11 (Studsvik Report/NR85120), February
1986.

5. Dinsmore, S., Prn, K., C C F Quantification from Plant Data. Final


Report, RAS470(86)10 (Studsvik. Report/NP86/72), September 1986.

6. Pulkkinen, U , Jrvinen, J. Identification of C ommon C ause Failure


Events, Phase 1. Final Report, RAS470(86)5 (VTT/SH 35/85), March
1986.

7. Pulkkinen, U., Jrvinen, J. C lassification of C ommon C ause Failure


Events, Phase 2. RAS470(86)6 (VTT/SH 15/86), March 1986.

8. Pulkkinen, U., Jrvinen, J., Huovinen, T. Quantification of C ommon


Cause Failure Events, Phase 3. RAS470(86)7 (VTT/SH 15/86), November
1986.

9. Kongs, H., Martinez, G., Petersen, K.E., NKA Benchmark Exercise


on C C Fdata. Identification Phase. Final Report, RAS470(85)11, Ris,
June 1986.

10. Kongsi, H., Petersen, .E., NKAproject Risk Analysis, RAS470,


Benchmark Exercise, Phase II, Quantification of C ommon C ause Failure
Contributions. Final Report, RAS470(86)11, Ris, November 1986.

11. Pulkkinen, U. et al, Reliability of Diesel Generators in the


Finnish and Swedish Nuclear Power Plants. Report VTT/SH 7/82.

12. Hirseberg, S., Pulkkinen, U. 'C ommon C ause Failure Data:


Experience from Diesel Generator Studies', Nuclear Safety, Vol.
26, No. 3, MayJune 1985.

13. Laakso, K., A Systematic Feedback of Plant Disturbance Experience


in Nuclear Power Plants, Helsinki University of Technology, Department
of Mechanical Engineering, Otaiiiemi, 1984 (In Swedish).

14. Hirschberg, S., Retrospective Analysis of Dependencies in the


Swedish Probabilistic Safety Studies. Phase I: Qualitative Overview,
RAS470(87)4.

15. Bengts, M., Hirschberg, S., Retrospective Analysis of Human


interactions in the Swedish Probabilistic Safety Studies. Phase I:
Qualitative Owerview, RAS470(87)5.

16. Ekberg, K. et al., The ATVsystem and Its Use. ANS/ENS


International Topical Meeting on Probabilistic safety Methods and
Applications, San Fransisco, C alifornia, U.S.A. February 24 March 1,
1985.

288
THE USE OF ABNORMAL EVENT D ATA FROM NUCLEAR POWER
REACTORS FOR D EPEND ENT FAILURE ANALYSIS

H.W. Kalfsbeek
Commission of the European Communities
Joint Research Centre
21020 Ispra (Va)
Italy

ABSTRACT. The Abnormal Occurrences Reporting System (AORS) of the Commis


sion of the European Communities is a databank which contains homogenized infor
mation on safety related events occurred in nuclear power plants from various coun
tries in Europe and the United States of America, covering about 780 years of reactor
operation from the years 1969 to 198S.
This paper mentions the use and analysis potentiality of this information system, in
particular with respect to the support it gives to incident analysts and PSA (Probabil
istic Safety Assessment) practitioners. As an example of this the attention is focussed
on a particular search procedure identifying various types of dependent failure cases.
Thus it is illustrated how one can use the incident data during the modelling stage of a
PSA study, or more generally, in the course of any type of incident analysis. The re
trieved reports might yield valuable information for upgrading the completeness of the
system, subsystem and component models in terms of failure modes and effects, fault
propagation paths and unforeseen system interactions. They might also be useful for
identifying the extra information needed if one wants to estimate dependent failure
model parameters.
The search procedure presented in this paper exploits a feature that renders the AORS
unique amongst the international safety related event databases, namely the codifica
tion and storage of causal sequences extracted from each event report.

1. INTROD UCTION

The Joint Research Centre (JRC) of the Commission of the European Communities
initiated in 1978 a project for creating an information system on operational experi
ence of nuclear power plants (NPPs), the European Reliability D ata System (ERD S).
After a feasibility study the design of this system started in 1980 [] and since 1984
most of it is operational.
The ERD S has been set up as an integral tool for the feedback and use of past
operating experience in the assessment of safety and operation of NPPs by various in
terested organizations such as utilities, constructors and safety authorities.

289
A. Amendola (ed).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 289-302.
61989 ECSC. EEC, EAEC, Brussels and Luxembourg.
The system is structured into four separate databanks, three of which contain
event data, the fourth one being dedicated to the organization of reliability parameter
data. The three event databanks are:
- Operating Unit Status Reports (OUSR), collecting data on productivity and outages
of the NPPs;
- Abnormal Occurrences Reporting System (AORS), collecting information on safety
related events in NPPs;
- Component Event Data Bank (CEDB), collecting information on the failure and
operation of safety significant components in some NPPs.
The fourth bank, of different nature and not yet fully implemented is the:
- Reliability Parameter Data Bank (RPDB), collecting reliability parameter data for
homogeneous classes of components, deriving from operational experience, labora-
tory tests, literature etc.
These databanks have logical couplings, in addition two other databanks con-
taining reactor unit design data are connected to this system, namely the Power
Reactor file, one of the databanks in the Power Reactor Information System (PRIS) of
the International Atomic Energy Agency in Vienna, and the Engineered Safety
Features and Auxiliary Systems (ESFAS) file. The latter databank contains information
on the philosophy and layout of the emergency systems and their support systems in
the NPPs. This data is important for assessing the severity of the abnormal events as
stored in the AORS.
In the present paper only the AORS is briefly presented and discussed. Descrip-
tions and presentations concerning the other parts of the ERDS can be found else-
where, see for instance [2], [s] and [4]. Also, no detailed discription will be given here
since this information can be found in the AORS Handbook [5]. Section 2 has been
added because it is useful for understanding the content of the paper.
The AORS collects information on safety related abnormal events from NPPs in
Europe and the USA. The data input for the system mainly originates from the various
national reporting organizations, in different languages and with different reporting
schemes and criteria, consisting in general of all the safety related operational events
which occurred in the participating plants, that by law have to be reported to the
competent safety authorities in the respective countries. In the AORS all this informa-
tion is merged together [e], homogeinized both in language (English) and content,
codified and stored in a databank which is accessible through the international data
communication networks.
The main theme of the present paper is a search procedure for identifying depen-
dent failure cases, as an illustrative example of the AORS potential for use and analy-
sis. In section 3 an outline of this procedure is given. This type of data retrieval aims
to support the incident or PSA analyst in upgrading the completeness of his insights
and system models by exploiting past operating experience to a maximum extent.
In section 4 an example application of the procedure is described, where as sub-
ject of investigation the mechanism(s) leading to multiple failures of breakers in power
distribution systems and electric power systems in any type of reactor design, is taken.
Also the impact of such failures on safety grade systems in the plant is treated. This
area has been chosen in view of the Sequence Analysis Benchmark Exercise presently

- 290 -
in progress []. Finally, the summary can be found in section 5, together with some
concluding remarks.

2. GENERAL D ESCRIPTION OF THE AORS

The AORS collects in a unique format all safety relevant events from NPPs as record
ed in the participating countries. The system has been setup with the specific objec
tive of providing an advanced tool for a synoptic analysis of a large number of events,
identifying patterns of sequences, trends, precursors to severe incidents, etc. Formally
stated the principal aim of the system is the collection, assessment and dissemination of
information concerning events that have or could have consequences for the safe
operation of NPPs; the system intends to:
homogeinize national data in one unique databank in order to facilitate the data
search and analysis;
be a support for safety assessment both by utilities and safety authorities;
feed back relevant operational experience and the lessons learned from it to the in
terested parties;
facilitate the data exchange between the various national reporting systems and utili
ties.
The AORS databank has been designed to reproduce the abnormal event informa
tion in a manner which is particularly tuned to the needs of the incident and safety/
reliability analyst. More specific, for each abnormal event not only a narrative de
scription is available, but also a coded sequence of socalled occurrences. These are the
basic elements (steps) into which the event can be decomposed and which are usually
the individual component failures, degraded (sub)system states or faulty human inter
ventions (or lack of interventions) determining together the course of the event. Also
the causal and sequential relationships between these occurrences are codified and
available for use.
An event databank thus structured has an increased potentiality for use and
analysis over the classical, pure narrative event databanks as currently in operation in
the international nuclear safety scene. The large, intrinsicly homogeneous and complete
data sample in the AORS gives the possibility to perform trend and pattern analyses,
comparative studies of (time dependent) plant behaviour and statistical frequency esti
mations. Moreover, the coding scheme, i.e. the various free text parts of the AORS
event reports combined with the occurrence sequence coding, allows in conjunction
with the previously mentioned characteristic for more sophisticated types of analysis,
such as the study of precursors to severe incidents, dependent failure analyses, human
behaviour assessment and sequence combining as a support for predictive (probabil
istic) modelling. In Ref. [] an overview is given of the various analysis possibilities
and applications.

The AORS Reference Format

The data sources of the AORS form an inhomogeneous set of safety related operating
experience collections. In order to homogeinize this information a manual data proces
291
sing scheme is applied, where the original reports (and where possible additional
sources) are screened thoroughly by hand for extracting the items featuring on the
AORS Reference Format, which is the key for unification of all the input information.
This format is treated exhaustivily in Volume II of the AORS Handbook [s]. It consists
of three types of sheets, the first one serves to collect general event and plant infor-
mation, including the event date, reactor identification, reactor type and age specifica-
tion, event category, classification of the type of transient (if applicable), operating
conditions at the onset of the event, consequences of the event (on plant operation,
radioactivity releases etc.) and a few lines of free text describing the safety signifi-
cance of the event. The second sheet contains the narrative event and sequence de-
scription. Per AORS report only one of these two sheet types is compiled. The third
type of sheet however collects data on the individual occurrences identified in the
event sequence. Hence from this sheet more than one may be present per AORS
report. We refer to the AORS Handbook [5] for a detailed description of all the items
on the Reference Format.
In the context of the present paper it is useful to review in somewhat detail the
items of the occurrence sheet.
Upon screening of all the available information, a chain of occurrences is de-
fined, describing the sequential and causal course of the abnormal event. There exists
no strict definition of what is an occurrence. They must represent the separate equip-
ment faults, operator errors, faulty or degraded system or subsystem states, insofar
discernable and relevant for the evolution of the event. Also, each occurrence can be
linked to one or more other occurrences within the sequence, employing two types of
links, namely purely sequential or causal. For each occurrence the following informa-
tion is compiled:
- title of the occurrence: a two-line narrative indication stating what the occurrence is
to represent, including what failed, in what mode and with what consequences;
- emergency system(s) responding (automatically or manually activated) as a conse-
quence of the occurrence (5 entries possible);
- location of the failure ('failed system'): the system in which the failing equipment is
located or the system affected by the failure (1 entry per occurrence);
- the failing item ('failed component'): the equipment failing or affected by the faulty
human action, or the degraded system or subsystem if none of these is identifiable
(1 entry);
- failure mechanism and causes, including dependency (if any) identification ('cause
of failure', 2 entries possible);
- consequences and effects of the occurrence ('effect on system/component operation',
2 entries);
- detection mode ('way of discovery', 2 entries);
- corrective (remedial) actions ('action taken', 3 entries);
- linking with other occurrence(s) in the sequence (10 entries).
Except for the title all these items are reported by means of codings. The code
dictionaries are given in the AORS Handbook [B] where also some guidance as to the
use of these codes is included. Here it is useful to describe in somewhat more detail
how the cause of failure item has to be interpreted.
There exists no strict separation of failure mechanisms into 'direct' or 'observed'
- 292 -
causes and 'underlying' or 'root' causes. Often it is not possible to extract the necessary
information from the original event reports for making such refined distinctions.
Usually only the direct cause is reported, hence in most cases also the AORS reports
bear only this information. The coding is then straightforward through one single oc-
currence in the AORS report. In the case that more detailed information is available
for establishing an AORS report, root causes may be reported as well. This can be
realized for the more simple cases necessitating only one occurrence in the AORS
report by means of the second cause entry. Otherwise, root cause mechanisms are re-
ported by means of a chain ( 2 or more) of causally related occurrences. The final
occurrence of such a train describes the end effect of the failure, with as cause of
failure the observed cause. Also the intermediate occurrences ( steps in the failure
process) have as cause of failure only the observed one, because these occurrences
themselves embody as a whole the root cause of the next occurrence in the chain, and
their root causes are represented by the preceding occurrence(s). The initiating occur-
rence of such a causal chain features in one of the cause of failure entries its root
cause, should this information be available in the original event report.
In summary, the definition of a series of occurrences within an abnormal event
creates searchable aggregate entities, beyond the classical level of simple codings. This
feature of the AORS is unique amongst the international abnormal event/incident
databases. It creates the possibility for optimal exploitation of past operating ex-
perience.

3. THE DEPENDENT FAILURE SEARCH PROCEDURE

The type of data retieval practice described in this section is justified by recognizing
the fact that (partial) sequences taking place in different plants, of different design
and under different conditions, still may yield valuable insights for the plant under
study, provided that the generic issues from each sequence allow for applicability or
transferability (e.g. human error impact, dependent failure mechanisms).
In order to understand better the search procedure outlined below we first explain
how dependent/multiple failures are handled within the AORS coding scheme.
Dependent/multiple failures (sometimes named common cause or common mode
failures) are in the context of the AORS event reporting all those instances where
(similar) pieces of equipment failed (simultaneously) due to the same underlying cause
mechanism. This cause could be either internal to the equipment ('pure' common cause
cases) or be external, i.e. the failure of some other equipment, an environmental factor
or a human action, involving multiple equipment. Any of these cases would be marked
with the special dependency indicator, the occurrence cause code value 'CSX'.
In order to classify the cases as internal or external cause it is necessary to read
the complete event report, there exist no separate codified representations of the
various possibilities. However, some breakdown of the retrieved material can be estab-
lished automatically by defining 3 classes of 'CSX' cases from the AORS databank:
- event sequences comprising at least one occurrence that has the value 'CSX' for one
of its cause entries and that has not been caused by some other occurrence(s) within
the sequence. These cases have the higher possibility of being 'pure' common cause.

- 293 -
If that is the case, the 'root' cause of the dependent failure mechanism is reported
by means of the second cause of failure entry in the coding of the occurrence
responsible for the selection of the sequence.
- sequences with at least one occurrence marked with 'CSX' that has been caused by
another occurrence(s) within that sequence which itself is no common cause failure
(otherwise the event would belong to the previous class). Here the pure cases are less
probable, however these could still be present if their root causes are reported by
means of a separate occurrence or a sequence of occurrences (see section 2).
- sequences where there is at least one occurrence that causes more than one other oc-
currence irrespective of any CSX indication. These would be the typical external
cause cases.
The output from a databank search looking for the cause code 'CSX' in any of the oc-
currences split according the above 3 classes (and a fourth class containing mixed
cases) enables the user to digest the retrieved material more effectivily.
We see that the sequence coding in the AORS is the key for a search procedure
that identifies all the event reports bearing relevant information on dependent failures.
Furthermore the retrieved material is processed and presented in such way that the
analyst can use it easily in checking his plant system and component model for any
qualitative aspect not yet incorporated, or for any other purpose.

Computer programs for dependent failure searches

As described in more detail in [8] there exists a library named AORSLIB containing
computer programs in NATURAL (a fourth generation programming language) that
serve for AORS users as basis for performing more or less standardized searches and
analyses and for deriving user-specific applications.
Two programs, CCF and CCF-SS, may be used for developing applications where
dependent failures of specific characteristics are retrieved and processed. The program
CCF retrieves from the AORS databank four disjunct sets of reports according the
above mentioned grouping and prints for all retrieved cases the full report contents.
The program CCF-SS also first retrieves the four disjunct report sets but then
isolates from these those cases where the dependent failures have had impact on safety
grade systems in the plant. The full contents of these reports are then printed out,
sorted by affected safety system and search class. It is anticipated that safety analysts
are preferably interested in this type of dependent failures.

4. EXAMPLE APPLICATION OF A DEPENDENT FAILURE SEARCH

The scope of this section is to illustrate how a dependent failure search procedure is
applied. The application presented below was not part of an incident analysis activity,
nor of some PSA study. Therefore little attention is paid to the technical conclusions
one could draw from the output. As a matter of fact the screening results would
depend strongly on the scope and the boundary conditions of the study setting the
framework for a search of this type. However, the cases retrieved from the databank
could be relevant in the frame of the present accident sequence analysis Benchmark
- 294 -
Exercise conducted by the JRC-Ispra [7]. In this study a loss of power accident is in-
vestigated, so as an example of a dependent failure search we have chosen common
cause failures with breakers of any type in power transmission systems and in elec-
trical power systems. All systems in class G (power transmission) and class H (electric
power) of the ERDS Reference System Classification (see Vol. Ill of the AORS Hand-
book [6]) are considered:
Power transmission systems: power transmission system general, generator system, main
bus duct system, main transformer system, auxiliary transformers system, back-up
auxiliary transformers system, switchyard to station high voltage connection.
Electric power systems: electric power system general, medium voltage system, low
voltage system, vital instrument and computer AC system, on-site DC system, emer-
gency power supply system, electrical heat tracing system, lighting and taxed motive
power system, security system, communication system, cathodic protection system and
grounding system.

General search results

From the earlier mentioned NATURAL program CCF the program was derived that
retrieves the four disjunct groups of (potential) dependent failure cases involving
breakers and related equipment in the above systems. The numbers of reports retrieved
in each class are given in Table I below.

TABLE I - Numbers of retrieved reports.

Search class Number of cases

Power transmission systems


A. CSX not due to some causing occurrence 10
B. CSX due to some causing occurrence 4
C. one occurrence causing more than one other 4
occurrence
D. mixed cases 0
Total for power transmission systems 18

Electric power systems


A. CSX not due to some causing occurrence 96
B. CSX due to some causing occurrence 18
C. one occurrence causing more than one other 50
occurrence
D. mixed cases 11
Total for electric power systems 175

The complete listings off all 203 event reports have been screened by hand in
order to retain only the possibly relevant cases and to discriminate between the pure

- 295 -
(internal) common cause cases and the external cases. To that end, during this
screening the reports have been assigned to one (or more) of the following classes:
"internal technical", i.e. the dependent failure mechanism described in the report
is of mechanical, hydraulical, electrical etc. nature and plays within the boundary
of the affected components. Obviously, the root cause of such "technical" pro
cesses may be environmental or "human" (bad design, construction, maintenance,
repair etc.) but in the report no explicit indication of this is given.
IE "internal environmental", indicates mechanisms within the component boundary
such as fouling, ingress of foreign materials, ambediental effects (temperature,
humidity, chemical reactivity etc.).
IH "internal human", where there is a clear reference to some human deficiency as
root cause of the failure process inside the component boundary, such as design
fault, manufacturing problem, wrong choice of materials or subcomponents etc.
ET "external technical", where the multiple failure mechanism originates outside the
component boundary and is of "technical" nature, see class IT.
EE "external environmental", i.e. a mechanism operating from outside the component
boundary and of ambediental nature, such as lightning, fire, flood, tornado,
smoke, vibration, shocks etc.
EH "external human" where multiple component failure results from erroneous human
behaviour outside the component boundary.
In Annex I for each of the above categories some illustrative examples are given,
as found amongst the 203 retrieved reports.
The effectiveness of the search procedure as to automatically discriminating be
tween internal and external dependent failure mechanisms may be learned from Table
II, where for the power transmission systems and the electric systems together a cross
tabulation is shown of the design search classes A through D (see above) and the target
classes IT through EH.

TABLE II Screening results and search classes.

A C D Tot.

IT 16 0 10 0 26
IE 6 1 2 0 9
IH 22 1 2 2 27
ET 2 14 14 3 33
EE 5 4 5 0 14
EH 51 4 15 6 76
Tot. 102 24 48 11 185

It can be seen that the search result did yield some nonrelevant cases, these stem
from the broad definition of "circuit breakers" used in the search algorithm and for

296
search class C, with nonmultiple failures. Furthermore, the majority of "internal"
cases, 44 out of 62 (71%) is indeed found in search class A, but the majority of class
A reports is concerned with "external" cases (58 out of 102, 57%). This is due to the
coding of human errors in the AORS, which is according the activity going on during
which the error occurred. Hence this is usually not delt with by introducing a separate
occurrence describing the erroneous behaviour. Looking at search classes and C
together, we see that indeed the majority (56 out of 72, 78%) concerns external cases.
Finally we remark that the "external human" cases are the most numerous, 76 out of
the overall total of 185 (41%).
In conclusion, the reports thus grouped constitute an interesting set of real life
examples of a variety of dependent failure mechanisms with circuit breakers and could
form a source of inspiration for the safety/incident analyst who has to model/analyze
this type of events.

Impact on safety grade systems

From the NATURAL program CCFSS (see section 3) in the AORSLIB analysis pro
gram library the module was developed that analyses the above type of dependent
failures according to their impact on safety grade systems. The list of such systems as
defined in the AORS is appended as Annex II, their functional descriptions, interfaces
and boundaries are given in Vol. Ill of the AORS Handbook [B]. The setup of the
module is simple. First the 4 disjunct report sets (according search classes A through
D) are established. Then a loop is started wherein for all AORS defined safety systems
each subset is checked for the presence of event reports featuring in the sequence of
occurrences at least one occurrence that was caused by another occurrence and with
the current safety system as failed system. These reports are then printed out,
preceded by the current safety system and the search class.
Two remarks concerning this procedure should be made here. First, the printed
reports are not automatically all relevant because the failure in the safety system can
also be caused by another fault reported in the sequence which has no relation with
the dependent breakers failure. Secondly, the same report may be printed more than
once if failures within different safety systems are observed.
The numbers of reports printed for each class and for each safety system are
listed in Table III. Here the results for power transmission systems and electric power
systems have been pooled together.
An enduser of the procedure interested in a certain safety grade system has easy
access to the (possibly) relevant material, given that the original search results (the 203
reports distributed in search classes A through D ) have been postprocessed in this
manner.

5. SUMMARY AND CONCLUSIONS

The background, design and structure of the Abnormal Occurrences Reporting


System (AORS) of the Commission of the European Communities has been highlighted.
An indication has been given of the methods available for use of the databank. It has

- 297 -
TABLE III - Safety systems in search classes.

D (search class)

tot. syst.
2 A08 1 1
11 BOO 3 - 5 3
8 B03 2 - 3 3
2 B04 2
4 BIO 1 - 3 -
1 Bll 1
1 B12 1
4 B13 1 - 3 -
3 B14 1 - 2 -
8 B15 1 - 6 1
4 B16 1 - 1 2
2 B18 1 1
1 B20 1
3 B21 3
4 B22 2 - 2 -
11 B23 1 - 8 2
2 B36 2
7 C05 1 - 5 1
8 C07 2 - 5 1
1 C12 1 - - -
1 H04 1
37 H05 24 4 8 1
5 LOI 3 - 2 -
6 L03 3 1 2 -

been mentioned that the existence of large, intrinsically homogeneous, complete


samples of data in the AORS gives the possibility to perform trend and pattern analy-
ses and statistical estimates of frequency of occurrence of classes of events. Further-
more, the particular data processing and storage enable a variety of more sophisticated
types of analysis including precursor searches, dependent failure classification and
human behaviour impact studies.
Particular attention has been paid to the retrieval of dependent failure cases. The
search method is based on the decomposition technique as applied to AORS data,
whereby each reported abnormal event is represented by a sequence of occurrences,
and it has been pointed out how event causes are reported within this frame.
The search programs have been applied to circuit breakers of any type in power
transmission systems and electric power systems. A valuable set of reports resulted,
which might be relevant in the frame of the JRC Benchmark Exercise on Event
Sequence Analysis.
By this example, it has been demonstrated that the AORS is a powerful tool for

- 298 -
exploiting safety related operating experience in NPPs, both for operational purposes
and for safety assessments.

References

[1] G. Mancini et al., 'ERD S: An Organized Information Exchange on the Operation


of European Nuclear Reactors', Proc. Int. Conf. on Nuclear Power Experience,
IAEA Vienna, 1317 September 1982.
[2] J. Amesz et al., 'The European Reliability D ata System: Main D evelopments and
Use', Proc. ANS/ENS Int. Topical Meeting on Probabilistic Safety Methods and
Applications, San Fransisco, 24 February1 March 1985.
[s] F. Cattaneo et al., 'Analysis of Operational Experience in Nuclear Power Plants by
the European Reliability D ata System ERD S', Proc. IAEA Int. Conf. on Nuclear
Power Performance and Safety, Vienna, 28 September2 October 1987.
[4] S. Balestreri et al., 'Use of CED B for PSA', Proc. Int. SNS/ENS/ANS Topical
Conf. on Probabilistic Safety Assessment and Risk Management, Zuerich, 30
August4 September 1987.
[6] 'Abnormal Occurrences Reporting System Handbook', Commission of the Euro
pean Communities, JRCIspra Establishment, Internal document, February 1986.
[] H.W. Kalfsbeek et al., 'Merging of Heterogeneous D ata Sources on Nuclear Power
Plant Operating Experience', Proc. Fifth EuReD ata Conf., Heidelberg, 911 April
1986.
[7] A. Poucet, 'Event Sequence Reliability Benchmark Exercise, Summary Record of
Meeting , Commission of the European Communities, JRCIspra Establishment,
Internal document SCA1006, D ecember 1987.
[8] H.W. Kalfsbeek, 'The Organisation and Use of Abnormal Occurrence D ata', Proc.
Ispra Course Reliability Engineering, Madrid (Spain), 2226 September 1986,
Reidei Pubi.

299
ANNEX I - Examples of retrieved cases per screening class.

IT - internal technical factors


Breakers not actuating due to spring-load clutches failure.
Internal breaker contacts fail to close rendering a DC control power line unavailable.
Movable arc contact cracking in several 4 kV breakers.

IE - internal environmental factors


Damp in electrical connections failing circuit breakers.
Breakers trip and restart failure due to binding of the trip mechanisms.
Breakers with loose internal leads due to vibration induced by breaker operation.

IH - internal human factors


Breakers trip due to excessive heat build up in housing: design error.
Breakers not functioning due to interaction between electrical and mechanical design.
Contacter assembly in breaker damaged due to arcing because of too small contactor
size.

ET - external technical factors


Cascading breaker trips initiated by spurious solid state relay card malfunctioning.
Multiple breaker failure to load shed due to undervoltage relay failure.
Two breakers tripping due to spuriously activated current relay.

EE - external environmental factors


Spurious breakers operation by severe electrical storm.
Loss of offsite power by hurricane depositing salt on switchyard equipment.
Two breakers tripped in the switchyard due to gallopping conductors on the transmis-
sion line.

EH - external human factors


Turbo generator output breakers not opened due to mispositioned limit switch.
Breakers not returned to normal open position upon test due to procedural deficiency.
Improper breaker alignment for HPSI due to failure in following a written procedure.
Breakers left open upon maintenance rendering one RHR train and one core spray
train unavailable.

- 300
A N N E X II - AORS safety grade systems.

A04 PRESSURIZIN G SYSTEM (PWR)


A08 CONTROL R O D SYSTEM (PWR/G CR)
A09 CONTROL R O D SYSTEM (BWR)
BOO ENG I N E E R E D SAFETY F E A T U R E S
B03 C O N T A I N M E N T SPRAY SYSTEM (PWR/BWR/PHWR)
B04 C O N T A I N M E N T ISOLATION SYSTEM
B05 C O N T A I N M E N T PRESSURE SUPPRESSION SYSTEM (BWR)
B06 PRESSURE RELIEF SYSTEM (PWR/PHWR/G CR)
B07 H Y D RG O E N V E N T I N G SYSTEM (PWR/BWR)
B08 POST-ACCIDENT CONTAINM.ATMOSPHERE MIXING SYS.(PWR/BWR)
B09 C O N T A I N M E N T G AS CONTROL SYSTEM (PWR/BWR)
BIO A U X I L I A R Y FEEDWATER SYSTEM (PWR/G CR)
Bl 1 REACTOR CORE ISOLATION COOLING SYSTEM (BWR)
B12 E M EG R E N C Y BORATION SYSTEM (PWR)
B13 S T A N D B Y LIQUID CONTROL SYSTEM (BWR)
14 R E S I D U A L H E A T R E M O V A L SYSTEM (PWR)
B15 R E S I D U A L H E A T R E M O V A L SYSTEM (BWR)
B16 HI
G H PRESSURE C O O L A N T INJECTION SYSTEM (PWR)
B17 A C C U M U L A T O R SYSTEM (PWR)
B18 LOW PRESSURE COOLANT INJECTION SYSTEM (PWR)
19 N U C L E A R BOILER OVERPRESSURE PROTECTION SYSTEM (BWR)
B20 HI
G H PRESSURE CORE SPRAY SYSTEM (BWR)
B21 HI
G H PRESSURE COOLANT INJECTION SYSTEM (BWR)
B22 LOW PRESSURE CORE SPRAY SYSTEM (BWR)
B23 LOW PRESSURE COOLANT INJECTION SYSTEM (BWR)
B24 REACTOR C O N T A I N M E N T SYSTEM (PHWR)
B25 C O N T A I N M E N T PRESSURE SUPPRESSION SYSTEM (PHWR)
B26 M O D E R A T O R D U M P SYSTEM (PHWR)
B27 LIQUID POISON INJECTION SYSTEM (PHWR)
B28 RESIDUAL H E A T R E M O V A L SYSTEM (PHWR)
B29 E M EG R E N C Y WATER SUPPLY SYSTEM (PHWR)
B30 HI
G H PRESSURE COOLANT INJECTION SYSTEM (PHWR)
B31 E M EG R E N C Y CORE COOLING SYSTEM (PHWR)
B33 E M EG R E N C Y SHUTDOWN SYSTEM (G CR)
B34 R E S I D U A L H E A T R E M O V A L SYSTEM (G CR)
B35 E M EG R E N C Y CORE COOLING SYSTEM (G CR)
B36 A U X L I A R Y BOILER SYSTEM (G CR)
B37 F U E L IGN MACHINE E M E R G E N C Y COOLING SYSTEM (G CR)
C05 PRIMARY COMPONENT COOLING WATER SYSTEM
C07 PRIMARY LOADS SERVICE WATER SYSTEM
C08 U L T I M A T E H E A T SINK SYSTEM (PWR/BWR/G CR)
C12 SAFETY EQUIPMENT COMPRESSED AIR SYSTEM
CI3 N U C L E A R FIRE PROTECTION SYSTEM
C21 SHIELD COOLING CIRCUIT (PHWR/G CR)

- 301 -
F06 T U R B I N E BYPASS SYSTEM
H04 ON-SITE D.C. SYSTEM
H05 E M E R G E N C Y POWER SUPPLY SYSTEM
LOO PROTECTION A N D CONTROL SYSTEM
LOI REACTOR PROTECTION SYSTEM
L02 BOP PROTECTION SYSTEM
L03 E N G E N E E R E D SAFETY F E A T U R E S A C T U A T I O N SYSTEM
LIO R E M O T E SHUTDOWN SYSTEM
M14 E M E R G E N C Y POWER SUPPLY BUILDING H V A C SYSTEM
M22 A U X I L I A R Y FEEDWATER PUMPS CHASE H V A C SYST. (PWR/GCR)
N15 BOP FIRE FIGHTING SYSTEM

302
MRF's FROM THE ANALYSIS OF COMPONENT DATA

Humphreys, A M Ganes, J Holloway


National Centre of Systems Reliability
UKAEA
Wigshaw Lane
Culcheth
Warrington WA3 4NE

ABSTRACT. The feasibility of extracting dependent failures data from


component failure data is established.
The process of data extraction is discussed, together with
relevant coding schemes which allow a more concise description of the
dependent events to be provided.
Examples of the application of the process are given by reference
to analysis of data contained within a component event database.

1. A PHILOSOPHY FOR DATA ANALYSIS

There is a great temptation to begin dependent failure data analysis by


looking at a list of DF models, selecting the most promising and trying
to find data that will suit that particular model. Unfortunately,
there is no single model that is universally accepted as the best and
even if such a model did exist would the data available be of the
correct sort?
It is suggested that a much more realistic approach is to firstly
identify the extent and limitations of the data and then find out what
it has to say about related or dependent failures without the analyst
imposing fixed objectives. A fortuitous outcome of this approach is
that a better engineering understanding of related failures and their
influence on the systems can be derived as well as taking the analyst a
step towards quantification.

2. THE IDENTIFICATION OF DEPENDENT FAILURES IN OPERATIONAL


RELIABILITY DATA

Operational reliability data is generally not collected specifically


for dependent failure studies and even when data has been collected
with the intention of using it for these studies, the event recording
is traditional in approach and does not reflect the concept of
dependence directly. For this reason, it has been necessary to devise

303
A. Amendola (ed.).
Advanced Seminar on Common Cause Failure Analysis in Probabilistic Safety Assessment, 303-341.
1989 ECSC, EEC, EAEC, Brussels and Luxembourg.
methods of data analysis so that the dependent failures can be
identified.
Operational reliability data is usually one of two types:
1 Complete scenario descriptions, such as US Licencee Event
Reports (LERs) or Nuclear Power Experience Reports (NPERs) or more
generally Abnormal Occurrence Reports; or,
2 Component failure event data taken from plant maintenance
records.
Abnormal occurrence reports, or Licensee Event Reports (LERs) from
the nuclear industry, give a description of an event, including
operator actions, component failures and other event sequence
information. They refer to serious or potentially serious Incidents
according to some criteria laid down in nuclear safety regulations.
They do not, therefore, cover the entire population of component
failures.
Component failure event information, which is gleaned from
maintenance reports, can give details of all component failures on a
system or plant. This means that a very complete picture of the total
failure population could be derived (in theory). Some dependent
failure models require the total population failure rate, for example
the denominator of the beta factor would require this information.
Unfortunately, component failure data also has some shortcomings.
Firstly, it does not describe accident sequences which means that
human/operator activities are not well represented in the data.
Secondly, it incorporates a species of event which, even in relation to
straightforward failure rate calculations, presents problems. That is,
the data is contaminated with maintenance interventions and it becomes
necessary to distinguish between what is a complete failure during
operation and what is a failure which was prevented from going to
completion because of maintenance intervention or the introduction of
modifications.
Both types of operational data appear to suffer from the same
deficiency which is that full system descriptions are not immediately
available and, therefore, it is almost impossible to determine
components' operating duties and their arrangement in the structure of
the system. Component failure event information will, however, contain
design details of the components, if it is well collected.

2.1 Definition of Multiple Related Failures

Before proceeding further a definition of Dependent Failures is


provided. In order to differentiate between independent, random
failures and Multiple Related Failures, the term Dependent Failure (DF)
is described as:-

Dependent Failure (DF): The failure of a set of events, the


probability of which cannot be expressed as the simple product of
the unconditional failure probabilities of the individual events.
Included in this category are:-

- 304 -
Common Cause Failure (C C F): This is a specific type of dependent
failure where simultaneous (or near simultaneous) multiple
failures result from a single shared cause.

Common Mode Failures (C MF): This term is reserved for


common-cause failures in which multiple equipment items fill in
the same mode.

Cascade Failures (C F): These are propagating failures.

2.2 A Method for the Analysis of Event Reports

The Electric Power Research Institute (EPRI) in the US sponsored Los


Alamos Technical Associates (LATA) to develop a classification scheme
for dependent failures. Their scheme was based on the observation that
there existed no accepted system for classifying any type of multiple
component malfunction event. The method they developed was, therefore
"A method for logically dissecting .multiple component
unavailability scenarios into individual component
unavailabilities" (1)
They defined unavailability as the inability of a component to
perform its intended function (either due to damage or lack of proper
input or lack of support function. Failure and functional
unavailability are two mutually exclusive and exhaustive subsets of
component unavailability.
The scheme is based on cause-effect logic ie each component
unavailability is expressed in terms of a cause and an ultimate
component state:

CAUSE 1 CAUSE 2

EFFECT 1 EFFECT 2
(COHPONENT STATE )

The cause of unavailability may be another component unavailability as


shown above. It is also possible for a single cause to have multiple
effects:

305
EFFECT

CAUSE

CAUSE

EFFECT
Typically, failure root causes might be related to poor design, poor
manufacture, maintenance error, inadequate operating procedures and so
on. Possible component states are

Failed

rV I Functionally Unavailability (Loss of input/preventative


maintenance/calibration/testing)

Potentially failed (Degraded/incipient)

H Potentially functionally unavailable

The relationship between dependent events and cause-effect logic


diagram event categories is illustrated in Figure 1. Generally, all
events which involve two or more interdependent actual or poential
component states are dependent devents of one type or another. The
type of structure of interest, however, is the branched root caused
event:

This is the type of dependent frequently referred to as a "common


cause failure".

306 -
The value of this structured approach is that each event scenario
can be dissected and expressed in a consistent form. There still
remain problems of interpretation regarding what the analyst perceives
as the ultimte root cause or the ultimate effects and so on, as well as
the residual difficulties imposed by lack of certain types of
information in, for example, the Licencee event reports. In a
comparison of the results of the analysis of a set of events by four
different analysts it was shown that differences were due almost
entirely to interpretation which can be overcome by adding some
specific rules and guidelines to the classificatin scheme ( 2 ) . It has
also been suggested that inclusion of post-event actions, system
structure and more detailed cause description in event reporting would
facilitate data analysis.
It has been found, however, that this method Is a very valuable
analysis tool. The scheme (of which there have been two or three
variants) has been applied to LER and NPER data. PIckard Lowe and
Garrick (PLG) applied their version of the LATA scheme to NPERs with a
view to supporting parametric dependent failures models (3).

2.2 A Method for the Analysis of Component Failure Data

Each failure event within a component event data bank relates to a


SINGLE failure of a SINGLE component. Each component failure event can
initially be classified, under the LATA scheme as a linear single
cause-effect logic unit:

O
In order to find dependent events, or branched cause-effect logic
units, the links have to be detected by examination of the event
descriptions.
The common cause or branched events can be identified if two or
more component failure reports have the same recorded or coded failure
cause plus other identical attributes such as mode of failure or
failure descriptor. In addition, a link between events may be inferred
from a combined knowledge of time of occurrence, system configuration,
component location, and so on. The major difficulty is in providing a
systematic framework in which to do this, because when the number of
events is large it is not possible to rely upon an analysts powers of
recall to detect the links.
Another important feature of component failure data is that
it is derived from maintenance records. Maintenance may be
carried out for several reasons:
1 The component may not be functioning and repair may be
required (the repair may be urgent but because of built-in
redundancy the repair can be carried out when convenient).
2 A scheduled maintenance check may be carried out during
which incipient and degraded conditions may be found.

- 307
3 Repair or maintenance of one component may prompt
examination of other similar components.
4 The operation of a component may be degraded. Repair
may be urgent, or, if not, may be carried out when system
operation is not impaired.
5 A component may have a small defect (incipient fault)
but is operating within specifications and, therefore,
maintenance can be carried out when convenient.
Cases 2, 3, 4 and 5 all involve potential failures and these
form the bulk of component failure event data. For this reason,
quite a large proportion of multiple failure events, in
particular branched events, will include potential rather than
actual failures.
It is possible to glean very useful information relating to
multiple failures if the potentially failed components are taken into
consideration.
The discussion can now be extended to deal with the multiple
failure events and the concept of potential common cause failure events
emerges. These events, are vitally important as a basis for defence
measures against dependent failures. It was found during a study of
component failure data from the European Reliability Data System that
events of type

which as we have seen earlier, is the classical 'common cause failure',


are relatively rare and much information can be gleaned inferentially
by taking into account potentially failed components. Consider the
following examples.
A root-caused component failure is a linear event

os
If, when this repair is carried out it is discovered that there
was a design defect, it is Important to check other, similar
components. These components, even though they have not yet failed
(because, perhaps, the revelation is time related) indicate a potential
component cause failure

308
which is a branched single unit root-caused structure. This is a long
way from the idea of catastrophic failure of multiple components.
Consider another case where a previous maintenance error results
in component AX's failure:

o^ AX

AX and BX identical components operating in


parallel redundancy
\ AY
AY
> BY
AY and BY identical components operating in
parallel redundancy

Suppose too that in neither case was the other redundancy


affected. It is possible to describe the whole 'event' in terms of
cause-effect logic as
AX

Even though the failures occurred on different systems. Herein


lies a warning of potential common cause failures on the individual
redundant systems and some move to protect against such an occurrence
is called for.
The examples chosen above related to redundant systems and the
last case to two separate systems. The cause-effect logic diagrams do
not indicate whether or not components are:
- identical
in the same system

309
- in the same plant.
The LATA/EPRI event classification in Figure 1 does include the
cases where components are not identical and permits all components.
It is essential, therefore, that a search of data is not
restricted in terms of mode of failure (that is, whether failure is
catastrophic or incipient) or in terms of time of occurrence.

2.3 Component Failure Sorting

A method has been developed for sorting component failure data based on
a limited number of relevant event attributes. The classification
scheme recognises the possibility of related component failures between
and plants. It has been suggested that data should be sorted and
analysed for different combinations of components as defined in Figure
2.
Ideally, each component/system/plant combination should be
considered but this is a formidable task. Table 1 indicates the
likelihood of finding related component failures on in-system,
inter-system and inter-plant levels according to different failure
causes. This is rather subjective and is open to debate, but it does
give an indication of where, perhaps, the emphasis of analysis should
be.

- 310 -
Event Classification System Categories
Event Classification
Type Name Code Typical CauseEffect
Logic

Independent One Actual or


Potential Component
State
Single Unit LS
oa
Linear

Multiple Units LM

3
Root BSR
Caused

Single
Dependent Two or More Unit
Interdependent
Actual or Potential
Component
Caused
BSC
OD(g
Component States
Branched
BMC
Caused
Component "<5
Multiple

a>S
Unit
Mixed BMM
Causes

FIGURE 1. Relationship between dependent events and logic diagram event categories
PLANT SYSTEM COMPONENT
SELECTION SELECTION SELECTION

All plants or All systems or All components or


combination combination combination of
of plants of systems components

All plants or All systems or One type of


combination combination components
plants of systems

All plants or One type of All components or


combination system combination of
of plants components

All plants or One type of One type of


combination system component
of plants

One plant All systems or All components or


combination combination of
of systems components

One plant All systems or One type of


combination component
of systems

One plant One type of All components or


system combination of
components

One plant One type of One type of


system component

FIGURE 2 FIRST TIER OF MULTIPLE FAILURE EVENT


CLASSIFICATION SCHEME

312
TABLE 1 LIKELIH OOD OF DISCOVERY OF RELATED COMPONENT FAILURES
AT IN-SYSTEM, INTER-SYSTEM AND INTER-PLANT LEVELS

CAUSE IN-SYSTEM (IN-PLANT) INTER-PLANT


INTER-SYSTEM

DESIGN

Requirements / Y Y
Error / Y
Manufacturing / / Y)LIKELY WH EN SAME
Construction / / Y)DESIGNERS &
)CONSTRUCTORS

PROCEDURES

Operation / Y)LIKELY WH EN SAME


Maintenance / / Y)MANAGEMENT OR
Calibration / Y)OWNERS

HUMAN

Procedures / Y)LIKELY WH EN
Misdiagnosis Y Y)COMMON
Accidental X)TRAINING

MAINTENANCE Y)LIKELY WH EN COMMON


)TRAINING OR SAME
)TEAMS

ENVIRONMENTAL Y)IF PLANTS ON SAME


)SITE

INTERNAL

LIKELY LESS LIKELY HIGHLY UNLIKELY

The analyst is more generally interested in related, in-system


component failures on redundant channels for dependent failures
studies. The additional dependencies found in the broader-based search
can, however, be of considerable value ( 4 ) . These dependencies can
provide indicators for potential links and if reliability data is
collected and sorted on a continuous basis the results can help in
performancemonitoring and the detection of trends or growth in
dependencies as a plant ages and faults, such as design faults, are
revealed. For example, inter-plant related failures will tend to arise
when the same designers, managers, site or maintenance teams are
involved, while inter-system related failures depend largely on plant
construction or layout, location of components and applied defences

- 313
where causal mechanisms relating to design, manufacturing and
environment may be seen as the main contributors.
Consider next the determination of the links between failure
events.

2.4 Defining Failure Event Attributes for Finding Related Events

It was demonstrated (4) in the ERDS component event data study that
single failure event attributes such as mode of failure, cause of
failure etc, were not sufficient to ensure that all related failures
were detected.
Firstly, the use of failure data as the basis for sorting out
component failures was attempted.
This initial choice was because simultaneous occurrence of
failures, particularly on redundant systems, is potentially more
far-reaching in effect. The problem is with this that, once again, the
detection of potential 'common cause failure' events is limited. A
common design error for example may only be revealed with passage of
time. If it Is, therefore, necessary to consider time periods, and
examine component failures within those periods, the problem become
infinitely large.1
Combining failure data date with one or more other attributes
would have its advantages In providing a tighter link.
Of course, there are two ways of oing about the sorting on the
basis of failure-date or any other attribute.
1 Select a particular group of components and sort out their
failures on the basis of failure date
or 2 Sort out all failures on the basis of failure date then
select groupings of interest and determine the component details
thereof.
The obvious selection of cause of failure as the main linking
attribute also raises some problems. Suppose that all failures with
cause of failure 'design' are chosen. Not all of the events will
relate to the same design defect (indeed, they may all be different).
It is, therefore necessary to refine further the selection that has
been made. This can be done by
1 Selecting particular component types of groups of identical
components from the initial group of events
and/or
2 Reducing from
- failure date
- failure mode, failure descriptor
1
There are another problem arising from this method which came
about from the inclusion of scheduled maintenance activity. A
large number of events were reported in a short time period
(maintenance period) and obscured any other patterns emerging from
the data. Most of these events are degraded or incipient
component conditions which could have been repaired at any time.

314
- parts repaired/parts failed
- method of detection
the closer links between failures.
It can be seen immediately that more than one attribute is
required to establish a definite link between failure events.
The traditional 'common cause failure' will drop out quite nicely
because we deal with simultaneous failure of Identical components. But
to define this event we already need three attributes:
1 cause
2 time of occurrence
3 a measure of similarity of components.
The potential common cause failures which have been discussed
above require a more inferential approach based on more attributes.
it is relatively easy for the human being to scan some data and
draw out or infer relationships. Difficulties arise however
1 If vast quantities of data have to be analysed.
2 If more than one type of relationship is being sought.
3 If consistency of approach is to be assured.
4 If continuous up-dating of lengthy analysis is required.
Ideally, some form of semi- or fully automatic procedure would be
desirable using computing facilities in order to detect links between
failures.
Thus it is necessary to decide which combination of attributes
should be used. Four attributes, cause, mode, failure date and
identicality were used in the ERDS-CEDB study but this is not
necessarily the only combination. That method was chosen initially and
then, because of the tie required to investigate this, no other
combinations (except using event descriptions instead of mode of
failure) were attempted. Because of the ERDS interrogative procedures
only a semi-automatic procedure for multiple failure event
identification could be devised. The searches were inordinately
time-consuming (4). The second level of the classification scheme Is
shown in Figure 3.
The way of collecting together related failures may be based on a
histogram - select - histogram ... basis. Suppose that failures with
the same cause and same failure date are required. Firstly, a
histogram of failure causes for the entire population has to be
produced. Each existing failure cause is selected in turn, a histogram
is obtained for the failure dates and failure dates attributed to more
than one failure can be identified.
An alternative approach to analysis is to interrogate the data
base for specific combinations which have been chosen. For example it
may be decided that all pumps failing suddenly owing to a design error
should be selected. This is straightforward but has the disadvantage
that the analyst must think up or invent all combinations. This is a
recipe for disorganisation and inconsistency. Multivariate analysis
can be handled for a few variables (4 were used in the ERDS-CEDB
study), but after that more complex codes and techniques become
necessary.
Whatever techniques are ultimately used to Identify dependencies
from the component data, the net result should be a set of dependent

- 315 -
A common cause common cause common cause
common mode common mode common mode GROUP
simultaneous simultaneous simultaneous I
identical elements non-ident elements same element

D common cause E common cause F common cause


common mode common mode common mode GROUP
temp distributed temp distributed temp distributed II
identical elements non-ident elements same element

common cause H common cause I common cause


different mode different mode different mode GROUP
simultaneous simultaneous simultaneous III
identical elements non-ident elements same element

J common cause common cause L common cause


different mode different mode different mode GROUP
temp distributed temp distributed temp distributed IV
identical elements non-ident elements same element

M different cause different cause 0 different cause


common mode common mode common mode GROUP
simultaneous simultaneous simultaneous V
identical elements non-ident elements same element

different cause 0 different cause R different cause


different mode different mode different mode GROUP
temp distributed temp distributed temp distributed VI
identical elements non-ident elements same element

S different cause different cause U different cause


different mode different mode different mode GROUP
simultaneous simultaneous simultaneous VII
identical elements non-ident elements same element

V different cause W different cause X different cause


different mode different mode different mode GROUP
temp distributed temp distributed temp distributed VIII
identical elements non-ident elements same element

FIGURE 3 PROPOSED SECOND TIER CLASSIFICATION SYSTEM


FOR MULTIPLE FAILURE EVENTS

316
failures. The data sets so derived can then be receded and expressed
in the same cause effect logic as discussed earlier.

3. EXAMPLES OF ANALYSIS OF COMPONENT FAILURE DATA

To demonstrate the process, a simple example is now provided. A set of


component failures is examined. It is decided to consider pump
failures on the high pressure coolant injection system of plant X. The
data is sorted so as to find groups of failures which have: common
cause and failure data. It is assumed that "identical'' components are
those which are of the same generic type (that is, the components are
not necessarily identical in every design detail). The following group
of failures are found.

1 PLANT SYSTEM COMPONENT CAUSE MODE DATA CATEGORY


X B18 Pump 06 Sil 070579
X B18 Pump 07 Sil 070579

Further examination of the failure details shows that these


incipient failures were found during a planned maintenance intervention
(hence same failure data and common incipient failure mode).
It was found that an abnormal environment condition (SII), high
humidity caused by leakage of vapour from certain values, had resulted
in rusted bearings. The pumps formed part of the redundant high
pressure coolant injection system and were located in the same place,
hence they suffered the common environment.

ANALYSIS. This was a dependent event ("common cause failure").


There could be some debate as to the component states, however.
The event did not occur during operation or when this emergency
system was called into operation. Thus, the components did not
"fail". Whether they would have failed on demand is a matter of
conjecture. The component states could be regarded as potentially
failed and, using the LATA classification method, the event would
be shown as

PUMP 06

PUMP 07

(where E abnormal environment and D indicates a degraded


condition).

317
An identical sort of the HFCIS pump failures on plant Y may give

PLANT SYSTEM COMPONENT CAUSE MODE DATE CATEGORY


Y B18 PUMP 06 Sil 091079
Y B18 PUMP 07 Sil 091079

These were two incipient pump failures. They were probably


detected during test or inspection. Thus, whilst the components did
not fail during operation or when required, they were incipiently
failed. Because the test failed the system was deemed to be inoperable
and the reactor was shut-down for safety reasons. Abnormal humidity
resulted in water presence in the lubricating oil and resulted in
rusted bearings.

PUMP 07

(Note that these examples imply a common design problem on plants 1 and
2. These examples are similar to a real situation found in component
data (4)).
There is a predominance of potential failures in data for the
reasons stated above but this information is of great engineering value
and the above examples demonstrate how performance might be monitored.
To demonstrate further the kind of post search analysis which
might be carried out some results from the ERDS study are attached.
(Appendix 1). The results relate to the Group A multiple failure
events which were found. These groups of failures involve failures
which common cause, mode and failure date and relate to "identical"
components (in this study identical implied components of the same
generic type eg pumps or motors etc).
It is after this preliminary searching that a more precise
definition of "dependent failure" can be imposed by the analyst. This
definition might be one related to a particular dependent failures
model which is being applied.
There are some residual problems such as what to do with POTENTIAL
dependencies, as described above which are not for the data analyst but
rather the dependent failures modeller to resolve.
To calculate beta factors it might be that only Group A related
events would be considered and, in particular only those involving
catastrophic or sudden failure, for example.

318
4. CONCLUSIONS

The extraction of dependent failures from component failure data which


has been obtained from maintenance reports is feasible.
The component failure data needs to be sorted to establish links
between the failure causes, and the sorting must be performed using a
multi attribute process.
A further phase of analysis is then required to determine actual
and potential dependencies after which the dependencies so detected may
be coded using the LATA scheme.
By the nature of the component failure data it is likely that
potential dependencies will prevail and it will be the responsibility
of system analysts to decide which potential events need to be
considered as part of a plant safety analysis.
In addition to providing a source of dependent failures data, the
analysis of maintenance related data, in the manner described, will
provide a useful mechanism for monitoring plant performance.

REFERENCES

1 Los Alamos Technical Associates, A Study of Common Cause Failures;


Phase 2: A Comprehensive Classification System for Component
Fault Analysis, EPRI NP-3837, Project 2169-1, Interim Report, June
1985.

2 Flemming, , A Reliability Model for Common Mode Failures in


Redundant Safety Systems, General Atomic report, GA-A13284, 1974.

3 Pickard Lowe and Garrick, Classification and Analysis of Reactor


Operating Experience Involving Dependent Failures, EPRI NP-3967,
Project 2169-4, Interim Report, June 1985.

4 G ames, A M. Some aspects of Common Cause Failure Analysis in


Engineering Systems. PhD Thesis, University of Liverpool, October
1986.

ACKNOWLEDGEMENTS

The bulk of the work by A. M. Games on classification and data analysis


has been performed during her stay at JRC-Ispra, supported by a CEC
PhD-Grant. The Component Event Data Bank (CEDB) is part of the ERDS
system operated by JRC Ispra.

319
PLANT PWRl

COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 2 2 t 1 0 0 1
BIO 3 2 0 5 5 0 0 5
Bit 2 2 2 6 3 0 1 t
B16 0 0 9 9 0 0 0 0
BIS 2 2 t 8 7 2 0 9
FOB 7 5 35 t7 2S 3 7 38
F16 2 6 2 IO 1 2 0 3
TOTAL: 18 19 56 93 ts 7 8 60

PLANT PWR2

COMPONENTS FAILURES
SYSTEM PIMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 2 2 t s5 1 0 2
BIO 3 2 0 6 0 0
lit 2 2 2 6 t 0 1
B16 0 0 9 9 0 0 t
BIS 2 2 t S 6 1 0
FOB 9 5 35 t9 t5 t 12 61
F16 2 6 2 10 2 0 0 2
TOTAL: 20 19 56 95 6t 5 19 88

PLANT PWR3

COMPONENTS FAILURES
SYSTEM PIMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 2 0 2 t
BIO 3 2 1 6 5 0 0 S
lit 2 0 16 S t 0 It 15
BI6 2 0 9 11 12 0 5 17
BI8 2 2 t a 0 0 1 1
FOB 7 5 35 t7 IS 1 15 3t
FI6 2 6 2 10 1 3 1 5
TOTAL: 22 19 71 112 42 t 35 81

PLANT PWR4

COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 2 0 0 2
BIO 5 2 0 7 0 0
lit 2 0 2 t 0 0 0 0
B16 2 0 9 11 t 0 8 12
BIS 2 2 t S 0 0 0 0
FOS 7 5 35 t7 22 0 17 39
F16 2 6 2 10 2 1 0 3
TOTAL: 24 19 56 109 38 I 25 64

PLANT PWR5

COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BO 3 t t t 12 0 0
BIO 3 2 0 5 0 0
Bit 2 0 2 t 0 1
BI6 2 0 9 11 0 2
BIS 2 2 t 8 0 0
FOB 7 5 35 t7 13 3 6 22
FI6 2 t 6 12 2 0
TOTAL: 22 17 60 99 31 5 9 45

P L A N T PWP.6

COMPONENTS FAILURES
SYSTEM PUMP ELMO VALV TOT PUMP ELMO VALV TOT
BOt t 2 t 10 2 0 0 2
BIO 3 2 0 5 3 0 0 3
Bit 2 0 2 t 0 0 0 0
B16 2 0 9 11 2 0 0 2
BIB 2 .2 t S 0 0 0 0
F08 7 5 35 t7 t 1 3 S
FI6 2 5 6 13 1 1 0 2
TOTAL: 22 16 60 88 12 2 3 17

NUMBERS OF COMPONENTS AND FAILURES FOR S I X PWR'S

320 -
APPENDIX 1

SOME RESULTS FROM THE ERDS STUDY

An example of the use of a classification scheme for multiple failure


evens (MFEs) (taken from Reference 4) is presented. MFEs of category A,
that is, those events wherein the failures have the same cause, same
mode and same date and relate to identical components, are discussed in
detail.
Using the proposed classification scheme a study was made of data
available for six PWRs. A breakdown of the data by plant, system and
component is given in Fig 4 and the system codes re explained in Fig
5.
Multiple failure events were sought for
- each generic component type (pumps, motors, values) on
- each system on
- each plant.
This constituted the first-tier selection of in-plant, in-system
related failures.
The ERDS-CEDB definitions of cause and mode were employed and
"identicality" of components was taken as meaning of the same generic
type.
Simultaneous events were those having the same failure date.
The total number of multiple failure evens (MFE's) of each type, A
to U, is given in histogram Fig 6.
Note that categories, B, E, H, K, N, Q and T are not represented.
These categories refer to non-identical components. Since,
however, the first tier selection refers to each generic type of
component separately and since it is assumed that 'identical'
components simply mean those of the same generic type, these categories
will have no entries.
Group VIII Multiple Failure Events (categories V, W and X) are too
general and have been excluded.
The total number of MFE's in each category for each component
type, each system and each plant is shown in Figs 7, 8 and 9
respectively.

Multiple Failure Events of Category A in Data for 6 PWR's

A full listing of category A MFE's is given in Appendix 2 but the


results can be summarised in Fig 10.
Category A MFE's are events in which the failures are coupled by
common cause, common mode, and common date of occurrence.
The distribution of MFE's for component type, common cause and
common mode is illustrated in Fig 11 and the description of cause and
mode codes is given in Figs 12 and 13.
In interpreting the results there are some important
observations:
1 It is not possible to compare directly the number of MFE's
relating to each type of component. This is because, as seen in

321
CODE SYSTEM

B03 Containment Spray System

BIO Auxiliary Feedwater System

B14 Residual Heat Removal System

B16 High Pressure Coolant Injection System

B18 Low Pressure Coolant Injection System

F08 Condensate and Feedwater System

F16 Circulating Water System

FIGURE 5 PWR SYSTEM CODES

322 -
s
50-j

40 -.

V)

oc
13

30-

20^
I s s I
10^

i il1. !

IS
FIGURE 6
1A 1C 2D
l
2F 3G 31 4J 4L 5M
CATEGORY
P73
50
M M. m _rza
6P 6R 7S 7U

HISTOGRAM SHOWING THE NUMBER OF MULTIPLE FA ILURE EVENTS OF EACH TYPE


FOUND IN THE ANALYSIS OF DATA FOR SIX PWR'S
50H

40-

=> 30H
OC

5 20-
O

10-

_ZZ3_
1A 1C 2D 2F 3G 31 4J 4L 5M 50 6P 6R 7S 7U
CATEGORY

LEGEND: COMP ELMO PUMP V7Z7A VALV

FIGURE 7 HISTOGRAM SHOWING THE NUMBER OF MULTIPLE FAILURE EVENTS OF EACH TYPE
FOR EACH COMPONENT TYPE IN THE ANALYSIS OF DATA FOR SIX PWR'S
501

40

= 30 H
t/t

3 20

10

_ & _HgJ
1 IC 2D 2F 3G 31 4J 4L 5M 50 6P 6R 7S 7U
CATEGORY
LEGEND: SYSTEM I 1 B03 BIO EZZZ3B14
kXXXXX B16 K 5 W 1
mm Fu
FIGURE 8 HISTOGR
A M SHOWING THE NUMBER OF MULTIPLE FA ILURE EVENTS FOR EA CH
TYPE FOR EA CH SYSTEM IN THE A NA LYSIS OF DA TA FOR SIX PWR'S
50 H

40-

30-

a.
2 20^ I
I
10-

1A 1C 2D 2F
1
3G 31 4J 4L 5M
m
50 6P 6R
S2ZL_ESS
7S 7U
CATEGORY
LEGEND: PLANT I IPWR1 l ^ P W R 2 PWR3
vx&s&xi PWR4 ^ S ^ PWR5 minimi) PWR6

FIGURE 9 HISTOGRAM SHOWING THE NUMBER OF MULTIPLE F AILURE EVENTS OF EACH TYPE
FOR EACH PLANT IN THE ANALYSIS OF DATA F OR SIX PWR'S
Event Category Plant System Component Cause Mode

1 IA B18 PUMP Sil


2 1A F08 PUMP Sil
1A F08 PUMP PI
3 1A F08 VALVE Ell
1A F08 VALVE Wl
4 1A 2 B18 PUMP Sil
5 1A 2 B18 PUMP Sil
6 1A 2 F08 PUMP Ell
7 1A 2 F08 PUMP Wl
8 1A 2 F08 PUMP S12
9 1A 2 B03 VALVE Ell
10 1A 2 B16 VALVE 11 *
1A 2 B16 VALVE Ell *
11 1A 2 F08 VALVE Ell
1A 2 F08 VALVE Wl
12 1A 2 F08 VALVE Wl
13 1A 3 F08 PUMP 11
14 1A 3 B14 VALVE Ell C
1A 3 B14 VALVE 11 C
15 1A 3 B14 VALVE Nl
16 1A 3 B16 VALVE Ell
17 1A 3 F08 VALVE Fl
18 1A 3 F08 VALVE Ell
19 1A 4 F16 ELMO N12
20 1A 4 BIO PUMP Ell C
21 1A 4 F08 PUMP Ell
22 1A 4 B16 VALVE Fll *
1A 4 B16 VALVE Sl *
1A 4 B16 VALVE Dl *
23 1A 4 B16 VALVE Nil
24 1A 4 F08 VALVE Wl
25 1A 4 F08 VALVE Nil C
26 1A 5 BIO PUMP Ell
1A 5 BIO PUMP 11
27 1A 5 F16 PUMP Ell
28 1A 5 Fl 6 ELMO Ell
1A 5 F16 ELMO 11
29 1A 5 B16 VALVE Ell C
30 1A 5 F08 VALVE Wl

FIGURE 10 MFE'S OF CATEG ORY A

- 327 -
8 -
7 H
6 -!
5 -.
4 -:

a 3 .-!
UJ
DC 2 -i
1 -i
O
DEFFINNNPSSSW DEFFINNNPSSSW DEFFINNNPSSSW
1111111111111 1111111111111 111111111111 1 CAUSE
11 12 12 11 12 12 11 12 12
ELMO h PUMP h -VALV COMP
LEGEND: MODE ^ ^ B

FIGURE 11 HISTOGRAM OF THE NUMBER OF CATEGORY A MULTIPLE FAILURE EVENTS FOR


EACH COMPONENT T Y P E . ACCORDING TO CAUSES AND MODES OF FAILURE,
AS FOUND IN THE ANALYSIS OF DATA FOR SIX PWR'S
Cl Installation/Construction (in situ)
Dl Failure caused by other plant devices or by off-site influence
El Engineering
Ell Engineering/Design (hardware)
E12 Engineering/Design (procedures/specifications)
E13 Other causes related to engineering
Fl Maintenance, testing, measuring and faulty operation
Fll Operating error
F12 Maintenance/Testing/Setting
F13 Incorrect procedure/instruction (for operation)
F14 Incorrect procedure/instruction (for maintenance, testing,
setting)
II Material incompatability (unexpected)
Ml Manufacturing (in workshop)
Ml No causes
Nil Cause unspecified
N12 Cause unknown
PI Pollution
SI Abnormal service condition
Sil Abnormal environment condition
S12 Abnormal component operation (out of specifications)
Wl Expected wear, ageing, corrosion, erosion, distortion, abrasion,
fouling

FIGURE 12 LIST OF CODED FAILURES CAUSES

Failure modes on demand

A* Won't open
B* Won't close
C* Neither opens nor closes
D* Fails to start
E* Fails to stop
F* Fails to reach design specifications

Failure modes on operation (sudden)

A Sudden
Incipient
C No defined

(Note that in the analysis carried out codes for modes of failure on
demand are followed by an asterisk to distinguish them from modes of
operation

FIGURE 13 LIST OF CODED FAILURE MODES

329 -
Fig 4, the total numbers of each component and total number of
failures for each type are very different:

MOTORS PUMPS VALVES

TOTAL FAILURES 24 232 355


TOTAL COMPONENTS 109 148 359

2 Some MFE's are repeated because of multiplicity of failure


causes. These are shown bracketed together in Fig 10.
3 In some cases the causes are given as 'expected wear, etc'
AND 'engineering design'. This may be regarded as a slight
contradiction in terms but it is sufficient at this stge to be
aware of it.

Adjusting the totals to account for duplication of groupings:

MOTORS PUMPS VALVES

TOTAL CAT A MFE's 2 12 16

RATIO OF CAT A
MFE's TO TOTAL
FAILURES 0.083 0.052 0.045

Multiple catastrophic failures are generally regarded as naturally


the most important but it is the writer's opinion that potential
multiple failures are useful indicators and should be included in
analysis. If it is now assumed that modes A (sudden) and A* (won't
open) are catastrophic events, whilst modes (incipient) and C
(unknown) are not catastrophic and can therefore be ignored, the
following result is obtained:

MOTORS PUMPS VALVES

TOTAL CAT A MFE'S


WITH SERIOUS FAILURE MODE 1 2 4

RATIO OF TOTAL CAT A


MFE'S WITH SERIoUS MODE
TO TOTAL FAILURES 0.046 0.008 0.011

To make further judgements it is necessary to examine the failure


details a little more closely.

1 2 pump failures (incipient).


During planned maintenance rusted bearings were found owing to
abnormal humidity (caused by leakage of vapour from APG valves).
No effect on system, other components or reactor.

330
2 2 pump failures (incipient).
During planned maintenance, pumps' mechanical seals found eroded
by humidity and deformed.
No effect on system, other components or reactor.
3 4 valve failures (incipient).
Found during routine surveillance:
i Continuous leakage from sealing was noted; repack and
replace stem, disc and segment.
ii Continuous leakage from packing gland. Replaced at next
outage.
iii Internal leakage. Packing gland replaced,
and iv No effect on system, other components or reactors.
4 2 pump failures (incipient).
Failures Detected by operating staff.
Water present in lubricating oil owing to abnormal humidity. Each
failure caused loss of one redundancy (ie one was repaired while
the other was used and vice versa).
There was no effect on the reactor or other systems.
5 2 pump failures (incipient).
There was a complete loss of system function and a forced manual
reactor shut-down because of bearing failure. Abnormal humidity
resulted in water presence in lubrication oil.
6 2 pump failures (sudden).
Method of detection unknown and loss of one redundancy in each
case.
i Fluid leakage from welding of coupling flange with
cooling pipe.
ii 10cm crack on cooling pipe.
7 2 pump failures (incipient).
During planned or preventative maintenance, the pumps were
replaced owing to erosion of the diffuser, casing and shaft and
degradation of bearings and seals.
i Degraded system operation and caused a turbine trip.
ii No significant effect on other systems or components or
on reactor.
Some modifications introduced.
8 2 pump failures (incipient).
Water leakage from pump cooling pipe occurred because of flow
rates being too high and thereby causing erosion
i Lose of one redundancy.
ii Loss of system function and forced reactor shut-down.
9 2 valve failures (incipient).
One detected during monitoring of operating abnormalities and the
other detected during repair or corrective maintenance. Internal
leakage was greater than allowable, even after adjustment, because
of stem and disk scoring.
10 2 valve failures (failed to open on demand).
The failures were detected by calling the component into operation
during refuelling or revision.
Friction and jamming of the disk occurred owing to material
incompatibility and design error.

- 331 -
11 2 valve failures (Incipient).
i Detected during planned maintenance. Stem and bonnet
erosion because of leaking packing gland,
ii Detected during routine surveillance. Leakage from
packing glands.
12 2 valve failures (incipient).
Both detected during planned maintenance and having no significant
effect on the system or reactor.
In one stem scoring had occurred and in the other a gasket needed
replacing.
13 2 pump failures (incipient).
i Detected during routine surveillance. Leakage from both
mechanical seals. Tungsten carbide seals replaced with
silicon carbide (Burgman).
ii Detected during repair or corrective maintenance. Free-
side seal replaced with Burgman type seal.
14 Three valve failures (mode not defined).
Detected during repair or corrective maintenance.
A modification was made to the packing gland by the manufacturer
in order to avoid leakage.
15 2 valve failures (incipient).
Detected by operating staff during reactor refuelling.
Both leakages from the body-bonnet joint. Pins were cleaned,
joint replaced and repacking carried out.
16 4 valve failures (incipient).
Detected by operating staff.
No effects on other systems or reactor.
Modifications were carried out to the packing gland in each case.
17 2 valve failures (incipient).
Detected during planned or preventive maintenance.
No effects on other systems or reactor.
Both valves were replaced owing to bad maintenance.
18 2 valve failures (incipient).
Detected during routine surveillance.
No effects on other systems or reactor.
Both valves had modifications made (braids added) and spring and
thrust-washer grinding.
19 2 motors failed (sudden).
Detected by monitoring of operating abnormalities.
No effects on other systems or reactor.
Insulation failure and earth circuit in stator windings owing to
water ingress in the stator terminals box.
20 2 pump failures (mode not defined).
Failure detected during maintenance.
No effects on other systems or reactor.
A modification was made to the Installation of the thermostatic
valves on the bodies of the pumps in order to permit easy
disassembly.
21 2 pump failures (incipient).
Detected by technical deduction (degraded performance).

- 332 -
No effects on other systems or reactor.
Magnetic filters were added on the cooling devices of the pumps.
22 3 valve failures (failure to open on demand).
Detected by calling the system/component into operation. Failure
caused loss of system function but had no effect on other systems.
In one failure report it was stated that the reactor had to be
shut down.
These motorlsed valves did not open because incorrect adjustment
of the servomotor resulted in excessive loading during closing
which deformed the disk.
23 2 valve failures (sudden failure).
Method of detection unknown.
Each failure caused loss of one redundancy.
Body-bonnet welding was carried out. The cause was unspecified.
24 2 valve failures (sudden).
Method of detection not stated.
Caused loss of one redundancy.
Leaking from packing glands owing to expected wear.
25 2 valve failures (mode not defined).
Detected during planned maintenance.
No significant effects.
Checking and regulation of valves.
26 2 valve failures (incipient).
Detected by monitoring of operating abnormalities.
No effects on other systems or reactor.
Modification in design of thrust bearings (new material) and
packing glands replaced. Modifications also to lube-oil loop
design.
27 2 pump failures (sudden).
Automatic shut-down by protections systems.
Loss of system function.
Loss of pressure In the lube-oil exchanger circuit (water side)
causing automatic shut-down of both circulating pumps.
The cause was a design error but no further details were given.
28 2 motor failures (incipient).
Method of detection not given.
Effects on other systems or reactor none.
Bearings were replaced on both motors owing to bad quality of
lubricating grease. Excessive vibration was noted on shut-down so
improvements were made in the insulation of motor supports.
29 2 valve failures (mode not defined).
Method of detection not given.
Failures caused loss of redundancy.
The packing glands on the valves were modified.
30 2 valve failures (incipient).
Detected during planned maintenance.
No effects on other systems or reactor.
Eroded valve seals replaced.
There are several crucial observations which can be made from
these events

333
a In twenty cases out of thirty no effects on the system, other
systems, or the reactor were observed. In one case (event 10)
both failures were features to open on demand, other failure modes
were Incipient or unknown.
b Of those events which had some significant effects there
were
7 cases of loss of one redundancy
4 cases of loss of system function
1 case of degraded system operation ) same failure
1 turbine trip )
2 reactor shut-downs.
c There were some apparent inconsistencies in this information.
For example in event 7, one failure was said to have no effects
whereas the other indicated system degradation and turbine trip.
Both pumps were replaced. It can only be assumed that the
occurrence of one failure which had significant effects caused
further investigation of the other pump and both were repaired at
a convenient point in planned maintenance. The causes of failure
were "expected wear" but this is slightly suspicious because of
the extreme damage to both pumps. Similarly, event 8 includes one
failure causing loss of one redundancy whilst the other caused
loss of system function and forced reactor shut-down. Again,
causes of failure were identical. If the pumps were operating in
parallel, as is suspected but cannot be confirmed because of lack
of supporting system design information, it would require both to
fail before system shut-down was necessary. To add to the
confusion, both failures were 'incipient' and were, therefore, in
a certain sense 'under control'.
Apart from loss of system function (LOSF) in event 8 this
also occurred in events 5, 22 and 27. In event 5 the failures
were 'incipient'. It can only be surmised that in order to
prevent a catastrophic failure it was decided to shut down and
repair.
In event 22 the two valves failed to open on demand. This is much
more like the common-cause failure that is immediately
recognisable. Similarly in event 27 the two pumps failed suddenly
because of a design error.
e The only two events which were noted as 'common cause'
failures in the event description were events 27 and 28.
f Event 28 involved two incipient motor failures which had no
effect on the system but the failures had a common cause.
g In each case the failures were not of simply generically
similar components but truly identical components.
h In 24, for example, the common cause of failure was "expected
wear". This suggests that the failures were random, and this is
confirmed by the fact that the failures were dealt within both
cases during planned maintenance. The need for packing gland
replacement Is normal at some point in a valve's life.
I In events 19 and 15 the coded causes of failure was unknown
or unspecified whilst the failures in these events are very
closely coupled. This is a caveat for the analyst. If only

- 334 -
Interesting causes like 'engineering design' are searched for then
some interesting events might be lost.
j In event 6 it would appear that examination and repair of one
failure caused maintenance engineers to check similar systems thus
revealing the second failure. The failure did not however have a
truly common cause.
These results to a large extent vindicate the method being adopted
because:
1 Neither cause nor mode of failure is sufficient to identify
related failures. Analysed together these attributes can draw out
closer coupling.
2 An analysis which simply considered events which had serious
effects would not adequately identify many related events.
3 Maintenance intervention appears to successfully prevent
sudden multiple failures. If only events with sudden or
catastrophic failure modes were considered then much useful
information would be lost.
4 If ratios of multiple failure events to total numbers of
failure events are calculated only for those MFE's which caused
loss of system function, the writer believes that this would give
a ratio which is far too conservative.
Events which may have become catastrophic common cause failures
are events 1, 2, 3, 4, 9, 13, 14, 15, 16, 17, 18, 21, 25, 28.
Event 5 Loss of system function (decision to shut-down to
prevent catastrophic failure) (class as actual C C F ) .
Event 6 Independent failures coupled by maintenance intervention
only.
Events 7 Positive intervention as in 5 to prevent catastrophic
and 8 failure (class as actual CCF's).
Event 10 Actual CCF.
Events 11 Independent events coupled by maintenance intervention
and 12 only.
Event 19 Actual CCF.
Event 20 Modification to permit easy disassembly (not an actual
failure event, only functional unavailability).
Event 22 Actual CCF.
Event 23 Actual CCF.
Event 24 Independent events coupled by maintenance only.
Event 26 Modifications during maintenance.
Event 27 Actual CCF.
Event 29 Modifications to valves.
Event 30 Replacement of parts during planned maintenance.
Where modifications are carried out it is difficult to know how
essential they were to preventing dependent failures. These are not,
failures considered potential dependent failures. In essence these
component outages were functional unavailabilities not failures.
Recalculating the ratios of MFE's to total failures for each
component type:

335 -
MOTORS PUMPS VALVES

Total number of failures (N) 24 232 355


Actual CCF's (n.) 1 4 3
Potential CCF's (n ) 1 5 8
P
na/N 0.042 0.017 0.008
n A + np/N 0.083 0.039 0.031
Screening of the events is clearly essential and some engineering
judgement is required to do this.
Of course, category A MFE's are only part of the story and the
observant reader will have noticed that in a previous analysis of PWR's
1 and 2, discussed in Chapter 5, there was one event whereby four
failures occurred, two pumps on identical systems in the two plants on
the same day and for the same cause. One pair of these failures was
not picked up in category A. Instead this appear in category M,
discussed below. This was because the coded failure causes were
different. This emphasises the need to consider attribute combinations
other than the type in category A.

- 336 -
APPENDIX 2

CATEGORY A MULTIPLE RELATED FAILURES

RESULTS OF SEARCH NO 1

MULTIPLE FAILURE EVENTS OF CATEGORY A

SYSTEM C OMPONENT CAUSE MODE DESCRIPTORS DATE CAT


PLANT PWR1

B18 P06 S U Ell Wl 03 18 26 33 070579


P07 Sil 03 34 39 070579

F08 P13 Sil PI 07 18 19 21 290580


P14 S U PI 07 18 19 21 290580

F08 P13 PI S U 07 18 19 21 290580


PU PI su 07 18 19 21 290580

F08 V04 EU Wl 26 33 300580


V05 EU Wl 26 33 300580
V06 EU Wl 26 33 300580
V03 EU Wl 25 33 300580

F08 V04 Wl EU 26 33 300580


V05 Wl EU 26 33 300580
V06 Wl EU 26 33 300580
V03 Wl EU 26 33 300580

PLANT PWR2

B18 P20 SU 03 34 39 110479


P21 S U Dl 03 18 25 34 39 110479

B18 P20 SU 03 34 39 120281


P21 S U Dl 03 34 39 120281

F08 P25 E U Dl 26 33 55 040179


P26 EU 26 28 040179

F08 P27 Wl Cl 18 19 290679


P29 Wl Ml 07 18 290679

F08 P25 S12 E12 18 26 33 010479


P26 S12 E U 18 26 33 010479

B03 V08 EU 18 19 25 190880


V09 EU 18 19 25 190880

B16 V18 H EU * 10 25 19 130879

337
V19 Il Ell A* 10 25 19 130879

B16 V18 Ell II A* 10 25 19 130879 1/


V19 Ell II A* 10 25 19 130879

338
F08 VIO Eli Wl 18 26 33 150980
Vil Eli Wl 18 26 33 150980

F08 VIO Wl Eli 18 26 33 150980


Vil Wl Eli 18 26 33 150980

F08 VIO Wl 18 26 070879


Vil Wl 25 33 070879

PLANT PWR3

F08 P45 Il Eli 26 33 100779


P46 11 07 33 100779

B14 V31 Eli 11 C 26 33 010780


V32 Eli 11 C 26 33 010780
V34 Eli 11 C 26 33 010780

B14 V31 Il Eli C 26 33 010780


V32 Il Eli C 26 33 010780
V34 Il Eli C 26 33 010780

B14 V28 - _ 25 33 100680


V29 NI 25 33 100680

B16 V35 Eli 26 33 090280


V36 Eli 26 33 090280
V37 Eli 26 33 090280
V38 Eli 26 33 090280

F08 V48 Fl 26 33 100780


V50 Fl 26 33 100780

F08 V45 Ell 26 33 200779


V46 Ell 26 33 200779

F16 Mil N12 09 22 27 34 43 301279


MI 2 N12 09 22 27 34 43 301279

PLANT PWR4

BIO P51 Ell C 30 26 020780


P52 Ell C 26 020780

F08 P59 Ell 17 19 041080


P60 Ell 17 46 19 041080

B16 V54 Fil Sl Dl * 04 21 13 190379


V55 Fll Sl Dl * 21 43 25 190379

B16 V54 Sl Dl Fll * 04 21 13 190379

339 -
V55 SI Dl Fil A* 21 43 25 190379

B16 V54 Dl SI Fil A* 04 21 13 190379 U


V55 Dl SI Fil A* 21 43 25 190379

340
B16 V56 Nil A 25 33 55 220980
V57 Nil A 25 33 55 220980

F08 V60 Wl A 26 33 140480


V61 Wl A 26 33 140480

F08 V59 Nil C 19 071080


V60 Nil C 19 071080
V61 Nil C 19 25 071080

PLANT PWR5

BIO P64 Ell 11 18 25 39 040480


P65 Ell 11 18 19 25 040880

BIO P64 11 Ell 18 25 39 040880


P65 11 Ell 18 19 25 040880

F16 P75 Ell 05 25 39 110180


P76 Ell 05 25 39 110180

F16 Ml 6 Ell 11 07 39 42 260880


Ml 7 Ell 11 07 39 42 260880

F16 M16 11 Ell 07 39 42 260880


Ml 7 11 Ell 07 39 42 260880

B16 V67 Ell C 26 33 181080


V68 Ell C 19 25 181080

F08 V72 Wl S12 18 25 020680


V73 Wl 18 19 020680

PLANT PWR 6

341
Subject Index

Accident management 109-110


Additive dependence model (ADDEP) 13,16,245,249-251,254
Administrative routines 72,84
Aggregation/disaggregation methods 145,151-156
Aircraft crash (defences against-) 92-93
Airlines CCF - case histories 47,49,68-70
Alpha factor 166,168,170-171,176,192,227,247
249,250,255
Assembly errors (events) 265-266,269,285
Attributes (for CCF identification) 122,124,136,137,140,314
Automation 102,108
Auxiliary Feedwater Systems 2,43-44,103-105,109,152-154,
212,221-234,260-267

Basic parameter model 166,176,192,227,245,250


Bayesian methods 5,191,233,248,255
Benchmark Exercise
- on Systems Reliability (JRC) 2,3
- on CCF Analysis (JRC) 3,27,47,53-56,103,128,169,
176-178,205,209,221-234
- on CCF Data Classification (EPRI) 3,35,38,177
- on CCF Data (Nordic Project) 3,9,10-16,27,235-255,277-279
Beta factor - model 2,4,21,22,117,122,127,166-168,
176,192,226,227
- values 24,197-200
Binomial failure rate
- model 2,13,16,122,127,166,168,
171-172,176,192,227,230,245,
247-248,250,253-255,262-263
- parameter values 267
Breakers 261-267
Building criteria 78,89,92,105

Cascade failure 39,42,48,131


Check list 224-225
CCF/CMF
- analysis procedures in PSA 2-5,9-10,42-45,71,96-99,
113-143,159-174,231-234,277
- classification, definition,
terminology 2,3,9,10,17,19,31-46,48,
49-53,113-117,176-178,197,
205-209,275-276,304-307,315-316
- causes (see also attributes) 33-39,40-41,56-61,225
- data bases 16,175,182
- data treatment 10,14,21,31,37-41,235-242

- 343 -
- data sources 23,49,120,175,177,237
- event data 65-70,264-266,295-297,317-341
- impact in PSA 9-29
- initiators 17,27,96,277,286
C-factor model/values 21,22,24,167
Command faults 132
Common load 2,117,149
Component event data
- general, analysis 4,5,16,37,259,303-341
- collections: CEDB 290,319-341
NPRDS 259,261,262
SRDF 259,261,262
Swedish 11,279
Computer codes: ALMONA 122
PREP-KITT 122
SAMPLE 122
Sstagen-Mmarela 145-147
Coupling mechanism 56,60,61
Cut-off method 52,117,122,127,227

Defences against CCFs 1-4,31,35,47-111,119,126


Dependency structures 1,3-4,42-44, see also CCF
Dependent failures see CCF
Design review 72,83-84
Design error events 111,265-266,269-270
Diesel generators 2,22,26,71,97-99,270-272,
281-286
Direct assessment method 9,10,13,16,166,251-255
Dirichlet distribution 251-252,255
Diversity 10,32,52-53,71-72,102-103,108

Electric power system 76


Emergency cooling system 74,101
Emergency feedwater see auxiliary feedwater
Environment caused failure events 265-266
Event tree 4,42
Explicit modelling 4,42,124,139-140
External events 42,71,84-96,106

Fails-safe criterion 53,102,108


Failure-rate coupling 208,227
Fault tree 3-4,19-20,42,122,124-125,128,
131-143,160-165
Flooding 95
Fire 86-92
FMEA 124,128,131,225-226
Funeti onal dependency/unavailabi1i ty 39,42,131,305-306

Human error events 265-266,269,284-285


Human interaction 1,17,19,28,43

344
Identification procedures for CCFs 16,49,113-129,210,237-241
Impact vector 5,179-185,193-196,210-213,
218,228
Implicit modelling see parametric models
Incident report data 4,5,37,259,261
- collections: AORS 289-302
ERDS 290,319,321
LERs 11,38,197,209,228,230,309
NPERs 309
SPT 261
Inspection
- case study 149-151
- maintenance, overhaul procedures 1,13,14,42,108-111,149-151
Instrumentation systems 108,260-267
Intercomponent/
intersystem dependencies 17,96

Judgement (subjective,
engineering, etc.) 4,5,14,21,175,205,206,227,241

Maintenance - policy see inspection


- error events 265,269
Manufacturing error events 265,269,272,285
Mapping down/up 5,13,183-187,210-218,227-228
Marnov method 42,145-158
Missile impact 94-95
Monte Carlo method 42,136
Motors 260,320-341
Multiple Greek Letter (MGL) model 13,16,21-22,122,127,166-170,
176,192,230,245-247,250,253-254
- parameter values 232
Multinomial failure rate model 13,16,245,248-250,253-255
Multiple related failures 31-46 (see also CCF)

On-site investigation 257,268-272


Operational experience 5,10,21,176,209-210,257-341

Parameter estimation 2,4,5,173-220,227-228


Parametric models 4,9,10,45,117-118,124,159-203
Partial beta facor model 55-56,117,127,227
Physical correlations/interactions 17,27,43-44,208
Potential CCF 39,306
Primary failures 132
Pumps 2,22,26,212,260-267,320-341
PSA see CCF analysis procedures

Qualitative analysis 3,4,16-21,37,113-129,223 (see


also CCF analysis procedures
and identification procedure)

345
Quantitative analysis 9,21-22,227-228,243-256 (see
also CCF analysis procedures
and parametric models)

Ranking CCF events 4,13 (see also sensitivity


analysis)
Rare event 2,32
Reactor protection/shutdown systems 22,26,65-67,77,79-83,85,102,
106,108,261
Redundancy 4,5,10,101,108,210-218 (see
also mapping down/up)
Reliability data book (Swedish) 16,236,247
Residual CCFs 19,44-45,131
Residual heat removal system 101
Root cause see CCF causes

Safety injection systems 109


Screening CCF event data/causes 4,13,176,188,232-233,235-242
Search procedures in data bases 293-300,310-317
Secondary failures 132
Seismic hazard 90-92
Sensitivity analysis 3,9,10,22,27,142-143,243
Separation 52-53,71-75,86-90,102,108
Shared equipment 17,39
Shock (models type) 2,117,126,168,179,180,185-187,
217,228
Staggered testing 190
Standardization 72
State-of-knowledge 208,255
Statistical correlations 42,208
Supercomponent technique 145,151-154

Technical specifications 42,132,137


Testing 72,84,103
Trigger-coupling mechanism model 56,122
Turbines 26,260

Uncertainty 4-5,9-10,15,19,22,26,49,
191-197,229-233,241,243,253-255

Valves (MOV and other) 11-16,22,26,27,213,235-242,


243-255,260-267,279-282,320-341

- 346
vancea seminar on
Common Cause Failure Analysis in
Probabilistic Safety Assessment
Proceedings of the ISPRA-Course held at
the Joint Research Centre, Ispra, 16-19 November 1987

Edited by

ANIELLO AMENDOLA
Commission of the European Communities, Joint Research Centre,
Ispra Establishment, Ispra, Italy

The analysis of dependent events is one of the most critical steps in safety and
reliability assessment of complex systems. However, a wide variety of dependency
structures may exist among the failure modes of equipment and components, thus
requiring a range of different analytical approaches. Several parametric models
have been proposed for the implicit treatment of so-called common cause failures:
the availibility of adequate data is still a matter of debate.
To clarify such problems, and to assess the state-of-the-art models, data, and
procedures, at least in regard to nuclear power plants, a series of benchmark studies
have been undertaken, involving the foremost experts available in the USA and
Europe. The participants in these benchmark studies here exchange the fruits of
their experiences, thereby providing a comprehensive review of the cutting edge of
work on the classification of independent failures: the identification of causes for
dependencies; implicit and explicit modelling procedures; defence criteria against
multiple failures; and data analysis and operating experience.
Although the studies have been undertaken against a background of nuclear
systems, the lessons learned may readily be generalized to other fields, thus
ensuring that the book will be beneficial to the whole community of reliability
analysts.

Kluwer Academic Publishers


Dordrecht / Boston / London ISBN 0-7923-0268-0

You might also like