You are on page 1of 62

THIRD EDITION

R Te SE
A ALYSIS
Improving Performance
for Bottom-line Results
Related Titles

Engineering Maintenance: A Moclern Approach, 8.5. Ohillon


1587161427
Performance Improvement: Making It Happen, o.Enos -jo,
T H I RO EDITION
(,'

'1574442821 :-

t:-
tL_
i
i-
,r
R T SE
Improving Performance
l SIS
for Bottom-line Results

Robert J. Lati no
Reliability Center, lne.
Hopewell, Virginia

Kenneth C. Latino
Practical Reliability Group
Daleville, Virginia

0l;~~~:n~~~;nCiS
Boca Ralan London New York

C:RC i, iln imnrini nf tllP T~'1lor N Frilnri, t.rolIO


Published in 2006 by
[n LOVig Memory of
eRe Press
TuyJor & FnlIlcis Group Joseph Rayrnone! Latino, William T. Burns,
6000 Broken Sound Parkway ]\""\V, Sui!e 300
Buca Ralon, rL 33487-2742
WilliOln Worsharn, eme! Roe! Oliva

lb 2006 by Tayior & Francis Grollp, LLC We also dedicate lhis text to al! of those in Louisiana, Mississippi,
CRC Press is an imprint of Tylar & Francis Group
Alabarna, and Florida who perished, and Ihose who lost everything
No clailll lO original U.S. Government worb
Pntcd in he United Stl1les 01' America 011 acid-ti"ee paplOf
due ID Hurricone Kalrina 0/1 Augusl 29, 2005,
10987654321 Hopeflly [his tex! Hlil! shed lighl on he reasons Ihal allowed
lnlernalional St,lIldard Book Number-10: 0-8493-5340-8 (HardcovlOr) the cOl1sequences lo be worse than lhe)' slwuld have been!
lnternalional Sandard Book Number-13: 978-0-8493-5340-6 (I-Jardcover)

This book contains informution obtained from uuthentic and highl), regarded somces. RepliI\[cJ malerial is
qllo(ed wirh pelmission, anc! sources are indicated. A wide variety ir rcrerences are lisled. Reasonab!e errorts
have been made lO publish reli:lble dala and infonnalion, bul lhe! author anJ lhe publisllCr eannot aSSllllle
responsibilily ror the validity of ul! materi<lls Ol" ror lhe eonsequenceS af her use.

No part 01" this book may be reprintcJ, reprodueed, transmitled, or utiliz,-,d in <lny !"onn by uny eleclronic,
lllcchanieal, or other means, now k.llO'.vn Ol" here,ll'tcr invented, inc!uding phOlocopying, microrilliling, <lnd
rccording, or in any informarion storage or retrieval systCl11, without wri[tcll permissioll frolll he publishers.
For pennission to phOlCOPY or use malerial e!ecronieaBy from lhis work, picase ac,-,css www.copy:right.com
(hllp://www.eopyrighl.coml) or eontae lhe Copyright Clearance Center, lne. (CCe) 222 RoscwooL! Drive,
D,mvas, vlA 01923, 978-750-8400. CCC is 11 nOl-for-protll organization tha! pnwides icenses and I"~gistra!ion
fOl" a val1el)' of users. For organizalions that llave been gmnted a phOlocopy liLcllse by the CCC, \1 separare
system 01' payment has been arranged.

Trademark Noiice: Pmduct or eorporale names !TIay be lrademarks 01" regislcrcd lradl2marks, <lnd me only
for idemiiication and explanation wilhOlll intent to infringe.

Librar)' of Congress Cnialoging-in-Publicatiun Data

Caralog record is available from he Library oY COllgres~

informa Taylor & Fruncs Group


is he Academic Division of In['onna plc.
Visit the Taylor & Fruncis \Vco sitc al
h t tp:/h YWW, tayI oranel fra ncis.cOlIl

and (he eRe Press \Veb sitc at


http://www.crcpress,colll
Foreword
In 111y \York as a print ancl o111ine magazine publisher, 1 huye the good fortune to
communicate with maintenance and reliability professionals f1'o111 around the wrle!.
Thcse special groups fpeaple are tbe Gnes who keep the lights on, the water flowing
ane! basie goods and sen/ices on thc \\,,'ay to the rest 01' lIS when tbey do their jobs
wel!. \Vhen trouble strikes and failure OCCllrs, the pressure mounts. Company man-
ageJ1lcnt ancI customers dell1and instant solutions.1VIany times it 1S the heroic actions
of this group that bring things back online.
Unfortunately, hero-bascd maintenance, noble as it is, 1S not the most eiTective
business Slrategy anci points to a reactive corporate culture. As companies strive ror
lean operalions, a proactive approach 1S requirecl to remain compelitive. Being lean
and efi'ective requircs that waste is minimized. Most woulcl agree that unanticipated
I'ailures are very wasteful. In lile rush to get produclion or other systems back online,
clecisions are often macle tbat set up "utme problems, compounding issues tbat couId
have been avoidecl in the first place.
When 1 first ventured out 01' 111y last manu1'acturing managementjob to beco me
a \vriter ancI pllblisher, I had an opportllnity to meet one o" the pioneers of today's
reliability strategies, lYlr. Charles Latino, founder o" the Reliability Center in
FUchmoncl, Virginia and 1'ather 01' the coauthors of this book. 1 am not sure i1' it
was far-sightedness on my part or sheer luck, bUL 1 had manageo. to request an
interview \Vilh the senior Latino to write an article on the past 50 years 01' reliability.
1 hacl aIready developed my ideas ancl strategies fOl" Rcliability having spent several
years in a maintenance-focllsed environmcnt, so 1 \Vas llot prepared fOl" lhe Reliability
paradigms Mr. Latino exposecl me too
What foHowecl were ideas and concepts clesigned to ensure the reliability 01'
almost any system. These concepts culd be llsecl to solve complex chains of events
that leacito the problems we all \Vant to avoid. Repetitive problems can be addressed
ami eliminatcd with thorough aneI c<llm metllodologies. In my mind, maintenance
startcd to shift fro111 "fixing things" when lhey fail, to adoptlng strntegies ancl
l'echniques that eliminate or greatly reduce the possibility of failure lo begin with.
1 aIso learned that people had an equal or greaier effect on problems than
clectrical ancl rnechanical subsystems do. The problem solving mcthocls discussed
in this book cover much more than how to determine the obvous cause 01' the
problem. This book discllsses how to sd IIp a blame-free methodlogy that encour-
ages deeper problcm cliscovery that reslllts in pennanent solutions. It strcsses lhat
Root Cause Analysis requires a dedicated team coupled with management support
ancl cxplains \vays to communicatc the impact so a bllsiness case can be madc. Best
of all it leachcs how to create an entire company 01' proactive problem solvers that
wlll move yom operation to the next level oY profitabillty ancl effectiveness.
'VU
Thc Latino [amily has been a cantributar ta the Reliability community for ayer """
50 years ami the next generaran has picked IIp tile Rdiab.ility batan and is making
their own unique impact. 1 count myself very fortunate to have them as personal
friencls and professional associates and 1 a111 Fonfident that this book will have a
profound effect on your problel11 solving abili(y.
Preface
!

Terrence O'Hanlon, ClVIH,P What 1S Root Cause Analysis (RCA)? lt seems hke such an easy question to ansvver
yet f1'0111 no vices to veterans and practitianers to providers, \Ve cannol seem to agree
Publisher (nar come to consensus on) 011 an acceptable definition [01' the industry. Why?
Reliabilityweb.com We wil! discuss our beliefs as to why it is so hard to get such consensus and why
various providers are relllctant [ol' that to happen,
1vlany who will read this tcxl m'e seeking to learn the basics about what is
invo1ved with conclllcting an ReA. Many veterans will peruse this text seeking to
ir
see they can find any pearls o' cOl1ventional wisdom that they do not aIready know
or to dispute and debate om phi1osophies. This creates a very broad spectrum of
expectation that we will t1'Y to accolTImoclate, However, in the end, success shall be
definecl by the clemollstration of quantifiable results and DOt on adherence to the
app1'Oach of favor.
\Ve have triecl to write this text in a conversational stylc bccause we believc thm
is a formal that most "1'Ooticians" can relate to, Basically we are writing like \ve are
teaching a wOl'kshop.
Reaclers wi11 fincl that much of our experience comes not only from the practicing
of RCA in the fidel, but more from our experiences with the over 10,000 ana1ysts
that we huve taught and mentOl'ecl over the years, Additionally, \ve participate in
many on-line discussioll forums where \ve inleract with beginners, veterans anci most
providers for the bellerment of the RCA field. \Ve willlist these sources in this text
in the hopes 1hat our readers will join and also participate in progressing our C0111l11on
iield of study,
So as yotl can see, we t1'Y to bring many diverse perspectives to the tabIe, while
making the pursuit 01' RCA a practica1 one, not a complex one. V/e certainly want
to avoid falling into the "paralysis by analysis" trap whcn looking at something like
RCA - lhat wulcl be hypocritical, \Vould it not?
V'le wiH bring to light the perspectivcs 01' the pragmatic "rooticians" to the
"puri~:t" so that readcrs can make tlleir own judgment as to what is best 1'or theil
applications, V/c will present debates Oil definitions of words c0111111on1y Llsed in the
RCA lexicon but uHimately come lo [he conclusion that there are no genera1l),
accepted definitions in the J'ieJd so we must fend for ourselves (\vhich is part of the
problem with communication),
There are many RCA methodologies 011 the market, so we \vill discuss them ir
generalities so as not to put the microscope on any individual ol' proprietary approach
In this manne1' \VC can discuss the pros and cons 01' each type 01' approach and reader~
can decide the level of breadth and depth that they require in their analysis,
\Ve will clisCLlSS tlle scopc 01' RCA: whcre cloes il begin and where cloes it end':
How does a true RCA effort integrate v'>'ith thc organizational struclure and rcmail
a y.iab!e and valuable resource to the organization? Vlhere there 18 RCA, there j
tnrf politics. So \Ve will discuss how this activity called RCA ots wi.tb existing =
WP""'W':lll

initiatives like Total Quality IvIanagement (TQIVI), Reliability Engincerillg (RE),


"
Reliability Centered ivla1ntenance (RCNI) ancl Six Sigl(1a.
Our intent with this editiol1 of th18 text 1S to expnd the various perspectives
brought to light on the [Opic of RCA and to prcscnt a currenl "state uf the RCA
Acknowledgments
field" so that readcrs can make their own SOl1l1e! judgmcnts as to how they wish to
dcsio-n and define RCA fOl' their own organizations.
This book would ncver have becn possible had .it 110t bcen fol' our father, Charles].
Will evcryone who reacls this text agree with its content? No. Can they benet
Latino, WllO had the couragc to lght rOl' Reliability Engineering early in his careel'
regarcllcss? Yeso 'vVe hope to spark debate within the mincls 01' our readers where
in AHed Cherniea1 in 1951 when no one would listen. He stood his grouncl until he
they contrast the clifferences between ho\\' \Ve appro~lch RCA ane! ho\V thcy are
proved his concepts to be of great value within thc eorporation. He established and
cUlTent1y conducting them at their facilities. :
directed Al1iecl Chemical's eorparate Reliability Engineering Depanment in 1972.
Perhaps \Ve will sway some to agree with certain premises in this texl' and otl1el's
Charles retired 1'rom Allitu in 1985 ancI purchased the Center from the corporation.
will improvc upon thci1' current approaches with ideas presentecl. Either \vay, Lhe
Charles hacl thc further couragc to start his own Reliability Consulting firm after
jOUn1cy of the learning is what is most important. Analysts w iII collect the necessary
retirement so thm he \vmd have a business lo leave to his children. Having worked
elata, sift out the faets anel malee their o\Vn determinati?l1 as to what thcy believc 18
rOl' Reliabilily Center Incorporated (RCI) fOl' 21 years ourselves, side-by-side with
best for them.
our father, we cuuid not help but become experts in the freld, if anything, through
osrnosis. Charles has embedded tough standards ancl ethics into the \vay we cunduet
Roberl J. Latino business ami for that we are eternally thankful.
Kenneth C. Latino \Ve were proud to see Charles Latino accept the flrst rvIaintenance al1cl Reliability
Technology SU!11mit (MARTS) Aware! fOl' his lifetime contribution and achievement
to the ile.lcl 01' Reliabity Engineering (May 25,2005, Chicago, IL).
We \Voulcl also like to thank the following iniiuencers who have not on1y
helpecl shape the currenl Reliability amI RCA fields but a1so helped shape ancl
balance our perspectives:

1v1r. Teny O 'Hanlon Mr. c. Robert Neims


Dr. Bill Corearan NIr. Terry Herrmann
Mi". BiJI Salol lVIs. Kim Williams
Vee Narayan 1V1s. lVfichaellvlulligan
iVIr. Doug Emberley r\1s. Paul Preuss
Mr. Keith Mobley IvIl'. Terry \Vireman
AA

About the Authors


Robert J, Latino
ExeCLltlve Vice President
Reliability Center, Inc.
Hopewell, Virginia

Roberl J. Latino is Executive Vice Presidenl of Stralegic Development for Reliability


Center, Inc. (RCI). RCT is a Reliability Consulting firm specializing in improving
Equipmcnt, Process, ctnd E-Iuman Reliability. Latino received his bachelor's degrec in
business aclllllnistratlon ancI management from Virginia Commonwealth Universily.
He has been facilitating RCA & FMEA analyscs with his clientele mound the
world 1'or over 20 years ane! has taught more than 10,000 students in the PROACT,-'
Melhoclology. Latino is coauthor of nllmerous seminars and workshops on F1VIEA and
RCA as well as co-designer oI' the awru-d winning PROACT Suite Software Package.
MI'. Latino is a contribllting authr 01' Error Reduction in Healthcare: A SyslCms
Approach lO Tmproving Patient Safety (1999, c. 284, ISBN: 1-55648-271-X, j\HA
Press) and The Harldbook of Patient S((lety Cornpliance: A Practical Guide for
Heo!th Care Organizations, (2005, c. 350 pp. ISBN 0-7879-6510-3, J05sey-Bass).
He has aIso published a papel' entitled "Optinzing FlVIEA and RCA Efforts in
Healtheare" in the ASHRM Journal (ASHRIVI Journal, 2004, Volume 24, No. 3,
pages 21-28). Latino presented a paper entitled "Root Cause Analysis Versus Sha110w
Cause Analysis: vVhat's the DifferenceT' at the ASE-IRi\I 2005 National Conference
in San Antonio, TX. 1-1e has been publishcd in l1lunerous traJe magazines on the topie
ofReliabillty, FI'v1EA, and RCA as well as a frequent speaker on the topie ae domestic
allcl international Lradc confercnces.
Latino has nI so nppliecl the PROACT methoclology to -he l1elcl of Tenorism ane!
CounterTerrorisl11 va a publishec! paper entitlec! "TheApplicaton of PROACT RCA
to Tcrroris1l1/Counter Tenorism Relaled Evcnts" (Ivfuresa, Gheorghe., The Applica-
tion 01' PROACT RCA to Tcrrorisrn/Counter Terroisll1 Relatecl Events, in Prac.
IEEE InternationJI Conference on Intelligence and Secllrity Informatics, Kantor, P.,
Roherts, F., Wang, F., NIerkle, R., Zencl, D., and Hsincl1llil, C., Springcr, Atlanta,
2005,579-589).

Kennclh c. Latino
President
Practica] ReliabiJity Group, LLC
Troutville, Virginia

Kennclh C. Latino has a bachelor of science clegree in computerized information


SystclllS from Virginia Commonwealth University. He began his CJreer developing
une! maintaining maintenance software applications in the continuous process indus- r w

tries. After \vorking with clients to help them beco me more proactive in lheir
m3intenance acvities, he began consulting and teaching industrial plants how lO
implemcnt Reliability methodo10gics and techniquesi to help improve the overal!
performance of plant assets. '
Contents
Over the past few years, a majority of Latino's focLls has centered aroLlnd
developing Reliability approaches with a heavy empl~asis on Root Cause Analysls Chapter 1 Introcluction to thc PROACT Root Cause Analysis (ReA)
(ReA). He has trainecl thousancls of engineers ancI tec~nical reprcsentativcs on how \Vork Process ...... . ...... !
to implement a successf111 RCA strategy at theil' respeqtive facilities. He coautl1orecl
two RCA training seminars for engineers and honrly lJersonnel respectively. Mean Time belween Fai!ure (MTBF). . ...... 4
Latino is also ca-software designer of the RCA program entitlcd The PROACT N umber of Events .................. . ...... .:-\.
Suite. PROACT was a National OoId Medal Award \ylllner in Plant Engineering's lvIaintenance Cost . .5
1998 ancl2000 Proclllct ofthe Year competition for S iiI1,st two versions on the markct. Availability .............. .. ...... 5
He ls currently President of the Practical Reliability Gi"Ollp, a Rel.ilbility consull1ng Reliabilily .. .... 5
,

flrm dedicated to delivering approaches ane! solutions t~1at can be practica!!y applicd Balancee! Seorecard ........ 9
In any asset intensive lndustry. Thc RCA vVork Process .... .... 10

Chapter 2 Introcluction to the Field 01' Root Cause Analysis .... ...... 17
Whal is Root Cause Analysis CRCA)? ...... 17
Why Do Dndcsirable Outcomes Occur? The Big Picwre. ............... .... 18
Are All RCA Methoclologies Crealecl Equal?. .. ............ 19
Attempting to Unclerstand RCA - ls This Good for the lndustry? ...... 19
What is Not Root Cause Analysis? ................... ............. .. ..................... 20
How to Compare Dlffercnt RCA IVIeihodologies When Comparing Them.. ..21
What Are the Primary Diiferences betwecn Six Sigma and RCA? ........... 24
Obstacles to Learning from Things That 00 \Vrong...... .. ........... 25

ChapLcr 3 Creating the Environment for RCA to Succeed: The Reliability


Performance Prucess (TRPP).. . .. 27

The Role uf Executive Management in RCA .................. 27


The Role of aRCA Champiol1 (SpOl1sor) ... 29
The Role 01' the RCA Driver. ....... 31
Sctting Pinancia! Expectntions: The Reality uf the Return ... .. .... 32
Institutiorwlizing Root Cause Analysis CRCA) in the System ...... ............. ....... 34
Rel'lability Ccnter, 1nc.. ............. ..35
Appendix L Sample RCA Procedure. ...... ............. ...... 40

Chaptcr 4 Failure Classif'ication ... ...... .43

RCA As AI1 Approach ................... . ................... ..49

Chapter 5 Opportunity Analysis: "The Manual Approach" .............. 51

Step 1: Pcrform Preparatory Work .. ...... 55


Dehne the System to Analyze .............. . . ... 55 Chaptcr 10 COHlmunicating Finclings and Recommendations .139
Define Undesirab1e Event .. . ............... 56
Draw Block Diagral11 (Use the Contact"P~i~~i~;i~) ..... 58 The Rccommenclatian Acceptance Criteria.... . .................... . ...... 139
Describe the Funclion o" Each Block ...... : ............. . . ......... 58 Devdoping the Recommendations.. . ....................................................... 140
Calculate the "Gap" . .............. ................ . .......... . . ......................... 58 Devcloping tile P.eport.. . ................ ............... . .................. 141
Develop Preliminary Intervicw Sheets and Schedule. . ................ 59 The Final Presentation ....... .................... .............................. . ........................ 145
Step 2: Callect rhe Data .... . ................ ..:.. ............ . . ............. 60
Step 3: Summarize anel Encocle Data ......... ............... . ............. 63 Chaptcr 11 Tracklng ror Bottom-Line Results .. 153
Step 4: Calculate Loss 65
Step 5: Determine the "Significant Few" ............................................................ 66 Getting Proactive V/mk Orders Accomplished in a Reactive Environment ........ 154
Step 6: Validate Results .................. . . ........... 67 Sl.iding the Proactive Work Scale.. ................. ... ............. .155
Step 7: Issue a Report . ............. . .. ,........ . . .................. 68 Develaping Tracking Metrics. .. ................... .............. 156
Exploiting Successes ... " ......................... "......... ................... ..159
Creating a Critical lvlass .... . ................ .......... .......... ......... ..161
Chapter ti Asset Performance Management S:ystcms (AP1VIS):
Recognizing the Life Cycle EtTects o" ReA on lhe Organization ... ". . .. 161
Automating the Opportnntiy Anal~sis Process ....................... 71
Conclusion . . ........ 162
Determining Our Event Data Needs
Establish a \Vorkl1ow ro Callcet the Data .. . .......... 72
Employ a Camprehensive Data Collection System .... 74 Chaptcl' 12 Automating Root Cause Analysis:
Analyze the Digital Data.. .............. . .............. . .......................... 75 The Utilization of The PROACT Entel1)l'ise Version 3.0+ ............ 165
Cuslomizing Proact ror Our Facility. .............. . ... 165
Chapler 7 The PROACT" RCA lVIethoclology ...... 85 Setting Up a Nc\v Analysls in thc Ncw Proact FMEA amI OA l\1odule.. .167
Setting Up a New Analysis in the New PROACT ReA l\1odule . . .................. 172
Preserving Event Data ..... . .......................... , ..................... 86
Al1tomating the Preservation of Event Data ... . .................. 183
The EITor-Change Phenomenon . . ......... .. . ............. 88
Automaling the Analysis Team Structure. ................. . ................ 185
The 5-Ps Concept ... . ................. . ........ 90
Automating the Root Cause Analysis - Logic Tree Development ... . .. ! 89
Parts ...... . ............ 91
Automnting the ReA Rcport \Vriting ..................... . ...... 197
Continnous Process Induslries (Oil, Steel, Aluminum, Paper, Chcmicals, etc.) .... 91
Automating Tmcking Nletrics . . . .. 200
Discrete Product Industries (Amomobiles, Package Delivery, Bouling Lines, etc.) ... 92
Healthcare (Hospitals, Nursing Homes, Outpatient Care Cenler,
Long-Term Care, etc.) ......................... . ..92 Chapter 13 Case I-Estories .. . .. 209
Position ....................... . ................. . .92
Case J-Jistory 111: ISPAT lnland.lnc. East Chicago.IN ................. . ..... 210
Peo1'le .......................................... . ............ 94
Une Item from .Mocliflecl I'::"IVIEA: Identified Root Causes ....................... 211
Papel' .......................... . ................ 97
lmplementecl Corrective Aclions . . ........ 212
Paracligms ........... 98
Bl'fect On Bottom Line . ... 213
RCA Team Statistics .. ..... 214
Chapter 8 Ordering the Analysis Team .. ...... IOS RCA. Team Acknmvlcdgrnents .. ...................... 214
Nov lees versus Vetemns ............... . ...... 106 eore RCA Team Nlembers .............. . . .. 214
The RCA 'Team .................. . .107 Case History il2: Eastman Chemical Cornpany Kingsport, TN .. ..... 222
Line 1tem fmm lVloclified FlVIEA ........ . . ............ .. . ... 223
Iclentifiecl Root Causes .. . ..................................... 223
Chapler 9 Analyzing the Data: lntroclucing the PROACT Logic Tree ......... 117
Implemcnted Corrective Actions . ............. 224
An .Academic Example ................... . . .. 131 EffeCl On Bottom Une .. ..... 224
Veritication Techniques ............ . .136 RCA Team Statistics ... 224
Confidence Factors .. . ........................ . ..137 RCA Team Acknowleclgments ....... . ..... 224
Thp Trnnhleshootinf! Flow Dl..Qram . ..... 138 Corc RCA Team JVlembers . .... 225
Case History #3: LYONDELL-ClTGO Refining HOllston, TX. .. ........... 226
Line-Item From Ivlodified FlvIEA
Identifiec1 RoO[ Canses ...................... "..........
Implementecl Corrective Actions
.......................... 226
........................... 227
............................................. 227
1 I ntroduction to the
PROACT Root Cause
Effect 011 Bottom-Line ...... ........ .... ...... ...... ..... ............. .. ...... 227
RCA Temu Statistics ............ . .......... 228
RCA Temn Acknow ledgments .................. . .. ... 228 Analysis (RCA)
eore RCA Team Ivlembers ....................... , ..... 22B
Case History #4: Eastman Chemical Campany Wc?rld I-:Ieaclquarters
I(ingsport, rrN ............................................ j.. . ........... 238
Work Process
Line ltem [mm Moclifiecl FMEA...... ...... ...... .. ...................................... 239
Specific RCA Description .................. ,... ..... ,.,.,. ................... 239 EtTective Root Cause Analysis CRCA) can arguably be onc of the most valuable
lclcntificd Raot Causes ..................................................................... ,.,. .... 240 tools to any organization. This is especially true ror large asset-intensive companies.
Implemented Corrective Actions .. ,.,.,.,.,. ...... c.. ............. .. .... 241
There are many issues that arisc, and if there is not a plan in place to deal \vith these
Effect On Company Bottam Line .......................... .. ............. 241 isslles, lllcn tlle facility can become very reactive.
RCA Acknowledgmcnts ......... ................ .. ................................ 241 The cllallenge with eCfective RCA 1S when to apply the resources m identify the
Case History #5: Southern Companles Alabama Power Company root causes of a problcm. There m'e simply too l11any issues that mise to effectively
Parrish, AL ............. ............... .. ............. 1.......... .. ..................... 243 salve every one. Therefore, a more inlellgent approach must be taken to select the
Line ltem From Modifed Flv!EA .................................................. 243 right issues to resol ve.
Speciic RCA Description ................. .. .................................... 243 Lel's tale a simple example. Let's assume that we have l\VO centrifugal pumps.
Iclentified Root Causes ............................................................................. 244 One 01' tlle pumps 1S 11 chargc pump that is critical to the operation of the unit ir
Implemented Corrective Actians ............................................................... 244 serves. The other is a water pump tbat is spared ane! is not deemed a critical service.
Effect 011 Company Battom Line .......................... ............... 245 Which problcm do we analyze ir we are expericncing problems with both 01' lhcse
RCA Team Statistics. .. .................. ........ ................. ....... .. 245 pumps and there are limited resources to address the root causes? Tl1e critical charge
RCA Acknowleclgments ............. .......................................... 245 pump, of course.
Core RCA Toam lv!embers ........................... '...................................... '....... 245 We often see organizations struggle with which failures to analyze using ReA.
Case I-listory 116: Weyerhauser Campany Valliant, OK... .. .......................:....... 246 Very often, ana1ysis work is limited to regulatory issues like safety and environmental
Line Ilem from :rvIacli11ed F1YIEA Identiftecl Roat Causes .................. , ...... 251 events. lvIany times, equipment or process related issues are simply corrected and
Implementecl Corrective Actians ........................................................;....... 252 tbe process is stmted back up witbout knO\ving tbe cause. Vi,Tithout iJentifying and
Eftect On Bottam-Line Tracking NIetr1cs ................ ...... ................ .. 253 addressing lhe variolls root causes, the problem s likely to reCUL It seems that
Bottom-Line Results ................... ..... ............ .. .................. 253 witbout somc SOft 01' outside pressllre to perform an analysis ir simply cloes not
Correctlvc Actlon Time Frames. ." ... 253 bappen, Thcrcfore, a strategy should be employed to direct personnel on hmv ancI
RCA Team Statistics .......... .. .... 253 when to do RCA .
RCA Tcam Acknowleclgments .... .. .......... 253 As ,ve statecl, there are 111<1l1y isslles that occur on a claily basis at a large asset-
Core RCA TeOlm Members .................................... .. ....... 254 intensive facility. \Vhen these issues occur thcy are eleemecl very important and must
be addressed. We Beeel ~omc_ \\Iay to separate,the_~n~!0.1.2_~L~I!e "fa_i~~I!.~..=.~~e-(~.~.t'
Index ......................................................................... . ....................... .... ".261 to what is trulv Jn1po~lil1 to~the sllccess rle- f~cilitX TIel~efore, \'ve need..._~2_ .
cletermtf1eWIITIT-ff1e pefspeaTVes~:objecives ~n4 n~~,~s~s'ar~_ f<?r tl1e organllhon .....-'"
For exan1ple, perhaps ybr plant h"a'samfUldate' tclimprove pi:litability-wlthout the
expenditure of aclclitional capital. How woulcl you go about cloing that? You neecl a
strategy to determine what yon are going to do and how yon are going to measure iL
'vVe work in a lot of facilities ihat are rneasuring many things related to operation.
Many organizalions develop the metrics that they fcel are important to measure as
2 Root Cause Analysis: lmproving Performance for 13ottom-Line Results Introduction to the PROACT Root Cause Analysis (ReAl \Nork Process 3

they progress into a maintenance ane! reliabiJity initiative. Yo/e ol-ten hear abOllt 3. \Vork Practices Perspective
iVIean-Time-Betvveen-Faure (1vlTBF), lVlean-Time-to-Restore (MTTR) ancl rnany a. Reduce Repair Time
others. NI~asu~~Fform:m~~J9~: the. sa.~. 0iE1casuring _is_l1ot cspecially useful b. Reduce "tvlaintenance IVlaterial Tnefficiencies
unless th..: measurements are directly related~t-o the peJ-'-forlrai1ce-)'fTlYe-6i~gJlizatjon c. lmprovc Labor Ei'lkiency
_~~::-;:.7 arfd-actiol1 18 t,ll~en to malee the neeCIed -il~provel;l?~]ltS ',\vhel-l"tl1e'l11caslli:e',s"are going d. Improve :Material Purchasing
~r in 'anegati~direction~----,=,--o-o--:-:------~r--' - __ _ (~) Perform Preclictive M.aintenance
Thereforc, we must first think abont what gqals or objectives we are trying to f. Optimize Time-Based Ivfaintenance
accomplish befare we can determine what measures \Ve neee! lo monitor. An c1'fectlve g. Optimize Work Processes
methodologv fox determining yOlIL.cDJnpanX~~....~~J~~~_~~es is ~.~ __~.~~~!.~~~.~.tegy Inap. h. Perform Reliability Stuc1ies
A strmegy map takes all of the objectives of the cqmpal1yand plS them into various J. Perform Criticality ancl Risk Assessments
perspectives. 'rhe perspectives can vary hom conipany to company bul for lhe area lP Improve Maintenancc Planning ancl Scheduling

l
l~f a~~~~gement tl=~!L'~
are four main perspectives: 4. K.nO\vleclge and Experience Perspectve
a. lmprovc Historical Equipment Data Collection
1. Corporate e0. Improve Operations Communications
2. Assets M
(:) Train I\1aintcnance ancl Operalions Personnel
\
3. Work Practices
L-~7' Knowleclge ane! ~_xp~r!~l~~~ Once the perspectives ancl objectives are fully dciined we necd to determine the
, reJationship of Jower level objectivcs ro llpper leve! objectives. Below is an example
Within each of the four perspectives, a number 01~ individual object"tves are deflned. oF a sample strategy map with the objective relationships denncd for the Corporme
For instance, within lhe Corporate perspectivc \oye lool\: at objeclives that directly perspective (Figure 1.1). Strategy maps are an effectivc visual vehicle for demon~
relate to goals clehned within the company. Tbes~ are typically relatecl Lo the 15cal strating how evcry person in the organization can al'fect the performance oI' the
perf0111lal1Ce oi' the business but can also relate critical operational issues like overall business. For instance, whcn a technician is performing vibration analysis
environmenta] anel safety performance. Other objectives related lo lhe Corporate in lile -c!d he can see hoy\' tile application of that skill will improve equipment
perspective might be customer satisfaction issues! like on-time cleliveries, quality of rcliability. This \1;'ill ultimately contribute to the corporate goal of achieving higher
the procluct anc1 many others. However, in the are,~ of asset management we typically returns on the capital employcd.
foeus Oil those areas that relate to financial, safe~y anel environmentat performance
as they relate to the utilization of assets. I

H "".'C Below is atable oI typical perspectives and objcctives relatecl to asset maqagemenl: Increase
[ 'm"mm
C.,(C.<J '-::Z~e:,;-l \':::::;:" ",-}, "f,' h 'O safdyancl return on
investment Corporate
1. eorporate Perspective L~~n~~r\.to\ !-i.::::o;iJL
environmental
L\..-'0 :0.::; perspective
conclitions (ROO
a. Increase Return on Investmcnt (ROJ)
__== (12) Improve Safety amI Environmental Condilions
"._\:'.':'Cl e, ,
c. Reduction of Controllable Lost Profit 1
el. Recluction of l\Ilaintenance Expenses
e. lncrease Revenue fmm Assets
J I
Reduce lilcrease
f. Reduce Production Unit Costs prouuction I rcvcnue from
.c'"~---- 9- Increase Asset Utilization
h. 1vlinimizc Safety anci EnvironmentaJ Incidents
2. Asset Perspective
unit costs
,
I
J assets

i
7~"- @ 1vIinimize Unscheduled Equipmcnt Downtlme
__<:.:6..~ (Jil Improve System Availability
I I I
c. Reduce Seheduled Maintenance Downtime
.~ Q. Reduce Unscheduled Repairs
Minimize
safety arrcl
environnl.ental
Reduction of
111ain[enance
expenses
Reduction oi"
controUable
lost proftts
I
lncrease asset
utilizatiorr I!
e. Recluce Non~Eql1ipment-Related Downtime inddents
.---(.o. Increase Equipmelll Reliability ~. I
,g. Reduce Equipment Pailure Time FIGURE 1 ..1 Sample Corporme PerSIJeCVe Strategv 1vlap
4 Rool Cause Analysis: Improving Performance far !3ottom~Line Results !ntroduction lo the PROACT Root Cause Analysis (RCA) vVork Process 5

Let's return to the COl1cept Di' metrics ami K~y Performance "Inclicators (KPIs). metric is closely related to lVlTBF as it is the denominator for the calculation}~ can
TOil1 Peters once said, "Yon can't improve wbat )lou Call11ot measure." 11' yOll think al so be an accural~Jd1Gction of a facility's maintenanee and reliability perforlnance.
abant it for a minute, t makes a Jot 01' sense. Wc have been cxposecl to KPIs since
----------------------- - - ___ " -- --,---

\ve were very young. From the mament we lli'C bqrn we are weighecl ane! measurecl,
ancl then \Ve m'e comparecl to stanclcmls to see whiqh percentilc we are in. As \Ve grow
ancl get nto 5ehoo1, we are exposecl to 1nother setl;of KPls, the infamoDs rcport curel. This melrle simp1y measures the number of maintenance dollars that are expended
The report care! allows us to compare our perfonnance against our peers or to sorne on rectifying the consequence of an evento Tbis is typically tbe som of labor ane!
standard. An example that many people can ccrt<:n1y relate to is the use Df a scale material cost (incluciing contractor costs). This meti:'ic-is'lli's-en1plOyed across rnany
lo measure the progress of a cliet. We probably \\(ouJel nol be very successful if we different 'ctime;1sion-s-lilfe-ecllupI1fnt;atw, manufacturers, etc. This metric i5 a bettcr
did not k11mv where we startecl and what progres0 we were making week-by-\veek:. business metric as it shows sorne of the financial conseqllences 01' the evento It also
\Ve all neeel a "scoreboarel" to help us cleten~'ne where we startecl ancl where
we m'e at any given time. This certainly applies tp measuring the performance 01' a
maintcnance ane! reliability organizaron. We nee4 to lenow how 111any events OCCUf
I has &ame drawbacks, as it does not tatally reflect the complete fll1ancial consequence
of the evento It does no1 eover the lost opportunity (e.g., downtime) associated \v1th
the event. As we all know, the cosl of downtirne is mllch greater than lhe cost of
in a given month, 011 a specific class of equipmei_nt, etc. Not until we know which
KPIs will effectively measure our maintenance -ane! reliability objeclives can we
I rmllntenanee on a clramatic downtime event.

begin to establish which opportunilies wiII afforcl the greatest retums.


AVAllABIUTY
\Vith all of that saiel, we woulel like to provide,a worcl of eaution. Be very careful ---'=t;:.:=>
lO diversify your KPI selections. vVhile a report cal:cl in sehool is a goocJ measurcment T11is metric i5 use fuI to clclermine how available a given asset or set of assets has
of a student's performance, it still does not pl[ovide a complete picture of thc bcen historieally. In a 24/7 operatiol1, the calculation is simply the entire year's
individual studenLlt ls only one data point. S0ll1e!sluclents perI'orm belter on writtcn potential operating time minus clowntlme divided by total potential operating time.
tests while other stuclents excel in other ways. \Ve neeel to be careful lo malee sure
tbat we employ a set of KPIs that most accuratel)-! rcpresents our performance. That (4 failures of 8 hOllrS each)
.?2L~-J_!~.~~.~~.~1~~~~ in a year)-32
mei:1ns baving many different me tries that look a~ clifferent areas 01' performance so 8,760 (total hrs. in ayear)
we can get a complete picture. ,
Availabili!y = 99.63
So let us talee a loole al a few C01111110n Reli<;tbillty KPls that can be employecl
to give lIS an llnclerstanding of Ollr overa1l asset performance. EQUATlON 1.2 Sample Availability Calculation

This calculation can be moclii1cd in many ways to fit a speciik business neecl.
'<"-;:\~''';i\7 MEAN TIME BETWEEr" FAllURE (MTBF) Although this me trie is a gaocl reftection of how available the assets are in a given
~ 1-\/\/ ().-,j ~
time period, it provicles absolmely no elata on the reliability or business il"~1pact of
F(lt-r\\r1 i'i.'..rdcan Time Betweel1 Failure CMTBF) is a common metric tl1at has beel~ usecl ror the assets. ---- ." -~ .. -----._,.==-=-=~,:--~---~---=~~----"~-.,,,,-,==-- -

~.~.~.0 many yems to establish the average time betweel1 failurcs. l\lthough h can be
calclllated in clifferent ways, ir primarily looIes at tIle tot,11 runtlmc of an asset elividecl
by the total number of failures fol' that asset. RElIABIUTY
This melric can be a better reflection o how reliablc a given asset is based on its
Total Runtime I Number ofEvents = MTDF
pasl pcrformnnce. ln the availability example aboye, we hacl an asset that failecl four
EQUAT10N 1.1 Sample M'TBF Calculation times in ayear resuJting in 32 hours 01' downtime. The availability calculation
dctcrminccl that the assct was available 99.63% 01' the time. This might givc thc
This is a good metric becallse it is easy for people to understancl and relate 10 I impression of a higbly reliable assct. But i1' \ve use the reliability calculation shown
and is common throughout industry. below we get a mueh different picture.
Tile fact aI' the matter i5, an asset lhat fails four times per year i8 extremely
llnreliable ancl the likelihood of that assct reaching a missl0n time of one year 1S
. --:-~~;// NUMBER OF EVENTS highly unlikely evcn though iLs availability lS very good.
This metric simply measures the volume of events that occur on a variety of climell- These are on1y a few common KPIs. As you can imagine there is an array af
sions. Those climensions are typically process units, equipment classes (e.g., pumps), metrics thal can be lIsecl to help measure the effectiveness of a maintenance amI
ecuipment types (e.g., centrifugal oumos), manufaeturer. ancl a host 01' others. This reliabilitv organization. Vve wiI1 disCllSS these in more detail in iust a moment.
:i""'!~~'.:f-""" ":... r' '. -,;:-~L.""" "~ L

"(}\-'';;\ ':','.1 '<6 \ -1-" ""J ,-':\Q R~~t Cause Analysis: Jmproving Performance for Bottol11-Line Results Jlllroductioll lo the PROACT Root Cause Ana!ysis (RCA) VVork Process 7

Reliability ~ e ~At '\fhen these threshldsare sel prperly for ~~gtu:ne.C!.:Sll).'.\~Jnent wc_ can objectivel)
assess our performance. O-{he-l:\vise, --\Ve -are sEnply collecting info~n;~ltfc;n- with l1l
Naturallogarthmic base: e=2.718 rears'iEnsc 01- Wllcth"ffle value is meeting our spcci-ied goals.
Let's gel back to the strategy map cliscussion. The process is to review ead
Fe'al' l . ).
ure rate: e00
1 i
MTBl'i, ~
1
91 objective that we deem important 10 our strmcgy- ancl list one or more KPIs tha
wl1l be accurate mcasurements for thal objective. Once vo/e define the I11casuremen
Mission time: t ~ 365 (days) and ca1culatioll, we neccl to determine the targct, strelch, critical, best and \vars
valLles 1'01' that measure. Upon completioll 01' this process we have a complelec
Reliability ~ e ~At stralegy map. Below is atable \vith somc example KPIs that relate to our object1ve~
t\.::......J~TA. L f-L ---Y'G
~ 2,718
~2,7189~!
~Al

1
- -(365)
and perspectives:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _-\,)
r ~'/\ l"'::'('l-~
4("
\!l... 0: J]i,=-
_CD "
~\~

I
-4.0109
TABLE L1
~2,718 ' Sample Completed Strategy Mal'
~L81%

EQUATION 1.3 Sample ReliabiJity Calculntlon

So we now unclerstancl that 1vlTBF, MT~R, avallability and many other~ are
conn:1on measures .for the effcctiveness o~' e~t~ipment reliability: B~lt unless ti~ese
metncs are measurmg the performance o" a grven company obJectlve they rmght
not provide the beneilt that is trying to be achi~vecl. Therefore, we necel to first 1001<:
at each objective and Lhen develop pertinent 111easurcmcnts to see ir that objective
is indeed being met.
For example, if our objecLive were to reduce procluction unit costs we would
measure the cost per unlt of product produced. This will help us to tLnclerstancl if
we are getting better, \Vorse or staying constant wilh respect to our prod~lction costs.
Howcver, this alone is not enough. We necd to be more spccific when w~ are defining
our measurements. The term Key Performance lnclicator (KPl) as it is oi'ten rcfcrrecl
io needs to delineate the clifterence between good anel pOOl' pcrformi.lncb. Por exal11-
pIe, let us assumc that our average cost per unit o' procluct is $lO this n~ollth. Js lhat
cost high, meclium or low? In order to_ haye an il~diCil!9i-Y2~llq$t Q,91lne -he
measurement tlllcsholcls. In oUI- example, -';ve saicl-thatUle n~eragc c~)st~ej:-unlTtl~is
-~1110ntl1-\\ias-:nO. Perh~1PS=O~lr target value for production unit cost js $8. 1'herefore,
our performance is not very good.
A KPT has several thresholcls that ShOllld be dennccl prior ta the nionitoring of
'lhe measure's value. These are listecl below:

l. Target Value - This value specifies the performance required to mcel -he
objective.
2. Stretch Value - This yalue represents performance aboye ancl beyoncl
what is expected to meet our objectiyes.
3. Critical Value - This value represents performance lhat is clcemed
unacceptable for meeting OHr objectiyes.
4. Best Value - This is the best possible value for this objectivc.
5. vVorst Value - T11is is the \Vorst Dossib1e value for lhis ohiecl-iw~
8

fTAI31J;il.1 (pooti!l!.led),
Sample Completed Strategy Map

withPM

Optirnize Work % ofRework


Processes

Opt!mize \,Vork Hours of Overtime


Processes

% of overdue work orclers

% ofPdM-Generated Work

Perfol1Jl_Retiability work,orders-.
S~udies Relitbility

Time

obs;rved from

Impl'oye Historical' % o pop\!!ated Iequir~d fielcls in


Equipment Data work arder history _
Collection
Hours oftraining per employee

Dowlltime oa ~aining peT


NUll1her of failures

BAlANCEDSCORECARD
'~1'1*'re;ll1e pr()Ce,,~ ,o l1foitoring the~~I(PI~Ol1 ~ routiI1e~a~i~. W~ w.i
ballln<:ed, sC9[ecard methodolog)'. tQhelp ust do j\1~t that. Ab~lance
''I''M"K'o. the per~pectives, oJ;jective,s':Cll1d me~sures introduced-in the strateg
Labor Cost :ofRepairs puls them)nto.an.ea~ily under~tood fOlilla!. A sample of a balance
.and KPI measurement are display"d below ..
10 Root Cause Ana!ysis: !mproving Performance far Bottom-Line Resu!ts ~cn:'ctiollto the PROACT Root Cause Allalysis (RCA) Work Process 11

Having a11 of your cdtical pelformance i~fonnation displayecl in Dne place makes
it easy faI everyone involved in the enterprise to see his or her performance ancl to
determine where to focus attention. :
Th15 process ~~~1-~!~~.g:~ that we are w~rking ~~~." th~,--~rlti~al. is~ues that 1110St
-= -----r-----'.- - - - ----.,. . . . .. - - -
~~
affect the pelfonnance of the business. 011ce we begin to 111onit011 the balanced
sCQ1:ecm:d 011 rtine 'b-asis"'\ve\vill begin t9 see the areas where we: neeel to make
improvements. For example, let us say we ~re monitoring unscheduled downtime '
<J ~.
o
as a mea8urement of the equipment downtiIhe objective. We observe that our o ~
t::! f~
fonnance for that KPI ls well below the target level. We then must investigate and ~ ;;.;
collect infonnation to see which events are c~ntributing to the poor performance fOl'
that objective. !
~

~
WOR~
tio
THE RCA
!
PROCESS " "" G
"O
@
A successful RCA initiative must have a stra'tegic ancl tactical plan in place. We just u

discussed the concept of a strategy map to lensure that we me measuring the key
metrics that wlll enable us to achieve our C0I11pany objectives. Let's talle more'
*' Ul
~
o
u

the tactical plan for implementing the RCA ltiative.


o
q ~
o
<.~
.2
~ 'o
<9 Ero
First of a11, we must have a means 01' c~)llecting data related to the events that
~
affect the pelionnance of our stated objectivet. This can be maintenance data, ".
data and other data related to the peliormapce of our facility. We will talk much
~ :il O
:5
''""
U
O

more about event data collection in Chapteri5 ancI Chapter 6. , ;t; ": '
,~
v
u
O
Once we have a process for collectil1g dtlta on these events, we must decide on ro
O
critera that wm initiate the execution of an RCA analysls. Por example, your strategy !J
O
'8
might dictate that any failures that occur on! critical equipment must: have an RCA
performed. This is very common for events that relate to safety ancl pnvironn!-ental
c,

";;: '"""uv
u
performance. We do not want to leave this iJrocess too amblguous qecause people o
will not know when, and under what clrcun1stances, to conduct an apalysis. '"
'@

It may be that you want to employ clifferent levels oi' analysis for different '"
v

a""
performance clitera. Perhaps you have many events that occur on nortcrltical eguip-
E 2: e
ment, but the frequency of the events ls causing a large a1110unt o~: maintenance ~
"'oo ;.:;
expenditure. This might 110t justify a full-blown team to peliorm tlie unalysis but
still woulcl justify some level 01' analysis to determine the reasons (al' the chronic

o ".""o
"!

'"
'"
maintenance events. These types of analyses might be much less fon~~al than a full- :o
blown RCA but stiU are valuable.
"
;;;
,s
.~
<.:J
:
'
Since every company is different and thus has clifferent goals adcl objectives it ~

'2
would not be prudent for us to define a generic criterion. However, we can delineate
.~
some examples that might be considered. In any plant, there i8 a ne~cI to optimize
maintenance expenditures. Therefore, we may want to consicler a ct-iterion that i8 '2,
mi
basecl on the amount of maintenance expended for a given piece ofequipment fOl'
a fixed time period (e.g., the last 12 months). 11' a pieee of equipmelt exceecls the
J
91
HI
threshold in that time period, then an RCA will automatically be initiated.
Another cornmon criterion can be based on production losses. This ls especially
tnle if yonr plant capacity 18 limited and you can market and sen everything that is
prodnced in your facility.lf there i8 a production 10s8 that exceeds a sp~cific financial
valne, then an RCA should be initiated.
UnitAvallability
,,,
,Ol'iner(s): Latino" KeD
FIeqll~rlt'i: i",onthjy

~
Ctlrrent Mea:mre: B!l/2DDS
~PIRanges ;;v
Actual ,94.1i
o
Pie'vlous s4.a 111 v.iorst-itkal S
Target
CJ,ltical
5tretcn
'97.!')O
92.00
'99.0p
CritcaHarget
Target-Stretch
/1..
100.. <. . .
\;)!?",
n
~
e
~
ro
Sc:re -ROO
>-
~
UnltAvilllability
"
~
Historical !i?'
10.0.00 3
-el

;;Z
80.00 a
~.
j5 ~
60.00- q

ro
;'

"""
=>
5'
20.00 3
0.00
"
~
n
ro
O'
~

w
S
S
3
r
:o"
ro
'"
ro
~
e
FIGURE 1.3 Sample Balanced Maintenance and Reliability Scorecard (2)
'"~

~
~

(3
Q.
e

s
"'
o"
~

~
:::c
ro
u
;;V
O
>-
Cl
'"
o
S
n
"
e
~
ro
>-
~

"
-<
~
;;"
?3
n
2':
~
A
u
an
ro
~
~

FiGURE 1.4 Sample RCA \Vade Process -,


w
14 Rool Cause Analysis: lmproving Performance for Bottom-Line Resu!ts lntroduction to the PROACT Root Cause Analysis (RCA) Work Process 15

These are simple examples, hut it is importt~nt to make sure that there is an process called Opportunity Analysis (OA) where we collect the data thIOugh the use
'__. agreed upon crlteria for when RCA analyses will ~e in1tiatecl and who wili perform of an interview process of various pe1'sonne1 \vithin the affected area. In the subsequent
'-::.~ the analyses. At 111any facilities, there is a Reliability Engineer responsible for a chapter we will discuss a more automateel appraach ta elata col1ection that will ntilize
given arca 01' the facility and responsible to pelform RCAs on equipment/events in existing infonnation systems that may aiready be employed at the company.
his area. It ls then 1s responsibility to determine! which adclitional team member8 There are pros and cons ta both approaches. It generally comes clown to data
will be necessary to perform the analysis. We will biscuss team 1'ormation i~1 greater collectian processes ancl how effectively they have been employed. Many companies
detail in ChapteT 8. 1 : utilize a Computerized Maintenance Management System or CM1vlS to manage
The key to a successful analysis is to malee IsuTe that -you have the ~lata and maintenance work ancl to clocument work history. For many, these systems are nO
subsequent infonnation to determine the underlyin& causes 01' the isslle being studied. utUizecl to their full potential and ma11y times tbe work history on assets is not t'ully
The temn will review the problem and cleterml~e what data wil1 be n~eded to documel1ted. Ir this is the case, thel1 a manual interview process can be utilized to
determine the root causes. The PROACT methodo:logy offers a simple but ~frective perform the opportunity analysis.
ac1'onym calleel the 5Ps to help in :thi8 effort. The! 5Ps represent the five c~tegorjes Now that we have explored (he concept of the RCA \\Tork Process, we will
of data requn-ed to analyze any problem. We will ~iscuss the data collection effort narrow the scope ancl 100k into the fielcl of RCA itself and what it means in the
and more specifically t~1e SPs, ~n Cha~ter 6, . I . . '
industry, both from a use!' and provider perspective,

1.
Have. you eve~' s~t. 111 a brmnstpnmng meet1l1 to solve a pmtlcular problem 111
t11e company? ThlS lS a very common approach Ita problem solving, We are not
against the concept of brainstonl1ing. In reality, w(= think it is a required activity in
the RCA analytical process. The problem with 1119st brainstorming sessions 18 that
the groUl~ pres~nts a vari:ty of .ideas but s0111etim~s they lack the data to v9rify that
lhe ~olutlon wlll \York. l:'or th18 reason, the PR9ACT methoclology will ptilize a
LOglC Tree approach to solve problems. This ls a,!visual brainstorming tol. It is a
hierarchical approach in which the pl:oble~ is defi~ecl in the beginning of the process
and subsequent hypotheses and venficatlOns are ~ormulated and proven. The encl
goal of the process is to identify the true 1'oot causes of the problem, Thes:e causes
can be p~~sic~l, human or latent in .n~ture, We w~.ll discuss this later in Cl~apter 9,
IdentlficatlOn ofroot cause,~, al~$lli l~portant, wlll not ~olve t~le probl~n~. rh~ only
way for the problem to be resolved lS to nnplement ~orrectlve actlOns. Tlus lS typlcally
clone by creating a list of recommendations c\irectecl at eliminating ol' redJcing the
impact of the identified root causes. These recornmendations must be th~)t"oughly
reviewed by all parties to ensure that they are the right solutlons. Although c~lIses are
facts and cannot be disputed, recornmenclations shoulcl be thoroughly sdutinizecl
ancl modifiecl to ensure that they are the best course of action. We will cli~cuss the
process of communicating team finclings al"1:d recommenclations in Chapte1l 10.
As time passes we sometimes forget to follow up to make sure that our cbrrective
actions were implemented and are providing the specified return we had ibtendecL
Ir the losses related to the problem me still affecting plant performance al1d nega-
tively affecting our corporate strategy, then we shoulcl reevalua(-e om cprredlve
actlons to determine why they are not provicling the intencled benefil. Thel strategy
map discusseel earlier will help but we would recommend having reevaluatiol1 criteria
set fOl' each recommendation. For example, we might measure the number ot' failures
on that piece of equipment. If another failure occurs in the next 12 moi1ths, we
should reevaluate to see if the failure was related to the ineffectivenes~ of our
conective act1ons. We will discuss tracking results in Chapter ] L
Let's revisit our discussion on data collection methocls. We have various methods
to colleet historical event information. We would like to break it into two categories:
a manual ancl automateel elata collection process. In Chapter S we will discuss a
2 ofI ntrod uction to the Field
Root Cause Analysis
WHAT 15 ROOT CAUSE ANAlY5!S (RCA)?
What a seemingly easy qucstion lo answer, yet no standard, generally acceptcd
defmition ex!sts in the industry today. We participate in several RCA on-line cJis-
cussion forums wbere practitioners (beginners to novices) and providers internet for
the bettermenl of the industry. The two primary forums that ;ve cncourage interested
analysts to join are:

l. rootcauseconference@yahoogroups.com I , ancl
2. Root_Causc_Statc_of_thc_Practice@yahoogroups.com2

These are two very active fonuTIs with somc 01" the most knO\vledgeable people
in the business participating in tl1em. Issues that will be cliscussed throllghout this
text are clebatecl 011 these forull1s every day. Tbese forums play an important role in
how we see the industry as we learn what others me doing and the obstac1es they
face. This brings lIS back to the definition of RCA.
To our knowledge, there is no single, generally accepted definition ofRoot Cause
Analysis in the RCA inclustry. Technical societies, regulatory boc!ies and corporations
. have their OW11 clefinitions, but it is rare that we ftnd two definitions that match. T
demonstrate why ihis is, \ve willlist several definitions used and proposed in various
industries to show the many different ways in which people vie\v RCA:

!. Root Cause Analysis is any structured approach to identifying the factors


thaf infl1..1enced the conseq1..1cnces of one or more past events in arder to
identify what behaviors or conditions need lO be changed to prevent
recurrence of similar conseq1..1ences, when adverse, and to identify the
lessons to be learned to promote the achievement of better consequences.
(6/16/04 - Dr. William Corearan - The Firebird FOrLlm)
2. Root Cause Analysis is any evidence-driven process that, at a minimum,
Ul1covers underlying truths about past adverse events, thereby exposing oppor-
tunities for making lasting improvements. (5/20/04 - Mr. William Salot)
3. Root Cause Analysis is any pro ces s that 1..1nCove1's underlying truths
concerning the occurrence or severity of an undeslrable consequence 01'
condition and identifies opportunities for lasting improvements. (5/18/04
- Mr. Doug Emberley)

I Tbis cliSCLlSSlon fomlll is assQciacd with WWW.rootc<luselivc.com ancl modcrated by 11r. C. Roberl Nelms.
2 Thi~ disellssion forum 18 moderatecl by Dr. William Coreordn of NSRC COIvoraoll.
lB Root Cause Analysis: Improving Performance far Bottom-Line Results Introduction to the Field of Root Cause Analysis 19

4. Root Cause Analysis i5 any process th~lt identifies the unclerlying weak- ARE All RCA MHHOOOlOCIES CREATED EQUAU
nesses that might lead to a11 adverse eve~lt or conclition, in arder to idenfy
opportl1uities 1'or improvement. (5/12/04 - Dr. Kenneth Hirsch) There are many proviclers of various RCA methodologies on the market taday. JVlany
01' these proviclers use tooIs that are considered RCA in the RCA community anel
AH of these discllssion forum posts l resulted *0111 tIle original clefiniti?l1 proposed many do not. Many have been in the RCA business for decades and many have just
by MI. William Salot: gotten ioto i1. The point here is 1hat this is a buyer be\vare field.
Anyone interested in shopping for RCA basecl solely 011 initial price, should
I hand out l pencil and piece of papel' and just ask his employees to ask themselves
RCA identifies WTIAT underlying causes neecl to be fixecl, nol HOW to fix thCl11.
-- I~m_ . "why?" five times and he will have his answe1'S.
Who is right? We do not thi11k that there i$ cne cure-all definition far RCA. As Por those companies looking to make dramatic strides in their operations, shop-
we can tell from aboye, the proposed defmitiop was re-shaped every tiine a debate ping on price alone will not suffice. Those serlous about RCA's being a majar
el1suecl abont the definition of individual worcl~ with the proposed clefinition. What contributor to their bottom lines will be interested in the methodologies involved
we do not \Vant to bappen in the industry 15 foripeople to be discouraged from doing anel what supportiog infrastructure may be required to be successful. We will discl1sS
RCA because some definitions make it seem tbo complexo Por the pmvoses 01' this both of these very important topies in cletail in coming chapters.
text, \ve feel that definition two aboye sllits Oll~' needs and captures our belief as to Many of the 1110st respected proviclers in the RCA industry normally have the1.r
what RCA should be. Therefore we wiH procd!ect on the bas1s 01' that cleiinition. own unique slyles and vocabularies, but there are also many commonalities among
them. PROACT@ 1S no different. These llniquenesses are what make the different
brancls of RCA proprietary to a certain provider. They make the brands stand Ollt
WHY DO UNDESIRABLE OUTCOMES
I
OCCUR? ' and separate them from the general commoclity te1111 of RCA.
THE BIC PICI URE I Por the llsers this is botl1 gooel and bad. It ls good to have variety and competition
in the marice! to keep investment dowo and provide choices far specific work envi-
V:e must pul asiele the industry lhat we work in and follow along [mm the standpoint
ronmellts. It is sometimes bacl because no generally aceepted standarcls emerge to
01 the human bemg. In arder to understand 1 hy undeslrable outcOll1fS eXIst, we
which aH true RCA methods should comply. Also, because there are so many RCA
must understancl the mechanics offailure. Virtually a11 unclesirable outcomes are the
methods on the market, the use of tenninology ls at best inconsistent when comparing
result of human errors of omission or commisrion Cor decision errors),: Experience
them. This further confuses users when they try to compare terms like our physical,
in industry indicates that any undesirable out~ome will have, on aven\ge, a series
human ancl latent 1'Oot causes with terms Hke, contributing faetars, primal)' root
of 10 to 14 cause-and-effect relationships thaf queue up in a particul*r pattern in causes, underlying root causes, approximate 1'Oot causes, near root causes, mitigating
order far that event to occur. ! i faetors, exacerbaling faetars, proxil11ate causes, etc.
This dispels the eommonly helel myth thai one error causes tl1e ul~imate uncle-
sirable olltcome. AH slleh llnclesirable olltcomes will have their 1'oots 1Jmbeclcled in
the physieal, human and latent areas. ATTEMPTlNC TO UNDERSTANO RCA - 15 THIS
I COOO FOR THE INDUSTRY?
Physieal Roots are typically found soon after er1'Ors of commission\ ar O1nis-
sion. They are the first physical conseqllences resulting from la human Valiant attempts have been made by the joint provider and user communities to
decision error. Physical 1'Oots, as wiU be describecl in detail i11 coming develop a standard for indllstry. One such attempt was to model it after the SAE
chapters, are in essenee tangible. JA-lO] 1 RClvl standmd I . Debates arose as to whether such a standmd is neecled at
Human Roots are decision errors. These are the actions Cm inaCtlons) that aH and if so, can one be deve10ped without eonstraining the task of RCA itself?
trigger the physical roots to sUlface. As mentloned aboye thes~ are the Becallse RCA reqllires sLlch open boundaries to the disciplinecl thought process
errors o' omission or commission of the human being. required to fincl lhe truth, would developing a standard b.ias possible outcomes?
Latent Roots are the organizations or systems that are flawecl. The~e are the Creating an RCA standard may define the boundmies bf RCA cliferently than
support systems (i.e., proceclures, training, incentive systems, purchasing some providers' methodologies. In sorne eircumstances, some provide1's' established
habits, etc.) that are typically put in place to help our workforce m*ke better RCA methodologies may now be deemed non-compIlan1. This would obviously be
decisions. Latent 1'Oots are the expressed intent of the humml decision a detriment to their businesses, ancl naturally they would oppose the development of
making process.
I Evalllation Criteria for Reliability-Centered Maintenance (RCIVI) Processes, G-Il Supportability
l AH posts printed with pennission of lhe website moderaloIs al rootcRuseconference@yahoogrollps.com Committec, SAE Standmds, Document ti JAlO 11, August, 1999, (http://www.sae.org/servlets/product-
20 Root Cause Analysis: Improving Performance far Bottom-Line Resu!ts !ntroduction to the Fie!d of Root Cause Ana!ysis 21

sllch a standard. For instance, if an RCA standfrcllisted validation of each hypothcsis are not typicaUy basecl in fact. They typically allow ignorance and assumption
with hard eviclence as cssential to ReA, then lypical brainstorming tcchniques would \he<~'say) to be viewed as fact. These are attractive techniques to such a reactive
be non-compIlant. If another RCA essential were that the tcam I11cmbers hacJ to create envlrollment because they can be concluded very quickly, oftentimes in a single
the logic by exploring the possibilities of hO\\~ something could have occurred, thcn session with minimum participation (iE any).
the use of pick-list RCA methpdologics woulJI, be non-compliant. Why do such techniques conduele so quickly? Time usually is not requirecl to
Pick-list RCA is when the methodologies either provide paper templates with thcir collect data or evidence to support the hearsay hypotheses. Usually data collection
list of possibilities or, if software oriented, ctrdp clown lists appear with~ the vendor's and testing is the bulk of the lime requirecl in any investigative occupation. In accident
possibilities provicled. While these approachesi 011 the sllrface seem the':l11ost logical investigations, think of what \veight they would cany without providing harel
3nd the easiest route, there aredangers. One stlch danger is the user believes that all evidence, Ifthe National Transportation Safety Board (NTSB) didn 't collect evidence
I .
the possibilities that could have contributed to hef undesirable outcome' are provided at airline crash scenes, what credibility would they have when issuing condusions
I .
in this Iist. That willllkely never be the case as no vendor can claim to capture aH of and recommencIations? What weight would a prosecutor's case in court cany if they
the vmiables associatecl with any event in ever)! environment. The second clanger, and had no evidence except hearsay?
perhaps the greater, is tllat the task of RCA is ~neant to mise tile knowleclge and skill
levels of the workforce. A methoelology that ~rovides what appears to be all of the
HOW TO COMPARE DIFFERENT RCA
llilswers does not force the users to explore lhe possibilities on their own and therefore
they do 110t learo. They are simply doing pain~by-the-nul11bers RCA. METHODOlOGIES WHEN COMPARING THEM
Unf011unately, for the user community eSP9cially, the endeavor to develop a C0111- When rcsearching RCA methoclologies, we should consider characteristics other
mon standlli'd never carne to pass because the ~llajor providers couId ne~er come to a than investments. While the initiaJ investment may be very inexpensive, our greatest
consensus (vvhich is 110t unusual). Ifreaders wantted to take it upon themse~ves on behalf concerns should be that the methodology has the breacIth ancI depth to uncover aH
of their corporations to clevelop an RCA stand, rel intemally that outline.<:;, the essential of the root causes associated \Vith any undesirable outcomes. 11' we focus on cost
elements of an analysis process in order forit to be considered Re:,"-, we woulcl anel not value, we may find that the lifecycle costs to support an inexpensive RCA
encourage them to obtain a copy of the SAE JA-IOll RCM standard ancl use it as a methodology will cost 100 times the original investment when the undesirable
draft baseline for the development of a similar dbc1ll11ent for RCA in theiI'[organizaon. outcomes persist ancl upset daily operaons,
As \Ve can tell from reading the SAE stanhard referenced aboye, it1is not biased We suggest that when a facility has properly researched the various RCA
to any provicler or methodology. Tt simply cl~rifies far the arganizati~n what they methoclologies on the market, it short-list the top tIu'ce providers based 011 the
consider to be the essential elements of RC~. This is important beca~lse there are company's internal requirements (i.e., the standard that we discussed earlier). It is
divi~ed camps. on wh,at is the ~cope of Re.!}. Some feel tile tasks .9f ide:1~ifying, also advisecl that the short-listed providers submit references prior to any future
quahfied candldates lar ~CA lS not RCA ltself. Some fee.l that tIle v:ntI~lg o meetings. Discussions with references should focus on comprehensiveness of
~ecommendations and the~ subsequent approval p~~cess ancJ 11l1plemeptatlOl1 l~ llot approach, efficiency ancl effectiveness, necessary management support and general
1ll the scope of RCA. Havmg such a document clan[les what the cOll1P4ny conslders acceptance by organizational personnel. Vle would be seeking to sift out the advan-
to be RCA, and, more importantly, what is not consiclered RCA. . tages ane! disaclvantages of the provider's approach that these users have experienced.
We want to be sure to understand isslles that are llnder the control o the provider
and issues that are uncler the control of the purchasing organization. For instance an
WHAT 15 NOT ROOT CAUSE ANAL YSIS? '1
, organization may select the best RCA option for their environment, but if the
Tt is common knowledge in manufacturing and healthcare tocIay that r;1 majority of management support il1frastructure is not in place al1cl the effort fails, it may not be
clepm1ments are understaffed and ullclerfuncled. When looking furt!;her nto the due to a f1aw in the selected methodology.
increasecl risk of error by a human being, one shoulcl note tl1at being verwhelmecl Once short-listecl lhe providers shoulcI be given the opportunity to present their
wth emergencies fueIs the environment of error. I approaches either in-person or via Eve on-Ene conferencing technologies. This is
The acceptance of common brainstorming techniques such as the ~ishbone Dia- where they shoulc1 be qllestioned and evaluatecl based on the merits of their
gram, the 5-\~Thys a.nd process fl.o~ mal~ping tecl~iques haveyrovicled jmany a .false approachcs ancI t1.1e breaclth and clepth of their offerings. Keep in mind that thi8 will
-v sense of secunty.11us false sense oL seclll1ty comes rom the bellef that these techmques
are compm'able to tme RCA. Again, t1.1is reinforces the need for an internal standard
also require preparation 011 tIle analyst's side in terms o' preparing eelucated and
detailecl questiolls related to the methoclology and not just pricing structure.
that defines the minimum essential elements to be considerecl RCA in theorganization. One tool we provicle om prospects that are researching RCA methodologies is
The aforementionecl techniques are refen:ed to as brainstorming tedmiques anc1 the evaluation tool shown in Figure 2.1. This ls an unbiased way of equally evaluating
not considered RCA techniques within tbe RCA comlllnnity. This is becan,e thoy several approaches basecl on cllstom weighting of methodologv characteristics.
22 Root Cause Analysis: !r:nproving Performance far Bottom-Line Results Introduclion lo the Field of Root Cause Analysis 23

Notice the characteristics (in this case) in which we have decided to compare
the methods short-listecl are:

l. Simplicity/User Frienclliness: One thing to aH 01' liS that lS an endangered


species ls time. Therefore, when conclucting 8uch analyses the methodol-
ogy must be very simple to grasp in concept 3nd to execute in practice.
2. Analysis Flexibility: Too l1111ch rigidity in a methodology can impose
unrcalistic constraints that can stifle the analysis itself. As we tell Qllr
cIients, we are C0l1S11ltants and \Ve live in this ideal world where we
make things Iook so simple. The fact is the best \Ve can do is provicle
an ideal framework for conclucting RCA, The methoclology must be
pliable enough to work effectively when molded to meet the reaEty of
the working environment,
3. lnitial Cost: While this is an important characteristic due to our buclgeting
constraints, we must nol let initial cost clOlld lifecycle costs and vatue,
If we ulways opt for the least expensive we must consider that if the
melhodology ls inferior and the problcm happens again, how much elid
the RCA purchase really cost tIle organizatioll?
4, Quality of Materials: \Vhcn the proviclcrs are gone, hmv good is the
reference material that you wil!. rely 011 in their absence?
5. Results and Reports: How well does the approach's reportlng capability
allow me to meet my compliance obligations and repOl'ting to my supe-
riors? Docs the methoclology provicle me a means fOl' making the business
case for implementing my recommendations? \\That feedback dicl we
receive fmm lhe references regarding the reality of results?
6, Training Flexibility: ls the training extensive enough that 111y analysts will
>< ~ N be comfortable in cJoing anatyses when the consultant leaves? Will the
>-
2'ro 2'ro l-raining involve canned examples in my industry and/or the use of cunent
"ro
'" "Eo "ao
E
o
problcl11s in my facility? Does the training convey"knowledge (lectllre)
U U U ancl skill (cxercises)? I5 there follow-up or refresher training available
anc!/or included? Will upper management be trained in an overview format
in what their responsibiEties vi'ill be to support the RCA effort?
7, Process Credibility ancl Thoroughness: \Vhat attributes cloes this approach
haye thal will allow it to likely capture issues that other approaches will
not? How easy \viII it be for my people to bypass the discipline of the
RCA process reslllting in shortcuts that can ncrease the risk of reCUlTence
01' the undesirable oLlteome?
8. Ability lo Track Bottom-Line Results: Does this methodology put any
empha..c;is on retllrn-on-investment (ROl)? What tralning and tooIs are
provided to ensure that the nnalysts are enpabIe of making a business case
fOl' their analysis results?

Remember, these are only a si1mpling of critera in which RCA methodologies can
be evaluatccL The organization 's Cyalllation team should COme up with its own list
based on the organizntion's own neec!s. Once the critera have been established, then
24 Root Cause AnaJysis: lmproving Performance far Bottom-Line Resu!ts Introductioll to the Fleje! of Root Cause Ana!ysis 25

tbe evaluation tcam caH weight eacb aL these! factors as to tbeir importance to the OBSTAClES TO LEARNING FROM THINGS
oyerall decislon, We typically use a weighting scale of 1 to 5 where "1" has the THAT CO WRONG
lmver impact on the cIecision and "5" has the, greatesL. Once these are estabEshed
and cntered lnto a simple spreaclsheet like Figure 2.1 (after the evaluatlon team meets In a recent informal on-line poU! prescmed to a group of beginner ancl veteran RCA
with each provider), they wiU fill out th18 e~aluation form inclividual1y ancI thcn practilioners, the following ql1estion \vas asked on the discusslon forum:
average thcm together as a team.
When the individual forms are compareclj if there are great disparities in any "W/wl are the obstacles to learningfrolll fhin.gs Iha! go wrong?"
particulru- critera it should be a signal that furtljer discussion 18 needecl to lloclerstand
why there 18 such a gap in how temn members riew the same thing. This approach is The following list is a surrunary 01' the responses grouped into appropriate
categories by the moclerator. Some cxamplcs of the actual responses are below each
a quick and unbiased manner in which to comp~~e offerings 01' any kind, qot just RCA.
category to heIp define what \Vas meant by tIle category title.

WHAT ARE THE PRIMARY Dl,FFERENCES BETWEEN 1. RCA ls almost contrary to human nature: 28%
SIX SIGMA ANO RCA? a. People don 't like to admt they made the mstake.
b. Accountability. If yOll are the boss, that is it!
Where lioes RCA tit in Six Sigma? The focal :point of most Six Sigma efforts will
"

c. We are unwilling to change our own behavior.


be to achieve precision through the minimizatlFH1 of process variation. However, the 2. Incentives and/or priority to do RCAs are lacking: 19%
-7 goal of RCA ls not to mi~imlze ?l"Ocess v~ia.tion, but to elimll1ate, the rlsk oE a. lt ls 110t expectecl of them.
recurrence of the event that 15 causmg the VartatlOn. b. There is no personal incentive to do so.
For instance, if a bottling ~peration was tlie system being analyzecl, Six Sigma c. The work envlronment does not condone, nar accommodate, sllch a
rnight seek to minimize the cpnsequences of "line jams" (process v'ariation) by proactive activity.
implementing recornmendations that would ~atch any jams at an ea~licr state in ~ 3. RCA takes time/we have no time: 14%
arder to tix it and minimize the production cbnsequences (MTTR M~an Time To a. People are too bllSy clue to daily wark/problems.
Repair[ or Restore]). 1,
b. Varlations on "I'm too busy."
Whereas RCA would seek to clrill clown :on the individual types pI iclentifiecl 4. Ill- or mis-clefined RCA processes: 12%
line jams ancl unclerstancl the chain of events ~hat lead to the jam in th~ fi1"st place. a. No agreel11ent on either "how far back" you have ro go in your anaiysis.
RCA would uncover the system deficiencies, whicb triggered pOOl" dedi8ions being b. Vaguely clefmecl processes.
maele that set off a series of phys.ical conseqhences unti l the line prdcluction was c. It lS a theoretic,1l approach. It ls practically impossible.
affect~d. ~CA seeks to Lln~le:st~nel what causes the unclesirable outcor+es to occur, ~ 5. Our "Western Culture": 9%
""'--"V ancl SlX SIgma seeks to ll1111lmlZe the consequences of those events When they do J. The stock market, short-term focus.
occur (i.e., process variation).
b. Managers being rewarcleel for short term reslllts.
Traditionally Six Sigma toolboxes lltilize' many total procluctive 1~1aintel1ance/ c. The tyranny of the urgent.
management (TPM) problem solving, brainstorming and RCA tools sucl~ as S-Whys, 6. We haven't hacl to do RCA in the past, why now: 8%
Fishbone cliagrams, fault tree analysis ancl til11eline analysis. While th~se tools are a. Not ho\V 1 was trainecl, not how l/wc do things.
goocl for basie problern solving, they are not traclitional1y llsed to th~ extent that b. Some behavior is so entrenchec1 that it \vould be like being struck by
Root Cause Analysis will be clescribecl in this text. RCA tools used i~ Six Sigma lightning for sorne individuals to be mvare of the need. .
tend to fan short of the depth achieved .in real RCA. Often tbis lack 0f ,
dcpth has 7. Most pcople don 't understand how important 1t 18 to leanl f1"0111 thmgs
res111ted in the coining of the term "shallow cause analysis." : that go wrong: 5% . .
Once an organization has iclentilied what its RCA neeels are, i:t l1111st thcn a. It never occurs to 11108t people that learning from expenence lS a cost-
unclerstand the social ramifications o' trying to il11plement sl1ch: behavioral eEfective activity.
ehanges. Remember, RCA is a thOllght process and not a tangiblei product. It 8. RCAs are not my responsibility: 5%.
:::::t7 involves the complexity anel variability associated with lhe human mind. It involves a. It's NIMBY (not in my bade yard).
cultural conslderations. While we will. clelve cleeply lnto the management systems b. That's not our jobo
requirecl to suppart such an effort, we wil! 11rst explore the reasons such efforts
often fail. Again, we will learn from those in the 13ast who have paved the way I Nelt;;,--R~b~l~l. (2004). What al'; lhc Obslacles To Learning Prom Things that 00 \Vrong? [Online].
for uso Available: hltp://www.roolcauselive.com
26 Root Cause Analysis: Improving,Performance fr Bottom-Line Results biI'iI!\iJ
,
!
The previous poll \Vas cited to make an]extrcmely important paint to execu-
tives. As ane can sce from the list, every :single objeclion is the res111t 0'1' an
improper, inaclequate or nonexistent managell1ent support structure. Every ane of
these objections can be overcome with proner strategy, clevelopmcnt and imple-
3 Creatingthe Environment
for RCA to Succeed: The
mentation of a support structure. As a matterl of Eaet, few 01' these are even related
to methodology consicleratiol1s.
Conversely, not addressing the support strllctllre wil11ikely malee s\-lch proactive Reliabi I ity Performance
efforts a lip-service effort that is not capablJ 01' producing sllbstanttI results. An
organization can have the best analysts aud th~ best tools, bUl without ptoper support
the proaetive efforts are not likely to sueeeedl.
Process (TRPP)
The following is a training moclel developecl by Reliability Center, Ine. (RC!)l
to provide guidanee for the cl((sign and ill1p]~111entation oY a support infrastructure The Reliability Performance Process (TRPP)I ls an RCA management Sllpport model
for proactive activities such as RCA. 1t enco~l1passes not only the elements about developed by Reliability Center, 1ne. (RCI). It eneompasses not only the elell1ents
specific training objectives nec'essary to be sucpessful, but it also out1in~s tIle specific of specific training objectives necessary to be successful, bm it also outlines the
require111ents o' the executives/manage111ent, the champions anel the drivers who are specific requirements of the exeeutivcs/management, the ehampions and the drivers
I
accol1ntable for creating the environ111ent for [RCA to be successfui. who are accountable for creating the environment for RCA to be successful.
Specific information will be outlinecl fro1111this moc!e] that ls pertlnellt to creating We will be outlining specific infonnation fmm TRPP that is peltinent lo creating
the environ111ent for RCA to ~llceeecl. For th'le sake of this text, we \;ll focus on the envimnment for RCA to suceeed.
RCA being the primary proaetive aetivity to support; however, the reader will
recognize that the moclel will fit any proaetivb initiative.
I
THE ROLE OF EXECUnVE MANAGEMENT IN RCA
Like any lnitiative trying to be implementecl into an organization, the path o least
resistance ls typieally fmm the top clown, relative to the bottom-up approach. The one
thing we should always be cognizant of ls the faet that no matter what the new initiative
is, it willlikely be viewecl by the enclllser as the "program of tIle momh." This should
always be in the back of OLlr minds in developing implementation strategies.
Our experience ls that the closer we get to tIle field where the work is actllaily
performecl, the shall) end, the more skeptics we will encounter. Every year a new
organizational buzz fud emerges and tlle executives hear and read about it in trade
journals, magazines ancl business texts. Eventually directives are given to implement
these fuds and by the time it reaehes the sharp end, the wel1-intentionecl objectives
of the-L;1tiaves are so dilutec1 fmm miscommunication that they are viewed as
nonvalue-added work and a burden to an existing workload. This is the paradigm
of the encl user that must be overcome to be successful at implementing RCA.
Often when we 100k at instituting these types of initiatives, we look at them
strictly fram lhe shareholders' view and work backwards. Do not get us wrong; we
are not against new initiatives that are designed to change behavior for the bettennent
of the corporation. This process 18 necessary to progress as a society. However, the
manner in whieh we try to attain that end is what has been typically ineffective.
We must look at linking what is different abant this initiative fmm the perception
of the end user as apposed to other initiatives we have tried unsuccessfully. \Ve must

I TRPP is a registered copyright 01' Reliabilily Center, lne.


-------_._-_._-----
I Reliability Center, Ine. (2004). The Reliability Pelfol'lllallCe Process (TRPP). Hopewell: Reliability
Center. lne.
28 Root Cause Analysis: lrl)proving Perfor;mance for Bottorn~Line Resu!ts Creating the Environrnent for RCA to Succeed 29
I
look at the reality of the environment of the peoRle who will make the change 2. The approving executive should be educated in the RCA process, even if
happen. How can we change the behavior Di' a gi';Ten population to reflect those it is an overview version. Such demonstratiolls of supporl are \vorthwhile
behaviors that m"e necessary to meet Ollr objectives? becallse the llsers can be assured that the executives have learned \vhat
Let us take an example. lf 1 am a maintenancel person in un organizatiol1 and they are learning ane.! support the process.
llave becn so fOI my entire cm"cer, 1 am expected to l'epair equipment to make them 3. The executive responsible {or the success of the effort ShOllld designate
more productive. My performance i8 measured by how wen 1 can make the,repair a champion or sponsor of the RCA effort. This individual 's role \vill be
in t11e shortest time possible. 1 am given recognitiOI~ when emergencies occ~lr, anct outlined later in this chapter.
1 respond almost heroically. This same 8ce11ario ca apply to t11e service inclustry, 4. The executive shoulcl cIearly delineate how the RCA cifort will benefit
healthca1'e and anywhere cIse people spend mast of their days reacting to prqblems the company, bUL more importantly how it \\'i11 benefit the work life of
as opposed to working on opportunities. i
every employee ancl provide quality product for the customer.
Now comes along this Root Cause AnaIysis (ROA) initiative and they want me 5. The executive should outUne how the RCA process will be imple-
to partlcipate in making sure that failures do not occbr anymore. In my mind,jf this mented to accomplish the objectives and how management will support
objective is accomplished, 1 am out of a jobo Ratherithan be perceived as no( being thase actions.
a temn player, 1 will superficially pal"ticipate until tl~e "program oi' the month" has 6. A policy or pracedure shauld be developecl to institutionalize the RCA ~_
lived out its average six-month shelt'life and then g4 011 with business as usual. We process. This is another physical clemonstration of support that also pro-
have seen this scenario repeatedly, and it is a very valid COl1cern based on the reality vieles continuity of the RCA application and perceivecl staying power.
?f. t~le. enc~ user. This.per.ception must be overcome fJrior to implementing a11 RCA It gives the effart perceived staying pO\\'er because even if there is a
mltwhve m an orgal1lZatlOn. ! ,
turnover in management, institutionalized processes have a greater chance
Let us face the fact that we are in a global envirdllnent today. We must cmpete of weathering the st01111.
I
not on1y elomestically, but with foreign markets. Oi'fentllnes these markets huye an 7. However, the most important acon an executive can take to clemonstrate
edge in that their costs to produce are significantl~ lower than here in the U.S. support is to sign a check. We believe this is a universal sign of support.
Maintenance, in its true state, is often viewed as a Aecessary evil to a corporation. Any organization that has implemented SAP@l or Six Sigma should be
But when equipment fails, it generally holds up proel~lCtion, which holcls up cl9livery, familiar with this concept.
which reduces profitability. Imagine a world wherelthe on1y failures that oc~curred
were wear-out failures that were predictabIe. This is world 'Ne are moving to~arels,
as precision environments become more the expectahon. As we move in this]direc- THE ROLE OF ARCA CHAMPION (SPONSOR)
tion, there wil1 be less need. for.m~il:tenanc:-typ: slfills on a routine b~sis:. I AH the aboye actions do not automaticany eI1sure success. How many times have
\Vhat about the arca of rehabl1ty eng1l1eenng; (RE)? Most orgmuza911s we we seen a well-intentlonecl effort from the Lop try to make its way to the field ancl
deal with never have the resources to properly sta,ff their reliability engin:eering fail miserably? Typically, somewhere in the micldle of the organization the translation
groups. There .ar~ ~lel~ty of avai1a~le roles .in tl:e tie1cl oi' reliability. Thinkl about of the original message begins to cleviate from its intended path. This is a common
how many relmbIllty Jobs are avmlable: vlbratlOl1 anaIysts, root cause anfllysts, reason of why some very gooel efforts fail, because o' the miscommLlnication of the
infrarecl thermographers, metal1urgists, designers, tnspectors, nondestructivp test- original message.
ing speciaIists, anel many more. I
Ir we are proactive in our thinking, ancl \\'e foresee such a balTier to success,
We are continually intligued by the mast frequently useel objection to R;CA at then we can plan for its occurrence and avoid it. This is where the role of the RCA
the sharp enel, "1 don't have time to do RCA." If you think harel abont this stat?ment, champion comes into pby. \Vc wil] use the term c/ampion synonymous]y with thc
it realIy is an oxymoron. Why do people typicalIy nol have time lo do RCA' They term spo/lsor.
are so busy fire fightng, they do not have time to analyze why the unc1d>irable There are three major roles of an RCA champion:
outcome occurrecl in the first place. If this remains as a maintenance strateg~, then
the organization will never progress, because no level of dedication is put tqwarcls l. The champion must aclminister aneI suppon: the RCA effort i'rom a man-
getting riel of the need to do the reactive work.
agement stanclpoint. This includes ensuring the message fram the top to
So how can executives get these people to willingly participate in p_ new the [loor is communicated properly anel effectively. Any deviations from
RCA initiative? the plan will be the responsibility of the champion to align or get back
on traclc. This person is truly the champion of the RCA effort. -4:---
l. It must start with an executive's putting a rubber stamp on thc RCA effort
and outlining specifically what his or her expectatiolls are for the process
ancI a time Une for when they expect to see bottom-line results.
30 Root Cause Analysis: lmproving !!:Jerformance for Bottom-Line Results Creating the Environment far RCA to Succeed 31

2. The second primary role of the RCA champion is to be a mentor to the 3. The champion will also be responsible for setting pelformance expectations.
drivers and the al1alystq. This means tl~at the champlon must be eclucatecl Tl1e champion should draft a letter that will be forwarded to aH employees
in the RCA process anc;l have a thorough unclerstanding of what lS neces- attendi.ng the RCA training. The letter should clearly outline cxactly what
sary for success. ! ' is expectecl o' them and how the follow-up system will be implemented.
3. The third primary role 01' the RCA cha~pion 15 to be a protector of thase 4. The champion should ensure all training classes are launched either by
using the process and unCQver causes: that muy be politically sensitive. the champion, an executive or other person in authority, thereby giving
Somet1mes we refer to this role as pro+iding air caver for groll1~cl troops. credibili.ty and priority lo the effort.
In arder to fulfill this responsibility, the RCA champion l11ust be in a 5. The champion should also be responsible for developing and setting up
position of authority to take a clefensiye positiol1 and protect t~e person a recognition system for RCA successes. Recognition can muge f1"0111 a
who uncovered these facts supportlng ;the iclentified catlSes. letter by an execulve to tickets to a ball game. \Vhatever the incentive,
it shoulcl be of value to the recipient.
'--f'l:>? Ideally this would be a fun-time pOSition.!However, we fincl it typically to b~ a
part-time effort for an individual. In either situation we have seen the champlOn Ncedless to say, the role of a champion ls critical to the RCA process. The lack
L
work; the key lS the role must be made a priority to the organization. This ls generally oYa champion is Llsually why most formal RCA efforts faiI. There is no one leading
accomplished if the executives perform the d~signated taslcs set out ~bove. Whcn the cause or carrying the RCA flag. If an organization has nevcr hacl a fonnal RCA
new initiatives come clown the' pike and the \vrkforce sees no support, it becomes effort, or had one and failed, such an endeavor is an uphill battle.
another "they are not going to walk-the-talk" ~ssue. These are viewecl as lip service
programs that will pass over time. If the RCY'\' effort ls going to succeed, it must
THE ROLE OF THE RCA DRIVER
first break down the current paradigms. RCA Imust be viewed as clifferent than the
other programs. This is also the RCA champi;on's role in projecting ap image that The RCA driver can be synonymous with the RCA team leader. Drivers are the
t111S is different and will work. I people who organize all the details and are closest to the work. They carry the burden
The RCA champion's adclitional responsibvities inclucle ensuring that the follow- .of producing bottom-line results for the RCA effort. Their teams willmeet, analyze,
ing responsibilities are carried out: 1 hypothesize, verify information and draw factual conclusions as to why unclesirable
! Qutcomes occur. Then they will develop recommenclations or COllntenneasures lo
1. Selecting ancI training RCA drivers wl~o willlead RCA teams.[What are eliminate the risk of recurrence of the event.
the personal characteristics that are requirecl to make this ai success? The efforts of the executive, manager and chmnpion to support RCA are directed
vVhat kind of training does the perso~.l need to acquire the tqols to do at supporting the driver' s role to ensure success. The driver is in a unique position (.'
the job right? I in that he cleals clirectly with the field experts, the people who will comprise the
2. Developing management support systems sllch as: I eore team. The personality traits that are most effective in this role as well as that
A. RCA performance crlteria - vVhat are the ~xpectations o:~ fimU1~ial of l core team member will be discussed at length in Chapter 8.
retums that are expected from the corporatlOn? Whal are the tune Prom a functional stanclpoint the RCA driver's roles are:
frames? What are the landmarks? !
B. Providing time - In an era of re-engineering ancllean m<:mutacturing, 1. I'v1aking arrangements for RCA training f01" team leaders and team mem-
how are we going to mandate that clesignated employecs 'fill spencl bers ~ This includes setting up meeting times, approving training
10% of their week on RCA team3? J ' objectives, anc1 providing adequate trainil1g rooms.
C. Process the recommendatlOl1S - How are recommendatlpns Tom 2. Reiterating expcctations to students - Clarify to students what is expected
RCAs going to be hanclleci in the current work order" system? of them, when it is expectecl, and hm\' it will be obtained. The driver
How cioes improvemenl (proactive) work get executed in a reactive shoulcl occasionally set ancl hold RCA class reunions. This reunion ShOllld
work arder system? be announced at the initial training so as to set an expectation of demon-
D. Provide technical resources ~ What technical resources arq going to strable performance by that time.
be rnade available to the analysts to prove and clisprove their hypotheses 3, Ensure thal RCA sllpport systems are working - Notify RCA champion
llsing the "whatever .it takes" mentaEty? oE any cleficiencies in support systems and see they are conected.
E. Provicle skill-based training ~ How will we eclucate RCA team 4, Facilitate RCA teams - The driver shall !cad the RCA tcams and be -"'"'
members ancl ensure that they are competent enough to participatc responsible and accountable for tl1e team's performance. The driver will
on such a team? be responsible for properly clocllmenting every phase of the analysis.
32 Root Cause Analysis: lmproving Perfo:rmance far Bottom-line Resu!ts Creating the Environment for RCA to Succeed 33

5. Document performance - The driver will J;le responsible for cleveloping indicators (KPI) of each i.rm, but in this section we will look at provcling a typical
the appropriate metrics to ll1yasure performa~1ce against. This performance business case to justify implemel1ting an RCA effort.
shall always be converted fmm units to dollr~ when demonstrating savings, Because the costs to implement su eh an effort wiU vary based on each facility's
hence snceess. ' procltlct sales margin, labor costs and training costs (in-hollse versus contract), we
6. Ensure regulatory compliance - The clriyer 811a11 be responsible' fOl" will base otlr justifications on the following assumptions:
ensllring tIlat the analyses conductecl are tliorough ancl credible cnqugh
to meet applicable regulatOly stanclards andl guic1elines. 1. Assllmptions
7. Comillunicate pClformimce - The driver sl~all be the chicf spokespe~'son a. Loaded cast al' hourly empJoyee $US 50,000/yr
for the temn. He or she will present updates;to management as well *s to b. Hourly employees will spend 10% of their time
other individuals on-site ancl, at other similruj operations that couId bet1cfit on RCA teams
fmm the infonuation. The driver sha11 develop proper informatlon di,stri- c. Laaded cast af [ull time RCA driver (saJaded) $US 70,OOO/yr
bution mutes so that the RCA resu1ts get to 6thers in the organizatiOlhat eL RCA driver will be a full time position
may have, or have had, similar occurrences' . e, RCA training costs (hourJy) $US 400/person/day
f. RCA training costs (salariecl) tus 500/porsan/day
The driver is the last ofthe support mechunisms .hat should be in place to ~upport g. Population trained Per 100 trained
an RCA effort. :rvrost RCA efforts that we have enqounterec! are put together at the 2. RCA Retllrn Expectations
last minute as a result of an incident which jllst 09cuned. We cliscussed thls topic a. Train 100 houdy employees in RCA methods
earlier reg,arding, llsing RCA only as a reactive too~. b. Train 1 salaried employce to lead RCA effort
A strLlctured RCA effort should be properly plbced in an organizationali chart.
o
c. Critical Mass (assumption): 30% of those trained will actually use the
Because RCA is intended to be a pro active task, it 'Ishoulcl reside under the 'control RCA method in the fieleL Tls results in 30 personnel traincd in RCA
oE a structured reliability departmenL In the absenci oi such a department, it,!shoulcl methods actually applying in the ficld (lOO trained X 30% applying).
report to a staff position such as a vice president ofoperations, engineering,.:quality d. Of the 30 personnel applying the RCA method, let us as sume they are
or risk. Whatever the case may be, ensure that an RiCA ea'ort is never placey under working in teams of three (3) at a minimum. This results in 10 RCA
I
the control of a maintenance department (DI any otfler reactive clepartment) i, By its teams applying the methoclology in the field (30 personne1l3 peT temu).
nature, a maintenance c1epartment is a reactive entrty. Its role is to responcl to the e. Each RCA temn will complete one analysis evely two months. This
clay-to-clay activities in the fielcl. The role of a trlle :reliabillty c1epartment ls ~o look results in 60 campJeted analyses per year (10 RCA teams x 6 analyses/yr).
at tomorrow, not today. Any pro active task assigne~ to a maintenance clepartfnent is f. Each "Significant Few" (to be discussed in Chapter 4) analysis will
typically doomed from the stmL , :' net a mininlll!11 of $US 50,000 ANNUALLY. This results in an anI1ual
This is the reason that when reliability became a buzzword of the mid-90:s many return of $US 3 million per 100 people trained in RCA methods.
maintenance engineering clepartments were renamed reliability deparlmen:ts. The 3. The Costs of Implcmenting RCA
same people worked in the department, and they were perfonning the san~e jobs; YEAR 1
however, theT title was changed and not their fUl1ctol1. If you tu'e an nclividllal who a. Training 100 hourJy emplayees in 3 days of RCA $US 120,000
ls chaTged with the responsibility of responcling tQ daily problems ane! also ~eizing b. Training 1 salaried persan in 5 days of RCA $US 2,500
future opportunities, you are likely never to realize those opportunities. R~action c. 10% of 30 hourly employees time per week, annually $US 150,000
W111S every time in this scenario. eL Salary af RCA Driver/Year tus 70,000
Now let's assume at this point we have developed aH the necessary systems and e. Total RCA Implcmentation Costs for Year SUS 342,500
personnel to support an RCA effort. How do we know what opportullties to work
on first? Working on the wrong evel1ts can be counterproductive aneI yielcl poor YEAR2
results. In the next chapter we will disCllSS a techniqlle to use to seH why you shoule!
work on one event versus another.
a. Training 100 hourly emplayees in 3 days of RCA O sus
b. Training 1 salariecl person in 5 days of RCA SUS O
:. ~ \' ,'C..y. ~ 'Y:ll-> l\ IL
c. 10% of 30 hourly employees time per week, a!lllually SUS 150,000
, ,.). 'L <'- 1'-" i<J SETTING FINANClAL EXPECTATIONS: eL Salary af RCA Driver/Year SUS 70,000
THE REAUTY OF THE RETURN e, Total RCA Implementation Costs [or Year 1 $US 220,000'
As cliscussee! earlier, one of the roles of the champion ls to delineate financial AH costs of resources to prove hypotheses and implement reCOlllinen-
dations are consirler~rl f1S <;nnk- rn"-,, 'Tp{'hn;ro<:d ,."'" ..... ,,,.,,,, .......... ~ ,"',<... __ ....1-.
~-;=> expectations of the RCA effort. This will obviouslv varv fmm th~ J.:-f'.V llf'.rfnrm:mrp
34 Root Cause Analysis: lmproving Performance for Bottom~Line Results Creating the Environment fol' RCA to Succeed 35
I

available and budgetecl for, regardless arReA. Also, recommcnclations the mass exodus ofknowledge and experience occurs in industry, how will businesses
from RCA generally result in the implementation of organizational compensate and be ab1e to compete in the global econorny?
I
system corrections. For instance rc\yriting
,
procedllres, providil1g
.
train- RCA actually can playa majar role in filling this corporate memory void. RCA
ing, upgrading testing tools, restructuring incentives, etc. These types is a too1 that maps out a process usecl to succcssfully solve a problem. This map in
01' recommendations are not genei'ally considerecl as capital cosls. essence is an aggrcgated thought based 011 the collective knowledge and experience
Capital costs resultil1g from ReA, our experience, are 110l the norm, o' om workforce. \iVhat we need to do 1s 1) encourage the activity of RCA in a
but the exception. clisciplinecl manner and 2) electronica!ly catalogue these analyses in a manner in-~
4. Return-On-Investment which future employees can view how previous analysts derived their conclusions.
a. Total Expected ReturD -Year $US 1,500,000' Activity one aboye can be accomplished by writing a procedure for RCA that
b. Total Expeetecl Costs . Year I tus 342,500 will survive the absence of a previous RCA chall1pion. '0le want the activity of RCA
c. RO! Year I 437% to still be expectecl by the organization va policy and procedure.
ASSllmes that it will take six (6) months to train a11 il1volvecl ancl get The fol1owing is a sample RCA procedure 1 we have usecl in industry in the
up to speed with actually impler11cnting RCA ancl the associated past. It ShOllld be usecl as a draft to moclel a more accommodating one for an
I
reconunendations. This is the reasoning 1'01' cutting this expectation in individual facility. 04--
half for the first yeaL
I
I ~ e'L.c0 Q,," '.-S

a. Total Expected Return ~ Year 2 $US 3,000,000 RElIABlUTY CENTER, INC, f<.\ .r.~ --'''.''' ~ ',""",-C (.
b. Total Expectcd Costs . Year 2 tUS . 220,000 \? :iA L'';;' ,i) \ ........... , ~ ,,-u
Sample PROACT RCA Proceclure c~ :;L?!~ (,.'" ~n~- (~
c. ROl Year 2 1360%
I 1. PURPOSE
As we can te11 from these numbers, the OPIJortunities are left to the:lmagination.
a. To provide consistency to the organization in the application of the
They are real; they are phenomenal to the ppint they are unbclievabte. When we
PROACT Root Cause Analysis (RCA) Procoss.
review the process we just went thIOugh, lool~ at the conservativeness '; built in:
b. To provide guidance in the following areas:
i : Reguests
l. Only 30% 01' those trained \Vil! aetuaUy apply the RCA methofl
Analyses
2. Students wi11 spend on1y 10% of their time on RCA .
Reporting
3. Stlldents will work in teams of three (3) or mOfe
Presenting
4. Stlldents will complete only one (1) RCA every two months
Tracking
5. Each event \Viii net anly tus SO,OOD/year
2. APPLICATION/SCOPE

Use this same cost-benefit thought process and plug in yom OWI~
numbers to
This procedure applies to all users of the PROACT process conducted in
compliance with aH Safety Policies and Procedures unless otherwise
see ifthe ROIs are uny less impressive. Using rhe most conservative Stl.~1ce, ir wOllld
directed by the Department Manager.
appear ilTational NOT to perform RCA in the fielcl. How many of ou~ engineering
3. RESPONSIBILITY
projects would be tumed clown ir we demonstrated to management a (ROl ranging a. The Supervisor of Reliability Engineering (or equivalent) sha11 have
from 437% to 1360%? Not many1 .
the responsibility to review, amend, and revise this procedure as
necessary to insure its integrity and application.
INSTlTUTIONALlZING ROOT CAUSE ANALYSIS (RCA) b. The Supervisor of Reliability Engineering (or equivalent) shall have
IN THE SYSTEM the responsibilty to develop, implement, review, amI revise related
proceclures andlor documents reqllired in this procedure.
In an era where most col1ege graduates willlikely be employed by a minimum of 4. DEFINITIONS
five employers in their eareers, stability al' tumover is difficult to control. This poses a. Champion: Usually a person in authority that sponsors and mentors
a problem with what is often ca1led corporate memory. C0l1JOHlte memory i8 the the principal analysts and supports the RCA effort.
ability to retain the knowledge and experience of the workforce in the midst of a b. Charter: Defines the charter (or mission) of the RCA effort.
h1gh tl1l11over environment. How does a company expect to produce a guality product c. Chronic Events: Events that occur repetitiously.
in a consistent manner when its workforce is inconsistent? This is an especially
36 Root Cause Analysis: lmproving Performanc::e for Bottom-Line Results Creating the Environment for RCA to Succeed 37
, I
!

el. Critical Success Factor (CSF): Identifiable rarker tllat will signal 8. VITAL MANY/CONTINUOUS IMPROVEMENT
the RCA effort has been successflll. Guidelines in which the RCA Tbe RCA of the Vital Many events will be led by a PA or other qualifiecl
temn operates. i personnel that are not in the Reliability Engineering group.
e. Logic Tree: A graphical represerltation of logic u~ed to uncovcr physical, a. Assignment of Champion: The Division Reliability Coorclinator \vill
human ancllatent l"Oot causes. : be assignecl as the champion of the event that falls within their divisioll.
f. Opportunity Analysis (OA): A techniqlle to ide1ltify the most 1111portant 1. A PA Qf other qualified personnel will be assigned or obtained by
failures (signiflcant few) to analyze. the Division Reliability Coordinator to lead the RCA.
g. Principal Analyst (PA), Qualified: The individu'al assigned the respon- 11. The Division Reliability Coordinator's role is to provide the
I
sibility of leading and completing the RCA. The individual is qual- resources or obtain the resources that the PA needs to do the job
ified based on tIle!r sllccessful compledon of the PROACT right and to identify and remove obstacles that hinder their analysis.
Certification Workshop. l 9. DETERMINATION OF TEAM MEMBERS
h. PROACT: A software program that facilitatds, the PROACT RCA Certain events will require a team to be formed while others will noto If a
process. team needs to be assembled the PA will malee a recommendation to the
1. Root Cause Analysis (RCA): Any eviclence-cl!!lven process that, at a Division Reliablity Coordinator. The following items also need to be
minimum, uncovers underlying truths abou( past aclver5e events, addressed when selecling the temu.
thereby exposing opportunities for making lasng improvements. Multl-clisciplined (i.e., mechanical, electrical, financial, managerial,
J. Significant Few: The 20% of the failure events ~hat have been cleemed hour1y, etc.).
to be accountable for 80% of the 105s. Th1S nfo11ation ls clerived from Personnel directly affected by problem or event.
I
the OA. I Personnel who may be involved \vith implementation of solution.
k. Sporadic Event: A one-time catastrophic event) Excused fram normal work assignments while working on RCA
1. Vital Many: The many deviations that OCCllr in a [aciIlty that equate (similar to HAZOP Stuc1ies).
. .llnprovement e fforts.
to contmuous 1
10. RCA METHODOLOGY
5. REFERENCES a. When a team has been fonnecl that is not familiar w1th ReA, the team
a. Site Poliey Manual will aUend, at a minimum, one-day problem solving methocls (PsrvI)
b. Site Safety Manual course before praceeding with the analysis.
c. Site Quality Manual b. The team will accunltely define the event.
6. SPORADIC EVENTS C. The charter ancl critical success factors (CSFs) of the analysis need to
a. An RCA is requested for sporadic events with a tptal cost (maintcnance, be deve10ped so each team member lmows the purpose of the allalysis
operations ancl lost profit opportunities) great than $100,000. Listecl effort and ir the effort is successful.
below are several examples: d. Develop Strategy for Col!ecting the 5~Ps. The team or PA needs to
Un]lredicted Event develop the strategy for capturing the 5-Ps. This may involve taldng
Property Damage pictures, retrieving clata from the operating instrumentation, interview-
Lost Procluction ing personnel, etc. The urgency that tbis data is col1ected will depencl
b. An RCA is reqllestecl for inciclents that resultecl in or could have upon whether this lS a chronic or sporadic event.
reslllted in personal injUIY or damage to cquipmcnt or property as e. Assignment 01' 5-Ps: The PA will assign the 5-Ps (listed below) to team
defined in Section X of the Safe Practices !vIanua!. members \vho will be responsible for collecting the data.
c. An RCA ls requested for repeat customer complalnts and complaints Parts
from key cllstomers. Position
7. SIGNIFICANT FEW Peo]lle
A Qualified PA willlead the RCA of the Signific<1nt Few events t11at were Paradigms
identified by the Department OA, unless rec1irected by the Reliability Paper
Coordinator and/or the Department Manager. f. Analyze: Using the data collected, develop a logic tree.
a. Assignment of Champion: The Division Reliability Coorclinator will i. The logic tree will not be considered complete unless all the appli-
be assignecl as the champion of the event that falls within their Divislon. cable latent roots are identified.
]. A qualifiecl principal analyst (PA) wil! be assignec1 as the PA for g. Hypothesis VerificatioIl: Each hypothesis block OIl the logic tree needs
the Sienificant Few evenj's ::SSifTllf':rl t'o 1"hf': rlp.n::n-t1l1pnt t" hp \/p,;r,lPr! {nrr,,pn ",. rliC""\lY\\II"n\ 'T'h;" ;" "nF' "F th", n.,,,,,t {'r1lr';"l
38 Root Cause AnaJysis: !mproving Performance for Bottom~Line Results Creating lhe Environment for RCA to Succeed 39

1
steps in the RCA process. Without verlf1catiol1, thc finclings ancl iii.PROACT Software Training: All users of PROACT RCA soft-
recommcndations of the RCA are meaningless ware sha11 successfu11y complete either the five-day RCA Methods
h. Review Logic Tree: The PA will c01tact the Divion Reliability Coor- trainll1g or the one-day PSM training befare becoming eligible rOl'
dinatar whcn tile team is reacly t~ Irevi~w tIle logic tree. The review PROACT' software training. AH potential PROACT USers are
shoulcl talee place befare proceeclll1g wlth tile report ane! the formal required to attencl a four-hour short comse in hands-on PROACT
publishing of the ana1Y818 in the PROACT software programo instructor-led training.
i. Write Report: The report should illeIucle the following sections: b. The PA sha11 be responsible fOl' the complete accuracy of the analysis
Executive Summary I lItilizing the software programo Team members sha11 lIpdate their
Description of Event i i responsibilities in any given analysis; however, the PA is ultimately
Description of Nlechanism ! responsible for reviewing the accuracy and thoroughness of the com-
Review of Causes and Recmrlmcnclations plete analysis.
Assignment of Respol1sibilities ancl Time Lines c. The PA will as sume the responsibility of when it is time to publish
Dctailed/Technical Section I the RCA. PlIblishing the analysis in PROACT means that the com-
Detailecl Recornmenclations pleteel RCA is certifiecl to be credible and thorough. Once publishcd,
Appendices the analysis serves as a logic template for the rest of the corporation.
Participants Involved Publislng also means that aH sensitive materials have been reviewed
5-Ps Data Collection Forms by the legal eleparlment ancl have been approved for pllblishing in
Verification Logs this formaL
Logic Tree
I d. The PA w111 reserve the right to passworcl protect the RCA. Only team
I
J. Develop Draft Recommenclations: 11 presentation of the finclings 01' the members o' that specific RCA shall be permitted to have the passworcl.
RCA sha11 be given to personnel af~ected by implementing tbe reCOill- It sha11 be the responsibility of the PA to remove the password once
mendations and to personnel who wIll implement the recomm.cnclations the RCA has been published.
ancI others as applicable. This willl provide input that may! ai'fect or i2. CORRECTIVE ACTION AND TRACKING
change specifics about the recommendations. Personnel will be assignecl responsibility fOl' the corrective actions nec-
k. Revise ancl review the recom1l1encl~tions as necessary. essary to implement the recommendations that result from the RCA. These
1. Develop conective acrion items 'or; each of the recol11mencl~ltions. corrective actions will be tracked anci a repOli issuecl.
I .
ffi. Fonnally present findings ancl reqommenclations to the Reliability a. The Division Reliability Coordinator ancl PA will assign responsibil-
Temu and/Ol' appropriate managcn~cnt personnel for ill1 Plelnentation rOl'
ity the corrective actlon ltems un1ess otherwise directed by the
approval. Department Manager or his designee.
11. UTLlZATlON OF PROACT RCA SOFTWARE i b. The PA wil! l1l)fy a member of the reliability gronp (RG) that the
AH documentation o' RCAs is to be stored electronically Llsing tile RCA corrective actlon items have been assigned.
PROACT RCA software program 011 the designated cliq~t server. c. The PA wil! see that a copy ofthe full report (hardcopy and electronic)
Use of this program sha11 be in strict accordance with the ]lcense to is given to the RO fer I1ling purposes.
the corporation. I eL RCAs that result from events listing safety procedures will primaTily
a. User Prerequisites: AH users oE PROACT mLlst first supcessfully be halld!ed by plant protection or environrncntal affairs. These dcpart-
complete requisite trai11ing in one ol' more of the following comses ments are rcsponsible fOl' tracking corrective acon items that result
based on their participation in the analysis. from these RCAs.
i. PROACT RCA Methocls: Al! Principal Analysts (PA) shaH complete e. AH RCA corrective action items will be issued as needecl in a report
the five-day RCA Methods course either on-site 01' at a public loca- to the personnel assigned responsibility for the items. The corrective
tion. It wi11 be at the discretion of the PA to determine which team action ltems will remain in the report until completecL
members receive the passwOl'd fOl' passworcl-protected analyses. 1'. U pela tes to the report can be fOl'warded to the Division Reliabllity
ii. PSM (Problem Soiving Methods): All RCA team members shall Coordinator as they are completed and wi1l be incorpOl'ated into the
successfully complete the one-day PSM training by a licensed next quarterly report.
PSM trainer. g. A progress report will be sent to the department manager fOl' revievi'.
40 Root Cause Analysis: Improving Perforry,ance for l3ottom-Line Resulls
, Creating the Environment for RCA to Succeed 41
1

APPENDIX 1: SAMPLE RCA PROCEDURE

Decide t Champion the Establish~1~~. Comrnl-lnicate


implement root reliability performance valuc ofRCA
cause anal>'sis performance critcl'ia anc( training and
methods training process 01' delinea te TRPP and gaio
choose designee landmarks I E
M

~o ]""S

"O
B

e .oo~ ~ro
.~ "O

" "
ro
.~

"
-"
u
2 B
o
o'-< .B
FIGURE 3.1 TRPP Executive rvranagtment Roles ~o
~
",2;-
ro

"" "~
ero
,----
Clear the path Assure that the Resource
for improvement support systems !mprovemenl
.~ ~
\vork are working ~vOl'k ~ 2
u

i .o
o
> -<'"
~ U
" "B
o
"

FIGURE 3.2 TRPP Champion Roles

E
o
8 o2
.~
."
%
o
"O
uo "ro
">'o "Oe :;o
o ro
"" u u
42 Root Cause Analysis: Improving performance far Bottom-Line Results

Fail U re Classification
4
,

Make appropriate Provide RCA Assst .1 Document Communicate


arrangements far methods trainiIig I in ",ming t, ' metrcs and performance
RCA trainng su pport s~te ~1S s'avings
are workng
To begin discussing the issue of Root Cause Anatysis we must first begin setting the
founc1ation with some kcy telminology. As mentioned earlier in this text, one of the
l . .. . .. primary reasons for the misinterpretation of "RCA" s that tilere lS no standard defi-
nition against which to benchmark. Therefore, everyone defines RCA as they pIcase
aneI the reslllt is equating shallow cause methodologies to root cause methodologies.
Por our purposes in this chapter, let's begin by discussing the key clifferences
between the terms problerns and opportunities. There are many people who tencl to
FIGURE 3.4 TRPP priver Roles use these tenns interchangeably. However, the truth ls these terms are really at
opposite encls of the spectrull1 in their definitions.
A problem can be defined as a negative deviation ji-Oln a pelformance norlTl.
What exactly does this mean? It simply means tbat we cannot perforll1 up to the
normallevel or standard that we are llsed too For example, let's as sume \ve have a
T l wiclget factory. We are able to produce 1,000 widgets per day in our factory. At
I A
some point we experience an event that interrupts our ability to make widgets at
this leve1. This mean s that we have expcrienccd a negative deviation from om
\ performance norm, which in this case is 1,000 widgets.
An opportllnity is really just the opposite of a problem. It can be defined as
a cllanee fo achieve a goal or an ideal state. This means that we are going to make
some changes to ~mcrease our performance norIl1 or status quo. Let's look back al
our widget cxample. If our normal output were 1,000 widgets per day, then an)'
changes we make to increase our throughput would be considered an opportunity
FIGURE 3.5 Ideal POSitiOll for RelialJ!lity OH Organizational Ch~t So if we eliminate certain bottlenecks fram the system and stm-t to produce
I
1,100 widgets in a day, this would be considered an opportunity.
()
1 ~ i(. ,~,,',-,"2: Now let's put these terms into perspective. \Vhen a problem occurs und we take
action to fix it, do we actually improve or progress? The answer to this question
., ~ V ''',L-'''-C /::",..) '-.i\
i-....r:" <>,,-~-C~J

~
" '1
an emphat1c no. When we work on problems we are essentially working to maintair
the status quo or performance norm. This is synonymous with the term reaction
'1::::_ '.:.:.;J._ .... k.\'J R..!,,- ;~~ '{' fc;,;' 1...) '::",,-
u .j r",\u:..I\ !) "'

We react when a problem occurs to get things bacle to their normal, status quo state
If all wc do is work on problems we will never be able to progress. In our clealillg~
with companies aH over the worlcl, we often ask the question, "How nmch time de
yon spencl reacting versus proacting in your daily routines?" M08t surveyed wil
answer 80% reacting and 20% proacting. If this is tme, then there ls very Httle
progress being made. This would seem to be a key inclicator as to why mas
procluctivity increases are minimal fmm year to year.
Let's consicJer opportllnities for a moment. \\fhen we work on opportunities de
we progress? The answer i8 yeso \Vhen we achieve opportunities we are striving te
raise the status qua to a higher level. Tberefore, to progress we have to begin taldn~
advantage oftbe numerous opportunities presented to uso So ifworking on problem~
is lilce reacting, then working on opportunities is like proacting.
44 Root Cause Analysis: !mproving Perforrnan~e for Bottom-Line Resu[ts Failure ClassH'lcation 45

A negative deviation fram A chance lo achieve a today do not want to take a lot oI' chances \,/ith their career, so opportunities-begin
a performance nOl'm I goal 01' an idealstate. to look like what we hke to caH "careel' timitlng" actlvltles. One ofthe top 10 causes
of human error ls "over confidence."
So with tbat said, we have to figure out a way of changing the paradigm that
reactive is always more important then pro active work. T11is means opportunities
are just as important, ir not more il1lpOltant than problems.
Let's switch gears ancl talk about the dlfferent types of faHures or events that
can occur. Inciclentally, when we talle about failures we are not always talking about
machines or equipment. "Failure" can also be unexpected patient deaths, operational
upsets, admnistrative cIelays, quality defects or eve11 eustomer complaints. There
Definitioll Graph
FIGURE 4.1 Problem Definition Graph FIGURE 4.2 ~u'e two baslc categories o' failures that can exist: sporaclic and chronic. L et's loo"k /~,.
~
at each of these eategories in greater detaiL
Opportunities
A sporaclic (to be usecl synonymously with acute) oeCUlTenee usually indicates
that a clramatlc event has occurrecl. Por example, maybe we had a frIe or an explosion
in om manufacturing plant, we just lost a long-standing contract to a competitor ol'
Status quo a patient diecl unexpectedly. These events tencl to demand a lot of attention - not
just attention, bul urgent ami immediate attention, In other words, everyone in the
organization knows sometl1ing bad has happened. The key characteristic of sporaclic
events is l-hey happen on1y once. Sporadie failures have a very dramatic impact when
lhey Qccur, which is why many people tend to apply financial figures lO them. FOl'
instance, you might hear someone say, "We hael a $10 million failure last year."
Sporadic events are very important, and they certainly do cost a lot of money
Problems when they oeCUL The reality, however, is that they do 110t happen very often. If \Ve .
had a 10t of sporadic events we eertainly would not be in business very long. SpOl'aclic ~~
FIGURE 4.3 Opportllllity Graph losses can also be distributecl over l11any years. For example, if the engine in your
,
CUT fails and you need 10 repInce it, it will be a very costly expense, but you can
The answer is simple. We should a11 start working 011 ~pportul1ities ancI disrega~cl amortize that cost over the remaining Efe of the cal'.
problems, right? Why can't we do this? There are mapy reasons, but a few are Chronic events on the other hand are 110t very dramatic when they occur. These I
obvious. Problems are more obvious ta us since they tak~ us away from 01.11' norlll~ll types oE events happen over ancl over again. They happen so often that tIley actnally d~~
operation. Therefore they get more attention and priority. We can always put ~n become a cost of cloing business. 'We become so proficient at wOl'king 011 these events
opportunity off untiJ tomOlTOW, but problems have to be; ac1dressecl today. There ~s that they actually become part oE the status quo. V'lc can produce our "normal" output
also the issue o' rewarcls. PeopIe who are gooclreactors, -who come in ami save tlle in spite of these events.
day, tencl to get pats 011 the back and the old "atta-boys." What a grcat thing fro~l1 Let's Ioole at sorne of the chru'acteristics of chronic events. Chronic evcnts are
the reactor's perspective: recognition, overtime pay and 1110st importantly, job sec~l aceepted as part of the routine. \Ve accept the faet they are going to happen. In a
dty. \Ve have seen 111any cases where the person who tries to prevent a prob1ern or manufaeturing plant, we will even aecount for these events by developing a main-
event from occurrng gets the cold shoulder whlle the person who comes in after tenance buclget. A maintenanee budgct ls in place to make sure that when routine
the event has occmreel gets treated lil:e a king. Not to say we should not revvard events occur we have money on hand to fix them. These types of events do, hO\vever,
excmplary reactors, but we also have to reinforce gooel proactive behavior as we11. demand attenlion but usually not the attention a big sporadic or aeute event would.
Then there ls the risk factor. \Vhich are more risky, problems or opportunlties? The leey characteristic of a chIOnic event is tlle frequency factor. These chronic
Opportunities are always more risky since there are many unlmowns. With problems events happen over ancI over again fOl' the same re asan ol' mode. Por instance, on a
there are virtually no unlenowns. We usually have fixecl tile problcms before, so we given pump failure, the beadng may fail three 01' four limes a yeru'. 01' you have a
certalnly have the confidence to fix them again. 1 once had a colleaglle \VIlo saiel, bottle flling line, and the botdes continuously jamo Both \Vonld be considered
"when you get reany goocl at fixing something, yon are getting way too much chronic events. Chronic events tend not to get the attention 01' sporadic events because
practice." In a perfect worlcl, we should have to pull the manual out to see what on their individual oceurrences, they are usually not very costly. ThereEore, rarely
steps to take to fix the problem. How many times do we a see a craftsman, 01' even would we ever assign a dallar figure to an individual chronic event.
a doctor fol' that matte1', pulling out the manual to troubleshoot a problem? People
46 Root Cause Ana!ysis: Improving Performance for Bottom-Line Results Failure Classification 47

Daily production cost ove1' a one-year period we \voulel see that their impact lS far more significant
10,000 that any given sporadic event, simply due to the frequency factoL
I
1tatus qua -1 Consider how aH of the events actually affect the profitability of a given facility.
---------------------T------ As we a11 know, we are all in business to make a profit. When a sporadic event
)~
occurs it actually affects the profitability of a facility significalltly the year that it
5,000
oecurs, bUl, once the problem has been resolved, profitability gets bacle to normal.
I
The dilemma with chronic events is that they llsually never get resolved so they
Chronic faHures
sporadiclrailures affect protability year after year. If we were to eliminate such events instead of
1 1 1 I I : : IIII!I 1 H just reacting to their symptoms, we could make great strides in profitability. Imagine
Time if we had ten facilities ancl we were able to reduce the amount of losses in arder to
obtain 10% more throughput from each ofthose facilities. In essence we would have
FI G U RE 4.4 The Linkage I the capadty of (me new facilily without spending the capital dolla.Is. That is the
power of resolving chronic issues.
vVhat 1110St people faiI to realize is the tremenclous effect thc frequency factor Let us give an example of a chronic event success story. In a large mining
has 011 the cost of clll'onic failurcs. A st9Ppage Oil a bo4ling Ene clue to a bottle jam operation the management wanted to uncover its most significant chronic events.
may lake only five minutes to COITcct when it oecurs. If it happens five times a day, This operation has a large crane or "drag hne." This drag line mines the surface for
we are looking at 152 homs of clowntime per year. I~ an hom of downtime casts lhe producto The product is then placed on large piles where a machine called a
$10,000, then we are looking at a cost of approximately $1,520,000. As we can see, bucket wheel moves up anel down the pile putting the procluct onto a cOl1veyor
the frequency factor is very powerful. But since we tC1ld to scc chronic events only system. This ls where the procluct is talcen dmvnstrcam to another process of the
in their individual state we sometimes overlook the ac1umulated cost. Jusi imag,'ine ope1'ation. One day, one of tbe analysts was talking to oue of the field maintel1ance
if we were to go into a facility and aggregate aH of thelchronic events over a year's representatives who saiel they spend a majarity oi' time resetting conveyor systems
time and multiply thelr effects by the number of occurrences. The yearly losses whose safety trip carel was triggcred. Thcy estimated that this activity toolc anywhere
woulcl be staggering. I : fmm 10 to 15 minutes to resolve per trip. Now this individual did not see this acdvity
Let's take a Iook at how chronc and sporadic even~s relate to the c1iscLlssiorf on as a "failme" by any 111eans. It was just part of the job he had to do. Upon further
problems ancl opportunities. Sporaclic events by their Clefinition take llS below i the investigation, it was discovered that other people were also resetting trippecl
status qua and tend to take an extended pedod of time l to restore. vVhen we restore conveyors. By their estimation this was happening approximately 500 times a week
we get back to the status quo. This is very ll1uch like yvhat happens when we fract to the tune of about $7 million per yeal' in lost production. Just identifying this as
to a problem. The problem occurs and we take some action to get back to the status an undesirable event allov,/ed them to take instant conective action. By adding a
quo. Chronic events, on the other hand, hapren so routinely tbat they actually bec~l11e simple procedure o' removing large rocks with a bulldozer prior to bucket wheel
pnrt of the status quo or the jobo Therefore, when they occur they do not tak~ us activity, approximately 60% of the problem went away. These types of stories are
below O1.1r performance norm. lf, in turn, we \oyere to eliminate thc cbroniG or not unCOl11mon. We get so ingrainecl in what we are cloing that we sometimes miss
repetitive evcnts, then the elimination would actually cause the status quo to impr9ve. the tlngs that are so obviollS to outsiders.
This improvement is the equivalent of realizing an opportunity. So by focusing on Similady in a hospital setting, we were looking at the number of times blood
cllronic events, eliminating the causes ancl not simply fixing the symptorns, wd are hac! to be rec!rawn in an emergency room of a 225-becl acute care hospitaL At the
really working on opportunitles. As we said before, when we work on opportun!ties conc1usion of our opportunity analysis (to be discussed in detail in Chapter 5), we
thc organizaton aetually progresses. fouml that ]0,0,13 bIood rec1raws were taken in the past 12-1ponth periodo Next we
Now that we know that eliminating chronic events can cause the organiztion ag,gregated the average costs per bIood redra\v. Thesc costs inelude things hke the
to progress, we llave to look at the significance of chronic events. Sporadic ev~nts costs for syringes, gauze, tech time, transport time, opportunity costs for the real
by thcu: very nature are high profile and high cost events. But we can amortizc 1hoso estate in the operating room, etc. V'/hen compilecl, we found that on average each
costs over a long period of time so the effect is not as severe. Consicler i1' the engine bIood redraw was costing $300. The math is simple from this point on. We multiply
in yom car bIew up and you had to rep1ace i1. To the average 1110tarist this would 10,000 rec1raws times $300/reelraw and we uncover a whopping $3 million worm of
be a sporadic event. But if we amortize the cost over the remaining life of tho cal' hidc!en losses. On any individual occurrence no one sees tls as a faUme. It is viewed
it becomes less of a burelen. Chronic events on the other hand llave a relativelilow as a cost 01' doing business. This is the power of evaluating chronic failures.
impact Oil their individual basis, but we often overlook their true impacto If we were To wrap this up \Ve will encl wth yet another story. We were working with a
to aggregate a11 of the chronic events from a particular facility and look at their total majar oil company that was trying to reduce its maintenance budget. They h.iJ:ed our
48 I<oot Cause Analysis: Improving Perform\lnce far Bottom-Line Results Failure Classification 49

!
firm to teach thcm tl1e methocls being explainecl in t11is text. The manager opened So to sum up this discussion on failure c1assification, let's look at the key ideas
thc three-day session by stating t11at he had been mand~ted by his superiors to reduce presented. We live in a world 01' problems and opportunities. V/e would all love to
th~ main.tenanc~ ~u.clget signiican~ly, He t~lcl the~1 t,h1t the maintemmce bLld~ct for takc advantagc of every opportunity thal carne about, but it seems as if there are too
ttns parllculax tacll1ty was approxllllate1y $250 mllllO~1. He "vent on to expla1l1! tl1at many problems confronting LIS to take advantage of the opportunities. A good way
somc analysis \Vas clone 011 the buclget to fincl out how the money was being spent. to take ac1vantage in a business situation ls to eliminate the cmonic or repetitive
It tllrns out that 85% of the money was spent in i11crements oE $5,000 or lcs~. So events th<1t confront us each and every duy. By eliminating this expensive, nonvalue-
ir
by his cstlmation he was spending abOl~t $212 millian chronic maintenance 10$ses.
This was just maintenance cost, 110t 10st production q)st. :
added work, we are really achieving opportunities as well as adding additional time
to eliminate more problems. In the next chapter \Ve will discuss a method for
So he teIls the 25 engineers in the: training class J1e has two options to recluce uncovering all 01' the evcnts for a given process and delincating which of those events
this maintenance cost: I . are the most signiticant from a business perspective.

l. He can eliminate the need to do the work in the first place or,
RCA AS AN APPROACH
2. He conle! jnst eliminate maintepance jobs. I

We mentiolled this briefly in the Introcluction, but it is also appropriate to menton


He says that i1' they could eliminate the neeel to alo the work in 1'he f-lrst place here. RCA ls certainly applicable to both chronic and sporadic events in any industry.
(e.g., reduce the <1mount of chronic or repetitive fail~lreS) then he feH they contd I-Iowever, focusing on RCA as only an incident or accident tool is not optimizing
reduce the maintenance expenclitures by about 20%. Th~s would be a savings of about its potential for the organization. Using RCA in this fashion limits its effectiveness
$42 million. If they were real1y successful they couIel ~liminate 30% or $63 miUion. and treats it as an off-the-shelf tool for reactive siruations.
He goes on to say that "if 1 talce option two and Jet alPproximately 100 maintenance When using RCA as an approach, we seek to break the paradigm that chronic
people go, that will probably net the company about $7S million of which 1 will have events are an accepted cost of doing business because they are compensated for in
the sarne, if not more work, ancI fewer people to have t~ address the adclitional work." the budget. V/e seek to solve these chronic events clown to their root causes ancl pass
To make a long story short, the people in the training class opted for option 'one, the kllowledge on to others in the organization \vho may be accepting tbem as a
reclucing the neeel to do the work using their abilities tb solve problems. ! cost of doing business as well. This is the knowledge management and transfer

~
__________-+1_____.----
, I
component of RCA that we discussecl earIier.
Also, many do not realize that the chronic types of events are actually precursors
TABLE 4.1 to the sporadic cvcnts. It is om expericnce that when reviewing the sporaclic inves-
Options to Reduce Maintenance Budget tigations tlUit we have been involved in over the past 20 years that rarely do we find
revelations. M.ost of the time we find the true latent causes to be systems that are ,4-
in place and have been the norm for some time. They have been chTonically accepted
over the years to the point that no one questions them anymore.
AH it takes 18 one trigger, one decision, to make a chronic event a sporadic onel
This was clemonstratecl on the space shuttle Challenger as the o-ring design ftaws
were known from the beginning. That chronic problem existed for years and was
an acccplable risk according lo the flight reacliness plan. In the Challenger Disaster
Final Repoft this gradual deterioration oY standards was refelTed to as normalization
of deviancc. Only \\1hen lhe decision was made to launch at 36F (1SF colder than
any other llight), dicl tbat chronic failure become a sporadic one. Bridging this to
our working environments: Can't this happen to us? Doesn't this happen to us?
Opportunity Analysis:
~0~N-Jr/--""j

5>(\'1_ 5 "The Manual Approach"


... ['.,

\-:.- With alJ the noise and clistracton 01' a reactive work envronll1ent it is sometlmes easy
to overlook the obvious. For instance, if we wanted to perfonn a Root Cause Analysis
I t. _~.r\ L..-
(RCA) cm an event, would we know which event was the most significant or costly?
1\ e':)-\. Lb \..
Experience demonstrates thut we \vould not. In a reactive environment, we natural1y
become 1'ocused on the short-term. We tend to loole at the problems or events that just
happened and naturally thinle they are the most significant. This is a problem because
what happened yesterday, in ll10st cases, ls llot the most significant or compelling
issue. \Ve need to talce 11 more macro loole at the situation. For these reasons 'vve must
depend on the strategy development process described earlier to ensure that \ve m'e
working on the evcnts that truly adcl value to the bottom line of the business.
In arder lo determine where om most significant issues are we should employ
techniques that will allow us to look objectively at all the historical events contrib-
uting to our performance or lacle thereof. Failure Modes and Effects Analysis or
FMEA was developecl in the aerospace industry to determine what failure evcnts
could occur within a given system (e.g., a new aircraft) flnd what the assocated
effects wonId be if those evenls did incleed occur. This technique, albeit effective,
ls very man-I1ouf intensive. It ls estlmatecl that a typical FMEA in the aerospace
inclustry talees numerous man-years to perform. There are many goocl reasons why
this technique talees so long to perform as well as significant benefits to this inclustry.
However, this technique.ls far too laborious to be pcrfonned in most industries such
as the process ancl discrete manufacturing arenas. Therefore, we had to take the basle
concept ancl make it more industry friendly. When cliscussing this modified FtvlEA
technique we wiII refer to ir as opportunity analysis or OA.
Before we cantinue \vith the cliscllssion on how to develop an opportunity
analysis, let's first talle aboul why you would \Vant to perform one in the first place.
There really are two basic reasons to perfonn an opportunity analysis. The first and
foremost is to malee a legitimate business case to analyze one event versus another.
In other words, it creates the financial or business reusan ta show a listing of a11 the
events within a given organization or system ancI delineate in dollars and cents, why
you are choosing one issue versus another. lt :llows the analyst to speak in the
languagc 01' business.
The second compelling reason ls to focus the organization on what the most
significant events really are, so that quantum leaps in procluctivity can be made with
fcwer of the orgallizatiol1's resources being utilized. Experience again has shown
that the Pareto PrincipIe l works with such events just like ir does in ather arcas.
It goes something lilee this: 20% 01' les s of the undesirable events that we uncover

I PROACT RCA Mdhous Cour~e is copyrighted by ReJiability Center, lne., Hopewell, VA


52 Root Cause Analysis: lmprovng Performance for Bottom~Line Results Opportullity Ana!ysis: "The Manual Approach" 53

In TabIe 5.1, we begin by looking at the turbine engine subsystem. 'tie begin
listing a11 oE the potential Eailllre modes that might occur on the turbine engines. In
this case, we l11ight determine that a turbine blade could fracture. We then ask what
the eHects on other items within the turbine engine subsystem might be. If the blade
were to reIease, it could fracture the other turbine blades. The effects on the entire
system, 01' the aircraft as a whole, wOllld be 10ss of lhe engine aneI reduced power
and control ol' the aircraft. V,re then begin examining the severity of the failure mode.
Wc will use a simple scaIe of 1 to 10 where "1" is the least severe and "10" ls the
most severe. We llave slmplifiec1 I"his for explanation purposes, but a traditional
FMEA analyst would have specific criteria for what COl1stitutes severlty. In this
example, we will say that losing a turbine blade would constitute a severity of 8.
FIGURE 5.1 A.ircraft Subsystem qiagram Now comes lhe probability rating. \Ve would have to conect enough data to determine
the relative probability oi' this occun-ence based on the design of our aircraft. We
by conclllcting an in-depth opportunity; analysis wi1l1~epresenl approximately 80% will assul11e that the probability in this case is .02 ol' 2%. The Iast step is to lTIltltiply
ofthe losses 1'or that organization. You m<ly have hcard its also called tIle 80120 "Ll1e. the severity times the probability to get a criticality ratil1g. In this case the rating
V/e will talk more about the 80120 ne later in this cllapter. \Voule! be calculated as follows:
As "ve mcntioned befare, the FMEA technique w<15 developed in the aerospace
industry unc1 we will refcr to this as the traditional FfvIEA methocl. Modifications 8 x .02 = 0.16
are necessary lo malee the traclitional FMEA more appcable in other organizations.
Therefore, based 011 the modifications that we will exblain in this chapter, we will Severity X Probabilty = Criticality
call this technique the opportunity analysis. The key! difference betwecn thc ': two
EQUATlON 5.1 Sample Criticality Equation
mctl.l0dS is that the traditional method is probabiliSC'j' meaning lhat jt is looking at
what cmd happen. :
This means that this line itel11 in the FMEA has a criticality ratng of .016.
In contrast, opportuny analysis looks 0111y at histOl:ical events. We list on1y ems
We wOllld then repeat this process fOl' all of the failure modes in the turbine engines
that have actually happened in the pasto l::"ar the historic;al method, \Ve are not eX~lctly
ancI all of the other majOl' subsystel11s.
interestecl in what might happen "tomolTow," as we me in what did happen yeste~day.
Once all of the items have been identified it is now time to prioritize. We
Let's take a look at a simple example of both ":a traditional FMEA an~[ an
wOlllcl sort our critica1ity colul11n in descending arder so that the largest criticality
Opportunity Analysis. Our intention is not to develop ',experts in traditional FJ\llEA,
ratings would bubble up lo the top and the smaller ones would faH to the bottom.
as it is to give a general understanding of how FMEA and hence Opportlfnity
At s0l11e point the analyst \vould make a cut specifying that aH criticalities below
Analysis were derived. In the aerospace industry, we woulcl perform a tracliti;onaJ
a certain nUl11ber are delineated as an acceptable risle, and all aboye neea to be
FMEA on a new acraft that is being developed. So the first thing we 111ight ~o lS
evaluated to determine a way to reduce the severity aud more importantly the
to break the aircraft clown into smaller sllbsystems. So a typical aircraft \Voulel huve
probability of oeCllrrence.
many sllbsystems sllch as the wing assembly, instrumentation system, fLlse~age,
Bear in mincl that this is a long-term process. Thcre is a great deal of attentlon
engines, etc,
placed upon detenninlng all of the possible failme modes and even greater attention
From there the analysis would look at each of the subsystems amI clcterrnine
paiel to substantiating the severity and probability. Thousancls o" hours are spent
\vhat failure modes might occur and if they did, what would be their effects? Let's
running compol1ents to failure to determine probability and severity. Computers,
take a look at a simple example in the foHowing tabIe:
however, have helpeel in tl1is endeavor, in that we can simulate many occunences
by building a computer moclel and then playing "what if' scenarios to see what the
TABtE 5.1 effeels wOllld be.
Traditional FMEA Sample We do not have the time Ol" resources in business, healthcare and industry to
perfonn a thorough traclitionaI FMEA on every system. Nor does it malee economic
sense to clo so on every system. What we have to do is modify the trac1itional FMEA
process to help LIS to uncovcr the problems and failures that are currently occurring.
This allows liS the ability to see what the real cost of these problems are and how
they are really affecting ou!" operatiol1. Let's loole at a simple example.
54 Root Cause Analysis: !l1lproving Performance fOI" Bottom~Line Results Opportunity Analysis: "The Manual Approach" 55

events that were tbe greatest contributors to 10st production and pelform a- disciplinecl
Root Cause Analysis (ReA) to determine the root causes fDI their existence.
Now tbat we understand the overall concept of FMEA, let's take a detailed Iook
, -;:-;----
at the steps involved in conducting an opportunity analysis. There are seven basic
steps involved in conducting an opportunity analysis:
Convey Fill Convey Paclcage fill
empty
bottles
~ empty:
bottles
~ filled
bottles
r bottles in
boxes l. Perform preparatory work
~
2. Collect the elata
lf\.. \ .J~~_~:
->(

! i
Convey iStack
3. Summarize and encode results
IMOve to
4. Calculate 10ss
filled hoxes on f- warehouse
boxes pallets fo~ shipment 5. Determine the "Significant Few"
.
6. Validate results
F1GURE 5.2 :Sample Lubncarits Plant 7. Issue a report

Consieler that we are ILlnning a lubJica11ts P1a11tll' In ths plant \Ve are cloing the
following: SUP 1: PERFORM PREPARATORY WORK
As with any analysis, there s a certan amount of preparation work that has to tak:e
L Creating the plastic bottles f?r the IUbricantl
place. Opportunity Analysis is no different, in that it also rcquires scvcral up-front
2. Convcying the battles to the :filling machine] to be filled witb lubricajlt.
tasks. In order to adequately prepare to perform an opportunity ana1ysis you must
3. COl1veying the filled battles to the packagngjProcess to be boxecl in c(~ses.
accomplish the following tasks:
4. Conveying the filleel boxes to be put anta paUets.
5. Moving the pallets to the warehouse where lhey await shipping.
Define the system to analyze.
I
Define undesirable evento
The next step is to determine a11 of the undesirble events that are occm(ring in
Draw block diagram (use contact principIe).
each of our subsystems. For instance, if we were looking al the 1111 empty:'bottles
Describe the function o' each block.
subsystem, we wouId uncover all of the undesirable Jevents related to this subsystem.
Caleulate the "GAP."
Let's look at this simple example: I ,
Develop preliminary lterview sheets ancI schedule.

TABLE 5.2 DEFINE THE SYSTEM TO ANALYZE

Opportllnity Analysis Une ltem Sample Before we can begin generating a list of problems, we have to decide which system
to ::tnalyze. This may sound like a simple task but it does require a fair amount of
Fill Empty BoUles BoUle Stoppage Sattle Jam 1,000 $150 '$150.000 thought on the analyst's part. When \Ve teach this method to our students, their usual
response is to take an catire facility ancI make it the system. This ls a preseription
for disaster. Trying to cle1ineate aH of the failures and/or problems in a huge oil
The idea is to delineate the events that have occurred that causecl an upset in
refinery for instance would be a claunting task. V.,rhat we need to do is localize the
the fin empty bottles subsystem. In this case, one 01' the events woulcl be h bottle
system down Lo one system within a larger system. For instance, a large oil refinery
stoppage. The mode of this pmticular event is that a bottte became jammecl in the
is comprised ofmany operating units. There is a Crude Unit, Fluid Catalytic Cracking
filling cyc1e. It occurs approximately 1,000 times ayear or abont three times a day.
Unit (FCCU), Detayed Coking Unit (DCU), and many others. The pruclent thing to
The approximate impact for each occurrence is $150 in 10st productioll,. Ir we
do would be Lo select one unt at a time anclmake that unit the foeus of the analysis.
mllltiply the freqllency times the impact for each occurrence, we \-voulcl come to a
For example, the Crude Unit woulcl be the system to study and then we would break
total 1088 of $150,000 per year.
the Crucle Unit into many subsystems. In other words, we should not bite off more
If we were to continue the analysis, we woulcl pursue each of the subsystems
than we can chew when selectil1g a system to study. \Ve have seen many cases where
delineating aH of the events and modes that have caused an upset in their respective
anaIysts f1rst do a rOllgh cut to see which area of the facility either comprises a
subsystems. The end result would be a listing of all the items that contribut~ to lost
bottleneck al' ls expencling the greatest amount of expense.
production and their respective losses. Based on that listing, we woule! select the
56 Root Cause Analysis: lmproving Performance for Bottom-Line Resu!ts Opportunity Analysis: "The Manual Approach" 57

DEfiNE UNDESIRABLE EVENT I the calendar dicta tes it. Instead of perfollning these planned shutdowns on a time
basis, maybe we shollld consteler usi.ng a conditional basis. In other words, let the
This may sound a little silly, but w~ have to clefin& exactly what an "undesirabJe condition 0-[ the equipment dictate \vhen a shutdown has to takes place.
event" 15 in OUT facility. During every seminar that we teach on th18 subject, we ask This idea of looking at planned shutdowns as an undesirable event is not always
the students in class to write clown their c1efinition of an undesirabJe event at their obvious 01' popular. But il' we are in a sold-out position, we must look at anything
I
facility.
.
Just abant every time, every 'stuclcnt has a differenl
I
clefinitiol1. The f~ct ls if that takes LIS away from our ability to run 8,760 hours ayear at 100% throughput
we m'e going to calleet event data, everyone involtecl must be using a con:sistent rateo Now let's consieler a eliiterent scenario. In many facilities, we have spare
clefinitiol1. If \Ve are collecting event data ancI there i5 no stanclardized clefipition, eqllipment,just in case the primary piece of equipment fails, It 1S sort of an insurancc
then everyone will give us their perceptions ofwhat lnclesirable events are occurring policy for um'eliability. In this scenario, if the primary equipment aileel ancl the
in their work areas. Far instance, if we ask a machinc operator what undesirable spared eqllipmenl "kicked in;' woulcl this intel1'llpt the continuity of maximum
events he sees, he will probably give us pmcess~g type evcnts, a maint~nance qllality production? Proviciing the spare functions properly, the answer here would
mechanic wiII probably give us machincry-relatecl e\lents, whereas a safely erigineer have to be no. Since we hacl the spare equipment in place and operating, we did not
would probably give all of the safety issues. The clilclmma here is that we lose foclls lose the prodllction. That event would BOt enc1 up on our list because it diel 110t meet
when we cIo not have a common clefinition of an mJcJesirable event. our de{inition of un l1nelesirable event. This is also a hard pill for some of us to
The key to making an effective definition o-f ad: undesirable event is to ~nsl'lre swallow. But that is the tough part abont focusing. Once we define what an unc1e-
that the definition coincides with a particular busi\1ess objective specifiecl 'in the sirable event ls, we must list only the events lhat meet that clefinition.
strategy map, Por example, if we are in a sold-out position ancl OHr objectlve is to Let's consieler the clefinition, "an undesirable event is a deviation fram lhe status
increase prodllction utilizaton, then om definition sliould be based primarily (~round quo." This de/lnition has many problems. The primary problem is, "\Vhat happens
continuous procluction or lirniting c1pwntime. Let'sl take a loole at some common if you have a positive eleviation'?" Should that be considered a failure? Probably noto
dellnitions that we have run aeross over the years, ~ome are pretty goocl ancl some Haw about the words "status quo"? For one thing, status quo is tar too vague. If we
others m'e unacceptable. An llndesirable event is: ' were to ask "] 00 peopIe to describe the status quo oE the United States today, they
woulcl all glve us a diHerent answer. Plus the fact, the status quo does not always
Any los s that interrupts the continuity of maixirnum quality procluctibn mean tl1at things are gooel; it just says that things are the way they are. If we were
! ;
A 10ss of asset availability to rewrite tl1at derinition, it would make mOfe sel1se if it looked like this:
The unavailability of equipment
A deviation from the status quo An unclesrable event lS a negative deviation l'ram 1 million units per day,
Not meeting target expectations
Any secondary defect So why bother with a definition? It serves 111llltiple purposes. Firsl of aH, we
cannot perform an Opportunity Analysis without i1. But in our opinion, that lS the
The first one lS "An undesirable event ls any 10ss that interrupts t11e conlin~lity of least important reaSOI1. The biggest advantage of an agreed upon definition is that
maximum quality procluction." This is a pretty good deflnition and one that yve see it fosters precise communicatiol1 between everyone in the facility. It gets people
and use quite frequent1y. Let's analyze this deflnition. In most manufacturing fa9ilities, focllsed on the most important issues. In short, it focuses people on what is really
we often take om processes offtine to do routine maintenance. The Cjuestion be:pomes important and that we are adhering to the strategy defined in the strategy map.
whcn we take these planned shutdowns, "Are we experiencing an undesirable event When we devise a dennition of an undesirable event, we neeel to make sure that
basecl on the 11rst definition above?" The answer is an emphatic yes! 1'he clef~nitjon it ls short ancl to the polnt. 'Ve certainly would not recommend a definition that is
states any 10ss that inlenupts the continuity of maximum quality production ls clpemed several paragraphs long. A good definition can and should be abont one sentel1ce.,c:ur
an undesu:able event. Even if we plan to take the machines out ol' servicc, it still de11nition ShOlllc1 acldress on1y one business objective at a time. Por example, a defimtlOn
interrupts the continuity of maximum qllality prodllction. Now we are not sayi\1g that that statcs "An llndesirable evcnt is unything that causes downtime, an injury, an envi-
we should not take periodic shutdowns for maintenance reasons. AH we are sug&esting ronmental excursion and/or a quality defect" is trying to capture too many objectives
ls that we look at them as undesirable events so that we can analyze iE there:ls any at one time, which in turn will cause the analysis to lose ocus. If we fed the need to
way to stretch out the intervals between each planned shutclown aneI reclucig the Iook at each of those issues, tl1en we need to peti:orm separate analyses for each of
amount of time a planned shutdown actllally takes. For instance, in many inclstries, them. It may take u HUle Ionger, but we \:viII maintain the integrlty of the analysis.
we still have what we caH annual shutclowns. How often do \ve have an animal Last but not least, it ls important to get decision makers involved in the process. ~~
shlltdown? Every year, of course! It suys so right in the name. ObvioLlsJy, the gov- We would recommend having S0111eone in authority sign off on the definition to glve
ernment aneI other 1egislative bocHes regulate some shutclowns such as pressurc vessel it some credence aneI clouL If we are lucky, the person in authority wl even modify
inspections. But in many cases, we are doing these yearly shutclowns just because the cJefinition. This wlll, in essence, create buy-in fmm that persono
58 Root Cause Analysis: lmproving Performance for Bottotn-Line Resu!ts Opportunity Analysis: "Th e Manual Approach" 59

Create '! Potcntial = 1,000 donuts/day


bottles !
GQP
,
I
250 clonuts/day

Convey Fill Convey Package fillec\


empty .. empty f. filled f. bottles in
bottles bottles bottles boxes

, I

Convey Stack I Move to


filled f. boxes on .. warehouse
,
boxes pallets for shipment
FIGURE 5.4 Sample Gap Analysis
I
FIGURE 5.3 Block Diagram rxamplc
has the potential of making 1,000 clonuts per day, but we are able to make on1y
ORAW BLOCK DIAGRAM (USE THE CONTACT PRINClPlE) 750 d~nllts per day. Th~ gap is 250 donuts per day. V\Te will use our opportunity
, I analysls 10 uncovcr all ot the.reasons that are keeping LlS from reaching our potential
Now that we have definecl the system to ana1yze anct the defmitlon of an undesirable of j .000 dOllllts pe, doy.
I
event that is most appropliate, we nQ\V have to cre$te a simple flow cliagram of the
system being analyzed. This diagram w111 serve aJ a job aid later when we begin DEVELOP PRElIMINARY INTERVIEW SHEETS AND SCHEDULE
collecting data. The idea of a cliagram is to show tAc
ftow 01' product from paint A
to paint B. We want to list out aU' of the system~ that come in contact ,,:,itl1 the The 1ast step in the preparatory stage is to design an interview sheet that is adequate
product. Let's refer to OUt lubrication facility exan~vle. " to colIeet the data eonsistent with your undesirable event definition and to set up a
Each of these blocks indicates a subsystem tIlat comes in contact witb tbe schedule 01' peuple to interview to get the required elata. Lcr's look at the reqllired
producto We use this c1iagram to help us graphicallyi represent a process tlow so tbat data elements 01' -nelds. In every analysis \ve wiII have the following data elements:
it s easy to refer too Many facilities maintain such,'cletailecl clrawings und uSe them
011 a daily basis. Often such diagrams are referred to as process tlow diagtams or
$ Subsystem: This conelates to the blocks in our block diagram.
PFDs. If we haye such diagrams aIready in our faci~ities, we are abead of th6 game. Event: The event is the actual undesirable event that match;s the definition
If we do not, we must simply create a simple cllagram like the one aboYe to belp we crealecl earlier.
represent the oyerall process. We \vill discuss how to use both the undesirabje event Mode: Thc mocle lS the apparent reason that the unclesirable event exists.
clefinitlon and the contact flow diagram in the data collection phase. Frequency Per Year: 1'his number conesponds to the number of times the
mode actually occurs in a year's time.
Impact Per Occurrence: This figure represents the actual cost of the mode
DESCRIBE THE fUNCTION OF EACH BLOCK
when it oecurs. For instance, We will 100k at materials, labor, lost pro-
In s011lc cases, drawing the block diagram in itself ls not enough of an exphmation. cluction, fines, serap, etc, This data elemcnt can represent any item that
\Ve may possibly be vo/Orking with some individuals who a're not intimateiy awarc has a determinable cost.
af the function of each of the systems. In these cases, it will be necessary fpr lIS to Total Loss Per Yenr: This lS the total 10ss per year far each mode. It is
adel some level of explanation for each of the blocks. This wil1 allow those \vho are calculated by simply multiplying thc freql1ency per year by the impact
less lrnowledgeable in the process to partieipate wlth some degree of backgr!~und in per oecunence.
the process.
In arder to develop an effective interview sheet we have to create it btlSecl on
our definition. The first four coll1mns (subsystem, event, mude and frequency) are
CALCULATE TriE "GApll
always the .'lame, The ill1paet column however can be expanded upon to include
In arder to determine success, it will be necessary to demonstrate where we are as whatever cost elements that \ve feel are appropriate for the given situation. For
opposed to where we could be. In order to do this, we will need to create G simple instance, some do not lnclude straight labor costs since we have to pay sllch a cost
gap analysis. The gap analysis will visual1y show where we currently are~ versus regarc\Iess. We \ViII, however, inclucle any oyertimc costs associated with the mocle
where we couId be. For instance, let's assume that we have a clonut machine that .'lince \Ve \Voulel not llave incl1rred the expense wlthout the event occurring.
60 Root Cause Ana!ysis: lmproving P;erformance far Bottom-Line Results Opportunity Analysis: "The Manual Approach" 61

connotation. In order to gain employment \ve typically had to go through an inter-


TABLE 5.3
view, which i8 sometimes a stressful situation. \Ve often watch TV police shows
Sample Opportllnity Analysis Worksheet where a suspect is being interviewed (Le., interrogated) in a dark s1l10ky room. We
wauld choose to make out" interviews much more informal. Think of them more as
an information gathering session instead of a formal interview. This will certainly
improve the ftow of information.
iu
Now who \\'ou1d be good candidates to talk to in an interview or discussion session?
It is important to make sure that we have a good cross section of people to talle too
Far instance, we would not want to talk to just maintenance personnel because we
may get only maintenance-related items. So what \ve should strive to do i8 intervie\v
The last itcm in the preparatory stage lS to determine which mOl'ilClU''''
across disciplines, meaning that we get information from maintenance, operations,
should lnterview ancI to create a prelimiuru'y sheet to list al!:
technical and even administrative. Only then will \Ve have the over-all depth that we
viduals to talk to in arder to colleet tls event . , Wc will talk more about
are looking fol". 'Ihere is also the question of what level of person we want to talle
what types of people to interview in the next
with. In most organizations there is a hierarchy of authority and responsibility. For
instance, in a mmmfactl1ring plant there are the hourIy or field level employees who
STEP 2: COLlECT are primarily responsible to operate and maintain the day-to-day operations to keep
1 the products flowing. Then there is a middle supervisory level who typically supervises
There are a couple schools of thought WhCll it to bow to caUeet the data that the craft ancI operator 1evels. Above the supervisory levels are the management levels
is necessary to perfOlm an opportunity l. On one side, there 18 the sehaol that typically m-e looking at the operation from a more global perspective.
that believes that all data can be retrieved a computerizecl systerp within the When trying to uncover undesirable events and modes it malees sense to go to
organization. The ather side believes that it be virtually impossi~le to get the the SOLIrce. This means talking to the people closest to the work. In 1110st cases, this
required data from an internal compLlter since the data going into tbe system woulcl describe the houdy workforce. They deal with undesirable events each and
is suspect at best. Both sides are to sorne correct. An organi~ation 's data every cIay and are usually the ones responsible for fixing those problems. For this
systems do not always give the precise that we need altbo\lgh they can reason, we woulcI recommencl speneling 1110St of the interview time with this Ievel.
be useful to verify trends that would be by interviewing people. If we think about it, the hourly workforce is the most abundant resource that, rm-ely
We wi11 explore both of these altematives ! tbis chapter and the nett. However, in our experience, is used to its fullest potential. Sometimes getting this wealth 01'
the analyst leading the Opportunity !, wi11 ultimately be responsible -far knowledge is as easy as just asking for it. \Ve are certainly not suggesting that we
making the decision as to whethe~ the more accur~te and timely ~ata ICOl:nes fr~m should not talk with supervisory level employees or aboye. They aIso have a vast
the people or the existing informatlOn system. In tlus chapter we will cpntllllle wlth amount of experience and knowledge of the operation.As far as upper level managers
the manual approach of co11ecting data f1'Om the raw source, the peopl~. In the next go, they usually have l more strategic focus on the operation. They may not have
chapter, we will explore the data collection opportunities that are availfble from an the specific information required to accomplish this type of analysis. There are
Asset Performance lv'Ianagement (APM) system, hence automating th~ effort. exceptions to every rule, however. \Ve once worked at a facility where the plant
It is recommended that when using the manual method of da~a collection manager routinely would log into the clistributive control system (DeS) from his
(interviewing technique), that we take a two-track approach. We beg~n collecting home computer in the midclle of the night to observe the actions of his operators.
data from people through the use of interviews. We use the interviews very loosely When they malle an acljustment that he thought was suspect, he would literally call
as we will explain later. Once we have collectecl and summarized the interview elata, the operator in thc control room to ask why they did what they did. Imagine trying
we can use our existing elata systems to velify financial numbers m~cI see ir the to operate in slIch a micro-managed environment? Although we do not support this
computer data supports the trencls that we uncavered in OUl' interviews. The numbers manager's practice, he probably would have a great deal to offer in our analysis o
will not be the same but the trends may very well be. So let's cliscuss haw to go process upsets because he had intricate knowleclge of the process itself.
about collecting event data using an interview method. Another idea that we have found to be very useful when collecting event 1nfor-
If we remember trom OUT previous discussion, we developed two Job aicls. We mation is to talk to multiple people at the same time. 'Ihis has several benefits. For
had an undesirable event definition and a block diagram 01' the process :Iiow. We are one, when a person is talking it is splllTing something in someone eIse's mInd. It
now going to use those two documents to help us structure an intervi~w. We begi.n also has a psychological etTect. When \Ve ask people abont event information, it
the interview by asking the interviewee to delineate any events that meet our deh- may be perceivecl as a witch hunt. In other worcls, they 111ight feel1ike management
nition of a11 unclesirable event within a certain subsystem. Tls Cl'eates a 1'ocused is trying to blame people. By having 111111tiple interviewees in a session, it appears
interview session. As we saicl before, an interview generally has a kind of negative to be more of a brainstorming session instead of an inteITOQ'ation.
I
62 Root Cause Analysis: !rnproving Performa!lce for Botlom-Line Resu[ts Opportunity Ana!ysis: "The Manual Approach" 63
I,

The interviewing process, as we have learnecl over the years, is rcally an art It is important to clevelop a strategy to draw out quiet participants. There
fonn more than a science. Whcn we first su"ted to intcl'view, we 80011 learned that are many quiet people in our workforce who have a wealth oE data to
il can sometimes be a difficult task. 1t is like golf; thF more we practice proper share but are not comfortable communicating it to others. \Ve have to
teclllliquc the beltcr the final results will be. An inter.riew is l10thing more tban make sure tIlat we draw out these quiet interviewees in a moderate and
getting information from one individual to another 3d cleady and accurately'as inquiring manner. We can use nominal group techniques where we ask
possible. To tbat end, hefe are sorne suggestiOl1S that will help you to become a each of the people to whom we are tallng to write theiT COllunents clown
more effective interviewcr. Same ofthese are very specifiC to the opportunity 3n111Y0is on an inclex card and then compile the list on a tEp chart. 1'his gives
process, but others are generic in t11at they can be applieclJto any interviewing session. everyone the same chance to have their comments hearcl.
i
Be very careful to ask the exact same lead qestions lo each of the
Be aware ofbody language in interviewees. There is an entire science behind
boc!y language. It 1s not important that we become an expert in tbis area.
interviewees. This wi11 ellminate the possibility of having clilTcrent However, it ls important to lmow that a substantial portion of human COl11-
answers clepending on tlle interpretation of tlle! question. Later we can' munication is through body language. Let the body language talk: to uso For
expand on the questions, if furtller c1arification is necessary. We can use instance, i1' someone sits bacle in a chair with their ru-ms finnly crossed, he
I
om undesirable event definition and block flo'r diagram to lceep the, may be apprehensive and not feel comfortable providing the informaton
1nterv1ewees focused on the analysis. : that we are asking foro This should be a clne to alter our questioning
IYlake sure that lhe partlcipants know what an O~?portunity Analysis is, as techniqlle to make lhat person more comfortable with the situatlon.
well as the pm})ose and structure of the interviets. If we are not careful, In any set of lnterviews, there will be a number of people who are able
the process may begin to 100k more like an intelT0gation than an interview: to contribllte more to tlle process than the others. It is important to make
to tile interviewees. An excellent way to malee obr intcrviewees comfort-, a note of the extraordinary contributors so that they can assist you later
able with the process is to concILlct the il1lerVie\1s in their work environ- in tlle analysis. They will be extremely helpful if you need aclditional
ments insteacl o' ours. For instance, go to the break area or lhe shop to event infonnation for valiclating the finished opportunity analysis, as well
ta1k to these people. People will be more forth!!:oming if they are COl1l-' as assisting when you begin the actual Root Cause Analysis CRCA).
fortable in their sUlToundings. I I Remember to use om undesirable event clefinition and block diagram to
A1low the interviewees to see your notes. This \~ill set them al ease slncei keep interviewees on track if they begin to wander off of the subject.
they can see that the infonnation they are proyiding is being recordedi We should strlve to keep interview sesslons relatively short. Usually abont
accurately. Never use a tape recorder in an oppfxtunity analysis session one hom is sllitable for an interview session. This process can be very
because ir tends to make people uncomfortable! ancI less likely to share) intensive and peopIe can become tired and sometmes lose their focns.
infonnation. Remember, this is an infonnation gathering session ancl not! This 18 dangerous becal1se it begins to npset the validity of the elata. So as
' !
an mtelTogatlOll. i a rule, one hour of interview 1S plenty.
If we do not llnderstancl what someone is telling us, let them use a pen:
to draw a simple cliagram o' the event for further understanding. If we!
still do nol understand what they are trying to describe, then we should:
sur 3: SUMMARIZE ANO ENCODE DATA
go out to the actual work arca where the problem is occurring so tilat we: At this stage, we have generated a vast amount of data from our interviews. We now
can actually visualize the problem. ' have to begin summarizing this information for accuracy. \Vhile conducting our
Never argue \vith an interviewee. Even if we do not agree with the person,' interviews, we will be getting some redundant data from different intcrviewees. For
ir is best to accept what they are saying at face value and cloLlble check! instance, a person fmm the night shift might be giving 1.1S the same events that the
it with the information from other interviews. The minute we become' day shift person gave uso So we have to be very careful to sum111arize the infonnation
argumentative, it reduces the a11101.1nt of information that we can gel fronI and encocle it properly so that we do not have redundant events and are essentially
that persono Not only will that person not give us any more infonnation, "double elipping."
but chances are he or she wi11 alert others to the arg1.1ment ancl they wiII The easiest way to collect and summarize the clata is to enter it 1nto an electronic
not want to pmticipate either. spreadsheet or database like Microsoft ExceJl or Microsoft Access', Of course we
Always be aware of interviewees' names. There is nothing sweeter to a couId certainly do this manually with a pencil and paper, but if we have a computer
person's ears than the sound of his own name. If you have troubJe rcmem.,
bering, simply write the names down in front ofyou so that yon can always 1 Microsoft Excel is a regis!ered trademark of the Microsoft Corporation
refer to them. This gives any interview 01' cliscllssion a more personal feel. 2 Microsoft Access is a registered tradcl11Mk 01' the Microsoft Corporation
64 Root Cause Analysis: Improving Performance for Bottom-Line Results Opportunity Analysis: "The Manual Approach" 65

available, \Ve should take the opportunity to use it. It will save many homs 01' In this example, we are looking at the recovery subsystem ancl we have sarted
frustration performing the analysis manually. Once >ve have input all of tlle infor- by the rccirculation pump fails. Four different people at four sepamte times described
mation lnto our spreadsheet, we llQ\V have to look' any redunclancy. We should these events. ls there any redundancy? The easiest way to see is to 100k at the modes.
always remember to use a logical co'ding system inputting information lnto a In ('his case wc have lwo that mention the worc1 bearlng. The seeond is oil contam-
compllter. Once we define what that logical system is, we stick with it. nation. The interviewec was probably trying to help us out by trying to give us their
Otherwise the computer will be unable to provide \Ve are trying to a~hieve. opinion 01' the cause o' the bearing failurcs. So in essence the first three events are
Let's take a look at the following ex'ample to help understand logical cod;ing. reaUy the same event. So we will have to summarize the three events into one. This
is what llnght look like after we summarize the items.

TABlE 5.4
Logical and lIIogical Coding TABLE 5.6
Example of Merging Like Events

Logical Coding

STEr 4: CAlCULATE lOS S


Calculating thc 10ss froIn individual modes is a relatively simply process. The idea
here lS to mllltiply the frequcney per year, times the impact per occunence. So if
we have a mode thal eosts $5,000 per oeeurrence ancl it happens once a 111onth, then
IIlogical Coding 1
we have a $60,000 a yem problem. 'vVe uSllally choose to use financial measurements
(e.g., dollars, euros) to aceurately determine 10ss. \Ve may find that using another
Jf \Ve were to use the cocling portrayed on the I of Table 5.4 we ;would metrie ls a more accurate measurement for our business. For instance, we may want
get illconsistent results when we tred to summarize data. Therefore, we have to to traek pounds, tons, number of clefeets, etc. But if it is p05sible, we should try to
strive to use a coding system like the one depicted'in the top of Table S.4,:which convert our measurement into financial eUlTeney. Money s the language of business
. ,
should give the reqllired result when summarizing t~1e data. ancI ls usually the easiest Lo cOl11l11unicate to aH leve1s o the organization.
"How can \Ve eliminate the redundant infonnation that is given in the int~rvicw Let's Ioole at a few examples of calculating the 105S.
sesslons?" The eaSlest way is to take the raw data from our interviews and ii1put it
luto a spreadsheet programo Prom there we can use the powerful sorting capabpity of
the program to help look for the redundant events. The first step ls to sort th9 entire TABLE 5.7
list by the subsystem column. Then witllin each sllbsyste111, we will need to sprt the Example of Calculating the Loss
taiture event column. This wil1 group a11 01' the events from a particular area ~o that Frequency x Impact = Total Loss
we can easily look for cluplicate events. Once again, i1' we do not use loglcal cocling
this will not be effeetive. So we should stlive to be c1isciplined in our data entry ~fforts.
Let's take a 100k at an example of how to summarize ancl encocle event~.

TABLE 5.5
Example of Summarizing and Encoding Results In this examplc, we are simp1y multiplying the frequeney per year times tIle
impact per OCClllTCnCe, which in this casc is in number of llnits. In other words,
~
Recave!'}
Recirculation
Recirculation
Pump
Pump
Fails
Fails
Bearing Lacks Up
OH Contaminatian
12
6
12 Hours
1 Day
when eaeh of lhese modes occurs the impact is the nllmber of unlts 10st as a resulto
Notice that the last eoIul1ln is totalloss in dollars. \Ve simply multiply the nllmber
Recovery Recirculation Pump Fails Bearing Fails 12 12 Hours of lost units by the cost of each unit to give a totalloss in dollars. That's a11 there
Recavery Recirculation Pump Falls Shaft Fracture 1 5 Days is to it!
66 Root Cause Analysis: lmproving Performance far Bottoll1-Line Results Opportunity Analysis: "Th e Manual Approach" 67

sur 5: DETERMINE THE "SIGNlflCANT HW"


I TABLE 5.8
'vVe now have to determine \vhich events out o' all q1C anes \Ve have listed are Sample Opportunity Analysis Worksheet


significltllt. \Ve have aH hemd 01' the 80/20 rule, but wl1at does it rcally mean? This
rule is sometimes referred to as the Pareto PrincipIe. Thc name Pareto comes from
the early 20 th cemury Italian ecol1omist who once saicl tf1al, "In any set or collection
..
of objects, ideas, people, and events, a few within the !sets or collections are more ~~j1bl~y','~,m\:-';l'; :.i~;i-;{S' , ::y,ilt<\: -;<~'Y"\ ' iM'~-qe:",-;' "~o,;.
;, iJ~i.e:j;e!i;:;Y ,--,--:;';;li:iip~(';l?'-i-.'.Jo~!\t::9sS-'-';_;1
Sub Syslem A E\~nt 1 rv10de 11 30 $40,000 I
significant tban the rcmaining majority:' This rule or pl~inciple
$1,200,000
,
demonstrates lhat in Sub System A E\~nt 2 Mode 7 4 $230,000 $920,000
our worlcl, somc things are mOfe important than others. ,Let's look at a few examples Sub System B E\,lOnt 3 Mode 1 365 $1,350 $492,750
Sub System A Ewnt 2 Mode 5 10 $20,000 $200,000
of this rule in action: ' Sub System A E\~n 2 Mode 8 iD! $10,000 S100,000
Sub System B Event 5 Mode 6 --,, $2,500 $87.500
Banking Inclustry: In a bank approximately 2091) ol' less 01' the CLlstomers Sub System B Ewn14 Mode 4 1,000 $70 $70,000
Sub System A Evanl4 I Mode 12 8 S8,OOO I $64,000
accoLlnt l'or approximately 80% or more 01' theiassets in that bank. Sub System 8 E\oI:lnt 6 Mode 10 6 $8,000 I $48,000
Hospital Inclustry: In l hospital approximately ~O% or less 01' the patients Sub System e E\oI:lnl4 [I.'lode 13 4 5;7,500 $30,000

get 80% or more of lhe eme in lhat hospital.


Sub System B E\~nI4 Mode 9 iD $2,500 S2~!~
Sub System A Evenl '1 r'llode 2 12 $2,000 $24,000
Airline Industry: 20% or less of the passengers iaecount rOl'
80% 01' more ~~ste-mA E\~nt 1 Mode 3 9 $2,500 I $22,500
Sub System e E\,Dnt 6 Mode -14 6 $3,500 521,000
of the revenues 1'or the airline. Total Loss $3,304,750
Slgnlficant Few Losses (Toto1 Loss x .80) $2,643,800
It also works in industrial applieations. Throughout our;years 01' expcdence, ancl our
clients, the rule holds truc, Tvventy percent or less ol' q~e iclentil1ed events typically
reprc::sents 80% or more of the reslllting los ses . This ls !tmly
,
signiikant if yOll think
about iLIt says that if we focus on and eliminate thc 20% of the cvcnts that represent
80 0 of the losses, we will achieve tremendoLls imprc1vemcnt in a relatively short Opportunity analysis rcsults

period of time. It just makes C01111110n sense! IEl Failure events I


~~-~

60%
Think about ho\V tIle rule applles to evcryclay living. We probably a11 are gullty
01' wearing 20% or less of the clothes in our closet 80% of the time. We probably aU 50%

havc J toolbox in which \Ve use 20% of the tools 80% of the time. V/e spend alhat ~
.9
40% J1---1+---~.~~~~~~~---~-
moncy on a11 those exatic taals and mast rcpairs require the screwdriver, ha111111 amI ~ 30%

-Ih--I~---
o
a wrench! V/e are all guilty ofthis! The rule even applies to bllsiness. Take fOl' inst~nce 0<0: 20%
a majar airline as clescribecl aboye. It is not the once ayear vacaoner who generates lO%
most of ils revenue, It is the guy who fiies every Nlonclay moming and returns e:very 0% ~ Ilh1mr''--r''"l=-r.,--
1,liIr iJ 8.8 n e.,",""_,
...,"T,-r,,"""Y-'Y'Y'r'Y'T'Y'Y' ti i [ i i [

Friday afternoon. So it ma}(es sense that very few of the airline's cllstomers reprcsent ~----_.j>- Events
most of its revenue and profits. Have you ever wondcred why Prequent Flyer programs 80% of 1055 20% of loss
are so important lo an airline? They knm\' \vhom they have to cater too
Lel's take a lool\: at the following example to determine exacy how lo ta!(C a FIGURE 5.5 Sample Bar Chart oi' Opportunity Analysis Resulls
list 01' cvents ancI nanow it down to the "Significant Few,"
In arder lo gel the maxlmum effect it tS always wise to present this information in
altcrnate ['mms. The use of graphs and charts will help us to e'fectively communicate
Step 1: lvlultiply the frequcncy column times the impact column lo gel 'a
this information lo others aroLll1U USo Here is a sample bar chart that takes the
total annual los s figure,
Step 2: Sum the total annualloss eolumn to obtain a global totalloss figure spreaclsheet data above and converts it into a more understandable formaL
for all the events in the analysis,
Step 3: Multiply the global total los s figure liom Step 2 by 80% or 0.80. sur 6: VAlIDAU RESULTS
This will give us the "significant few" los ses amOllnt.
Step 4: Sort the total 1055 colunm in cleseencling order so thal the largest Althollgh our anaJysis is almost l1nshecl, there is still more lo aceomplish, \Ve have
events bubble up to the top. to verify that our iindings are accurate, Our opportunity analysis total should be
Step 5: Snm the totalloss amounts from biggest to smal1esl untll you reaeh relative!y close to our gap that we defined in om prcparatory pilase. The general
nlll~ ;~ nh~ 'll' t11inll~ 1n o/,., nI' thp ,H\n
68 Root Cause Analysis: lmprovillg Performance for 8ottom-Line Resu!ts Opportunity Analysis: "The Manual Approach" 69

If we are \Vay L1nder that gap, we have either missed some events or undervalucd
them, or we do not have an accurate gap (actual veJ~sl1s potential). Ir \ve \Vere lo
overshoot the gap, we probably dicl not do as gaocl a jobi at rcmoving [he rcdundal1cies
or \\'c have simply overvalueclthe los s contribution. ' 1

At a minil1111111 we must c1ol1ble-check our "signicanr rew" evcnts lo make sure


\Ve are relatively clase. We do 110t loo k for perfection in this <:1nalysis simpJy beca use
.it would take too long lo nccomplish, but \Ve do wall~: to be clase. This \voll1d 'be l
good opportunity to go to our data sources like otlr computerizecl mainlcnance
management system (Cl\IIMS) or our clistributive cOl~trol system (DeS) lo verify
trends and financial numbers. Incidentally, if there \-vere ever a controversy over a
financial number it wou1cl be pruclent to use numbers t~at the accouncing deparlment
clecms accurate. A1so, it is better to be conservative wi~b our financia1s so \ve do not
risk 10sing credibility for an exaggerated nUl1lber. The!nul11bers wiil be high enough
on their own without any exaggeration. Other verific~\ton methods might be more
interviews or designed experil1lents in the He1cl lo va1i~!ate intervic\v lindings. Al! in
aU, we want to be comfortable enough to present th~se numbcrs to anyone in the
organization and feel that we have enough supporting:information lo back lhcm up.

sur 7: ISSUE A REPORT


Last but certainly not least, we have to cOl11municate otlr I1ndings to decision makers
so that we can proceecl to solving SODle o' these pres~ing issues. Many of us faller
here because we do not take the time to aclequately prepare a thorough rcport ancl
presentation. In order to gain maximum benefit from t11.is annlysis, we have to prepare
a detailed report to present to any ane! all interested: parties. The report format is
basecl primarily on style. This may be our own persqnal style or even a manclatee!
company reporting style. 'vVe would suggest the followlng ltems to be includfd in
the reporL

Explain the Analysis: 1vIany of our rcaders may be unfmniliar with tIle
opportunity analysis process. Therefore, it 1S in our best interest to give
them a brief overview of what an opportunity analysis is ancl \:vhm its goal
ancl benefits are. This way, they wi1l have a clear unclerstanding of what
they are reading.
Display Results: Provide severa] charts to represent -he data that our analy.~is
uncovered. The classic bar cllart clemonstrated earlier is ccrlainly a mininial
requi..rement. In aclclition to supporting graphs, we should provide al! the
details. This incllldes any and all worksheets compilecl in the lnalysis.
Acld Something Extra: We can be creative with this information to provicle
further insight into the facility's neecls by determining other areas of
improvemcnt othcr than the "significant few." For instal1cc, we couId
break out the results by subsystem ancl give a total 10ss figure for each
subsystem. The manager of that arca would probably find that infonnation
very interesting. We could aIso show how ml1ch the facility spenl on
particular maintainable items (e.g., compoilcnts) likc benrings or seals .
. ,.
70 Root Cause Analysis: lmproving Performance far [3ottolll~Une Results

must use om imaginatiol1 as to what we think 1S qseful, but by using tl1e


querying capabilitles o' OUt spreaclsheet or datab~lse, \Ve can glcam any
number of interesting insighls from this data.
Recommend vVhich Event(s) to Analyze: -YVe cOllld cOl1ceivably have a
6 Asset Performance
Management Systems
couple clozen events from which our "significant few",
lisl 1S comprisccl,
'vVe cannot work 011 all of them al Once so we musl!prioritizc which cvents
should be analyzed first. Common sense would tlictate going aftcr lbe (APMS): Automating
mast cost1y event first. On the sUlface this sound~ like a good idea, but
in reality we might be better off going afier a les!s signilicant event that
has a lesser degrec of complexity to salve. We like to call these evenls
the Opportunity
;'low hanging fnt." In other words, go after t~c cvcnts that givc lhe
greatest amollnt of payback with the least amoun~ of drarl. Analysis Process
Give Credit \Vhere Credit is Due: vVe must list hach and cvcry pcrson
who participated in the analysis process. This incll1Clcs intcl"vicwees, sup-
port pcrsonnel and the likc. If we want lo gain ~hcir supporl 1'01' fulurc 1n the last chapter \ve disCllssed the manual inte1'v1ew method o' collecting evenl
anulyses, then we have to gain their confidence by giving tbem credil for data 1:0 determine the candidates rOl' ReA. Now, let's consider automating the process
the work they helped to perform. It 1S also CritiCi:tl to makc surc that we o' event da(-a colleclion. When we talk about automating data collection \ve are really
feed the results of the analysis back to these peobte so they can see ('he cliscllssing how to collect event data on a c1ay-to-clay basis using modern data
final product. \Ve have seen uny number of analy~es ,
fail because partici- colkction ane! analysis tooIs. vVhcn \Ve employ sophisticatecl data analysis [ech-
pants were len out of the feedback loop. . niques, we actually have the ability ro view the data in a \Vay tilat turns faw elata
into acLionable information.
That i3 all there ls to performing a thorough opportunity ~malysis. As \ve mentioned 1n -h1s chapter we will discuss \vhat ls needed to implement a comprehensive
I
befare, this techniqllc is a pO\verful analysls too1, but it ~s also un invaluable sales evenl recorcling data system. Below are the cure activities that need to be established
tool in getting people interested in onr projects. If we tlnk about t, it appeals to to enable the aUlomatecl data analysis infrastmcture:
aH parties. The people who participatecl will benefit bec~1L1se it will help eliminate
some of their unnecessary work. iVIanagement \villlike it because ir ciearly demo~ Determine yom event dala neecls
stnlles what the return on lnvestment vViU be if those event's 01' problems are resolve~l. Establish a worldlow to collect the data
So, i' you are struggling with elata quality issues in your current elata systell1s Employ a comprehensive data coUection system
ancl yOLl \Vonlcl sti11 like to determine whe1'e to start yOllr _RCA process, consicl~r Analyze the digital data
thls approach. It wilI help yon learn a great eleal about your facllity anc1 will provide
you with 1he focus to get startecl with RCA. In the next chapter \ve will explo~'e
DETERMINING OUR EVENT DATA NEEDS
methocls for ntilizing existlng data systems to perform a simitar type 01' analysis.
This aSSllmes that there is ample data in these systems t.lnd tile elata is considered Once we have satisfactorily dctcrmined out' performance mctrics it is tiLDe to deter-
to be of goocl quality fOl' performing opportunity analysis. mine the data requirccl to accurately report on those metrics. Our data requirements
will vnry depending on our selection 01' KPIs, so we w111 provide some COl11mon
data requireme.nts lo satis!"y lhe more common metrcs.
Since we are Cocusecl on collecting event clata ir is lmportant to repeat what \Ve
cliscllssed in tbc manualmethod. The clefinition of event 1S stilJ critica] vi'hether we
are performing Opportunity Analysis manually or \vith an automatecl collection
system. This clennilion Is crilical to the proCSS ami is typically the place where
el'forts like lhese bccome unsuccessful. As \Ve might imagine, it is very difficult 10
collect daLa 011 somcLhing like events whcn the term has noL been fully defined. What
might be an evcnt lo you might not be considered an cvent to someone else. So
foHO\v .'lome good advice ane! aecurate1y define the event for your organization ancl
thcn coml11unicate that definition to aU the re1evam clara collcctors.
72 Root Cause Analysis: !mproving Performance for Bottom-Line Results Asset Performance I'y\anagernent Systellls (APMS) 73

v- What kind 01' elata shoulel be collected when an ev~nt occurs? TabIe 6.1 is a tabIe
of commOll elata items lhat shouId be collecteel ror
miy evento
The lst is by no means complete but it is a goocl basis for getting a good event
reporling system off the ground. Most asset performance ,
KPIs coulel be calculatce!
\vith dala from lhis list.

ESTABUSH A WORKFLOW TO COLLECT THE DATA


We do not want to minimize the elifficultly rclateel to collecting event elata on a regular
basis. Thc fact is that collectlng accurate event data is extremely elifficult. Event clata
i5 eliffcrent than other types of data. Process data, for instance, is automatically capturccl
in a clisciplined and consistent manner through the use of a Distributive Control System
(DeS). The data is automatically captLlred \vlth very liLllc human interaction.
Event data, 011 the other hand, is dependent 011 d variety of people col1ecting
data in a uniform \Vay. For instance, what one persOl~ might view as a pUl11p evenl
111ight actually be a motor (driver) issue. So how dp \Ve enSllfe that (he data is
compiled in a uniform manner?
First, we neecl lO educate an stakeholders in the need ,
for accmate clata collectiol1.
Inloday's busy work environment we m'e eonstantly ascecl lo collecl an mmy 01' data. --=--'+--=-"-~
The prob1em w1th this approach is that most peoplc h~lve no idea how the dala they
are bcing asked to colIeet is actually usee!. \Vhe11 tbis happens we begin to see en tries
in the Computerized :t\1ainlel1anee lvIanagemcnt Sys\cm (CtvllVIS) stating, "Pump
broke, fixed it." This obviously gives no detail into the vents and provides 110 oppor-
tUllty to summarize the data for llseful clecisiol1-making, So before ,",ve ask anyone to
collect elata, we neeel to edueate them in how the data will be used to make decisions.
The second step in the process i8 related lO the tlrst in that we neeeL to dcyelop
elc1lnitions ami cocles to support the event elata collection elfort. This means that we
need to detennine common event cocles for our equlpment events anc! tl1en ed~lcate
our data eollectors in the elefinition of these cocles, You might eOllsider 1SO-14274 as
a guideline for determining your equipmcllt taxonomy ancI to he!p you get startecl'. with
a good code set for doellmentlng events. ]SO is the International Organization for ~

Slanclardization, and ir has developed a standard approaeh rm the eollection and cxc!1ange j
>
I.W
of reliability and maintenance data for equipmen(. YOll can !i.ncl out more abou( ISO and
:>.-
the 14224 standard on their \Vebsite at \Vww.iso.arg. A greal \Vay lo train personncl e

in 111is is tl1rough the use of scenarios. The groups of elata colJeetars are presented with ~

the various cocles ancI their definitions. They are then subjeclcd to a variety 01' event $:
~
u
scenarios to test how they would use the cocles in a variety 01' commOIl situations. ~
Last, but not least, a comprehenslve work11O\v will need to be establisheel to oU
collcet the elata described aboye. Essentially a11 array 01' "W" Cluestions neecls to be o
~
formulated ancl answercd. For instance: ~

E
-""
~

Wha will calleet the data? '.ti


w
\Vhat data lS important? -'
'"~8
~

When \Vil! the data be eollected?


Where \Vill it be stored?
Who will verify the data?
\Vhn \:l/in I-"ntp thp rht'l'l
Root Cause Analysis: Improving Performance far Sottom-Line Results Assel Performance Managemellt Systems (APlv1S) 75
74

\Vc wili answer rnanv of these workflow questions \vhcn we disCllSS clala col-
TABLE 6.2
lcction systems. As a pr~lude to this, what many people do is to try lo use th~ir
CMJ\1S syslem as tbe initial workflow lo calicet some or
the data, and thcn c!evJse
Common Data Fields

a sllpplemental world1ow to get the remaining data itms. T~1is. j.s certail.11y ~ne AssetID ~,l;iiIllclla!lce Slarl [ilHerriIll~
metl10d uncl rnay be one 01' the most cffective since somc lecy relmblllty data J5 bCll1g
generated through the use 01' the maintenance system.

EMPLOY A COMPREHENSIVE DATA COlLECTION SYSTEM


!
I Un
neeel lo use pow~rrul elata
Evelll Dale
To truly autornate the Opportunity Analysis process wd I
colleclion ancl analyticaJ tools. Database technology bas corne lo lhe POtnt whcre
clifferent types af elata systems can easily "ta1k" ro eacb!otber so that a wide variety
of elata can be collectecl, summarizecl and analyzed to allow analysls to make
This elata is a solid starting point ro performing Opportunity Analysis for Root
informed decisions. !

\Ve are going 10 discuss a method for trans1'crring pata from cXi,sn g Comput- Cause evcnts. The next step is to Transfer this data into an APIvIS so that tbe data
erized Maintenance 1Vlanagement Systems (CMN1S) into: an Assel PerJ"ormance Man- c~n be supplemented \Vith additional data about the evenl and then be "sliced ancl

agement Systcm or APrvIS. Befare \ve clisCLlSS the inl~rface bet\:'een ~M~IS. <:nd dlced" 10 determine the opporlunities.
AP1\/IS let's cliscuss the role 01' borh of thesc systems in:the operatJon oi tbc aedily. ,. In order 1.0 make use .of this important data, the dala 111llst be some\vhat easy to
A CNlrvIS i5 designed to aS5ist maintenance person11el in the execulion 01' work. flnd ami manipulate. Havmg \vorked with Reliability and Maintenance Analvsts far
The main 1'unction of this system is to automutc the prcess of getting maintenance many years, \ve have seen a number 01' homegrown retiability management s;!stems.
tasks cornpleted in the field. This includes things likp generating work. requests, 1 an.1 s~il:e th~t y?U loo can attest to sllch systems. For example, what happens when
prioritizng work, plannng and scheduling, matcrials lmanagen~ent ancl hnally tbe RclwbIll.ly ~jngll1eers cannot seem to acquire the data they neeel to do their job?
actual execution of the work. CTvHvISs by natme are tr:msactlOn-based syslems, Tbey bmlcJ It themselves: Thcy miraculously go [rom capable engineer lo software
since many transactions have to take place to complrtely ex~cule a mainlenance developer. 1 am sllre you have seen sorne 01' these mastcrpieces. They builcl them in
cvent. 1I[any efficiencies are gained by automating tbe mall1tenance workf:~ow. spreaclsheets, clesktop databases or even llsing full-blown developmenl tooIs.
Hence, most asset intensive companies have implemented such syslems to aclueve Although th~s~ hon:cgrown systems serve a vaIuable purpose for their creatars, thcy
have 111any pittalls 10r an organizatioL1. For one, the data may or may not be accurate.
the many bencfits.
Althouob a CMlVIS provicles a variety 01' beneJ.1ls, it was not designeclto ~e an Since the elata is lypically collecteel by a handful 01' l1sers, il may not truI)"' refiect
analytlcal :ystem to provide clecision support to Reliabillty ancl Maintcnance j~na- ll:~?verall reality. The data may n01 be propcrly cvent codecl so it becomes extremely

1ysts. It cloes, however, offer a variety of good elata that c~n be usecl lo perJorm chthcult to analyze. The main problem \vilh these homegrown solutlons is tbat the
reliability analysis. Por instance, every work arder sboulcl dell11eatey1c a~set ID and data 15 Bol accessible to all the stakeholclers who neecl il.
location of the maintenance evcnt, the date thc asset came out 01 serVlCC ;.\l1c! the An APIVS 1S clesigned to interface with existing data sources like CMMS
components that were used to repair the asset. There s. obviousl:i .much more than (Figure 6.1), PDfvl systems, process systems and a variety of others. This enSUres
litis, but those items alone can be extremely valuablc ll1 dclen11lnl11g event pruba- that the clata 1S aceurate amI i5 kept up to date, as the interface keeps the system
conl"inllally in sync. This is critica! because it allows the data to be collected once
bilities and even optimizing preventive maintenance aclivities. .
\ n APIvIS is not desioned to hanclle m:ntenance worldlmv ami transacuons but ancl usecl fOl' a variety of purposes. An APl\1S is a sccurcd system so yon know
~\ b 1
to taLe that data and a variety of othe!" elata to create actionablc information in 'vVI~lC 1 that lhe data is proteeled. The most important purpose of an APlvfS is to provide
to improve the overall reliability anci availabiJity of lhe facili:y. Tbcse .tools l1:lgl1t the value-adclecl allalysis looIs to turn existing malntenance ancl reliabilitv data ,",:l~_.
contain extensive clata manipulation tools, statistical analysls tools ltke Welbuli into actionable information. - ~
Analysis, Root Cause Analysis (RCA) , Rlsk Basee! Inspection (RBl) and 1~1.lrlY Let's move on to the arca of Hnalyzing your digital elata.
otbers. Vk wil1 focus our discussion on how an APrvIS can be a valllable <ud to
helping Root Cause AnaIysts determine the besL opportunitics for analysi~. ANAlVZE THE DIGITAL DATA
So what data can we use from a Crvll\ilS that wOllld help AP1V1S cletenmne wherc
the best opportunities [or analysis might be? 'TabIe 6.2 i5 a !::ble of some 01"' tbe The tool 01' choice lo perform Opponunity Analysis is lhe Pareto charlo Just to recap,
common data fields that woulcl be useful in this type of analysls. a Fareto cha1't is simply a w;:y to clelineate the significant tems within a colleedon.
76 Root Cause Analysis: lmproving Perrormance far Bottom-Line Results Asset Performance Management Systems (API\t1S) 77

In our casc, it will l1elp llS determine the ft~\V significant issues that represcnt the
majority 01' the los ses within a facilily. The Parcta cl1art can be LIsed on a varety af
metrics depending on the necd. For instance, S0111e userS might slInply use mainte-
nance cost as thc on1y mcasure to determine whether an RCA needs to be initiated.
Others l11ight want to compile all the cost::; associated with an event, namely lost
opportunity costs for llot proclucillg. Still others might be more interestcd in IvIean
Time Belwecn PailllIc or MTBF. The assets \vith the lowcst IVITBF might be the
bes!' candidates for RCA. The advantage of llsing an automated approach to Oppor-
lunity Anal ysis 1S that the analyst can look {ur upportunities using a variety of mctrics
ancl techniques.
Today therc are some pO\verful technologies to vie\\' ancl analyze data. One of the
best for performing Opporlunity Analysis via Pareto chmis 15 a tcchnology called On
Line Analytical Proccssing or OLAP. This technology allows users to view data with
a variety 01' cJimensions ami measures. For instan ce, suppose you wanted to know
wlch unit within yom plant was l'csponsible for the greatcst maintenance expendi-
tnres? Once yOll knew that, the next obviollS questioll might be \vhich pieces of
equipment were most respol1s1ble for tll3.( To go even deeper, yOll might want to know
what the componenl was tbal causecl mosl of that expense. With OLAP tools, you can
use powerFul drilldown capability to do this type of analysis. Figure 6.2, F'igure 6.3,
anci Figure 6.4 are a series oY charts clemonstrating these dynamic Pm'eto charts.
The use of OLAP makes sophislicated clata mining easy fol' end users, It allows
Llsers lo see what they \vant to .'lee in the form that is the 1110St useful fol' them.
Althongh OLAP is an incredible tool for dynamic Opportunity Analyses, other
tools might be LlseFul as well. Sorne uscrs might like to see the data presentecl in a
particular formal. Por instance, suppose there is a corporate reporting standard that
neeels to be aclherecl too Ir this were the case, the use of prefonnatred reports l11ight
make the most sense. Reports are useful ror presenting preJeterminecl metrics that
are updated every time the particular report 1S runo Figure 6.5 is an example 01' a
pump event counl and maintenance cost report.
'fo a110w for complete f1exibiJity for data analysis, an AP1VlS \vould provide a
comprehensive 1.001 to perform acl hoc query abily. A query is simpl:y a way to
extract the information we need from the clatabasc. This is commonly done Llsing
the struclurcu query langllage 01' SQL. SQL is [he syntax or language needed to gel
the retev~-ll1t elata from the database. SQL is not something l110st analysts are inti-
mately familiar with. So the APrvIS must provide a highly llexible query tool that
uoes no! require lhe end user to know anything ubout SQL. Figure 6.6 and Figure 6.7
provicle an eXi.1mple of a quel'y clcsigned to determine the MTBF (mean rime between
failures) for a collcction of pumps.
This 0111y shows the surface of what can be accomp1ished when we automate
opportunity analys18. Therc are far more sophisticated statistical methods thal can
be employccl. Our advice, however, is to start with the basics a11d slowly move into
more soplsticatecl methods.
By automating opportunity analys!s, the users have a dynamic tool that allows
them lo luok at opportunities in a variety of different I,vays. As business conditions
change, then so can the opportunities. Thc key is to consistently collect the right
,--I~lt,;] nn ,;] rl"'I_l-rLrICl" h ... ";,,
'"
e
fij

Al
o
o
:;;510,000
n
$480,000 ~
e
.$450,000 ~, ~
ro
$"20,000 p
$390,000
:$360,000 "
-<
~

~
$330,000
E$300,000 ~

3
$
o
<
ClC1"
~
~
~

3
~

"nro
O'
~
< ro
O
",
e
.g O
~
:3
~ ,~
Prvc:e:ss Units 3"
'"
FIGURE 6.4 Ste.p 3 - Dri11 down 10 determine the components for the highest asset cost (i.e. P:tvlP-4543) '"'"e
~

:;:;

p
~
~

s.
~
o
3
~

~
ro
merltt~Urn
" ~" :;:
"",'"1:0"",0.,.-,,,,"
Centrifuga! Pump Repon: p
~
~
Top 15 bad Actors by Cost cm
ro
~~~~f;$He 3
ro
Pl...-\.2;;1:ii,l.
~
;; M Y'..AL5 552.\,757

illn::wR<'t,X".rt ~ o e.;ARINGS $!41,971


" V>
-;;;
liQO=Ro::>x: o [] S27,{)82
ro
g S~\",R~;>;:ct
3
~
L\\' Perf1Jfmanct; 510,732
W$!-,'ek- " o ----_._---._-------,--- 5=
TQ~a! $7'14,5'1'1 28
)<;'[h>:i~tR"flO:l v
.:;1 Prir'j~iX'n .tME.:.1!L1,L :;::
en
,l) S!:roTD) mm SfALS $30,820 H
,,-J; Hcl;- o
<i UEMUHGS $175,339
o o CAS1NG $58,418
o o SHAfT $52,563
Tfa $61;8,141 lB
?EP- 6f.lli1L
~ o BEARlHGS $543,414
T.ta! $51,3,411

-'> L ........ S\.~~\/';:;t..

>--;:,;"~,, ~"" C;'~,""\~lC fJ ,,!\,~


FIGURE 6.5 Sample pump event repor!
'Cl;:; '<- :::
,-=- S
B2 Rool Cause Analysis: lmproving Performance for Botlom-Line Results Assel Performance !\'ianagelllenl Syslems (APMS) 83

~ s"'o g 's
~
" ~

~ "" ~
j [5 ; ~
~

i
!
o f ~ ~
:::
~
~

~u
. ., . ., .
~
~ ~ 8 ~
~ ~ "5
o
u
V)
tl
'0 ~
" 8
g :;
E
:s"
u
8 S g
" ~ ~ ~ ~ ~ 8
le
g
"'
u
u
1=2
8 !l
o

N <f. " ~ ~
U O
J
il
o.,
>,
u

''""
Iil
m~
"-
E
"
ero

"vi
<.u

r
~'
'"
::o
v
:
~. _:~
l' \j
;l~
"l
V'

i , "! 1l, !, ,, ~
,
" & ~ S ~ ,~ j; , <X ~
B

~ ~"
~
g~ *ti"11
1ill(~\11;~
Jl ~
z o
~
"
B =~
"
" "~ ~ ]
f X <ih:-;J) Si
"
~
~
The PROACT RCA
7 Methodology
The term "proact" has recently come to mean the opposite of react. This may scem
to be in conniCl with PROACT's use as a Root Cause Analysis CReA) too1. Normally
\-vhen \ve think of H.CA, thc phrase "after-thc-fact" comes to mind. After-the-fact,
by ts nature, sllggests an undesirable outcome must OCCUT in arder to spark action.
So how can RCA be coincJ proactive?
In the last two chapters on Opporlunity Analysis COA), we cleady outlined l
process by which to iclentify which failures or events were actually worth per-
forl11ng RCA. \Ve Jearnecl fmm this prioritization technique that, generally, the
1110St important events lo analyze are NOT the sporadic incidents, but rather the ~
day-to-Jay chronic events t11at continually sap our protltabHity.
RCA tooIs can be used in a reactive fas11ion and a pro active fashion. The RCA
analyst \Vj]] llltimately determine this. VVhen we use RCA to investigatc onIy those
incidents that are cJe11necl by regulalory agencies, thcn \Ve are responding [O the fleId.
T!s is s(Tictly reactive. Hmvever, ir we were lO use the OA tools described previously
to prioritize our efforts, Vle \villllnCover events th<11 many times are not even recordecl
in our ComputerizecllVlaintenance lVlanagement Systems (CMivIS) or the tilee. This
is because sucIl events happen so often tha1 they are no longer an anomaly. They
are a part 01' the jobo They have been absorbed into lhe daily routine. By uncovering
such cvents ami analyzing them, \Ve are being proactive because unless we loole at
them, no one eIse win.
The greatest bencfits from performing RCA will come from the analysis of
cilronic events, hencc Llsing RCA in a proacve manner. V/e 111l1St understand that
often we are gelting sllcked into tlle "paralysis by analysis" trap ancl encl up expend-
ing too many resources lo attack an issue that is relatively unimporrant. \Ve aIso at
limes refer lo these as the poltical failures-of-the-day. Trying to do RCA on every-
tbing will destroy a company. It i8 overkill, ane! companies do 110t have the time or ~<'~
resources to do it effcctively (Figure 7.1).
Understanding the difference between chronic ancl sporadic events \vi11 nmv
highligbt ollr awareness to wlch clata collection strategy will be appropriate for the
event being analyzecL The ley aclvantage, if thcre can be (me with chronic evenlS,
105 their frequency of occurrence. This is an advantage because like the detective
stalking a serial ki!ler, he is looking for a pattern to the activities. In this manner,
tile detective may be able to stake out where he feels the next logical crime will
take place and hopcflllly prevent its occurrence. The same 1S tnle for chronc even1s.
Witb chronic events, \Ve have in our favor that they \ViH likely happen again within
a ccrtain time frame ancl we may be able to plan for their recurrence ana capture
more data at that poinl in time. We will discuss tls more \vhen \ve go over
Veriftcation Techniques in the Analyze chapter (Chapter 9).
86
Root Cause Analysis: Irnproving Performance for Boltom-Line ReslIlts
The PROACT0 RCA IVlethodo!ogy 87

PROACT Principal analysl Root cause analysis


RCA requircd Extremely disciplined/high
Methocls Involvcs alllevels altention to detai!
Pan time/full time

Failm Significant few .. 80% oI' losses


Events Vital many _ 20% oflosses 100% Cailul;e covcrage
BFA Hourly/supervisOll' Le~s intensive problem
(basic level solving/qualitx tools ,U)Jp~f'O~i,a~();n
failure Part time fol' tho:: yaJue 0["
Less attention :'-0 detail
<lnulysis) irnfiol'tn~e
ofdata
FIGURE 7,-1 Tile Two-Track Approach lO Failllre Avoiclance

Convers:ly, when \Ve ,look. al what data collection strat~gy would be employccl FIGURE 7.2 Typical Rcasons Why Event Data is NOl Collected
Jl1 a sporadlc event, \ve {md Jreguency cloes NOT work in! our favor. Uncler these
:ircumstances, our detective may be investigating a single homicide and be re!iant !cad to people who llseel pOOl' judgment ancl therefore managcment couId witch 1mnl
l11 tlle evidenee at that see11C 0111y. This would mean we !must be cliligent abollt lbcm and app!y ce1'1ain disciplinary actions.
ollecting the elata 1'ro:l~ the scene before it is tampered withl \Vhen a spo;~ldc event These are al! valie! concerns. \Vc have seen the gaoe!, the bacl and the ugly cremed
<CCllrS,."ve l1lust be dgent at that time to colleet rhe dataiin spitc 01' the massive by these concerns. The raet 01' the matter 5 that ir \ve \vlsh to uncover the truth, the
iIorts ro get the operaton running again. ' real root causes, \!,IC cannot do so wilbuut the necessary data. Think about any
investigalive. or analytical profession, the first step s always to design data colleetion
strategies to obtain the data. ts a deteclive cxpected to solve a crime vthol1t any
PRESERVING EVENT DATA evidence or leads? 18 an NTSB invcstigator expected to solve rhe reasons ror an
'he rst :step in the P~OACT RCA t'lethOdology, as 1s thc c:ase in any investigative airplane crash without any evidence f1'Om lhe scene? Do doctors make cliagnuses
r analytlcal process, lS to preserve ancl colleet relevant data~ Before we discLlss the without any more information than what the patient presemecl? If these professionals
Jcciflcs of ho\V and when lo collect varions forms of data,:let's take a look at tlle see the necessity of gathering data ane! information to dravv conclusions, then we
:;yehological sic1e ofwhy people shoulc1 assist in collccting data f1'o111 an event scene. certainly must recognize the eorrelation to RCA.
Basee! on om experience, we have seen a general resistance to data collection
~et's create a scenario in which we are a mechanic in a manufacturing planto
Te Just completed a IO-day shutclown of the facility to perfonn scheclu[ed mainte- fol' RCA purposes. \Ve can clraw t\Vo general conclusiol1s from our experience
(Figure 7.2):
-~Ilee. Everyone knows at this Yacility that when the plant manager says the shutclown
lllast 10 clays ancl no more, \Ve do 110t want to be the one responsible fol' extendi!lo
l. Peopk are resistant 1"0 collecting event elata because they do not appreciate
past 10 days. A 5ituation mises in the ninth day of the shutdO\vn. During an internJ
the value oi' lhe elata lo an analysis or analyst.
'cventive 1.naintenance inspection we finel that a part has [ailec! ancl must be replaced.
2. People are resistant to collectlng elata because of the paradigms that exist
'-' gooe! fmth \Ve reqll.est the part Ymm the storeroom. The storeroom personncl
wilh regard to v,titch hunling and managerial expecrations.
lOun liS th01t the particular part 1S out o stock and that it will ta1<c [our \vecks to
pedite the orcle~' f1'Om the vendor. Knowing this is the 11inth day of the IO-day The flrst conchlsion we see is the lesser of the two. Often procluction 01' any
utdown, we decIde to make a "banel aid" repair because \ve Jo 110t \vant to be thc facility is the ruling bocly. After all, we are paiel to produce quality product whether
1'son lo extend the shutdown. vVe rationa1ize that the "band aiel" will hole! rOl' the that produc! is oi!, s!eel, package de!ivery orqllality patient careo \Vhen this mentalily
ur-week duration as we llave gotten away with j- in the past. So we install a nol is dominan!, il -mces liS lo rcact \\/ith certain behavors. If proclllclion is paramOlll1I,
:e .. for-likc part in preparation for the start-up of the process. then whenever an event occurs, \Ve ll111st clean it up and get production startecl as .~~
Within 24 homs of start-up lhe process 1'ai1s eatast1'Ophically ancI aU indications quickly as possibJe. The fOCllS js not on why the event occurred; rather it is on the
ld to lhe area where the "banel aid" fix was installccL A formal RCA team 1S amassec1 fact it did occur, and we must get bacle lo our status quo as soon as possible.
ti they a~sign us to collect sume parts elata fram the seene immediately. Givcn lhe This paradigm can be overcome merely with awareness ancI eclucation. ~.ianage
teh huntmg culture that we know exists, "\Vhy shoulcl we uncover elata/evidence mel1l" must [irst cOl1lmit to supporting RCA both vcrbally and on papel'. \Ve cllscLlssecl
II \vill incr1minate us?" \Vhile this is a hypothetical scenario, it could very well earlier in the management support chapter that dcmonslrated acons are seen as
)l'escnt many Sl1aons in any industry. "vVhat is the incentive to coUeet event elnta "walking the talle" ancI one of those actlons was issuing an RCA poliey or procedur~.
hopes ol' ullcovering the truth?" After all, this ls a tirne-consumim.': taslc TI- \vill This would reCluire data to be collected insteacl of makin!I it an ontlon. Secondlv. lt
88 Root Cause Analysis: !rnproving Perform!ance far Bottom~Line Result'S The PROACT<8 RCA tvethodology 89

is not enough just to support lt, but \Ve must tinIe with the incllvicluals wha must
physically colleet the data. They must clearly unclerstand why they shoulcl callcel E @ @
lhe clata and how to do it properly. '
\Ve should link \Vith people's valuc systems and sllOW them the pmpose 01" dala
coHechan. If \Ve are an operator in a steel mill ancl tl1e 6r5t ane to an event scene,

-.~t0JE,~
e E
? e

~
we should understancl what 1S important informaton versus unimportant to an RCA.
For instance, we can view a broken shafL as an item to clean up or as an integral
@@
~--kJ \ ~
E'
piece of infonnation to a metallurgist. If we understmld hm\' important the elata we
calleet is to an analysis, we \vill see ane! apprcciate wl~y it shoulcl be collectecl. Ir \ve .r -\
do not unclerstancl or appreciate its value, then the task'is seen as a burclen. Providing ~Random \ ~ ~ 0
everyone with basic training in proper data collection proceclurcs can prove invalu- event ~ V
able lo any organization. ~\ = error @ = change
'vVe have seen the potentlal consequences of poor slata collection elTorts in somc
recent high proiile court cases. Allegations are made as to the sJoppy handling 01' FIGURE 7,3 The Error-Change Phenomenon
evidence in lab work, improper lesting procedures, lf)lproper labeling and con1am-
inalecl samp1es. lssues 01" these types can lose your c~se as \ovello a plannccl sequence of mental and physieal activilies fails to achieve its intenclecL
Providing the above support ancl training overcomes pne hurclle. But it cloes 110t clcar oLLtcomc, and \vhen these failures cannot be attributed to sorne chance agency."
the hurdle of perceived witch hunting by an organizatio~. rvIany pcoplc wi11 choose 1101 This l1leans \ve in tended on a satisfactory outcomc and it clicl not OCCllr. \Ve,
...'L.,.:J to collcct data 1'or fcm that they mOly be targeted based on lhe conclusion drawn from in sOl11e manner, eithcr 1) clevi.atee! 1'rom our intended path, or 2) (he intencled palh
the data. This 1S a prominent cultural issue that mllst bq aclclresscd in oreter ro progress was incorrcct.
with RCA. \Ve cannot detenrune "root" causes if a wil9h hunting culture is prevalent. Tlle change, as a result 0"1' an error in om environmcnt, is something that is
perceptible lo lhe human senses. An cxample might be that we commit an error by
I misaligning a shaft. The ch,mge will be thal an cxccssive vibration occurs as a resulto
THE ERROR-CHANGE PHENOMENON A nurse administering the wrong medication to a patient is the human error. The
adverse reaction is thc perceptible change. These series of human eiTars ami asso- ../..1 __.
Onr experience indicates that there are an average nllp.1ber 01' elTOl"S that mllst pccm
cjatee! changes me occurring arouncl us evcry day. \Vhen they 9~leue.J:W in a particular ..,./~"-
in a particular pattern for a catastrophic event to occpr. Thc Error Chain Contept l ,
. pattern thal is when catastrophic occmrences happen. r\k<t\.~N.t\~ . . . c:
"describes human error acclclents as the result of a sequence O'l events t I1<1t eu 1J11Ll1ate
l.
1im Reasons coinee! the term Swiss Cheese :Modell to clepict this scenario ...,dr_.-
in mlshaps. There is seldom an overpowering cause, but rather a llumber oi' contrib-
graphically <1ncl this lenn has caught on in many industries (Figure 7.4).
uting factors o enors, hence rhe tem1 eJ"mJ" chall/. Bre,llcing nny one link in the
((llowing Ihis infurl11ation, \Ve vmuld likc to makc two points:
chain might potentially break the en tire error chain and prevcnt a misbap.'l. This
research comes 1rom the aviation inclustry ancl is based 011 lhc invcstigation of more l. \Ve as human beings have lhe ability through our senses to be more aware
than thi.rty acciclents or incidents. This has been om experience as well in irivesti- of out" environments. I.f \Ve shal1)en our senses, we can detect these changcs
gating industrial failures. <lnd [lkc aclon to prevenl [he error chain from running irs course. :rV1any
Flight Safety International statcs the fewest links discovered in any Ol1e accident uf our organizalional s.iislemS are pUL in place to recognize these ehanges. ,4-:-
was Our, the average being seven. 2 Our experience in industrial applicalions ?hows For example, the precJictive maintenance group's sale purpose is to utilize
the average number of errors that must occur to be bctween 10 ::lnd 14. '1'0 liS, ('his testing eqllipmen( lo idcntify changcs within the process ancl equipmcnl.
15 the eare to understanding what an analyst needs in order to unc!erstanq why 1"1' changcs are not within acceptable limits, actions are taken to make
undesirable events occur (Figurc 7.3). thel11 w1thi11 acceptable lil11its.
\Ve likc refcrring to it as error-ehange reJatiollships. First \Ve mllst c!cl"ine somc 2. By wch-huntlng [he 1as1 person associated with an evenl, \Ve give up the
terms in arder 1'0 coml1111nicate more effectively. V'le will use James Reasons' right to tbe in.forl11ation that person possesses on the orher errors that lead
(Humau Error: (990)' defnition 01" human C1Tor for our RCA purposes. Jim Reasons l1p to the event. Ir we discipline a person associated with the event becausc "L~ .,
defines Human Errr as "a generic tel111 to encompass all those oecasions in which our culture requires a "heacl to roll," t!"len that person (or anyone arouncl him)
will not likely be honest about why he made clecislons that resulted in errors.
1 Flght surety lnternational, Crew Resonrce l'v"hmagcmcnt vVorkshop, September 1993.
"l Flight Safety InternRtional, Crew Resource Mallagement Worksho,l~, Sep;~~~.~er 1993.
88 Rool Cause Ana!ysis: Improving Performance fOI" Bottom-Line Results The PROACT''' RCA Melhodo!ogy 89

is llot cnough jusi to SUppOli it, but we must link with the individuals who lTl1lst
physically collect the data. They l11ust ciearly unclerstancl why lhey should collect
the data and how to do it properly.
VI/e should link with people's value systcms ane! sl~ow thel11 ('he purpose of dala
collection. Ir \Ve are an operator in a steel mill and the first one to al1 event scelle,
we should understand what is important information versus unimportant to an RCA.
Por inslance, \ve can view a broken shaft as an itel11;to ciean up or as an integral
picce of information to a metallurgist. Ir \Ve unders[[uld how imporlant the data we
collect is to an analysis, \Ve will see and appreciate why it should be collectecL 1f we
do not understand or appreciate its value, then the taskiis seen as a burden. Proyiding
eyeryone with basic training in proper elata collection! procedures can prove invalu-
able to any organization.
\Ve haye seen the potential consequences of poor data collection e-J-forlS in some
recent high profile court cases. Allegations are maele as lo the sloppy handling of FIGURE 7.3 Tbe Error-Cllange Phenomenon
eyidence in lab work, improper testing procedurcs, i~11proper labeling ancl contam-
a plannecl seqllence ol' mental and physical activities fails to achieve its intended
inated samples. lssues 01' these types can lose your Cftse as wd!.
olltcome, <.lnd when lllese t~1ilures cannol be attributed to some chance agenc)i."
ProYiding lhe aboye support and training overcomes :one hurdle. But it cloes not clC8r
This mean s \Ve in tended 011 l satisfactory outcome ancl it did not occur. I.Ve,
the hurdle of perceived witch hunting by an organizatiop. r'l1any people .wi11 choose 110t
in some manner, either 1) dcviatecl from Ollr in tended path, or 2) the intended path
:;":t..t. ,):; lo collect data for Cm that they may be tmgeted based on the conciuslon dra\vn from
\Vas incorrect.
7 the data. This is a promincnt cultural issue that mLlst b6 addressed in order lO progress
The chal1ge, as a result o" an error in our environment, is something thar is
wilh RCA. We cannot detennine "root" causes ir a witch 11l1nting culture is prcvalent.
perceptible to the human senses. An cxample might be that \VC commil an error by
misaligning a shaft. The change wi11 be that an exce.sslve vibration occurs as a result.
THE ERROR-CHANGE PHENOMENON A nurse administering the wrong medication to a patient is the human error. The
aclverse reaction 1S the perceptible changc. These series o" human errors and as so- "",,,i.
Our experience indicates that there are an average number of crrOl'S that must:'occur ciated changes are occurring arounclus every day. \Vhen they C]1..lcll9...-MP in a particular ~. .,
in a particular pattern ror a calastrophic eyent to occur. The Error Chain Coricept l , pattern lhal 15 when catastrophic OCClllTenCeS happen. H~\:-\.~-...l-l ~~'?.h2.
"describes buman error accidents as the result of a sequence of events lhal cul!1nate 1im Reasons coincd {he term S\viss Cheese IVlodeP to clepict this scenario ~..Js-.
in mishaps. There is seldom an overpowering cause, bul rather a number o" Ct?ntrib- graphically ancl this term has caughl 011 in mun)' industries (Figure 7.4).
unrr factors of errors hence the term error c!win. Breaking any one linkin the Knowing this illformation, \-ve \Voulcl [ike to malee two poinls:
chai~ migh; potentiall; break the en tire error chain and prevent a mishi.lp!' This
rescarch comes fmm lhe aviation industry ancl is based Clll the investigation 01' more l. \Ve as human beings have the ability through OU1" sen Ses lo be more awarc
than thirty accldents 01' incidents. This has been our experience as well in investi- of our envirolll11ents. If we sharpen our senses, \Ve can detect these changes
gat111g industrial failures. am! take action to prevent the error chain from running lts course. Many
Flight Safety International states the fewest links disco ve red in any 011e accidenl 01' our orgnnizational systems are put in place lO recognize these changes.
was four, the average being seven. 2 Our experience in industrial appliGltions shows For eXi.lmple, the predictive maintenance group's sole purpose is to utilize
the average nlllllber of errors that must occur to be bctween 10 and 14. To us, this testing equipmcnt to iclentiCy changes within the process ami equipment.
i5 ihe core lo understancling what an analysl neecls in orcler to unclerslancl \vhy Ir changes are not \vithin acceptable limils, actions are taken to malec
unclesirable eycnts occur (Figure 7.3). them within acceptable limlts.
I.Ve li1<:c referring to it as error-change relationships. First we l1lust dehlle some 2. By witch-hunting the last pcrson associatecl with an event, \ve give up the
tenns in order to commllnicate more effectively. We \vi11 use James 1"<.easol1s' right to tile informatton that persol1 possesses on the other errors that lead
(Human Error: 1990)3 definition ofhuman error ror
our RCA purposes. Jim Reasons up to lhe event. Ir we discipline El person associatecl Wilh the event because .~L
defines Human Error as "a generic term to encol11pass all those occasions in which our culture requires a "heacl to roll," then that person (or an)'one around him)
--_. __ _- __.__...._- ..__..
... .. _--.".~-_., .. _- ..
wil! nOllikely be honest about \vhy he made decjsions that resulted in errors.
; Fliglit Slll'ety Intemational, Crc\V Rcsource ~'lanagcment Workshop, September 1993.
]. Flight Sarely International, ere\\' Resource Managernel1t Workshop, Seplember 1993.
. ~!~ .. , . "'.,~ . . t... ;,I.'n 11";",.,,,;,,, p,"c~
--~--_ ... _--
!()l)()
11~f':l~nl1 fmnr'~ HUilllI1I Prmr Victoria: CambriJrC: Universllv Press. 1990-1992 .
90 Root Cause Ana!ysis: lrnproving Performance far Bottom-Lne Results Tile PI\OACT" I\CA Methodology 91

Virtually anything needing to be col!ecred f'rom an event scene can be srorecl


under Olle of (hese categories. Many items will have shades oi' gray anci Jit under
two or more categories, but the 11l1portant thing is to capture the infonl1ation ancl
slot lt under one calegary. This categorization process will help document ancl
manage the data for the analysis.
Lel's use the example 01' the poI ice detective again. \V11al do we See detectives
and poEce routinely do at a crime scene? We see t11e poEce rope off the afea
preserving the positional informaton. \Ve see detectives interviewing people who
may be eyewitnesses. V/e see forensic teams "bagging ancl tagging" evidence or
parts. We see a lmni begin fol' lnformation or a papel' tra11 oi' a suspect thm may
involve past arresrs, insurance information, financial situation, etc. And lastly, as a
result of tIle interviews with the observers \Ve draw tentative conc1usions about the
situation sllch as, "He was always al home clLlring the day and away at night. \Ve
woulcl See chilclren constan tI y visiJing for five minutes m a time. \Ve thlr:: he is a
drug ckaler." These are the paradigms tllat people have abolll situations that are
important because ir they believe [hese paradigms, then lhey are basing thelr deci-
fiGURE 7.4 Reasons' S\viss Cheese I\lodel sions on them. This can be clnngerous.

In a latcr chapter Oil Analyzing the Data (Chaptcr 9\ "ve will explore what \Ve
caH a Logic Tree that is a graphical representation 01' "In error-change chain aneI PARTS
basecl 011 tl115 research. vVe discuss this research at this p~int becausc it is necessary
Parts wil1 generally mean something physical ol' tangible The potential list is enclless
Lo understand tbat any investigation or analysis C<111not be performed \vithout elata.
dcpencling on the industry \vhere the RCA is conduetecl. For a rough smnpling of
VI/e have enough experience in the fielcl application ori RCA to make a general
what is meant by parts, pIease review the following lists:
statement that lile physical activity uf obtaining such datc~ can have many organiz~
lional barriers in front of it. Once these barriers are rccognized ancl Qvercome, tlte
task comes of actually preserving and collecting the data. CONTlNUOUS PROCESS INDUSTRIES
(Oll, STEEl, AlUMINUM, PAPER, CHEMICAlS, HC)
THE 5-PS CONCEPT
Bearings
Prc.serving Failure Data is the PR in PROACT. In a typical high-profil.e RCA, an
Seals
immense amount of data is usually collected ancl then l11ust be organized une!
Couplings
rnnnaged. As we go [hrough this cliscllssion \Ve will relate haw to manage this process
Impolle"
manually versus with soft\vare. \Ve will discuss automating your RCA using software
Bolts
teclmologies in Chapter 12.
Flanges
Consicler this seenmio: a major upseljust occurred in our faciJity. \Ve are chargecl
Grease Samplcs
lo coHect thc necessary data fOl' an investigation. What is the necessary infOl'matiol1
ProclUCl Samples
to callcet for an investigation 01' analysis? We use a 5-Ps approach, whcre the Ps
\Vater Sumples
stand 1'01' the following:
Tools
Testing Equlpment
1. Parts
Jnstrumcntation
2. Position
Tanks
3. People
Comprcssors
4. Papel'
Nlotors
5. Paracligms
Root Cause Analysis: Improving Performance for Bottolll-Line Results The PROACT" RCA Methoclology 93

D1SCRETE PRODUCT INDUSTRIES tbe lert?" "vVhy was it the an (1ower) flelcl joint attachment versus the l..lpper fleld
(AUTOMOBlLES, PACKAGE DHIVERY, BOTTUNG UNES, ETC) joinl attaehment?" "\Vhy was lhe leak al the O-ring on tile il1sicle diametcl' oE the
SRB versus the outside diamekr?" These are quesons regarding positional inror-
Product Samples matiol1 that hacl to be answerecl.
Cnveyor Rollers NO\v let's lake a loak at positions in time and their relative importance. Mon-
Pumps toring positions in time at which undesirable ontcomes OCC1..1r can provide information
IvIotors for correlatiol1 analysis. By rccording historie al occurrences \Ve can plot trends that
Instmmentation iclentify the presence 01' ccrtain variables when these OCCUlTences happen. Let's takc
Processing Equipment a loo k at the shuttle Challcnger again. -VIost of us remember rhe inciclcnt and rhe
COl1clllsion reportecl to the publie: an O-ring failure resulting in a leak of solid rocket
HEALTHCARE (HOSPlTAlS, NURS1NG HOMES, fue!. If we look at the positional infonnation from the standpoint of time, \ve wOllId
OUTPATIENT CARE CENTER, lONG-TERM CARE, HC) lenrn that the O-rings had cvidence 01' sccondary O-ring erosion on 15 uf the prevLous
25 shuttlc launches. 1 \Vhen the SREs are released they are parachuted into the o~ean,
IVIedical Diagnostic Equipment relrievecl ane! analyzecl 1'or clamage. The corrclation o thesc past launehes, which
Surgical To01s incurred secondary O-ring erosion, sllOwed that low temperatures \vere a common
Gauze variable. Thc positions in time information aided in this correlation.
Fluid Samples lronically on the shuttle Columb break-up, there were seven oceunences of
Btood Samples bipocJ ramp foam cvents sincc the first mission STS-l. The lable below identifies
Biopsies which missions incurrcd ancl which types of clamage.
I'vfeclicines
Syringes
Testing Equipment TABLE 7.1
IV Pumps Space Shuttle Columbia Debr,s Damage Events
Patient Becls

This ls just a sampllng to give a [eel for the type 01' information thnt may- be ;~~~i:~_~;;:~:,;aga 300 ilea r~------------~
consiclerecl uncler the parts category. srS-7 06/18/83 FII's! knavm le, blpad r21mp foam sheddlng everll.
rsrs-27R 12/02/98 Debris knocks off lile. s!ructural damage and n-e-a-rC-bu-r-'-t-hr-o-u-gh-re-s-u-It-s.--
-j
PosmON STS-32R
!-c-.-1---'~.-+::-.--
01/09/90 Secanu Imown oipod oven!
..
8rS-35 '12/02/90 First Ulne NASA calls oem debris "safal}1 of fligtlt issue." and '"re-use or turn-
Positional data is the least understood ami ls what we consider lo be the 1110st around
_ __ time....issue."
ce.- - . - -________ - c - - . - -____-.-c-----
important. Positional elata comes in the form of two cliffcrenl climcnsions, one being 5T5-42 o"l/22J92 First miss ion aner Ihe next miss ion (8T8-45) launched without debris In-
physical space and the seconcl being point in time. Positions in lcrms 01' space are flight anomaly closure/resolution.
----r--.--.. --"----'---==.-=--____::-c-________"C"c---:---:--::--cc:__c_-
vitally important lo an analysis because of the facts that can be deduccel. STS-45 03J24J03 Damage to wing RCC Panel 10-right. Unexplahed anornaly, "most likely

-=
orbital dabris
\Vhen tile space shuttle Challenger exploded on January 28, 1986 it was
STS-50 06/25/92 Thircll\l"IOWIl bipoci ramp 103m even!. Hazard Repor! 37:Accepled Risk.
approximately five miles in the air. Films 1'rol11 the ground provided millisecond-
ST8-5-2 ", "IO/22Jj~ Urlddect~cI bipod r8fnp 108m 10% (fOll\"lh b-IP~? t:\,I8nl) ____ ..
by-millisecond footage of the pmis that \vere being clispersecl from -he initial doud.
Prom this positional information, trajectory information was calcuLated ancl scarch
Sr5 .. 56 04/.08/93 ~~'eage tile damage (larg8). Called withill_',,8xperience base."
SrS-62 '--~6iofl/94 Undeleciefi iJipccl rllllp ioarn loss (ifth bipod eV8ilt)
and rccovery groups \verc assigned to the approx1mate locatlons of where vital parls
STS~Y----- 11/19197 to~~agG !o-Orbiier l'hermal P-c-ot-e-'"-tio-'-S"Cy.'s-te-,-; -s-p-u"-rs-N'A'S:cAc-to-"be-g""',-;"'g""''"g"'h"C
were iocatecl. Approximately 93,000 square miles 01' oeean were involved in the tests lo resolve foam shedcling. Foam ix ineffeclive. In-flight 8nomaly
search and recovery of shuttle evidence in the government investigallon. 1 \Vhile this eventually closed after ST8-1 01 as "accepted risk,"
1s an extreme case, it shows how position informat1on is used to determine, mnong
r:::-::-c-c,-r-=c:c::-i
STS-112 '10/07/02 Sixth Imowll lefi bipod ramp foam loss. Firsl time majar debris even! not
other things, force. assiglled an In-flighi anornaly. External tank was assigned an aelion. Not
closed out lIntil after 8T8-113 and 8T8-1 07.
\Vhile on the subject of the shuttle Challenger, other positional informatiol1 !-hal
STS-107 0'1/16/03 Columbia Launch. Sevenih knowll le! bipod rlll"lp loam IOS5 eV8rli.
should be considerec1 1S, "vVhy was it the right solid rocket booster (SRB) amI not

j Chullcnger: D\saSler ane! lnvestigation, Cananta Commllnications Cnm I ()~7


94 Root Cause Analysis: lmproving Performance i'or l3oltom-Line Resu!ts lhe PROACT'i'J RCA Methodology 95

The long ancl short of it here 1S that the loss C foam tiles hom tile mall1 fuel
Hotter side no failure
tanks ancl their sllbsequcnt impact on the shuttle vchicle were not a ncw phenomenon
._- just like the O-ring erosion OCClluences. Collecting the pOSitiOllS in time 01' these
occnrrences amI mapping them out on a time line lJrove thcse correlaLions.
Now moving into more familiar ellvironments; we can review some general or
common positional informat1011 to be colJectee! al mpst any organization (Figure 7 .5):
i
Physical Position of Parts at Scene ol' Incident
Point in Time of Current ancl Past Occurrences
Position of lnstrument Reaclings
Position oY Personnel at Time of Occllrrence(s)
Position of Occurrence in Relation to Ovel~all Facility
Envlronmental Information Relme<.l to Po~ition of Occurrence sllch as
Temperature, Humidity, Wi.ncl Velocity, etc.

We are not looking to recruit artists fOl" these ~~laps or sk:elches. We are simpJy
seeking to ensurc that everyone sees the situation the same way based on tlle faets
at hand. Again, this is just a sampling to get incliv~eluals in the right framc of mind
01' what we mean by positional information.

Thcrmo couple
PEOPlE
The people category is lhe more easily clcfined P.i This i5 simply wllo necds to be FIGURE 7.5 i\if:J.pping Exal11ple of Sulfur Burner Boiier
talked to to whom initial1y in arder to obtain inforI11ation about an evcnt. The peopie
of their clients, their opposition ane! the \vitnesses. Bocly language clnes will direet
we must talk to llrst shoulel typically be the physiCal observers al' witnesses 01' the
Lheir next movc, T11is should be the same for interviews associatee! with an uncle-
event. Efforts to obtain sllch interviews should be l;elentless ane! immediate.:\Ve risk
sirable outcome. The bocly bnguage will tell interviewers 'vhen tiley are getting
lhe chanee of losing elireet observation when we i,nterview observers days ;ancr an
c10se to lnfonnation they desire, amI this will direct the line anci tone of subsequenl
cvent occurs, I.Ve \-vill ultimately lose some clegree oY short-term mcmory ~tncl also
qucstioning. Consicler another profession that \ve might not thinl: of as having a
risk the observers' having talked to others about the.ll' opinion of whal h4ppened.
strong relalionship \vith bocly langl1age - professional poker players. \Vith the recent
Once observers cliscuss sl1ch an event with oUlsiclers, thcy wil! lene! lo rcshape ll1eir
popularity ol' 'Texas Hold'em poker, it cIoes not take the novice long to realize that
clirect observation with the new perspectives.
LIJe slrenglh 01' the cards you are dealt cIoes not determine if you are a winner.
\Ve have always regarded the goal of intervie\vs with observen; to be that \Ve
Professional poker players play their hands based on their read of the body language
must be able to see through their eyes what thcy saw at the sccne. The clesc.ription
01' their opponents. They icHOW that there are certain involuntary responses o" the
111l1st be vivid, ancl it is IIp to the interviewer to obtain such clarity throllgh 1"llCir
body by cerlain players Iha! indicate that (hey are holding a strong hancl or that they
questioning proeess.
are likeiy bluning. This further v"llidates the importance ancI effect of bdy language
Intervie\ving skills are nec.essary in such analytical work. Peoplc must reel
whcn intcrviewing.
comfortable i.uound an intervievi'er ancl not intimidated. A poor llltervicwing style
\Vhen intcrvicwing c.iuring lhe course of an RCA, it is also important to consider
can ruin an interview amI subsequently a11 analysis or invcstigation. A good inter-
the logistics 01' the intcrview. \\fhere is the appropriate place to intervie\v? How
viewer will understand the importance and value ofbocly language. Experts estmate
many people shoulc! we intervic\v at a time? \Vhat types o" people shoulc! be in the
approximately 55-60S{, 01' an eommunication between people i5 through body lan-
room al tlle same time? I-Iow will we record all the information? Preparabon anc!
guage.Approximately 30% of cormnunication is through Lhe tonal voice and 10-15%
cnvironment are very imporlanl factors to eonsider.
1S through the spoken \vorcl. l This is very important when intervic\ving because ir
We dlscussed the intervie\ving environmcnt ancI the ideal number of peoplc i.l1
emphasizes the neeel to interview in person rather lhan over the telcphone. H you
an interview in Chapter S. Those same poimers wiII hold truc when interviewing
100k at the legal profession, lawyers are professionals at reacJing rhe body language
for [he actual RCA versus the Opportmlity Analysis.
I.Ve have most suecess in inlerviews when the interviewees are from varioLls
i Lyle. Jane. EN!." LUllguoge. London: Thc Hum!yn Publishing 01"Ouj1 Lirnitcd, 1990. clcpartmcnts, and more speeit'icnl1y from different kingcloms. \Ve define a kingdol11
96 Root Cause Analysis: !mproving Performance for Bottom-Line Results lhe PROACT'"' RCA Methodo!ogy
97

as entities that build their castles within facilities and tener not to communicatc \vith Keep in 1:1ind our detective scenarios cliscLlssed earlier and the faet they are
cach olher. Examples can be maintenance versus operations, labor versus manage- always prcpanng. H solid case [or court. Papel' elata is one of the most etlective ami
ment, cloclors versus nurses, hourly versus salary, etc. vVheil such groups gct together expect:d ~atcgones of evidence in court. Solid, organizcd c!OCUmentation is the leev
they leam a great cIeal abont the other's perspective and fend to earn a respect for to a wl11l1lng strategy. ~
each other's position. This is another addccl benet of an R~A is that people aclual1y Typical papel' data examples are as follows:
start to mect and communicate with others from different levcls aneI areas.
1f an interviewer is fortunate enough to have an associate analyst to assist, the Chemistry Lab Rcports
associatc analyst can take the notes while the interviewel'i focuses on the intcrview. Mctallul'gical Lab Report
It is not recommended that recorcling devices be usecl in l-outinc intel'views as they Speclf-lcations
are intimidating, anei people believe that the informalion ~nay be usecl against them Procedures
al a later date. In sorne instances where significant legal,liabilities may be at play, Pulides
Legal counsel may impose such actions. However, if they dp,lhey are generally cloing Financial Rcports
the interviewing. In the case of most chronic falures ol' eVej1ts, such extremes are rare. Training Records
Comrnon people to interview \vill again be based on the nature of the industry Purchasing Requisitions/Autborizations
and the event being analyzed. As a sample of potential intrrviewees, please consieler NOlldestructive Testing Results
the following list: Quality Control Reports
Emp10yec File Information
Observers Maintenancc Histories
I\!Jalntenance Personnel Production Histories
Operations Personnel Medica1 Histories/Patient Recarcls
Management Personnel Safety Records lnformation
Aclministrative Pel'sonnel Interna1 Memos/E-IVIails
Clinicians/MedicaJ Staff Sales Contact Information
Technical Personne1 Process & Instrumentation Drawino-s
Purchasing Personnel Past RCA Reports D

Storeroom Personnel Labcling of Equipl11ent/Products


Vendar Representatives Distlibutivc Control System (DeS) Strips
Original Equipment Manufacturers (OENI) Statistical Pro ces s Control/Statistical Qualitv Control Infofm<ltiol1
Other Similar Sites with Similm Processes (SPC/SQC) -
lnspection/Quality Control Personnel
Risk/Safety Personnel . , \Ve \:vill discu.ss in the chapter "Automating Your RCA" how to keep aH this
Environmental Personnel 1l110nmltlon orgal1lZcd aneI properly docul11cnted in an efficient and effective manner.
Lab Personnel
Outside Expens
PARADIGMS
As stated previously, this 1S just to givc a feel for the variety 01' people that may Paradigms .have bcen discussed lhroughout this text as a necessary foundation for
provide infolTI1ation about any given event. unclerstandll1g how Ollr rhought processes affccr OH!' problel11 solving abilities. But
cxaclly whal are paradigms? V/e wil1 base the denition \'le use in RCA on FLlturist
Joel 13arker'.') definition:
PAPER
Paper data is probably the most understoocl fonn of elata. In an information agc "A paracligm is a set 01' rules ancl regulations lhat: 1) Defines bounclarics; and 2) tells _~.;-.;;:-,.:::
where \Ve have instant aCCeSS to data through our communications systems, we lene! you whal to do to be successful wit11il1 those bounclaries. (Success is mensured by the
lo be able to amass a great deal of paper data. However, we must make sure tbat problems yOll sol ve Llsing lhese rules and regulations.)"l
we are not collecting paper elata for the sake of developing a big lile. Some companies
we have seen seem to feel they are getting paid basecl on thc wiclth oi' the lile folder. ----_._--- ----_... _----
Wf-' Illw:t m:lkp: ~11rP the dflt8 we are collectin!:!: is relevant to the i1nalvsis at hancl. J Barkcr, Joe!. Di.\'col'ering t!ill Fulilre: Tlw Busi~;;ess ()fP~;I~l1ligms. Elmo, MN: IU he-ss, 19S5l~--
9B Root Cause Analysis: Improving Performance ror Bottorn-Line Results The PROACT:fl RCA Iv'ethodology 99

This is basically haw each individual views the \Vadcl ~lnd reacts and responcls to lf we believe that equipment will fail becJuse it is olel, lhcn \ve will be
situations arising m-01.Jlld them. This inJJerently affects h'ow \Ve approach solving better preparecl lo replace it.
problems ancl \Vill ultimately be responsible for our success or failurc in thc RCA effort. [e \Ve bclieve RCA is the program-of-the-month, t11en we vll \vait ir out
Paracligms are a by-product of interviews carricd out in ~his process amI disCllsscd unlil the facl goes away.
earlier in this chapter. Paracligms me recognizable beC<:I~lse repetitive t,hemes are If we do not believe data collection is irnportant, then we \vill rely on
cxpressed in these intervicws from V;U-OLlS individuals. How an individual sees the word of 111ou1'h amI aUow ignoranee and assumption to penetrate an RCA
wmlel 1s a mincIset. \Vhcn a certain popuJation shares the same mindsct, it becol1lcs as raet.
a paradigm. Paradigms are importanl becausc even if thes~ are false, thcy represent lfwe belicve that RCA is a witch-hunting tooi, then we will not participate.
the beliefs in which we base our decision-mak1ng. Thercfor~, true paracligms represent Ir we believe failure 1S inevitable, then the best we can clo is beeome a
rcalily to the people who possess them. better responder.
Belmv is a 11st ol' common paradigms we see in our travels. '0/e are Bol mnking Tr \ve believe that RCA \"li11 eyentually eliminate our job, thcn we wi11
a juclgment as to whether or not they are true, bul rathcr ,that they affect judgment not let ir succecd.
in clecision-making. Ir l nurse believes that it 18 career limiting to contraclict a doctor's order,
then someone wiII likcly die as a result o" the silence.
We do not havc time to perform RCA. lf \ve believe that the hospital is in total control of our care, then \Ve \vill
vVe say safety is number OIle, but when it comes ~lown to brass tacks on nol question things thnt scem \vrong.
the fioar, cost is really number one. lJ' we bclieve that hospitals are safe-havens for the sick, then \ve are stating
This is impossible to solve. th01t we are not responsible fOl" our own safety.
\Ve have triecl to solve this for twenty years. Ir we believc that what we get 1S whm we order, then we \vill nol ever
It's olcl equiprnent; it's supposecl to fai!. inspect when \ve receive an order ancl jUSl trust the vendor.
We know because we have been here fOl" twenty-tive ycars. Tf we believe that aIl RCA is the same, tl1en techniql1es like the 5-\lI/hys
This 1S ::lnother program-of-the-month. will be consiclerecl as comprehensive and thorougb as PROACT.
We do not need elata to support RCA because we! know the answer. If we believe \Ve know all of the ,mswers, then RCA \vill nol be valuecL
This is another way for management to witch hurit. Ir \Ve believe thar unexpected failure is covered for in the budget, then
Failure happens; the best we can do i8 sharpcn our response. we wili 1l0t attempt to resolvc those unexpectecl failurcs.
RCA wiH eliminate maintenance jobs. H we bellcve that RCA is someone else's job, then we are inc1icating that
It is a career-llmiting choice to contradict the doctor (nurse's perspective). our safety is the responsibility of othen, and not ourselves.
\Ve fully trust the hospitals to be responsible fOl" Gtil" careo
Hospitals are safe havens for the sick. Our purposc with these "if-then" statemcnts is to sl10w tile effect that paradigms
\Vhat we get is \vhat we arder; there is no need to check. have 011 human c1ec1sion-making. \\ll1en human errors in decision-making occur,
RCA is RCA; it is all tbe same. lhen they "lre thc triggcfing mechanism rOl" a series 01' other subseqllent errors llntil
\Ve don't need RCA; we know the answer. thc undcsirab!e event surfaces and is recognized.
Ifthe railure is compensated for in the budget, it lS not really a fai!ure anymore. \Ve llave now cIiscllssed in cIetail the error-change phcnomenon and the 5-Ps.
RCA is someonc c1se's job, not mine. Now we mllsl cliscuss how \Ve get al! 01' this informatlOll. When an RCA team has
been commissioncd, a group of data col1ectors must be assemblecl to brainstorm
Many 0-[ these statements may souncl familiar. But think about ho\V each state- wl1ich data will be necessary to start the ana1s'sis. This first team session is just
111ent could affect problem solving abilities. Consicler these i1'-then statements. lIlat, a brainstorming session 0[" data needs. This IS not a session to analyze anything.
The group Il111St be rocused on data neecls ancl not be distracted by the premature
Ir \Ve see RCA as another burclen (ancl not a to01), thcn we \-vill nol givc search rOl" sO!lltiollS. The gOJI of this first session should not be to collect 10090
it a high priority. of the data neecled. Icleally our data collcclion altcmpts sholllcl result in capturing
lf \Ve believe that management values proi1t more than safely, then \Iy'C abolll 60-70% of the necessary elata. AIl 01' lhe obvious surface data should be
may rat10nalize that bending the safety rules is really what our manage- collected first ancl also the most fragile data. Table 7.2 describes the normal fragility
ment wants us to do. 01" data at a typical event scene. By fragility \vc mean the prioritization of the
Ifwe belieye tharsomething is impossible to solve, then we will not sol ve it. 5-Ps in terms 01' which is most impOl"lant to collect first, second, third ancl so on.
If we believe that ",ve huye nm be en able to solve the problem in thc past, We shou!el be cOllcerned about \vhich elata has the grcatest likelihood of being
then no one \vill be able to solve it. lainlcd [he fastest.
100 Root Cause Analyss: Improving Performance for Bottom-Linc Results Tlle PROACP RCA r'v1etlloclology 101

Measllfing Tape
TABLE 7.2 Sample Vials
Data Fragi l'ty Ran mgs ,
\h/ire Tags to ID equipment
.> .5P:; , Fragi(ity Ranking
This lS of course a parlial listing, and cJepending on tlle organization ami nature
Parts ,
2 al' work other items \vould be aclclecl or deleted 1'1'Om the list.
Position 1 Thc following form is a typical dala collection fonn used for manua1ly organiziug
Paper 3 data colleelion strategics far aRCA team (Figure 7.6).
People , 1
l. Data Type/Category: L..iSl which 01' the 5-Ps this fonn is directed aL Each
Paradigms , 4 "P" should havc ils own 1'011n.
2. Person Responsible: The person responsible for making sure tbe data is
Yon vv1llnotice that people anc1 position are tiecl fr l1rst. This is 11m an acciclent. collected by the assignccJ date.
As \Ve discllssec1 earlier in this chapter, tbe need 10 int~rview observen; is immediate 3. Data to Cancet: During the 5-Ps brainstorming session, list a11 dara
in arder to obtain direct observatiol1. Positional int\'mnaliol1 is equally importanl neccssary to colleet for each "P."
bccaLlse it 1S the 1110St likely to be disturbcd the quicJ~est. Therefore attcmpts to gel 4. Data Colleclol1 Strmegy: This space is for aetually listing the plan of hO\v
sl1ch elata should be performed immediately. Parts are, seconcl because i1' therc i5 BOL to obtain the previously idcntified data to collect.
a plan to obtain them, t11ey will typically end IIp iI~ the trash can. Papel" elala 15 5. Date lo be Collceted By: DaLe by which the data is to be collected and
I
generally static \Vith 1he exception of process or onl1ne proc1uctiOI1 claln (DeS, reacly to be reportcd to tearn.
SPC/SQC). Such new tcchnologies allow for auton~atic averaging uf elata to the
point that if the information is not retrieveel within ,i certain time frame, it can be Figure 7.7 is a completed sLlmple Data Collection \Vorksheet:
lost forever. Paradigms are last becal1se \ve wish we,could change lhem faster, but
modifying behavior and belief systems takes more tirne.
One preparatory step for analysts shoulcl be to have a data c01lection kit
ahvays preparcd. l\IIany times such events occur when we least expecl it. \Ve do
not \vant to have to be nmning around collecling a camera, plaslic bags, etc. lf
it is aH in one place it is much easier to go prepared in a minule's Ilotiee. Usually
goocl models are from other emergency response occupations such as clo'clor's
bags, tire departments, police departments, E1VITs, etc. They always have: most
f \vhat t11ey need accessible at any time. Such a bag (in general) lllay he['\'C the
fallowing iterns:

Caution Tape
Masking Tape
Plastic Zip-Lock Bags
Gloves
Safety Glasses
Ear Plugs
Adhesive Labels
NIarking Pens
Digital Camera w/Spare Batteries
Video Camera (if possible)
lVIarking Paint
Tweezers
Pacl and Pen
S-P's Data Collection f'o;:m
"-
Analysis Name:
'"
Dala type: People, parts, position, paper, paradigms (circle one)
Champon:~~~~_ _ _~_~_ _ _ _ _~_ _ _ _~_
;v

I'f"'r" "'". .;. ~-" ~:YL>;;:::;::=l' ' "I;!,;;;);.":ti;i';~'~:


(Person that ensures al! data assigned below is collected by dne date O
O

n
'"
e
~
ro

~

[] . I ! +-1-
'"
-< ~

T
~

~+ ---j- I u
3

~II__
O
<
S-
t l / m

~-I
v
"
--+--i------.j ~
H
C.f.!
'
.---ll~-
1

~ I
3
'"
~
n
ro
o
H I I ~
co
o
S
1 1 .. -1 ~
r
3-
m
I f

'"
ro
~
e
FIGURE 7.6 Sample 5-Ps Data Collection Form ;;

5-Ps Data Collection Form ;i


Analysis name: Recurrinfi failure,..c0CfE.p~uCn";p.C2E.3"5,-_~_ _ _ __
ro
v

Data type: People, parts, POSition,8paradigms (circle one)


'"
O

n
-1
Champon: J~ochcnCS"nc,"t"I!.,_ _ _~_ _.___~~~_ _ __ o
(Person that ensures aU data assigned beJo\\' is collected by due date
.'.i.-'-:j .
1, . . ,
:
'-
D
:1'ato
_ ,_e, ,_
-'j:' :"d"'-"':I::. , ." -- ,. .l.-Io"":da.'t;:,,-~ifl.bc.Obt.aiued:---
.. b: e"OJecte'
:-, '
,'-
_ _ :_.:. __
-,',
:_~. ___ ,', _'
O" ,- -: .. ' ; ' , - " , " ,

,,(data collec:?on strategy)


',-
I

'peJ.:.son
-

l'csp?Hsble
-1J-': ,D,.-e"io-'Qe

...
_colle~ted,by! ~
9s:
~.' J Ken Latino
I. .~
Shift logs .1 Have shift foreman collect rile shift __

! logs when pump 235 fails and


! 11/30/05
'"
:-
O
Q.

L---I
I O
--~~-i~ to John Sm~t~ within 1 day_ ----1..J----- O
00
-<
1_ _ 1 ' I

-.J. i ----.J
!
I

I ! -- 1-1.
11 I I --j :~
. ..,~ -----l ~
1

t-I
'.J,
l.] 1-=
1:- I
I 1

[-1 J:-
1- 1
--[----- : 1--
FiGURE 7.7. Compleled Data Collection Form
'"
'<o

You might also like