Professional Documents
Culture Documents
R Te SE
A ALYSIS
Improving Performance
for Bottom-line Results
Related Titles
'1574442821 :-
t:-
tL_
i
i-
,r
R T SE
Improving Performance
l SIS
for Bottom-line Results
Robert J. Lati no
Reliability Center, lne.
Hopewell, Virginia
Kenneth C. Latino
Practical Reliability Group
Daleville, Virginia
0l;~~~:n~~~;nCiS
Boca Ralan London New York
lb 2006 by Tayior & Francis Grollp, LLC We also dedicate lhis text to al! of those in Louisiana, Mississippi,
CRC Press is an imprint of Tylar & Francis Group
Alabarna, and Florida who perished, and Ihose who lost everything
No clailll lO original U.S. Government worb
Pntcd in he United Stl1les 01' America 011 acid-ti"ee paplOf
due ID Hurricone Kalrina 0/1 Augusl 29, 2005,
10987654321 Hopeflly [his tex! Hlil! shed lighl on he reasons Ihal allowed
lnlernalional St,lIldard Book Number-10: 0-8493-5340-8 (HardcovlOr) the cOl1sequences lo be worse than lhe)' slwuld have been!
lnternalional Sandard Book Number-13: 978-0-8493-5340-6 (I-Jardcover)
This book contains informution obtained from uuthentic and highl), regarded somces. RepliI\[cJ malerial is
qllo(ed wirh pelmission, anc! sources are indicated. A wide variety ir rcrerences are lisled. Reasonab!e errorts
have been made lO publish reli:lble dala and infonnalion, bul lhe! author anJ lhe publisllCr eannot aSSllllle
responsibilily ror the validity of ul! materi<lls Ol" ror lhe eonsequenceS af her use.
No part 01" this book may be reprintcJ, reprodueed, transmitled, or utiliz,-,d in <lny !"onn by uny eleclronic,
lllcchanieal, or other means, now k.llO'.vn Ol" here,ll'tcr invented, inc!uding phOlocopying, microrilliling, <lnd
rccording, or in any informarion storage or retrieval systCl11, without wri[tcll permissioll frolll he publishers.
For pennission to phOlCOPY or use malerial e!ecronieaBy from lhis work, picase ac,-,css www.copy:right.com
(hllp://www.eopyrighl.coml) or eontae lhe Copyright Clearance Center, lne. (CCe) 222 RoscwooL! Drive,
D,mvas, vlA 01923, 978-750-8400. CCC is 11 nOl-for-protll organization tha! pnwides icenses and I"~gistra!ion
fOl" a val1el)' of users. For organizalions that llave been gmnted a phOlocopy liLcllse by the CCC, \1 separare
system 01' payment has been arranged.
Trademark Noiice: Pmduct or eorporale names !TIay be lrademarks 01" regislcrcd lradl2marks, <lnd me only
for idemiiication and explanation wilhOlll intent to infringe.
Terrence O'Hanlon, ClVIH,P What 1S Root Cause Analysis (RCA)? lt seems hke such an easy question to ansvver
yet f1'0111 no vices to veterans and practitianers to providers, \Ve cannol seem to agree
Publisher (nar come to consensus on) 011 an acceptable definition [01' the industry. Why?
Reliabilityweb.com We wil! discuss our beliefs as to why it is so hard to get such consensus and why
various providers are relllctant [ol' that to happen,
1vlany who will read this tcxl m'e seeking to learn the basics about what is
invo1ved with conclllcting an ReA. Many veterans will peruse this text seeking to
ir
see they can find any pearls o' cOl1ventional wisdom that they do not aIready know
or to dispute and debate om phi1osophies. This creates a very broad spectrum of
expectation that we will t1'Y to accolTImoclate, However, in the end, success shall be
definecl by the clemollstration of quantifiable results and DOt on adherence to the
app1'Oach of favor.
\Ve have triecl to write this text in a conversational stylc bccause we believc thm
is a formal that most "1'Ooticians" can relate to, Basically we are writing like \ve are
teaching a wOl'kshop.
Reaclers wi11 fincl that much of our experience comes not only from the practicing
of RCA in the fidel, but more from our experiences with the over 10,000 ana1ysts
that we huve taught and mentOl'ecl over the years, Additionally, \ve participate in
many on-line discussioll forums where \ve inleract with beginners, veterans anci most
providers for the bellerment of the RCA field. \Ve willlist these sources in this text
in the hopes 1hat our readers will join and also participate in progressing our C0111l11on
iield of study,
So as yotl can see, we t1'Y to bring many diverse perspectives to the tabIe, while
making the pursuit 01' RCA a practica1 one, not a complex one. V/e certainly want
to avoid falling into the "paralysis by analysis" trap whcn looking at something like
RCA - lhat wulcl be hypocritical, \Vould it not?
V'le wiH bring to light the perspectivcs 01' the pragmatic "rooticians" to the
"puri~:t" so that readcrs can make tlleir own judgment as to what is best 1'or theil
applications, V/c will present debates Oil definitions of words c0111111on1y Llsed in the
RCA lexicon but uHimately come lo [he conclusion that there are no genera1l),
accepted definitions in the J'ieJd so we must fend for ourselves (\vhich is part of the
problem with communication),
There are many RCA methodologies 011 the market, so we \vill discuss them ir
generalities so as not to put the microscope on any individual ol' proprietary approach
In this manne1' \VC can discuss the pros and cons 01' each type 01' approach and reader~
can decide the level of breadth and depth that they require in their analysis,
\Ve will clisCLlSS tlle scopc 01' RCA: whcre cloes il begin and where cloes it end':
How does a true RCA effort integrate v'>'ith thc organizational struclure and rcmail
a y.iab!e and valuable resource to the organization? Vlhere there 18 RCA, there j
tnrf politics. So \Ve will discuss how this activity called RCA ots wi.tb existing =
WP""'W':lll
Kennclh c. Latino
President
Practica] ReliabiJity Group, LLC
Troutville, Virginia
tries. After \vorking with clients to help them beco me more proactive in lheir
m3intenance acvities, he began consulting and teaching industrial plants how lO
implemcnt Reliability methodo10gics and techniquesi to help improve the overal!
performance of plant assets. '
Contents
Over the past few years, a majority of Latino's focLls has centered aroLlnd
developing Reliability approaches with a heavy empl~asis on Root Cause Analysls Chapter 1 Introcluction to thc PROACT Root Cause Analysis (ReA)
(ReA). He has trainecl thousancls of engineers ancI tec~nical reprcsentativcs on how \Vork Process ...... . ...... !
to implement a successf111 RCA strategy at theil' respeqtive facilities. He coautl1orecl
two RCA training seminars for engineers and honrly lJersonnel respectively. Mean Time belween Fai!ure (MTBF). . ...... 4
Latino is also ca-software designer of the RCA program entitlcd The PROACT N umber of Events .................. . ...... .:-\.
Suite. PROACT was a National OoId Medal Award \ylllner in Plant Engineering's lvIaintenance Cost . .5
1998 ancl2000 Proclllct ofthe Year competition for S iiI1,st two versions on the markct. Availability .............. .. ...... 5
He ls currently President of the Practical Reliability Gi"Ollp, a Rel.ilbility consull1ng Reliabilily .. .... 5
,
flrm dedicated to delivering approaches ane! solutions t~1at can be practica!!y applicd Balancee! Seorecard ........ 9
In any asset intensive lndustry. Thc RCA vVork Process .... .... 10
Chapter 2 Introcluction to the Field 01' Root Cause Analysis .... ...... 17
Whal is Root Cause Analysis CRCA)? ...... 17
Why Do Dndcsirable Outcomes Occur? The Big Picwre. ............... .... 18
Are All RCA Methoclologies Crealecl Equal?. .. ............ 19
Attempting to Unclerstand RCA - ls This Good for the lndustry? ...... 19
What is Not Root Cause Analysis? ................... ............. .. ..................... 20
How to Compare Dlffercnt RCA IVIeihodologies When Comparing Them.. ..21
What Are the Primary Diiferences betwecn Six Sigma and RCA? ........... 24
Obstacles to Learning from Things That 00 \Vrong...... .. ........... 25
they progress into a maintenance ane! reliabiJity initiative. Yo/e ol-ten hear abOllt 3. \Vork Practices Perspective
iVIean-Time-Betvveen-Faure (1vlTBF), lVlean-Time-to-Restore (MTTR) ancl rnany a. Reduce Repair Time
others. NI~asu~~Fform:m~~J9~: the. sa.~. 0iE1casuring _is_l1ot cspecially useful b. Reduce "tvlaintenance IVlaterial Tnefficiencies
unless th..: measurements are directly related~t-o the peJ-'-forlrai1ce-)'fTlYe-6i~gJlizatjon c. lmprovc Labor Ei'lkiency
_~~::-;:.7 arfd-actiol1 18 t,ll~en to malee the neeCIed -il~provel;l?~]ltS ',\vhel-l"tl1e'l11caslli:e',s"are going d. Improve :Material Purchasing
~r in 'anegati~direction~----,=,--o-o--:-:------~r--' - __ _ (~) Perform Preclictive M.aintenance
Thereforc, we must first think abont what gqals or objectives we are trying to f. Optimize Time-Based Ivfaintenance
accomplish befare we can determine what measures \Ve neee! lo monitor. An c1'fectlve g. Optimize Work Processes
methodologv fox determining yOlIL.cDJnpanX~~....~~J~~~_~~es is ~.~ __~.~~~!.~~~.~.tegy Inap. h. Perform Reliability Stuc1ies
A strmegy map takes all of the objectives of the cqmpal1yand plS them into various J. Perform Criticality ancl Risk Assessments
perspectives. 'rhe perspectives can vary hom conipany to company bul for lhe area lP Improve Maintenancc Planning ancl Scheduling
l
l~f a~~~~gement tl=~!L'~
are four main perspectives: 4. K.nO\vleclge and Experience Perspectve
a. lmprovc Historical Equipment Data Collection
1. Corporate e0. Improve Operations Communications
2. Assets M
(:) Train I\1aintcnance ancl Operalions Personnel
\
3. Work Practices
L-~7' Knowleclge ane! ~_xp~r!~l~~~ Once the perspectives ancl objectives are fully dciined we necd to determine the
, reJationship of Jower level objectivcs ro llpper leve! objectives. Below is an example
Within each of the four perspectives, a number 01~ individual object"tves are deflned. oF a sample strategy map with the objective relationships denncd for the Corporme
For instance, within lhe Corporate perspectivc \oye lool\: at objeclives that directly perspective (Figure 1.1). Strategy maps are an effectivc visual vehicle for demon~
relate to goals clehned within the company. Tbes~ are typically relatecl Lo the 15cal strating how evcry person in the organization can al'fect the performance oI' the
perf0111lal1Ce oi' the business but can also relate critical operational issues like overall business. For instance, whcn a technician is performing vibration analysis
environmenta] anel safety performance. Other objectives related lo lhe Corporate in lile -c!d he can see hoy\' tile application of that skill will improve equipment
perspective might be customer satisfaction issues! like on-time cleliveries, quality of rcliability. This \1;'ill ultimately contribute to the corporate goal of achieving higher
the procluct anc1 many others. However, in the are,~ of asset management we typically returns on the capital employcd.
foeus Oil those areas that relate to financial, safe~y anel environmentat performance
as they relate to the utilization of assets. I
H "".'C Below is atable oI typical perspectives and objcctives relatecl to asset maqagemenl: Increase
[ 'm"mm
C.,(C.<J '-::Z~e:,;-l \':::::;:" ",-}, "f,' h 'O safdyancl return on
investment Corporate
1. eorporate Perspective L~~n~~r\.to\ !-i.::::o;iJL
environmental
L\..-'0 :0.::; perspective
conclitions (ROO
a. Increase Return on Investmcnt (ROJ)
__== (12) Improve Safety amI Environmental Condilions
"._\:'.':'Cl e, ,
c. Reduction of Controllable Lost Profit 1
el. Recluction of l\Ilaintenance Expenses
e. lncrease Revenue fmm Assets
J I
Reduce lilcrease
f. Reduce Production Unit Costs prouuction I rcvcnue from
.c'"~---- 9- Increase Asset Utilization
h. 1vlinimizc Safety anci EnvironmentaJ Incidents
2. Asset Perspective
unit costs
,
I
J assets
i
7~"- @ 1vIinimize Unscheduled Equipmcnt Downtlme
__<:.:6..~ (Jil Improve System Availability
I I I
c. Reduce Seheduled Maintenance Downtime
.~ Q. Reduce Unscheduled Repairs
Minimize
safety arrcl
environnl.ental
Reduction of
111ain[enance
expenses
Reduction oi"
controUable
lost proftts
I
lncrease asset
utilizatiorr I!
e. Recluce Non~Eql1ipment-Related Downtime inddents
.---(.o. Increase Equipmelll Reliability ~. I
,g. Reduce Equipment Pailure Time FIGURE 1 ..1 Sample Corporme PerSIJeCVe Strategv 1vlap
4 Rool Cause Analysis: Improving Performance far !3ottom~Line Results !ntroduction lo the PROACT Root Cause Analysis (RCA) vVork Process 5
Let's return to the COl1cept Di' metrics ami K~y Performance "Inclicators (KPIs). metric is closely related to lVlTBF as it is the denominator for the calculation}~ can
TOil1 Peters once said, "Yon can't improve wbat )lou Call11ot measure." 11' yOll think al so be an accural~Jd1Gction of a facility's maintenanee and reliability perforlnance.
abant it for a minute, t makes a Jot 01' sense. Wc have been cxposecl to KPIs since
----------------------- - - ___ " -- --,---
\ve were very young. From the mament we lli'C bqrn we are weighecl ane! measurecl,
ancl then \Ve m'e comparecl to stanclcmls to see whiqh percentilc we are in. As \Ve grow
ancl get nto 5ehoo1, we are exposecl to 1nother setl;of KPls, the infamoDs rcport curel. This melrle simp1y measures the number of maintenance dollars that are expended
The report care! allows us to compare our perfonnance against our peers or to sorne on rectifying the consequence of an evento Tbis is typically tbe som of labor ane!
standard. An example that many people can ccrt<:n1y relate to is the use Df a scale material cost (incluciing contractor costs). This meti:'ic-is'lli's-en1plOyed across rnany
lo measure the progress of a cliet. We probably \\(ouJel nol be very successful if we different 'ctime;1sion-s-lilfe-ecllupI1fnt;atw, manufacturers, etc. This metric i5 a bettcr
did not k11mv where we startecl and what progres0 we were making week-by-\veek:. business metric as it shows sorne of the financial conseqllences 01' the evento It also
\Ve all neeel a "scoreboarel" to help us cleten~'ne where we startecl ancl where
we m'e at any given time. This certainly applies tp measuring the performance 01' a
maintcnance ane! reliability organizaron. We nee4 to lenow how 111any events OCCUf
I has &ame drawbacks, as it does not tatally reflect the complete fll1ancial consequence
of the evento It does no1 eover the lost opportunity (e.g., downtime) associated \v1th
the event. As we all know, the cosl of downtirne is mllch greater than lhe cost of
in a given month, 011 a specific class of equipmei_nt, etc. Not until we know which
KPIs will effectively measure our maintenance -ane! reliability objeclives can we
I rmllntenanee on a clramatic downtime event.
This calculation can be moclii1cd in many ways to fit a speciik business neecl.
'<"-;:\~''';i\7 MEAN TIME BETWEEr" FAllURE (MTBF) Although this me trie is a gaocl reftection of how available the assets are in a given
~ 1-\/\/ ().-,j ~
time period, it provicles absolmely no elata on the reliability or business il"~1pact of
F(lt-r\\r1 i'i.'..rdcan Time Betweel1 Failure CMTBF) is a common metric tl1at has beel~ usecl ror the assets. ---- ." -~ .. -----._,.==-=-=~,:--~---~---=~~----"~-.,,,,-,==-- -
~.~.~.0 many yems to establish the average time betweel1 failurcs. l\lthough h can be
calclllated in clifferent ways, ir primarily looIes at tIle tot,11 runtlmc of an asset elividecl
by the total number of failures fol' that asset. RElIABIUTY
This melric can be a better reflection o how reliablc a given asset is based on its
Total Runtime I Number ofEvents = MTDF
pasl pcrformnnce. ln the availability example aboye, we hacl an asset that failecl four
EQUAT10N 1.1 Sample M'TBF Calculation times in ayear resuJting in 32 hours 01' downtime. The availability calculation
dctcrminccl that the assct was available 99.63% 01' the time. This might givc thc
This is a good metric becallse it is easy for people to understancl and relate 10 I impression of a higbly reliable assct. But i1' \ve use the reliability calculation shown
and is common throughout industry. below we get a mueh different picture.
Tile fact aI' the matter i5, an asset lhat fails four times per year i8 extremely
llnreliable ancl the likelihood of that assct reaching a missl0n time of one year 1S
. --:-~~;// NUMBER OF EVENTS highly unlikely evcn though iLs availability lS very good.
This metric simply measures the volume of events that occur on a variety of climell- These are on1y a few common KPIs. As you can imagine there is an array af
sions. Those climensions are typically process units, equipment classes (e.g., pumps), metrics thal can be lIsecl to help measure the effectiveness of a maintenance amI
ecuipment types (e.g., centrifugal oumos), manufaeturer. ancl a host 01' others. This reliabilitv organization. Vve wiI1 disCllSS these in more detail in iust a moment.
:i""'!~~'.:f-""" ":... r' '. -,;:-~L.""" "~ L
"(}\-'';;\ ':','.1 '<6 \ -1-" ""J ,-':\Q R~~t Cause Analysis: Jmproving Performance for Bottol11-Line Results Jlllroductioll lo the PROACT Root Cause Ana!ysis (RCA) VVork Process 7
Reliability ~ e ~At '\fhen these threshldsare sel prperly for ~~gtu:ne.C!.:Sll).'.\~Jnent wc_ can objectivel)
assess our performance. O-{he-l:\vise, --\Ve -are sEnply collecting info~n;~ltfc;n- with l1l
Naturallogarthmic base: e=2.718 rears'iEnsc 01- Wllcth"ffle value is meeting our spcci-ied goals.
Let's gel back to the strategy map cliscussion. The process is to review ead
Fe'al' l . ).
ure rate: e00
1 i
MTBl'i, ~
1
91 objective that we deem important 10 our strmcgy- ancl list one or more KPIs tha
wl1l be accurate mcasurements for thal objective. Once vo/e define the I11casuremen
Mission time: t ~ 365 (days) and ca1culatioll, we neccl to determine the targct, strelch, critical, best and \vars
valLles 1'01' that measure. Upon completioll 01' this process we have a complelec
Reliability ~ e ~At stralegy map. Below is atable \vith somc example KPIs that relate to our object1ve~
t\.::......J~TA. L f-L ---Y'G
~ 2,718
~2,7189~!
~Al
1
- -(365)
and perspectives:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _-\,)
r ~'/\ l"'::'('l-~
4("
\!l... 0: J]i,=-
_CD "
~\~
I
-4.0109
TABLE L1
~2,718 ' Sample Completed Strategy Mal'
~L81%
So we now unclerstancl that 1vlTBF, MT~R, avallability and many other~ are
conn:1on measures .for the effcctiveness o~' e~t~ipment reliability: B~lt unless ti~ese
metncs are measurmg the performance o" a grven company obJectlve they rmght
not provide the beneilt that is trying to be achi~vecl. Therefore, we necel to first 1001<:
at each objective and Lhen develop pertinent 111easurcmcnts to see ir that objective
is indeed being met.
For example, if our objecLive were to reduce procluction unit costs we would
measure the cost per unlt of product produced. This will help us to tLnclerstancl if
we are getting better, \Vorse or staying constant wilh respect to our prod~lction costs.
Howcver, this alone is not enough. We necd to be more spccific when w~ are defining
our measurements. The term Key Performance lnclicator (KPl) as it is oi'ten rcfcrrecl
io needs to delineate the clifterence between good anel pOOl' pcrformi.lncb. Por exal11-
pIe, let us assumc that our average cost per unit o' procluct is $lO this n~ollth. Js lhat
cost high, meclium or low? In order to_ haye an il~diCil!9i-Y2~llq$t Q,91lne -he
measurement tlllcsholcls. In oUI- example, -';ve saicl-thatUle n~eragc c~)st~ej:-unlTtl~is
-~1110ntl1-\\ias-:nO. Perh~1PS=O~lr target value for production unit cost js $8. 1'herefore,
our performance is not very good.
A KPT has several thresholcls that ShOllld be dennccl prior ta the nionitoring of
'lhe measure's value. These are listecl below:
l. Target Value - This value specifies the performance required to mcel -he
objective.
2. Stretch Value - This yalue represents performance aboye ancl beyoncl
what is expected to meet our objectiyes.
3. Critical Value - This value represents performance lhat is clcemed
unacceptable for meeting OHr objectiyes.
4. Best Value - This is the best possible value for this objectivc.
5. vVorst Value - T11is is the \Vorst Dossib1e value for lhis ohiecl-iw~
8
fTAI31J;il.1 (pooti!l!.led),
Sample Completed Strategy Map
withPM
% ofPdM-Generated Work
Perfol1Jl_Retiability work,orders-.
S~udies Relitbility
Time
obs;rved from
BAlANCEDSCORECARD
'~1'1*'re;ll1e pr()Ce,,~ ,o l1foitoring the~~I(PI~Ol1 ~ routiI1e~a~i~. W~ w.i
ballln<:ed, sC9[ecard methodolog)'. tQhelp ust do j\1~t that. Ab~lance
''I''M"K'o. the per~pectives, oJ;jective,s':Cll1d me~sures introduced-in the strateg
Labor Cost :ofRepairs puls them)nto.an.ea~ily under~tood fOlilla!. A sample of a balance
.and KPI measurement are display"d below ..
10 Root Cause Ana!ysis: !mproving Performance far Bottom-Line Resu!ts ~cn:'ctiollto the PROACT Root Cause Allalysis (RCA) Work Process 11
Having a11 of your cdtical pelformance i~fonnation displayecl in Dne place makes
it easy faI everyone involved in the enterprise to see his or her performance ancl to
determine where to focus attention. :
Th15 process ~~~1-~!~~.g:~ that we are w~rking ~~~." th~,--~rlti~al. is~ues that 1110St
-= -----r-----'.- - - - ----.,. . . . .. - - -
~~
affect the pelfonnance of the business. 011ce we begin to 111onit011 the balanced
sCQ1:ecm:d 011 rtine 'b-asis"'\ve\vill begin t9 see the areas where we: neeel to make
improvements. For example, let us say we ~re monitoring unscheduled downtime '
<J ~.
o
as a mea8urement of the equipment downtiIhe objective. We observe that our o ~
t::! f~
fonnance for that KPI ls well below the target level. We then must investigate and ~ ;;.;
collect infonnation to see which events are c~ntributing to the poor performance fOl'
that objective. !
~
~
WOR~
tio
THE RCA
!
PROCESS " "" G
"O
@
A successful RCA initiative must have a stra'tegic ancl tactical plan in place. We just u
discussed the concept of a strategy map to lensure that we me measuring the key
metrics that wlll enable us to achieve our C0I11pany objectives. Let's talle more'
*' Ul
~
o
u
more about event data collection in Chapteri5 ancI Chapter 6. , ;t; ": '
,~
v
u
O
Once we have a process for collectil1g dtlta on these events, we must decide on ro
O
critera that wm initiate the execution of an RCA analysls. Por example, your strategy !J
O
'8
might dictate that any failures that occur on! critical equipment must: have an RCA
performed. This is very common for events that relate to safety ancl pnvironn!-ental
c,
";;: '"""uv
u
performance. We do not want to leave this iJrocess too amblguous qecause people o
will not know when, and under what clrcun1stances, to conduct an apalysis. '"
'@
It may be that you want to employ clifferent levels oi' analysis for different '"
v
a""
performance clitera. Perhaps you have many events that occur on nortcrltical eguip-
E 2: e
ment, but the frequency of the events ls causing a large a1110unt o~: maintenance ~
"'oo ;.:;
expenditure. This might 110t justify a full-blown team to peliorm tlie unalysis but
still woulcl justify some level 01' analysis to determine the reasons (al' the chronic
o ".""o
"!
'"
'"
maintenance events. These types of analyses might be much less fon~~al than a full- :o
blown RCA but stiU are valuable.
"
;;;
,s
.~
<.:J
:
'
Since every company is different and thus has clifferent goals adcl objectives it ~
'2
would not be prudent for us to define a generic criterion. However, we can delineate
.~
some examples that might be considered. In any plant, there i8 a ne~cI to optimize
maintenance expenditures. Therefore, we may want to consicler a ct-iterion that i8 '2,
mi
basecl on the amount of maintenance expended for a given piece ofequipment fOl'
a fixed time period (e.g., the last 12 months). 11' a pieee of equipmelt exceecls the
J
91
HI
threshold in that time period, then an RCA will automatically be initiated.
Another cornmon criterion can be based on production losses. This ls especially
tnle if yonr plant capacity 18 limited and you can market and sen everything that is
prodnced in your facility.lf there i8 a production 10s8 that exceeds a sp~cific financial
valne, then an RCA should be initiated.
UnitAvallability
,,,
,Ol'iner(s): Latino" KeD
FIeqll~rlt'i: i",onthjy
~
Ctlrrent Mea:mre: B!l/2DDS
~PIRanges ;;v
Actual ,94.1i
o
Pie'vlous s4.a 111 v.iorst-itkal S
Target
CJ,ltical
5tretcn
'97.!')O
92.00
'99.0p
CritcaHarget
Target-Stretch
/1..
100.. <. . .
\;)!?",
n
~
e
~
ro
Sc:re -ROO
>-
~
UnltAvilllability
"
~
Historical !i?'
10.0.00 3
-el
;;Z
80.00 a
~.
j5 ~
60.00- q
ro
;'
"""
=>
5'
20.00 3
0.00
"
~
n
ro
O'
~
w
S
S
3
r
:o"
ro
'"
ro
~
e
FIGURE 1.3 Sample Balanced Maintenance and Reliability Scorecard (2)
'"~
~
~
(3
Q.
e
s
"'
o"
~
~
:::c
ro
u
;;V
O
>-
Cl
'"
o
S
n
"
e
~
ro
>-
~
"
-<
~
;;"
?3
n
2':
~
A
u
an
ro
~
~
These are simple examples, hut it is importt~nt to make sure that there is an process called Opportunity Analysis (OA) where we collect the data thIOugh the use
'__. agreed upon crlteria for when RCA analyses will ~e in1tiatecl and who wili perform of an interview process of various pe1'sonne1 \vithin the affected area. In the subsequent
'-::.~ the analyses. At 111any facilities, there is a Reliability Engineer responsible for a chapter we will discuss a more automateel appraach ta elata col1ection that will ntilize
given arca 01' the facility and responsible to pelform RCAs on equipment/events in existing infonnation systems that may aiready be employed at the company.
his area. It ls then 1s responsibility to determine! which adclitional team member8 There are pros and cons ta both approaches. It generally comes clown to data
will be necessary to perform the analysis. We will biscuss team 1'ormation i~1 greater collectian processes ancl how effectively they have been employed. Many companies
detail in ChapteT 8. 1 : utilize a Computerized Maintenance Management System or CM1vlS to manage
The key to a successful analysis is to malee IsuTe that -you have the ~lata and maintenance work ancl to clocument work history. For many, these systems are nO
subsequent infonnation to determine the underlyin& causes 01' the isslle being studied. utUizecl to their full potential and ma11y times tbe work history on assets is not t'ully
The temn will review the problem and cleterml~e what data wil1 be n~eded to documel1ted. Ir this is the case, thel1 a manual interview process can be utilized to
determine the root causes. The PROACT methodo:logy offers a simple but ~frective perform the opportunity analysis.
ac1'onym calleel the 5Ps to help in :thi8 effort. The! 5Ps represent the five c~tegorjes Now that we have explored (he concept of the RCA \\Tork Process, we will
of data requn-ed to analyze any problem. We will ~iscuss the data collection effort narrow the scope ancl 100k into the fielcl of RCA itself and what it means in the
and more specifically t~1e SPs, ~n Cha~ter 6, . I . . '
industry, both from a use!' and provider perspective,
1.
Have. you eve~' s~t. 111 a brmnstpnmng meet1l1 to solve a pmtlcular problem 111
t11e company? ThlS lS a very common approach Ita problem solving, We are not
against the concept of brainstonl1ing. In reality, w(= think it is a required activity in
the RCA analytical process. The problem with 1119st brainstorming sessions 18 that
the groUl~ pres~nts a vari:ty of .ideas but s0111etim~s they lack the data to v9rify that
lhe ~olutlon wlll \York. l:'or th18 reason, the PR9ACT methoclology will ptilize a
LOglC Tree approach to solve problems. This ls a,!visual brainstorming tol. It is a
hierarchical approach in which the pl:oble~ is defi~ecl in the beginning of the process
and subsequent hypotheses and venficatlOns are ~ormulated and proven. The encl
goal of the process is to identify the true 1'oot causes of the problem, Thes:e causes
can be p~~sic~l, human or latent in .n~ture, We w~.ll discuss this later in Cl~apter 9,
IdentlficatlOn ofroot cause,~, al~$lli l~portant, wlll not ~olve t~le probl~n~. rh~ only
way for the problem to be resolved lS to nnplement ~orrectlve actlOns. Tlus lS typlcally
clone by creating a list of recommendations c\irectecl at eliminating ol' redJcing the
impact of the identified root causes. These recornmendations must be th~)t"oughly
reviewed by all parties to ensure that they are the right solutlons. Although c~lIses are
facts and cannot be disputed, recornmenclations shoulcl be thoroughly sdutinizecl
ancl modifiecl to ensure that they are the best course of action. We will cli~cuss the
process of communicating team finclings al"1:d recommenclations in Chapte1l 10.
As time passes we sometimes forget to follow up to make sure that our cbrrective
actions were implemented and are providing the specified return we had ibtendecL
Ir the losses related to the problem me still affecting plant performance al1d nega-
tively affecting our corporate strategy, then we shoulcl reevalua(-e om cprredlve
actlons to determine why they are not provicling the intencled benefil. Thel strategy
map discusseel earlier will help but we would recommend having reevaluatiol1 criteria
set fOl' each recommendation. For example, we might measure the number ot' failures
on that piece of equipment. If another failure occurs in the next 12 moi1ths, we
should reevaluate to see if the failure was related to the ineffectivenes~ of our
conective act1ons. We will discuss tracking results in Chapter ] L
Let's revisit our discussion on data collection methocls. We have various methods
to colleet historical event information. We would like to break it into two categories:
a manual ancl automateel elata collection process. In Chapter S we will discuss a
2 ofI ntrod uction to the Field
Root Cause Analysis
WHAT 15 ROOT CAUSE ANAlY5!S (RCA)?
What a seemingly easy qucstion lo answer, yet no standard, generally acceptcd
defmition ex!sts in the industry today. We participate in several RCA on-line cJis-
cussion forums wbere practitioners (beginners to novices) and providers internet for
the bettermenl of the industry. The two primary forums that ;ve cncourage interested
analysts to join are:
l. rootcauseconference@yahoogroups.com I , ancl
2. Root_Causc_Statc_of_thc_Practice@yahoogroups.com2
These are two very active fonuTIs with somc 01" the most knO\vledgeable people
in the business participating in tl1em. Issues that will be cliscussed throllghout this
text are clebatecl 011 these forull1s every day. Tbese forums play an important role in
how we see the industry as we learn what others me doing and the obstac1es they
face. This brings lIS back to the definition of RCA.
To our knowledge, there is no single, generally accepted definition ofRoot Cause
Analysis in the RCA inclustry. Technical societies, regulatory boc!ies and corporations
. have their OW11 clefinitions, but it is rare that we ftnd two definitions that match. T
demonstrate why ihis is, \ve willlist several definitions used and proposed in various
industries to show the many different ways in which people vie\v RCA:
I Tbis cliSCLlSSlon fomlll is assQciacd with WWW.rootc<luselivc.com ancl modcrated by 11r. C. Roberl Nelms.
2 Thi~ disellssion forum 18 moderatecl by Dr. William Coreordn of NSRC COIvoraoll.
lB Root Cause Analysis: Improving Performance far Bottom-Line Results Introduction to the Field of Root Cause Analysis 19
4. Root Cause Analysis i5 any process th~lt identifies the unclerlying weak- ARE All RCA MHHOOOlOCIES CREATED EQUAU
nesses that might lead to a11 adverse eve~lt or conclition, in arder to idenfy
opportl1uities 1'or improvement. (5/12/04 - Dr. Kenneth Hirsch) There are many proviclers of various RCA methodologies on the market taday. JVlany
01' these proviclers use tooIs that are considered RCA in the RCA community anel
AH of these discllssion forum posts l resulted *0111 tIle original clefiniti?l1 proposed many do not. Many have been in the RCA business for decades and many have just
by MI. William Salot: gotten ioto i1. The point here is 1hat this is a buyer be\vare field.
Anyone interested in shopping for RCA basecl solely 011 initial price, should
I hand out l pencil and piece of papel' and just ask his employees to ask themselves
RCA identifies WTIAT underlying causes neecl to be fixecl, nol HOW to fix thCl11.
-- I~m_ . "why?" five times and he will have his answe1'S.
Who is right? We do not thi11k that there i$ cne cure-all definition far RCA. As Por those companies looking to make dramatic strides in their operations, shop-
we can tell from aboye, the proposed defmitiop was re-shaped every tiine a debate ping on price alone will not suffice. Those serlous about RCA's being a majar
el1suecl abont the definition of individual worcl~ with the proposed clefinition. What contributor to their bottom lines will be interested in the methodologies involved
we do not \Vant to bappen in the industry 15 foripeople to be discouraged from doing anel what supportiog infrastructure may be required to be successful. We will discl1sS
RCA because some definitions make it seem tbo complexo Por the pmvoses 01' this both of these very important topies in cletail in coming chapters.
text, \ve feel that definition two aboye sllits Oll~' needs and captures our belief as to Many of the 1110st respected proviclers in the RCA industry normally have the1.r
what RCA should be. Therefore we wiH procd!ect on the bas1s 01' that cleiinition. own unique slyles and vocabularies, but there are also many commonalities among
them. PROACT@ 1S no different. These llniquenesses are what make the different
brancls of RCA proprietary to a certain provider. They make the brands stand Ollt
WHY DO UNDESIRABLE OUTCOMES
I
OCCUR? ' and separate them from the general commoclity te1111 of RCA.
THE BIC PICI URE I Por the llsers this is botl1 gooel and bad. It ls good to have variety and competition
in the marice! to keep investment dowo and provide choices far specific work envi-
V:e must pul asiele the industry lhat we work in and follow along [mm the standpoint
ronmellts. It is sometimes bacl because no generally aceepted standarcls emerge to
01 the human bemg. In arder to understand 1 hy undeslrable outcOll1fS eXIst, we
which aH true RCA methods should comply. Also, because there are so many RCA
must understancl the mechanics offailure. Virtually a11 unclesirable outcomes are the
methods on the market, the use of tenninology ls at best inconsistent when comparing
result of human errors of omission or commisrion Cor decision errors),: Experience
them. This further confuses users when they try to compare terms like our physical,
in industry indicates that any undesirable out~ome will have, on aven\ge, a series
human ancl latent 1'Oot causes with terms Hke, contributing faetars, primal)' root
of 10 to 14 cause-and-effect relationships thaf queue up in a particul*r pattern in causes, underlying root causes, approximate 1'Oot causes, near root causes, mitigating
order far that event to occur. ! i faetors, exacerbaling faetars, proxil11ate causes, etc.
This dispels the eommonly helel myth thai one error causes tl1e ul~imate uncle-
sirable olltcome. AH slleh llnclesirable olltcomes will have their 1'oots 1Jmbeclcled in
the physieal, human and latent areas. ATTEMPTlNC TO UNDERSTANO RCA - 15 THIS
I COOO FOR THE INDUSTRY?
Physieal Roots are typically found soon after er1'Ors of commission\ ar O1nis-
sion. They are the first physical conseqllences resulting from la human Valiant attempts have been made by the joint provider and user communities to
decision error. Physical 1'Oots, as wiU be describecl in detail i11 coming develop a standard for indllstry. One such attempt was to model it after the SAE
chapters, are in essenee tangible. JA-lO] 1 RClvl standmd I . Debates arose as to whether such a standmd is neecled at
Human Roots are decision errors. These are the actions Cm inaCtlons) that aH and if so, can one be deve10ped without eonstraining the task of RCA itself?
trigger the physical roots to sUlface. As mentloned aboye thes~ are the Becallse RCA reqllires sLlch open boundaries to the disciplinecl thought process
errors o' omission or commission of the human being. required to fincl lhe truth, would developing a standard b.ias possible outcomes?
Latent Roots are the organizations or systems that are flawecl. The~e are the Creating an RCA standard may define the boundmies bf RCA cliferently than
support systems (i.e., proceclures, training, incentive systems, purchasing some providers' methodologies. In sorne eircumstances, some provide1's' established
habits, etc.) that are typically put in place to help our workforce m*ke better RCA methodologies may now be deemed non-compIlan1. This would obviously be
decisions. Latent 1'Oots are the expressed intent of the humml decision a detriment to their businesses, ancl naturally they would oppose the development of
making process.
I Evalllation Criteria for Reliability-Centered Maintenance (RCIVI) Processes, G-Il Supportability
l AH posts printed with pennission of lhe website moderaloIs al rootcRuseconference@yahoogrollps.com Committec, SAE Standmds, Document ti JAlO 11, August, 1999, (http://www.sae.org/servlets/product-
20 Root Cause Analysis: Improving Performance far Bottom-Line Resu!ts !ntroduction to the Fie!d of Root Cause Ana!ysis 21
sllch a standard. For instance, if an RCA standfrcllisted validation of each hypothcsis are not typicaUy basecl in fact. They typically allow ignorance and assumption
with hard eviclence as cssential to ReA, then lypical brainstorming tcchniques would \he<~'say) to be viewed as fact. These are attractive techniques to such a reactive
be non-compIlant. If another RCA essential were that the tcam I11cmbers hacJ to create envlrollment because they can be concluded very quickly, oftentimes in a single
the logic by exploring the possibilities of hO\\~ something could have occurred, thcn session with minimum participation (iE any).
the use of pick-list RCA methpdologics woulJI, be non-compliant. Why do such techniques conduele so quickly? Time usually is not requirecl to
Pick-list RCA is when the methodologies either provide paper templates with thcir collect data or evidence to support the hearsay hypotheses. Usually data collection
list of possibilities or, if software oriented, ctrdp clown lists appear with~ the vendor's and testing is the bulk of the lime requirecl in any investigative occupation. In accident
possibilities provicled. While these approachesi 011 the sllrface seem the':l11ost logical investigations, think of what \veight they would cany without providing harel
3nd the easiest route, there aredangers. One stlch danger is the user believes that all evidence, Ifthe National Transportation Safety Board (NTSB) didn 't collect evidence
I .
the possibilities that could have contributed to hef undesirable outcome' are provided at airline crash scenes, what credibility would they have when issuing condusions
I .
in this Iist. That willllkely never be the case as no vendor can claim to capture aH of and recommencIations? What weight would a prosecutor's case in court cany if they
the vmiables associatecl with any event in ever)! environment. The second clanger, and had no evidence except hearsay?
perhaps the greater, is tllat the task of RCA is ~neant to mise tile knowleclge and skill
levels of the workforce. A methoelology that ~rovides what appears to be all of the
HOW TO COMPARE DIFFERENT RCA
llilswers does not force the users to explore lhe possibilities on their own and therefore
they do 110t learo. They are simply doing pain~by-the-nul11bers RCA. METHODOlOGIES WHEN COMPARING THEM
Unf011unately, for the user community eSP9cially, the endeavor to develop a C0111- When rcsearching RCA methoclologies, we should consider characteristics other
mon standlli'd never carne to pass because the ~llajor providers couId ne~er come to a than investments. While the initiaJ investment may be very inexpensive, our greatest
consensus (vvhich is 110t unusual). Ifreaders wantted to take it upon themse~ves on behalf concerns should be that the methodology has the breacIth ancI depth to uncover aH
of their corporations to clevelop an RCA stand, rel intemally that outline.<:;, the essential of the root causes associated \Vith any undesirable outcomes. 11' we focus on cost
elements of an analysis process in order forit to be considered Re:,"-, we woulcl anel not value, we may find that the lifecycle costs to support an inexpensive RCA
encourage them to obtain a copy of the SAE JA-IOll RCM standard ancl use it as a methodology will cost 100 times the original investment when the undesirable
draft baseline for the development of a similar dbc1ll11ent for RCA in theiI'[organizaon. outcomes persist ancl upset daily operaons,
As \Ve can tell from reading the SAE stanhard referenced aboye, it1is not biased We suggest that when a facility has properly researched the various RCA
to any provicler or methodology. Tt simply cl~rifies far the arganizati~n what they methoclologies on the market, it short-list the top tIu'ce providers based 011 the
consider to be the essential elements of RC~. This is important beca~lse there are company's internal requirements (i.e., the standard that we discussed earlier). It is
divi~ed camps. on wh,at is the ~cope of Re.!}. Some feel tile tasks .9f ide:1~ifying, also advisecl that the short-listed providers submit references prior to any future
quahfied candldates lar ~CA lS not RCA ltself. Some fee.l that tIle v:ntI~lg o meetings. Discussions with references should focus on comprehensiveness of
~ecommendations and the~ subsequent approval p~~cess ancJ 11l1plemeptatlOl1 l~ llot approach, efficiency ancl effectiveness, necessary management support and general
1ll the scope of RCA. Havmg such a document clan[les what the cOll1P4ny conslders acceptance by organizational personnel. Vle would be seeking to sift out the advan-
to be RCA, and, more importantly, what is not consiclered RCA. . tages ane! disaclvantages of the provider's approach that these users have experienced.
We want to be sure to understand isslles that are llnder the control o the provider
and issues that are uncler the control of the purchasing organization. For instance an
WHAT 15 NOT ROOT CAUSE ANAL YSIS? '1
, organization may select the best RCA option for their environment, but if the
Tt is common knowledge in manufacturing and healthcare tocIay that r;1 majority of management support il1frastructure is not in place al1cl the effort fails, it may not be
clepm1ments are understaffed and ullclerfuncled. When looking furt!;her nto the due to a f1aw in the selected methodology.
increasecl risk of error by a human being, one shoulcl note tl1at being verwhelmecl Once short-listecl lhe providers shoulcI be given the opportunity to present their
wth emergencies fueIs the environment of error. I approaches either in-person or via Eve on-Ene conferencing technologies. This is
The acceptance of common brainstorming techniques such as the ~ishbone Dia- where they shoulc1 be qllestioned and evaluatecl based on the merits of their
gram, the 5-\~Thys a.nd process fl.o~ mal~ping tecl~iques haveyrovicled jmany a .false approachcs ancI t1.1e breaclth and clepth of their offerings. Keep in mind that thi8 will
-v sense of secunty.11us false sense oL seclll1ty comes rom the bellef that these techmques
are compm'able to tme RCA. Again, t1.1is reinforces the need for an internal standard
also require preparation 011 tIle analyst's side in terms o' preparing eelucated and
detailecl questiolls related to the methoclology and not just pricing structure.
that defines the minimum essential elements to be considerecl RCA in theorganization. One tool we provicle om prospects that are researching RCA methodologies is
The aforementionecl techniques are refen:ed to as brainstorming tedmiques anc1 the evaluation tool shown in Figure 2.1. This ls an unbiased way of equally evaluating
not considered RCA techniques within tbe RCA comlllnnity. This is becan,e thoy several approaches basecl on cllstom weighting of methodologv characteristics.
22 Root Cause Analysis: !r:nproving Performance far Bottom-Line Results Introduclion lo the Field of Root Cause Analysis 23
Notice the characteristics (in this case) in which we have decided to compare
the methods short-listecl are:
Remember, these are only a si1mpling of critera in which RCA methodologies can
be evaluatccL The organization 's Cyalllation team should COme up with its own list
based on the organizntion's own neec!s. Once the critera have been established, then
24 Root Cause AnaJysis: lmproving Performance far Bottom-Line Resu!ts Introductioll to the Fleje! of Root Cause Ana!ysis 25
tbe evaluation tcam caH weight eacb aL these! factors as to tbeir importance to the OBSTAClES TO LEARNING FROM THINGS
oyerall decislon, We typically use a weighting scale of 1 to 5 where "1" has the THAT CO WRONG
lmver impact on the cIecision and "5" has the, greatesL. Once these are estabEshed
and cntered lnto a simple spreaclsheet like Figure 2.1 (after the evaluatlon team meets In a recent informal on-line poU! prescmed to a group of beginner ancl veteran RCA
with each provider), they wiU fill out th18 e~aluation form inclividual1y ancI thcn practilioners, the following ql1estion \vas asked on the discusslon forum:
average thcm together as a team.
When the individual forms are compareclj if there are great disparities in any "W/wl are the obstacles to learningfrolll fhin.gs Iha! go wrong?"
particulru- critera it should be a signal that furtljer discussion 18 needecl to lloclerstand
why there 18 such a gap in how temn members riew the same thing. This approach is The following list is a surrunary 01' the responses grouped into appropriate
categories by the moclerator. Some cxamplcs of the actual responses are below each
a quick and unbiased manner in which to comp~~e offerings 01' any kind, qot just RCA.
category to heIp define what \Vas meant by tIle category title.
WHAT ARE THE PRIMARY Dl,FFERENCES BETWEEN 1. RCA ls almost contrary to human nature: 28%
SIX SIGMA ANO RCA? a. People don 't like to admt they made the mstake.
b. Accountability. If yOll are the boss, that is it!
Where lioes RCA tit in Six Sigma? The focal :point of most Six Sigma efforts will
"
2. The second primary role of the RCA champion is to be a mentor to the 3. The champion will also be responsible for setting pelformance expectations.
drivers and the al1alystq. This means tl~at the champlon must be eclucatecl Tl1e champion should draft a letter that will be forwarded to aH employees
in the RCA process anc;l have a thorough unclerstanding of what lS neces- attendi.ng the RCA training. The letter should clearly outline cxactly what
sary for success. ! ' is expectecl o' them and how the follow-up system will be implemented.
3. The third primary role 01' the RCA cha~pion 15 to be a protector of thase 4. The champion should ensure all training classes are launched either by
using the process and unCQver causes: that muy be politically sensitive. the champion, an executive or other person in authority, thereby giving
Somet1mes we refer to this role as pro+iding air caver for groll1~cl troops. credibili.ty and priority lo the effort.
In arder to fulfill this responsibility, the RCA champion l11ust be in a 5. The champion should also be responsible for developing and setting up
position of authority to take a clefensiye positiol1 and protect t~e person a recognition system for RCA successes. Recognition can muge f1"0111 a
who uncovered these facts supportlng ;the iclentified catlSes. letter by an execulve to tickets to a ball game. \Vhatever the incentive,
it shoulcl be of value to the recipient.
'--f'l:>? Ideally this would be a fun-time pOSition.!However, we fincl it typically to b~ a
part-time effort for an individual. In either situation we have seen the champlOn Ncedless to say, the role of a champion ls critical to the RCA process. The lack
L
work; the key lS the role must be made a priority to the organization. This ls generally oYa champion is Llsually why most formal RCA efforts faiI. There is no one leading
accomplished if the executives perform the d~signated taslcs set out ~bove. Whcn the cause or carrying the RCA flag. If an organization has nevcr hacl a fonnal RCA
new initiatives come clown the' pike and the \vrkforce sees no support, it becomes effort, or had one and failed, such an endeavor is an uphill battle.
another "they are not going to walk-the-talk" ~ssue. These are viewecl as lip service
programs that will pass over time. If the RCY'\' effort ls going to succeed, it must
THE ROLE OF THE RCA DRIVER
first break down the current paradigms. RCA Imust be viewed as clifferent than the
other programs. This is also the RCA champi;on's role in projecting ap image that The RCA driver can be synonymous with the RCA team leader. Drivers are the
t111S is different and will work. I people who organize all the details and are closest to the work. They carry the burden
The RCA champion's adclitional responsibvities inclucle ensuring that the follow- .of producing bottom-line results for the RCA effort. Their teams willmeet, analyze,
ing responsibilities are carried out: 1 hypothesize, verify information and draw factual conclusions as to why unclesirable
! Qutcomes occur. Then they will develop recommenclations or COllntenneasures lo
1. Selecting ancI training RCA drivers wl~o willlead RCA teams.[What are eliminate the risk of recurrence of the event.
the personal characteristics that are requirecl to make this ai success? The efforts of the executive, manager and chmnpion to support RCA are directed
vVhat kind of training does the perso~.l need to acquire the tqols to do at supporting the driver' s role to ensure success. The driver is in a unique position (.'
the job right? I in that he cleals clirectly with the field experts, the people who will comprise the
2. Developing management support systems sllch as: I eore team. The personality traits that are most effective in this role as well as that
A. RCA performance crlteria - vVhat are the ~xpectations o:~ fimU1~ial of l core team member will be discussed at length in Chapter 8.
retums that are expected from the corporatlOn? Whal are the tune Prom a functional stanclpoint the RCA driver's roles are:
frames? What are the landmarks? !
B. Providing time - In an era of re-engineering ancllean m<:mutacturing, 1. I'v1aking arrangements for RCA training f01" team leaders and team mem-
how are we going to mandate that clesignated employecs 'fill spencl bers ~ This includes setting up meeting times, approving training
10% of their week on RCA team3? J ' objectives, anc1 providing adequate trainil1g rooms.
C. Process the recommendatlOl1S - How are recommendatlpns Tom 2. Reiterating expcctations to students - Clarify to students what is expected
RCAs going to be hanclleci in the current work order" system? of them, when it is expectecl, and hm\' it will be obtained. The driver
How cioes improvemenl (proactive) work get executed in a reactive shoulcl occasionally set ancl hold RCA class reunions. This reunion ShOllld
work arder system? be announced at the initial training so as to set an expectation of demon-
D. Provide technical resources ~ What technical resources arq going to strable performance by that time.
be rnade available to the analysts to prove and clisprove their hypotheses 3, Ensure thal RCA sllpport systems are working - Notify RCA champion
llsing the "whatever .it takes" mentaEty? oE any cleficiencies in support systems and see they are conected.
E. Provicle skill-based training ~ How will we eclucate RCA team 4, Facilitate RCA teams - The driver shall !cad the RCA tcams and be -"'"'
members ancl ensure that they are competent enough to participatc responsible and accountable for tl1e team's performance. The driver will
on such a team? be responsible for properly clocllmenting every phase of the analysis.
32 Root Cause Analysis: lmproving Perfo:rmance far Bottom-line Resu!ts Creating the Environment for RCA to Succeed 33
5. Document performance - The driver will J;le responsible for cleveloping indicators (KPI) of each i.rm, but in this section we will look at provcling a typical
the appropriate metrics to ll1yasure performa~1ce against. This performance business case to justify implemel1ting an RCA effort.
shall always be converted fmm units to dollr~ when demonstrating savings, Because the costs to implement su eh an effort wiU vary based on each facility's
hence snceess. ' procltlct sales margin, labor costs and training costs (in-hollse versus contract), we
6. Ensure regulatory compliance - The clriyer 811a11 be responsible' fOl" will base otlr justifications on the following assumptions:
ensllring tIlat the analyses conductecl are tliorough ancl credible cnqugh
to meet applicable regulatOly stanclards andl guic1elines. 1. Assllmptions
7. Comillunicate pClformimce - The driver sl~all be the chicf spokespe~'son a. Loaded cast al' hourly empJoyee $US 50,000/yr
for the temn. He or she will present updates;to management as well *s to b. Hourly employees will spend 10% of their time
other individuals on-site ancl, at other similruj operations that couId bet1cfit on RCA teams
fmm the infonuation. The driver sha11 develop proper informatlon di,stri- c. Laaded cast af [ull time RCA driver (saJaded) $US 70,OOO/yr
bution mutes so that the RCA resu1ts get to 6thers in the organizatiOlhat eL RCA driver will be a full time position
may have, or have had, similar occurrences' . e, RCA training costs (hourJy) $US 400/person/day
f. RCA training costs (salariecl) tus 500/porsan/day
The driver is the last ofthe support mechunisms .hat should be in place to ~upport g. Population trained Per 100 trained
an RCA effort. :rvrost RCA efforts that we have enqounterec! are put together at the 2. RCA Retllrn Expectations
last minute as a result of an incident which jllst 09cuned. We cliscussed thls topic a. Train 100 houdy employees in RCA methods
earlier reg,arding, llsing RCA only as a reactive too~. b. Train 1 salaried employce to lead RCA effort
A strLlctured RCA effort should be properly plbced in an organizationali chart.
o
c. Critical Mass (assumption): 30% of those trained will actually use the
Because RCA is intended to be a pro active task, it 'Ishoulcl reside under the 'control RCA method in the fieleL Tls results in 30 personnel traincd in RCA
oE a structured reliability departmenL In the absenci oi such a department, it,!shoulcl methods actually applying in the ficld (lOO trained X 30% applying).
report to a staff position such as a vice president ofoperations, engineering,.:quality d. Of the 30 personnel applying the RCA method, let us as sume they are
or risk. Whatever the case may be, ensure that an RiCA ea'ort is never placey under working in teams of three (3) at a minimum. This results in 10 RCA
I
the control of a maintenance department (DI any otfler reactive clepartment) i, By its teams applying the methoclology in the field (30 personne1l3 peT temu).
nature, a maintenance c1epartment is a reactive entrty. Its role is to responcl to the e. Each RCA temn will complete one analysis evely two months. This
clay-to-clay activities in the fielcl. The role of a trlle :reliabillty c1epartment ls ~o look results in 60 campJeted analyses per year (10 RCA teams x 6 analyses/yr).
at tomorrow, not today. Any pro active task assigne~ to a maintenance clepartfnent is f. Each "Significant Few" (to be discussed in Chapter 4) analysis will
typically doomed from the stmL , :' net a mininlll!11 of $US 50,000 ANNUALLY. This results in an anI1ual
This is the reason that when reliability became a buzzword of the mid-90:s many return of $US 3 million per 100 people trained in RCA methods.
maintenance engineering clepartments were renamed reliability deparlmen:ts. The 3. The Costs of Implcmenting RCA
same people worked in the department, and they were perfonning the san~e jobs; YEAR 1
however, theT title was changed and not their fUl1ctol1. If you tu'e an nclividllal who a. Training 100 hourJy emplayees in 3 days of RCA $US 120,000
ls chaTged with the responsibility of responcling tQ daily problems ane! also ~eizing b. Training 1 salaried persan in 5 days of RCA $US 2,500
future opportunities, you are likely never to realize those opportunities. R~action c. 10% of 30 hourly employees time per week, annually $US 150,000
W111S every time in this scenario. eL Salary af RCA Driver/Year tus 70,000
Now let's assume at this point we have developed aH the necessary systems and e. Total RCA Implcmentation Costs for Year SUS 342,500
personnel to support an RCA effort. How do we know what opportullties to work
on first? Working on the wrong evel1ts can be counterproductive aneI yielcl poor YEAR2
results. In the next chapter we will disCllSS a techniqlle to use to seH why you shoule!
work on one event versus another.
a. Training 100 hourly emplayees in 3 days of RCA O sus
b. Training 1 salariecl person in 5 days of RCA SUS O
:. ~ \' ,'C..y. ~ 'Y:ll-> l\ IL
c. 10% of 30 hourly employees time per week, a!lllually SUS 150,000
, ,.). 'L <'- 1'-" i<J SETTING FINANClAL EXPECTATIONS: eL Salary af RCA Driver/Year SUS 70,000
THE REAUTY OF THE RETURN e, Total RCA Implementation Costs [or Year 1 $US 220,000'
As cliscussee! earlier, one of the roles of the champion ls to delineate financial AH costs of resources to prove hypotheses and implement reCOlllinen-
dations are consirler~rl f1S <;nnk- rn"-,, 'Tp{'hn;ro<:d ,."'" ..... ,,,.,,,, .......... ~ ,"',<... __ ....1-.
~-;=> expectations of the RCA effort. This will obviouslv varv fmm th~ J.:-f'.V llf'.rfnrm:mrp
34 Root Cause Analysis: lmproving Performance for Bottom~Line Results Creating the Environment fol' RCA to Succeed 35
I
available and budgetecl for, regardless arReA. Also, recommcnclations the mass exodus ofknowledge and experience occurs in industry, how will businesses
from RCA generally result in the implementation of organizational compensate and be ab1e to compete in the global econorny?
I
system corrections. For instance rc\yriting
,
procedllres, providil1g
.
train- RCA actually can playa majar role in filling this corporate memory void. RCA
ing, upgrading testing tools, restructuring incentives, etc. These types is a too1 that maps out a process usecl to succcssfully solve a problem. This map in
01' recommendations are not genei'ally considerecl as capital cosls. essence is an aggrcgated thought based 011 the collective knowledge and experience
Capital costs resultil1g from ReA, our experience, are 110l the norm, o' om workforce. \iVhat we need to do 1s 1) encourage the activity of RCA in a
but the exception. clisciplinecl manner and 2) electronica!ly catalogue these analyses in a manner in-~
4. Return-On-Investment which future employees can view how previous analysts derived their conclusions.
a. Total Expected ReturD -Year $US 1,500,000' Activity one aboye can be accomplished by writing a procedure for RCA that
b. Total Expeetecl Costs . Year I tus 342,500 will survive the absence of a previous RCA chall1pion. '0le want the activity of RCA
c. RO! Year I 437% to still be expectecl by the organization va policy and procedure.
ASSllmes that it will take six (6) months to train a11 il1volvecl ancl get The fol1owing is a sample RCA procedure 1 we have usecl in industry in the
up to speed with actually impler11cnting RCA ancl the associated past. It ShOllld be usecl as a draft to moclel a more accommodating one for an
I
reconunendations. This is the reasoning 1'01' cutting this expectation in individual facility. 04--
half for the first yeaL
I
I ~ e'L.c0 Q,," '.-S
a. Total Expected Return ~ Year 2 $US 3,000,000 RElIABlUTY CENTER, INC, f<.\ .r.~ --'''.''' ~ ',""",-C (.
b. Total Expectcd Costs . Year 2 tUS . 220,000 \? :iA L'';;' ,i) \ ........... , ~ ,,-u
Sample PROACT RCA Proceclure c~ :;L?!~ (,.'" ~n~- (~
c. ROl Year 2 1360%
I 1. PURPOSE
As we can te11 from these numbers, the OPIJortunities are left to the:lmagination.
a. To provide consistency to the organization in the application of the
They are real; they are phenomenal to the ppint they are unbclievabte. When we
PROACT Root Cause Analysis (RCA) Procoss.
review the process we just went thIOugh, lool~ at the conservativeness '; built in:
b. To provide guidance in the following areas:
i : Reguests
l. Only 30% 01' those trained \Vil! aetuaUy apply the RCA methofl
Analyses
2. Students wi11 spend on1y 10% of their time on RCA .
Reporting
3. Stlldents will work in teams of three (3) or mOfe
Presenting
4. Stlldents will complete only one (1) RCA every two months
Tracking
5. Each event \Viii net anly tus SO,OOD/year
2. APPLICATION/SCOPE
Use this same cost-benefit thought process and plug in yom OWI~
numbers to
This procedure applies to all users of the PROACT process conducted in
compliance with aH Safety Policies and Procedures unless otherwise
see ifthe ROIs are uny less impressive. Using rhe most conservative Stl.~1ce, ir wOllld
directed by the Department Manager.
appear ilTational NOT to perform RCA in the fielcl. How many of ou~ engineering
3. RESPONSIBILITY
projects would be tumed clown ir we demonstrated to management a (ROl ranging a. The Supervisor of Reliability Engineering (or equivalent) sha11 have
from 437% to 1360%? Not many1 .
the responsibility to review, amend, and revise this procedure as
necessary to insure its integrity and application.
INSTlTUTIONALlZING ROOT CAUSE ANALYSIS (RCA) b. The Supervisor of Reliability Engineering (or equivalent) shall have
IN THE SYSTEM the responsibilty to develop, implement, review, amI revise related
proceclures andlor documents reqllired in this procedure.
In an era where most col1ege graduates willlikely be employed by a minimum of 4. DEFINITIONS
five employers in their eareers, stability al' tumover is difficult to control. This poses a. Champion: Usually a person in authority that sponsors and mentors
a problem with what is often ca1led corporate memory. C0l1JOHlte memory i8 the the principal analysts and supports the RCA effort.
ability to retain the knowledge and experience of the workforce in the midst of a b. Charter: Defines the charter (or mission) of the RCA effort.
h1gh tl1l11over environment. How does a company expect to produce a guality product c. Chronic Events: Events that occur repetitiously.
in a consistent manner when its workforce is inconsistent? This is an especially
36 Root Cause Analysis: lmproving Performanc::e for Bottom-Line Results Creating the Environment for RCA to Succeed 37
, I
!
el. Critical Success Factor (CSF): Identifiable rarker tllat will signal 8. VITAL MANY/CONTINUOUS IMPROVEMENT
the RCA effort has been successflll. Guidelines in which the RCA Tbe RCA of the Vital Many events will be led by a PA or other qualifiecl
temn operates. i personnel that are not in the Reliability Engineering group.
e. Logic Tree: A graphical represerltation of logic u~ed to uncovcr physical, a. Assignment of Champion: The Division Reliability Coorclinator \vill
human ancllatent l"Oot causes. : be assignecl as the champion of the event that falls within their divisioll.
f. Opportunity Analysis (OA): A techniqlle to ide1ltify the most 1111portant 1. A PA Qf other qualified personnel will be assigned or obtained by
failures (signiflcant few) to analyze. the Division Reliability Coordinator to lead the RCA.
g. Principal Analyst (PA), Qualified: The individu'al assigned the respon- 11. The Division Reliability Coordinator's role is to provide the
I
sibility of leading and completing the RCA. The individual is qual- resources or obtain the resources that the PA needs to do the job
ified based on tIle!r sllccessful compledon of the PROACT right and to identify and remove obstacles that hinder their analysis.
Certification Workshop. l 9. DETERMINATION OF TEAM MEMBERS
h. PROACT: A software program that facilitatds, the PROACT RCA Certain events will require a team to be formed while others will noto If a
process. team needs to be assembled the PA will malee a recommendation to the
1. Root Cause Analysis (RCA): Any eviclence-cl!!lven process that, at a Division Reliablity Coordinator. The following items also need to be
minimum, uncovers underlying truths abou( past aclver5e events, addressed when selecling the temu.
thereby exposing opportunities for making lasng improvements. Multl-clisciplined (i.e., mechanical, electrical, financial, managerial,
J. Significant Few: The 20% of the failure events ~hat have been cleemed hour1y, etc.).
to be accountable for 80% of the 105s. Th1S nfo11ation ls clerived from Personnel directly affected by problem or event.
I
the OA. I Personnel who may be involved \vith implementation of solution.
k. Sporadic Event: A one-time catastrophic event) Excused fram normal work assignments while working on RCA
1. Vital Many: The many deviations that OCCllr in a [aciIlty that equate (similar to HAZOP Stuc1ies).
. .llnprovement e fforts.
to contmuous 1
10. RCA METHODOLOGY
5. REFERENCES a. When a team has been fonnecl that is not familiar w1th ReA, the team
a. Site Poliey Manual will aUend, at a minimum, one-day problem solving methocls (PsrvI)
b. Site Safety Manual course before praceeding with the analysis.
c. Site Quality Manual b. The team will accunltely define the event.
6. SPORADIC EVENTS C. The charter ancl critical success factors (CSFs) of the analysis need to
a. An RCA is requested for sporadic events with a tptal cost (maintcnance, be deve10ped so each team member lmows the purpose of the allalysis
operations ancl lost profit opportunities) great than $100,000. Listecl effort and ir the effort is successful.
below are several examples: d. Develop Strategy for Col!ecting the 5~Ps. The team or PA needs to
Un]lredicted Event develop the strategy for capturing the 5-Ps. This may involve taldng
Property Damage pictures, retrieving clata from the operating instrumentation, interview-
Lost Procluction ing personnel, etc. The urgency that tbis data is col1ected will depencl
b. An RCA is reqllestecl for inciclents that resultecl in or could have upon whether this lS a chronic or sporadic event.
reslllted in personal injUIY or damage to cquipmcnt or property as e. Assignment 01' 5-Ps: The PA will assign the 5-Ps (listed below) to team
defined in Section X of the Safe Practices !vIanua!. members \vho will be responsible for collecting the data.
c. An RCA ls requested for repeat customer complalnts and complaints Parts
from key cllstomers. Position
7. SIGNIFICANT FEW Peo]lle
A Qualified PA willlead the RCA of the Signific<1nt Few events t11at were Paradigms
identified by the Department OA, unless rec1irected by the Reliability Paper
Coordinator and/or the Department Manager. f. Analyze: Using the data collected, develop a logic tree.
a. Assignment of Champion: The Division Reliability Coorclinator will i. The logic tree will not be considered complete unless all the appli-
be assignecl as the champion of the event that falls within their Divislon. cable latent roots are identified.
]. A qualifiecl principal analyst (PA) wil! be assignec1 as the PA for g. Hypothesis VerificatioIl: Each hypothesis block OIl the logic tree needs
the Sienificant Few evenj's ::SSifTllf':rl t'o 1"hf': rlp.n::n-t1l1pnt t" hp \/p,;r,lPr! {nrr,,pn ",. rliC""\lY\\II"n\ 'T'h;" ;" "nF' "F th", n.,,,,,t {'r1lr';"l
38 Root Cause AnaJysis: !mproving Performance for Bottom~Line Results Creating lhe Environment for RCA to Succeed 39
1
steps in the RCA process. Without verlf1catiol1, thc finclings ancl iii.PROACT Software Training: All users of PROACT RCA soft-
recommcndations of the RCA are meaningless ware sha11 successfu11y complete either the five-day RCA Methods
h. Review Logic Tree: The PA will c01tact the Divion Reliability Coor- trainll1g or the one-day PSM training befare becoming eligible rOl'
dinatar whcn tile team is reacly t~ Irevi~w tIle logic tree. The review PROACT' software training. AH potential PROACT USers are
shoulcl talee place befare proceeclll1g wlth tile report ane! the formal required to attencl a four-hour short comse in hands-on PROACT
publishing of the ana1Y818 in the PROACT software programo instructor-led training.
i. Write Report: The report should illeIucle the following sections: b. The PA sha11 be responsible fOl' the complete accuracy of the analysis
Executive Summary I lItilizing the software programo Team members sha11 lIpdate their
Description of Event i i responsibilities in any given analysis; however, the PA is ultimately
Description of Nlechanism ! responsible for reviewing the accuracy and thoroughness of the com-
Review of Causes and Recmrlmcnclations plete analysis.
Assignment of Respol1sibilities ancl Time Lines c. The PA will as sume the responsibility of when it is time to publish
Dctailed/Technical Section I the RCA. PlIblishing the analysis in PROACT means that the com-
Detailecl Recornmenclations pleteel RCA is certifiecl to be credible and thorough. Once publishcd,
Appendices the analysis serves as a logic template for the rest of the corporation.
Participants Involved Publislng also means that aH sensitive materials have been reviewed
5-Ps Data Collection Forms by the legal eleparlment ancl have been approved for pllblishing in
Verification Logs this formaL
Logic Tree
I d. The PA w111 reserve the right to passworcl protect the RCA. Only team
I
J. Develop Draft Recommenclations: 11 presentation of the finclings 01' the members o' that specific RCA shall be permitted to have the passworcl.
RCA sha11 be given to personnel af~ected by implementing tbe reCOill- It sha11 be the responsibility of the PA to remove the password once
mendations and to personnel who wIll implement the recomm.cnclations the RCA has been published.
ancI others as applicable. This willl provide input that may! ai'fect or i2. CORRECTIVE ACTION AND TRACKING
change specifics about the recommendations. Personnel will be assignecl responsibility fOl' the corrective actions nec-
k. Revise ancl review the recom1l1encl~tions as necessary. essary to implement the recommendations that result from the RCA. These
1. Develop conective acrion items 'or; each of the recol11mencl~ltions. corrective actions will be tracked anci a repOli issuecl.
I .
ffi. Fonnally present findings ancl reqommenclations to the Reliability a. The Division Reliability Coordinator ancl PA will assign responsibil-
Temu and/Ol' appropriate managcn~cnt personnel for ill1 Plelnentation rOl'
ity the corrective actlon ltems un1ess otherwise directed by the
approval. Department Manager or his designee.
11. UTLlZATlON OF PROACT RCA SOFTWARE i b. The PA wil! l1l)fy a member of the reliability gronp (RG) that the
AH documentation o' RCAs is to be stored electronically Llsing tile RCA corrective actlon items have been assigned.
PROACT RCA software program 011 the designated cliq~t server. c. The PA wil! see that a copy ofthe full report (hardcopy and electronic)
Use of this program sha11 be in strict accordance with the ]lcense to is given to the RO fer I1ling purposes.
the corporation. I eL RCAs that result from events listing safety procedures will primaTily
a. User Prerequisites: AH users oE PROACT mLlst first supcessfully be halld!ed by plant protection or environrncntal affairs. These dcpart-
complete requisite trai11ing in one ol' more of the following comses ments are rcsponsible fOl' tracking corrective acon items that result
based on their participation in the analysis. from these RCAs.
i. PROACT RCA Methocls: Al! Principal Analysts (PA) shaH complete e. AH RCA corrective action items will be issued as needecl in a report
the five-day RCA Methods course either on-site 01' at a public loca- to the personnel assigned responsibility for the items. The corrective
tion. It wi11 be at the discretion of the PA to determine which team action ltems will remain in the report until completecL
members receive the passwOl'd fOl' passworcl-protected analyses. 1'. U pela tes to the report can be fOl'warded to the Division Reliabllity
ii. PSM (Problem Soiving Methods): All RCA team members shall Coordinator as they are completed and wi1l be incorpOl'ated into the
successfully complete the one-day PSM training by a licensed next quarterly report.
PSM trainer. g. A progress report will be sent to the department manager fOl' revievi'.
40 Root Cause Analysis: Improving Perforry,ance for l3ottom-Line Resulls
, Creating the Environment for RCA to Succeed 41
1
~o ]""S
"O
B
e .oo~ ~ro
.~ "O
" "
ro
.~
"
-"
u
2 B
o
o'-< .B
FIGURE 3.1 TRPP Executive rvranagtment Roles ~o
~
",2;-
ro
"" "~
ero
,----
Clear the path Assure that the Resource
for improvement support systems !mprovemenl
.~ ~
\vork are working ~vOl'k ~ 2
u
i .o
o
> -<'"
~ U
" "B
o
"
E
o
8 o2
.~
."
%
o
"O
uo "ro
">'o "Oe :;o
o ro
"" u u
42 Root Cause Analysis: Improving performance far Bottom-Line Results
Fail U re Classification
4
,
~
" '1
an emphat1c no. When we work on problems we are essentially working to maintair
the status quo or performance norm. This is synonymous with the term reaction
'1::::_ '.:.:.;J._ .... k.\'J R..!,,- ;~~ '{' fc;,;' 1...) '::",,-
u .j r",\u:..I\ !) "'
We react when a problem occurs to get things bacle to their normal, status quo state
If all wc do is work on problems we will never be able to progress. In our clealillg~
with companies aH over the worlcl, we often ask the question, "How nmch time de
yon spencl reacting versus proacting in your daily routines?" M08t surveyed wil
answer 80% reacting and 20% proacting. If this is tme, then there ls very Httle
progress being made. This would seem to be a key inclicator as to why mas
procluctivity increases are minimal fmm year to year.
Let's consicJer opportllnities for a moment. \\fhen we work on opportunities de
we progress? The answer i8 yeso \Vhen we achieve opportunities we are striving te
raise the status qua to a higher level. Tberefore, to progress we have to begin taldn~
advantage oftbe numerous opportunities presented to uso So ifworking on problem~
is lilce reacting, then working on opportunities is like proacting.
44 Root Cause Analysis: !mproving Perforrnan~e for Bottom-Line Resu[ts Failure ClassH'lcation 45
A negative deviation fram A chance lo achieve a today do not want to take a lot oI' chances \,/ith their career, so opportunities-begin
a performance nOl'm I goal 01' an idealstate. to look like what we hke to caH "careel' timitlng" actlvltles. One ofthe top 10 causes
of human error ls "over confidence."
So with tbat said, we have to figure out a way of changing the paradigm that
reactive is always more important then pro active work. T11is means opportunities
are just as important, ir not more il1lpOltant than problems.
Let's switch gears ancl talk about the dlfferent types of faHures or events that
can occur. Inciclentally, when we talle about failures we are not always talking about
machines or equipment. "Failure" can also be unexpected patient deaths, operational
upsets, admnistrative cIelays, quality defects or eve11 eustomer complaints. There
Definitioll Graph
FIGURE 4.1 Problem Definition Graph FIGURE 4.2 ~u'e two baslc categories o' failures that can exist: sporaclic and chronic. L et's loo"k /~,.
~
at each of these eategories in greater detaiL
Opportunities
A sporaclic (to be usecl synonymously with acute) oeCUlTenee usually indicates
that a clramatlc event has occurrecl. Por example, maybe we had a frIe or an explosion
in om manufacturing plant, we just lost a long-standing contract to a competitor ol'
Status quo a patient diecl unexpectedly. These events tencl to demand a lot of attention - not
just attention, bul urgent ami immediate attention, In other words, everyone in the
organization knows sometl1ing bad has happened. The key characteristic of sporaclic
events is l-hey happen on1y once. Sporadie failures have a very dramatic impact when
lhey Qccur, which is why many people tend to apply financial figures lO them. FOl'
instance, you might hear someone say, "We hael a $10 million failure last year."
Sporadic events are very important, and they certainly do cost a lot of money
Problems when they oeCUL The reality, however, is that they do 110t happen very often. If \Ve .
had a 10t of sporadic events we eertainly would not be in business very long. SpOl'aclic ~~
FIGURE 4.3 Opportllllity Graph losses can also be distributecl over l11any years. For example, if the engine in your
,
CUT fails and you need 10 repInce it, it will be a very costly expense, but you can
The answer is simple. We should a11 start working 011 ~pportul1ities ancI disrega~cl amortize that cost over the remaining Efe of the cal'.
problems, right? Why can't we do this? There are mapy reasons, but a few are Chronic events on the other hand are 110t very dramatic when they occur. These I
obvious. Problems are more obvious ta us since they tak~ us away from 01.11' norlll~ll types oE events happen over ancl over again. They happen so often that tIley actnally d~~
operation. Therefore they get more attention and priority. We can always put ~n become a cost of cloing business. 'We become so proficient at wOl'king 011 these events
opportunity off untiJ tomOlTOW, but problems have to be; ac1dressecl today. There ~s that they actually become part oE the status quo. V'lc can produce our "normal" output
also the issue o' rewarcls. PeopIe who are gooclreactors, -who come in ami save tlle in spite of these events.
day, tencl to get pats 011 the back and the old "atta-boys." What a grcat thing fro~l1 Let's Ioole at sorne of the chru'acteristics of chronic events. Chronic evcnts are
the reactor's perspective: recognition, overtime pay and 1110st importantly, job sec~l aceepted as part of the routine. \Ve accept the faet they are going to happen. In a
dty. \Ve have seen 111any cases where the person who tries to prevent a prob1ern or manufaeturing plant, we will even aecount for these events by developing a main-
event from occurrng gets the cold shoulder whlle the person who comes in after tenance buclget. A maintenanee budgct ls in place to make sure that when routine
the event has occmreel gets treated lil:e a king. Not to say we should not revvard events occur we have money on hand to fix them. These types of events do, hO\vever,
excmplary reactors, but we also have to reinforce gooel proactive behavior as we11. demand attenlion but usually not the attention a big sporadic or aeute event would.
Then there ls the risk factor. \Vhich are more risky, problems or opportunlties? The leey characteristic of a chIOnic event is tlle frequency factor. These chronic
Opportunities are always more risky since there are many unlmowns. With problems events happen over ancI over again fOl' the same re asan ol' mode. Por instance, on a
there are virtually no unlenowns. We usually have fixecl tile problcms before, so we given pump failure, the beadng may fail three 01' four limes a yeru'. 01' you have a
certalnly have the confidence to fix them again. 1 once had a colleaglle \VIlo saiel, bottle flling line, and the botdes continuously jamo Both \Vonld be considered
"when you get reany goocl at fixing something, yon are getting way too much chronic events. Chronic events tend not to get the attention 01' sporadic events because
practice." In a perfect worlcl, we should have to pull the manual out to see what on their individual oceurrences, they are usually not very costly. ThereEore, rarely
steps to take to fix the problem. How many times do we a see a craftsman, 01' even would we ever assign a dallar figure to an individual chronic event.
a doctor fol' that matte1', pulling out the manual to troubleshoot a problem? People
46 Root Cause Ana!ysis: Improving Performance for Bottom-Line Results Failure Classification 47
Daily production cost ove1' a one-year period we \voulel see that their impact lS far more significant
10,000 that any given sporadic event, simply due to the frequency factoL
I
1tatus qua -1 Consider how aH of the events actually affect the profitability of a given facility.
---------------------T------ As we a11 know, we are all in business to make a profit. When a sporadic event
)~
occurs it actually affects the profitability of a facility significalltly the year that it
5,000
oecurs, bUl, once the problem has been resolved, profitability gets bacle to normal.
I
The dilemma with chronic events is that they llsually never get resolved so they
Chronic faHures
sporadiclrailures affect protability year after year. If we were to eliminate such events instead of
1 1 1 I I : : IIII!I 1 H just reacting to their symptoms, we could make great strides in profitability. Imagine
Time if we had ten facilities ancl we were able to reduce the amount of losses in arder to
obtain 10% more throughput from each ofthose facilities. In essence we would have
FI G U RE 4.4 The Linkage I the capadty of (me new facilily without spending the capital dolla.Is. That is the
power of resolving chronic issues.
vVhat 1110St people faiI to realize is the tremenclous effect thc frequency factor Let us give an example of a chronic event success story. In a large mining
has 011 the cost of clll'onic failurcs. A st9Ppage Oil a bo4ling Ene clue to a bottle jam operation the management wanted to uncover its most significant chronic events.
may lake only five minutes to COITcct when it oecurs. If it happens five times a day, This operation has a large crane or "drag hne." This drag line mines the surface for
we are looking at 152 homs of clowntime per year. I~ an hom of downtime casts lhe producto The product is then placed on large piles where a machine called a
$10,000, then we are looking at a cost of approximately $1,520,000. As we can see, bucket wheel moves up anel down the pile putting the procluct onto a cOl1veyor
the frequency factor is very powerful. But since we tC1ld to scc chronic events only system. This ls where the procluct is talcen dmvnstrcam to another process of the
in their individual state we sometimes overlook the ac1umulated cost. Jusi imag,'ine ope1'ation. One day, one of tbe analysts was talking to oue of the field maintel1ance
if we were to go into a facility and aggregate aH of thelchronic events over a year's representatives who saiel they spend a majarity oi' time resetting conveyor systems
time and multiply thelr effects by the number of occurrences. The yearly losses whose safety trip carel was triggcred. Thcy estimated that this activity toolc anywhere
woulcl be staggering. I : fmm 10 to 15 minutes to resolve per trip. Now this individual did not see this acdvity
Let's take a Iook at how chronc and sporadic even~s relate to the c1iscLlssiorf on as a "failme" by any 111eans. It was just part of the job he had to do. Upon further
problems ancl opportunities. Sporaclic events by their Clefinition take llS below i the investigation, it was discovered that other people were also resetting trippecl
status qua and tend to take an extended pedod of time l to restore. vVhen we restore conveyors. By their estimation this was happening approximately 500 times a week
we get back to the status quo. This is very ll1uch like yvhat happens when we fract to the tune of about $7 million per yeal' in lost production. Just identifying this as
to a problem. The problem occurs and we take some action to get back to the status an undesirable event allov,/ed them to take instant conective action. By adding a
quo. Chronic events, on the other hand, hapren so routinely tbat they actually bec~l11e simple procedure o' removing large rocks with a bulldozer prior to bucket wheel
pnrt of the status quo or the jobo Therefore, when they occur they do not tak~ us activity, approximately 60% of the problem went away. These types of stories are
below O1.1r performance norm. lf, in turn, we \oyere to eliminate thc cbroniG or not unCOl11mon. We get so ingrainecl in what we are cloing that we sometimes miss
repetitive evcnts, then the elimination would actually cause the status quo to impr9ve. the tlngs that are so obviollS to outsiders.
This improvement is the equivalent of realizing an opportunity. So by focusing on Similady in a hospital setting, we were looking at the number of times blood
cllronic events, eliminating the causes ancl not simply fixing the symptorns, wd are hac! to be rec!rawn in an emergency room of a 225-becl acute care hospitaL At the
really working on opportunitles. As we said before, when we work on opportun!ties conc1usion of our opportunity analysis (to be discussed in detail in Chapter 5), we
thc organizaton aetually progresses. fouml that ]0,0,13 bIood rec1raws were taken in the past 12-1ponth periodo Next we
Now that we know that eliminating chronic events can cause the organiztion ag,gregated the average costs per bIood redra\v. Thesc costs inelude things hke the
to progress, we llave to look at the significance of chronic events. Sporadic ev~nts costs for syringes, gauze, tech time, transport time, opportunity costs for the real
by thcu: very nature are high profile and high cost events. But we can amortizc 1hoso estate in the operating room, etc. V'/hen compilecl, we found that on average each
costs over a long period of time so the effect is not as severe. Consicler i1' the engine bIood redraw was costing $300. The math is simple from this point on. We multiply
in yom car bIew up and you had to rep1ace i1. To the average 1110tarist this would 10,000 rec1raws times $300/reelraw and we uncover a whopping $3 million worm of
be a sporadic event. But if we amortize the cost over the remaining life of tho cal' hidc!en losses. On any individual occurrence no one sees tls as a faUme. It is viewed
it becomes less of a burelen. Chronic events on the other hand llave a relativelilow as a cost 01' doing business. This is the power of evaluating chronic failures.
impact Oil their individual basis, but we often overlook their true impacto If we were To wrap this up \Ve will encl wth yet another story. We were working with a
to aggregate a11 of the chronic events from a particular facility and look at their total majar oil company that was trying to reduce its maintenance budget. They h.iJ:ed our
48 I<oot Cause Analysis: Improving Perform\lnce far Bottom-Line Results Failure Classification 49
!
firm to teach thcm tl1e methocls being explainecl in t11is text. The manager opened So to sum up this discussion on failure c1assification, let's look at the key ideas
thc three-day session by stating t11at he had been mand~ted by his superiors to reduce presented. We live in a world 01' problems and opportunities. V/e would all love to
th~ main.tenanc~ ~u.clget signiican~ly, He t~lcl the~1 t,h1t the maintemmce bLld~ct for takc advantagc of every opportunity thal carne about, but it seems as if there are too
ttns parllculax tacll1ty was approxllllate1y $250 mllllO~1. He "vent on to expla1l1! tl1at many problems confronting LIS to take advantage of the opportunities. A good way
somc analysis \Vas clone 011 the buclget to fincl out how the money was being spent. to take ac1vantage in a business situation ls to eliminate the cmonic or repetitive
It tllrns out that 85% of the money was spent in i11crements oE $5,000 or lcs~. So events th<1t confront us each and every duy. By eliminating this expensive, nonvalue-
ir
by his cstlmation he was spending abOl~t $212 millian chronic maintenance 10$ses.
This was just maintenance cost, 110t 10st production q)st. :
added work, we are really achieving opportunities as well as adding additional time
to eliminate more problems. In the next chapter \Ve will discuss a method for
So he teIls the 25 engineers in the: training class J1e has two options to recluce uncovering all 01' the evcnts for a given process and delincating which of those events
this maintenance cost: I . are the most signiticant from a business perspective.
l. He can eliminate the need to do the work in the first place or,
RCA AS AN APPROACH
2. He conle! jnst eliminate maintepance jobs. I
~
__________-+1_____.----
, I
component of RCA that we discussecl earIier.
Also, many do not realize that the chronic types of events are actually precursors
TABLE 4.1 to the sporadic cvcnts. It is om expericnce that when reviewing the sporaclic inves-
Options to Reduce Maintenance Budget tigations tlUit we have been involved in over the past 20 years that rarely do we find
revelations. M.ost of the time we find the true latent causes to be systems that are ,4-
in place and have been the norm for some time. They have been chTonically accepted
over the years to the point that no one questions them anymore.
AH it takes 18 one trigger, one decision, to make a chronic event a sporadic onel
This was clemonstratecl on the space shuttle Challenger as the o-ring design ftaws
were known from the beginning. That chronic problem existed for years and was
an acccplable risk according lo the flight reacliness plan. In the Challenger Disaster
Final Repoft this gradual deterioration oY standards was refelTed to as normalization
of deviancc. Only \\1hen lhe decision was made to launch at 36F (1SF colder than
any other llight), dicl tbat chronic failure become a sporadic one. Bridging this to
our working environments: Can't this happen to us? Doesn't this happen to us?
Opportunity Analysis:
~0~N-Jr/--""j
\-:.- With alJ the noise and clistracton 01' a reactive work envronll1ent it is sometlmes easy
to overlook the obvious. For instance, if we wanted to perfonn a Root Cause Analysis
I t. _~.r\ L..-
(RCA) cm an event, would we know which event was the most significant or costly?
1\ e':)-\. Lb \..
Experience demonstrates thut we \vould not. In a reactive environment, we natural1y
become 1'ocused on the short-term. We tend to loole at the problems or events that just
happened and naturally thinle they are the most significant. This is a problem because
what happened yesterday, in ll10st cases, ls llot the most significant or compelling
issue. \Ve need to talce 11 more macro loole at the situation. For these reasons 'vve must
depend on the strategy development process described earlier to ensure that \ve m'e
working on the evcnts that truly adcl value to the bottom line of the business.
In arder lo determine where om most significant issues are we should employ
techniques that will allow us to look objectively at all the historical events contrib-
uting to our performance or lacle thereof. Failure Modes and Effects Analysis or
FMEA was developecl in the aerospace industry to determine what failure evcnts
could occur within a given system (e.g., a new aircraft) flnd what the assocated
effects wonId be if those evenls did incleed occur. This technique, albeit effective,
ls very man-I1ouf intensive. It ls estlmatecl that a typical FMEA in the aerospace
inclustry talees numerous man-years to perform. There are many goocl reasons why
this technique talees so long to perform as well as significant benefits to this inclustry.
However, this technique.ls far too laborious to be pcrfonned in most industries such
as the process ancl discrete manufacturing arenas. Therefore, we had to take the basle
concept ancl make it more industry friendly. When cliscussing this modified FtvlEA
technique we wiII refer to ir as opportunity analysis or OA.
Before we cantinue \vith the cliscllssion on how to develop an opportunity
analysis, let's first talle aboul why you would \Vant to perform one in the first place.
There really are two basic reasons to perfonn an opportunity analysis. The first and
foremost is to malee a legitimate business case to analyze one event versus another.
In other words, it creates the financial or business reusan ta show a listing of a11 the
events within a given organization or system ancI delineate in dollars and cents, why
you are choosing one issue versus another. lt :llows the analyst to speak in the
languagc 01' business.
The second compelling reason ls to focus the organization on what the most
significant events really are, so that quantum leaps in procluctivity can be made with
fcwer of the orgallizatiol1's resources being utilized. Experience again has shown
that the Pareto PrincipIe l works with such events just like ir does in ather arcas.
It goes something lilee this: 20% 01' les s of the undesirable events that we uncover
In TabIe 5.1, we begin by looking at the turbine engine subsystem. 'tie begin
listing a11 oE the potential Eailllre modes that might occur on the turbine engines. In
this case, we l11ight determine that a turbine blade could fracture. We then ask what
the eHects on other items within the turbine engine subsystem might be. If the blade
were to reIease, it could fracture the other turbine blades. The effects on the entire
system, 01' the aircraft as a whole, wOllld be 10ss of lhe engine aneI reduced power
and control ol' the aircraft. V,re then begin examining the severity of the failure mode.
Wc will use a simple scaIe of 1 to 10 where "1" is the least severe and "10" ls the
most severe. We llave slmplifiec1 I"his for explanation purposes, but a traditional
FMEA analyst would have specific criteria for what COl1stitutes severlty. In this
example, we will say that losing a turbine blade would constitute a severity of 8.
FIGURE 5.1 A.ircraft Subsystem qiagram Now comes lhe probability rating. \Ve would have to conect enough data to determine
the relative probability oi' this occun-ence based on the design of our aircraft. We
by conclllcting an in-depth opportunity; analysis wi1l1~epresenl approximately 80% will assul11e that the probability in this case is .02 ol' 2%. The Iast step is to lTIltltiply
ofthe losses 1'or that organization. You m<ly have hcard its also called tIle 80120 "Ll1e. the severity times the probability to get a criticality ratil1g. In this case the rating
V/e will talk more about the 80120 ne later in this cllapter. \Voule! be calculated as follows:
As "ve mcntioned befare, the FMEA technique w<15 developed in the aerospace
industry unc1 we will refcr to this as the traditional FfvIEA methocl. Modifications 8 x .02 = 0.16
are necessary lo malee the traclitional FMEA more appcable in other organizations.
Therefore, based 011 the modifications that we will exblain in this chapter, we will Severity X Probabilty = Criticality
call this technique the opportunity analysis. The key! difference betwecn thc ': two
EQUATlON 5.1 Sample Criticality Equation
mctl.l0dS is that the traditional method is probabiliSC'j' meaning lhat jt is looking at
what cmd happen. :
This means that this line itel11 in the FMEA has a criticality ratng of .016.
In contrast, opportuny analysis looks 0111y at histOl:ical events. We list on1y ems
We wOllld then repeat this process fOl' all of the failure modes in the turbine engines
that have actually happened in the pasto l::"ar the historic;al method, \Ve are not eX~lctly
ancI all of the other majOl' subsystel11s.
interestecl in what might happen "tomolTow," as we me in what did happen yeste~day.
Once all of the items have been identified it is now time to prioritize. We
Let's take a look at a simple example of both ":a traditional FMEA an~[ an
wOlllcl sort our critica1ity colul11n in descending arder so that the largest criticality
Opportunity Analysis. Our intention is not to develop ',experts in traditional FJ\llEA,
ratings would bubble up lo the top and the smaller ones would faH to the bottom.
as it is to give a general understanding of how FMEA and hence Opportlfnity
At s0l11e point the analyst \vould make a cut specifying that aH criticalities below
Analysis were derived. In the aerospace industry, we woulcl perform a tracliti;onaJ
a certain nUl11ber are delineated as an acceptable risle, and all aboye neea to be
FMEA on a new acraft that is being developed. So the first thing we 111ight ~o lS
evaluated to determine a way to reduce the severity aud more importantly the
to break the aircraft clown into smaller sllbsystems. So a typical aircraft \Voulel huve
probability of oeCllrrence.
many sllbsystems sllch as the wing assembly, instrumentation system, fLlse~age,
Bear in mincl that this is a long-term process. Thcre is a great deal of attentlon
engines, etc,
placed upon detenninlng all of the possible failme modes and even greater attention
From there the analysis would look at each of the subsystems amI clcterrnine
paiel to substantiating the severity and probability. Thousancls o" hours are spent
\vhat failure modes might occur and if they did, what would be their effects? Let's
running compol1ents to failure to determine probability and severity. Computers,
take a look at a simple example in the foHowing tabIe:
however, have helpeel in tl1is endeavor, in that we can simulate many occunences
by building a computer moclel and then playing "what if' scenarios to see what the
TABtE 5.1 effeels wOllld be.
Traditional FMEA Sample We do not have the time Ol" resources in business, healthcare and industry to
perfonn a thorough traclitionaI FMEA on every system. Nor does it malee economic
sense to clo so on every system. What we have to do is modify the trac1itional FMEA
process to help LIS to uncovcr the problems and failures that are currently occurring.
This allows liS the ability to see what the real cost of these problems are and how
they are really affecting ou!" operatiol1. Let's loole at a simple example.
54 Root Cause Analysis: !l1lproving Performance fOI" Bottom~Line Results Opportunity Analysis: "The Manual Approach" 55
events that were tbe greatest contributors to 10st production and pelform a- disciplinecl
Root Cause Analysis (ReA) to determine the root causes fDI their existence.
Now tbat we understand the overall concept of FMEA, let's take a detailed Iook
, -;:-;----
at the steps involved in conducting an opportunity analysis. There are seven basic
steps involved in conducting an opportunity analysis:
Convey Fill Convey Paclcage fill
empty
bottles
~ empty:
bottles
~ filled
bottles
r bottles in
boxes l. Perform preparatory work
~
2. Collect the elata
lf\.. \ .J~~_~:
->(
! i
Convey iStack
3. Summarize and encode results
IMOve to
4. Calculate 10ss
filled hoxes on f- warehouse
boxes pallets fo~ shipment 5. Determine the "Significant Few"
.
6. Validate results
F1GURE 5.2 :Sample Lubncarits Plant 7. Issue a report
Consieler that we are ILlnning a lubJica11ts P1a11tll' In ths plant \Ve are cloing the
following: SUP 1: PERFORM PREPARATORY WORK
As with any analysis, there s a certan amount of preparation work that has to tak:e
L Creating the plastic bottles f?r the IUbricantl
place. Opportunity Analysis is no different, in that it also rcquires scvcral up-front
2. Convcying the battles to the :filling machine] to be filled witb lubricajlt.
tasks. In order to adequately prepare to perform an opportunity ana1ysis you must
3. COl1veying the filled battles to the packagngjProcess to be boxecl in c(~ses.
accomplish the following tasks:
4. Conveying the filleel boxes to be put anta paUets.
5. Moving the pallets to the warehouse where lhey await shipping.
Define the system to analyze.
I
Define undesirable evento
The next step is to determine a11 of the undesirble events that are occm(ring in
Draw block diagram (use contact principIe).
each of our subsystems. For instance, if we were looking al the 1111 empty:'bottles
Describe the function o' each block.
subsystem, we wouId uncover all of the undesirable Jevents related to this subsystem.
Caleulate the "GAP."
Let's look at this simple example: I ,
Develop preliminary lterview sheets ancI schedule.
Opportllnity Analysis Une ltem Sample Before we can begin generating a list of problems, we have to decide which system
to ::tnalyze. This may sound like a simple task but it does require a fair amount of
Fill Empty BoUles BoUle Stoppage Sattle Jam 1,000 $150 '$150.000 thought on the analyst's part. When \Ve teach this method to our students, their usual
response is to take an catire facility ancI make it the system. This ls a preseription
for disaster. Trying to cle1ineate aH of the failures and/or problems in a huge oil
The idea is to delineate the events that have occurred that causecl an upset in
refinery for instance would be a claunting task. V.,rhat we need to do is localize the
the fin empty bottles subsystem. In this case, one 01' the events woulcl be h bottle
system down Lo one system within a larger system. For instance, a large oil refinery
stoppage. The mode of this pmticular event is that a bottte became jammecl in the
is comprised ofmany operating units. There is a Crude Unit, Fluid Catalytic Cracking
filling cyc1e. It occurs approximately 1,000 times ayear or abont three times a day.
Unit (FCCU), Detayed Coking Unit (DCU), and many others. The pruclent thing to
The approximate impact for each occurrence is $150 in 10st productioll,. Ir we
do would be Lo select one unt at a time anclmake that unit the foeus of the analysis.
mllltiply the freqllency times the impact for each occurrence, we \-voulcl come to a
For example, the Crude Unit woulcl be the system to study and then we would break
total 1088 of $150,000 per year.
the Crucle Unit into many subsystems. In other words, we should not bite off more
If we were to continue the analysis, we woulcl pursue each of the subsystems
than we can chew when selectil1g a system to study. \Ve have seen many cases where
delineating aH of the events and modes that have caused an upset in their respective
anaIysts f1rst do a rOllgh cut to see which area of the facility either comprises a
subsystems. The end result would be a listing of all the items that contribut~ to lost
bottleneck al' ls expencling the greatest amount of expense.
production and their respective losses. Based on that listing, we woule! select the
56 Root Cause Analysis: lmproving Performance for Bottom-Line Resu!ts Opportunity Analysis: "The Manual Approach" 57
DEfiNE UNDESIRABLE EVENT I the calendar dicta tes it. Instead of perfollning these planned shutdowns on a time
basis, maybe we shollld consteler usi.ng a conditional basis. In other words, let the
This may sound a little silly, but w~ have to clefin& exactly what an "undesirabJe condition 0-[ the equipment dictate \vhen a shutdown has to takes place.
event" 15 in OUT facility. During every seminar that we teach on th18 subject, we ask This idea of looking at planned shutdowns as an undesirable event is not always
the students in class to write clown their c1efinition of an undesirabJe event at their obvious 01' popular. But il' we are in a sold-out position, we must look at anything
I
facility.
.
Just abant every time, every 'stuclcnt has a differenl
I
clefinitiol1. The f~ct ls if that takes LIS away from our ability to run 8,760 hours ayear at 100% throughput
we m'e going to calleet event data, everyone involtecl must be using a con:sistent rateo Now let's consieler a eliiterent scenario. In many facilities, we have spare
clefinitiol1. If \Ve are collecting event data ancI there i5 no stanclardized clefipition, eqllipment,just in case the primary piece of equipment fails, It 1S sort of an insurancc
then everyone will give us their perceptions ofwhat lnclesirable events are occurring policy for um'eliability. In this scenario, if the primary equipment aileel ancl the
in their work areas. Far instance, if we ask a machinc operator what undesirable spared eqllipmenl "kicked in;' woulcl this intel1'llpt the continuity of maximum
events he sees, he will probably give us pmcess~g type evcnts, a maint~nance qllality production? Proviciing the spare functions properly, the answer here would
mechanic wiII probably give us machincry-relatecl e\lents, whereas a safely erigineer have to be no. Since we hacl the spare equipment in place and operating, we did not
would probably give all of the safety issues. The clilclmma here is that we lose foclls lose the prodllction. That event would BOt enc1 up on our list because it diel 110t meet
when we cIo not have a common clefinition of an mJcJesirable event. our de{inition of un l1nelesirable event. This is also a hard pill for some of us to
The key to making an effective definition o-f ad: undesirable event is to ~nsl'lre swallow. But that is the tough part abont focusing. Once we define what an unc1e-
that the definition coincides with a particular busi\1ess objective specifiecl 'in the sirable event ls, we must list only the events lhat meet that clefinition.
strategy map, Por example, if we are in a sold-out position ancl OHr objectlve is to Let's consieler the clefinition, "an undesirable event is a deviation fram lhe status
increase prodllction utilizaton, then om definition sliould be based primarily (~round quo." This de/lnition has many problems. The primary problem is, "\Vhat happens
continuous procluction or lirniting c1pwntime. Let'sl take a loole at some common if you have a positive eleviation'?" Should that be considered a failure? Probably noto
dellnitions that we have run aeross over the years, ~ome are pretty goocl ancl some Haw about the words "status quo"? For one thing, status quo is tar too vague. If we
others m'e unacceptable. An llndesirable event is: ' were to ask "] 00 peopIe to describe the status quo oE the United States today, they
woulcl all glve us a diHerent answer. Plus the fact, the status quo does not always
Any los s that interrupts the continuity of maixirnum quality procluctibn mean tl1at things are gooel; it just says that things are the way they are. If we were
! ;
A 10ss of asset availability to rewrite tl1at derinition, it would make mOfe sel1se if it looked like this:
The unavailability of equipment
A deviation from the status quo An unclesrable event lS a negative deviation l'ram 1 million units per day,
Not meeting target expectations
Any secondary defect So why bother with a definition? It serves 111llltiple purposes. Firsl of aH, we
cannot perform an Opportunity Analysis without i1. But in our opinion, that lS the
The first one lS "An undesirable event ls any 10ss that interrupts t11e conlin~lity of least important reaSOI1. The biggest advantage of an agreed upon definition is that
maximum quality procluction." This is a pretty good deflnition and one that yve see it fosters precise communicatiol1 between everyone in the facility. It gets people
and use quite frequent1y. Let's analyze this deflnition. In most manufacturing fa9ilities, focllsed on the most important issues. In short, it focuses people on what is really
we often take om processes offtine to do routine maintenance. The Cjuestion be:pomes important and that we are adhering to the strategy defined in the strategy map.
whcn we take these planned shutdowns, "Are we experiencing an undesirable event When we devise a dennition of an undesirable event, we neeel to make sure that
basecl on the 11rst definition above?" The answer is an emphatic yes! 1'he clef~nitjon it ls short ancl to the polnt. 'Ve certainly would not recommend a definition that is
states any 10ss that inlenupts the continuity of maximum quality production ls clpemed several paragraphs long. A good definition can and should be abont one sentel1ce.,c:ur
an undesu:able event. Even if we plan to take the machines out ol' servicc, it still de11nition ShOlllc1 acldress on1y one business objective at a time. Por example, a defimtlOn
interrupts the continuity of maximum qllality prodllction. Now we are not sayi\1g that that statcs "An llndesirable evcnt is unything that causes downtime, an injury, an envi-
we should not take periodic shutdowns for maintenance reasons. AH we are sug&esting ronmental excursion and/or a quality defect" is trying to capture too many objectives
ls that we look at them as undesirable events so that we can analyze iE there:ls any at one time, which in turn will cause the analysis to lose ocus. If we fed the need to
way to stretch out the intervals between each planned shutclown aneI reclucig the Iook at each of those issues, tl1en we need to peti:orm separate analyses for each of
amount of time a planned shutdown actllally takes. For instance, in many inclstries, them. It may take u HUle Ionger, but we \:viII maintain the integrlty of the analysis.
we still have what we caH annual shutclowns. How often do \ve have an animal Last but not least, it ls important to get decision makers involved in the process. ~~
shlltdown? Every year, of course! It suys so right in the name. ObvioLlsJy, the gov- We would recommend having S0111eone in authority sign off on the definition to glve
ernment aneI other 1egislative bocHes regulate some shutclowns such as pressurc vessel it some credence aneI clouL If we are lucky, the person in authority wl even modify
inspections. But in many cases, we are doing these yearly shutclowns just because the cJefinition. This wlll, in essence, create buy-in fmm that persono
58 Root Cause Analysis: lmproving Performance for Bottotn-Line Resu!ts Opportunity Analysis: "Th e Manual Approach" 59
, I
The interviewing process, as we have learnecl over the years, is rcally an art It is important to clevelop a strategy to draw out quiet participants. There
fonn more than a science. Whcn we first su"ted to intcl'view, we 80011 learned that are many quiet people in our workforce who have a wealth oE data to
il can sometimes be a difficult task. 1t is like golf; thF more we practice proper share but are not comfortable communicating it to others. \Ve have to
teclllliquc the beltcr the final results will be. An inter.riew is l10thing more tban make sure tIlat we draw out these quiet interviewees in a moderate and
getting information from one individual to another 3d cleady and accurately'as inquiring manner. We can use nominal group techniques where we ask
possible. To tbat end, hefe are sorne suggestiOl1S that will help you to become a each of the people to whom we are tallng to write theiT COllunents clown
more effective interviewcr. Same ofthese are very specifiC to the opportunity 3n111Y0is on an inclex card and then compile the list on a tEp chart. 1'his gives
process, but others are generic in t11at they can be applieclJto any interviewing session. everyone the same chance to have their comments hearcl.
i
Be very careful to ask the exact same lead qestions lo each of the
Be aware ofbody language in interviewees. There is an entire science behind
boc!y language. It 1s not important that we become an expert in tbis area.
interviewees. This wi11 ellminate the possibility of having clilTcrent However, it ls important to lmow that a substantial portion of human COl11-
answers clepending on tlle interpretation of tlle! question. Later we can' munication is through body language. Let the body language talk: to uso For
expand on the questions, if furtller c1arification is necessary. We can use instance, i1' someone sits bacle in a chair with their ru-ms finnly crossed, he
I
om undesirable event definition and block flo'r diagram to lceep the, may be apprehensive and not feel comfortable providing the informaton
1nterv1ewees focused on the analysis. : that we are asking foro This should be a clne to alter our questioning
IYlake sure that lhe partlcipants know what an O~?portunity Analysis is, as techniqlle to make lhat person more comfortable with the situatlon.
well as the pm})ose and structure of the interviets. If we are not careful, In any set of lnterviews, there will be a number of people who are able
the process may begin to 100k more like an intelT0gation than an interview: to contribllte more to tlle process than the others. It is important to make
to tile interviewees. An excellent way to malee obr intcrviewees comfort-, a note of the extraordinary contributors so that they can assist you later
able with the process is to concILlct the il1lerVie\1s in their work environ- in tlle analysis. They will be extremely helpful if you need aclditional
ments insteacl o' ours. For instance, go to the break area or lhe shop to event infonnation for valiclating the finished opportunity analysis, as well
ta1k to these people. People will be more forth!!:oming if they are COl1l-' as assisting when you begin the actual Root Cause Analysis CRCA).
fortable in their sUlToundings. I I Remember to use om undesirable event clefinition and block diagram to
A1low the interviewees to see your notes. This \~ill set them al ease slncei keep interviewees on track if they begin to wander off of the subject.
they can see that the infonnation they are proyiding is being recordedi We should strlve to keep interview sesslons relatively short. Usually abont
accurately. Never use a tape recorder in an oppfxtunity analysis session one hom is sllitable for an interview session. This process can be very
because ir tends to make people uncomfortable! ancI less likely to share) intensive and peopIe can become tired and sometmes lose their focns.
infonnation. Remember, this is an infonnation gathering session ancl not! This 18 dangerous becal1se it begins to npset the validity of the elata. So as
' !
an mtelTogatlOll. i a rule, one hour of interview 1S plenty.
If we do not llnderstancl what someone is telling us, let them use a pen:
to draw a simple cliagram o' the event for further understanding. If we!
still do nol understand what they are trying to describe, then we should:
sur 3: SUMMARIZE ANO ENCODE DATA
go out to the actual work arca where the problem is occurring so tilat we: At this stage, we have generated a vast amount of data from our interviews. We now
can actually visualize the problem. ' have to begin summarizing this information for accuracy. \Vhile conducting our
Never argue \vith an interviewee. Even if we do not agree with the person,' interviews, we will be getting some redundant data from different intcrviewees. For
ir is best to accept what they are saying at face value and cloLlble check! instance, a person fmm the night shift might be giving 1.1S the same events that the
it with the information from other interviews. The minute we become' day shift person gave uso So we have to be very careful to sum111arize the infonnation
argumentative, it reduces the a11101.1nt of information that we can gel fronI and encocle it properly so that we do not have redundant events and are essentially
that persono Not only will that person not give us any more infonnation, "double elipping."
but chances are he or she wi11 alert others to the arg1.1ment ancl they wiII The easiest way to collect and summarize the clata is to enter it 1nto an electronic
not want to pmticipate either. spreadsheet or database like Microsoft ExceJl or Microsoft Access', Of course we
Always be aware of interviewees' names. There is nothing sweeter to a couId certainly do this manually with a pencil and paper, but if we have a computer
person's ears than the sound of his own name. If you have troubJe rcmem.,
bering, simply write the names down in front ofyou so that yon can always 1 Microsoft Excel is a regis!ered trademark of the Microsoft Corporation
refer to them. This gives any interview 01' cliscllssion a more personal feel. 2 Microsoft Access is a registered tradcl11Mk 01' the Microsoft Corporation
64 Root Cause Analysis: Improving Performance for Bottom-Line Results Opportunity Analysis: "The Manual Approach" 65
available, \Ve should take the opportunity to use it. It will save many homs 01' In this example, we are looking at the recovery subsystem ancl we have sarted
frustration performing the analysis manually. Once >ve have input all of tlle infor- by the rccirculation pump fails. Four different people at four sepamte times described
mation lnto our spreadsheet, we llQ\V have to look' any redunclancy. We should these events. ls there any redundancy? The easiest way to see is to 100k at the modes.
always remember to use a logical co'ding system inputting information lnto a In ('his case wc have lwo that mention the worc1 bearlng. The seeond is oil contam-
compllter. Once we define what that logical system is, we stick with it. nation. The interviewec was probably trying to help us out by trying to give us their
Otherwise the computer will be unable to provide \Ve are trying to a~hieve. opinion 01' the cause o' the bearing failurcs. So in essence the first three events are
Let's take a look at the following ex'ample to help understand logical cod;ing. reaUy the same event. So we will have to summarize the three events into one. This
is what llnght look like after we summarize the items.
TABlE 5.4
Logical and lIIogical Coding TABLE 5.6
Example of Merging Like Events
Logical Coding
TABLE 5.5
Example of Summarizing and Encoding Results In this examplc, we are simp1y multiplying the frequeney per year times tIle
impact per OCClllTCnCe, which in this casc is in number of llnits. In other words,
~
Recave!'}
Recirculation
Recirculation
Pump
Pump
Fails
Fails
Bearing Lacks Up
OH Contaminatian
12
6
12 Hours
1 Day
when eaeh of lhese modes occurs the impact is the nllmber of unlts 10st as a resulto
Notice that the last eoIul1ln is totalloss in dollars. \Ve simply multiply the nllmber
Recovery Recirculation Pump Fails Bearing Fails 12 12 Hours of lost units by the cost of each unit to give a totalloss in dollars. That's a11 there
Recavery Recirculation Pump Falls Shaft Fracture 1 5 Days is to it!
66 Root Cause Analysis: lmproving Performance far Bottoll1-Line Results Opportunity Analysis: "Th e Manual Approach" 67
significltllt. \Ve have aH hemd 01' the 80/20 rule, but wl1at does it rcally mean? This
rule is sometimes referred to as the Pareto PrincipIe. Thc name Pareto comes from
the early 20 th cemury Italian ecol1omist who once saicl tf1al, "In any set or collection
..
of objects, ideas, people, and events, a few within the !sets or collections are more ~~j1bl~y','~,m\:-';l'; :.i~;i-;{S' , ::y,ilt<\: -;<~'Y"\ ' iM'~-qe:",-;' "~o,;.
;, iJ~i.e:j;e!i;:;Y ,--,--:;';;li:iip~(';l?'-i-.'.Jo~!\t::9sS-'-';_;1
Sub Syslem A E\~nt 1 rv10de 11 30 $40,000 I
significant tban the rcmaining majority:' This rule or pl~inciple
$1,200,000
,
demonstrates lhat in Sub System A E\~nt 2 Mode 7 4 $230,000 $920,000
our worlcl, somc things are mOfe important than others. ,Let's look at a few examples Sub System B E\,lOnt 3 Mode 1 365 $1,350 $492,750
Sub System A Ewnt 2 Mode 5 10 $20,000 $200,000
of this rule in action: ' Sub System A E\~n 2 Mode 8 iD! $10,000 S100,000
Sub System B Event 5 Mode 6 --,, $2,500 $87.500
Banking Inclustry: In a bank approximately 2091) ol' less 01' the CLlstomers Sub System B Ewn14 Mode 4 1,000 $70 $70,000
Sub System A Evanl4 I Mode 12 8 S8,OOO I $64,000
accoLlnt l'or approximately 80% or more 01' theiassets in that bank. Sub System 8 E\oI:lnt 6 Mode 10 6 $8,000 I $48,000
Hospital Inclustry: In l hospital approximately ~O% or less 01' the patients Sub System e E\oI:lnl4 [I.'lode 13 4 5;7,500 $30,000
60%
Think about ho\V tIle rule applles to evcryclay living. We probably a11 are gullty
01' wearing 20% or less of the clothes in our closet 80% of the time. We probably aU 50%
havc J toolbox in which \Ve use 20% of the tools 80% of the time. V/e spend alhat ~
.9
40% J1---1+---~.~~~~~~~---~-
moncy on a11 those exatic taals and mast rcpairs require the screwdriver, ha111111 amI ~ 30%
-Ih--I~---
o
a wrench! V/e are all guilty ofthis! The rule even applies to bllsiness. Take fOl' inst~nce 0<0: 20%
a majar airline as clescribecl aboye. It is not the once ayear vacaoner who generates lO%
most of ils revenue, It is the guy who fiies every Nlonclay moming and returns e:very 0% ~ Ilh1mr''--r''"l=-r.,--
1,liIr iJ 8.8 n e.,",""_,
...,"T,-r,,"""Y-'Y'Y'r'Y'T'Y'Y' ti i [ i i [
Friday afternoon. So it ma}(es sense that very few of the airline's cllstomers reprcsent ~----_.j>- Events
most of its revenue and profits. Have you ever wondcred why Prequent Flyer programs 80% of 1055 20% of loss
are so important lo an airline? They knm\' \vhom they have to cater too
Lel's take a lool\: at the following example to determine exacy how lo ta!(C a FIGURE 5.5 Sample Bar Chart oi' Opportunity Analysis Resulls
list 01' cvents ancI nanow it down to the "Significant Few,"
In arder lo gel the maxlmum effect it tS always wise to present this information in
altcrnate ['mms. The use of graphs and charts will help us to e'fectively communicate
Step 1: lvlultiply the frequcncy column times the impact column lo gel 'a
this information lo others aroLll1U USo Here is a sample bar chart that takes the
total annual los s figure,
Step 2: Sum the total annualloss eolumn to obtain a global totalloss figure spreaclsheet data above and converts it into a more understandable formaL
for all the events in the analysis,
Step 3: Multiply the global total los s figure liom Step 2 by 80% or 0.80. sur 6: VAlIDAU RESULTS
This will give us the "significant few" los ses amOllnt.
Step 4: Sort the total 1055 colunm in cleseencling order so thal the largest Althollgh our anaJysis is almost l1nshecl, there is still more lo aceomplish, \Ve have
events bubble up to the top. to verify that our iindings are accurate, Our opportunity analysis total should be
Step 5: Snm the totalloss amounts from biggest to smal1esl untll you reaeh relative!y close to our gap that we defined in om prcparatory pilase. The general
nlll~ ;~ nh~ 'll' t11inll~ 1n o/,., nI' thp ,H\n
68 Root Cause Analysis: lmprovillg Performance for 8ottom-Line Resu!ts Opportunity Analysis: "The Manual Approach" 69
If we are \Vay L1nder that gap, we have either missed some events or undervalucd
them, or we do not have an accurate gap (actual veJ~sl1s potential). Ir \ve \Vere lo
overshoot the gap, we probably dicl not do as gaocl a jobi at rcmoving [he rcdundal1cies
or \\'c have simply overvalueclthe los s contribution. ' 1
Explain the Analysis: 1vIany of our rcaders may be unfmniliar with tIle
opportunity analysis process. Therefore, it 1S in our best interest to give
them a brief overview of what an opportunity analysis is ancl \:vhm its goal
ancl benefits are. This way, they wi1l have a clear unclerstanding of what
they are reading.
Display Results: Provide severa] charts to represent -he data that our analy.~is
uncovered. The classic bar cllart clemonstrated earlier is ccrlainly a mininial
requi..rement. In aclclition to supporting graphs, we should provide al! the
details. This incllldes any and all worksheets compilecl in the lnalysis.
Acld Something Extra: We can be creative with this information to provicle
further insight into the facility's neecls by determining other areas of
improvemcnt othcr than the "significant few." For instal1cc, we couId
break out the results by subsystem ancl give a total 10ss figure for each
subsystem. The manager of that arca would probably find that infonnation
very interesting. We could aIso show how ml1ch the facility spenl on
particular maintainable items (e.g., compoilcnts) likc benrings or seals .
. ,.
70 Root Cause Analysis: lmproving Performance far [3ottolll~Une Results
v- What kind 01' elata shoulel be collected when an ev~nt occurs? TabIe 6.1 is a tabIe
of commOll elata items lhat shouId be collecteel ror
miy evento
The lst is by no means complete but it is a goocl basis for getting a good event
reporling system off the ground. Most asset performance ,
KPIs coulel be calculatce!
\vith dala from lhis list.
Slanclardization, and ir has developed a standard approaeh rm the eollection and cxc!1ange j
>
I.W
of reliability and maintenance data for equipmen(. YOll can !i.ncl out more abou( ISO and
:>.-
the 14224 standard on their \Vebsite at \Vww.iso.arg. A greal \Vay lo train personncl e
in 111is is tl1rough the use of scenarios. The groups of elata colJeetars are presented with ~
the various cocles ancI their definitions. They are then subjeclcd to a variety 01' event $:
~
u
scenarios to test how they would use the cocles in a variety 01' commOIl situations. ~
Last, but not least, a comprehenslve work11O\v will need to be establisheel to oU
collcet the elata described aboye. Essentially a11 array 01' "W" Cluestions neecls to be o
~
formulated ancl answercd. For instance: ~
E
-""
~
\Vc wili answer rnanv of these workflow questions \vhcn we disCllSS clala col-
TABLE 6.2
lcction systems. As a pr~lude to this, what many people do is to try lo use th~ir
CMJ\1S syslem as tbe initial workflow lo calicet some or
the data, and thcn c!evJse
Common Data Fields
a sllpplemental world1ow to get the remaining data itms. T~1is. j.s certail.11y ~ne AssetID ~,l;iiIllclla!lce Slarl [ilHerriIll~
metl10d uncl rnay be one 01' the most cffective since somc lecy relmblllty data J5 bCll1g
generated through the use 01' the maintenance system.
\Ve are going 10 discuss a method for trans1'crring pata from cXi,sn g Comput- Cause evcnts. The next step is to Transfer this data into an APIvIS so that tbe data
erized Maintenance 1Vlanagement Systems (CMN1S) into: an Assel PerJ"ormance Man- c~n be supplemented \Vith additional data about the evenl and then be "sliced ancl
agement Systcm or APrvIS. Befare \ve clisCLlSS the inl~rface bet\:'een ~M~IS. <:nd dlced" 10 determine the opporlunities.
AP1\/IS let's cliscuss the role 01' borh of thesc systems in:the operatJon oi tbc aedily. ,. In order 1.0 make use .of this important data, the dala 111llst be some\vhat easy to
A CNlrvIS i5 designed to aS5ist maintenance person11el in the execulion 01' work. flnd ami manipulate. Havmg \vorked with Reliability and Maintenance Analvsts far
The main 1'unction of this system is to automutc the prcess of getting maintenance many years, \ve have seen a number 01' homegrown retiability management s;!stems.
tasks cornpleted in the field. This includes things likp generating work. requests, 1 an.1 s~il:e th~t y?U loo can attest to sllch systems. For example, what happens when
prioritizng work, plannng and scheduling, matcrials lmanagen~ent ancl hnally tbe RclwbIll.ly ~jngll1eers cannot seem to acquire the data they neeel to do their job?
actual execution of the work. CTvHvISs by natme are tr:msactlOn-based syslems, Tbey bmlcJ It themselves: Thcy miraculously go [rom capable engineer lo software
since many transactions have to take place to complrtely ex~cule a mainlenance developer. 1 am sllre you have seen sorne 01' these mastcrpieces. They builcl them in
cvent. 1I[any efficiencies are gained by automating tbe mall1tenance workf:~ow. spreaclsheets, clesktop databases or even llsing full-blown developmenl tooIs.
Hence, most asset intensive companies have implemented such syslems to aclueve Although th~s~ hon:cgrown systems serve a vaIuable purpose for their creatars, thcy
have 111any pittalls 10r an organizatioL1. For one, the data may or may not be accurate.
the many bencfits.
Althouob a CMlVIS provicles a variety 01' beneJ.1ls, it was not designeclto ~e an Since the elata is lypically collecteel by a handful 01' l1sers, il may not truI)"' refiect
analytlcal :ystem to provide clecision support to Reliabillty ancl Maintcnance j~na- ll:~?verall reality. The data may n01 be propcrly cvent codecl so it becomes extremely
1ysts. It cloes, however, offer a variety of good elata that c~n be usecl lo perJorm chthcult to analyze. The main problem \vilh these homegrown solutlons is tbat the
reliability analysis. Por instance, every work arder sboulcl dell11eatey1c a~set ID and data 15 Bol accessible to all the stakeholclers who neecl il.
location of the maintenance evcnt, the date thc asset came out 01 serVlCC ;.\l1c! the An APIVS 1S clesigned to interface with existing data sources like CMMS
components that were used to repair the asset. There s. obviousl:i .much more than (Figure 6.1), PDfvl systems, process systems and a variety of others. This enSUres
litis, but those items alone can be extremely valuablc ll1 dclen11lnl11g event pruba- that the clata 1S aceurate amI i5 kept up to date, as the interface keeps the system
conl"inllally in sync. This is critica! because it allows the data to be collected once
bilities and even optimizing preventive maintenance aclivities. .
\ n APIvIS is not desioned to hanclle m:ntenance worldlmv ami transacuons but ancl usecl fOl' a variety of purposes. An APl\1S is a sccurcd system so yon know
~\ b 1
to taLe that data and a variety of othe!" elata to create actionablc information in 'vVI~lC 1 that lhe data is proteeled. The most important purpose of an APlvfS is to provide
to improve the overall reliability anci availabiJity of lhe facili:y. Tbcse .tools l1:lgl1t the value-adclecl allalysis looIs to turn existing malntenance ancl reliabilitv data ,",:l~_.
contain extensive clata manipulation tools, statistical analysls tools ltke Welbuli into actionable information. - ~
Analysis, Root Cause Analysis (RCA) , Rlsk Basee! Inspection (RBl) and 1~1.lrlY Let's move on to the arca of Hnalyzing your digital elata.
otbers. Vk wil1 focus our discussion on how an APrvIS can be a valllable <ud to
helping Root Cause AnaIysts determine the besL opportunitics for analysi~. ANAlVZE THE DIGITAL DATA
So what data can we use from a Crvll\ilS that wOllld help AP1V1S cletenmne wherc
the best opportunities [or analysis might be? 'TabIe 6.2 i5 a !::ble of some 01"' tbe The tool 01' choice lo perform Opponunity Analysis is lhe Pareto charlo Just to recap,
common data fields that woulcl be useful in this type of analysls. a Fareto cha1't is simply a w;:y to clelineate the significant tems within a colleedon.
76 Root Cause Analysis: lmproving Perrormance far Bottom-Line Results Asset Performance Management Systems (API\t1S) 77
In our casc, it will l1elp llS determine the ft~\V significant issues that represcnt the
majority 01' the los ses within a facilily. The Parcta cl1art can be LIsed on a varety af
metrics depending on the necd. For instance, S0111e userS might slInply use mainte-
nance cost as thc on1y mcasure to determine whether an RCA needs to be initiated.
Others l11ight want to compile all the cost::; associated with an event, namely lost
opportunity costs for llot proclucillg. Still others might be more interestcd in IvIean
Time Belwecn PailllIc or MTBF. The assets \vith the lowcst IVITBF might be the
bes!' candidates for RCA. The advantage of llsing an automated approach to Oppor-
lunity Anal ysis 1S that the analyst can look {ur upportunities using a variety of mctrics
ancl techniques.
Today therc are some pO\verful technologies to vie\\' ancl analyze data. One of the
best for performing Opporlunity Analysis via Pareto chmis 15 a tcchnology called On
Line Analytical Proccssing or OLAP. This technology allows users to view data with
a variety 01' cJimensions ami measures. For instan ce, suppose you wanted to know
wlch unit within yom plant was l'csponsible for the greatcst maintenance expendi-
tnres? Once yOll knew that, the next obviollS questioll might be \vhich pieces of
equipment were most respol1s1ble for tll3.( To go even deeper, yOll might want to know
what the componenl was tbal causecl mosl of that expense. With OLAP tools, you can
use powerFul drilldown capability to do this type of analysis. Figure 6.2, F'igure 6.3,
anci Figure 6.4 are a series oY charts clemonstrating these dynamic Pm'eto charts.
The use of OLAP makes sophislicated clata mining easy fol' end users, It allows
Llsers lo see what they \vant to .'lee in the form that is the 1110St useful fol' them.
Althongh OLAP is an incredible tool for dynamic Opportunity Analyses, other
tools might be LlseFul as well. Sorne uscrs might like to see the data presentecl in a
particular formal. Por instance, suppose there is a corporate reporting standard that
neeels to be aclherecl too Ir this were the case, the use of prefonnatred reports l11ight
make the most sense. Reports are useful ror presenting preJeterminecl metrics that
are updated every time the particular report 1S runo Figure 6.5 is an example 01' a
pump event counl and maintenance cost report.
'fo a110w for complete f1exibiJity for data analysis, an AP1VlS \vould provide a
comprehensive 1.001 to perform acl hoc query abily. A query is simpl:y a way to
extract the information we need from the clatabasc. This is commonly done Llsing
the struclurcu query langllage 01' SQL. SQL is [he syntax or language needed to gel
the retev~-ll1t elata from the database. SQL is not something l110st analysts are inti-
mately familiar with. So the APrvIS must provide a highly llexible query tool that
uoes no! require lhe end user to know anything ubout SQL. Figure 6.6 and Figure 6.7
provicle an eXi.1mple of a quel'y clcsigned to determine the MTBF (mean rime between
failures) for a collcction of pumps.
This 0111y shows the surface of what can be accomp1ished when we automate
opportunity analys18. Therc are far more sophisticated statistical methods thal can
be employccl. Our advice, however, is to start with the basics a11d slowly move into
more soplsticatecl methods.
By automating opportunity analys!s, the users have a dynamic tool that allows
them lo luok at opportunities in a variety of different I,vays. As business conditions
change, then so can the opportunities. Thc key is to consistently collect the right
,--I~lt,;] nn ,;] rl"'I_l-rLrICl" h ... ";,,
'"
e
fij
Al
o
o
:;;510,000
n
$480,000 ~
e
.$450,000 ~, ~
ro
$"20,000 p
$390,000
:$360,000 "
-<
~
~
$330,000
E$300,000 ~
3
$
o
<
ClC1"
~
~
~
3
~
"nro
O'
~
< ro
O
",
e
.g O
~
:3
~ ,~
Prvc:e:ss Units 3"
'"
FIGURE 6.4 Ste.p 3 - Dri11 down 10 determine the components for the highest asset cost (i.e. P:tvlP-4543) '"'"e
~
:;:;
p
~
~
s.
~
o
3
~
~
ro
merltt~Urn
" ~" :;:
"",'"1:0"",0.,.-,,,,"
Centrifuga! Pump Repon: p
~
~
Top 15 bad Actors by Cost cm
ro
~~~~f;$He 3
ro
Pl...-\.2;;1:ii,l.
~
;; M Y'..AL5 552.\,757
~ s"'o g 's
~
" ~
~ "" ~
j [5 ; ~
~
i
!
o f ~ ~
:::
~
~
~u
. ., . ., .
~
~ ~ 8 ~
~ ~ "5
o
u
V)
tl
'0 ~
" 8
g :;
E
:s"
u
8 S g
" ~ ~ ~ ~ ~ 8
le
g
"'
u
u
1=2
8 !l
o
N <f. " ~ ~
U O
J
il
o.,
>,
u
''""
Iil
m~
"-
E
"
ero
"vi
<.u
r
~'
'"
::o
v
:
~. _:~
l' \j
;l~
"l
V'
i , "! 1l, !, ,, ~
,
" & ~ S ~ ,~ j; , <X ~
B
~ ~"
~
g~ *ti"11
1ill(~\11;~
Jl ~
z o
~
"
B =~
"
" "~ ~ ]
f X <ih:-;J) Si
"
~
~
The PROACT RCA
7 Methodology
The term "proact" has recently come to mean the opposite of react. This may scem
to be in conniCl with PROACT's use as a Root Cause Analysis CReA) too1. Normally
\-vhen \ve think of H.CA, thc phrase "after-thc-fact" comes to mind. After-the-fact,
by ts nature, sllggests an undesirable outcome must OCCUT in arder to spark action.
So how can RCA be coincJ proactive?
In the last two chapters on Opporlunity Analysis COA), we cleady outlined l
process by which to iclentify which failures or events were actually worth per-
forl11ng RCA. \Ve Jearnecl fmm this prioritization technique that, generally, the
1110St important events lo analyze are NOT the sporadic incidents, but rather the ~
day-to-Jay chronic events t11at continually sap our protltabHity.
RCA tooIs can be used in a reactive fas11ion and a pro active fashion. The RCA
analyst \Vj]] llltimately determine this. VVhen we use RCA to investigatc onIy those
incidents that are cJe11necl by regulalory agencies, thcn \Ve are responding [O the fleId.
T!s is s(Tictly reactive. Hmvever, ir we were lO use the OA tools described previously
to prioritize our efforts, Vle \villllnCover events th<11 many times are not even recordecl
in our ComputerizecllVlaintenance lVlanagement Systems (CMivIS) or the tilee. This
is because sucIl events happen so often tha1 they are no longer an anomaly. They
are a part 01' the jobo They have been absorbed into lhe daily routine. By uncovering
such cvents ami analyzing them, \Ve are being proactive because unless we loole at
them, no one eIse win.
The greatest bencfits from performing RCA will come from the analysis of
cilronic events, hencc Llsing RCA in a proacve manner. V/e 111l1St understand that
often we are gelting sllcked into tlle "paralysis by analysis" trap ancl encl up expend-
ing too many resources lo attack an issue that is relatively unimporrant. \Ve aIso at
limes refer lo these as the poltical failures-of-the-day. Trying to do RCA on every-
tbing will destroy a company. It i8 overkill, ane! companies do 110t have the time or ~<'~
resources to do it effcctively (Figure 7.1).
Understanding the difference between chronic ancl sporadic events \vi11 nmv
highligbt ollr awareness to wlch clata collection strategy will be appropriate for the
event being analyzecL The ley aclvantage, if thcre can be (me with chronic evenlS,
105 their frequency of occurrence. This is an advantage because like the detective
stalking a serial ki!ler, he is looking for a pattern to the activities. In this manner,
tile detective may be able to stake out where he feels the next logical crime will
take place and hopcflllly prevent its occurrence. The same 1S tnle for chronc even1s.
Witb chronic events, \Ve have in our favor that they \ViH likely happen again within
a ccrtain time frame ancl we may be able to plan for their recurrence ana capture
more data at that poinl in time. We will discuss tls more \vhen \ve go over
Veriftcation Techniques in the Analyze chapter (Chapter 9).
86
Root Cause Analysis: Irnproving Performance for Boltom-Line ReslIlts
The PROACT0 RCA IVlethodo!ogy 87
Convers:ly, when \Ve ,look. al what data collection strat~gy would be employccl FIGURE 7.2 Typical Rcasons Why Event Data is NOl Collected
Jl1 a sporadlc event, \ve {md Jreguency cloes NOT work in! our favor. Uncler these
:ircumstances, our detective may be investigating a single homicide and be re!iant !cad to people who llseel pOOl' judgment ancl therefore managcment couId witch 1mnl
l11 tlle evidenee at that see11C 0111y. This would mean we !must be cliligent abollt lbcm and app!y ce1'1ain disciplinary actions.
ollecting the elata 1'ro:l~ the scene before it is tampered withl \Vhen a spo;~ldc event These are al! valie! concerns. \Vc have seen the gaoe!, the bacl and the ugly cremed
<CCllrS,."ve l1lust be dgent at that time to colleet rhe dataiin spitc 01' the massive by these concerns. The raet 01' the matter 5 that ir \ve \vlsh to uncover the truth, the
iIorts ro get the operaton running again. ' real root causes, \!,IC cannot do so wilbuut the necessary data. Think about any
investigalive. or analytical profession, the first step s always to design data colleetion
strategies to obtain the data. ts a deteclive cxpected to solve a crime vthol1t any
PRESERVING EVENT DATA evidence or leads? 18 an NTSB invcstigator expected to solve rhe reasons ror an
'he rst :step in the P~OACT RCA t'lethOdology, as 1s thc c:ase in any investigative airplane crash without any evidence f1'Om lhe scene? Do doctors make cliagnuses
r analytlcal process, lS to preserve ancl colleet relevant data~ Before we discLlss the without any more information than what the patient presemecl? If these professionals
Jcciflcs of ho\V and when lo collect varions forms of data,:let's take a look at tlle see the necessity of gathering data ane! information to dravv conclusions, then we
:;yehological sic1e ofwhy people shoulc1 assist in collccting data f1'o111 an event scene. certainly must recognize the eorrelation to RCA.
Basee! on om experience, we have seen a general resistance to data collection
~et's create a scenario in which we are a mechanic in a manufacturing planto
Te Just completed a IO-day shutclown of the facility to perfonn scheclu[ed mainte- fol' RCA purposes. \Ve can clraw t\Vo general conclusiol1s from our experience
(Figure 7.2):
-~Ilee. Everyone knows at this Yacility that when the plant manager says the shutclown
lllast 10 clays ancl no more, \Ve do 110t want to be the one responsible fol' extendi!lo
l. Peopk are resistant 1"0 collecting event elata because they do not appreciate
past 10 days. A 5ituation mises in the ninth day of the shutdO\vn. During an internJ
the value oi' lhe elata lo an analysis or analyst.
'cventive 1.naintenance inspection we finel that a part has [ailec! ancl must be replaced.
2. People are resistant to collectlng elata because of the paradigms that exist
'-' gooe! fmth \Ve reqll.est the part Ymm the storeroom. The storeroom personncl
wilh regard to v,titch hunling and managerial expecrations.
lOun liS th01t the particular part 1S out o stock and that it will ta1<c [our \vecks to
pedite the orcle~' f1'Om the vendor. Knowing this is the 11inth day of the IO-day The flrst conchlsion we see is the lesser of the two. Often procluction 01' any
utdown, we decIde to make a "banel aid" repair because \ve Jo 110t \vant to be thc facility is the ruling bocly. After all, we are paiel to produce quality product whether
1'son lo extend the shutdown. vVe rationa1ize that the "band aiel" will hole! rOl' the that produc! is oi!, s!eel, package de!ivery orqllality patient careo \Vhen this mentalily
ur-week duration as we llave gotten away with j- in the past. So we install a nol is dominan!, il -mces liS lo rcact \\/ith certain behavors. If proclllclion is paramOlll1I,
:e .. for-likc part in preparation for the start-up of the process. then whenever an event occurs, \Ve ll111st clean it up and get production startecl as .~~
Within 24 homs of start-up lhe process 1'ai1s eatast1'Ophically ancI aU indications quickly as possibJe. The fOCllS js not on why the event occurred; rather it is on the
ld to lhe area where the "banel aid" fix was installccL A formal RCA team 1S amassec1 fact it did occur, and we must get bacle lo our status quo as soon as possible.
ti they a~sign us to collect sume parts elata fram the seene immediately. Givcn lhe This paradigm can be overcome merely with awareness ancI eclucation. ~.ianage
teh huntmg culture that we know exists, "\Vhy shoulcl we uncover elata/evidence mel1l" must [irst cOl1lmit to supporting RCA both vcrbally and on papel'. \Ve cllscLlssecl
II \vill incr1minate us?" \Vhile this is a hypothetical scenario, it could very well earlier in the management support chapter that dcmonslrated acons are seen as
)l'escnt many Sl1aons in any industry. "vVhat is the incentive to coUeet event elnta "walking the talle" ancI one of those actlons was issuing an RCA poliey or procedur~.
hopes ol' ullcovering the truth?" After all, this ls a tirne-consumim.': taslc TI- \vill This would reCluire data to be collected insteacl of makin!I it an ontlon. Secondlv. lt
88 Root Cause Analysis: !rnproving Perform!ance far Bottom~Line Result'S The PROACT<8 RCA tvethodology 89
is not enough just to support lt, but \Ve must tinIe with the incllvicluals wha must
physically colleet the data. They must clearly unclerstand why they shoulcl callcel E @ @
lhe clata and how to do it properly. '
\Ve should link \Vith people's valuc systems and sllOW them the pmpose 01" dala
coHechan. If \Ve are an operator in a steel mill ancl tl1e 6r5t ane to an event scene,
-.~t0JE,~
e E
? e
~
we should understancl what 1S important informaton versus unimportant to an RCA.
For instance, we can view a broken shafL as an item to clean up or as an integral
@@
~--kJ \ ~
E'
piece of infonnation to a metallurgist. If we understmld hm\' important the elata we
calleet is to an analysis, we \vill see ane! apprcciate wl~y it shoulcl be collectecl. Ir \ve .r -\
do not unclerstancl or appreciate its value, then the task'is seen as a burclen. Providing ~Random \ ~ ~ 0
everyone with basic training in proper data collection proceclurcs can prove invalu- event ~ V
able lo any organization. ~\ = error @ = change
'vVe have seen the potentlal consequences of poor slata collection elTorts in somc
recent high proiile court cases. Allegations are made as to the sJoppy handling 01' FIGURE 7,3 The Error-Change Phenomenon
evidence in lab work, improper lesting procedures, lf)lproper labeling and con1am-
inalecl samp1es. lssues 01" these types can lose your c~se as \ovello a plannccl sequence of mental and physieal activilies fails to achieve its intenclecL
Providing the above support ancl training overcomes pne hurclle. But it cloes 110t clcar oLLtcomc, and \vhen these failures cannot be attributed to sorne chance agency."
the hurdle of perceived witch hunting by an organizatio~. rvIany pcoplc wi11 choose 1101 This l1leans \ve in tended on a satisfactory outcomc and it clicl not OCCllr. \Ve,
...'L.,.:J to collcct data 1'or fcm that they mOly be targeted based on lhe conclusion drawn from in sOl11e manner, eithcr 1) clevi.atee! 1'rom our intended path, or 2) (he intencled palh
the data. This 1S a prominent cultural issue that mllst bq aclclresscd in oreter ro progress was incorrcct.
with RCA. \Ve cannot detenrune "root" causes if a wil9h hunting culture is prevalent. Tlle change, as a result 0"1' an error in om environmcnt, is something that is
perceptible lo lhe human senses. An cxample might be that we commit an error by
I misaligning a shaft. The ch,mge will be thal an cxccssive vibration occurs as a resulto
THE ERROR-CHANGE PHENOMENON A nurse administering the wrong medication to a patient is the human error. The
adverse reaction is thc perceptible change. These series of human eiTars ami asso- ../..1 __.
Onr experience indicates that there are an average nllp.1ber 01' elTOl"S that mllst pccm
cjatee! changes me occurring arouncl us evcry day. \Vhen they 9~leue.J:W in a particular ..,./~"-
in a particular pattern for a catastrophic event to occpr. Thc Error Chain Contept l ,
. pattern thal is when catastrophic occmrences happen. r\k<t\.~N.t\~ . . . c:
"describes human error acclclents as the result of a sequence O'l events t I1<1t eu 1J11Ll1ate
l.
1im Reasons coinee! the term Swiss Cheese :Modell to clepict this scenario ...,dr_.-
in mlshaps. There is seldom an overpowering cause, but rather a llumber oi' contrib-
graphically <1ncl this lenn has caught on in many industries (Figure 7.4).
uting factors o enors, hence rhe tem1 eJ"mJ" chall/. Bre,llcing nny one link in the
((llowing Ihis infurl11ation, \Ve vmuld likc to makc two points:
chain might potentially break the en tire error chain and prevcnt a misbap.'l. This
research comes 1rom the aviation inclustry ancl is based 011 lhc invcstigation of more l. \Ve as human beings have lhe ability through our senses to be more aware
than thi.rty acciclents or incidents. This has been om experience as well in irivesti- of out" environments. I.f \Ve shal1)en our senses, we can detect these changcs
gating industrial failures. <lnd [lkc aclon to prevenl [he error chain from running irs course. :rV1any
Flight Safety International statcs the fewest links discovered in any Ol1e accident uf our organizalional s.iislemS are pUL in place to recognize these ehanges. ,4-:-
was Our, the average being seven. 2 Our experience in industrial applicalions ?hows For example, the precJictive maintenance group's sale purpose is to utilize
the average number of errors that must occur to be bctween 10 ::lnd 14. '1'0 liS, ('his testing eqllipmen( lo idcntify changcs within the process ancl equipmcnl.
15 the eare to understanding what an analyst needs in order to unc!erstanq why 1"1' changcs are not within acceptable limits, actions are taken to make
undesirable events occur (Figurc 7.3). thel11 w1thi11 acceptable lil11its.
\Ve likc refcrring to it as error-ehange reJatiollships. First \Ve mllst c!cl"ine somc 2. By wch-huntlng [he 1as1 person associated with an evenl, \Ve give up the
terms in arder 1'0 coml1111nicate more effectively. V'le will use James Reasons' right to tbe in.forl11ation that person possesses on the orher errors that lead
(Humau Error: (990)' defnition 01" human C1Tor for our RCA purposes. Jim Reasons l1p to the event. Ir we discipline a person associated with the event becausc "L~ .,
defines Human Errr as "a generic tel111 to encompass all those oecasions in which our culture requires a "heacl to roll," t!"len that person (or anyone arouncl him)
will not likely be honest about why he made clecislons that resulted in errors.
1 Flght surety lnternational, Crew Resonrce l'v"hmagcmcnt vVorkshop, September 1993.
"l Flight Safety InternRtional, Crew Resource Mallagement Worksho,l~, Sep;~~~.~er 1993.
88 Rool Cause Ana!ysis: Improving Performance fOI" Bottom-Line Results The PROACT''' RCA Melhodo!ogy 89
is llot cnough jusi to SUppOli it, but we must link with the individuals who lTl1lst
physically collect the data. They l11ust ciearly unclerstancl why lhey should collect
the data and how to do it properly.
VI/e should link with people's value systcms ane! sl~ow thel11 ('he purpose of dala
collection. Ir \Ve are an operator in a steel mill and the first one to al1 event scelle,
we should understand what is important information versus unimportant to an RCA.
Por inslance, \ve can view a broken shaft as an itel11;to ciean up or as an integral
picce of information to a metallurgist. Ir \Ve unders[[uld how imporlant the data we
collect is to an analysis, \Ve will see and appreciate why it should be collectecL 1f we
do not understand or appreciate its value, then the taskiis seen as a burden. Proyiding
eyeryone with basic training in proper elata collection! procedures can prove invalu-
able to any organization.
\Ve haye seen the potential consequences of poor data collection e-J-forlS in some
recent high profile court cases. Allegations are maele as lo the sloppy handling of FIGURE 7.3 Tbe Error-Cllange Phenomenon
eyidence in lab work, improper testing procedurcs, i~11proper labeling ancl contam-
a plannecl seqllence ol' mental and physical activities fails to achieve its intended
inated samples. lssues 01' these types can lose your Cftse as wd!.
olltcome, <.lnd when lllese t~1ilures cannol be attributed to some chance agenc)i."
ProYiding lhe aboye support and training overcomes :one hurdle. But it cloes not clC8r
This mean s \Ve in tended 011 l satisfactory outcome ancl it did not occur. I.Ve,
the hurdle of perceived witch hunting by an organizatiop. r'l1any people .wi11 choose 110t
in some manner, either 1) dcviatecl from Ollr in tended path, or 2) the intended path
:;":t..t. ,):; lo collect data for Cm that they may be tmgeted based on the conciuslon dra\vn from
\Vas incorrect.
7 the data. This is a promincnt cultural issue that mLlst b6 addressed in order lO progress
The chal1ge, as a result o" an error in our environment, is something thar is
wilh RCA. We cannot detennine "root" causes ir a witch 11l1nting culture is prcvalent.
perceptible to the human senses. An cxample might be that \VC commil an error by
misaligning a shaft. The change wi11 be that an exce.sslve vibration occurs as a result.
THE ERROR-CHANGE PHENOMENON A nurse administering the wrong medication to a patient is the human error. The
aclverse reaction 1S the perceptible changc. These series o" human errors and as so- "",,,i.
Our experience indicates that there are an average number of crrOl'S that must:'occur ciated changes are occurring arounclus every day. \Vhen they C]1..lcll9...-MP in a particular ~. .,
in a particular pattern ror a calastrophic eyent to occur. The Error Chain Coricept l , pattern lhal 15 when catastrophic OCClllTenCeS happen. H~\:-\.~-...l-l ~~'?.h2.
"describes buman error accidents as the result of a sequence of events lhal cul!1nate 1im Reasons coincd {he term S\viss Cheese IVlodeP to clepict this scenario ~..Js-.
in mishaps. There is seldom an overpowering cause, bul rather a number o" Ct?ntrib- graphically ancl this term has caughl 011 in mun)' industries (Figure 7.4).
unrr factors of errors hence the term error c!win. Breaking any one linkin the Knowing this illformation, \-ve \Voulcl [ike to malee two poinls:
chai~ migh; potentiall; break the en tire error chain and prevent a mishi.lp!' This
rescarch comes fmm lhe aviation industry ancl is based Clll the investigation 01' more l. \Ve as human beings have the ability through OU1" sen Ses lo be more awarc
than thirty accldents 01' incidents. This has been our experience as well in investi- of our envirolll11ents. If we sharpen our senses, \Ve can detect these changes
gat111g industrial failures. am! take action to prevent the error chain from running lts course. Many
Flight Safety International states the fewest links disco ve red in any 011e accidenl 01' our orgnnizational systems are put in place lO recognize these changes.
was four, the average being seven. 2 Our experience in industrial appliGltions shows For eXi.lmple, the predictive maintenance group's sole purpose is to utilize
the average nlllllber of errors that must occur to be bctween 10 and 14. To us, this testing equipmcnt to iclentiCy changes within the process ami equipment.
i5 ihe core lo understancling what an analysl neecls in orcler to unclerslancl \vhy Ir changes are not \vithin acceptable limils, actions are taken to malec
unclesirable eycnts occur (Figure 7.3). them within acceptable limlts.
I.Ve li1<:c referring to it as error-change relationships. First we l1lust dehlle some 2. By witch-hunting the last pcrson associatecl with an event, \ve give up the
tenns in order to commllnicate more effectively. We \vi11 use James 1"<.easol1s' right to tile informatton that persol1 possesses on the other errors that lead
(Human Error: 1990)3 definition ofhuman error ror
our RCA purposes. Jim Reasons up to lhe event. Ir we discipline El person associatecl Wilh the event because .~L
defines Human Error as "a generic term to encol11pass all those occasions in which our culture requires a "heacl to roll," then that person (or an)'one around him)
--_. __ _- __.__...._- ..__..
... .. _--.".~-_., .. _- ..
wil! nOllikely be honest about \vhy he made decjsions that resulted in errors.
; Fliglit Slll'ety Intemational, Crc\V Rcsource ~'lanagcment Workshop, September 1993.
]. Flight Sarely International, ere\\' Resource Managernel1t Workshop, Seplember 1993.
. ~!~ .. , . "'.,~ . . t... ;,I.'n 11";",.,,,;,,, p,"c~
--~--_ ... _--
!()l)()
11~f':l~nl1 fmnr'~ HUilllI1I Prmr Victoria: CambriJrC: Universllv Press. 1990-1992 .
90 Root Cause Ana!ysis: lrnproving Performance far Bottom-Lne Results Tile PI\OACT" I\CA Methodology 91
In a latcr chapter Oil Analyzing the Data (Chaptcr 9\ "ve will explore what \Ve
caH a Logic Tree that is a graphical representation 01' "In error-change chain aneI PARTS
basecl 011 tl115 research. vVe discuss this research at this p~int becausc it is necessary
Parts wil1 generally mean something physical ol' tangible The potential list is enclless
Lo understand tbat any investigation or analysis C<111not be performed \vithout elata.
dcpencling on the industry \vhere the RCA is conduetecl. For a rough smnpling of
VI/e have enough experience in the fielcl application ori RCA to make a general
what is meant by parts, pIease review the following lists:
statement that lile physical activity uf obtaining such datc~ can have many organiz~
lional barriers in front of it. Once these barriers are rccognized ancl Qvercome, tlte
task comes of actually preserving and collecting the data. CONTlNUOUS PROCESS INDUSTRIES
(Oll, STEEl, AlUMINUM, PAPER, CHEMICAlS, HC)
THE 5-PS CONCEPT
Bearings
Prc.serving Failure Data is the PR in PROACT. In a typical high-profil.e RCA, an
Seals
immense amount of data is usually collected ancl then l11ust be organized une!
Couplings
rnnnaged. As we go [hrough this cliscllssion \Ve will relate haw to manage this process
Impolle"
manually versus with soft\vare. \Ve will discuss automating your RCA using software
Bolts
teclmologies in Chapter 12.
Flanges
Consicler this seenmio: a major upseljust occurred in our faciJity. \Ve are chargecl
Grease Samplcs
lo coHect thc necessary data fOl' an investigation. What is the necessary infOl'matiol1
ProclUCl Samples
to callcet for an investigation 01' analysis? We use a 5-Ps approach, whcre the Ps
\Vater Sumples
stand 1'01' the following:
Tools
Testing Equlpment
1. Parts
Jnstrumcntation
2. Position
Tanks
3. People
Comprcssors
4. Papel'
Nlotors
5. Paracligms
Root Cause Analysis: Improving Performance for Bottolll-Line Results The PROACT" RCA Methoclology 93
D1SCRETE PRODUCT INDUSTRIES tbe lert?" "vVhy was it the an (1ower) flelcl joint attachment versus the l..lpper fleld
(AUTOMOBlLES, PACKAGE DHIVERY, BOTTUNG UNES, ETC) joinl attaehment?" "\Vhy was lhe leak al the O-ring on tile il1sicle diametcl' oE the
SRB versus the outside diamekr?" These are quesons regarding positional inror-
Product Samples matiol1 that hacl to be answerecl.
Cnveyor Rollers NO\v let's lake a loak at positions in time and their relative importance. Mon-
Pumps toring positions in time at which undesirable ontcomes OCC1..1r can provide information
IvIotors for correlatiol1 analysis. By rccording historie al occurrences \Ve can plot trends that
Instmmentation iclentify the presence 01' ccrtain variables when these OCCUlTences happen. Let's takc
Processing Equipment a loo k at the shuttle Challcnger again. -VIost of us remember rhe inciclcnt and rhe
COl1clllsion reportecl to the publie: an O-ring failure resulting in a leak of solid rocket
HEALTHCARE (HOSPlTAlS, NURS1NG HOMES, fue!. If we look at the positional infonnation from the standpoint of time, \ve wOllId
OUTPATIENT CARE CENTER, lONG-TERM CARE, HC) lenrn that the O-rings had cvidence 01' sccondary O-ring erosion on 15 uf the prevLous
25 shuttlc launches. 1 \Vhen the SREs are released they are parachuted into the o~ean,
IVIedical Diagnostic Equipment relrievecl ane! analyzecl 1'or clamage. The corrclation o thesc past launehes, which
Surgical To01s incurred secondary O-ring erosion, sllOwed that low temperatures \vere a common
Gauze variable. Thc positions in time information aided in this correlation.
Fluid Samples lronically on the shuttle Columb break-up, there were seven oceunences of
Btood Samples bipocJ ramp foam cvents sincc the first mission STS-l. The lable below identifies
Biopsies which missions incurrcd ancl which types of clamage.
I'vfeclicines
Syringes
Testing Equipment TABLE 7.1
IV Pumps Space Shuttle Columbia Debr,s Damage Events
Patient Becls
This ls just a sampllng to give a [eel for the type 01' information thnt may- be ;~~~i:~_~;;:~:,;aga 300 ilea r~------------~
consiclerecl uncler the parts category. srS-7 06/18/83 FII's! knavm le, blpad r21mp foam sheddlng everll.
rsrs-27R 12/02/98 Debris knocks off lile. s!ructural damage and n-e-a-rC-bu-r-'-t-hr-o-u-gh-re-s-u-It-s.--
-j
PosmON STS-32R
!-c-.-1---'~.-+::-.--
01/09/90 Secanu Imown oipod oven!
..
8rS-35 '12/02/90 First Ulne NASA calls oem debris "safal}1 of fligtlt issue." and '"re-use or turn-
Positional data is the least understood ami ls what we consider lo be the 1110st around
_ __ time....issue."
ce.- - . - -________ - c - - . - -____-.-c-----
important. Positional elata comes in the form of two cliffcrenl climcnsions, one being 5T5-42 o"l/22J92 First miss ion aner Ihe next miss ion (8T8-45) launched without debris In-
physical space and the seconcl being point in time. Positions in lcrms 01' space are flight anomaly closure/resolution.
----r--.--.. --"----'---==.-=--____::-c-________"C"c---:---:--::--cc:__c_-
vitally important lo an analysis because of the facts that can be deduccel. STS-45 03J24J03 Damage to wing RCC Panel 10-right. Unexplahed anornaly, "most likely
-=
orbital dabris
\Vhen tile space shuttle Challenger exploded on January 28, 1986 it was
STS-50 06/25/92 Thircll\l"IOWIl bipoci ramp 103m even!. Hazard Repor! 37:Accepled Risk.
approximately five miles in the air. Films 1'rol11 the ground provided millisecond-
ST8-5-2 ", "IO/22Jj~ Urlddect~cI bipod r8fnp 108m 10% (fOll\"lh b-IP~? t:\,I8nl) ____ ..
by-millisecond footage of the pmis that \vere being clispersecl from -he initial doud.
Prom this positional information, trajectory information was calcuLated ancl scarch
Sr5 .. 56 04/.08/93 ~~'eage tile damage (larg8). Called withill_',,8xperience base."
SrS-62 '--~6iofl/94 Undeleciefi iJipccl rllllp ioarn loss (ifth bipod eV8ilt)
and rccovery groups \verc assigned to the approx1mate locatlons of where vital parls
STS~Y----- 11/19197 to~~agG !o-Orbiier l'hermal P-c-ot-e-'"-tio-'-S"Cy.'s-te-,-; -s-p-u"-rs-N'A'S:cAc-to-"be-g""',-;"'g""''"g"'h"C
were iocatecl. Approximately 93,000 square miles 01' oeean were involved in the tests lo resolve foam shedcling. Foam ix ineffeclive. In-flight 8nomaly
search and recovery of shuttle evidence in the government investigallon. 1 \Vhile this eventually closed after ST8-1 01 as "accepted risk,"
1s an extreme case, it shows how position informat1on is used to determine, mnong
r:::-::-c-c,-r-=c:c::-i
STS-112 '10/07/02 Sixth Imowll lefi bipod ramp foam loss. Firsl time majar debris even! not
other things, force. assiglled an In-flighi anornaly. External tank was assigned an aelion. Not
closed out lIntil after 8T8-113 and 8T8-1 07.
\Vhile on the subject of the shuttle Challenger, other positional informatiol1 !-hal
STS-107 0'1/16/03 Columbia Launch. Sevenih knowll le! bipod rlll"lp loam IOS5 eV8rli.
should be considerec1 1S, "vVhy was it the right solid rocket booster (SRB) amI not
The long ancl short of it here 1S that the loss C foam tiles hom tile mall1 fuel
Hotter side no failure
tanks ancl their sllbsequcnt impact on the shuttle vchicle were not a ncw phenomenon
._- just like the O-ring erosion OCClluences. Collecting the pOSitiOllS in time 01' these
occnrrences amI mapping them out on a time line lJrove thcse correlaLions.
Now moving into more familiar ellvironments; we can review some general or
common positional informat1011 to be colJectee! al mpst any organization (Figure 7 .5):
i
Physical Position of Parts at Scene ol' Incident
Point in Time of Current ancl Past Occurrences
Position of lnstrument Reaclings
Position oY Personnel at Time of Occllrrence(s)
Position of Occurrence in Relation to Ovel~all Facility
Envlronmental Information Relme<.l to Po~ition of Occurrence sllch as
Temperature, Humidity, Wi.ncl Velocity, etc.
We are not looking to recruit artists fOl" these ~~laps or sk:elches. We are simpJy
seeking to ensurc that everyone sees the situation the same way based on tlle faets
at hand. Again, this is just a sampling to get incliv~eluals in the right framc of mind
01' what we mean by positional information.
Thcrmo couple
PEOPlE
The people category is lhe more easily clcfined P.i This i5 simply wllo necds to be FIGURE 7.5 i\if:J.pping Exal11ple of Sulfur Burner Boiier
talked to to whom initial1y in arder to obtain inforI11ation about an evcnt. The peopie
of their clients, their opposition ane! the \vitnesses. Bocly language clnes will direet
we must talk to llrst shoulel typically be the physiCal observers al' witnesses 01' the
Lheir next movc, T11is should be the same for interviews associatee! with an uncle-
event. Efforts to obtain sllch interviews should be l;elentless ane! immediate.:\Ve risk
sirable outcome. The bocly bnguage will tell interviewers 'vhen tiley are getting
lhe chanee of losing elireet observation when we i,nterview observers days ;ancr an
c10se to lnfonnation they desire, amI this will direct the line anci tone of subsequenl
cvent occurs, I.Ve \-vill ultimately lose some clegree oY short-term mcmory ~tncl also
qucstioning. Consicler another profession that \ve might not thinl: of as having a
risk the observers' having talked to others about the.ll' opinion of whal h4ppened.
strong relalionship \vith bocly langl1age - professional poker players. \Vith the recent
Once observers cliscuss sl1ch an event with oUlsiclers, thcy wil! lene! lo rcshape ll1eir
popularity ol' 'Texas Hold'em poker, it cIoes not take the novice long to realize that
clirect observation with the new perspectives.
LIJe slrenglh 01' the cards you are dealt cIoes not determine if you are a winner.
\Ve have always regarded the goal of intervie\vs with observen; to be that \Ve
Professional poker players play their hands based on their read of the body language
must be able to see through their eyes what thcy saw at the sccne. The clesc.ription
01' their opponents. They icHOW that there are certain involuntary responses o" the
111l1st be vivid, ancl it is IIp to the interviewer to obtain such clarity throllgh 1"llCir
body by cerlain players Iha! indicate that (hey are holding a strong hancl or that they
questioning proeess.
are likeiy bluning. This further v"llidates the importance ancI effect of bdy language
Intervie\ving skills are nec.essary in such analytical work. Peoplc must reel
whcn intcrviewing.
comfortable i.uound an intervievi'er ancl not intimidated. A poor llltervicwing style
\Vhen intcrvicwing c.iuring lhe course of an RCA, it is also important to consider
can ruin an interview amI subsequently a11 analysis or invcstigation. A good inter-
the logistics 01' the intcrview. \\fhere is the appropriate place to intervie\v? How
viewer will understand the importance and value ofbocly language. Experts estmate
many people shoulc! we intervic\v at a time? \Vhat types o" people shoulc! be in the
approximately 55-60S{, 01' an eommunication between people i5 through body lan-
room al tlle same time? I-Iow will we record all the information? Preparabon anc!
guage.Approximately 30% of cormnunication is through Lhe tonal voice and 10-15%
cnvironment are very imporlanl factors to eonsider.
1S through the spoken \vorcl. l This is very important when intervic\ving because ir
We dlscussed the intervie\ving environmcnt ancI the ideal number of peoplc i.l1
emphasizes the neeel to interview in person rather lhan over the telcphone. H you
an interview in Chapter S. Those same poimers wiII hold truc when interviewing
100k at the legal profession, lawyers are professionals at reacJing rhe body language
for [he actual RCA versus the Opportmlity Analysis.
I.Ve have most suecess in inlerviews when the interviewees are from varioLls
i Lyle. Jane. EN!." LUllguoge. London: Thc Hum!yn Publishing 01"Ouj1 Lirnitcd, 1990. clcpartmcnts, and more speeit'icnl1y from different kingcloms. \Ve define a kingdol11
96 Root Cause Analysis: !mproving Performance for Bottom-Line Results lhe PROACT'"' RCA Methodo!ogy
97
as entities that build their castles within facilities and tener not to communicatc \vith Keep in 1:1ind our detective scenarios cliscLlssed earlier and the faet they are
cach olher. Examples can be maintenance versus operations, labor versus manage- always prcpanng. H solid case [or court. Papel' elata is one of the most etlective ami
ment, cloclors versus nurses, hourly versus salary, etc. vVheil such groups gct together expect:d ~atcgones of evidence in court. Solid, organizcd c!OCUmentation is the leev
they leam a great cIeal abont the other's perspective and fend to earn a respect for to a wl11l1lng strategy. ~
each other's position. This is another addccl benet of an R~A is that people aclual1y Typical papel' data examples are as follows:
start to mect and communicate with others from different levcls aneI areas.
1f an interviewer is fortunate enough to have an associate analyst to assist, the Chemistry Lab Rcports
associatc analyst can take the notes while the interviewel'i focuses on the intcrview. Mctallul'gical Lab Report
It is not recommended that recorcling devices be usecl in l-outinc intel'views as they Speclf-lcations
are intimidating, anei people believe that the informalion ~nay be usecl against them Procedures
al a later date. In sorne instances where significant legal,liabilities may be at play, Pulides
Legal counsel may impose such actions. However, if they dp,lhey are generally cloing Financial Rcports
the interviewing. In the case of most chronic falures ol' eVej1ts, such extremes are rare. Training Records
Comrnon people to interview \vill again be based on the nature of the industry Purchasing Requisitions/Autborizations
and the event being analyzed. As a sample of potential intrrviewees, please consieler NOlldestructive Testing Results
the following list: Quality Control Reports
Emp10yec File Information
Observers Maintenancc Histories
I\!Jalntenance Personnel Production Histories
Operations Personnel Medica1 Histories/Patient Recarcls
Management Personnel Safety Records lnformation
Aclministrative Pel'sonnel Interna1 Memos/E-IVIails
Clinicians/MedicaJ Staff Sales Contact Information
Technical Personne1 Process & Instrumentation Drawino-s
Purchasing Personnel Past RCA Reports D
This is basically haw each individual views the \Vadcl ~lnd reacts and responcls to lf we believe that equipment will fail becJuse it is olel, lhcn \ve will be
situations arising m-01.Jlld them. This inJJerently affects h'ow \Ve approach solving better preparecl lo replace it.
problems ancl \Vill ultimately be responsible for our success or failurc in thc RCA effort. [e \Ve bclieve RCA is the program-of-the-month, t11en we vll \vait ir out
Paracligms are a by-product of interviews carricd out in ~his process amI disCllsscd unlil the facl goes away.
earlier in this chapter. Paracligms me recognizable beC<:I~lse repetitive t,hemes are If we do not believe data collection is irnportant, then we \vill rely on
cxpressed in these intervicws from V;U-OLlS individuals. How an individual sees the word of 111ou1'h amI aUow ignoranee and assumption to penetrate an RCA
wmlel 1s a mincIset. \Vhcn a certain popuJation shares the same mindsct, it becol1lcs as raet.
a paradigm. Paradigms are importanl becausc even if thes~ are false, thcy represent lfwe belicve that RCA is a witch-hunting tooi, then we will not participate.
the beliefs in which we base our decision-mak1ng. Thercfor~, true paracligms represent Ir we believe failure 1S inevitable, then the best we can clo is beeome a
rcalily to the people who possess them. better responder.
Belmv is a 11st ol' common paradigms we see in our travels. '0/e are Bol mnking Tr \ve believe that RCA \"li11 eyentually eliminate our job, thcn we wi11
a juclgment as to whether or not they are true, bul rathcr ,that they affect judgment not let ir succecd.
in clecision-making. Ir l nurse believes that it 18 career limiting to contraclict a doctor's order,
then someone wiII likcly die as a result o" the silence.
We do not havc time to perform RCA. lf \ve believe that the hospital is in total control of our care, then \Ve \vill
vVe say safety is number OIle, but when it comes ~lown to brass tacks on nol question things thnt scem \vrong.
the fioar, cost is really number one. lJ' we bclieve that hospitals are safe-havens for the sick, then \ve are stating
This is impossible to solve. th01t we are not responsible fOl" our own safety.
\Ve have triecl to solve this for twenty years. Ir we believc that what we get 1S whm we order, then we \vill nol ever
It's olcl equiprnent; it's supposecl to fai!. inspect when \ve receive an order ancl jUSl trust the vendor.
We know because we have been here fOl" twenty-tive ycars. Tf we believe that aIl RCA is the same, tl1en techniql1es like the 5-\lI/hys
This 1S ::lnother program-of-the-month. will be consiclerecl as comprehensive and thorougb as PROACT.
We do not need elata to support RCA because we! know the answer. If we believe \Ve know all of the ,mswers, then RCA \vill nol be valuecL
This is another way for management to witch hurit. Ir \Ve believe thar unexpected failure is covered for in the budget, then
Failure happens; the best we can do i8 sharpcn our response. we wili 1l0t attempt to resolvc those unexpectecl failurcs.
RCA wiH eliminate maintenance jobs. H we bellcve that RCA is someone else's job, then we are inc1icating that
It is a career-llmiting choice to contradict the doctor (nurse's perspective). our safety is the responsibility of othen, and not ourselves.
\Ve fully trust the hospitals to be responsible fOl" Gtil" careo
Hospitals are safe havens for the sick. Our purposc with these "if-then" statemcnts is to sl10w tile effect that paradigms
\Vhat we get is \vhat we arder; there is no need to check. have 011 human c1ec1sion-making. \\ll1en human errors in decision-making occur,
RCA is RCA; it is all tbe same. lhen they "lre thc triggcfing mechanism rOl" a series 01' other subseqllent errors llntil
\Ve don't need RCA; we know the answer. thc undcsirab!e event surfaces and is recognized.
Ifthe railure is compensated for in the budget, it lS not really a fai!ure anymore. \Ve llave now cIiscllssed in cIetail the error-change phcnomenon and the 5-Ps.
RCA is someonc c1se's job, not mine. Now we mllsl cliscuss how \Ve get al! 01' this informatlOll. When an RCA team has
been commissioncd, a group of data col1ectors must be assemblecl to brainstorm
Many 0-[ these statements may souncl familiar. But think about ho\V each state- wl1ich data will be necessary to start the ana1s'sis. This first team session is just
111ent could affect problem solving abilities. Consicler these i1'-then statements. lIlat, a brainstorming session 0[" data needs. This IS not a session to analyze anything.
The group Il111St be rocused on data neecls ancl not be distracted by the premature
Ir \Ve see RCA as another burclen (ancl not a to01), thcn we \-vill nol givc search rOl" sO!lltiollS. The gOJI of this first session should not be to collect 10090
it a high priority. of the data neecled. Icleally our data collcclion altcmpts sholllcl result in capturing
lf \Ve believe that management values proi1t more than safely, then \Iy'C abolll 60-70% of the necessary elata. AIl 01' lhe obvious surface data should be
may rat10nalize that bending the safety rules is really what our manage- collected first ancl also the most fragile data. Table 7.2 describes the normal fragility
ment wants us to do. 01" data at a typical event scene. By fragility \vc mean the prioritization of the
Ifwe belieye tharsomething is impossible to solve, then we will not sol ve it. 5-Ps in terms 01' which is most impOl"lant to collect first, second, third ancl so on.
If we believe that ",ve huye nm be en able to solve the problem in thc past, We shou!el be cOllcerned about \vhich elata has the grcatest likelihood of being
then no one \vill be able to solve it. lainlcd [he fastest.
100 Root Cause Analyss: Improving Performance for Bottom-Linc Results Tlle PROACP RCA r'v1etlloclology 101
Measllfing Tape
TABLE 7.2 Sample Vials
Data Fragi l'ty Ran mgs ,
\h/ire Tags to ID equipment
.> .5P:; , Fragi(ity Ranking
This lS of course a parlial listing, and cJepending on tlle organization ami nature
Parts ,
2 al' work other items \vould be aclclecl or deleted 1'1'Om the list.
Position 1 Thc following form is a typical dala collection fonn used for manua1ly organiziug
Paper 3 data colleelion strategics far aRCA team (Figure 7.6).
People , 1
l. Data Type/Category: L..iSl which 01' the 5-Ps this fonn is directed aL Each
Paradigms , 4 "P" should havc ils own 1'011n.
2. Person Responsible: The person responsible for making sure tbe data is
Yon vv1llnotice that people anc1 position are tiecl fr l1rst. This is 11m an acciclent. collected by the assignccJ date.
As \Ve discllssec1 earlier in this chapter, tbe need 10 int~rview observen; is immediate 3. Data to Cancet: During the 5-Ps brainstorming session, list a11 dara
in arder to obtain direct observatiol1. Positional int\'mnaliol1 is equally importanl neccssary to colleet for each "P."
bccaLlse it 1S the 1110St likely to be disturbcd the quicJ~est. Therefore attcmpts to gel 4. Data Colleclol1 Strmegy: This space is for aetually listing the plan of hO\v
sl1ch elata should be performed immediately. Parts are, seconcl because i1' therc i5 BOL to obtain the previously idcntified data to collect.
a plan to obtain them, t11ey will typically end IIp iI~ the trash can. Papel" elala 15 5. Date lo be Collceted By: DaLe by which the data is to be collected and
I
generally static \Vith 1he exception of process or onl1ne proc1uctiOI1 claln (DeS, reacly to be reportcd to tearn.
SPC/SQC). Such new tcchnologies allow for auton~atic averaging uf elata to the
point that if the information is not retrieveel within ,i certain time frame, it can be Figure 7.7 is a completed sLlmple Data Collection \Vorksheet:
lost forever. Paradigms are last becal1se \ve wish we,could change lhem faster, but
modifying behavior and belief systems takes more tirne.
One preparatory step for analysts shoulcl be to have a data c01lection kit
ahvays preparcd. l\IIany times such events occur when we least expecl it. \Ve do
not \vant to have to be nmning around collecling a camera, plaslic bags, etc. lf
it is aH in one place it is much easier to go prepared in a minule's Ilotiee. Usually
goocl models are from other emergency response occupations such as clo'clor's
bags, tire departments, police departments, E1VITs, etc. They always have: most
f \vhat t11ey need accessible at any time. Such a bag (in general) lllay he['\'C the
fallowing iterns:
Caution Tape
Masking Tape
Plastic Zip-Lock Bags
Gloves
Safety Glasses
Ear Plugs
Adhesive Labels
NIarking Pens
Digital Camera w/Spare Batteries
Video Camera (if possible)
lVIarking Paint
Tweezers
Pacl and Pen
S-P's Data Collection f'o;:m
"-
Analysis Name:
'"
Dala type: People, parts, position, paper, paradigms (circle one)
Champon:~~~~_ _ _~_~_ _ _ _ _~_ _ _ _~_
;v
n
'"
e
~
ro
~
[] . I ! +-1-
'"
-< ~
T
~
~+ ---j- I u
3
~II__
O
<
S-
t l / m
~-I
v
"
--+--i------.j ~
H
C.f.!
'
.---ll~-
1
~ I
3
'"
~
n
ro
o
H I I ~
co
o
S
1 1 .. -1 ~
r
3-
m
I f
'"
ro
~
e
FIGURE 7.6 Sample 5-Ps Data Collection Form ;;
l'csp?Hsble
-1J-': ,D,.-e"io-'Qe
...
_colle~ted,by! ~
9s:
~.' J Ken Latino
I. .~
Shift logs .1 Have shift foreman collect rile shift __
L---I
I O
--~~-i~ to John Sm~t~ within 1 day_ ----1..J----- O
00
-<
1_ _ 1 ' I
-.J. i ----.J
!
I
I ! -- 1-1.
11 I I --j :~
. ..,~ -----l ~
1
t-I
'.J,
l.] 1-=
1:- I
I 1
[-1 J:-
1- 1
--[----- : 1--
FiGURE 7.7. Compleled Data Collection Form
'"
'<o