Professional Documents
Culture Documents
Vijaya Deepti
Nimmagadda Ramanamurthy and K. Uma Balasubramanian
Tata Consultancy Services
Bangalore, Karnataka India
Abstract
“Identifying and dealing with risks early in development lessens long-term costs and
helps prevent software disasters”
– Barry W. Boehm, 1991
“FMEA is a technique used to identify, prioritize, and eliminate potential failures from
the system, design or process before they reach the customer”
– Omdahl, 1988
At a high level, the framework diagram shows the steps taken for risk management
at enterprise level:
• Risks are identified using techniques such as Top 10 Top-level Software Risks,
checklists or common risk lists from the organisation’s knowledge repository.
Past experience, problem analysis, assumption analysis and intuition are used to
derive these lists.
• In a programme, during the start-up phase, risks are analysed for what-if
scenarios in schedule, effort and quality of software variable. Their relative
probability of occurrence and impact on the project are determined. The
probability and impact levels used are as follows in Table 1 [Boehm]:
Value (Interpretation)
Probability 0.1-0.3 (Improbable) 0.4-0.6 (Probable) 0.7-1.0 (Frequent)
Impact 1-3 (Low) 4-6 (Medium) 7-10 (High)
2 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
• A prioritised list of identified risks is drawn up using risk exposure analysis, risk
exposure being calculated by multiplying loss probability and loss impact for each
risk. Fig. 2 shows the quick-reference table used for risk exposure. (For instance,
risk exposure values for frequent and low-impact risks range from 0.7 to 3.0.)
• All risks and contingency plans are documented in the risk management plan.
• Risks are reviewed regularly for change, since changes in impact or probability
affect risk exposure, irrespective of a trigger event. The risks are then re-
prioritised and corrective action taken.
• The status of the top 10 critical risks, i.e. those that have the highest impact on
the project’s success, is tracked and reported to management regularly.
3 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
This format includes prioritised (re-prioritised) risks. However, it does not address
prevention, i.e. early identification and mitigation of risk. Risk management planning
remains inadequate unless risk exposure is mapped to a threshold that triggers
contingency plan invocation.
Management found it needed five key features in the risk management plan:
This led to developing a systematic, best-in-class risk analysis model that was
prevention-oriented and could segregate the ‘vital few’ risks from the ‘trivial many’ ones.
TCS thus adopted the Failure Mode and Effects Analysis (FMEA) technique.
The power of FMEA is four-fold. Firstly, all FMEA artifacts are dynamic, living
documents. Continuous improvement and risk level reduction drive FMEA. Next, the
technique identifies high-priority, ‘vital few’ risks because, in real life, not all problems are
equally important. Thirdly, FMEA is customer-oriented although a customer
representative may not be an end-user. Fourthly, FMEA offers audit trails, i.e. a well-
documented record of improvements arising out of corrective action implemented. In
sum, FMEA gives one a mechanism to document and monitor all data elements required
to meet business drivers.
Action Results
RPN=S*P*D
Occurrence (P)
Step Failure/Error Effect(s) Of Cause(s) Of Controls Action Completion Date Action Taken
New Severity
New Occurrence
New RPN
New Detection
0 0
0 0
0 0
0 0
4 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
• The customer’s perspective on the effect of failure is described and a severity
rating attached.
• The possible causes of these failures are identified and documented. These
are then granularised to a low level so that corrective action and control is
possible. A probability value is assigned to each cause.
The primary reasons for which TCS elected to go the FMEA way were:
To realise the benefits of the technique and ingrain continuous improvement into
TCS’s organisational culture.
TCS used FMEA for programme risk management. Process change and
qualitative benefits prompted a discussion with the customer, whose desire for
improvement brought in the needed rigour and spurred FMEA use in the projects.
Lastly, FMEA is a technique already being used in the Six Sigma process
improvement projects at TCS.
TCS therefore identified the need to enhance FMEA. It discussed this with the
customer and shared with them the resulting gap analysis report.
5 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
CMMi RSKM PA Goals Key Practice Sub-Practice Gap in FMEA
[SG1] Preparation for risk [SP 1.2] Define the parameters used to [2] Define thresholds for each risk Yes. There is no category or threshold.
management is conducted. analyze and categorize risks, and the category Priority is by RPN value and risks are
parameters used to control the risk addressed by priority
management effort.
[SG3] Risks are handled and [SP 3.1] Develop a risk mitigation plan [3] Determine cost-to-benefit ratio of Yes. The technique does not provide
mitigated, where appropriate, to for the most important risks to the implementing risk mitigation plan for means for this
reduce adverse impacts on project, as defined by the risk each risk
achieving objectives. management strategy.
[SG3] Risks are handled and [SP 3.2] Monitor the status of each risk [2] Provide a method for tracking open No. Responsibility and Target Completion
mitigated, where appropriate, to periodically and implement the risk risk-handling action items to closure Dates are handled in FMEA
reduce adverse impacts on mitigation plan as appropriate
achieving objectives.
The enhanced form has additional fields such as failure category, risk
identification date, mitigation start date, cost-benefit ratio, etc (Fig. 5).
Failure categorisation
6 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
Risk Management - Potential Failure Modes and Effects Analysis
<Project or Application>
Severity (S)
Severity (S)
RPN (S * P * D)
RPN (S * P * D)
Occurrence (P)
Detection Score (D)
Occurrence (P)
Detection Score (D)
0 0
0 0
0 0
0 0
Total Risk Priority Number 0 0
The toolkit calculates cost versus benefits for recommended mitigation. The
resulting figures are documented in the new risk management plan.
Uniform usage
Anticipatory dates
Each recommended mitigation action is given a start date, which is the earliest
date that a risk is likely to affect the project. This is the date by which the risk must be
acted on. Additionally, target completion dates are also set, which are the latest dates for
action before the impact is felt.
7 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
Risk threshold criteria matrix
2 If RPN >= nn2 and < nn1, then Medium Risk. Warning. Needs
constant monitoring
3 If RPN >= nn3 and < nn2, then Low Risk. Under control. No action
required
This was a form that allowed risk monitoring to be done weekly. Active risks were
reviewed, new ratings were assigned according to severity, occurrence and detection
observed in the risk management plan’s monitoring section. Depending on the
recalculated RPN value, status of risk at the end of the week was determined as high
(red), medium (amber) or low (green). A unique ID is assigned to each risk item to link it
to the risk management plan and status tracking and reporting form. Review of all active
risks was carried out as a matter of course and the top-10 risks reported.
Escalation
High risks that stayed high for more than two weeks were escalated as per the
mechanism defined (Fig. 8).
8 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
6. Challenges Encountered
Some of the challenges encountered in developing and deploying the model are
worth mentioning:
• Defining the categories and criteria to determine threshold values for critical
risks
• Calculating cost-benefit ratios for risk mitigation action
• Fostering a culture in the organisations, which motivated appropriate
stakeholders to participate actively in risk mitigation
At the programme level, one of the risks identified is related to the delay in
immigration processing of associates travelling overseas. Failure in this area would
delay project start-ups and, in turn result in cost escalation. The details documented in
the new risk management plan (the enhanced FMEA) were as shown in Fig. 9.
9 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
This risk was identified on 23 Dec 2003. The start date for action on mitigation
was the following day. The risk was to be mitigated before 16 Jan 2004 (the end date),
as the project was to start on 19 Jan 2004.
The mitigation action recommended was to keep have a joint capacity planning
and keep all the stakeholders informed whenever a team was identified.
Using the toolkit, the cost-benefit ratio was found to be 5.06. It was calculated as
savings over investment, where
and
Risk impact cost was calculated as cost for retaining current resources.
Mitigation cost was calculated as a sum of yearly one-time cost to develop a capacity
plan and recurring cost for regular conference calls to keep the stakeholders informed.
Within two weeks of identification, the risk was under control and is now a low
risk. A snapshot of the status tracking and reporting form of the risk is shown in Fig. 10.
This also shows the audit trail from identification to reporting.
Status Before Status Now No. Description dd/mm/yy dd/mm/yy dd/mm/yy
☺ Immigration processing delay for associates travelling
Immediate Satisfactorily overseas
action Dealt With 1
The mitigation of this risk involved all the stakeholders, namely, the project
leader, the visa cell representative, the project manager at the client end and one team
member.
Fig. 11 shows an analysis of the consolidated status of all risks currently active at
programme level. Clearly the enhanced FMEA model was effective.
10 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
Distribution of risks in sample list of categories
12% Category1
8% 28% Category2
Category3
Category4
12% Category5
Category6
12% 20%
8% Category7
70 30%
50
20%
% of Risks
40
15%
30
10%
20
10 5%
0 0%
Dec'03 Jan'04 Feb'04 Mar'04 Dec'03 Jan'04 Feb'04 Mar'04
Month -> Month ->
Benefits
11 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS
Drivers for Change:
# Description Yes No
1 A shift from corrective to preventive mode
Going forward, TCS has decided to create a databank on programme risk to help
other customers understand the challenges in offshoring, develop assets to manage the
risks better and institutionalise the process.
That TCS enjoys the confidence of the client for which it piloted the model is
evidenced in the exact words of the customer: “We are happy to [partner with you in
adopting the model for managing project risk] . . . . We are trialling this technique in a
few IT projects.”
References
Project Procedures Manual, Version 8.1. Tata Consultancy Services, October 2003.
Software Risk Management. Boehm, Barry W., IEEE Computer Society Press, 1989.
Software Runaways: Lessons Learned from Massive Software Project Failures. Glass,
Robert L., Prentice-Hall, 1998.
The Six Sigma Way Team Fieldbook. Pande, Pete S., Robert P. Neuman, Roland R.
Cavanagh, Tata-McGraw Hill, 2003.
12 Presented at Annual Project Management Leadership Conference 2004 – QAI India TCS