You are on page 1of 91

SENG 637

Dependability,
Dependability Reliability &
Testing of Software
Systems
SRE D
Deployment
l t
(Chapter 10)
Department of Electrical & Computer Engineering, University of Calgary

B.H. Far (far@ucalgary.ca)


http://www enel ucalgary ca/People/far/Lectures/SENG637/
http://www.enel.ucalgary.ca/People/far/Lectures/SENG637/

far@ucalgary.ca 1
Contents
 Quality in requirements phase
 Quality in design & implementation, testing &
release phases
 Software
S f Q li Assurance
Quality A (SQA) and
dSSoftware
f
Reliability Engineering (SRE)
 Quality, test and data plans
 Roles and responsibilities
 Sample quality and test plan
 Defect reporting procedure
 Best practices of SRE
 Quality in post-release and maintenance phase
far@ucalgary.ca 2
Quality vs.
vs Project Costs
Cost distribution for a typical
software
ft project
j t
Product
Integration Design
and test

Programming

3
Total Cost Distribution

Product Design
Questions:
Q

Programming How to build


quality into a
system?
Maintenance
How to
Integration
andd test
t t assess quality
of a system?
Developing better quality system will
contribute to lowering maintenance
costs
4
Quality in Software
Development Process
Q. How to include quality concerns in the process?
Architectural analysis
Quality attributes Software Reliability Software Quality
Method: ATAM, CBAM, etc. Engineering (SRE) Assurance (SQA)

Requirement & Design &


Test & Release
A hit t
Architecture I
Implementation
l t ti

Maintenance

Software Q
Quality
y Assessment
Method: RAM, etc.

far@ucalgary.ca 5
Chapter 10 Section 1
S f
Software Q li
Quality:
Requirements
q and
Architecture phase

far@ucalgary.ca 6
Quality Challenges
 Modern software systems are required to meet
several quality attributes such as: modifiability
modifiability,
performance, security, interoperability, portability,
reliability, etc.
 Questions for any particular system:
 What precisely do these quality attributes mean?
 Can a system be
b analyzed
l d to determine
d i desired
d i d qualities?
li i
 How soon can such an analysis occur?
 How do you know if the design is suitable without
having to build the system first?

 SW Architecture Evaluation / Assessment!

far@ucalgary.ca 7
Evaluating SW Architecture
 Determining whether an architecture satisfies
it requirements
its i t often
ft involves:
i l
 Being very explicit about what the requirements
((functional & non-functional)) are and how theyy are
reflected in the architecture
 Understanding where one has to make trade-offs between
different design alternatives
 Applying analysis wherever possible to determine the
consequences of an architectural choice
 Mediating between desires of different stakeholders
To achieve these goals an architectural
evaluation
l ti process is
i needed
d d

far@ucalgary.ca 8
SW Architecture Evaluation

I f
Informal
l / ad-hoc
d h architectural
hit t l evaluation
l ti

 Pros?
 Quick and Cheap
Q p
 Cons?
 … and Dirty? Incomplete? Unreliable?
 … Unrepeatable? Poorly documented?

far@ucalgary.ca 9
SW Architecture Evaluation
 Are there better methods than ad-hoc evaluation?
 The answer is “YES”:
 SAAM (Software Architecture Analysis Method)
 Scenario-based evaluation
 ATAM (Architecture Tradeoff Analysis Method)
 Scenario based evaluation with focus on trade-offs
Scenario-based trade offs
 SACAM (Software Architecture Comparison Method)
 Business goal-driven comparison of architecture alternatives
 CBAM (Cost-Benefit Analysis Method)
 Focus on economic aspects
 etc.
t

far@ucalgary.ca 10
References
 Software Architecture Technology Initiative of the
SEI: http://www.sei.cmu.edu/architecture/
 ATAM: Method for Architecture Evaluation (2000),
Rick Kazman, Mark Klein, Paul Clements, Technical
Report, CMU/SEI-2000-TR-004.
 CBAM Making
CBAM: M ki Architecture
A hi Design
D i Decisions:
D ii An
A
Economic Approach (2002), Rick Kazman, Jai
Asundi, Mark Klein, Technical Report, CMU/SEI-
2002 TR 035.
2002-TR-035.

 Software Architecture in Practice, 2nd ed., Len Bass,


Paul Clements,, Rick Kazman,, Addison-Wesley, y, 2003.

 Evaluating Software Architectures: Methods and Case


Studies, Paul Clements, Rick Kazman, Mark Klein,
Addison-Wesley, 2001.

far@ucalgary.ca 11
Chapter 10 Section 2
S f
Software Q li Design
Quality: D i &
Implementation, Testing &
Release Phases

far@ucalgary.ca 12
What is Reliable Software?
 Reliable software products are those that run correctly and
consistently, have fewer remaining defects, handle abnormal
situation properly, and need less installation effort
 The remaining defects should not affect the normal behaviour
and the use of the software,
software they will not do any destructive
things to system and its hardware or software environment,
and rarely be evident to the users
 D l i reliable
Developing li bl software
ft requires:
i
 Establishing Software Quality System (SQS) and Software
Quality Assurance (SQA) programs
 Establishing Software Reliability Engineering (SRE)
process

far@ucalgary.ca 13
Software Quality System (SQS)
Goals:
 Building
B ildi qualitylit
into the software
from the beginning
g g

 Keeping
p g and
tracking quality in
the software
throughout the
software life cycle
T h l
Technology
John W. Horch: Practical Guide to Software Quality Management

far@ucalgary.ca 14
Software Quality Assurance (SQA)
 Software quality Assurance (SQA) is a planned and
systematic approach to ensure that both software process and
software product conform to the established standards,
processes, and procedures.
 The goals of SQA are to improve software quality by
monitoring both software and the development process to
ensure full compliance with the established standards and
procedures.
procedures
 Steps to establish an SQA program
 Get the top management’s agreement on its goal and support.
 Identify SQA issues, write SQA plan, establish standards and SQA
functions, implement the SQA plan and evaluate SQA program.

far@ucalgary.ca 15
SRE: Process & Plans
Requirement & Design &
Test
Architecture Implementation

Define Necessary
Reliability

Develop Operational
SRE Profile

Proc Prepare
P ffor T
Testt

Apply
Execute
Failure
Test
Data

time
Quality Test Data
Plan Plan Plan There may be many Test and Data (measurement)
plans for various parts of the same project

far@ucalgary.ca 16
Defect Handling: Without &
With SQS
 Defect reporting,
p g, tracking,
g, and closure procedure
p

Defect
reports
DB
SCN: software change notice

STR: software trouble report


p

John W. Horch: Practical Guide to Software Quality Management

far@ucalgary.ca 17
SRE: Who is Involved
Involved?
?
 Senior management
 Test coordinator (manager)
 Data coordinator (manager)
 Customer or user

far@ucalgary.ca 18
SRE: Management Concerns
 Perception and specification of a customer’s real needs.
 Translation
l i off specification
ifi i intoi a conforming
f i design.
d i
 Maintaining conformity throughout the development
processes.
processes
 Product and sub-product demonstrations which provide
convincing indications of the product and project having met
their requirements.
 Ensuring that the tests and demonstrations are designed and
controlled, so as to be both achievable and manageable.

far@ucalgary.ca 19
Roles & Responsibilities /1
 Test Coordinator (Manager):
Test coordinator is expected to ensure that every specific statement of
intent in the product requirement, specification and design, is matched by
a well designed (cost-effective, convincing, self-reporting, etc.) test,
measurement or demonstration.
demonstration
 Data Coordinator (Manager) :
Data coordinator ensures that the physical and administrative structures
f ddata collection
for ll i exist i andd are documented
d d in
i the
h quality
li plan,
l receives
i
and validates the data during development, and through
analysis and communication ensures that the
meaning of the information is known to all, in
time, for effective application.

far@ucalgary.ca 20
Roles & Responsibilities /2
 Customer or User:
 Actively encouraging the making and following of detailed
quality plans for the products and projects.
 Requiring access to previous quality plans and their
recorded
d d outcomes bbefore
f accepting
i the
h figures
fi andd
methods quoted in the new plan.
 Enquiring into the sources and validity of synthetics and
formulae used in estimating and planning.
planning
 Appointing appropriate personnel to provide authoritative
responses to queries from the developer and a managed
p
interface to the developer.
 Receiving and reviewing reports of significant audits,
reviews, tests and demonstrations.
 Making any queries and objections in detail and in writing,
at the earliest possible time.

far@ucalgary.ca 21
Quality Plans /1
 The most promising mechanisms
f gaining
for i i andd improving
i i
predictability and controllability
of software qualities are quality Test
plan and its subsidiary Plan
documents, including test plans
and data (measurement) plans. Quality
 The creation of the quality plan Plan
can be instrumental in raising
project effectiveness and in
ppreventingg expensive
p and time- Data
consuming misunderstandings Plan
during the project, and at
release/acceptance time.

far@ucalgary.ca 22
Quality Plan /2
 Quality plan and quality record, provide guidelines
f carrying
for i outt andd controlling
t lli the
th followings:
f ll i
 Requirement and specification management.
 Development processes.
processes
 Documentation management.
 Design evaluation.
 Product testing.
 Data collection and interpretation. SRE related
activities
 Acceptance and release processes.

far@ucalgary.ca 23
Quality Plan /3
 Quality planning should be made at the very earliest point in a
project preferably before a final decision is made on
project,
feasibility, and before a software development contract is
signed.
 Quality plan should be devised and agreed between all the
concerned parties: senior management, software development
management (both administrative and technical),
technical) software
development team, customers, and any involved general
support functions such as resource management and
company-wide id quality
lit management. t

far@ucalgary.ca 24
Data (Measurement) Plan
 The data (measurement) plan prescribes:
 What should be measured and recorded during a project;
 How it should be checked and collated;
 How it should be interpreted and applied.
applied
 Data may be collected in several ways, within the
specific project and beyond it.
it
 Ideally, there should be a higher level of data
collection and application into which project data is
fed.

far@ucalgary.ca 25
Test Plan /1
 The purpose of test plan is to ensure that all testing activities
(including those used for controlling the process of
development, and in indicating the progress of the project) are
expected, are manageable and are managed.
 Test plans are created as a subsection or as an associated
document of the quality plan.
 Test plans become progressively more detailed and expanded
d i a project.
during j t
 Each test plan defines its own objectives and
scope, and the means and methods by which
the objectives are expected to be met.

far@ucalgary.ca 26
Test Plan /2
 For the software product, the test plan is usually restricted by
the scope of the test: certification, feature and load test.
 The plan predicts the resources and means required to reach
the required levels of assurance about the end products, and
the scheduling of all testing,
testing measuring and demonstration
activities.
 Tests, measurements and demonstrations are used to establish
th t the
that th software
ft product
d t satisfies
ti fi the
th requirements
i t document,
d t
and that each process during a development is carried out
correctly and results in acceptable outcomes.

far@ucalgary.ca 27
Chapter 10 Section 2.1
El
Elements
t off Q
Quality
lit & Test
T t Plan
Pl

far@ucalgary.ca 28
Sample SQS Plan /1
 1 Purpose
 2 Reference Documents
 3 Management
 3.1 Organization
 3 2 Tasks
3.2
 3.3 Responsibilities

Based on IEEE Standard 730.1-1989

far@ucalgary.ca 29
Sample SQS Plan (cont
(cont’d)
d) /2
 4 Documentation
 4.1 Purpose
 4.2 Minimum Documentation
 44.2.1
2 1 Software
S ft Requirements
R i t Specification
S ifi ti
 4.2.2 Software Design Description
 4.2.3 Software Verification and Validation Plan
 4.2.4 Software Verification and Validation Report
 4.2.5 User Documentation
 4.2.6 Configuration
fi i Management Plan l
 4.3 Other Documentation

Based on IEEE Standard 730.1-1989

far@ucalgary.ca 30
Sample SQS Plan (cont
(cont’d)
d) /3
 5 Standards, Practices, Conventions, and
Metrics
 5.1 Purpose
 5.2 Documentation, Logic, Coding, and
Commentary Standards and Conventions
 5.3 Testing Standards, Conventions, and Practices
 5.4 Metrics

Based on IEEE Standard 730.1-1989

far@ucalgary.ca 31
Sample SQS Plan (cont
(cont’d)
d) /4
 6 Review and Audits
 6.11 Purpose
6
 6.2 Minimum Requirements
 6.2.1 Software Requirements Review
 6.2.2
6. . Preliminary
e a y Design
es g Review
ev ew
 6.2.3 Critical Design Review
 6.2.4 Software Verification and Validation Review
 6.2.5 Functional Audit
 6.2.6 Physical Audit
 6.2.7 In-process Reviews
 6.2.8 Managerial Reviews
 6 2 9 Configuration
6.2.9 C fi i Management Planl Reviewi
 6.2.10 Postmortem Review
 6.3 Other Reviews and Audits

Based on IEEE Standard 730.1-1989

far@ucalgary.ca 32
Sample SQS Plan (cont
(cont’d)
d) /5
 7 Test
 8 Problem Reporting and Corrective Action
 8.1 Practices and Procedures
 8.2 Organizational Responsibilities
 9 Tools,
T l Techniques,
T h i andd Methodologies
M h d l i
 10 Code Control
 11 Media Control
 12 Supplier Control
 13 Records Collection, Maintenance, and Retention
 14 Training
 15 Risk Management

Based on IEEE Standard 730.1-1989

far@ucalgary.ca 33
Sample Test Plan /1
 1 Test Plan identifier
 2 Introduction
 j
2.1 Objectives
 2.2 Background
 2.3
.3 Scope
 2.4 References

Based on IEEE Standard 829-1983

far@ucalgary.ca 34
Sample Test Plan (cont
(cont’d)
d) /2
 3 Test Items
 3.1 Program Modules
 3.2 Job Control Procedures
 3.3 User Procedures
 3.4 Operator Procedures
 4 Features To Be Tested
 5 Feature Not To be Tested

Based on IEEE Standard 829-1983

far@ucalgary.ca 35
Sample Test Plan (cont
(cont’d)
d) /3
 6 Approach
 6.1 Conversion Testing
 6.2 Job Stream Testing
 6 3 Interface Testing
6.3
 6.4 Security Testing
 6.5 Recovery Testing
 6.6 Performance Testing
 6.7 Regression
 6.8 Comprehensiveness
 6.9 Constraints

Based on IEEE Standard 829-1983

far@ucalgary.ca 36
Sample Test Plan (cont
(cont’d)
d) /4
 7 Item Pass/Fail Criteria
 8 Suspension Criteria and Resumption
Requirements
q
 8.1 Suspension Criteria
 8.2 Resumption
p Requirements
q
 9 Test Deliverables
 10 Testing Tasks

Based on IEEE Standard 829-1983

far@ucalgary.ca 37
Sample Test Plan (cont
(cont’d)
d) /5
 11 Environmental Needs
 11.11 Hardware
11 H d
 11.2 Software
 11.3 Securityy
 11.4 Tools
 11.5 Publications
 12 Responsibilities
 12.1 Test Group
 12 2 User Department
12.2
 12.3 Development Project Group

Based on IEEE Standard 829-1983

far@ucalgary.ca 38
Sample Test Plan (cont
(cont’d)
d) /6
 13 Staffing and Training Needs
 13.1 Staffing
 13.2 Training
 14 Schedule
 15 Risks and Contingencies
 16 Approvals

Based on IEEE Standard 829-1983

far@ucalgary.ca 39
Chapter 10 Section 2.2
B tP
Best Practice
ti SRE

far@ucalgary.ca 40
Practice of SRE /1
 The practice of SRE provides the software engineer or
manager the means to predict,
predict estimate,
estimate and measure the rate
of failure occurrences in software.
 Using SRE in the context of Software Engineering, one can:
 Analyze, manage, and improve the reliability of software products.
 Balance customer needs for competitive price, timely delivery, and a
y!

reliable product.
p
Hopefully

 Determine when the software is good enough to release to customers,


minimizing the risks of releasing software with serious problems.
 Avoid excessive time to market due to overtesting
overtesting.
H

far@ucalgary.ca 41
Incremental Implementation
 Most projects
implement the
SRE activities
incrementally.
 A typical
implementation
sequence

far@ucalgary.ca 43
Implementing SRE /1
 Feasibility and requirements phase:
 Define and classify failures, i.e., failure severity
classes
 Identify customer reliability needs
 Determine operational profile
 Conduct trade-off studies (among reliability, time,
cost, people, technology)
 Set reliability objectives

far@ucalgary.ca 44
Implementing SRE /2
 Design and implementation phase:
 Allocate reliability among components, acquired
software, hardware and other systems
 Engineer to meet reliability objectives
 Focus resources based on operational profile
 Measure reliability of acquired software, hardware
and other systems, i.e., certification test
 Manage fault introduction and propagation

far@ucalgary.ca 45
Implementing SRE /3
 System test and field trial phase:
 Determine operational profile used for testing, i.e.
test profile
 Conduct reliability growth testing
 Track testing progress
 Project additional testing needed
 Certify reliability objectives and release criteria
are met

far@ucalgary.ca 46
Implementing SRE /4
 Post delivery and maintenance:
 Project post-release staff needs
 Monitor field reliability vs. objectives
 Track customer satisfaction with reliability
 Time new feature introduction by monitoring
reliability
 Guide product and process improvement with
reliability measures

far@ucalgary.ca 47
Feasibility Phase
 Activity 1: Define and classify failures
 D fi ffailure
Define il from
f customer’s
t ’ perspective
ti
 Group identified failures into a group of severity classes from
customer’s perspective
 Usually 3-4 classes are sufficient
 Activity 2: Identify customer reliability needs
 What is the level of reliability that the customer needs?
 Who are the rival companies and what are rival products and what is
their reliability?
 Activity 3: Determine operational profile
 Based on the tasks performed and the environmental factors

far@ucalgary.ca 48
Requirements Phase
 Activity 4: Conduct trade-off studies
 Reliability and functionality
 Reliability, cost, delivery date, technology, team
 Activity 5: Set reliability objectives
based on
 Explicit requirement statements from a request for
pproposal
p or standard document
 Customer satisfaction with a previous release or similar
product
 Capabilities of competition
 Trade-offs with performance, delivery date and cost
 Warranty, technology capabilities

far@ucalgary.ca 49
Design Phase
 Activity 6: Allocate reliability among acquired software,
components hardware and other systems
components,
 Determine which systems and components are involved and how they
affect the overall system reliability
 Activity 7: Engineer to meet reliability objectives
 Plan using fault tolerance, fault removal and fault avoidance
 Activity 8: Focus resources based on operational profile
 Operational profile guides the designer to focus on features that are
supposed to be more critical
 De elop more critical functions
Develop f nctions first in more detail

far@ucalgary.ca 50
Implementation Phase
 Activity 9: Measure reliability of acquired
software,
ft h
hardware
d and
d other
th systems
t
 Certification test using reliability demonstration chart
 A ti it 10:
Activity 10 Manage
M fault
f lt introduction
i t d ti andd
propagation
 Practicing a development methodology; constructing
modular system; employing reuse; conducting inspection
and review; controlling change

far@ucalgary.ca 51
System Test Phase
 Activity 11: Determine operational profile used
f ttesting
for ti
 Decide upon critical operations
 Decide upon need of multiplicity of operational profile
 Activity 12: Conduct reliability growth testing
 Activity 13: Track testing progress and certify
that reliability objectives are met
 Conduct feature test,
test regression test and performance and
load test
 Conduct reliability growth test

far@ucalgary.ca 52
Field Trial Phase
 Activity 14: Project additional testing needed
 Check accuracy of test: time and coverage
 Plan for changes in test strategies and methods
 A ti it 15:
Activity 15 Certify
C tif ththatt reliability
li bilit objectives
bj ti and
d
release criteria are met
 Check accuracy of data collection
 Check whether test operational profile reflects field
operational profile
 Check customer’s definition of failure matches with what
was defined for testing the product

far@ucalgary.ca 53
Post Delivery Phase /1
 Activity 16: Project post-release staff needs
 Customer’s
C t ’ staff
t ff for
f system
t recovery; supplier’s
li ’ staff
t ff to
t
handle customer-reported failures and to remove faults
 Activityy 17: Monitor field reliabilityy vs. objectives
j
 Collect post release failure data systematically
 Activity 18: Track customer satisfaction with
reliability
li bili
 Survey product features with a sample customer set

far@ucalgary.ca 54
Post Delivery Phase /2
 Activity 19: Time new feature introduction by
monitoring reliability
 New features bring new defects. Add new features desired
by the customers if they can be managed without
sacrificing
ifi i reliability
li bili off the
h whole
h l system
 Activity 20: Guide product and process
improvement with reliability measures
 Root-cause analysis for the faults
 Why the fault was not detected earlier in the development
phase and what should be done to reduce the probability of
introducing similar faults

far@ucalgary.ca 55
Chapter 10 Section 2.3
P ti Variations
Practice V i ti

far@ucalgary.ca 56
Existing vs
vs. New Projects
 There is no essential difference between new and existing
projects in applying SRE for the first time.
time However,
However
determining failure intensity objective and operational profile
for existing projects is easier.
 Most of the SRE activities will require only small updates
after they have been completed once, e.g., operational profile
should only be updated for the new operations added.
added
(remember interaction factor)
 After SRE has been applied to one release, less effort is
needed for succeeding releases, e.g., new test cases should be
added to the existing ones.

far@ucalgary.ca 57
Short--Cycle Projects
Short
 Small projects or releases or those with short
development cycles may require a modified set of
SRE activities to keep costs low or activity durations
short.
 Reduction in cost and time can be obtained by
limiting the number of elements in the operational
profile and bby accepting less precision.
precision
 Examples: Setting one operational mode and
performing certification test rather than reliability
growth test.

far@ucalgary.ca 58
Cost Concerns
 There may be a training cost when starting to apply
SRE.
SRE
 The principal cost in applying SRE is determining
the operational profile.
profile
 Another cost is associated with processing and
analyzing failure data during reliability growth test.
test
 As most projects have multiple releases, the SRE
cost drops sharply after initial release
release.

far@ucalgary.ca 59
Practice Variation
 Defining an operational profile based on “customer
customer
modeling”.
 Automatic test cases generation based on frequency
of use reflected in operational profile.
 Employing “cleanroom”
cleanroom development techniques
together with feature and certification testing.
 A t
Automatic
ti tracking
t ki off reliability
li bilit growth.
th
 SRE for Agile software development.

far@ucalgary.ca 60
Conclusions …
 Practical implementation of an effective SRE
program is a non
non-trivial
trivial task.
 Mechanisms for collection and analysis of data on
software product and process quality must be in
place.
place
 Fault identification and elimination techniques must
be in place.
p
 Other organizational abilities such as the use of
reviews and inspections, reliability based testing, and
software process improvement are also necessary for
effective SRE.
 Quality oriented mindset and training are necessary!

far@ucalgary.ca 61
Chapter 10 Section 3
S f
Software Q li
Quality:
Post Release & Maintenance
Phase

far@ucalgary.ca 62
Quality Assessment
 Post-release
Post release quality
assessment: Quality
Assess
evaluation,, validation ment

Ref: Design for Electrical & Comp. Engineers, J.E. Salt et al., Wiley

63
Quality Assessment:
Difficulties
Leonardo Pablo
da Vinci Picasso

Mona Lisa Dorra Maar


(1479) (1937)

 Same requirements can lead to different systems


 Need to account for “creativity” in the “design” of the
product and the “requirements” as well as the “product”
itself
 Quality assessment method: RAM

64
How Do We Assess Quality?
Usual (ad-
(ad-hoc)
approach

Systematic
approach: RAM

far@ucalgary.ca
65
Inside RAM
 What is RAM?

RAM:: RELIABILITY – AVAILABILITY – MAINTAINABILITY


RAM

A collection of numerical analysis techniques


that quantifies the reliability, availability and
maintainability of a complex system
RAM analysis helps us answer questions
related to dependability (i.e.
(i e reliability,
reliability safety,
safety
availability and maintainability) of the system

66
RAM: Advantages & Uses
Can be used to understand
 Operation of the system - System reliability versus
through-put rate requirements
 Safety of the system - Identifiable
d ifi bl failure
f il modes
d
which present an unacceptable consequence to
facility workers or the public
 Improvements that can have substantial impacts on

system performance - Recommendations for


improving the safety and reliability of
equipment/processes.
q p p

far@ucalgary.ca 67
RAM: Data Requirements
 Failure data
 Maintenance data
 Reliability and availability data from
recognized industry standards (MTTF, MTBF
& MTTR)
Data collection requires:
• Engineering
E i i experience
i andd judgment
j d t
• Interviews with engineering and maintenance
personnel at the system site

far@ucalgary.ca 68
Case Study
RAM Analysis
Di t ib t d C
Distributed Control
t lS System
t off th
the
Bonnybrook Waste Water Treatment
Pl t (Cit
Plant (City off Calgary)
C l )

far@ucalgary.ca 69
Background
 The City of Calgary invested $100 million in the 1994
expansion of the Bonnybrook Wastewater Treatment Plant
(WTP) to serve Calgary's growing population, which was
767,000 in 1996.
 This expansion increased the plant capacity by %25 to
500,000 cubic meter per day, while incorporating state-of-the-
art treatment technologies.
technologies
 This study was performed in order to provide the City with an
assessment of quality of the Distributed Control Systems
(DCS) of the Bonnybrook WTP to be used as a guide for the
next WTP plant at Pine Creek.

far@ucalgary.ca 70
Background (cont’d)
(cont d)
 The City’s WTP DCS is real-time, mission
critical, dependable, safe and secure.
 However, the current q qualityy measures for
City of Calgary’s WTP DCS is unknown.
 To successfully improve the safety and
reliability for the next generation of WTP,
which is built in Pine Creek, a study of current
level of reliability and safety of the existing
Bonnybrook WTP plant was prudent.

far@ucalgary.ca 71
Assumptions & Hints
a) Deal with both hardware (mechanical and
electrical) and software failures
b) Only deal with “failures”, not mandatory
preventative maintenance or minor repairs
where no components are replaced
c) Components whose function is to wear and/or
fail after a certain period of time (e.g.,
batteries, etc.), and regularly replaced items
are not included in the analysis
d) Probes, gauges, or transmitters whose
purpose is to provide information to the user
are not included
far@ucalgary.ca 72
Assumptions (Cont
(Cont’d)
d)
e) Failures due to an improper installation of
hardware/software are not included
f) Missing parts are not considered failures
(e.g., rivets, screws, bolts)
g) Anything below the subsystem level is
considered to be in series
h) All subsystems are independent (i.e., loss of
one subsystem does not result in loss of
another
th subsystem’s
b t ’ functionality)
f ti lit )
i) Failures are not distinguished based on their
severity
far@ucalgary.ca 73
RAM for Bonnybrook WWTP

Reason for conducting RAM analysis for Bonnybrook WWTP

y
Current scenario at Bonnybrook WWTP

Methods / Techniques used

Result of the analysis (How Reliable is Bonnybrook?)

Key value of RAM analysis to the City of Calgary

How to use the results (for current and future systems)

far@ucalgary.ca 74
Why RAM?
 Reason for conducting RAM analysis for
Bonnybrook WWTP
What we know? What we DO NOT know?
Current system runs smoothly Actual reliability of the system
Minor failures can be repaired Cost of each maintenance
easily (e.g. card frame change)
Connection layout of components Impact of “minor” failures on the overall system
Frequency of failure / maintenance
- Accurate failure data
- Accurate maintenance data
Ch
Change in
i costt / reliability
li bilit with
ith th
the change
h iin configuration
fi ti
Is the current system configuration good for next projects?
Is the system serial or parallel?
Are the components
p inside each DCU serial or p
parallel?
Can the change in layout change the performance?

far@ucalgary.ca 75
Why RAM?
 Reason for conducting RAM analysis for
B
Bonnybrook
b k WWTP
 Better understand the system (system configuration)
 Better understand the impact of failure / faults of
components on the system
 Establish ggroundwork for Reliability-Availability-
y y
Maintainability measurement
 Study the method of data collection, fault / maintenance
record keeping
 Design and develop tool to perform what-if scenario

far@ucalgary.ca 76
RAM: Current Scenario
 Current scenario at Bonnybrook WWTP
 Reliability of components and the system as a
whole is not measured
 Established method to measure the system
reliability needs to be put in place

far@ucalgary.ca 77
RAM: Methods Used
 Techniques are selected based on availability
of data and tools
 Techniques
q used in this analysis
y are:
 Reliability Block Diagram (RBD)
 Reliabilityy Demonstration Chart (RDC)
( )
 Fault Tree Analysis (FTA)

far@ucalgary.ca 78
RAM: RBD
 A reliability block diagram is a graphical
representation
t ti off how
h theth components t off a system
t
are connected from reliability point of view

far@ucalgary.ca 79
RAM: RDC
 RDC analysis is an efficient way of checking
whether
h th failure
f il iintensity
t it objective
bj ti (FIO) isi mett or
not.

far@ucalgary.ca 80
RAM: FTA
 Fault tree analysis is
a graphical
hi l
representation of the
major (critical)
failures associated
with a pproduct,, the
causes for the faults,
and potential
countermeasures.

far@ucalgary.ca 81
Analysis: System Configuration

far@ucalgary.ca 82
Analysis: Data Control System

RBD of The DCS layout

far@ucalgary.ca 83
Analysis: Inside a DCU

Contains serial and parallel subsystems


Configuration affects total system reliability

Total 10 Units

far@ucalgary.ca 84
Analysis: Results

far@ucalgary.ca 85
Analysis: Results
What We know? What we would Like to see?
The exact layout of the DCS (inside-out) - More relevant failure data
- More maintenance data
Actual reliability of the current system Failure mode and their effects (FMEA)
Cost and impact of “minor” failures on the
overall system
Change in cost / reliability with the change in
config ration
configuration
Is the system serial or parallel?
Are the components inside each DCU serial or
pa a e
parallel?
Can the change in layout change performance?
Is the current system/configuration fit to be
used in the next projects?

far@ucalgary.ca 86
RAM: Key Values
From engineering point of view
 Current:
 Can understand what the system looks like inside-out
 Can use current system as benchmark for future system
system’ss
performance
 Can change components and see their effects on reliability
 In future:
 Can be used to pinpoint single points of failures
 Can be used to effectively plan redundancy and refrain
from “over engineering” and over spending (spending can
be made at the right place to complement reliability and
availability
far@ucalgary.ca 87
RAM: Key Values
From management point of view
 Can help perform what-if
what if scenario evaluation
 Can help planning and design of future projects and plants
 Can helpp pperform cost-value analysis
y on maintenance vs.
replacement
 Can help make better decisions on system/ subsystem/
component purchase
h b d on reliability
based li bili data
d andd impact
i on
performance
 Can help compare systems/subsystems/components from
several vendors.
 Can be used to plan procedures that need to be in place for
d collection,
data ll i maintenance
i as well
ll as analysis
l i purpose.

far@ucalgary.ca 88
What Was Accomplished
 Defined benchmark value for reliability metrics for
WTP DCS components
WTPs t ((vendors’
d ’ ddata)
t )
 Defined the architecture of the WTPs DCS
 Identified
d ifi d source off failure
f il data
d
 Stated ground rules and assumptions
 Identified the confidence level of estimation and
predictions
 Collected failure and maintenance data for the
current WTPs DCS

far@ucalgary.ca 89
What Was Accomplished
 Analyzed data to identify proper distribution that
models
d l failure
f il data
d t
 Performed goodness-of-fit and bias tests (using
reliability demonstration charts,
charts fault tree analysis,
analysis
etc.) to validate distribution fit
 Estimated current system reliability
 Based on these:
 A reliability calculation chart to perform what-if
what if analysis
for various units of the system was developed
 A list of recommendations for reliability improvements of
the WTPs DCS was produced
far@ucalgary.ca 90
Conclusions
 System integration & manufacturing are not the final
steps
t off a development
d l t process (usually)
( ll )
 Quality assessment of hardware/software system can
be performed systematically using RAM
 Mechanisms for failure data collection and
interpretation are necessary
 Engineering judgment (in selecting tools, techniques,
interpreting data,
data etc.)
etc ) is essential to the analysis

far@ucalgary.ca 91
far@ucalgary.ca 92

You might also like