You are on page 1of 44

Railway Certification and RAM Calculations

CSW Workshop on Dependability and Certification, Coimbra, Portugal, September 28th-29th, 2011

2011 Critical Software Dependable Technologies For Critical Systems

Contents
PART 1

Railway Certification

PART 2

Safety

2011 Critical Software S.A.

PART 3

RAM

Dependable Technologies For Critical Systems

PART 1 Railway Certification

Railway Certification Topics


IEC 61508 - Functional Safety Safety Integrity Levels CENELEC Standards EN50126/8/9 EN50126 Lifecycle Safety Cases Organisation

Organisation Independency
2011 Critical Software S.A. 4

2011 Critical Software S.A.

IEC 61508
Functional Safety

IEC 61508
Safety Integrity Level Tolerable Hazard Rate

SIL
4 3 2 1

Tolerable Hazard Rate


10-9 <= THR < 10-8 10-8 <= THR < 10-7 10-7 <= THR < 10-6 10-6 <= THR < 10-5

2011 Critical Software S.A.

CENELEC Standards
EN50126/8/9

EN 50126
Railway applications - Specification and demonstration of reliability, availability, maintainability and safety (RAMS)

2011 Critical Software S.A.

EN 50128
Railway applications Communications, signalling and processing systems Software for railway control and protection systems

EN 50129
Railway applications Communication, signalling and processing systems - Safety related electronic systems for signalling

EN50126 Lifecycle
Phase 1: Concept Phase 8: Installation Phase 2: System Definition Phase 9: System Validation Phase 3: Risk Analysis Phase 10: System acceptance Phase 4: System Requirements Phase 5: Apportionment of System Requirements
2011 Critical Software S.A.

Phase 12: Performance Monitoring

Phase 11: Operation and Maintenance

Phase 13: Modification and Retrofit

Phase 6: Design and Implementation Phase 7: Manufacture Phase 14: De-commissioning and Disposal
New Lifecycle

EN50126 Lifecycle
GPSC and GASC
Phase 1: Concept

Phase 2: System Definition

Phase 3: Risk Analysis

Phase 4: System Requirements Phase 5: Apportionment of System Requirements


2011 Critical Software S.A.

Generic Product Safety Case Generic Application Safety Case

Phase 6: Design and Implementation Phase 7: Manufacture

EN50126 Lifecycle
SASC
Phase 8: Installation

Specific Application Safety Case

Phase 9: System Validation

Phase 10: System acceptance

2011 Critical Software S.A.

Phase 12: Performance Monitoring

Phase 11: Operation and Maintenance

Phase 13: Modification and Retrofit

Phase 14: De-commissioning and Disposal

New Lifecycle

10

Safety Cases Organisation


GPSC, GASC and SASC
Generic Product
System Requirements Specification Safety Requirements Specification

Generic Application
System Requirements Specification Safety Requirements Specification

Specific Application
System Requirements Specification Safety Requirements Specification

2011 Critical Software S.A.

Generic Product Safety Case

Generic Application Safety Case


Safety Assessment Report

Specific Application Safety Case


Application Design Physical Implementati on

11

Safety Assessment Report

Safety Assessment Report

Safety Cases
GENERIC PRODUCT
The HW boards composing a module and the base SW that runs on the boards of the module represent a generic product.

The base SW is intended as the part of the SW that doesnt change from customer to customer and therefore normally includes the OS, the drivers, and the base SW functionalities.
2011 Critical Software S.A.

Generic Product Safety Case

12

Safety Cases
GENERIC APPLICATION
The generic application changes, typically, from customer to customer. It is defined by a set of HW modules combination (minimum and maximum number of a module, types of modules interconnection etc.) and by an application SW that specialize each module behaviour for a customer. The generic application is normally implemented by application data.

2011 Critical Software S.A.

Generic Application Safety Case

13

Safety Cases
SPECIFIC APPLICATION
The specific application specializes a generic application for a specific usage (typically a single train between all the trains of a customer fleet). This means that the generic application is configurable and that the specific application represents a specific configuration of it. An object of the specific application configuration level can be a voltage level of an input, a specific behavior of the logic for a particular train, etc
Specific Application Safety Case
Application Design

2011 Critical Software S.A.

14

Physical Implementati on

Organisation Independency
Project Manager Assr Dev. Team Ver. Team Val. Team

SIL 3 & SIL 4


2011 Critical Software S.A.

Project Manager
Assr Dev. Team Ver. Team Val. Team

15

Dependable Technologies For Critical Systems

PART 2 Safety

Safety Topics
Preliminary Hazard Analysis / Risk Analysis Hazard Analysis Hazard Log Safety Case Relation with other Safety Cases
2011 Critical Software S.A. 17

Preliminary Hazard Analysis


Hazard Identification Hazard Causes Identification Hazard Consequences Hazard Initial Risk Evaluation Hazard Mitigation Recommendations

Hazard Final Risk Evaluation


2011 Critical Software S.A. 18

Preliminary Hazard Analysis

Application Domain

System Context
2011 Critical Software S.A.

System Hazards

Past Experience

19

Preliminary Hazard Analysis


Example of hazard consequences in the railway domain:
Collision Derailment Casualties Injuries

20

2011 Critical Software S.A.

Risk Analysis
Risk Analysis Process

System Analysis

Hazard Identification

Hazard Consequence

Initial Risk Evaluation

Mitigation Actions

Final Risk Evaluation

Risk Quantification Process


2011 Critical Software S.A.

Check Hazard Frequency

Verify Hazard Severity

Make Qualitative Risk Evaluation

Risk Value

21

Risk Analysis
Risk Evaluation Matrix
Frequency Frequent Probable Occasional Remote
2011 Critical Software S.A.

Risk Level
Undesirable Tolerable Tolerable Negligible Negligible Negligible Intolerable Undesirable Undesirable Tolerable Negligible Negligible Intolerable Intolerable Undesirable Undesirable Tolerable Negligible Intolerable Intolerable Intolerable Undesirable Tolerable Negligible

Improbable Incredible

Insignificant

Marginal Severity

Critical

Catastrophic

22

Hazard Analysis
Top Level System Activities

Hazard Analysis Process

System and Interface Requirements

Architecture Specification

Hazard Analysis

2011 Critical Software S.A.

Preliminary Analysis Identified Hazards

Execution Preparation
Analyse all Inputs; Define Risk Analysis Methodoly; Define Safety Criteria Define Hazard Analysis Properties. Identify all foreseen hazards; Identify causes and consequences of each hazard; Evaluate initial risk (frequency and severity) of each hazard; Define mitigations (both preventive and protective) for each hazard; Define external costumer recomendations; Evaluate final risk when mitigations are applied

Input
System and Interface Requirements; Architecture Specification; Preliminary Hazards Analysis; Top level system activities.

Output
Hazard Log; System Safety Requirements; Safety Exported Constraints; Functional and Physical SIL Allocation.

23

Hazard Log
Property
ID-xxx System Activity Architecture Item Function Name Component Name System State Hazard Description Hazard Cause Hazard Effect Direct Consequence Frequency Severity Risk Evaluation level Preventive Countermeasure
2011 Critical Software S.A.

Description
A running number starting from 1 Activity to support the analysis of this hazard. Sub-system where the hazard was identified Function where the hazard was identified Component where the hazard was identified System state where the hazard was identified Hazard Description Cause of Hazard Effects of the hazard in the system Description of the direct consequence of this hazard in the environment. Frequency of the hazard occurrence Severity of the hazard Risk evaluation Level Preventive Countermeasure Protective Countermeasure Description of the consequence of this hazard in the environment after applying the mitigation action. Recommendations for the customer. Need to be transmitted to customer Final Frequency of the hazard occurrence Final Severity of the hazard

Protective Countermeasure Mitigated Consequence Customer Recommendations Final Frequency Final Severity

Final Risk Evaluation Level


Application Conditions

Final Risk Evaluation Level


Application conditions code for the correct usage of the system in terms of safety.

24

Safety Requirement related Code


Hazard Status

FDT3_RS_SR_xxx : Code of Safety Requirement related to Hazard.


Status of the hazard.

Safety Case
Generic Product Safety Case
Part 1:
Definition of the System

Part 2:
Quality Management Report

Part 3:
Safety Management Report

Part 4:
Technical Safety Report

Part 5:
Related Safety Cases

Part 6:
Conclusions

2011 Critical Software S.A.

Part 4: Technical Safety Report


Section 2: Section 1:
Introduction Assurance of Correct Operation

Section 3:
Effects of Faults

Section 4:
Operation with External Influences

Section 5:
Safety-related Application Conditions

Section 6:
Safety Qualification Tests

25

Relation with other Safety Cases


Component 1 GPSC Component 2 GPSC
2011 Critical Software S.A.

Product 1 GPSC

Component 3 GPSC

26

Dependable Technologies For Critical Systems

PART 3 RAM

RAM Topics
Dependability Concepts RAM Process RAM Activities Qualitative Analysis FMEA, FTA

Quantitative Analysis
2011 Critical Software S.A.

Software Reliability

28

Dependability Concepts
EN50126
Reliability
Probability that an item can perform a required function under given conditions for a given time interval (t1, t2).

Availability
Ability of a product to be in a state to perform a required function under given conditions at a given instant of time or over a given time interval, assuming that the required external resources are provided.
2011 Critical Software S.A.

Maintainability
Probability that a given active maintenance action, for an item under given conditions of use can be carried out within a stated time interval when the maintenance is performed under stated conditions and using stated procedures and resources.

29

Dependability Concepts
System A System B

MTTF: Mean Time To Failure MTBF: Mean Time Between Failures


2011 Critical Software S.A.

MDT: Mean Down Time MTW: Mean Time Waiting


MDT = MTW + MTTR
MTBF = MTTF + MDT

MTTR: Mean Time To Repair

30

31

2011 Critical Software S.A.

RAM Process

RAM Activities
Preliminary RAM Analysis Reliability Apportionement & Prediction Detailed RAM Analysis
Qualitative Analysis Quantitative Analysis
FMEA/FMECA FTA
2011 Critical Software S.A.

RBD FA

CCA
HSIA,

32

Back to Basics
Qualitative Analysis
Failure Mode:
Cause Local Effect (Fault) (Error)

End Effect

(Failure)

Fault -> Error -> Failure Fault -> Detection -> Negation -> Restore
2011 Critical Software S.A.

All errors must be properly handled (detected and mitigated)


FMEA/FMECA, FTA, CCA, HSIA are all different techniques for assisting the identification of all system failures, effects and combinations/propagations.

33

FMECA Table
Failure Modes, Effects, and Criticality Analysis
Field Name FMEA ID Trace from Function Generic failure mode

Failure Mode
Failure Cause Local Effects Propagates to End Effects Impact Type
2011 Critical Software S.A.

Severity Probability of Ocurrence Method of Detection Compensating Provisions

Mitigated Severity
Mitigated Probability

34

Notes

35

2011 Critical Software S.A.

FTA
Fault Tree Analysis

Quantitative Analysis
Reliability

36

2011 Critical Software S.A.

Quantitative Analysis
Availability

Stand-by:
2011 Critical Software S.A.

...

37

Quantitative Analysis
In Practice

38

2011 Critical Software S.A.

Software Reliability
RAM calculations tipically consider only HW failure rates;
In SW failures are systematic; Software does not wear out or break;

Software failures result of errors in the software;


This does not necessarily imply that a software function containing implementation errors will fail every time it is called!
2011 Critical Software S.A.

The error may not reveal every time the function is called.

There are no absolute answers for the classification of software reliability

39

Software Reliability
To prove the absence of faults in reasonably complex software is a tremendous task, if not impossible. To avoid software errors, EN 50128 provides a set of development guidelines and V&V procedures that, for the highest integrity levels, are very demanding.
2011 Critical Software S.A.

Accordance with these procedures allows an extreme level of confidence in the SW implementation correctness.

40

SW Failure Probability
Classification alternatives:
0/1 No value. Only Qualitative Analysis.
Evaluate the probability of failure excluding software causes from the calculation and present this value together with a detailed analysis of SW failures impact

Mapped to SIL Level Value evaluation supported by:


2011 Critical Software S.A.

Sound engineering and statistical judgment, analyses, and evidences. Service records

41

42

2011 Critical Software S.A.

Stress testing
IEC 60605-4

Example

r T m
2011 Critical Software S.A.

2 50000 25000

20 500000 25000

90%
95% 99%

~9400
~7950 ~6920

~19300
~17900 ~15700

43

Contacts
Jorge Almeida
j-almeida@criticalsoftware.com

Jos Faria
jmfaria@criticalsoftware.com

Coimbra, Lisboa, Porto www.criticalsoftware.com

Southampton www.critical-software.co.uk

San Jose www.criticalsoftware.com

So Jos dos Campos www.criticalsoftware.com.br

Maputo http://www.criticalsoftware.co.mz
44 2011 Critical Software

You might also like