You are on page 1of 44

Human Error

C H A P T E R 8 ASSESSING AND REDUCING THE HUMAN ERROR RISK

HKR-6350

Syed Nasir Danial

Outline
2

This chapter focuses on remedial solutions to errors

rather than questioning why the error is caused.


More

specifically, it deals with the various techniques employed by human reliability specialists to assess and mitigate the risks associated with human error. For example, probability risk assessment and human error analysis.

Outline
3

Topics to be discussed here fall in following four

major categories:

Probability Risk Assessment (PRA) techniques Human Reliability Analysis (HRA) techniques Risk management Potential measures of error reduction

Basis of this study


4

The development of human reliability analysis is based

on fortunes / misfortunes of nuclear power industry. The first reason is the Chernobyl disaster which happened on 26 April 1986 at the Chernobyl Nuclear Power Plant in Ukraine. An explosion and fire released large quantities of radioactive contamination into the atmosphere, which spread over much of the then Western USSR and Europe. 31 killed instantaneously 64 killed due to radioactivity till 2008 WHO estimates a death toll of 4000 due to radioactivity 50,000 excess cancer cases

Aftereffects: Without public enquiry it is not possible in Europe to install Nuclear Power Plants

To show the public about their safety, standards are made and studies such as Probability Risk Assessment and Human Reliability Analysis are proposed.

Probability Risk Assessment


6

It has two main aims: To identify potential areas of significant risk and thereby to indicate how improvements can be made. To quantify the overall risk from a potentially hazardous plant. The general structure of PRA is founded in 1975 with

the US Reactor Safety Study, a 10-Kg document known as WASH-1400.

Probability Risk Assessment


7

The PRA procedural steps: Identify the source of potential hazard (use of fault trees). In case of nuclear plants the major hazard is the release of radioactivity from a degraded core. Identify the initiating events that could lead to this hazard Establish the possible sequences that could follow from various initiating events using event tree. Quantify each event sequence.
Assessment of the frequency of initiating event Assessment of the probability of failure on demand of the relevant safety systems.

Determine the overall plant risk.

Probability Risk Assessment


8

Event / Fault Trees


Error tree is an inductive analytical diagram in which an event is analyzed using Boolean logic to examine a chronological series of subsequent events or consequences.

The Fault Tree Analysis evaluates risk by tracing backwards in time or backwards through a cause chain. The analysis takes as a premise a given hazard. Event tree analysis is a logical evaluative process which works by tracing forward in time through a causal chain to model risk. It does not require the premise of a known hazard

Probability Risk Assessment


9

Criticism on PRA
10

Leveson et. al (2011) of MIT argue that the chain-of-event

conception of accidents typically used for such risk assessments cannot account for the indirect, non-linear, and feedback relationships that characterize many accidents in complex systems. These risk assessments do a poor job of modeling human actions and their impact on known, let alone unknown, failure modes.

In the case of many accidents, probabilistic risk assessment models

do not account for unexpected failure modes. For example, At Japan's Kashiwazaki Kariwa reactors, after the 2007 Chuetsu earthquake some radioactive materials escaped into the sea when ground subsidence pulled underground electric cables downward and created an opening in the reactor's basement wall. As a Tokyo Electric Power Company official remarked then, It was beyond our imagination that a space could be made in the hole on the outer wall for the electric cables.

Outcomes of PRA studies


11

The underlying logic provides an important adjunct

to the design process, identifying those areas where redundant or diverse safety systems need to be installed. Its major failure is its inability to accommodate substantial contribution of human failures (mistakes). (This is further developed in HRA)

Human Reliability Analysis


12

There are a large number of techniques which lie

under the umbrella of human reliability analysis. Schurman and Banks (1984) reviewed nine models Hannaman, Spurgin and Lukic (1984) identify ten including two of their own. Senders, Moray and Smiley (1985) examined eight such models. Williams (1985) compared nine techniques.

Human Reliability Analysis


13

Six techniques are discussed here which are: Technique for human error rate prediction (THERP) Time-reliability techniques Empirical techniques to estimate operators errors (TESEO) Confusion matrix Success likelihood index methodology (SLIM) Systematic human action reliability procedure (SHARP)

14

Technique for human error rate prediction (THERP)

It assumes that the operators actions can be regarded in the same light as the success or failure of a given pump or a valve. Reliability of operator is assessed in the same way as that of an equipment.

15

THERP is performed according to following steps: Identify the system functions that may be influenced by human error. List and analyze the related human operations (i.e. perform a detailed task analysis)
This

stage requires a comprehensive task and human error analysis. The task analysis describes a complete list of each and every elements and information required by task operators. For each step of the task possible errors are listed which may be categorized as follows: Errors of Omission: The task is omitted partly or wholly

16

Errors of Commission
Errors of Selection: errors in use of controls or in issuing a command Errors of Sequence: required action is carried out in wrong order Errors of Timing: task is executed in wrong timings Errors of Quality: inadequate or in excess amount

Estimate the relevant error probabilities using a combination of expert judgment and available data. Estimate the effects of human errors on the system failure events, a step that usually involves the integration of HRA with PRA.

17

A little more on THERP

The core of THERP is contained in 27 tables of human error probabilities set out in Part IV of the 1983 book Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications by Swain & Guttman. The values given in the tables relate to nominal human error probabilities (HEPs ) which are generic values, based on expert opinion and data borrowed from similar activities as that of NPP operators. Each of the table deals with a particular errors associated with specific activity, e.g., errors of commission in reading and recording quantitative information, etc. Filling out 27 tables is a extensive job requires expert such as Alan Swain himself, otherwise, it is hard to implement this technique.

18

Time Reliability Techniques These are closely related techniques which quantify postaccident errors on the basis of time-reliability-curves.

Operator Action Trees (OATS) (1980)

Three types of cognitive errors are identified: Failure to perceive that an event has occurred Failure to diagnose the nature of event and its remedy Failure to implement the responses correctly in a timely manner

Here time-reliability curves are used which consider probability of failures as a function of time.

19

Empirical technique to estimate operators errors (TESEO)


Its an acronym for Italian name: tecnica empirica stima errori operatori. TESEO yields the probability of operator failures through the combined application of five error probability parameters, K1 to K5: K1=type of activity (routine or not, requiring close attention or not): probability parameter between 0.001 and 0.1 K2=a temporary stress factor (for routine and nonroutine activities) probability parameter between 10 and 0.5 K3=operator qualities (assigned according to selection, expertise, and training): parameters between 0.5 and 3.

20

K4=an activity anxiety factor (depending upon situation, either a grave emergency, potential emergency or nominal conditions): parameter between 3 and 1. K5=an activity ergonomic factor (according to the quality of the microclimate and plant interface. Parameter between 0.7 and 10.

21

Confusion Matrix (Potash, 1981)


It is designed to evaluate various modes of misdiagnosis by operators for a range of possible events under abnormal plant conditions. The method relies upon judgment of experts as to the likelihood of different misdiagnoses of specific critical plant conditions These judgments are solicited in a structured and systematic way for the evaluation of probabilities at different times during a given accident sequence. Outputs are probabilities that the operator fail to respond correctly to events A, B, C, etc. at different times after the initiation of the sequence. However, different values for probabilities may be assigned by different experts based on their own experiences and expertise.

22

Success likelihood Index (SLI) methodology


The SLI is derived from a consideration of the typical variables, called performance influence factors (PIF), known to influence error rates (e.g., quality of training, procedures and time available for action). Judges can give numerical ratings (called weights) of how good or bad these PIFs are in a given situation. The relative importance weights and error rates are multiplied together for each PIF and the sum of products is called likelihood index. This index is presumed to relate to the probability of success that would be observed over the long run in the particular situation of interest.

23

Systematic human action reliability procedures (SHARP)

Designed by Hannaman et. Al, (1984) which is used to guide the selection of appropriate HRA models and techniques in a given situation (THERP, OATS, etc.)

Risk Management
24

Probability risk assessment does more than identify an acceptable level of risk. It also defines acceptable level of safe operations of the plant. The risk includes
The use of components and materials which fall outside the populations providing the PRA failure data, or that are substandard. The fact that real plant does not conform to the model underlying PRA The plant is not modeled or maintained according to the guidelines of PRA

Risk Management
25

This risk should be minimized by modifying the design of the plant, appropriate level of training or by reducing the factors figured out during PRA.

Potential Measures of Error Reduction


26

Here we discuss some of the error reduction possibilities which are still in the early stages of research and development. Eliminating affordance of errors Intelligent decision support systems Memory aids for maintenance personnel Training Issues Ecological Interface design Self-knowledge about error types and mechanism

Potential Measures of Error Reduction


27

Eliminating affordance of errors The term affordance refers to the basic properties of objects that shape the way in which people react to them. Normans analysis pose several questions (p. 235):
Humans always behave clumsily when the things they must do are badly conceived, badly designed. Does a commercial airliner crash? A pilot error? Does a Soviet nuclear plant have a serious problem? Human error? Consider the Three-Mile Island (Pennsylvania, USA) nuclear accident, where failure in the non-nuclear system followed by a stuck-open pilot operated relief valve in the primary system which allowed large amounts of radioactive material to waste in environment.

Potential Measures of Error Reduction


28

The mechanical failures were compounded by initial failures of operators to recognize the situation due to inadequate training and human factors, such as human computer interaction design oversights. The reports blames on operators, however, as Norman argues, why was this due to initial failure of equipment? What was about misdiagnosis? To Norman, it was about equipment failure coupled with serious design errors. Error affordance situations is due to two types of knowledge:
Knowledge in the world (KIW), Knowledge of people in their heads

KIW is always accessible but subject to out-of-sight-out-ofmind principle

Potential Measures of Error Reduction


29

Knowledge in heads (KIH) is efficient and independent of the immediate environment, it needs to be retrieved and may require reminding. KIW needs not to be learnt but to be interpreted and it is this interpretation which often leads to erroneous situation.

Potential Measures of Error Reduction


30

Minimization Principles for Error Affordances Use KIW and KIH both in order to promote a good conceptual model of the system on the part of its users. Simplify the structure of task so as to minimize the load upon vulnerable cognitive processes such as working memory, planning, or problem solving. Make both the execution and the evaluation sides of an action visible. Visibility of execution to know what is possible and how it should be done and visibility of evaluation means to gauge the effects of users actions. Exploit natural mappings between intentions and possible actions, between actions and their effects on the system, between the actual system state and what is percievable.

Potential Measures of Error Reduction


31

Exploit the power of constraints, both natural and artificial. Constraints guide the user to the next appropriate actions or decision. Design for errors. Assume their occurrences. Plan for error recovery. When all else fails, standardize actions, outcomes, layout, displays, etc.

Potential Measures of Error Reduction


32

Intelligent Decision Support System (IDSS) Following TMI-2, the industry, specifically Electric Power Research Institute in Palo Alto, explored various ways of aiding operator decision making in accident conditions. These include among others, a computerized support systems. These systems ranged from safety parameter display systems (showing trends in important system state variables) to predictive on-line simulation capable of answering What if? questions during the course of an emergency.

Potential Measures of Error Reduction


33

Failures of IDSS Devis-Besse accident (Chapter 7) (1985, Ohio), both independent safety parameter display systems were out of action before and during the event. NUREG report (1985) says that this was due to data transmission problem between the control room terminals and respective computer processors. Moreover, the shift technical advisor was on a 24-hour shift and was asleep at the time the reactor trip. He arrived in 15-minutes but during this time the event was essentially over. Thereafter he acted as an administrative assistant, rather than technical advisor. However a progress in AI techniques should enable us to use new and powerful IDSS tools for use in such situations.

Potential Measures of Error Reduction


34

Memory Aids for Maintenance Personnel A number of nuclear power plant error surveys (Rasmussen, 1980; INPO, 1984, 1985b) suggest that maintenance related omission errors significant causes for failures. Other task factors which are likely to promote the probability of making an omission errors are:
The larger the number of discrete steps in an action sequence, the greater are the chances to omit one or more of them. The greater the informational loading of a particular procedural step, the more likely it is that items within that step will be omitted. Procedural steps that do not follow in a direct linear sequence

Potential Measures of Error Reduction


35

from them are likely to be omitted Verbal instructions in more than five simple steps, items from the middle are likely to be omitted In written instructions, isolated steps at the end of sequence (e.g., replacing caps or bushes, removing tools, etc.) Necessary steps in an action sequence are more likely to be omitted during reassembly than during the original disassembly (Chapter 6) Highly automatic tasks are prone to unexpected interruptions due to omission because either unrelated action is unconsciously counted-in or the individual lose his place on resumption of the task.

To control such slips and omission errors, a memory device

Potential Measures of Error Reduction


36

is suggested. A prototype of such a device is termed portable interactive maintenance auxiliary (PIMA).

37

Potential Measures of Error Reduction


38

Training Issues
Procedures or heuristics. (Duncan 1987) most of the experiences operators try to use their own heuristic rather than going through a standard procedure. The experienced operators usually design their own procedure to deal with a faulty situation. Simulator Training Duncan (1987, p. 266) suggest that [Simulator] training may succeed in providing operators with generalizable diagnostic skill but there are limits to what may be achieved, and post-training probabilities of diagnostic error remain uncomfortably high. Creating a simulated environment for an event (that has never happened) in a complex system such as the nuclear power plant is still a dream! And therefore a toy type simulator would yield probabilities of diagnostic error very high for post training.

Potential Measures of Error Reduction


39

Error Management
This is a procedure developed by Micheal Frese group at University of Munich from empirical research on errors in human computer-interaction. They note that errors committed during training can have both negative and positive effects. The aim of error management is to promote the positive and to mitigate the negative effects in a systematic fashion. The positive aspects of errors: Errors can encourage creative problem solutions. For example, if a trainee does not know the difference between insert mode and overwrite mode (in word processing), then errors resulting from this lack of knowledge lead him to explore these modes spontaneously.

Potential Measures of Error Reduction


40

It is to be noted that human errors are not always

stochastic. In fact, they tend to be nonstochastic most of the times. Such are the intrinsic part of mental functioning and can not be eliminated by means of a training, no matter how effective the training is. Therefore, the most productive strategy is to focus upon controlling their consequences rather than striving for their elimination.

Potential Measures of Error Reduction


41

Ecological Interface Design


Rasmussen & Vicente focus upon four categories of errors: (a) Errors related to learning and adaptation (b) Interfacing among competing cognitive control structures (c) lack of resources (d) Intrinsic human variability Rasmussen &Vicente present ten guidelines principles to have an improved system design. The important aspect of these guidelines is that they are distinguished at the skill-based, rule-based and knowledge based levels of performance.

Potential Measures of Error Reduction


42

Self-knowledge about error types and mechanisms


This means to incorporate some useful degree of self-knowledge about human error mechanisms to those for whom the consequences of slips, lapses, and mistakes are unacceptable. The operators of high-risk technologies are informed about likely system breakdowns and how to deal with them. To protect aircrew against the effects of in-flight disorientation they are demonstrated directly to the varieties of disorientations in both actual and simulated flights. They are also told that their earthbound sense (which is two-dimensional) may mislead them about the actual position-and-motion information while in the air (three dimensional).

Concluding Remarks
43

Nature of human error is discussed Distinctions between error types based on performance

levels is proposed Error forms are discussed A framework theory of error productions and several experimental survey results are discussed (Ch. 4 & 5) Modes of error detection and various other processes, especially in high-risk technologies, by which error may be detected are proposed (Ch. 6, 7) Assessment and mitigation of errors is discusses in the present chapter.

44

Thank you Questions?

You might also like