You are on page 1of 29

Risk Analysis using Simulation

Page 1 of 29

Risk Analysis using simulation


Introduction

Simulation is a process of understanding behavior of a system or


evaluating the various strategies for the operation of the system.

Simulation calculates multiple scenarios of a model by repeatedly


sampling values from the probability distributions for the uncertain
variables.

Simulation involves the generation of the artificial history of the system


and observation of the artificial history to draw inferences concerning the
operating characteristics of the real system that is presented.

Why Simulation?

When it is too costly to do physical studies on the system itself (e.g., trying
alternative layout of a factory, building new facility)

The corresponding analytic models are too complicated to study (e.g. a


queuing/transportation network)

When uncertainty, non-stationarity dominates the output results.

Steps in Simulation
Model development: System defination, objectives to study, decision
variables, output measures, input variables and parameters.
Data collection: Collect data from the real system, obtain probability
distributions of the input parameters by statistical analysis
Model translation: Translate the physical model into computer simulation
software.
Simulation runs & Output Data Analysis: Run the simulation upto
satisfactory level and use statistical analysis of the ouput data to estimate
the performance measures.

Risk Analysis using Simulation

Page 2 of 29

Model Verification and Validation:


z Verification: Ensuring that the model is free from logical errors. It
does what it is intended to do.
z Validation: Ensuring that the model is a valid representation of the
whole system. Model outputs are compared with the real system
outputs.
Optimization/Experimental Design: Analyze alternative strategies on the
validated simulation model.
Sensitivity analysis: Analuze the input data sensitivity in performance
measure of the system. Cheking model robustmess.

Tools for simulation

@Risk

Crystal Ball

MatLab

Minitab

Excel Spreadsheet

And so on, etc

Simulation using Excel Spreadsheet


Basic Excel Skills
To facilitate the understanding of simulation using Excel, it is necessary to be
familiar with the following features of Excel

Copying formula and cell references

Function: It is used in performing special calculations in the Spread sheet


cells. Some of the more common functions that are usually used in
simulation models are including:
9 MlN (range) - finds the smallest value in a range of cells.
9 MAX (range) - finds the largest value in a range of cells
9 SUM (range) - finds the sum of values in a range of cells

Risk Analysis using Simulation

Page 3 of 29

9 AVERAGE (range) - finds the average of the values in a range of


cells
9 STDEV(range) -finds the standard deviation for a sample in a
range of cells
9 AND (condition 1, condition 2, ) -a logical function that returns
TRUE if all conditions are true, and FALSE if not.
9 OR (condition1, conditiott2 . . .) - a logical function that returns
TRUE if any condition is true, and FALSE if not.
9 IF (condition, value if true, value if false) -a logical function that
returns one value if the condition is true and another if the condition
is false.
9 VLOOKIUP (value, table range column number) -looks up a
value in a table.

Charts and graphs

Excel has many other functions for statistical, financial and other
applications

Others useful features

Split Screen: Division of displayed spreadsheet screen

Paste Special: Controlling of direct pasting into the cells

Column and Row width: Customization of Column and Row size.

Displaying Formulas in Worksheets: Showing actual formula in


spreadsheet

Displaying Grid Lines and Row, and Column Headers for Printing

Filling a Range with a Series of Numbers.

Comment Boxes

Building Simulation Models in Excel


However a good design is essential to user understanding. Any good simulation
model design in any spreadsheet should be included with the following features.

A descriptive title

Risk Analysis using Simulation

Page 4 of 29

A separate input data section area

A separate working space

A separate output section that provides the model results.

Appropriate formatting such as in currency or comma formatting.

Complex calculations should be divided into several cells to minimize the


chances of error and enhance understanding.

Comments should be placed next to formula cells or in comment boxes for


explanation, if appropriate.

Example: A Simulation Model for Dave's Candies


Daves Candies is a small family owned business that offers gourmet chocolates
and ice cream fountain service. For special occasions such as Valentines day,
the store must place orders for special packaging several weeks in advance from
their supplier. One product, Valentines day chocolate massacre, is bought for
$7,50 a box and sells for $12.00. Any boxes that are not sold by February 14 are
discounted by 50% and can always be sold easily. Historically Daves candies
has sold between 40-90 boxes each year with no apparent trend. Daves
dilemma is deciding how many boxes to order for the Valentines day customers.
If the order quantity, Q is 70, what is the expected profit?
Formulation:
Selling price=$12
Cost = $7.50
Discount price=$6
| If *D<Q
Profit=selling price*D - cost*Q + discount price*(Q-D)..........[1]
| D>Q
Profit=selling price*Q-cost*Q................................................[2]
*D=Demand & Q=Quantity

Risk Analysis using Simulation

Page 5 of 29

Simulation model in Excel

Model description

The input data are given in Column A and Column B.

Simulation results are shown in Columns D through F.

IF function is used in Column F to calculate the profit based on order


demand and quantity relation using the equations [1] and [2].

Risk Analysis using Simulation

Page 6 of 29

Each row in the results table represents one trial of the simulation.

Demand in each row was generated from probability distribution.

Since selling boxes range 40-90 boxes/year, the values in Column E were
generated by rolling a die and entered into the worksheet to verify the
formulas for profit in column F.

Probability & Statistic in Simulation


Fundamentals

Probability - Probability is a measure of how likely a value or event is to


occur. It can be measured from simulation data as frequency by
calculating the number of occurrences of the value or event divided by the
total number of occurrences. This calculation returns a value between 0
and 1 which then can be converted to percentage by multiplying by 100.

Probabilistic Risk Assessment (PRA) - A risk assessment that yields a


probability distribution for risk, generally by assigning a probability
distribution to represent variability or uncertainty in one or more inputs to
the risk equation.

Probability Density Function (PDF) - A function representing the


probability distribution of a continuous random variable. The density at a
point refers to the probability that the variable will have a value in a narrow
range about that point.

Probability Distribution - A probability distribution is a set of probabilities


associated with all possible outcomes of an uncertain event.

Probability Mass Function (PMF) - A function representing the


probability distribution for a discrete random variable. The mass at a point
refers to the probability that the variable will have a value at that point.

Random Variable - A variable that may assume any value from a set of
values according to chance. Discrete random variables can assume only a
finite or countable infinite number of values (e.g., number of rainfall events
per year). A random value is continuous if its set of possible values is an
entire interval of numbers (e.g., quantity of rain in a year).

Risk Analysis using Simulation

Page 7 of 29

Central Tendency Exposure (CTE) - A risk descriptor representing the


average or typical individual in a population, usually considered to be the
mean or median of the distribution.

Confidence Interval - Confidence intervals refers an interval that provides


a range where the exact or true probability value of parameters may lie.

Confidence Limit - The upper or lower value of a confidence interval.

Cumulative Distribution Function (CDF) - Obtained by integrating the


PDF, gives the cumulative probability of occurrence for a random
independent variable. Each value c of the function is the probability that a
random observation will be less than or equal to c.

Frequency Distribution or Histogram - A graphic (plot) summarizing the


frequency of the values observed or measured from a population. It
conveys the range of values and the count (or proportion of the sample)
that was observed across that range.

Monte Carlo Analysis (MCA) or Monte Carlo Simulation - A technique


for characterizing the uncertainty and variability in risk estimates by
repeatedly sampling the probability distributions of the risk equation inputs
and using these inputs to calculate a range of risk values.

Point Estimate - In statistical theory, a quantity calculated from values in


a sample to estimate a fixed but unknown population parameter. Point
estimates typically represent a central tendency or upper bound estimate
of variability. [Source: U.S. EPA]

Types of Distribution:

Continuous distribution: A probability distribution is continuous when


any value between the minimum and maximum is possible (has finite
probability). For example, an uncertainty function describing the possible
annual rainfall in Ithaca, New York next year would be a continuous
function since any value between 0 and some upper limit is possible.

Risk Analysis using Simulation

Page 8 of 29

Discrete distribution: A discrete probability distribution has only a finite


number of possible values between the maximum and minimum. For
example, an uncertainty function describing the outcome of a coin toss is
discrete since only two values are possible: heads or tails.

Descriptive Statistics
Parameter - A value that characterizes the distribution of a random variable.
Parameters commonly characterize the location, scale, shape, or bounds of the
distribution. For example, a truncated normal probability distribution may be
defined by four parameters: arithmetic mean [location], standard deviation
[scale], and min and max [bounds]. It is important to distinguish between a
variable (e.g., ingestion rate) and a parameter (e.g., arithmetic mean ingestion
rate).
Measures of Central Tendency: Mean, Median, & Mode

Mean: The mean of a set of values is the sum of all the values in the set
divided by the total number of values in the set.

1
N

i =1

xi

Median: The median is the middle data point when the data is ordered.
The middle value (50th percentile) in the ordered sequence of measured
values. For highly skewed data sets the median can give a better

Risk Analysis using Simulation

Page 9 of 29

representation than the mean of the middle of the data distribution. The
median is not as comprehensive a measure of the data set as the mean.

Mode: The most likely value or mode is the value that occurs most often
in a set of values. In a histogram and a result distribution, it is the center
value in the class or bar with the highest probability.

Measures of Variation

Standard deviation: The standard deviation is a measure of how widely


dispersed the values are in a distribution. Equals the square root of the
variance.
N

(x
i =1

)2

N 1

Variance: The variance is a measure of how widely dispersed the values


are in a distribution, and thus is an indication of the "risk" of the
distribution. It is calculated as the average of the squared deviations about
the mean. The variance gives disproportionate weight to "outliers", values
that are far away from the mean. The variance is the square of the
standard deviation.

Range: The range is the absolute difference between the maximum and
minimum values in a set of values. The range is the simplest measure of
the dispersion or "risk" of a distribution

Kurtosis: Kurtosis is a measure of the shape of a distribution. Kurtosis


indicates how flat or peaked the distribution is. The higher the kurtosis
value, the more peaked the distribution. The coefficient of kurtosis is
computed as:
N

CK =

(x
i =1

)4

Risk Analysis using Simulation

Skewness:

Page 10 of 29

Skewness is a measure of the shape of a distribution.

Skewness indicates the degree of asymmetry in a distribution. Skewed


distributions have more values to one side of the peak or most likely value
one tail is much longer than the other. A skewness of 0 indicates a
symmetric distribution, while a negative skewness means the distribution
is skewed to the left. Positive skewness indicates a skew to the right. The
coefficient of skewness is computed as:
N

(x

CS =

i =1

)3

Coefficient of Variation: It is the ratio of standarad deviation and the


sample mean.
CV =

Correlation: It is measure the interdependences of two variables. The


sample correlation coefficient is computed as:
N

r=

(x
i =1

x )( y i y )

( N 1) S x S y

Random numbers & Probability distribution in Simulation

It is a numerical description of the outcome of some experiment. Example


of outcome of experiment could be profit values, failure rate or chance of
an event occurrence, etc.

It can generated from a probability distribution.

In simulation terminology, a random number is one that is uniformly


distributed in between 0 and 1.

Recall from statistic that uniform probability distribution characterizes a


random variable for which all outcomes between a minimum value a and a
maximum value b are equally likely.

Risk Analysis using Simulation

Page 11 of 29

Some Important Continuous Distribution

Exponential Distribution: The exponential distribution is characterized


by the single parameter (.) The exponential random variable T (t > 0) has
a probability density function
f(t) = e-t

for t > 0

Where,

= mean number of occurrences per unit time


t= number of time units until the next occurrence
The cumulative distribution function is
t

F (t ) = e x dx = e x = 1 e t for t > 0.
0

Mean: =

1
and

1
Variance: 2 =

Lognormal Distribution: Distribution parameter (, )

Risk Analysis using Simulation

Page 12 of 29

Risk Analysis using Simulation

Normal Distribution: Distribution parameter (, )

Page 13 of 29

Risk Analysis using Simulation

Page 14 of 29

Triangular Distribution : Distribution Parameter (min, m.likely, max)

Risk Analysis using Simulation

Page 15 of 29

Risk Analysis using Simulation

Uniform Distribution : Distribution parameter (min, max)

Page 16 of 29

Risk Analysis using Simulation

Page 17 of 29

Risk Analysis using simulation

Some Important Discrete Distribution

Binomial Distribution: Distribution parameter (n, p)

Page 18 of 29

Risk Analysis using simulation

Page 19 of 29

Risk Analysis using simulation

Poisson distribution: Distribution parameter ()

Page 20 of 29

Risk Analysis using simulation

Page 21 of 29

Generating random variables in Excel


Two properties of probability distribution are useful to generate random numbers:
1. the probability of any outcome is always between 0 and 1 and
2. the sum of the probabilities of all outcomes adds to 1
Random numbers from discrete distribution

Divide the range from 0 to 1 into intervals that correspond to the probabilities of
the discrete outcomes.

Use VLOOKUP function (look up value, table array, column index number) or
Random number generation function in excel.

Risk Analysis using simulation

Example : Random number generation for Dave's Candies

Page 22 of 29

Risk Analysis using simulation

Page 23 of 29

Random numbers from continuous distribution

Inverse transform method is used in case if there is no direct function to


generate random numbers.

Inverse transform method uses the following steps:


1. generate

random

number

from

the

Uniform

distribution:

u=Uniform(0,1),
2. Calculate inverse cumulative distribution function (ICDF). In Excel, the
function name to calculate the ICDF for Normal distribution NORMINV
and for LogNormal distribution is Lognormal (LOGINV). RAND function
in excel can also be used to generate random numbers from the Uniform
distribution, and apply the built-in functions to calculate the ICDF.

Example:
For example, the following formula will return the inverse CDF of the Normal
distribution with mean=1 and standard_deviation=2 evaluated at p=0.2:
=NORMINV(0.2; 1; 2)
Replacing 0.2 with RAND will yield the Normal random number generation formula:
=NORMINV(RAND(); 1; 2)
Similarly for LogNormal distribution the function is
LOGINV (probability, mean, Standard_dev)
For triangular distribution, It may need to use If function to generate random numbers
in Excel. The structure of the IF function is:
=IF(expression, what is returned if true, what is returned if false)
If function first determines which side of the distribution corresponds to the random
number and then evaluates the appropriate formula.
=IF (RAND () < (Mode Min)/(Max Min),Left formula, Right formula)
Let assume,
In a triangular distribution, if Min is a, Mode is b and Max. is c, then
Left Formula = a + RAND () * (b a ) * (c a )
Right Formula = c (1 RAND()) * (c b) * (c a )

a xb
bxc

Risk Analysis using simulation

Page 24 of 29

Monte-Carlo Simulation

Figure 1: Monte Carlo analysis to a model.


Monte-Carlo simulation uses the following steps for assessing parameters uncertainty:
i.

Select a distribution to describe possible values of a parameter.

ii.

Generate data from this distribution.

iii.

Use the generated data as probable values of the parameters in the model to
produce output.

Monte-Carlo simulation on Excel


Monte-Carlo simulation on excel spreadsheet follows the following steps::
1. Develop the spreadsheet model including a separate input and output
region.
2. Generate random numbers from the assigned probability distribution and
use the random data into the appropriate formula in the simulation model.
3. Repeat step 2 until to obtain the satisfactory output to create a distribution of
results.

Risk Analysis using simulation

Page 25 of 29

4. Make a summary for the descriptive statistics and collect output data in a
frequency distribution or histogram for analysis.
Example: Monte-Carlo simulation on excel for Dave's Candies problem

Output

Example: Methyl-mercury has been inadvertently released to a nearby lake. Use


Monte Carlo simulation. With 90% (subjective) confidence, what is the risk to the
maximally exposed individual? The following table shows the (subjective) probability
distributions for this problem.

Risk Analysis using simulation

Page 26 of 29

Solution:
Step 1: Selecting distribution for model parameters
9 Table 1 is used as input for a Monte Carlo simulation
Table 1: Input data for MCS
Input Region
Parameter

Distribution

Min.

Max.

Mean/Mode

STD

2.06E-01

4.22E-02

Fish concentration(CF),
mg/kg

Normal

Intake of Fish (IF),kg/d

Uniform

2.00E-02

1.30E-01

6.50E-02

Methyl mercury RfD


(RfDMM), mg/kg-d

Triangle

1.50E-04

3.00E-03

3.00E-04

Body mass (BM), kg

Triangle

4.50E+01

1.20E+02

7.00E+01

Step 2: Generate data from this distribution.


9 Table 3 shows the generated dated for this example:
Table 3: Random number generation
Sample
Number

CF

IF

RfD

BM

1
2
3
4
5
6
7
8
9
10
11
12
96
97
98
99
100

0.209
0.268
0.203
0.199
0.140
0.230
0.191
0.182
0.197
0.220
0.218
0.198
0.265
0.213
0.233
0.265
0.161

0.032
0.047
0.052
0.026
0.123
0.083
0.025
0.102
0.095
0.041
0.099
0.024
0.098
0.109
0.113
0.070
0.066

5.39E-04
9.61E-04
2.35E-03
2.51E-04
3.71E-04
6.53E-04
2.77E-03
1.00E-03
1.36E-03
6.73E-04
7.61E-04
1.57E-03
2.90E-04
4.57E-04
1.55E-03
4.98E-04
6.63E-04

59.203
66.066
109.907
95.667
94.443
60.822
78.394
80.310
57.432
84.090
59.938
88.117
90.694
84.283
64.637
93.816
87.898

Risk Analysis using simulation

Page 27 of 29

Step 3: Use the generated data as probable values in the model to calculate HQ.
9 Used model to calculate the Hazard Index (HI):

HI =

CF I F
BM RfD

Table 3: MCS to calculate HI using 100 iterations


Sample
Number

CF

IF

RfD

BM

HI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-

0.209
0.268
0.203
0.199
0.140
0.230
0.191
0.182
0.197
0.220
0.218
0.198
0.142
0.268
0.217
-

0.032
0.047
0.052
0.026
0.123
0.083
0.025
0.102
0.095
0.041
0.099
0.024
0.037
0.099
0.070
-

5.39E-04
9.61E-04
2.35E-03
2.51E-04
3.71E-04
6.53E-04
2.77E-03
1.00E-03
1.36E-03
6.73E-04
7.61E-04
1.57E-03
2.05E-03
1.83E-03
2.17E-03
-

59.203
66.066
109.907
95.667
94.443
60.822
78.394
80.310
57.432
84.090
59.938
88.117
64.939
87.285
76.220
-

2.12E-01
1.97E-01
4.07E-02
2.13E-01
4.92E-01

94
95
96
97
98
99
100

0.216
0.223
0.265
0.213
0.233
0.265
0.161

0.119
0.078
0.098
0.109
0.113
0.070
0.066

2.01E-03
2.68E-03
2.90E-04
4.57E-04
1.55E-03
4.98E-04
6.63E-04

69.327
74.652
90.694
84.283
64.637
93.816
87.898

4.79E-01
2.16E-02
2.32E-01
2.40E-01
1.59E-01
4.71E-01
3.48E-02
3.92E-02
1.66E-01
9.23E-02

1.85E-01
8.68E-02
9.83E-01
6.04E-01
2.63E-01
3.95E-01
1.83E-01

Risk Analysis using simulation

Page 28 of 29

Output Summary:

The output from MCS is shown in Fig 1.

The frequency distribution for the output from MCS (Monte Carlo simulation) is
shown in Fig 2.

From Monte Carlo simulation, the confidence interval for 90% & 95% are
obtained [0.193, 0.251] & [0.188, 0.256] respectively.

This implies that after taking into account the uncertainties on the parameters,
one is highly confident (at a subjective level of 95%) that the true HI should lie
between 0.188 and 0.256.

Since the 95% upper confidence limit of HI is still below 1, there is high
confidence that the maximally exposed individual for this scenario is not
exposed to an unacceptable level of risk, and remediation should not be
warranted.

Figure 1: MCS output

Risk Analysis using simulation

Page 29 of 29

Forcast: Hazard Index (HI)


25

Frequenc

20
15
10
5

1.
01

0.
91

0.
81

0.
71

0.
61

0.
51

0.
41

0.
31

0.
21

0.
11

0.
01

Hazard Index

Figure 2: Frequency distribution of HI

References:
Hammonds, J. S. , Hoffman, F. O., and Bartell, S .M., An Introductory Guide to
Uncertainty Analysis in Environmental and Health Risk Assessment managed by
MARTIN MARIETTA ENERGY SYSTEMS, INC. for the U.S. DEPARTMENT OF
ENERGY
James R. Evans, David Louis Olson, Introduction to Simulation and Risk Analysis,
Prentice Hall PTR, Upper Saddle River, NJ, 2001
http://www.cas.lancs.ac.uk/glossary_v1.1/prob.html#pdf
http://www.ltcconline.net/greenl/courses/201/probdist/zScore.htm

You might also like