You are on page 1of 19

Monte Carlo Simulation

Overview
In the business world, you often have to make far-reaching decisions based on limited information. To ascertain the full consequences of your decision, you will want to use all tools and methods available to you.

Monte Carlo simulation, one type of risk analysis, is a powerful tool that can make you aware of the positive, as well as the negative, outcomes of your decision.

Course Objectives
This course will: Introduce you to the benefits of using Monte Carlo simulation. Present some basic statistical terminology. Expose you to two Monte Carlo simulation packages. Provide realistic practice exercises.

Benefits of Using Monte Carlo Simulation Benefits of Using Monte Carlo Simulation: Objectives
Once you have completed this section, you should be able to: Define Monte Carlo simulation. List situations when you would use Monte Carlo simulation. Provide common uses of Monte Carlo simulation for the petroleum industry.

Benefits of Using Monte Carlo Simulation: Lesson


Deterministic methods assume that each variable has only one possible value. Many real-world problems involve uncertainty. An efficient technique for analyzing these types of problems is Monte Carlo simulation. By randomly selecting different values for each input variable across the range of a probability distribution, a computer simulates hundreds or thousands of possible

combinations. These results are presented in the form of histograms. Monte Carlo simulation cannot provide either a single, absolutely correct answer or the decision. Used with skill, however, these simulations can help you make reasonable, and better, business decisions. When a problem or question arises in which there is uncertainty in the variables, Monte Carlo simulation can help the analysis and lead to better decision-making. It can also: Facilitate a thorough investigation of both the direct and indirect consequences of random variation within a system. Identify prime sources of fluctuations. In the petroleum industry common uses for Monte Carlo simulation are: To estimate capital. To appraise and evaluate projects. To create a production forecast. To provide strategic planning and portfolio mix.

In a capital-estimation problem, some of the questions investigated are: What would the total cost of a project be? How long will the project take? Are the benefits worth the costs? Which project should be funded first? Monte Carlo simulation can help you resolve some of the key concerns in capital estimation.

When appraising wildcats or evaluating a project, Monte Carlo simulation can help you answer questions such as these: What are the initial well rates? What is the size of the reserves? How viable is the opportunity?

When creating a production forecast, Monte Carlo simulation can help estimate: Production demands Material requirements Material and labor costs Capacity Rate of return Net present value Economic forecasts Your company can use Monte Carlo simulation for strategic planning and portfolio mix. You can simulate estimates of: Aggregate capital Revenue and NPV Rate of return Efficiency

Benefits of Using Monte Carlo Simulation: Summary


In summary, Monte Carlo Simulation is any numerical method that uses random sampling to construct the solution to a physical or mathematical problem. Because it can provide glimpses into both positive and negative outcomes, it is a powerful tool to help make important business decisions.

Review of Statistics Fundamentals: Objectives


Upon completion of this portion of the lesson, you will be able to: Identify three types of distributions; Normal, Lognormal, and Triangular. Calculate measures of central tendency; Median, Mode, and Mean. Define Skewness. Calculate measures of dispersion such as Standard Deviation, Ranqe and Variance. Describe the relationship between two variables using Correlation.

Review of Statistics Fundamentals: Lesson


Before we elaborate on the simulation process, you need to know a few fundamental statistical definitions. A sample Excel file demonstrates the various statistical terms and is named AGES.XLS. Rightclick on the file name and select Save Target As (IE) or Save Link As (Netscape). Use the Save As dialog to select a drive and directory to store the spreadsheet, then click Save. For data to be useful, it needs to be sorted and classified. You can organize data in many ways. The frequency distribution is a typical method of organizing your data into a few manageable relationships. This distribution displays, either by using percent or by using actual values, the number of times a factor occurs within a defined boundary. This display is called a class interval.

Our example uses a sample of 18 people. Their ages and weights are the data. In this figure, the ages are grouped into number of people vs. age categories.

Another way to view this data would be to organize percent vs. age categories. Notice that the shape of the graph is shorter than that of the previous figure.

The cumulative frequency distribution is used to compare one distribution with another. This is the distribution widely used in risk analysis.

, , '

/ '

* ,

' P ,

By organizing our age data in cumulative percentages, we can construct a cumulative frequency graph. To do this, we need to calculate the percentage of "older than" and "younger than" for each class interval.

This figure represents the "older than" graph.

We have been discussing cumulative probability distributions. Another way to represent common density functions are the familiar bell-shaped curve (the normal distribution), an asymmetrical bell with a tail to the right (the lognormal distribution) and the triangular distribution. These curves are simply the derivatives of the corresponding cumulative functions. They are useful in depicting ranges, modes, and general shapes, but are not very helpful when looking for percentiles.

Mathematicians throughout the ages have found that, when measuring certain physical characteristics, the resulting data often follow a consistent frequency distribution that is called the normal distribution.

Notice that the bell-shaped curve is symmetric and that the mean, median, and mode occur at the same location. The normal distribution actually extends infinitely in each direction, but it is customary to draw it to extend only three standard deviations on either side of the mean.

A = Mode Median = Mean

Lognormal is a skewed right curve. Notice that for this distribution, the mean, median, and mode have different values. Recall that In the normal distribution, all three values are the same.

1
A BC

You will find that the lognormal distribution is very important in Monte Carlo simulation used for upstream petroleum models.

The lognormal distribution actually extends from zero to infinity. It always represents items with positiv&alues. There is no conventional cutoff point, as there is for a normal distribution.

Most Likely I

The triangular distribution uses three points: the minimum, the maximum, and the most likely. When choosing a minimum value, make sure it is a value lower than the lowest value that could ever occur. Likewise, the maximum value should be a value higher than the highest value that could ever occur.

Ranges of values around the most likely point should have the highest probability of occurrence.

Use your experts to help decide on maximum and minimum points. Make sure they understand that these values must be absolute and neither will ever be reached.

Now that we have learned a little about distributions, the terms on the following pages will help us define specific characteristics of a distribution. We will use the age example to explain these terms.

Review of Statistics Fundamentals: Lesson


1

# a .

v .
I*

I*
I> .

>I*
I=
1.1

I 1
8).

1%
I* .

The Median is the point that separates the members of the data set into two groups, each with an equal number of samples. The median is also referred to as the P50 or the 50th percentile. For a sample with an even N (sample, population, number of data points), Excel picks the average of the middle two numbers (41 and 44 in this sample), which is why our

I*

W
I . .

m
1 3

0%

Median for age is 42.5.

The Mode is the one value that occurs most frequently within a sample. The mode in our age example is 29. Although 49 also occurs twice, Excel picks the first occurring number (not necessarily the smallest), if there is a tie.

The Mean of this example is the arithmetic average or the sum of the values divided by the total number of measurements. In this case, we would add up all the ages (769) and divide by the sample sizelcount (18). The mean in this age example is equal to 42.72

What would happen to the Mean, Median, and Mode if we changed the value of the highest age from 60 to 75?

Only the mean value changes. It is now 43.56.Notice how the mode and the median remain the same.

These measures, mean, median, and mode are often referred to as characteristics of central tendency. They are very useful and essential in risk analysis.

Skewness is a measure of the lopsidedness of a distribution. It illustrates the relationship between the mode, median, and mean.

When the Skewness = 0,the data are symmetric: 10,20,30,40, This would be a normal 50. distribution. When the Skewness < 0,there are a few numbers much smaller than the mean: 1, 2,30, 30,30,

40,50.
When the Skewness > 0,there are a few numbers much larger than the mean: 10,20,30,30,30,

30,70,100.
These data might have come from a lognormal distribution, which are always skewed right (have positive skewness).

1'0 calculate the skew, Excel uses this formula:

It is almost the average cubed deviation from the mean, divided by the cube of the standard deviation.

At times you will need to know more about a set of data than the characteristics of its central tendency. Measures of variance can tell us more about the data set as a whole.

Range of values is a descriptive device. It expresses the gap between the extremes of the data (the maximum minus the minimum). In this example, our age range is 31 (60-29).
Minimum: Maximum:

Variance indicates how scattered data is.


It is the sum of the squares of the difference between individual values and the mean value, divided by the number of data points or population.

If you are calculating variance from a sample, you need to use N-1 (VAR in the Excel spreadsheet) instead of N (VARP). In our example, if 18 were the population (for example, a physics class at a university), then the variance is 99.98. However, it is more likely that this group is a sample of a larger population (such as all undergraduate students at the university). Using N -1, the variance becomes 105.86.

The problem with variance is that much of what we measure cannot be thought of in terms of squared units. In this example, how would you use the units of years2?

Standard Deviation, another measure of central tendency, solves this squared unit problem. It is the square root of the variance.

This is the most popular measure of dispersion of distributions. It is also the most important way to describe continuous distributions (distributions that assume there are an infinite number of possible values, uninterrupted over a range). In this example, the standard deviation, using N (VARP) is 10. Using N -1 (VAR) it is 10.29.

What happens when we change one of the ages from 60 to 75? Remember standard deviation uses the mean, therefore every single data point affects it.

Now that we know how to calculate the standard deviation, how is it linked to probability determination? One standard deviation from the mean includes 34.1 5% of the total observations in a normal distribution. Therefore, if we measure one standard deviation to the right and one standard deviation to the left of the mean, the area covered is 68.3 %.

A randomly selected value from this distribution would have only a 31.7% chance of occurring outside this area.

Two standard deviations would include 95.5% of the total curve. A randomly selected value from this distribution would have only a 4.5% chance of occurring outside this area.

Three standard deviations cover 99.7% of the area. A randomly selected value would have only a 0.3% chance of occurring outside this area.
This is how normal distributions, standard deviations, and the mean produce estimates of probability.

The next important term is correlation. Correlation (CORREL) is a relationship between two variables.

Correlation is always between -1 and 1. When it is 0, the XY-scatter plot has no apparent trend or relationship. If the correlation is less than 0, X increases, Y has a tendency to decrease. With as a correlation greater than 0, X increases, Y has a tendency to increase. as In this example, there is a negative correlation between age and weight. According to these data, as people get older, their weight tends to decrease somewhat. (LC -,
I

One final concept that is essential to Monte Carlo simulation is that of sensitivity.

Sensitivity analysis identifies which input variables have the largest impact on your model. These are the variables that are causing the most uncertainty.
Statistically, sensitivity analysis is measured by the correlation coefficient between the inputs and the outputs. This will be discussed more in the first Crystal Ball or @RISKexercise part of the lesson.

Review of Statistics Fundamentals: Summary


In this section we have learned that distributions are a useful method for organizing data. The three distributions commonly used in Monte Carlo simulations are normal, lognormal, and triangular. Measures of central tendency, measures of variance, skew, and correlation provide descriptive and comparative information about a distribution. Knowledge of these fundamental statistical terms is essential when creating Monte Carlo simulations, the next topic of this lesson.

Using Monte Carlo Simulation Using Monte Carlo Simulation: Objectives


Many software companies have developed statistical programs to run Monte Carlo simulations. @Riskand Crystal Ball are the two most widely used packages; both are add-ons to Excel. This part of the lesson will explain: Learning @Riskand Crystal Ball menus Running Monte Carlo simulations Analyzing three common distributions

Using Monte Carlo Simulation: Exercises


Please choose an exercise: Crystal Ball @Risk

Using Monte Carlo Simulation: Summary


Although both @Riskand Crystal Ball provide an easy method of generating Monte Carlo simulations, keep in mind they can only produce results based upon your input.

Using Monte Carlo Simulation: Summary


For the most accurate results, remember these key fundamentals of risk analysis: You must be able to isolate key variables. Without this information, resulting output could be poor and unusable. You must be able to quantify these key variables. For example, a lot of concrete data are available when trying to ascertain the risk involved in development drilling. Less data are available on outpost drilling and the least amount of data are available for wildcat drilling.

Using Monte Carlo Simulation: Summary


Other fundamentals of risk analysis are: Your basic view of uncertainty is another fundamental consideration. Choose a view that reflects both upside and downside possibilities as well as most probable values. Triangular distribution is a popular way to show these parameters.

Search for reality checks by correlation, by comparisons to similar but known situations, or by checits of limits set by reality.

Glossary
Assumption
An estimated value (input to a spreadsheet model in Crystal Ball)

Coefficient of Variability

A measure of relative variation that relates the Standard Deviation to the mean. Results are represented in percentages for comparison purposes.
(Also called Coefficient of Variance or Coefficient of Variation)

Continuous Probability Distribution

A probability distribution that describes a 4et of uninterrupted values over a range. In contrast to a discrete distribution. a continuous distribution assumes there are an infinite number of possible values.

Correlation Correlation Coefficient

Relationship between two variables.

A number between - 1 and + 1 that describes the degree of positive or negative correlation between variables. Correlation of + 1 indicates perfect positive correlation while - 1 shows a perfect negative correlation. 0 indicates there is no correlation.
A probability distribution that describes distinct values, usually integers, with no intermediate values. In contrast. a continuous distribution assumes there are an infinite number of possible values

Discrete Probability Distribution

Expected Value (Mean)

Sum of all the values in a set divided by the total number of values in the set.

Forecast

In Crystal Ball, an output for a simulation model. The Standard Deviation of the distribution of possible sample means. This statistic gives one indication of the accuracy of the simulation. Algebraically, the standard deviation divided by the square root of N.

Iteration (Trial)

One calculation of the user's model during a simulation. A simulation consists of many recalculations or iterations. Sum of all the values in a set divided by the total number of values in the set.

Mean (Expected Value)

Mean Standard Error

The Standard Deviation of the distribution of possible sample means. This statistic gives one indication of the accuracy of the simulation. Algebraically, this is the standard deviation divided by the square root of
3.

Median

For data, the middle number (given an odd number of items) or the average of the two middle numbers (given an even number of items). For a continuous distribution, the value for which
there is a 50h nrobahilitv of not

being exceeded (i.e. P50 is the 50th percentile).

Mode

For data, the mode is the item that repeats most. For a continuous distribution, the mode is the value corresponding to the highest point on the probability density function.

Monte Carlo Simulation

ANY numerical method that uses

random sampling to construct the solution to a physical or mathematical problem. It refers to the traditional method of sampling random variables in simulation modeling. Samples are chosen completely randomly across the range of the distribution, thus necessitating large numbers of samples for convergence for highly skewed or long-tail distributions.

Probability

The measure of how likely a value or event is to occur.

Probability Distribution

A set of all possible events and their associated probabilities.

Range

The difference between the largest and smallest values in a data set. Range is the simplest measure of the dispersion or "risk of a distribution". The uncertainty or variability in the outcome of some event or decision

Risk (Uncertainty)

Sensitivity

The extent to which a simulation output is influenced by each of the inputs. Thus. an output is more sensitive to some input variables than others. Sensitivity is measured by a correlation coefficient between the output and the input. Is the measure of the shape or degree of asymmetry of a distribution. Negatively skewed distribution has most of its values at the upper end of the range, a positively skewed distribution has most of the values at the lower end of the range. A normal distribution has no skewness. The square root of the variance for a distribution. It is the measure of how widely dispersed the values are in a distribution.

Skewness

Standard Deviation

Trial (Iteration)

One calculation of the user's model during a simulation. A simulation consists of many recalculations or iterations.

Uncertainty (Ri\k)

The uncertainty or variability in the outcome of some event or decision.

Ya riance

The square of the Standard Deviation. It is the measure of how widely dispersed the values are in a distribution. It is one indicator of uncertainty. Variance gives disproportionate weight to outliers or values that are far away from the mean. When values are close to the mean. variance is small, when widely scattered. the variance is larger.

count 1 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18

AGE X

(~-xave)~2 (x-xavep3 188.30 -2583.89

Weight Y 165

To get this in Excel: Tools Data Analysis Descriptive statistics ColumnI Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 42.72 2.43 42.50 29.00 10.29 105.86 -1.17 0.22 31.OO 29.00 60.00 769.00 18.00

2340.86 4313.06 5157.79 199.73

175 185 139 Nicknames Mean = average, arithme!tic average Median

MEDIAN MODE VARP VAR STDEVP STDEV SKEW 99.98 105.86 10.00 10.29

0.20

= P50, the 50th percentile

Mode = most likely VARP is the average of the squared deviations from the mean (column C) VAR is the sum of the squared deviations divided by N-I (instead of N). This is the sqrt of VARP This is the sqrt of VAR SKEW is almost the average CUBED deviation from the mean, divided by the cube of the standard deviation Check out the formula for SKEW in Excel. Excel uses NI[(N-I)"(N-2)], which is close to 1IN When SKEW is between -.I . or even -.2 and .2, the histogram would appear symmetric and I

0.22

CORREL

-0.322008 CORREL is the ordinary correlation coefficient between X and Y CORREL(X,Y) = CORREL(Y,X) CORREL is always between -1 and 1. When it is 0, the xy-scatter plot has no apparent trend O CORREL < indicates that as X increases, Y has a tendency to decrease CORREL >O indicates that as X increases, Y has a tendency to increase Monte Carlo software uses Rank correlation, which is CORREL on the ranks of the data

--

Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis

4 1 42 1.25

Skewness Range Minimum Maximum Sum Count

-0.12 15.00 33.00 48.00 497.00 12.00

You might also like