02 - Probabilities, Distribution and Correlation

Risk, Uncertainty & Economic Analysis for Resource Assessment and Production Forecasting in Shale and Tight Reservoirs
Introduction Probability, Distributions and Correlation Estimating Under Uncertainty Tight Clastics / Carbonate Assessment Shale Assessment Reservoir Flow Valuation Techniques
Theresnothingismoredangerousthanasampleofone
QuotefromE.C.Capen
Rose & Associates, LLP
Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013
Definitions
Event: One of two or more things which can occur What does occur Subjective confidence about the likelihood of an uncertain future event, given trials An orderly portrayal of related data samples selected from a population; Portrayal Types include frequency, % frequency cumulative frequency, cum. % frequency log - cumulative probability (probit) Various shapes available in commercial software for curve fitting and forecasting Outcome: Probability: Distribution:
Frequency Distributions
60 50
Distribution of gross pay intervals, your trend Helpful distribution, lets call the peak frequency the mode. What % of the outcomes are > or = than 80? Distribution not that helpful anymore we need another way of looking at the data
100% 80% 60% 40%
Frequency
40 30 20
mode
10 0
20
20% 0%
40
60
80
100
Feet
Probability <=
Cumulative % Distributions
Distribution of gross pay intervals, your trend Distribution not that helpful anymore we need another way of looking at the data Lets accumulate the outcomes from small to large Lets also convert the frequency to a % of the total population
- 100 % - 80 % - 60 % - 40 % - 20 % -0%
Probability of < =
20
40
60
80
100
Feet
Cumulative Frequency Distributions

60 50
Distribution of gross pay intervals, your trend Distribution not that helpful anymore we need another way of looking at the data Lets accumulate the outcomes from large to small
100% 80% 60% 40% 20% 0%
Frequency
40 30 20 10 0
20 40 60
mode
80
100
Feet
Probability <=
Distribution of gross pay intervals, your trend
- 100 %
Distribution not that helpful anymore - 80 % we need another way of looking at the data Lets accumulate the - 60 % outcomes from large to small - 40 % Lets also convert the frequency to a % of the total population
Probability of > =
- 20 % -0%
20
40
60
80
100
Feet
Distribution of gross pay intervals, your trend
Probability of < =
100 % 80 % 60 % 40 % 20 % 0%-
- 100 % - 80 % - 60 % - 40 % - 20 % -0%
Probability of > =
20
40
60
80
100
Feet
One of the main standards in risk analysis is selecting how you accumulate frequency distributionsin this course, we will accumulate from the large end, referred to as the greater than convention (P10 is a big number), which is consistent with the latest PRMS guidelines. A Percentile is a ?.
Plotting Conventions
Definitions: % >= (GE) or % <= (LE) Evolving preference: % >= (GE)

Explorers think in terms of large discoveries Consistent with SEC / SPE / WPC / AAPG guidelines Commercial threshold truncations easier to apply Less confusing for decision makers
In a Greater Than convention: P10 is the larger number P90 is the smaller number
What Are P10 and P90?

In the GE convention
P10 is the value on the distribution for which there is a 10% probability that a random selection from that distribution will be greater than or equal to that value this is a large number P90 is the value on the distribution for which there is a 90% probability that a random selection from that distribution will be greater than or equal to that value this is a small number
In the LE convention
P10 is the value on the distribution for which there is a 10% probability that a random selection from that distribution will be less than or equal to that value this is a small number P90 is the value on the distribution for which there is a 90% probability that a random selection from that distribution will be less than or equal to that value this is a large number
These definitions apply to any Pvalue
EXERCISE 2: HISTOGRAM OF DIE ROLLS

1. Roll your die six times placing an x in the box that matches the die outcome a. What is the shape of the distribution? b. What is the average outcome of your 6 samples? 2. Roll your die six more times, adding the results to your plot such that n = 12 a. What is the shape of the distribution? b. What is the average outcome of your 12 samples? 3. Roll your die 12 more times, adding the results to your plot such that n = 24 a. What is the shape of the distribution? b. What is the average outcome of your 24 samples ?
10
EXERCISE 2: HISTOGRAM OF DIE ROLLS
Frequency
1
DieFaceOutcome
11
6
More trials than you think to recreate the population average!
Note how the Population average is reached with large sample sizes. The Population mean is not an actual die face. The concept of the average comes from observations of numerous repeated trials.
EXERCISE 3: CUMULATIVE FREQ. PLOT
5orless
Count orFrequency
3orless
1orless
Dieface
12
2ormore
Count orFrequency
4ormore
6ormore
Dieface

100% %Frequency 75%
Key points:
Best to know how certain shapes come about, since you are accountable for your forecasts You are able to make better forecasts with less data when you know the inherent shape, hence the need to cultivate analogs! Cumulative percentage plots (especially in a greater than convention) provide insight into prob of future occurrence
Count orFrequency
25%
Dieface
13
Sums of Independent Distributions

Die One Likelihood plus Likelihood
Die Two
Outcome

7 6 5
Frequency
4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Outcome of Two Dice Summed
14

160 140 120
Frequency
100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
TENDS TOWARD A SYMMETRICAL (NORMAL) DISTRIBUTION
Outcome of Four Dice Summed
Products of Independent Distributions

Die One Likelihood times Likelihood
Die Two
Outcome
15

10 9 8 7 6 5 4 3 2 1 0
Frequency
Outcome of the Product of Two Dice

100 90 80 70 60 50 40 30 20 10 0 0 200
Frequency
TENDS TOWARD AN ASYMMETRICAL (LOGNORMAL) DISTRIBUTION
400
600
800
1000
1200
Outcome of the Product of Four Dice
16
Niobrara Shale
Coskey (2011)
17
Central Limit Theorem

SUM of a group of independent random variables tends towards a NORMAL DISTRIBUTION
Linear Scale
PRODUCT of a group of independent random variables tends towards a LOGNORMAL DISTRIBUTION
Linear Scale
Log Scale

A commonly used plot: The log - cum probability (aka probit) coordinate system
More on the probit scale
Distribution of product of various numbers of dice
A logarithmic x axis telescopes the uncertainty to a manageable scale, and the probit y axis was developed to permit highly asymmetrical distributions to appear as a straight line
18

+3 +2 +1
-1 -2 -3
A logarithmic x axis telescopes the uncertainty to a manageable scale, and the probit y axis was developed to permit highly asymmetrical distributions to appear as a straight line
Understanding the Cumulative Probability Y-Axis

Linear vs Probit: Straight lines are normal distributions
P99
P98 P95
P01
P02 P05
P90
P80 P70 P60
P10
P20 P30 P40
P50
P40 P30 P20
P50
P60 P70 P80
P10
P05 P02
P90
P95 P98
P01
P99
19
Understanding the Cumulative Probability Y-Axis

Log vs Probit: Straight lines are lognormal distributions
P99
P98 P95
P01
P02 P05
P90
P80 P70 P60
P10
P20 P30 P40
P50
P40 P30 P20
P50
P60 P70 P80
P50
P90
P10
P05 P02
P90
P95 P98
P10
P01
1 10
P99
100

+1
-1
If the x axis was linear then normal distributions would plot as straight lines
A logarithmic x axis telescopes the uncertainty to a manageable scale, and the probit y axis was developed to permit highly asymmetrical distributions to appear as a straight linenote also how the scale balances the area of a standard normal distribution under equal percentile segments
20
Practical Attributes of Key Parameters

P01 P10 P50 P90 P99
P99
Geologically possible; but extremely unlikely Reasonable Maximum Half below, Half above, the median Reasonable Minimum As small as it could be . . . Yet detectable
Constraints or Reality Checks
P01P00? P01
P10
P90
P50
P50
80% confidence level

P10 P90
P01 P100-P99? 1
P99
10 100 1,000 10,000

The fact that P10 / P50 = P50 / P90 for lognormal distributions gives three very useful equations: P50 = SQRT (P10 x P90) P10 = P502 / P90 P90 = P502 / P10
Never estimate what you can calculate!
21
Measures of Central Tendency For Lognormal Distributions

Mode:
The most frequently occurring value of a data set. Occupies the peak of the frequency curve. 2 Mode = e( ) being larger or smaller. The P50 of a continuous distribution is not equal to the Median of a sample! Median = e
Median: The point at which there is an equal probability of
Mean:
Where,
The average of all outcomes. 2 Mean = e( + 0.5 ) = natural log of the median (mean of natural logs) = [Ln(P90) + Ln(P10)] / 2 = standard deviation of natural logs = [Ln(P10) Ln(P90)] / (2 x 1.282)
Measures of Dispersion
P10 / P90 2 Variance
SPREAD
Easy to understand, dimensionless, can be easily calculated.

This equation provides us with an unbiased estimate of 2 _ given a sample of n values. x is the sampled mean from a population with a mean of .
Standard
Deviation Skewness Range
Square root of the variance

An estimate of the degree 3 of asymmetry of a_sample of n values about x . s is an estimate of for a sample of n values.
n (n 1)(n 2)
_ xi - x
s
The difference between the largest and smallest values in a dataset.
22

P01 P10 P50 P90 P99
P99
Geologically possible; but extremely unlikely Reasonable Maximum Half below, Half above, the median Reasonable Minimum As small as it could be . . . Yet detectable
Butwheredoes Constraints or Most Likely Reality Checks reside?
P01
P90
P10
P50
P50
P10
P90
P01
1 10 100 1,000
P99
10,000
Avg rate MCF / D in the Year of Maximum Production Fruitland Coals, SJ Basin (n = 564)
P10 = 900 P50 = 90 P90 = 9 4.5 1.8
From best fit line Statistical Mean = 452 Mode = 4 P10 / P90 = 900 / 9 = 100
10 1,000 100 Maximum Annual Production, MCF / D

AyersandKaiser(1994)
23
Which single well metric is best transferable from the single well level to the program level?
P99
P98 P95
P01
P02 P05
P90
P80 P70 P60
P10
Mean: 1.5? Median: 1.0? Mode: 0.4?
P20 P30 P40
P50
P40 P30 P20
P50
P60 P70 P80
P10
P05 P02
P90
P95 P98
P01
0.1 1
P99
10
P90
P50
P10
Mode ~ 0.4 EUR from a prospect
Mean ~ 1.50
Single Well simulation using Monte Carlo with Crystal Ball
24
To simulate a Program of Single Wells we use a Monte Carlo Analysis simulation approach
Heres how it works
10x Mode = 4 from the sum of 10 distributions modes = mode of sum? NO
10x Median = 10 from the sum of 10 distributions medians = median of sum? NO
25
Which single well metric is best transferable from the single well level to the program level? MEAN
10x Mode = 4 from the sum of 10 distributions modes = mode of sum? NO
10x Median = 10 from the sum of 10 distributions medians = median of sum? NO 10x Mean = 15 from the sum of 10 distributions means = Mean of sum? YES
Only the mean appears transferable from the well to the program level .The mean of the sum is the sum of the means.
Lognormal Distributions:
Estimates of the population Mean Arithmetic Mean = (sum of n values) / n Statistical Mean =
Mz = e ( + 0.5
2)
= Natural Log of the Median = [Ln(P90) + Ln(P10)] / 2 = Standard Deviation of the Natural Logs = [Ln(P10) Ln(P90)] / (2 x 1.282)
Truncated Statistical Mean (P99 to P01) Swansons Mean =

Mz = 0.3xP90 + 0.4xP50 + 0.3xP10
26
Swansons Mean: Rationale and Origin

= (0.3)(P10) + (0.4)(P50) + (0.3)(P90)
P99
P98 P95
P01
P02 P05
P90
P80 P70 P60
The P10 is representative of the mean value of the top 30% of the curve
P10
P20 P30
P50
P40 P30 P20
The P50 is representative of the mean value of the central 40% of the curve
P40
Swansons: 456.4 Statistical: 473.4 Trunc Statistical: 445.7
P50
P60 P70 P80
P10
P05 P02
The P90 is representative of the mean value of the lower 30% of the curve
P90
P95 P98
P01
P99
10
100
1,000
10,000
Swansons Mean: Practical Application

Success Pc Pf If PV/BOE is the same for discrete cases Failure (cost) 0.3 P10 Success Pc Pf 0.4 P50 Mean of PV PV of Mean
0.3 P90 If PV/BOE is different for discreet cases Failure (cost)
27
Variation of Means with Uncertainty

40%
(Mean Truncated Mean) Truncated Mean
Statistical Mean Truncated Mean Truncated Mean
30% 20% 10% 0%
Swansons Mean Truncated Mean Truncated Mean
-10% 1 10 Uncertainty, P10 / P90 100
Exercise 4: Production Rate Distribution Production Rate (Mcfd): 850, 2500, 570, 1100, 160, 333, 1333
Using the mid-point approach, build a Rate distribution on log probability paper: MCFD Rank %tile a. Arrange all production rates by size,
largest first and assign Rank b. Determine the Percentile Values n = 7; Percentile = (100/n) * (Rank 0.5) c. Tabulate production rates sizes with percentiles (%tile)
2500 1333 1100 850 570 333 160
1 2 3 4 5 6 7
7.2
28
200
120 150
~2/3 ~1/2 ~1/3
Plotting on Log Probability Paper
130
100
Patterns of Bias
200
From 100 to 200 120 is 0.30 of the distance from 100 to 200 (about 1/3) 130 is 0.47 of the distance from 100 to 200 (about 1/2) 150 is 0.70 of the distance from 100 to 200 2/3)c (about 2/3)
1,000
29
30
Rank Order the Rates & Determine the Percentile

MCF P99
P98 2500
Percentile 7.2 21.4 35.7 50 64.3 78.6 92.9
P01
P02 P05
1333 1100 P90 850 570 P80 333 P70 160

P95 P60
P10
P20 P30 P40
P50
P40 P30 P20
P50
P60 P70 P80
P10
P05 P02
P90
P95 P98
P01
P99
10
100
31
1,000
10,000
32
Exercise 4: Production Rate Distribution

Questions: 1. What is the probability that an new well in this trend would be greater than of equal to 500 MCFD? __ 2000 MCFD? ___
2. What is the median & P50 production rate in this trend? 3. How do you know that this trend shows a lognormal distribution? 4. What is the arithmetic mean of this distribution? 5. Calculate the mean of this distribution using Swanson's Rule Swansons Mean = (0.3) * P90 + (0.4) * P50 + (0.3) * P10
Rose & Associates, LLP 33 Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013
Limitations of Lognormal and Normal Distributions

Lognormal distributions range from zero to positive infinity (0 to +)
Lognormal distributions are good for most EUR parameters, but any variable constrained by an upper limit may not be fairly represented (e.g., percentages such as porosity, saturation, recovery efficiencies or N/G) Lognormal distributions require truncation
Normal distributions range from negative infinity to positive infinity (- to +)

All variables used in EUR calculations are positive values and may not be fairly represented by normal distributions Normal distributions require truncation
Beta Distributions
Beta distributions range from zero to one (0 to 1)
Just about any variable that ranges from zero to one can be represented by a Beta distribution and not require truncation It can take on a variety of shapes including symmetric, right or left skewed or U-shaped It can be uniquely defined by mean and variance or P10 and P90 Beta distributions are ideal for porosity, saturation, recovery efficiency, or N/G
Examples of Beta Distributions

0.4 0.3
0.2
0.1
0.0 0.00
0.20
0.40
0.60
0.80
1.00
0.4
0.3
0.2
0.1
Right skewed water saturation, oil recovery efficiency, N/G in sand-shale sequences Left skewed gas recovery efficiency, N/G of massive sands
0.0 0.00
0.20
0.40
0.60
0.80
1.00
36 Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013
Examples of Beta Distributions

0.4 0.3
0.2
0.1
0.0 0.00
0.20
0.40
0.60
0.80
1.00
0.4
0.3
0.2
0.1
Low variance, largely symmetric porosity U-shaped percent oil or gas volume (high probability of one or the other, low probability of both together)
0.0 0.00
0.20
0.40
0.60
0.80
1.00
37 Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013
Beta Distribution Data Fit

P99.9 P99 P95 P90 P80 P70 P60 P50 P40 P30 P20 P10 P05 P00.1
Data Beta Log Norm Normal
N/G data from deepwater reservoirs Three distributions with mean and variances that match the data Note the lack of fit with both the lognormal and normal curves The best fit is clearly the Beta distribution
P01 P05 P10 P20 P30 P40 P50 P60 P70 P80 P90 P95 P99 P99.9 1000%
P01 P00.1 10%
100%
Citron&Others(2005) DatafromtheCossey andAssociatesDeepwater Database

Triangular Distributions
Most Likely @ Mean
No naturally occurring process is triangularly distributed Artificial distribution, developed as a computational shortcut Particularly harmful as variance increases Ok for narrow Normal Distributions
Max @ P01?
10
Avoid Triangular Distributions when modeling Logn

Correlation:
Establishing whether - and the extent to which variables interrelate is probably more important than deciding on the distributions Assuming independence when a dependency really exists biases the results, often quite significantly
Campbell and others (2001)

Correlation:
Correlation coefficient is a measure of the degree of correlation between parameters. A high positive correlation implies that given a high input, that the correlated value will also sample a high value. In this course we will only use linear Correlation Coefficients as a measure of the linear association between variables. Correlation coefficients range from -1.0 to 1.0 Negative coefficients occur when we want to model inverse correlation such as the relationship between interest rates and housing starts
41
Correlation: examples from x,y plots

0.00 0.25
0.50
0.75
42
Correlation: coefficients
How does a Correlation Coefficient of 0.66 work?
Variable B
The Pearson Correlation Coefficient (r)
r=
Variable A Sqrt [
n i=1 n i=1
(x-mean)(y-mean) (x-mean)2
n i=1
(y-mean)2]
Where x represents Variable A and y represents Variable B
Possible outcomes of B as a function of A are constrained to 34% of the full distribution of B. Hence the Correlation Coefficient r = 1- 0.34 = 0.66
Correlation: coefficients
Cumulative Probability 100
How does a Correlation Coefficient of 0.66 work?

Variable A
Cumulative Probability
100
Variable B
xx
x x x
~1/3
44
Correlation: impact on the mean

R=1
100
Prob or greater % Prob or greater %
100
200
500
1000
R=0
Mean
100
Prob or greater % Prob or greater %
100
200
500
1000
45
Correlation: example of impact

Avg Net Pay P10 / P90 = 5
46

Area P10 / P90 = 5
47

NRV w corr coef = 0 P10 / P90 = 9.7 Mean = 119
48

NRV w corr coef = 1.0 P10 / P90 = 25 Mean = 175
49

NRV w corr coef = -1.0 P10 / P90 = 1.01 Mean = 80
Mean = 80
50
Key Points
Central Limit Theorem Distribution type, shape and variance Lognormal shapes are very common in our business The mean is the best single point representation of a large programs average result Establishing the nature and scope of correlation and dependency can be more critical than distribution type The larger the P10/P90 of inputs, the stronger the effect of your correlations The best distribution is the one you can best defend
52
Recommended Website:
53

02 - Probabilities, Distribution and Correlation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 - Probabilities, Distribution and Correlation

Uploaded by

Copyright:

Available Formats

Risk, Uncertainty & Economic Analysis for Resource Assessment and Production Forecasting in Shale and Tight Reservoirs

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

100% 80% 60% 40%

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Cumulative Frequency Distributions

100% 80% 60% 40% 20% 0%

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Definitions: % >= (GE) or % <= (LE) Evolving preference: % >= (GE)

What Are P10 and P90?

These definitions apply to any Pvalue

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

EXERCISE 2: HISTOGRAM OF DIE ROLLS

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

EXERCISE 2: HISTOGRAM OF DIE ROLLS

More trials than you think to recreate the population average!

EXERCISE 3: CUMULATIVE FREQ. PLOT

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

EXERCISE 3: CUMULATIVE FREQ. PLOT

EXERCISE 3: CUMULATIVE FREQ. PLOT

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Sums of Independent Distributions

Sums of Independent Distributions

4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Outcome of Two Dice Summed

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Sums of Independent Distributions

TENDS TOWARD A SYMMETRICAL (NORMAL) DISTRIBUTION

Outcome of Four Dice Summed

Products of Independent Distributions

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Products of Independent Distributions

Outcome of the Product of Two Dice

Products of Independent Distributions

TENDS TOWARD AN ASYMMETRICAL (LOGNORMAL) DISTRIBUTION

Outcome of the Product of Four Dice

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013

Central Limit Theorem

PRODUCT of a group of independent random variables tends towards a LOGNORMAL DISTRIBUTION

Products of Independent Distributions

Distribution of product of various numbers of dice

Rose & Associates, LLP

Ch 2 - Probability, Distributions, and Correlations AAPG Cartagena 2D course, Sept. 2013