You are on page 1of 17

SPE 166449

New Approach to Identify Analogue Reservoirs


H. Martin Rodriguez, E. Escobar, S. Embid, N. Rodriguez, and M. Hegazy, Repsol, and Larry W. Lake, SPE, The
University of Texas at Austin

Copyright 2013, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Annual Technical Conference and Exhibition held in New Orleans, Louisiana, USA, 30 September2 October 2013.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by t he author(s). Contents of the paper have not been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its
officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to
reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract mus t contain conspicuous acknowledgment of SPE copyright.

Abstract
The identification of analogous reservoirs is an important step in planning the development of a new field, because the
information available about the new areas is usually limited or even nonexistent. Traditionally the search for analogous
reservoirs has been made by experienced geoscientists, but this practice is subject to availability of this experience and the
results are heavily dominated by geology. In this paper we present a systematic and unbiased procedure to search for
analogous reservoirs, based on information contained in a validated large database of reservoirs parameters, both engineering
and geologic. Each reservoir has its own fingerprint characterized by the set of its own properties, which differ from one
reservoir to another. The method uses multivariate statistical techniques to find a unique and reproducible list of reservoirs
with fingerprints that are most similar to the selected target. The flexibility of the method allows variation of the similarity
function (weights) and evaluation of different scenarios (static, dynamic, PVT behavior, etc.).

Our method basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate Analysis, and
Similarity Ranking. The first step consists of the analysis and preprocessing of the available database. In the second step, Key
Parameter Selection, variables with largest impact on the case to be evaluated are identified. The third step, Multivariate
Analysis, applies several multivariate techniques such as principal component analysis (PCA) and cluster analysis. Finally, in
the Ranking step, we apply a similarity function to the group of previously selected analogous reservoirs, generating a
similarity ranking of analogous reservoirs.

To validate this new method we use the Casablanca oil field as a target reservoir. Casablanca is a mature carbonate reservoir
very well known by Repsol whose experts identified four analogues for this target. The new developed method was
independently applied in this case to obtain 19 analogous reservoirs sorted by similarity criteria. The maximum similarity
found was 85 % for the Amposta Marino reservoir, one of the independently identified analogous reservoirs given by the
business unit team. Moreover, the four analogous reservoirs previously identified by the business unit team were between the
first ten positions in the similarity ranking. These results are highly encouraging as it captures the know-how of the experts and
ensures a reproducible response, regardless of the user expertise.

The most relevant advantage of this new method is that it is based on a similarity function that takes into account all the
weighted key parameters simultaneously, instead of sequential filters used by some commercial software. As a result, the
procedure we present in this work will support the predictive search of missing properties for the target reservoir, reducing the
uncertainty for decision making.

Introduction
In Oil & Gas Exploration and Production, analogous reservoirs are normally used in many ways to study reservoir that lack
critical knowledge. In this paper, we will call a target reservoir any reservoir with a deficit of information to which we want
to identify a ranked list of analogues reservoirs. Besides, analogous reservoirs are those with the most similar fingerprints to
the selected target based on the selected key parameters, not necessarily geographically closed.
2 SPE 166449

Often, relatively similar neighboring reservoirs are used as a first approximation of analogous reservoirs to provide reservoir
data as: PVT properties, and petrophysic information. However, although this is not always the best decision, this is done
because a lack of a simple method for identifying analogs in the short term. In other cases, the development plan and
production behavior of analogous reservoirs are used as a reference for preliminary forecasting of less mature projects. The
screening criterion of EOR processes is other example, where analogous reservoir concepts are used to pre-select the potential
recovery processes for application to a target reservoir. All the same, one of the best known uses of analogous reservoir is for
estimation of reserves (Harrel, 2004; Hodgin, 2006; Sidle, 2010). However, a literature review about the processes used in the
oil industry to systematically identify analogous reservoirs showed that this is an area with limited technical references. Also,
some major oil companies consider the analogous reservoir identification skill as a differentiating capability for reservoir
assessments.
There are many reasons to develop a systematic method to identify analogous reservoirs. From a reservoir development point
of view, one of the most important reasons is to support the assessment of new business opportunities that are normally
restricted by short evaluation times, and limited amount of specific information from a particular or target reservoir.
The objective of this paper is to describe a statistical method for systematic identification of analogous reservoirs. It includes
the procedure followed to integrate a reservoir database to be used by the statistical analysis. The developed method was
successfully validated with a well-known Respol carbonate reservoir, which its respective analogous reservoirs were
previously identified by a qualified geoscience team.
Among the advantages of the method presented in this paper are: (1) It generates a list of analogous reservoirs ranked by
similarity criteria in a quantitative form. (2) It helps to diminish the error originated by an evaluator because of: prejudice,
erroneous or lack of experience. (3) It supports the predictive estimation of missing properties for target reservoir. (4) It
provides flexibility to improve or to adapt to different needs. (5) For the final user it is very easy to use. (6) It needs a
minimum amount of reservoir key parameters.

Method
The method proposed in this work basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate
Analysis, and Similarity Ranking. Fig. 1 shows the flow diagram of this new method.

KEY
DATA MULTIVARIATE SIMILARITY
PARAMETER
PREPROCESSING ANALYSIS RANKING
SELECTION

Figure 1. Flow diagram of the proposed method.

The first step corresponds to the analysis and preprocessing of the available database. It includes data validation,
normalization, outlier identification, missing values imputation and standardization of the properties. During the Key
Parameter Selection, experts identify the variables with largest impact on the specific used case to be evaluated. At the third
step we apply several multivariate techniques as principal component analysis (PCA) (Aminzadeh, 2005 and Sharma et al.,
2010) and cluster analysis (Sharma et al., 2010). PCA is used to reduce the dimension of the problem and avoid co-linearity
between the properties describing reservoirs. Cluster Analysis is then applied to the principal components extracted to identify
those analogous reservoirs lying in the same cluster as the target reservoir. Finally, at the Ranking step, we apply a similarity
function to the group of previously selected analogous reservoirs and generate a similarity ranking based on the normalized
similarity function.

Data Preprocessing
The main source of information to be used is a large database of reservoirs, created by merging existing databases from
different business units inside the company.
SPE 166449 3

This large database, from now on the master database (MDB), is organized by cases (rows), which will be the different
reservoirs, and variables (columns), that will be the different properties or characteristics taken into account for each reservoir.
Table 1 shows a fragment of the MDB.

Table 1. Snapshot of the first few lines and columns of the Master Database

The data preprocessing consist of five consecutive steps (Tan et al., 2006): (1) data validation, (2) normalization, (3) outlier
identification (and elimination), (4) missing values analysis (and imputation), and (5) standardization. All these five steps must
be done before the multivariate analysis. This data preprocessing needs to be repeated only if the MDB is updated.

Data validation
Depending on the type of variable (string, numeric, continuous, nominal, and ordinal), the treatment of data and the statistical
methods applicable will be different. In our case we have originally numeric (continuous) data and string nominal data (string
categorical variables). So, the first step is to define correctly the type of variable that we will be dealing with.

One issue related to the special characteristic of hierarchical categorical variables is an excessive number of categories and
subcategories for a given property, which makes its analysis and interpretation very difficult. Some of these subcategories have
very few cases (< 5 %), so they are not statistically representative. To solve this inconvenience some of these less
representative subcategories have been grouped into its respective higher level, reducing the number of levels in the
hierarchical variable.

An example of this is in Table 2, which shows a fragment of the Klemmes basins classification (Klemme, 1980), the Type II
which refers to Continental Multicyclic Basins. In this case, all items inside the third level: 2Ca, 2Cb and 2Cc are all merged
into category 2C in the second level.

Table 2. Type II of the Klemmes basins classification.

2. Continental Multicyclic Basins 2A. Craton Margin - Composite


2B. Craton Accreted Margin - Complex
2C. Crustal Collision Zone Convergent plate margin 2Ca. Closed
2Cb. Trough
2Cc. Open

On the other hand, some categorical variables in the MDB have different categories in the same cell (sorted by importance or
age), giving rise to many possible combinations of categories. This excess of combinations makes the interpretation unfeasible.
To avoid this problem we take into account only the most important category or the most recent one in every cell.

Finally, to allow using some needed statistical methods later in the multivariate analysis (without losing any information), all
the string nominal data are changed into numeric nominal data. So, the exact correspondence between the original string
values and the new numeric values must be described and preserved. This change has been done for all the categorical
variables.
4 SPE 166449

Normalization
One of the hypothesis of many of the statistical analysis used in this method is that the variables must follow a normal
probability distribution. To check agreement of this hypothesis we make histograms and normality analysis for each variable
in the MDB.

As a result, most of the variables followed a normal probability distribution; the ones that do not do that follow a lognormal
probability distribution. An example of this last kind of variables can be seen in Fig. 2. To normalize these variables, some
transformation must be applied (Box-Cox power transformation), and for the case shown the transformation is equivalent to
taking logarithms of these variables. An example of the result of applying this transformation is in Fig. 3.

From now on when we refer to these variables, it will be understood that we mean the logarithm of them. As an example, for
comparison purposes, the following figures show the effect of taking logarithms of the variable Dip Angle.

Figure 2. Example of a variable following a lognormal probability distribution (original variable).

Figure 3. Example of variable in Figure 2 after applying a logarithm transformation.


SPE 166449 5

Outlier identification
Outliers can have a disastrous effect on the results of the statistical analysis used in this method. To avoid this, a step is added
to identify and delete those outliers found. To detect the presence of outliers we use a Box-plot analysis for every variable in
the MDB, a very useful graphical tool for this purpose (Walpole et al, 2012). Fig. 4 shows a Box-plot for the variable
Averarage Matrix porosity (%), where one reservoir with anomalous value for this property appear (square symbol in the right
of the figure). The shaded area contains 50 % of the reported values, and the extended arms 95 %. The isolated points are
outliers. The line in the center of the shaded box is the median.

Figure 4. Box-plot for the variable average matrix porosity (%).

Missing values analysis and imputation


Missing values raise an important challenge because typical statistical modelling procedures discard these cases from the
analysis. When there are few missing values ( 10 % of the total number of cases) and those values can be considered to be
missing in a random way, then the typical method of list-wise deletion (if any of the variables have missing values, the whole
case is omitted from the computations) is a good alternative. This quick solution could not be applied if there are many
cases (> 10 %) with missing values, because then we would loose a great amount of information.

When there are too many cases with missing values, what is needed is an accurate estimation of these missing values. The
method used in this work for imputation of missing numeric scale data is the univariate model Multiple Linear Regression,
while Logistic Regression is used as the univariate model for dichotomous categorical variables. When there are more than
two outcome categories in the variable we used Discriminant Analysis and Multinomial Logistic Regression (Csar Prez,
2004).

During imputation, each variable in the MDB may be selected as independent variable or not, and can be restricted the range
of imputed values of a numeric scale variable so that they are plausible. In addition, the imputation will be restricted to
variables with less than a maximum percentage of missing values, in this work we use a 30 % cutoff.

After imputation, a user should check the results. In our case imputed values are inside the original range for any given
property, and the values of the mean, median and standard deviation for the whole set of data (included imputed ones) do not
change significantly from the original ones.

An important issue related to missing values imputation is that a new imputation is accomplished every time that the MDB is
updated with new reservoirs or new data. This is so because it is necessary to analyse again the new missing data pattern,
which may change, and the new ranges of the properties, because they could change too, what means that the imputed values
could be really different.

Standardization
Additionally, all the variables should be standardized to make their ranges have about the same order of magnitude. Having
6 SPE 166449

variables with very different orders of magnitude could influence the results of the multivariate analysis. So, every variable
(column in the MDB) is standardized by subtracting its mean and dividing by its standard deviation.

Thus, in the following analysis all the variables are standardized unless explicitly stated otherwise.

Selection of Key Parameters


The use of analogs is a very common practice in the E&P industry. Identifying reservoirs with similar features and
characteristics is a reliable way to infer unknown parameters from the target reservoir under evaluation and to analyze
production strategies.

Even if it is almost a routine activity inside the companies, little has been said about procedures to identify this similarity
among reservoirs. However, it is well accepted that the validity of the analogs depend on the purpose followed. In this sense,
we may define different kind of needs or type problems: static (not related with production), dynamic (related with
production), PVT behaviour (related with fluid properties), etc.

The parameters used to prove the analogy (commonly referred as Key Parameters, KP) change depending on the future use of
the analogs. Therefore, it is crucial to dedicate some time to define the role of the desired analogs, and to select the appropriate
set of Key Parameters to reduce to a minimum the uncertainty entailed in the act of assuming reservoir characteristics.

The majority of the references which deal with the subject of finding analogous reservoirs (Harrel, 2004; Hodgin, 2006; Sidle,
2010) talk about several categories of data or KP used to find analogies: geological, petrophysical, engineering and
operational. In this work we use the following KP shown in Table 3.

Table 3. List of Key Parameters used in this work.

BASIN CODE (BALLY)


BASIN CODE (KLEMME)
PRESENT TECTONICS CODE
FLUID TYPE CODE
PRINCIPAL STRUCTURAL CODE
PRIMARY TRAP TYPE CODE
log_NUMBER OF STRUCTURAL COMPARTMENTS
TOP RESERVOIR DEPTH (m)
log_DIP ANGLE ()
log_AREA (km2)
Log_ORIGINAL TOTAL HC COLUMN HEIGHT (m)
PRIMARY SEDIMENTARY SYSTEM CODE
PRIMARY SEDIMENTARY ENVIRONMENT CODE
log_NUMBER OF STRATIGRAPHIC COMPARTMENTS
PRIMARY LITHOLOGY CODE
PRIMARY POROSITY TYPE CODE
SECONDARY POROSITY TYPE CODE
log_AVERAGE GROSS THIKNESS (m)
log_AVERAGE NET PAY (m)
AVERAGE MATRIX POROSITY (%)
log_AVERAGE AIR PERMEABILITY (mD)
AVERAGE WATER SATURATION (%)
DIAGENETIC PROCESS CODE
FRACTURE RESERVOIR CLASSIFICATION CODE
SPE 166449 7

AVERAGE API GRAVITY (API)


ORIGINAL TEMPERATURE GRADIENT (C/m)
TEMPERATURE AT TOP RESERVOIR DEPTH (C)
ORIGINAL PRESSURE GRADIENT (KPa/m)
PRESSURE AT TOP RESERVOIR DEPTH (KPa)
PRIMARY DRIVE MECHANISM CODE

Multivariate Analysis
The core of the current method presented is based on a multivariate statistical analysis of the preprocessed KP database
(KPDB). The statistical techniques to be applied are Principal Components Analysis (PCA) and Cluster Analysis.

Principal Components Analysis


PCA is often used in dimension reduction, to identify a small number of new latent variables (principal components, PC) that
explain most of the variance that is observed in a much larger number of original variables. This fact is very important for
conceptual clarification and simplification of the later analysis.

The new variables (principal components) are computed as linear combinations of the original variables. They have the
desirable property that they are uncorrelated with each other. That the variables are uncorrelated makes the results easy to
attribute back to their source in Cluster Analysis.

In this study we use the PCA method. This method is used when there is a more emphasis on data reduction and less on
interpretation, whereas Factor Analysis is used when there is interest in studying the relationships among the variables.
Another advantage of the PCA method is that it is insensitive to the variables being multicollinear (Daniel Pea, 2002).

Taking into account that the variables may have very different orders of magnitude, we apply the PCA to the standardized
variables.

PCA tries to account for the maximum amount of variation in the set of variables. The value of every eigenvalue of the
correlation matrix represents the standardized variance in the original variables that is taken into account by its respective
component. On the other hand, the associated eigenvector defines the coefficients of every original variable in its respective
component.

The maximum amount of standardized variance contained in a single variable is 1. So, if an eigenvalue is greater than 1, it
must account for variation in several variables.

The first component selected has maximum variance (the greater the variance, the greater the information it has). So,
successive principal components explain progressively smaller portions of the variance. The result of applying PCA to our
data can be shown in Table 4.

Table 4. Absolute, relative and cumulative variance obtained from applying PCA to the KPDB.

Value of the
Cumulative
Component Eigenvalue Variance (%)
Variance (%)

1 4.64 15.5 15.5

2 2.99 10.0 25.5

3 2.72 9.1 34.6

4 2.06 6.9 41.5

5 1.80 6.0 47.5

6 1.62 5.4 52.9


8 SPE 166449

7 1.39 4.6 57.5

8 1.23 4.1 61.6

9 1.12 3.7 65.3

10 1.05 3.5 68.8

11 0.98 3.3 72.1

12 0.92 3.1 75.2

13 0.82 2.7 77.9

14 0.76 2.6 80.5

15 0.69 2.3 82.8

16 0.67 2.2 85.0

17 0.63 2.1 87.1

18 0.58 1.9 89.0

19 0.53 1.8 90.8

29 0.04 0.2 99.9

30 0.01 0.1 100.00

Usually, it is customary to select the first principal components that have a cumulative variance of 90 %. To fulfil this criterion
taking into account Table 4, we should select the first 19 principal components. If we would decide to relax this criterion, and
to maintain only 80 % of the original variance, we should select the first 14 principal components. Another criterion is to take
only those principal components with a variance greater than the mean variance. When the correlation matrix is used to
compute the principal components, the mean value of the variance is 1, so, we would select only those PC with an absolute
variance greater than 1. In this case, taking into account Table 4 we would select the first 10 principal components
(maintaining only the 69 % of the variance in the original data).

Depending on the criterion used, the reduction in the number of variables maybe significant. In this case, even with the last
criterion, the reduction in the number of variables is only two-third of the number of original variables, maintaining 10 PC.

Component loadings for the first 10 PC are in Table 5, which shows the weights for the KP for each PC.
SPE 166449 9

Table 5. Component loadings for the first 10 PCs.

Cluster analysis
Cluster Analysis is a multivariate statistical method for automatic or unsupervised data classification. In this work Cluster
Analysis is used to find groups (clusters) of similar reservoirs based on the variables examined, in this case the PC. where the
similarity between members of the same group is large and the similarity between members of different groups must be low.
The results can be used to identify associations that would otherwise not be apparent.
10 SPE 166449

Applying Cluster Analysis to the components scores resulting from the PCA, instead of applying it directly to the original
variables, has several advantages, it: i) simplifies the later interpretation of results and ii) avoids the problem of weighting in
excess some concepts that are taken into account at the same time by similar properties in the original variables.

The way we use Cluster Analysis is with the aim of identifying the cluster in which the target reservoir belongs. In that way,
all the reservoirs belonging to that cluster will be considered analogous reservoirs. This concept is explained in Fig. 5, in this
particular case the target reservoir belong to the cluster number 2.

Figure 5. Idealization of the application of cluster analysis for the identification of analogous reservoirs.

We use two different methods of cluster analysis: Hierarchical method with Wards grouping rule and Two Step method
(Csar Prez, 2004).

Hierarchical cluster analysis


This procedure attempts to identify relatively homogeneous groups of reservoirs based on selected characteristics (principal
components in this work). Using an algorithm that starts with each reservoir in a separate cluster, it combines clusters until
only one is left (to which all reservoirs belong).

Wards grouping rule is selected as an internal parameter in the hierarchical cluster analysis. This rule uses criteria for
association of clusters those movements that produce the minimum increase in the residual variance, so, in 3D it would tend to
create spherical shaped clusters.

One of the main outputs of this method is the dendrogram, a graphical representation used to assess the cohesiveness of the
clusters formed, and at the same time provide information about the appropriate number of clusters to keep. The dendogram
has a hierarchical structure, classifying all the reservoirs inside clusters in different levels, where the different levels represent
the distances between every cluster.

Two Step cluster analysis


The two steps in this algorithm refer to pre-clustering and clustering steps. During the pre-clustering step, individual cases are
grouped into pre-clusters in a single data pass. Next, in the clustering step, an agglomerative hierarchical cluster analysis
algorithm is applied to the pre-clusters. Statistical criteria are recorded as clustering is performed and are used to determine the
optimal number of clusters (within a user-specified range).

As an internal parameter in the two step cluster analysis we have selected the following choices:
Distance measure: Log-likelihood. This selection determines how the similarity between two clusters is computed.
The likelihood measure places a probability distribution on the variables. The two step procedure deals with both
categorical and continuous variables, and their distinct properties are taken into account by the log-likelihood distance
measure.
Clustering criterion: This selection determines how the automatic clustering algorithm determines the number of
clusters. Either the Schwarzs Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) can be
specified. These two criteria are used to choose the optimal number of clusters, through evaluation of the models
goodness-of-fit.

The Two Step Cluster Analysis procedure has several potential advantages compared to other clustering methods, which make
SPE 166449 11

it the better choice for our kind of data:


It can automatically select the number of clusters based on statistical criteria.
This procedure works with both continuous and categorical variables, taking account of their different properties.
It is recommended when applied to large data sets.

Similarity Ranking
Once the analogous reservoirs have been identified (those reservoirs that belong to the same cluster as the target reservoir), we
apply a similarity function to these a priori analogous reservoirs to classify them according to decreasing similarity regarding
the target reservoir. The aim here is to make a ranking of similarity with the analogous reservoirs regarding the target
reservoir, that is, to measure in a quantitative way which reservoirs (within all analogous reservoirs) are more similar to the
target reservoir, and how strong is this similarity. This concept is explained in Fig. 6.

Figure 6. Idealization of the similarity ranking of analogous reservoirs.

In this step we work with only a reduced set of reservoirs, those reservoirs that belong to the same cluster that the target
reservoir. These reservoirs are considered a priori analogous reservoirs. All the information needed to apply the similarity
function is represented in a schematic database in Fig. 7:

KP 1 KP 2 KP (j) KP p
Analogous Reservoir 1
Analogous Reservoir 2

Analogous Reservoir (i)

Analogous Reservoir n
Target Reservoir (t)

Figure 7. Schematic database containing all the information needed to apply the similarity function.

If the variable is continuous, for the j th KP we define the following similarity function between one analogous reservoir (i) and
the target reservoir (t):

| ( ) ( )|
( )
( )

Where:
( ) { ( )} { ( )}

If the variable is categorical, for the j th KP we define in this work the following similarity function between one analogous
12 SPE 166449

reservoir (i) and the target reservoir (t):

( ) ( )
( ) {
( ) ( )

Now we define the following global similarity function between one analogous reservoir (i) and the target reservoir (t). This
expression is known as Gowers General Similarity Coefficient (Gower, 1971):

( )
( )

Where:
are binary weights, in the following sense:

In this work all the weights are equal to one, but they may be different depending on the kind of KP and purpose of the
analogs.

Validation of the Method Using a Well-known Target


The Casablanca oil field is one of the main producer fields in the Spanish Mediterranean Sea. Discovered in 1975, is still in
operation with Repsol as main operator. Geographically, it is located at 45 km offshore south-east of the city of Tarragona
(Spain). The main reservoir rocks are Upper Jurassic and Lower Cretaceous shallow marine carbonates (Orlopp, 1988;
Lomando et al., 1993).

In general terms, once the method has been developed, it should be checked using a well-known case. For this purpose we are
going to use the Casablanca field (henceforth Casablanca) as the target reservoir, because we know some analogous reservoirs
from it. In this example, due to confidentiality reasons, all the data used come from the C&C Reservoirs DAKS commercial
database (C&C Reservoirs Limited 2013).

In the first step of the multivariate analysis we apply PCA to the KPDB, selecting finally 19 PC to follow the criterion of
selecting the first PC that have an accumulated variance of 90 % (losing as little information as possible). Then we apply both
clustering techniques Hierarchical Ward and Two Step on these 19 PCs.

The Hierarchical Ward method results in a dendogram. Fig. 8 below shows a fragment of this dendogram, where we have
outlined in red the cluster of Casablancas analogous reservoirs. This cluster consists of 17 reservoirs, including, of course,
Casablanca itself.
SPE 166449 13

Figure 8. Fragment of the dendogram obtained as result of applying hierarchical cluster analysis. Casablancas analogous reservoirs
are shown in red outline.

In this dendogram it could be interpreted that the cluster of Casablancas analogous reservoir is the merger of other two little
clusters with great similarity between them (this similarity is measured by the short length of the horizontal lines in blue that
merge them). In this case, the next possible grouping would need a big jump (three times longer length of the horizontal line in
yellow) to merge this cluster with others. So, we interpret that the natural classification of Casablancas analogous reservoirs
consists of 17 reservoirs.

We also applied the Two Step method with different numbers of clusters, trying to optimize this parameter to obtain a
natural classification of reservoirs. The final number of clusters selected is 25. With this configuration, the analogous cluster
consists of 21 reservoirs, included Casablanca itself. The analogous reservoirs obtained with the Two Step method are in
Table 6.

Table 6. Casablancas analogous reservoirs obtained with the two-step method.

AMPOSTA MARINO [MONTSIA]


ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)]
AUK [ZECHSTEIN (HALIBUT)]
BLACKBURN [NEVADA]
CASABLANCA
14 SPE 166449

EAGLE SPRINGS [SHEEP PASS]


GELA [TAORMINA]
GRANT CANYON [GUILMETTE]
LIUBEI [WUMISHAN]
MARKOVO [OSA]
MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS]
RAGUSA [TAORMINA]
RECHITSA [SEMILUKI]
RECHITSA [VORONEZH]
RECHITSA [ZADONSK]
RENQIU [WUMISHAN]
VEGA [SIRACUSA]
VERKHNECHONA [DANILOV (PREOBRAZHEN HZ)]
VERKHNEVILYUY [YURYAKH]
YANLING [WUMISHAN]
YIHEZHUANG [MAJIAGOU-BADOU]

Comparing the results obtained with both methods, we find that all analogous reservoirs identified by the Hierarchical Ward
method are also identified by the Two Step method, except reservoir NAGYLENGYEL. Moreover, the Two Step method
identifies five new analogous reservoirs that were not identified previously by the Hierarchical method: ARDMORE, AUK,
MARKOVO, VERKHNECHONA and VERKHNEVILYUY.

In this study, the criterion to define Casablancas analogous reservoirs after cluster analysis has been to choose all those
analogous reservoirs identified in at least one of the two clustering methods used. So, the final list of Casablancas analogous
reservoirs after cluster analysis is in Table 7.

Table 7. Casablancas analogous reservoirs obtained after cluster analysis.

AMPOSTA MARINO [MONTSIA]


ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)]
AUK [ZECHSTEIN (HALIBUT)]
BLACKBURN [NEVADA]
CASABLANCA
EAGLE SPRINGS [SHEEP PASS]
GELA [TAORMINA]
GRANT CANYON [GUILMETTE]
LIUBEI [WUMISHAN]
MARKOVO [OSA]
MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS]
NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)]
RAGUSA [TAORMINA]
RECHITSA [SEMILUKI]
RECHITSA [VORONEZH]
RECHITSA [ZADONSK]
RENQIU [WUMISHAN]
VEGA [SIRACUSA]
SPE 166449 15

VERKHNECHONA [DANILOV (PREOBRAZHEN HZ)]


VERKHNEVILYUY [YURYAKH]
YANLING [WUMISHAN]
YIHEZHUANG [MAJIAGOU-BADOU]

To check if the method developed in this study is valid, the results obtained are compared with companys previously known
Casablancas analogous reservoirs:

Amposta Marino

Nagylengyel

Yanling

Renqiu

As can be seen, the four known analogous reservoirs have also been identified as analogous reservoirs by the method
developed in this study. Besides, a similarity function has been applied to all Casablancas analogous reservoirs after cluster
analysis, to make a ranking of similarity regarding Casablanca reservoir. The similarity index and similarity ranking of
Casablancas analogous reservoirs are represented in Table 8, and Fig. 9 below shows their geographical distribution.

Table 8. Similarity index and ranking of Casablancas analogous reservoirs obtained with the method
proposed in this work.

RESERVOIR
SIMILARITY SIMILARITY
INDEX RANKING

AMPOSTA MARINO [MONTSIA] 0,85 1


RECHITSA [VORONEZH] 0,75 2

YIHEZHUANG [MAJIAGOU-BADOU] 0,75 2


RECHITSA [SEMILUKI] 0,74 3
LIUBEI [WUMISHAN] 0,74 3
MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS] 0,73 4
RENQIU [WUMISHAN] 0,72 5
YANLING [WUMISHAN] 0,72 5

RECHITSA [ZADONSK] 0,71 6


AUK [ZECHSTEIN (HALIBUT)] 0,68 7
BLACKBURN [NEVADA] 0,68 7
GRANT CANYON [GUILMETTE] 0,67 8
NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)] 0,66 9
ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)] 0,66 9

VEGA [SIRACUSA] 0,63 10


EAGLE SPRINGS [SHEEP PASS] 0,61 11
RAGUSA [TAORMINA] 0,59 12
GELA [TAORMINA] 0,57 13
MARKOVO [OSA] 0,57 13
16 SPE 166449

Figure 9. Geographycal distribution of Casablanca reservoir and its analogous reservoirs

Using the method proposed in this work obtains the following results:

19 analogous reservoirs have been finally obtained.


The 4 previously known reservoirs analogous to Casablanca were also identified.
Within the 4 previously known analogous reservoirs, AMPOSTA MARINO is the most similar (first in the ranking),
RENQIU and YANLING are 5th, and NAGYLENGYEL is 9th. These are in bold in Table 8 above.
Similarity index is between 0.57 0.85. Only those reservoirs with similarity index greater than 0.5 are ultimately
identified as analogous reservoirs (notice that 2 reservoirs from the 21 originally selected were left out because they
had a similarity index of only 0.4: VERKHNECHONA and VERKHNEVILYUY).

Summary and Conclusions


Development of the statistical method for the identification of analogous reservoirs
Using statistical criteria, we performed an evaluation of the existing data in the master database. In this way, outliers were
detected, analyzed and corrected. Using a statistical procedure of imputation, all the missing data in the master database were
estimated, which allowed us to work with a 100 % complete database.

We developed a statistical and systematic method for the identification of analogues reservoirs. This method is based on
principal component analysis using as input data the KP in the carbonate database, followed by a cluster generation using as
imput data the PCs obtained in the previous step. The clustering methods used are Hierarchical Ward and Two Step. With each
method and for every target reservoir, we generated the necessary clusters to obtain a natural grouping of reservoirs.
Additionally, a similarity index was calculated for every analogous reservoir, based on the mean distance of the set of
properties of every reservoir regarding the target reservoir. This similarity index allows us to generate a similarity ranking of
all the analogous reservoirs inside the cluster to which the target reservoir belongs.

Validation of the developed method


We validated the new method using the Casablanca reservoir as the target, because it is a carbonate reservoir that is very well-
known inside the company, and a set of 4 analogous reservoirs previously identified by the business unit team after an
independent analysis of the properties of Casablanca.

The developed method was successfully applied in this case, obtaining 19 analogous reservoirs sorted by similarity criteria
regarding Casablanca reservoir. The maximum similarity found was 85 % for Amposta Marino reservoir.

This new method provides a systematic procedure for the identification of analogous reservoirs, with the following
advantages:
Results are obtained in an objective manner. They are unaffected by personal criteria or lack of previous experience.
SPE 166449 17

The method allows the application of simultaneous similarity concepts. This is an improvement over the tools that are
based on successive application of filters.
The method allows estimation of missing parameters in the target reservoir, based on the information provided by the
identified analogous reservoirs.

The analogous reservoirs found with this method obviously depend on the Key Parameters used (number, weighting and kind
of them). Different KP or weighting of them will give inevitably different classifications. A very important step in this kind of
analysis is the proper definition of the KP to be included and the value of its weights.

Acknowledgements
The authors would like to thank C&C Reservoirs for granting permission to publish some of the results obtained in the
example shown in this paper.

References

1. Aminzadeh, F. 2005. Applications of AI and soft computing for challenging problems in the oil industry. Journal of Petroleum
Science and Engineering.
2. Csar Prez. 2004. Tcnicas de Anlisis Multivariante de Datos. Prentice Hall.
3. C&C Reservoirs Limited. 2013. C&C Reservoirs DAKS, http://www.ccreservoirs.com/ (accessed 5 April 2013).
4. Daniel Pea. 2002. Anlisis de Datos Multivariantes. McGraw-Hill.
5. Harrell, D.R. and Hodgin, J.E. 2004. Oil and gas estimates: recurring mistakes and errors. SPE-91069.
6. Hodgin, J.E and D.R. Harrell, D.R. 2006. The selection, application, and misapplication of reservoir analogs for the estimation of
petroleum reserves. SPE-102505.
7. Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27, 857-74.
8. Klemme, H.D. 1980. Petroleum basins: classification and characteristics. Journal Petroleum Geology, Vol.3, No.2, 187207.
9. Orlopp, D.E.1988. Casablanca Oilfied, Spain: a karsted carbonate trap at the shelf edge. Proceedings of the Offshore Technology
Conference, OTC 5734, 441-448.
10. Lomando, A.J., Harris, P.M., Orlopp, D.E. 1993. Casablanca Field, Tarragona Basin, Offshore Spain: A karsted carbonate
reservoir. In: Fritz, R.D., Wilson, J.L., Yurewicz, D.A. (eds.). Paleokarst related hydrocarbon reservoirs, SEMP Core Workshop,
18, 201-225.
11. Sharma, S. Srinivasan and Larry W. Lake. 2010. Classification of oil and gas reservoirs based on recovery factor: a data-mining
approach. SPE-130257.
12. Sidle, R.E. and Lee, W.J. 2010. An update on the use of reservoir analogs for the estimation of oil and gas reserves. SPE
Economics & Management.
13. Tan, P., Steinbach, M. and Kumar, V. 2006. Introduction to Data Mining. Addison-Wesley.
14. Walpole, Myers & Ye. 2012. Probability and Statistics for Engineers and Scientists. Prentice-Hall.