8 views

Original Title: SPE-166449-MS.pdf

Uploaded by Udi

- Beginners Guide to Learn Dimension Reduction Techniques
- Principal Component Analysis on a LES of a Squared Ribbed Channel
- Cerny-TR-2014-07
- IMDS
- Handling Data with Three Types of Missing Values
- 15
- Comparación de técnicas de detección de cambios para monitoreo desmonte y regeneración en bosques tropicales en una serie de tiempo.
- Gupta2018 Rock Typing Shale
- Transportation Statistics: entire
- a06v47n5.pdf
- Texture-based Segmentation of Diffuse Lesions of the Brain’s White Matter
- 2. Ijcseitr - Achieving Privacy Preserving Clustering In
- tdm-otey
- The Use of Discrete Data in PCAR
- I_M
- Behavioral Recognition of Fish Using Accelerometer Data
- ththrghn6nnghgnm
- lec6.pdf
- Analisis components Principales
- CAROLINE_LAMB (MSc) Probabilistic Performance-Based Geometric

You are on page 1of 17

H. Martin Rodriguez, E. Escobar, S. Embid, N. Rodriguez, and M. Hegazy, Repsol, and Larry W. Lake, SPE, The

University of Texas at Austin

This paper was prepared for presentation at the SPE Annual Technical Conference and Exhibition held in New Orleans, Louisiana, USA, 30 September2 October 2013.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by t he author(s). Contents of the paper have not been

reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its

officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to

reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract mus t contain conspicuous acknowledgment of SPE copyright.

Abstract

The identification of analogous reservoirs is an important step in planning the development of a new field, because the

information available about the new areas is usually limited or even nonexistent. Traditionally the search for analogous

reservoirs has been made by experienced geoscientists, but this practice is subject to availability of this experience and the

results are heavily dominated by geology. In this paper we present a systematic and unbiased procedure to search for

analogous reservoirs, based on information contained in a validated large database of reservoirs parameters, both engineering

and geologic. Each reservoir has its own fingerprint characterized by the set of its own properties, which differ from one

reservoir to another. The method uses multivariate statistical techniques to find a unique and reproducible list of reservoirs

with fingerprints that are most similar to the selected target. The flexibility of the method allows variation of the similarity

function (weights) and evaluation of different scenarios (static, dynamic, PVT behavior, etc.).

Our method basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate Analysis, and

Similarity Ranking. The first step consists of the analysis and preprocessing of the available database. In the second step, Key

Parameter Selection, variables with largest impact on the case to be evaluated are identified. The third step, Multivariate

Analysis, applies several multivariate techniques such as principal component analysis (PCA) and cluster analysis. Finally, in

the Ranking step, we apply a similarity function to the group of previously selected analogous reservoirs, generating a

similarity ranking of analogous reservoirs.

To validate this new method we use the Casablanca oil field as a target reservoir. Casablanca is a mature carbonate reservoir

very well known by Repsol whose experts identified four analogues for this target. The new developed method was

independently applied in this case to obtain 19 analogous reservoirs sorted by similarity criteria. The maximum similarity

found was 85 % for the Amposta Marino reservoir, one of the independently identified analogous reservoirs given by the

business unit team. Moreover, the four analogous reservoirs previously identified by the business unit team were between the

first ten positions in the similarity ranking. These results are highly encouraging as it captures the know-how of the experts and

ensures a reproducible response, regardless of the user expertise.

The most relevant advantage of this new method is that it is based on a similarity function that takes into account all the

weighted key parameters simultaneously, instead of sequential filters used by some commercial software. As a result, the

procedure we present in this work will support the predictive search of missing properties for the target reservoir, reducing the

uncertainty for decision making.

Introduction

In Oil & Gas Exploration and Production, analogous reservoirs are normally used in many ways to study reservoir that lack

critical knowledge. In this paper, we will call a target reservoir any reservoir with a deficit of information to which we want

to identify a ranked list of analogues reservoirs. Besides, analogous reservoirs are those with the most similar fingerprints to

the selected target based on the selected key parameters, not necessarily geographically closed.

2 SPE 166449

Often, relatively similar neighboring reservoirs are used as a first approximation of analogous reservoirs to provide reservoir

data as: PVT properties, and petrophysic information. However, although this is not always the best decision, this is done

because a lack of a simple method for identifying analogs in the short term. In other cases, the development plan and

production behavior of analogous reservoirs are used as a reference for preliminary forecasting of less mature projects. The

screening criterion of EOR processes is other example, where analogous reservoir concepts are used to pre-select the potential

recovery processes for application to a target reservoir. All the same, one of the best known uses of analogous reservoir is for

estimation of reserves (Harrel, 2004; Hodgin, 2006; Sidle, 2010). However, a literature review about the processes used in the

oil industry to systematically identify analogous reservoirs showed that this is an area with limited technical references. Also,

some major oil companies consider the analogous reservoir identification skill as a differentiating capability for reservoir

assessments.

There are many reasons to develop a systematic method to identify analogous reservoirs. From a reservoir development point

of view, one of the most important reasons is to support the assessment of new business opportunities that are normally

restricted by short evaluation times, and limited amount of specific information from a particular or target reservoir.

The objective of this paper is to describe a statistical method for systematic identification of analogous reservoirs. It includes

the procedure followed to integrate a reservoir database to be used by the statistical analysis. The developed method was

successfully validated with a well-known Respol carbonate reservoir, which its respective analogous reservoirs were

previously identified by a qualified geoscience team.

Among the advantages of the method presented in this paper are: (1) It generates a list of analogous reservoirs ranked by

similarity criteria in a quantitative form. (2) It helps to diminish the error originated by an evaluator because of: prejudice,

erroneous or lack of experience. (3) It supports the predictive estimation of missing properties for target reservoir. (4) It

provides flexibility to improve or to adapt to different needs. (5) For the final user it is very easy to use. (6) It needs a

minimum amount of reservoir key parameters.

Method

The method proposed in this work basically consists of four steps: Data Preprocessing, Key Parameters Selection, Multivariate

Analysis, and Similarity Ranking. Fig. 1 shows the flow diagram of this new method.

KEY

DATA MULTIVARIATE SIMILARITY

PARAMETER

PREPROCESSING ANALYSIS RANKING

SELECTION

The first step corresponds to the analysis and preprocessing of the available database. It includes data validation,

normalization, outlier identification, missing values imputation and standardization of the properties. During the Key

Parameter Selection, experts identify the variables with largest impact on the specific used case to be evaluated. At the third

step we apply several multivariate techniques as principal component analysis (PCA) (Aminzadeh, 2005 and Sharma et al.,

2010) and cluster analysis (Sharma et al., 2010). PCA is used to reduce the dimension of the problem and avoid co-linearity

between the properties describing reservoirs. Cluster Analysis is then applied to the principal components extracted to identify

those analogous reservoirs lying in the same cluster as the target reservoir. Finally, at the Ranking step, we apply a similarity

function to the group of previously selected analogous reservoirs and generate a similarity ranking based on the normalized

similarity function.

Data Preprocessing

The main source of information to be used is a large database of reservoirs, created by merging existing databases from

different business units inside the company.

SPE 166449 3

This large database, from now on the master database (MDB), is organized by cases (rows), which will be the different

reservoirs, and variables (columns), that will be the different properties or characteristics taken into account for each reservoir.

Table 1 shows a fragment of the MDB.

Table 1. Snapshot of the first few lines and columns of the Master Database

The data preprocessing consist of five consecutive steps (Tan et al., 2006): (1) data validation, (2) normalization, (3) outlier

identification (and elimination), (4) missing values analysis (and imputation), and (5) standardization. All these five steps must

be done before the multivariate analysis. This data preprocessing needs to be repeated only if the MDB is updated.

Data validation

Depending on the type of variable (string, numeric, continuous, nominal, and ordinal), the treatment of data and the statistical

methods applicable will be different. In our case we have originally numeric (continuous) data and string nominal data (string

categorical variables). So, the first step is to define correctly the type of variable that we will be dealing with.

One issue related to the special characteristic of hierarchical categorical variables is an excessive number of categories and

subcategories for a given property, which makes its analysis and interpretation very difficult. Some of these subcategories have

very few cases (< 5 %), so they are not statistically representative. To solve this inconvenience some of these less

representative subcategories have been grouped into its respective higher level, reducing the number of levels in the

hierarchical variable.

An example of this is in Table 2, which shows a fragment of the Klemmes basins classification (Klemme, 1980), the Type II

which refers to Continental Multicyclic Basins. In this case, all items inside the third level: 2Ca, 2Cb and 2Cc are all merged

into category 2C in the second level.

2B. Craton Accreted Margin - Complex

2C. Crustal Collision Zone Convergent plate margin 2Ca. Closed

2Cb. Trough

2Cc. Open

On the other hand, some categorical variables in the MDB have different categories in the same cell (sorted by importance or

age), giving rise to many possible combinations of categories. This excess of combinations makes the interpretation unfeasible.

To avoid this problem we take into account only the most important category or the most recent one in every cell.

Finally, to allow using some needed statistical methods later in the multivariate analysis (without losing any information), all

the string nominal data are changed into numeric nominal data. So, the exact correspondence between the original string

values and the new numeric values must be described and preserved. This change has been done for all the categorical

variables.

4 SPE 166449

Normalization

One of the hypothesis of many of the statistical analysis used in this method is that the variables must follow a normal

probability distribution. To check agreement of this hypothesis we make histograms and normality analysis for each variable

in the MDB.

As a result, most of the variables followed a normal probability distribution; the ones that do not do that follow a lognormal

probability distribution. An example of this last kind of variables can be seen in Fig. 2. To normalize these variables, some

transformation must be applied (Box-Cox power transformation), and for the case shown the transformation is equivalent to

taking logarithms of these variables. An example of the result of applying this transformation is in Fig. 3.

From now on when we refer to these variables, it will be understood that we mean the logarithm of them. As an example, for

comparison purposes, the following figures show the effect of taking logarithms of the variable Dip Angle.

SPE 166449 5

Outlier identification

Outliers can have a disastrous effect on the results of the statistical analysis used in this method. To avoid this, a step is added

to identify and delete those outliers found. To detect the presence of outliers we use a Box-plot analysis for every variable in

the MDB, a very useful graphical tool for this purpose (Walpole et al, 2012). Fig. 4 shows a Box-plot for the variable

Averarage Matrix porosity (%), where one reservoir with anomalous value for this property appear (square symbol in the right

of the figure). The shaded area contains 50 % of the reported values, and the extended arms 95 %. The isolated points are

outliers. The line in the center of the shaded box is the median.

Missing values raise an important challenge because typical statistical modelling procedures discard these cases from the

analysis. When there are few missing values ( 10 % of the total number of cases) and those values can be considered to be

missing in a random way, then the typical method of list-wise deletion (if any of the variables have missing values, the whole

case is omitted from the computations) is a good alternative. This quick solution could not be applied if there are many

cases (> 10 %) with missing values, because then we would loose a great amount of information.

When there are too many cases with missing values, what is needed is an accurate estimation of these missing values. The

method used in this work for imputation of missing numeric scale data is the univariate model Multiple Linear Regression,

while Logistic Regression is used as the univariate model for dichotomous categorical variables. When there are more than

two outcome categories in the variable we used Discriminant Analysis and Multinomial Logistic Regression (Csar Prez,

2004).

During imputation, each variable in the MDB may be selected as independent variable or not, and can be restricted the range

of imputed values of a numeric scale variable so that they are plausible. In addition, the imputation will be restricted to

variables with less than a maximum percentage of missing values, in this work we use a 30 % cutoff.

After imputation, a user should check the results. In our case imputed values are inside the original range for any given

property, and the values of the mean, median and standard deviation for the whole set of data (included imputed ones) do not

change significantly from the original ones.

An important issue related to missing values imputation is that a new imputation is accomplished every time that the MDB is

updated with new reservoirs or new data. This is so because it is necessary to analyse again the new missing data pattern,

which may change, and the new ranges of the properties, because they could change too, what means that the imputed values

could be really different.

Standardization

Additionally, all the variables should be standardized to make their ranges have about the same order of magnitude. Having

6 SPE 166449

variables with very different orders of magnitude could influence the results of the multivariate analysis. So, every variable

(column in the MDB) is standardized by subtracting its mean and dividing by its standard deviation.

Thus, in the following analysis all the variables are standardized unless explicitly stated otherwise.

The use of analogs is a very common practice in the E&P industry. Identifying reservoirs with similar features and

characteristics is a reliable way to infer unknown parameters from the target reservoir under evaluation and to analyze

production strategies.

Even if it is almost a routine activity inside the companies, little has been said about procedures to identify this similarity

among reservoirs. However, it is well accepted that the validity of the analogs depend on the purpose followed. In this sense,

we may define different kind of needs or type problems: static (not related with production), dynamic (related with

production), PVT behaviour (related with fluid properties), etc.

The parameters used to prove the analogy (commonly referred as Key Parameters, KP) change depending on the future use of

the analogs. Therefore, it is crucial to dedicate some time to define the role of the desired analogs, and to select the appropriate

set of Key Parameters to reduce to a minimum the uncertainty entailed in the act of assuming reservoir characteristics.

The majority of the references which deal with the subject of finding analogous reservoirs (Harrel, 2004; Hodgin, 2006; Sidle,

2010) talk about several categories of data or KP used to find analogies: geological, petrophysical, engineering and

operational. In this work we use the following KP shown in Table 3.

BASIN CODE (KLEMME)

PRESENT TECTONICS CODE

FLUID TYPE CODE

PRINCIPAL STRUCTURAL CODE

PRIMARY TRAP TYPE CODE

log_NUMBER OF STRUCTURAL COMPARTMENTS

TOP RESERVOIR DEPTH (m)

log_DIP ANGLE ()

log_AREA (km2)

Log_ORIGINAL TOTAL HC COLUMN HEIGHT (m)

PRIMARY SEDIMENTARY SYSTEM CODE

PRIMARY SEDIMENTARY ENVIRONMENT CODE

log_NUMBER OF STRATIGRAPHIC COMPARTMENTS

PRIMARY LITHOLOGY CODE

PRIMARY POROSITY TYPE CODE

SECONDARY POROSITY TYPE CODE

log_AVERAGE GROSS THIKNESS (m)

log_AVERAGE NET PAY (m)

AVERAGE MATRIX POROSITY (%)

log_AVERAGE AIR PERMEABILITY (mD)

AVERAGE WATER SATURATION (%)

DIAGENETIC PROCESS CODE

FRACTURE RESERVOIR CLASSIFICATION CODE

SPE 166449 7

ORIGINAL TEMPERATURE GRADIENT (C/m)

TEMPERATURE AT TOP RESERVOIR DEPTH (C)

ORIGINAL PRESSURE GRADIENT (KPa/m)

PRESSURE AT TOP RESERVOIR DEPTH (KPa)

PRIMARY DRIVE MECHANISM CODE

Multivariate Analysis

The core of the current method presented is based on a multivariate statistical analysis of the preprocessed KP database

(KPDB). The statistical techniques to be applied are Principal Components Analysis (PCA) and Cluster Analysis.

PCA is often used in dimension reduction, to identify a small number of new latent variables (principal components, PC) that

explain most of the variance that is observed in a much larger number of original variables. This fact is very important for

conceptual clarification and simplification of the later analysis.

The new variables (principal components) are computed as linear combinations of the original variables. They have the

desirable property that they are uncorrelated with each other. That the variables are uncorrelated makes the results easy to

attribute back to their source in Cluster Analysis.

In this study we use the PCA method. This method is used when there is a more emphasis on data reduction and less on

interpretation, whereas Factor Analysis is used when there is interest in studying the relationships among the variables.

Another advantage of the PCA method is that it is insensitive to the variables being multicollinear (Daniel Pea, 2002).

Taking into account that the variables may have very different orders of magnitude, we apply the PCA to the standardized

variables.

PCA tries to account for the maximum amount of variation in the set of variables. The value of every eigenvalue of the

correlation matrix represents the standardized variance in the original variables that is taken into account by its respective

component. On the other hand, the associated eigenvector defines the coefficients of every original variable in its respective

component.

The maximum amount of standardized variance contained in a single variable is 1. So, if an eigenvalue is greater than 1, it

must account for variation in several variables.

The first component selected has maximum variance (the greater the variance, the greater the information it has). So,

successive principal components explain progressively smaller portions of the variance. The result of applying PCA to our

data can be shown in Table 4.

Table 4. Absolute, relative and cumulative variance obtained from applying PCA to the KPDB.

Value of the

Cumulative

Component Eigenvalue Variance (%)

Variance (%)

8 SPE 166449

Usually, it is customary to select the first principal components that have a cumulative variance of 90 %. To fulfil this criterion

taking into account Table 4, we should select the first 19 principal components. If we would decide to relax this criterion, and

to maintain only 80 % of the original variance, we should select the first 14 principal components. Another criterion is to take

only those principal components with a variance greater than the mean variance. When the correlation matrix is used to

compute the principal components, the mean value of the variance is 1, so, we would select only those PC with an absolute

variance greater than 1. In this case, taking into account Table 4 we would select the first 10 principal components

(maintaining only the 69 % of the variance in the original data).

Depending on the criterion used, the reduction in the number of variables maybe significant. In this case, even with the last

criterion, the reduction in the number of variables is only two-third of the number of original variables, maintaining 10 PC.

Component loadings for the first 10 PC are in Table 5, which shows the weights for the KP for each PC.

SPE 166449 9

Cluster analysis

Cluster Analysis is a multivariate statistical method for automatic or unsupervised data classification. In this work Cluster

Analysis is used to find groups (clusters) of similar reservoirs based on the variables examined, in this case the PC. where the

similarity between members of the same group is large and the similarity between members of different groups must be low.

The results can be used to identify associations that would otherwise not be apparent.

10 SPE 166449

Applying Cluster Analysis to the components scores resulting from the PCA, instead of applying it directly to the original

variables, has several advantages, it: i) simplifies the later interpretation of results and ii) avoids the problem of weighting in

excess some concepts that are taken into account at the same time by similar properties in the original variables.

The way we use Cluster Analysis is with the aim of identifying the cluster in which the target reservoir belongs. In that way,

all the reservoirs belonging to that cluster will be considered analogous reservoirs. This concept is explained in Fig. 5, in this

particular case the target reservoir belong to the cluster number 2.

Figure 5. Idealization of the application of cluster analysis for the identification of analogous reservoirs.

We use two different methods of cluster analysis: Hierarchical method with Wards grouping rule and Two Step method

(Csar Prez, 2004).

This procedure attempts to identify relatively homogeneous groups of reservoirs based on selected characteristics (principal

components in this work). Using an algorithm that starts with each reservoir in a separate cluster, it combines clusters until

only one is left (to which all reservoirs belong).

Wards grouping rule is selected as an internal parameter in the hierarchical cluster analysis. This rule uses criteria for

association of clusters those movements that produce the minimum increase in the residual variance, so, in 3D it would tend to

create spherical shaped clusters.

One of the main outputs of this method is the dendrogram, a graphical representation used to assess the cohesiveness of the

clusters formed, and at the same time provide information about the appropriate number of clusters to keep. The dendogram

has a hierarchical structure, classifying all the reservoirs inside clusters in different levels, where the different levels represent

the distances between every cluster.

The two steps in this algorithm refer to pre-clustering and clustering steps. During the pre-clustering step, individual cases are

grouped into pre-clusters in a single data pass. Next, in the clustering step, an agglomerative hierarchical cluster analysis

algorithm is applied to the pre-clusters. Statistical criteria are recorded as clustering is performed and are used to determine the

optimal number of clusters (within a user-specified range).

As an internal parameter in the two step cluster analysis we have selected the following choices:

Distance measure: Log-likelihood. This selection determines how the similarity between two clusters is computed.

The likelihood measure places a probability distribution on the variables. The two step procedure deals with both

categorical and continuous variables, and their distinct properties are taken into account by the log-likelihood distance

measure.

Clustering criterion: This selection determines how the automatic clustering algorithm determines the number of

clusters. Either the Schwarzs Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) can be

specified. These two criteria are used to choose the optimal number of clusters, through evaluation of the models

goodness-of-fit.

The Two Step Cluster Analysis procedure has several potential advantages compared to other clustering methods, which make

SPE 166449 11

It can automatically select the number of clusters based on statistical criteria.

This procedure works with both continuous and categorical variables, taking account of their different properties.

It is recommended when applied to large data sets.

Similarity Ranking

Once the analogous reservoirs have been identified (those reservoirs that belong to the same cluster as the target reservoir), we

apply a similarity function to these a priori analogous reservoirs to classify them according to decreasing similarity regarding

the target reservoir. The aim here is to make a ranking of similarity with the analogous reservoirs regarding the target

reservoir, that is, to measure in a quantitative way which reservoirs (within all analogous reservoirs) are more similar to the

target reservoir, and how strong is this similarity. This concept is explained in Fig. 6.

In this step we work with only a reduced set of reservoirs, those reservoirs that belong to the same cluster that the target

reservoir. These reservoirs are considered a priori analogous reservoirs. All the information needed to apply the similarity

function is represented in a schematic database in Fig. 7:

KP 1 KP 2 KP (j) KP p

Analogous Reservoir 1

Analogous Reservoir 2

Analogous Reservoir (i)

Analogous Reservoir n

Target Reservoir (t)

Figure 7. Schematic database containing all the information needed to apply the similarity function.

If the variable is continuous, for the j th KP we define the following similarity function between one analogous reservoir (i) and

the target reservoir (t):

| ( ) ( )|

( )

( )

Where:

( ) { ( )} { ( )}

If the variable is categorical, for the j th KP we define in this work the following similarity function between one analogous

12 SPE 166449

( ) ( )

( ) {

( ) ( )

Now we define the following global similarity function between one analogous reservoir (i) and the target reservoir (t). This

expression is known as Gowers General Similarity Coefficient (Gower, 1971):

( )

( )

Where:

are binary weights, in the following sense:

In this work all the weights are equal to one, but they may be different depending on the kind of KP and purpose of the

analogs.

The Casablanca oil field is one of the main producer fields in the Spanish Mediterranean Sea. Discovered in 1975, is still in

operation with Repsol as main operator. Geographically, it is located at 45 km offshore south-east of the city of Tarragona

(Spain). The main reservoir rocks are Upper Jurassic and Lower Cretaceous shallow marine carbonates (Orlopp, 1988;

Lomando et al., 1993).

In general terms, once the method has been developed, it should be checked using a well-known case. For this purpose we are

going to use the Casablanca field (henceforth Casablanca) as the target reservoir, because we know some analogous reservoirs

from it. In this example, due to confidentiality reasons, all the data used come from the C&C Reservoirs DAKS commercial

database (C&C Reservoirs Limited 2013).

In the first step of the multivariate analysis we apply PCA to the KPDB, selecting finally 19 PC to follow the criterion of

selecting the first PC that have an accumulated variance of 90 % (losing as little information as possible). Then we apply both

clustering techniques Hierarchical Ward and Two Step on these 19 PCs.

The Hierarchical Ward method results in a dendogram. Fig. 8 below shows a fragment of this dendogram, where we have

outlined in red the cluster of Casablancas analogous reservoirs. This cluster consists of 17 reservoirs, including, of course,

Casablanca itself.

SPE 166449 13

Figure 8. Fragment of the dendogram obtained as result of applying hierarchical cluster analysis. Casablancas analogous reservoirs

are shown in red outline.

In this dendogram it could be interpreted that the cluster of Casablancas analogous reservoir is the merger of other two little

clusters with great similarity between them (this similarity is measured by the short length of the horizontal lines in blue that

merge them). In this case, the next possible grouping would need a big jump (three times longer length of the horizontal line in

yellow) to merge this cluster with others. So, we interpret that the natural classification of Casablancas analogous reservoirs

consists of 17 reservoirs.

We also applied the Two Step method with different numbers of clusters, trying to optimize this parameter to obtain a

natural classification of reservoirs. The final number of clusters selected is 25. With this configuration, the analogous cluster

consists of 21 reservoirs, included Casablanca itself. The analogous reservoirs obtained with the Two Step method are in

Table 6.

ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)]

AUK [ZECHSTEIN (HALIBUT)]

BLACKBURN [NEVADA]

CASABLANCA

14 SPE 166449

GELA [TAORMINA]

GRANT CANYON [GUILMETTE]

LIUBEI [WUMISHAN]

MARKOVO [OSA]

MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS]

RAGUSA [TAORMINA]

RECHITSA [SEMILUKI]

RECHITSA [VORONEZH]

RECHITSA [ZADONSK]

RENQIU [WUMISHAN]

VEGA [SIRACUSA]

VERKHNECHONA [DANILOV (PREOBRAZHEN HZ)]

VERKHNEVILYUY [YURYAKH]

YANLING [WUMISHAN]

YIHEZHUANG [MAJIAGOU-BADOU]

Comparing the results obtained with both methods, we find that all analogous reservoirs identified by the Hierarchical Ward

method are also identified by the Two Step method, except reservoir NAGYLENGYEL. Moreover, the Two Step method

identifies five new analogous reservoirs that were not identified previously by the Hierarchical method: ARDMORE, AUK,

MARKOVO, VERKHNECHONA and VERKHNEVILYUY.

In this study, the criterion to define Casablancas analogous reservoirs after cluster analysis has been to choose all those

analogous reservoirs identified in at least one of the two clustering methods used. So, the final list of Casablancas analogous

reservoirs after cluster analysis is in Table 7.

ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)]

AUK [ZECHSTEIN (HALIBUT)]

BLACKBURN [NEVADA]

CASABLANCA

EAGLE SPRINGS [SHEEP PASS]

GELA [TAORMINA]

GRANT CANYON [GUILMETTE]

LIUBEI [WUMISHAN]

MARKOVO [OSA]

MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS]

NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)]

RAGUSA [TAORMINA]

RECHITSA [SEMILUKI]

RECHITSA [VORONEZH]

RECHITSA [ZADONSK]

RENQIU [WUMISHAN]

VEGA [SIRACUSA]

SPE 166449 15

VERKHNEVILYUY [YURYAKH]

YANLING [WUMISHAN]

YIHEZHUANG [MAJIAGOU-BADOU]

To check if the method developed in this study is valid, the results obtained are compared with companys previously known

Casablancas analogous reservoirs:

Amposta Marino

Nagylengyel

Yanling

Renqiu

As can be seen, the four known analogous reservoirs have also been identified as analogous reservoirs by the method

developed in this study. Besides, a similarity function has been applied to all Casablancas analogous reservoirs after cluster

analysis, to make a ranking of similarity regarding Casablanca reservoir. The similarity index and similarity ranking of

Casablancas analogous reservoirs are represented in Table 8, and Fig. 9 below shows their geographical distribution.

Table 8. Similarity index and ranking of Casablancas analogous reservoirs obtained with the method

proposed in this work.

RESERVOIR

SIMILARITY SIMILARITY

INDEX RANKING

RECHITSA [VORONEZH] 0,75 2

RECHITSA [SEMILUKI] 0,74 3

LIUBEI [WUMISHAN] 0,74 3

MATZEN [HAUPTDOLOMITE-BOCKFLIESS BEDS] 0,73 4

RENQIU [WUMISHAN] 0,72 5

YANLING [WUMISHAN] 0,72 5

AUK [ZECHSTEIN (HALIBUT)] 0,68 7

BLACKBURN [NEVADA] 0,68 7

GRANT CANYON [GUILMETTE] 0,67 8

NAGYLENGYEL [MAIN DOLOMITE (HAUPTDOLOMITE)] 0,66 9

ARDMORE [ZECHSTEIN (HALIBUT-TURBOT)] 0,66 9

EAGLE SPRINGS [SHEEP PASS] 0,61 11

RAGUSA [TAORMINA] 0,59 12

GELA [TAORMINA] 0,57 13

MARKOVO [OSA] 0,57 13

16 SPE 166449

Using the method proposed in this work obtains the following results:

The 4 previously known reservoirs analogous to Casablanca were also identified.

Within the 4 previously known analogous reservoirs, AMPOSTA MARINO is the most similar (first in the ranking),

RENQIU and YANLING are 5th, and NAGYLENGYEL is 9th. These are in bold in Table 8 above.

Similarity index is between 0.57 0.85. Only those reservoirs with similarity index greater than 0.5 are ultimately

identified as analogous reservoirs (notice that 2 reservoirs from the 21 originally selected were left out because they

had a similarity index of only 0.4: VERKHNECHONA and VERKHNEVILYUY).

Development of the statistical method for the identification of analogous reservoirs

Using statistical criteria, we performed an evaluation of the existing data in the master database. In this way, outliers were

detected, analyzed and corrected. Using a statistical procedure of imputation, all the missing data in the master database were

estimated, which allowed us to work with a 100 % complete database.

We developed a statistical and systematic method for the identification of analogues reservoirs. This method is based on

principal component analysis using as input data the KP in the carbonate database, followed by a cluster generation using as

imput data the PCs obtained in the previous step. The clustering methods used are Hierarchical Ward and Two Step. With each

method and for every target reservoir, we generated the necessary clusters to obtain a natural grouping of reservoirs.

Additionally, a similarity index was calculated for every analogous reservoir, based on the mean distance of the set of

properties of every reservoir regarding the target reservoir. This similarity index allows us to generate a similarity ranking of

all the analogous reservoirs inside the cluster to which the target reservoir belongs.

We validated the new method using the Casablanca reservoir as the target, because it is a carbonate reservoir that is very well-

known inside the company, and a set of 4 analogous reservoirs previously identified by the business unit team after an

independent analysis of the properties of Casablanca.

The developed method was successfully applied in this case, obtaining 19 analogous reservoirs sorted by similarity criteria

regarding Casablanca reservoir. The maximum similarity found was 85 % for Amposta Marino reservoir.

This new method provides a systematic procedure for the identification of analogous reservoirs, with the following

advantages:

Results are obtained in an objective manner. They are unaffected by personal criteria or lack of previous experience.

SPE 166449 17

The method allows the application of simultaneous similarity concepts. This is an improvement over the tools that are

based on successive application of filters.

The method allows estimation of missing parameters in the target reservoir, based on the information provided by the

identified analogous reservoirs.

The analogous reservoirs found with this method obviously depend on the Key Parameters used (number, weighting and kind

of them). Different KP or weighting of them will give inevitably different classifications. A very important step in this kind of

analysis is the proper definition of the KP to be included and the value of its weights.

Acknowledgements

The authors would like to thank C&C Reservoirs for granting permission to publish some of the results obtained in the

example shown in this paper.

References

1. Aminzadeh, F. 2005. Applications of AI and soft computing for challenging problems in the oil industry. Journal of Petroleum

Science and Engineering.

2. Csar Prez. 2004. Tcnicas de Anlisis Multivariante de Datos. Prentice Hall.

3. C&C Reservoirs Limited. 2013. C&C Reservoirs DAKS, http://www.ccreservoirs.com/ (accessed 5 April 2013).

4. Daniel Pea. 2002. Anlisis de Datos Multivariantes. McGraw-Hill.

5. Harrell, D.R. and Hodgin, J.E. 2004. Oil and gas estimates: recurring mistakes and errors. SPE-91069.

6. Hodgin, J.E and D.R. Harrell, D.R. 2006. The selection, application, and misapplication of reservoir analogs for the estimation of

petroleum reserves. SPE-102505.

7. Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27, 857-74.

8. Klemme, H.D. 1980. Petroleum basins: classification and characteristics. Journal Petroleum Geology, Vol.3, No.2, 187207.

9. Orlopp, D.E.1988. Casablanca Oilfied, Spain: a karsted carbonate trap at the shelf edge. Proceedings of the Offshore Technology

Conference, OTC 5734, 441-448.

10. Lomando, A.J., Harris, P.M., Orlopp, D.E. 1993. Casablanca Field, Tarragona Basin, Offshore Spain: A karsted carbonate

reservoir. In: Fritz, R.D., Wilson, J.L., Yurewicz, D.A. (eds.). Paleokarst related hydrocarbon reservoirs, SEMP Core Workshop,

18, 201-225.

11. Sharma, S. Srinivasan and Larry W. Lake. 2010. Classification of oil and gas reservoirs based on recovery factor: a data-mining

approach. SPE-130257.

12. Sidle, R.E. and Lee, W.J. 2010. An update on the use of reservoir analogs for the estimation of oil and gas reserves. SPE

Economics & Management.

13. Tan, P., Steinbach, M. and Kumar, V. 2006. Introduction to Data Mining. Addison-Wesley.

14. Walpole, Myers & Ye. 2012. Probability and Statistics for Engineers and Scientists. Prentice-Hall.

- Beginners Guide to Learn Dimension Reduction TechniquesUploaded byReaderRat
- Principal Component Analysis on a LES of a Squared Ribbed ChannelUploaded byAlessandro Parente
- Cerny-TR-2014-07Uploaded byShafayet Uddin
- IMDSUploaded bySunny Nguyen
- Handling Data with Three Types of Missing ValuesUploaded byJen Boyko
- 15Uploaded bysdghfgh
- Comparación de técnicas de detección de cambios para monitoreo desmonte y regeneración en bosques tropicales en una serie de tiempo.Uploaded byArmando Rodriguez Montellano
- Gupta2018 Rock Typing ShaleUploaded byMuhammad Nur Ali Akbar
- Transportation Statistics: entireUploaded byBTS
- a06v47n5.pdfUploaded byAnderson Santos Morais
- Texture-based Segmentation of Diffuse Lesions of the Brain’s White MatterUploaded byOrchidaceae Phalaenopsis
- 2. Ijcseitr - Achieving Privacy Preserving Clustering InUploaded byAnonymous QvVRHmEok
- tdm-oteyUploaded byUno De Madrid
- The Use of Discrete Data in PCARUploaded byJoe Ogle
- I_MUploaded byMochammad Nashih
- Behavioral Recognition of Fish Using Accelerometer DataUploaded byAreej Habib
- ththrghn6nnghgnmUploaded byAngelo-Daniel Seraph
- lec6.pdfUploaded byPurushotham Prasad K
- Analisis components PrincipalesUploaded byArquimedes Aular
- CAROLINE_LAMB (MSc) Probabilistic Performance-Based GeometricUploaded bySrdjan
- Birch 2013Uploaded byJabo
- Prat as 2005Uploaded byAngela Mzk
- The Social Semantics of LiveJournal FOAF: Structure and Change from 2004 to 2005Uploaded bySam Chase
- Feature Selection using Stepwise ANOVA Discriminant Analysis for Mammogram Mass ClassificationUploaded byIDES
- SanUploaded byseelan9
- Behavioral Recognition of Fish Using Accelerometer DataUploaded byAreej Habib
- Doc 2Uploaded bySrinivas Rvnm
- khoury2015Uploaded byapi-279655137
- Artigo-ECO-Cluster.pdfUploaded byRafael Stubs Parpinelli
- 3 DataAnalysis and InterpretationUploaded byamin jamal

- Quality ControlUploaded byUdi
- Rangkuman Dari Paper 3Uploaded byUdi
- Rangkuman Dari Paper 1Uploaded byUdi
- SPE-96945-MSUploaded byUdi
- 9780133155587.pdfUploaded byMuhammad Saleem
- Water Paper SpeUploaded byUdi
- Bagian Ii_ MagmaUploaded bydenywahyu
- Peramalan Decline CurveUploaded byBayu Ciptoaji
- Gas Lift - Intermittent (Leslie Thompson)Uploaded byUdi
- Rangkuman Dari Paper 2Uploaded byUdi
- Material BalanceUploaded byBayu Ciptoaji
- Teknologi IslamUploaded byUdi
- WorkoverUploaded byUdi
- SPE 99371Uploaded byRishi Mishra
- SPE-47201-MSUploaded byUdi
- IPTC-17505-MSUploaded byUdi
- URTEC-1968860-MSUploaded byUdi
- Simulasi ReservoirUploaded byTeguh Akbar Harahap
- HLTV ReadmeUploaded bywillquem16
- Modul PSR - Pak Joko Pamungkas11.docxUploaded byUdi
- SPE-944-G (1)Uploaded byUdi
- SPE-166449-MS.pdfUploaded byUdi
- SPE-166449-MS.pdfUploaded byUdi
- The Electrical Resistivity Log as an Aid in Determining Some Reservoir Characteristics's PaperUploaded byKenneth Burstall
- 2001-04 (kurva typical RRA)Uploaded byEduward OktaVian Silitonga
- matematikaUploaded byArya Syifa Fadilah
- matematikaUploaded byArya Syifa Fadilah
- SPE-101880-MSUploaded byUdi
- SPE-28041-MSUploaded byUdi

- Aplicacion de la Espectometría infrartojaUploaded byOzcar García Barragán
- Quant 1Uploaded byTecwyn Lim
- d 3492 - 97 _rdm0otitotcUploaded byMorched Tounsi
- Airline Safety Improvement Through Experience with Near-Misses: A Cautionary TaleUploaded byLarryDCurtis
- Time SeriesUploaded byMohammad Shaniaz Islam
- Israel Criteria Rev 4Uploaded byflip749wop
- Practice Exam 1Uploaded byAbla Remorabla
- 6-ETSUploaded byjustinlondon
- Persoalan n HipotesisUploaded bySchahyda Arley
- Design Wind Speeds Diffrent 3 Sec 1 Hour MeanUploaded byAkhilesh Krishna Ramkalawon
- A3 SolutionsUploaded byEmbtrans Inc
- 082 Ch 19 SelectedUploaded byRaymond Su
- Establishment of Baseline using Electronic Distance MeasurementUploaded byInternational Organization of Scientific Research (IOSR)
- Beat the Fish Start Dominating Online Poker GamesUploaded byDiosel John F. Escutin
- CH03 Classification Part IUploaded byozge
- Distribution parameters From MATLABUploaded byJunhyo Lee
- operations management chapter 8 caseUploaded bySuszie Sue
- Probability and Statistics Advanced (Second Edition)Uploaded byJuiced-IN it
- 2009 TaylorUploaded byajn1966
- 1 Basic ConceptsUploaded byIzzeah Ramos
- Linear RegressionHandoutUploaded byHorváth Dániel
- Reservoir Dose With Ferric SulphateUploaded byLeena Marashdeh
- study guide and intervention workbook course 2 editionUploaded byapi-261894355
- Role of Statistics in GeographyUploaded byHarshSuryavanshi
- Quiz ForecastingUploaded bySiddarthSuhas
- A2 Math C3 doneUploaded byWuileap
- Data a AUploaded byMuhammad Putra Dinata Saragi
- mlogitUploaded byk_ij9658
- Chapter 3Uploaded bylight247_1993
- time seriesUploaded bySahauddin Sha