You are on page 1of 5

Article

pubs.acs.org/EF

Identication of the Source of Geographical Origin of Iranian Crude


Oil by Chemometrics Analysis of Fourier Transform Infrared Spectra
Bahram Hemmateenejad* and Samira Dorostkar
Chemistry Department, Shiraz University, Shiraz 71454, Iran
S Supporting Information
*

ABSTRACT: In this paper, we analyzed the crude oils of ve dierent regions of Iran to identify their sources of geographical origin. The
Fourier transform infrared (FTIR) spectra of samples were analyzed by chemometrics methods to make discrimination between the dierent
crude oil sources. Infrared (IR) spectroscopy in conjunction with chemometrics techniques allows for online monitoring in real time, which
can be of considerable use in the petroleum industry. Principal component analysis (PCA) and extended canonical variates analysis (ECVA),
as unsupervised and supervised classication methods, respectively, were employed. The PCA scores made a relative discrimination between
the dierent crude oil sources; however, the degree of classication was not satisfactory. Instead, more accurate classication results were
achieved by ECVA. The results show that the spectral region 13501490 cm1 possessed much better performances for classication by
ECVA. This spectral region, which is attributed to the SO, aromatic CC, and methylene CC vibrations, suggests that the dierence
between crude oils of these geographical origins is primarily attributed to the dierence in sulfoxide and aromatic compounds. The ECVA
technique was found as a promising classication model and has shown good classication power for crude oil sources.

1. INTRODUCTION
Identication of the source of the geographical origin of crude
oils and their products is of high importance for not only their
fractions in the environment but also forensic and quality control
studies. Making classication models to distinguish crude oils and
the petrochemical products of dierent geographical origin is of
great importance in arson investigations, spills of crude oil, and
geochemical and reservoir management studies.13 However,
because these materials possess a complex matrix, aliating a
crude oil sample to its native geographical origin is not simple.
Hyphenation of infrared (IR) spectroscopy with chemometrics methods makes it a fast and reliable analytical
technique, which has found widespread applications in
pharmaceutical4 and industrial57 elds, disease detections,8,9
monitoring of chemical and biotechnological processes,10
bacterial identication,11 and also the petroleum industry.1215
The above-mentioned examples show the quantitative analytical
aspects of IR spectroscopy (as a non-destructive measurement)
besides its extensive qualitative applications for structural
elucidations of chemical compounds. IR spectroscopy benets the
advantage of studying samples in any state, including (solutions and
thin lms) solids (powders, lms, and bers), pastes, and gases.1618
Because of the importance of the classication of crude oil
and products in petroleum, some papers have been devoted to
this problem.12,14,1923 However, according to the best of our
knowledge, source identication of Iranian oil using IR
spectroscopy and chemometrics data analyses has not yet
been reported. As holder of 10% of the worlds proven oil
reserves and 15% of its gas, Iran is the second largest exporter
of the Organization of the Petroleum Exporting Countries
(OPEC) and, by producing 132.5 billion barrels, is the fourth
oil producer of the world.24
In this work, we made discrimination models for the crude
oil samples collected from ve dierent regions of Iran, using
IR spectroscopy analyzed with multivariate classication methods.
2014 American Chemical Society

While oils of all sources possessed very similar IR spectra, they


were successfully classied employing the extended canonical
variates analysis (ECVA) method. A comparison was made
between ECVA and interval ECVA methods. It was found that, by
multivariate analysis of the IR spectroscopic data, the source of
geographical origin of crude oils can be identied. This method
does not need expensive chromatographic methods or hazardous
solvents and is non-destructive.

2. EXPERIMENTAL SECTION
2.1. Data Set. In this study, 100 samples of crude oil collected
from ve dierent regions of Iran (belonging to Aghajari, Ahvaz,
Maroon, Rage Sed, and Ramshir regions) and 20 samples for each
region, were investigated. All regions are located in the Khozestan
state. The oil samples were collected directly from the top of the oil
well after exhausting gases and before entering the transport section.
Information about samples and classes is given in Table 1. To ensure

Table 1. Characteristics of the Data Set Used in This Study


class number

source of geographical origin

number of samples

1
2
3
4
5
total

Aghajari
Ahvaz
Maroon
Rage Sed
Ramshir

20
20
20
20
20
100

the reproducibility of the method and encounter the eect of sampling


time, samples were collected at dierent times in a period of 6 months.
After sampling, to delete the containing water, the fresh samples were
incubated for 24 h with calcium chloride before Fourier transform
infrared (FTIR) measurements. The data set was randomly partitioned
Received: March 12, 2013
Revised: December 31, 2013
Published: January 20, 2014
761

dx.doi.org/10.1021/ef4017202 | Energy Fuels 2014, 28, 761765

Energy & Fuels

Article

into training (70 samples; 14 samples from each region) and test
(30 samples; 6 samples from each region) sets.
2.2. IR Spectra Measurement. The IR spectra were acquired with
a Shimadzu 8000 series FTIR spectrometer with a deuterated,
1
L-alanine-doped triglycine sulfate (DLATGS) detector set to 16 cm
optical resolution, and 15 scans were averaged. Liquid lms of oils were
used for spectral measurements. A liquid demountable IR cell with NaCl
windows with a spectral path length of 30 m (equipped with a Teon
spacer) was used throughout. About 5.0 L aliquots of oil samples were
transferred into the cell, and their FTIR spectra were acquired after 2 min,
at room temperature (23 25 C). The cell was cleaned by toluene
solvent and then dried for further uses. Each spectrum was recorded
3 times, and the average spectrum was reported for each sample.
2.3. Software and Data Analysis. Data analyses were run in a
MATLAB (Mathwork, Inc., version 8) environment. The MATLAB
Statistics Toolbox and the ECVA Toolbox, version 2.02, developed by
the Technology and Quality Research Group at Kopenhagen
University (available at http://www.models.life.ku.dk) were used.
The transmittance IR spectra of the studied samples were digitized
in absorbance units in the form of a data matrix D of size (100 nw),
where nw is the number of absorbance readings per spectrum. Autoscaling
to unit variance and zero mean was used as data preprocessing.
Spectral data were rst analyzed by principal component analysis
(PCA), employing a singular value decomposition (SVD) function of
MATLAB. Then, ECVA as a supervised classication method was
applied on the data. Cross-validation was used to optimize the ECVA
model, e.g., determination of the optimum number of PLS models,
which is used as inner part of ECVA for discrimination analysis. To
judge the overall prediction ability of the model, six samples from each
group were randomly selected as the prediction set. These samples did
not contribute in the model development.

3. RESULTS AND DISCUSSION


We investigated the multivariate processing of IR spectral data
as a classication method for identication of the crude oil
sources. The averaged IR spectra of samples from dierent
sources are represented in Figure 1. The oil samples represent
signicant absorbance in the spectral regions of 28003000
and 13501500 cm1. Obviously, the spectra are very similar,
and discrimination based on visual inspection is not possible.
3.1. Spectral Region Selection. To select the sub-spectral
region that resulted in lower classication errors, interval ECVA
(i-ECVA) was employed. In this method, equal-sized subspectral regions are selected from the whole spectral data and
then separate ECVA analysis is applied on each sub-region. The
classication error for dierent sub-spectral regions is plotted
in Figure 2, and the misclassication errors obtained from dierent
spectral intervals are listed in Table S1 of the Supporting Information.
Obviously, interval 5 (spectral region of 12811475 cm1) resulted
in the least number of misclassications. Only one sample has not
been classied to its native group. There is also observed another
minimum in Figure 2 (and Table S1 of the Supporting Information)
corresponding to the spectral interval of 28413035 cm1. For this
spectral region, four samples have been misclassied. Therefore, the
classication model obtained from spectral region 12811475 cm1
possessed much better performances.
It should be noted that the discussion given in the following is
based on the selected spectral region of 12811475 cm1 and not
the whole spectral data. As shown in Figure 1, in these spectral
regions, the oils represented signicant absorbance peaks. The size of
data matrices used for the selected region was then (70 290),
which means that 290 wavenumbers were used. The data matrices in
spreadsheet format are available upon request for interested readers.
3.2. PCA of the Spectral Data. The eigenvalues and the
percent of variances explained by each eigenvector, obtained
from application of PCA on the spectral data matrix of the

Figure 1. Example of crude oil FTIR spectra: (a) Aghajari, (b) Ahvaz,
(c) Maroon, (d) Rage Sed, and (e) Ramshir.

selected spectral interval (12811475 cm1), are summarized


in Table 2. It is observed that 99.96% of spectral information is
retained when 290-dimentional spectral space is projected into
a three-dimentional factor space. Plotting the crude oil samples
in the two- or three-dimentional Cartesian coordinates of the
principal components visualizes their relative position based on
information included in their IR spectra. As shown in Figure 3,
PCA did some discrimination between the sources of crude oil
samples but the degree of discrimination is not so good. This is
the inherent advantage of PCA to extract and distinguish the
minute spectral dierences between closely related sources of
crude oil samples. However, the pattern of distribution of the
samples has been obtained without using information about the
classes of samples in PCA.
762

dx.doi.org/10.1021/ef4017202 | Energy Fuels 2014, 28, 761765

Energy & Fuels

Article

from the spectral data matrix, and hence, they are not essentially
correlated with discriminate variables.25 On the other hand, the
principal factors obtained by supervised classication methods,
such as discriminate partial least squares (DPLS), are calculated
using the information included in both spectral data and classes of
samples.26 However, DPLS does not oer good results for real
data sets, and in some instances, overtting is observed.27
Canonical variates analysis (CVA)28 is another supervised method
that cannot be applied to highly collinear spectroscopic data.
ECVA is an extension of CVA and proposed by Norgaard et al. to
overcome limitations of CVA.27 Using PLS in the inner part of
ECVA allows for the analysis of collinear data.
Therefore, in the next data analysis step, we used ECVA for
its growing applications and its promising results.25,2931 The
model renement procedure used 5-fold cross-validation to select
the number of PLS factors. From the plot of misclassication error
as a function of the number of PLS factors (see Figure S1 of the
Supporting Information), 11 factors were selected as the optimum
factors. At this number of factors, cross-validation error is in its
minimum value and closer to the calibration error.
In this study, ve classes of samples were investigated, and hence,
the ECVA solution is a four-dimentional model (one unit less than
the number of classes). The values of canonical variates calculated
for each sample in all four directions are depicted in Figure 4.

Figure 2. Results of interval ECVA analysis of the FTIR spectral data.


The bars are the number of misclassied compounds for each interval.
Italic numbers are the optimal number of latent variables used for each
interval models. The dotted line is the number of misclassications for
the global model.

Table 2. Results of Application of PCA on the Absorbance


Data Matrix
factor number

EV

PV

CPV

1
2
3
4
5
6
7
8
9
10

420.9
16.14
0.7750
0.1278
0.0188
0.0090
0.0055
0.0040
0.0028
0.00176

96.1
3.69
0.18
0.03
0
0
0
0
0
0

96.1
99.78
99.96
99.99
99.99
100
100
100
100
100

Figure 3. Distribution pattern of the studied crude oil samples in the


(a) two-dimensional and (b) three-dimensional, PCA-based factor
space of their absorbance FTIR spectra.

3.3. Classication by ECVA. A clear reason for failure of


PCA for correct classication of sources of crude oil samples is
that the eigenvectors or principal components are calculated solely

Figure 4. Extended canonical variates for Aghajari, Ahvaz, Maroon,


Rage Sed, and Ramshir (from left to right) obtained by the
11-component inner PLS model.
763

dx.doi.org/10.1021/ef4017202 | Energy Fuels 2014, 28, 761765

Energy & Fuels

Article

clear separation between the crude oil samples of dierent


sources in the two-dimensional space of the canonical variates.
However, one sample of Rage Sed has moved toward the
Ramshir group, and it was the only sample that was classied
erroneously. In addition, the oil samples of Maroon and
Aghajari regions are located close to each other. Therefore, one
can assume similar compositions for the crude oil of these
regions. Finally, it is observed in Figure 5 that the test samples
are also placed in the regions of the calibration samples.
The FTIR spectra of crude oil samples shown in Figure 1
indicated strong bands around 3000, 2925, and 2860 cm1,
which correspond to aromatic CH, methyl CH, and
methylene CH. A band positioned at around 1400 cm1
represents methylene CC, followed by another intense band
at 1375 cm1 corresponding to SO of sulfones.32 The
canonical weight vectors (Figure 6) was use to identify those
wavenumbers of highest impact on the discrimination of the
crude oil sources. Obviously, at the spectral regions of 1474,
1473, 1468, 1465, 1463, 1457, 1419, 1378, and 1375 cm1, the
weight vectors are larger. These identied spectral regions are
close to the regions where distinctive peaks of crude oil samples
appeared. The spectral region of 1375 cm1, which is attributed
to the SO vibrations, and the spectral region of 1419
1474 cm1, which is attributed to aromatic compounds CC
and methylene CC, suggest that the dierence between crude
oils of these geographical origins can be mainly attributed to the
dierence in sulfoxide and aromatic compounds in crude oils.
The canonical variates of the ECVA model were used as the
classier variables of the linerar discriminant analysis (LDA)
method, whereas the number of misclassications was used as
the validation error. Among the 30 prediction set samples, all
were correctly assigned to their native classes.
The classication results obtained by analysis of the whole
spectral region are given in Figure S2 of the Supporting
Information. It is observed from this gure that, when the
whole spectral region is used as the input of ECVA, the
discrimination between crude oil sources is not strong and the
samples of Ahvaz and Maroon sources are mixed.

Figure 5. Distribution pattern of the studied crude oil samples in the


two-dimensional canonical variate space of their absorbance FTIR
spectra. The red markers represent the test set samples.

The distribution of the sample points in the two-dimensional space


of the rst and second variates is represented in Figure 5. To show
the impacts of the spectral wavenumbers on the classication ability
by ECVA, the canonical weight vectors are plotted in Figure 6.

Figure 6. Extended canonical weights obtained by the 11-component


inner PLS model.

4. CONCLUDING REMARKS
The FTIR spectra of crude oil samples in ve states of
Khozestan, Iran, were analyzed by chemometrics methods to
make discrimination between the dierent crude oil sources.
PCA of the IR spectral data does not represent any discrimination
between the sources of crude oil compounds. However, high
discrimination was obtained by ECVA, so that all crude oil
compounds used in this study were correctly assigned to their
native sources. In comparison to analysis of the whole spectral
region, using selected spectral regions (12811475 cm1) resulted
in models of better classication results.

The canonical variates shown in Figure 4 describe the success


of ECVA for discrimination between crude oils of dierent
sources of origin. In the rst variates, the samples of Ramshir
and Rage Sed are discriminated from other sources, whereas
the canonical variates in the second direction represent a clear
discrimination between Ramshir and Rage Sed samples. The
samples of the Ramshir source are possessing positive and
negative variates in the rst and second directions, respectively.
However, both the rst and second variates of the Rage Sed
samples are positive. On the other hand, one can observe a
clear discrimination between Maroon and Ahvaz sources using
the second and third canonical variates, such that those having
negative variates in the rst and second directions and positive
variates in the third direction are samples from the Maroon
source. Finally, the canonical variates in the fourth direction
make discrimination between the Ahvaz and Aghajari sources.
In the third and fourth directions, crude oil samples from the
Ahvaz source have negative variates and crude oil sample from
the Aghajari source have positive variates.
Plotting of the crude oil data samples in the two-dimensional
space of the rst and second canonical variates (see Figure 5)
gives a visual inspection of the discrimination ability of ECVA
for the studied crude oil samples. Obviously, there is a relatively

ASSOCIATED CONTENT

* Supporting Information
S

Number of misclassications (from cross-validation) obtained


by ECVA analysis of dierent spectral regions (Table S1),
number of misclassications as a function of the number of PLS
components used in the inner PLS relation of ECVA on the
FTIR data (Figure S1), and distribution pattern of the studied
crude oil samples in the three-dimensional canonical variate
space of their absorbance FTIR spectra of the whole spectral
region (Figure S2). This material is available free of charge via
the Internet at http://pubs.acs.org.
764

dx.doi.org/10.1021/ef4017202 | Energy Fuels 2014, 28, 761765

Energy & Fuels

Article

(31) Hansen, C. L.; van den Berg, F.; Rasmussen, M. A.; Engelsen, S.
B.; Holroyd, S. Chemom. Intell. Lab. Syst. 2010, 104, 243248.
(32) Shakirullah, M.; Ahmad, W.; Ahmad, I.; Ishaqand, M.; Khan, M.
I. J. Chil. Chem. Soc. 2012, 57, 13751380.

AUTHOR INFORMATION

Corresponding Author

*Telephone: 0098-711-646-0724. Fax: 0098-711-646-0788.


E-mail: hemmatb@sums.ac.ir.
Notes

The authors declare no competing nancial interest.

REFERENCES

(1) Tan, B.; Hardy, J. K.; Snavely, R. E. Anal. Chim. Acta 2000, 422,
3746.
(2) Lavine, B. K.; Ritter, J.; Moores, A. J.; Wilson, M.; Faruque, A.;
Mayfield, H. T. Anal. Chem. 2000, 72, 423431.
(3) Peters, K. E.; Fowler, M. G. Org. Geochem. 2002, 33, 536.
(4) Sekulic, S.; Ward, H. W.; Brannegan, D.; Stanley, E.; Evans, C.;
Sciavolino, S.; Hailey, P.; Aldridge, P. Anal. Chem. 1996, 68, 509513.
(5) Blanco, M.; Coello, J.; Eustaquio, A.; Iturriaga, H.; Maspoch, S.
Anal. Chim. Acta 1999, 392, 237246.
(6) Ozaki, Y.; Cho, R.; Ikegaya, K.; Muraishi, S.; Kawauchi, K. Appl.
Spectrosc. 1992, 46, 15031507.
(7) Blanco, M.; Coello, J.; Garcia Fraga, J. M.; Iturriaga, H.; Maspoch,
S. Analyst 1999, 122, 7780.
(8) Backhaus, J.; Mueller, R.; Formanski, N.; Szlama, N.; Meerpohl,
H.; Eidt, M.; Bugert, P. Vib. Spectrosc. 2010, 46, 173177.
(9) Kondepati, V. R.; Keese, M.; Mueller, R.; Manegold, B. C.;
Backhaus, J. Vib. Spectrosc. 2007, 44, 236242.
(10) Huber, W.; Bubendorf, A.; Grieder, A.; Obrecht, D. Anal. Chim.
Acta 1999, 393, 213221.
(11) Marcott, C.; Dowrey, A. E.; Poppel, J. V.; Noda, I. Vib. Spectrosc.
2004, 36, 221225.
(12) Balabina, R. M.; Safievab, R. Z.; Lomakinac, E. I. Anal. Chim.
Acta 2010, 671, 2735.
(13) Midttun, .; Kvalheim, O. M. Fuel 2001, 80, 717730.
(14) Balabin, R. M.; Safieva, R. Z. Fuel 2008, 87, 10961101.
(15) Peinder, P. d.; Visser, T.; Petrauskas, D. D.; Salvatori, F.;
Singelenberg, F.; Soulimani, F.; Weckhuysen, B. M. Appl. Spectrosc.
2008, 62, 414422.
(16) Crowther, M. W. J. Chem. Educ. 2008, 85, 15501555.
(17) Stuart, B. H.; George, B.; Mc Intyre, P. Modern Infrared
Spectroscopy; John Wiley and Sons, Ltd.: Hoboken, NJ, 1996.
(18) Mullins, O. C.; Daigle, T.; Crowell, Ch.; Groenzin, H.; Joshi, N.
B. Appl. Spectrosc. 2001, 55, 197201.
(19) Brudzewski, K.; Kesik, A.; Kolodziejczyk, K.; Zborowska, U.;
Ulaczyk, J. Fuel 2006, 85, 553558.
(20) Zhang, Q. K.; Dai, L. K. Control Instrum. Chem. Ind. 2005, 32,
5359.
(21) Pasadakis, N.; Kardamakis, A.; Sfakianak, P. Fuels 2007, 21,
34063409.
(22) Clark, H. A.; Jurs, P. C. Anal. Chem. 1979, 51, 616623.
(23) Peinder, P. D. Characterization and classication of crude oils
using a combination of spectroscopy and chemometrics, Ph.D. Thesis,
Utrecht University, Utrecht, Netherlands, 2009; http://igiturarchive.
library.uu.nl/dissertations/2010-0113-200208/UUindex.html.
(24) U.S. Energy Information Administration (EIA). Iran; EIA:
Washington, D.C., 2012; www.eia.gov/countries/country-data.
cfm?ps=IR (accessed July 31, 2012).
(25) Mobaraki, N.; Hemmateenejad, B. Chemom. Intell. Lab. Syst.
2011, 109, 171177.
(26) Barker, M.; Rayen, W. J. Chemometrics 2003, 17, 166173.
(27) Nrgaard, L.; Bro, R.; Westad, F.; Engelsen, J. Chemometrics
2006, 20, 42535.
(28) Russell, E. I.; Chiang, L. H.; Braatz, R. D. Chemom. Intell. Lab.
Syst. 2000, 51, 8193.
(29) Winning, H.; Viereck, N.; Salomonsen, T.; Larsen, J.; Engelsen,
S. B. Carbohydr. Res. 2009, 344, 18331841.
(30) Nrgaard, L.; Soletormos, G.; Harrit, N.; Albrechtsen, M.;
Olsen, O.; Nielsen, D.; Kampmann, K.; Bro, R. J. Chemometrics 2007,
21, 451458.
765

dx.doi.org/10.1021/ef4017202 | Energy Fuels 2014, 28, 761765

You might also like