You are on page 1of 10

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO.

3, MARCH 2010 1355


Simulated Multispectral Imagery for Tree Species
Classication Using Support Vector Machines
Ville Heikkinen, Timo Tokola, Jussi Parkkinen, Ilkka Korpela, and Timo Jskelinen
AbstractThe information content of remotely sensed data de-
pends primarily on the spatial and spectral properties of the imag-
ing device. This paper focuses on the classication performance of
the different spectral features (hyper- and multispectral measure-
ments) with respect to three tree species. The Support Vector Ma-
chine was chosen as the classication algorithm for these features.
A simulated optical radiation model was constructed to evaluate
the identication performance of the given multispectral system
for the tree species, and the effects of spectral-band selection
and data preprocessing were studied in this setting. Simulations
were based on the reectance measurements of the pine (Pinus
sylvestris L.), spruce [Picea abies (L.) H. Karst.], and birch trees
(Betula pubescens Ehrh. and Betula pendula Roth). Leica ADS80
airborne sensor with four spectral bands (channels) was used as a
xed multispectral sensor system that leads to response values for
the at-sensor radiance signal. Results suggest that this four-band
system has inadequate classication performance for the three
tree species. The simulations demonstrate on average a 515 per-
centage points improvement in classication performance when
the Leica system is combined with one additional spectral band.
It is also demonstrated for the Leica data that feature mapping
through a Mahalanobis kernel leads to a 510 percentage points
improvement in classication performance when compared with
other kernels.
Index TermsFeature extraction, image sensors, pattern classi-
cation, radiometry, remote sensing.
I. INTRODUCTION
C
LASSIFICATION is one of the approaches in deriving in-
formation from remotely sensed forest data, and detailed
tree species classication is important in forest inventories for
technical, ecological, and economic reasons [14]. The adequate
accuracy level for practical applications is above 90%, as the
value of the forest data deteriorates rapidly at lower accuracies
[15]. Tree species classication is an evident bottleneck in
current remote sensing of forests, in spite of the ample research
carried out into the use of airborne laser scanning and the
recently introduced digital aerial multispectral cameras (e.g.,
[25]). These cameras offer enhanced radiometric and geometric
properties when compared with traditional lm cameras, but
they are not customized for forestry applications but for sur-
Manuscript received April 6, 2009; revised May 22, 2009 and July 27, 2009.
First published December 4, 2009; current version published February 24,
2010. This work was supported by the Academy of Finland under Grant
123193.
V. Heikkinen, J. Parkkinen, and T. Jskelinen are with the Faculty of
Science, InFotonics Center, University of Joensuu, 80101 Joensuu, Finland
(e-mail: ville.heikkinen@ifc.joensuu.).
T. Tokola is with the Faculty of Forest Sciences, University of Joensuu,
80101 Joensuu, Finland.
I. Korpela is with the Faculty of Agriculture and Forestry, Department of
Forest Resource Management, University of Helsinki, 00014 Helsinki, Finland.
Digital Object Identier 10.1109/TGRS.2009.2032239
veying and mapping purposes. There is still a substantial lack
of basic research into the spectral characteristics distinguishing
given forest objects, and such information would be valuable in
specifying optimal sensors designed for forest use.
The classication of objects in images is based on their
geometrical or spectral features. At the single-tree level, im-
portant structures contribute to the image texture only at very
high resolutions, and such images are often too expensive to
acquire over large areas. We will thus ignore the spatial features
of images here and focus entirely on features derived from
pointwise multispectral measurements. Tree species recogni-
tion algorithms that operate at the individual tree level have
been developed, and they are mainly based on the spectral
properties of the observed signal [9], [10], [13], [22].
A property which characterizes the spectral imaging system
is the number of the individual wavelength bands sensed in the
electromagnetic spectrum. Every spectral band of the sensor has
a corresponding spectral response function with some shape,
and the number of these bands denes the dimensionality of the
measurement vectors. The bandwidth of an individual band in
the sensor system is usually called its spectral resolution. Cur-
rent sensor technology allows the capture of spectral data us-
ing hundreds of high-resolution spectral bands simultaneously.
Imaging devices with these capabilities provide a possibility
to use well-known analytical methods to extract representative
spectral space features from the data. It has been shown that the
classication based on high-dimensional reectance data can be
carried out accurately and efciently using linear mappings to
lower-dimensional subspaces for each class [3], [13], [20].
Although the most informative spectral data are obtained
with systems involving hundreds of spectral bands, the use
of such imaging devices can be impractical or too costly in
some applications. For example, the width of the imaged area
(swath width) of hyperspectral devices for remote sensing is
smaller than that of multispectral devices. Usually, the high-
dimensional hyperspectral data also involve a high level of
redundancy, implying an inefcient data management and stor-
age. The identication and usage of a small number of data-
dependent relevant bands would increase the applicability of
such an imaging system.
When the imaging device has a low number of spectral
bands, the available data already reside in the xed lower-
dimensional subspace dened by the spectral response func-
tions of the data-independent sensor system. Consequently, an
efcient linear feature extraction might be disrupted due to
the lower information content of the measured data. It is also
possible that the system may have been optimized for some
particular task, which could lead to poor performance when it
0196-2892/$26.00 2009 IEEE
1356 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
is used for classication purposes. In this paper, the modeling
problems due to lower information content of the multispectral
measurements are compensated for by means of various data
preprocessing methods and nonlinear feature space mappings
through positive denite kernel functions. The kernel technique
does not compensate for the lack of information content, but
they introduce the tools to model complex data structures non-
linearly. The features derived from the kernel give representa-
tions for the data in a high-dimensional feature space, where the
classication task is assumed to be easier to accomplish [24].
Recently, support vector machine (SVM) classiers have
been found to achieve excellent accuracy when used for the
classication of remotely sensed data. In particular, when com-
bined with kernel functions, SVMs have been found to give
results that compete well with the best previously available
classiers [4], [7], [11], [18], [19]. An SVM classier is also
robust to noise and high-dimensional data and gives the solution
(support vectors) in the form of a sparse representation, which
is benecial in practical usage [4].
Performance of a classier is also affected by the prepro-
cessing stage for data. Standard methods for preprocessing are
different scaling methods, outlier detection, principal compo-
nent analysis, and whitening. In this paper, we concentrate
on outlier removal and whitening transformation. Whitening
transformation is related to the use of translation invariant
Mahalanobis kernel, which has been suggested to allow fast
model selection of the SVM algorithm [1].
The objective of the present paper is to evaluate the effects
of spectral-band selection and data preprocessing on tree
species identication with the SVM algorithm. Simulated
high-spectral-resolution radiance measurements and simulated
response values [digital numbers (DNs)] of a four-channel
Leica ADS80 airborne camera were used as a basis for this
paper [16]. The simulation features are based on real reectance
measurements of pine (Pinus sylvestris L.), spruce [Picea abies
(L.) H. Karst.], and birch trees (Betula pubescens Ehrh. and
Betula pendula Roth). The simulated setting and the availability
of a reectance ensemble allow the Leica ADS80 system to
be studied in conjunction with additional spectral response
functions. Simulated measurements obtained using alternative
multispectral system are compared in terms of the classication
accuracy of the SVM algorithm, when the rst-order polyno-
mial, Gaussian, and Mahalanobis kernels are used. Statistical
signicance between classications using different kernels and
systems was evaluated with McNemars test.
We demonstrate that signicant improvements for classi-
cation accuracy are obtained when an additional spectral re-
sponse function is added to the Leica system. In addition, we
also present improvements of classication accuracy by using
preprocessing of data.
II. OPTICAL RADIATION MODEL
In the following, we introduce the optical radiation model
and data used for the simulations of the hyper- and multi-
spectral measurements. In the notation used in the following,
the wavelength variable is denoted by (in nanometer) and
the wavelength region by . Operators are written in capital
letters, and functions in lower case letters. Vectors are denoted
by boldface letters, and the corresponding vector components
are superscripted.
In the visible and near-infrared regions, the at-sensor radi-
ance component of a perfectly diffuse reecting surface (this is
also known as Lambertian surface model, where reectance is
independent of viewing angle) is approximated as
()=r()
l
0
()
s
()
v
()

cos()+r()
l()
v
()

+
s
()
(1)
where [23]
l
0
: R
+
the exo-atmospheric solar irradiance;
l : R
+
the irradiance at the surface due to skylight;

s
: [0, 1] the atmospheric transmittance along the solar
path;

v
: [0, 1] the atmospheric transmittance along the sen-
sor view path;
r : [0, 1] the spectral reectance of the object;
the angle between the surface normal and the
solar incident angle;

s
() the path scattered radiance at-sensor
component.
The dependence on the spatial location is not explicitly
written into the earlier equation. For a non-Lambertian surface,
we could assume a xed viewing angle and replace r()/ with
the bidirectional reectance distribution function [23].
We approximate that
v
= 1 for our airborne sensor and
assume that
s
() = 0 for the indirect component. Assuming
that the angle is xed, the effect of the electromagnetic
radiation in the k-band camera can be modeled as
x
i
=

i
1
_

()
c
()s
i
()d

(2)
where i = 1, . . . , k, s
i
: [0, 1] is the spectral response
function of the ith camera band, and
c
: [0, 1] is the
transmittance of the camera optics. The scalar
i
1
corresponds to
the chosen measurement geometry and exposure setting, while
the function gathers together the nonlinearity of the system.
Substituting the radiance function into (2), we have
x
i
=

i
2
_

w
i
()r()d

(3)
with
i
2
= (1/)
i
1
cos() as the system calibration con-
stant and
w
i
() = (l
0
()
s
() + l())
c
()s
i
() = l
d
()
c
()s
i
().
(4)
We assume here that the response functions {s
i
}
k
i=1
are located
in the wavelength interval of the visible and near-infrared
radiation = [390, 850] nm. This wavelength region was based
on the properties of the available reectance ensemble.
We use the discrete high-resolution daylight measurement as
an approximation for the irradiance
l
d
() = l
0
()
s
() + l(). (5)
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION 1357
Fig. 1. Daylight irradiance, sampling of 5 nm.
The spectral power distribution of daylight used here corre-
sponds to spectroradiometer measurements of the hemispheric
daylight in the region of 380780 nm, including the global spec-
tral irradiance E on a horizontal surface from direct sunlight
and the entire sky. The measurements correspond to midday
conditions with clear weather, and they were carried out in
Joensuu, Finland. Due to the restricted wavelength range of
the measurements, the daylight irradiance was extended to the
region of 380850 nm using a constant value in the region of
780850 nm (see Fig. 1). This daylight irradiance was used for
all the simulations.
A. Reectance Ensemble
The reectance spectra of needles of young (less than
40 years old) Scots pines (Pinus sylvestris L.) and Norway
spruces [Picea abies (L.) H. Karst.] and the leaves of birches
(Betula pubescens Ehrh. and Betula pendula Roth) collected
from Finland and Sweden were used as experimental data
[13]. The spectroradiometric measurements were made in clear
weather during the growing season in June 1992. Each radiance
measurement represented the average spectrum of thousands of
leaves on a growing tree. The measured crowns were thick in
order to minimize the effects of branch color and background
illumination. The reectance component of the signal was
calculated using the baseline measurement at a distance of
5 m, with the aid of a BaSO4 reference surface [12]. In the mea-
surement setting, the sun was always behind the measurement
direction of the sensor (backscattering), with a clear path toward
the object. The solar vector had an almost constant elevation
angle, but the azimuth angle with respect to the measurement
direction had a variation.
The measurements can be considered to be free of spectral
signatures from other classes. They were carried out on the
ground at a distance of 50 m (eld of view 0.6 m
2
) using a
PR 713/702 AM spectroradiometer in the wavelength interval
of 3901070 nm with a 4-nm spectral sampling. Repetition
accuracy of the device is 3.5%. The original spectra were
transformed to the wavelength range of 390850 nm at 5-nm
sampling intervals by linear interpolation. The upper limit of
the wavelength was set to 850 nm, as no signicant differences
in the spectra were found above this wavelength for these
measurements. The ensemble sizes for the three classes were
almost equal: 336 samples for birch, 369 samples for pine, and
Fig. 2. Average spectra of birch, pine, and spruce ensembles.
348 samples for spruce. Differences between average spectra
of deciduous (birch) and coniferous (pine and spruce) groups
are shown in Fig. 2. More details on the measurement setting
and the analysis of the data subspaces for discrimination and
approximation are given in [13].
B. Response Functions
Systems {s
i
}
k
i=1
based on rectangular response functions
were used in this paper. Examples of real multispectral systems
using response functions of this kind are the Leica ADS40 and
ADS80 (airborne digital sensor) systems [16]. These ideal func-
tions {s
i
}
k
i=1
correspond to the characteristic functions of non-
overlapping wavelength intervals {
i
}
k
i=1
and are dened as
s
i
() = c
i

i
() (6)
where c
i
R
+
and

i
() =
_
1,
i
0, otherwise.
(7)
The wavelength supports for the Leica system are
1
=
[428, 492],
2
= [533, 587],
3
= [608, 662], and
4
=
[833, 887]. These bands are comparable in their spectral
properties to Landsat bands 14 [2]. The response functions of
the Leica system are shown in Fig. 3.
C. Numerical Approximation of Multispectral Responses
High-spectral-resolution reectance measurements were
used as approximations for the true reectance values r(
i
),
where the measurements i = 1, . . . , n correspond to the spec-
troradiometer measurements and
i
correspond to uniform
sampling of the wavelength interval .
For the camera, we assume that can be inverted and
c
= 1.
We used Simpsons rule as a standard quadrature model for the
numerical integration in order to simulate the camera response
values in accordance with (2). The approximated model is
formulated as
x
i

i
2

3m
m

t=0
w
i
(
t
)q(
t
)r(
t
)
_
(8)
1358 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
Fig. 3. Leica system with (solid) an additional band in the 705755-nm
waveband, (dash-dotted) an additional band in the 710725-nm waveband, and
(dashed) an additional band in the 695725-nm waveband.
where m is given by the xed sampling interval for the
region and q is the quadrature weight of Simpsons rule [21].
Using a vector notation r = (r(
0
), . . . , r(
m
))
T

R
n
(n = m + 1) for the measured reectance and a matrix
notation for the responsivity matrix W R
kn
(including the
weight q), we can write (2) as
x = (x
1
, . . . , x
k
)
T
Wr. (9)
This numerical model was used to form the simulated responses
from a sampled spectrum r. The sampling interval was set to
5 nm (n = 93), which corresponded to the maximal spectral
resolution of our reectance data.
III. EXTENSION OF THE SENSOR SYSTEM
We chose to study howthe addition of a newcharacteristic re-
sponse function would improve the classication performance
of the four-band Leica system introduced in Section II-B. Three
alternative response functions with different bandwidths are
considered in this paper.
In the rst case, the additional response function was chosen
based on the existing real system. In some congurations,
the Leica system provides an optional near-infrared channel
supported in the region of 705755 nm[2]. The four-band Leica
system and the extended system are shown in Fig. 3.
In the second case, the four-band Leica system was extended
using an extra response function based on the properties of the
reectance ensembles of the trees and the original Leica system.
We xed two different bandwidths and calculated the posi-
tions of these response functions by measuring the separability
among the three classes. The positions of the new response
functions were allowed to change only in the nonsupported
662833-nm gap of the original Leica system. The details of
the calculation are described below.
Two new response functions corresponding to the dif-
ferent bandwidths are derived by maximizing the average
JeffriesMatusita distance (see [26]) of the three tree classes.
This distance is based on the calculation of the average Bhat-
tacharyya distance between the density functions of two classes.
Assuming normal distribution for the classes, the distance
between class i and j is dened as
J
ij
= 2 (1 exp(
ij
)) (10)
Fig. 4. Leica system and average spectra for the second derivatives of the
tree ensembles in the wavelength region of 420850 nm (solid line = spruce,
dash-dotted line = birch, and dashed line = pine).
where

ij
=
1
8
(x
i
x
j
)
T
((
i
+
j
)/2)
1
(x
i
x
j
)+
1
2

ij
(11)

ij
=ln(10)
_
lg (|(
i
+
j
)/2|)
1
2
lg (|
i
|)
1
2
lg (|
j
|)
_
.
(12)
In the earlier formulation, covariance matrices
i
and
j
correspond to the simulated 5-D multispectral responses, and
the determinants of these matrices are presented in their ten-
based logarithmic scales (see [26] for more information). Mul-
tispectral responses were simulated using (9) with concatenated
responsivity matrix

W
T
p,b
= [ w
p,b
W]
T
R
n5
, where W cor-
respond to the four-band Leica system and w
p,b
correspond to
the fth response function with bandwidth (b) and position (p).
In this paper, the bandwidths were chosen to be 30 and 15 nm.
For both of these bandwidths, we calculated the optimal posi-
tions from the region of 662833 nm (using a 5-nm sampling
grid), which were dened to correspond to the maximal average
distances between the classes. In the calculations, we used
randomizations of available dataset, where 75% of the available
reectance data were used to simulate the multispectral ensem-
bles for the three classes. Approximately, same amount of data
samples was used for every class. Because of the deviation
in the properties of the randomized datasets, there was some
variation in the optimal positions. This variation with respect
to optimal positions was approximately 10 nm, but all the
distance values in this region were close to the maximal values.
The response functions chosen for this paper are located in the
regions of 695725 and 710725 nm (see Fig. 3).
To support the choice of the new response functions pre-
sented earlier, we analyzed the differences in second-order
derivative features between classes so as to identify changes
in the reectance curve. Derivative analysis has already been
used for remotely sensed data in previous studies [5] [26].
In order to dampen the effect of measurement noise, mean
ltering with a window size of three units was performed before
the calculation of the divided difference approximation for the
second derivative [26]. When the average derivative curves are
analyzed, it can be seen that the 660830-nm interval shows
an interesting behavior when compared with other wavelength
regions and locations of the original Leica response functions
(see Fig. 4). The average of the second derivative for the birch
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION 1359
Fig. 5. Average spectra for the second derivatives of the tree ensembles
in the unsupported wavelength region of 660780 nm of the Leica system
(solid line = spruce, dash-dotted line = birch, and dashed line = pine).
spectra deviates strongly from that of the pine and spruce
spectra in the regions of 690720 and 725755 nm. It can be
also seen that the derivative of spruce spectra shows deviation
in shape from that of the pine and birch spectra in the region of
710725 nm (see Fig. 5).
Summarizing the earlier discussion, the nal candidates
for the fth band were
[705,755]
(),
[695,725]
(), and

[710,725]
(). The classication performance was studied for
the three ve-channel systems corresponding to these response
functions. Details of the classication are presented in the
following sections.
IV. CLASSIFICATION DETAILS
We used an SVM classication algorithm to discriminate
between the simulated multispectral or hyperspectral measure-
ments. The algorithm is based on the optimization problem of
nding a separating hyperplane between the feature vectors of
two classes [24]. The separating hyperplanes identied by the
SVM maximize the margin between the classes and are robust
for the classication of unseen samples. Since the multispec-
tral/hyperspectral signatures inside the three classes were not
mixed with signatures from other sources, this setting can be
dened as a pure pixel classication [3].
Let us assume that we have a binary classication problem
with training data in the form {x
i
, y
i
}
l
i=1
, with x
i
R
k
and
y
i
{1, 1}. In the SVM framework, it is assumed that the
data are mapped to some feature space F with feature map
: R
k
F (13)
and an explicit representation of the decision function is writ-
ten as
f(x) = sign
_
w
T
(x) + b
_
(14)
where b is a bias term and w
T
(x) + b = 0 denes a hyper-
plane in the feature space. If the data are separable in the feature
space, it can be written that f(x
i
) +1, when y
i
= +1 and
f(x
i
) 1, when y
i
= 1.
Assuming that the two classes are not separable in the feature
space, the classication model is derived as the solution to the
minimization problem

min
w,b,
1
2
w
T
w+ C

l
i=1

i
,
s.t. y
i
_
w
T
(x
i
) + b
_
> 1
i
, i = 1, . . . , l
and
i
> 0, i = 1, . . . , l
(15)
where the term w
T
w/2 corresponds to the margin between
classes, the parameter C controls the penalization of the sam-
ples located at the incorrect side of the decision boundary, and
{
i
}
l
i=1
are slack variables which indicate misclassication of
sample x
i
when
i
> 1 [24].
The solution is obtained using the dual space of Lagrange
multipliers and the property
(x, z) = (x)
T
(z) (16)
where kernel denes the mapping : R
k
F of input
samples x, z R
k
to the feature space F [24]. In this way,
nonlinear decision boundaries in the input space are dened
without having to make explicit use of the possibly innite
dimensional feature space F. The decision function for the
SVM becomes
f(x) = sign
_
l
s

i=1

i
y
i
(x, x
i
) + b
_
(17)
where l
s
is the number of support vectors, {
i
}
l
s
i=1
are the
calculated Lagrange multipliers, and is the selected positive
denite kernel function [24]. The algorithm for multiclass
classication is an extension of binary classications using
separate binary classications. For more information on SVM
and multiclass techniques, please refer to the study in [7], [18],
and [24].
The present SVM classication was performed with a poly-
nomial kernel of rst degree

L
(x, z) = x
T
z + 1 (18)
and a Gaussian kernel

G
(x, z) = exp
_
x z
2
2
_
(19)
where denes the length scale of the kernel. When the rst-
degree polynomial kernel is used, it is assumed that the decision
boundaries between the classes are hyperplanes in the original
input space.
A. Data Preprocessing
The data were standardized to a zero mean and unit vari-
ance before the calculations. For the Gaussian kernel, this
preprocessing is equivalent to the use of a kernel

G
(x, z) = exp
_
(x z)
2

1
_
. (20)
The norm x
2

1
= x
T

1
x is dened by a diagonal matrix

1
ii
= 1/
2
i
, and
i
denotes the standard deviation of the ith
components of the training set.
The above preprocessing step can be generalized with a full
covariance matrix using the translation invariant Mahalanobis
1360 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
Fig. 6. Assumed birch outliers (10% of the data) for one training set. The
outliers are calculated from the Leica DN using the Mahalanobis distance. The
data in the gure are represented in terms of the two most signicant principal
components of the set.
kernel

M
(x, z) = exp
_
/m(x z)
2

1
_
(21)
where = E[(x x)(x x)
T
] denotes the covariance matrix
of the training ensemble {x
i
}
l
i=1
, x denotes the expected value
of the ensemble and m = k [1]. This preprocessing corresponds
to data whitening, where the data are rst represented in orthog-
onal directions dened by the eigenvectors of the covariance
matrix and scaled to have equal variance in this orthogonal
representation. Using eigendecomposition of the covariance
matrix = USU
T
and notation d = (x z) for the difference
vector, the generalized norm can be written in the form
d
T

1
d = d
T
US
1
U
T
d =
k

i=1
(d
T
u
i
)
2
/S
ii
(22)
where u
i
and S
ii
denote the ith eigenvector and corresponding
eigenvalue, respectively. Equation (22) shows that coordinates
d
T
u
i
of the difference vector are divided by the data variances
in the corresponding directions. Directions corresponding to a
small data variance are given more weight.
Usage of Mahalanobis kernel is closely related to outlier
removal based on the Mahalanobis distance [23]
x x
2

1 = (x x)
T

1
(x x). (23)
When using this distance for data preprocessing, we assume
that the data distribution in one class has an ellipsoidal shape.
According to this assumption, large Mahalanobis distances
indicate anomalous data items and should be removed from the
ensemble. For example, the dataset used in this study includes
reectance measurements with different solar geometries and
therefore contains some variation that might disrupt the clas-
sication performance. It can be expected that this kind of
renement method for training data weights the essential parts
of the ensemble more efciently (see Fig. 6).
B. Model Training
The kernel and margin parameters were found using a tenfold
cross-validation routine [24], with a grid search performed for
the Gaussian kernel, while an alternative two-step line search
TABLE I
CLASSIFICATION ERROR RATES WHEN THE SIMULATED HYPERSPECTRAL
DATA ARE USED. STATISTICALLY SIGNIFICANT DIFFERENCES TO THE
BEST PERFORMING KERNEL ARE INDICATED WITH UNDERLINES
method was used for the Mahalanobis kernel [1]. In the rst
step, a line search for parameter C was performed using a xed
value = 1. In the second step, the resulting parameter C was
xed, and a line search was performed for parameter . This
provided a signicant speeding up of the training phase without
any signicant decrease in accuracy.
The kernel evaluations were carried out using the high-
dimensional radiance measurements or simulated camera re-
sponses (DNs). The multiclass one against one method was
used for the classications, which were carried out using the
SimpleSVM Matlab toolbox (v.2.31) [17]. In this method, the
SVM is trained separately for each pair of classes, and the de-
cision regarding the class label for the test sample x is made by
voting between the binary classiers concerned. This strategy
has been veried as accurate for land cover classication [18].
V. EXPERIMENTAL DETAILS
The classication performance was calculated for the DNs
from the simulated multispectral systems and for the simulated
full-resolution radiance data with 5-nm sampling. The wave-
length interval was xed at 390850 nm. The covariance matrix
is poorly suited for use with full-resolution data due to the
small effective dimensionality of the data and the instability
of the inversion. Magnitude of the condition numbers of the
covariance matrices of radiance data was 10
11
. Because of this,
the results of the classication of the full-resolution data with
the Mahalanobis kernel are not presented. A subspace mapping
or regularization technique of some kind would be needed in
order to use this kernel efciently with high-dimensional data.
Misclassication ratios were calculated for the combinations
of sensor system and kernel, employing ve randomizations
of the available data to the training and test sets. These same
randomizations were used for all the systems and kernels. The
sample sizes for the tree classes were the following: birch,
336; pine, 369; and spruce, 348. In each randomization, 75%
of the samples in each class were assigned to the training set,
and the remaining 25% were used in the test set. The data
were standardized for the rst-degree polynomial and Gaussian
kernels.
This data preprocessing method was also compared to pre-
processing with outlier removal. In this paper, 10% of the
training data items corresponding to the largest Mahalanobis
distances were removed; this is done for each class separately
using the simulated 4- or 5-D camera responses. The results
are presented in Tables IV. The results obtained after extend-
ing the Leica system with additional sensors are denoted in
Tables IV and V by the respective wavelength intervals sup-
ported. In the following, the Gaussian and Mahalanobis kernels
are called as nonlinear kernels.
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION 1361
TABLE II
CLASSIFICATION ERROR RATES WHEN DNS FROM THE LEICA SYSTEM
ARE USED. STATISTICALLY SIGNIFICANT DIFFERENCES TO THE BEST
PERFORMING KERNEL ARE INDICATED WITH UNDERLINES
TABLE III
CLASSIFICATION ERROR RATES WHEN DNS FROM THE LEICA SYSTEM
ARE USED AND 10% OF THE TRAINING DATA ITEMS ARE REMOVED AS
OUTLIERS. STATISTICALLY SIGNIFICANT DIFFERENCES TO THE BEST
PERFORMING KERNEL ARE INDICATED WITH UNDERLINES
TABLE IV
CLASSIFICATION ERROR RATES WHEN DNS FROM THE FIVE-CHANNEL
SYSTEMS ARE USED. STATISTICALLY SIGNIFICANT DIFFERENCES TO
THE BEST PERFORMING KERNEL AND SYSTEM ARE INDICATED
WITH UNDERLINES AND BOLDFACE, RESPECTIVELY
TABLE V
CLASSIFICATION ERROR RATES WHEN DNS FROM THE FIVE-CHANNEL
SYSTEMS ARE USED AND 10% OF THE TRAINING DATA ITEMS ARE
REMOVED AS OUTLIERS. STATISTICALLY SIGNIFICANT DIFFERENCES
TO THE BEST PERFORMING KERNEL AND SYSTEM ARE INDICATED
WITH UNDERLINES AND BOLDFACE, RESPECTIVELY
A. Statistical Signicance of Classication Differences
McNemars test was used to test the statistical signicance
between the classication results [6], [8]. The McNemars value
with continuity correction is dened as
M =
(|f
12
f
21
| 1)
2
f
12
+ f
21
(24)
where f
12
is the number of samples misclassied by classier
1 but not by classier 2 and f
21
is the number of samples
misclassied by classier 2 but not by classier 1. The null
hypothesis is that the two different classiers 1 and 2 have the
same error rate, which means that f
12
= f
21
. McNemars test
is based on a
2
test with one degree of freedom. In this paper,
the
2
critical value with a 5% level of signicance was chosen,
and with one degree of freedom, the value is 3.8414. If the null
hypothesis is true, the probability of having a McNemars value
greater than the critical value is less than 5%.
Results are presented in Tables IV, so that, for every sensor
system, statistically signicant differences to the best perform-
ing kernel are indicated with underlining. The McNemars
test for statistical signicance shows that there are signicant
differences between the rst-degree polynomial and nonlinear
kernels. The signicant difference between the Gaussian and
Mahalanobis kernels can be seen only for the simulated four-
channel Leica system (Sets 2 and 5). When the outlier removal
is performed, the signicant differences between these kernels
vanish. In the case of the ve-channel systems, no signicant
difference is detected between these two kernels.
For every kernel (Tables IV and V), statistically signicant
differences to the best performing sensor system is indicated
with bold face notation. It was validated for all the kernels that
there are no statistically signicant differences between the sys-
tems with additional support in the 695725- and 705755-nm
regions. This result is also valid for the case where outlier
removal is used. Signicant differences can be found when the
systemsupported in the region of 710725 nmis compared with
two other systems.
B. Classication Results
The results in Table I show that the classication based on
the hyperspectral measurements of radiance leads to superior
performance relative to the classication based on the four-band
measurements in Table II. The large number of wavelength
bands allows small changes in spectral shape to be detected. For
the high-dimensional radiance data, a rst-degree polynomial
kernel gives a slight improvement in classication accuracy
over the Gaussian kernel. The results for the low-dimensional
inputs suggest that it is benecial to use the nonlinear kernel.
The use of a nonlinear feature space compensates for the poor
spectral accuracy of the measurement system; a decrease of
approximately 523 percentage points in the misclassication
performance is achieved when a nonlinear kernel is substituted
for the rst-degree polynomial kernel. The Mahalanobis kernel
outperforms the Gaussian kernel in four cases and has almost
equal performance for one set. It can be seen that the maximal
difference in misclassication in favor of the Mahalanobis
kernel is 16 percentage points.
It is shown in Table III that the removal of 10% of data points
as outliers clearly increases the accuracy of the rst-degree
polynomial and Gaussian kernels when compared with the
training sets including outliers, whereas for the Mahalanobis
kernel, the results are similar with or without this preprocessing.
We also studied how an additional response function would
enhance the classication performance. Three wavelength loca-
tions with varying support were tested. It is shown in Table IV
that all these ve-channel systems show a signicant improve-
ment in classication performance relative to the results for the
four-channel system in Table II, with the misclassication ratio
decreasing by 123 percentage points depending on the kernel.
Nonlinear kernels seem to benet most from the addition of a
new band; the largest increase in accuracy is obtained for Set 2.
For the Gaussian kernel, there is a decrease of 22.8 percentage
1362 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
points in the misclassication ratio. The response function in
the 705755-nm region shows almost identical performance to
that in the 695725-nm region, with the misclassication ratio
differing between these two regions by 04 percentage points
in favor of the former. A response function in the region of
710725 nm leads to the most accurate classication with both
the Gaussian and Mahalanobis kernels, reducing the average
error by approximately 0.59 percentage points relative to the
other two ve-channel systems. On the other hand, this sensor
system leads to an increase in the errors relative to the other
ve-channel systems with the rst-degree polynomial kernel. In
most cases, the Gaussian and Mahalanobis kernels have similar
performance with these ve-channel systems.
Outlier removal for the ve-channel systems (Table V) again
leads to improved performance for the rst-degree polynomial
kernel with almost every dataset and every system, whereas the
results for the Gaussian kernel with these ve-channel systems
are poorer in almost every case due to preprocessing. For the
Mahalanobis kernel, the results again show a somewhat similar
performance with or without preprocessing. The results are
poorer for the systems with additional support in the 695725-
and 705755-nm regions but remain almost unaltered for the
system with a sensor in the 710725-nm region.
VI. DISCUSSION
We presented the classication results when additional
spectral response functions were used with the Leica sys-
tem. The bandwidth of the best performing response function

[710,725]
() is smaller than that of the other two response
functions
[705,755]
() and
[695,725]
(), and the wavelength
range of 710725 nm is also covered by the other two ad-
ditional bands. Thus, the results suggest that the deviation in
performance is partly due to the smaller bandwidth of this
response function. The effectiveness of the band
[710,725]
()
depends also on the kernel because results suggested that the
rst-degree polynomial kernel does not show consistent perfor-
mance difference between the three different bands. The results
also show that there was no signicant performance difference
between the
[705,755]
() and
[695,725]
() bands, although the
bandwidth difference is 20 nm.
The experiments suggest that a decrease in the bandwidth
(from 30 to 15 nm in this paper) is useful only if the location
is specied accurately. For a response function with 15-nm
bandwidth, a small shift in the position had sometimes signif-
icant effect for the classication of samples from the test sets.
The experiments also showed that the band-selection method
for the optimal positions had deviating results, depending on the
used data randomization (75%of the available data were used in
every randomization). Although the deviation in optimal posi-
tions was small, a careful data evaluation is needed, particularly
when position is calculated for the response function with small
bandwidth.
The three new response functions were located on the wave-
length interval of 660830 nm in this paper. This region is moti-
vated since response functions in this region avoid overlap with
the four Leica response functions. In addition, the derivative
analysis shows that the tree spectra have interesting features
in this region. It can be argued that other useful regions can
be found beyond the 660830-nm region. For example, the
average second derivatives have also elevated values in intervals
of 495525 and 590605 nm, as shown in Figs. 4 and 5. These
two intervals are not supported by the original Leica system and
might also provide valuable information from the tree species.
However, further investigation is needed to nd out if these
wavelength regions have potential also for the classication of
these tree species.
When classication of these reectance data at the same
resolution as in the present case was attempted previously
with linear subspace classiers in [13], a linear pseudoinverse
classier was found to give the best performance, with a similar
performance to that of the SVM model used here for the simu-
lated radiance data. In comparison with our results for the low-
dimensional multispectral measurements, it has been reported
that it is benecial to increase the order of the polynomial kernel
if only a small number of input variables are present [11].
Results suggest that the measurements obtained from a
simulated hyperspectral imaging device will capture essen-
tial information from the training set also with rst-degree
polynomial kernel. Some techniques for the preprocessing of
training sets were evaluated here in order to improve further
the performance of the SVM classication based on the low-
dimensional multispectral measurements. For the four-channel
system, it was veried that the usage of the Mahalanobis
distance as a preprocessor or the usage of the Mahalanobis
kernel improved the classication accuracy. This suggests that
the SVM model based on the Mahalanobis kernel is robust with
respect to outliers in the case of the datasets used in this paper.
The results for the ve-channel systems suggest that the outlier
removal used here is benecial only when the classier is used
with a rst-degree polynomial kernel. The Gaussian kernel was
capable of extracting essential information from the training
data of these ve-channel systems without outlier processing.
On the other hand, the usage of the Mahalanobis kernel did not
lead to any decrease in performance when compared with the
Gaussian kernel.
It can be assumed that some of the so-called outliers in data
will be due to, for instance, varying measurement conditions
(e.g., variable sensorobjectsun geometry) or measurement
errors. On the other hand, the set of outliers includes also
samples due to natural spectral variation within and between
species. In practice, it might be difcult to remove the dis-
rupting data variation automatically in an optimal way so that
no essential information is removed from the training set. The
results obtained here for the ve-channel systems, for example,
suggest that the procedure for the removal of 10% of the
training samples using Mahalanobis distance was too extreme
and it decreased the classication accuracy. On the other hand,
the results for this dataset suggest that the Mahalanobis kernel
performed the processing of outliers more efciently and
automatically, without need to set any threshold value.
It should be noted that, when compared with the Gaussian
kernel, different cross-validation routines were used for the
training of the SVM with Mahalanobis kernel. Training method
based on two line searches has effect to the performance of
the Mahalanobis kernel, but the difference to the grid search
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION 1363
was found to be small for our data. It is also noted that, when
the Mahalanobis kernel is used, the inverse covariance matrix
includes the effect of the training samples from all the three
classes (total covariance matrix). It was veried for these data
that the set of outliers detected using the Mahalanobis distance
for every class separately is similar to the set of outliers when
calculated using the inverse of the total covariance matrix. This
similarity gives some explanation for the similar results of the
Mahalanobis and Gaussian kernels with outlier removal.
VII. CONCLUSION
A signicant amount of redundancy exists in spectral radi-
ance from natural objects, and intelligent signal measurement
(or compression) is appropriate. This was achieved here using
four-channel system based on a real Leica ADS40/ADS80
sensor system. A simulated optical radiation model was used to
evaluate the tree species classication performance of the given
sensor system using the SVM classier with three different
kernel functions. The effects of an additional fth spectral band
and data preprocessing were studied using this simulator.
We have employed a model in the simulations which includes
the following properties for the sensed signal.
1) The reectance spectra (as measured at ground level)
are assumed to correspond to the signal sensed from the
geometry of the airborne camera.
2) A pure pixel assumption [3]. Depending on the distance
between the camera and the surface (varying ight alti-
tude), the reectance distribution for practical measure-
ments is a mixture of different signatures present in the
scene.
3) The incident light at the sensor also has some effect on
account of indirect scattered component, which has been
ignored in this paper.
4) An unknown measurement noise component is included
in the measured reectance distributions and is prop-
agated to simulated responses via a quadrature model
in (9).
5) The spectral response functions of the simulated camera
were an idealization, and in reality, they will be inu-
enced by the properties of the lens, beam splitter, and
interference lters and by the sensitivity of the charged-
coupled device system [2].
In the previous study, it has been shown that it is possible to
classify reectance spectra accurately in a 3-D subspace using
spectral response functions from linear subspace classiers
[13]. The modeling presented in this paper is more closely
related to practical band construction since it allows us to
interpret the subspace-mapped data directly as simulated mea-
surements from a multispectral camera. Real construction of
spectral response functions derived from the optimal subspace
classier is impossible due to the wildly oscillating behavior of
these functions (see [13]), and it may also be unrealistic to use
only certain very narrow wavelength bands in the measurement
system.
Classication performance nevertheless degenerates signif-
icantly from the results obtained with high-dimensional mea-
surements when the camera DNs corresponding to the xed
four-channel Leica system are used as input vectors for the
SVM classier. The results indicate a need for a higher number
of bands, decrease in the bandwidths, or new positioning of the
bands in order to improve the classication accuracy. Of the
three extensions of the Leica system to a ve-channel system
evaluated here, band selection based on the use of the interval
of 710725 nm showed promising results, with an average
misclassication ratio of 15%.
It was also assumed that the classication performance in
low-dimensional multispectral spaces was decreased due to
outliers in the class samples. It was shown for the four-
channel Leica system that the use of the Mahalanobis kernel
or outlier removal increased the accuracy of the SVM classier.
In addition to this, results suggest that the Mahalanobis kernel
performed the outlier processing automatically without any
user interference and also provided signicant speedup in the
training phase of the classier.
ACKNOWLEDGMENT
The authors would like to thank anonymous reviewers for the
advice and suggestions concerning this paper.
REFERENCES
[1] S. Abe, Training of support vector machines with Mahalanobis kernel,
in Proc. ICANN, 2005, pp. 571576.
[2] U. Beisl, Absolute spectroradiometric calibration of the ADS40
sensor, in Proc. Congrs ISPRS Commission Technique I. Symp.,
Marne-la-Valle, France, 2006, pp. 1418.
[3] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection
and Classication. New York: Kluwer, 2003.
[4] G. Camps-Valls and L. Bruzzone, Kernel-based methods for hyperspec-
tral image classication, IEEE Trans. Geosci. Remote Sens., vol. 43,
no. 6, pp. 13511362, Jun. 2005.
[5] T. H. Demetriades-Shah, M. D. Steven, and J. A. Clark, High-resolution
derivatives spectra in remote sensing, Remote Sens. Environ., vol. 33,
no. 1, pp. 5564, Jul. 1990.
[6] T. G. Dietterich, Approximate statistical tests for comparing super-
vised classication learning algorithms, Neural Comput., vol. 10, no. 7,
pp. 18951923, Oct. 1998.
[7] G. M. Foody and A. Mathur, A relative evaluation of multiclass image
classication by support vector machines, IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 6, pp. 13351343, Jun. 2004.
[8] G. M. Foody, Thematic map comparison: Evaluating the statistical sig-
nicance of differences in classication accuracy, Photogramm. Eng.
Remote Sens., vol. 70, no. 5, pp. 627633, 2004.
[9] F. A. Gougeon, D. A. Leckie, D. Paradine, and I. Scott, Individ-
ual tree crown species recognition: The Nahmint study, in Proc. Int.
Forum Autom. Interpretation High Spatial Resolution Digital Imagery
Forestry, D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998,
pp. 209223.
[10] A. Haara and M. Haarala, Tree species classication using semi-
automatic delineation of trees on aerial images, Scand. J. For. Res.,
vol. 17, no. 6, pp. 556565, Nov. 2002.
[11] C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of sup-
port vector machines for land cover classication, Int. J. Remote Sens.,
vol. 23, no. 4, pp. 725749, Feb. 2002.
[12] R. D. Jackson, S. M. Moran, P. N. Slater, and S. F. Biggar, Field cali-
bration of reference reectance panels, Remote Sens. Environ., vol. 22,
no. 1, pp. 145158, Jun. 1987.
[13] T. Jskelainen, R. Silvennoinen, J. Hiltunen, and J. P. S. Parkkinen,
Classication of the reectance spectra of pine, spruce, and birch, Appl.
Opt., vol. 33, no. 2, pp. 23562362, Apr. 1994.
[14] I. Korpela, Individual tree measurements by means of digital aerial pho-
togrammetry, Silva Fennica Monographs, vol. 32004.
[15] I. Korpela and T. Tokola, Potential of aerial image-based monoscopic
and multiview single-tree forest inventory: A simulation approach, For.
Sci., vol. 52, no. 2, pp. 136147, Apr. 2006.
[16] ADS80 Datasheet, Leica Geosystems AG, Heerbrugg, Switzerland, 2008.
[Online]. Available: http://www.leica-geosystems.com
1364 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
[17] G. Loosli, SimpleSVM: The Matlab Toolbox2004. [Online]. Available:
http://gaelle.loosli.fr/research/tools/simplesvm.html
[18] F. Melgani and L. Bruzzone, Classication of hyperspectral remote sens-
ing images with support vector machines, IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 8, pp. 17781790, Aug. 2004.
[19] G. Mercier and M. Lennon, Support vector machines for hyperspectral
image classication with spectral-based kernels, in Proc. IEEE IGARSS,
2003, pp. 288290.
[20] E. Oja, Subspace Methods of Pattern Recognition. Hertfordshire, U.K.:
Res. Studies Press, 1983.
[21] G. M. Phillips and P. J. Taylor, Theory and Applications of Numerical
Analysis. New York: Academic, 1973.
[22] A. Pinz, Tree isolation and species classication, in Proc. Int. Fo-
rum Autom. Interpretation High Spatial Resolution Digital Imagery
Forestry, D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998,
pp. 127139.
[23] R. A. Schowengerdt, Remote Sensing: Models and Methods for Image
Processing, 3rd ed. Amsterdam, The Netherlands: Elsevier, 2007.
[24] B. Schlkopf and A. J. Smola, Learning With Kernels. Cambridge, MA:
MIT Press, 2002.
[25] R. Synjoki, P. Packaln, M. Maltamo, M. Vehmas, and K. Eerikinen,
Detection of aspens using high resolution aerial laser scanning data and
digital aerial images, Sensors, vol. 8, no. 8, pp. 50375054, 2008.
[26] F. Tsai and W. D. Philbot, Derivative-aided hyperspectral image analysis
system for land-cover classication, IEEE Trans. Geosci. Remote Sens.,
vol. 40, no. 2, pp. 416425, Feb. 2002.
Ville Heikkinen received the M.Sc. degree in ap-
plied mathematics from the University of Joensuu,
Joensuu, Finland, in 2004.
He is currently with the Department of Computer
Science and Statistics, University of Joensuu. He has
worked with method development in spectral data
analysis and classication.
Timo Tokola received the D.Sc. degree in forestry
from the University of Joensuu, Joensuu, Finland.
He has over 20 years of professional experience.
He is currently a Professor of forest information
technology with the Faculty of Forest Sciences, Uni-
versity of Joensuu. He had previously mainly worked
in the elds of natural resource inventory, geograph-
ical information systems (GIS), information system
planning, and forest management planning. He has
been working in various projects as a Coordinator.
His private and public sector assignments include
modern forest management including aerial photography, photogrammetry,
satellite remote sensing, terrestrial and airborne laser scanning, GPS-based
mapping, GIS database design, analysis of GIS data, and implementation of
desktop GIS systems. He has published over 50 scientic refereed papers
on database design, GIS, forest resource inventory, and remote sensing. His
main interests include developing methods for using remote sensing in natural
resource inventory and computer applications for supporting regional decision
making.
Jussi Parkkinen received the M.Sc. degree in med-
ical physics and the Ph.D. degree in mathematics
from the University of Kuopio, Kuopio, Finland, in
1982 and 1989, respectively.
In 19891990, he was a Visiting Researcher with
The University of Iowa, Iowa City. In 1990, he
was a Visiting Professor with the University of
Saskatchewan, Saskatoon, SK, Canada. In 1991
1992, he was a Professor and the Head of the Depart-
ment of Computer Science, University of Kuopio.
In 19921998, he was a Professor of information
processing, and in 19951998, he was the Dean of the Department of In-
formation Technology, Lappeenranta University of Technology, Lappeenranta,
Finland. Since 1999, he has been a Professor of computer science, and since
2007, he has been the Vice Rector responsible for research with the University
of Joensuu, Joensuu, Finland. He specializes in spectral color image analysis
and pattern recognition. Since 2007, he has been a Visiting Professor with Chiba
University, Chiba, Japan.
Dr. Parkkinen was the Chairman of the Finnish Pattern Recognition Society
in 19951999. He is a fellow and a member of the governing board of the
International Association of Pattern Recognition. He was the Chairman of the
CIE TC8-07 technical committee on multispectral imaging in 20042008.
Ilkka Korpela received the Ph.D. degree in forestry
from the University of Helsinki, Helsinki, Finland,
in 2004.
He is currently a Researcher with the Depart-
ment of Forest Resource Management, University of
Helsinki. He has worked with method development
in 3-D measurement and classication of forest veg-
etation using terrestrial and airborne image and light
detection and ranging (LiDAR) data.
Timo Jskelinen was born in Luumaaki, Finland,
in 1953. He received the Ph.D. degree in physics
from the University of Joensuu, Joensuu, Finland,
in 1981.
From 1981 to 1991, he was a Chief Assistant and
Acting Associate Professor with the University of
Kuopio, Kuopio, Finland. From1987 to 1989, he was
a Visiting Research Scientist with Saitama Univer-
sity, Saitama, Japan. In 1991, he was an Associate
Professor with the Department of Physics, University
of Joensuu. Since 1994, he has been a Professor with
the same department. He is also the Head of the department. He has published
approximately 100 articles. His research interests include optical materials
research, optical metrology, color research, and information optics.
Prof. Jskelinen is a member of the Optical Society of America.

You might also like