Professional Documents
Culture Documents
Abstract—This paper addresses the problem of automatically Index Terms—Gaussian radial basis function (RBF) kernel, gen-
tuning multiple kernel parameters for the kernel-based linear eralization capability, kernel Fisher discriminant (KFD), kernel
discriminant analysis (LDA) method. The kernel approach has parameter, model selection.
been proposed to solve face recognition problems under complex
distribution by mapping the input space to a high-dimensional
feature space. Some recognition algorithms such as the kernel I. I NTRODUCTION
principal components analysis, kernel Fisher discriminant, gen-
ACE recognition research started in the late 1970s and has
eralized discriminant analysis, and kernel direct LDA have been
developed in the last five years. The experimental results show that
the kernel-based method is a good and feasible approach to tackle
F become one of the most active and exciting research areas
in computer science and information technology since 1990
the pose and illumination variations. One of the crucial factors [1], [2]. Many face recognition algorithms/systems have been
in the kernel approach is the selection of kernel parameters, developed during the last decade. Among various approaches,
which highly affects the generalization capability and stability of the appearance-based method, in general, gives a promis-
the kernel-based learning methods. In view of this, we propose
an eigenvalue-stability-bounded margin maximization (ESBMM)
ing result. Principal components analysis (PCA) and linear
algorithm to automatically tune the multiple parameters of the discriminant analysis (LDA) are the two most popular methods
Gaussian radial basis function kernel for the kernel subspace LDA in appearance-based approaches for face recognition. Their su-
(KSLDA) method, which is developed based on our previously de- perior performances have been reported in the literature during
veloped subspace LDA method. The ESBMM algorithm improves the last decade [3]–[6]. Moreover, in general, LDA-based meth-
the generalization capability of the kernel-based LDA method ods perform better than PCA-based methods. This is because
by maximizing the margin maximization criterion while main-
LDA-based methods aim to find projections with most discrim-
taining the eigenvalue stability of the kernel-based LDA method.
An in-depth investigation on the generalization performance on inant information, whereas PCA-based methods find projec-
pose and illumination dimensions is performed using the YaleB tions with minimal reconstruction errors. The first well-known
and CMU PIE databases. The FERET database is also used for LDA-based face recognition algorithm, called FisherFace, was
benchmark evaluation. Compared with the existing PCA-based developed in 1997. After that, a number of LDA-based face
and LDA-based methods, our proposed KSLDA method, with the recognition algorithms/systems have been developed. Gener-
ESBMM kernel parameter estimation algorithm, gives superior ally speaking, the LDA-based face recognition algorithm gives
performance.
a satisfactory result under controlled lighting conditions and
small face image variations such as facial expressions and small
occlusion. However, the performance is not satisfactory under
Manuscript received May 22, 2006; revised October 17, 2006. This project
was supported in part by Earmarked Research Grant HKBU-2113/06E of the large variations such as pose and illumination variations. To
Research Grants Council, by the Science Faculty Research grant of Hong solve such complicated image variations, a kernel trick has been
Kong Baptist University, by the Sun Yat-Sen University Science Foundation, by employed. The basic idea is to nonlinearly map the input data
the National Science Foundation of Guangdong under Contract 06105776, by
the National Science Foundation of China (NSFC) under Contract 60373082,
from the input space to a higher dimensional feature space
and by the 973 Program under Contract 2006CB303104. This paper was and then perform LDA in the feature space. By performing
recommended by Associate Editor J. Su. this nonlinear mapping, we hope that the complex distribution
J. Huang is with the Department of Computer Science, Hong Kong Baptist becomes linearly separable in the feature space.
University, Kowloon, Hong Kong, and also with the Department of Com-
puter Science, Guangdong Province Key Laboratory of Information Security, Following the success of applying the kernel trick in support
School of Information Science and Technology, Sun Yat-Sen (Zhongshan) vector machines (SVMs), many kernel-based PCA and LDA
University, Guangzhou 510275, China (e-mail: jhuang@comp.hkbu.edu.hk; methods have been developed and applied in pattern recogni-
hjian@mail.sysu.edu.cn). tion tasks. In 1996, a nonlinear form of PCA, namely, kernel
P. C. Yuen is with the Department of Computer Science, Hong Kong Baptist
University, Kowloon, Hong Kong (e-mail: pcyuen@comp.hkbu.edu.hk). PCA (KPCA) is proposed by Schölkopf et al. [7]. In 1999,
W.-S. Chen is with Institute of Intelligent Computing Science, College Mika et al. proposed the kernel Fisher discriminant (KFD) [8]
of Mathematics and Computational Science, Shenzhen University, Shenzhen method by introducing the kernel trick on the Fisher discrim-
518060, China (e-mail: chenws@szu.edu.cn).
J. H. Lai is with the Department of Electronics and Communication
inant. It provides a classification framework for a two-class
Engineering, School of Information Science and Technology, Sun Yat-Sen problem. In 2000, Baudat and Anouar [9] proposed the
(Zhongshan) University, Guangzhou 510275, China (e-mail: stsljh@mail. generalized discriminant analysis (GDA) method by extending
sysu.edu.cn). the KFD method to multiple classes. Their method assumes
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. that the kernel matrix K is nonsingular with the applications
Digital Object Identifier 10.1109/TSMCB.2007.895328 on low-dimensional patterns. Experimental results on Iris and
1083-4419/$25.00 © 2007 IEEE
848 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
C
Sb = 1 N (m m)(m m)T (2)
N Σ
− −i
i i
848 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
different
because of the
different
objective
functions.
Therefore,
directly
applying the
methods in SVM
to the kernel-
based LDA
method is not
feasible.
In SVM, the
typical
approach for
tuning a kernel
para- meter is
to use the
leave-one-out
(LOO)
estimator or
the
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 849
cross-validation estimator. However, this is time consuming. and applying the subspace LDA method in the feature space.
Currently, a more computationally efficient strategy is to con- Details of each step are discussed in the following.
struct an upper bound or approximate the true generalization 1) Nonlinear Mapping From the Input Space to the Feature
d
error without an LOO procedure. Space: Let φ : R→ Fdf ,x →φ(x), be a nonlinear mapping
In 1999, Jaakkola and Haussler present a cross-validation from the input space R to a high-dimensional feature space
theoretical bound [29]. No experimental results are provided in F. Considering a C-class problem in the feature space, the set
their article. In 2002, Chapelle et al. [30] employed a gradient- of all classes is represented by C = {C1, C 2 , . .. , CC }, and
descent-based method to minimize the LOO estimation of the the jth class Cj contains Nj samples, j = 1, 2, . . . , C. Let N
generalization error of SVM over the parameters. Each opti- C
be
Σ
mized parameter is selected individually. Their results show the total number of training samples, i.e., N = j=1 N j . Let
that an accurate estimate of the error is not required, and a sim- ix
j be the jth sample in class i, and mi and m be the mean
ple estimate performs well. Thus, the LOO or cross-validation of the ith class and all samples, respectively. The within-class
procedure can be avoided. Mika [31] presented two techniques: scatter matrix S w, between-class scatter matrix Sb, and total-
stability bounds and algorithmic luckiness. He mentioned that class scatter matrix St can be formulated in the feature space F
these could be used as a basis for deriving the generalization as follows:
error for the kernel-based LDA method, but no details are given
Sw = 1 Σ Σ .φ .xij Σ − m Σ .φ .xij Σ − m Σ T
C Ni
in his Ph.D. dissertation. Schittkowski [32] optimized the kernel
N i i
parameters for SVM using a two-level approach. They split the i=1 j=1
training data into two sets: one set for formulating the SVM Σ
C Σ
Ni . . . Σ Σ Σ. 1 . . Σ Σ ΣT
and the other one for minimizing the generalization error. Lee = √1 φ xi − mi √ φ xi − mi
j
j
and Lin [33] studied the relation between the LOO rate and the i=1 j=1
N N
T
stopping criteria for SVM from the numerical analysis point of =Φ Φ (4)
view and proposed loose stopping criteria for the decomposi- w w
tion method in SVM. In 2003, Keerthi and Lin [34] analyzed the
behavior of SVM when the kernel parameter takes a very small where Φw is a df × N matrix and is defined as
or a very large value. They developed an efficient heuristic
Φ = ΣΦ̃i,j
wΣ
method of searching for the kernel parameter with small gen- i=1,...,C, j=1,...,Ni
eralization errors. Zhang et al. [35] recently proposed to deter-
w Σ
mine the kernel parameters by optimizing an objective function = Φ̃1,1
w
, Φ̃w1,2 , . . . , Φ̃w1,N1 , Φ̃w2,1 , Φ̃w2,2 , . . . , Φ̃
w
2,N2
,...,
for a kernel minimum distance classifier. Moreover, some re-
search articles also discussed and proposed methods for mea- Σ (5)
suring the similarity between two kernel functions [36], [37]. Φ̃w
C,1 , Φ̃w
C,2 , . . . , Φ̃w
C,NC
df ×N
√
with Φ̃i,j = (1/ N )(φ(xi ) − mi ) being a df × 1 column vec-
w j
III. O UR P ROPOSED M ETHOD tor. Then, we have
C
This section consists of two parts. The first part introduces 1 Σ
a newly developed kernel-based LDA method, i.e., the KSLDA Sb = Ni(mi − m)(mi − m)T
N
method. The KSLDA method is derived by applying the kernel i=1
trick on our previously developed subspace LDA method [13]. C T
.. Ni
The ESBMM algorithm is proposed in the second part. The =Σ (m Σ .. Σ
i i
ESBMM algorithm automatically chooses multiple parameters N N
Ni
of the Gaussian RBF kernel function for the KSLDA method.
i= − m) (m − m)
1 (6)
With the use of the ESBMM algorithm, the proposed KSLDA
face recognition method could further improve the generaliza- = ΦbΦbT
tion capability. We would like to point out that the ESBMM where Φb is a df × C matrix and is defined as
Σ Σ
algorithm is generic and, with minor modifications, can be
Φb = Φ̃i
applied to all kernel-based LDA algorithms. Details of each part b
i=1,...,C
are discussed in the following. Σ
= Φ̃b , Φ̃b2 , . . . , Φ̃bC
1
(7)
Σ df ×C
A. KSLDA Algorithm
˜
with Φbi = (√ Ni/N )1/2(mi m) being a df 1 column vec-
The KSLDA method is developed based on the subspace tor. Finally, we have − ×
LDA method [13], which solves the S3 problem and has been
C N
St = 1 Σ Σ . . ji Σ Σ. . Σ T Σ
demonstrated to give a better performance than existing LDA- i
based methods such as FisherFace [5], direct LDA [6], and
N φ x − m φ j xi − m
Chen et al. LDA [17]. The earlier version of the KSLDA method i=1 j=1
Σ
C ΣNi . . . Σ ΣΣ. 1. . Σ ΣΣT
√1 φ xi − m
has been reported in [38]. The basic idea of our proposed
= √ φ xi − m
KSLDA method is to apply the subspace LDA method in the j
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 849
j
kernel feature space, and it consists of two steps, namely, i=1 j=1
N N
nonlinear mapping from the input space to the feature space = ΦtΦtT (8)
850 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
where Φt is a df × N matrix and is defined as matrix, and ΛNi is an Ni × Ni matrix with all terms equal to
Σ Σ 1/Ni. [See Appendix III for the derivation of (12).]
Φt = Φ̃i,j By singular value decomposition, there exist orthonormal
Σ t i=1,...,C, matrices U, V ∈ RN×N and a diagonal matrix Λ = diag(σ1,
1,1 1,2 1,N 2,1 2,2 2,N2
= Φ̃ t ,j=1,...,N
Φ̃t , . . i. , Φ̃t 1 , Φ̃t , Φ̃t , . . . , Φ̃
t . . . , σr, 0 ,. . ., 0) ∈ RN×N (σ1 ≥ σ2 ≥ · · · ≥ σr > 0) such
that Swt = U ΛV T .
,...,
Σ
Φ̃C,1 , Φ̃ C,2
, . . . , Φ̃ C,NC (9) Since Sw = Φ wΦ wT , by Theorem 1, we get
t t t
√ df ×N . 2 2 Σ
i,j
with Φ̃ t = (1/ N )(φ(xij) − m) being a df × 1 column (ΦtV )T Sw(Φ tV ) = diag σ 1,..., σ ,r0 , . . . , 0 .
N ×N
vector. Here, we also define
Let V j = [vr+1, vr+2,..., vN ]. Then, the sub-null-space of
1
Ni
1
C Ni Sw, Y , is given by Y = ΦtV j. It can be seen that Y satisfies
mi = Σ . ij Σ i
Y T Sw Y = 0(N −r)×(N −r).
Ni φ x N Σ Σ . j
Σ
j= i=1 j=1 φx. b) Discarding the null-space of Sb: After determining the
1 m=
For classes C and C , an N × N dot-product matrix Ki is sub-null-space of Sw, the projection is then determined outside
i l i l l the null-space of Sb. Thus, the second step is to discard the null-
defined as space of Sb to ensure that the numerator of the Fisher Index will
not be zero.
. Σ Define Sˆb = Y T SbY , and then Sˆb = ( TYΦ )(b T Φ T
Kli = Kijlk k=1,...,N
(10) Y ) .b Let
Z = Y T Φb = (V j)T ΦTt Φb, which is a matrix of (N − r) × C
l
j=1,...,N i
where K ijlk= k(xi j, xlk) = φ(xij ) · φ(x dimensions. Utilizing the kernel function, matrix Z can be
k ). Then, for all C
l
expressed in terms of the kernel matrix K as follows:
classes, we can define an N × N kernel matrix K as
. Σ
K = K ij lk . (11) Z = (V j)T · (K · Λ NC − K · I NC
i=1,...,C,j=1,...,Ni
. Σ
and the within-
I⊂{ }
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 851
b λj S −Sw(X\xp
0 )∪xp1
Σ
class scatter matrix Sw. .j=1 b(X\xp0 )∪xp1
.
The stability is defined as follows [31].
Definition (stability). Let X 1,...,X M be independent ran- 8nR
≤ 2 . (21)
dom variables taking values in a set Z, and let f : Z M → R be (C − 1)(n − 1)
852 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
The proof is given in Appendix V. From Theorem 5, we can of F with respect to kernel parameters θ. In the following, we
2
see that the term 8R /(C− 1)(n −1) measures the change of will give the formulation for gradient computing.
the eigenvalue after replacing one training sample by another To compute the gradient of F with respect to θ, we partial
2
outside the training set. If the term 8R /(C − 1)(n −1) is differentiate the two matrices, P and Q, with respect to θ. In
small, it means that under distribution P , the eigenvalues of order to clearly show the formulation, we first formulate a gen-
the covariance matrix are less sensitive to the training data eral term (∂/∂θ s )K(X i ,X l ). Let Nθ be the number of kernel
j
set. In other words, we can say that the kernel-based learning parameters to be used and Nkθ be the feature dimension.
algorithm is less sensitive on the training data set. In turn, the First, we denote the partial ≤ differentiation of the kernel
algorithm is generalized well from the training data set to the matrix K with respect to θs as ∂Kθs , which is an N× N matrix
testing data set. From the earlier discussion, we know that R, expressed as follows:
which is the minimum radius of the hypersphere containing all
the training data, is crucial to the generalization capability of Σ i,j Σ
the KSLDA algorithm. In maximizing the objective function, the ∂Kθs = ∂Kl,k . (26)
separability between different classes will increase. In turn, the ∂θs i=1,...,C,j=1,...,Ni
l=1,...,C,k=1,...,Nk
radius R will also increase. This means that the eigenvalue sta-
bility and the generalization capability of the learning algorithm Each element in matrix ∂Kθs is the differentiation of the
decrease. Therefore, the objective of the ESBMM algorithm is corresponding element in the kernel matrix K with respect to
to choose proper kernel parameters that maximize the objective θs and is formulated as
function while guaranteeing a good 0
generalization capability by
not exceeding the threshold of R . i,j i l
2) Objective Function Maximization: We propose to adopt . Σ
∂Kl,k ∂K Xj, Xk
the maximum margin criterion [39] as the objective function ∂θs ≡ ∂θs
as follows: Nu . Σ2
i,u l,u
Nθ Σ xj,t − xk,t
F (W, θ) = tr(W T S W − W T S W ) ∂ t=
(22) Σ 1
b w = ∂θ exp − 2θ2
u
s u=1
where W is the projection matrix, and θ = (θ 1 ,..., ) is the Σ
kernel parameters used in the kernel function. As there is no
θd
N
. i,u l,u
need to calculate the inverse of Sw, the objective function u 2
becomes efficient and stable. The basic idea is to determine Nθ Σ xj,t − xk,t
t=
the proper kernel parameters by maximizing the objective Σ 1 2θ2
= exp −u=1 u
function F with respect to the parameter set θ. Our method is
gradient descent based, and an iterative procedure is required
s 2
to update the kernel parameters along the direction of maxi-
. xj,t − xk,t Σ
mizing the objective function. Based on the derivation of the Σ
N
i,s l,s
. −Σ
KSLDA method, the projection matrix W is defined as W = × − t= 2 −2θ3s
Φ tV jjA in (14). Denote tt = V jjA, and tt is determined by our 1
proposed KSLDA method. Then, the objective function F can
Ns
Σ.
be reformulated as follows: 2
i,s
j,t k,tΣ
l,s
F = tr(W T SbW − W TSwW ) t=1 x −x
=
= tr tt Φt Φ Φb Φ tt − tt Φt Φ Φw Φ tt θs
3
. T T T
T T w T t
Σ
= tr(ttT PP T tt − ttT QQT tt)
b t (23)
T i,j
where P = Φ t Φb = [P l ]N×C is an N × C matrix, and Q =
i,u
Σ2
l,u
t w Σ
N
.
ΦT Φ = [Qi,j l,k]N ×N is an N × N matrix. Each element in Σu x − x . (27)
matrix P is an explicit function in terms of the kernel function. × exp −u=1
Nθ j,t k,t
From (13) and (12), matrix P and matrix Q can be expressed in t=
1
terms of the kernel matrix K as follows: 2θu
Then, we derive ∂P/∂θs as follows:2
P = K ·ΛN C − K ·IN C − 1N N ·K ·∆N C + 1N N ·K ·ΓN C
(24) ∂P ∂
= (K · Λ NC − K · INC
1 ∂θs ∂θs
Q = N (K − K ·INN − ΛNN ·K + ΛNN ·K ·INN )T . (25)
− 1NN · K · ∆NC + 1NN · K · ΓNC )
Hence, the objective function F can be described as an explicit = ∂Kθs · ΛNC − ∂Kθs · INC
function of kernel parameters. When initial values of kernel
0
parameters θ are given, to maximize the objective function, we − 1NN · ∂Kθs · ∆NC + 1NN · ∂Kθs · ΓNC (28)
need to find the search direction to update the kernel parameters
0
θ . To get the search direction, we need to compute the gradient where ∂P/∂θ s is an N × C matrix.
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 853
∂θs = ∂θ.. Σ
∂F ∂ 1 2 Nθ N N (31)
∂P ∂P T 1 i,j=
= tr s ttT · · PT · tt + ttT · P · · tt 0 0
and the step size ρ is set to θ /100 in our experiments reported
. ∂θs ∂θs ΣΣ in Section IV.
∂Q ∂QT
− ttT · · QT · tt + ttT · Q · · tt .
∂θs ∂θs IV. E XPERIMENTAL R ESULTS
(30) In this section, the FERET [42] database is selected to
evaluate the performance of our KSLDA method with the
Now, the gradient of F with respect to θ is formulated in ESBMM algorithm. After that, an in-depth investigation on the
terms of matrix K and matrix ∂Kθs . We can use the objective generalization performance of our proposed method and other
function F as a model selection criterion and will make use of existing methods along pose and illumination dimensions is
this to optimize the kernel parameters in the next section. performed using the YaleB and CMU PIE databases.
TABLE I
PERFORMANCE USING A SINGLE PARAMETER
TABLE II
PERFORMANCE COMPARISON USING SINGLE
AND MULTIPLE PARAMETERS
TABLE III
FIXED POSE VARIATION AND EVALUATION OF THE GENERALIZATION
PERFORMANCE ALONG THE ILLUMINATION DIMENSION;
ALL IMAGES OF SET 1 FOR TRAINING AND THOSE
OF SETS 2–4 FOR TESTING
TABLE IV TABLE V
FIXED POSE VARIATION AND EVALUATION OF THE GENERALIZATION FIXED ILLUMINATION AND EVALUATION OF THE GENERALIZATION
PERFORMANCE ALONG THE ILLUMINATION DIMENSION; PERFORMANCE OF THE KSLDA METHOD ALONG THE POSE DIMENSION; ALL
RANDOMLY SELECTED TWO IMAGES FROM EACH SUBSET FOR TRAINING AND IMAGES OF POSE 1 FOR TRAINING AND THOSE OF POSES 2–45 FOR
THE OTHERS FOR TESTING TESTING
TABLE VI
degrade dramatically, except for the subspace LDA method. PERFORMANCE WITH BOTH THE POSE AND ILLUMINATION VARIATIONS
When the illumination directions among the testing samples
further deviated from those of the training samples, e.g., subset
4, both linear and nonlinear method cannot give satisfactory
results. The results in Table III show the following.
1) The generalization performance of all linear and nonlin-
ear methods will degrade when the illumination direc-
tions among the training images are significantly different
from those in the testing images. This implies that both
linear and nonlinear methods cannot give satisfactory
results when there are no representative samples for
training.
2) Generally speaking, nonlinear methods outperform linear
methods.
3) LDA-based methods give a relatively better performance
compared with PCA-based methods.
4) The proposed method always gives the best (or among the
best) performance in all cases.
The above experiment shows that both linear and nonlinear
methods require representative samples for training. Therefore, condition 1. For all cases, the proposed KSLDA method gives a
in the next experiment, for each pose, we randomly select two better performance than the other methods, regardless of which
images from each subset for training (2 images × 4 subset = illumination subsets. Furthermore. in general, kernel-based
8 images for training per individual), and all the other images LDA methods have a relatively better performance than linear-
from the four subsets (45−8 = 37 images) are selected for based LDA methods, except that the subspace LDA method
testing. For each pose, the experiment is repeated ten times, performs as well as kernel-based LDA methods. Results in
and the average rank-1 to rank-3 accuracies are recorded. The Table V show that kernel-based LDA methods have a better
average results on all poses are listed in Table IV. In Table IV, generalization performance along pose dimensions than do
it can be seen that both the kernel-based and linear-based LDA linear-based LDA methods.
methods give very good performance with more representative 3) Both Pose and Illumination Variations: Finally, we
samples for training. would like to perform experiments on both pose and illumina-
2) Fixed Illumination With Pose Variations: Now, we fix tion variations and to evaluate the generalization performance
the illumination variations and change the poses. Images of of all methods along both the pose and illumination dimensions.
nine poses from illumination condition 1 are used for train- The experimental settings are as described follows. For each
ing, whereas the testing will be performed on the remaining pose, we select two images from each illumination subset
44 illumination conditions. As the page width is limited, the (out of four subsets). This is to say that we will randomly
results under these 44 illumination conditions are grouped × 4 subsets
select 720 images (10 persons 9 poses × ×
according to four subsets, as shown in Table V. As images 2 images) for training, and the remaining 3330 (10 persons
under illumination condition 1 are used for training, the results × 9 poses × 37 images) images will be for testing. The
of subset 1 becomes the average accuracy among illumina- experiments are repeated ten times, and the average rank-1 to
tion condition 2 to illumination condition 7. The results of rank-3 accuracies are recorded and shown in Table VI. It can
subsets 2–4 are the average accuracies under the illumination be seen in Table VI that when both the pose and illumination
conditions of each subset, respectively. It can be seen that variations are considered, the performance of kernel-based LDA
the performance of each method is degrading with the illu- methods become obvious compared with previous cases. All
mination condition going extremely different with illumination kernel-based LDA methods give a better performance than do
858 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
√ C Ns C N p
By formulating the dot product in terms of the kernel function, N
we get l
ΣΣΣΣ . Σ
T 1 + √ k xs,t xpq
˜i,j ˜ l,k i T 1 l N 3 s=1
N C N
t=1 p=1
. . Σ Σ . Σ 1 q=1
Nl
s
. Σ Φt = √ φ xj − mi √ φ(xk ) − m Σl
Φw N N √ Klsij − ij
=
1 i T l i T N Nl ΣΣ
s=1 Kst
N2 s=1
. φ(xk )− φ(xj ) m 1 C Ns N l t=1
= φ(xj )
N lq
l T ΣΣΣ
Σ −2 N√l Kst
− mTi φ(xk ) + mi m N s=1 t=1
q=1
1 1
C Np √ lΣ Σ
C N Σ
CΣNp st
. lΣ + ··· N s
i Kpq . (35)
= k x ,x − ΣΣ . N3
Σ
k xi , x p
j k j q
N N2 s=1 t=1 p=1 q=1
p=1 q=1
Ni Therefore, we have
1 Σ
. Σ
− k xi , xl
NNi s k
Σ. ΣT Σ
s=1 i,j
j· φ̃lb
Z = V j ·φTt φ b = V φ̃ t
Ni C Np i,j
1
+ ··· N Σ Σ Σ . s qΣ
s=1 p=1 k x , x = K ·Λ NC −K ·IN C −1NN · K ·∆N C +1NN · K ·Γ NC .
i p l
2Ni
q=1
1 ij C Np N (36)
1 ij 1 Σ K is
i
= Klk − Σ Σ Kpq − lk
N N Ni s=1
p=1 q=1 APPENDIX V
P ROOF OF T HEOREM 5
C Np
1 Ni
Proof: Assume each class contains n training samples.
pq
ΣΣΣ . (32) Then, we can simplify between-class scatter matrix and within-
is
+ NN i K class scatter matrix as
s=1 p=1 q=1
Then, we get Σ
C
1 (mi − m)(mi − m)T
Σ T Σ Sb =
Swt = ΦTwΦt = . w Σ . t Σ C
Φ̃i,j Φ̃l,k
i,j
l,k
C
C− 1 1
= i=1
( )( )T
· mi − m m − m (37)
1 C Σ
C 1 i=1 i
= (K −K ·IN N −ΛN N ·K +ΛN N ·K ·IN N ). (33) C−1
N = Cov−
M
C
A PPENDIX IV
OF MATRIX Z
DERIVATION ΣC T
where Cov M = 1/C − 1 i=1(mi − m)(mi − m)
covariance matrix on the set of = m1, m 2 ,. .., is
mCthe.
Σ. M { }
Σ Σ Σ Σ ΣT . ΣΣ
Then, we have
ΦT Φ = Φ̃i,j T Φ̃l = Φ̃i,j Φ̃lb . (34) C
t b t b t
i,j l 1
i,j
l Sw = Σ
N (xij − m)(xij − m)T
By get
formulating the dot product in terms of the kernel function, i=
we 1 C n
n 1 1
−
. = · n − 1Σ Σ(x − m)(x − m)T
. Σ . Σ . . Σ Σ l N ij ij
Φ̃i,j
T Φ̃ l
= √1 φ xi − m T (m − m)
N C
i=1
N j=1
t b j
N
l
n−1Σ
√ = N Covi (38)
N
j l
. . Σ . iT
i=1
T
= l φ xi m −φ x m
N Σ n
− m T ml + m m
T Σ where Covi = (1/(n − 1)) Σ (xij − m)(xij − m)T is the
j
1
Nl covariance
Suppose matrix of class
we replace i.1 j=sample x
a training in class p with
j s
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 859
Σ . Σ
= √ k x i , xl p0
√ mp with mj
p. Therefore, we have
Nl Σ Σ
C Ns
. Σ
− 2
k xij, xts
C
N s=1 t=1 T
Cov M(M\mp )∪mrp = CovM − (m − mp )(m − mp )
C Ns Nl (C − 1)2
1 1
ΣΣΣ . t qΣ . (M\mp)
T
√ pΣ . (M\mp) pΣ
−··· k xs, xl + C m − mj m − mj . (39)
N 2 Nl s=1 t=1
q=1
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
After replacing xp0 with xp1 , we denote SbM(M\mp )∪mrp Therefore, we have
as Sbj . Then (Sj −S j )−(S −S )= (Sj −S )−(Sj −S )
Σ b w b w b 1b w w
C −1 C
S =S −
j (m − m )(m − m )T =− ≥m−mp2≥ Oi1 OT
p p 1 i
b b
C (C −1)2 C− 1
C −1 j 2 T
+ m
1 (M\m p)
− mp Oi2 Oi2
(M\mp) Σ . (M\mp) Σ C 2
n
. − mjp m ΣT 2 T
m + − mj p
C
= Sb − 1 +··· ≥mp −xp ≥ ∆i1∆
n−1N (n−1)
C − 1 (m − mp)(m − mp)
T
T i1
mp(X \xp0 ) −xp1 2 ∆i2∆ i .
0
C−1 Σ. ΣT − 2
+ m (M\mp) − mjp m(M\mp) − mjp . N ·n
C.
(40) (46)
2
Using Theorem 4, we have
Let Oi1 = (m − mp)/≥m − mp≥ and Oi2 = (m (M\ mp) −
mjp )/≥m(M\mp ) − mp j≥, where we assume m − mp ƒ= 0 and =λ − 1 ≥m − m ≥2 m
λj (S r −S r )
m(M\mp ) − mp j =
ƒ 0. Then, we have b w j(Sb−Sw)
C −1
p j1
1 C−1 m − mj
2 + 2
b p
T m
Sbj = S − C − 1 ≥m − m ≥ O i1
O + ···CN n (M\mp) p j2
i1 2 (n − 1) ≥mp − xp0 ≥ mj3 2
C 1 2
T
− Oi2 Oi2. (41)
+ m(M\mp) − mjp n −1 −xm
2
(47)
C − mp(X \xp0 ) p1 j4
2
Suppose we replace a training sample xp0 in class p with N ·n
another sample xp1 in class p. For Sw , we know that this Σ mji = 1, for i = 1, 2, 3, and 4. Then
where j
will only change the covariance matrix on class p, i.e., for
ƒ p, we have the following equation:
i = 1, 2, . . . , C and I = .N Σ Σ.
Σ
Cov = Cov . λj (S r −Swr ) − λj(Sb −Sw ) ..
. b
i(X \xp0 ) i N
j=1
Covi(X ∪xp1 ) = Covi 1
= . Σ Σ− ≥m − mp≥ mj1
Cov i(X \xp0 )∪xp1 = Covi. (42) . j= C − 1 2
n
p(X \xp0 )
p(X \x ) p p(X \x ) p C−
2
2
p j4
.
)∪x as .
After replacing xp with xp , we denote SwX (X \x 1
1 − mj
2
.
j
= − ≥m − m ≥ + m
SW p0 p1
2
0 1 (M\mp) p
p
. C− 1 C2
− ··· ≥m
= Cov
, thenN 0 1
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
n
+ N (n − 1)
n −1
n
n 1 p− xp0 ≥
C
2 T
j .
Sw −x . (48)
− m
Σ
i(X \xp )∪xp
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
2
p(X \xp0 ) p1
.
i=1 N ·n
n−1
=
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
Σ
Covi −
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
.
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
Σ. Σ
n− p(X \xp0 ) − xp1 mp(X \xp0 ) − xp1 . (44) Σ.
+ m λj (S r −Swr ) − λj(Sb −Sw )
N·n b
N Σ
Let ∆i1 = (mp − xp0 )/≥mp − xp0 ≥ and ∆i2 = (mp(X \xp ) − .Σ
0
xp1 )/≥mp(X \xp0 ) − xp1 ≥. Then we have
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
2 C− 1 2
+
i
2
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
j
≤ ≥m − mp≥ C2 1 p) − m p
m(M\m
C−1
Sj = Sw − n
≥mp − xp ≥ ∆i1∆
w
N (n − 1) n 2 n− 2
+ +
0 i1
n − 1 mp(X \xp0
+
N ·n
860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
) − xp1
T
∆ i2 ∆ . mp(X \xp0 ) − xp1 .
(45) ··· N·n
N (49)
(n −
1)
≥mp
− x p0
≥
2
HUANG et al.: CHOOSING PARAMETERS OF KSLDA FOR RECOGNITION OF FACE IMAGES 861
Since we assume that all training data lie in a2 hypersphere with [12] W. Zheng, L. Zhao, and C. Zou, “A modified algorithm for generalized
radius R, we know that ≥m − mp≥2 ≤ 4R , ≥m(M\m ) −p discriminant analysis,” Neural Comput., vol. 16, no. 6, pp. 1283–1297,
j 2 2 2 2 2004.
mp≥ ≤ 4R , ≥mp − xp0 ≥ ≤ 4R , and ≥mp(X \xp0 ) − [13] J. Huang, P. C. Yuen, W. S. Chen, and J. H. Lai, “Component-based
2 subspace LDA method for face recognition with one training sample,”
xp1 ≥2 ≤ 4R . Then, we have Opt. Eng., vol. 44, no. 5, p. 057 002, 2005.
[14] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,”
. N . Pattern Recognit., vol. 33, no. 11, pp. 1771–1782, 2000.
ΣΣ Σ
[15] W. Zhao, R. Chellappa, and P. Phillips, “Subspace linear discriminant
. j=1 λj(S j−S r ) − λj(S −S ) . analysis for face recognition,” Center Autom. Res., Univ. Maryland,
. b w b w . College Park, MD, Tech. Rep. CAR-TR-914, 1999.
[16] H. Yu and J. Yang, “A direct LDA algorithm for high-dimensional
1 2 C − 1 n 2 n −1
4 + 4 2+ R + 4 2
≤ R R 4 R data—With application to face recognition,” Pattern Recognit., vol. 34,
C−1 C2 N (n − 1) N·n [17]
no. 10, pp. 2067–2070, 2001.
L. F. Chen, H. Y. Liao, M. T. Ko, J. C. Lin, and G. J. Yu, “A new LDA-
. 2 1 C−1 1 based face recognition system which can solve the small sam- ple size
problem,” Pattern Recognit., vol. 33, no. 10, pp. 1713–1726, 2000.
≤ 4R C − 1 (C −
+ + 1) 2 (C − 1)(n − 1) W. Zheng, L. Zhao, and C. Zou, “An efficient algorithm to solve the
n−1 (C − 1)(n − Σ [18]
small sample size problem for LDA,” Pattern Recognit., vol. 37, no. 5,
+ pp. 1077–1079, 2004.
1)(n − 1) [19] R. Huang, Q. S. Liu, H. Q. Lu, and S. D. Ma, “Solving small sample size
2 2 problem in LDA,” in Proc. Int. Conf. Pattern Recog., Aug. 2002, vol. 3,
= 4R . + Σ pp. 29–32.
2
C − 1 (C − 1)(n − 1) [20] J. Yang, J.-Y. Yang, and D. Zhang, “What’s wrong with Fisher criterion?”
2 Pattern Recognit., vol. 35, no. 11, pp. 2665–2668, 2002.
8nR
= . (50) [21] J. Yang and J.-Y. Yang, “Why can LDA be performed in PCA transformed
space?” Pattern Recognit., vol. 36, no. 2, pp. 563–566, 2003.
(C − 1)(n − 1) [22] J. Yang, J.-Y. Yang, and A. Frangi, “Combined Fisherfaces framework,”
Image Vis. Comput., vol. 21, no. 12, pp. 1037–1044, 2003.
□ [23] J. Yang, A. Frangi, and J.-Y. Yang, “A new kernel Fisher discriminant
algorithm with application to face recognition,” Neurocomputing, vol. 56,
no. 1, pp. 415–421, 2004.
ACKNOWLEDGMENT [24] J. Yang, Z. Jin, J. Yang, D. Zhang, and A. F. Frangi, “Essence of kernel
Fisher discriminant: KPCA plus LDA,” Pattern Recognit., vol. 37, no. 10,
The authors would like to thank the U.S. Army Research pp. 2097–2100, 2004.
[25] J. Yang, D. Zhang, and J.-Y. Yang, “A generalised k–l expansion method
Laboratory for the FERET database and Yale University and which can deal with small sample size and high-dimensional problems,”
Carnegie Mellon University for providing the YaleB and CMU Pattern Anal. Appl., vol. 6, no. 1, pp. 47–54, 2003.
PIE databases. Furthermore, the authors would like to thank [26] M. Wilkes, A. Barkana, H. Cevikalp, and M. Neamtu, “Discriminative
Dr. Lu for providing the Matlab code of KDDA and V. Franc for common vectors for face recognition,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 27, no. 1, pp. 4–13, Jan. 2005.
providing the Statistical Pattern Recognition Toolbox, which [27] J. Liu and S. Chen, “Discriminant common vectors versus neighbourhood
includes the Matlab code of KPCA and GDA used in this paper. components analysis and Laplacianfaces: A comparative study in small
sample size problem,” Image Vis. Comput., vol. 24, no. 3, pp. 249–262,
2006.
REFERENCES [28] C. Liu and H. Wechsler, “Gabor feature based classification using the
enhanced Fisher linear discriminant model for face recognition,” IEEE
[1] R. Chellappa, C. Wilson, and S. Sirohey, “Human and machine recog- Trans. Image Process., vol. 11, no. 4, pp. 467–476, Apr. 2002.
nition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705–740, [29] T. Jaakkola and D. Haussler, “Probabilistic kernel regression models,” in
May 1995. Proc. Conf. AI and Statist., 1999.
[2] W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips, “Face recognition: [30] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multi-
A literature survey,” ACM Comput. Surv., vol. 35, no. 4, pp. 399–458, ple parameters for support vector machines,” Mach. Learn., vol. 46, no. 1,
Dec. 2003. pp. 131–159, 2002.
[3] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neu- [31] S. Mika, “Kernel fisher discriminants,” Ph.D. dissertation, Tech. Univ.
rosci., vol. 3, no. 1, pp. 71–86, 1991. Berlin, School Electr. Eng. Comput. Sci., Berlin, Germany, Dec 2002.
[4] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern [32] K. Schittkowski, “Optimal parameter selection in SVM,” J. Ind. Manag.
Anal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001. Optim., vol. 1, no. 4, pp. 465–476, 2005.
[5] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. [33] J.-H. Lee and C.-J. Lin, “Automatic model selection for support vector
Fisherfaces: Recognition using class specific linear projection,” IEEE machines,” Dept. Comput. Sci. Inf. Eng., Nat. Taiwan Univ., Taipei,
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. Taiwan, Tech. Rep., 2000.
[6] D. L. Swets and J. Weng, “Using discriminant eigenfeatures for [34] S. Keerthi and C. Lin, “Asymptotic behaviors of support vector machines
image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 8, with Gaussian kernel,” Neural Comput., vol. 15, no. 7, pp. 1667–1689,
pp. 831–836, Aug. 1996. Jul. 2003.
[7] B. Schölkopf, A. Smola, and K. Müller, “Nonlinear component analysis as [35] D.-Q. Zhang, S. Chen, and Z.-H. Zhou, “Learning the kernel parameters
a kernel eigenvalue problem,” MPI fur biologische kybernetik, Tubingen, in kernel minimum distance classifier,” Pattern Recognit., vol. 39, no. 1,
Germany, Tech. Rep. 44, 1996. pp. 133–135, 2006.
[8] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K. R. Müller, “Fisher [36] N. Cristianini, J. Shawe-Taylor, and A. Elisseeff, “On kernel-target align-
discriminant analysis with kernels,” in Proc. IEEE Workshop Neural Netw. ment,” in Proc. Neural Inf. Process. Syst., 2001, pp. 367–373.
Signal Process. IX, 1999, pp. 41–48. [37] H. Xiong, M. N. S. Swamy, and M. O. Ahmad, “Optimizing the kernel
[9] G. Baudat and F. Anouar, “Generalized discriminant analysis using a in the empirical feature space,” IEEE Trans. Neural Netw., vol. 16, no. 2,
kernel approach,” Neural Comput., vol. 12, no. 10, pp. 2385–2404, 2000. pp. 460–474, Mar. 2005.
[10] M. H. Yang, “Kernel eigenfaces vs. kernel Fisherfaces: Face recognition [38] J. Huang, P. C. Yuen, W. S. Chen, and J. H. Lai, “Face recognition using
using kernel methods,” in Proc. 5th IEEE Int. Conf. Autom. Face and kernel subspace-LDA algorithm,” in Proc. Asian Conf. Comput. Vis.,
Gesture Recog., 2002, pp. 215–220. Jan. 2004, vol. 1, pp. 61–66. no. 10.
[11] J. W. Lu, K. Plataniotis, and A. N. Venetsanopoulos, “Face recognition [39] H. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extrac-
using kernel direct discriminant analysis algorithms,” IEEE Trans. Neural tion by maximum margin criterion,” in Advances in Neural Information
Netw., vol. 14, no. 1, pp. 117–126, Jan. 2003.
862 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 37, NO. 4, AUGUST 2007
Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Wen-Sheng Chen received the B.Sc. and Ph.D. de-
Cambridge, MA: MIT Press, 2004. grees in mathematics from Sun Yat-Sen (Zhongshan)
[40] O. Bousquet and A. Elisseeff, “Stability and generalization,” J. Mach. University, Guangzhou, China, in 1989 and 1998,
Learn. Res., vol. 2, no. 3, pp. 499–526, 2002. respectively.
[41] W. H. Rogers and T. J. Wagner, “A finite sample distribution-free per- He is currently a Professor in the Institute of
formance bound for local discrimination rules,” Ann. Stat., vol. 6, no. 3, Intelligent Computing Science, College of Mathe-
pp. 506–514, May 1978. matics and Computational Science, Shenzhen Uni-
[42] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERET evalu- versity, Shenzhen, China. His current research
ation methodology for face-recognition algorithms,” IEEE Trans. Pattern interests include pattern recognition, kernel methods,
Anal. Mach. Intell., vol. 22, no. 10, pp. 1090–1104, Oct. 2000. and wavelet analysis and its applications.
[43] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, “From few Dr. Chen is a member of the Chinese Mathe-
to many: Illumination cone models for face recognition under variable matical Society.
lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6,
pp. 643–660, Jun. 2001.
Jian Huang Lai received the M.Sc. degree in ap-
plied mathematics and the Ph.D. degree in mathe-
Jian Huang received the B.Sc. and M.Sc. degrees in matics from Sun Yat-Sen (Zhongshan) University,
applied mathematics from Sun Yat-Sen (Zhongshan) Guangzhou, China, in 1989 and 1999, respectively.
University, Guangzhou, China, in 1999 and 2002, He is currently a Professor in the School of
respectively, and the Ph.D. degree from Hong Kong Information Science and Technology, Sun Yat-Sen
Baptist University, Kowloon, in 2006. (Zhongshan) University. He is also a Board Member
He is currently with the School of Information of the Image and Graphics Association China. His
Science and Technology, Sun Yat-Sen (Zhongshan) current research interests include image processing,
University and Hong Kong Baptist University. His pattern recognition, computer vision, and wavelet
research interests include pattern recognition, face analysis. He has published more than 50 scientific
recognition, image processing, linear discriminant articles in these areas.
analysis algorithm, and kernel methods.