You are on page 1of 2

Procedure of ESBMM Algorithm

Instead of minimizing the estimation of generalization error, it is better to maximize an objective function that
explicitly depends on kernel parameters while maintaining the Eigen value stability of the EDLDA method. The
objective function adopted here is derived from the maximum margin criterion.

The ESBMM algorithm is divided into two parts, namely, eigenvalue stability analysis and objective function
maximization. The objective of the ESBMM algorithm is to choose proper kernel parameters that maximize the
0
objective function while guaranteeing a good generalization capability by not exceeding the threshold of R .
When initial values of kernel parameters  0 are given, to maximize the objective function, we need to find the search
direction to update the kernel parameters  0 . To get the search direction, we need to compute the gradient of F with
respect to kernel parameters θ. This method is gradient descent based, and an iterative procedure is required to
update the kernel parameters along the direction of maximizing the objective function.
Based on the analysis above, the detailed procedure of our proposed kernel parameter optimization algorithm
ESBMM is described as follows.

Step 1) Given threshold R0, step size ρ0, initialize iteration counter k = 0, and the initial values for the
kernel parameters  0  (10 ,  20 ,..... N0 ), where Nθ is the number of kernel parameters to be used in the RBF
kernel function.
Step 2) Using the EDLDA method, find the projection matrix
Step 3) Construct the objective function.
Step 4) Update the parameter set  by a gradient step such that the objective function is maximized as
follows:
k
Compute the gradient D to get the search direction along the direction of maximizing the objective function
Dk  (F / 1, F /  2, ........F /  n ). Compute the bound R using R=max|| yi  E (Y ) ||, i  1,2...., N ,
yi  W 0  ( xi ) i=1,2…..,N , is the projected sample vector using
T
where N is the number of training samples, and
0 
kernel based LDA methods. If R> R , the go to step (5) Compute the unit vector D̂K of the gradient vector
Dˆ K  DK / DK .
Let     p 0 D̂K and k=k+1. Go to step (2).
Step 5) Terminate and Output results.

Fig.1. Images of two persons from the FERET database.


0
The threshold R in our experiments is set to 5e − 3. For the initial value of the kernel parameters, we select the
mean distance of all the training vectors as follows:
N
1
10   20  ....   N0  .|| xi  x j || 2   0
N .( N  1)) i , j 1
and the step size 0 is set to  0 /100 in our experiments reported in Section IV.

EDLDA ALGORITHM:

Compared with the original DLDA algorithm, the Fisher Criterion in the CwDLDA with the weighted estimation of
between and within class scatter matrices is more related to the minimization of classification error. However, since
it is still a suboptimal solution to approximate the Bayes error criterion, the extracted features may not retain the
complete discriminatory power. Hence, we consider assigning each extracted feature with different weights
f i  f i1, f i 2, .... f iN )  R N , I  1,....., m K i , i  1,....., m. The mutual information is a nonlinear metric to evaluate
the correlation between two random variables, which has demonstrated good performance in measuring the salience
of features (Kwak & Choi, 2002). Therefore, in our paper, we choose the normalized mutual information I( f i ,C )
between f i and the corresponding class label C={ ci , i  1,....., N | ci {1,....., C}} to weight features As
suggested in (Kwak & Choi, 2002), we evenly discretize each feature into 10 equal intervals
[i  2*  i , i  2*  i ] , where  i and  i are the mean and standard deviation of f i respectively and then
count the sample frequency in each interval as the probability.
f i ,C ) is calculated as:I( f i ,C ) =  s1 t 1 P( f it , s) log( P( f it , s) /( P( f it ) P(s)))
C 10
Therefore I (

where P( f it , s) is the joint probability of f it , and s and f it is the t th interval of feature f i .Consequently ,
K i can be expressed as: K i = I( f i , C ) /  j 1 I ( f i , C ), i  1,....., m If mutual information between f i and
m
C is

large, it means that f i and C are closely related, and vice versa. By combining the above step with the CwDLDA
solution, an Enhanced Direct LDA (EDLDA) solution is devised.

The outline of the EDLDA method


1. Find a set of C  1 orthonormal eigenvectors 1 and the corresponding non-zero eigenvalues of the
n
between class scatter matrix sb defined in the original data space R Remove the null space of sb by projecting all
samples onto R C 1 ,i.e.  R n  1T x  R C 1 .
2. Calculate S w in the reduced space R C 1 .If S w is not singular, S w1 is computed, otherwise it is re-
estimated as S wnew and the inverse of S wnew is computed to replace S w1 .
C 1
3. Calculate, in R the weighted between-class and
sb and ~s w .
within-class scatter matrices ~
~ ~
sb by:  T2 ~
4. Whiten sb  2  1  11/ 2 T2 Sb  2 11/ 2  I C 1xC 1 and project all the samples by
~
transformation matrix 1  2 .Thus we have ~
1/ 2 T
s w = (11/ 2 T2 )S w ( 2 11/ 2 )
~ ~
5. Diagnalize ~s w by T3 ~sw3   2 and take the m eigenvectors  3 and  3 corresponding to the m
smallest eigen values. Therefore the total projection transformation is obtained as:
1 / 2
xR   
n T
3 1   x  R which transforms data from R to R .
T
2
T
1
m n m

6. Calculate the weights K i , i  1,....m for each of the extracted features f i  R N , i  1,...m .

You might also like