Professional Documents
Culture Documents
p=0
s(g
p
g
c
)2
p
(1)
where P is the number of the neighbors around the central
pixel, R is the distance of the neighbors from the central pixel
[8].
Certain local binary patterns are fundamental properties of
textures, which can provide most of the patterns presented in
the observed textures. These fundamental patterns are called
Uniform Local Binary Patterns (ULBP). Because in circular
720
Fig. 1. The owchart of the multi-modal face recognition system, in which the fusion is implemented in the score stage.
structure these patterns contain few spatial transitions, which
can be viewed as texture micro-structures, such as spot and
edge.
B. Representation and Recognition
For 2D modality, appearance based features have been suc-
cessfully adopted into face recognition [9]. For 3D modality,
because the features in 3D coordinates, such as curvatures,
are sensitive to the data noises, recently more and more
researchers also adopt the appearance based features for face
recognition. Therefore, in both intensity and depth modalities,
we adopt ULBP for face representation. And the NN classier
is adopted to compute the matching scores.
III. FUSION IN SCORE STAGE
A. Score Normalization
Because the recognition scores from different modalities
are in different measurements, so it is necessary to map
them into the same scale rst. To overcome the inuence
of the data noises, we use a modied min-max rule for score
normalization [10]. Given a set of matching scores {S
k
},
k = 1, 2, , n, the normalized scores are given by
S
n
k
=
S
k
min
max min
(2)
where min is the minimum value estimated form {S
k
},
which means that the two images have the similar appearance.
But max is the value larger than 95% of the scores in {S
k
},
which is not the maximum value. In this way, we discard
the data noises which have large matching scores and the
accuracy of normalization is guaranteed. Therefore, we also
need a training stage to estimate these values (min and max)
for score normalization.
B. Fusion strategies
In this paper we select 6 fusion strategies to improve
the multi-modal face recognition performance, which can be
divided by two types. The simple fusion methods which are
without the training stage, such as Sum, Product, Max and
Min. And the complex fusion methods which need some pre-
prepared data training stage, such as LDA and SVM. For the
convenience of description, suppose X
j
i
is the ith sample of
class j and there are C classes of samples. The matching
scores from each modality for a sample are represented as a
feature vector X
j
i
= [s
1
, s
2
, , s
N
], where N is the number
of the modalities and s
n
is the output of each modality. The
fusion score is represented as F. The adopted fusion rules
are described in detail as follows:
1) Sum Rule:
F =
N
i=1
s
i
(3)
2) Product Rule:
F =
N
i=1
s
i
(4)
3) Max Rule:
F = MAX
N
i=1
{s
i
} (5)
4) Min Rule:
F = MIN
N
i=1
{s
i
} (6)
5) Fisher Rule (LDA): The sher rule can be summarized
as follows: Let the between-class scatter matrix be dened
as
S
B
=
c
i=1
N
i
(u
i
u)(u
i
u)
T
(7)
and the within-class scatter matrix be dened as
S
W
=
c
i=1
x
k
X
i
(x
k
u
i
)(x
k
u
i
)
T
(8)
721
where u
i
is the mean image in class X
i
, N
i
is the number
of images in class X
i
. Then the projection matrix W can be
obtained as Equ.9
J(W
opt
) = argmax
|W
T
S
B
W|
|W
T
S
W
W|
(9)
And the matching score can be obtained as
F = W X (10)
You can refer to [11] for the detailed information of Fisher
Rule.
6) Support Vector Machine (SVM): Dene the labeled data
set as (x
i
, y
i
), i = 1, , n, x R
d
, y {+1, 1}. If there
exists a hyperplane which can separates the positive samples
from negative samples. Then the points lie in the hyperplane
must satisfy the Equ.11
w x +b = 0 (11)
where w is normal to the hyperplane and b/w is the
perpendicular distance from the hyperplane to the origin. Let
d
+
(d