Professional Documents
Culture Documents
a r t i c l e
o
i n f
Article history:
Available online 12 December
2013
Keywords:
Depth
image
Alignment
Interpolati
on
Resamplin
g Face
image
a b s t r a c t
Face, gender, ethnic and age group classication systems often work through an
alignment, feature extraction, and identication pipeline. The quality of the alignment
process is thus central to the perfor- mance of the identication process. Furthermore,
missing portions of depth information can greatly affect results. Appropriate image
reconstruction is therefore crucial for the correct operation of those systems. This paper
presents a simple and effective approach for the automatic alignment and
reconstruction of damaged facial depth images. By using only four facial landmarks and
the raw depth data, our approach converts a given damaged depth image into a
smooth depth function, performs the 3D alignment of the underlying face with the face
of an average person, and produces an aligned depth image having arbitrary resolution.
Our experiments show that the proposed approach outperforms commonly used methods.
For instance, we show that it improves the quality of a state-of-art gender classication
technique.
2013 Elsevier B.V. All rights
reserved.
1. Introduction
The ability to retrieve information from facial depth
images has many practical applications including face
recognition, age group estimation, gender and ethnic
group classication. Unfortunately, depth data is often
damaged due to limitations intrinsic to off- the-shelf
depth-image
capturing systems. Examples include,
but are not limited to, depth shadowing and the
inuence of reective, refractive and infrared absorbing
materials in the scene (Zhu et al., 2008). Also, the
amount of pixels covering the imaged face and faces
orientation often vary from image to image, making
difcult or even impossible the use of captured images
without the proper alignment and reconstruction of
depth data (Szeliski, 2010).
Virtually every computer vision researcher that
needs to perform alignment and reconstruction of facial
depth data usually presents its own solution to the
problem. A well-known technique is to identify some
facial features by curvature, and compute the
alignment based on them (Moreno et al., 2005).
Solutions based on principal component analysis (PCA)
have also been proposed (Stormer and Rigoll, 2008).
However, most of the attempts do not make proper
use of depth information while performing the
alignment, restricting the solution to the 2D image
plane. Also, lin- ear interpolation is commonly used to
2629 5669.
E-mail
addresses:
gtaveira@ic.uff.br
laffernandes@ic.uff.br (L.A.F. Fernandes).
(G.
Taveira),
Fig. 1. For the same subject: (a) the original color image, (b) the original damaged depth image, and (c) the image with reconstructed depth
information produced by our technique. Six aligned and reconstructed depth images of different subjects are presented in (d). Images (b), (c) and
(d) are presented in false-color, where dark red pixels denote the surface closest to the camera. Notice in (c) the smooth transition of depth values
in the originally corrupted portions (navy blue pixels in (b)). (For interpretation of the references to colour in this gure legend, the reader is
referred to the web version of this article.)
2. Related work
This section discusses the use of TPS on the
interpolation of facial color images, the use of
interpolation schemes to resample facial depth data,
and alignment schemes for aligning human body
surfaces.
Rosen (1996) developed the Java applet entitled
AlexWarp. Since its creation, the applet has gained
popularity among internet users world-wide for its
simple and fast method of facial image warping.
When the user provides one pair of landmark points,
the applet determines the region to warp, warps it, and
then out- puts the warped picture. One major drawback
of the AlexWarp ap- plet is that transformations can
only be applied one at a time. The AlexWarp applet
works on colored images in the 2D domain with a
limited number of control points.
180
Subjects
120
100
80
120
Proposed
Approach ICP +
2D
+ Linear
Linear
160
140
120
100
80
60
60
40
40
20
20
Proposed
Approach ICP +
2D
+ Linear
Linear
100
Subjects
140
180
Proposed Approach
ICP + Linear
2D + Linear
Subjects
160
80
60
40
20
0
0
0.7
0.1
0.2
> 0.8
0.3
0.4
0.5
0.6
0.75
1.5
2.25
3.75
4.5
5.25
>6
Squared error
x 10
0.75
1.5
2.25
Squared
error
3.75
4.5
5.25
x 10
>6
5
Fig. 2. Histograms showing the distribution of (a) minimum, (b) mean and (c) maximum squared error values computed from the depth values
of a reference image and images produced using the proposed alignment and reconstruction approach (blue), a common 3D ICP-based
alignment method with linear reconstruction (green), and a common 2D alignment technique with linear reconstruction (red). Notice that the
error values of the proposed approach are smaller than those of the common approaches. (For interpretation of the references to colour in this
gure legend, the reader is referred to the web version of this article.)
no
min vno
hac
v D;
where wac and hac are, respectively, the width and height
of the in- put image having pixel coordinates u 2 1; wac
] and v 2 1; hac ]. uD and v D
are computed,
respectively, as:
.
.
uD min d1:5 dle;re=2e; wac uno
and
vD min
.
v no ;
Fig. 3. Image cropping. (left) A grayscale visualization of an original depth image (640 480) before any cropping is done. The distances dle;re and
dno;ch are proportional to the cropping rectangle. (right) A scaled example of a cropped image (195 293). See (1) for details. Lighter shades of
gray correspond to points closer to the camera.
coordinate
height function that can be evaluated at a
given u; v
in order to retrieve the respective scalar value z that
best describes the height surface passing through N
non-overlapping control
T
points having coordinates u; v ; z . In this paper, u and v are coordinates of valid pixels (i.e., pixels storing valid depth
information), and z is the associated depth value. It is
important to notice that a TPS is adjusted to an
unstructured set of control points, and also
T
position, returnthat it can be evaluated at any real
valued u; v
ing a smoothly interpolated z value. Thus, it is clear
that sub-pixel sampling and damaged facial depth
reconstruction are naturally handled by the TPS-based
interpolation scheme adopted in our work.
Fig. 4. An example of how the blocks in the last row and in the last
A TPS is described by 2 N 3 parameters, which
column may be smaller in size than the rest of the blocks in the
include six global afne motion parameters and 2 N
image. In the proposed algorithm, each block may independently
coefcients for correspon- dences of the control points.
grow in size until enough control points are within its boundaries (see
These parameters are computed by solving a linear
the blocks in the lower right corner of the image).
system having a closed-form solution. Due to the large
number of parameters, the computation of a single
TPS to all valid pixels in a cropped image may be
dened in the standard space (i.e., the 3D space where
all faces will be aligned). The transformation for a
unfeasible. We avoid such an issue by dividing the
given imaged face is com- puted from the location of
cropped image into adaptive blocks having a small
four landmarks in the actual coordinate frame (namely,
number of control points, and t a different TPS to
left eye (le), right eye (re), nose (no) and chin (ch)) and
each one of the blocks. Such an approach has two
the equivalent locations in the standard coordinate
advantages:
frame. In
(i) it allows our technique to handle images having
the following equations, each location is represented
arbitrary size; and (ii) the procedure is less prone to
by a point
numerical instability.
The adaptive blocks are initially distributed uniformly
over the
cropped image as a regular grid comprised by square
entries having xed size (most blocks in Fig. 4). However, since
depth data
may be damaged, some of the blocks may not contain
enough con- trol points to dene a TPS. In such a case,
we incrementally change
the size of an ill-dened block by including a ring of
surrounding pixels in it. An ill-dened block grows until it
has enough valid pixels to solve the linear system of equations that
computes the coefcients of
of the TPS (see the blocks in the lower right
corner
Pac;F
.
x
a
c; F
where Q
; yac;F .ac;F
zac;
;z
ac;F
u ;
vF
.
;
ac;F
Fig.
4). The
TPS assigned to a block ts the points
covered
by the
original
blockregion.
size and the points inside the
overlapping
However, after the TPS has been tted, the evaluation
of the smooth surface related to a block is performed
only inside the original coverage of that block.
f mu c
B
K@ 0
0
ou
f mv oA
v
0
1
C
:
MQP
MP
and
O
; Pst le
ac;
le
P Pac;re Pac;le
Q Pst;re Pst;le
Pac;no Pac;le
Pst;no Pst;le
Pac;ch Pac;le ;
Pst;ch Pst;le :
7
8
Pst MPac O:
0
1
0
1
wst leye
Pst;le 1 B wst leye
C
B
C
hst lchin ;
Pst;re1 hst lchin ;
A
A
2@
2@
0
0
0
st
1
0
st 1
C1 B h
1 B
w
wst
C
P
lchin
st;ch
h
2
l
st
l
2
2
@
A;
0
Pst;no
chin
nose
A;
ltip
Pac M
O;
Pst
Microsoft
Visual Studio
as dynamic link libraries
hst
118.
How-
TF TM
accuracy
FF
TF
FM
TM
10
TFR
TF
TF
FM
and
TMR
TM
TM
FF
11
+ Linear
3D ICP
Interpolation
Alignment
> 600
500
500
500
500
400
400
400
400
300
300
300
300
200
200
200
200
100
100
100
100
0
1200
> 600
1000
500
500
500
800
400
400
400
600
300
300
300
400
200
200
200
200
100
100
100
0
4
x 10
0
4
x 10
2.6
0
3
x 10
6.0
> 600
+ Linear
2DInterpolation
Alignment
0
3
x 10
Min.: 2724.35 | MMax.:
ax.:
111556699..7766
11.0
3.0
Min.: 117104.28
7104.28 | Max.: 30507.77
10.0
9.0
8.0
5.5
2.4
2.2
2.8
2.6
5.0
4,5
2.0
2.4
7.0
4.0
1.8
5.0
2.0
4.0
1.8
1.6
3.0
1.4
2.5
1.2
3.0
2.0
Fig. 5. Color visualization of the squared error on the nose region. (top) Using our method. (center) Using ICP for 3D alignment and linear
interpolation. (bottom) Using 2D alignment and linear interpolation. Notice the difference of maximum error values. (For interpretation of the
references to colour in this gure legend, the reader is referred to the web version of this article.)
180
180
120
Subjects
Subjects
Subjects
Original Landmarks
160
Original +
Landmarks
+ Noise
( ( = 5)
Original Landmarks
Landmarks + Noise ( = 1) Landmarks + Noise
( = 2) Landmarks
Noise ( = 3) Landmarks + Noise ( =Landmarks
4) Landmarks
+ Noise
160
= 1) Landmarks +
100
Landmarks
+
Noise
(
=
140
Noise
(
=
2)
140
1) Landmarks + Noise (
Landmarks + Noise (
= 2) Landmarks + Noise
120
= 3) Landmarks +
120
80
( = 3) Landmarks +
Noise ( = 4)
Noise ( = 4) Landmarks
100
Landmarks + Noise (
100
+ Noise ( = 5)
= 5)
60
80
80
60
60
40
40
20
20
0
40
20
0
0
1.25
> 10
2.5
3.75
Squared error
6.25
7.5
8.75
x 10
-7
0
0.75
>6
1.5
2.25
Squared
error
3.75
4.5
5.25
4
x 10
0.75
1.5
2.25
Squared
error
3.75
4.5
5.25
x 10
>6
Proposed Approach
Fig. 6. Histograms showing the distribution of (a) minimum, (b) mean and (c) maximum squared error values computed from the comparison of
images aligned using the original location of facial landmarks provided by the database, and locations corrupted by Gaussian noise with mean 0
and standard deviation (r) ranging from 1 to 5.
Fig. 7. The results of (a) PGA and (b) SWPGA. Notice how the TPS provides a higher accuracy value than the nearest-neighbor, linear and naturalneighbor interpolations.
Acknowledgments
This
work
was
sponsored
by
FAPERJ
(E26/111.468/2011). Giancarlo was sponsored by a CAPES
fellowship. We thank the Computer Vision Research
Laboratory of the University of Notre Dame for the
database used in this research. We thank Wu, Smith and
Hancock for kindly providing the implementation of their
gen- der classication technique, and the anonymous
reviewers
for their
comments
and
insightful
suggestions.
References
Azouz, Z.B., Rioux, M., Shu, C., Lepage, R., 2004. Analysis of human
shape variation using volumetric techniques, In: Proc. of CASA, pp.
197206.
9
0
Besl, P.J., McKay, N.D., 1992. Method for registration of 3-D shapes, In:
Proc. of IEEE Trans. Pattern Anal. Machine Intell., International
Society for Optics and Photonics. pp. 239256.
Bookstein, F.L., 1989. Principal warps: thin-plate splines and the
decomposition of deformations. IEEE Trans. Pattern Anal. Mach.
Intell. 11, 567585.
Chang, K., Bowyer, K., Flynn, P., 2003. Face recognition using 2D and
3D facial data, In:
ACM
Workshop
on
Multimodal
User
Authentication, pp. 2532.
Guo, H., Jiang, J., Zhang, L., 2004. Building a 3D morphable face model
by using thin plate splines for face reconstruction, In: Proc. of
SINOBIOMETRICS, pp. 258267. Hartley, R.I., Zisserman, A., 2000.
Multiple View Geometry in Computer Vision.
Cambridge University Press.
Kakadiaris, I.A., Passalis, G., Toderici, G., Murtuza, M.N., Lu, Y.,
Karampatziakis, N., Theoharis, T., 2007. Three-dimensional face
recognition in the presence of facial expressions: an annotated
deformable model approach. Pattern Analysis and Machine
Intelligence 29, 640649.
Kohavi, R., 1995. A study of cross-validation and bootstrap for
accuracy estimation and model selection, In: Proc. of IJCAI, pp.
11371143.
Kohavi, R., Provost, F., 1998. Glossary of terms. Mach. Learn. 30, 271
274.
Matthews, B., 1975. Comparison of the predicted and observed
secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta
405, 442.
Moreno, A.B., Sanchez, A., Velez, J.F., Diaz, F.J., 2005. Face recognition
using 3D local geometrical features: PCA vs. SVM, In: Proc. of ISPA,
pp. 185190.
Perakis, P., Theoharis, T., Passalis, G., Kakadiaris, I.A., 2009. Automatic
3D facial region retrieval from multi-pose facial datasets, In: Proc.
of Eurographics Workshop on 3D Object Retrieval, pp. 3744.
Policarpo, F., Oliveira, M.M., Comba, J., 2005. Real-time relief mapping
on arbitrary polygonal surfaces, In: Proc. of ACM SIGGRAPH I3D,
pp. 155162.