Professional Documents
Culture Documents
LIVIU VLADUTU
Dublin City University
School of Computing
Glasnevin, Dublin 9
IRELAND
lvladutu@computing.dcu.ie
Abstract: The recognition of humans and their activities from video sequences is currently a very active area of
research because of its many applications in video surveillance, multimedia communications, medical diagnosis,
forensic research and sign language recognition. Our system is designed with the aim to precisely identify human
gestures for Irish Sign Language (ISL) recognition. The system is to be developed and implemented on a standard
personal computer (PC) connected to a colour video camera. The present paper tackles the problem of shape
recognition for deformable objects like human hands using modern classification techniques derived from artificial
intelligence.
N
C X
h ≤ min([D2 /∆2 ], n) + 1 (6)
X
J= (uij )m d2 (xj , ci ) (3) where [x] denotes the integer part of x.
i=1 j=1
where uij ∈ [0, 1] for all i and obeys the con- In this case, we can find an optimal weight vector w0
straints defined in the equation below: such that kw0 k2 is minimum (in order to maximize the
margin ∆ = kw20 k of Theorem 1) and yi ·(w0 ·xi +b) ≥
1, i = 1, . . . , l.
C
X The support vectors are those training examples
uik = 1, 1 ≤ k ≤ n that satisfy the equality, i.e. yi · (w0 · xi + b) = 1.
i=1 They define two hyperplanes. The one hyperplane
n
X goes through the support vectors of one class and the
0< uik < n, 1 ≤ i ≤ c other through the support vectors of the other class.
k=1 The distance between the two hyperplanes is maxi-
mized when the norm of the weight vector kw0 k is
where C is the number of clusters (3 as explained
minimum. This minimization can proceed by maxi-
above) and d is the distance norm (like Euclidean,
mizing the following function with respect to the vari-
Manhattan aso). The minimization of the objective
ables αi (Lagrange multipliers) [20]:
function with respect to membership values leads to
the following:
l l l
1
X 1 XX
uij = P (4) W (α) = αi − αi · αj · xi · xj · yi · yj (7)
C d2 (x ,c )
j i
1 2
k=1 ( d2 (xj ,c ) )
m−1 i=1 i=1 j=1
k
And the minimization of the objective function subject to the constraint: 0 ≤ αi . If αi > 0 then
with respect to the center of each cluster will gives xi corresponds to a support vector. The classification
us: of an unknown vector x is obtained by computing
PN m l
j=1 (uij ) xj
X
ci = PN (5) F (x) = sgn{w0 · x + b}, where w0 = αi · yi · xi (8)
m
j=1 (uij ) i=1
In the equation above m ∈ [1, ∞] is the fuzzifier. and the sum accounts only Ns ≤ l nonzero sup-
Therefore, in the end the algorithm looks like in the port vectors (i.e. training set vectors xi whose αi are
Table 1. nonzero). Clearly, after the training, the classifica-
tion can be accomplished efficiently by taking the dot
2.4 Short introduction to SVM product of the optimum weight vector w0 with the in-
put vector x.
In the current framework of Machine Learning and
l
data understanding we considered the approach de- 1 X
rived from statistical learning theory of SVM (support minimize w · w + C · ξi (9)
2
vector machines). i=1
Suppose we are given a set of examples The case that the data is not linearly separable is
(x1 , y1 ), . . . , (xl , yl ), xi ∈ X, yi ∈ {±1} and we as- handled by introducing slack variables (ξ1 , ξ2 , . . . , ξl )
sume that the two classes of the classification problem with ξi ≥ 0 [21] such that, yi (w · xi + b) ≥ 1 − ξi , i =
are linearly separable. 1, . . . , l. The introduction of the variables ξi , allows
misclassified points, which have their corresponding
Table 2: Generalization Performance of mySVM clas-
ξi > 1. Thus, li=1 ξi is an upper bound on the num-
P
sifier with polynomial kernels.
ber of training errors. The corresponding generaliza-
Current Performance Description of
tion of the concept of optimal separating hyperplane is Experiment Vector the experiment
obtained by the solution of the optimization problem RS1 6.03% Region-shape only
given by equation (9) above, subject to: RS1 & CLD1 6.39% Region-shape and CLD
coeffs real images
yi · (w · ξi + b) ≥ 1 − ξi and ξi ≥ 0, i = 1, . . . , l (10)
RS2 7.64 % Region-shape coeffs for
The control of the learning capacity is achieved real and virtual signers
by the minimization of the first term of (9) while the
purpose of the second term is to punish for misclassi-
have used a Java version ([15]) of an implementation
fication errors. The parameter C is a kind of regular-
of the Support Vector Machine called mySVM devel-
ization parameter, that controls the tradeoff between
oped by Stefan Rüping. It is based on the optimization
learning capacity and training set errors. Clearly, a
algorithm of SV M light as described in [16]. mySVM
large C corresponds to assigning a higher penalty to
can be used for pattern recognition, regression and
errors.
distribution estimation. In order to cope with the rela-
Finally, the case of nonlinear Support Vector Ma-
tively small number of examples, a cross-validation
chines should be considered. The input data in this
(see, [19] with a factor of 25 was chosen. Sev-
case are mapped into a high dimensional feature space
eral types of kernels were tested (neural, polynomial,
through some nonlinear mapping Φ chosen a priori
Anova, epanechnikov, gaussian-combination, multi-
[20]. The optimal separating hyperplane is then con-
quadric, based on radial-basis functions) but, our ex-
structed in this space. As shown at 5.5 in [13] the
perience [10] is once more confirmed, that (most prob-
corresponding optimization problem is obtained from
ably) the SVM-classifiers based on polynomial ker-
(7) by substituting x by its mapping z = Φ(x) in the
nels are the best for classification problems. There-
feature space, i.e. is the maximization of W (α):
fore, all the results expressed in the table 2 correspond
l l l to the polynomial-kernel supervised learning.
X 1 XX
W (α) = αi − αi · αj ·
2
i=1 i=1 j=1
(Φ(xi ) · Φ(xj )) · yi · yj
3 Experimental results
The experience acquired in the group has shown that
subject to: there are many factors that can influence the qual-
l ity of image understanding, like: the differences be-
X
yi · αi = 0 tween signers clothing, between the lighting sources,
between the skin of the humans or due to the mo-
i=1
tion blur. Therefore the first step was to compare the
∀i : 0 ≤ αi ≤ C
classification performance of our algorithm for two
classes of input data, only 35 coefficients (of the RS-
2.5 The proposed classifier descriptor), or 47 coefficients (of the RS and CLD de-
scriptors). The performance vector (overall) resulted
The number of training examples is denoted by l.In from the confusion matrix of the results is presented
our case l was always 280 (10 frames for each of in the table 2, and it shows that by adding the 12-extra
the 28 one-handed gestures of ISL). α is a vector coefficients corresponding to CLD, the classification
of l variables, where each component αi corresponds error is only slightly increased (by 0.35%). That will
to a training example (xi , yi ). xi represents the fea- allow to gather more information in our training and
tures vector which is formed by either 35 (only the testing database (more real and virtual signers) and
Region-shape coefficients) or 47 coefficients corre- to quantify the differences enumerated above in only
sponding to both the Region-shape (RS) and the CLD few coefficients at virtually no expenses. The second
descriptors. A fast Windows implementation (.dll’s) step of our experiment used the same number of im-
for the extraction of the video descriptors was chosen, ages but, for the same letter/ sign expressed were used
([17]) that can be included in our final real-time Sign- ten static images and ten images extracted from the
Language understanding system. Although there are Poser-generated video-stream- corresponding to the
many available implementations in several program- same sign, like in the Figure 1. The performance is
ming languages (like Matlab, C++, Java, Lisp aso), I given in the third line of the table.
4 Conclusions [8] http://standards.iso.org/ittf/PubliclyAvailableStan
dards/MPEG-7 schema files/
The results explained in the previous sections show
[9] Anupam Joshi, Sansanee Auephanwiriyakul,
that a limited of gestures (executed by a human or a
and Raghu Krishnapuram, On Fuzzy Clustering
robot...) can be learned and understood by combining
and Content Based Access to Networked Video
the shape recognition (hand shapes playing the role of
Databases, Proceedings of the Workshop on Re-
letters in an alphabet) detailed in the current work with
search Issues in Database Engineering, Page:
an understanding of the gesture dynamics represented
42–47 ,1998.
in the feature space (like PCA). In this latter approach,
[10] S. Papadimitriou, L. Vladutu, S. Mavroudi and
the images are represented in the PCA-space and the
A. Bezerianos, Ischemia Detection with a Self-
gestures are represented in a nonlinear manifold, see
Organizing Map Supplemented by Supervised
[18]. The fast procedure exposed- it takes approxi-
learning, IEEE Transactions on Neural Net-
mately 10 msec (average classification time), and ap-
works 12 (2001), 503–515.
prox. 1 second for the VDE-feature extraction on a
PC (2.4 GHz) it’s considered to be a good choice for [11] Wu Hai and Alistair Sutherland, Irish Sign
other researchers in the field. Language Recognition Using Hierarchical PCA,
Irish Machine Vision and Image Processing
Acknowledgements: Conference (IMVIP 2001), National University
The research was supported by the Science Founda- of Ireland, Maynooth, 5–7 September 2001.
tion of Ireland (SFI) and the author is also thank- [12] G. Awad, J. Han and A. Suther-
ful to Dr. Sara Morrisey for the database of syn- land, Hand and face tracking demo:
thetic video streams and to Lecturer Alistair Suther- http://www.youtube.com/watch?v=exbGdHpFiW0
land (both from Dublin City University)- for being a [13] Vladimir Vapnik, The Nature of Statistical
good group leader. Learning Theory, 2nd Edition, Springer –Verlag,
Berlin–Heidelberg–New York–Tokyo 2000.
[14] ”The Standard Dictionary of Irish
References: Sign Language”, by microBooks
[1] Poser official site by e-Frontier America Inc : Ltd. on DVD,2006 by NAD,
http://www.e-frontier.com/. www.deafhear.ie/documents/pdf/1951.pdf.
[2] Junwei Han, George Awad, Alistair Suther- [15] Open-Source Data Mining with the
land, and Hai Wu, Automatic Skin Segmen- Java Software RapidMiner, http://rapid-
tation for Gesture Recognition Combining Re- i.com/content/blogcategory/38/69/
gion and Support Vector Machine Active Learn- [16] Joachims, Thorsten, Making large-Scale SVM
ing, Proceedings of the 7th International Con- Learning Practical, In Advances in Kernel Meth-
ference on Automatic Face and Gesture Recog- ods - Support Vector Learning, chapter 11, MIT
nition, 2006, pp. 237–242. Press, 1999.
[17] Giorgos Tolias, Visual Descriptor Applications,
[3] George M. Awad, A Framework for Sign Lan-
Semantic Multimedia Analysis Group- NTUA,
guage Recognition using Support Vector Ma-
Greece: http://image.ntua.gr/smag/tools/vde.
chines and Active Learning for Skin Segmenta-
tion and Boosted Temporal Sub-units, PhD The- [18] L. Vladutu, A. Sutherland, Gesture Analysis
sis, Dublin City University, 2007. of Deaf People Language using nonlinear anal-
ysis of manifolds, , July 2007, SFI Conference,
[4] I. T. Jolliffe, Principal Component Analysis, Dublin, Ireland.
Springer Verlag, 2002. [19] Kohavi, Ron, ”A study of cross-validation and
[5] J. C. Dunn, A Fuzzy Relative of the ISODATA bootstrap for accuracy estimation and model se-
Process and Its Use in Detecting Compact Well- lection”, Proceedings of the 14th International
Separated Clusters, Cybernetics and Systems: Joint Conference on Artificial Intelligence 2
An International Journal, 1087-6553, Volume 3, (12): 1137-1143,(Morgan Kaufmann, San Ma-
Issue 3, 1973, pp. 32 - 57. teo),1995.
[6] J.C. Bezdek, Pattern Recognition with Fuzzy [20] V. N. Vapnik, Statistical learning theory, Wiley-
Objective Function Algorithms, Plenum Press, Interscience, 1998.
New York, 1981 [21] C. Cortes and V. Vapnik, Support vector net-
[7] J. Kovac, P. Peer and F. Solina, Human Skin works, Machine Learning 20:3,1995, pp. 273–
Colour Clustering for Face Detection, Proc. of 297.
EUROCON 2003, Finland, pp. 144–148.