You are on page 1of 5

Two-Handed Hand Gesture Recognition for Bangla

Sign Language using LDA and ANN


Rahat Yasir

Riasat Azim Khan

Computer Science and Engineering


North South University
Dhaka, Bangladesh
rahat.anindo@live.com

Computer Science and Engineering


North South University
Dhaka, Bangladesh
riasat2274@gmail.com

AbstractSign language detection and recognition (SLDR)


using computer vision is a very challenging task. In respect to
Bangladesh, sign language users are around 2.4 million [16]. In
this paper, we try to focus for communicating with those users by
computer vision. In this respect, an efficient method is proposed
consists of some significant steps and they are, skin detection,
preprocessing, different machine learning techniques like PCA
and LDA, neural network for training and testing purpose of the
system. Various hand sign images are used to test the proposed
method and results are presented to provide its effectiveness.

were implemented in different fields. Some of the approaches


were vision based approaches, data glove based approaches,
soft computing approaches like Artificial Neural Network,
Fuzzy logic, Genetic Algorithm and others like PCA,
Canonical Analysis, etc. Many researchers [1-3] used skin
filtering technique for segmentation of hand.. Fang [9] used
Adaptive Boost algorithm which could not only detect single
hand but also the overlapped hands. In [10-11] external aid
like data gloves, color gloves were used by the researchers for
segmentation purpose.

KeywordsYCbCr, PCA, LDA, Back propagation Neural


Network, sign language, hand gesture.

Swapnil [16] in his paper face detection using color model


used YCbCr algorithm for detection skin color from the
image. We have used a similar process to detect skin and
extract it from the image to recognize the hand gesture of the
image. In Sign Language Finger Alphabet Recognition from
Gabor-PCA Representation of Hand Gestures[4] by Amin,
M.A. et.al where classification is performed with fuzzy-cmean clustering on a lower dimensional data which is acquired
from the Principle Component Analysis (PCA) of Gabor
representation of hand gesture images. More methods of hand
gesture recognition based on Gabor filters and support vector
machine (SVM) can be found in Vision-Based Hand Gesture
Recognition Using PCA + Gabor Filters and SVM [5] by
Deng-Yuan Huang et.al. In Hand alphabet recognition using
morphological PCA and neural networks by Lamar, M.V. et.al
has used a morphological principle component analysis for
pixel positions for the description of the hand postures from
colored images were discussed.

I.

INTRODUCTION

According to sociolinguistic survey deaf, dumb and sign


language users are neglected by society. In time to time, the
scenario becomes changed. In the modern world, they are also
treated as imaginative, creative and as intelligent as any other
normal human being. But their disabilities are the main
obstacle to lead a normal social life. According to study,
instead of verbal communication, the deaf and dumb people
use sign language, which is a visual form of communication
including the combination of hand shapes, orientation and
movement of the hands, arms or the body, and facial
expressions. Sign language is the organized collection of
gestures. Gestures are usually understood as hand and body
movement which can pass information from one to another. In
this paper we work with two hand gesture. Very few works
have been done in the field of Bangla sign language detection.
However, Bangla- sign language users community is the
largest community among the language based minority
communities in Bangladesh.
We focused our research on deaf and dump that live in
Bangladesh. This paper proposed a framework for Bangla
SLDR. We have proposed a system which is able to recognize
the various alphabets of Bangla Sign Language for HumanComputer interaction giving more accurate results at least
possible time.
II.

RELATED WORK

Different approaches have been used by different


researchers for recognition of various hand gestures which
978-1-4799-6399-7/14/$31.00 2014 IEEE.

III.

METHODOLOGIES

The algorithms we have chosen for implementation are


Principal Component Analysis (PCA) and Linear
Discriminant Analysis (LDA). The basic idea behind the
PCA is dimensionality reduction, transforming the images to a
subspace which is uncorrelated. Whereas Linear Discriminant
works to minimize the In-class scatter while maximizing the
cross-class distance. We have used skin color detection for
preprocessing of the images. The images that we have used for
training and testing purpose is collected by us. We have used
Neural Networks for testing which feature extraction scheme
is better.

The process of Two-Handed Hand Gesture Recognition


system typically consists of four phases:

V.

YCBCR ALGORITHM FOR SKIN DETECTION

THE MOSTLY USED COLOR SPACE IS YCBCR [14,15]. THE


CONVERSION FROM RGB TO YCBCR IS DONE USING FOLLOWING
EQUATIONS:

a) Skin Extraction
b) Feature Extraction
c) Training and
d) Recognition
In training phase a person has to provide sample image of
his hand gesture so that the reference template model or
database can be built. In training phase images we have
collected is 100X100 pixel images and we have collected hand
gesture of 15 Bangla letters from 22 different people. So, in
total we have collected total 330 sample images. Then we
detect skin from that image and extract the features of these
330 images and saved into a database.

Y=0.299R+ 0.58G+0.4B
(5)
=128+ (-0.169R + 0.331G+0.5B)
(6)
=128+ (-0.5R - 0.419G- 0.081B)
(7)
Experimental results show that skin pixel has Cr value about
100 and Cb value about 150. Pixel is classified as skin or nonskin pixel using Eq. (8).

Input RGB Image


Skin detection using YCbCr Algorithm
2 bit gray scale conversion of image
LDA
Fig. 3. Skin detection of (a), (b), (c), (d), (e), (f) word images.

ANN for training the Dataset


Testing and Validation using Different Dataset

Results are shown in Figure 3. The area of the skin is


detected and is colored blue as shown in the figure. But if we
see the empty picture then it cannot find any skin pixels. It
took only 0.78 second time to check for skin pixels in an
image.

Decision Making
FIG. 1. WHICH COMPRISES THE WHOLE PROCEDURE OF THIS SYSTEM.

IV.

Fig. 4. Normal Background Image.

RGB IMAGE DATASET

We have collected images for 15 Bangla letters from 22


different people. So in total we have collected 330 RGB
images for developing a new dataset of Bangla sign language.
All the images were 561X290 pixels in size so at first we have
converted each image into 100X100 sizes.

Fig. 2. Bangla sign language of bangle letter (a), (b), (c), (d), (e), (f)

VI.

GRAY SCALE CONVERSION

After detecting the skin pixels from the image, we have


converted the test image into binary gray scale image. We
have assigned 0 values for skin pixels and 1 value for other
background pixels. Then the images we get for , , , , ,
word images is below,

Fig. 5. (a), (b), (c), (d), (e), (f)word images in Binary.

After converting into gray scale image we can see that our
process has successfully extracted the hand gesture from that
image. If we convert the background image into gray scale
image that we can see that there is no skin pixel or no sign of
hand in that image and the output image is plain black.

Fig. 6. Background Image after gray scale conversion

VII. PROCESSING OF IMAGE DATA


We have collected about 330 sample data sets from 22
different people. Among which 65 are for validation purposes
and 65 for testing purposes and 200 for training purpose.
These images are in RGB format. So we have used YCbCr
algorithm to separate the skin from the background. Then the
YcbCr image is converted to gray scale to feed the computer
with the data.
VIII. RAW IMAGE DATA
As for learning and testing purposes using the binary
image is the most practical approach in this case because here
0 values (white) are representing human hand gesture and 1
values are (Black) representing the background. The gray
scale image is then converted to a 100 by 100 matrix where 0
is assigned for skin pixels and 1 is assigned for background
and others. Then the raw image data is passes through the
training system without any feature extraction.
IX.

Layer Perceptron supervised learning. The scheme for training


is back propagation with sum square error. We are calculating
sum square error through this equation,

where ej=(tj-oj) and N is the number of input, t is the


desire target output and o is the output gotten from the
network based on the square error Each error of ej like the
above example influences the total sum error from network the
learning process. Therefore, if the error is big then the sum
error is big implying the bad learning. The learning process in
back propagation is done through minimizing the sum error
through updating the weights using methods known as
steepest descend which can be simplify as in the equation
below.

is the signal error at nod j


Where is the learning rate,
from L layer and
is the output for node i from L-1 layer.
The value of
is calculated as,

The network training process can be illustrated as in


Figures 7 and 8 which shows the flow diagram of back
propagation scheme

PCA

Principal component analysis (PCA) is a mathematical


procedure that uses an orthogonal transformation to convert a
set of observations of possibly correlated variables into a set
of values of linearly uncorrelated variables called principal
components. We have then applied Principle Component
Analysis on the raw image data and then train our system with
those data tested the system with another set of data in the
later part.
X.

LDA

Linear discriminant analysis (LDA) and the related Fisher's


linear discriminant are methods used in statistics, pattern
recognition and machine learning to find a linear combination
of features which characterizes or separates two or more
classes of objects or events. The resulting combination may be
used as a linear classifier or, more commonly, for
dimensionality reduction before later classification. We have
then applied Linear Discriminant Analysis on the raw image
data and then train our system with those data tested the
system with another set of data in the later part.
XI.

Fig. 7. Training Process

The network is a supervised network where target output is


used to supervise learning process by error reduction.
Therefore, the output nodes for the training pattern need to be
well represented. Here from the figure 9 below we can see that
we have used 22 neurons in this neural network system and
total number of layer is 5 and the total number of input is 330.

APPLIED NEURAL NETWORK

We have used different approaches to extract image


features properly and the neural network in this work is Multi-

Fig. 8. Our Developed Neural Network for training and testing

XII. RESULT
As per data collection, we have collected all the data by
ourselves from about 22 people for 15 Bengali alphabets.
About 17 images were taken from each person for training
purposes for each symbol or alphabet and at least 5 were for
taken for testing purposes. So we have collected about 330
sets of data among which 200 are for training purposes and 65
are for testing. Below a table is given describing the results of
different methods used on these data.

Symbol

Number
training
images

of

Number
Testing
Images

of

Raw
Data %

PCA %

RAW Image Data

17

68

37

100

17

66

33

100

17

54

34

100

17

59

39

100

17

72

28

100

17

57

30

100

17

68

35

100

17

67

33

100

17

63

32

100

17

68

31

100

17

67

34

100

17

60

37

100

17

70

39

100

17

69

37

100

17

62

40

100

Table.1. Bengali letter wise success rate of our system.

Fig. 9. Success rate of RAW image data on our neural system.

B. Model 2
This model uses Principal component analysis (PCA) to
train the data to the system. After the training is done the later
processes such as the validations etc. are described below. As
we can see the rate of success is not much for the PCA
process. Here MSE is the mean square error and %E is the
recognition error.
PCA features-hidden
Nodes-50

Mean

Training

MSE

0.12

0.002

0.109

0.139

0.0618

%E

69.44

57.14

70.63

36.546

MSE

0.152

0.141

0.152

0.154

0.1471

%E

71.43

67.86

80.95

75

73.689

MSE

0.15

0.143

0.151

0.156

0.1481

%E

72.62

71.43

77.38

72.62

73.572

60%-200

Validation
20%-65

Testing

A. Model 1
Here we have taken 330 raw images converted to gray and
running them directly to the neural network for training and
testing purposes. 200 sets of data are used for training
purposes and 65 for validation and 65 for testing. Below the
results are given for the raw data.
1

mean

0.002

0.0054

60%-200

%E

3.97

2.183

Validation

MSE

0.098

0.086

0.11

0.0904

20%-65

%E

38.09

34.52

42.86

35.236

Testing

MSE

0.096

0.086

0.074

0.0816

20%-65

%E

40.48

35.75

29.75

32.855

Table.2. Segmentation of RAW Image dataset and test result

Error on
Testing

Success
Rate

LDA
%

Original pics-hidden
Nodes-50
Training
MSE

About 200 images were taken to training of the raw data


and 65 for each validation and testing. Below a graphical
representation of the success rate is given for raw images.

20%-65

Table. 3. Segmentation of PCA dataset and test result

Here from the graphical representation we can see that the


percentage of error on testing data is about 73% and success
rate is only 26% which is very poor.

Fig. 10. Success rate of PCA on our neural system.

C. Model 3
In the third model the LDA algorithm is used for feature
extraction. Below a table view of the training, validation and
testing results are given for LDA.

LDA features-hidden
Nodes-50
Training

MSE

60%-200

%E

Validation

MSE

20%-65

%E

Testing

MSE

20%-65

%E

Mean

Table.4. Segmentation of LDA dataset and test result

Here a graphical representation of the LDA process is


given with the success rate and the rate of error. WE can see
that LDA is the most successful feature extraction process in
this case.

XIV. CONCLUSION
The sole purpose of this paper is to detect and recognize
the sign language using different techniques and see which is
more accurate. Only this case we have used it to detect and
recognize the Bangla sign language. Number of works done,
specifically on Bangla sign language is very less. So our goal
was to find a better solution to detect and recognize the Bangla
sign language.

References
[1] T. Kapuscinski and M. Wysocki, Hand Gesture Recognition for ManMachine interaction, Second Workshop on Robot Motion and Control,
October 18-20, 2001, pp. 91-96.
[2] D. Y. Huang, W. C. Hu, and S. H. Chang, Vision-based Hand Gesture
Recognition Using PCA+Gabor Filters and SVM, IEEE Fifth International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing, 2009.
[3] Manigandan M. and I. M Jackin, Wireless Vision based Mobile Robot
control using Hand Gesture Recognition through Perceptual Color
Space, IEEE International Conference on Advances in Computer
Engineering, 2010, pp. 95-99.
[4] Amin, M.A. ; City Univ. of Hong Kong, Kowloon ; Hong Yan,Sign
Language Finger Alphabet Recognition from Gabor-PCA Representation
of Hand Gestures, Machine Learning and Cybernetics, 2007 International
Conference on (Volume:4 )
[5] Deng-Yuan Huang ; Dept. of Electr. Eng., Da-Yeh Univ., Chang-Hua,
Taiwan ; Wu-Chih Hu ; Sung-Hsiang Chang, Vision-Based Hand Gesture
Recognition Using PCA+Gabor Filters and SVM, Intelligent Information
Hiding and Multimedia Signal Processing, 2009. IIH-MSP '09. Fifth
International Conference on
[6]

[7]

Fig. 11. Success rate of LDA on our neural system.

XIII. COMPARISON BETWEEN 3 MODELS


As you can see from this the Neural Network
Classification works extremely well for LDA algorithm
features while it is not so good for the PCA features. The
reason for the poor performance of PCA-NN might be over
fitting. Neural network works well on specific input data.

[8]

[9]
[10]

[11]

[13]
[14]

[15]
Fig. 12. Success rate comparison of three models.

[16]

Lamar, M.V. ; Dept. of Electr. & Comput. Eng., Nagoya Inst. of Technol.,
Japan ; Bhuiyan, M.S. ; Iwata, A., Hand alphabet recognition using
morphological PCA and neural networks, Neural Networks, 1999.
IJCNN '99. International Joint Conference on (Volume:4 )
Lamar, M.V. ; Dept. of Electr. & Comput. Eng., Nagoya Inst. of Technol.,
Japan ; Bhuiyan, M.S. ; Iwata, A.,, Hand gesture recognition using
morphological principal component analysis and an improved
CombNET-II, Systems, Man, and Cybernetics, 1999. IEEE SMC '99
Conference Proceedings. 1999 IEEE International Conference on
(Volume:4 )
William T. Freeman and Michal Roth, Orientation Histograms for Hand
Gesture Recognition, IEEE Intl. Wkshp. on Automatic Face and Gesture
Recognition, Zurich, June, 1995.
Y. Fang, K. Wang, J. Cheng, and H. Lu, A Real-Time Hand Gesture
Recognition Method, IEEE ICME, 2007, pp. 995-998.
S. Saengsri, V. Niennattrakul, and C.A. Ratanamahatana, TFRS: Thai
Finger-Spelling Sign Language Recognition System, IEEE, 2012, pp. 457462.
J. H. Kim, N. D. Thang, and T. S. Kim, 3-D Hand Motion Tracking and
Gesture Recognition Using a Data Glove, IEEE International Symposium
on Industrial Electronics (ISIE), July 5-8, 2009, Seoul Olympic Parktel,
Seoul , Korea, pp. 1013-1018.
Bangla Sign Language Anthology: CDD s JOURNEY, Centre for
Disability in Development (CDD), Bangladesh, 2009.
J. Tang and J. Zhang, Eye tracking based on grey prediction, First
International Workshop on Education Technology and Computer
Science, pp. 861864, 2009.
Z. Juan and Z. Jing-xiu, A novel method of rapid eye location, 2nd
International Conference on Bioinformatics and Biomedical
Engineering, pp. 20042007, 2008.
Swapnil V Tathe and Sandipan P Narote, Face detection using color
models World Journal of Science and Technology 2012, 2(4):182-185;
ISSN: 2231 2587

You might also like