Professional Documents
Culture Documents
Submitted by:
Ahsan Ayyaz
2010-MC-09
Hassan Javaid
2010-MC-20
M. Usman Khan
2010-MC-41
Submitted to the faculty of the Mechatronics and Control Engineering Department of the University
of Engineering and Technology Lahore
in partial fulfillment of the requirements for the Degree of
Bachelor of Science
in
Internal Examiner
External Examiner
[i]
Declaration
I declare that the work contained in this thesis is my own, except where explicitly stated otherwise. In
addition this work has not been submitted to obtain another degree or professional qualification.
Signed:
Date:
[ii]
Acknowledgments
Thanks to Allah Almighty who gave strength and courage to work on this project and achieve
this goal with His grace and blessings. The completion brings satisfaction, but it is incomplete without
thanking those people who made it possible and whose constant support polished our efforts with
success.
We would like to thank our project advisor Mr. Ahsan Naeem whose valuable suggestions and
guidance brought us to the point of achievement. His greatest support was always with us in every
situation throughout completion of the project. Finally, we offer regards and prayers for all who helped
and supported us in this project.
[iii]
Dedication
This project is dedicated to our beloved parents whose prayers, care and efforts made us
capable of reaching this greatest point of achievement.
[iv]
Abstract
Hand gesture recognition is a way to create a useful, highly adaptive interface between
machines and their users. The recognition of gestures is difficult because gestures exhibit human
variability. Sign languages are used for communication and interface. There are various types of
systems and methods available for sign languages recognition. Our approach is robust and efficient for
static hand gesture recognition. The main objective of this thesis is to propose a system which is able
to recognize 26 static hand gestures of American Sign Language (ASL) for letter A- Z successfully and
also it is able to perform the classification on static images correctly in real time. We proposed a novel
method of pattern recognition to recognize symbols of the ASL based on the features extracted by SIFT
algorithm and the classification by MK-ROD algorithm. A modified technique for classification is also
presented which has drastically improved our results.
[v]
LIST OF TABLES
Table#1
Existing Systems
Table#2
Table#3
Table#4
Table#5
Table#6
Table#7
Table#8
[vi]
LIST OF FIGURES
Figure#1
Hardware Setup
Figure#2
Lab Environment
Figure#3
Figure#4
Figure#5
Figure#6
Figure#7
Figure#8
Figure#9
Final Result
Figure#10
Figure#11
Figure#12
Figure#13
Figure#14
Figure#15
Figure#16
Figure#17
Figure#18
Gestures Set
[vii]
CONTENTS
Acknowledgements
Dedication
ii
Abstract
iii
List of Table
iv
List of figures
Chapter# 1
Introduction
Background
12
[viii]
Chapter# 5 Experimentation
22
36
[ix]
CHAPTER# 1
INTRODUCTION:
In this project we will design and build a man-machine interface using a video camera to interpret
the American one-handed sign language alphabet and number gestures (plus others for additional keyboard
and mouse control).
The keyboard and mouse are currently the main interfaces between man and computer.
In other areas where 3D information is required, such as computer games, robotics and design, other
mechanical devices such as roller-balls, joysticks and data-gloves are used.
Humans communicate mainly by vision and sound, therefore, a man-machine interface would be
more intuitive if it made greater use of vision and audio recognition. Another advantage is that the user not
only can communicate from a distance, but need have no physical contact with the computer. However,
unlike audio commands, a visual system would be preferable in noisy environments or in situations where
sound would cause a disturbance.
The visual system chosen was the recognition of hand gestures. The amount of computation
required to process hand gestures is much greater than that of the mechanical devices, however standard
desktop computers are now quick enough to make this project hand gesture recognition using computer
vision a viable proposition.
1.1 APPLICATIONS:
A gesture recognition system could be used in any of the following areas:
Man-machine interface: using hand gestures to control the computer mouse and/or keyboard
functions. An example of this, which has been implemented in this project, controls various
keyboard and mouse functions using gestures alone.
3D animation: Rapid and simple conversion of hand movements into 3D computer space for the
purposes of computer animation.
Visualization: Just as objects can be visually examined by rotating them with the hand, so it would
be advantageous if virtual 3D objects (displays on the computer screen) could be manipulated by
rotating the hand in space [Bretzner & Lindeberg, 1998].
Computer games: Using the hand to interact with computer games would be more natural for many
applications.
Control of mechanical systems (such as robotics): Using the hand to remotely control a
manipulator.
[1]
1.2 MOTIVATION:
In this modern world of technology, automation has not only provided human beings with comfort
and facilitation but it also has opened many doors for the improvement of business and earning. Technology
like gesture recognition and machine vision has revolutionized industrial estates along with the creation of
luxury and comfort in the lives of common people.
Gesture recognition is an area of keen interest in computer science. Recent development are driving
the industry both in the domain of gaming and biometrics. Our interest and support from the faculty in
alternate human-computer interfaces motivated us to take up this project.
[2]
CHAPTER# 2
BACKGROUND:
The history of hand gesture recognition for computer control started with the invention of glove-based
control interfaces. Researchers realized that gestures inspired by sign language can be used to offer simple
commands for a computer interface. This gradually evolved with the development of much accurate
accelerometers, infrared cameras and even fiber optic bend-sensors (optical goniometers).
Some of those developments in glove based systems eventually offered the ability to realize computer vision
based recognition without any sensors attached to the glove. These are the colored gloves or gloves that
offer unique colors for finger tracking ability that would be discussed here on computer vision based gesture
recognition. Over past 25 years, this evolution has resulted in many successful products that offer total
wireless connection with least resistance to the wearer. This chapter discusses the chronological order of
some fundamental approaches that significantly contributed to the expansion of the knowledge of hand
gesture recognition.
Gesture recognition is a topic in computer science and language technology with the goal of
interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion
or state but commonly originate from the face or hand. Current focuses in the field include emotion
recognition from the face and hand gesture recognition. Many approaches have been made using cameras
and computer vision algorithms to interpret sign language. However, the identification and recognition of
posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture
recognition can be seen as a way for computers to begin to understand human body language, thus building
a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical
user interfaces), which still limit the majority of input to keyboard and mouse.
Gesture recognition enables humans to communicate with the machine (HMI) and interact naturally
without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at
the computer screen so that the cursor will move accordingly. This could potentially make conventional
input devices such as mouse, keyboards and even touch-screens redundant.
2.2 USES:
Gesture recognition is useful for processing information from humans that is not conveyed through speech
or type. There are also various types of gestures that can be identified by computers.
[3]
Sign language recognition. Just as speech recognition can transcribe speech to text, certain types
of gesture recognition software can transcribe the symbols represented through sign language into
text.
For socially assistive robotics. By using proper sensors (accelerometers and gyros) worn on the
body of a patient and by reading the values from those sensors, robots can assist in patient
rehabilitation. The best example can be stroke rehabilitation.
Directional indication through pointing. Pointing has a very specific purpose in our society, to
reference an object or location based on its position relative to ourselves. The use of gesture
recognition to determine where a person is pointing is useful for identifying the context of
statements or instructions. This application is of particular interest in the field of robotics.
Control through facial gestures. Controlling a computer through facial gestures is a useful
application of gesture recognition for users who may not physically be able to use a mouse or
keyboard. Eye tracking in particular may be of use for controlling cursor motion or focusing on
elements of a display.
Alternative computer interfaces. Foregoing the traditional keyboard and mouse setup to interact
with a computer, strong gesture recognition could allow users to accomplish frequent or common
tasks using hand or face gestures to a camera.
Immersive game technology. Gestures can be used to control interactions within video games to
try and make the game player's experience more interactive or immersive.
Virtual controllers. For systems where the act of finding or acquiring a physical controller could
require too much time, gestures can be used as an alternative control mechanism. Controlling
secondary devices in a car, or controlling a television set are examples of such usage.
Affective computing. In affective computing, gesture recognition is used in the process of
identifying emotional expression through computer systems.
Remote control. Through the use of gesture recognition, "remote control with the wave of a hand"
of various devices is possible. The signal must not only indicate the desired response, but also
which device to be controlled.
Wired gloves. These can provide input to the computer about the position and rotation of the hands
using magnetic or inertial tracking devices. Furthermore, some gloves can detect finger bending
with a high degree of accuracy (5-10 degrees), or even provide haptic feedback to the user, which
is a simulation of the sense of touch. The first commercially available hand-tracking glove-type
device was the Data Glove, a glove-type device which could detect hand position, movement and
finger bending. This uses fiber optic cables running down the back of the hand. Light pulses are
created and when the fingers are bent, light leaks through small cracks and the loss is registered,
giving an approximation of the hand pose.
Depth-aware cameras. Using specialized cameras such as structured light or time-of-flight
cameras, one can generate a depth map of what is being seen through the camera at a short range,
and use this data to approximate a 3d representation of what is being seen. These can be effective
for detection of hand gestures due to their short range capabilities.
Stereo cameras. Using two cameras whose relations to one another are known, a 3d representation
can be approximated by the output of the cameras. To get the cameras' relations, one can use a
[4]
positioning reference such as a lexian-stripe or infrared emitters. In combination with direct motion
measurement (6D-Vision) gestures can directly be detected.
Controller-based gestures. These controllers act as an extension of the body so that when gestures
are performed, some of their motion can be conveniently captured by software. Mouse gestures are
one such example, where the motion of the mouse is correlated to a symbol being drawn by a
person's hand, as is the Wii Remote or the Myo, which can study changes in acceleration over time
to represent gestures. Devices such as the LG Electronics Magic Wand, the Loop and the Scoop
use Hillcrest Labs' Free space technology, which uses MEMS accelerometers, gyroscopes and other
sensors to translate gestures into cursor movement. The software also compensates for human
tremor and inadvertent movement. Audio Cubes are another example. The sensors of these smart
light emitting cubes can be used to sense hands and fingers as well as other objects nearby, and can
be used to process data. Most applications are in music and sound synthesis, but can be applied to
other fields.
Single camera. A standard 2D camera can be used for gesture recognition where the
resources/environment would not be convenient for other forms of image-based recognition. Earlier
it was thought that single camera may not be as effective as stereo or depth aware cameras, but
some companies are challenging this theory. Software-based gesture recognition technology using
a standard 2D camera that can detect robust hand gestures, hand signs, as well as track hands or
fingertip at high accuracy has already been embedded in Lenovos Yoga ultrabooks, Pantechs
Vega LTE smartphones, Hisenses Smart TV models, among other devices.
[5]
Paper
[Bauer &
Hienz,
2000]
[Starner, &
Weaver
Pentland,
1998]
[Bowden &
Sarhadi,
2000]
Hidden Markov 97
Models
General
Multicolored 7-hours
gloves
signing
Hidden Markov 40
Models
General
No
Linear
26
approximation
to
nonlinear
point
distribution
models
Square invariant
Feature
transform[SIFT]
Blue screen No
[ David
Lowe]
Static
None
Static
Wrist band
[6]
400
97.6%
training
sentences
10
7441
images
24
100
99.1%
examples
per
gesture
15
CHAPTER # 3
An optical method has been chosen, since this is more practical (many modern computers come with a
camera attached), cost effective and has no moving parts, so is less likely to be damaged through use.
The first step in any recognition system is collection of relevant data. In this case the raw image information
will have to be processed to differentiate the skin of the hand (and various markers) from the background.
Once the data has been collected it is then possible to use prior information about the hand (for example,
the fingers are always separated from the wrist by the palm) to refine the data and remove as much noise
as possible. This step is important because as the number of gestures to be distinguished increases the data
collected has to be more and more accurate and noise free in order to permit recognition.
The next step will be to take the refined data and determine what gesture it represents. Any recognition
system will have to simplify the data to allow calculation in a reasonable amount of time (the target
recognition time for a set of 36 gestures is 25 frames per second). Obvious ways to simplify the data include
translating, rotating and scaling the hand so that it is always presented with the same position, orientation
and effective hand-camera distance to the recognition system.
[8]
[9]
[10]
Figure 3: The effect of self-shadowing (A) and cast shadowing (B). The top three images were lit by
a single light source situated off to the left. A self-shadowing effect can be seen on all three, especially
marked on the right image where the hand is angled away from the source. The bottom three images
are more uniformly lit, with little self-shadowing. Cast shadows do not affect the skin for any of the
images and therefore should not degrade detection. Note how an increase of illumination in the
bottom three images results in a greater contrast between skin and background.
However, since this system is intended to be used by the consumer it would be a disadvantage if
special lighting equipment was required. It was decided to attempt to extract the hand and marker
information using standard room lighting (in this case a 100 watt bulb and shade mounted on the ceiling).
This would permit the system to be used in a non-specialist environment.
Camera orientation: It is important to carefully choose the direction in which the camera points to
permit an easy choice of background. The two realistic options are to point the camera towards a wall or
towards the floor (or desktop). However since the lighting was a single overhead bulb, light intensity would
be higher and shadowing effects least if the camera was pointed downwards.
Background: In order to maximize differentiation it is important that the color of the background
differs as much as possible from that of the skin. The floor color in the project room was a dull brown. It
was decided that this color would suffice initially.
[11]
CHAPTER # 4
METHODOLOGY:
Methodology of the project constitutes of the following points.
[12]
1. Binarization: Input image is converted into a binary image. Binary image is an image consists of
only one or zero intensity pixels.
2. Marking Skin Pixels: Input RGB input is used for marking the skin pixels. This RGB image is
then converted to YCbCr plane. After that Thresholding is done, ranges of these thresholds are
mention below.
Cb 67 & Cb 137
Cr 133 & Cr 173
Pixels which lie in this range are marked as BLUE pixels. This YCbCr plane is then converted to
RGB
plane.
[13]
3. Morphological operations: These Binary images were further processed through morphological
operations. Which are following.
Cleaning
Closing
4. Hand Detection: vision.BlobAnalysis which is a Computer Vision Toolbox Function is used for
detecting Blobs in a binary image and outputs regional properties of the blob in the image.
Properties like
Area
Centroid
[14]
Bounded Box
Centroid is displayed as a BLUE CROSS (*) and Bounded Box with Yellow Color.
[15]
Area metric
Radial length signature
Template matching in the canonical frame
Convenient to use
More practical
[16]
Figure 10: Hand gesture recognition system with point pattern matching flowchart
[17]
Flow Chart of Point Pattern Matching Algorithm The working of point pattern matching algorithm is as
follows:
Take a test image
Pre-process the test image.
Initialize the distRatio = 0.65 and threshold= 0.035
Run the SHIFT match algorithm
Key point matching starts its execution by running the threshold. It gets the key point matched
between test and all 36 trained images. We get the validity ratio.
Check that we got more than one result or not.
If we get more than1 result then increment the SHIFT distRatio by 0.05 and threshold by 0.005
and repeat the steps from 4 to 7.
If we get only one result then display the result.
[18]
IMPLEMENTATION:
Implementation of Point Pattern Matching Algorithm:
During the test implementation of Hand Gesture Recognition System, the point pattern matching algorithm
is executed. First test image is taken from the database or from the webcam. Then point pattern matching
algorithm start its execution to find the matching key-points between test and train database images. After
executing this algorithm it recognizes ASL input (query) images by comparing test image with all the
database images and outputs the equivalent ASCII representation of it. This algorithm is implemented in
two parts.
i)
SIFT Algorithm
For any object in an image, points of interest of that object can be extracted to provide a "feature
description" of the object. This description, extracted from a training image, can then be used to locate and
identify the object in a test image containing many other objects. For accurate and reliable recognition, the
features extracted from the training image must be detectable even under changes in image scale, noise and
illumination. Also, the relative positions between them in the original image shouldn't change from one
image to another. SIFT detects and uses a much larger number of features from the images, which reduces
the contribution of the errors caused by these local variations in the average error of all feature matching
errors.
The SIFT algorithm consists of following steps:
Constructing a scale space.
Log Approximation
Finding key-points
Get rid of bad key points (Edges and low contrast regions)
Assigning an orientation to the key-points.
Generate SIFT features.
During the test implementation, the point pattern matching algorithm uses the SIFT algorithm to find the
key-points of the images. These key-points are the Scale invariant features located near the high contrast
regions of the image that can be used to distinguish them. In SIFT algorithm, image1 and image2 are taken
as a two images to match. For our case the first image is one of the database images and image2 is the input
(query) image. distRatio is the parameter of SIFT algorithm. In the original implementation, this parameter
is set as a constant. For our algorithm's recursivity we made it a variable parameter and threshold is the
threshold value for the MK-ROD algorithm.
Here, for finding the shift key-points of an image the function sift is called which finds the key-points with
the combination of image description and location of given image. These terms are:
[19]
ii)
MK-ROD Algorithm:
For finding the validity ratio MK-ROD algorithm is used. For example, Fig. 3 shows the two images for
finding the validity ratio.
Representation of (a) Trained Database Image (b) Test Input Image with Key points
C- Denotes the center points
D- Denotes the distance mask
T - Denotes the No. of test image to match
M - Denotes the No. of Matched Points 1, 2, 3 are the key points.
The procedure to find the Validity ratio of One Database Image versus Test Input Image is as follows:
Once we got the validity ratio, mask the distances by taking the absolute which are below the algorithm's
threshold. This operation is done in order to determine the similar pattern of the matched key points from
the center of the matched key points. The absolute of the difference of the points which are below the given
threshold are treated as valid matched key point.
[20]
[21]
CHAPTER # 5
EXPERIMENTATION:
Our experimentation was performed in a controlled environment in embedded systems lab. Hand gesture
was acquired through the camera and further image processing and classification was done through software
program in the computer.
We have used Scale Invariant Feature Transform or SIFT algorithm along with MK-ROD to properly detect
and recognize a particular gesture.
The SIFT algorithm was used for extracting features and MK-ROD was used as a classifier.
Figure 13 shows the input image and the detected hand from the image by masking the skin samples and then drawing
the bounding box around it. Figures on the left were used for further processing.
[22]
50
50
100
100
150
200
150
250
200
300
350
250
400
300
450
50
100
150
200
250
50
300
100
150
200
50
50
100
100
150
150
200
200
250
50
100
50
150
100
150
200
250
Figure 14 shows the SIFT key points that were detected. Numerous SIFT key points were detected, some of these result
from noise in texture and could not be used so they have to be removed
Image 1 correspond to the input image and image 2 correspond to the image in database. SIFT key points
are numerous in numbers but most of them are outliers and will be excluded in the match. This matching
is done through the dot product approximation of the ratio of cosine of angles having small orientations.
[23]
<
An important thing to mention in this pseudo code is this we have not used the Euclidean distance for
calculation of the distances. Euclidean distance was the best option to use with respect to calculating
distance in offline mode, bur as the main objective of our project was to detect the gesture in real time and
then display the result so for the reasons of efficiency in the code we had opt to go with calculating the
dot product of cosine angles.
From the math it was seen that there was a close approximation between the ratio of Euclidean distance
and ratio of the dot product of cosine angle having small orientations. This option was also benefited as it
was a close approximation to the Euclidean distance.
[24]
50
50
100
100
150
200
150
250
200
300
350
250
400
300
450
50
100
150
200
250
50
300
100
150
200
50
50
100
100
150
150
200
200
250
50
100
50
150
100
150
200
250
Figure 15 shows the matched key points found after the calculating the distances of each of the key points as proposed in
the algorithm inly those points are kept and shown whose distance is less than distance ratio (which is defined in the
beginning of the process)
Now after feature extraction and matching we have to calculate the validity ratio of the results obtained
after matching. The validity ratio is given as:
[25]
=
=
2. Sum of the distances of all of the matched key points in input image
=
=
4. Calculate the distance ratio of the database image this is done by dividing d1 calculated in step 1
by each of the distance.
[ ]
=
The validity ratio corresponding to each of the image is calculated and the results are removed after each
iteration, when only a single result remains then that result is fed to the output.
The classification step comes next whose code is given in the next section.
[26]
[27]
50
100
150
200
250
300
350
400
450
50
100
150
200
250
300
350
400
450
500
50
100
150
200
250
50
100
150
200
250
300
350
400
Figure 16 shows the final result after classification through MK-ROD algorithm. The image on the left is the input and
the right image is its corresponding match in the database.
The final result of the match shows that MK-ROD did successfully classified the gesture. But its results
from real-time input was not that much accurate as explained in the next section
[28]
Most Recurring
False-Match
E
E
C
G
C
G
W
T
N
J
X
P
J
A
G
Q
E
F
A
E
Z
C
M
E
W
Q
Percent accuracy
(>= 50 is correct)
42%
33%
64%
23%
14%
43%
22%
11%
13%
52%
10%
10%
9%
7%
23%
21%
22%
15%
5%
19%
20%
21%
18%
15%
8%
38%
NumCorrect
NumIncorrect
Total
[29]
Correct/
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
2
24
26
Accuracy
8%
Average Accuracy
of Each Gesture
22%
After changing the initial value of distance ratio the results of the MK-ROD classifier was, this is shown
in the table below
Table 3: MK-ROD SIFT Results DistRatio = 0.65
Most Recurring
False-Match
A
P
E
O
I
F
W
W
S
J
X
U
M
X
E
P
Q
R
T
T
V
V
X
T
X
Z
Percent accuracy
(>= 50 is correct)
56%
46%
36%
22%
17%
55%
32%
22%
44%
60%
10%
16%
50%
42%
22%
66%
51%
50%
20%
52%
28%
63%
30%
30%
40%
68%
NumCorrect
NumIncorrect
Total
Accuracy
Correct/
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Incorrect
Correct
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Correct
10
16
26
38%
[30]
Can changes
be applied?
-Y
Y
Y
N
-Y
Y
Y
-N
Y
-N
Y
---N
-Y
-Y
Y
N
--
Average Accuracy
of Each Gesture
40%
The increase in accuracy due to decrease in the value of distRatio is because now it matches only those
key points which have very close correspondence to their neighbors (details can be seen in pseudo code),
as a result the accuracy increased to 40%, but this was not a good result the MK-ROD classifier was
detecting some false positives and the classification was a bit modified to improve the results.
# MK
30
40
44
43
36
31
27
35
32
35
45
69
30
35
35
51
# Valid
MK
24
35
30
31
30
29
23
30
24
33
40
67
29
33
34
51
Validity
Ratio
0.80
0.88
0.68
0.72
0.83
0.94
0.85
0.86
0.75
0.94
0.89
0.97
0.97
0.94
0.97
1.00
[31]
Letter
num
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ind
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Q
R
S
T
U
V
W
X
Y
Z
27
37
30
37
45
38
43
43
40
25
26
31
30
37
44
36
43
42
38
23
0.96
0.84
1.00
1.00
0.98
0.95
1.00
0.98
0.95
0.92
17
18
19
20
21
22
23
24
25
26
1
1
1
1
1
1
1
1
1
1
# MK
33
66
29
55
37
47
20
41
37
30
42
42
28
31
39
46
22
38
36
38
41
46
50
39
45
31
# Valid
MK
24
65
28
50
37
45
20
41
37
28
42
39
27
29
38
46
21
36
36
36
40
45
48
38
44
31
Validity
Ratio
0.73
0.98
0.97
0.91
1.00
0.96
1.00
1.00
1.00
0.93
1.00
0.93
0.96
0.94
0.97
1.00
0.95
0.95
1.00
0.95
0.98
0.98
0.96
0.97
0.98
1.00
[32]
Letter
num
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
ind
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Both of the tables shown above strengthens the case that validity ratio alone results in a poor classifier. So
in order to have a better classifier, validity ratio as well as number of valid key points must be included in
the determination of the correct letter for the input gesture. The pseudo code for the modified MK-ROD
algorithm is as follows
(() > )
.
.
[33]
The bold part in the pseudo code shows the modified portion, as can be seen number of valid points are
also accounted for during classification.
Now the best 5 results are selected according to the number of valid points. From these 5 results
if the one having the highest validity ratio has the maximum number of valid key points then it is selected
as the output and loop is broken and result is displayed.
5.5 CLASSIFICATION RESULTS 2: MODIFIED MK-ROD ALGORITHM
Table 6: MK-ROD SIFT Results DistRatio = 0.65 Changes Applied
Most Recurring
False-Match
A
B
C
D
I
F
G
P
I
J
K
L
T
A
O
P
Q
R
T
T
U
V
V
X
Y
Z
Percent accuracy
(>= 50 is correct)
98%
88%
91%
75%
45%
85%
98%
48%
87%
90%
70%
80%
45%
45%
90%
75%
75%
90%
49%
90%
85%
87%
45%
70%
90%
90%
NumCorrect
NumIncorrect
Total
Accuracy
Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
20
6
26
77%
[34]
Correct match
after changes?
-Y
Y
Y
N
-Y
N
Y
-Y
Y
N (result changed)
N
Y
---N
-Y
-N
Y
N
--
Average Accuracy
of Each Gesture
76%
As we can see that the accuracy of the result has dramatically increased to 76%. This was because of the
modifications to the code. This accuracy value includes both online and offline testing.
Figure 17 shows the GUI developed for the demonstration of this project. The GUI shows all of the features mentioned.
[35]
CHAPTER # 6
RESULTS AND CONCLUSIONS
6.1.
Offline testing results were more accurate than those obtained through online testing.
In offline mode 20 gestures were correctly recognized resulting in a 77% accuracy.
In online mode 14 gestures were correctly recognized resulting in a 54% accuracy.
The overall system accuracy then comes out to be 66%.
The rotation invariance in the roll direction was only 20 degrees.
The rotation invariance due to yaw was removed to a certain extent i.e. +- 10
degrees.
The tables with the offline and online accuracies are shown:
Table 7: Offline testing results
Offline Testing Results
Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Most Recurring
False-Match
A
B
C
D
I
F
G
P
I
J
K
L
T
A
O
P
Q
R
T
T
U
V
V
[36]
Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
Incorrect
X
Y
Z
X
Y
Z
Correct
Correct
Correct
NumCorrect
NumIncorrect
Total
Accuracy
20
6
26
77%
Most Recurring
False-Match
A
B
C
D
I
B
G
H
I
J
B
G
T
A
J
P
Q
E
T
T
I
V
V
C
Y
Z
[37]
Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Correct
Incorrect
Incorrect
Correct
Incorrect
Correct
Incorrect
Incorrect
Correct
Correct
NumCorrect
NumIncorrect
Total
Accuracy
14
12
26
54%
6.2.
CONCLUSIONS:
Following conclusions were drawn from the experimentation and the project itself:
As this project was a machine vision and image processing project so it was
necessary that that environment must remain constant.
The lighting variations did had some impact on the experiments but the overall
accuracy was not much affected due to the use of SIFT algorithm which is light
invariant.
Yellow light is much better than white light for illumination of hand as more detail is
seen through it.
The background in our case was kept constant and non-reflective, the reason being
that if it had been shiny light variations would have been more.
Light variations must be considered in the choice of background.
The maximum invariance to the rotation in the roll an yaw did have some impact on
the output result.
For making a more robust system these rotations must be accounted for.
The dataset images must be clear and the gesture must be at front so in order to aid
detection.
In order to process more dataset images in real time a computer with a compatible
processor must be selected.
Hand gesture recognition system is an up and coming technology and has a future in
automation.
6.3.
FUTURE IMPROVEMENTS:
This project was made to recognize gestures of American Sign Language (ASL) of a
single hand, it can be extended to include both of the hands and thus producing entire
words or parts of words in one gesture instead of one letter.
When using two hands problems of occlusion and person detection can arise so this
problem can be sorted out in future improvements.
The problem of rotation invariance in yaw axis can be removed by using Harris
Corner Detector along with SIFT descriptors.
More work can be done on making the project more robust by making the program
learn the gestures but the processor speed should also be increased accordingly for
real time processing
The main objective of our project was to demonstrate the use of a gesture
recognition, some other application can also serve the same purpose but ASL
recognition is more challenging.
[38]
[39]
% --- Outputs from this function are returned to the command line.
function varargout = gui_OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
[40]
% hObject
% eventdata
% handles
handle to figure
reserved - to be defined in a future version of MATLAB
structure with handles and user data (see GUIDATA)
[41]
%
%
end
disp('letter');
disp(handles.letter);
str1 = handles.s;
handles.s = str_concat(str1,'\nRecognition Complete.');
set(handles.text12,'String',sprintf(handles.s));
if handles.letter ~= '-'
if size(handles.image,3) == 3
axes(handles.axes4);
imshow(handles.image);
end
axes(handles.axes3);
str1 = strcat(handles.word,' ');
handles.word = str_concat(str1,handles.letter);
set(handles.text11,'String',sprintf('%s',handles.word));
str1 = handles.s2;
handles.s2 = str_concat(str1,sprintf('Match Found: %c
char.\n',handles.letter));
set(handles.text14,'String',handles.s2);
else
set(handles.text14,'String',sprintf('No Match Found'));
set(handles.text11,'String','-');
end
guidata(hObject,handles);
else
errordlg('Database not initialized or Video not playing.','Error');
end
guidata(hObject,handles);
[42]
guidata(hObject,handles);
break;
end
guidata(hObject,handles);
end
end
else
errordlg('Database not initialized. Initialize database first','Initialize
Database.');
end
guidata(hObject,handles);
[43]
APPENDIX C: REFERENCES
[1]
Distinctive Image Features from Scale-Invariant Key points By David G. Lowe, 2003, University
of British Columbia, Canada.
[2]
[3]
Pattern Matching Algorithm for Asl Sign Gesture Recognition Using Neural Network by ms.
Kalyani p. Sable1, dr. H. R. Deshmukh2, prof. N. S. Band3, prof. R. N. Gadbail3, 2014.
[4]
[5]
Hand Gesture Recognition Using Computer Vision by Ray Lockton, Balliol College Oxford
University.
[6]
[7]
Image Processing Toolbox, Image Acquisition, Computer Vision Toolbox Documentation and
Online Help.
[8]
[9]
SIFT Key Point Detection Code and Application Modified By D. Alvaro and J,J, Gurraero.
University of Zaragoza, Modified by D. Lowe.
[44]
APPENDIX D: BIBLOGRAPHY
http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
http://www.ijisme.org/attachments/File/v1i7/G0336061713.pdf
http://www.ijpret.com/publishedarticle/2014/4/IJPRET%20-%20CSIT%20248.pdf
http://research.microsoft.com/en-us/um/people/awf/bmvc02/project.pdf
http://www.mathworks.com/
http://www.cs.ubc.ca/~lowe/425/slides/13-ViolaJones.pdf
[45]