You are on page 1of 33

GYAN GANGA COLLEGE OF

TECHNOLOGY
A
Project Report on

GESTURE RECOGNITION USING MATLAB


Submitted in partial fulfillment for the award of the degree
Of

BACHELOR OF ENGINEERING

In
ELECTRONICS & COMMUNICATION ENGINEERING
Submitted By:
AMEE VISHWAKARMA, APOORVA SRIVASTAVA, DEBOLINA SUR,
MOMITA SAHA & MONALISA HAZRA
Enrollment No.:
0208EC101017, 0208EC101028, 0208EC101036, 0208EC101055, 0208EC101056

Guided by:
Mr. Rajender Yadav
Designation: Asst. Professor
Deptt.of Electronics & Communication Engineering
Gyan Ganga College of Technology, Jabalpur

CERTIFICATE
This is to certify that the Minor Project entitled GESTURE
RECOGNITION

USING

MATLAB

VISHWAKARMA,

APOORVA

submitted

SRIVASTAVA,

by

DEBOLINA

AMEE
SUR,

MOMITA SAHA & MONALISA HAZRA has been carried out under my
guidance & supervision. The project is approved for submission towards
partial fulfillment as required for the award of degree of BACHELOR OF
ENGINEERING in ELECTRONICS & COMMUNICATION from Gyan
Ganga College of Technology, Jabalpur under RAJIV GANDHI
PROUDYOGIKI VISHWAVIDYALAYA, BHOPAL (M.P.)

SIGNATURE

SIGNATURE

Mrs. Papiya Dutta

Mr. Rajender Yadav

HEAD OF THE DEPARTMENT

GUIDE

GYAN GANGA COLLEGE OF TECHNOLOGY


CERTIFICATE
This is to certify that the Minor Project report entitled GESTURE
RECGNITION USING MATLAB has been submitted by AMEE
VISHWAKARMA, APOORVA SRIVASTAVA, DEBOLINA SUR,
MOMITA SAHA & MONALISA HAZRA for partial fulfillment of the
required for the award of degree of BACHELOR OF ENGINEERING
in ELECTRONICS & COMMUNICATION at Gyan Ganga College of
Technology

under

RAJIV

GANDHI

PROUDYOGIKI

VISHWAVIDYALAYA, BHOPAL (M.P.)

Internal Examiner
Date:

External Examiner
Date:

ACKNOWLEDGEMENT

We would like to express my sincere gratitude to Dr. R.K Ranjan, Principal and Mrs. Papiya Dutta, H.O.D of
Electronics and Communication Department of Gyan Ganga College Of Technology for providing me with an
opportunity to do our minor project on GESTURE RECOGNITION USING MATLAB.
This project bears on imprint of many people. We sincerely thank our project guide Mr. RAJENDER YADAV,
Assistant Professor, Department of Electronics & Communication, Gyan Ganga College of Technology,
Jabalpur whose help, stimulating suggestions and encouragement, helped to coordinate the project especially in
writing this report. We would also like to acknowledge with much appreciation the crucial role of the officials
and other staff members of the institute who rendered their help during the period of project work. Last but not
the least we wish to avail this opportunity to appreciate and give a special thanks to every team member for
their supportive contribution, project enhancing comments and tips that improved the presentation skills, report
writing and brought about clarity in the software work.

Place: Jabalpur

Date:

TABLE OF CONTENTS

Index

Page No.

Chapter 1:- Introduction to Hand Gesture Recognition


1.1
1.2
1.3

Introduction...
Motivation.
Gesture Analysis.

Chapter 2:- Objectives & Tools

1
1
1
1
3

2.1

Introduction

2.2

Objectives...

2.3

Tools..

Chapter 3:- Literature Review & Algorithm

3.1

MATLAB Overview

3.2

Literature Review on Gesture Recognition..

3.3

Neural Networks...

3.4

Neuron Model... 10

3.5

Perceptron.. 11

3.6

Image Database.. 12

3.7

Orientation Histogram 13

3.8

Operation. 15

3.9

Algorithm 17

Chapter 4:- Results & Discussion

19

APPENDIX I: - Commands

21

APPENDIX II: - Coding

23

References

28

CHAPTER 1
INTRODUCTION TO HAND GESTURE RECOGNITION

1.1 INTRODUCTION:
This project is to create a method to recognize hand gestures, based on a pattern recognition technique
developed by McConnell; employing histograms of local orientation. The orientation histogram will be used as
a feature vector for gesture classification and interpolation.
Computer recognition of hand gestures may provide a more natural-computer interface.
Hand gesture recognition is an important area of computer vision and pattern recognition field. Gestures are the
way by which one can communicate non-verbally.
Gesture recognition is a field, in which there is large number of innovations. Gestures can be defined as a
physical action, which is used to convey the information. There are various input output devices for
interacting with the computer, but now days emphasis is given ,how to make human computer interaction
more easy going, and for that purpose hand gesture recognition comes in light. Hand can be used as an input
device, by making its gesture understandable to computer, and for this purpose, this project aims at recognizing
the various hand gestures.

1.2 MOTIVATION:
Hand gesture recognition is done in this project by aiming basic shapes made by hand. Communication in our
daily life is generally vocal, but body language has its own significance, like hand gestures, facial expressions
and sometimes they play an important role in conveying the information. Hand gesture would be an ideal
option for expressing the feelings, or in order to convey something, like representing a number. It has many
areas of application like sign languages are used for various purposes and in case of people who are deaf and
dumb, sign language plays an important role. Gestures are the very first form of communication. So this area
influenced us very much to carry on the further work related to hand gesture recognition.

1.3 GESTURE ANALYSIS:


Gestures are defined as a physical activity that conveys some message, whether it can be facial expressions,
body language, hand movements etc. Gesture can be defined as the motion of the body in order to
communicate with others.
1

Hand gestures can be classified in two categories: static and dynamic. A static gesture is a particular hand
configuration and pose, represented by a single image. A dynamic gesture is a moving gesture, represented by
a sequence of images. We will focus on the recognition of static images.
In gesture analysis, there are mainly three steps to follow

Hand localization

Hand feature extraction

Hand model parameter computation of features

There are two types of classification

Rule Based Classification

Learning Based Classification

We have our project based on LEARNING BASED CLASSIFICATION. Under this classification, Hidden
Markov Models (HMM) approach has been used. An HMM consists of a number of hidden states, each with a
probability of transitioning from itself to another state. The transitioning probabilities are modeled as nth order
Markov processes. An important feature of the topology is that states are allowed to transition themselves.
Alternatively, models based on finite state machines have been used to capture the sequential nature of gestures
by requiring a series of states estimated from visual data to match in order, to a learned model of ordered
states.
The advantage of this representation over HMMs is that it does not require a large set of data in order to train
the models. Finally, temporal extensions to neural networks, (time-delay neural networks), have been used to
learn mappings between training exemplars (2D or 3D features) and gestures. Also, much care must be taken
during the training stage; otherwise the network may over fit on the training gesture set and not generalizes
well to variations of gestures outside the training set.

CHAPTER 2
OBJECTIVES & TOOLS

2.1 INTRODUCTION:
This project on Gesture Recognition Using MATLAB emphasizes on easy & swift communication using
minimum tools which can be easily accessible. The main objectives & tools used for this project are discussed
in the nest sections.

2.2 OBJECTIVES:
The main objectives of our project are:
Response time should be very fast.
Computer vision algorithms should be reliable and work for different people.
The vision-based interfaces will be replacing existing ones, which are often of low cost.
User-friendlier man-machine interaction.
Reduce time consumption.
Reduce error scope.
Easy operation for operator of the system.

2.3 TOOLS:
This project is developed using the following tools:
HARDWARE:

Processor
Memory
Processor Speed
RAM memory

: Intel Core i3
: 3 GB
: 2.30GHz
: 2.00 GB

SOFTWARE:

Windows XP/7
MATLAB 7.01
3

CHAPTER 3
LITERATURE REVIEW & ALGORITHM

3.1 MATLAB Overview


The name MATLAB stands for matrix laboratory. MATLAB is a high-performance language for technical
computing. It integrates computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses include:

Math and computation

Algorithm development

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics

Application development, including Graphical User Interface building

MATLAB is an interactive system whose basic data element is an array that does not require dimensioning.
This allows solving many technical computing problems, especially those with matrix and vector formulations,
in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or
FORTRAN.
The reason we have decided to use MATLAB for the development of this project is its toolboxes. Toolboxes
allow learning and applying specialized technology. Toolboxes are comprehensive collections of MATLAB
functions (M-files) that extend the MATLAB environment to solve particular classes of problems. It includes
among others image processing and neural networks toolboxes.
In 2004, MATLAB had around one million users across industry and academia. MATLAB users come from
various backgrounds of engineering, science, and economics. MATLAB is widely used in academic and
research institutions as well as industrial enterprises.

GRAPHICS AND GRAPHICAL USER INTERFACE PROGRAMMING


MATLAB supports developing applications with graphical user interface features. MATLAB includes
GUIDE (GUI development environment) for graphically designing GUIs. It also has tightly integrated graphplotting features. For example the function plot can be used to produce a graph from two vectors x and y. The
code:
x = 0:pi/100:2*pi;
y = sin(x);
plot(x,y)
produces the following figure of the sine function:

Figure 1: sine function

A MATLAB program can produce three-dimensional graphics using the functions surf, plot3 or mesh.
[X,Y] = meshgrid(-10:0.25:10,-10:0.25:10);
f = sinc(sqrt((X/pi).^2+(Y/pi).^2));
mesh(X,Y,f);
axis([-10 10 -10 10 -0.3 1])
xlabel('{\bfx}')
ylabel('{\bfy}')
zlabel('{\bfsinc} ({\bfR})')
hidden off
5

This code produces a wireframe 3D plot of the two-dimensional unnormalize


sinc function:

Figure 2: Wireframe 3D plot of 2D sinc function

[X,Y] = meshgrid(-10:0.25:10,-10:0.25:10);
f = sinc(sqrt((X/pi).^2+(Y/pi).^2));
surf(X,Y,f);
axis([-10 10 -10 10 -0.3 1])
xlabel('{\bfx}')
ylabel('{\bfy}')
zlabel('{\bfsinc} ({\bfR})')

This code produces a surface 3D plot of the two-dimensional unnormalized sinc function:
6

Figure 3: Surface 3D plot of 2D sinc function

In MATLAB, graphical user interfaces can be programmed with the GUI design environment (GUIDE) tool.

INTERFACING WITH OTHER LANGUAGES:


MATLAB can call functions and subroutines written in the C programming language or FORTRAN. A
wrapper function is created allowing MATLAB data types to be passed and returned. The dynamically
loadable object files created by compiling such functions are termed "MEX-files" (for MATLAB executable).
Libraries written in Perl, Java, ActiveX or .NET can be directly called from MATLAB, and many MATLAB
libraries (for example XML or SQL support) are implemented as wrappers around Java or ActiveX libraries.
Calling MATLAB from Java is more complicated, but can be done with a MATLAB toolbox which is sold
separately by MathWorks, or using an undocumented mechanism called JMI (Java-to-MATLAB Interface),
(which should not be confused with the unrelated Java Metadata Interface that is also called JMI).
As alternatives to the MuPAD based Symbolic Math Toolbox available from MathWorks, MATLAB can be
connected to Maple or Mathematica.
Libraries also exist to import and export MathML.
7

3.2 Literature Review on Gesture Recognition:


Research on hand gestures can be classified into three categories.
The first category, glove based analysis, employs sensors (mechanical or optical) attached to a glove
that transduce finger flexions into electrical signals for determining the hand posture.
The second category, vision based analysis, is based on the way human beings perceive information
about their surroundings.
The third category, analysis of drawing gestures, usually involves the use of a stylus as an input device.
Analysis of drawing gestures can also lead to recognition of written text.
Our project is based on analysis of drawing gestures.
OBJECT RECOGNITION: It is classified into two categories

OBJECT
RECOGNITION

LARGE OBJECT
TRACKING

SHAPE
RECOGNITION
Figure 4

LARGE OBJECT TRACKING - The large-object-tracking method makes use of a low-cost


detector/processor to quickly calculate moments. This is called the artificial retina chip. This chip combines
image detection with some low-level image processing. The chip can compute various functions useful in the
fast algorithms for interactive graphics applications.
8

SHAPE RECOGNITION - If the hand signals fell in a predetermined set, and the camera views a close-up of
the hand, example-based approach may be used, combined with a simple method top analyze hand signals
called orientation histograms. These example-based applications involve two phases; training and running. In
the training phase, the user shows the system one or more examples of a specific hand shape. The computer
forms and stores the corresponding orientation histograms. In the run phase, the computer compares the
orientation histogram of the current image with each of the stored templates and selects the category of the
closest match, or interpolates between templates, as appropriate. This method should be robust against small
differences in the size of the hand but probably would be sensitive to changes in hand orientation.

3.3 NEURAL NETWORKS:


Neural networks are composed of simple elements operating in parallel. Neural networks are models that are
capable of machine learning and pattern recognition. They are usually presented as systems of interconnected
neurons that can compute values from inputs by feeding information through the network. Commonly neural
networks are adjusted, or trained, so that a particular input leads to a specific target output. There, the network
is adjusted, based on a comparison of the output and the target, until the network output matches the target.
Typically many such input/target pairs are used, in this supervised learning, to train a network.
Neural networks have been trained to perform complex functions in various fields of application including
pattern recognition, identification, classification, speech and vision and control systems.

Figure 5: Neural Net block diagram

There are two modes of learning: Supervised and unsupervised.

SUPERVISED LEARNING: Supervised learning is based on the system trying to predict outcomes for
known examples and is a commonly used for training method. It compares its predictions to the target answer
and "learns" from its mistakes. The data start as inputs to the input layer neurons. The neurons pass the inputs
along to the next nodes. As inputs are passed along, the weighting, or connection, is applied and when the
inputs reach the next node, the weightings are summed and either intensified or weakened. This continues until
the data reach the output layer where the model predicts an outcome. In a supervised learning system, the
predicted output is compared to the actual output for that case. If the predicted output is equal to the actual
output, no change is made to the weights in the system. But, if the predicted output is higher or lower than the
actual outcome in the data, the error is propagated back through the system and the weights are adjusted
accordingly.
This feeding error backwards through the network is called "back-propagation."

UNSUPERVISED LEARNING: Neural networks which use unsupervised learning are most effective for
describing data rather than predicting it. The neural network is not shown any outputs or answers as part of the
training process--in fact, there is no concept of output fields in this type of system. The advantage of the neural
network for this type of analysis is that it requires no initial assumptions about what constitutes a group or how
many groups are there. The system starts with a clean slate and is not biased about which factors should be
most important.

3.4 NEURON MODEL:


A class of statistical models will be called neuron if they:
Consist of sets of adaptive weights (numerical parameters that are tuned by a learning algorithm
Are capable of approximating non-linear functions of their inputs

10

Figure 6: Neuron

The scalar input p is transmitted through a connection that multiplies its strength by the scalar weight w, to
form the product wp, again a scalar. Here the weighted input wp is the only argument of the transfer function f,
which produces the scalar output a. The neuron on the right has a scalar bias, b. The bias is much like a weight,
except that it has a constant input of 1. The transfer function net input n, again a scalar, is the sum of the
weighted input wp and the bias b. This sum is the argument of the transfer function f. Here f is a transfer
function, typically a step function or a sigmoid function, that takes the argument n and produces the output a. w
and b are both adjustable scalar parameters of the neuron. The central idea of neural networks is that such
parameters can be adjusted so that the network exhibits some desired or interesting behavior.
Thus, the network can be trained to do a particular job by adjusting the weight or bias parameters, or perhaps
the network itself will adjust these parameters to achieve some desired end. All of the neurons in the program
written in MATLAB have a bias.. A

3.5 PERCEPTRON:
The perceptron is a program that learns concepts, i.e. it can learn to respond with True (1) or False (0) for
inputs presented to it, by repeatedly "studying" examples presented to it.
The structure of a single perceptron is very simple. There are two inputs, a bias, and an output. Both the inputs
and outputs of a perceptron are binary - that is they can only be 0 or 1. Each of the inputs and the bias is
connected to the main perceptron by a weight. A weight is generally a real number between 0 and 1. When the
input number is fed into the perceptron, it is multiplied by the corresponding weight. After this, the weights are
all summed up and fed through a hard-limiter. Basically, a hard-limiter is a function that defines the threshold
values for 'firing' the perceptron. For example, the limiter could be:
11

For example - If the sum of the input multiplied by the weights is -2, the limiting function would return 0. Or
if the sum was 3, the function would return 1.

Figure 7: Perceptron block diagram

3.6 IMAGE DATABASE:


The starting point of the project was the creation of a database with all the images that would be used for
training and testing. The image database can have different formats. Images can be either hand drawn, digitized
photographs or a 3D dimensional hand. Photographs were used, as they are the most realistic approach.
Images came from two main sources. Various ASL databases on the Internet and Photographs took with a
digital camera. This meant that they have different sizes, different resolutions and sometimes almost
completely different angles of shooting. Two operations were carried out in all of the images. They were
converted to grayscale and the background was made uniform. The internet databases already had uniform
backgrounds.
Camera had to be processed in Adobe Photoshop. Drawn images can still simulate translational variances with
the help of an editing program (e.g. Adobe Photoshop).
The database itself was constantly changing throughout the completion of the project as that would decide the
robustness of the algorithm. Therefore, it had to be done in such way that different situations could be tested
and thresholds above which the algorithm didnt classify correct would be decided.
12

An example is shown below. In the first row are the training images. In the second, the testing images.

Train image 1

Test Image 1

Train image 2

Train image 3

Test Image 2

Test Image 3

Figure 8

3.7 ORIENTATION HISTOGRAMS:


To have the gestures same regardless of where they occur with the images boarders, position is ignored
altogether, and a histogram is tabulated of how often each orientation element occurred in the image. Clearly,
this throws out information and some distinct images will be confused by their orientation histograms. In
practice, however, one can choose a set of training gestures with substantially different orientation histograms
from each other.
One can calculate the local orientation using image gradients. In this project two 3 tap x and y derivative
filters have been used. The outputs of the x and y derivative operators will be dx and dy. Then the gradient
direction is atan (dx, dy). The edge orientation is used as the only feature that will be presented to the neural
network. The reason for this is that if the edge detector was good enough it would have allowed testing the
network with images from different databases.
13

Another feature that could have been extracted from the image would be the gradient magnitude using the
formula below

Where,
a= dx
b= dy
This would lead though to testing the algorithm with only similar images. Apart from this the images before
resized should be of approximately the same size. This is the size of the hand itself in the canvas and not the
size of the canvas. Once the image has been processed the output will be a single vector containing a number
of elements equal to the number of bins of the orientation histogram.
Figure shows the orientation histogram calculation for a simple image. Blurring can be used to allow
neighboring orientations to sense each other.

Figure 9: Orientation histogram


14

3.8 OPERATION:
The program can be divided in 6 steps.

Step1
The first thing for the program to do is to read the image database. A for loop is used to read an entire folder of
images and store them in MATLABs memory. The folder is selected by the user from menus. A menu will
firstly pop-up asking whether to run the algorithm on test or train sets. Then, a second menu will pop-up for the
user to choose which ASL sign he wants to use.

Step2
Resize all the images that were read in Step1 to 150x140 pixels. This size seems the optimal for offering
enough detail while keeping the processing time low.

Step3.
Next thing to do is to find the edges. As mentioned before 2 filters were used.
For the x direction x = [0 -1 1]
For the y direction y=
which is the same as x but transposed and multiplied by 1.

Step 4
Divide the two resulting matrices (images) dx and dy element by element and then take the atan ( tan1 ). This
will give the gradient orientation.

Step 5
Then the MATLAB function im2col is called to rearrange the image blocks into columns. This is not a
necessary step but it has to be done to display the orientation histogram. Rose creates an angle histogram,
which is a polar plot showing the distribution of values grouped according to their numeric range. Each group
is shown as one bin. Below are some examples. While developing the algorithm those histograms are the
fastest way of getting a good idea how good the detection is done.
15

Orientation histogram of a_1

Orientation histogram of a_2


Figure 10

Here we can see the original images that generated the histograms above in the same order

Original image a_1

Original image a_2


Figure 11

16

3.9 Algorithm

17

1. Read Text Files from Disk:


This is the first step where we input image to the system.
2. Determine Number of Neurons & Targets:
For multidimensional arrays RGB image is being considered. To access a sub-image, the syntax is,
subimage=RGB(20:40,50:80,:85);
For optimal number determination of neurons, always the highest among the input and output is taken.
So number of neurons determined are 85.
Since we have taken 5 test images in our project, number of targets to be matched are 5.
3. Initialize Pre-Processing Layer:
Pre-processing of image is done for enhancement of image and also for getting results with minimum
error. In the proposed algorithm the image is pre-processed using RGB color model. Three primary
colors red(R), green (G), and blue (B) are used. The main advantage of this color space is its
simplicity..
4. Initialize Learning Layer:
This is a method for initialization of weights of neural networks to reduce the training time.
5. Train Perceptron:
A perceptron learns to distinguish patterns through modifying its weights. In the perceptron, the most
common form of learning is by adjusting the weights by the difference between the desired output and
the actual output.
6. Plot Error:
This step plots the graph of vector with error bars. The error is calculated by subtracting the output A
from target T.
7. Select Test Set:
Test images are selected so that they can be matched with the trained images to obtain the desired
output.
8. Display Output:
Finally, the output is displayed showing the similarity or difference between trained & test images
about their orientation histogram.

18

CHAPTER 4
RESULTS & DISCUSSION
4.1 RESULT:

4.2 DISCUSSION:

CONCLUSION:
We proposed a simple hand gesture recognition algorithm, followed by various steps like pre-processing,
image converted into RGB , so that varying lightening conditions will not cause any problem. Then smudge
elimination is done in order to get the finest image. These pre-processing steps are as important as any other
step. After performing the pre-processing on the image, the second step is to determine the orientation of the
image, only horizontal and vertical orientation is considered here and images with uniform background is
taken.
The strength of this approach includes its simplicity, ease of implementation, and it does not required any
significant amount of training or post processing as rule based learning is used. It provides the higher
recognition rate with minimum computational time.
The weakness of this method is that certain parameters and threshold values are taken experimentally that is it
does not follow any systematic approach for gesture recognition, and many parameters taken in this algorithm
are based on assumption made after testing number of images.
In this system we have only considered the static gesture, but in real time we need to extract the gesture form
the video or moving scene.
To realize the ultimate goal of humans interfacing with machines on their own natural terms gestures are
expressive, meaningful body motions involving physical movements of the fingers, hands, arms, head, face, or
body with the intent of:
1) Conveying meaningful information interacting with the environment
2) Gesture recognition is an extensively developed technology available designed to identify human position,
action, and manipulation. Gestures are used to facilitate communication with digital applications.
3)Among the various ways of gesture recognition like Hand, Face and Body Gesture Recognition, Hand
Gesture Recognition is efficient technique to o recognize human gestures due to its simple and greater
accuracy features.
19

FUTURE SCOPE:

The future scope lies in making this algorithm applicable for various orientations of hand gestures, also
different classification scheme can be applied. Gesture recognition could be used in many settings in the future.
The algorithm can be improved so that images with non uniform background can also be used, this will
enhance the human computer interaction.
Visually impaired people can make use of hand gestures for human computer interaction like
controlling television, in games and also in gesture to speech conversion.
Georgia Institute of Technology researchers have created the Gesture Panel System to replace
traditional vehicle dashboard controls. Drivers would change, for example, the temperature or soundsystem volume by maneuvering their hand in various ways over a designated area. This could increase
safety by eliminating drivers current need to take their eyes off the road to search for controls.
During the next few years, according to Gartner's Fenn, gesture recognition will probably be used
primarily in niche applications because making mainstream applications work with the technology will
take more effort than it's worth.
Hand recognition system can be useful in many fields like robotics, computer human interaction and so
make hand gesture recognition offline system for real time will be future work to do.
Support Vector Machine can be modified for reduction of complexity. Reduced complexity provides us
less computation time so we can make system to work real time.
Facial Gesture Recognition Method could be used in vehicles to alert drivers who are about to fall
asleep.

20

APPENDIX I
COMMANDS:
1. echo on: The commands in a script M-file will not automatically be displayed in the Command Window. To
display the commands along with results we use echo.
2. clc: Clear command window clears all input and output from the command window display, giving a clear
screen.
3. pause: Each time MATLAB reaches a pause statement, it stops executing the M-file until the user presses a
key. Pauses should be placed after important comments, after each graph, and after critical points where your
script generates numerical outputs. Pauses allow viewer to read and understand results.
4. fid: file ID=> An integer file identifier obtained from fopen.
5. fopen: Opens file.
6. fid=fopen(train.txt,rt): Opens the file with the type of access specified by permission. rt means Read
Text (t-text mode).
7. fscanf (file scan format): Read data from device, and format as text.
8. P1=fscanf(fid,%f,[19,inf]); :
A=fscanf(str,format,sizeA)
Read data from string and converts it according to format. Format is a C language conversion specification.
Conversion specification involves the % character and the conversion characters d, i, o, u, x, X, f, e, E, g, G, c
and s.
Size A can be integer or can have the form [m,n].
[m,n]: m by n matrix filled in column order, n can be inf, but m cannot.
n: Column vector with n elements.
inf: Column vector with the number of elements in the file (default).
str: Character string.
21

%f: It denotes floating point numbers. Floating point fields can obtain any of the following (inf, -inf, NaN or NaN).
[19]: Neural network interface. (Face segmentation using a Gaussian mixture model of skin or deformable
models of body / face parts)
9. TS1: Test vector.
10. T: Target vector.
11. T=fscanf(fid,%f,[8,inf]); : [8] means 8 bytes (since set A is always double).
12. Determine optimal number of neurons: For multidimensional arrays RGB image is being considered.
subimage=RGB(20:40,50:80,:85);
For optimal number determination of neurons, always the highest among the input and output is taken. So we
have,
S1=85
13. initp: This command initializes weights and biases.
14.Pre-processing: Image pre-processing is the very first step, pre-processing of image is done for
enhancement of image and also for getting results with minimum error. Reduction in the dimensionality of the
input raw data is probably the most important reason for pre-processing.
15. A1=simup(P,W1,b1); : The network takes P as the input data processes and transforms it to A1. A1 is
85x5 matrix. A1 will then be the input to the hidden layer that will train the network. A data sheet of 5 imagesafter the first layer pre-processing contains 85x5=425 elements.
16. simup: It simulates perception/preprocessing layer.
17. TP=[1 500]; : Creates vector of 500 equally spaced frequencies.
18. clf reset: Clears current figure and resets the window.
19. figure(gcf): Sets the units of current figure.
gcf: get current figure.
20. epochs: Presentation of the set of training (input/target) vectors to a network and the calculation of new
weights and biases.
21. ploterr (errors): Plots the graph of vector with error bars. The error is calculated by subtracting the output
A from target T.
22. close(W): Closes the waiting bar.
22

APPENDIX II
CODING:
echo on
clc
pause
clc
%Store the training information in a test file
fid=fopen(train.txt,rt);
P1=fscanf(fid,%f,[19,inf]);
P=P1;
%%Open some text file using code to write and fetch the required information about image
fid=fopen(testA.txt,rt);
TS1=fscanf(fid,%f,[19,inf]);
%(As here we are only testing alphabet A)
Fid=fopen(target8.txt,rt);
T=fscanf(fid,%f,[8,inf]);
%%It has been found that the optimal number of neurons for the hidden layer is 85
S1=85;
S2=5;
%%Now we have to initialize pre-processing layer
[W1,b1]=initp(P,S1);
%%We also have to initialize learning layer
[W2,b2]=initp(S1,T);
Pause
%%Now train the network
23

A1=simup(P,W1,b1);

%First layer is used to preprocess the input vectors

TP=[1:500];
pause
clf reset
figure(gcf)
%%Resize the frame size
Setfsize(600,300);
[W2,b2,epochs,errors]=trainp(W2,b2,A1,T,TP);
pause
clc
ploterr(errors);
pause
M=MENU(Choose a file resolution,Test A);
If M==1
TS=TS1;
else
disp(Wrong Input);
a1=simup(TS,W1,b1);
a2=simup(a1,W2,b2);
echo off
%%Create a Menu
clc
F=MENU(Choose a database set,Test Set,Train Set);
If F==1
K=MENU(Choose a file,Test A);
%%For testing a Datasheet
24

If K==1
loop=5
for i=1:loop
string=[test\A\num2str(i).tif];
Rimages{i}=imread(string);
end
end
end
end;
%%For training
If F==2
loop=3

%%Set loop to 3 considering all train sets have 3 images

L=MENU(Choose a file,Train A);


If L==1
for i=1:loop
string=[train\A\num2str(i).tif];
Rimages{i}=imread(string);
end
end
end
T{i}=imresize(Timages{i},[150,140]);
x=[0 -1 1];

%x-derivative filter

y=[0 1 -1];

%y-derivative filter

dx{i}=convn(T{i},x,same);
dy{i}=convn(T{i},y,same);
gradient{i}=dy{i}./dx{i};
25

theta{i}=atan(gradient{i});
cl{i}=im2col(theta{i},[1,1],distinct);
N{i}=(cl{i}*180)/3.14159265359;
C1{i}=(N{i}>0)&(N{i}<10);
S1{i}=sum(C1{i});
C2{i}=(N{i}>10.0001)&(N{i}<20);
S2{i}=sum(C2{i});
C3{i}=(N{i}>20.0001)&(N{i}<30);
S3{i}=sum(C3{i});
C4{i}=(N{i}>30.0001)&(N{i}<40);
S4{i}=sum(C4{i});
C5{i}=(N{i}>40.0001)&(N{i}<50);
S5{i}=sum(C5{i});
C6{i }=(N{i}>50.0001)&(N{i}<60);
S6{i}=sum(C6{i});
C7{i }=(N{i}>60.0001)&(N{i}<70);
S7{i}=sum(C7{i});
C8{i }=(N{i}>70.0001)&(N{i}<80);
S8{i}=sum(C8{i});
C9{i }=(N{i}>80.0001)&(N{i}<90);
S9{i}=sum(C9{i});
C10{i }=(N{i}>90.0001)&(N{i}<100);
S10{i}=sum(C10{i});
C11{i }=(N{i}>-89.9)&(N{i}<-80);
S11{i}=sum(C11{i});
C12{i }=(N{i}>-80.0001)&(N{i}<-70);
26

S12{i}=sum(C12{i});
C13{i }=(N{i}>-70.0001)&(N{i}<-60);
S13{i}=sum(C13{i});
C14{i }=(N{i}>-60.0001)&(N{i}<-50);
S14{i}=sum(C14{i});
C15{i }=(N{i}>-50.0001)&(N{i}<-40);
S15{i}=sum(C15{i});
C16{i }=(N{i}>-40.0001)&(N{i}<-30);
S16{i}=sum(C16{i});
C17{i }=(N{i}>-30.0001)&(N{i}<-20);
S17{i}=sum(C17{i});
C18{i }=(N{i}>-20.0001)&(N{i}<-10);
S18{i}=sum(C18{i});
C19{i }=(N{i}>-10.0001)&(N{i}<-0.001);
S19{i}=sum(C19{i});
D{i}=[S1{i} S2{i} S3{i} S4{i} S5{i} S6{i} S7{i} S8{i} S9{i} S10{i} S11{i} S12{i} S13{i} S14{i} S15{i}
S16{i} S17{i} S18{i} S19{i}];
close(W);

27

REFERENCES

[1] Klimis Symeonidis, Final Report-Hand Gesture Recognition using Neural Networks
[2] A Study on Hand Gesture Recognition Technique, Delhi University
[3] Kishan Mehrotra, Elements of Artificial Neural Networks
[4] Christopher M.Bishop, Neural Networks for Pattern Recognition, Clarendon Press.Oxford 1995
[5] M.Hajek, Neural Networks
[6] Thomas Holleczek, Daniel Roggen, MATLAB based GUI for Gesture Recognition with Hidden
Markov Models

28

You might also like