Professional Documents
Culture Documents
Abstract— This paper presents a hand gesture and a vision • An eclectic gesture set is used to offer higher accuracy
driven wheelchair system for physically handicapped people. The with minimized confusion. By contrast, conventional
system is contrived to have a utilitarian design that caters to wheelchairs comprise of buttons, joysticks, paddles and
most of the disabilities rather than just one. In order to minimize levers to perform various control maneuvers. The
system-human interaction, we have endeavored to present two
subject ought to know the whereabouts of the controls
distinct but closely related concepts viz. the hand gesture and
vision based control system in conjunction for the wheelchair and must be sufficiently agile to reach and operate
control. The wheelchair system is designed such that it provides them, which is difficult in the case of physically
for a natural and an intuitive user interface which can challenged and infirm people.
comprehend and react to in compliance with the users’ • Very robust to lighting variations.
instinctive volition which eases the mobility of the physically • Uses Image Moments instead of template matching for
challenged people. All image processing was performed using NI classification which has a significant impact on the total
Vision Assistant. NI LabVIEW was used to train and implement processing time, due to simplified algorithm. The
Neural Networks for Hand Gesture Classification and also to acquired images are invariant to scaling, rotation and
enable motion control. A prototype was developed on which all
translation.
our experiments were successfully carried out. The system has
• Image classification was done using Artificial Neural
been tested for users with varied hand shapes and proved to be
extremely reliable. Networks which made our system more intuitive and
adaptive in nature. It was also significantly quicker and
more efficient than Template-Matching and nearest
Keywords— Thresholding, Morphological operations, Image
neighbor classifiers.
Moments, Artificial Neural Network (ANN).
I. INTRODUCTION
Old age results in infirmity which eventuates in bodily
enervations. This state in old people often leads to adverse
scenarios where locomotion may pose a major challenge. This
situation is more tragic in the case of the paralyzed and the
quadriplegic [1]. In perspective of these issues, a locomotive
contrivance would serve a monumental purpose, especially
one that comprehends and complies with the users’ will. The
challenge would thus be to devise and construct a locomotive
contraption viz. a wheelchair which accords with the subject’s
instinctive volition and reciprocates in a manner that will
assist navigation. Fig.1 The conceived architecture of the proposed electronic wheelchair
Various wheelchair systems are found aplenty. Voice
operated devices are by far the most popular. However, voice In perspective of the affording a more natural user
operated wheelchairs certainly inhibit the user’s natural ability communication and in order to minimize human-system
to communicate. Moreover, it doesn’t conform to the user’s interaction, we have endeavored to present two distinct but
instinctive ability to enable locomotion and may prove to be closely related concepts viz. the hand gesture[2,3] and vision
very difficult to control and maneuver. Numerous attempts based control system in conjunction [4] which are used as
have been made to map eye movements onto a display and to modes of communication to the electronic wheelchair
perform certain tasks by drifting the cursor across various contingent on the users’ will, subsequently facilitating
options available across the terminal. Though successful, this locomotion. The idea here is to improve the independence of
mode of operations exerts a huge amount of stress on the user physically challenged people over complicated systems with
and may make the system impracticable. Certain endeavors complex functionalities thus offering a locomotive device that
have also been made to develop thought-controlled comprehends the users’ instinctive thought process enabling
wheelchairs. simplified motion.
Our application distinguishes from the previous attempts This paper is organized into three main sections. Firstly, a
because of few pronounced differences viz. detailed elucidation of the principles involved in hand gesture
control is presented. This section also illustrates the various
169
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
rest is made black in color. The basic steps followed in hand to reproduce an exact hand gesture that is already stored in the
gesture control are shown in the flowchart in fig.3 (a). database. It is also prone to errors when the image is scaled
which is likely when the camera is repositioned. The gesture
1) Image Acquisition: Our application focuses on the hand
variations because of rotation, scaling, and translation can be
segment enclosed between the wrist and the fingertips, and
circumvented using a set of features that are invariant to these
any image processing in done only on this portion of the hand.
operations.
This consideration is of chief importance in assisting the
system in much easier interpretation of the gestures. The
images are continuously acquired by the camera that is fitted
over the hand rest and are pre-processed to check whether the
gesture given is static or not.
2) Removal of Erratic Gestures (Motion Detection): In
order to eliminate the necessity to process every frame we Fig.5 Image Moments of normal, scaled, translated, and rotated images
resort to performing motion detection on the sequence of
Table I— The 7 Image Moments of the gestures in fig.5
images prior to any image processing. The user is stipulated to
issue a legitimate gesture for a specific interval of time such
that there is insignificant hand motion in that interval. Thus
with motion detection all the intermediate frames between any
two legitimate gestures which are either blurred or
unrecognized can be discarded and only the valid gestures are
considered for future processing. The concomitant flowchart
shown in fig. 3(b) elaborates the process of motion detection.
3) Thresholding: The relevant image is now thresholded
using Adaptive Thresholding which will automatically set the
threshold value to figure out brighter objects. Since we are
using a black background, our hand will behave as a bright
object and will be perfectly separated from the background
after Thresholding. To indentify the region of interest, which
in this case is the region enclosed between the wrist and the The fig.5 shows an example of a threshold hand gesture,
fingertips, we resort to usage of bands worn around the wrist. which is scaled, translated, and rotated respectively. The table
This proved to be of immense assistance in delineating the 1 shows their respective image moments. It is clear from the
ROI from the extraneous segment of the hand. table that the image moments are significantly close to each
other. This example substantiates the usability of image
4) Morphological Operations: Despite Thresholding, in moments in object recognition applications as they are
majority of the cases, the resultant image fails to contain only considerably invariant to scaling, rotation and translation. We
the concerned hand figure. The image may consist of border have taken 3 image moments Φ1, Φ2, and Φ3 for our
objects or stray pixels which may debilitate precise feature experimentations. The usage of Image Moments largely
extraction. In order to remove these we resort to certain simplifies our algorithm and decreases the processing time as
morphological operations which essentially includes removal compared with template-matching.
of border objects, small objects, extraneous or noisy pixels Having achieved the above stages, we have successfully
resulting in a clean binary image with the requisite segment of extracted the image moments from an image of a user’s hand
the hand. Fig. 4(a)-(e) depict the aforementioned step, in that gesture. However, these moments would remain meaningless
order. Fig. 4(e) is the final thresholded image which is unless we are able to classify the newly issued gesture into
qualified for the subsequent gesture classification process. one of the pre-defined gestures which would correspond to a
particular command to control the wheelchair. We have
utilized the concept of Artificial Neural Network [ANN] for
Hand Gesture Classification.
(a) (b) (c) (d) (e) Estimating the Best Set of Gestures: Faulty gesture
classification due to overlapping moment invariants can be
Fig. 4 (a) Original image; (b) Thresholded image; (c) Removing border; (d)
Removing small objects ; (e) Filling Holes. obviated by selecting the most appropriate gestures for our
application. Over 5000 sample images with corresponding
5) Gesture Classification: moment invariants, representing 12 gestures, were chosen
Calculation of Moment Invariants [7]: For feature extraction insuring that these were distinctly identifiable and
we use image moments. Classification using Template ergonomically feasible to issue. Accuracy scores for these
Matching and Nearest Neighbor classifiers were tried which gestures as a result of succeeding experimentation is exhibited
proved to be highly susceptible to errors when a user is unable in fig.6
170
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
were used as the activation functions in layer 1 and layer 2
respectively. We used NI LabVIEW to train the system using
Back Propagation Algorithm and Delta Rule.
The index of the neuron which fires in the last stage
represents the type of gesture recognized. We obtained an
output vector of size 8x1 with one element ‘1’ and remaining
‘0’s. Whichever command corresponds to the output of index
1 is executed. For example, if the output is [0 1 0 0 0 0 0 0 ]’,
gesture 2 is recognized and the command corresponding to it
is carried out. If more than one neuron fires at the same time
or if none fires then the gesture is considered unrecognized [9].
B. Vision Based Control:
Fig.6 The 12 distinct gestures with their respective classification accuracy
scores Vision based control has been included in our application in
concern of enhancing the scope of the functionalities offered
Furthermore, a Φ1-Φ2-Φ3 3D plot was used to rule out the
by the electronic wheelchair. It isn’t hard to notice that eye
gestures with overlapping spatial distributions. Fig.7 shows
movements are far more comfortable than moving any other
the resultant 8 gestures that were culled out by discarding
part of the body. This simple notion plays a momentous role
gestures with overlapping image moments and low accuracy
in establishing motion control for maneuvering the wheelchair.
score. Notice that the moment invariant clusters are distinctly
Fundamentally, the notion is to afford a system that directs the
distributed in space and non-overlapping.
subject to the place desired merely by eye movements i.e. the
direction of motion is determined by where the subject is
looking. This rationale can be corroborated by everyday
experience. Whenever a person desires to move to a place in
close proximity, his instinctive reaction would be to look
toward the place before making a move. It is possible to track
the movement of the iris, which is an obvious indicator of the
direction of travel.
171
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
of the image so as to separate the pupil from the rest of the in the direction of the line of sight (LoS). If the issued gesture
image. This step is accompanied by morphological operations isn’t gesture 1, then whichever command corresponds to the
such as removal of border pixels and small objects and identified gesture is executed. Five of the 8 gestures are
particle filtering to improve the quality of the processed shown in fig. 11 with the corresponding commands performed.
images and to ensure its independence over noise and
distortions. We then eventually find the centre coordinates of
the iris with the help of its curvature using the ‘circle detector’
function. All concomitant snapshots shown in fig.8 would
attest to the aforementioned steps.
172
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
Fig. 12 (a) The Front Panel of our application; (b) LabVIEW VI — The entire Block Diagram for Hand Gesture and Vision Controlled Wheelchair
Fig.11 above shows five of the eight eclectic gestures we cRIO and CVS. The requisite information, once passed onto
had obtained earlier, essential for our application. Gesture 1 cRIO, was subject to further processing. The VI prepared on
transfers the responsibility of direction control to eye motion. the development computer was also deployed onto the cRIO.
The wheelchair travels in direction of line of sight. Gesture 2 The cRIO processes the Image Moments and with the help of
halts the wheelchair. Gesture 3, 4, 5 make the wheelchair Neural Network classifies the issued gesture and dispatches
traverse straight, rotate left and right respectively. Fig.12 (a) the command to the motors via the cRIO digital I/O module
and 12(b) show the front panel and the LabVIEW VI block 9472.
diagram of our application.
All image-processing experimentations were performed VI. SUMMARY AND FUTURE WORK
using NI Vision Assistant and NI LabVIEW was used for both The paper describes a novel vision and hand gesture based
training and implementing the Neural Network and also to locomotive system ideal for physically challenged individuals.
effect motion detection and control of wheelchair. We used a The electronic wheelchair also focuses on complying with
computer running on Windows XP with a 2GHz processor user’s instinctive volition enabling much easier mobility. An
and 512MB for all our initial experimentations. eclectic set of 8 hand gestures were selected by statistical
analysis and were used in our experimentations. Our system
excels other systems in that it uses only a minimum number of
gestures and facilitates easy implementations with accurate
results. The system proved to be highly responsive to the user
Fig. 13 NI Vision script —Image processing for Hand Gesture Control
inputs and behaved precisely in the manner it was
programmed to. We plan to incorporate dynamic, three
dimensional gestures[10] in the future for increased
functionalities .We also plan to incorporate GPS based failure
Fig. 14 NI Vision script —Image processing for Vision Based Control notification system, which can also be used to monitor the
users’ whereabouts.
The NI Vision scripts which were prepared for Hand
Gesture and Vision based control are shown in fig. 13 and REFERENCES
fig.14 respectively. Our image processing algorithm to [1] Mark Weiser, “The Computer for the Twenty-First Century,” Scientific
process an image on an average required only about 9ms and American, pp. 94–104, September 1991.
556ms for Hand Gesture and Vision based Control [2] Davis, J., and Shah, M.: ‘Recognizing hand gestures’. Proc. European
Conf. Computer Vision, Stockholm, 1994, pp. 331–340
respectively. We computed a total of less than 650ms for the [3] C. Metzger, M. Anderson, and T. Starner. Freedigiter: A contact-free
entire operation on our computer, right from image acquisition device for gesture control. In Eighth IEEE International Symposium on
to an action being executed. Wearable Computers (ISWC’04), pages 18–21, 2004.
Our hardware experimentation includes the usage of CVS- [4] Marius et al.: ‘Face detection using color thresholding, and Template
matching’, Stanford University, CA,May
1454 and cRIO- 9074 with cRIO digital I/O module 9472. The 2003.http://www.stanford.edu/class/ee368/Project_03/Project/reports/e
Compact Vision System [CVS-1454] was interfaced with two e368group15.pdf
Firewire [IEEE-1394], CMOS cameras. [5] http://www.ni.com/pdf/products/us/2005_5732_221_101_lo.pdf
The CVS-1454 was configured such that it resides on the [6] http://www.ni.com/pdf/manuals/374639c.pdf
[7] M. Hu, ‘‘Visual pattern recognition by moment invariants,” IRE
same subnet as the development computer. The VI was Trnrisnctioiis oii biforriiatioii Tlieory, vol. IT& pp. 179-187, 1962.
deployed on CVS-1454 enabling it to run the application [8] Haykin, S.: ‘Neural networks’ (Prentice-Hall, 1999, 2ndedn.)
without further connection to the development computer. The [9] Gonzalez, R.C., Woods, R.E., and Eddins, S.L.: ‗Digital image
CMOS cameras are made perform image acquisition, and the processing using MATLAB‘ (Pearson Prentice-Hall, 2004)
[10] Y. Xiaoming and X. Ming, "Estimation of the fundamental matrix from
image processing was performed in Real-Time by CVS-1454. uncalibrated stereo hand images for 3D hand gesture recognition",
The Image Moments, which forms the output of the CVS is Pattern Recognition, vol. 36, pp. 567-584, 2003.
dispatched to cRIO-9074. An Ethernet cable is used to link
173
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.