You are on page 1of 6

ICVES 2009

A Smart Mobility Solution for Physically Challenged


Rajat Garg, N.Shriram, Vikrant Gupta, and Vineet Agrawal
Department of Electronics and Communication Engineering
VIT- University, Vellore- 600014, India
{rajatgarg2006, nshriram2006, vikrantgupta2006, vineetagrawal2006} @ vit.ac.in

Abstract— This paper presents a hand gesture and a vision • An eclectic gesture set is used to offer higher accuracy
driven wheelchair system for physically handicapped people. The with minimized confusion. By contrast, conventional
system is contrived to have a utilitarian design that caters to wheelchairs comprise of buttons, joysticks, paddles and
most of the disabilities rather than just one. In order to minimize levers to perform various control maneuvers. The
system-human interaction, we have endeavored to present two
subject ought to know the whereabouts of the controls
distinct but closely related concepts viz. the hand gesture and
vision based control system in conjunction for the wheelchair and must be sufficiently agile to reach and operate
control. The wheelchair system is designed such that it provides them, which is difficult in the case of physically
for a natural and an intuitive user interface which can challenged and infirm people.
comprehend and react to in compliance with the users’ • Very robust to lighting variations.
instinctive volition which eases the mobility of the physically • Uses Image Moments instead of template matching for
challenged people. All image processing was performed using NI classification which has a significant impact on the total
Vision Assistant. NI LabVIEW was used to train and implement processing time, due to simplified algorithm. The
Neural Networks for Hand Gesture Classification and also to acquired images are invariant to scaling, rotation and
enable motion control. A prototype was developed on which all
translation.
our experiments were successfully carried out. The system has
• Image classification was done using Artificial Neural
been tested for users with varied hand shapes and proved to be
extremely reliable. Networks which made our system more intuitive and
adaptive in nature. It was also significantly quicker and
more efficient than Template-Matching and nearest
Keywords— Thresholding, Morphological operations, Image
neighbor classifiers.
Moments, Artificial Neural Network (ANN).

I. INTRODUCTION
Old age results in infirmity which eventuates in bodily
enervations. This state in old people often leads to adverse
scenarios where locomotion may pose a major challenge. This
situation is more tragic in the case of the paralyzed and the
quadriplegic [1]. In perspective of these issues, a locomotive
contrivance would serve a monumental purpose, especially
one that comprehends and complies with the users’ will. The
challenge would thus be to devise and construct a locomotive
contraption viz. a wheelchair which accords with the subject’s
instinctive volition and reciprocates in a manner that will
assist navigation. Fig.1 The conceived architecture of the proposed electronic wheelchair
Various wheelchair systems are found aplenty. Voice
operated devices are by far the most popular. However, voice In perspective of the affording a more natural user
operated wheelchairs certainly inhibit the user’s natural ability communication and in order to minimize human-system
to communicate. Moreover, it doesn’t conform to the user’s interaction, we have endeavored to present two distinct but
instinctive ability to enable locomotion and may prove to be closely related concepts viz. the hand gesture[2,3] and vision
very difficult to control and maneuver. Numerous attempts based control system in conjunction [4] which are used as
have been made to map eye movements onto a display and to modes of communication to the electronic wheelchair
perform certain tasks by drifting the cursor across various contingent on the users’ will, subsequently facilitating
options available across the terminal. Though successful, this locomotion. The idea here is to improve the independence of
mode of operations exerts a huge amount of stress on the user physically challenged people over complicated systems with
and may make the system impracticable. Certain endeavors complex functionalities thus offering a locomotive device that
have also been made to develop thought-controlled comprehends the users’ instinctive thought process enabling
wheelchairs. simplified motion.
Our application distinguishes from the previous attempts This paper is organized into three main sections. Firstly, a
because of few pronounced differences viz. detailed elucidation of the principles involved in hand gesture
control is presented. This section also illustrates the various

978-1-4244-5441-9/09/$26.00 ©2009 IEEE 168


Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
image processing methods adopted followed by a discussion information to the NI cRIO for immediate processing. Four
on Image Moments and Artificial Neural Networks, which proximity sensors were used to check whether the desired
facilitates convenient classifications This is followed by a direction is obstacle- free or not.
description of the methodology for vision based control. The The Firewire, CMOS cameras are made to monitor the
final section presents the test results of our experimentations hand or eye movements constantly. The wheelchair is initially
which uses the aforementioned concepts in concurrency. in stand-by mode. A switch is pressed to initialize the system.
Following this activation, the camera performs image
II. SYSTEM CONFIGURATION acquisition for processing. Depending on the results of this
The conceived architecture of the wheelchair is shown in image processing, commands are signaled to the motors.
fig. 1. The illustration shows the presence of two IEEE-1394, However, prior to image processing we perform motion
Monochrome, CMOS cameras for image acquisition. For detection on this sequence of acquired frames to discard
capturing hand gestures, a monochrome camera is situated intermediate and inappropriate frames between any two
facing downward as shown. For enabling vision control, eye consecutive legitimate gestures. Thus only those frames in
movements are captured. This is done by the second which the hand or eyeball is static for a considerable amount
lightweight camera affixed to a band around the forehead, of time is taken for image processing and all others with
focused on the eye as outlined. The cameras can be motion or blurring are discarded. This time interval is entirely
repositioned based on users’ convenience. The wheelchair dependent on the convenience of the user. Considering only
consists of a black colored hand rest. To avoid system relevant frames is of primary importance as the system
complexity the user is restrained to issue the hand gestures requires only legitimate gestures to issue commands and every
while placing the hand on the hand rest. Also the hand rest is other gesture is unneeded. Thus it is wise to discard all
black colored to make the background uniform and to unnecessary frames in concern of not overloading the
alleviate succeeding complications in image processing. processor.
Besides the cameras, the system consists of an image The following sections will elaborate the image processing
processing and a decision-making unit. The image processing techniques that were adopted for A) Hand Gesture Control B)
was performed exclusively using LabVIEW RT with Vision Vision Based Control respectively.
Development Module and decision-making was established
using NI LabVIEW RT with the help of features extracted A. Hand Gesture Control
from the image-processing unit. The result of the decision- Hand Gesture control plays a primal role is our application.
making unit would be signaled as commands to the motors In this method of control, the user gesticulates to the system a
through the digital I/O module 9472. specific, pre-defined hand gesture. The wheelchair in turn
Firstly, video acquisition of the eye movement and hand reacts to the specific gesture issued. The user has at his
motion is either done in synchronism or separately based on disposal the control of the wheelchair and can maneuver it the
the nature of control to be executed. Appropriate information way he wants.
in the gestures is extracted and issued as commands to
maneuver the wheelchair. Fig. 2 demonstrates the general
system setup.

Fig. 2 System Configuration


(a) (b)
III. SYSTEM IMPLEMENTATION
Fig.3 (a) Flowchart describing Hand Gesture control and (b) Flowchart
Compact Vision System [NI CVS 1454[5]] and Compact describing Motion Detection procedure
RIO [NI cRIO 9074[6]] were used as image processing and
decision making unit respectively .The cameras are interfaced As mentioned earlier, in view of reducing imaging
with CVS-1454. processing complexity, the user is constrained to give the
NI cRIO Digital I/O Module- 9472 was used to signal the gesture while placing the hand on the hand rest. This however
decision made by the NI cRIO-9074 to the respective motors. will not be of inconvenience as placing the hand on the hand
An NI cRIO analog input module 9201 was used to take rest is an ergonomically viable posture. Also in perspective of
analog input from sensors and microphone and send the reducing complexity of non-uniform background, the hand

169
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
rest is made black in color. The basic steps followed in hand to reproduce an exact hand gesture that is already stored in the
gesture control are shown in the flowchart in fig.3 (a). database. It is also prone to errors when the image is scaled
which is likely when the camera is repositioned. The gesture
1) Image Acquisition: Our application focuses on the hand
variations because of rotation, scaling, and translation can be
segment enclosed between the wrist and the fingertips, and
circumvented using a set of features that are invariant to these
any image processing in done only on this portion of the hand.
operations.
This consideration is of chief importance in assisting the
system in much easier interpretation of the gestures. The
images are continuously acquired by the camera that is fitted
over the hand rest and are pre-processed to check whether the
gesture given is static or not.
2) Removal of Erratic Gestures (Motion Detection): In
order to eliminate the necessity to process every frame we Fig.5 Image Moments of normal, scaled, translated, and rotated images
resort to performing motion detection on the sequence of
Table I— The 7 Image Moments of the gestures in fig.5
images prior to any image processing. The user is stipulated to
issue a legitimate gesture for a specific interval of time such
that there is insignificant hand motion in that interval. Thus
with motion detection all the intermediate frames between any
two legitimate gestures which are either blurred or
unrecognized can be discarded and only the valid gestures are
considered for future processing. The concomitant flowchart
shown in fig. 3(b) elaborates the process of motion detection.
3) Thresholding: The relevant image is now thresholded
using Adaptive Thresholding which will automatically set the
threshold value to figure out brighter objects. Since we are
using a black background, our hand will behave as a bright
object and will be perfectly separated from the background
after Thresholding. To indentify the region of interest, which
in this case is the region enclosed between the wrist and the The fig.5 shows an example of a threshold hand gesture,
fingertips, we resort to usage of bands worn around the wrist. which is scaled, translated, and rotated respectively. The table
This proved to be of immense assistance in delineating the 1 shows their respective image moments. It is clear from the
ROI from the extraneous segment of the hand. table that the image moments are significantly close to each
other. This example substantiates the usability of image
4) Morphological Operations: Despite Thresholding, in moments in object recognition applications as they are
majority of the cases, the resultant image fails to contain only considerably invariant to scaling, rotation and translation. We
the concerned hand figure. The image may consist of border have taken 3 image moments Φ1, Φ2, and Φ3 for our
objects or stray pixels which may debilitate precise feature experimentations. The usage of Image Moments largely
extraction. In order to remove these we resort to certain simplifies our algorithm and decreases the processing time as
morphological operations which essentially includes removal compared with template-matching.
of border objects, small objects, extraneous or noisy pixels Having achieved the above stages, we have successfully
resulting in a clean binary image with the requisite segment of extracted the image moments from an image of a user’s hand
the hand. Fig. 4(a)-(e) depict the aforementioned step, in that gesture. However, these moments would remain meaningless
order. Fig. 4(e) is the final thresholded image which is unless we are able to classify the newly issued gesture into
qualified for the subsequent gesture classification process. one of the pre-defined gestures which would correspond to a
particular command to control the wheelchair. We have
utilized the concept of Artificial Neural Network [ANN] for
Hand Gesture Classification.

(a) (b) (c) (d) (e) Estimating the Best Set of Gestures: Faulty gesture
classification due to overlapping moment invariants can be
Fig. 4 (a) Original image; (b) Thresholded image; (c) Removing border; (d)
Removing small objects ; (e) Filling Holes. obviated by selecting the most appropriate gestures for our
application. Over 5000 sample images with corresponding
5) Gesture Classification: moment invariants, representing 12 gestures, were chosen
Calculation of Moment Invariants [7]: For feature extraction insuring that these were distinctly identifiable and
we use image moments. Classification using Template ergonomically feasible to issue. Accuracy scores for these
Matching and Nearest Neighbor classifiers were tried which gestures as a result of succeeding experimentation is exhibited
proved to be highly susceptible to errors when a user is unable in fig.6

170
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
were used as the activation functions in layer 1 and layer 2
respectively. We used NI LabVIEW to train the system using
Back Propagation Algorithm and Delta Rule.
The index of the neuron which fires in the last stage
represents the type of gesture recognized. We obtained an
output vector of size 8x1 with one element ‘1’ and remaining
‘0’s. Whichever command corresponds to the output of index
1 is executed. For example, if the output is [0 1 0 0 0 0 0 0 ]’,
gesture 2 is recognized and the command corresponding to it
is carried out. If more than one neuron fires at the same time
or if none fires then the gesture is considered unrecognized [9].
B. Vision Based Control:
Fig.6 The 12 distinct gestures with their respective classification accuracy
scores Vision based control has been included in our application in
concern of enhancing the scope of the functionalities offered
Furthermore, a Φ1-Φ2-Φ3 3D plot was used to rule out the
by the electronic wheelchair. It isn’t hard to notice that eye
gestures with overlapping spatial distributions. Fig.7 shows
movements are far more comfortable than moving any other
the resultant 8 gestures that were culled out by discarding
part of the body. This simple notion plays a momentous role
gestures with overlapping image moments and low accuracy
in establishing motion control for maneuvering the wheelchair.
score. Notice that the moment invariant clusters are distinctly
Fundamentally, the notion is to afford a system that directs the
distributed in space and non-overlapping.
subject to the place desired merely by eye movements i.e. the
direction of motion is determined by where the subject is
looking. This rationale can be corroborated by everyday
experience. Whenever a person desires to move to a place in
close proximity, his instinctive reaction would be to look
toward the place before making a move. It is possible to track
the movement of the iris, which is an obvious indicator of the
direction of travel.

Fig.7 A Φ1-Φ2-Φ3- 3D plot illustrating the spatial distinction in the clusters


of image moments for the hand gesture chosen.

Training of ANN using optimal gesture set: Neural networks


[8]
have been applied to perform complex functions in
numerous applications, including pattern recognition,
classification, identification etc. In comparison with other
methods, a neural network classifier is more effective and less
time-consuming. The property of ANN to learn and predict
over time makes it a powerful classification tool ideal for
complex computation. ANN has been used for hand gesture
classification thus providing for a more intuitive system.
Following the elimination process, the neural network was
trained using these optimal set of gestures. The system was Fig. 8 Image Processing using NI Vision on a sample image to determine the
trained with approximately 1000 gestures corresponding to the center coordinates of the iris.
8 gestures selected earlier, each with around 120 sample
Following the acquisition of applicable images using
images. The number of iterations was taken as 1000 with
motion detection as indicated in section II, we perform the
mean square error to be 0.01 and the ANN was trained twice
following sequence of processes to procure the centre
to develop an accurate transfer function between the input and
coordinates of the iris. The following image calibration tools
the output. We have developed a two layer Perceptron model
are used to remove nonlinear and perspective errors caused by
for our applications in order to recognize the correct gestures
lens distortion and camera placement.
issued to the system.
Firstly, we begin by smoothing the acquired image using a
The first layer consists of ten neurons and the second layer
smoothing filter. This is done in order to remove noises from
contains eight neurons. Sigmoid and Pure Linear functions
the acquired image. This process is followed by Thresholding

171
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
of the image so as to separate the pupil from the rest of the in the direction of the line of sight (LoS). If the issued gesture
image. This step is accompanied by morphological operations isn’t gesture 1, then whichever command corresponds to the
such as removal of border pixels and small objects and identified gesture is executed. Five of the 8 gestures are
particle filtering to improve the quality of the processed shown in fig. 11 with the corresponding commands performed.
images and to ensure its independence over noise and
distortions. We then eventually find the centre coordinates of
the iris with the help of its curvature using the ‘circle detector’
function. All concomitant snapshots shown in fig.8 would
attest to the aforementioned steps.

Fig.10 Flowchart illustrating the motion control implementation

IV. LIMITATION, ISSUES AND SOLUTIONS


The following were the issues that we confronted during
the course of our experiments and correspondingly the
proposed solutions.
Fig. 9 Depiction of the Vision Controlled algorithm 1) Lighting Conditions: Poor illumination of captured
The Algorithm: In order to determine where the subject is images will have dramatic declination on the effectiveness of
looking, the following methodology is adopted. the system. In order to combat this issue, the light source is
• CR: X-Coordinate of the Reference Point placed in such a way that the maximum light that is reflected
• C: X-Coordinate of the Centre of the Eye back is from the hand. The hand is illuminated using a 60W
• b: Parameter that determines the direction of light source. To prevent the light source from perturbing the
motion— this can be calibrated based on users’ user, an 850nm filter is positioned over the source. The
convenience and surrounding conditions illumination cannot be detected by the human eye, as 850nm
• If CR>C and |CR –C|>b then, turn RIGHT. falls in the infrared region, and hence does not agitate the user.
• If CR<C and |CR -C|>b then, turn LEFT. Also a diffused, non-obtrusive light source is used to
• If |CR –C| <b then, go STRAIGHT. illuminate to subject’s eye in event of poor illumination.
2) Defining the Region of Interest (ROI): In view of
We have taken the reference point as the centre of the iris identifying the region of interest, which in this case is the
in the case when the subject is looking straight. With respect region enclosed between the wrist and the fingertips, we resort
to this reference we can decide where the subject is to usage of a band worn around the wrist. This proved to be of
looking, for e.g. if he is looking toward left, his intention is immense assistance in delineating the ROI from the
to turn left. The extent to which the negotiation of turn is to be extraneous segment of the hand. The color of the band was
executed can be calibrated based on the surroundings and chosen black to match the background, which facilitated the
users’ comfort level. The accompanying fig.9 shows the image processing.
reference point. When the subject moves his eye beyond the
parameter ‘b’ as shown, there’s an indication to the system to V. EXPERIMENTAL RESULTS:
execute a turn in that direction to the extent defined by the
user.
C. Motion Control:
In order to establish full fledged motion control, we have
incorporated both vision based and hand gesture based control
in conjunction. Fig.10 shows a flow chart that illustrates the
general conception of motion control implemented using a Fig. 11 Hand gestures numbered 1-5 corresponding to the command they
combination of both these methods. The issued hand gesture is execute
identified to be one of the qualified 8 gestures. If this issued
gesture is gesture 1, then the responsibility of direction control
is transferred to the eye motion. The wheelchair then traverses

172
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.
Fig. 12 (a) The Front Panel of our application; (b) LabVIEW VI — The entire Block Diagram for Hand Gesture and Vision Controlled Wheelchair

Fig.11 above shows five of the eight eclectic gestures we cRIO and CVS. The requisite information, once passed onto
had obtained earlier, essential for our application. Gesture 1 cRIO, was subject to further processing. The VI prepared on
transfers the responsibility of direction control to eye motion. the development computer was also deployed onto the cRIO.
The wheelchair travels in direction of line of sight. Gesture 2 The cRIO processes the Image Moments and with the help of
halts the wheelchair. Gesture 3, 4, 5 make the wheelchair Neural Network classifies the issued gesture and dispatches
traverse straight, rotate left and right respectively. Fig.12 (a) the command to the motors via the cRIO digital I/O module
and 12(b) show the front panel and the LabVIEW VI block 9472.
diagram of our application.
All image-processing experimentations were performed VI. SUMMARY AND FUTURE WORK
using NI Vision Assistant and NI LabVIEW was used for both The paper describes a novel vision and hand gesture based
training and implementing the Neural Network and also to locomotive system ideal for physically challenged individuals.
effect motion detection and control of wheelchair. We used a The electronic wheelchair also focuses on complying with
computer running on Windows XP with a 2GHz processor user’s instinctive volition enabling much easier mobility. An
and 512MB for all our initial experimentations. eclectic set of 8 hand gestures were selected by statistical
analysis and were used in our experimentations. Our system
excels other systems in that it uses only a minimum number of
gestures and facilitates easy implementations with accurate
results. The system proved to be highly responsive to the user
Fig. 13 NI Vision script —Image processing for Hand Gesture Control
inputs and behaved precisely in the manner it was
programmed to. We plan to incorporate dynamic, three
dimensional gestures[10] in the future for increased
functionalities .We also plan to incorporate GPS based failure
Fig. 14 NI Vision script —Image processing for Vision Based Control notification system, which can also be used to monitor the
users’ whereabouts.
The NI Vision scripts which were prepared for Hand
Gesture and Vision based control are shown in fig. 13 and REFERENCES
fig.14 respectively. Our image processing algorithm to [1] Mark Weiser, “The Computer for the Twenty-First Century,” Scientific
process an image on an average required only about 9ms and American, pp. 94–104, September 1991.
556ms for Hand Gesture and Vision based Control [2] Davis, J., and Shah, M.: ‘Recognizing hand gestures’. Proc. European
Conf. Computer Vision, Stockholm, 1994, pp. 331–340
respectively. We computed a total of less than 650ms for the [3] C. Metzger, M. Anderson, and T. Starner. Freedigiter: A contact-free
entire operation on our computer, right from image acquisition device for gesture control. In Eighth IEEE International Symposium on
to an action being executed. Wearable Computers (ISWC’04), pages 18–21, 2004.
Our hardware experimentation includes the usage of CVS- [4] Marius et al.: ‘Face detection using color thresholding, and Template
matching’, Stanford University, CA,May
1454 and cRIO- 9074 with cRIO digital I/O module 9472. The 2003.http://www.stanford.edu/class/ee368/Project_03/Project/reports/e
Compact Vision System [CVS-1454] was interfaced with two e368group15.pdf
Firewire [IEEE-1394], CMOS cameras. [5] http://www.ni.com/pdf/products/us/2005_5732_221_101_lo.pdf
The CVS-1454 was configured such that it resides on the [6] http://www.ni.com/pdf/manuals/374639c.pdf
[7] M. Hu, ‘‘Visual pattern recognition by moment invariants,” IRE
same subnet as the development computer. The VI was Trnrisnctioiis oii biforriiatioii Tlieory, vol. IT& pp. 179-187, 1962.
deployed on CVS-1454 enabling it to run the application [8] Haykin, S.: ‘Neural networks’ (Prentice-Hall, 1999, 2ndedn.)
without further connection to the development computer. The [9] Gonzalez, R.C., Woods, R.E., and Eddins, S.L.: ‗Digital image
CMOS cameras are made perform image acquisition, and the processing using MATLAB‘ (Pearson Prentice-Hall, 2004)
[10] Y. Xiaoming and X. Ming, "Estimation of the fundamental matrix from
image processing was performed in Real-Time by CVS-1454. uncalibrated stereo hand images for 3D hand gesture recognition",
The Image Moments, which forms the output of the CVS is Pattern Recognition, vol. 36, pp. 567-584, 2003.
dispatched to cRIO-9074. An Ethernet cable is used to link

173
Authorized licensed use limited to: Thiagarajar College of Engineering. Downloaded on February 17,2010 at 07:31:48 EST from IEEE Xplore. Restrictions apply.

You might also like