You are on page 1of 4

2014 International Conference on Fuzzy Theory and Its Applications (iFUZZY2014)

November 26-28, Ambassador Hotel, Kaohsiung, Taiwan

A Kinect-Based Gesture Command Control Method


for Human Action Imitations of Humanoid Robots
Ing-Jr Ding, Che-Wei Chang and Chang-Jyun He
Department of Electrical Engineering
National Formosa University
No.64, Wunhua Rd., Huwei Township, Yunlin County 632, Taiwan
E-mail: ingjr@nfu.edu.tw

AbstractThis paper develops a Kinect-based gesture


command control method for driving the humanoid robot to
learn human actions. The popular Kinect sensor is well-known
for its high performances on gesture recognition. This work uses
three different recognition mechanisms, dynamic time wrapping
(DTW), hidden Markov model (HMM) and principal component
analysis (PCA)-based eigenspace approaches for performing the
recognition task of the specified human active gestures captured
by the Kinect sensor. The recognized gesture command is then
used to control the action of a humanoid robot where the
humanoid robot will imitate the human active gesture according
the content of the gesture command. By the presented method,
the humanoid robot will effectively learn the human action. A
series of experiments on gesture recognition and humanoid robot
control are done to compare the gesture command recognition Fig. 1. The scenario of human actions learning of humanoid robots with a
performance of three recognition methods and evaluate the Kinect gesture capture sensor.
similarity degree of humanoid robot imitations on the gesture
command-indicated human active gesture.

KeywordsKinect; gesture command; human joint; human Different to those integration studies of the Kinect and
action imitation; humanoid robot the robot, this paper develops a HCI system where Kinect-
based gesture command control is performed for human action
I. INTRODUCTION imitations of the humanoid robot. Figure 1 depicts the
practical scenario of human actions learning of the humanoid
With the fast development of human computer interactive robot with a Kinect gesture capture sensor. The humanoid
(HCI), speech recognition and text recognition techniques robot will operate the active gesture as the test active users
have been matured and widely used in many areas in human gesture. The users active gesture that are viewed as the
daily life. Gesture recognition by using the human active operation command for controlling the robot is recognized by
gestures is a new type of HCI techniques. The development of three different recognition schemes, dynamic time warping
the Kinect sensor produced by the Microsoft company does (DTW) [5], hidden Markov model (HMM) [6] and principal
speed up the elaboration of gesture recognition [1]. Integration component analysis (PCA)-based eigenspace [7] methods.
of the Microsoft Kinect sensor and the robot for creating a fine
interface between the human user and the robot machine to
perform the specific function has been an interesting technical II. GESTURE COMMAND RECOGNITION
issue in the recent year. In the work of [2], the Microsoft This work employs three recognition schemes, DTW [5],
Kinect is applied to recognize different body gestures and HMM [6] and PCA-based eigenspace [7] approaches, for
generate an interaction interface between the body gesture performing gesture recognition and then controlling the action
module and the humanoid robot Nao that is made by the of the humanoid robot according to the recognized command
Aldebaran company. In addition, a control system for result. This section will primarily introduce these three
manipulator robot using the Microsoft Kinect based on recognition schemes.
proportional-derivative control algorithm is proposed in [3].
The study of [3] is a combination scheme of the robot arm A. DTW Recogntion
control system and the Microsoft Kinect. Integrating the
Kinect, gesture recognition systems and mobile devices for the Dynamic time warping [5] belongs to the type of
application of an interactive discussion could be seen in the dynamic programming techniques. Such the dynamic
work of [4]. programming algorithm makes appropriate template-matching
calculations by a time-warping technique [5]. The time

978-1-4799-4588-7/14/$31.00 2014 IEEE 208


2014 International Conference on Fuzzy Theory and Its Applications (iFUZZY2014)
November 26-28, Ambassador Hotel, Kaohsiung, Taiwan

warping of DTW is essentially to search an optimal path the establishment of the eigenspace for representing all
between the testing data and the reference template. When collected gesture information. As mentioned, fourteen human
performing DTW time warping calculations, the similarity gesture command are designed, and these fourteen gesture
degree between the testing data and the reference template will training data will be collected and then used to establish the
be derived. The high distortion between the two of them eigenspace. When the test user has a gesture recognition test in
denotes a low similarity degree. the test phase, the recognition decision process is to locate the
position of the test gesture data and then find the most
In this work of Kinect-based gesture command matched gesture categorization among these fourteen gesture
recognition for humanoid robot imitations, there are fourteen classes for this test data to be the recognition outcome.
human gesture command designed. These fourteen gesture
training data will be collected and then used to establish the III. HUMANOID ROBOT IMITATIONS ON HUMAN ACTIVE GESTURE
DTW reference template database. When the test user has a Gesture recognition adopting DTW, HMM and PCA-based
gesture recognition test in the test phase, each template of the eigenspace methods are integrated into the overall human
DTW referenced template database will be compared with the machine interactive system to be a command mechanism for
test users active gesture by time warping. controlling the action of a humanoid robot. Figure 2 depicts
the joint distributions in an indicated humanoid robot (the left
B. HMM Recognition side of Fig. 2) and the Kinect-captured human skeleton (the
Hidden Markov model [6] has been widely used in the right side of Fig. 2). As could be seen in Fig. 2, the robot
field of pattern recognition, such as speech recognition, text adopted to imitate the human gesture is the Bioloid humanoid
recognition and human gesture recognition in this study. As robot. The adopted Bioloid humanoid robot is produced by the
the above mentioned DTW recognition scheme, the HMM South Korean company, Robotis, and the Bioloid robot is
recognition task also includes training and test stages. composed of components and modular servomechanisms
Different to template matching of DTW in two templates, the (called artificial joint motor) which can be arranged according
HMM approach belongs the category of the model-based to the requirement of the user [9]. In this work, there are
technique, and therefore, in the training stage, HMM models totally fourteen gesture commands, and therefore fourteen
that contain statistical information of training patterns will be different setting configurations for the corresponding fourteen
built up. In fact, HMM has been early used in speech robot action establishments are made.
recognition where the HMM probability model is employed to
The number of joints in the Kinect-captured human
describe the pronunciation characteristics of the speakers the
skeleton is 20, which is different to the number of the artificial
uttered speech signals [8]. In this study of gesture recognition,
joint motor in Bioloid humanoid robot. The Bioloid humanoid
the HMM probability model is used to describe the active
robot has 18 modular servomechanisms, each of which
characteristics of the actors operative gestures, especially the
represents a corresponding artificial joint motor. In this study,
gesture captured by the Kinect sensing camera.
the gesture operated by the test user is recognized, and then
There are totally fourteen HMM classification gesture three dimensional positions of a series of joint sets, each joint
models established in this study. Each HMM gesture model set containing 20 joints, in the Kinect-captured human
among fourteen classification models will be used to calculate skeleton are determined. The setting configurations for the
the likelihood degree with test actors operated test gesture in corresponding robot action is made according to all these
the test stage. The derived likelihood degree between each of derived position information of joint sets in the Kinect-
those trained HMM state sequence models and the input test captured human skeleton and the real action gesture from the
active gesture of a test actor is used to evaluate the recognition human actor.
result of the test gesture. The label of the trained HMM state
sequence model with the highest value of the likelihood
degree will be the recognition outcome of the test gesture. In
this study of Kinect-based gesture command recognition for
humanoid robot imitations, left-to-right state transitions are
adopted in both HMM training and HMM testing.
C. PCA-based Recognition
Principal component analysis technique has been widely
used in image processing [7]. The eigenspace method
employing the PCA technique has been proved to be effective
in pattern recognition. This paper also explores the
effectiveness of the utilization of the PCA-based eigenspace
approach for achieving humanoid robot imitations by Kinect-
based gesture command recognition.
In this study, the PCA-based eigenspace approach for
gesture recognition employing the classical principal
component analysis technique, and there are two main stages
Fig. 2. Bioloid humanoid robot joints (left) and Kinect-captured human
included, PCA operations for features of human activities and skeleton joints (right).

978-1-4799-4588-7/14/$31.00 2014 IEEE 209


2014 International Conference on Fuzzy Theory and Its Applications (iFUZZY2014)
November 26-28, Ambassador Hotel, Kaohsiung, Taiwan

IV. EXPERIMENTS AND RESULTS


This section describes experimental settings and related
results of the presented Kinect-based gesture command control
method for human action imitations of the humanoid robot.
Experimental settings and the database collections, the
experimental result of gesture command recognition, and the
experimental results of humanoid robot imitations are
provided as follows.

A. Experimental Settings and the Database Collections


In the aspect of gesture recognition experiments, one
Kinect sensor with the RGB camera and the infrared camera is
used to capture the active users gestures. The default setting
of the frame rate in Kinect is 30, and therefore, there are
totally 30 active frames captured in one second. There are
totally 7 active users are requested to operate a series of the
designed active gestures. The gesture database collected Fig. 3. Humanoid action setup using the motion editor of RoBoPlus user
includes fourteen active gestures which are popular and interface.
frequently seen. Each of five of 7 active users is requested to
operate 10 active gestures for each of these indicated fourteen motion editor of the RoBoPlus software for the development
classes of gestures, and therefore there are totally 700 active of the motion models of the designed fourteen gesture
gestures with fourteen gesture classes, which are used to recognition commands is used in this work. In Fig. 8, there are
establish the DTW referenced template database in the six fields required to be completed including starting the
experiment of DTW gesture recognition for humanoid robot creation of the sets of motions, defining the individual motions
imitations, the HMM gesture models in the experiment of associated to the motion set, displaying the position data
HMM gesture recognition for humanoid robot imitations, and associated to the defined motions and sending this position
the gesture eigenspace in the experiment of PCA-based data directly to the required servomechanism of the Bioloid
eigenspace gesture recognition for humanoid robot imitations. humanoid robot, displaying the position data associated to the
In addition, each of two of 7 active users that are not included actual position of the servomechanisms of the Bioloid
in the stage of recognition system establishments is requested humanoid robot, turning off or turning on a particular
to operate 10 active gestures for each of these indicated servomechanism, and configuring certain crucial parameters
fourteen classes of gestures, and therefore there are totally 280 for the servomechanism.
active gestures with fourteen gesture classes, which are used B. Experimental Results of Gesture Command Recognition
in the test phase to evaluate the performance of three different
gesture recognition approaches for controlling the action of As mentioned before, the recognition accuracy of gesture
the humanoid robot. recognition methods will have a direct impact on the
It is noted that there are mainly two factors for the correctness of the humanoid robot actions. An incorrect
performance of humanoid robot imitations, one is the gesture command recognition result received by the humanoid
recognition accuracy of gesture recognition methods for robot will make the robot to operate an incorrect active gesture
dictating the robot and the other is the matched degree without any human imitations.
between the joint number and the joint distribution in the The averaged recognition performance of three different
Kinect-captured human skeleton and those in the Bioloid gesture recognition schemes on the test database of 280 active
humanoid robot. The Bioloid humanoid robot cannot operate gestures collected by two test active users are evaluated. The
the same gesture as the active gesture operated by the test PCA-based eigenspace gesture recognition method has the
active user due to an incorrect gesture command recognition best recognition performance of 87.86%, followed by 82.14%
result or an imperfect match of the joint number and the joint of the DTW gesture recognition method, and the HMM
distribution between the Kinect-captured skeleton and the gesture recognition method performs worst, which achieves
Bioloid humanoid robot. just 81.79%. For Kinect-based gesture recognition for human
Figure 3 shows the motion editor of the RoBoPlus user action imitations of the Bioloid humanoid robot, the PCA-
interface, which is used to set the configuration of Bioloid based eigenspace gesture recognition method will have the
humanoid actions. For the Bioloid humanoid robot, the most accurate recognition control command for dictating the
premium version of the kit with 18 degrees of freedom (DOF), robot.
i.e. 18 artificial joint motors, is used in this study. The Bioloid
humanoid robot includes three classes of mechanical designs, C. Experimental Results of Humanoid Robot Imitations
Type-A, Type-B and Type-C according to the functional An imperfect match of the joint number and the joint
complexity [9], and this work adopts the Type-A Bioloid distribution between the Kinect-captured skeleton and the
humanoid robot. As depicted in Fig. 3, the utility tool of the Bioloid humanoid robot could cause a quite dissatisfactory

978-1-4799-4588-7/14/$31.00 2014 IEEE 210


2014 International Conference on Fuzzy Theory and Its Applications (iFUZZY2014)
November 26-28, Ambassador Hotel, Kaohsiung, Taiwan

using the right hand with the dissatisfactory imitation due to


a bad match between the human joints and the robot joints. In
such the imitation situation of joint mismatch, that the
humanoid robot imitates the human active gesture according
to the received gesture command is hard to operates perfectly.

V. CONCLUSIONS
Fig. 4. The humanoid robot action lifting the left foot with both hands held In this paper, the popular Microsoft Kinect sensor and
with the best imitation due to a perfect match between the human joints and
the robot joints. the humanoid robot is properly integrated for humanoid robot
action imitation applications. The humanoid robot with the
artificial joint servomechanisms could imitate the humans
specific active gestures according to the gesture command
made by the test active user. The actors active gesture
captured by the Kinect platform for humanoid robot imitations
is viewed as the control command. DTW, HMM and
eigenspace recognition schemes are employed for recognizing
the gesture control command in this work. Experiments show
Fig. 5. The humanoid robot action putting both hands in the hip with the left that the presented Kinect-based gesture command control
foot lifted to the left side with the passable imitation due to a acceptable method is effective and efficiency for humanoid robot action
match between the human joints and the robot joints. imitation.

ACKNOWLEDGMENT
This research is partially supported by the Ministry of
Science and Technology (MOST) in Taiwan under Grant
MOST 103-2218-E-150-004.

REFERENCES
Fig. 6. The humanoid robot action handing the phone using the right hand [1] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia,
with the dissatisfactory imitation due to a bad match between the human joints vol. 19, no. 2, pp. 410, 2012.
and the robot joints.
[2] L. Cheng, Q. Sun, H. Su, Y. Cong, and S. Zhao, Design and
implementation of human-robot interactive demonstration system based
humanoid robot imitation action. As mentioned, the robot used on Kinect, Proc. the 24th Control and Decision Conference (CCDC),
in this work is the Type-A Bioloid humanoid robot, which has 2012, pp. 971975.
18 artificial joints. Such the number of 18 joints is smaller [3] R. Afthoni, A. Rizal, and E. Susanto, Proportional derivative control
than 20 of Kinect-captured skeleton joints, and therefore, in based robot arm system using Microsoft Kinect, Proc. IEEE
some situations, the gesture command is correctly recognized, International Conference on Robotics, Biomimetics, and Intelligent
but the Bioloid humanoid robot operates an imperfect Computational Systems (ROBIONETICS), 2013, pp. 2429.
imitation action due to the limitation of the restricted joint [4] V. Tam and L.-S. Li Integrating the Kinect camera, gesture recognition
number and the specific joint distribution. Fourteen human and mobile devices for interactive discussion, Proc. IEEE International
active gestures are devised in this work, and not all gestures Conference on Teaching, Assessment and Learning for Engineering
(TALE), 2012, pp. H4C-11 H4C-13.
are well imitated by the Bioloid humanoid robot in the
situation of correct gesture command received. Figure 4 [5] H. Sakoe and S. Chiba, Dynamic programming algorithm optimization
for spoken word recognition, IEEE Transactions on Acoustics, Speech,
depicts the Bioloid humanoid robot action lifting the left foot and Signal Processing, vol. 26, no. 1, pp. 43 49, 1978.
with both hands held with the best imitation due to a perfect
match between the human joints and the robot joints. In the [6] L. R. Rabiner, A Tutorial on hidden Markov models and selected
applications in speech recognition, Proceedings of the IEEE, vol. 77, no.
category of active gestures, the joint matched degree between 2, pp. 257286, 1989.
the Bioloid humanoid robot and the Kinect-captured human [7] M. Turk and A. Pentland, Eigenfaces for recognition, Journal of
skeleton is high, and therefore the robot imitation action is Cognitive Neuroscience, vol. 3, no. 1, pp. 7186, 1991.
satisfactory. The humanoid robot action putting both hands in [8] I. J. Ding, Speech recognition using variable-length frame overlaps by
the hip with the left foot lifted to the left side with the intelligent fuzzy control, Journal of Intelligent and Fuzzy Systems, vol.
passable imitation is shown in Fig. 5. Although the robot 25, no. 1, pp. 4956, 2013.
imitation action is not ideal but still acceptable due to a much [9] J.-K. Han and I.-Y. Ha, Educational robotic construction kit: Bioloid,
proper match between the human joints and the robot joints. Proc. the 17th World Congress of the International Federation of
Figure 6 is the humanoid robot action handing the phone Automatic Control (IFAC), 2008, pp. 30353036.

978-1-4799-4588-7/14/$31.00 2014 IEEE 211

You might also like