You are on page 1of 20

SixthSense

B.Tech Seminar report

by Gokul Sudhakaran ETAHECS017

Department of Computer Science And Engineering Government Engineering College, Thrissur December 2010

Seminar Report 2010

Acknowledgment

Let me begin by thanking Mr Pranav Mistry, the brain behind this incredible technology called SixthSense and the author of the paper SixthSense: A Wearable Gestural Interface. I would also like to thanks Ms Pattie Maes, Mr. Liyan Chang, Mr. Tsuyoshi Kuroki, Mr. Ming-Hsuan Yang, Mr. Narendra Ahuja and Mr. Mark Tabb, the authors of the various papers that I referred for my Seminar. I would also like to thank Prof. Manoj Kumar, Head of the Department, Computer Science and Engineering for providing facilities required to conduct the seminar. Next I would like to express my sincere gratitude to Mr. Ajay James for approving my topic and for helping and guiding me during the entire seminar process. My warmest regards to Mr. Savyan P.V and Mrs. Baby Syla for their support and encouragement. It is always a great condence booster when you have the backing of your teachers. Last but not the least I would extend my gratitude to my classmates for helping me and giving me valuable suggestions regarding the topic. Gokul Sudhakaran December 2010 Govt. Engineering College, Thrissur

Dept. of CSE, GEC, Thrissur

Seminar Report 2010

Contents
1 Introduction 1.1 Organization Of the Report . . . . . . . . . . . . . . . . . . . . . . . . 2 Gesture Recognition 2.1 Motion Segmentation . . . . 2.2 Skin Colour Model . . . . . 2.3 Geometric Analysis . . . . . 2.4 Motion Trajectories . . . . . 2.5 Recognizing Motion patterns 3 Applications 1 1 3 3 4 4 5 5 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . using Time Delay neural Network(TDNN)

4 Tangible Public Map - TaPuMa 10 4.1 How does TaPuMa work? . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 Merits and Demerits 12 5.1 Merits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.2 Demerits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6 Future Work 7 Conclusion References 13 14 14

Dept. of CSE, GEC, Thrissur

ii

Seminar Report 2010

List of Figures
1.1 2.1 2.2 3.1 3.2 3.3 3.4 3.5 4.1 SixthSense device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ane Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . Architecture of TDNN . . . . . . . . . . . . . . . . . . . . . . . . . . . Viewing map using SixthSense device Gesture for recognising time . . . . . Gesture for taking a photo . . . . . . Making a call using SixthSense device Multimedia Reading Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 5 6 7 8 8 9 9

TaPuMa working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Dept. of CSE, GEC, Thrissur

iii

Abstract Information is everything in todays world. Yet the world of information is very small. From details about the various stars or galaxies to the description of the tinniest piece of junk that you can get in a supermarket, everything is present on the internet. Yet it is conned, trapped in a screen on a desktop or on a mobile. Well SixthSense breaks these connes and brings the information to the real world. Rather than adjusting ourselves to the latest machine(gadgets), SixthSense adjusts the machine to us and trains it to understand our natural hand gestures.

Seminar Report 2010

Chapter 1 Introduction
Steve Mann was the rst person to bring forth the idea of SixthSense in the form of a device called Telepointer, it was originally referred to as Synthetic Synesthesia of the Sixth Sense. He is also considered as the father of SixthSense. This was later developed by Pranav Mistry, a PhD student at MIT. SixthSense is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information. Hardware requirements for the device include : Camera- recognizing and tracking hand gestures Pocket Projector- projects digital information on the wall or any other surface Cellphone- connects to the cloud and does the processing Mirror- helps in projecting the image on a horizontal surface by reection Coloured Caps- helps in keeping track of the hand and recognising gestures. The technoligies used in this project are : Hand Augmented Reality Gesture Recognition Image capturing, processing and manipulation

1.1

Organization Of the Report

1. Chapter 2 describes how gesture recognition is done and its dierent phases Dept. of CSE, GEC, Thrissur 1

section 1.1

Seminar Report 2010

Figure 1.1: SixthSense device 2. Chapter 3 describes the applications of the device 3. Chapter 4 describes how TaPuMa works 4. Chapter 5 describes the merits and demerits of the device. 5. Chapter 6 discusses the future developments 6. Chapter 7 gives the conclusion of the paper.

Dept. of CSE, GEC, Thrissur

Seminar Report 2010

Chapter 2 Gesture Recognition


Gesture recognition is an important aspect of this device. We would be tackling the general gesture recognition technique(not specic to SixthSense). The entire process of gesture recognition can be divided into the following parts : Motion Segmentation Skin Colour Model Geometric Analysis Motion Trajectories Recognizing Motion Patterns Using Time Delay Neural Network(TDNN)

2.1

Motion Segmentation

Early methods can be divided into two types:1. Pixel based Motion segmentation in this method is done on the basis of intensity variations i.e. it regards intensity variations as a cause for motion and vice versa. This method works well when we consider a scene with slow moving objects or in which the number of objects are less. But as the number of objects or speed of objects increases the performance falls. 2. Feature based Feature based method matches image features like points dened by local intensity, edges, corners, etc. Featuress are extracted using single scale segmentation. These features are then matched across frames. Segmentation errors might make it dicult to nd feature correspondence across frames. Dept. of CSE, GEC, Thrissur 3

section 2.3

Seminar Report 2010

In motion segmentation region primitives are used to nd the 2D motion eld. It performs well in situation where pixel based methods fail. The major advantage of using this method is that it is multiscale therefore providing a rich description of region primitives. Also the region primitives are not eected by noise or illumination changes. Region correspondence is found by matching region shape, intensity and size into account whereas previous algorithms used a much simpler approach. For a pair of frames, (It,It+1), the algorithm identies regions in each frame comprising the multiscale intraframe structure. Regions at each scale are then matched across frames. Motion segmentation generates regions that have uniform motion.

2.2

Skin Colour Model

Not all of the regions generated by the motion segmentation is of use to us. In fact most of the regions can be easily discarded based on certain criteria. The criteria might defer based on the implementation of the particular project. Here skin colour is used as a cue for selecting the regions that are useful. The rest of the regions can be discarded. Unlike other methods, here skin colour is not used for motion segmentation but to select the motion elds generated after motion segmentation process. In case of SixthSense device thiis selection criteria is narrowed further by the use of coloured caps on ngers. Therefore it is a generic method and the criteria for selecting the motion elds can be selected by the individual based on the purpose for which it is being designed. Human skin color has been used and proven to be an eective feature in numerous applications. We use a Gaussian mixture to model skin color distribution in CIE LUV color space from a database of 2,447 images which consists of faces of dierent ethnic groups. The luminance value of each pixel is discarded to minimize the eects of lighting conditions and the parameters of the Gaussian mixture are estimated. A motion region is classied to be skin tone if the probabilities of being skin color of most pixels were above a threshold.

2.3

Geometric Analysis

Since the shapes of human head and hands can be approximated by ellipses, motion regions of skin tone were merged until the shape of the merged region is approximately elliptical. Skin tone regions were sorted based on their size. Among the largest regions, a region was randomly selected as a seed. A neighbour of the selected region was iteratively merged if the goodness t of an elliptic function Dept. of CSE, GEC, Thrissur 4

section 2.5

Seminar Report 2010

of the resulting region did not decrease by a threshold. This iteration proceeded till the number of grouped regions did not exceed a preselected maximum value. The whole process repeated several times with dierent random seeds to generate multiple candidates and the largest merged region was selected. The orientation of an ellipse was estimated from the axes of the least moment of inertia.

2.4

Motion Trajectories

Although motion segmentation captures motion details by matching regions at ne scales, it is sucient to use coarser motion trajectories of identied hand regions for gesture recognition. Ane transformation of a hand region in each frame pair is computed using the following formula

Figure 2.1: Ane Transformation The ane transformations of successive pairs are then concatenated to construct motion trajectories of the hand region.

2.5

Recognizing Motion patterns using Time Delay neural Network(TDNN)

TDNN is a dynamic classication approach in that the network sees only a small window of the input motion pattern and this window slides over the input data while the network makes a series of local decisions. The local decisions taken are later integrated into a global decision. The advantages of using TDNN are : Its able to identify patterns from poorly alligned training examples. The total number of weights in the network is relatively small since only a small window of the input pattern is fed to the TDNN at any instance. TDNNs have been applied successfully to speech recognition where the patterns vary slightly in time. On one hand, we want to recognize gestures with slight time variation as the same gesture. On the other hand, gestures with the same movements Dept. of CSE, GEC, Thrissur 5

section 2.5

Seminar Report 2010

Figure 2.2: Architecture of TDNN but dierent execution time should be recognized with dierent meanings. It has been noted that some gestures have similar movements but have dierent execution time. Its compact structure economizes on weights and makes it possible for the network to develop general feature detectors. Its temporal integration at the output layer makes the network shift invariant (i.e., insensitive to the exact positioning of the gesture).

Dept. of CSE, GEC, Thrissur

Seminar Report 2010

Chapter 3 Applications
The applications of SixthSense technology are numerous Hardware requirements for the device include :1. 3D Drawing application lets the user to draw on any surface by tracking the ngertip movements of the users index nger. 2. Mapping :- Navigate map using hand gestures, zoom in, zoom out, pan, magnifying a certain portion of the map, etc.

Figure 3.1: Viewing map using SixthSense device

Dept. of CSE, GEC, Thrissur

section 3.0

Seminar Report 2010

3. GESTURE ANALYSIS :- Multi touch gestures, iconic gestures, freehand gestures. Common gestures recognised are drawing a circle on wrist to show the clock, making a square using hands to capture a photo, drawing an @ symbol to check emails.

Figure 3.2: Gesture for recognising time

Figure 3.3: Gesture for taking a photo 4. Touch Sensation :- Microphone attached to paper receptive to touch, camera tracks movement of nger 5. Make a call Dept. of CSE, GEC, Thrissur 8

section 3.0

Seminar Report 2010

Figure 3.4: Making a call using SixthSense device 6. Get product information 7. Create multimedia reading experience in newspaper

Figure 3.5: Multimedia Reading Experience

8. Get reviews of book 9. Get dynamic updates 10. Feed information on people 11. Information acquirement through things we carry- TaPuMa

Dept. of CSE, GEC, Thrissur

Seminar Report 2010

Chapter 4 Tangible Public Map - TaPuMa


Maps are wonderful tools that help us orient and guide ourselves on a street, in a city, in a country, or on the planet. Maps are universal medium for communication, easily understood and appreciated by most people, regardless of language or culture. Internet search engines have popularized the keyword-based search paradigm. It has been noticed that the intent behind a search is often not informational (what is the solar system), it might be navigational (how I can get there, where is the restroom) or transactional (guide me to the places where I can perform a certain transaction, e.g., buy a coee, or use a service). In physical world, to search things, places and people, we are looking for, we use a dierent process. We use properties and characteristics of objects and places to locate and dierentiate them among other objects and environment in the real world. In the digital world we rely heavily on words. TaPuMa on the other hand does not rely on words. TaPuMa is a digital, tangible public map that allows people to use their own belongings or the everyday objects they carry with them to access relevent information. The main advantages of using TaPuMa are : Physical objects as keyword Locations and non spatial information Multiple user interactions Dynamic and contextual information

4.1

How does TaPuMa work?

The TaPuMa system (gure 2.) uses a table-top environment where map and dynamic content is projected on the table. A camera mounted above the table identies Dept. of CSE, GEC, Thrissur 10

section 4.1

Seminar Report 2010

and tracks the locations of the objects on the surface. The projector and the camera are connected to a computer. A software program identies and registers the location of objects on the table. On the basis of identications of the objects, the software program provides relevant information visualization to be shown on the table. The projector augments the table and objects on the table with projected digital information from overhead along with the map. The system also consist supplementary components such as RFID readers to support the user identication and provide customized information relevant to the user.

Figure 4.1: TaPuMa working

Dept. of CSE, GEC, Thrissur

11

Seminar Report 2010

Chapter 5 Merits and Demerits


5.1 Merits

Turn any surface to touch screen User friendly Reduces machine dependence Supports multi-touch and multi-user interactions Portable and easy to carry Easy access to information

5.2

Demerits

Projector cannot cast perfect size images from all elevations Less accuracy Dierent surfaces may cause sensitivity problem Vulnerable to security threats

Dept. of CSE, GEC, Thrissur

12

Seminar Report 2010

Chapter 6 Future Work


Combine camera and projector into one single unit, this would make the device more compact and easier to carry. Use of laser projectors Proper gesture recognition algorithms to do away with the use of markers. Integrating GPS technology Making the system wireless. This will allow for wireless transmission of information thus improving the range of the device.

Dept. of CSE, GEC, Thrissur

13

Seminar Report 2010

Chapter 7 Conclusion
SixthSense technology is currently in its early stages and is regarded as a very powerful project by many experts. The future for this project is very promising and well be surely seeing its applications all around the world very soon. Pranav Mistry has indicated that hell making the project code open pretty soon. This will denitely give a major boost to the development of this project. Recently Pranav Mistry used sixth sense technology to implement a mouse withhout using the actual device using lasers and gesture recognition.

Dept. of CSE, GEC, Thrissur

14

Seminar Report 2010

References
[1] Ming-Hsuan Yang, Member, IEEE, Narendra Ahuja, Fellow, IEEE, and Mark Tabb, Member, IEEE, Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition [2] Pranav Mistry and Pattie Maes, MIT Media Laboratory, SixthSense- A Wearable Gestural Interface [3] Pranav Mistry, Tsuyoshi Kuroki, Chaochi Chang, MIT Media Laboratory, TaPuMa: Tangible Public Map for Information Acquirement through the Things We Carry

Dept. of CSE, GEC, Thrissur

15

You might also like