You are on page 1of 1

Camera Based Hand and Body Driven Interaction with Very Large Displays

Tom Cassey, Tommy Chheng, Jeffrey Lien


{tcassey,tcchheng,jwlien}@ucsd.edu

Advisor: Raj Singh, Ruth West Sponsor: National Center for Microscopy and Imaging Research (NCMIR) University of California, San Diego

Objective Approach Correspondence Matching


We introduce a system to track a user's hand and head in 3D Three-dimensional values for the hand and head are obtained by correspondence
matching of the stereo cameras. Once we have detected the hand or the head, we
and real-time for usage with a large tiled display. The system System overview search for the equivalent object along the same axis in the left image. We used
uses overhead stereo cameras and does not require the user to normalized cross correlation as the metric to determine the best match.
wear any equipment.
The cameras capture the user from two overhead cameras.
Our system uses the right frame as the primary source for head and
hand detection. The left frame is a support frame used to extrapolate
NCC =
3D information.

Hardware Setup Vector Extrapolation


After the correspondence matching, we have a
We use two Point Grey Research Dragonfly Express cameras to create a stereo pair. IR emitters 3D vector for both the head and hand from the
along with IR filters on the cameras are used to help distinguish skin tones. The cameras are camera. We then subtract these two vectors to
mounted about two meters above the user. We were able to capture the video with a resolution of extrapolate user's intended pointing direction.
640x480 at 15 frames per seconds. An Sun workstation equipped with an AMD Opteron 246 and 3
GB of memory. This 3D vector is projected onto the large tiled
display by a series of translations from the
Camera Calibration camera coordinate system to the screen
coordinate system.
The cameras were calibrated to infer 3D properties from the stereo camera pair. We used the SRI Small Vision System
Calibration[1] to calibrate the cameras. Using a known checker board pattern, the SRI API allowed us to compute the extrinsic
and Intrinsic parameters of the cameras. These parameters correct for the distortions in the lens and misalignment.

Conclusion
Hand Detection Head Detection
Hands are detected using Haar-like features with a cascade of Heads are detected using Hough transforms generalized
Background Adaboost classifiers. This system allows robust detection and
a certain degree of invariability to scale and rotation.
for circles. From above, heads appear roughly circular.
HATS
Heads can be detected by exploiting this geometric
property.
We used the implementation from the OpenCV libraries[2] to We developed a testing program to
Many research groups have built large tiled displays to visualize Circular regions are detected using generalized hough
train and create a cascade of classifiers for our hand detection. visualize the stereo pair and show the
large high resolution imagery such as a detailed biological transform with the following equation:
sample under an electron microscope. However, not many currently detected head and hand.
2 2 2
intuitive control systems have been created. r = x−a   y−b
Using a cascade of Adaboost
Our project attempts to solve the interface problem by using classifiers with Haar-like features for
overhead stereo cameras to project the user's intended point of hand detection and a generalized
interest to the screens. hough transform for circles as a head
detector, we compute 3D positioning
using correspondence matching.
Finally a 3D vector of the user's point
of interest is extracted. Our system
works well enough for users to trace
their hands on a drawing board.

Tracking .NCMIR will be interfacing with our system for 3D scientific applications on large tiled
displays at research conferences.
We use token tracking to make our object detection more robust. Tokens are data structures used to store state information
(position and area) about detected objects in the image. When an object is detected, its location and area are compared to
pre-existing tokens. If the new region is similar enough to an existing token, the token data is updated. This allows us to select
the best detected object. We use token tracking to help both hand and head detection.
Future Work
NCMIR's BioWall, a 20 tiled display, is shown here. It is the
perfect candidate to use our system. There is more work required in this project for it to be a truly robust system. The
need for speed is required for greater control accuracy and a robust gesture
recognition system would allow user actions.

[1] http://www.ai.sri.com/~konolige/svs/
[2] http://www.intel.com/technology/computing/opencv/overview.htm
Your Address, / 24/01/03 / Page 1
Your Institute abstract

You might also like