Stereo Vision New

Object Tracking Using Stereo Vision
Yogesh Borhade, Pratik.V.Jani, Sanket Jog

BE-EXTC,KCCOE, Mumbai University.
yogi.borhade@gmail.com
pratik069@gmail.com
sanketjog@gmail.com
Abstract— This document gives a brief description of stereo image is more than the sum of its parts. It is a three-
vision capabilities that can be used by autonomous robots. It can dimensional stereo picture.
detect different objects, its distance, its position in X, Y and Z With stereo vision you see an object as solid in three spatial
coordinate axes, its colour and shape, contour etc.
dimensions--width, height and depth--or x, y and z. It is the
added perception of the depth dimension that makes stereo
vision so rich and special.
Keywords— Intex USB Cameras,Stereo Vision,Linux (Ubuntu
9.10 version).
1. INTRODUCTION
In Robots, Visual interpretation is an extremely challenging
problem to fully solve. Robot vision technology is needed for
the stable walking, object recognition and the movement to
the target spot. By some sensors which use infrared rays and
ultrasonic, robot can overcome the urgent state. But stereo
vision of three dimensional spaces would make robot have
powerful artificial intelligence. With stereo vision, we can see where objects are in relation
to our own bodies with much greater precision--especially
when those objects are moving toward or away from us in the
2. STEREO VISION depth dimension. Instead of eyes, two USB Cameras are used
to obtain the two 2D images while Linux could be used for
Human beings are equipped with two eyes. Unlike performing the necessary operations instead of Brain to obtain
animals , humans have two eyes located side-by-side in the 3D coordinate of an object. Procedure to obtain 3D coordinate
front of their heads. Thanks to the close side-by-side is quite complicated and takes places in the following
positioning, each eye takes a view of the same area from a manner:-
slightly different angle. The two eye views have plenty in 1. Stereo Pipeline
common, but each eye picks up visual information that other 2. Epipolar Rectification
doesn't. 3. Stereo Matching
Two Eyes = Three Dimensions (3D)! 4. Depth via Triangulation
Each eye captures its own view and the two separate
images are sent to the brain for processing. When the two STEREO PIPELINING
images arrive simultaneously in the back of the brain, they are
united into one picture. The brain combines the two images by In the First Stage, global coordinate is captured by both the
matching up the similarities and adding in the small cameras and plotted on to the Image Planes of the cameras.
differences. The small differences between the two images
add up to a big difference in the final picture! The combined
EPIPOLAR RECTIFICATION
Given a pair of stereo images, rectification determines a

transformation of each image plane such that pairs of
conjugate epipolar lines become collinear and parallel to one
of the image axes
Each camera captures a 2D image of the 3D world. This

conversion from 3D to 2D is referred to as a perspective
projection. The camera is therefore modeled by its perspective
projection matrix(PPM)) Since the two focal points of the
cameras are distinct, each focal point projects onto a distinct
point into the other camera's image plane. These two image
points are denoted by eL and eR and are
called epipoles or epipolar points. The line OL–X is seen by
Stereo vision uses triangulation based on epipolar geometry to determine
distance to an object the left camera as a point because it is directly in line with that
camera's focal point. However, the right camera sees this line
Between two cameras there is a problem of finding a as a line in its image plane. That line (eR–xR) in the right
corresponding point viewed by one camera in the image of the camera is called an epipolar line. Symmetrically, the line OR–
other camera. This is called the correspondence problem. In X seen by the right camera as a point is seen as epipolar
most camera configurations, finding correspondences requires line eL–xLby the left camera. An epipolar line is a function of
a search in two dimensions. But, if the two cameras are the 3D point X, i.e. there is a set of epipolar lines in both
aligned to have a common image plane, the search is images if we allow X to vary over all 3D points. Since the 3D
simplified to one dimension - a line that is parallel to the line line OL–X passes through camera focal point OL, the
between the cameras (the baseline). Image rectification is an corresponding epipolar line in the right image must pass
equivalent and more often used alternative to this precise through the epipole eR (and correspondingly for epipolar lines
camera alignment. It transforms the images to make the in the left image). This means that all epipolar lines in one
epipolar lines align horizontally. image must intersect the epipolar point of that image. In fact,
any line which intersects with the epipolar point is an epipolar
Epipolar geometry refers to the geometry of stereo vision. line since it can be derived from some 3D point X.
When two cameras view a 3D scene from two distinct The process of camera calibration gives us both a model of the
positions, there are a number of geometric relations between camera’s geometry and a distortion model of the lens. In
the 3D points and their projections onto the 2D images that practice, no lens is perfect. This is mainly for reasons of
lead to constraints between the image points. To find the manufacturing; it is much easier to make a “spherical” lens
corresponding pixel, we only have to search along the epipolar than to make a more mathematically ideal “parabolic”
line (1D instead of 2D search). This search space restriction is lens.Radial distortions arise as a result of the shape of lens,
known as epipolar constraint whereas tangential distortions arise from the assembly
process of the camera as a whole. The lenses of real cameras
often noticeably distort the location of pixels near the edges of
the imager. This bulging phenomenon is the source of the
“barrel” or “fish-eye” effect. We assume that the stereo rig is
calibrated, i.e., the PPMs(Perspective Projection Matrices)
Of the two cameras are known. The idea behind rectification
is to define two new PPMs obtained by rotatingthe old ones
around their optical centers until focal planesbecomes
coplanar, thereby containing the baseline. This ensures that
epipoles are at infinity; hence, epipolar lines are parallel. To
have horizontal epipolar lines, the baseline mustbe parallel to
the new X axis of both cameras. In addition,
to have a proper rectification, conjugate points must have the
same vertical coordinate. This is obtained by requiring
that the new cameras have the same intrinsic parameters.
Note that, being the focal length the same, retinal planes are
coplanar too. In summary, positions (i.e, optical centers) of
the new PPMs are the same as the old cameras, whereas the 4. OPENCV LIBRARY
new orientation (the same for both cameras) differs from the OpenCV is a computer vision library originally developed
old ones by suitable rotations; intrinsic parameters are the by Intel. It is free for commercial and research use under the
same for both cameras. Therefore, the two resulting PPMs open source BCD license. The library is cross-platform and
will differ only in their optical centers, and they can be runs on Linux, Windows and Mac OS X.
thought as a single camera translated along the X axis of its Officially launched in 1999, the OpenCV project was
reference system. initially an Intel Research initiative to advance CPU-intensive
Epipolar Rectification enables to find the corresponding pixel applications, part of a series of projects including real-time ray
along a single horizontal scanline(1D search). The tracing and 3D display walls.
difference in x-coordinates of corresponding pixels is called OpenCV's application areas include:
disparity. 2D and 3D feature toolkits
Ego-motion
STEREO MATCHING Face Recognition
Gesture Recognition
Human-Computer Interface (HCI)
Mobile robotics
Motion Understanding
Object Identification
Segmentation and Recognition
Stereopsis Stereo vision: depth perception from 2 cameras
Structure from motion (SFM)
Motion Tracking
5. APPLICATIONS
1. Serves as a vision to autonomous robots

2. Helps in identifying defects in products in
manufacturing industries.
3. For surveillance purposes.
3. USB CAMERA
6. CONCLUSION
Thus this concept of stereo vision mimics the human vision
thereby increasing accuracy.
REFERENCES
M. Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, “High
resolution fiber distributed measurements with coherent OFDR,” in
The USB camera will capture the live real world video. Proc. ECOC’00, 2000, paper 11.3.4, p. 109.
By using two cameras, frames from the two videos would be
compared and processed to obtain a 3D image.
[1] (2002) The IEEE website. [Online]. Available: http://www.ieee.org/
[2] M. Shell. (2002) IEEEtran homepage on CTAN. [Online]. Available:
http://www.ctan.org/tex-
archive/macros/latex/contrib/supported/IEEEtran/
[3] Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Specification, IEEE Std. 802.11, 1997.
[4] www.wikipedia.com
[5] www.howstuffworks.com
[6] http://opensource.org

Stereo Vision New

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stereo Vision New

Uploaded by

Copyright:

Available Formats

Object Tracking Using Stereo Vision

Yogesh Borhade, Pratik.V.Jani, Sanket Jog

Given a pair of stereo images, rectification determines a

Each camera captures a 2D image of the 3D world. This

1. Serves as a vision to autonomous robots

You might also like