IEEE Paper

Stereovision to Calculate Depth and 3D Map
College: Cummins College of Engineering for women
Guide: Mrs. Shubhangi V. Tikhe svtikhe@rediffmail.com
Team Members
Anagha Kulkarni anagha.kulkarni03@gmail.com
Amruta Mohod amrutamohod@gmail.com
Priya Rokade priya.rokade12@gmail.com
Dhanashree Shetty shetty.dhanashree@gmail.com
Abstract --- An important research topic in the field of image to effectively processes, segment and analyze visual input of
processing is stereo vision. For acquisition of 3-Dimensional their environments it is often a requirement that the system is
information in real space, stereo vision system is suitable. Stereo able to obtain data of the surrounding world in a format that can
vision is the process of constructing a 3D model of a scene through be easily equated to the actual environment in which the system
processing two 2D images of this scene. The main problem is the finds itself. In the case of many vision systems this could be a 3
construction of a disparity map. This is a map that describes which dimensional representation of the real world. For humans this is
points in the two images corresponds to the same point in the 3D a task that we achieve quite naturally from an early age and it
scene. soon becomes second nature for us to accurately judge distance,
perspective and space, however, when human vision systems are
In the stereo system, 3D real world position is derived from analyzed it becomes apparent that the brain uses a multitude of
translation of coordinates between cameras and world. Thus, to use techniques to give us a sense of the three dimensional world in
stereo vision, it is needed to construct a precise system which which we live.
provides kinematically precise translation between camera and In order for a computer vision system to obtain depth
world coordinate, in spite of intricacy and hardness. In our paper, data from a scene it is possible to use a number of different
we propose an approach to develop a system which can easily techniques. Three dimensional scene data can be obtained from
obtain 3D information by direct computation of position with sources including object shading, motion parallax data,
translation of coordinates, disparity in stereo pair of image which is structured light or laser range finders. However, perhaps the
used to find the reference depth of objects. The algorithm is based most obvious technique is that of stereo vision. In a system
on correlation and uses the epipolar constraint. analogous to a pair of human eyes, the input to two cameras
observing the same scene can be analyzed and the differences
between the two images used to compute object depth and hence
Keywords -– Stereo Vision, Epipolar Geometry, Camera a model of the scene that the system is viewing. The utilities of a
Calibration, Rectification, Disparity robust implementation of such a system are many and
potentially include applications in areas such as space flight,
face recognition, immersive video conferencing and industrial
I. INTRODUCTION inspection to name just a few.[7]
Stereo vision, which is inspired by human visual
Making a machine to see objects is one of the most process, computes the disparity between correspondence points
important tasks in artificial intelligence, manufacturing in images captured by multiple cameras for distance
automation, and robotics. measurement. In Robots, Visual interpretation is an extremely
Computer Vision is one of the fastest growing areas challenging problem. Robot vision technology is needed for the
within Computer Science. Aided by rapid recent progress in stable walking, object recognition and the movement to the
hardware and software design, computer vision projects are target spot.
making use of vast increases in processing and memory Stereo vision of three dimensional spaces would make
capacities to enhance their performance. In order for computers robot have powerful artificial intelligence. Stereo Vision is an
important research topic in the field of image processing. The parameters to be determined. Techniques for camera calibration
problem is to compute a 3D model of a scene from two (or loosely fall into three categories linear, non-linear and two-step.
possibly more) 2D images. Each pixel in these images is Linear techniques assume a simple pinhole camera model and
represented by a number, corresponding to a grey level. In order
do not account for lens distortion effects which turn out to be
to construct a 3D model of a scene a disparity map is needed.
Such a map describes for each point in the source image where “significant in most off-the-shelf charge coupled devices” .In
to find the best corresponding point in the target image. From a non-linear methods a relationship between parameters is
disparity map and the calibration information of the cameras established and then an iterative solution is calculated through
that generated the two images a 3D model of the scene can be minimizations. The parameters to be calibrated are classified in
constructed. two groups:
Computational stereo vision involves inferring the 3-
dimensional depth of a scene from the small differences
between multiple views of the same scene. The most general 1) Internal (or intrinsic) parameters: Internal geometric and
definition of computational stereo vision involves only two optical characteristics of the lenses and the imaging device.
views of the same scene. This is also the most difficult problem 2) External (or extrinsic) parameters: Position and orientation
in stereo vision, since the amount of available data is small of the camera in a world reference system [7].
relative to multi-view stereo vision and hence the ambiguity of
the problem is high. With stereo vision, the disparity or image Calibration is important for accuracy in 3D reconstruction.
distance d between matching points XL and XR in the left and In particular it is a critical task for stereovision analysis.
right images is inversely proportional to the depth z, with the Calibrating stereo cameras is usually dealt with by calibrating
baseline distance b and focal length f the constant factors. The each camera independently and then applying geometric
concept of stereo vision itself is easily grasped, but the stereo transformation of the external parameters to find out the
matching problem of finding the correct match is much more geometry of the stereo setting [6].
ambiguous.
Real-time stereo vision has been widely used in Methods for estimating camera parameters rely typically on
intelligent robot navigation, smart human-computer interaction, targeted test-fields and correspondences between targets and
intelligent surveillance, etc. The estimation of the disparity their images on one or more frames. For multi-image
between two images of the same scene is a long-standing issue configurations, precise 3D test-fields can be replaced by simple
for the machine vision community [1]. Stereoscopic vision is 2D patterns, typically of a chess-board type. With unknown
based on the principal, first utilized by nature, that two spatially exterior orientation (Fiala and Shu, 2005), a further advantage of
differentiated views of the same scene provide enough such patterns is the fact that their high contrast and regularity.
information so as to perceive the depth of the portrayed objects. Several freely available algorithms exist for estimating interior
Thus, the importance of stereo correspondence is apparent in the and exterior orientation parameters based on chess-board
fields of machine vision, virtual reality, robot navigation, patterns imaged from different points of view. Among
simultaneous localization and mapping [2], [3], depth functional tools presented in this context, Bouquet’s Camera
measurements and 3-D environment reconstruction. The two Calibration implemented in C++ and included in the Open
alternatives for estimating disparity are either to precisely align Source Computer Vision library distributed by Intel is probably
the stereo camera rig and then perform the demanded best known. The initialization step includes manual pointing of
rectification (leading to simple scan line searches),or to have the four chessboard corners on all images and knowledge of the
arbitrary stereo cameras setup and avoid any rectification number of nodes per row and column. Node locations can thus
(performing searching throughout blocks). Accurately aligned be first approximated and then identified with sub-pixel
stereo devices are very expensive. accuracy by a point operator [6].
If lens distortion is strong, one may need to supply

II. CAMERA CALIBRATION approximations for its coefficients. Initial values are provided
by the plane-based calibration algorithm, while an iterative
Camera calibration is the process of relating the ideal model adjustment provides a final solution for camera and pose
of the camera to the actual physical device and of determining parameters. This is achieved using image sets of typical chess-
the position and orientation of the camera with respect to a board patterns (alternating light and dark squares of equal size),
world reference system. In order for it to be possible to which is the sole a priori information needed. Only those
reconstruct a scene from a stereo image pair it is necessary that extracted image points are kept which may be ordered in two
several important properties about each of the cameras be groups of lines referring to the main orthogonal directions on the
object plane. Due to the regularity of the pattern, establishing
known. Obtaining values for these properties is known as
point correspondences among views is then a trivial task,
calibration. Depending on the model used, there are different
although possibly involving object systems which differ in- angle lens), this VP is considered at infinity. If both VPs are
plane rotation and translations [6]. finite, their locations are refined in a single adjustment, in which
coefficients of radial lens distortion are also included in the
A. Camera calibration algorithm unknowns. Using diagonals with >3 points, the VP of the
diagonal direction which falls between the two principal VPs is
1) Corner extraction: With parameters chosen after some tests, also included as an unknown to enforce the vanishing line
the Harris corner operator with sub-pixel accuracy, made constraint. In case one VP is finite, it is estimated from all
available by Bouguet, is applied to grayscale images with participating lines along with the radial distortion coefficients
equalized histograms. [6].
2) Point ordering: After extraction of feature points on an B. Epipolar geometry

image, the medians of their x, y coordinates are calculated. It used to the geometry of stereo vision. When two
These will normally indicate a point close to the centre of the cameras view a 3D scene from two distinct positions, there are a
chess-board pattern. The median is preferred against the mean number of geometric relations between the 3D points and their
value due to its lower sensitivity to the presence of ‘noisy projections onto the 2D images that lead to constraints between
points’ outside the chess-board. The point selection and ordering the image points. To find the corresponding pixel, we only have
algorithm is initialized by choosing the closest feature point as to search along the epipolar line (1D instead of 2D search). This
‘base point’ B. All extracted points around B in a window of search space restriction is known as epipolar constraint [8]
size equal to ⅓ of the image size are calculated and sorted
according to distance from it. Assuming that B is indeed a valid
node, the principal directions of the pattern must now be
identified by avoiding points not corresponding to pattern nodes
but also points on chess-board diagonals. The linear segment s
from ‘base point’ B to the nearest extracted point is formed.
Identification of the main directions succeeds by comparing the
grey values of the pixels on either side of s [6].
3) Point correspondences: The final outcome of this step is a set

of points characterized by the number of the respective chess-
board row and column with which they are associated. The
lower row appearing on an image is row 1 and is arbitrarily Fig:1: Epipolar geometry
associated with the object X-axis. The column to the far left is
column 1 and associated with the object Y-axis. Thus, the point Each camera captures a 2D image of the 3D world.
belonging to these two lines is point (1, 1) of this image and is
This conversion from 3D to 2D is referred to as a perspective
associated with the origin (point 1, 1) of the chess-board XY
system. If this point does not actually appear on an image or has projection. The camera is therefore modelled by its perspective
not been detected, the adjacent node detected on this image is projection matrix (PPM)) Since the two focal points of the
numbered accordingly, e.g. (2, 1) or (1, 2) etc. The process is cameras are distinct, each focal point projects onto a distinct
repeated for all images. Hence, thanks to the symmetric nature point into the other camera's image plane. These two image
of the pattern, it may be assumed that point correspondences points are denoted by eL and eR and are
among frames as well as correspondences with the chess-board called epipoles or epipolar points. The line OL–X is seen by the
nodes have been established. This provides an answer to the
left camera as a point because it is directly in line with that
problem of correspondences, which is seen as the most difficult
part in automatic camera calibration and is often solved camera's focal point. However, the right camera sees this line as
manually (Fiala and Shu, 2005) [6]. a line in its image plane. That line (eR–xR) in the right camera is
called an epipolar line. Symmetrically, the line OR–X seen by
4) Initial value: Instead of the linear solution of plane-based the right camera as a point is seen as epipolar line eL–xLby the
calibration, the approach adopted here relies on the two left camera. An epipolar line is a function of the 3D point X, i.e.
principal vanishing points (VP) of the images. These are found
there is a set of epipolar lines in both images if we allow X to
by line-fitting adjustment to nodes ordered in pencils of
converging lines. In each direction, initial estimates of VP vary over all 3D points. Since the 3D line OL–X passes through
locations are obtained from the two lines with >3 points forming camera focal point OL, the corresponding epipolar line in the
the largest angle. If the distance of a VP from the image centre right image must pass through the epipole eR (and
exceeds by more than 40 times the size of the image format correspondingly for epipolar lines in the left image). This means
(equivalent to a rotation angle ~1.5° for a moderately wide- that all epipolar lines in one image must intersect the epipolar
point of that image. In fact, any line which intersects with the
epipolar point is an epipolar line since it can be derived from
some 3D point X [9].
The process of camera calibration gives us both a

model of the camera’s geometry and a distortion model of the
lens. In practice, no lens is perfect
III. IMPLEMENTATION
Fig: 2
A. Undistortion Implementations of correlation algorithms required in

the next stage of the reconstruction process can be greatly
Distortion is explained in detail in section.Using the simplified if the input images can be rectified. The process of
parameters extracted from calibration images are undistorted. rectification involves a 2D image transformation of the input
images such that corresponding image points are located on
B.Rectification equivalent image scan lines. Utilizing geometric properties
inherent to epipolar geometry (see Figure 4), given a point and
Image rectification is an important step in the three its projected location on one image plane, it is possible to
dimensional analysis of scenes. For stereo vision, image calculate on which epipolar line in the other image plane the
rectification can increase both the reliability and the speed of the point will appear. This epipolar constraint allows us to calculate
disparity estimation process. This is because in the rectified and perform the rectification 2D transformation of the original
images, the relative rotation among the original images has been input images [7].
removed and the disparity search happens along the image
horizontal or vertical scan lines. The rectification process
requires certain camera calibration parameters or weakly
calibrated (uncelebrated) epipolar geometries of the image pair
[10].
To reduce disparity estimation complexity, the used
cameras are usually arranged in a parallel-axis configuration or
equivalently, the stereo image pairs are carefully rectified using
the camera geometry. The main function of a stereo
correspondence algorithm is to match each (in the case of dense
stereo) or some (in the case of sparse stereo) pixels of the first
image to their corresponding ones in the second image. The
outcome of this process is a depth image, i.e. a disparity map.
This matching can be done as a 1-D search if the stereo pairs are
accurately rectified. In rectified pairs horizontal scan lines reside Fig 3
on the same epipolar lines, as shown in Fig. 2
A point P1 in one image plane may have arisen from The epipolar constraint expresses the relation between
any of points in the line C1P1, and may appear in the alternate two images of the same scene. The effect of the rectification is
image plane at any point on the so-called epipolar line such that the correspondence problem is reduced to one
E2(Eipipolar Constrain).Thus, the search is theoretically dimension since we only have to search for matching points
reduced within a scan line, since corresponding pair points across a single horizontal line of the matching input image.
reside on the same epipolar line. The difference on the 2 Figure 5 shows the results of rectifying some input images after
horizontal coordinates of these points is the disparity. The calibration of a stereo rig and the capture of a stereo image pair.
disparity map results from assigning to each matched pixel the Analysis of the rectified image pairs shows that matching points
disparity value of its correspondent. Afterwards, the depth of the are indeed positioned on matching scan lines showing this to be
scenery can be derived from the disparity map, since the more a useful rectification. With the rectification of the input images
distant an object is the larger disparity value it is expected to complete it then becomes possible to begin attempting
have [4]. additional stages in the reconstruction process [7].
C. Stereo correspondence problem

The dense correspondence problem consists of finding
a unique mapping between the points belonging to two images
of the same scene. If the camera geometry is known, the images
can be rectified, and the problem reduces to the stereo
correspondence problem, where points in one image can
correspond only to points along the same scan line in the other
image. If the geometry is unknown, then we have the optical
flow estimation problem. In both cases, regions in one image
which have no counterparts in the other image are referred to as
occlusions [11].
Scharstein and Szeliski have provided an exhaustive
comparison of dense stereo correspondence algorithms. Most
algorithms generally utilize local measurements such as image
intensity (or color) and phase, and aggregate information from
multiple pixels using smoothness constraints. The simplest
method of aggregation is to minimize the matching error within
rectangular windows of fixed size. Better approaches utilize
multiple windows adaptive windows which change their size in
order to minimize the error, shift able windows or redicted
windows, all of which give performance improvements at
discontinuities. [4]
D. Disparity
A disparity map is a depth map where the depth Fig 4

information is derived from offset images of the same scene.
Depth maps can be generated using various other methods, such Fig. 4 and Fig. 4 illustrate the disparity estimation
as time-of-flight (sonic, infrared, laser). Although these active algorithm. As described in Fig. 1, the object of interest
methods can often produce far more accurate maps at short registered on different x-position in an image. However, both
distances, the passive method we use has its benefits, including images share similar features. By repeating shifting one of the
applicability at long distances [14]. images in x-axis, in this case the algorithm shift the right image,
at a certain point, both images will exhibit the maximum
As mentioned above, differences between two images number of overlapping pixels. As a result, the shift value that
gives depth information. These differences are known as produced this maximum overlapping area is used to represent
disparities. The key step to obtaining accurate depth information the disparity value for this object of interest. [13]
is therefore finding a detailed and accurate disparity map. Obtaining depth information is achieved through a
Disparity maps can be visualized in grayscale. Close objects process of four steps. Firstly the cameras need to be calibrated.
result in a large disparity value. This is translated into light After calibrating the cameras the assumption is made that the
grayscale values. Objects further away will appear darker. differences in the images are on the same horizontal or epipolar
line [4].
The secondly step is the decision as to which method is
going to be used to find the differences between the two images.
Once this decision is made, an algorithm to obtain the disparity
map needs to be designed or decided on. The third step is to
implement the algorithm to obtain the disparity information. The
final step is to use the disparity information, along with the
camera calibration set in step one, to obtain a detailed three
dimensional view of the world [12].
Fig 6
In the above arrangement, two cameras (C, C’) see the same
feature point (S). The location of the point in the two image
planes is denoted by A and A’. When the cameras are separated
by a distance T, the location of A and A’ from the cameras
normal axis will differ (denoted by U, U’). Using these
differences, the distance (Z) to the point can be calculated from
the following formula:
Z=f*T/U-U’
In order to calculate depth however, the difference of U and U’
need to be established.[4]
This formula allows us to calculate the real world distance of a
point. If what we are interested in is relative distance of points
rather than exact distance we can do this with even less
Fig 5 information. The base offset and focal length of the camera are
the same for both images. Hence the distance of different points
in the images will vary solely based on this disparity
1) Feature Based Disparity: This method looks at features in component. Therefore we can gauge relative distance of points
one image and tries to find the corresponding feature in the in images without having the base offset and focal length [14].
other. The features can be edges, lines, circles and curves.
Nasrabadi applies a curve segment based matching algorithm. IV CONCLUSION
Curve segments are used as the building block in the matching In this paper, we propose an approach for stereo vision to
process. Curve segments are extracted from the edge points calculate depth and 3D map. In our system, we have used
detected. The centre of each extracted curve is used as the cameras so it gives more accuracy in distance calculation than
feature in the matching process. [12] sensors. The algorithm which we have used gives better
disparity map which leads to 3D visual perception. Our system
2) Area Based Disparity: There are two techniques that are works well under following conditions:1)good image quality
used in this algorithm. In both these methods a window is placed 2)proper camera calibration 3)object distance up to 5 feet.
on one image. The other image is scanned using the same size
window. The pixels in each window are compared and operated ACKNOWLEDGMENT
on. These are then summed to give a coefficient for the centre This paper is outcome of the help, co-operation and guidance
pixel. These techniques have been developed by Okutomi and from various corners. We take this opportunity to express our
Kanade [12] deep sense of gratitude to our internal guide Mrs.S.V.Tikhe and
project incharge Mr. Saurabh Mengale. They took keen and
D. Triangulation personal interest in giving us constant encouragement and
The same method that is used in navigation and timely suggestions.
surveying is used to calculate depth. Basic triangulation uses the
known distance of two separated points looking at the same REFERENCES
scene point. From these parameters the distance to the scene
point can be calculated. This same basic idea is used in stereo [1] D. Marr and T. Poggio, “Cooperative computation of stereo
vision to find depth information from two images. Figure 6 disparity,”Science, vol. 194, no. 4262, p. 283, 1976.
[2] D. Murray and J. little, “Using real-time stereo vision for mobile robot
below graphically shows the geometry.[4] navigation,” Autonomous Robots, vol. 8, no. 2, pp. 161–171, 2000.
[3] D. Murray and C. Jennings, “Stereo vision based mapping and navigation [7]. Daniel Bardsley “A Correlation Based Stereo Vision System For Face
for mobile robots,” in Proc. IEEE Int. Conf. on Robotics and Automation, vol. 2, Recognition Applications”2004, University of Nottingham
1997, pp. 1694–1699. [8].http://en.wikipedia.org/wiki/Image_rectification
[4] L. Nalpantidis, A. Amanatiadis, G. Sirakoulis, N. Kyriakoulis, and A. [9].http://en.wikipedia.org/wiki/Epipolar_geometry
Gasteratos,” Dense Disparity Estimation Using a Hierarchical Matching [10 Changming Sun, “Uncelebrated Three-View Image Rectification”,. IMAGE
Technique from Uncalibrated Stereo Vision”, 2009 IEEE AND VISION COMPUTING, VOL.21, NO.3, PP.259-269, MARCH 2003
[5]. Valsamis DOUSKOS, Ilias KALISPERAKIS, George karras“AutomaticC [11]. http://www.cs.umd.edu/~ogale/papers/ogaleIJCV05shape.pdf
Calibration of Digital Cameras usingPlanar chess board pattern”Department of [12]. Paul Munro, Antony P. Gerdelan “Stereo Vision Computer Depth
Surveying, National Technical University of Athens Perception”
[13]. Yen San Yong and Hock Woon Hon.”Disparity Estimation for Objects of
[6]. http://Calibration of Stereo Cameras for Mobile Robots.html Interest”
[14].http://www.Wikidot.com

IEEE Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Paper

Uploaded by

Copyright:

Available Formats

Stereovision to Calculate Depth and 3D Map

College: Cummins College of Engineering for women

Guide: Mrs. Shubhangi V. Tikhe svtikhe@rediffmail.com

Anagha Kulkarni anagha.kulkarni03@gmail.com

Amruta Mohod amrutamohod@gmail.com

Priya Rokade priya.rokade12@gmail.com

Dhanashree Shetty shetty.dhanashree@gmail.com

If lens distortion is strong, one may need to supply

2) Point ordering: After extraction of feature points on an B. Epipolar geometry

3) Point correspondences: The final outcome of this step is a set

The process of camera calibration gives us both a

A. Undistortion Implementations of correlation algorithms required in

C. Stereo correspondence problem

A disparity map is a depth map where the depth Fig 4

You might also like