Professional Documents
Culture Documents
6, DECEMBER 2007
2547
I. I NTRODUCTION
OR ITS autonomous movement in real time, a robotic system has to perceive its environment, calculate the position
of a target or a block, and properly move. For this reason,
many types of sensors and apparatus have been proposed. By
using cameras as sensors, it is possible to mainly have vision
systems with one or two cameras. A stereo-vision system is
composed of two cameras. For the recovery of a 3-D scene
from a pair of stereo images of the scene, it is required to
establish correspondences. The basic steps of the stereo process
are the following: 1) features detection in each image captured;
2) matching of features detected (correspondence), under certain geometric and other constraints, for each pair of stereo
images; and 3) depth calculation by means of the disparity
values found and the geometric parameters of the vision system. From the previous three steps, correspondence between
points (second step) is usually the most difficult step, and it is
generally the most time-consuming.
Depth perception via stereo disparity is a passive method that
does not require any special lighting or scanner to acquire the
images. This method may be used to determine the depths of
points in indoor as well as outdoor scenes and the depths of
Manuscript received September 21, 2005; revised June 27, 2007.
The authors are with the Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece (e-mail:
pated@mail.otenet.gr; ilygour@ee.duth.gr).
Digital Object Identifier 10.1109/TIM.2007.908231
2548
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
2549
Fig. 1. PSVS. Some construction details and the virtual cameras created. The
established coordinate system is also illustrated.
optical axes being parallel to the optical axis of the real camera.
These cameras are symmetrically located to the Z-axis. They
have the same geometric properties and parameters (the same
as the real camera). Consequently, these cameras constitute
an ideal stereo-vision system with two cameras. This vision
system, as it is presented, receives one complex image in a
single shot. A complex image is defined as an image that is
created from the superposition of two images received from the
left and right views of the system. The baseline b of this stereo
system is the distance between the two virtual parallel optical
axes (Fig. 1). The beam splitter selected permits the reflection
of 50% of the incident light, whereas it permits, through its
body, the propagation of the other 50%. As the incident light is
coming from two different directions, 50% of the light in each
direction is lost, whereas the other 50% is driven to the camera
lens. If the intensity of each pixel of an image that is captured
from the left view is IL (i, j), and the intensity of each pixel of
an image that is captured from the right view is IR (i, j), then
the intensity of each pixel of the complex image is given as
IC (i, j) = k IL (i, j) + (1 k) IR (i, j)
(1)
where i and j are the indices of the current row and column, respectively, of a pixel in an image. k is the parameter
(k = 0.5 with the beam splitter used) indicating the decrease
OB tan a
sin 1 + cos 1 tan a
(2)
2 OB tan a
2 OB tan a
+
=
1 tan a
1 + tan a
= 2 OB tan 2a.
(3)
2550
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
Fig. 2. (a) PSVS mounted on the end effector of the robotic manipulator PUMA761. (b) Inner view of the apparatus. (c) Complex image captured in the
environment light conditions. (d) Complex image captured by means of the proposed lighting system.
tan a =
(5)
2 OI + 2b tan a
2 (OI + ID) tan a
=
. (6)
DG =
1 + tan a
1 + tan a
2551
(8)
m=
(10)
d
(1 sin(1 + r ) + cos(1 + r ) tan 1 ) . (11)
cos r
2552
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
xl l
b
f m
f b
+
+
.
xl xr
2 xl xr
xl xr
(12)
yf
z + 2b
Projection of a point onto the image planes of the two virtual cameras.
O2 (xo2 , yo2 , zo2 )
b
b
m, 0, + l .
2
2
yr =
yl =
or
yf
.
z + 2b l
(15)
Then
yf
yf
yl yr
z + 2b l
z + 2b
yl yr =
=y f
z+
b
2
l
.
l z + 2b
(16)
O2 (xo2 , yo2 , zo2 )
b
b
m l, 0,
2
2
.
b f (m + l)
f b
+
.
xl xr
2
xl xr
(17)
b
z+
2
or
yl
y=
f
b
z+
2
.
(19)
V. C ORRESPONDENCE A LGORITHM
A large number of correspondence algorithms have been
proposed over the last 20 years [20], [21]. These algorithms
are classified as area- and feature-based. Area-based algorithms
create a description for each image pixel location, usually by
formulating a measure for the local intensity profile of the
area surrounding the pixel, and compare this measure with the
candidate target pixels in the other image. They produce dense
disparity maps as they work at the pixel level, and they are
more suitable for photometrically invariant imagery. Featurebased approaches rely on the matching of explicit features
extracted from the images, such as the edges which correspond to physical scene properties. These presume a degree
of geometric invariance. Depending on the feature complexity,
the feature-based approaches can be low or high level. Lowlevel methods attempt to match individual edgels (edge points),
whereas high-level methods carry out matching between edge
segments, curves, or regions.
The proposed correspondence algorithm belongs to highlevel feature-based algorithms and particularly to algorithms
that can find correspondences in curves. It differs from the other
proposed algorithms of this class for the following reasons.
1) It is implemented not only for pairs of stereo images but
also for complex images.
2) It exploits the concept of seeds to find edge correspondences. While some other researchers [22] have used
this concept, in the proposed algorithm, it has a different
meaning that will be explained later.
3) The selection and correspondence of one or a few edges
or of a part of an edge, in a semiautomatic procedure, are
possible.
4) The criteria that are used to find the corresponding edges
are applied not only to some features but also to the
whole edge. However, these criteria depend on low-level
features (edgels).
5) The segmented pairs of edges have different color or
grayscale values, and thus, the priority of edges is
determined.
6) In order to obtain a corresponding edge, only some pixels
are found (seeds), and the edge is detected by passive
propagation of seeds.
2553
7) It is a two-stage algorithm. First, it detects the corresponding edges. Second, it finds the corresponding points
for each pair of edges with a desired density of points.
This second stage permits a robot path to be generated
from a pair of curves.
8) While the algorithm is implemented to large images (i.e.,
512 512 pixels), the execution time is variable and
depends on the number of the selected straight or curved
edges and the number of the desired points in each
edge. Thus, in some cases, it can be used in real-time
applications.
To implement the proposed algorithm, a complex image is
initially processed. In the application developed in Visual C++,
a variety of filters and edge-detection methods may be used. In
the final edge image, the desired edges are selected as left-view
edges in a semiautomatic procedure, i.e., by manually coloring
a pixel and then by propagating the pixel to the whole edge.
When all the desired edges are colored, with different color
values, the corresponding edges are detected. In an automatic
procedure, each left-view edge is automatically selected first,
the corresponding edge is detected, and the whole procedure
is repeated until all the pairs of the corresponding edges are
detected. Three criteria are used to select the corresponding
edge:
1) the horizontal upper and lower limits of the initial edge
plus a small permissible deviation measured in pixels;
2) the number of pixels in each initial edge, which is extended by a predefined percentage of the initial number
of pixels;
3) the criterion of the most probable edge.
A number of different criteria have been studied. It was found
that the previous three criteria permit a more reliable detection
of a corresponding edge. To explain these criteria, first, it is
worth noticing the way the edges of the left and right views
are located in a complex image. In such an image, the edges
of an object for the two different views have the same upper
and lower limits. If the object is far away from the PSVS, the
disparities of the different corresponding edge points will have
small differences in value. Contrary to the previous case, if the
object is near the PSVS and, moreover, its end points are along
the Z-axis (optical axis), the difference in disparity values will
be significant. If the object is not symmetrically located to the
Y -axis, the number of pixels of the two different views will also
significantly differ. Consequently, it is made clear why the first
criterion is required. A correct corresponding edge will have
the same upper and lower limits with the initial edge. Thus, the
edges whose limits are different from the limits of the initial
edge are excluded from detection. However, a small deviation is
permitted to these limits. This deviation is measured in pixels.
The second criterion checks for a significant deviation in the
number of pixels per edge. This deviation must be greater than
the number of pixels of the initial edge plus a percentage of
that number. The extra percentage of pixels is determined in the
application before the algorithm is executed. Once the second
criterion is applied, the edges with an excessively large number
of pixels but probably with the same limits, compared with the
initial edge, are rejected. The third criterion is more powerful
2554
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
than the other two and can be applied by itself if the differences
in pixel disparity values are too small. The constraint used, as it
happens in a large number of correspondence algorithms, is the
epipolar constraint. According to this criterion, we assume that
in order to detect a corresponding edge, it is not necessary to
examine all the pixels of the initial edge. Thus, from the colored
pixels of the initial edge, a few are selected. The selection
is automatically made by means of a prespecified number of
desired pixels (the range of this number is from 2 to 100). The
selected pixels are equally distributed along the whole initial
edge. For each one of these pixels, the corresponding epipolar
line of the complex image is scanned. The candidate edge pixels
are detected and stored in a matrix of counters according to their
distance from the initial edge pixel. The procedure is repeated
for all the selected pixels of the initial edge. Thus, only a few
lines are scanned, and at the end of this procedure, each element
of the matrix contains the population of the detected pixels per
distance.
Then, the third criterion is defined as follows.
Denition: The most probable candidate edge corresponds
to the maximum population of pixels at the same distance from
the initial image. At this distance, at least one pixel of the
candidate edge is detected.
If an object is parallel to the XY plane of the camera
system, then its projections onto a complex image plane are
parallel edges, and all the selected pixels are detected in the
corresponding edge. In any other case, the number of the
detected pixels is smaller. At least one pixel is necessary to
create the corresponding edge. Pixels that satisfy the third
criterion and, at the same time, belong to an edge are colored
with the same color as the initial edge. These pixels are called
seeds, and they are propagated as four-neighbor pixels. In this
way, the whole corresponding edge is created. Propagation is
a passive procedure with very low computational cost. When
applying the aforementioned criteria, it is obvious that the
algorithm cannot find correspondences if an edge of the left
view intersects an edge of the right view. However, in the
semiautomatic approach, parts of the corresponding edges can
be found. In a complex image, the intersection of the edges
may occur when the left and right views of the objects overlap.
For this reason, we are researching (our research has yet to be
completed) the separation of a complex image into the pair of
stereo images and its reconstruction. We are elaborating on two
different methods. The first one is based on grayscale cameras
and the use of spatial filters in front of the PSVS, whereas
the second one is based on color cameras and the use of color
filters. The second method separates a complex image into the
pair of stereo images at frame rate.
After the corresponding edges have been created (detection
and propagation), the corresponding points in each pair of
edges can be found by mapping the points one by one. In
this second stage, the density of points, i.e., the number of
corresponding points in a pair of edges, is determined by means
of a prespecified number. The range of this number is from
3 to 1000, and it is manually selected before the algorithm is
implemented. The corresponding pairs of points are equally
distributed, and their locations and disparities are stored in a
matrix.
Fig. 6. Correspondence procedure. (a) In the first stage, seven points are
selected for correspondence from the desired edge. Only one corresponding
point was found. (b) In the second stage, 30 corresponding pairs of points were
found. Parallel lines show this mapping.
2555
Fig. 7. (a) and (b) Complex images of a scene comprising a few objects.
(c) Partial overlapping of the ashtrays is shown. (d) and (e) Complex images
of more complicated scenes are presented.
Fig. 9. Disparity map, computed by means of the complex image of pliers, for
500 points, is illustrated.
Fig. 8. (a) Initial complex image is captured by means of the PSVS. (b) Binary
image is illustrated after the initial processing of the complex image. (c) Left
view of the pliers is selected (with grayscale value). (d) Detection of the right
view is presented. Parallel lines show the mapping of the desired corresponding
pairs of points.
2556
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
Fig. 11. Complex images of the same iron plate for different distances. The
two different views are always adjacent.
Fig. 12.
Fig. 10. (a) Initial complex image of a complicated scene captured by means
of the PSVS. (b) Edge image is illustrated. (c) Two segments of edges are
selected with different grayscale values. (d) Having implemented the correspondence algorithm, the pairs of the corresponding points are indicated by
means of parallel lines.
selected points of the initial edges is five, and the seeds found
are three (for the segment with a grayscale value that is equal
to 185) and one (for the segment with a grayscale value that is
equal to 192). For this experiment, the number of the desired
corresponding pairs of points is specified to be 35 [Fig. 10(d)].
The coordinates of 3-D points can be computed from the
corresponding pairs of points by means of (12)(14).
C. Complex-Image Views Overlapping
In the two previous experiments, the examples of a scene
comprising an object and a complicated scene are presented.
Overlapping of the left and right views of an object in a complex
image takes place when the dimension of the object along the
X-axis of the camera is greater than the length of baseline b,
whereas it does not depend on depth z. To show it, by way of
experiment, Fig. 11 is used.
It consists of eight complex images of an iron plate measuring 98 68 mm. The long dimension of the plate is parallel to
the X-axis of the camera frame, b = 98 mm, and the distance
of the PSVS from the object is increased every 100 mm. As
the long dimension of the plate is equal to b, the two different
views are adjacent at any location of the plate. While the size
of the views is decreased as the distance from the iron plate
is increased, the two views are always adjacent. Thus, the
overlapping of the views does not depend on the distance of
the objects from the PSVS but only on b.
D. Measurement Accuracy
The pattern of Fig. 12(a) was used to verify the accuracy of
the PSVS measurements. The camera was calibrated by means
of the method of Zhang [23] and the corner-detection algorithm
proposed in [24]. Here, the length of the system baseline is
b = 100 mm, and the active focal length is f = 16.276 mm.
Fig. 12(b) shows a sample of the complex images received
by the PSVS. The diameter of each circle is d = 10 mm, and
the distance between the circles is a = 20 mm. The PSVS
is mounted on the end effector of the robotic manipulator
PUMA 761.
One part of the HumanPT is used to achieve communication
between the robot controller and the personal computer by
means of the ALTER communication port, and another part is
used to control the robot. Thus, the manipulator is controlled
by means of the personal computer (server), whereas a second
PC (client) is connected with the first one via Ethernet. The
main robotic application is executed on the second PC. After
implementing the correspondence algorithm, a third part of this
software is used to calculate the values of depth z, the values
of circle diameters, and the distance between the centers of the
first and last circles. The Z-axis of the camera is aligned to the
Z-axis of the coordinate system on the base of the manipulator.
Using the aforementioned structure, the PSVS distance from
the pattern is increased every 50 mm (along the Z-axis) by
means of the robotic manipulator. Sixteen images are captured,
and after their processing, the results of the calculated depth,
the circle diameter, and the distance between the centers of the
first and last circles are presented in Table I.
The distances of the PSVS (of the optical center) from the
pattern are manually measured. For this reason, the position of
the optical center is first estimated. To estimate the position of
2557
TABLE I
DEPTH, CIRCLE DIAMETER, AND DISTANCE BETWEEN THE CENTERS
OF THE F IRST AND L AST C IRCLES M EASURED AND
CALCULATED IN MILLIMETERS
Fig. 13. (a) Calculated depth-z error percentages versus the measured depth z.
(b) Measured and calculated distances versus the calculated depth z.
Fig. 14. (a) Complex image of the TOB initial pose. (b) Complex image of
the TOB final pose.
2558
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007
Fig. 15. Translation of the end effector of the manipulator along the X-, Y -,
and Z-axes from an initial to a final pose.
Fig. 16. (a) Translation velocities of the end effector of the manipulator.
(b) Rotation velocities of the end effector of the manipulator.
2559
Fig. 18. Errors measured with respect to the real distance of each vision
system from the target.
2560
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 56, NO. 6, DECEMBER 2007