You are on page 1of 6

Stereo Vision and Navigation within Buildings

Ernst Triendl and David J. Kriegman

Artificial Intelligence Laboratory


Stanford University
Stanford, CA 94305

Abstract: Soft modeling, stereo vision, motion planning, These goals are achieved by a combination of mod-
uncertainty reduction, image processing, and locomotion eling and vision. Modeling tells us that buildings have
enable the Mobile Autonomous Robot Stanford to explore walls, doors and floors that obey certain relations to each
a benign indoor environment without human intervention. other and the vehicle. The vision system determines the
The modeling system describes rooms in terms of floor, regions of free space and location of obstacles. The mo-
walls, hinged doors and allows for unspecified obstacles. tion planning system generates moves that are likely to
Image processing basically extracts vertical edges along increase the knowledge about free space so that further
the horizon using a n edge appearance model. Stereo vi- moves will be possible.
sion matches those edges using edge and level greysimilar-
Examples presented are from the first convincing run
ity, constraint propagation and a preference for epipolar
on October 18, 1986. Left to its own devices Mobi moved
ordering. The motion planner tries to move in a way that
down the hallway, made a tour of the lobby and cameback
is likely to increase knowledge about obstacle free space.
into the hallway, travelinga total distance of 35 meters.
Results presented are from an autonomous run that in-
cluded difficult passages suchas navigation arounda pillar In this pa.per we discuss the vision and motion plan-
without apriori knowledge. ningalgorithms. More abouttherobot,itsother sen-
sors and on odometry correction by vision is contained in
[Kriegman 871 and in [Triendl 871.
1. Introduction
Our goal is to developed a stereo vision system that
allows a rob,ot to exploretheinterior of a typicalbuild- 2. Modeling
ing; a benign environment that is neither rigged for the
purpose,
nor filled with
tricky
obstacles. We expect
robot
explore
the
to inside
buildings.
The
base model used to represent knowledge about this en-
By the end of 1986 the Mobile Autonomous Robot
vironmentconsists of a flatfloor thatcarriesvertical,
Stanford or Mobi w a s able to move relatively freely under
straight wallswithhingeddoors.Aroomis thespace
its own guidance inside our laboratory building. It is
between walls. A hallway is a long and narrow room.
0 fast enough for slow walking speed,
The actual implementation of the model allows for
0 able to understand enough about the indoors tomove surface marks on the walls and a class “Other Object”
about and recognize the more important elements of to cope with things that the stereo vision system sees,
a building. modeling cannot explain, but the motion planner should
0 able to move autonomously, exploring rooms without not push over.
human intervention. Figure 1 is a typical instantiation of our model that
contains all possible objects and their joining edges. Com-
pare it to the exampleof a stereo pair of images (figure 2)
seen by the robot during its excursion.
Support for this work was provided by the Air Force Office
of Scientific Research under contract F33615-85-C-5106, Image The choice of this model has implications for the vi-
Understanding contract N00039-84-c-0211 and Autonomous sion system: It needs only tolook a t vertical edges to gen-
LandVehicle contract AIDS-1085s-1. David Kriegman was erate the model. In fact looking for edges at the horizon
supported by a fellowship from the Fannie and John Hertz suffices, since all importent vertical edges cross
a horizon-
Foundation.

1725
CH2413-3/87/0000/1725$01.00 0 1987 IEEE
other hand, looking down occasionally is needed to avoid
obstacles on thefloor. Our Robot does not doso now and
consequently rams into chairs, flower pots (if small) and
couches.
A model of the robot, (figure 3) is used for vision and
motion planning. Mobi is an omnidirectional cylindrical
vehicle, 170 cm high, 65 cm in diameter. It has 12 touch
sensitive bumpers, and carriestwo cameras (17 cm apart)
which have a 36 degree field of view.

3. Edge Detection
Figure 1 . Possible Instantiation of Model A verticaledgedetectorwith anaperture of 5
columns by 10 rows proceeds in two stages:
First a 1 by 10 vertical averaging filter is applied to
both images a t t h e horizon, which is known from a cali-
bration program that measures camera orientations. This
vertical smearing has the following effects:
1. Vertical edges retain their acuity.
2. Slanted edges get blurred, horizontal edges vanish.
3. Tilt and roll angle misalignment have less effect.
4. Image noise is reduced.
5 . Blobs become vertical edges.
Figure 2. Stereo pair of images seen by Mobi while roam-
ing through our lab Second a 5 by 1 version of the edge appearance model
is applied to the filtered image line. The edge appearance
model [Triendl 19781 compares a local patch of image to
the image thatwould have been creat,edif the camerawere
looking at an ideal step edge. It uses thespatialfilter
created by the lens-camera-digitizer-preprocessor pipeline
for this purpose. The operator returns quality, position,
direction, left and right greylevels and an estimateof the
localization error (1/8 pixel for good edges).

Alternative Edge Extractions


Before arriving at the above solution
we explored sev-

Figure 3. Model of Mobi with cameras and field of view.


. eral alternatives. Some of them are to be applied laterfor
more specific purposes such as looking for a floor edge.
figure 2 .
T h e stereo matches are from the image Detecting all edges in the image first and then link-
ing them into lines took several minutes per stereo pair.
Combining edge detection and linkage improved to speed
to about 30 seconds, mostly spent for the search for new
tal plane at camera height (otherwise Mobi could not fit
edge links. Reducing this search risks loosing short and
through doors). One ambiguity remains though: We may
marginal chainsof edges.
confuse a gap in the wall with the wall itself unless some
other object is seen through the gap. A look at the floor The resulting lines were labeled ‘straight’ and ‘bent’
edge might resolve this situation. andconnectedwithotherlinesformingcorners, T-
junction, Y-junctions and arrows. These were intended
Whether one should look a t t h e floor at all is also to label the resulting line-graph according to the model
a question of good usage of processing resources: When and combine monocular and binocular stereo. Againpro-
all information about the model can be gained by looking cessing time wastoo long, and in addition lines were easily
horizontally, looking down will slow the process. On the

1726
broken by door knobs, labels on thewall and the likes, so and second neighbors to the left. By comparing these 4
that stereo pairs weredifficult to determine. Many real combinations per match, marginal edges present in only
objects did not give sufficiently good and consistent edges one image can be pinpointed and eliminated. As a side
for a stereo match. effect, high correlation makes itlikely that the ribbon be-
Non-vert,ical lines that fit the model, i.e.floor-wall tween edges results from a solid object, in our model a
and floor-door edges, are often found outside the field of wall or a closed door (see below),
view of the camera, too low in contrast, ill defined, or Localconsistencylinksneighborsanddetermines
obstructed by furniture.Whenextractingverticallines which neighboring ma.tch is more consistent. Theseneigh-
the image can be enhanced by application of a vertical bor links form paths of maximalconsistencythatlink
low pass filter, a moving average over a few lines. The groups of likely match choices. Simply summing up match
line followerwon’t get thrown off by a tackin a door qualities in a group represents a n effective means of con-
frame. All important information, except edge length, is straint propagation. A group of equal bars or a checker-
provided by the first edge of a vertical. So finally we drop board that is entirelyvisible will be matched correctly. In
angular sensitivity of the edge detector and linefollowing the case of an occluding edge, the constraint will p r o p
and arrive at our present solution. agate up to the edge, and the part visible only to one
camera will be left unmatched.
Edge detector, stereo proposer and grey level compar-
4. Stereo Algorithm ision function are implemented in C and run on a VAX.
Processing time is about one second per stereo pair. Data
Certainty about a stereo match is not possible, since arethensentto a Symbolics 3600 lispmachine which
we can find a possible if far fetched physical explanation makes the final matching decisions shown as little circles
for any match. Even in the series of pictures from our test infigure 5 . All matches in the example happen to be
run discussed below, we find matching edges that appear unique, but motion matches, if available,wouldresolve
very different. We do tap all available sources of informa- any remaining ambiguity.
tion and cues to make the chance of correct matches in
the real world as high as possible. See [Baker 19821 and
[Tsuji 19861 for other solutions.
5. Modelmaking: Spines, Walls, Doors
Thestereomechanism we finally settled onuses
edges, grey levels, correlationof intensities, constellations Looking at the ribbons formed by pairs of neighbor-
of edges and constraint propagation. It tries to preserve ing edges, we tentatively call ‘wall’ a ribbon whose ap-
left to right ordering of edge matches but allows viola- pearance is similar in the left and right image (see fig 5).
tions, e.g. by a pillar in the middle of a room. Multiple The direction angles of these prospective walls are clus-
choices of matches are kept along if needed. tered with weights proportional to their lengths. The most
T h e first stage of the matcher proposes all possible prominent angle will be the angleof the main walls in the
matches of pairs edges that are similar on the left or right scene.
side of the edge or have a similar grey level curve. Note After creating an empty model the wall spines are
that this includes matches of occluding edges with differ- added. A wall spine is an unbounded straight line (Le.
ing background. Edges and their potential matches super- vertical plane in 3D) through the center of walls. In the
imposed on the grey level curves of left and right epipolar present example2 spines have been found along the x-axis
line of the image are shown in figure 4. in positions 50cm to the left and 93cm to the right. They
The grey level comparison function that we use is might be called leftwall and rightwall.
similar to the normalized cross correlation but takes into
Potential walls and doors are added to the model if
account differences in standard deviations, mean grey lev-
t.hey are close to a spine. The model created SO far from
els and interval lengths.
the sample image has two wall spines with 3 doors and
several wall slabs (fig 6).

A dooristheribbonbetween two not necessarily


neighboring matches that (1) are 60 to 140 cm apart, (2)
have edges with big brightness difference, (3) are dark on
Figure 4, Stereo match proposals and grey level curves the inside, bright on the outside, and vice versa, (4) can
be associated witha wall spine. If the ribbons look similar
Next we deal with local consistency. First the grey- in both images they are called ‘closed-door’, if they look
level-comparison-function is applied to the intervals be- very different they are labelled ‘open-door’ otherwise just
tween pairs of matching edges and their respective first ‘door’.

1727
Figure 5 Mobis footprint and matches bound free space. The labels read ‘wall’ ‘door’ ‘closed-door’
are put by the model proposer. Derived from image fig. 2 and partially shown in fig 3.

Figure 6‘. Model derived from image figure 2. The narrow


c
rectangles are doors.

. Six one meter steps of Mobi later the model contains


5 doors and several pieces of wall on two spines, as shown
in figure 7.

2
Figure 8. Modelseenfrom theintzrmediateposition.
Most of lies outside the retina of mobis real camera.

Figure 9. Picture seen from the intermediate position. +

Figure 7 Model after 6 steps. The mobi icons represent first and last step. The gap in the left spine
is mostly real (it did not see the pillar there yet). The gap in the right series of walls is where a
glass covered display case hides the wall.

1728
6. Free Space for Motion Planning
Free space is the volume,or in our case, the floor area,
that is known to be free from obstacles. Init,ially it is the
area occupied by the vehicles footprint. A safe move will
keep the vehicle entirely within free space.
The rays between both cameras and the object go
through free spaceif the match was correct and no window
panes or mirrors are involved. For modeling we have as-
sumed that certain ribbonsbetween neighboring matches
represent solid objects. For motion planning we assume
now that all those ribbons are solid, to be on the safe side.
The hull of lines connecting neighboring matches and lines
connecting matches with each camera represents our vi-
sual free space. Looking at this closed polygon in figure
5 (part of it is cut of by the page layout) we notice that
no safe move is possible from a single view since there is
a bottleneck between footprint and visual free space.
If the robot has already moved, the new free space
consists of the suDerDosition of the new visual free mace
~~ ~
. I

to the previous visual free space and the space swept out
during the motion. Old freespace has to be shrunk ac-
cording to motion uncertainty.
The strategyof the motion planner implemented is to
generate enough free space to allow motion and to make
the vehiclecover groundwithouttoomanydeviations
without getting cornered of caught into repetitive moves.
It moves the vehicle into the middle of free space andlook-
ing towards the middle of free space ahead. Smaller steps
result when obstacles are closeby. If free space bifurcates,
i.e. there are several middles,a random choice is made. If
there is no free space,or no space tolook a t , Mobi rotates
until it finds space to move.

7. Autonomous Exploration
Now, let us take a look at what a real mobile robot
runislike.Tomakethestartinteresting, we pointed
mobi at a blank wall and commanded it to go. It did
not see any edges while staring at a white wall, and per-
formedtheonlysafemove,rotation.Afterthis move,
seen enough to move forward and rotates further until it
is looking down the hallway (figure 2) and sees enough
correspondence points to build a large enough region of
free space to begin translating. Mobi travels towards the
end of hallway without any further incidents building up
the model shown previously (figures 5 to 7).

Figure 11 Motion series: Mobi passes the pillar and turns


from the hallway into the lobby.

1729
T h e first challenge is the white pillar visible in the References
first image of the sequence figure 11. This poses a partic-
ularly interesting problem because the epipolar ordering 1. Triendl,Ernst;Kriegman,David J.; Binford,Tom
constraintmaybeviolated.Furthermore,thebuilding A Mobile Robot: Sensing, Blaming andLocomo-
corner causes a major occlusion of the window. Note the tion Proc. IEEE Int. Conf. Robotics & Aulomalion,
white signs on the window are visible in the right image 1987.
and are occluded in the left image. The occluding edge 2. Triendl, Ernst; Kriegman, DavidJ. Vision and Visual
changes from light grey to black in the left image while Exploration for the Stanford MobileRobot Proc. Im-
switching from grey to bright white in the right image. a g e UnderslandingWorkshop 1987.
Also note theblack wire danglingclose to the corner. This
3. A.R. de Saint Vincent, A 3D Perception System for
the real world after all. The robot still sees a large region
the Mobile Robot Hilare, Proc. IEEEInt.Conf.
of free space and heads towards the water fountain. Fi- Robotics & Aulomaiion, 1986.
nally, mobi passes the pillar, gets close to the end of the
hallway, seeslittle andseeing the reflections of lobby heads 4. S. Tsuji et al., Stereo Vision for a MobileRobot:
straight toward thewindow. Fortunately for us Mobi sees World Constraints for Image Matching andhterpre-
the edges of the paper signs. Mobi rotates again to avoid tation, Proc. IEEE In?.Conf. Robotics & Automa-
bumping the window. lion, 1986.
5. A.M. Waxman et al., A Visual Navigation System,
T h e excursion continued making similar turnsa t t h e
next window. Sadly, the run ended because of a commu-
Proc. IEEE
Int.
Conf.
Rolofics 63 Automation,
nication failure between the lisp machine and the robot 1986.
permitting the researchers and onlookers to retire for the 6. Brooks, Rodney A., Visual Map Making far a Mobile
evening after celebrating over champagne. Robot, Proceedings IEEEInternallanal Conference
on Robotics and Automation, 1985.
7. Chaiila, Raja; Laumond, Jean-Paul, Position Refer-
8. Conclusion encing a d Consistent World Modeling for Mobile
Robots Proc. IEEE Inl. Conf. Robolics & Automa-
So, Mobi can successfully navigate through the inside
tion, 1985.
of a building under automatic visual control, while creat-
ing its own symbolic model of the building structure. By 8. W.P. Moravec, The StanfordCartand the CMU
choosing a model that can be simply instantiated with the Rover, Proc. IEEE, vol. 71, no. 7 , July 1983.
detection of vertical edges in the world, processing time 9. H . Baker, Depth from Edge and Intensity Based
has been greatly reduced. Stereo AIM-347, Stanford University, 1982.
Acknowledgements 10. R.A. Brooks, SymboficReasoningamong 3-D Models
and 2-D Images Ph.D. dissertaiion, Stanford Univer-
We’d like to thank Tom Binford, Soon Yao Kong, Ron
sity, 1981.
Fearing, Giora Gorali, S h a d Fishman, Leonie Dreschler-
Fischer, Rami Rise and Rami Rubenstein for all their help 11. E. Triendl, Modellierung von Kanten bei m e g e l -
throughout this work. m3Biger Rastermng in Bildverarbeilung und Muster-
erkennung, E. Triendl (ed), Springer, Berlin, 1978.
12. -, How to get the Edge into the Map Proc. 4th In-
lernationalConference on Paltern Recognition, Ky-
oto, 1078.

1730

You might also like