Professional Documents
Culture Documents
November 2010
iii
e-mail: daniel.clarke1@uqconnect.edu.au
November 4, 2010
The Dean
School of Engineering
University of Queensland
St Lucia, Q 4072
I declare that the work submitted in this thesis is my own, except as acknowl-
edged in the text and footnotes, and has not been previously submitted for a degree
at the University of Queensland or any other institution.
Yours sincerely,
Daniel Clarke
Acknowledgments
Mum
For everything you’ve done to get me here.
Dr D. Ball
For the constant and copious amounts contact time and guidance.
S. Heath
For the priceless help and friendship throughout the year.
W. Maddern
For the general advice and last minute proof reading of this document.
N. Ali
For proof reading this, going above and beyond in team projects and for the cookies.
v
Abstract
The i-Rat currently uses three infra-red proximity sensors to avoid obstacles dur-
ing navigation. Due to the lack of information received from these sensors, the i-Rat
is constrained to operate in tightly structured environments.
The aim of this thesis is to use this visual information to detect and avoid ob-
stacles in a laboratory environment using the i-Rat mobile robot platform and an
off board processor. The high bandwidth of information received from the visual
sensor will allow the i-Rat to better perceive its environment, and liberate it from
such tightly structured environments.
vi
Contents
Acknowledgments v
Abstract vi
List of Figures x
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Chapter Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Vision Based Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Stereo Vision based Obstacle Avoidance . . . . . . . . . . . . 3
2.1.2 Issues with Stereo for the i-Rat . . . . . . . . . . . . . . . . . 5
2.1.3 Monocular Vision based Obstacle Avoidance . . . . . . . . . . 6
2.2 Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Geometrical Optics . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Focal Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Field of View . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 Lens Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.4 Shutter Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Stereo Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.3 Stereo Camera Calibration . . . . . . . . . . . . . . . . . . . . 13
2.5 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
viii CONTENTS
3 System Requirements 18
4 Camera Calibration 19
4.1 Caltech Calibration Toolbox . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Caltech Calibration Toolbox Results . . . . . . . . . . . . . . 21
4.2 Direct Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Direct Mapping Results . . . . . . . . . . . . . . . . . . . . . 24
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Colour Segmentation 26
5.1 Colour Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Edge Detection 31
7 Time Derivative 33
8 Method Combination 35
9 Evaluation 38
9.1 Ranging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.2 Obstacle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9.3 Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10 Conclusion 46
11 Future Work 47
References 49
List of Figures
ix
x LIST OF FIGURES
Introduction
The field of vision based obstacle avoidance can be segmented into many different
categories based on machine learning, stereo vision, motion based perception such
as optical flow and image segmentation methods using colour or edge information.
Some robot platforms are unable to support sensors such as laser scanners due to
size and or cost constraints, whereas visual sensors are generally both small and
cheap [7]. Other sensors, such as proximity sensors, are not suited for the task of
navigation and can only be successfully used in a strictly structured environment
[11]. Environmental cues such as colour, lighting conditions and movement are
more readily perceived through visual information than through the use of other
specialised sensors. Due to the benefits of visual sensors, considerable efforts in the
progression of vision based robot platforms have been and are being made [16] [10].
1.1 Motivation
The i-Rat mobile robot platform, under development as part of the Thinking Sys-
tems research project, is used for research in the field of navigation. It currently
uses three infra red proximity sensors to detect obstacles. The lack of information
provided by the sensor array restricts the i-Rat to operation in carefully structured
environments.
It is planned that the i-Rat will be able to freely navigate unstructured indoor
environments. For this to become a reality it must use its visual sensor for greater
perception of the environment. The purpose of this thesis is to provide the i-Rat
with the ability to freely explore a minimally structured indoor environment using
machine vision.
1
2 CHAPTER 1. INTRODUCTION
1.2 Scope
This thesis will develop an obstacle avoidance system for a mobile robot platform,
the i-Rat, purely using the video stream captured from the onboard wide angle
camera. As this system is required to provide the obstacle avoidance, it is restricted
to image processing and computer vision techniques that can be performed in real
time on a live video stream. The obstacle avoidance system will only need to avoid
obstacles that are within view of the i-Rat. Obstacles out of view of the i-Rat will
have to be avoided using a system not presented in this thesis, such as haptic feed-
back, or by mapping the obstacles in the environment. The system will be designed
for use indoors where the ground is assumed to be flat and motions of the robot
are restricted to translation parallel and rotations perpendicular to the floor. The
obstacle avoidance system is not expected to use machine learning based methods.
Design alterations of the i-Rat platform are considered beyond the scope of this
thesis. The system will provide a reactive ability to avoid obstacles and is not re-
quired to map the environment or perform localisation. The vision based obstacle
avoidance system should allow the i-Rat to operate in larger, less structured envi-
ronments than currently permitted.
Background
3
4 CHAPTER 2. BACKGROUND
Figure 2.1: Depth Map from Stereo Images (reproduced from [9])
Murray and Little use a trinocular camera module, shown in figure 2.2, to extract
depth information from a scene. They used three cameras in their vision module
to remove ambiguity in the stereo matching. A disparity map is produced using
the three images. Although the camera module captures scene information in three
dimensions, Murray and Little use this information to produce a 2D distance map.
The maximum disparity in each column of the disparity map is extracted and used
to produce a single row of distance measures. They justify this decision by arguing
that indoor robots essentially operate in a 2D top down environment.
The trinocular module is too large for use on the i-Rat. There are other stereo
vision modules commercially available such as Point Grey’s BumbleBee [4] or Sur-
veyor’s SVS [2] which both use twin parallel cameras. However, these too are too
large to fit on the i-Rat.
Elinas, Sim and Little have approached the problem that is SLAM [39] using a Rao-
Blackwellised particle filter [17] and a stereo camera [18]. Their method uses the
images captured from the stereo camera as the system’s only input. They construct
2.1. VISION BASED OBSTACLE AVOIDANCE 5
dense metric maps of 3D landmarks using the Scale Invariant Feature Transforma-
tion (SIFT) [32]. The algorithm checks the ratio of old landmarks to total landmarks
in the scene. If the ratio is greater than 30% then the pose of the robot is estimated
using a weighted sum of a motion model and a second distribution that depends on
the latest observation and the current learned map, otherwise the pose is estimated
using a vision based motion model.
Figure 2.3 shows the processing time of the algorithm versus the frame number.
The algorithm takes between one and two seconds to compute per frame on average.
While the algorithm achieves results similar to models that use laser and odometric
sensors, the computation time per frame is simply too long for use in real time.
However this would reduce the image quality, resulting in poorer distance estimation
and obstacle detection.
Edge detection, image segmentation and visual sonar approaches commonly stem
from the same principle of assuming the ground to be a flat plane with obstacles
rising vertically from it [45]. From this assumption, the distances to obstacles can
be estimated using camera calibration equations or a look up table.
Optical flow methods use the flow of an image sequence to avoid obstacles.
Prominent features, such as corners or boundaries between objects, will produce
regions with greater optical flow. During navigation, the robot will attempt to
travel to areas of lower optical flow. Thus areas containing obstacles will produce
regions of higher optical flow and will be avoided.
door environments, the process of learning using a SICK laser scanner is financially
impractical for the budget of this project.
obstacle boundary to proportionally set forward velocity and the difference between
the left and right averages to set rotational velocity.
The histogram method used by Lorigo et al allows the robot to navigate over
changing terrain. As it does not require knowledge of what the ground looks like,
it can be deployed on any coloured surface and is ready to autonomously navigate.
A problem with this method is encountered if the robot approaches an obstacle so
closely that it covers the entire visual sensor. If this scenario were to occur, the
robot would use the bottom of the obstacle to compare to the rest of the image
and interpret the environment as obstacle free, driving into the obstacle under the
impression that it is part of the ground.
2.2 Lenses
2.2.1 Geometrical Optics
Geometrical optics is a sufficient approximation to light for optical analysis in this
project. Under the approximation of geometrical optics, light obeys three laws:
Rectilinear Propagation
Refraction
Reflection
X
−x =f (2.1)
Z
2.2. LENSES 9
d
α = 2 tan( ) (2.2)
2f
Figure 2.6: Extreme Distortion by a Fish Eye Lens (reproduced from [40])
Extreme radial distortion can be modelled using a third order Taylor series ex-
pansion. The radial location of a point on an image sensor can be corrected using
the equations [12], letting r 2 = x2 + y 2 and where the k terms correspond to the
radial and tangential distortion parameters of the camera:
Tangential distortion is caused by the lens not being parallel to the image sensor.
Tangential distortion can be corrected using the equations [12]:
2.3 Cameras
2.3.1 Camera Calibration
It is often desirable to counteract the effects of lens distortion when working with
machine vision [37]. Equations 2.3, 2.4, 2.5 and 2.6 can be used with the camera dis-
tortion parameters to correct lens distortion provided the distortion is not too severe.
of the image is in focus. An image with a deep depth of field has a large portion,
often the entire image, in focus. In robotics, a deep depth of field is desirable as
more information can be perceived from the environment and the effects of motion
blur are reduced [13].
2.3.3 Aperture
The aperture of the camera is the term used to describe the size of the opening of
the lens diaphragm. By adjusting the aperture of the lens the rate at which light
can enter the image sensor is changed. As the aperture decreases the opening of
the diaphragm increases allowing more light to reach the image sensor in a given
duration of time. A small aperture creates a narrow depth of field. Conversely, a
large aperture reduces the size of the diaphragm, restricting the amount of light
that can reach the image sensor in a given duration. A large aperture gives a wide
depth of field.
2.4.1 Stereopsis
Stereopsis is the act of perceiving depth or three dimensions using two or more two
dimensional images. This ability to perceive depth from multiple images is used
in robot control [22] [25] and navigation [29] [8] as its versatility enables the robot
to navigate in complex environments. The first process of stereopsis is to remove
the radial and tangential lens distortions. The images are rectified by accounting
for the difference in angle and position between the two cameras. At this stage the
two images will be rectified and their rows will be aligned. Identical features in the
two images are identified and a disparity map is produced by taking the differences
between the same features of the two images. The disparity map is then used to
determine distances using triangulation [12].
pl · [T × (Rpr )] (2.7)
where p and p′ are the homogeneous image coordinate vectors, T is the translation
−−−→
vector from the optic centre of one camera to the other Ol Or and R is the rotation
matrix used to describe a vector from one reference frame in the other reference
frame.
2.4. STEREO VISION 13
(x − a) · n = 0 (2.8)
The epipolar plane contains the vectors Pl and T . Taking the cross product of
the vectors Pl and T results in a vector normal to the epipolar plane. Equation
eq : pointsp lanec onstraint can then be used to describe all the points Pl through
the point T :
(Pl − T )T (T × Pl ) = 0 (2.9)
The right projection is obtained by transforming the left projection. The trans-
formation is achieved by:
Pr = R(Pl − T ) (2.10)
This transformation allows the points on the right projection plane to be intro-
duced into the equation:
(R−1 Pr )T (T × Pl ) = 0 (2.11)
Given that:
RT = R−1 (2.12)
14 CHAPTER 2. BACKGROUND
(RT Pr )T (T × Pl ) = 0 (2.13)
−Ty Tx 0
Rewriting equation 2.13 using equation 2.14:
Equation 2.14 also produces the essential matrix E which contains the translation
and rotation parameters that relate the two cameras in physical space:
RS = E (2.16)
The relation between the left and right projection planes is given by:
The fundamental matrix F is similar to the essential matrix E except that is also
contains the intrinsic properties of the cameras. Equation 2.17 relates the physical
points of one projection plane to the other. What is in fact needed is the relation
between the pixels of the two cameras. Using the relation between a physical point
and a pixel p = M −1 q where M is the camera intrinsics matrix, equation 2.17 can
be altered to relate the pixels of the two cameras:
qrT F ql = 0 (2.21)
which relates the pixels from one camera to the pixels in the other camera.
v
OF = −ω + cos θ (2.23)
d
Obstacles in line with the direction of motion will not illicit optical flow but
will instead expand as the reference frame moves closer to the obstacle. As the
reference frame gets closer, the rate of expansion increases. Optical flow can be
used to detect obstacles and provide heading and velocity information[3]. Obstacles
can be detected by the rate of flow of the visual field. Close obstacles will flow faster
than objects that are further away. The translational and rotational velocity of the
reference frame can be estimated by taking an average of the optical flow with close
objects excluded, as they would provide great and perhaps incorrect biasing to the
result [7].
2.8 i-Rat
Currently under development is a biologically inspired robot the ”i-Rat” (intelligent
Rat animat technology) as part of research undertaken by Thinking Systems at the
University of Queensland shown in Figure 2.9. The intended purpose of the i-Rat
is for use in navigation research. Currently, the i-Rat uses a monocular camera for
landmark recognition and Sharp infra-red sensors for obstacle avoidance.
From the perspective of this thesis, the i-Rat has some limiting factors. The i-Rat
can not see the floor within a 100mm radius of the camera. This can pose collision
hazards under certain circumstances where the i-Rat can approach an object and be
unable to see it. The video stream received from the i-Rat is currently streamed at
5Hz and can experience latency of approximately an entire frame on average. This
may pose a collision hazard as obstacles perceived by the i-Rat would move large
distances between frames, forcing the i-Rat to operate at lower speeds.
Chapter 3
System Requirements
The vision based obstacle avoidance system needs to operate in real time using only
image data. The images from the i-Rat are sent over a wireless connection to an off
board processor which will perform the obstacle detection and avoidance, then send
commands back to the i-Rat. The i-Rat will use the system to wander the laboratory
so the system will need to operate in real time on a video stream of at least 5Hz with
resolution of 424 × 240. The system needs to incorporate a depth measure to give
obstacle distance feedback using the image data. The system will need to operate in
an indoor environment cluttered with generic objects such as chairs, desks, cables,
boxes and other items often encountered in a laboratory environment. The i-Rat will
use the obstacle avoidance system to wander the indoor environment for extended
periods of time, so it must robustly and reliably operate within the environment
without colliding.
18
Chapter 4
Camera Calibration
Two fundamental assumptions form the basis for extracting distance from a single
image:
1. It is assumed that the camera is postioned at a fixed height and angle relative
to the ground.
Each pixel in the image plane can be mapped to a fixed displacement on the
ground plane due to assumption 1. The distance to an obstacle is extracted from
the image based on the lower most pixels occupied by the image of the obstacle.
Assumption 2 states that this is where the obstacle is in contact with the floor and
is thus the minimum distance to the obstacle. Figure 4.1b shows the distance in
metric space to the obstacles in Figure 4.1a using this principle.
Calibration of the camera is required to develop the mapping from pixel space
to metric space. Two methods for calibrating the camera were pursued, calibration
using the Caltech calibration toolbox [5] and a direct mapping method.
19
20 CHAPTER 4. CAMERA CALIBRATION
Figure 4.2: Caltech Calibration Toolbox Corner Extraction (reproduced from [5])
The distortion parameters for each image are estimated by the toolbox. If the
image distortion is too severe, the distortion parameters must be manually tuned for
the image as the toolbox will compute an incorrect estimate. The toolbox calculates
the camera distortion parameters using a least square optimisation on the data
collected from the extracted corners from all the images after the corner extraction
stage has been completed. The resultant model can be used to map image pixels to
normalised image pixels which are in turn related to three dimensional space by the
equation:
4.2. DIRECT MAPPING 21
! !
u x/z
= (4.1)
v y/z
where u and v are the horizontal and vertical positions of the pixel and x, y, and
z are respectively lateral distance, height and depth in real space. Under the two
assumptions, the heights of obstacles remain unknown, but the height of the floor is
known and remains constant. The height of the floor, y, is fixed to negative of the
height of the image centre, the centre axis of the camera. Because image boundaries
between the floor and obstacles occur on the floor, the height of the base of the
obstacle is also equal to the negative of the height of the image centre. The depth
of an obstacle can be found by rearranging part of equation 4.1 as such:
y
z= (4.2)
v
which is then used to calculate the lateral distance:
x= u·z (4.3)
Figure 4.4 shows a grid of points in real space. The red crosses are the actual
positions of the points in real space and the blue crosses are the calculated positions
of the points using the camera model produced via the toolbox. The distortion
model maps the points well for points close to the camera, but becomes steadily
worse as the points move away.
The theory behind this method was such that since the size of the squares and
the position of the i-Rat relative to them are known, the corresponding distances of
pixels that contain corners are also known.
A MATLAB m file was written to automate the corner extraction process. Ob-
jects most likely to be corners were found using Harris corner detection [23]. Of
these, the local objects most likely to be corners were extracted using non-maximal
suppression [35]. The corners extracted using this method are shown in figure 4.6.
The method worked well, extracting 70% of the corners in the image. As visible in
figure 4.6, some actual corners were not picked up by the method and 12% of the
points were false positives, usually due to a double detection of one corner.
4.2. DIRECT MAPPING 23
0.4
0.3
0.2
Depth (m)
0.1
−0.1
−0.2
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Lateral Distance (m)
Manual corner extraction, while tedious and time consuming, has been shown to
produce good results when compared to automatic corner extraction methods [24].
It was argued that manual corner extraction on the image in Figure 4.6 would con-
sume less time in the short term than perfecting the automatic process. Figure 4.6
was used for manual corner extraction as many of the corners were already found
by the automatic corner extraction script with the remaining corners easier to de-
24 CHAPTER 4. CAMERA CALIBRATION
tect when compared to the pre-processed image in figure 4.5. The pixels that had
corners in them were entered into a look up table along with the metric distances
with which they related.
The resolution of the image was a limiting factor for the construction of the
look up table. On forming the look up table using this method, a second grid was
produced with 80mm squares as the resolution of the camera made extraction of
the original grid corners impossible for distances exceeding 500mm. The possible
extraction distance using the larger squares increased to 800mm from the camera.
4.3 Discussion
The look up table produced by the direct mapping method is far more accurate than
the toolbox method discussed in section 4.1. The accuracy of the direct mapping
method is the result of setting each pixel to the actual distance that it corresponds
to instead of performing a calculation based on a model. However, on comparing
figures 4.3 and 4.7, while the direct mapping method is more accurate, the surface
is far less smooth. This is most likely due to the sparsity of the data points and the
complexity of the pixel to distance geometry. The look up table generated by the
direct mapping method was deemed superior to the toolbox method and is the one
in use for pixel to distance transformations.
4.3. DISCUSSION 25
Colour Segmentation
The aim of the colour segmentation method was to provide the i-Rat with the ability
to distinguish the floor from obstacles using colour information. The approach of
the colour method was to build a colour model (mean and variance values) of the
floor using images captured from the i-Rat. Then, during operation of the i-Rat,
each pixel in the image would be compared to the model using a likeliness measure:
i=n
X (pi − mi )2
D= (5.1)
i=0
2vi 2 + 1
where n is the size of the colour space, for example n = 3 for an RGB colour
space. The terms mi and vi are the mean and variance values for the ith component
of the colourspace and pi is the ith component of the current pixel p. D can be
thought of as a difference measure, as the value of D increases the likeliness of the
pixel p to the floor model decreases.
A data set of over 1500 images of the floor was collated to build a mean and
variance colour model of the floor. The images, an example shown in figure 5.2,
were captured so that the lower half of the image contained only floor. Two colour
spaces, RGB and HSV, were used to build independent colour models of the floor.
It was hypothesised that the HS model, which only used the hue and saturation
values, would perform better than the RGB colour space in variant lighting condi-
tions as it did not consider the V(alue) term which holds the lightness information.
A single mean and variance value was produced for each term in the colour spaces
by computing the mean and variance of the captured image data set. A study was
performed with the aim to characterise how well each colour space differentiated
between floor and non-floor colours.
26
5.1. COLOUR CHARACTERISATION 27
Four colour maps were produced in MATLAB, the first three in red-green, red-
blue and green-blue space and the fourth in hue-saturation space. These colour
maps were used to characterise the colours that the colour segmentation process
interpreted as floor and those that it interpreted as not floor. The colours inter-
preted as not floor were considered to belong to obstacles. A ”floorness” value was
calculated using equation 5.1 for each pixel in the three colour maps. The floorness
values were interpreted as floor or not floor using a threshold.
(a) Red Green Floor Model (b) Red Blue Floor Model (c) Green Blue Floor Model
Figure 5.1 shows the colours interpreted as floor in white for the three RGB
colour maps. A sample image of the environment, figure 5.2, is overlaid on the three
colour maps. The image is visualised by the red points which represent floor pixels
and the blue points which represent not floor pixels.
0.6 0.6
0.58
10 10 10
0.58
0.58
20 20 0.56 20
0.56
30 30 30 0.56
0.54 0.54
40 40 40
Y Pixel
Y Pixel
Y Pixel
0.52 0.54
50 50 0.52 50
0.5 0.52
60 60 60
0.5
0.48
70 70 70 0.5
0.46 0.48
80 80 80
0.48
0.44
90 90 0.46 90
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
X Pixel X Pixel X Pixel
(a) Red Value Mean (b) Green Value Mean (c) Blue Value Mean
−3 −3 −3
x 10 x 10 x 10
11 12
10 10 12 10
10 11
11
20 9 20 20 10
10
8 9
30 30 9 30
8
40 7 40 8 40
Y Pixel
Y Pixel
Y Pixel
7
6 7
50 50 50
6
5 6
60 60 60 5
4 5
70 70 4 70 4
3
3 3
80 80 80
2
2 2
90 90 90
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
X Pixel X Pixel X Pixel
(a) Red Value Variance (b) Green Value Variance (c) Blue Value Variance
red, green and blue mean and variance plots of the floor, shown in figures 5.3 and
5.4, were analysed. Importantly, figure 5.3 indicates that the pixel intensity in the
center of the image is significantly different to the intensity at the edges up to 16%
difference in mean value is present. Based on this observation the mean and variance
plots in figures 5.3 and 5.4 replaced the single mean and variance pixel previously
used as the floor model.
Figures 5.5 and 5.6b show the colours interpreted as floor using the two dimen-
5.1. COLOUR CHARACTERISATION 29
(a) 2D Red Green Floor(b) 2D Red Blue Floor Model(c) 2D Green Blue Floor
Model Model
(a) Hue Saturation Floor Model (b) 2D Hue Saturation Floor Model
sional floor model. The result of using the two dimensional floor model is approxi-
mately a 28% reduction in the range of colours interpreted as floor in RGB and HS
colour space compared to the single pixel models shown in figures 5.1 and 5.6a.
Figure 5.7b shows an image that has been processed using the colour segmen-
tation method described in RGB colour space. Pixels of greater intensity represent
30 CHAPTER 5. COLOUR SEGMENTATION
parts of the image more likely to be an obstacle. Note that the floor appears as floor
and is coloured blue whilst the maze wall is an orange colour showing it is clearly
an obstacle.
5.2 Discussion
The HS model could not distinguish between the floor and varying levels of grey,
from black to white. This inability to detect black or white obstacles was a crucial
flaw in the model as many of the objects in the laboratory, such as table and chair
legs and the plastic skirting of the walls, are black. Another problem with the HS
model was encountered which stemmed from the nonlinear conversion method from
RGB to HSV space. The converted images, an example is shown in figure 5.8, con-
tained large patches of discontinuities such as the vast areas of pink on blue, and
also lost large amounts of information during the conversion process. Also worth
noting from figure 5.6 is the colour space that the HS method interprets as floor.
All colours with a saturation intensity of less than around 30% are considered to
belong to the floor. It is likely that this will cause obstacle detection problems for
any object that is not brightly coloured.
While the RGB model prevented the immunity to lighting conditions that the
HS model would have granted, it was able to distinguish between varying levels of
grey including black and white. Due to the poor performance of the HS model, the
RGB model was selected for use in the colour segmentation method.
Chapter 6
Edge Detection
The aim of the edge detection method was to find the lower most edges in an image
and use these edges, under the assumption that the pixels between the bottom of the
image and the edge were obstacle free, to find the distances to the closest obstacles.
Figure 6.1 shows the general process of detecting the obstacles in an image using
Canny edge detection. Each image is first filtered eight times using a 3 × 3 Gaussian
blur. The image edges are detected using OpenCV’s Canny edge detection, shown
in figure 6.1b. Each column of the image is run length encoded, then all pixels
above the lower most pixel with a value of 1 are also set to 1. An example of the
final image is shown in figure 6.1c. A clear distinction has been formed between
the floor and the obstacles in the image. The pixels in figure 6.1c that occupy the
31
32 CHAPTER 6. EDGE DETECTION
edge between the black and white pixels are used to determine the distances to the
obstacles in the image using the look up table described in chapter 4.
The edge detection method fails to detect obstacles when the i-Rat comes so close
to an object that its base is with in the i-Rat’s blind spot. Figure 6.2 shows how the
edge detection method fails to detect the wall in the bottom left of the image. The
wall is interpreted as floor in this image because the first edge encountered is actually
the top edge of the wall. This weakness could be counteracted by tracking obstacles
from image to image, although this is not fool proof either and is computationally
expensive. Not allowing the i-Rat to come so close to an obstacle is another solution,
however it is likely that avoiding such situations is almost impossible. Consider, for
example, if the i-Rat were to rotate on the spot. There could be an unseen obstacle
with in the tolerance range of the edge detection method, however, this object would
not be seen until the i-Rat rotated so that the object was in view, thus breaking the
requirement that such a situation should not occur.
The ideal solution to this problem is to reposition the camera on the i-Rat so
that it can view its own outline. This should allow the i-Rat to view the floor at all
times which would prevent the failure case from ocurring. The i-Rat is currently un-
dergoing redesigns which will move the camera vertically down closer to the ground.
This redesign will reduce the size of the i-Rat’s blind spot.
Chapter 7
Time Derivative
The aim of the time derivative method was to account for the motion of objects
relative to the reference frame. The underlying theory of this method is similar to
the use of optical flow for obstacle avoidance, explained in section 2.5. The time
derivative of the image is calculated by subtracting the previous intensity image
frame from the current intensity image frame [36] as shown in figure 7.1.
Although optical flow is often used in robot navigation, the time derivative
method is better suited for the obstacle avoidance method presented in this the-
sis. First and foremost, the time derivative can be directly combined with the other
two obstacle detection methods that form the complete system. The time derivative
method does not need to track features from between images, so it does not need
to spend time searching through the images for the matching features, making it
33
34 CHAPTER 7. TIME DERIVATIVE
computationally less expensive than optical flow. The low frame rate of the video
stream is subject to large pixel displacements from one frame to the next. When
the i-Rat rotates, images can experience pixel displacements in excess of 40 or more
pixels. This makes tracking of features using optical flow more difficult and time
consuming where optical flow works best for small displacements of features of few
or one pixel between frames [6].
Chapter 8
Method Combination
I = αC + βE + γD (8.1)
Where I is the obstacle intensity image, C is the image produced using colour
segmentation, E is the image produced using edge detection and D is the image
time derivative. α, β and γ are constants used to weight the three methods. Figure
8.1 shows the result of running the obstacle detection method on an image captured
in the laboratory.
A distance map and angle map are created using the lateral distance and depth
maps explained in Chapter 4. The distance map is produced by taking the magni-
tude of each element in the look up tables:
p
Dij = Xij + Zij (8.2)
and the angle map by taking the arctan of the ratio of the depth map and lateral
distance map:
35
36 CHAPTER 8. METHOD COMBINATION
Zij
θij = arctan (8.3)
Xij
where Dij is the distance corresponding to pixel ij, θij is the angle for the pixel
ij, Zij is the depth value for pixel ij and Xij is the lateral distance for the pixel ij.
Each pixel in the obstacle intensity image is divided by its corresponding cubed
distance to produce a resultant measure of virtual force magnitude [27]. These forces
are then virtually applied to the i-Rat at the orientations contained in θ for each
pixel. Figure 8.2 shows the forces in red virtually acting on the i-Rat.
Figure 8.3 shows the how the obstacle avoidance system presented in this thesis
works.
The i-Rat constantly attempts to drive directly ahead at a fixed velocity. The
course that the i-Rat traverses is altered by the virtual forces that the obstacles
impart on it. As the i-Rat approaches an obstacle, the virtual force acting from the
obstacle cubically increases in magnitude with distance. If the i-Rat comes too close
to an obstacle it is forced to stop and rotate on the spot until a clear path has been
found. The i-Rat then resumes its forward motion.
37
Image Frame
Obstacle
Intensity Image
3
D atan
Force Force
Magnitude Direction
Evaluation
9.1 Ranging
The aim of this study was to characterise the ranging abilities of the colour segmen-
tation and edge detection methods over a range of obstacle angles and distances.
The distance that the i-Rat can sense directly affects the velocity at which it can
travel. The accuracy of the measured distance will affect how well the i-Rat can nav-
igate cluttered environments without collision. Perceived distances that are larger
than the actual distances to obstacles present collision hazards while perceived dis-
tances that are shorter increase the possibility of the i-Rat becoming unnecessarily
trapped in a cluttered environment.
Five orange balls were positioned equidistantly from the camera on the i-Rat at
angles −60◦ , −30◦ , 0◦ , 30◦ and 60◦ . The balls were repositioned at decreasing incre-
ments of 50mm and three photos were captured at each position. The three photos
of each position were averaged to reduce any anomalies that may have occurred in
the images. The balls were initially placed at a maximum distance of 500mm and
ended at a minimum distance of 100mm. Figure 9.1 shows the ranging setup of the
i-Rat with the five orange balls positioned at 500mm.
Figures 9.2a and 9.3a are plots of the perceived distance to the five orange balls
versus the actual distances to each ball for colour segmentation and edge detection
respectively. The mean distance error averaged over the five balls, shown in figures
9.2b and 9.3b, is substantial with an average error of 23% for colour segmentation
and 16% for edge detection. Multiple factors are likely to have contributed to this
error result.
The position of the balls was measured to their centres, however the segmentation
38
9.1. RANGING 39
700 50
45
600
40
Perceived Distance (mm)
35
500
−60° 30
−30°
Error %
400 0° 25
30°
60° 20
300
15
10
200
5
100 0
100 200 300 400 500 100 150 200 250 300 350 400 450 500
Actual Distance (mm) Actual Distance (mm)
and edge detection process picked up the shadows that the balls cast on the carpet
and considered the edge of the shadow as the closest obstacle. This discrepancy will
account for some of the error, but not a substantial fraction. A critical assumption
of the colour segmentation and edge detection methods is that the camera is at
fixed height and angle to the horizon. If the camera is moved, or the i-Rat sits at a
different angle, the distance measure is sure to be erroneous. The large error for the
edge detection method at 100mm can be attributed to the edge detection method
taking the top edge of the ball as the closest distance to an obstacle. This problem
has been discussed in chapter 6 with possible solutions provided.
700 50
45
600
40
Perceived Distance (mm)
500 35
−60°
−30°
Error %
30
400 0°
30° 25
60°
300 20
15
200
10
100 5
100 200 300 400 500 100 150 200 250 300 350 400 450 500
Actual Distance (mm) Actual Distance (mm)
The colour segmentation, edge detection and time derivative obstacle detection
methods were performed on these images. The results are shown in Figures 9.5, 9.6,
9.7, 9.8 and 9.9.
method in this situation, see Figure 9.6c, does not detect regions of the wall with in
the blind spot as an obstacle.
The edge detection method is less susceptible to gradual changes in lighting con-
ditions when compared to the RGB method as it is essentially a spatial derivative
of the image as opposed to an absolute difference. Figure 9.6a highlights the edge
detection method’s performance in non-uniform lighting conditions. The high fre-
quency regions in Figure 9.5a, namely the desk legs, are accurately detected by the
method where as the soft shadow in the background is correctly identified as part
of the floor. It also correctly detects the wooden board in Figure 9.5d due to the
distinct edge between the floor and the obstacle. The edge detection method falsely
identifies objects as part of the floor if the boundary between the object and the floor
is significantly softened, or as in Figure 9.6c, if the obstacle is too close to the camera.
The time derivative method functions similarly to the edge detection method as
it too is a derivative based approach, however through time instead of space. The
method performs less well on images with uniformly similar colours and few edges,
as shown in Figure 9.7d. It also has a tendency to duplicate obstacles, generating
an image of the USB cable in Figure 9.5e. The time derivative method can detect
obstacles when they are with in the i-Rat’s blind spot depending on the motion of
42 CHAPTER 9. EVALUATION
the i-Rat relative to the obstacle. Figure 9.7c shows the maze wall correctly detected
9.2. OBSTACLE DETECTION 43
by the time derivative method, where as in Figure 9.6c the edge detection method
failed to completely detect it.
Figure 9.9 shows the obstacles as detected by the combined system. The com-
bined method has performed better at detecting the entire sample set than any of
the methods individually. It correctly detects objects that are undetected by the
individual methods such as the wooden board in Figure 9.9d and the maze wall
in Figure 9.9c. While the method does perform less well on some obstacles when
compared to individual methods, such as the maze wall in Figure 9.9c compared to
the RGB segmentation method in Figure 9.5c, it has correctly detected all of the
obstacles in the sample set, a task no individual method has achieved.
44 CHAPTER 9. EVALUATION
A maze was populated with objects found in the laboratory. Objects that were
blue or red were discriminated against as the overhead tracking uses the blue and
red marker on the i-Rat for tracking. A computer mouse, a cardboard box and a roll
of tape were positioned in the maze as shown in figure 9.10. The i-Rat was placed
in the maze with the combined obstacle avoidance system in operation. Overhead
tracking was set up to track the path of the i-Rat as it traversed the maze using the
obstacle avoidance system.
Figure 9.10 shows the path the i-Rat traversed in the maze over a period of 21
minutes. The i-Rat maintained an average velocity of 31mm/s with a maximum
velocity of 80mm/s. The first collision occurred at 7 : 13 minutes when the i-Rat ro-
tated on the spot to avoid a maze corner and blindly crashed into the cardboard box.
The i-Rat has been allowed to wander freely around the laboratory for up to half
an hour at a time.
Chapter 10
Conclusion
A vision based obstacle avoidance system using only one camera has successfully
been implemented on the i-Rat mobile robot platform. The system is comprised
of four modules; a colour segmentation module, edge detection module and time
derivative for the obstacle detection methods with a direct pixel to metric space
mapping used to infer depth. A virtual force field wraps the system up using the
obstacle intensity image as the force magnitudes scaled cubically by the distance
corresponding to the pixels.
1. An obstacle that is raised from the floor with clearance lower than the height
of the i-Rat is encountered.
2. The i-Rat approaches an object in such a way that it remains out of sight.
Outside of these situations, the i-Rat rarely requires human intervention. Situ-
ation 1 can be over come by removing all obstacles from the environment that fit
this description. Repositioning of the camera on the i-Rat will reduce its blind spot
thus reducing the occurrence of situation 2. Running the i-Rat at a slower velocity
reduces the effects of video latency, if run at a slow enough velocity situation 3 can
be avoided all together.
46
Chapter 11
Future Work
The most immediate area for improvement with the vision based obstacle avoid-
ance method presented in this thesis is autonomy. The camera calibration method
presented in Chapter 4 is manually time consuming. If the method were to be com-
pletely automated, calibration time would significantly reduce. The pixel to metric
space maps could be drastically improved by moving the camera at slight parallel
intervals with respect to the grid to map new pixels, increasing the amount of data
for the interpolation function to work with.
Development of the i-Rat so it can autonomously learn the floor model would
allow it to be deployed in an indoor environment without the need to pre-process
floor image data. Ideally, the i-Rat would be placed on the ground with a set radius
around the i-Rat clear of obstacles and told to develop a model floor. The i-Rat
would start by rotating slowly about its axis to start building a model of the floor
using the knowledge that a certain distance in the image is guaranteed to be part
of the floor. Using this rudimentary model, the i-Rat could then slowly explore its
surrounds whilst reinforcing its internal representation of the floor using the newly
captured image data. This process would continue until the i-Rat had developed a
sufficient model of the floor, or the process could be an on going one which would
allow the i-Rat to slowly adapt to lighting changes throughout the day.
The use of different colour spaces should be further explored. Using a colour
space that is tolerant to lighting conditions would enable the i-Rat to more readily
navigate through areas of varying lighting conditions [45], such as under desks or
through patches of sunlight.
Finally, combining the system with haptic sensors such as whiskers would reduce
the occurrence of collisions. The whiskers could be used to sense outside of the
visual field to check for obstacles before moving into an area not visible by the i-
47
48 CHAPTER 11. FUTURE WORK
Rat. Rotating on the spot is an example where this would be useful. Whiskers would
help with the autonomous learning of the floor previously described. As the i-Rat
builds its model of the floor, the whiskers would be used to provide reinforcement
that the region to learn from is free from obstacles.
References
[8] M. Bertozzi and A. Broggi. GOLD: A parallel real-time stereo vision system for
generic obstacle and lane detection. IEEE Transactions on Image Processing,
7(1):62–81, 1998.
[10] F. Bonin-Font, A. Ortiz, and G. Oliver. Visual navigation for mobile robots:
A survey. Journal of Intelligent and Robotic Systems, 53(3):263–296, 2008.
[12] Gary Bradski and Adrian Kaehler. Learning OpenCV. O’Reilly, 2008.
49
50 REFERENCES
[13] P.I. Corke. Visual control of robot manipulators, a review. Visual Serving: Real
Time Control of Robot Manipulators Based on Visual Sensory Feedback, 1993.
[15] K. Daniilidis and J.O. Eklundh. 3-D Vision and Recognition. 2008.
[16] G.N. DeSouza and A.C. Kak. Vision for mobile robot navigation: A survey.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(2):237–
267, 2002.
[18] P. Elinas, R. Sim, and J.J. Little. /spl sigma/SLAM: stereo vision SLAM using
the Rao-Blackwellised particle filter and a novel mixture proposal distribution.
In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE Inter-
national Conference on, pages 1564–1570. IEEE, 2006.
[19] D.A. Forsyth and J. Ponce. Computer vision: a modern approach. Prentice
Hall Professional Technical Reference, 2002.
[20] G. Gini and A. Marchi. Indoor robot navigation with single camera vision.
Pattern Recognition in Information Systems, 8(1):67–76, 2002.
[22] G.D. Hager, W.C. Chang, and AS Morse. Robot Hand-eye Coordination based
on Stereo Vision. IEEE Control Systems Magazine, 15(1):30–39, 1995.
[23] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey
vision conference, volume 15, page 50. Manchester, UK, 1988.
[24] A. Heyden and K. Rohr. Evaluation of corner extraction schemes using invari-
ance methods. In Pattern Recognition, 1996., Proceedings of the 13th Interna-
tional Conference on, volume 1, pages 895–899. IEEE, 2002.
[27] Y. Koren and J. Borenstein. Potential field methods and their inherent limita-
tions for mobile robot navigation. In Robotics and Automation, 1991. Proceed-
ings., 1991 IEEE International Conference on, pages 1398–1404. IEEE, 2002.
[28] Peter Kovesi. MATLAB and Octave Functions for C0mputer Vision and Image
Processing. [Online], 2009. http://www.csse.uwa.edu.au/ pk/Research/Mat-
labFns/.
[29] D.J. Kriegman, E. Triendl, and T.O. Binford. Stereo vision and navigation in
buildings for mobile robots. IEEE Transactions on Robotics and Automation,
5(6):792–803, 1989.
[30] S. Lenser and M. Veloso. Visual sonar: Fast obstacle avoidance using monocular
vision. In Proceedings of IROS, volume 3. Citeseer, 2003.
[31] L.M. Lorigo, R.A. Brooks, and W.E.L. Grimson. Visually-guided obstacle
avoidance in unstructured environments. In Proceedings of IROS, volume 97,
pages 373–379. Citeseer, 1997.
[32] D.G. Lowe. Object recognition from local scale-invariant features. In iccv, page
1150. Published by the IEEE Computer Society, 1999.
[33] MJ Milford, GF Wyeth, and D. Prasser. RatSLAM: a hippocampal model for si-
multaneous localization and mapping. In Robotics and Automation, 2004. Pro-
ceedings. ICRA’04. 2004 IEEE International Conference on, volume 1, pages
403–408. IEEE, 2005.
[34] D. Murray and J.J. Little. Using real-time stereo vision for mobile robot navi-
gation. Autonomous Robots, 8(2):161–171, 2000.
[36] S.H. Park and T. Matsuo. Time-derivative estimation of noisy movie data using
adaptive control theory. International Journal of Signal Processing, 2:3, 2006.
[38] A. Saxena, S.H. Chung, and A.Y. Ng. 3-d Depth Reconstruction from a Single
Still Image. International Journal of Computer Vision, 76(1):53–69, 2008.
52 REFERENCES
[40] D. Slater. Using the 6mm Nikon fisheye lens with the Nikon D1 camera. [On-
line], 2000. http://www.nearfield.com/ dan/photo/wide/fish/index.htm.
[41] K.T. Song and J.H. Huang. Fast optical flow estimation and its application to
real-time obstacle avoidance. In Robotics and Automation, 2001. Proceedings
2001 ICRA. IEEE International Conference on, volume 3, pages 2891–2896.
IEEE, 2005.
[42] C.J. Taylor and J.P. Ostrowski. Robust vision-based pose control. In Robotics
and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference
on, volume 3, pages 2734–2740. IEEE, 2002.
[44] Paul A. Tipler and Gene Mosca. Physics for Scientists and Engineers. Susan
Finnemore Brennan, 2004.