Monocular Vision Based Obstacle Avoidance: by Daniel Clarke

SCHOOL OF ENGINEERING
Monocular Vision Based

Obstacle Avoidance
by
Daniel Clarke
School of Information Technology and Electrical Engineering,

University of Queensland
Submitted for the degree of

Bachelor of Engineering Honours
in the division of Mechatronics
November 2010
iii
e-mail: daniel.clarke1@uqconnect.edu.au
November 4, 2010
The Dean
School of Engineering
University of Queensland
St Lucia, Q 4072
Dear Professor Paul Strooper,
In accordance with the requirements of the degree of Bachelor of Engineering

Honours in the division of Mechatronic Engineering, I present the following thesis
entitled “Monocular Vision based Obstacle Avoidance”. This work was performed
under the supervision of Dr David Ball.
I declare that the work submitted in this thesis is my own, except as acknowl-
edged in the text and footnotes, and has not been previously submitted for a degree
at the University of Queensland or any other institution.
Yours sincerely,
Daniel Clarke
Acknowledgments
Mum
For everything you’ve done to get me here.
Dr D. Ball
For the constant and copious amounts contact time and guidance.
S. Heath
For the priceless help and friendship throughout the year.
W. Maddern
For the general advice and last minute proof reading of this document.
N. Ali
For proof reading this, going above and beyond in team projects and for the cookies.
v
Abstract
Visual sensors provide a high bandwidth of information useful for perception of an

environment. This information can be used to detect obstacles within the environ-
ment and if two basic assumptions are made, the locations of the obstacles can be
extracted from the information at the resolution of the visual sensor.
The i-Rat currently uses three infra-red proximity sensors to avoid obstacles dur-
ing navigation. Due to the lack of information received from these sensors, the i-Rat
is constrained to operate in tightly structured environments.
The aim of this thesis is to use this visual information to detect and avoid ob-
stacles in a laboratory environment using the i-Rat mobile robot platform and an
off board processor. The high bandwidth of information received from the visual
sensor will allow the i-Rat to better perceive its environment, and liberate it from
such tightly structured environments.
The project successfully combined three independent obstacle detection modules

and an image to metric space map to provide the i-Rat with dense information of
its surrounds. The system has been used on the i-Rat for extended periods of time
in the minimally structured laboratory environment without the need for human
intervention. The obstacle avoidance system operates in real time using the video
stream captured from the i-Rat at 5Hz.
vi
Contents
Acknowledgments v
Abstract vi
List of Figures x
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Chapter Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Vision Based Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Stereo Vision based Obstacle Avoidance . . . . . . . . . . . . 3
2.1.2 Issues with Stereo for the i-Rat . . . . . . . . . . . . . . . . . 5
2.1.3 Monocular Vision based Obstacle Avoidance . . . . . . . . . . 6
2.2 Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Geometrical Optics . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Focal Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Field of View . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 Lens Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Aperture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.4 Shutter Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Stereo Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.3 Stereo Camera Calibration . . . . . . . . . . . . . . . . . . . . 13
2.5 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
viii CONTENTS
2.6 Non Maximum Suppression . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Harris Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 i-Rat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 System Requirements 18
4 Camera Calibration 19
4.1 Caltech Calibration Toolbox . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Caltech Calibration Toolbox Results . . . . . . . . . . . . . . 21
4.2 Direct Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Direct Mapping Results . . . . . . . . . . . . . . . . . . . . . 24
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Colour Segmentation 26
5.1 Colour Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Edge Detection 31
7 Time Derivative 33
8 Method Combination 35
9 Evaluation 38
9.1 Ranging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.2 Obstacle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9.3 Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10 Conclusion 46
11 Future Work 47
References 49
List of Figures
2.1 Depth Map from Stereo Images (reproduced from [9]) . . . . . . . . . 4

2.2 Trinocular Module (reproduced from [34]) . . . . . . . . . . . . . . . 4
2.3 Processing Time vs Frame Number (reproduced from [18]) . . . . . . 5
2.4 3D Scanner (reproduced from [38]) . . . . . . . . . . . . . . . . . . . 7
2.5 3D Depth Estimation (reproduced from [38]) . . . . . . . . . . . . . . 7
2.6 Extreme Distortion by a Fish Eye Lens (reproduced from [40]) . . . . 9
2.7 Distortion Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Epipolar Geometry (reproduced from [12]) . . . . . . . . . . . . . . . 13
2.9 The i-Rat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Obstacle Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Caltech Calibration Toolbox Corner Extraction (reproduced from [5]) 20
4.3 Distance Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Direct Mapping Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Extracted Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Floor Model Projections . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Reference Environment Image . . . . . . . . . . . . . . . . . . . . . . 28
5.3 Floor Mean RGB Values . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Floor Variance RGB Values . . . . . . . . . . . . . . . . . . . . . . . 28
5.5 2D Floor Model Projections . . . . . . . . . . . . . . . . . . . . . . . 29
5.6 Hue Saturation Floor Model Projections . . . . . . . . . . . . . . . . 29
5.7 Obstacle Detection using RGB Extraction . . . . . . . . . . . . . . . 29
5.8 HSV Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.1 Canny Obstacle Detection Process . . . . . . . . . . . . . . . . . . . . 31

6.2 Canny Obstacle Detection Process Fail Case . . . . . . . . . . . . . . 32
7.1 Image Time Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 33
ix
x LIST OF FIGURES
8.1 Obstacle Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

8.2 Resultant Forces from the Environment . . . . . . . . . . . . . . . . . 36
8.3 Combined Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.1 Ranging Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

9.2 Distance Ranging Data . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.3 Distance Ranging Data . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9.4 Sample Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
9.5 RGB Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.6 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.7 Time Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.8 HS Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.9 Combined System Obstacle Detection . . . . . . . . . . . . . . . . . . 44
9.10 Traversed Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 1
Introduction
The field of vision based obstacle avoidance can be segmented into many different
categories based on machine learning, stereo vision, motion based perception such
as optical flow and image segmentation methods using colour or edge information.
Some robot platforms are unable to support sensors such as laser scanners due to
size and or cost constraints, whereas visual sensors are generally both small and
cheap [7]. Other sensors, such as proximity sensors, are not suited for the task of
navigation and can only be successfully used in a strictly structured environment
[11]. Environmental cues such as colour, lighting conditions and movement are
more readily perceived through visual information than through the use of other
specialised sensors. Due to the benefits of visual sensors, considerable efforts in the
progression of vision based robot platforms have been and are being made [16] [10].
1.1 Motivation
The i-Rat mobile robot platform, under development as part of the Thinking Sys-
tems research project, is used for research in the field of navigation. It currently
uses three infra red proximity sensors to detect obstacles. The lack of information
provided by the sensor array restricts the i-Rat to operation in carefully structured
environments.
It is planned that the i-Rat will be able to freely navigate unstructured indoor
environments. For this to become a reality it must use its visual sensor for greater
perception of the environment. The purpose of this thesis is to provide the i-Rat
with the ability to freely explore a minimally structured indoor environment using
machine vision.
1
2 CHAPTER 1. INTRODUCTION
1.2 Scope
This thesis will develop an obstacle avoidance system for a mobile robot platform,
the i-Rat, purely using the video stream captured from the onboard wide angle
camera. As this system is required to provide the obstacle avoidance, it is restricted
to image processing and computer vision techniques that can be performed in real
time on a live video stream. The obstacle avoidance system will only need to avoid
obstacles that are within view of the i-Rat. Obstacles out of view of the i-Rat will
have to be avoided using a system not presented in this thesis, such as haptic feed-
back, or by mapping the obstacles in the environment. The system will be designed
for use indoors where the ground is assumed to be flat and motions of the robot
are restricted to translation parallel and rotations perpendicular to the floor. The
obstacle avoidance system is not expected to use machine learning based methods.
Design alterations of the i-Rat platform are considered beyond the scope of this
thesis. The system will provide a reactive ability to avoid obstacles and is not re-
quired to map the environment or perform localisation. The vision based obstacle
avoidance system should allow the i-Rat to operate in larger, less structured envi-
ronments than currently permitted.
1.3 Chapter Outlines

This section contains a brief description of each of the chapter contents.
Chapter 1 introduces the project and the motivation behind it.
Chapter 2 presents the background theory and literature on vision based obstacle
avoidance.
Chapter 3 describes the requirements of the system.
Chapter 4 presents two methods to relate pixel space to metric space.
Chapter 5 describes the design of the colour segmentation obstacle detection
method.
Chapter 6 explains the design of the edge detection method and how it detects
obstacles.
Chapter 7 presents the time derivative obstacle detection method and reasons
why it was used over optical flow.
Chapter 8 explains how the system is combined using the different modules.
Chapter 9 evaluates the system by comparing the individual methods and the
combined system on the same set of test data.
Chapter 10 concludes the thesis and summarises the results.
Chapter 11 discusses future work envisioned for this project.
Chapter 2
Background
2.1 Vision Based Obstacle Avoidance

Vision based obstacle avoidance uses image data captured from a visual sensor to
detect obstacles in an environment and calculate a safe passage to a target desti-
nation. Visual sensors provide the highest bandwidth of data when compared to
other sensors [15] and information about an environment that other sensors can not
detect, such as colour information. When used for navigation, the visual sensors
are often paired with other sensors such a laser scanners to provide a more accurate
and robust depth map of the environment [43]. The use of sensors other than vi-
sual sensors is however theoretically unnecessary, as it is possible to perform tasks
such as estimation of trajectory and pose [42], mapping of the environment and
scene recognition using visual information [33]. For some robot designs, carrying
an array of distance sensors or a laser scanner is infeasible, due to size and/or cost
restrictions [7]. This thesis splits vision based obstacle avoidance into two main cat-
egories: stereo vision based obstacle avoidance and monocular vision based obstacle
avoidance.
2.1.1 Stereo Vision based Obstacle Avoidance

Stereo vision based obstacle avoidance uses images from multiple image sensors to
infer depth from a scene. Stereopsis (see section 2.4.1) is the most common method
to reconstruct a depth map from two or more images. Stereo depth maps can provide
a large amount of accurate depth information of a scene, limited by the resolution
of the cameras used [21]. Figure 2.1c shows a depth map calculated using the stereo
images in figures 2.1a and 2.1b, where the intensity of the depth map represents
depth.
3
4 CHAPTER 2. BACKGROUND
(a) Left Image (b) Right Image (c) Depth Map
Figure 2.1: Depth Map from Stereo Images (reproduced from [9])
Murray and Little
Murray and Little use a trinocular camera module, shown in figure 2.2, to extract
depth information from a scene. They used three cameras in their vision module
to remove ambiguity in the stereo matching. A disparity map is produced using
the three images. Although the camera module captures scene information in three
dimensions, Murray and Little use this information to produce a 2D distance map.
The maximum disparity in each column of the disparity map is extracted and used
to produce a single row of distance measures. They justify this decision by arguing
that indoor robots essentially operate in a 2D top down environment.
Figure 2.2: Trinocular Module (reproduced from [34])
The trinocular module is too large for use on the i-Rat. There are other stereo
vision modules commercially available such as Point Grey’s BumbleBee [4] or Sur-
veyor’s SVS [2] which both use twin parallel cameras. However, these too are too
large to fit on the i-Rat.
Elinas, Sim and Little
Elinas, Sim and Little have approached the problem that is SLAM [39] using a Rao-
Blackwellised particle filter [17] and a stereo camera [18]. Their method uses the
images captured from the stereo camera as the system’s only input. They construct
2.1. VISION BASED OBSTACLE AVOIDANCE 5
dense metric maps of 3D landmarks using the Scale Invariant Feature Transforma-
tion (SIFT) [32]. The algorithm checks the ratio of old landmarks to total landmarks
in the scene. If the ratio is greater than 30% then the pose of the robot is estimated
using a weighted sum of a motion model and a second distribution that depends on
the latest observation and the current learned map, otherwise the pose is estimated
using a vision based motion model.
Figure 2.3: Processing Time vs Frame Number (reproduced from [18])
Figure 2.3 shows the processing time of the algorithm versus the frame number.
The algorithm takes between one and two seconds to compute per frame on average.
While the algorithm achieves results similar to models that use laser and odometric
sensors, the computation time per frame is simply too long for use in real time.
2.1.2 Issues with Stereo for the i-Rat

A stereo vision module for use on the i-Rat would need to use only two cameras
for biological plausibility as the robot is inspired by nature by design. The module
would have to be small enough to fit on the front plate of the i-Rat. For stereo
reconstruction to function correctly, the images used must have been captured at
the same time. Dedicated hardware would need to solve this problem, as the image
module would need to connect via USB to the i-Rat (which can not synchronise
data). Currently, the image stream frame rate is 5Hz using a single camera at
424 × 240 resolution. Using a stereo module would theoretically halve the frame
rate of the image stream as the amount of image data would double. Alternatively,
the image resolution of the cameras could be halved to maintain the 5Hz frame rate.
However this would reduce the image quality, resulting in poorer distance estimation
and obstacle detection.
2.1.3 Monocular Vision based Obstacle Avoidance

Monocular vision based obstacle avoidance uses a single image sensor for navigation.
Generally, either optical flow [41], edge detection [31], image segmentation [20] or
visual sonar [30] are used to detect obstacles in the environment from a single image.
Edge detection, image segmentation and visual sonar approaches commonly stem
from the same principle of assuming the ground to be a flat plane with obstacles
rising vertically from it [45]. From this assumption, the distances to obstacles can
be estimated using camera calibration equations or a look up table.
Optical flow methods use the flow of an image sequence to avoid obstacles.
Prominent features, such as corners or boundaries between objects, will produce
regions with greater optical flow. During navigation, the robot will attempt to
travel to areas of lower optical flow. Thus areas containing obstacles will produce
regions of higher optical flow and will be avoided.
Saxena, Chung and Ng
Saxena et al have taken a supervised learning approach to the task of 3D depth

estimation from a single image [38]. Their algorithm learns from a large data set
consisting of monocular images with corresponding laser scans. The data set is
captured using a paired SICK LMS-291 laser scanner and digital camera, as shown
in figure 2.4.
Each image in the data set is split into patches. The global and relative depths of
each patch are estimated using texture variation, texture gradient and colour. The
result is the ability to robustly extract depth information from a single monocular
image captured in an unstructured environment, as shown in figure 2.5. Saxena et
al used their method to autonomously drive a remote controlled car in unstructured
environments for up to 200 seconds without colliding using a 320 × 240 resolution
web camera mounted to the front of the car.
If this method were to be pursued for use on the i-Rat, a rig similar to that in
figure 2.4 would need to be developed before the data capture process could begin.
The rig would require a more compact design with the laser closer to the ground
so that it could scan underneath desks and chairs in the laboratory. While the
algorithm developed by Saxena et al is proven to be robust in unstructured out-
2.1. VISION BASED OBSTACLE AVOIDANCE 7
Figure 2.4: 3D Scanner (reproduced from [38])
Figure 2.5: 3D Depth Estimation (reproduced from [38])
door environments, the process of learning using a SICK laser scanner is financially
impractical for the budget of this project.
Lorigo, Brooks and Grimson
Lorigo et al have combined three independent vision modules to navigate in an un-

structured environment [31]. They combine an edge detection module with an RGB
segmentation module and a HSV segmentation module by taking the median pixel of
the three modules for each column in the image. The RGB and HSV segmentation
modules compare a histogram of the base of the image, which is assumed to belong
to the ground, to the rest of the image using a vertically sliding window. Patches of
the image that deviate too greatly from the base histogram are considered to con-
tain obstacles. The robot platform, Pebbles III, uses the average distance of the
obstacle boundary to proportionally set forward velocity and the difference between
the left and right averages to set rotational velocity.
The histogram method used by Lorigo et al allows the robot to navigate over
changing terrain. As it does not require knowledge of what the ground looks like,
it can be deployed on any coloured surface and is ready to autonomously navigate.
A problem with this method is encountered if the robot approaches an obstacle so
closely that it covers the entire visual sensor. If this scenario were to occur, the
robot would use the bottom of the obstacle to compare to the rest of the image
and interpret the environment as obstacle free, driving into the obstacle under the
impression that it is part of the ground.
2.2 Lenses
2.2.1 Geometrical Optics
Geometrical optics is a sufficient approximation to light for optical analysis in this
project. Under the approximation of geometrical optics, light obeys three laws:
Rectilinear Propagation
Refraction
Reflection
In geometrical optics, light is approximated as rays. Rays are drawn as straight

lines to represent rectilinear motion of light and do not experience diffraction [44].
Due to the comparison in size between wavelengths of light and ordinary objects,
diffraction of light is difficult to detect in ordinary environments, justifying the
approximation of light as rays [44]. They are used as a way of visualising optics and
do not physically occur. This approximation is known as ray approximation.
2.2.2 Focal Length

The focal length of a lens is the distance between the image sensor and the entry
point of the lens. A larger focal length provides greater magnification of the image
and a smaller field of view [26]. The size x of the image of an object of size X at a
distance Z when viewed through a pin hole lens with focal length f is given by [12]:
X
−x =f (2.1)
Z
2.2. LENSES 9
2.2.3 Field of View

The field of view of a lens is twice the angle from the optical centre within which
entering light can be directed to the image sensor. The field of view in a desired
direction α is related to the focal length of the lens f and the image sensor size d
in the desired direction by [14]:
d
α = 2 tan( ) (2.2)
2f
2.2.4 Lens Distortion

Radial distortion is caused by the shape of the lens and results in compression of
the image close to the perimeter of the lens. A fish eye lens provides an exaggerated
example of radial distortion as shown in Figure 2.6.
Figure 2.6: Extreme Distortion by a Fish Eye Lens (reproduced from [40])
Extreme radial distortion can be modelled using a third order Taylor series ex-
pansion. The radial location of a point on an image sensor can be corrected using
the equations [12], letting r 2 = x2 + y 2 and where the k terms correspond to the
radial and tangential distortion parameters of the camera:
xcorrected = x(1 + k1 r 2 + k2 r 4 + k3 r 6 ) (2.3)
ycorrected = y(1 + k1 r 2 + k2 r 4 + k3 r 6 ) (2.4)
Tangential distortion is caused by the lens not being parallel to the image sensor.
Tangential distortion can be corrected using the equations [12]:
xcorrected = x + [2p1 y + p2 (r 2 + 2x2 )] (2.5)

ycorrected = x + [p1 (r 2 + 2y 2 ) + 2p2 x] (2.6)
2.3 Cameras
2.3.1 Camera Calibration
It is often desirable to counteract the effects of lens distortion when working with
machine vision [37]. Equations 2.3, 2.4, 2.5 and 2.6 can be used with the camera dis-
tortion parameters to correct lens distortion provided the distortion is not too severe.
The Caltech camera calibration toolbox is a freely available calibration toolbox

for MATLAB and C++ that can be used to calculate the distortion parameters of a
camera and correct the distortion [5]. All that is needed to calibrate a camera using
the toolbox is a chequerboard attached to a rigid flat surface. Multiple images of
the chequerboard are captured from different angles and distances using the camera
that is to be calibrated. The chequer corners are extracted from the images and
used to calculate the distortion parameters of the camera. The camera distortion
can then be corrected once the camera has been calibrated. Figure 2.7 shows an
image before and after distortion correction using the Caltech toolbox.
(a) Original Image (b) Corrected Image
Figure 2.7: Distortion Correction
2.3.2 Depth of Field

The depth of field of an image is the portion of the image that is in focus [3]. A
shallow depth of field has a small portion of the image in focus, often on the subject
2.4. STEREO VISION 11
of the image is in focus. An image with a deep depth of field has a large portion,
often the entire image, in focus. In robotics, a deep depth of field is desirable as
more information can be perceived from the environment and the effects of motion
blur are reduced [13].
2.3.3 Aperture
The aperture of the camera is the term used to describe the size of the opening of
the lens diaphragm. By adjusting the aperture of the lens the rate at which light
can enter the image sensor is changed. As the aperture decreases the opening of
the diaphragm increases allowing more light to reach the image sensor in a given
duration of time. A small aperture creates a narrow depth of field. Conversely, a
large aperture reduces the size of the diaphragm, restricting the amount of light
that can reach the image sensor in a given duration. A large aperture gives a wide
depth of field.
2.3.4 Shutter Speed

The shutter speed when used in conjunction with a digital camera refers to the
duration at which the image sensor is receptive. A long shutter speed will keep the
sensor receptive for a longer period, allowing more light to reach the sensor which
in turn produces images with greater exposure. This is generally used in low light
conditions to extract more information from the scene. As the sensor is receptive for
a longer duration of time, the use of longer shutter speeds tends introduce motion
blur in dynamic scenes due to movement of the image on the image sensor. A
fast shutter speed is less prone to blur but generally performs poorly in low light
conditions as it does not give the sensor enough time to properly expose the image
sensor.
2.4 Stereo Vision

Stereo vision is vision using two visual receptors, such as cameras, to view one scene
from two slightly different perspectives. This slight difference in perspective can be
used to infer depth by triangulating common features between the images. This
ability to perceive depth can be used to avoid obstacles which is fundamental to
navigation.
2.4.1 Stereopsis
Stereopsis is the act of perceiving depth or three dimensions using two or more two
dimensional images. This ability to perceive depth from multiple images is used
in robot control [22] [25] and navigation [29] [8] as its versatility enables the robot
to navigate in complex environments. The first process of stereopsis is to remove
the radial and tangential lens distortions. The images are rectified by accounting
for the difference in angle and position between the two cameras. At this stage the
two images will be rectified and their rows will be aligned. Identical features in the
two images are identified and a disparity map is produced by taking the differences
between the same features of the two images. The disparity map is then used to
determine distances using triangulation [12].
2.4.2 Stereo Geometry

The use of two cameras provides two distinct images of one point, P . The two
images, l (from the left perspective) and r (from the right perspective), will see the
same point from different perspectives. Triangulation can be used to determine the
distance from an object since the two cameras are positioned at different positions.
The use of triangulation is constrained by the geometry of the two cameras. This
geometric constraint is called the epipolar constraint which is determined by the
configuration of the camera pair and the orientation of the point of interest [19].
The epipolar constraint is based around the epipolar plane and where it intersects
the projection planes of the cameras. The epipolar plane is defined by the optic
centres of the cameras and the point P . The line formed by the intersection of the
epipolar plane and an image plane is called the epipolar line. The most difficult part
of stereo data analysis is matching corresponding points between the two images.
The epipolar constraint constrains the set of possible matches of the point p to lie
on the epipolar line l′ . Figure 2.8 represents the epipolar geometry.
−−→ −−→ −−−→

The three vectors Ol pl , Or pr and Ol Or that make up the epipolar plane are
inherently coplanar. Therefore, each vector must lie in the plane spanned by the
other two vectors [19]. This relation can be rewritten from the perspective of the
first camera as [19]
pl · [T × (Rpr )] (2.7)
where p and p′ are the homogeneous image coordinate vectors, T is the translation
−−−→
vector from the optic centre of one camera to the other Ol Or and R is the rotation
matrix used to describe a vector from one reference frame in the other reference
frame.
2.4. STEREO VISION 13
Figure 2.8: Epipolar Geometry (reproduced from [12])
2.4.3 Stereo Camera Calibration

In order to achieve stereo-vision the two cameras need to be calibrated. A relation
is required that connects projections pl and pr of an object P . The relation derived
here has been sourced from [12]. All points x on a plane with normal vector n that
passes through a point a obey the following constraint:
(x − a) · n = 0 (2.8)
The epipolar plane contains the vectors Pl and T . Taking the cross product of
the vectors Pl and T results in a vector normal to the epipolar plane. Equation
eq : pointsp lanec onstraint can then be used to describe all the points Pl through
the point T :
(Pl − T )T (T × Pl ) = 0 (2.9)
The right projection is obtained by transforming the left projection. The trans-
formation is achieved by:
Pr = R(Pl − T ) (2.10)
This transformation allows the points on the right projection plane to be intro-
duced into the equation:
(R−1 Pr )T (T × Pl ) = 0 (2.11)
Given that:
RT = R−1 (2.12)
The equation now becomes:
(RT Pr )T (T × Pl ) = 0 (2.13)
The translation vector cross product in the equation is replaced by a matrix S

such that:
 
0 −Tz Ty
T × Pl = SPl =  Tz 0 −Tx  (2.14)
 
−Ty Tx 0
Rewriting equation 2.13 using equation 2.14:
PrT RSPl = 0 (2.15)
Equation 2.14 also produces the essential matrix E which contains the translation
and rotation parameters that relate the two cameras in physical space:
RS = E (2.16)
The relation between the left and right projection planes is given by:
PrT EPl = 0 (2.17)
Substituting the projections equationsPl = fZl Pl l and Pr = fZr Pr r and dividing by

fl fr P l P r
Zl Zr
results in the relation of the physical points between the two image sensors:
pTr Epl = 0 (2.18)
The fundamental matrix F is similar to the essential matrix E except that is also
contains the intrinsic properties of the cameras. Equation 2.17 relates the physical
points of one projection plane to the other. What is in fact needed is the relation
between the pixels of the two cameras. Using the relation between a physical point
and a pixel p = M −1 q where M is the camera intrinsics matrix, equation 2.17 can
be altered to relate the pixels of the two cameras:
qrT (Mr−1 )T EMl−1 ql = 0 (2.19)
The fundamental matrix F is then defined as:
F = (Mr−1 )EMl−1 (2.20)
And so, equation 2.19 becomes:

2.5. OPTICAL FLOW 15
qrT F ql = 0 (2.21)
which relates the pixels from one camera to the pixels in the other camera.
−−→ −−−→ −−→

Ol pl · [Ol Or × Ol pr ] = 0 (2.22)
2.5 Optical Flow

Optical flow is the process of measuring motion by comparing the displacement of
pixels between two or more frames [12]. The motion of features in the environment
is represented by vectors. The direction of the vectors represents the direction
of motion of the feature and the magnitude of the vector represents the velocity
at which it travels. Dense optical flow compares every pixel between frames and
is computationally expensive, for this reason optical flow is often implemented as
sparse optical flow where only pixels of importance, such as edges, are compared. A
reference frame with translational velocity v and rotational velocity ω experiences
optical flow OF in the direction θ of an obstacle at a distance d given by [7]:
v
OF = −ω + cos θ (2.23)
d
Obstacles in line with the direction of motion will not illicit optical flow but
will instead expand as the reference frame moves closer to the obstacle. As the
reference frame gets closer, the rate of expansion increases. Optical flow can be
used to detect obstacles and provide heading and velocity information[3]. Obstacles
can be detected by the rate of flow of the visual field. Close obstacles will flow faster
than objects that are further away. The translational and rotational velocity of the
reference frame can be estimated by taking an average of the optical flow with close
objects excluded, as they would provide great and perhaps incorrect biasing to the
result [7].
2.6 Non Maximum Suppression

Non maximum suppression is a local maximum search [35]. It searches within a
neighbourhood for the greatest value and does this for each neighbourhood over an
entire data set. A MATLAB m file implementation based off Peter Kovesi’s non
maximum suppression [28] is shown below.
Non Maximum Suppression

% performs non maximal s u p p r e s s i o n in im

function [r , c ] = n o n _ m a x i m a l _ s u p p r e s s i o n ( im , radius , ←֓
t h r e s h o l d)
% replaces each element in im by the maximum element in
% the set of n e i g h b o u r s within t h r e s h o l d
maxima = ordfilt2 ( im , (2* radius +1) ^2 , ones (2* radius +1) ) ;
imwrite ( uint8 (255 * maxima / max ( max ( maxima ) ) ) , ' max . jpg←֓
');
% create mask with ones over the whole image and a ←֓
border
% of zeros of width 2* radius +1
mask = zeros ( size ( im ) ) ;
mask ( radius +1: end - radius , radius +1: end - radius ) = 1;
% for all pixels , if the current pixel is a maxima
% and that maxima is larger than t h r e s h o l d and the
% pixel is within the image mask , put a 1 in i m _ m a x i m a
i m _ m a x i m a = ( im == maxima ) & ( im > t h r e s h o l d) & mask ;
imwrite ( uint8 ( i m _ m a x i m a * 255) , ' m a x i m a _ p o i n t s . jpg ' ) ;
% r and c are the rows and columns that c o o r d i n a t e a ←֓
maxima
[r , c ] = find ( i m _ m a x i m a) ;
2.7 Harris Corner Detection

The Harris corner response R is a measure of corner quality [23], how much a region
is like a corner, and is given by:
R = Det(A) − κT race(A)2 (2.24)
where A is a matrix made from the dot products of image derivatives:

" #
Ix2 Ix Iy
A= (2.25)
Ix Iy Iy2
and Ix and Iy are respectively the horizontal and vertical first order derivatives
of the image.
2.8. I-RAT 17
2.8 i-Rat
Currently under development is a biologically inspired robot the ”i-Rat” (intelligent
Rat animat technology) as part of research undertaken by Thinking Systems at the
University of Queensland shown in Figure 2.9. The intended purpose of the i-Rat
is for use in navigation research. Currently, the i-Rat uses a monocular camera for
landmark recognition and Sharp infra-red sensors for obstacle avoidance.
Figure 2.9: The i-Rat
From the perspective of this thesis, the i-Rat has some limiting factors. The i-Rat
can not see the floor within a 100mm radius of the camera. This can pose collision
hazards under certain circumstances where the i-Rat can approach an object and be
unable to see it. The video stream received from the i-Rat is currently streamed at
5Hz and can experience latency of approximately an entire frame on average. This
may pose a collision hazard as obstacles perceived by the i-Rat would move large
distances between frames, forcing the i-Rat to operate at lower speeds.
Chapter 3
System Requirements
The vision based obstacle avoidance system needs to operate in real time using only
image data. The images from the i-Rat are sent over a wireless connection to an off
board processor which will perform the obstacle detection and avoidance, then send
commands back to the i-Rat. The i-Rat will use the system to wander the laboratory
so the system will need to operate in real time on a video stream of at least 5Hz with
resolution of 424 × 240. The system needs to incorporate a depth measure to give
obstacle distance feedback using the image data. The system will need to operate in
an indoor environment cluttered with generic objects such as chairs, desks, cables,
boxes and other items often encountered in a laboratory environment. The i-Rat will
use the obstacle avoidance system to wander the indoor environment for extended
periods of time, so it must robustly and reliably operate within the environment
without colliding.
18
Chapter 4
Camera Calibration
Two fundamental assumptions form the basis for extracting distance from a single
image:
1. It is assumed that the camera is postioned at a fixed height and angle relative
to the ground.
2. The ground is assumed to be an expansive flat two dimensional plane with

obstacles vertically rising from the floor.
These two assumptions generally restrict the environment to artificial environ-
ments and fail when an obstacle that is raised from the ground is encountered.
The immediate intended use of the i-Rat is within a laboratory environment.

The laboratory has a flat expansive floor and the motions of the i-Rat are cur-
rently restricted to translation parallel to, and rotation perpendicular to, the floor.
There are some objects in the lab, specifically office chairs, which are elevated off
the ground yet pose collision hazards. Objects that do not conform to the second
assumption are removed from the environment while the i-Rat navigates using the
obstacle avoidance system.
Each pixel in the image plane can be mapped to a fixed displacement on the
ground plane due to assumption 1. The distance to an obstacle is extracted from
the image based on the lower most pixels occupied by the image of the obstacle.
Assumption 2 states that this is where the obstacle is in contact with the floor and
is thus the minimum distance to the obstacle. Figure 4.1b shows the distance in
metric space to the obstacles in Figure 4.1a using this principle.
Calibration of the camera is required to develop the mapping from pixel space
to metric space. Two methods for calibrating the camera were pursued, calibration
using the Caltech calibration toolbox [5] and a direct mapping method.
19
20 CHAPTER 4. CAMERA CALIBRATION
(a) Sample Image (b) Distances to the Obstacles in Metric

Space
Figure 4.1: Obstacle Distance
4.1 Caltech Calibration Toolbox

The Caltech calibration toolbox uses multiple images of a checker board of known
dimensions taken from different distances and orientations to calculate the distor-
tion parameters of a camera. For each image of the checker board, the four extreme
corners are manually marked in the image. Using these four corners and the knowl-
edge of how many squares there are on the board, the toolbox finds and marks each
corner, shown in Figure 4.2.
Figure 4.2: Caltech Calibration Toolbox Corner Extraction (reproduced from [5])
The distortion parameters for each image are estimated by the toolbox. If the
image distortion is too severe, the distortion parameters must be manually tuned for
the image as the toolbox will compute an incorrect estimate. The toolbox calculates
the camera distortion parameters using a least square optimisation on the data
collected from the extracted corners from all the images after the corner extraction
stage has been completed. The resultant model can be used to map image pixels to
normalised image pixels which are in turn related to three dimensional space by the
equation:
4.2. DIRECT MAPPING 21
! !
u x/z
= (4.1)
v y/z
where u and v are the horizontal and vertical positions of the pixel and x, y, and
z are respectively lateral distance, height and depth in real space. Under the two
assumptions, the heights of obstacles remain unknown, but the height of the floor is
known and remains constant. The height of the floor, y, is fixed to negative of the
height of the image centre, the centre axis of the camera. Because image boundaries
between the floor and obstacles occur on the floor, the height of the base of the
obstacle is also equal to the negative of the height of the image centre. The depth
of an obstacle can be found by rearranging part of equation 4.1 as such:
y
z= (4.2)
v
which is then used to calculate the lateral distance:
x= u·z (4.3)
4.1.1 Caltech Calibration Toolbox Results

A set of over one hundred images was used to estimate the distortion parameters of
the camera. Figure 4.3 shows the pixel to distance maps for lateral distance in 4.3a
and depth in figure 4.3b. The distance maps were produced by evaluating equations
4.2 and 4.3 on each pixel below the horizon of the image. Theoretically the depth
should approach infinity as the vertical pixel approaches the horizon of the image.
It should be noted however that in Figure 4.3b the maximum distance is approxi-
mately 550mm.
Figure 4.4 shows a grid of points in real space. The red crosses are the actual
positions of the points in real space and the blue crosses are the calculated positions
of the points using the camera model produced via the toolbox. The distortion
model maps the points well for points close to the camera, but becomes steadily
worse as the points move away.
4.2 Direct Mapping

For the direct mapping method, a large checker board with 28.5mm squares was
laid on the floor with the i-Rat positioned such that its view of the board was that
shown in figure 4.5.
(a) Toolbox Pixel to Lateral Distance
(b) Toolbox Pixel to Depth
Figure 4.3: Distance Geometries
The theory behind this method was such that since the size of the squares and
the position of the i-Rat relative to them are known, the corresponding distances of
pixels that contain corners are also known.
A MATLAB m file was written to automate the corner extraction process. Ob-
jects most likely to be corners were found using Harris corner detection [23]. Of
these, the local objects most likely to be corners were extracted using non-maximal
suppression [35]. The corners extracted using this method are shown in figure 4.6.
The method worked well, extracting 70% of the corners in the image. As visible in
figure 4.6, some actual corners were not picked up by the method and 12% of the
points were false positives, usually due to a double detection of one corner.
4.2. DIRECT MAPPING 23
Calculated vs Actual Distances

0.5
0.4
0.3
0.2
Depth (m)
0.1
−0.1
−0.2
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Lateral Distance (m)
Figure 4.5: Direct Mapping Image
Figure 4.6: Extracted Corners
Manual corner extraction, while tedious and time consuming, has been shown to
produce good results when compared to automatic corner extraction methods [24].
It was argued that manual corner extraction on the image in Figure 4.6 would con-
sume less time in the short term than perfecting the automatic process. Figure 4.6
was used for manual corner extraction as many of the corners were already found
by the automatic corner extraction script with the remaining corners easier to de-
tect when compared to the pre-processed image in figure 4.5. The pixels that had
corners in them were entered into a look up table along with the metric distances
with which they related.
4.2.1 Direct Mapping Results

The look up table was sparsely populated with only 280 pixels having a correspond-
ing distance out of a possible 50880 pixels, accounting for a mere 0.55% of the
total look up table. Interpolation using a virtual spring model provided by John
D’Errico [1] was performed on the data points to fill the rest of the look up ta-
ble. The interpolated data points were smoothed using a low pass filter to remove
anomalies. Figures 4.7a and 4.7b show the pixel to distance mapping for lateral
distance and depth respectively.
The resolution of the image was a limiting factor for the construction of the
look up table. On forming the look up table using this method, a second grid was
produced with 80mm squares as the resolution of the camera made extraction of
the original grid corners impossible for distances exceeding 500mm. The possible
extraction distance using the larger squares increased to 800mm from the camera.
4.3 Discussion
The look up table produced by the direct mapping method is far more accurate than
the toolbox method discussed in section 4.1. The accuracy of the direct mapping
method is the result of setting each pixel to the actual distance that it corresponds
to instead of performing a calculation based on a model. However, on comparing
figures 4.3 and 4.7, while the direct mapping method is more accurate, the surface
is far less smooth. This is most likely due to the sparsity of the data points and the
complexity of the pixel to distance geometry. The look up table generated by the
direct mapping method was deemed superior to the toolbox method and is the one
in use for pixel to distance transformations.
4.3. DISCUSSION 25
(a) Pixel to Lateral Distance
(b) Pixel to Depth

Chapter 5
Colour Segmentation
The aim of the colour segmentation method was to provide the i-Rat with the ability
to distinguish the floor from obstacles using colour information. The approach of
the colour method was to build a colour model (mean and variance values) of the
floor using images captured from the i-Rat. Then, during operation of the i-Rat,
each pixel in the image would be compared to the model using a likeliness measure:
i=n
X (pi − mi )2
D= (5.1)
i=0
2vi 2 + 1
where n is the size of the colour space, for example n = 3 for an RGB colour
space. The terms mi and vi are the mean and variance values for the ith component
of the colourspace and pi is the ith component of the current pixel p. D can be
thought of as a difference measure, as the value of D increases the likeliness of the
pixel p to the floor model decreases.
A data set of over 1500 images of the floor was collated to build a mean and
variance colour model of the floor. The images, an example shown in figure 5.2,
were captured so that the lower half of the image contained only floor. Two colour
spaces, RGB and HSV, were used to build independent colour models of the floor.
It was hypothesised that the HS model, which only used the hue and saturation
values, would perform better than the RGB colour space in variant lighting condi-
tions as it did not consider the V(alue) term which holds the lightness information.
A single mean and variance value was produced for each term in the colour spaces
by computing the mean and variance of the captured image data set. A study was
performed with the aim to characterise how well each colour space differentiated
between floor and non-floor colours.
26
5.1. COLOUR CHARACTERISATION 27
5.1 Colour Characterisation

The aim of this study was to characterise the colour segmentation method’s ability
to distinguish between colours that belong to the floor and those that don’t. Ob-
stacles that are within the floor colour space are hazards for collision as the colour
segmentation process will interpret them as part of the floor, thus deeming them
safe to approach and drive over. Areas of the floor that are outside of the floor
colour space, such as areas under desks or in direct sun light, will be interpreted as
obstacles. Interpretation of part of the floor as an obstacle may result in the i-Rat
becoming stuck within an imaginary boundary, which is a nuisance, however less
severe than a collision. Ideally the colour set interpreted as floor should be minimal
and yet encompass the entire set of floor colours.
Four colour maps were produced in MATLAB, the first three in red-green, red-
blue and green-blue space and the fourth in hue-saturation space. These colour
maps were used to characterise the colours that the colour segmentation process
interpreted as floor and those that it interpreted as not floor. The colours inter-
preted as not floor were considered to belong to obstacles. A ”floorness” value was
calculated using equation 5.1 for each pixel in the three colour maps. The floorness
values were interpreted as floor or not floor using a threshold.
(a) Red Green Floor Model (b) Red Blue Floor Model (c) Green Blue Floor Model
Figure 5.1: Floor Model Projections
Figure 5.1 shows the colours interpreted as floor in white for the three RGB
colour maps. A sample image of the environment, figure 5.2, is overlaid on the three
colour maps. The image is visualised by the red points which represent floor pixels
and the blue points which represent not floor pixels.
It is clear that the range of colours interpreted as floor is unacceptably large on

inspection of the colour plots produced in figure 5.1. After viewing this result the
28 CHAPTER 5. COLOUR SEGMENTATION
Figure 5.2: Reference Environment Image
0.6 0.6
0.58
10 10 10
0.58
0.58
20 20 0.56 20
0.56
30 30 30 0.56
0.54 0.54
40 40 40
Y Pixel
Y Pixel
Y Pixel
0.52 0.54
50 50 0.52 50
0.5 0.52
60 60 60
0.5
0.48
70 70 70 0.5
0.46 0.48
80 80 80
0.48
0.44
90 90 0.46 90
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
X Pixel X Pixel X Pixel
(a) Red Value Mean (b) Green Value Mean (c) Blue Value Mean
Figure 5.3: Floor Mean RGB Values
−3 −3 −3
x 10 x 10 x 10
11 12
10 10 12 10
10 11
11
20 9 20 20 10
10
8 9
30 30 9 30
8
40 7 40 8 40
Y Pixel
Y Pixel
Y Pixel
7
6 7
50 50 50
6
5 6
60 60 60 5
4 5
70 70 4 70 4
3
3 3
80 80 80
2
2 2
90 90 90
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
X Pixel X Pixel X Pixel
(a) Red Value Variance (b) Green Value Variance (c) Blue Value Variance
Figure 5.4: Floor Variance RGB Values
red, green and blue mean and variance plots of the floor, shown in figures 5.3 and
5.4, were analysed. Importantly, figure 5.3 indicates that the pixel intensity in the
center of the image is significantly different to the intensity at the edges up to 16%
difference in mean value is present. Based on this observation the mean and variance
plots in figures 5.3 and 5.4 replaced the single mean and variance pixel previously
used as the floor model.
Figures 5.5 and 5.6b show the colours interpreted as floor using the two dimen-
5.1. COLOUR CHARACTERISATION 29
(a) 2D Red Green Floor(b) 2D Red Blue Floor Model(c) 2D Green Blue Floor
Model Model
Figure 5.5: 2D Floor Model Projections
(a) Hue Saturation Floor Model (b) 2D Hue Saturation Floor Model
Figure 5.6: Hue Saturation Floor Model Projections
sional floor model. The result of using the two dimensional floor model is approxi-
mately a 28% reduction in the range of colours interpreted as floor in RGB and HS
colour space compared to the single pixel models shown in figures 5.1 and 5.6a.
(a) Original Image (b) RGB Colour Extraction
Figure 5.7: Obstacle Detection using RGB Extraction
Figure 5.7b shows an image that has been processed using the colour segmen-
tation method described in RGB colour space. Pixels of greater intensity represent
30 CHAPTER 5. COLOUR SEGMENTATION
parts of the image more likely to be an obstacle. Note that the floor appears as floor
and is coloured blue whilst the maze wall is an orange colour showing it is clearly
an obstacle.
5.2 Discussion
The HS model could not distinguish between the floor and varying levels of grey,
from black to white. This inability to detect black or white obstacles was a crucial
flaw in the model as many of the objects in the laboratory, such as table and chair
legs and the plastic skirting of the walls, are black. Another problem with the HS
model was encountered which stemmed from the nonlinear conversion method from
RGB to HSV space. The converted images, an example is shown in figure 5.8, con-
tained large patches of discontinuities such as the vast areas of pink on blue, and
also lost large amounts of information during the conversion process. Also worth
noting from figure 5.6 is the colour space that the HS method interprets as floor.
All colours with a saturation intensity of less than around 30% are considered to
belong to the floor. It is likely that this will cause obstacle detection problems for
any object that is not brightly coloured.
Figure 5.8: HSV Conversion
While the RGB model prevented the immunity to lighting conditions that the
HS model would have granted, it was able to distinguish between varying levels of
grey including black and white. Due to the poor performance of the HS model, the
RGB model was selected for use in the colour segmentation method.
Chapter 6
Edge Detection
The aim of the edge detection method was to find the lower most edges in an image
and use these edges, under the assumption that the pixels between the bottom of the
image and the edge were obstacle free, to find the distances to the closest obstacles.
(a) Original Image (b) Canny Edge Detection
(c) Obstacle Outline
Figure 6.1: Canny Obstacle Detection Process
Figure 6.1 shows the general process of detecting the obstacles in an image using
Canny edge detection. Each image is first filtered eight times using a 3 × 3 Gaussian
blur. The image edges are detected using OpenCV’s Canny edge detection, shown
in figure 6.1b. Each column of the image is run length encoded, then all pixels
above the lower most pixel with a value of 1 are also set to 1. An example of the
final image is shown in figure 6.1c. A clear distinction has been formed between
the floor and the obstacles in the image. The pixels in figure 6.1c that occupy the
31
32 CHAPTER 6. EDGE DETECTION
edge between the black and white pixels are used to determine the distances to the
obstacles in the image using the look up table described in chapter 4.
The edge detection method fails to detect obstacles when the i-Rat comes so close
to an object that its base is with in the i-Rat’s blind spot. Figure 6.2 shows how the
edge detection method fails to detect the wall in the bottom left of the image. The
wall is interpreted as floor in this image because the first edge encountered is actually
the top edge of the wall. This weakness could be counteracted by tracking obstacles
from image to image, although this is not fool proof either and is computationally
expensive. Not allowing the i-Rat to come so close to an obstacle is another solution,
however it is likely that avoiding such situations is almost impossible. Consider, for
example, if the i-Rat were to rotate on the spot. There could be an unseen obstacle
with in the tolerance range of the edge detection method, however, this object would
not be seen until the i-Rat rotated so that the object was in view, thus breaking the
requirement that such a situation should not occur.
(a) Original Image (b) Obstacle Outline
Figure 6.2: Canny Obstacle Detection Process Fail Case
The ideal solution to this problem is to reposition the camera on the i-Rat so
that it can view its own outline. This should allow the i-Rat to view the floor at all
times which would prevent the failure case from ocurring. The i-Rat is currently un-
dergoing redesigns which will move the camera vertically down closer to the ground.
This redesign will reduce the size of the i-Rat’s blind spot.
Chapter 7
Time Derivative
The aim of the time derivative method was to account for the motion of objects
relative to the reference frame. The underlying theory of this method is similar to
the use of optical flow for obstacle avoidance, explained in section 2.5. The time
derivative of the image is calculated by subtracting the previous intensity image
frame from the current intensity image frame [36] as shown in figure 7.1.
(a) Original Image (b) Image Time Derivative
Figure 7.1: Image Time Derivative
This method accentuates regions of difference in displacement with respect to the

motion of the reference frame. The obstacles in figure 7.1b have been successfully
extracted using the time derivative method. The ball which is in close proximity to
the i-Rat has been especially accentuated as the motion between frames is greater
than for objects further away.
Although optical flow is often used in robot navigation, the time derivative
method is better suited for the obstacle avoidance method presented in this the-
sis. First and foremost, the time derivative can be directly combined with the other
two obstacle detection methods that form the complete system. The time derivative
method does not need to track features from between images, so it does not need
to spend time searching through the images for the matching features, making it
33
34 CHAPTER 7. TIME DERIVATIVE
computationally less expensive than optical flow. The low frame rate of the video
stream is subject to large pixel displacements from one frame to the next. When
the i-Rat rotates, images can experience pixel displacements in excess of 40 or more
pixels. This makes tracking of features using optical flow more difficult and time
consuming where optical flow works best for small displacements of features of few
or one pixel between frames [6].
Chapter 8
Method Combination
The culmination of chapters 4, 5, 6 and 7 is a monocular vision based obstacle

detection system. The separate detection methods were combined using a weighted
summation to produce an obstacle intensity image:
I = αC + βE + γD (8.1)
Where I is the obstacle intensity image, C is the image produced using colour
segmentation, E is the image produced using edge detection and D is the image
time derivative. α, β and γ are constants used to weight the three methods. Figure
8.1 shows the result of running the obstacle detection method on an image captured
in the laboratory.
Figure 8.1: Obstacle Intensity
A distance map and angle map are created using the lateral distance and depth
maps explained in Chapter 4. The distance map is produced by taking the magni-
tude of each element in the look up tables:
p
Dij = Xij + Zij (8.2)
and the angle map by taking the arctan of the ratio of the depth map and lateral
distance map:
35
36 CHAPTER 8. METHOD COMBINATION
Zij
θij = arctan (8.3)
Xij
where Dij is the distance corresponding to pixel ij, θij is the angle for the pixel
ij, Zij is the depth value for pixel ij and Xij is the lateral distance for the pixel ij.
Each pixel in the obstacle intensity image is divided by its corresponding cubed
distance to produce a resultant measure of virtual force magnitude [27]. These forces
are then virtually applied to the i-Rat at the orientations contained in θ for each
pixel. Figure 8.2 shows the forces in red virtually acting on the i-Rat.
Figure 8.2: Resultant Forces from the Environment
Figure 8.3 shows the how the obstacle avoidance system presented in this thesis
works.
The i-Rat constantly attempts to drive directly ahead at a fixed velocity. The
course that the i-Rat traverses is altered by the virtual forces that the obstacles
impart on it. As the i-Rat approaches an obstacle, the virtual force acting from the
obstacle cubically increases in magnitude with distance. If the i-Rat comes too close
to an obstacle it is forced to stop and rotate on the spot until a clear path has been
found. The i-Rat then resumes its forward motion.
37
Image Frame
Colour Edge Time

Segmentation Detection Derivative
Obstacle
Intensity Image
3
D atan
Force Force
Magnitude Direction
Figure 8.3: Combined Method

Chapter 9
Evaluation
9.1 Ranging
The aim of this study was to characterise the ranging abilities of the colour segmen-
tation and edge detection methods over a range of obstacle angles and distances.
The distance that the i-Rat can sense directly affects the velocity at which it can
travel. The accuracy of the measured distance will affect how well the i-Rat can nav-
igate cluttered environments without collision. Perceived distances that are larger
than the actual distances to obstacles present collision hazards while perceived dis-
tances that are shorter increase the possibility of the i-Rat becoming unnecessarily
trapped in a cluttered environment.
Five orange balls were positioned equidistantly from the camera on the i-Rat at
angles −60◦ , −30◦ , 0◦ , 30◦ and 60◦ . The balls were repositioned at decreasing incre-
ments of 50mm and three photos were captured at each position. The three photos
of each position were averaged to reduce any anomalies that may have occurred in
the images. The balls were initially placed at a maximum distance of 500mm and
ended at a minimum distance of 100mm. Figure 9.1 shows the ranging setup of the
i-Rat with the five orange balls positioned at 500mm.
Figures 9.2a and 9.3a are plots of the perceived distance to the five orange balls
versus the actual distances to each ball for colour segmentation and edge detection
respectively. The mean distance error averaged over the five balls, shown in figures
9.2b and 9.3b, is substantial with an average error of 23% for colour segmentation
and 16% for edge detection. Multiple factors are likely to have contributed to this
error result.
The position of the balls was measured to their centres, however the segmentation
38
9.1. RANGING 39
Figure 9.1: Ranging Setup
700 50
45
600
40
Perceived Distance (mm)
35
500
−60° 30
−30°
Error %
400 0° 25
30°
60° 20
300
15
10
200
5
100 0
100 200 300 400 500 100 150 200 250 300 350 400 450 500
Actual Distance (mm) Actual Distance (mm)
(a) Actual vs Perceived Distance (b) Average Distance Error
Figure 9.2: Distance Ranging Data
and edge detection process picked up the shadows that the balls cast on the carpet
and considered the edge of the shadow as the closest obstacle. This discrepancy will
account for some of the error, but not a substantial fraction. A critical assumption
of the colour segmentation and edge detection methods is that the camera is at
fixed height and angle to the horizon. If the camera is moved, or the i-Rat sits at a
different angle, the distance measure is sure to be erroneous. The large error for the
edge detection method at 100mm can be attributed to the edge detection method
taking the top edge of the ball as the closest distance to an obstacle. This problem
has been discussed in chapter 6 with possible solutions provided.
The look up table method of mapping distances, as described in Chapter 4, has

inherit problems stemming from the sparsity of data used to generate the table.
The interpolation process has produced better results than the Caltech calibration
toolbox method, however the mapping is not ideal. Low pass filtering of the look up
table is also likely a contributor to the errors in this experiment. It is to be noted
in figures 9.2a and 9.3a that the perceived distances to the balls vary linearly with
the actual distance to the balls after a distance of 200mm. This is important as
40 CHAPTER 9. EVALUATION
700 50
45
600
40
Perceived Distance (mm)
500 35
−60°
−30°
Error %
30
400 0°
30° 25
60°
300 20
15
200
10
100 5
100 200 300 400 500 100 150 200 250 300 350 400 450 500
Actual Distance (mm) Actual Distance (mm)
(a) Actual vs Perceived Distance (b) Average Distance Error
Figure 9.3: Distance Ranging Data
it means as a stationary obstacle is approached at a constant speed, the i-Rat will

detect a constant rate of change in distance to the obstacle.
9.2 Obstacle Detection

The aim of this study was to characterise and compare how well the colour segmen-
tation and edge detection methods detect obstacles likely to be encountered in the
robotics laboratory. Images were captured of various objects found on the labora-
tory floor using the i-Rat’s camera. Figure 9.4 shows the sample images used to
evaluate the detection methods.
The colour segmentation, edge detection and time derivative obstacle detection
methods were performed on these images. The results are shown in Figures 9.5, 9.6,
9.7, 9.8 and 9.9.
The RGB segmentation method is susceptible to interpreting discrepancies caused

by lighting, such as shadows and bright spots as obstacles. In Figure 9.5a the method
incorrectly interprets a large shaded area behind the table legs as an obstacle. It is
also susceptible to falsely identifying obstacles as floor when they are coloured simi-
larly to the floor. The wooden partition in figure 9.5d is entirely interpreted as floor.
Conversely, the RGB segmentation method performs well when detecting obstacles
that are coloured outside of the floor model. Figures 9.5b, 9.5c and 9.5e show the
method clearly detecting the obstacles in the images. The RGB method provides
the ability to detect obstacles that encompass entire vertical regions of the image.
In Figure 9.5c, the image was captured with the maze wall with in the i-Rat’s blind
spot, however it is still identified as an obstacle by the method. The edge detection
9.2. OBSTACLE DETECTION 41
(a) Desk Legs (b) Wall Skirting (c) Maze Wall
(d) Wooden Board (e) USB Cable
Figure 9.4: Sample Images
method in this situation, see Figure 9.6c, does not detect regions of the wall with in
the blind spot as an obstacle.
The edge detection method is less susceptible to gradual changes in lighting con-
ditions when compared to the RGB method as it is essentially a spatial derivative
of the image as opposed to an absolute difference. Figure 9.6a highlights the edge
detection method’s performance in non-uniform lighting conditions. The high fre-
quency regions in Figure 9.5a, namely the desk legs, are accurately detected by the
method where as the soft shadow in the background is correctly identified as part
of the floor. It also correctly detects the wooden board in Figure 9.5d due to the
distinct edge between the floor and the obstacle. The edge detection method falsely
identifies objects as part of the floor if the boundary between the object and the floor
is significantly softened, or as in Figure 9.6c, if the obstacle is too close to the camera.
The time derivative method functions similarly to the edge detection method as
it too is a derivative based approach, however through time instead of space. The
method performs less well on images with uniformly similar colours and few edges,
as shown in Figure 9.7d. It also has a tendency to duplicate obstacles, generating
an image of the USB cable in Figure 9.5e. The time derivative method can detect
obstacles when they are with in the i-Rat’s blind spot depending on the motion of
Figure 9.5: RGB Segmentation
Figure 9.6: Edge Detection
the i-Rat relative to the obstacle. Figure 9.7c shows the maze wall correctly detected
9.2. OBSTACLE DETECTION 43
by the time derivative method, where as in Figure 9.6c the edge detection method
failed to completely detect it.
Figure 9.7: Time Derivative
The HS segmentation method performed extremely poorly on all sample images,

Figure 9.8 shows that is fails to detect any of the obstacles in the sample images.
The algorithm for calculating the HS floorness measure is identical to that of the
RGB method which detects all but one of the obstacles. As discussed in Section 5.2,
the HS segmentation method’s poor performance is due to its inability to distinguish
between grey levels and the inherit problems of the conversion method.
Figure 9.9 shows the obstacles as detected by the combined system. The com-
bined method has performed better at detecting the entire sample set than any of
the methods individually. It correctly detects objects that are undetected by the
individual methods such as the wooden board in Figure 9.9d and the maze wall
in Figure 9.9c. While the method does perform less well on some obstacles when
compared to individual methods, such as the maze wall in Figure 9.9c compared to
the RGB segmentation method in Figure 9.5c, it has correctly detected all of the
obstacles in the sample set, a task no individual method has achieved.
Figure 9.8: HS Segmentation
Figure 9.9: Combined System Obstacle Detection

9.3. OBSTACLE AVOIDANCE 45
9.3 Obstacle Avoidance

The aim of this study was to evaluate how well the combined system functioned at
avoiding obstacles and under what situations it failed. The criteria for determining
how well the i-Rat could avoid obstacles were the duration before the first collision,
the average speed of the i-Rat and the types of obstacles avoided.
A maze was populated with objects found in the laboratory. Objects that were
blue or red were discriminated against as the overhead tracking uses the blue and
red marker on the i-Rat for tracking. A computer mouse, a cardboard box and a roll
of tape were positioned in the maze as shown in figure 9.10. The i-Rat was placed
in the maze with the combined obstacle avoidance system in operation. Overhead
tracking was set up to track the path of the i-Rat as it traversed the maze using the
obstacle avoidance system.
Figure 9.10: Traversed Path
Figure 9.10 shows the path the i-Rat traversed in the maze over a period of 21
minutes. The i-Rat maintained an average velocity of 31mm/s with a maximum
velocity of 80mm/s. The first collision occurred at 7 : 13 minutes when the i-Rat ro-
tated on the spot to avoid a maze corner and blindly crashed into the cardboard box.
The i-Rat has been allowed to wander freely around the laboratory for up to half
an hour at a time.
Chapter 10
Conclusion
A vision based obstacle avoidance system using only one camera has successfully
been implemented on the i-Rat mobile robot platform. The system is comprised
of four modules; a colour segmentation module, edge detection module and time
derivative for the obstacle detection methods with a direct pixel to metric space
mapping used to infer depth. A virtual force field wraps the system up using the
obstacle intensity image as the force magnitudes scaled cubically by the distance
corresponding to the pixels.
Each module in the combined method contributes strengths to compliment other

methods’ weaknesses. It has been shown that the combined method performs better
at detecting obstacles in the environment than each of the individual modules. The
obstacle avoidance system allows the i-Rat to traverse semi-structured cluttered
environments collision free for extended periods of time. For the majority of cases,
collision by the system is attributed to one of three situations:
1. An obstacle that is raised from the floor with clearance lower than the height
of the i-Rat is encountered.
2. The i-Rat approaches an object in such a way that it remains out of sight.
3. The image that is used to avoid obstacles is delayed by a duration of hundreds

of milliseconds while the i-Rat is with in close proximity to an object.
Outside of these situations, the i-Rat rarely requires human intervention. Situ-
ation 1 can be over come by removing all obstacles from the environment that fit
this description. Repositioning of the camera on the i-Rat will reduce its blind spot
thus reducing the occurrence of situation 2. Running the i-Rat at a slower velocity
reduces the effects of video latency, if run at a slow enough velocity situation 3 can
be avoided all together.
46
Chapter 11
Future Work
The most immediate area for improvement with the vision based obstacle avoid-
ance method presented in this thesis is autonomy. The camera calibration method
presented in Chapter 4 is manually time consuming. If the method were to be com-
pletely automated, calibration time would significantly reduce. The pixel to metric
space maps could be drastically improved by moving the camera at slight parallel
intervals with respect to the grid to map new pixels, increasing the amount of data
for the interpolation function to work with.
Development of the i-Rat so it can autonomously learn the floor model would
allow it to be deployed in an indoor environment without the need to pre-process
floor image data. Ideally, the i-Rat would be placed on the ground with a set radius
around the i-Rat clear of obstacles and told to develop a model floor. The i-Rat
would start by rotating slowly about its axis to start building a model of the floor
using the knowledge that a certain distance in the image is guaranteed to be part
of the floor. Using this rudimentary model, the i-Rat could then slowly explore its
surrounds whilst reinforcing its internal representation of the floor using the newly
captured image data. This process would continue until the i-Rat had developed a
sufficient model of the floor, or the process could be an on going one which would
allow the i-Rat to slowly adapt to lighting changes throughout the day.
The use of different colour spaces should be further explored. Using a colour
space that is tolerant to lighting conditions would enable the i-Rat to more readily
navigate through areas of varying lighting conditions [45], such as under desks or
through patches of sunlight.
Finally, combining the system with haptic sensors such as whiskers would reduce
the occurrence of collisions. The whiskers could be used to sense outside of the
visual field to check for obstacles before moving into an area not visible by the i-
47
48 CHAPTER 11. FUTURE WORK
Rat. Rotating on the spot is an example where this would be useful. Whiskers would
help with the autonomous learning of the floor previously described. As the i-Rat
builds its model of the floor, the whiskers would be used to provide reinforcement
that the region to learn from is free from obstacles.
References
[1] In painting NaN elements in 3D. [Online], 2008.

http://www.mathworks.com/matlabcentral/fileexchange/21214.
[2] Surveyor Stereo Vision System. [Online], 2009.

http://www.surveyor.com/stereo/stereo info.html.
[3] Tutorial: Depth of Field. [Online], 2009.

http://www.canon.com.au/worldofeos/learn/getting-started/Depth-of-
field.aspx.
[4] Bumblebee. [Online], 2010. http://www.ptgrey.com/products/bumblebee2/index.asp.
[5] Camera Calibration Toolbox for MATLAB. [Online], 2010.

http://www.vision.caltech.edu/bouguetj/calib doc/index.html#examples.
[6] L. Alvarez, J. Weickert, and J. Sánchez. Reliable estimation of dense optical

flow fields with large displacements. International Journal of Computer Vision,
39(1):41–56, 2000.
[7] G.L. Barrows, J. S. Chahl, and M. V. Srinivasan. Biomimetic visual sensing

and flight control. 2002.
[8] M. Bertozzi and A. Broggi. GOLD: A parallel real-time stereo vision system for
generic obstacle and lane detection. IEEE Transactions on Image Processing,
7(1):62–81, 1998.
[9] Stan Birchfield. Depth Discontinuities by Pixel-to-Pixel Stereo. [Online], 1999.

vision.stanford.edu/ birch/p2p/.
[10] F. Bonin-Font, A. Ortiz, and G. Oliver. Visual navigation for mobile robots:
A survey. Journal of Intelligent and Robotic Systems, 53(3):263–296, 2008.
[11] J. Borenstein, HR Everett, L. Feng, and D. Wehe. Mobile robot positioning:

Sensors and techniques. Journal of robotic systems, 14(4):231–249, 1997.
[12] Gary Bradski and Adrian Kaehler. Learning OpenCV. O’Reilly, 2008.
49
50 REFERENCES
[13] P.I. Corke. Visual control of robot manipulators, a review. Visual Serving: Real
Time Control of Robot Manipulators Based on Visual Sensory Feedback, 1993.
[14] M. A. Covington. Astrophotography for the Amatuer. Cambridge University

Press, 1999.
[15] K. Daniilidis and J.O. Eklundh. 3-D Vision and Recognition. 2008.
[16] G.N. DeSouza and A.C. Kak. Vision for mobile robot navigation: A survey.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(2):237–
267, 2002.
[17] A. Doucet, N. De Freitas, K. Murphy, and S. Russell. Rao-Blackwellised par-

ticle filtering for dynamic Bayesian networks. In Proceedings of the Sixteenth
Conference on Uncertainty in Artificial Intelligence, pages 176–183. Citeseer,
2000.
[18] P. Elinas, R. Sim, and J.J. Little. /spl sigma/SLAM: stereo vision SLAM using
the Rao-Blackwellised particle filter and a novel mixture proposal distribution.
In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE Inter-
national Conference on, pages 1564–1570. IEEE, 2006.
[19] D.A. Forsyth and J. Ponce. Computer vision: a modern approach. Prentice
Hall Professional Technical Reference, 2002.
[20] G. Gini and A. Marchi. Indoor robot navigation with single camera vision.
Pattern Recognition in Information Systems, 8(1):67–76, 2002.
[21] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz. Multi-view

stereo for community photo collections. In Proc. ICCV, pages 1–8. Citeseer,
2007.
[22] G.D. Hager, W.C. Chang, and AS Morse. Robot Hand-eye Coordination based
on Stereo Vision. IEEE Control Systems Magazine, 15(1):30–39, 1995.
[23] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey
vision conference, volume 15, page 50. Manchester, UK, 1988.
[24] A. Heyden and K. Rohr. Evaluation of corner extraction schemes using invari-
ance methods. In Pattern Recognition, 1996., Proceedings of the 13th Interna-
tional Conference on, volume 1, pages 895–899. IEEE, 2002.
[25] N. Hollinghurst and R. Cipolla. Uncalibrated stereo hand-eye coordination.

Image and Vision Computing, 12(3):187–192, 1994.
REFERENCES 51
[26] R. Kingslake. Optics in photography. Washington, 1992. SPIE-International

Society for Optical Engineering.
[27] Y. Koren and J. Borenstein. Potential field methods and their inherent limita-
tions for mobile robot navigation. In Robotics and Automation, 1991. Proceed-
ings., 1991 IEEE International Conference on, pages 1398–1404. IEEE, 2002.
[28] Peter Kovesi. MATLAB and Octave Functions for C0mputer Vision and Image
Processing. [Online], 2009. http://www.csse.uwa.edu.au/ pk/Research/Mat-
labFns/.
[29] D.J. Kriegman, E. Triendl, and T.O. Binford. Stereo vision and navigation in
buildings for mobile robots. IEEE Transactions on Robotics and Automation,
5(6):792–803, 1989.
[30] S. Lenser and M. Veloso. Visual sonar: Fast obstacle avoidance using monocular
vision. In Proceedings of IROS, volume 3. Citeseer, 2003.
[31] L.M. Lorigo, R.A. Brooks, and W.E.L. Grimson. Visually-guided obstacle
avoidance in unstructured environments. In Proceedings of IROS, volume 97,
pages 373–379. Citeseer, 1997.
[32] D.G. Lowe. Object recognition from local scale-invariant features. In iccv, page
1150. Published by the IEEE Computer Society, 1999.
[33] MJ Milford, GF Wyeth, and D. Prasser. RatSLAM: a hippocampal model for si-
multaneous localization and mapping. In Robotics and Automation, 2004. Pro-
ceedings. ICRA’04. 2004 IEEE International Conference on, volume 1, pages
403–408. IEEE, 2005.
[34] D. Murray and J.J. Little. Using real-time stereo vision for mobile robot navi-
gation. Autonomous Robots, 8(2):161–171, 2000.
[35] A. Neubeck and L. Van Gool. Efficient non-maximum suppression. In Pattern

Recognition, 2006. ICPR 2006. 18th International Conference on, volume 3,
pages 850–855. Ieee, 2006.
[36] S.H. Park and T. Matsuo. Time-derivative estimation of noisy movie data using
adaptive control theory. International Journal of Signal Processing, 2:3, 2006.
[37] B. Prescott and GF McLean. Line-based correction of radial lens distortion.

Graphical Models and Image Processing, 59(1):39–47, 1997.
[38] A. Saxena, S.H. Chung, and A.Y. Ng. 3-d Depth Reconstruction from a Single
Still Image. International Journal of Computer Vision, 76(1):53–69, 2008.
52 REFERENCES
[39] B. Siciliano and O. Khatib. Springer handbook of robotics. Springer-Verlag New

York Inc, 2008.
[40] D. Slater. Using the 6mm Nikon fisheye lens with the Nikon D1 camera. [On-
line], 2000. http://www.nearfield.com/ dan/photo/wide/fish/index.htm.
[41] K.T. Song and J.H. Huang. Fast optical flow estimation and its application to
real-time obstacle avoidance. In Robotics and Automation, 2001. Proceedings
2001 ICRA. IEEE International Conference on, volume 3, pages 2891–2896.
IEEE, 2005.
[42] C.J. Taylor and J.P. Ostrowski. Robust vision-based pose control. In Robotics
and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference
on, volume 3, pages 2734–2740. IEEE, 2002.
[43] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel,

P. Fong, J. Gale, M. Halpenny, G. Hoffmann, et al. Stanley: The robot that
won the DARPA Grand Challenge. The 2005 DARPA Grand Challenge, pages
1–43, 2007.
[44] Paul A. Tipler and Gene Mosca. Physics for Scientists and Engineers. Susan
Finnemore Brennan, 2004.
[45] I. Ulrich and I. Nourbakhsh. Appearance-Based Obstacle Detection with

Monocular Color Vision. In AAAI-00: Seventeenth National Conference on
Artificial Intelligence (AAAI-2000): Twelfth Innovative Applications of Artifi-
cial Intelligence Conference (IAAI-2000)., page 866. Amer Assn for Artificial,
2000.

Monocular Vision Based Obstacle Avoidance: by Daniel Clarke

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Monocular Vision Based Obstacle Avoidance: by Daniel Clarke

Uploaded by

Copyright:

Available Formats

SCHOOL OF ENGINEERING

Monocular Vision Based

School of Information Technology and Electrical Engineering,

Submitted for the degree of

Dear Professor Paul Strooper,

In accordance with the requirements of the degree of Bachelor of Engineering

Visual sensors provide a high bandwidth of information useful for perception of an

The project successfully combined three independent obstacle detection modules

2.6 Non Maximum Suppression . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Depth Map from Stereo Images (reproduced from [9]) . . . . . . . . . 4

4.1 Obstacle Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.1 Floor Model Projections . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.1 Canny Obstacle Detection Process . . . . . . . . . . . . . . . . . . . . 31

7.1 Image Time Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.1 Obstacle Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

9.1 Ranging Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.3 Chapter Outlines

2.1 Vision Based Obstacle Avoidance

2.1.1 Stereo Vision based Obstacle Avoidance

(a) Left Image (b) Right Image (c) Depth Map

Murray and Little

Figure 2.2: Trinocular Module (reproduced from [34])

Elinas, Sim and Little

Figure 2.3: Processing Time vs Frame Number (reproduced from [18])

2.1.2 Issues with Stereo for the i-Rat

2.1.3 Monocular Vision based Obstacle Avoidance

Saxena, Chung and Ng

Saxena et al have taken a supervised learning approach to the task of 3D depth

Figure 2.4: 3D Scanner (reproduced from [38])

Figure 2.5: 3D Depth Estimation (reproduced from [38])

Lorigo, Brooks and Grimson

Lorigo et al have combined three independent vision modules to navigate in an un-

In geometrical optics, light is approximated as rays. Rays are drawn as straight

2.2.2 Focal Length

2.2.3 Field of View

2.2.4 Lens Distortion

xcorrected = x(1 + k1 r 2 + k2 r 4 + k3 r 6 ) (2.3)

ycorrected = y(1 + k1 r 2 + k2 r 4 + k3 r 6 ) (2.4)

xcorrected = x + [2p1 y + p2 (r 2 + 2x2 )] (2.5)

ycorrected = x + [p1 (r 2 + 2y 2 ) + 2p2 x] (2.6)

The Caltech camera calibration toolbox is a freely available calibration toolbox

(a) Original Image (b) Corrected Image

Figure 2.7: Distortion Correction

2.3.2 Depth of Field

2.3.4 Shutter Speed

2.4 Stereo Vision

2.4.2 Stereo Geometry

−−→ −−→ −−−→

Figure 2.8: Epipolar Geometry (reproduced from [12])

2.4.3 Stereo Camera Calibration

The equation now becomes:

The translation vector cross product in the equation is replaced by a matrix S

PrT RSPl = 0 (2.15)

PrT EPl = 0 (2.17)

Substituting the projections equationsPl = fZl Pl l and Pr = fZr Pr r and dividing by

pTr Epl = 0 (2.18)

qrT (Mr−1 )T EMl−1 ql = 0 (2.19)

The fundamental matrix F is then defined as:

F = (Mr−1 )EMl−1 (2.20)

And so, equation 2.19 becomes:

−−→ −−−→ −−→

2.5 Optical Flow