You are on page 1of 10

OPTICAL TRIANGULATION-CORRELATION SENSOR

FOR UNDERWATER VEHICLES MOTION ESTIMATION


M. Caccia
Consiglio Nazionale delle Ricerche
Istituto Automazione Navale
Via De Marini, 6
16149 Genova, Italy
fax: +39-010-6475600
e-mail: max@ian.ge.cnr.it

MED2002 Conference
e-mail: med2002@isr.ist.utl.pt
http://www.isr.ist.utl.pt/med2002
Keywords: motion estimation, underwater vision.
Abstract
The problem of providing a reliable estimate of the
slow motion of unmanned underwater vehicles is
faced by proposing, for near bottom applications, an
optical triangulation-correlation sensor. Theoretical
and experimental motivations of the proposed
approach are discussed and preliminary results
obtained from data gathered by a sensor prototype
mounted on a ROV in operating conditions are
reported.
1 Introduction
Nowadays the accurate, reliable and high sampling
rate measurement of slow horizontal motion of
unmanned underwater vehicles is likely the main
bottleneck limiting ROVs hovering performances
and capabilities in operating in the proximity of the
seabed. Currently the most accurate and reliable
solutions to this problem are provided by acoustics.
On one hand, the use of commercial echo-sounders
working at relatively high frequency (at least 500
KHz) combined with model-based estimators
modeling the vehicle-propeller dynamic interactions
allows a very fine control of ROVs vertical motion
in operating conditions (i.e. precision of the order of
1 cm) [6][8], on the other hand, the integration of
measurements supplied by high frequency LBL
(300KHz), Doppler velocimeter (1.2 MHz) and
ring-laser gyro, has proved its capability in
providing a high precision estimate of the horizontal
motion of a ROV for archaeological applications
[23]. Anyway, this acoustic-based system for
measuring the horizontal motion of a ROV is quite
expensive and requires a very strong structuring of
the operating area with the construction of the high
frequency LBL polygon. These constraints are not
compatible with applications, such as the
exploration of benthic areas in harsh environment in
many cases from vessels of opportunity, which
require a light logistics from the point of view of
operating time and size and weight of devices. This
motivates research on motion estimation techniques
which can potentially lead to the development of
cheap stand-alone devices directly mountable on the
vehicles, such as those based on underwater optical
vision. In the case of underwater vision the main
technical issue is related to the capability of tackling
in a large variety of operating conditions the basic
properties of underwater images, which are usually
characterized by non-uniform lighting, suspended
particles in water, limited range and unstructured
environment [1]. A number of pioneer applications
of underwater vision techniques to cable and
pipeline tracking [1][24], mosaicking [15], object
sizing [21], motion estimation in structured, i.e. off-
shore applications, [22] and unstructured
Proceedings of the 10th Mediterranean Conference
on Control and Automation - MED2002
Lisbon, Portugal, July 9-12, 2002.
environment [20] have been evaluated in the
literature using both mono and stereo devices until
the last experiments in combined motion estimation
and mosaicking proposed in [19]. In the authors
opinion the main result is the revised definition of
optical flow as the perceived transformation of
brightness patterns in an image sequence [18],
which, on the contrary of the classical definition
given by Horn [13], allows to discriminate the
optical flow induced in a stationary scene by the
motion of a light source. It is worth noting that this
is the typical operating condition in underwater
vision, where the light sources are mounted on the
ROV and move together with the camera(s).
Basically, the above-mentioned researches on
vision-based motion estimation focused on the
complete computation of the 6 dofs motion from
images, rather than on the integration of vision-
based information with other sensor measurements
(it is worth noting that the integration of data
gathered by different sources involves complex
calibration problems when high accuracy is
required). In any case experimental results pointed
out the theoretically expected difficulties, in an
unstructured environment, related to feature
extraction and matching in stereovision systems,
and to the estimate of the ratio between the motion
components orthogonal and parallel to the image
plane in monocular systems. In some experiments,
these problems seemed to be amplified, when trying
to stabilize a vehicle using a video feed-back, by the
combination with the control system of ROVs
characterized by under-actuated strongly coupled
propulsion systems. On the basis of the above-
mentioned research and of operating experience
gained by ROVs exploitation in marine science
applications [2][3][4] this research sets itself the
goal of developing a mono-camera vision system for
measuring the linear motion of a ROV executing
basic operational tasks at low speed, such as
station-keeping and executing near bottom
traverses. Strong constraints on the application, e.g.
station-keeping or constant heading maneuvers
always in the close proximity of the seabed, and
assumptions on the dynamics of the carrier vehicle,
e.g. zero pitch and roll rates during operations, are
not, in the authors opinion, a serious drawback of
the proposed approach: autonomous and reliable
execution of basic tasks such as station-keeping in
an unknown environment is currently a step beyond
the state of the art, working ROVs are usually
constructively stable in pitch and roll, and the
physics of underwater optic propagation forces the
employment of vision only at short range, according
to the natural law of increasing precision and details
and reducing speed when reducing range from
environmental features. On the other hand, high
performances in slow motion control of UUVs are
strongly related not only to the quality of the
position and speed measurements, but also to the
precision of the control algorithms and system
controllability, and, in particular, to the smoothness
of the actuation system and its symmetry and
uncoupling along the motion dofs. In other words,
the goal of high performances in hovering control
cannot be reached by a vehicle characterized by a
strong coupling between its actuation along the
heave, sway and roll dofs, but requires a fully
controllable platform, eventually intrinsically stable
with respect to some dofs. Since experience with
ROV images revealed that the availability of direct
measurements of the scene depth increases the
reliability and stability of the motion estimate [5],
the possibility of obtaining this information from
dedicated sensors has been investigated. In order to
reduce costs and sidestep the difficulties related to
the calibration of sensors of different nature, e.g.
optical and acoustic devices, the basic idea of using
the single camera for directly measuring the scene
depth besides the motion components parallel to the
image plane has been pursued. In particular, the use
of optical triangulation of laser spots for measuring
range is well known in the literature [10][16] and its
employment in underwater applications has been
proposed in [9]. In addition, the employment on the
Romeo ROV of a small matrix of parallel red laser
spots for immediate sizing of the observed objects
by the human operator, typically a marine scientist,
demonstrated the reliability of laser tracking in a
large variety of operating conditions. The result has
been the design of a video system constituted by a
video camera and four parallel red laser diodes for
measuring distance and orientation from surfaces.
This leads, at least in principle, to a reliable and high
frequency short range altimeter in the case of a
camera mounted vertically below the vehicle. As far
as the problem of motion estimation from video
images is concerned the approach of computing
motion from the displacement of tokens tracked
through correlation techniques has been followed.
In case of slow motion characterized by small
rotations and changes in the scene depth correlation
proved working well in tracking image token also in
underwater applications [14]. Since the basic
problem of measuring the scene depth is managed
by the triangulation module, the main technical issue
is the detection of the tokens in an unstructured
operating area such as the underwater environment.
In computer vision literature, typical tokens such as
corners, circular symmetric features, and
logarithmic spirals [12], are defined for structured
environments, but these kinds of features are not
usually present in underwater scenes of scientific
interest, i.e. not in the proximity of human-made
structures as offshore platforms. So, at first
research focused on the basic problem of defining
what a token is in an unstructured environment. A
possible answer to this question consists in defining
a token, i.e. an interest area, as an area with locally
varying image intensity. A method for automatic
extraction of interest areas in the case of
autonomous spacecraft landing is presented in [17].
Specific spatial wavelengths of the original image
are enhanced by 2-D band-pass filter, which is
performed through averaging and sub-sampling, and
Laplacian filtering. Then local variances of the
filtered image are computed to evaluate contrast,
and high local-variance areas are extracted as
templates (tokens). In a first simple approach, the
displacements of the tokens extracted in that way
has been computed with correlation-based
techniques. The prediction of the token position in
the image at each time makes computation fast
reducing the number of points where correlation is
computed. The paper is organized as it follows.
After a short summary of the nomenclature used in
the paper (section 2), a brief overview of the system
architecture is given in section 3, while the main
modules constituting the system are presented in
sections from 4 to 6. Section 4 deals with the 3D
optical triangulation sensor focusing on the problem
of detecting and tracking laser spots. A more
rigorous definition of interest areas is given in
section 5, where techniques for their detection and
tracking are presented too. Section 6 summarizes
the algorithms for computing motion from token
displacements discussing the benefits got from the
use of independent measurements of the scene
depth and speed in the direction perpendicular to
the image plane. Experimental set-up and results,
including the first sensor prototype and preliminary
trials at sea with the Romeo ROV, are reported and
discussed in section 7. Conclusion remarks and
discussion of current and future research will
conclude the paper.
2 Nomenclature
As discussed in [11], the motion of marine vehicles
is usually described with respect to an earth-fixed
inertial reference frame <X,Y,Z> and a moving
body-fixed reference frame <X
V
,Y
V
,Z
V
>, whose
origin coincides with the center of gravity of the
vehicle. Thus, position and orientation of the vehicle
are described relative to the inertial reference frame,
while linear and angular speeds are expressed
relative to the body-fixed reference frame. The
vehicle kinematics nomenclature follows:
[x y z]
T
: UUV position relative to the earth-fixed
reference frame;
[ ]
T
: UUV roll, pitch and yaw angles relative to
the earth-fixed reference frame;
[u v w]
T
: UUV linear speed (surge, sway, heave)
relative to the vehicle-fixed reference frame;
[p q r]
T
: UUV angular speed (roll, pitch and yaw
rates) relative to the vehicle-fixed reference frame.
A camera-fixed reference frame <X
C
,Y
C
,Z
C
> is
defined with the Z
C
axis directed towards the scene.
The camera and image basic nomenclature follows:
f: focal length;
[m n]
T
: image point in the image plane <X
C
,Y
C
,0>
[ ]
T
n m & & : image motion field in the image plane
<X
C
,Y
C
,0>;
[X Y Z]: coordinates of a generic point in the 3-D
space (usually referred to the camera frame).
The laser diodes are rigidly connected to the
camera-fixed frame and laser spots are given by the
intersection between the laser rays and the
environment:
L
N : number of laser diodes;
( ) ( ) ( )
L
T
i
L
i
L
i
L N 1 i , Z Y X K
1
]
1

: laser diode positions


relative to the camera-fixed reference frame;
( ) ( ) ( )
[ ]
L
T
i
L
i
L
i
L
N 1 i , Z Y X K : laser spot positions
relative to the camera-fixed reference frame
( ) ( )
[ ]
L
T
i
L
i
L
N 1 i , n m K : image of the laser spot in the
image plane <X
C
,Y
C
,0>.
In the case the camera is mounted downward-
looking below the vehicle, the frames <X
C
,Y
C
,Z
C
>
and <X
V
,Y
V
,Z
V
> are assume to coincide. In
addition, if the vehicle pitch and roll are zero the
axes Z
V
and Z
C
are vertical and the laser spot
coordinates
( ) i
L
Z represent the altitude of the camera
from the surface tracked by the laser spot.
3 System architecture
As shown in Figure 1 the system is constituted by:
an Optical Laser Triangulation Sensor that detects
and tracks the laser spots and computes the image
depth; a Token Detector and Tracker that identifies
and tracks suitable areas of interest in the image
sequence; and a Motion from Tokens Estimator that
on the basis of the measurements supplied by the
previous modules computes the translational motion
of the camera.
Figure 1. System Architecture.
4 3D Optical triangulation sensor
The processing of the image to measure distance
and orientation from the seabed surface is
performed in three steps: i) detection and tracking
of the laser spots in the image coordinates; ii)
estimation of the spatial coordinates of the laser
spots in the camera(vehicle)-fixed frame; iii)
estimation of the surface range and orientation.
As shown in Figure 2, taken with the camera
mounted below the Romeo ROV in the Ligurian
Sea, operating experience reveals the sharpness of
the laser spots when considering the red component
of a RGB image of the seabed.
Figure 2. Red component image of the seabed
and laser spots
In these conditions the laser spots can be easily
detected by searching the pixel with maximum
intensity in the image, and then computing the
intensity center of the light spot in its neighborhood
[9] in order to reduce disturbances and increase the
measurement accuracy. In the initialization step the
whole image is first considered and the first laser
spot detected. Then its neighborhood is blackened
and the second intensity peak is searched, and so
forth until all the laser spots will be detected.
During algorithm iteration, each laser spot is
searched in a sub-image of suitable size centered in
its previous estimated location in the image plane.
Once estimated the image laser spots
( ) ( )
[ ]
L
T
i
L
i
L
N 1 i , n m K , combining the laser ray
equations
( )
( )
( ) ( )
( )
( )
( )
( ) ( )
( )
L
i
L
i
L
i
Y
i
L
i
L
i
L
i
L
i
X
i
L
i
L
N 1 i ,
Z Z a Y Y
Z Z a X X
K

'

,
_

,
_

+
(1)
where the laser ray parameters
( ) ( )
L
i
Y
i
X
n 1 i , a , a K , are
estimated during the device calibration, and the
camera perspective model
1
]
1

1
]
1

Y
X
Z
f
n
m
(2)
the following LS estimate of the i-th laser spot
depth
( ) i
L
Z is obtained:
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
L
2
i
Y
i
L
2
i
X
i
L
i
L
i
Y
i
L
i
Y
i
L
i
L
i
X
i
L
i
X
i
L
i
L
N 1 i
,
a
f
n
a
f
m
Z a Y a
f
n
Z a X a
f
m
Z

,
_

,
_

,
_

,
_

,
_

,
_

(3)
and, substituting in (1), the estimates of laser spot
locations,
( ) ( )
( )
L
i
L
i
L
N 1 i , Y

, X

K , are computed.
At a generic instant, the seabed can be locally
represented in the camera (vehicle)-fixed reference
frame by the plane
( ) 0 cos h Z sin sin Y cos sin X + + (4)
where and are the seabed maximum slope and
its orientation and h is the vehicle altitude.
Assuming that the seabed profile is not vertical the
local plane can be written as:
c Y b X a Z + (5)
Substituting eqn. (5) in the camera perspective
model (2), the 3D coordinates of each image point
[m n]
T
can be computed as:
1
1
1
]
1

+ +

1
1
1
]
1

f
n
m
bn am f
c
Z
Y
X
(6)
On the other hand, given N
L
laser spots
( ) ( ) ( )
[ ]
L
T
i
L
i
L
i
L
N 1 i , Z

K , it is possible to estimate
the coefficients of the plane (5), e.g. using a LS
algorithm with a set of measurement equations:
( ) ( )
[ ]
( )
L
i
L
i
L
i
L
N .. 1 i , Z

c
b
a
1 Y


1
1
1
]
1

(7).
In the case the area S
mn
of the image of the polygon
which has its vertices in the laser spots is
considered, the bottom surface range Z can be
derived by the relation
mn
2
S
k
Z (8)
where k is a device calibration parameter.
5 Natural interest areas: detection and
tracking
The choice of interest areas in images of natural
terrain is usually based on the detection of areas
with locally varying intensity. In particular, the use
of Gabor functions have been proposed in
applications of image registration of natural terrain,
while a less computing demanding approach based
on averaging, Laplacian filtering and local variance
has been proposed in [17] to automatically choose
and track image templates for autonomous landing
of space-crafts. The similarity between the
unstructured landing sites of spacecrafts shown in
the above-mentioned paper and the unstructured
seabed environment suggested the adoption of this
approach for detection and tracking of natural
interest areas in underwater applications. In the
following a short summary of this approach is
reported for the sake of completeness, while a
detailed discussion can be found in[17]. Templates,
i.e. tokens, able to guarantee robust and accurate
tracking, should be characterized by shading pattern
of wavelength comparable with preselected block
size and be distinctive in the sense of contrast.
Tokens extraction procedure consists of three steps:
1) 2-D band-pass filtering to enhance specific
spatial wavelength;
2) local variances computation to evaluate
contrast;
3) extraction of high-local variance areas as
templates.
In order to reduce computation band-pass filtering
is performed executing averaging as low-pass
filtering and sub-sampling and Laplacian filtering as
high-pass filtering.
Averaging and sub-sampling are performed
simultaneously:
( ) ( ) ( ) [ ]


+ +
m n
S
1 i
S
1 j
n m
n m
S
j 1 S , i 1 S I
S S
1
, I (9)
while the 8-neighbor Laplacian is computed as:
( ) ( ) ( ) { }


+ +
1
1 i
1
1 j
S S L
, I j , i I , I (10)
where I, I
S
and I
L
represent the intensity of the
original, averaged and sub-sampled, and Laplacian-
filtered image respectively, and S
m
and S
n
indicate
the sub-sampling interval. Then, the roughness of
the band-pass filtered image I
L
is evaluated
computing the statistical variance within a sliding
window, to cover the entire image, the size of
which is equal to the token that is to be extracted.
Templates used for tracking are extracted in order
of local variance which is required to be higher than
a threshold. In order to avoid clustered templates,
which are more sensitive to observation noise, the
extraction of tokens near to a selected one is
inhibited. Token tracking is performed looking for
the highest correlation displacement in a suitable
neighborhood of the previous location. In order to
reduce the computational charge, after initialization
each token is searched in the proximity of its
predicted position, computed assuming a linear
motion of the area of interest on the image plane.
Token tracking fails when the correlation gets lower
than a suitable threshold.
6 Motion from token displacements
Consider now the problem of estimating the camera
speed [ ]
T
w v u given two frames at time t and t
in the initial camera-fixed frame. A relationship
between the camera displacements, token locations
and scene structure, i.e. scene depth, can be
obtained writing the camera perspective model
equations (2) for the points [ ]
T
i i i
f n m X and
[ ] [ ]
T
i
T
i i i
z y x X f ' n ' m ' X , i.e. for the
position of a generic scene point in the camera-fixed
frame in two frames at time t and t in case of pure
translational motion described by [ ]
T
z y x . For
the generic token it yields to the equations
1
1
1
]
1

1
]
1


1
]
1

+
1
]
1


1
]
1

w
v
u
n f 0
m 0 f
Z
1
n
m
Z
w
v
u
Z
f
n
m
&
&
(11),
with the general solution
1
1
1
]
1

+
1
1
1
]
1


1
1
1
]
1

f
n
m
0
n
m
f
Z
w
v
u
&
&
(12).
As discussed in [12], observing that [ ]
T
w v u is
the span of [ ]
T
0 n m & & and [ ]
T
f n m , the following
equation holds:
[ ][ ] 0 w v u n m n m f m f n
T
& & & & (13).
Thus, for N tracked tokens, the solution is given by
the eigenvector of A
T
A with the minimum
eigenvalue in the overconstrained system
1
1
1
]
1

1
1
1
]
1

1
1
1
]
1

1
1
1
]
1

0
0
w
v
u
n m n m f m f n
n m n m f m f n
w
v
u
A
N N N N N N
1 1 1 1 1 1
M
& & & &
M M M
& & & &
(14)
with the constraint 0 K w v u
2 2 2 2
> + + .
It is worth noting that the above-presented
conventional constrained optimization techniques,
where the constraint is the normalized module of
the camera speed, for the computation of
translational motion from token displacements
neglect physical constraints on the unknown
parameters, e.g. the fact that the scene depth is
always positive. As discussed in [5] on the basis of
image sequences acquired by the Romeo ROV pilot
camera, this could lead to unconsistent solutions,
as, for instance, in the case of non zero mean radial
components of the token displacement errors, that
affect the estimate of the camera speed along the Z-
axis (this kind of biased error is typically induced by
the approximated compensation on the image
distortion given by a wide-angle lens). In particular,
referring to typical images acquired by advanced
ROVs for scientific research such as Romeo, in
addition to the constraint on the sign of the scene
depth, the following constraints on the scene
structure and motion dynamics are neglected by the
above-mentioned approach:
the scene depth is quite smooth or, in many
cases, constrained in a range interval quite
narrow, due to the small area covered by the
camera;
the dynamics of the vehicle, and so of the
camera, is such that its motion is quite smooth
and well modeled with respect to the water with
limited variations in the speed between two
consecutive image frames.
In addition, the memory of the estimated motion is
usually available, the depth of some image points
can be measured/estimated combining video with
other devices (sonar and/or laser), i.e. in some sense
locally structuring the environment, and an estimate
of the vehicle heave is usually available integrating
depth measurements and vehicle dynamics. The
estimate precision and robustness can be improved
by the integration of physical constraints and/or
other sensor data in the process of estimating
motion from video sequences. The basic ideas in
this direction are: i) measuring/estimating the depth
of some points in the scene and solving a
constrained optimization problem where the
constraints are on the scene depth; ii) assume that
an estimate of the vehicle heave is available; iii)
build a recursive filter of the vehicle speed in order
to exploit vehicle dynamics information.
In the case a 3D optical triangulation sensor of the
type presented in section 4 is available, the scene
depth Z can be directly measured and the vehicle
heave w obtained by the vehicles vertical motion
estimator or, in suborder, approximated by the first
derivative of the scene depth. Introducing these
estimates equation (11) gets
1
1
]
1

1
]
1

n w n
Z

m w m
v
u
f
&
&
(15)
obtaining the LS estimate
1
1
1
1
]
1

1
]
1

N
1 i
i i i
N
1 i
i i i
Z

n w n
Z

m w m
fN
1
v
u
&
&
(16).
It is worth noting that a single tracked token is
sufficient to compute a (noisy) estimate of the
camera translational motion.
In any case, if only a direct measurement of the
scene depth is available the overconstrained system
b
n Z

m Z

n Z

m Z

w
v
u
n f 0
m 0 f
n f 0
m 0 f
w
v
u
A
N N
N N
1 1
1 1
N
N
1
1

1
1
1
1
1
1
]
1

1
1
1
]
1

1
1
1
1
1
1
]
1

1
1
1
]
1

&
&
M
&
&
M M M (17)
can be written and solved with a LS algorithm.
7 Experimental set-up and results
Preliminary set-up for evaluating the system
feasibility and basic calibration procedures has been
assembled and tested in air. The selected camera is
the high sensitivity (0.8 lux F1.2, 50 IRE; 0.4 lux
F1.2, 30 IRE) Sony SSC-DC330 1/3" High
Resolution Exwave HAD CCD Color Camera
which features 480 TV lines of horizontal resolution
with 768H x 494V picture elements. Four laser
diodes, with rays perpendicular to the image plane,
were mounted in the corners of a 11 cm side square
aligned with the camera frame and symmetric with
respect to the image center. The system, calibrated
using 8 images with a scene depth in the interval
between 60 cm and 95 cm proved a precision higher
than 3 mm in the measurement of surface distance
in the range between 60 cm and 105 cm. After this
preliminary encouraging results, an underwater
version of the device have been assembled. The
camera has been mounted inside a suitable steel
canister, where the four laser diodes were rigidly
connected in the corners of a square of 14 cm side.
Due to scheduling needs of the Romeo ROV [7],
there was no time for an accurate in-water
calibration of the device, which was mounted in the
toolsled below the vehicle and tested at sea for
collecting data. A simplified calibration was
performed in order to evaluate the device
coefficient which links the surface of the image of
the laser spot quadrilateral to the square of the
image depth (see eqn. (8)). To this aim the scene
depth was measured by the ROV acoustic altimeter
with the vehicle operating at different altitudes over
a smooth flat seabed. Calibration values of
[ ]
2 2
pixel m 8 . 4132 2 k

and [ ] pixel 9 . 1194 f

have been
computed. During at-sea trials, carried out in the
Ligurian Sea, Portofino Park area, with Romeo
operating at a depth of about 55 m, image
sequences have been recorded in DV format and
then acquired, in post-processing, with a sampling
rate of 5 Hz at the resolution of 360x272 pixels.
The red component of the RGB image has been
processed for laser spot and interest areas detection
and tracking. Laser spot locations in the image have
been detected and tracked by trivially searching the
pixel with maximum intensity in four suitable sub-
images (in the initialization phase this operation was
iterated four times on the whole image with
blackened areas in correspondence with the already
detected light spots). Results, obtained processing a
sequence of 582 images, i.e. 116.4 sec, are shown in
Figure 3, where the estimated image coordinates of
the laser spots are depicted, and in Figure 4 where
the corresponding estimated scene depth, i.e.
vehicle altitude, and its first derivative are plotted.
As far as token extraction and tracking is
concerned, in order to avoid clustered templates, 20
not overlapping sub-images of 64 by 64 pixels have
been considered, and templates of 24 pixels of side
have been extracted after processing each sub-
image with a sub-sampling interval equal to 4.
Search of new areas of interest were performed in
the sub-images where no other tokens or laser spots
had been detected. Only tokens with a local
variance higher than a threshold equal to 4.5 have
been considered, and token tracking failed when the
correlation was lower than 0.75. Since trials have
been performed in an area characterized by a
smooth flat seabed, the vehicles heave has been
approximated by the scene depth first derivative, i.e.
Z

w
&
.
Figure 3. Estimated laser spot image coordinates:
m: solid line; and n: bold line.
Figure 4. Estimated scene depth (ROV altitude) and
its first derivative.
Results for the estimation of the vehicles
translational motion according to (16) and (17) in
the above-mentioned image sequence, in the case of
no constraints on the number of tracked tokens and
searched sub-images at each step, are shown in
figures from 5 to 7. Smooth estimate of the
vehicles translational speed with oscillations on
surge and sway which includes the effects of pitch
and roll small displacements, which, in this case, are
not compensated by external direct measurements
and are not distinguishable by small horizontal
displacements. It is worth noting than when the
number of tracked tokens is lower than 5 the
numerical stability of the estimate neglecting heave
information, i.e. (17), could dramatically decrease
as pointed out by spikes in Figure 5 and 7.
Figure 5. Estimated camera translational speed:
scene depth provided by optical triangulation.
Figure 6. Estimated camera speed: scene depth and
ROV heave are provided by triangulation.
Results obtained processing the same image
sequence, with constraints on the maximum number
of tracked tokens, show the presence of a higher
number of measurement outliers, in particular in the
case only scene depth measurements are considered.
In order to have a clearer view of the sensor and
algorithms performances, the reader can download
at http://web.tiscali.it/maxcaccia/robotlab.htm some
mpg files showing the tracked tokens and laser
spots in the above-mentioned image sequence.
Figure 7. Difference in estimated camera
translational speed with/without heave data.
8 Conclusions
An optical triangulation-correlation sensor for
underwater vehicles motion estimation has been
designed, developed and preliminary tested. Results,
obtained by post-processing image sequences
acquired by the instrument mounted below the
Romeo ROV in operating conditions, are quite
encouraging in terms of satisfactorily precision and
reliability of the optical triangulation sensor, good
efficiency of the method for the extraction of areas
of interest and the correlation-based token tracking
in unstructured environment and smoothness of the
estimated motion from token displacement, i.e. the
signal seems to have a sufficient sampling rate and
reduced noise and delay to be used as feed-back in a
control loop for station-keeping. In order to
implement a real-time application the system
requires some basic improvements in particular
regarding the intelligent search of new areas of
interest exploring the new regions of the image, i.e.
just entered in the camera field of view. This should
guarantee the availability at any time of a
sufficiently high number of tokens with a
constrained image processing time. In addition
current work is focusing on: the estimate of the
absolute position of the camera relative to a fixed
reference point (station point) processing two
images at the station and last sampling time, and
comparison with results obtained computing
position integrating velocity data obtained from
processing of interframe displacements; the
evaluation of calibration problems in integrating
image processing with data supplied by other
vehicles sensor and motion estimation modules,
e.g. pitch and roll angles and rates, heave, etc.; the
execution of extended sea trials to evaluate
performance of the optical sensor for motion
estimation in various operating conditions; the
employment of the data as feed-back in Romeos
guidance and control: in particular the optical
triangulation sensor is a reliable, high precision,
high sampling frequency, short range altimeter
which could be used for bottom-following and auto-
altitude in near bottom applications, while
performances in the estimate of absolute position
and its application to station-keeping has to be
evaluated in-field.
Acknowledgements
This work has been partially funded by a 1999
CNR-NATO Grant Settore Disciplinare per le
Scienze dIngegneria ed Architettura under the
direction of Prof. A. Pascoal, Instituto Superior
Tecnico-Instituto de Sistemas e Robotica, Lisbon,
Portugal.
The author thanks all the staff of the CNR-IAN
Robotlab, directed by G. Veruggio, for their help in
device design and construction and in collecting and
acquiring Romeos video data, and all the staff of
IST-ISR Dynamical Systems and Ocean Robotics
Lab for their kindest support and cooperation
during the stage in Fall 1999.
References
[1] Balasuriya B.A.A.P., M. Takai, W.C. Lam, T.
Ura, Y. Kuroda. Vision based autonomous
underwater vehicle navigation: underwater cable
tracking, Proc. of OCEANS '97, Vol. 2 , pp.
1418 1424, (1997)
[2] Bono R., G. Bruzzone, M. Caccia, F. Grassia,
E. Spirandelli, G. Veruggio. ROBY goes to
Antarctica, Proc. of OCEANS '94, Vol. 3, pp.
621-625, (1994).
[3] Bono, R., Ga. Bruzzone, Gi. Bruzzone, M.
Caccia, E. Spirandelli, G. Veruggio. Romeo
goes to Antarctica, Proc. of OCEANS '98, Vol.
3, pp. 1568-1572, (1998).
[4] Bruzzone G. Notes on Internet-based satellite
tele-operation of the Romeo ROV in
Antarctica, CNR-IAN TR-2002-Rob-01,
(2002).
[5] Caccia M. Vision for estimating the slow
motion of unmanned underwater vehicles,
CNR-IAN NATO Grant Report, (1999).
[6] Caccia M., R. Bono, G. Bruzzone, G.
Veruggio. Bottom-following for remotely
operated vehicles, Proc. of 5
th
IFAC
Symposium on Manoeuvring and Control of
Marine Craft, pp. 251-256, Aalborg, Denmark,
(2000)
[7] Caccia M., R. Bono, G. Bruzzone, G.
Veruggio. Unmanned Underwater Vehicles for
scientific applications and robotics research: the
ROMEO Project, Marine Technology Society
Journal, Vol. 24, n. 2, pp. 3-17, (2000).
[8] Caccia M., R. Bono, Ga. Bruzzone, Gi.
Bruzzone, E. Spirandelli, G.Veruggio.
Integration and sea trials of ARAMIS with the
Romeo ROV, Proc. of Oceans 2001,
Honolulu, USA, (2001)
[9] Chen H.H., C.J. Lee. A simple underwater
video system for laser tracking, Proc. of
Oceans 2000, vol. 3, pp. 1543-1548, (2000).
[10] Clark J., A.K. Wallace, G.L. Pronzato.
Measuring range using a triangulation sensor
with variable geometry, IEEE Transactions on
Robotics and Automation, Vol. 14, no. 1, pp.
60-68, (1998).
[11] Fossen T.I. Guidance and Control of Ocean
Vehicles, John Wiley & Sons, England, (1994).
[12] Haralick R.M., L.G. Shapiro. Computer
and robot vision, Addison-Wesley, (1992).
[13] Horn, B.K.P. Robot vision, MIT Press,
Cambridge, USA, (1986).
[14] Marks R.L., H.H. Wang, M.J. Lee, S.M.
Rock. Automatic visual station keeping of an
underwater robot, Proc. of OCEANS '94, Vol.
2, pp. 137-142, (1994).
[15] Marks R.L., S.M. Rock, M.J. Lee. Real-
time video mosaicking of the ocean floor,
IEEE Journal of Oceanic Engineering, Vol. 20,
no. 3, pp. 229241, (1995)
[16] Marques L., U. Nunes, A.T. de Almeida. A
new 3D optical triangulation sensor for
robotics, Proc. of 5
th
International Workshop
on Advanced Motion Control, pp. 512-517,
(1998).
[17] Misu, T., T. Hashimoto, K. Ninomiya.
Optical guidance for autonomous landing of
spacecraft, IEEE Transactions on aerospace
and electronic systems, vol. 35, no. 2, pp. 459-
473, (1999).
[18] Negahdaripour S. Revised definition of
optical flow: integration of radiometric and
geometric cues for dynamic scene analysis,
IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 20, no. 9, pp. 961
979, (1998)
[19] Xu X., S. Negahdaripour. Application of
extended covariance intersection principle for
mosaic-based optical positioning and navigation
of underwater vehicles, Proc. of IEEE Int.
Conference on Robotics and Automation, Vol.
3, pp. 2759-2766, (2001).
[20] Negahdaripour S., X. Xu, L. Jin. Direct
estimation of motion from seafloor images for
automatic station keeping of submersible
platforms, IEEE Journal of Oceanic
Engineering, vol. 24, no. 3, pp. 370-382,
(1999).
[21] Singh H., F. Weyer, J. Howland, A.
Duester, D. Yoerger, A. Bradley. Quantitative
stereo imaging from the Autonomous Benthic
Explorer (ABE), Proc. of OCEANS '99, Vol.
1, pp. 5257, (1999)
[22] Wasielewski S., M.J. Aldon. Dynamic
vision for ROV stabilization, Proc. of
OCEANS '96, Vol. 3, pp. 10821087, (1996).
[23] Whitcomb L., D. Yoerger, H. Singh, J.
Howland. Advances in underwater robot
vehicles for deep ocean exploration: navigation,
control, and survey operations, Proc. of the 9
th
Int. Symposium of Robotics Research,
Snowbird, USA, (1999).
[24] Zingaretti P., S.M. Zanoli. Robust real-time
detection of an underwater pipeline,
Engineering Applications of Artificial
Intelligence, vol. 11, pp. 257-268, (1998).

You might also like