You are on page 1of 5

2nd IFAC Workshop on Research, Education and Development of

Unmanned Aerial Systems


November 20-22, 2013. Compiegne, France

Vision-Based Window Estimation for MAV


in Unknown Urban Environments
Shuting Zhou Gerardo Flores Rogelio Lozano
Pedro Castillo
Heudiasyc UMR 6599 Laboratory, University of Technology of
Compi`egne, France (e-mail: shuting.zhou@hds.utc.fr).

Heudiasyc UMR 6599 Laboratory, UTC CNRS France and LAFMIA


UMI 3175, Cinvestav, Mexico.

Abstract: This paper addresses the issue of window estimation of a micro Air Vehicle (MAV)
in unknown urban environments. The MAV is required to navigate from an initial and outdoor
position to a final position inside a building. This paper develops two vision-based methods using
the information provided by the onboard vision system. To effectively identify the target and
estimate the distance between the camera carrier and target, firstly a stereo camera system is
applied. Besides, we propose another approach using point cloud captured by a RGB-D camera.
Keywords: Computer vision; Window estimation; Stereo vision; Kinect; Point cloud
1. INTRODUCTION
Recently micro air vehicles (MAV) are encountered in an
increasing number of applications due to their capabilities
to fly in indoor/outdoor environments. The abilities of
flying at low speeds, hovering, flying laterally, exploring
terrains and acquiring visual information in narrow spaces
make the MAV a highly suitable platform for several tasks
such as surveillance, reconnaissance, traffic monitoring and
inspection in complex and dangerous environments, where
manned or regular-sized aerial vehicles are not able to
accomplish these missions, even with their full operational
capabilities.
Speaking of accomplishing an efficient exploratory navigation and leading through objects avoiding collisions with
obstacles in cluttered environments, relative positioning is
necessary for precise flight in relation to objects of interest.
Vision-based positioning is well-suited for this type of
positioning tasks as it is able to free the system from
relying on external positioning devices such as satellitebased GPS. Vision also allows autonomous helicopters to
serve as intelligent eyes in the sky and provides a natural
sensing modality for object detection and tracking.
As one of the applications of vision-based state estimation,
the topic of vision guided object detection and navigation of MAV has been widely investigated. Relative pose
estimation of an unmanned helicopter has been investigated in Xu et al. (2006), where the UAV must land on
a moving target. Object positioning by using a visual
odometer for an autonomous helicopter is presented in
Amidi et al. (1998). The odometer estimates helicopter
position by visually locking on to and tracking ground
objects. The work on window tracking for one-dimensional
visual control of an unmanned autonomous helicopter is
reported in Mejias et al. (2005). Then the vision-based
This work is partially supported by the Institute for Science &
Technology of Mexico City (ICyTDF).

978-3-902823-57-1/2013 IFAC

107

feature tracking system has been combined with GPSpositioning references to navigate the helicopter towards
these features and track them, see Mejias et al. (2006).
In this paper, we require that the MAV accomplishes the
task of identifying a window and fly through it, in order to
access into a building. The fulfillment of this objective will
be quite significant for various military and civil missions.
This work presents two vision approaches developed for
the real-time identification of a window model. The first
vision-based strategy is performed by using the stereo
vision techniques. The advantage of applying stereo vision
is that it can reconstruct the depth information of the
scene more precisely than single camera, which guarantees
the safety navigation. The second vision algorithm is based
on structured light RGB-D camera. Although based on
stereo techniques and share many properties with stereo
cameras, RGB-D cameras achieve a better performance in
the spatial density of depth data. Since RGB-D cameras
illuminate a scene with a structured light pattern, they
can estimate depth in areas with poor visual texture.
The reminder of this paper is organized as follows: Section
2 addresses the window estimation problem. Section 3
presents the window estimation strategy based on stereo
vision techniques. The vision-based window detection algorithm using the RGB-D camera is introduced in Section 4.
Section 5 gives the comparison of two approaches. Finally,
Section 6 draws a conclusion and gives a perspective on
future work of the related research.
2. PROBLEM DEFINITION
Suppose that the MAV can arrive at the intermediate point
outside the target window which is provided by the GPS
system. This scenario is shown in Fig.1. Once the MAV
has achieved the aforementioned point, the parameters of
the target window model must be estimated so that the
flying control system can minimize the distance between
10.3182/20131120-3-FR-4045.00026

IFAC RED-UAS 2013


November 20-22, 2013. Compiegne, France

the centroid of the target and the MAV center of gravity,


then fly through the window and finally access into the
building.

Fig. 1. Three-dimensional scenario of window model estimation problem


3. WINDOW ESTIMATION ALGORITHM BASED ON
STEREO VISION

Stereo rectification Since it is much easier to compute


the stereo disparity when the two image planes are exactly
aligned, it is desired to reproject the image planes of both
cameras so that they reside in the same plane, with image
rows perfectly aligned into a frontal parallel configuration.
This is the goal of stereo rectification: the image rows
between the two cameras are aligned after rectification
so that the subsequent work will be more reliable and
computationally tractable. we have applied the Bouguets
algorithm to fulfill the stereo rectification task, see Bradski
and Kaehler (2008).
Stereo correspondence and acquisition of depth image
The goal of this stage is to find the same point in the
two different camera views and match a 3D point. Having
corresponding points in the two different camera views, it
is possible to derive depth measurements from the triangulated disparity measures. We choose the block-matching
stereo algorithm in Bradski and Kaehler (2008) to conduct
the stereo correspondence. The algorithm works by using
small sum of absolute difference (SAD) windows to find
matching points between the left and right stereo rectified
images, see Guevorkian et al. (2002).

Before applying the vision strategy in the real-time navigation system, some simulations are conducted. To simplify
the proposed problem, the design of the window model
must be simple and distinctive for the location, feature
extraction and matching. We consider a designed pattern
as a square with black edge on a white wall, in which the
target window frame is designed as the black border.

Extraction of desired features Our final goal is to extract


the object of interest from the depth image. As not all
the desired features can be reflected in the depth image,
there are some small gaps within the image, which lead
to discontinuous contour. Therefore we make use of the
connected domain processing technology in Bradski and
Kaehler (2008) for the purpose of patching up these gaps
and obtaining the complete contour.

3.1 Algorithm description

3.2 Experimental setup and result

Stereo-pair images obtained from two cameras placed in


parallel are used to compute three-dimensional (3D) world
coordinates (in the camera coordinate system) of desired
object features. The object 3D reconstruction contains
five main steps: stereo calibration, rectification, stereo
correspondence, acquisition of depth image and extraction
of desired features. These steps are explained next.

A stereo rig has been constructed for the purpose of


performing stereo imaging studies. Two uEye UI-1226LEM-G cameras from IDS GmbH (IDS (2011)) are installed
with a seperation of 35cm between them, as shown in Fig.2.

Stereo calibration As accurate calibration of cameras is


especially critical for applications that involve quantitative
measurements and depth from stereoscopy, we start with
the stereo calibration process by Camera Calibration Toolbox for Matlab, which uses Bouguets stereo calibration
method in Bouguet and Perona (1998). Firstly he two cameras are calibrated individually with the aim of improving
the accuracy of the stereo calibration process. The problem
of camera calibration is to compute the camera extrinsic
and intrinsic parameters. The extrinsic parameters of a
camera indicate the position and the orientation of the
camera with respect to the coordinate system; while the
intrinsic parameters characterize the inherent properties
of the camera optics, including the focal length, the image
centre, the image scaling factor and the lens distortion
coefficients. Typically twelve parameters are found for
each camera, which can be used as initial values of stereo
calibration process. Then the stereo calibration is executed
to obtain the new intrinsic parameters of the two cameras
and the translation vector and rotation matrix between
both of them.
108

Fig. 2. The stereo rig: UEye UI-1226LE-M-G cameras


installed with a seperation of 35cm
Simulations are carried out by putting the target in front
of the stereo rig. Fig.3 shows the rows aligned images after
the stereo rectification. Then the depth image of window
frame is obtained by making use of disparities between
the corresponding points in the two rectified images, see

IFAC RED-UAS 2013


November 20-22, 2013. Compiegne, France

Fig.4. By restoring the small gaps existing in the contour of


Fig.4, the depth image with connected contour is presented
in Fig.5. Finally in Fig.6 Canny edge detector in Canny
(1983) helps to extract the contour of target window
frame from the depth image so that the three-dimensional
coordinates of any points in the window frame can be
estimated. The whole execution of experiment lasts about
200ms which is appropriate to apply for the real-time
navigation task.

Fig. 6. Edge of target window frame extracted from depth


image
cloud. Point cloud is a collection of massive points in the
same spatial reference system, which expresses the spatial
distribution and the surface properties of the target.
Three-dimensional coordinates of each sample point on
the target surface (x, y, z) and the corresponding color
information can be acquired from the point cloud. We have
taken full advantage of the programs available in the open
source point cloud library (Rusu and Cousins (2011)) to
improve the efficiency of window model estimation process.

Fig. 3. Rows aligned images after stereo rectification

4.1 Algorithm description


For a fixed-size window with features (window frame
color), this task requires the detection and extraction of
the desired features in order to locate the target window in
a precise manner. The feature detection algorithm consists
of three stages: segmentation, extraction of characteristic
points and estimation of window centroid, which are
explained below:
Fig. 4. Depth image of object of interest

Segmentation The purpose of this stage is to extract the


color that characterizes the object of interest. According to
the given color information of features, which in this case
corresponds to the black window frame, the segmentation
process is executed to filter the input RGB color image
to obtain the image coordinates of window frame section,
which will be incorporated with the corresponding depth
data to form the input point cloud. The formulation for
the segmentation process is presented in table 1:
Table 1. Segmentation algorithm
Algorithm 1 Segmentation()
for i from 0 to Nr , j from 0 to Nc
if I(i, j)r <50 && I(i, j)g <50 && I(i, j)b <50
cloud.points.push back (i, j, I(i, j)d )
end if
end for
return cloud

Fig. 5. Depth image with connected contour


4. WINDOW ESTIMATION ALGORITHM USING
RGB-D CAMERA
As a RGB-D camera can capture RGB color images
accompanying with the corresponding depth information
at each pixel, it is convenient for the MAV to acquire
sufficient visual information and locate the target. Such
cameras manage all the information with a set of vertices
in a three dimensional coordinate system called point
109

Where I is the RGB image with components Ir , Ig , Ib


respectively; Nr and Nc are the numbers of rows and
columns of RGB image; Id is the depth image.
Extraction of characteristic points This is intended to
identify the geometric features and extract the key points
which approximately represent the four vertices of the
extracted window frame. We have applied the Normal
Aligned Radial Feature method (NARF) in Steder et al.

IFAC RED-UAS 2013


November 20-22, 2013. Compiegne, France

(2010) for interest points detection and feature descriptor


calculation in 3D range data. As NARF method takes
information about borders and the surface structure into
consideration, the centroid of target window frame can be
estimated by applying the edge information extracted by
the NARF detection algorithm.
How the key points are extracted from the input point
cloud is described in table 2:
Table 2. Key points extraction algorithm
Algorithm 2 ExtractKeyPoints (cloud)
range image = CreateRangeImage(cloud)
points[] = DetectNarfKeypoints(range image)
return points[]

(a)

Estimation of window centroid With the four estimated


vertices of the window frame, the three-dimensional coordinate of the window centroid in the camera coordinate
system can be easily obtained by intersecting the diagonals
of the four key points. Then the relative position of target
window is pinpointed.
(b)

4.2 Experimental setup


In order to perform the implementation of our strategy,
we use a Microsoft Kinect sensor, see Fig.7. As a lowcost RGB-D sensor developed by PrimeSense, Kinect is
equipped with three lenses, the lens in the middle is the
RGB color camera, and the lenses in the left and right side
are the infrared transmitter and infrared CMOS camera
which constitute a 3D structured light depth sensor.

Fig. 8. Three-dimensional point cloud of object of interest


extracted from the range image of target window frame,
which can be approximately regarded as the four vertices
of the window.

Based on the light coding, Kinect projects a known infrared pattern onto the scene and determines depth based
on the patterns deformation captured by the infrared
CMOS imager, see Xiang et al. (2011). Functioning in this
way, Kinect can provide a 320X240 depth image at 30fps
and a 640X480 RGB color image at 30fps. When stripped
down to its essential components, the Kinect weighs 115g
- light enough to be carried by a MAV.
(a)

Fig. 7. Microsoft Kinect sensor developed by PrimeSense


4.3 Experiments

(b)

In this subsection, several tests have been carried out just


with the involvement of the Kinect sensor. The execution
time of the vision-based window estimation algorithm lasts
about 800ms, which meets the requirements of MAV realtime navigation task. The distances between the Kinect
and the target window plane are respectively 0.78m and
0.59m. Fig.8 shows the input point clouds representing
the object of interest, which in this case indicates the target window frame. Fig.9 shows four characteristic points
110

Fig. 9. Characteristic points extracted from the point cloud


5. COMPARISON AND DISCUSSION
Regarding the algorithm performance, both algorithms
can restore the three-dimensional coordinate information
of the object of interest rapidly and efficiently, which can
be applied in the real-time navigation systems of a MAV.

IFAC RED-UAS 2013


November 20-22, 2013. Compiegne, France

The advantages of vision algorithm using RGB-D camera


lies in the more direct and easier reconstruction of depth
information. Furthermore the Kinect can achieve satisfactory performance even in areas with poor visual texture as
it can project an infrared light pattern onto the scene. A
disadvantage is the limited depth sensor range (from 0.8m
to 4.0m) which restricts the navigation capability of the
Kinect. Conversely, the stereo vision algorithm achieves a
farther functioning range (proportional to the separation
of the two cameras). Both vision systems are affected by
the light intensity for the reason that too strong illumination conditions will impair the performance of vision
system.
6. CONCLUSION
In this paper, we have presented two vision-based approaches that will be used for the window estimation problem of a MAV. Also, we have extended some preliminary
experimental results to verify the vision algorithms.
Future work will be improving the vision-based approaches
to make them reliable and robust for the identification of
realistic window model, combining the vision algorithms
with the onboard controller to complement the navigation
systems of a MAV.
REFERENCES
(2011). Ids imaging development systems gmbh. URL
http://www.ids-imaging.com/.
Amidi, O., Kanade, T., and Fujita, K. (1998). A visual
odometer for autonomous helicopter flight. Robotics and
Autonomous Systems, 185193.
Bouguet, J. and Perona, P. (1998). Camera calibration
from points and lines in dual-space geometry. European
Conference on Computer Vision, 26.
Bradski, G. and Kaehler, A. (2008). Learning OpenCV.
OReilly Media.
Canny, J.F. (1983). Finding edges and lines in images.
M.I.T. Artificial Intell. Lab.
Guevorkian, D., Launiainen, A., Liuha, P., and Lappalainen, V. (2002). Architectures for the sum of absolute differences operation. Signal Processing Systems.
Mejias, L., Saripalli, S., Campoy, P., and Sukhatme, G.S.
(2006). Visual servoing of an autonomous helicopter
in urban areas using feature tracking. Journal of Field
Robotics, 185199.
Mejias, L., Saripalli, S., Sukhatme, G., and Campoy, P.
(2005). Detection and tracking of external features in
a urban environment using an autonomous helicopter.
Robotics and Automation, 39723977.
Rusu, R.B. and Cousins, S. (2011). 3d is here: Point
cloud library (pcl). IEEE International Conference on
Robotics and Automation (ICRA), 913.
Steder, B., Rusu, R., Konolige, K., and Burgard, W.
(2010). Narf: 3d range image features for object recognition.
Xiang, X., Pan, Z., and Tong, J. (2011). Depth camera
in computer vision and computer graphics. Journal of
Frontiers of Computer Science and Technology.
Xu, C., Qiu, L., Liu, M., and Kong, B. (2006). Stereo
vision based relative pose and motion estimation for
unmmaned helicopter landing. International Conference
on Information Acquisition.
111

You might also like