You are on page 1of 6

Proceedings of the 2007 IEEE International Conference on Robotics and Biomimetics December 15 -18, 2007, Sanya, China

Towards intelligent autonomous Vision Systems Smart Image Processing for Robotic Applications
A NDREAS M AEDER , H ANNES B ISTRY, J IANWEI Z HANG University of Hamburg Institute TAMS, Department of Informatics Vogt-Koelln-Str. 30, 22527 Hamburg G ERMANY maeder | bistry | zhang@informatik.uni-hamburg.de http://tams-www.informatik.uni-hamburg.de
AbstractVision-based sensors are a key component for robot systems, where many tasks depend on image data. Realtime constrains of the control tasks bind a lot of processing power only for a single sensor modality. Dedicated and distributed processing resources are the natural solution to overcome this limitation. This paper presents rst experiments, using embedded processors as well as dedicated hardware, performing various image (pre)processing tasks. Architectural concepts and requirements for smart vision systems have been acquired. Index TermsService Robotics, Computer Vision, System Design, Embedded Control

systems are installed, serving different subsystems of the platform: An omnidirectional vision system used for localisation and global vision tasks. A stereo camera system mounted on a pan-tilt unit performing detailed three-dimensional vision for interaction with users, handling and manipulation of objects with arms and hands. Two cameras mounted on the hands to monitor and control grasps. The platform is controlled by standard PC hardware, as sketched in gure 1. Caused by the limited processing capabilities, a scenario-based strategy sequentializes the different tasks in such a way that only a few subsystems are active, while most of the hardware stays idle. Exploration, localisation and movement of the robot is based on odometric and laser range data. Only the omnidirectional vision system is used for localisation tasks outside the realtime control. The stereo camera system is active when the immobile robot is operating its arms or interacting with the user, e.g. recognition tasks, gesture and gaze detection, etc. The hand cameras are needed to grasp objects, guiding the approach of the arm and visually controlling the hand. Only using this workaround, a fast system control-loop can be guaranteed. Sensor-fusion techniques, like merging odometric, laser range sensors and the vision systems data, further enhancing the localisation and positioning abilities, can only be achieved with continuous image processing in realtime. III. S OLUTION A distributed processing approach, using smart sensors, could overcome these problems of centralized computing resources, as stated in [6]. This holds especially for vision, where high data rates in conjunction with the computational complexity of most image processing algorithms, impose high system load on the control hardware of the robotic platform. A smart vision system should be able to do autonomous image (pre)processing, performing (some of) the tasks within the perception action cycle:

I. I NTRODUCTION All studies predict an exponential growth of the market for personal and service robots in the near future [1]. In contrast to manufacturing robotic systems these autonomous robots have to interact with a highly dynamic environment. Therefore advanced sensor technology, sensor fusion and new modalities are the keys to dynamic interaction. Vision is one of the most sophisticated perceptional capabilities of humans and animals. We are able to localize ourselves in the environment, recognize objects and other persons, control our movements and avoid collisions based on our vision sense. It would be desirable to implement such vision capabilities for Service Robots, too. II. P ROBLEM The challenge for the integration of vision systems into Service Robots is the high computational effort that current image processing algorithms generate. Beside the complexity of the algorithms the actual effort depends on the resolution of the image data, the color depth and on the number of frames processed each second. Only high-quality image data leads to precise information about the environment. A high frame rate is sufcient for having up-to-date image data, so that the system can react to changing conditions as quickly as possible. On the other hand the resources on a mobile system are limited. There are several sensors and actuators that have to be controlled. A. Example: Service Robot Figure 1 shows the current conguration of our Service Robot TASER [2], [3]. To aquire image data, several camera

978-1-4244-1758-2/08/$25.00 2008 IEEE.

1081

Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

Fig. 1.

I/O-channels of the service robot

1. Image enhancement: normalisation of contrast and color, calibration 2. Preprocessing: lter and morphological operators 3. Sensor data fusion: e.g. 3-D processing 4. Feature extraction: segmentation, higher-level operators 5. Classication and image analysis Several, so-called smart-cameras have been introduced by companies offering computer vision products.They contain an embedded processor which can do some (limited) preprocessing within the rst two levels. The approach presented here is much more general, since the proposed vision system should be able to abstract from images, at high data-rates, and generate (signicantly less) feature data. Furthermore, an autonomous vision system permits the implementation of direct control loops, similiar to reex actions in biological systems. The desired system should combine the exibility of a software-programmed processor with the performance of dedicated hardware units to meet realtime conditions. One major design objective is to offer an experimental prototyping environment, which allows both hardware and software to be used in a exible manner. Image processing tasks should be transferred from the control PC of the service robot to the camera-based hardware system, from very simple image enhancement up to extended feature extraction. Due to the advances in service robotics, the vision system should adapt new algorithms, as well as new control strategies.

IV. S YSTEM DESIGN Two prototyping systems have been examined, to evaluate the specics of architectural alternatives. They will help to dene interfaces to the software components of the service robot and to derive architectural requirements for a novel vision system, comprising both hard- and software. Throughout the last sections of the paper the prototyping systems are introduced and the new vision subsystem is presented. A. Possible Approaches Several implementations would provide the functionality of an intelligent vision system, inheriting specic advantages and drawbacks: Embedded processors , as used in smart-cameras, provide the exibility of a software-controlled system together with the ease of use. Due to the fact that they transfer a PC-like environment, consisting of the hardware paradigm (Von-Neumann model), the programming toolchain and even the operating system, to the camera housing, they also have similiar problems: the hardware architecture is not suited to perform streaming operations. the computational complexity of advanced image processing algorithms is too high to run them under realtime conditions. Signal processors provide very high data throughput, but efcient programming of their architectures is quite dif-

1082
Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

cult. The desired exibility of the vision system in implementing new algorithms cannot be guaranteed. Dedicated hardware offers even more performance solutions, but is even more difcult to handle, since the realisation of new features requires hardware design cycles. The exibility of the system could only be achieved using FPGAs as a re-programmable hardware pool. The hardware solutions mentioned above enforce also different system design methodologies, namely the crosscompilation of code, the adaption of algorithms to different programming paradigms and the (manual) transfer to hardware structures using HDLs (VHDL, SystemC or SystemVerilog) and hardware-software codesign techniques. We propose a combination of these approaches, using customized hardware units or signal processors for stream data processing in realtime and exible software solutions for control implementation. V. FPGA- BASED SYSTEM PROTOTYPE First experiments [5] used an FPGA prototyping board from Altera, which can be used in an ideal manner for hardware-software codesign, since both parts can be implemented easily: Hardware is designed using VHDL. Subsequent synthesis and mapping steps generate the data to be downloaded. The board already contains SRAM and ash memory needed for the embedded processor. A library of parametrizable components enables enables easy integration of memories, FIFOs, arithmetic units or interface circuits into the HDL design. Their implementation is optimized according to the FPGA family. Different memory components like FIFOs, (RAM-based) shift registers and ROMs, are an integral part of the overall system architectur, sketched in gure 2 and 3. Software is developed for an IP (Intellectual Property) soft core. For software development the GnuPro toolkit provides a C cross compiler, which is adapted to the options of the processor. Through a serial interface console IO and debugging can be handled. This prototype board serves as the computational resource for the omnidirectional vision camera and the cameras of the stereo vision system of the service robot. Both camera systems transfer their images at 400 Mbps isochronous data rate via the IEEE 1394 bus (aka. FireWire). Since the FPGA prototype board is not suitable to interact with the FireWire bus at these data rates (with differential signals), an extra connection board has been designed. A. Specic results Exploring the design space, the FPGA-based system has served as an architectural workbench, evaluating design alternatives and specifying requirements of the overall system. Time-critical parts are implemented as hardware units to ensure the necessary data throughput, while control-driven algorithmic parts are implemented in software computing on the processor.

Several tasks of the vision system for the service robot had been implemented. Block level schematics for two of the designs are sketched in Figure 2 and 3. B. Automatic focus An autonomous automatic focus is the rst application for the vision subsystem board, enhancing the quality of the images, since a sample software implementation of an automatic focus already causes a processor load of about 30 % on the robot. This task belongs to the simple class of control applications, where the subsystem listens to the isochronous data on the FireWire bus, computes some control values for the camera and sends the values as asynchronous FireWire commands. 1) Hardware implementation: One central component of all designs is a state machine interacting with the FireWire board: FireWire Control. It has to initialize the external board during power-up and after IEEE 1394 bus identication and handles all data transfer to and from the link layer circuit. Data received on the FireWire bus is collected from the internal buffer of the link layer chip, analysed and distributed in either the asynchronous data stream (Figure 2 and 3: blue) or the isochronous (image) data stream (red). Due to the data format YUV 4:2:2 two adjacent pixel values are transferred as one FireWire Quadlet (32-bit). Computation in the arithmetic pipeline (AF-Pipeline) handles both pixel during one cycle. The pipeline itself has three stages, activated by a controller when the image data belongs to the central focus region. The CPU implements the higher levels of the IEEE 1394 protocol, computes the control algorithm for the automatic focus and is used for debugging and control through the serial port. A two-stage algorithm controls the camera, running global and local optimization steps based on the measure computed within the pipeline. C. Programmable digital lter Extending the architecture in a way that it can also send an isochronous FireWire data stream, the state machine connected to the FireWire board (FireWire Control IO) has to be changed. Dealing with (burst) data streams in both directions requires proper scheduling together with additional buffers to ensure that neither internal nor external buffers (link layer chip) will overow. The FPGA prototype gives the exibility to tune the parameters and to investigate different scheduling strategies hardwired in the controller. A prototype version based on the previous design, using only 33 windows and simple xed-point arithmetic (division as shift), has been implemented. D. Omnidirectional to panoramic image conversion The conversion of the high-resolution omnidirectional data into panoramic images, needed in different localisation algorithms of the service robot, has also been implemented, resulting in the architecture as shown in gure 3. Since each images is quite large (2.4 MB), some extra memory, implementing an input and output image buffer,

1083
Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

Fig. 2.

System architecture for the automatic focus

was added to the system using the SODIMM connector of the prototype board. A control unit (SDRAM Control) interfaces the external memory, generating the access cycles for dynamic RAM. A simple, request-based interface connects this controller with the main processing unit. To deal with a continuous FireWire input, the memory is logically divided into logical image areas. Bank switching techniques during address generation are used to interchange these images when processing has nished or isochronous transfer is complete. The computation itself is implemented completely in hardware, in the processing unit. This component contains a control component and two specialized data paths: address generation: it computes for a given address of the panoramic image, the corresponding pixel within the omnidirectional image.Two additional ROMs serve as lookup tables for sine and cosine. To speed up memory access an extra RAM is used as a cache, since several subsequent addresses result in the same target area. pixel interpolation: to enhance the image quality, an interpolation scheme has been implemented. A pixel value may be computed from its neighbours of the omnidirectional image, when addresses do not match.

VI. S MART-C AMERA The second prototype examined a commercial smartcamera, the Basler eXcite exA1390-19c [4], to be integrated to the service robot. A. Hardware The camera contains a 1 GHz MIPS processor with 128 MB of RAM and 128 MB ash ROM. Performance is comparable to an Intel Pentium III with 800 MHz. The system runs an adapted version of Linux with a 2.6 kernel. Communication is done via Gigabit Ethernet. The Camera has a resolution of 1388x1038 pixels, in the continuous shot mode the camera can take 18 pictures each second. B. Experiments Running a standard operating system, we had the possibility to reuse a huge set of already complete image processing functions and algorithms. We developed a distributed software system to control the processing function of many cameras from a single instance. The setup of the connection is shown in gure 4. Transfer of image data and processed data will be done within the GStreamer framework. This open-source multimedia framework is supposed to be an equivalent of DirectShow

1084
Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

Fig. 3.

System architecture for the image conversion

under Linux. Many functions needed for this application are already implemented in GStramer, like format conversion, image resizing, encoding, decoding, timing issues and network data transmission. GStreamer Framework is plugin-based, so the functionality can be expanded by writing new plugins. The plugins are connected to a processing chain, so that many operations can be run consecutively on image data. A plugin for the integration of the image data of the Basler camera has been developed. An example of a possible image processing strategy directly on one Smart Camera that incorporates preprocessed image data of a second Smart Camera is shown in gure 5. C. Specic results The aim of the experiments was the examination of appropriate software architectures to be used in conjunction with smart vision systems. Due to its streaming nature with inherited capabilities for network transfer, the GStreamer framework provides the ideal environment for the transition of image processing algorithms from the robots control PC to vision systems hardware. Using this software environment, the experiments yielded two results: Simple image processing tasks can be transferred to the processor of the camera, but implementing more parts of the streaming chain leads to a computational bottleneck

on the camera. Complex operations, like the one sketched in gure 5, cannot be done in realtime. The transfer of preprocessed images puts a high system load both on the camera and the control PC. As a consequence, the amount of data to be transferred must be reduced. This can only be achieved by the complete feature extraction on the vision system itself. VII. R ESULTS Summing up the previous experiments, a novel architecture must provide the exibility of a software system and the performance of signal processors or data paths implemented in dedicated hardware, all this integrated into a framework, which allows the implementation of vision algorithms independently from the underlying hardware ressources. Funded by the German Gouvernment (BMBF) and in cooperation with the company Basler, we investigate appropriate architectures, consisting of a combination of standard processors and hardware-enhanced processing units. Using different sets of libraries, software-programming, signal processor based realisations and hardware mapping should be transparent to the end user.

1085
Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

Fig. 4.

Connection between TASER and the Smart-Camera

Fig. 5.

Setup where image data is processed on two Smart-cameras and transmitted to the control PC of the Service Robot

VIII. C ONCLUSION This paper presented both hard- and software architectures for a exible and extendible vision subsystem. Using a network of distributed modular smart sensors on robotic platforms, the system load on control units of the robot could be reduced drastically. In future the smart sensor technology enables us to do further tasks which were not possible with the initial system architecture. Prototyping environments have been developped ond tested on our service robot. They served as an experimental workbench investigating system architectures, providing information for further developments. R EFERENCES
[1] Robotics Market, RoboNexus Conference and Expo, San Jose, CA, Oct. 6-9 2005 URL www.robonexus.com/roboticsmarket.htm

[2] D. Westhoff et.al., A exible framework for task-oriented programming of service robots, in Robotik 2004, VDI-Berichte (ISBN 3-18-0918411), Munich, Germany, June 2004. [3] T. Baier M. H ser, M. Westhoff, J. Zhang, A exible software aru chitecture for multi-modal service robots, in Proc. Multiconference on Computational Engineering in Systems Applications (CESA), Oct. 2006 [4] Basler Vision Technologies, Intelligent Camera eXcite, Ahrensburg, Germany URL www.baslerweb.com [5] A. Maeder, J. Zhang, System Design for an Autonomous Smart Vision System, in Proc. 5th WSEAS International Conference on Instrumentation, Measurement, Circuits and Systems (IMCAS 06), Hangzhou, China, April 2006. [6] H. Bistry, S. P hlsen, D. Westhoff, J. Zhang, Development of a o smart laser range nder for an autonomous service robot, in Proc. IEEE International Conference on Integration Technology (ICIT 2007), Shenzhen, China, March 20-24, 2007.

1086
Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on August 11,2010 at 08:50:52 UTC from IEEE Xplore. Restrictions apply.

You might also like