You are on page 1of 5

Proceedings of the 4th European DSP in Education and Research Conference

A NOVEL ALGORITHM FOR DISPARITY CALCULATION BASED ON STEREO VISION


Zhen Zhang, Yifei Wang and Naim Dahnoun
Department of Electrical and Electronic Engineering, University of Bristol Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, U.K. phone: + (44) 117 928 8619, fax: + (44) 117 954 5206 email: Zhen.Zhang@bristol.ac.uk; Yifei.Wang@bristol.ac.uk; Naim.Dahnoun@bristol.ac.uk

ABSTRACT This paper presents a novel algorithm for disparity calculation based on stereo cameras. The proposed algorithm applies colour segmentation to each image row. All pixels within the same segment contribute to the disparity calculation of the segment. The individual disparity of each pixel is calculated based on the combination of the pixels cost function and the disparity of its belonging segment. The algorithm is implemented on to the Texas Instrument TMS320DM648 DSP Platform for real-time applications. Our experiments suggest the proposed algorithm is able to provide high resolution disparity map with large number of disparity levels. Furthermore, the proposed algorithm generates disparity maps with signicantly increased accuracy and percentage of correct disparity compares to the traditional stereo matching algorithms. 1. INTRODUCTION Disparity calculation based on stereo cameras is a popular research area [4]. It is widely used for object detection as the distance is inverse proportional to the disparity in the scene. In order to nd the disparity, the cost or likelihood function used to build the correspondence between the left and the right images must be decided. Once the correspondence is determined, the movement between the two corresponded pixels/areas are treated as the disparity. However, large amount of errors will be introduced in the occlusion areas. There are a number of occlusion area handling techniques available but this part is not included in our system. The most important part for disparity calculation is nding the correspondence between the pixels on the left and right image. Most of the algorithms for building the correspondence that can achieve real-time operation could be separated into two catalogues: The block based methods and the region based methods. For the block based methods [8, 10, 1, 9, 6], the correspondence of each pixel is based on the information provided by pixel and its surrounding pixels. The advantage of these algorithms is they provide high resolution depth map with large number of depth levels. The disadvantage is very difcult to determine the block size which is suitable for different images or even for the different areas of the same image pair. Small block size result in sharp edges on the depth map but induces errors in the homogeneous regions since only a very limited local information is used. Large block size provides good performance in the homogeneous areas but results in very inaccurate disparity measurements along objects edges. This is due to the pixels inside the same block having varying disparities.

For the region based methods [8, 7, 5], image segmentation is applied to separate pixels having different disparities into irregular regions. Then the disparity of these regions are measured according to the regional optimum value instead. These types of algorithms provide sharp edges on the disparity map but the number of depth level is limited, since pixels in a large area share a constant disparity. In this case, the region based algorithms are especially unsuitable for nding gradually changing disparities. Furthermore, accurate segmentation results are needed for these algorithm and therefore these algorithms are computationally time consuming [4]. For the proposed algorithm, assuming left and right cameras are calibrated and the two image planes are parallel, colour segmentation is applied to image rows on a line by line fashion. Since the segmentation is one dimensional process, the computation for this step is minimised while the pixels belonging to different disparities are separated. In addition, the segment disparity is further processed to calculate the disparity of an individual pixel. Therefore the proposed algorithm combines the advantage of both the block based and the region based methods. The resolution and depth level is preserved while accurate results can be generated at the edge areas (do not include the occlusion areas). Finally, the algorithm is suitable for real-time implementation since each line of the disparity is only dependent on the left and right images as two corresponding lines. Thus the required memory and computation is minimised. The proposed algorithm is separated into four parts: selecting the cost of each pixel based on predened cost function, image line segmentation, segment disparity calculation and nally pixel disparity calculation. In this paper, Section 2 introduces details of the proposed algorithm, Section 3 describes the implementation stage on to TMS320DM648 DSP platform, Section 4 compares the results of the proposed algorithm with traditional block based algorithms and Section 5 concludes the paper. 2. PROPOSED ALGORITHM 2.1 Cost function generation Sum of absolute difference (SAD) and normalized crosscorrelation (NCC) have been chosen as the cost function for many traditional algorithms. For both of these cost functions, a xed correlation block size is set. For the NCC each block on the right image are compared with the shifted block on the left image using normalised cross-correlation. The maximum value among the correlation results is selected and the amount of the shifts to achieve that value is the disparity of

180

(a) Tsukuba left image

(b) Sawtooth left image

(c) Venus left image

(d) Tsukuba right image

(e) Sawtooth right image

(f) Venus right image

Figure 1: Stereo vision evaluation original images. The rst row images are the original left images, and the second row images are the right images.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 2: Stereo vision evaluation result. The rst row images is the results with NCC cost function, the second row is the results with SAD cost function, and the third row content the results of the proposed algorithm. this pixel. When using the SAD cost function, the absolute difference is calculated between the block on the right and the shifted block on the left image. The sum of the subtraction results are treated as the cost value for the centre pixel. The minimum value is selected and its shift distance is the disparity of the pixel.

181

Under the assumption that the left and right cameras are calibrated and the two images are shared the same image plane, the correspondence of each pixel can only be in the horizontal direction. In this case, the correspondent point of the pixel on the left image can only appear on the same row of the right image. The Absolute Difference (AD) is chosen to be the cost function in the proposed algorithm. It calculates the absolute difference between a pixel on the left image and the one on the right image. Instead of processing by blocks for the NCC and the SAD, the AD does not require the information of the neighbourhood pixels. It reduces large amount of calculations in expense of lower accuracy and higher error rate. By calculating the AD of the right and the shifted left image, the difference array (DA) for each pixel is obtained. The DA can be represented in Equation 1. In the Equation, Ir and Il represent the pixels on the right and the left image , where x and y represent the image row and column number respectively. The shift distance (d) of the minimum values in the difference arrays (DA) are selected as the disparities for the pixels. Unlike many other algorithms, RGB components (n) instead of intensity of each pixel is used in our proposed algorithm. Since colour information is included in the RGB components, better performance could be achieved. DA(x, y, d) =

(a) Original Left Image

(b) Original Right Image

(c) Disparity map with Absolute(d) Disparity map of the proposed Difference algorithm

nR,G,B

n |Ir (x, y) Iln (x d, y)|

(1)

Figure 3: Using Absolute Difference as the cost function. The rst row images are the original left and right image. Figure 3c is the disparity map using AD as cost function. Figure 3d is the disparity map of the proposed algorithm.

In each pixels difference array, a distinct minimum corresponding to the correct disparity may exist. However, this is not always the case since the difference array could contain large number of points having very similar value as the minimum as shown in Figure 5. It is hard to determine which one corresponds to the correct disparity. This often happens on the homogeneous areas of an image. In order to solve this problem, a one-dimensional image segmentation process is applied in order to include the information provided by the neighbouring pixels. Figure 3c shows the resultant disparity map using the AD cost function. The error rate is unacceptably high. In order to reduce the errors while maintaining low computational complexity, a colour segmentation technique is applied. This technique is described in details in Section 2.2 and 2.3. 2.2 Image line segmentation In the traditional NCC and SAD algorithms, a block around the current pixel is chosen to calculate the pixel correspondence. In different area of an image, different block sizes are required to achieve optimum results. It is difcult to choose a x block size to achieve good performance across the whole image. Using this block to determine the correspondence of the pixel results blurred in edges especially with large block sizes. Figure 4 shows the results of algorithms using the NCC and the SAD cost functions. In the proposed algorithm, a one-dimensional colour segmentation process is used to overcome this problem. Within an image, adjacent pixels with similar colours are likely to be belonging to the same object so that these pixels are likely to share identical disparity. Under this assumption, similar colour pixels in the same image row are segmented. Since segmentation is applied, the coordinates of each pixel could be represented by its group number (g), and the pixel index

(a) Disparity Map using NCC

(b) Disparity Map using SAD

Figure 4: Disparity Map using NCC and SAD as the cost function. (i) inside that group. Next step is to combine the difference array for each pixel in the same segment to calculate the segment disparity which will be used to rene the pixel disparity later. 2.3 Segment disparity calculation Many incorrect pixel associations occur in the homogeneous areas due to the lack of texture. Hence the difference array of the pixels in this area contains points that have similar value with the minimum. Figure 5 shows an example of a situation that often occurs in the homogeneous areas. Three yellow blocks represent the pixels on the left and right images, the number on the blocks indicates their values. For each individual pixel, its difference array contains a few minim (as shown in Figure 5a). It is difcult to correctly select the corresponded point. The right column of Figure 5 illustrates the summation results of the difference arrays of all three pixels. From this gure, we can see that the single minimum

182

point shared by all three difference arrays is the correct one. In order to differentiate this point from the rest, the summation of their difference arrays is calculated. In the proposed algorithm, the difference arrays of each pixel in a segment are summed and a group difference array (GDA) is produced. The GDA is shown in Equation 2. Where g is the group number, i is the index in the group, and Tg is the total pixel number of the group. The segment disparity is the index of the minimum point in the GDA. The correct disparity for each pixel should be close to the segment disparity. However, by applying the segment disparity for all the pixels in the group will result the depth resolution error.
Tg

where

GDAg (d) = DAg (i, d)


i=1

(2)

Wg (d, , 2 ) = e 2 2 where Wg (d, , 2 ) denotes the Gaussian shaped weighting function for group g, and 2 represents the mean and variance of the weighting function. The weighting function is scaled accordingly to avoid the numerical issues before processing. By doing so the weighting function limits the minimum point to be around its mean. In FDA, the position which holds the lowest value is chosen to be the disparity of that pixel. Using this point as the disparity minimises the possibility of incorrect pixel associations. In the mean while, difference information for each individual pixel is also taken into consideration. The proposed algorithm combines both pixel and segment information for disparity calculation. Hence it will not affect the resolution of the disparity map. Selected results are shown in Figure 3d. 3. IMPLEMENTATION The proposed method is implemented on the DM648 multi channel DSP platform from Texas Instrument [3] . The proposed algorithm is processed with single line of pixels. Each line of pixels are individual with each other, that makes the memory transfer from the display buffer to the internal memory easier to be implemented. Designing with implementation in mind, it is very suitable for xed-point DSP implementation [2]. The complete algorithm only require addition, subtraction and very few multiplications, thus can be highly paralleled processed with all eight function units in the DM648 processor. The complete system is currently running at 3 fps with standard PAL input (720576). The DM648 digital media processor contains eight highly independent functional units that can be used in parallel. The processor is clocked at 900MHz and based on C64+ core. It also support non-aligned load-store operations which provides efcient loading and shifting during the DA calculation stage. The main focus of the implementation is divided in two parts, the memory management and code optimising. Since access to external memories is unacceptable slow for realtime DSP applications, carefully handled and specied data transfer between the external and the internal memory is necessary in the application. The DM648 comes with 64 KB Level 1 on-chip memory (L1) and 512 KB Level 2 on-chip memory (L2). Both of them can be used to store image lines to be processed. Also, because the Enhanced DirectMemory-Access (EDMA) Controller works in parallel with the processing unit, the overhead of the entire transfer can be minimised using double buffering. Figure 6 shows the data ow diagram for the memory transfer. In Figure 6, two pair of buffers are allocated in the L2 internal memory. Each of them contain one input buffer (DataBuf1 or DataBuf2) and one output buffer (ResBuf1 or ResBuf2). At loop n, EDMA transfer one line of image data from the capture frame buffer (locates in the external memory) to DataBuf2 and the result data from ResBuf2 to the display frame buffer (locates in the external memory), while the data in DataBuf1 is processed by the CPU. In the next loop n + 1, while DataBuf1 is lling with the new data lines, ResBuf1 is transferring out and the data stored in DataBuf2 is processing by the CPU. By applying this double buffering structure, the memory transfer between the capture frame

(d)2

(a)

(b)

Figure 5: Forming of Difference Array (DA) and Group Difference Array (GDA) in homogeneous area. 2.4 Final disparity calculation In the proposed algorithm, in order to fully use the local information provided by the DA, a Gaussian shaped weighting function is used instead of the GDA. Other than using the minimum value of the GDA to control the nal disparities of all pixels in the group, a Gaussian shaped weighting function which has xed variance can allow a certain variation of the minimum position. Combining the DA of each pixel and the segment disparity information (represented by the weighting function), the resolution and the disparity levels are preserved. The nal difference array (FDA) of each pixel as shown in Equation 3. FDAg (i, d) = DAg (i, d) Wg (d, , 2 ) (3)

183

Venus 17 levels) while in the proposed algorithm it shows much more accurate disparity levels. 5. CONCLUSION This paper presents a novel algorithm for disparity calculation based on stereo vision. The proposed algorithm applies colour segmentation on each image row and all the pixels within the same segment contribute to the disparity calculation of the segment. The individual disparity of each pixel is calculated based on the combination of the pixels difference array and the Gaussian weighted function generated by its group difference array. The algorithm can be implemented highly efciently for real-time applications. After comparing with the traditional algorithms using SAD and NCC as cost functions, the experiment results suggest the proposed algorithm is able to provide high resolution disparity map with large number of disparity levels. By improving the line segmentation technique to produce more accurate colour segments the accuracy of the nal disparity map will be increased. Further improving the shaped weighting function generation stage in Section 2.4, the quality of the nal disparity map would be increased. REFERENCES [1] E. Binaghi, I. Gallo, G. Marino, and M. Raspanti. Neural adaptive stereo matching. Pattern Recognition Letters, 25(15):17431758, 2004. [2] N. Dahnoun. Digital Signal Processing Implementation: using the TMS320C6000 processors. Prentice Hall PTR, 2000. [3] N. Dahnoun. C6000 DSP teaching ROM. Texas Instruments, 2nd edition, 2005. [4] N. Lazaros, G. Sirakoulis, and A. Gasteratos. Review of Stereo Vision Algorithms: From Software to Hardware. International Journal of Optomechatronics, 2(4):435 462, 2008. [5] P. Mordohai and G. Medioni. Stereo using monocular cues within the tensor voting framework. Computer Vision-ECCV 2004, pages 3035, 2006. [6] K. Muhlmann, D. Maier, J. Hesser, and R. Manner. Calculating dense disparity maps from color stereo images, an efcient implementation. International Journal of Computer Vision, 47(1):7988, 2002. [7] A. Ogale and Y. Aloimonos. Shape and the stereo correspondence problem. International Journal of Computer Vision, 65(3):147162, 2005. [8] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1):742, 2002. [9] L. Stefano, M. Marchionni, and S. Mattoccia. A fast area-based stereo matching algorithm. Image and vision computing, 22(12):9831005, 2004. [10] S. Yoon, S. Park, S. Kang, and Y. Kwak. Fast correlation-based stereo matching with the reduction of systematic errors. Pattern Recognition Letters, 26(14):22212231, 2005.

Figure 6: System ow diagram for double buffering buffer and the input buffer works in parallel with the processing unit. It reduced a lot of operations and queueing time for the CPU to access data from external memory. The proposed algorithm is programmed in C language only using xed-point arithmetic, which is highly optimised by the complier. Because of the intensive calculation of the proposed algorithm, further optimised using intrinsics is necessary. Four intrinsics functions subabs4, mpy2, saddu4, sadd2 are used together with the memd8 to increase its efciency. For each side of the CPU blocks, subabs4 performs absolute difference of four 8-bit pairs, mpy2 multiplies two 16-bit pair, saddu4 and sadd2 performs additions with 8-bit and 16-bit pairs. Because C64+ core processor content two sides of the CPU blocks, two intrinsics instructions can be perform on each clock cycle. By applying these intrinsics functions with memd8 to load a 64-bit double word on each clock cycle, the total CPU cycles used for processing an capture frame reduces 75%. 4. EXPERIMENTAL RESULTS The most popular evaluation data sets are the Middlebury image pairs [8]. In this case Tsukuba, Sawtooth and Venus were selected to evaluate the error rate of the algorithm. The results are shown in Figure 2. For the proposed algorithm, by comparing the result to the ground truth provided, it indicates 13.9%, 7.22% and 6.12% error percentages for the Tskuba, Sawtooth and Venus stereo pairs respectively. Table 1 compares the error rate using the NCC and the SAD cost functions with the proposed algorithm. Tsukuba % 41.4 36.9 13.9 Sawtooth % 9.92 11.9 7.22 Venus % 17.4 24.5 6.12

NCC SAD Proposed algorithm

Table 1: Percentage of error pixels for four image pairs. The rst three results are quite satisfactory without the need of occlusion handling techniques. By implementing the occlusion handling technique in the future, the accuracy will be much higher. Also, the ground truth disparity only provide certain disparity level, (Tskuba 7 levels, Sawtooth 14 levels,

184

You might also like