You are on page 1of 4

IMAGE SEGMENTATION WITH IMPROVED WATERSHED ALGORITHM AND ITS FPGA IMPLEMENTATION

Chuiig J. Kuo, Soulieil F. Odeh', and Mirig C. H i ~ a i i g Institute of Communications Engineering National Chung Cheng University Chiayi 62107, Taiwan 'Department of Computer Engineering University of Jordan Amman 11942, Jordan ing pixels in the grayscale image. Therefore, pixels with small gradient value can be ignored by a simple thresholding operation. In this case, a less oversegmented image can be obtained if the gradient image after thresholding is further watershed transformed. This process is equivalent to removing the additive noise in an image by a lowpass filter. In this paper. an improved watershed algorithm, which is based on Vincent's immersion watershed algorithm. is introduced. In conventional watershed algorithm. label of a pixel is decided based on its eight neighboring pixels and thus a processing element (PE) needs 3x3 inputs. Here. a scheme is proposed to decide the labels of 3x3 pixels based on their 16 neighboring pixels. As a result, the watershed can be found quickly and the architecture throughput of the proposed algorithm is increased. Moreover, an added advantage of this algorithm is that a less over-segmented image can be produced compared to conventional watershed algorithm. The organization of this paper is as follows. The proposed algorithm is explained in section 2. Section 3 presents the simulation results, which show that the improved watershed algorithm has a better throughput and yields comparable results to those of Vincent's immersion watershed algorithm. An FPGA-based architecture, which is developed to implement the improved algorithm, is described in section 4. Finally, section 5 discusses the advantages of the improved watershed algorithm and the architecture.

ABSTRACT
This paper is concerned with image segmentation based on watershed transform techniques. It introduces a fast, improved watershed algorithm which processes 3x3 pixels in one process. Simulation results show that the improved watershed algorithm has a better throughput and yields comparable results to those of Vincent's immersion watershed algorithm. The improved algorithm is modified and formulated such that it is amenable to computing architecture implementation. An PGA-based architecture that is developed to implement the proposed algorithm is presented. This architecture improves the applicability of this algorithm in real time applications. A description of the improved watershed algorithm. its extension to NXN pixels, and the architecture implementation is presented.

1 INTRODUCTION
Image segmentation is an important pre-processing step in the applications of MPEG-4 and computer vision [4]. Although it is a difficult problem. several solutions have been proposed. Among these existing techniques, watershed transform [3] is the best-known because it guaranteed closed contour of objects con be found. Watershed transform is a morphology-based image segmentation scheme to extract shape-related information from images. In watershed transform. an image is considered as a topographic surface where the grayscale value of a pixel denotes the altitude of that point. Two different watershed transform techniques. namely, rain falling and water immersion, can be used to find watershed within an image. The rain falling algorithm [ l ] is a straightforward way to find watershed but requires extensive computations. On the other hand, the water immersion algorithm [9-31 can find watershed of an image quickly. In water immersion algorithm, information about the object's boundary in an image is obtained by simulating immersion of the image surface based on flooding water from the lowest point in the topogfaphic surface. When water from two different minima (basins) merge, an artificial dam is created. At the end of the immersion procedure, each local minimum in the surface is completely surrounded by an artificial dam (or watershed of the image). Simulated immersion algorithm is a region-growing method but in topographical sense. Since the water immersion algorithm is much faster than the rain falling algorithm, watershed transform is almost always computed based on the simulated immersion algorithm. Watershed transform usually produces an oversegmented image due to additive noise in natural scene images. In mathematical morphology. the contour of an object within an image corresponds to lines where gayscale values are changing abruptly compared to the backgound. Hence. g a dient operation is first used to obtain the position of contours Small gradient modulus, which may originate from additive noise. corresponds to small altitude difference of neighbor-

2 THE IMPROVED WATERSHED ALGORITHM


In this section, we propose a new watershed algorithm that is faster than Vincent's immersion watershed algorithm. The original idea of Vincent's immersion watershed algorithm is to design a processirig eleiiieiit ( P E ) such that the single pixel's watershed is decided. In Vincent's algorithm. the PE needs 3x3 inputs while the algorithm presented in this paper needs 5x5 inputs. (Please note that the algorithm presented here can be easily extended to process iixii inputs with 17 = 7 . 9. 11. ... .) To decide watershed for 3x3 pixels at the same time, modifications are needed on Vincent's algorithm. The improved watershed algorithm is depicted in Fig. I and is discussed in the following. The conventional watershed scheme provides the label of pixel C but requires 8 additional supporting pixels ( N , - N,) to decide C s label. as illustrated in Fig. 9. In this scheme. the image pixels are first sorted and then processed from the smallest to the largest. The PE of the conventional watershed technique inputs the label of 3x3 pixels. if available. and outputs the label of the central pixel C. Before processing. pixel c's label is not assigned yet but all other pixels' (i.e.. the N,- N , pixels) label will already be assigned if their value is smaller than that of pixel C . Pixel C will be at the watershed line when its neighboring pixels have two (or more) different labels. Details of this scheme are found in ['I. Apparently, the throughput (U9) of the PE is far too low.

11-753
0-7803-6685-9/01/$10.0002001 IEEE

Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 09,2010 at 15:41:55 UTC from IEEE Xplore. Restrictions apply.

Our scheme, however, decides the label of pixel C and its neighboring pixeIs NI - Ns at one pass but requires 16 additional supporting pixels SI - SI6,as illustrated in Fig. 3. Our objective is to increase the throughput by processing 5x5 pixels together instead of 3x3. The details of the proposed algorithm are shown below. First, all the label and pixel values of these 25 pixels are inputted to the PE. The PE here consists of 10 smaller PES that are almost identical to the PE in the conventional case. Our scheme first decides the label of pixel C and then decides the label of pixels NI - Ns by using the pipeline architecture (discussed in section 4) as follows. We fEst decide the label of pixel C based on the label of pixels N I - Ns. Since the image pixels are still sorted and processed from the smallest to the largest, the neighboring pixels of pixel C already have label if their value is smaller than that of pixel C. Therefore, this procedure is identical to the one shown in Vincent's algorithm. Once the label of pixel C is decided, the rest of the PES (total number is 9 ) decides the label of pixels NI - Ns. Taking pixel NI as an example, some of its neighboring pixels may not have label yet although its pixel value is smaller than the pixel value of NI. This is because the pixel value of NI should be larger than or equal to that of pixel C. As a result, some of the neighboring pixels of N I (SI, &, SIs.S16, S:, N2, and Ns) might have pixel value smaller than N I . (Please note that pixel value of C is always smaller than that of N I . )This face leaves the decision of label for pixel N I difficult. To solve this problem, we must consider the pixel value of the neighboring pixels of NI instead of its label only. In processing pixel N,. i = 1, . . ., 8, only the three supporting pixels in the opposite direction of pixel C (with respect to Ni) are considered and processed. For example. in deciding the labei of pixel NI. we need to consider the pixel values of pixels C. SI. and SI6. S,, Similarly, in deciding the label of pixel N 2 , we need to consider the pixel values of pixels C. &. SJ. and S4.Deciding the label of pixels Nl-Ns is very similar to that of pixel C . However, we need additional criteria to say pixel N , . for example, is a watershed. That is. the pixel value of N I must be the largest among pixels C, SI. S2. and SI,. We denote this criterion as "LHL" (low, high. and low) condition because the pixel value of N , (or N2) must be larger than its four neighboring pixels C. SI.S,, and SI6 N I (or C, SI. and Sa) to be deemed watershed. for S,, The reason why only some neighboring pixels of Ni. i = 1, ... 8. are considered in deciding whether Ni is a watershed is as follows. Among the neighboring pixels of pixel N,. the pixel C has smallest pixel value. Take pixel N I as an example. the watershed is very unlikely to pass through C-N,-SI. In addition. when the image to be processed is large. the w,atershed passing pixel N I can be considered as a straight line locally. Therefore, we only consider the values of pixels C . SI,S,. and SI6because pixels C. NI, and SI(or S2 or sl6)can decide whether there is a watershed pass through SJ-NI-S~~ S16-Nl-N2 or s2-Nl-N~). (or The procedure used in the improved watershed algorithm is illustrated in the flowchart shown in Fig. 3 . Apparently, the improved watershed algorithm increases the throughput from 1/9 to 9/35. Also. the data transfer between the PE and the memory is significantly reduced. The proposed algorithm can be easily extended to process 17x17pixels jointly where = 5, 7, ... . Take 17 = 5 and Fig. 3 as an example. Here we first process the pixel C. then comes A the pixel N , and finally the pixel Si. difference is that 7 x 7 image block must be considered in order to obtain the watershed of a 5x5 image block. Besides. all the pixel values

and labels must be inputted into the PE to generate the watershed for of a 5 x 5 image block.

3 SimulationResults
The improved watershed algorithm is used to compute watersheds in digital grayscale images. As in conventional watershed algorithms, the image pixels are sorted in increasing order based on their grayscale values. The label of the image pixels is decided in two steps: First, deciding the label of the center pixel (i.e., the extracted pixel). Next, deciding the labels of neighboring pixels. In order to verify the results of the proposed algorithm presented in this paper, several input patterns were tested, in which the final results of this algorithm are close to those of Vincent's algorithm. An example is shown in Fig. 5 in which the output labels of the two algorithms are identical. The proposed algorithm is then tested on the 512x512 Lena's image. Fig. 4 show two similar watershed images resulting from Vincent's algorithm and our proposed algorithm, respectively, for a gradient image with threshold value 17. Table 1 summarizes the advantages of our proposed algorithm compared with Vincent's algorithm in terms of the number of pixels to be processed, the number of input pixels for the PE, and the I/O translation rate. This table clearly shows that our proposed algorithm is much more efficient in processing and in J/CI translation rate while giving comparable results to those of Vincent's algorithm. The proposed algorithm can process 12x12 image block jointly; however, the details of watershed information are lost when image block is too large. Table 2 shows the watershed results of the image from the Vincent's algorithm and the proposed one. Based on this table, our proposed scheme is suitable for images with size QCIF and up when 3x3 block are processed jointly. Under this situation. no apparent degradation is observed As for image with size SQCIF. we had better use Vincent's algorithm. It is known that a major problem with watershed technique is that it always results in oversegmented images. However, via the proposed technique, the oversegmented image is less likely to appear. This is an added advantage of the proposed scheme.

4 ArchitectureImplementation
In this section, a description of the ASIC chip architecture that implements the improved watershed PE is presented. Here we only consider the window of 3x3 only. An ASIC design is essential and necessary to implement the watershed PE because one CPU (or DSP chip) cannot handle the intensive computations required by the algorithm while giving a good performance. A properly designed ASIC architecture can provide for the massive computations required in pre-processing and post-processing of the watershed segmentation at a reduced watershed processing time. The block diagram of the pipeline architecture that implements the improved Watershed algorithm is shown in Fig. 5. Using this architecture. our scheme first decides the label of pixel C and then decides the label of pixels NI-N8. This architecture consists of 10 PES that are pipelined as shown in Fig. 5. Since the first and second stage PES have many components 'and functionality in common, the overall size is not ten times. 'The number of data transfer between the PE and the memory is reduced because the throughput rate of the improved watershed algorithm is 9/25 as opposed to 119 as in the conventional scheme.

11-754

Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 09,2010 at 15:41:55 UTC from IEEE Xplore. Restrictions apply.

This watershed PE architecture has 77 serial input data (i.e., 25 pixel values, 25 label values, 25 assigned signals, I clock. and 1 reset) and 9 serial output data. The logic gate count of the watershed PE is 2965 gates. The gate count of the watershed PE is less than 9 smaller PES that process a pixel, which amounts to a decrease of about 32% gate counts The performance of the FPGA implementation is 50.363 MHz in XC40 13E pq 160- 1. The schematic diagrams of the circuits for the center label and the neighboring labels are shown in Figs. 6(a) and 6(b), respectively. In order to prove and test its correctness, this watershed architecture was simulated on several input patterns. Although a single PE requires more inputs by our scheme, the data flow between the PE and the memory is reduced because the throughput is larger as discussed above. The performance of the improved watershed algorithm and architecture is tested when extended to work with NxN pixels in order to match the pixel numbers with those of Vincents watershed algorithm. Simulation results show that the percentage of matches decreases as the number of pixels increases. The percentage of matches is better for the case of approximate matches. The rate of decrease flattens up as the number of pixels increases and is better for the approximate matches. We found that the U 0 data translation rate efficiency of the improved algorithm increases as the number of pixels increases. Finally, simulation results indicate that the maximum path delay time of the architecture increases as the number of pixels increases, which is expected.

[3] L. Vincent and P. Soille. Watersheds in digital spaces: An efficient algorithm based on immersion simulations, IEEE Transactions 0 1 7 Pattern Aizalysis and Machiiie Iiitelligeiice, vol. 13, pp. 583-597, June 1991 [4] ISO/IEC 14496: Information technology-Coding of audio-visual objects, MPEG-4 standard

Table I: Comparison of the watershed algorithms for 5 12x5 I 2 Lena image.

I HDTV IHDTVICCIR-601 ~CFlQCFlSQCIFI

IProposed,9x91 Yes I Yes I No INo( No I No I Table 2 : Comparison of the watershed algorithms for image with different format (Yes: acceptable; No: unacceptable).

5 Conclusion
A fast, improved watershed algorithm is introduced which processes 3x3 pixels in one process. In Vincents immersion watershed algorithm, the PE needs 3x3 inputs while the improved watershed algorithm presented in this paper needs 5x5 inputs. The improved watershed algorithm decides the 3x3 pixels watershed simultaneously by using one PE. Although a single PE requires more inputs by our scheme, the data flow between the PE and the memory is reduced because the throughput is larger. Simulation results show that the improved watershed algorithm has a better throughput and yields comparable results to those of Vincents immersion watershed algorithm. The improved algorithm is modified and formulated such that it is amenable to computing architecture implementation. An FPGA-based architecture that is developed to implement the proposed algorithm is presented. The gate count of the watershed PE is less than 9 smaller PES that process a pixel. which amounts to a decrease of about 32% gate counts. The number of data transfer between the PE and the memory can be reduced because the throughput rate of the proposed algorithm is 9/35 as opposed to 1/9 as in the conventional scheme. By increasing the processing speed. this architecture improves the applicability of this algorithm in real time applications.

Process 9 pixels in an exaactiou<

Sorting array

000@..
Pixels belong to Fi-. l:.Id b of the proposed watersh&sigmhni New mininirfin pixe? Watershed pixel

References
Moga. B. Cramariuc and M. Gabbouj, An efficient watershed segmentation algorithm suitable for parallel implementation. Proceediir7gs of IEEE Iiirer-r7arioria/ Corlfereizce 0 1 7 Iimge Processi17g,vol. 2. pp. lOl-lO4, Oct. 1995 P. Salembier and M. Pardhs, Hierarchical morphological segmentation for image sequence coding. I Tr-nizsacrions on Image Proc-essii7p. vol. 3, pp. 639-65 1. Sep. 1994

11-755

Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 09,2010 at 15:41:55 UTC from IEEE Xplore. Restrictions apply.

Fig. 3. Flowchart of the proposed watershed algorithm.

Fig. 6. Architecture of the proposed algorithm

Fig. 4. The result of the Vincent watershed iniage

Fig. ?(a)

a .

*"_

Fig. 5. The result of the proposed watershed image (Gradientimage threshold=17)

Fig. 7(b) Fig. 7. Schematic diagrani 01 the Y t . (a) For center pixel. (b)or neighboring pixel.

11-756

Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 09,2010 at 15:41:55 UTC from IEEE Xplore. Restrictions apply.

You might also like