18.ijaest Vol No 10 Issue No 2 Symbol Extraction From Document Images Using Image Segmentation in Color Domain 309 313

Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 10, Issue No.
2, 309 - 313
SYMBOL EXTRACTION FROM DOCUMENT IMAGES USING IMAGE SEGMENTATION IN COLOR DOMAIN
Student of Master of Technology Department of Computer Engineering YCOE,GuruKashi Campus, Punjabi University Talwandi Sabo, Punjab,India uvikataneja01@gmail.com
Uvika
Keywords- Image segmentation, Matlab, number of
symbols extracted, symbol extraction

I. INTRODUCTION
Image segmentation and extraction is the process of dividing the image into segments and then extract or recognize objects from it. Image segmentation has
IJ A
ISSN: 2230-7818
Abstract Image segmentation is an important application of image processing. In proposed algorithm we achieved the segmentation and extraction of symbols using minimum spanning tree based segmentation method. This paper presents the extraction of symbols and characters from document images and describes the number of symbols extracted from the images. Symbols itself include all characters and characters includes all the letters and numbers. The focus is on the black and white images. Basically this is achieved by using image segmentation in color domain. That is why each and every symbol or character in document images should be disjoint. Our proposed algorithm also extracts the handwritten symbols and characters from binary images. The images of text can also be taken with the help of high resolution camera and extract symbols (including characters) from those images.
basically two parts color extraction and texture extraction. In proposed algorithm the symbol extraction has been done by using image segmentation in only color domain where the intensity of color changes from white to black, it extracts each symbol and character from binary images. So each symbol should be disjoint from another. It takes connected symbols as one symbol. Basically segmentation means to find out the coordinates of objects having same pixel intensity and to cut that part from image using image processing commands is called extraction. The extraction of textual information from document images provides many useful applications in document analysis and understanding, such as optical character recognition, document retrieval, and compression. The document image segmentation is an important component in the document image understanding. The extraction of text in an image is a classical problem in the computer vision. However variation of text due to difference in size, style, orientation, alignment, low image contrast and complex background make the problem of automatic text extraction extremely challenging. Sometimes characters in a text are of different shapes and structures. The images may contain noise and have complex structure which makes the extraction more difficult [3]. In proposed algorithm I have used
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
ES
Assistant Professor, Department of Computer Engineering YCOE,GuruKashi Campus, Punjabi University Talwandi Sabo,Punjab,India purbasumeet@yahoo.co.in
Sumeet Kaur
Page 309
Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 10, Issue No. 2, 309 - 313
II.
PROPOSED SCHEME
IJ A
A. Binarization B. Segmentation C. Extraction
ISSN: 2230-7818
In this section proposed method for extraction is given, in the proposed scheme the work is divided into three parts-
ES
1. 2.
minimum spanning tree based segmentation method. A Minimum Spanning Tree (MST) is a minimum-weight; cycle-free subset of a graph's edges such that all nodes are connected. The possibility of stitching together independent subimages motivates adding connectivity information to the pixels. In which image can be viewed as a graph, the nodes of which are pixels, and edges represent connections between pixels. This method is used to show multiple disjoint symbols which collectively cover the entire image. [7] Symbol extraction is the process of extracting each symbol which includes letters, numbers and symbols from document images. The purpose of Symbol extraction from document images using image segmentation in color domain is to extract every symbol from document images. And after extracting the individual symbol in a document then calculate the number of symbols extracted in image. The focus is on the black and white images. These include scanned images and take live picture of document also (documents can be handwritten also) and extract symbols from it and calculate the numbers of symbols or characters included in document images. For this purpose high resolution camera must be used. In proposed algorithm I have used 16 mega pixels camera.
Work flow Model of the Proposed Scheme A. Here in the first phase we perform Binarization of RGB that mean this algorithm would covert the RGB images i.e Document contained Symbols and characters in to the Binary Images, Read RGB images i.e Documents. Convert it into a Gray images. Convert the gray image into binary image by taking the suitable threshold value (100 has been used). 4. This binary image is then inverted. 5. All connected components (symbols) that have fewer than 30 pixels are removed from binary images by using filter morphologically open binary
1. 2. 3.
B. The Second phase of the development is of Segmentation that is for the defining the co-ordinates of objects in document images Convert the gray image into rgb format. Invert of the labeling of the original image is done. bwlabel(~(sel_img)); 3. Finding must be done. Take two metrics x,y. Got the min(minimum) and max(maximum) of the metrics using Find function. [x_mat y_mat]=find(img == i);
Page 310
C. In our final phase the Extraction Algorithm is come which is used for extracting the symbols & character from the document images and count each one Use same metrics x,y for plotting the boundary box on each symbol. 2. For putting the colored box, we have to convert the binary image into rgb format.Then plot the Red or green or blue colour box on symbols by changing the value. 3. Show each and every symbol and character separately by using metrics. x_min : x_max,y_min : y_max
1.
Figure 2 Segmented Image
IJ A
Figure 1 Original Image
ISSN: 2230-7818
ES
We have applied the proposed algorithm on the Standard Three Images which are presented in the Below shown figures, each one original figure is followed by its segmented image. The extraction of symbols from document images is shown in following figures. Figure 1, figure 3 and figure 5 are the original document images and captured handwritten images respectively. Figure 2, figure 4 and figure 6 are the resulting segmented images.
Figure 3 Original Scanned Handwritten Image
T
Page 311
extracted symbols by the total numbers of the symbols in the images.

Images Handwritten Images Document Images Document Images Document Images Font Size Large 28 26 24 Arial black Arial black Arial black Font Type Extraction Rate 100% 97% 90% 89%
IJ A
III.
CONCLUSION In this paper extraction of symbols and characters from document images and handwritten images is presented in which all the symbols should be disjointed. The major sources of error were due to symbols like % and = because it will take % as three different symbols and = as two symbols because they are disconnected. Black colored text printed on the white sheet is preferred for better extraction rate. No extra light effects must be present while capturing the images from camera. Our future work is directed towards segmentation and extraction of symbols from RGB images which includes other objects also with the documents includes symbols.
IV.
EXPERIMENTAL RESULTS
The algorithm is implemented using MATLAB. We have taken RGB document images of different font sizes. We consider the document images of size generally 24, 26, 28, 36 and Arial black font has been used. The extraction rate of the symbols is calculated from dividing the total numbers of the
ISSN: 2230-7818
ES
V.REFERENCES 1. Phalgun Pandya, Mandeep Singh Morphology Based Approach To Recognize Number Plates in India International Journal of Soft Computing and Engineering (IJSCE) ISSN: 22312307, Volume-1, Issue-3, July 2011. 2. Zhe Wang, Yue Lu, Chew Lim Tan, Word Extraction Using Area Voronoi Diagram Department of Computer Science, School of Computing National University of Singapore, Kent Ridge, Singapore, 2009.
Figure 5 Original Captured Handwritten Image
Table 1 Experimental Results
Page 312
3.
4.
5.
6.
7.
IJ A
ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 313
ES
G. RAMA MOHAN BABU, P. SRIMAIYEE, A. SRIKRISHNA, TEXT EXTRACTION FROM HETROGENOUS IMAGES USING MATHEMATICAL MORPHOLOGY Journal of Theoretical and Applied Information Technology 2005 - 2010 JATIT. Aryuanto, Koichi Yamada, F. Yudi Limpraptono, Color Segmentation for Extracting Symbols and Characters of Road Sign Images Department of Electrical Engineering, Institut Teknologi Nasional (ITN) Malang, Indonesia Department of Management Information Systems Science, Nagaoka University of Technology, Japan. Satadal Saha, Subhadip Basu, Mita Nasipuri and Dipak Kr. Basu, A Hough Transform based Technique for Text Segmentation JOURNAL OF COMPUTING, VOLUME 2, ISSUE 2, FEBRUARY 2010, ISSN 2151-9617. Character recognition overview http://www.cs.berkeley.edu/~fateman/ka they/char_recognition.html Minimum spanning tree-based segmentation http://en.wikipedia.org/wiki/Minimum_s panning_tree-based_segmentation

18.ijaest Vol No 10 Issue No 2 Symbol Extraction From Document Images Using Image Segmentation in Color Domain 309 313

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18.ijaest Vol No 10 Issue No 2 Symbol Extraction From Document Images Using Image Segmentation in Color Domain 309 313

Uploaded by

Copyright:

Available Formats

Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 10, Issue No.

Keywords- Image segmentation, Matlab, number of

symbols extracted, symbol extraction

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

Figure 2 Segmented Image

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

Figure 3 Original Scanned Handwritten Image

extracted symbols by the total numbers of the symbols in the images.

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

Figure 5 Original Captured Handwritten Image

Table 1 Experimental Results

You might also like