You are on page 1of 21

International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING 6367(Print), ISSN 0976 6375(Online) Volume

e 4, Issue 1, January- February (2013), IAEME & TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), pp. 337-357 IAEME:www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

FUZZY RULE BASED CLASSIFICATION AND RECOGNITION OF HANDWRITTEN HINDI CURVE SCRIPT
Gunjan Singh1, Avinash Pokhriyal1, Sushma Lehri2
1

( Faculty of Management & Computer Application, RBS College, Agra, India.) 2 (Professor, I ET, Dr. B. R. Ambedkar University, Agra, India.)

ABSTRACT This paper presents a novel system for classification and recognition of handwritten Hindi script using fuzzy rule based approach. Classification & recognition of handwritten Hindi script is a complex task as characters are cursive in nature and demonstrate a lot of similar features. The quality of fuzzy logic to deal with vague and imprecise data makes it appropriate for such problems. In this paper, we focus on two or three letter words without modifiers. Prior to recognition, handwritten words are preprocessed and segmented into individual characters. The performance of an optical character recognition system extremely depends on the procedure used to extract quality features from characters. During classification stage characters are classified into seven classes using fuzzy if-then rules based on one of the most important component of Hindi characters the vertical bar. Features such as curves, lines, junction points and endpoints are used at the recognition stage. A 3x3 mask is used to extract features from character image. System was tested for total 450 words written by 30 different people. Experimental results show that the proposed method performs classification and recognition at the rate of 92.02%. The proposed system has been implemented in MATLAB 2009 environment. Keywords: Classification, Fuzzy rule based approach, Handwritten Hindi curve script, Vertical bar, 8-neighbourhood

337

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

1. INTRODUCTION Character recognition is a broad field in which all types of machine recognition of characters in various application domains is studied. It includes the recognition of machine printed as well as hand written characters. Recognition of machine printed characters involves the recognition of characters written by a machine, while handwritten character recognition includes the recognition of characters written by human being either online or offline. Recognition of machine printed characters is easy as characters are of same size, font & thickness and have a proper shape, but due to various writing styles, hand written character recognition is difficult as characters may be of different sizes, width and orientation. A comparison of both approaches is given in [1]. In this paper, we will present a fuzzy rule based classification and recognition system for handwritten Hindi script. Hindi is one of the official languages of India. It is worlds third most commonly used language after Chinese and English. Hindi script has 13 vowels (SWARS) and 33 consonants (VYANJANS) in its basic character set. All the characters have two common features (i) their cursive nature and, (ii) presence of header line (SHIROREKHA). Header line is a powerful tool of Hindi language. These features differentiate the script from English and other Latin scripts. Words are formed by combining characters, half characters and /or modifiers using header line. Fig.1 shows basic character set, a list of modifiers and few words.

(b)

(a)

(c)

Figure 1(a). Basic character set, (b) Swars (vowels) & corresponding matras (modifiers) and (c) Few Hindi language words

Now-a-days Hindi is being used worldwide in many fields such as banking, medical, science and technology etc. Most of the Hindi language words are being included in worlds best dictionaries and other vocabulary developing tools. Due to the increasing popularity, automatic Hindi language recognition systems have now become important. Research in this area started in early 1970s. In 1977, Sethi and Chatterjee [2] presented a constrained recognition system for handwritten Hindi characters. In [3], Sinha and Mahabala presented a syntactic pattern analysis system for the recognition of machine printed and handwritten characters. The first complete OCR system for machine printed characters is presented in [4]. Recognition of handwritten Hindi characters is still difficult for a machine as characters are
338

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

cursive in nature and show a lot of similarities such as presence of header line, presence / absence of vertical bar, loops & curves. A survey for handwritten character recognition was proposed by R. Srihari [5] in 2000. Most of the work is focused on the recognition of individual characters, and a little attention has been paid towards the recognition of words, sentences or text. Recognition of words is difficult as words should be segmented into individual characters. In the present paper, we propose a fuzzy rule based classification and recognition system for handwritten Hindi curve script words of two or three letters without modifiers. Fuzzy logic is an organized method to solve problems dealing with vague, ambiguous, imprecise, noisy, or missing input data. The concept of fuzzy logic is first given by Dr. Lotfi A. Zadeh in 1965[13]. According to Dr, Zadeh, fuzzy logic is a mathematical tool for dealing with uncertainty. As compared to crisp logic that deals with precise values; it is a form of multi valued logic, which provides a way to deal with reasoning that is approximate. So it gives a machine a better mean to simulate human reasoning capabilities. Dealing with approximation makes it appropriate for problems such as handwritten character recognition. This paper is organized in 5 sections. Section 2 throws some light on work done in the field of handwritten Hindi character recognition. Section 3 presents the proposed system. Section 4 shows the experimental results. Finally conclusion is made in the last section. 2. LITERATURE REVIEW Hanmandlu et al. [6] presented a fuzzy model based recognition system for handwritten Hindi characters with 90.65% accuracy. The system works by performing coarse classification of preprocessed character image by dividing it into 3x3 windows and then determining the presence and position of vertical bar. Then feature are extracted by applying the box approach. For recognition, an exponential variant of fuzzy membership function, constructed using the normalized vector distance, is used. Mukherjee and Rege [7] presented a shape feature and fuzzy logic based offline handwritten character recognition system for the language with 86.4% recognition rate. Structural features, such as end points, junction points, and adaptive thinning algorithm are used for segmenting characters into strokes. Then crisp and fuzzy features are extracted for each stroke of the character. Two stage classification is performed. Pre classification is performed using tree classifier in which characters are classified based upon the presence and position of vertical line. Final classification and recognition is performed using unordered stroke classification based on mean stroke features. In [8], a handwritten Hindi vowel character recognition system is presented, in which vowels are segmented into five groups using projection approach. To extract the core character header line is removed by applying horizontal projection and modifiers are removed using vertical projection. Feature extraction is done by using Invariant moments. Holambe and Thool [9] presented a system for the recognition of printed and handwritten Devanagari script using support vector machine and k-nearest neighbour classification technique. Singh, Mittal and Ghosh [10] perform estimation of Support vector machine with Radial basis function and k-nearest neighbour and achieved 93.8% accuracy. Two methods curvelet transform & character geometry used for extracting features.

339

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

3. PROPOSED SYSTEM The proposed system works in six stages: preprocessing, segmentation, normalization, classification, feature extraction and recognition. Flow diagram is shown in Fig.2.
Preprocessing Start

Thinning

Binarization

Slant Correction

Dilation

Erosion

Filtering

Scanning

Noise Reduction

Segmentation

Normalization

Classification

Feature Extraction

Recognition

Figure 2. Flow diagram of the proposed 3.1 Preprocessing During preprocessing, a number of following operations are performed on the collected data to make it suitable for further processing (i) Scanning Handwritten word data samples, collected from various people, are scanned through an optical scanner or camera to convert data into a gray scale image. (ii) Noise Reduction-- Noise may be introduced in image during scanning, so to reduce noise following operations are performed: (a) Filteringto reduce noise and false points, a nonlinear spatial filter- median filter is applied. Concept is to convolute a predefined mask with the image and replaces the value of the centre pixel by the median of intensity values in the neighbourhood of that pixel [14] (b) Dilation there may be gaps in characters, which are filled by dilation using a structuring element [14]. (c) Erosion to eliminate the spurious objects from the image, erosion is applied on it. (iii) Slant Correction there are chances that characters in the word are inclined upwards or declined downwards, which makes feature extraction process difficult. For that, slant correction is done by using [ 12]. (iv) Binarization--In this paper, features are extracted from binary images of characters, so there is a need to convert the image to binary form. Global thresholding is applied for binarization. The method works by choosing a threshold value for the whole image and then sets the values of pixels to 1 whose value is greater than the threshold and 0 otherwise. (v) ThinningFinally, binary image is thinned to single pixel width by the method presented in [11].

340

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

3.2 Segmentation Thinned image of word is segmented into individual characters by histogram equalization as following (i) First, horizontal histogram is taken to get the upper and lower boundary of the word. (ii) Then vertical histogram is taken to get the region of each character. (iii) A case occurs when number of regions is more than the number of characters in the word. It may be due to the presence of a character in which vertical bar is not connected to the character. In that case, the region of the vertical bar, with highest peak value, is considered to be a part of the character to its left. 3.3 Normalization Binary images of individual characters are normalized into 9x9. 3.4 Classification All Hindi language characters are made up of mainly three components: header line or SHIROREKHA, vertical bar, and curves. In the proposed method, we choose vertical bar component to classify characters. TABLE 1 shows the features (presence or absence, length, position, connectedness of vertical bar and number of junction points) on which basis different classes of characters are formed. A character can belong to one class only. Table 1: Features used for classification Symbol Values P (present) VB NP(not present) M(middle) Position of vertical bar Length of vertical bar Connectedness of vertical bar to character POS LEN RE (right end) S (20%-30% of the character width W) L(70%-80% of the character width W) C (connected) CON NC (not connected) Number of junction points JP 1,2,3.4, or 5

Feature Presence of vertical bar

A junction point is a point with 3 or more pixels in its neighbourhood .Method of extracting these features is given in algorithm VERTICALBAR_INFO and JUNCTIONPOINT_COUNT. A movable 3X3 mask (Fig.3) is applied on the image, which shows 8-neighborhood of the pixel P0.

341

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

P8 P7 P6

P1 P0 P5

P2 P3 P4

Figure 3: 3X3 mask In these algorithms, following notations are used: CP -- current pixel CL -- current location COUNT_1 -- counter variable to count the number of pixels. Initial value is set to 0. COUNT_2 -- counter variable to count the number of junction points. Initial value is set to 0. ROW -- current row number COL -- current column number Algorithm VERTICALBAR_INFO To determine the information about the vertical bar do the following: 1. Starting from the last column of the first row i.e. ROW==0 & COL==8, convolute the mask on the binary image of character and check: (i) IF pixel is a foreground pixel then call it as P0. IF number of neighbouring pixels of P0 3 and one pixel is P5 then do the following -(a) Set CP = P0. (b) Set N = COL. (c) Increase COUNT_1 by 1. (ii) ELSE move to next column to the left and repeat step (i) till COL 4 2. To identify the presence of vertical bar check the value of COUNT_1 IF COUNT_1 ==1 THEN VB is P ELSE VB is NP. 3. To identify the position of vertical bar check the value of N. IF N 8 THEN POS is RE ELSE POS is M 4. To identify the length and connectedness of vertical bar to character check POS. (i) IF POS==M THEN do the following till P5 is encountered (a) Set P5=P0 (b) Increase COUNT_1 by 1 (ii) IF COUNT_1 >3 THEN LEN is L ELSE LEN is S (iii) IF POS ==RE THEN Set CP=P0 and check the following till P5 is encountered IF P6 OR P7 OR P8 exists THEN CON is C ELSE CON is NC
342

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

Algorithm NUM_JUNCTIONPOINTS To determine the number of junction points do the following 1. 2. Starting from the upper left corner pixel, convoluting the mask on the image from left to right. Find the first foreground pixel P0 IF number of neighbouring pixels of P0 3 THEN increase COUNT_2 by 1 ELSE P0=P3 Repeat step 2 till rightmost lower pixel is obtained. Set JP=COUNT_2

3. 4.

Using above mentioned algorithms, following fuzzy rules are formed to classify the characters into one of the eight classes. Flow process is shown in Fig.4. (i) (ii) IF VB == NP THEN character belongs to class A ( )

IF VB == P AND POS == M AND LEN == L THEN character belongs to class B ( )

(iii) IF VB == P AND POS == M AND LEN == S AND JP < 2 THEN character belongs to class C( ) (iv) IF VB == P AND POS == M AND LEN == S AND JP 2THEN character belongs to class D ( ) (v) IF VB == P AND POS== RE AND CON == NC THEN character belongs to class E ( )

(vi) IF VB == P AND POS == RE AND CON == C AND JP <4 THEN character belongs to class F( ) (vii) IF VB == P AND POS == RE AND CON == C AND JP 4 THEN character belongs to class G( )

343

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME
Read normalized image of size 9X9 of the character

Read presence of VB Character belongs to class A ( )

yes If VB==A no Read position of VB yes If POS==RE no Read length of VB

VB : Vertical bar A: Absent POS : Position of vertical bar RE : Right end M: Middle LEN : Length of vertical bar L : Large S: Small JP : Junction point CON: Connectedness of vertical bar NC : Not connected

Read connectedness of VB

yes If CON==NC

Character belongs to class B ( )

yes

If LEN==L no Read value of JP no Read value of JP

Character belongs to class E ( )

Character belongs to class D ( )

yes If JP 2 no no Character belongs to class C ( )

If JP 4 yes Character belongs to class F ( ) Character belongs to class G ( )

Figure 4. Flow process of classification 3.5 Feature Extraction Steps for extracting features are given in following algorithm-Algorithm FEATURE_REC 1. Remove header line by applying the following method(i) Apply the 3X3 movable mask on the normalized image and scan the first row from right to left. (ii) IF pixel is a foreground pixel then call it P0. IF P7 is a foreground pixel OR P0 is an end point OR P0 is a disconnected component SET P0 = 0 ELSE move to the left pixel.
344

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

Image is scanned from right to left to avoid the deletion of character pixels in characters such as: because these characters, except , may be written in two ways (a) header line covers the whole character and, (b) when header line covers only half or a portion of the character. In the first case, this step may result in deletion of pixels, which are common to header line and character, in characters mentioned above as well as characters such as and may produce some disconnected components with small number of pixels. 2. Delete disconnected components as following-(i) Scan the second row of the image from left to right. (ii) Find the first foreground pixel P0. (iii) IF P3 ==1 IF any pixel in 8 neighbourhood of P3 does not exists THEN SET P0=0 AND P3=0 ELSE IF P5==1 IF any pixel in 8 neighbourhood of P5 does not exists THEN SET P0=0 AND P5=0 Fig. 5 shows the process of deleting header line from character and its result.

(a)

(b)

(c)

Figure 5: (a) Character with header line, (b) Character without header line and disconnected component, (c) Character after removing disconnected component 3. Apply the 3X3 movable mask on the normalized image of classified character and scan the image from top to bottom row wise. Collect following information for junction points and end points-(i) N1 : total number of junction points (ii) N2: total number of end points (iii) JPi : ith junction point, where i=1 to N1 (iv) EPi : ith end point where i=1 to N2 (v) Curve (JPi) : curve on ith junction point (Table 2) (vi) Curve (EPi) : curve on ith end point (vii) Line(JPi) : line on ith junction point (Table 2)
345

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

(viii) (ix) (x) (xi)

Line(EPi) : line on ith end point Loop(JPi) : loop on ith junction point D1(i): direction of next endpoint from ith end point D2(i): direction of next junction point from ith junction point

Values and symbols of different types of curves, lines & loops are given in the TABLE 2. Table 2: Values and symbols for curves, lines and loop Features Values Left Curve Upper left curve Lower left curve Curve Right curve Upper right curve Lower right curve U curve Vertical line Horizontal line Line Back slash Present Loop Not present NP BS P Symbol LC ULC LLC RC URC LRC U VL HL

Different forms of above mentioned curves, lines and loops are shown in Fig. 6. In this code, following notations are used: PS CL CP COUNT ----Starting point current location Current pixel counter variable. Initial value is set to 0.

346

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

Algorithm CURVE_LINE_LOOP_INFO To determine the nature of the curve do the following: Convolute the mask on the binary image of classified character from bottom to top row wise. Let P is the first foreground pixel. Call it current pixel (CP). 1. If CP is a junction point or end point, then check the 8-neighbourhood of CP. (a) IF P1 is true THEN (i) Repeat till P1 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P3 is true THEN (i) Repeat till P3 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P8 is true THEN (i) Repeat till P8 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P1 OR P2 is true THEN (i) Repeat till P1 OR P2 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P1 OR P8 is true THEN (i) Repeat till P1 OR P8 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P2 OR P3 OR P4 is true THEN (i) Repeat till P2 OR P3 OR P4 encountered (ii) Increase COUNT by 1. ELSE stop. IF P4 OR P5 is true THEN (i) Repeat till P4 OR P5 is encountered (ii) Increase COUNT by 1. ELSE stop. IF P6 OR P7 OR P8 is true THEN (i) Repeat till P6 OR P7 OR P8 is encountered (ii) Increase COUNT by 1. ELSE stop.

(b)

(c)

(d)

(e)

(f)

(g)

(h)

2.

Check the following to know the type of curve and line: (i) IF step 1(h) is true IF step 1(a) is true IF step 1(f) is true IF COUNT 3 THEN Curve is LC
347

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

ELSEIF step 1(e) is true IF step 1(f) is true IF COUNT 2 THEN Curve is ULC (iii) ELSEIF step1(h) is true IF step 1(e) is true IF COUNT 2 THEN Curve is LLC (iv) ELSE IF step 1(f) is true IF step 1(a) is true IF step 1 (h) is true IF COUNT 3 THEN Curve is RC. (v) ELSE IF step 1(d) is true IF step 1(h) is true IF COUNT 2 THEN Curve is URC. (vi) ELSE IF step 1 (f) is true IF step 1(e) is true IF COUNT 2 THEN Curve is LRC. (vii) ELSEIF step 1(g) is true IF step 1(h) OR step1 (f) is true IF step 1(d) is true IF COUNT 3 THEN Curve is U (viii) IF step 1(a) is true IF COUNT 2 THEN Line is VL (ix) IF step 1(b) is true IF COUNT 2 THEN Line is HL (x) IF step 1(c) is true IF COUNT 2 THEN Line is BS

(ii)

3.

If CP is a junction point, then do the following to check the presence of loop: IF step 1(h) is true IF step 1(a) OR step 1 (g) is true IF step 1(f) is true IF Pi == CP AND COUNT 5 THEN Loop is P.

348

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

CP

(a)

(b)

(c)

(d)

(e)

(f)

(g)
CP

(h)

(i)

Figure 6 : Different types of curves : (a) Left curve (LC), (b) Upper left curve (ULC) , (c) Lower left curve (LLC), (d) Right curve (RC), (e) Upper right curve (URC), (f) Lower right curve (LRC), (g) U curve (U) , (h) Vertical line (VL), Horizontal line (HL), Backward slash (BS), (i) loop 349

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

3.6 Recognition Fuzzy rules are used for recognition. Class wise rules applied for characters are: 1. IF Class is A IF Curve (EP1) == RC THEN character is ELSE IF Curve (JP1) ==LRC IF N2==4 OR D1 (3) == P3 THEN character is ELSE character is 2. IF Class is B IF Curve (EP2) == LC THEN character is ELSE IF Curve (EP2) == URC IF Curve (JP1) == LC OR Loop(JP1) ==P THEN character is ELSE character is 3. IF Class is C IF Curve (EP1) == LC THEN character is ELSE IF Curve (EP1) == RC IF N2==3 THEN character is ELSE character is 4. IF Class is D IF Curve (EP1) == LC THEN character is ELSE IF Curve (JP1) == LC IF N2 < 2 THEN character is ELSE IF N2==2 THEN character is ELSE character is ELSE IF Loop (JP1) ==P IF Curve (JP1) == RC OR URC THEN character is ELSE character is
5. IF Class is E IF Loop (JP1) ==P IF N1==2 THEN character is ELSE character is ELSE IF Curve (EP1) == U THEN character is 350

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME 6. IF Class is F IF N2 > 3 IF Curve (EP1) == ULC THEN character is ELSE IF Curve (EP1) == RC OR Curve (EP2) == RC THEN character is ELSE character is ELSE IF N2==3 IF Curve (JP1) == LLC THEN character is ELSE IF Curve (JP1) ==U THEN character is ELSE IF Curve (EP1) == ULC THEN character is ELSE character is ELSE IF Curve (JP1) ==U THEN character is ELSE IF Curve (JP1) ==LLC THEN character is ELSE IF Curve (JP1) ==LC OR Loop (JP1) ==P THEN character is ELSE character is 7. IF Class is G IF N2>4 IF Curve (EP1) == RC THEN character is ELSE IF Line (EP1) == BS IF D2 (1) ==P3 OR D2(2)==P3 THEN character is ELSE character is ELSE IF N2 ==4 IF Loop on JP1 ==P THEN character is ELSE character is ELSE IF Curve (JP1) ==LLC OR U THEN character is ELSE IF Curve (JP1) == LC THEN character is ELSE IF Loop on JP1 ==P IF Loop on JP3 ==P OR LINE (EP2) == HL THEN character is ELSE character is

351

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

Table 3: Summary of fuzzy rules for each character


Class A N1 ------------------------------2 --------------------------------------------N2 ---4 ----------3 --<2 2 ------------>3 >3 >3 3 3 3 3 <3 <3 <3 <3 >4 >4 >4 4 4 <4 <4 <4 <4 Curve(JP) ----------LC --------LC LC LC --RC OR URC --------------LLC U ----U LLC LC ----------LLC OR U LC ----Curve(EP) RC ----LC URC URC LC RC RC LC --------------U --ULC RC ------ULC ----------RC --------------Line(JP) --LRC LRC ----------------------------------------------------------------------Line(EP) --------------------------------------------------------------BS --------HL --Loop(JP) ----------P --------------P P P P ----------------------P --------P ----P P D1 ----P3 --------------------------------------------------------P3 ------------D2 --------------------------------------------------------------P3 ------------D3 ----------------------------------------------------------------------------Character

352

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

4.

EXPERIMENTAL RESULTS

Dataset has been created by collecting handwritten word samples by 30 people of different age groups. Each person was asked to write 15 predecided words. A part of dataset is shown in the following figure

Figure 7: Word samples taken for experiment

353

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

These word samples were scanned, using a flat-bed scanner at 300dpi. Results of operations performed during recognition process on scanned image of word are shown in the following figure.

Original image

Filtered image

Eroded and dilated image

Binarized image

Thinned image

Segmented image

VB == P POS == RE CON == C JP 4

VB == NP

VB == P POS == RE CON == C JP < 4

Classification

Character belongs to class

Figure 8. Result of operations performed during preprocessing, segmentation and classification on sample word
354

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

After classification, features mentioned in TABLE 2 are extracted for each character by applying algorithm FEATURE_REC, which are then used at the time of recognition. Recognition rate for each word sample and for the proposed method is given in TABLE 4. Table 4. Average recognition rate of selected words Word Recognition Recognition Recognition rate rate of rate of of character character 2 character 3 1 92.15% 94.08% 88.23% 94% 90.93% 94.14% 83.66% 95% 95.22% 96.31% 88.42% 89.75% 90.68% 96.29% 88.57% 96.81% 87.41% 90.11% 97.26% 90.17% 93.96% 93.48% 92.01% 92.45% 92.31% 83.52% 88.99% 94.43% 93.91% 97.44% 96.80% 87.23% 95.06% 90% 92.07% 84.36% 89.76% 91.19% 94.21% 93.46% ----------------------------------------

Sample

Avg. recognition rate 91.48% 90.44% 94.41% 91.43% 89.89% 90.94% 92.33% 93.31% 91.64% 88.91% 89.83% 95.36% 91.24% 97.12% 92.10% 92.02%

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

Overall Average Recognition Rate

355

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME January

98 96 94 92 90 88 86 84 S10 S11 S12 S13 S14 S15 Series1

S6

S7

S8

Figure 9. Graphical representation of recognition rate of sample words 5. CONCLUSION

presented or In this paper, we have present a novel method for classification and recognition of simple Hindi language two or three letter words without modifiers using fuzzy rule based approach. Characters are first classified into seven different classes and then recognized class wise. Few misclassification cases arise due to the presence of: some of the similar shape es characters such as & and & , and characters which can be written in more than one way such as & . We have extracted features for all the basic characters of the language for recognition process. Algorithms developed perform well and give fine results as the most prominent features, such as vertical bar, curves, loops and lines, are used at classification and , recognition stage. Experimental results verify the significance of the proposed system with of 92.02% recognition rate. Fuzzy logic performs better than other methods as it can deal with imprecise, incomplete and vague data efficiently without losing any important information. In future, we will work to achieve better results and to improve the recognition rate by emphasizing more on characters having similar shape such as and on Hindi words with modifiers. REFERENCES Journal Papers: [1]. N. Arica and F.T. Yarman-Vural, An overview of character recognition focused on Yarman off line hand writing, C99-06-C-203, 2000,IEEE. C99 [2]. I.K. Sethi, and B. Chatterjee, Machine recognition of constrained hand printed rinted Devnagari, , pattern recognition, vol. 9, no. 2, 1977, pp.69 75. [3]. R.M.K. Sinha and H. Mahabala, Machine recognition of Devnagari script, IEE IEEE Trans. System, Man Cybern. 9,1979, 435-441. 435 [4]. S. Palit, B.B. Chaudhuri, P.P. Das, B.N. Chatterjee, Pattern Recognition, Image Processing and Computer Vision, Narosa Publishing House, India,1995,163 1995,163-168.

356

S9

S1

S2

S3

S4

S5

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 1, January- February (2013), IAEME

[5]

[6] [7]

[8] [9]

[10]

[11]

[12]

R. Plamondon and S. N. Srihari, On-line and off-line handwriting recognition: A comprehensive survey, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22(1), 2000, pp6384. M. Hanmandlu, O.V. R. Murthy and V. K. Madasu, fuzzy model based recognition of handwritten Hindi characters, 0-7695-3067-2/07, 2007,IEEE. P. Mukerji and P.P. Rege, Shape Feature and Fuzzy Logic Based Offline Devnagari Handwritten Optical Character Recognition, Journal of Pattern Recognition Research 4, 2009, 52-68. R.J.Ramteke, Invariant moments based feature extraction for handwritten Devnagari vowel recognition, IJCA, ( 0975-8887) Vol 1 No. 18., 2010. A. N. Holambe, R.C.Thool , Printed and handwritten character & number recognition of Devanagari script using SVM and KNN, Int. Journal of Recent Trends in Engineering and Technology, Vol. 3, No. 2, May 2010 B. Singh, A. Mittal and D. Ghosh, An evaluation of different feature extractors and Classifiers for offline handwritten Devnagari character recognition, Journal of Pattern Recognition Research 2, 2011, 269-277. A. Pokhriyal and S. Lehri, MERIT: Minutiae Extraction Using Rotation Invariant Thinning. International Journal of Engineering Science & Technology, vol. 2(7), 2010, 3225-3235. Primekumar K.P and Sumam Mary Idicula, Performance Of On-Line Malayalam Handwritten character Recognition Using HMM and SFAM International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 115 125, Published by IAEME

Proceeding Papers: [12] P. Mukherji, P. P. Rege and L. K. Pradhan, Analytical Verification System for Handwritten Devnagari Script. Proceedings of the Sixth IASTED VIIP, pp. 237-242, Palma DeMallorca, Spain, August,2006. Books: [13] S.N. Sivanandam and S. N. Deepa, Principles of Soft Computing (Second Edition, Wiley-India) [14] R.C. Gonzales and R.E.Woods, Digital Image Processing (Second Edition, Prentice Hall)

357

You might also like