You are on page 1of 5

Perceptual Zoning for Handwritten Character Recognition

Simone B. K. AIRES a,b, Cinthia O. A. FREITAS a, Flvio BORTOLOZZI a, Robert SABOURIN c a Pontifical Catholic University of Paran (PUCPR) R: Imaculada Conceio, 1155 Curitiba (PR) Brazil b Centro Federal de Educao Tecnolgica (CEFET) Av: Monteiro Lobato km 4 Ponta Grossa(PR) Brazil c cole de Technologie Suprieure (ETS) 1100, Rue Notre Dame Ouest Montreal (QC) Canada saibes@ppgia.pucpr.br, cinthia@ppgia.pucpr.br, fborto@ppgia.pucpr.br, robert.sabourin@etsmtl.ca
Abstract. Our study investigates the perceptual zoning mechanism for handwritten character recognition. This paper proposes a non-symmetrical zoning mechanism as the baseline feature extractor. Zoning is a method for local information analysis on partitions of a given pattern. So, the basic idea is to work at a high level of representation: the feature vectors extracted from the word images based on the feature set, which are based on zoning mechanism taking into account letter confusion parts. Our feature extraction is based on Concavities/Convexities Deficiencies, which are obtained by labelling the background pixels of the input images. Therefore, circumscribed the letter by a rectangle and partitioned it into Z parts, say Z = 4, 5H (horizontal), 5V (Vertical), and 7. Our experimental results are promising and contribute to the handwritten character recognition field.

1. Introduction
The handwritten character recognition is a special subject and has become important as ICR systems become more powerful and commercially available. On the other hand, there is a gap between human reading capabilities and the recognition systems, so that it is necessary to explore and capture information from human perception to design new systems (Madhavanath & Govindaraju ,1998), (Schomaker & Segers, 1998) (Suen et al., 1994). To understand the recognition process, let us consider two different views. One view, the feature theory, uses a checklist approach it claims that letters are represented in the nervous system as a set of features, lines, and contours of various orientations (Sekuler & Blake, 1994). The other view uses a spatial frequency approach derived from the work of Campbell and Robson (Sekuler & Blake, 1994). Then, the letter tendency to be confused was used to define the perceptual similarity of letters. The basic idea is that when two letters look a lot alike, they will often be confused with one another. The Figure 1 presents this idea considering letters: U and V, O, and Q. A good strategy of perceptual similarity should predict which pairs of letters are confused and which are not.

V O Figure 1: Similarity between letters: a) U and V; b) O and Q

Our work is based on the feature approach; therefore, it is necessary first to compile a checklist of features that is, to decide what features should make up the list. Table 1 presents a list of the features distinguishing some letters of the alphabet, such as: A, C, and G. A complete list is presented in (Sekuler & Blake, 1994) and it was adapted from Geyer & DeWalt (1973).
Table 1: List of the features distinguishing letters of the alphabet

Features External 1. Slant (/) 2. Slant (\) 3. Convex segment

A 1 1

Features Open 4. Horizontal 5. Intersection, internal 6. Bar-horizontal 7. Symmetry, vertical 8. Symmetry, horizontal

C 1

G 1 1

2 1 1 1

Based on the feature theory we know that if two letters have many features in common, they will tend to be confused; letters that have few features in common will not be. We can observe in Table 1 that the letter C and G have features in common (convex segment and horizontal open). These letters have two differences based on bar-horizontal and horizontal symmetry, which are difficult to extract from the letter stroke since it depends on the writing style (Figure 2).

When we are designing a recognition system we take into account that perception depends on cooperative interaction between the processing of global and local information. The handwritten character or word recognition is an example of stimulus that contains both kinds of information. We are expertise in recognition of characters from early childhood onwards. But, when we observe only a part of the letter, its identification is not that obvious. In the first observation, we are processing global information. In the second one, we need to process local information (Suen et al., 1994). Therefore, a zoning mechanism has been proposed to evaluate the recognition rates of the distinct parts of characters (Suen et al., 1994). Zoning is a method for local information analysis on partitions of a given pattern. So, the basic idea in this paper is to work at a high level of representation: the feature vectors extracted from the word images based on the feature set, which are based on zoning mechanism taking into account perceptual similarities.

2. Visual Perception Concepts


In this section we introduce a summary about the visual perception concepts related to handwritten word recognition. The Gestalt Theory describes the principles of organization, which tend to encourage the emergence of perceptual forms and promote the grouping of those forms, segregated form their surroundings. This theory is beyond the scope of this paper. However, an excellent introduction to this subject can be found in (Sekuler & Blake, 1994). Generally speaking, people organize what they see. In Figure 2, the cluster of plus signs (+) stands out in the midst of the little Ls; it is very hard, though, to see the cluster of Ts embedded in the Ls in the right-hand side of the figure. Only, by scrutinizing the elements individually can discover the existence of a group of Ts. Because only particular properties promote grouping, these properties may constitute the basic elements of perception. We call these visual elements primitives (Sekuler & Blake, 1994). On the assumption that knowledge of these primitives might reveal how grouping processes work, many researchers - Beck, Julesz, Prazdny and Rosenfeld - try to identify visions primitives. The Gestalt theory postulated that the human being has a tendency to interpret a visual stimulus as a complete scene. This tendency is known in Gestalt theory as closure concept. For our case, this approach is called Global Approach. C

(a) (b) Figure 2: Visual Perception: a) Difference based on bar-horizontal feature (Letters C and G); b) Demonstration of some limits on object segregation by shape

But, the studies can take into account the structuralist approach. This approach treats form perception as an analytical process, whereby complex forms were decomposed into small and simple elements. For our case, it is the same approach that segment words into letters or pseudo-letters, called by the researchers as Local or Analytical Approach. In Figure 2, the first level of perception is Global, and then the observer can identify the cluster of +s. Only the second level of perception is Local, and then the observer can identify the cluster of Ts. Concluding, this work takes into account the Gestalt theory applying first a Global feature extraction based Concavities/Convexities Deficiencies, and seconds a Local perceptual zoning mechanism. The zoning mechanism allows scrutinizing the elements (features) individually. The feature set extracted from the images is presented in Sections 3 and 4.

3. Handwritten Character Recognition


Since the late 1960s, research on recognition of unconstrained handwritten characters has made impressive progress and many systems have been developed, particularly in machine printed and on-line character

recognition. However, there is still a significant performance gap between humans and machines in the recognition of off-line totally unconstrained handwritten character recognition. Generally, off-line handwritten character recognition system includes three stages: image preprocessing, feature extraction, and classification. Preprocessing is primarily used to reduce noise or variations of handwritten characters. A feature extraction is essential for data representation and extracting meaningful features for later processing. A classification stage assigns the characters to one of the several classes. Considering the influence on recognition performance, the features extraction plays a very important role in handwriting recognition. This has led to the development of a variety of features for handwritten recognition and their recognition performances have been reported (Madhavanath & Govindaraju, 1998) (Suen et al., 1994). The baseline system used in this work applies a Global Approach for feature extraction combined with a Local Approach based on zoning mechanism, and uses Class-Modular architecture feedforward MLP (Multiple Layer Perceptron) in the classification stage. The system gets as input a 256 grey-level image. The preprocessing step is composed of binarization (Otsu, 1979) and a bounding box definition. The feature set is based on Concavities/Convexities Deficiencies. These deficiencies are obtained by labelling the background pixels of the input images (Parker, 1997). The entire and definitive symbols were adapted to handwritten characters, then we have 24 different symbols. Figure 3 presents an example of feature extraction applied to letter S.

4. Perceptual Zoning: Symmetrical versus Non-Symmetrical


Authors have presented zoning mechanisms or regional decomposition methods to investigate the recognition rates of patterns based on their parts, and to discover potential candidates when an uncertainty (or confusion) occurs at a given part. Let us analyze the human brain during the character reading process. Human often concentrate on the significantly parts of the characters for effective and efficient reading. But, do we really know which the meaningful parts are? Where are located in the characters the meaningful parts? In this paper we analyzed the meaningful parts of the characters using the confusion matrix obtained in the recognition process. Analyzing the confusion matrix allows us to understand which parts of the characters are making up the confusion. Suen et al. (1994) applied a zoning mechanism in their experiments using handprinted characters (Suen et al., 1994). They analysed 4 different configurations. Therefore, circumscribed the letter by a rectangle which is partitioned into Z parts, say Z = 2(Left-Right), 4, and 6 as presented. These paper observed that letter D always lies on the top (100%), and letters A, K and G give higher recognition rates (100%) than P, I and T (54%). The authors comment about the case 2LR for Y and explain that this zoning is perfect for recognition; but it brings a difficulty to B because the left half is confusing with E. Therefore, it should be noticed that different partitions may produce big differences in recognition rates. For instance, in Z = 6 the character B is confused with 6 characters: C, G, J, O, S, U. Our experiments were based on these studies and started with Z = 4. We analyze the confusion matrix looking for the relation between the regions and the confusions. In this way, we experiments other zoning mechanisms, such as: Z = 5Horizontal, 5Vertical and 7, as shown in Figure 4.

Figure 3: Feature extraction: letter S

Figure 4: Zoning mechanism: Z = 4, 5H, 5V, and 7 parts

5. IRONOFF Database
The experiments were carried out using the handwritten character database from IRESTE/University of Nantes (France), called IRONOFF (IReste ON/OFF Dual Database), consisting of 26 classes of uppercase characters from Form B: B27 B52 fields (Viard-Gaudin, 1999). The IRONOFF database was selected because it is fully cursive. The samples were collected from about 700 writers, mainly of French nationality. The off-line data were scanned at 300 dpi with 8 bits per pixel. The experiments were carried out using 3 subsets, which we called the Training, Validation, and Test databases. Their composition is as follows: 60% for the Training, 20% for the Validation database, 20% for the Test database. The database has a total of 10,510 images.

6. Class-Modular NN Recognition Method


A single task is decomposed into multiple subtasks and each subtask is allocated to an expert network. In this paper, as well as in (Oh & Suen, 2002), in the class-modular classification, the K-classification problem is decomposed into K 2-classification subproblems, each for one of the K classes. A 2-classification subproblem is solved by the 2-classifier specifically designed for the corresponding class. The modular MLP classifier consists of K sub-networks, Mi for 0 i K-1, each responsible for one of the K classes (K = 26 letters).

7. Experimental Results
The recognition rate obtained applying Z = 4 is 82.89%, as presented in Table 2. In carrying out the recognition procedure over the validation database, we performed an error analysis based on the confusion matrix (Table 3). Observing the matrix the following confusions are established: B, D and O; C and E; D and O; H and M; I, F and J; G and Q; J and D; K and M; N and W; R and A; S and D; W, U and V; X and K; Y and X. Then, we experiment the zoning using 5 parts; 5-Horizontal and 5-Vertical. The idea was to provide a better solution to the confusion problems among shapes that are not symmetrical, such as: G and Q (Figure 5a); D and O; Y and X. The recognition rates obtained are 5H = 81.75% and 5V = 80.94%, respectively (see Table 2). The 5H confusion matrix presents better results to following letters: G, O, and Y (Table 2). This zoning mechanism contributes for those letters which are not horizontally symmetric, as presented in Figure 5b. In this manner, we experiment the zoning using 7 parts. The idea was to provide a better solution to the confusion problems among shapes that are not symmetrical and to extract and represent differentially the character middle zone, such as: D and C; N and W; Y and X. This zoning results better than others for the following letters: B, C, D, E, K, N, P, R, U, W, and X (Table 2 and Figure 5c). This zoning mechanism achieved a recognition rate of 84.73%, demonstrating that this approach is promising and some representations are more robust to discriminate among subset of letters than others.
Letter A B C D E F G H I J K L M 4 91.04 65.67 82.09 73.13 83.58 92.54 82.09 88.06 76.12 83.58 77.61 92.54 92.54 5H 86.57 64.18 79.10 65.67 85.07 91.04 86.57 85.07 71.64 79.10 76.12 89.55 82.09 Table 2: Recognition Rate (%) 5V 7 Letter 4 89.55 91.04 68.66 N 74.63 79.10 86.57 O 68.66 88.06 86.57 P 68.66 82.09 82.09 Q 89.55 95.52 86.57 R 89.55 92.54 79.10 S 80.60 80.60 95.52 T 70.15 76.12 80.60 U 76.12 71.64 95.52 V 79.10 82.09 70.15 W 77.61 80.60 76.12 X 86.57 91.04 77.61 Y 85.07 88.06 89.55 Z Average 82.89 5H 77.61 89.55 92.54 64.18 89.55 79.10 97.01 85.07 82.09 74.63 74.63 89.55 88.06 81.75 5V 70.15 88.06 91.04 76.12 88.06 79.10 97.01 82.09 88.06 65.67 70.15 85.07 88.06 80.94 7 86.57 83.58 94.03 80.60 91.04 76.12 97.01 86.57 82.09 79.10 79.10 82.09 86.57 84.73

8. Conclusion
In this paper, we explored the perceptual zoning based on the confusion matrix and it information about the confusion parts of the letters. So, we can conclude that based on this type of analysis, we probably are able to design a global hierarchical recognition system. This system has some networks specialised on handwritten character recognition sub problems. Therefore, we apply a suited for particular purpose representation to specific sub problems and it is used to improve the overall experimental performance.

Table 3: Confusion matrix: Z = 4 (NI = not identified) A A 61 B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 5 1 1 B 44 3 1 2 1 1 2 1 C 55 7 1 2 1 1 D 13 49 2 5 1 2 2 1 5 1 2 E 4 56 2 2 8 1 1 2 1 F 62 1 1 G H 1 3 2 55 1 1 1 1 1 59 1 3 3 2 1 1 I 1 51 2 4 1 1 1 J 4 56 1 4 2 K 52 8 L M N 1 3 62 1 3 1 1 1 1 8 6 62 6 1 1 1 2 1 46 4 4 O 5 9 1 1 58 5 2 P 1 1 58 1 1 1 Q 1 1 3 1 55 1 1 1 R 2 3 2 1 1 58 S 2 1 53 T 1 1 64 2 U 1 1 1 54 2 6 V W X 1 1 1 2 64 6 3 1 11 47 1 1 1 51 4 Y 1 1 52 2 Z NI 1 1 1 1 1 60 3 3 1 -

(a) G and Q

(b) Y and X

(c) B, C, D, and E

Figure 5: Perceptual zoning based on confusion parts of the letters

References
Madhavanath, S. & Govindaraju, V., Preceptual features for off-line handwritten word recognition: a framework for prediction, representation and matching. Advances in Pattern Recognition, august, 1998, 524-531. Oh, I-S. Suen, C. Y., A class-modular feedforward neural network for handwriting recognition, Pattern Recognition 35 (2002), 229--244. Otsu, N., A threshold selection method from gray-level histograms, IEEE Transations Systems, Man. and Cybernetics, SMC 9, Vol.1, 1979, pp.63-66. Parker, J.R., Algorithms for Image Processing and Computer Vision. Ed. John Wiley & Sons, Inc. 1997. Sekuler, R. & Blake, R., Perception. 3rd ed. McGraw-Hill, Inc. 1994. Schomaker, L. and Segers, E., A method for the determination of features used in human reading of cursive handwriting. In 6th International Workshop on Frontiers in Handwriting Recognition, 1998, 157-168. Suen, C.Y., Guo, J., Li, Z.C., Analysis and Recognition of Alphanumeric Handprints by parts. IEEE, Transactions on Systems, Man, and Cybernetics, Vol.24, 1994, p. 614-631. Viard-Gaudin, C., The ironoff user manual. IRESTE, University of Nantes, France, 1999.

You might also like