22 views

Uploaded by Krishna Reddy

charecter recognition

- 23
- IJAIEM-2013-08-31-080
- Optimized.approx.neuralNetwork
- water quality svm önemli
- 5-ANN_GKJHA_2007.pdf
- Chapter5 Vapnik ADVANCED
- Advances in Image And Video Segmentation -IRM Press (2006).pdf
- birkbeck-chapter13
- 01. a Generalized Random Walk With Restart and Its Application in Depth Up-Sampling and Interactive Segmentation
- h 0399071813
- ANN QB_BIF
- Improving Training Speed of Support Vector Machines by Creating Exploitable Trends of Lag Rang Ian Variables an Application to DNA Splice Site Detection
- Gallego Pardas Land ICIP08
- Cc 31331335
- Digital Image Processing
- The Correntropy Mace Filter for Image Recognition
- How to Increase Google Scholar Citations
- 10.1.1.23
- Project1 Description
- 112

You are on page 1of 27

A Thesis submitted in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology

by

ANSHUL GUPTA

(07010206)

MANISHA SRIVASTAVA

(07010226)

ASSAM, INDIA - 781039

April, 2011

Certificate

This is to certify that work reported in this thesis entitled Offline Handwritten Character

Recognition in partial fulfilment of the requirements for the award of the Degree of Bachelor

of Technology, is submitted by Anshul Gupta(07010206) and Manisha Srivastava(07010226) in

the Department of Electronics and Communication Engineering, Indian Institute of Technology

Guwahati , under the supervision of Dr. Chitralekha Mahanta, Department of ECE, IIT Guwahati. The matter embodied in this thesis has not been submitted elsewhere for the award of any

other degree.

Place :

Date :

(Supervisor Signature)

Dr. Chitralekha Mahanta

Associate Professor

Dept. of Electronics and Communication Engineering

Indian Institute of Technology, Guwahati

Acknowledgment

First and foremost, We would like to take this opportunity to express our deepest and

sincere gratitude to our thesis supervisor. We value the freedom she gave us to carry out

research in the field of our interest and we sincerely thank her for that. Her stimulating

suggestions and encouragement helped us in the time of research and writing of this thesis.

We are very much thankful for her continuous help and support during entire semester.

Finally, We would like to thank our parents and siblings and friends for their immense

love and support during our entire student life.

Abstract

Character Recognition (CR) has been an active area of research and due to its diverse

applicable environment, it continues to be a challenging research topic. In this project, we

focus specially on off-line recognition of handwritten English words. The main approaches

for off-line cursive word recognition can be divided into segmentation-based and holistic

one. The holistic approach is used in recognition of limited size vocabulary where global

features, extracted from the entire word image are considered. As the size of the vocabulary increases, the complexity of algorithms also increases linearly due to the need for a

larger search space and a more complex pattern representation. Additionally, the recognition rates decrease rapidly due to the decrease in interclass variances in the feature space.

The segmentation based strategies, on the other hand, employ bottom-up approaches, starting from stroke or character level and going towards producing a meaningful text. With

the cooperation of segmentation stage, the problem is reduced to the recognition of simple

isolated characters or strokes, which can be handled for unlimited vocabulary. We here

adopt segmentation based character recognition using neural nets. A number of techniques

are available for feature extraction and training of CR systems each with its own superiorities and weaknesses. We will try to explore these techniques in order to obtain a good

recognition rate.

Introduction

It is a challenging issue to develop a practical cursive, handwritten CR system which can maintain

high recognition accuracy and is independent of the quality of the input documents. Very often

adjacent characters tend to be touched or overlapped.

Therefore, in the segmentation-based strategy, it is essential to segment a given string correctly into

its character components. The complexity of character segmentation stems from the wide variety of

fonts, rapidly expanding text styles and poor image characteristics. Touched, overlapped, separated,

and broken characters are major factors for causing segmentation errors. In most of the existing

segmentation algorithms, human writing is evaluated empirically to deduce rules. Sometimes the rules

derived are satisfactory but there is no guarantee for their optimum results in all styles of writing.

Moreover human writing varies from person to person and even for the same person depending on

mood, speed, environment etc. On the other hand researchers have employed techniques like artificial

neural networks, hidden Markov models and statistical classifiers to extract rules based on numerical

data.

Another crucial module is a cursive character classifier for scoring individual characters. It has to

cope with the high variability of the cursive letters and their intrinsic ambiguity (letters like e and l

or u and n can have the same shape).The features that are used for training the neural net classifier

also play a very important role. The choice of a good feature vector can significantly enhance the

performance of a character classifier whereas a poor one can degrade its performance considerably.

A generic character recognition system may be shown in Figure 1. Its different stages are as given

below:

Input: Samples are read to the system through a scanner.

Preprocessing: Preprocessing converts the image into a form suitable for subsequent processing

and feature extraction.

Segmentation: The most basic step in CR is to segment the input image into individual glyphs.

This step separates out sentences from text and subsequently words and letters from sentences.

Feature extraction: Extraction of features of a character forms a vital part of the recognition

process. Feature extraction captures the vital details of a character.

Classification: During classification, a character is placed in the appropriate class to which it

belongs.

Post Processing: Combining the CR techniques either in parallel or series.

The very first effort in the direction of CR was made by Tyuring who attempted to develop an aid

for the visually handicapped [1]. The first character recognizer appeared in around 1940s. The early

works were concentrated either upon machine-printed text or upon a small set of well-separated handwritten text or symbols. Machine-printed CR generally used template matching and for handwritten

text, low-level image processing techniques were used on the binary image to extract feature vectors,

which were then fed to statistical classifiers [2],[3],[4]. A good survey of the CR techniques used until

1980s can be found in [5]. The period from 1980 - 1990 witnessed a growth in CR system development due to rapid growth in information technology [6],[7],[8]. Structural approaches were initiated in

many systems in addition to the statistical methods [9],[10]. The syntactic and structural approaches

require efficient extraction of primitives [11]. Chan et al. [12] discussed a structural approach for

recognizing on-line handwriting. The recognition process starts with a sequence of points from the

user and then uses these points to extract the structural primitives. These primitives include different

types of line segments and curves. But there existed an upper limit in the recognition rate, because

the CR research was focused basically on the shape recognition techniques without using any semantic information. Historical review of CR research and development during 1980-1990 can be found in

[13]and [14] for off-line and on-line cases, respectively.

After 1990, image processing techniques and pattern recognition were combined using artificial intelligence. Along with powerful computers and more accurate electronic equipments such as scanners,

cameras and electronic tablets, there came in efficient, modern use of methodologies such as neural networks (NNs), hidden Markov models (HMMs), fuzzy set reasoning, and natural language processing.

The 1990s systems for the machine-printed off-line [15],[16] and limited vocabulary, user-dependent

on-line handwritten characters [17],[18] were satisfactory only for restricted applications.

Although research on recognizing isolated handwritten characters has been quite successful, recognizing off-line cursive handwriting has been found to be a challenging problem. There is a large corpus

of research on the application of character recognition in different domains, but no system to date

5

Applications

One application of CR system is handwritten word recognition . Current research aims at developing

constrained systems for limited domain applications such as postal address reading , check sorting, tax

reading, and office automation for text entry. Since we can make use of the entire word at once, it is

possible to exploit correlations between adjacent characters. One way to do this is through contextual

knowledge of syntax and a dictionary of possible words, which has been shown to be successful for

reading handwritten address information of postmarked mail. Another potential application of CR

systems is in script recognition. CR systems also find applications in newly emerging areas, such

as development of electronic libraries, multimedia database, and systems which require handwriting

data entry.

4

4.1

Methodology Used

Segmentation

Most of the existing CR systems threshold the gray-level image and normalize the slant angle and

baseline skew in the preprocessing stage. Then, they employ the normalized binary image in the

segmentation and recognition stages [19, 20, 21]. However, in some cases, normalization may severely

deform the writing generating improper character shapes. Furthermore, through the binarization of

the gray scale document image, useful information is lost. In order to avoid the limitation of binary

image, some recent methods use gray-level image [22]. There, however, the insignificant details

suppress important shape information. The method used in this project for segmentation is similar

to that in [23] which employs an analytic approach on gray-level image supported by binary image

and a set of global features.

4.1.1

4.1.1.1 Global Feature Estimation : In this stage, first, the input image is binarized using a global

threshold. Secondly, the following operations are performed on the binarized image.

4.1.1.1.1 Stroke Width and Height Estimation : Stroke Width Estimation is a two-scan procedure. The first scan on each row of the binary image calculates the stroke width histogram by

counting the black pixel runs in horizontal direction. Then, the mean width, estimated over all of the

rows, is taken as the upper bound (maximum width) for the run length of the strokes. The second

scan on the stroke width histogram discards those strokes whose run length is greater than maximum

width. Finally, the stroke width of the input-word image is estimated as the average width of the

strokes in the second scan. In order to estimate the stroke height, which is assumed to be the average height of the vertical strokes in writing, a similar algorithm is used with the scanning procedure

applied in vertical direction. Minimum height is estimated instead of maximum width. In the second

scan, those pixels whose run lengths are smaller than the minimum height are discarded.

6

4.1.1.1.2 Slant Angle Detection: Slant is the deviation of the strokes from the vertical direction, depending on writing style. In many handwriting recognition studies, slant correction is applied

before segmentation and recognition stages. However, this correction produces serious deformation in

characters. In [24] no slant correction was applied, but slant angle was used later in the segmentation

stage. For slant angle estimation, we have used [25]. The method involves rotating the image from

45 to 45 .The horizontal projection was taken at each rotation to calculate Wigner - Ville distribution (WVD - a joint function of time and frequency). The angle, which presents the maximum

intensity after applying WVD, is taken as the estimated slant angle.

4.1.1.1.3 Baseline extraction : Locations of upper and lower baselines determine the existence

of ascending and descending characters in a given word image. Baseline information is used in

segmentation in order to avoid problems introduced by ascending and descending portions of the

characters. In [24], a new baseline extraction algorithm has been proposed. First, a preliminary

centerline for each word image is determined by finding the horizontal line with the highest number

of black pixel runs. Then, the local minima below the preliminary baseline are identified eliminating

the ones on the ascending part. The goal is to find the best fit to the local minima with a high

contribution from the normal characters and low contribution from descending characters. A weight

is computed for each minimum by considering the average angle between that minimum and the

rest of the minima. This approach assumes relatively small average angles among the minima of

normal characters compared to the average angle between a descending minimum and normal minima,

independent of the writing style. Finally, a line-fitting algorithm is performed over the weighted local

minima. To locate the upper baseline, the local maxima above the lower baseline are identified and

their distances from lower baseline is calculated. The ones whose distance is less than the estimated

stroke height are pruned. Next the remaining distances are clustered in two classes and a line parallel

to the lower baseline is drawn, which passes from the mean value of the class, which includes the local

maxima with smaller distances. The center baseline is a parallel line with equal distance from the

upper and lower baseline.

4.1.1.2 Determination of Segmentation Regions : The segmentation regions carry the potential

segmentation boundaries between the connected characters. The first step is to partition each word

image into stripes along the slant angle direction, each of which contains a potential segmentation

boundary. The rules applied on the binary word image for identifying the segmentation regions are

based on the fact that a single maximum above the center baseline indicates a single character or a

portion of a character whereas the region between the two adjacent local maxima carries a potential

segmentation boundary.

Determination of the segmentation regions in each word image is accomplished in three steps:

Step:1 A straight line is drawn in the slant angle direction from each local maximum until the top

of the word image. However, there may be an ascender character on this path which should

be avoided. While going upward in slant direction, if any contour pixel is hit, this contour is

followed until the slope of the contour changes to the opposite direction which marks the end of

the character. The direction of the contour following is selected as the opposite of the relative

position of the local maximum with respect to the first contour pixel hit by the slanted straight

line. After this a line is drawn from the maximum to the top of the word image in the slant

direction.

Step:2 In this step, a path in the slant direction from each maximum to the lower baseline is drawn.

However, the algorithm avoids passing from the white pixels by selecting a black pixel, as long

as there is one in either left or right neighborhood of the white pixels.

Step:3 A process similar to the one in the first step is performed in order to determine the path from

lower baseline to the bottom of the word image. In this case, the aim is to find the path, which

does not cut any part of the descended character.

4.1.1.3 Segmentation Path : The problem of segmentation can be represented as finding the shortest path from the top row to the bottom row, which minimizes the cumulative cost function as given

in [24],

X

(i,j)(i1 ,k)

(1)

Cost =

1<i<H

j1k j+1

(2)

where,

!

(i,j)(i1 ,k) =

H + (H i)

Ii+1,k + SW Ci+1,k ,

H

(3)

1 i H,

j 1 k j + 1,

(4)

where Iij and Cij are the gray values of the pixel (i, j) in gray-level and boundary image respectively,

with zero corresponding to white and one corresponding to black. SW is the estimated stroke width,

i is the y-coordinate and j is the x-coordinate of a pixel .The above cost function forces the shortest

path to pass through the white pixels without cutting any boundary of the characters if possible.

The intensity values in gray-level are weighted by the y coordinate of the pixel. Between two pixels

with the same intensity values, the pixel whose coordinate is lower than the other is given higher

weight. If the path is required to cut any stroke in the segmentation region, it cuts the stroke, which

is closest to the bottom. The character contours in the boundary image are represented by black

pixels and weighted by the estimated stroke width. The weight given to each boundary pixel enforces

the path to cut the minimum number of strokes. Therefore, the segmentation path is optimal when it

goes through the common stroke, thus separating two joined characters. The algorithm constrain the

possible vertices that can be reached from vij to vi+1,j1 , vi+1,j , and vi+1,j+1 . This avoids the cuts in

the horizontal direction and the moves in the opposite directions. A dynamic programming algorithm

then searches for the shortest path (segmentation path) from top to bottom rows.

4.1.2

In [26] word recognition system, heuristic and intelligent methods are used for the segmentation of

real world, handwritten words.

Gray level image is converted to binary image. Slant detection similar to the one used in Heuristic

based segmentation is employed and then slant correction is done. For both training and testing

phases, a heuristic, feature detection algorithm is used to locate prospective segmentation points in

handwritten words. Each word is inspected in an attempt to locate characteristics representative of

segmentation points.

4.1.2.1 Segmentation using a heuristic algorithm : A simple heuristic segmentation algorithm was

implemented which scanned handwritten words for important features to identify valid segmentation

points between characters. The algorithm first scanned the word looking for minimas or arcs between

letters, common in handwritten cursive script. For this a histogram of vertical pixel densities is

calculated for each word. The histogram is obtained by calculating total runs of vertical pixels for

each column of the word image where black pixels exist. The histogram is examined for minima (low

vertical pixel density) which may confirm the location of possible segmentation points in the word. In

many cases these arcs are the ideal segmentation points, however in the case of letters ,as a and o,

where an erroneous segmentation point could be identified. Therefore the algorithm incorporated a

hole seeking component which attempted to prevent invalid segmentation points from being found.

If an arc was found, the algorithm checked to see whether it had not segmented a letter in half,

by checking for a hole. Finally, the algorithm performed a final check to see if one segmentation

point was not too close to another. This was done by ascertaining if the distance between the last

segmentation point and the position being checked was equal to or greater than the average character

width of a particular word. If the segmentation point in question was too close to the previous one,

segmentation was aborted. Conversely, if the distance between the position being checked and the last

segmentation point was greater than the average character width, a segmentation point was forced.

4.1.2.2 Manual Segmentation of the database: Since we did not have any database for handwritten

words we created our own database for the training of neural network segmentation. 26 words were

chosen that contained all the upper and lower case alphabets and then 10 different samples of each

word were taken on paper from different writers. The images were then scanned and preprocessed to

create a list of 260 words. Prior to ANN training, the heuristic feature detector was used to segment

all words. The segmentation points output by the heuristic feature detector were manually analyzed

so that the x coordinates can be categorized into correct and incorrect segmentation point classes.

For each segmentation point, a matrix of pixels representing the segmentation area was extracted and

stored in an ANN training file. The feature extractor breaks the segmentation point matrix down into

small windows of equal size 5x5 and analysis the density of black and white pixels. Therefore, instead

of presenting the raw pixel values of the segmentation points to the ANN, only the densities of each

window are presented. As an example, if a window exists that, and contains 6 black pixels, then a

single value of 0.24 (Number of pixels/25) was written to the training file to represent the value of

the window. Accompanying each matrix the desired output was also stored in the training file (0.1

for an incorrect segmentation point and 0.9 for a correct point) ready for ANN training

9

4.1.2.3 Training of the ANN : For this step, a multi-layer feed-forward Neural Network trained

with the backpropagation algorithm was used. The ANN was presented with the training pairs found

in the previous step.

4.1.2.4 Testing phase of the segmentation technique : Following ANN training, the words used for

testing are also segmented using the heuristic, feature-based algorithm. This time there is no manual

processing. The segmentation points are automatically extracted and are fed into the trained ANN.

The ANN then verifies which segmentation points are correct and which are incorrect. Finally, upon

ANN verification, each word used for testing should only contain valid segmentation points.

4.1.3

Results of segmentation :

[A]

[B]

hidden layer)], Training Algorithm:traingdx(Matlab); B : Heuristic segmentation

4.2

Feature Extraction

A compact and characteristic representation of the image is required in the CR systems. For this

purpose, a set of features is extracted for each class that helps distinguish it from other classes, while

remaining invariant to intra class differences [27]. A good survey on feature extraction methods for

CR can be found in [22].

The different representation methods can be categorized into three major classes:

1. Global Transformation and Series Expansion: includes Fourier Transform, Gabor Transforms,

wavelets, moments and Karhuen-Loeve Expansion.

2. Statistical Representation: Zoning, Crossing and Distances, Projections.

3. Geometrical and Topological Representation: Extracting and Counting Topological Structures,

Geometrical Properties, Coding, Graphs and Trees etc.

We have used the following three features.

4.2.1

Gradient Features

10

4.2.1.1 Skeletonisation : The skeletonisation process has been used on binary pixel image. The

extra pixels which do not belong to the backbone of the character, were deleted and the broad strokes

were reduced to one pixel thin lines. This creates a uniformity in all the testing and training data.

4.2.1.2 Normalization and Compression:

Since there are a lot of variations in handwritings of

different persons, therefore after skeletonisation process, we used a normalization process, which

normalized the character into 32 x 32-pixel character and used as an input of the neural network.

4.2.1.3 Gradient Feature Extraction: Each character is normalized into 32 x 32 size. The gradient

operator, named Sobel operator is used to calculate the gradient. The Sobel operator uses two

templates to compute the gradient components in horizontal and vertical directions, respectively.

The templates are shown below :

The two gradient components at location (i,j) are calculated by:

gv (i, j) = f (i 1, j + 1) + 2f (i, j + 1) + f (i + 1, j + 1) f (i 1, j 1) 2f (i, j 1) f (i + 1, j 1)

(5)

gh (i, j) = f (i 1, j 1) + 2f (i 1, j) + f (i 1, j + 1) f (i + 1, j 1) 2f (i + 1, j) f (i + 1, j + 1)

(6)

The gradient strength and the direction are calculated as:

G(i, j) =

(7)

gv (i, j)

= arctan

gh (i, j)

(8)

The gradient strength and the direction calculation are the same as eq:7 and eq:8. In this way, we

can calculate the gradients of each character which comes between 0 and 2

11

4.2.2

Fourier Descriptor

The method adopted is similar to [29] where first boundary detection is done. Once a boundary image

is obtained then Fourier descriptors are found. This involves finding the discrete Fourier coefficients

a[k]andb[k] for 0 < k < L 1, where L is the total number of boundary points found, by applying

equations :

L

2

1 X

x[k]e(jk( L )m)

a[k] = ( )

L m=1

(9)

L

2

1 X

b[k] = ( )

y[k]e(jk( L )m)

L m=1

(10)

where x[m] and y[m] are the x and y coordinates, respectively,

of the mth boundary point. As found

q

2

in the study [29], descriptor produced using r[k] = |a[k]| + |b[k]|2 is less effective than using the

moduli of the complex coefficients, |a[k]| and |b[k]|. The values for K = 0 are discarded as they only

contain information about the position of the image. The coefficients for high values of k describe

high frequency features in the image but do not contain much information about the overall shape of

the character and so these high frequency components are also discarded. So the first five beginning

from k = 1 to k = 5 are considered.

Once the coefficients of the moduli have been found, the input vector is normalized to 1 to compensate

for image scaling. To spread the input data more evenly over the input space, the mean and standard

deviation. vectors are found over the whole set of test and training data. The j th component of input

vector i, is calculated as :

1

1 +1

ipj = (ipoj ) ioj )

noj

(11)

where ipoj is the j th component of the original vector or pattern p, ioj is the mean of the j th components

of the original vectors and noj is the corresponding standard deviation. Coefficient linearly controls

the degree of standard deviation compensation. If = 0, there is no compensation for variations of

standard deviation between dimensions; if = 1, the standard deviation of all dimensions is forced

to equal 1, giving full standard deviation compensation.

4.2.2.1 Fourier angle : It was also mentioned in [29] that if there is moduli alone is not successful

in discriminating all the classes experiments can done to incorporate angles also in the training set.

12

4.2.2.2 Fourier magnitude [30] : The use of FFT is not feasible if one seeks rotation and shift

invariant descriptors for the characters. Further, it has been observed that only the first few (say

10-15) Fourier coefficients are needed to adequately describe the various characters. Under these

conditions there exists no computational advantage in using FFT to evaluate the Fourier coefficients.

The Fourier coefficients derived from eq:9 and eq:10 are not rotation or shift invariant (to clarify, it

is noted that a shift will occur if the starting point of boundary following is arbitrary). In order to

derive a set of Fourier descriptors that have the invariant property with respect to rotation and shift

the following operations are defined. For each n compute a set of invariant descriptorsr[n] as :

r[n] =

|a[n]|2 + |b[n]|2

(12)

It is easy to show that r are invariant to rotation or shift. A further refinement in the derivation of

the descriptors is realized if dependence of r[n] on the size of the character is eliminated by computing

a new set of descriptors :

s[n] = r[n]/r[1]

(13)

The Fourier coefficients k(a[n]|, |b[n]| and the invariant descriptors s[n], n = 2, 3.. were derived for

all the character specimens and stored in files for application to reconstruction and recognition.

4.3

4.3.1

Training of classifiers

Neural Network (NN) based classifiers [31].

A neural network is a massively parallel distributed processor that has a natural propensity for storing

experiential knowledge and making it available for use. It resembles the brain in two respects :

1. They adapt by learning process.

2. Knowledge is stored in interconnections between neurons known as synaptic weights.

Basically, learning is a process by which the free parameters (i.e.,synaptic weights and bias levels) of a

neural network are adapted through a continuing process of stimulation by the environment in which

the network is embedded. The type of learning is determined by the manner in which the parameter

changes take place. Broadly learning can be classified into two :

1. Supervised Learning : This form of learning assumes the availability of a labeled (i.e., groundtruthed) set of training data made up of N input-output.

2. Unsupervised Learning : This form of learning do not assume the availability of a set of training

data made up of N input-output. They learn to classify input vectors according to how they

are grouped spatially and try to tune its network by considering a neighborhood.

In this project we will consider MLP RBF as classifiers based on supervised learning. We have used

Matlab neural network toolbox for the implementation of these networks.

13

4.3.1.1 Multilayer Perceptron(MLP) : This network is a feed forward network because its structure does not contain any loop. As shown in Fig., a multilayer perceptron has an input layer of source

nodes and an output layer of neurons (i.e., computation nodes); these two layers connect the network

to the outside world. In addition to these two layers, the multilayer perceptron usually has one or

more layers of hidden neurons, which are so called because these neurons are not directly accessible.

The hidden neurons extract important features contained in the input data. Each input node is

connected to each node of hidden layer by a synaptic weight. The input to a hidden node is the sum

of all input nodes weighted by synaptic weights for connection between input nodes and the hidden

neurons.

There are many activation functions out of which we selected tan-sigmoid, log-sigmoid and pure

linear.

2

1

(1 + exp(2n))

1

(1 + exp(n))

3. purelinear purelin(n) = n

4.3.1.2 Radial Basis function(RBF NN) : Radial Basis function NN ( RBF NN ) is a two layer

network. It falls under the category of feed-forward network, in which graphs has no loops. Basic

structure of RBF network is given below :

14

A radial basis function is a real- valued function whose output depends on the distance between

origin and the input to that function.

(x, c) = (x, c)

where ||x c|| is norm or the distance between vector c (defined as center of the radial basis

function) and the input.

Different types of radial basis function. :

1. Gaussian RBF :

z(x) = exp(

||x ||2

)

()2

Where is a vector defined as the center of the Gaussian function and is the standard deviation of

this given function and x is the distance between the input and the center of this function.

(r) =

1

(1 + exp( r ))

where is the standard deviation and r is the distance between input and the origin.

The RBF network is built up as a linear combination of N radial basis functions with N distinct

centers. Given an input vector x, the output of the RBF network is the activity vector y given by

y =

L

X

m=1

15

j zj

(14)

where, j is the weight associated with the jth radial basis function, centered at j and zj =

(||x j ||). The output y approximates a target set of values denoted by y.

into two parts.

2. Validation subset used for evaluating the model performance.

The network is finally tuned by using the entire set of training examples and then tested on test

data.Training of these networks is usually done by back-propagation algorithm. This algorithm consists of two phases:

1. Forward Phase: In this phase the free parameters of the network are fixed, and the input

signal is propagated through the network, layer by layer. At the end of this phase error signal is

calculated between predicted output of network and the actual output corresponding to input

sample presented.

2. Backward Phase: During this phase, the error signal ei is propagated through the network in

the backward direction. It is during this phase that adjustments are applied to the network

weights so as to minimize the error ei in a statistical sense, generally MSE criterion is used.

16

4.3.1.4 Classification using neural networks: In classification problems, the purpose of the network is to assign each input to one of the classes. Each of the output units has continuous activation

values between 0.0 and 1.0. In order to definitely assign a class from the outputs, the network must

decide if the outputs are reasonably close to 0.0 or 1.0, otherwise the class is regarded as undecided.

Confidence levels (the accept and reject thresholds) decide how to interpret the network outputs.

4.3.2

Support Vector Machines are based on the concept of decision planes that define decision boundaries.

A decision plane is one that separates between a set of objects having different class memberships.

Most classification tasks, however, are not that simple, and often more complex structures are needed

in order to make an optimal separation, i.e., correctly classify new objects (test cases) on the basis

of the examples that are available (train cases). For example , in the figure below the GREEN and

RED objects would require a curve (which is more complex than a line).

Support Vector Machines are particularly suited to handle such tasks.

The illustration below shows the basic idea behind Support Vector Machines. Here we see the original

objects (left side of the schematic) mapped, i.e., rearranged, using a set of mathematical functions,

known as kernels. The process of rearranging the objects is known as mapping (transformation).

Note that in this new setting, the mapped objects (right side of the schematic) is linearly separable

and, thus, instead of constructing the complex curve (left schematic), all we have to do is to find an

optimal line that can separate the GREEN and the RED objects.

17

Support Vector Machine (SVM) is primarily a classifier method that performs classification tasks

by constructing hyperplanes in a multidimensional space that separates cases of different class labels.

4.4

Fourier with phase, ||a(k)|| and ||b(k)|| features are used for the comparison of classifiers.

4.4.1

[A]

[B]

Figure 10: A: MLP with structure [80(first hidden) 50(second hidden) 50(third hidden)],Algorithm

usedGradient-descent with momentum (traingdx of Matlab), learning rate: adaptive with initial 0.2,

Momentum :0.9 : Results are very bad on training set ; B : RBF : Results are good on training data

but over learning is high hence bad results on test data.

4.4.2

In case of SVM result on training data is 98.86% and very optimum learning. The result on the

testing data is 62.93%.

On the test data SVM outperforms the other two networks.

4.4.3

Now we have to pick the best feature extraction technique for our system. For that we tested SVM

with different feature vectors. The table below shows the recognition rate (%) for all four feature

18

vectors.

Fourier

with

magnitude

(s(k)),|a(k)| and

|b(k)|

86.66%

4.5

Fourier

with Fourier

with Gradient

phase,

|a(k)| magnitude

tures.

and |b(k)|.

(s(k)),|a(k)|,|b(k)|

and phase

98.74%

98.04%

40.50%

fea-

Fusion is one of the powerful methods for improving recognition rates produced by various techniques.

It takes advantage of different errors produced by different techniques, emphasizes the strengths

and avoids the weaknesses of individual techniques. Researcher have found that in many real word

applications , it is better to fuse multiple techniques to improve the results. Fusion can be done in

the following two ways:

Serial Architecture: In this method the output of a classifier is fed into the next classifier. There

are four basic methodologies used, viz.: sequential, selective [33], boosting [34] and cascade

[35] methodologies.

Parallel Architecture: This method combines the result of more than one independent algorithms

by using one of the following methodologies: voting , Bayesian [36], Dempster-Shafer Theory

[37], behavior-knowledge space [38], mixture of experts [39] and stacked generalization.

We here use a method based on Borda count that is inspired from [40] to combine the following results

:

Technique 1: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)|| and magnitude s(k).

Technique 2: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)|| and phase.

Technique 3: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)||, phase and magnitude s(k).

4.5.1

Conventional Borda count for a string in the lexicon is defined as the sum of the number of strings

that are below the string in the different lexicons produced by the various techniques [40].

4.5.2

A rank is assigned and used in the calculation of the Boda count, instead of calculating the number

of strings below the string to be recognized. The rank for a particular string can be calculated using

the following formulae :

Rank = 1

N

19

We have taken N=3.Therefore only top three words are considered from each technique to calculate the

rank. Secondly the confidence values produced by different techniques are considered. The confidence

values for all the three predicted words for any given technique is the confidence that the classifier has

in its predicted string, even if the string is not a valid lexicon word. This can be estimated by summing

up the scores of each predicted characters. This is reasonable because the top three strings are chosen based on its similarity with the predicted string. The similarity between the predicted string and

the lexicon words are found by finding the number of matching characters and their relative positions.

Final Boda count of a lexicon word = (rank conf idence)tech1 + (rank conf idence)tech2 + (rank

conf idence)tech3

20

Final Results

21

22

Discussions

In case of Moderated, the neural network segmentor failed to segment te. This is obvious because

it treated them as a hole because of the way in which these pair of characters was written. The outputs of the three different techniques are MOrerlmd, MOGeraED and MOrerlmd, which has

very small similarity with word Moderated. This error is because of the low discriminative ability

of fourier features and their combinations in our case where they have to distinguish 52 different

classes. This error is corrected in the post processing step where the borda count for all three parallel

techniques is highest for the word Moderated of the lexicon. Hence, system outputs correct word

Moderated.

In case of puzzle, u is incorrectly segmented into two. The outputs of the three different techniques

are PzZzfe, PCZZfc and PsZzme, which has very small similarity with word Puzzle. Again

this is due to the low discriminative ability of Fourier features. Here the output of two techniques

is Puzzle with confidence 1.2 each while the third technique predicted Climate with confidence

2.17. This error is corrected when the borda count for all three techniques are combined with highest

23

confidence for the word Puzzle . Hence, system outputs correct word Puzzle.

In case of Rolled, segmentation is perfect but the outputs of the three techniques are quite different

from the word Rolled. But combining the results of the three parallel techniques the score for the

word Rolled is highest, hence system outputs correct word.

Conclusions

We thus conclude that the proposed system gives fairly good results on the test samples that were

presented to it. We could not list the recognition accuracy as percentage because we did not have

enough test samples. We tested both heuristic and neural network based segmentation and found

that the later gave better results. This is reasonable because heuristic algorithm is based on rules

that are deduced empirically and there is no guarantee for their optimum results for different styles

of writing. So their validation using neural network becomes essential. Moreover our character

recognition network has 52 output classes whereas in most of the literature they have used separate

classifiers for upper and lower case characters. We tested different neural networks that have been

used in the past for character recognition. We tried different configuration of MLP upto 3 hidden

layers and the best results were obtained with [80 50 50] configuration, with validation performance

of 0.01 in 640 epochs. The training algorithm used in this case was Gradient-descent with momentum

(traingdx of Matlab). Also, we tested RBF neural network and got performance (MSE) of 0.0010155

in 1800 epochs. This network suffered from over learning and gave poor results on test data. Apart

from neural network we tried Support vector machines classifier on the same feature set and achieved

98% classification accuracy on training data set and 62.93% on test data set. Finally, we selected

SVM as it outperformed MLP and RBF. For feature extraction we started with gradient features,

which in our case produced very poor results. We tried Fourier features like moduli of Fourier

coefficients,magnitude, phase and their various combinations as feature vectors. We got best results

with Moduli of Fourier coefficients and phase with a recognition accuracy of 98.74% on training data

set. We have used three combinations of Fourier descriptors in our final system.Postprocessing which

uses lexicon becomes imperative as there is no other way to find out the errors that have creeped in

at any of the previous stages.The only way to do that is to verify that whether the predicted word is

a valid lexicon word or not.Thus incorporating this in our final system using Borda Count improved

the overall efficiency of the system.

Future work

Performance of neural network based segmentation can be improved by using a larger database. More

research can be done to come up with a better feature vector that incorporates transform based, statistical and directional features for character recognition. SVM has outperformed in classification of

characters because it performs classification tasks by constructing hyperplanes in a multidimensional

space that separates samples of different class labels. Other recently developed technique like Dempster Shafer theory could be used for combining different CR technique. Even in the case of Borda

Count other techniques can be explored which can give different confidences to each predicted lexicon

24

word for a given classifier. Also, experiments can be done to give different weights to each of the

parallel CR techniques according to their performance on the validation data

References

[1] J. Mantas, An overview of character recognition methodologies, Pattern Recognition, vol. 19, no. 6, pp. 425 430, 1986.

[2] T. S. El-Sheikh and R. M. Guindi, Computer recognition of arabic cursive scripts, Pattern Recognition, vol. 21, no. 4, pp. 293 302,

1988.

[3] S. Mori, K. Yamamoto, and M. Yasuda, Research on machine recognition of handprinted characters, Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. PAMI-6, no. 4, pp. 386 405, 1984.

[4] C. Suen, M. Berthod, and S. Mori, Automatic recognition of handprinted characters 8212;the state of the art, Proceedings of the

IEEE, vol. 68, no. 4, pp. 469 487, 1980.

[5] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.

[6] R. Bozinovic and S. Srihari, Off-line cursive script word recognition, Pattern Analysis and Machine Intelligence, IEEE Transactions

on, vol. 11, pp. 68 83, Jan. 1989.

[7] V. Govindan and A. Shivaprasad, Character recognition a review, Pattern Recognition, vol. 23, no. 7, pp. 671 683, 1990.

[8] Q. Tian, P. Zhang, T. Alexander, and Y. Kim, Survey: omnifont-printed character recognition, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series (K.-H. Tzou & T. Koga, ed.), vol. 1606 of Society of Photo-Optical Instrumentation

Engineers (SPIE) Conference Series, pp. 260268, Nov. 1991.

[9] A. Belaid and J.-P. Haton, A syntactic approach for handwritten mathematical formula recognition, Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. PAMI-6, no. 1, pp. 105 111, 1984.

[10] Y. Ding, F. Kimura, Y. Miyake, and M. Shridhar, Evaluation and improvement of slant estimation for handwritten words, in

Document Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 753 756, Sept.

1999.

[11] S. M. Lucas, E. Vidal, A. Amiri, S. Hanlon, and J.-C. Amengual, A comparison of syntactic and statistical techniques for off-line

ocr, in Proceedings of the Second International Colloquium on Grammatical Inference and Applications, (London, UK), pp. 168179,

Springer-Verlag, 1994.

[12] K.-F. Chan and D.-Y. Yeung, Recognizing on-line handwritten alphanumeric characters through flexible structural matching, 1999.

[13] S. Mori, C. Suen, and K. Yamamoto, Historical review of ocr research and development, Proceedings of the IEEE, vol. 80, pp. 1029

1058, July 1992.

[14] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and Machine

Intelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.

[15] H. Avi-Itzhak, T. Diep, and H. Garland, High accuracy optical character recognition using neural networks with centroid dithering,

Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 218 224, Feb. 1995.

[16] I. Bazzi, R. Schwartz, and J. Makhoul, An omnifont open-vocabulary ocr system for english and arabic, Pattern Analysis and

Machine Intelligence, IEEE Transactions on, vol. 21, pp. 495 504, June 1999.

[17] J. Hu, S. G. Lim, and M. K. Brown, Writer independent on-line handwriting recognition using an hmm approach, Pattern Recognition,

vol. 33, no. 1, pp. 133 147, 2000.

[18] A. Meyer, Pen computing: a technology overview and a vision, SIGCHI Bull., vol. 27, pp. 4690, July 1995.

[19] G. Kim and V. Govindaraju, A lexicon driven approach to handwritten word recognition for real-time applications, Pattern Analysis

and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 366 379, Apr. 1997.

[20] M. Mohamed and P. Gader, Handwritten word recognition using segmentation-free hidden markov modeling and segmentation-based

dynamic programming techniques, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 548 554, May

1996.

25

[21] A. A. Atici and F. T. Yarman-Vural, A heuristic algorithm for optical character recognition of arabic script, Signal Processing,

vol. 62, no. 1, pp. 87 99, 1997.

[22] ivind Due Trier, A. K. Jain, and T. Taxt, Feature extraction methods for character recognition-a survey, Pattern Recognition, vol. 29,

no. 4, pp. 641 662, 1996.

[23] S.-W. Lee, D.-J. Lee, and H.-S. Park, A new methodology for gray-scale character segmentation and recognition, Pattern Analysis

and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 1045 1050, Oct. 1996.

[24] N. Arica and F. Yarman-Vural, Optical character recognition for cursive handwriting, Pattern Analysis and Machine Intelligence,

IEEE Transactions on, vol. 24, pp. 801 813, June 2002.

[25] E. Kavallieratou, N. Fakotakis, and G. Kokkinakis, Skew angle estimation for printed and handwritten documents using the wignerville distribution, Image and Vision Computing, vol. 20, no. 11, pp. 813 824, 2002.

[26] M. Blumenstein and B. Verma, Neural-based solutions for the segmentation and recognition of difficult handwritten words from a

benchmark database, in Document Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conference

on, pp. 281 284, sep 1999.

[27] I.-S. Oh, J.-S. Lee, and C. Suen, Analysis of class separation and combination of class-dependent features for handwriting recognition,

Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 1089 1094, Oct. 1999.

[28] D. Singh, M. Dutta, and S. H. Singh, Neural network based handwritten hindi character recognition system, in Proceedings of the

2nd Bangalore Annual Computer Conference, COMPUTE 09, (New York, NY, USA), pp. 15:115:4, ACM, 2009.

[29] I. P. Morns and S. S. Dlay, Character recognition using fourier descriptors and a new form of dynamic semisupervised neural network,

Microelectronics Journal, vol. 28, no. 1, pp. 73 84, 1997.

[30] M. Shridhar and A. Badreldin, High accuracy character recognition algorithm using fourier and topological descriptors, Pattern

Recognition, vol. 17, no. 5, pp. 515 524, 1984.

[31] S. Haykin, Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1st ed., 1994.

[32] C. Wei, Statsoft, inc., tulsa, ok.: Statistica, version 8, AStA Advances in Statistical Analysis, vol. 91, pp. 339341, 2007.

10.1007/s10182-007-0038-x.

[33] S. Gopisetty, R. Lorie, J. Mao, M. Mohiuddin, A. Sorin, and E. Yair, Automated forms-processing software and services, IBM J.

Res. Dev., vol. 40, pp. 211230, March 1996.

[34] H. Drucker, R. E. Schapire, and P. Simard, Improving performance in neural networks using a boosting algorithm, in Advances in

Neural Information Processing Systems 5, [NIPS Conference], (San Francisco, CA, USA), pp. 4249, Morgan Kaufmann Publishers

Inc., 1993.

[35] J. Park, V. Govindaraju, and S. Srihari, Ocr in a hierarchical feature space, Pattern Analysis and Machine Intelligence, IEEE

Transactions on, vol. 22, pp. 400 407, Apr. 2000.

[36] H.-J. Kang and S.-W. Lee, Combining classifiers based on minimization of a bayes error rate, in Document Analysis and Recognition,

1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 398 401, Sept. 1999.

[37] L. Xu, A. Krzyzak, and C. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition, Systems,

Man and Cybernetics, IEEE Transactions on, vol. 22, no. 3, pp. 418 435, 1992.

[38] Y. Huang and C. Suen, A method of combining multiple experts for the recognition of unconstrained handwritten numerals, Pattern

Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 90 94, Jan. 1995.

[39] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, Adaptive mixtures of local experts, Neural Comput., vol. 3, pp. 7987,

March 1991.

[40] B. Verma, P. Gader, and W. Chen, Fusion of multiple handwritten word recognition techniques, Pattern Recognition Letters, vol. 22,

no. 9, pp. 991 998, 2001.

26

- 23Uploaded byJames Lanham
- IJAIEM-2013-08-31-080Uploaded byAnonymous vQrJlEN
- Optimized.approx.neuralNetworkUploaded byscribdmtm
- water quality svm önemliUploaded bymanisali44
- 5-ANN_GKJHA_2007.pdfUploaded bygetmak99
- Chapter5 Vapnik ADVANCEDUploaded byshayan
- Advances in Image And Video Segmentation -IRM Press (2006).pdfUploaded byU.Ida Percia
- birkbeck-chapter13Uploaded byMaksym Walczak
- 01. a Generalized Random Walk With Restart and Its Application in Depth Up-Sampling and Interactive SegmentationUploaded bysaran52_ece
- h 0399071813Uploaded bySaurabh Pandey
- ANN QB_BIFUploaded byMATHANKUMAR.S
- Improving Training Speed of Support Vector Machines by Creating Exploitable Trends of Lag Rang Ian Variables an Application to DNA Splice Site DetectionUploaded bybhatt_chintan7
- Gallego Pardas Land ICIP08Uploaded bycacr_72
- Cc 31331335Uploaded byIJMER
- Digital Image ProcessingUploaded byvenkateshmukharji
- The Correntropy Mace Filter for Image RecognitionUploaded bydanilodsp
- How to Increase Google Scholar CitationsUploaded byIJARBEST
- 10.1.1.23Uploaded byMouncef Cherqaoui
- Project1 DescriptionUploaded by13110449
- 112Uploaded byZeeshan Jafar
- iris bio metricUploaded byMehdi Hafeez
- CONTENT-BASED RETRIEVAL SYSTEMSUploaded byIRJCS-INTERNATIONAL RESEARCH JOURNAL OF COMPUTER SCIENCE
- Accepted PapersUploaded byNahid Hasan
- What Are Best Machine Learning Certifications Available_ - QuoraUploaded bySai Kumarp
- Modules of Compound Image CompressionUploaded byprakash
- Paper 30-Improved Accuracy of PSO and de Using Normalization an Application to Stock Price PredictionUploaded byEditor IJACSA
- Facebook Check in Deals APIUploaded byAllfacebook.de - Facebook Marketing Blog
- 1801.00826Uploaded bycunhafcc
- 10.1.1.108Uploaded bySatyasis Mishra
- An Automatic Detection of Landing Sites for Emergency Landing of AircraftUploaded byIRJET Journal

- Congestion Management USING FACTS DEVICESUploaded byKrishna Reddy
- Fundumentals of Genetic AlgorithmsUploaded byKrishna Reddy
- Fault Location in Series Compensated Transmission LinesUploaded byKrishna Reddy
- Infrastructural FacilitiesUploaded byKrishna Reddy
- ECTutorial(Chap5 Found of PSO)Uploaded byKrishna Reddy
- Damping Interareaand Torsional Oscillations Using Facts DevicesUploaded byKrishna Reddy
- Introduction -Evolutionary ComputationUploaded byKrishna Reddy
- Educational Modeling for Fault Analysis of Power Systems With STATCOM Using SimulinkUploaded byKrishna Reddy
- Solutions of Ac- Dc Power Flow.-2Uploaded byKrishna Reddy
- Lecture Single Phase Induction Motor(1)Uploaded bymirko_italy
- Solutions of Ac- Dc Power Flow.-1Uploaded byKrishna Reddy
- FACTS_Controllable ParametersUploaded byc_h_v_k_r
- Reactive Power and Harmonic Filter_ABBUploaded bypongpum
- Pre Qualifiers-AICTEUploaded byKrishna Reddy
- Fundamentals of PsoUploaded byKrishna Reddy
- Bsk Ele DutyUploaded byKrishna Reddy
- Performance IndicatorsUploaded byKrishna Reddy
- A New Method for Determining Reference.pdfUploaded byMarimuthu Balasubramaniam
- Switch GearUploaded bymsudam
- ResumeUploaded byKrishna Reddy
- ANRS-EM-I 13-14Uploaded byKrishna Reddy
- 09 Chapter 1Uploaded byKrishna Reddy
- New Microsoft Office Word DocumentUploaded byKrishna Reddy
- Suggested Periodicals and Reference BooksUploaded byKrishna Reddy
- Letter Pay Scales Arrears-9!10!13Uploaded byKrishna Reddy
- EEE VLSI Assignment1Uploaded byKrishna Reddy
- Lic Course Objectives and OutcomesUploaded byKrishna Reddy
- 27Uploaded byKrishna Reddy
- Pspice.cirUploaded byKrishna Reddy

- a.k.t.u.-B.Tech._CSE-4th year_ _19_05_2016Uploaded byGaurav Sharma
- Performance Analysis of Various Data Mining Techniques on Banknote AuthenticationUploaded byinventionjournals
- Object-Oriented Rosenblatt Perceptron Using C++Uploaded bySam Bixler
- chapter 2 ann.docxUploaded byPavan Kumar Narendra
- Emotion a SoundsUploaded byRaveendra Moodithaya
- Credit Card Fraud Detection Using Perceptron Training Algorithm and Prevention Using One Time PasswordUploaded byEditor IJRITCC
- ms3Uploaded byAnthi Valavani
- Face Recognition Using Neural Networks Sona CollegeUploaded byAjith Varghese
- Predicting Student Academic Performance in Blended Learning Using Artificial Neural Networks Full TextUploaded byAdam Hansen
- Music_Genre_Classification_Project_Report.pdfUploaded byanon_496380494
- New Microsoft Word DocumentUploaded byAguz Munandar
- lect7.pdfUploaded byerjaimin89
- Neural NetworksUploaded byAazim Yashwanth
- 10.1.1.31.6617Uploaded byChulka
- 0 PerceptronUploaded byLuis Enrique Carmona Gutierrez
- electrical and electronicsUploaded bykiran
- Neural Networks(1)Uploaded byapi-3814100
- Thesis Final ShaneLynnUploaded bytomasheaney
- Neural Network Based Face DetectionUploaded byMohammad Umar Rehman
- Classification ToolboxUploaded bybadi143
- 5.Perceptron and LMSUploaded byAkanksha Maurya
- v46i07Uploaded byMarco Zárate
- TB04_soft-computing-ebook.pdfUploaded byManoj Balaji
- Computational-Fluid-Dynamics-Expert-System-using-Artificial-Neural-Networks.pdfUploaded byAnonymous p4tN2uKslj
- Deep Learning for Computer Vision - Rajalingappa ShanmugamaniUploaded byTin Kuculo
- 2-PerceptronsUploaded byazmi_fa
- NeuralNetworksUploaded byRijas Rasheed
- Recurrent Neural Networks for PredictionUploaded byjohnsmithxx
- Test BankUploaded byRodel D Dosano
- Neural Networks-MATLAB ExamplesUploaded bydpksobs