You are on page 1of 38

CHAPTER 1

INTRODUCTION
Digital Image Processing (DIP) is rapidly changing field with growing application in science and
engineering. DIP refers to the processing of a 2 dimensional picture by a digital computer. Image
processing holds the possibility of an ultimate model that could perform visual functions of all
human beings.

Character recognition is a concept related to digital signal processing and is a quite an upcoming
topic in IT branch. Today the recognition of sign is must in bank institution. Character recognition is
a trivial for humans, but to make a computer program that does a character recognition is a difficult
task. For humans character recognition seems to be simple task but to make a computer analyze and
finally correct and recognize a character is a difficult task. Extracting information from the image
requires accurate recognition of the text of that image. Camera captured images can suffer from low
resolution, blur, and perspective distortion, as well as complex layout and interaction of the content
and background.

Kannada is one of the major Dravidian languages of southern India and one of the earliest
languages evidenced epigraphically in India. The script has 49 characters in its alpha syllabary and is
phonetic. The characters are classified into three categories: swaras (vowels), vyanjanas (consonants)
and yogavahanas (part vowel, part consonant). Due to the impact and the advancement in the
information technology, nowadays more emphasis is given in Karnataka to use Kannada at all levels
and hence the use of Kannada in computer system is also necessity .Therefore, efficient character
recognition system for Kannada is one of the present day requirement.

In particular, our focus is on the recognition of individual characters of camera based images .From
the images the characters are manually segmented.10 samples of 49 characters ,totally 490 characters
are collected. 8 samples of each characters are sent for training phase to extract features using feature
extraction algorithm. The images are first converted to gray scale and then binary images .These
images are then scaled to fit a pre-determined area with the fixed but significant number of pixels
.The feature vector are then extracted from the image by using Hu’s moments. The features are
stored in knowledge base. The testing phase uses the image to be recognized and on running the
algorithm , if recognized then the message containing the group name to which the character belong
to will be displayed based on Euclidian distance.

The proposed work is planned to develop a system for recognition of Kannada characters for the
images taken from the camera from various sources like government office boards. We present an
annotated database of images containing basic Kannada character. Our proposed work concentrates
on invariant features of the image to recognize the character.

Kannada script
Kannada is one of the four popular Dravidian languages of South India. Kannada is written
horizontally from left to right and the concept of lower and upper case is absent .Kannada is a non-
cursive script. That is, a Kannada word is written without joining the characters of the word. The
characters are isolated within a word.

Kannada language has 16 vowels and 34 consonants as the basic alphabet of the language as shown
in figure 1 and 2 respectively. Each vowel has a vowel sign(modifier) and each consonant has a basic
form (primitive).A basic form of consonant can combine with the vowel sign to form another setoff
16 Consonant-Vowel(CV) composite characters called as ‘Gunithakshara ’.In Kannada, all the 34
consonants have a half/short form, called as ‘Vat thus’, which can be referred as half consonants or
subscripts. Any half consonant can appear below any other consonant or a CV character as subscript
character to form a conjunct-consonant character.

1.1. Related Work

Some of the related works are summarized below:

A new method for Text detection and recognition in images and video frames is described in [1].
Text detection is performed in a two-step approach that combines the speed of a text localization
step, enabling text size normalization, with the strength of a machine learning text veri3cation step
applied on background independent features. Text recognition, applied on the detected text lines, is
addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-
hypotheses framework relying on multiple segments, language modeling and OCR statistics.
Experiments conducted on large databases of real broadcast documents demonstrate the validity of
our approach
A detailed report on Progress in camera based document image analysis is described in [2]. The
increasing availability of high performance, low priced, portable digital imaging devices has created
a tremendous opportunity for supplementing traditional scanning for document image acquisition.
Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly
mobile and easy to use; they can capture images of any kind of document including very thick books,
historical pages too fragile to touch, and text in scenes; and they are much more versatile than
desktop scanners. Should robust solutions to the analysis of documents captured with such devices
become available, there is clearly a demand from many domains. Traditional scanner-based
document analysis techniques provide us with a good reference and starting point, but they cannot be
used directly on camera-captured images. Camera captured images can suffer from low resolution,
blur, and perspective distortion, as well as complex layout and interaction of the content and
background. In this paper we present a survey of application domains, technical challenges and
solutions for recognizing documents captured by digital cameras. We begin by describing typical
imaging devices and the imaging process. We discuss document analysis from a single camera-
captured image as well as multiple frames and highlight some sample applications under
development and feasible ideas for future development.

The Description about System design for optical character recognition for handheld devices is
described in[3].It presents a complete Optical Character Recognition (OCR) system for camera
captured image/graphics embedded textual documents for handheld devices. At first, text regions are
extracted and skew corrected. Then, these regions are finalized and segmented into lines and
characters. Characters are passed into the recognition module. Experimenting with a set of 100
business card images, captured by cell phone camera, we have achieved a maximum recognition
accuracy of 92.74%. Compared to Tesseract, an open source desktop-based powerful OCR engine,
present recognition accuracy is worth contributing. Moreover, the developed technique is
computationally efficient and consumes low memory so as to be applicable on handheld devices

A detailed Explanation of Mobile based text detection and translation system is given in [4].
Inspired by the well-know iPhone app “Word Lens”, we developed an Android-platform based text
translation application that is able to recognize the text captured by a mobile phone camera, translate
the text, and display the translation result back onto the screen of the mobile phone. Our text
extraction and recognition algorithm has a correct-recognition rate that is greater than 85% on
character level. In this report, they demonstrate the system flow, the text detection algorithm and
detailed experiment result.
One of the Feature extraction method for character recognition is described in [5].The paper presents
an overview of feature extraction methods for off-line recognition of segmented (isolated) characters.
Selection of a feature extraction method is probably the single most important factor in achieving
high recognition performance in character recognition systems. Different feature extraction methods
are designed for different representations 6f the characters, such as solid binary characters, character
contours, skeletons (thinned characters) or gray-level sublimates of each individual character. The
feature extraction methods are discussed in terms of invariance properties, reconstruct ability and
expected distortions and variability of the characters. The problem of choosing the appropriate
feature extraction method for a given application is also discussed. When a few promising feature
extraction methods have been identified, they need to be evaluated experimentally to find the best
method for the given application.

Character recognition in natural images is described in [6]. The paper tackles the problem of
recognizing characters in images of natural scenes. In particular, we focus on recognizing characters
in situations that would traditionally not be handled well by OCR techniques. We present an
annotated database of images containing English and Kannada characters. The database comprises
of images of street scenes taken in Bangalore, India using a standard camera. The problem is
addressed in an object categorization framework based on a bag-of-visual-words representation. We
assess the performance of various features based on nearest neighbor and SVM classification. It is
demonstrated that the performance of the proposed method, using as few as 15 training images, can
be far superior to that of commercial OCR systems. Furthermore, the methods can benefit from
synthetically generated training data obviating the need for expensive data collection and annotation.

Text and character recognition using fuzzy image processing is described in[7].The current
investigation presents an algorithm and software to detect and recognize character in an image. Three
types of fonts were under investigation, namely, case (I): Verdana, case (II): Arial and case (III):
Lucida Console. The font size will be within the range of 17–29 font size. These types were chosen
because the characters have low variance and there is less redundancy in the single character. Also,
they have no breakpoints in the single character or merge in group of characters as noticed in Times
New Roman type. The proposed algorithm assumed that there are at least three characters of same
colour in a single line, the character is on its best view which means no broken point in a single
character and no merge between group of characters and at last, a single character has only one color.
The basic algorithm is to use the 8-connected component to binaries the image, then to find the
characters in the image and recognize them. Comparing this method with other methods, we note
that the main difference is in the binarizing technique. Previous work depends mainly on histogram
shape. They calculate the histogram, and then smooth it to find the threshold points. These methods
are not working well when the text and background colours are very similar, and also if it may create
an area of negative text. The negative text may be treated as a graphic region which may affect the
system efficiency. The presented approach will take each colour alone. This will make the merge
between the text and the background unlikely to happen. Also there will be no negative text in the
whole image because the negative text to a colour will be normal text for another. The shown and
discussed results will show the efficiency of the proposed algorithm and how it is different compared
with other algorithms.

In case of south Indian scripts ,approach have been taken which is Efficient zone based
feature extraction algorithm for hand written numeral recognition and is described in[8]Character
recogniti on is the important area in image processing and pattern recognition fields. Handwritten
character recognition has received extensive attention in academic and production fields. The
recognition system can be either on-line or off-line. Off-line handwriting recognition is the subfield
of optical character recognition. India is a multi-lingual and multi-script country, where eighteen
official scripts are accepted and have over hundred regional languages. In this paper we propose
Zone centroid and Image centroid based Distance metric feature extraction system. The character
centroid is computed and the image (character/numeral) is further divided in to n equal zones.
Average distance from the character centroid to the each pixel present in the zone is computed.
Similarly zone centroid is computed and average distance from the zone centroid to each pixel
present in the zone is computed. We repeated this procedure for all the zones/grids/boxes present in
the numeral image. There could be some zones that are empty, and then the value of that particular
zone image value in the feature vector is zero. Finally 2*n such features are extracted. Nearest
neighbour and Feed forward back propagation neural network classifiers are used for
subsequent classification and recognition purpose. We obtained 99 %, 99%, 96% and 95 %
recognition rate for Kannada, Telugu, Tamil and Malayalam numerals respectively.

A detailed Survey on the Minimally Segmenting High Performance Bangla Optical Character
Recognisation is presented in [9].. Character Recognition (OCR) system, providing much higher
performance than the traditional neural network based ones. It describes how Bangla characters are
processed, trained and then recognized with the use of a Kohonen network. While there have been
significant efforts in using the various types of Artificial ,neural , networks (A,,) in optical character
recognition, this is the first published account of using a segmentation-free optical character
recognition system for Bangla using a Kohonen network. The methodology presented here assumes
that the OCR pre-processor has minimally segmented the input words into easily segment able
chunks, and presenting each of these as images to the classification engine described here. The size
and the font face used to render the characters are also significant in both training and classification.
The images are first converted into grayscale and then to binary images; these images are then scaled
to a fit a pre-determined area with a fixed but significant number of pixels. The feature vectors are
then extracted from the rectangular pixel map, which in this case is simply a series of 0s and 1s of
fixed length. Finally, a Kohonen neural network is chosen for the training and classification process.
Although the steps are simple, and the simplest network is chosen for the training and recognition
process, the resulting classifier is accurate to better than 98%, depending on the quality of the input
images.

System Design for Optical Character Recognition in urdu is described in[10].The offline optical
character recognition (OCR) for different languages has been developed over the recent years. Since
1965, the US postal service has been using this system for automating their services. The range of
the applications under this area is increasing day by day, due to its utility in almost major areas of
government as well as private sector. This technique has been very useful in making paper free
environment in many major organizations as far as the backup of their previous file record is
concerned. Our this system has been proposed for the Offline Character Recognition for Isolated
Characters of Urdu language, as Urdu language forms words by combining Isolated Characters. Urdu
is a cursive language, having connected characters making words. The major area of utility for Urdu
OCR will be digitizing of a lot of literature related material already stocked in libraries. Urdu
language is famous and spoken in more than 3 big countries including Pakistan, India and
Bangladesh. A lot of work has been done in Urdu poetry and literature up to the recent century.
Creation of OCR for Urdu language will make an important role in converting all those work from
physical libraries to electronic libraries. Most of the stuff already placed on internet is in the form of
images having text, which took a lot of space to transfer and even read online. So the need of an
Urdu OCR is a must. The system is of training system type. It consists of the image preprocessing,
line and character segmentation, creation of xml file for training purpose. While Recognition system
includes taking xml file, the image to be recognized, segment it and creation of chain codes for
character images and matching with already stored in xml file.
Massively parallel character recognition system is described in [11]. The system is designed to study
the feasibility of the recognition of hand printed text in a loosely constrained environment. The NIST
handprint database, NIST Special Database 1, is used to provide test data for the recognition system.
The system consists of eight functional components. The loading of the image into the system and
storing the recognition results from the system are I/O components. In between are components
responsible for image processing and recognition. The first image processing component is
responsible for image correction for scale and rotation, data field isolation, and character data
location within each field; the second performs character segmentation; and the third does character
normalization. Three recognition components are responsible for feature extraction and character
reconstruction, neural network-based character recognition, and low-confidence classification
rejection.

A study of Optical Character Recognition (OCR) techniques employed in automatic mail sorting
equipment is presented in [12]. Methods and algorithms for image pre-processing, character
recognition, and contextual post processing are discussed and compared. The objective of the study
is to provide a background in the state-of-the-art of this equipment as the first element in a search for
techniques to significantly improve the capabilities of postal address recognition. This document
contains a report of activities that describes the methods used for locating published and patent
literature, and the experimental soft are developed. A background report is also given on the past and
present USPS OCR environment and research. A description is provided of methods, algorithms and
technical specifications of character recognition equipment manufactured by three firms: Pitney
Bowes - Elsag, Burroughs Nippon Electric (NEC), and Telefunken. Only technical specifications are
provided for two others: Toshiba and Recognition Equipment, Inc. A bibliography of relevant
literature is also presented that includes papers in the open technical literature, U.S. patents and
USPS-sponsored technical reports.

License Plate Recognition in neural network is clearly explained in [13].This paper uses two Neural
Network techniques for character recognition, one is Back Propagation Neural Network and other
one is Learning Vector Quantization Neural Network. Their results are compared based upon their
perfection in the character recognition. It is observed that the character recognition results obtained
using Learning Vector Quantization Neural Network (LVQ NN) is better than the character
recognition results obtained by using Back Propagation Neural Network (BP NN). The efficiency of
the System can be further improved by increasing the number of fonts for training Neural Networks.

A model for recognising scene text called Semi-Markov model is described in [14] Using wavelet
features.It requires only approximate location of the text baseline and font size; no binarization or
prior word segmentation is necessary. Our system is aided by a lexicon, yet it also allows non
lexicon words. To facilitate inference with a large lexicon, we use an approximate Viterbi beam
search. Our system performs robustly on low-resolution images of signs containing text in fonts
atypical of documents.
The automatic recognition method for color text characters extracted from scene
images, which is robust to strong distortions, complex background, low resolution and
non uniform lightning is discussed in [15]. Based on a specific architecture of convolution
neural networks, the proposed system automatically learns how to recognize characters without
making any assumptions, without applying any preprocessing or post-processing and without using
tunable parameters. For this purpose, we use a training set of scene text images extracted from the
ICDAR 2003 public training database. The proposed method is compared to recent character
recognition techniques for scene images based on the ICDAR 2003 public samples dataset in order to
contribute to the state-of-the-art method comparison efforts initiated in ICDAR 2003. Experimental
results show an encouraging average recognition rate of 84.53%, ranging from 93.47% for clear
images to 67.86% for seriously distorted images.

The method for recognition of text in natural image containing signs is discussed in
[16]. Recognized text from natural images presents valuable information which can be used for
translation, location based services and image searches. As street view services are becoming
important for major IT companies, this street text recognition is an important problem to solve. There
are a lot of previous studies on street text recognition for English, Chinese and mixed languages.
Although a lot of creative recognition methods have been proposed and experimented with, the
average recognition rate is around 60∼80%.This paper starts out with the question of how we can
recognize street text appearing in a general street view with high reliability. We proposed two main
tools to tackle this problem: one is spatial GPS information, and the other is string alignment which
is a good method to handle the approximated string match. Since the text recognized by the previous
methods can be considered a sort of approximated string from the original text(true string),the string
alignment score could give a good candidate set of street text .For a group of street texts collected by
string alignment, we then exploit the spatial information which can be obtained by GPS information.
Our approach guarantees more than 90% recognition rate by combining alignment scores and GPS
information, which is a marked improvement above the 60% rate of previous methods. Since most
digital maps such as Google maps provide a precise GPS information and true textual information
for a designated position, our approach can be automated to tag building information over the street
view images.

The recognition of text in everyday scenes is made difficult by viewing conditions, unusual fonts,
and lack of linguistic context. Most methods integrate a priori appearance information and some sort
of hard or soft constraint on the allowable strings. Weinman and Learned-Miller showed that the
similarity among characters, as a supplement to the appearance of the characters with respect to a
model in [17] which could be used to improve scene text recognition. In this work, we make further
improvements to scene text recognition by taking a novel approach to the incorporation of similarity.
In particular, we train a similarity expert that learns to classify each pair of characters as equivalent
or not. After removing logical inconsistencies in an equivalence graph, we formulate the search for
the maximum likelihood interpretation of a sign as an integer program. We incorporate the
equivalence information as constraints in the integer program and build an optimization criterion out
of appearance features and character bigrams. Finally, we take the optimal solution from the integer
program, and compare all “nearby” solutions using a probability model for strings derived from
search engine queries. We demonstrate word error reductions of more than 30% relative to previous
methods on the same data set.

A method for text localization and recognition in real word image is described in[18] The paper
proposes a flexible probabilistic model for character recognition that integrates local language
properties, such as bigrams, with lexical decision, having open and closed vocabulary modes that
operate simultaneously. Lexical processing is accelerated by performing inference with sparse belief
propagation, a bottom-up method for hypothesis pruning. We give experimental results on
recognizing text from images of signs in outdoor scenes. Incorporating the lexicon reduces word
recognition error by 42% and sparse belief propagation reduces the number of lexicon words
considered by 97%.

One of the method for Scene text recognition using similarity and lexicon with sparse belief
propagation is described in[19].Relative to document recognition, it is challenging because of font
variability, minimal language context, and uncontrolled conditions. Much information available to
solve this problem is frequently ignored or used sequentially. Similarity between character images is
often overlooked as useful information. Because of language priors, a recognizer may assign
different labels to identical characters. Directly comparing characters to each other, rather than only a
model, helps ensure that similar instances receive the same label. Lexicons improve recognition
accuracy but are used post hoc. It introduce a probabilistic model for STR that integrates similarity,
language properties, and lexical decision. Inference is accelerated with sparse belief propagation, a
bottom-up method for shortening messages by reducing the dependency between weakly supported
hypotheses. By fusing information sources in one model, we eliminate unrecoverable errors that
result from sequential processing, improving accuracy. In experimental results recognizing text from
images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19
percent, the lexicon reduces word recognition error by 35 percent, and sparse belief propagation
reduces the lexicon words considered by 99.9 percent with a 12X speedup and no loss in accuracy.

A system that reads the Kannada text encountered in natural scenes with the aim to
provide assistance to the visually impaired persons of Karnataka state is presented in
[20]. The proposed system contain three main stages text extraction, text recognition and speech
synthesis. This paper concentrated on text extraction from images/videos. In this paper: an efficient
algorithm which can automatically detect, localize and extract Kannada text from images (and digital
videos) with complex backgrounds is presented. The proposed approach is based on the application
of a colour reduction technique, a standard deviation base method for edge detection, and the
localization of text regions using new connected component properties. The outputs of the algorithm
are text boxes with a simple background, ready to be fed into an OCR engine for subsequent
character recognition. Our proposal is robust with respect to different font sizes, font colours,
orientation. Alignment and background complexities. The performance of the approach is
demonstrated by presenting promising experimental results for a set of images taken from different
types of video sequences.

1.2 Motivation for the Work:

The literature survey reveals that not much work has been carried out in recognition of Kannada
character in camera based images. Character recognition in general have a quite a lot of relevant
application for automatic indexing information retrieval such as document indexing ,content based
image retrieval, license car plate recognition which further opens up the possibility for more
improved and advanced systems. The same characters differ in size, shape and styles. Like any
image the visual characters are subjected to spoilage due to noise. Due to peculiarities of Kannada
script, it is difficult to extract the features to recognize a character compare to words. Thus present
work illustrates the Kannada word recognition for scene text images...

1.3 Work Carried Out


A) Images are taken from DC office of Bagalkot.
B) Images are segmented to get 490 characters manually using paint brush as a tool.
C) Algorithm for feature extraction is developed. Thus character recognition and test has
been done for 400 samples.
D) Extracted features stored in knowledge base.
E) Using the codes in knowledge base and the algorithm the characters are tested whether
recognized or not.
Finally, we have collected 490 samples. Trained 8 samples of each character stored at the
database. 2 samples of same character is tested whether recognized or not. 100%
efficiency is obtained.

1.4 Organization of Report


The report is organized into 5 steps
 Chapter 1 describes introduction to Kannada script and character recognition, some of the
related works for recognition of Kannada script and motivation towards the work.
 Chapter 2 presents methodology to recognize basic kannada character from camera based
images obtained from display boards of Karnataka government offices.
This chapter divided into 4 sections.
 Section 2.1 Describes the proposed methodology.
 Section 2.2 Describes the algorithm of proposed methodology.
 Section 2.3 Illustrates the proposed methodology with an example.
 Section 2.4 Gives summary of chapter 2
 Chapter 3 gives experimental results and discussion.
 Chapter 4 presents conclusion and future work , bibliography.
1.5 Summary
In this chapter an introduction to character recognition system that recognizes of character
written on display boards of Karnataka Government Offices. An exhaustive literature survey
in different allied domains is also included. The objective of this is to automate the character
recognition using algorithm developed Hu’s invariant moment. The performance of proposed
system has been tested with different samples of character to determine the accuracy rate

CHAPTER 2
REQUIREMENT ANALYSIS AND PROBLEM
DEFINITION

2.1 Requirement Specification


2.1.1 Purpose
The purpose of this document is to clarify the software requirements related to the kannada character
recognition system for camera based images. This document will detail the purpose of this the
system along with its capabilities, interfaces, and user interactions.

2.1.2 Scope

This software system will be a character recognition System that will be implemented in MATLAB.
The system will take a .jpeg file as an input and output a message. This system will also allow for
some user interaction. Users will be able to select the input file, and moving to the next options. The
system will then display the results of the process in message box

2.1.3 Definitions, Acronyms, and Abbreviations.


JPEG-joint photographic Experts group

PNG- portable network graphics

GIF-graphics interchange format

2.2 The Overall Description


The following sections describe the interfaces of the system to the environment. It also describes the
functionality of the system and also goes into details about the specific requirements of the system.

2.2.1 Hardware Requirements


A standard computer with MATLAB 7.0 or higher installed on it is needed to run the system. This
computer will have the necessary input devices such as a keyboard and a mouse. Output devices such
as a display monitor are needed to view the results . The software shall interact with the movement
of the mouse and the mouse button. The mouse shall activate input and output, command buttons and
select options from menus. - The software shall interact with the keystrokes of the keyboard. The
keyboard will input data into the active areas of the GUI.

2.2.2 Software Requirements


The database of images is maintained in the form of folders. The Mat lab Guide is used for user
interface purpose

(a) MATLAB-Version 7.8.0.(R2009a)

(b) Operating System- Windows and Linux (Fedora, Mandriva2007)

(c) Image Formats- .jpg, .png, .jpeg, .gif

2.2.3 Communications Interfaces


No specific interfaces are needed during the operation of the system.

2.2.4 Memory Constraints


At least 250MB RAM space and at least
2.3 User Characteristics

Basic understanding of executing programs in MATLAB is required. The user should be able to run
the program, input the character images and read and analyse from the message box. Understanding
of the basic digital image processing techniques would be beneficial

2.4 Constraints

The computer on which the system is to operate should have all the minimum system requirements
for running MATLAB 7.0. Depending on the operating system that is running on the computer, these
constraints may vary. Please refer to the Math works website
(http://www.mathworks.com/products/matlab/requirements.html) for detailed MATLAB 7.0
operational requirements. Additionally, the CRS system shall only receive jpeg files as input.

2.5 Problem definition


The work aims at developing a novel method for Kannada Basic Character Recognition for camera
based image.

Kannada Vowels

Kannada Consonants
2.6 Summary
In this chapter, the all the functional and non-functional requirements for implimenting the project
are listed.The software and hardware requirements for the project are also listed and the problem
defintion of our project is defined at the end of the chapter

CHAPTER 3
METHODOLOGY FOR CHARACTER
RECOGNITION
The chapter 2 describes the functional and non-functional requirements to build a robust character
recognition system of basic Kannada character for camera based images. The proposed methodology
involves two phases namely training and testing. Training phase involves pre-processing of the
training images and obtaining the corresponding feature vectors and construction of knowledge base.
And the testing phase involves the recognition of the image based on Euclidian distance classifier.
The methodology employed is shown in the figure 3.1.

3.1 Proposed Methodology


Training Images Testing Image
Preprocessing
Preprocessing

Feature
extraction Knowledge
Base

Base

Euclidian Distance Feature


Classifier extraction

Recognized Character

Fig3.1: Block diagram of proposed model

3.2 Character image acquisition

The character recognition system builds a database containing total number of 490 characters of
different issues with respect to size, orientation, color and thickness is built.

3.3 Pre-processing

The preprocessing involves mainly four stages - gray scale conversion, Binarization, Thinning and
applying Bounding box.

3.3.1 Gray scale conversion

The captured images is stored in database as colored images, a three dimensional image where each
pixel is represented by red, green and blue components which is not suitable for further processing.
So to get a one dimensional image it is converted into a gray scale image where each pixel is
represented by value in the range 0 to 255.

3.3.2 Binarization

The binarization process converts the image into binary image where each pixel is represented by
either 0 or 1. It also reduces image information by removing background so that the image is black
and white type. This type of image is much easier for further processing.

3.3.3 Thinning

Thinning is the process of reducing thickness of each line of pattern to just a single pixel. It removes
pixels so that an object without holes shrinks to a minimally connected stroke , and an object with
holes shrinks to a connected ring halfway between each hole and the outer boundary.

3.3.4 Bounding box

Individual captured images are automatically cropped to the size of character using fitting
rectangular bounding box algorithm so that unnecessary areas are removed. It reduces the total
number of pixels in the analyzed image.

3.4 Feature Extraction

In this project we implemented Hu’s moments and the character is represented feature vector. The
moments describe the image with respect to its axis. The moments are invariant with respect to
translations, rotation and scaling.

3.4.1 Hu’s Moment

Moment invariants are firstly introduced by Hu .Hu derived six absolute orthogonal invariants and
one skew orthogonal invariant based upon algebraic invariants, which are not only independent of
position, size and orientation and also independent of parallel projection. The moment invariants
have been proved to be the adequate measure for the tracing image patterns regarding the image
translations scaling and rotation under the assumption of image with continuous functions and noise-
free. Moment invariants have been extensively applied to image pattern recognition, image
registration and image reconstruction. However, the digital images in practical application are not
continuous and noise-free, because images are quantized by finite-precision pixels in a discrete
coordinate. In addition, the noise may be introduced by various factors such as camera.
The moments describe the image content (or distribution) with respect to its axes. If the image
function f(x,y) is a piecewise continuous bounded function, the moments of all orders exist and the
moment sequence {mpq} is uniquely determined by f(x,y); and correspondingly, f(x,y) is also
uniquely determined by
the moment sequence {mpq}.

M p ,q   i( x, y).x p y q dxdy
D

M 0,0  area of the object

M 0 ,1 , M 1, 0   center of mass

Centralized Moments
One should note that the moments may be not invariant when f(x,y) changes by translating,
rotating or scaling. The invariant features can be achieved using central moments, which are defined
as follows:

M pc , q   i( x, y).( x  x) ( y  y ) q dxdy
p

M 1, 0 M 0,1
x ,y
M 0, 0 M 0, 0
The centroid moments μpq computed using the centroid of the image f(x,y) is equivalent to the mpq
whose center has been shifted to centroid of the image. Therefore, the central moments are invariant
to image translations.
Hu (1962) introduced seven non-linear functions defined on regular moments
which are invariant to rotation, scaling and translation. Those are

The seven invariants moments are useful properties of being unchanged under image scaling,
translation and rotation.

3.5 Euclidian Distance Classifier

Feature Extraction by Hu’s moment yields a set of feature vectors for the trained images as well as
for the testing image .The classification of the testing image is based on spectral values of individual
pixel. With this algorithm each unknown pixel with feature vector X obtained as a result of Hu’s
moment is classified by assigning it to the classes whose mean vector (M) is closest to X. The
Euclidian Distance is computed for all the classes and the pixel is assigned to the class for which the
distance is minimum.

3.6 Training

The database used consists of 490 images, 10 images of each character. The training phase considers
one by one image in database , preprocessing is applied, features are extracted by Hu’s Moment and
these features are stored in knowledge base.

3.7 Testing

The testing phase considers single test image and preprocessing is applied, features are extracted by
Hu’s Moment and stored in a feature vector. The Euclidian distance of test image to all images in the
training is calculated. The testing image is assigned to the class of minimum Euclidian distance.

3.8 Example

For better illustration, consider a basic Kannada character image. The images are collected from DC
office Bagalkot. One of the images is shown below.

Fig 3.8.1 Color image

To get a one dimensional image it is converted into a gray scale image and the converted image is
shown below.
Fig 3.8.2 Gray scale image

The gray scale image is binarized and the image is shown below.

Fig 3.8.3 Binarized image

The binarized image is subjected to thinning and the thinned image is shown below

Fig 3.8.4 Thinned image

The thinned character image is fitted in the rectangular bounding box and resized. And the image is
shown below.
Fig 3.8.5 Resized image

The Hu’s 7 invariant moments for the above sample are

Humom =

1.0e+004 *

0.0006 0.0011 0.0449 0.0294 6.2763 0.0368 -8.6238

Euclidian distance for above got feature vector is

Z = 1.0e+005 *

0.0000, 0.2065, -0.0448, 0.2208, 0.1712

0.1895, 0.2237, 0.0734, 0.3016, 1.4168

Result--- ‘ image belong to A group’

3.9 Summary

This chapter gives the information of methodology used in the project. Explains about the training
phase and testing phase. The brief description of preprocessing phase, feature extraction, Hu’s
moment and Euclidian distance. The example explains the working of methodology.
CHAPTER 4
IMPLEMENTATION

In chapter 3 ,the proposed methodology utilizes the feature vector obtained from the Hu’s algorithm
to assign the character to the corresponding class hence the character is recognized. This chapter
explains the implementation details of each module of the proposed methodology. To build robust
character recognition system a database containing a total amount of 490 characters, 10 samples
corresponding to each character is required. The obtained images are segmented manually to get
each individual character. These images are stored in the database as color images(jpeg).

4.1 Preprocessing Module


A wide variety and configuration of devices used to recognize the character required normalizing the
images before further processing . The purpose in this phase is to make the characters to be a
standard size and ready for feature extraction. Many number of pre processing methods can be
followed. This experiment involved 5 stages of preprocessing. They are-gray sale conversion,
binarization, thinning, fitting boundary box, and resizing

The color character image is taken from the database as input using imread () function .The function
rgb2gray converts RGB images to grayscale by eliminating the hue and saturation information while
retaining the luminance. The function im2bw (I, level) converts the grayscale image I to a binary
image. The output image BW replaces all pixels in the input image with luminance greater than level
with the value 1 (white) and replaces all other pixels with the value 0 (black).

The bwmorph () is applied to the binarized image for specified number of times (n). n can be infinity
in which case operation is repeated until the image no longer changes.
Algorithm 4.1: Image acquisition and preprocessing

Input : Original 24 bit color image

Output: Standard size image(30x30)

Start

Step1: Input image of basic kannada character.


Step2: Convert the rgb image into gray scale image.
Step3: Convert the grayscale image into binary image.
Step4: Apply the thinning algorithm.
Step5: Apply fitting boundary box (refer Algorithm 4.1.2)
Step6: Resize the image to 30x30.
Stop.

Individual captured images are automatically cropped to the size of character using fitting
rectangular bounding box algorithm so that unnecessary areas are removed.

Algorithm 4.1.2 : Fitting boundary box

Input: Thinned image

Output: Character image reduced in size

Start
Step 1: Input the thinned image
Step 2: Find the size of the image(row, column)
Step 3: Scan the image from top row, for (i=1 to row) do the following

-find the sum of i th row pixels

- if sum is less than column

then save the row number in k1 and stop scanning.

Step 4: Scan the image from bottom row ,for(i= row to 1) do the following

- find the sum of i th row pixels

- if sum is less than column


then save the row number in k2 and stop scanning.

Step 5: Scan the image from rightmost column for (i=1 to column) do the following
-find the sum of i th column pixels
-if sum is less than row
Then save that column number in k3 and stop scanning
Step 6: Scan the image from the leftmost column for(i=column to 1) do the following
- Find the sum of i th column pixel
- If the sum is less than row
Then save that column in k4 and stop scanning.
Step 7: Store the pixels of the original image from row k1 to k2 and from column k3 to k4
in the variable I2
Step 8: Take I2 as reduced character image to specified size
Stop.

4.2 Feature Extraction Module

Hu’s invariant moment method is used to extract feature of images. These moments are independent
of rotation , translation and scaling. The image taken is independent of skew. The algorithm for Hu’s
moments is given below.

Algorithm 4.2: Feature Extraction


Input : Standard image
Output: 7 invariant values.

Start

Step1: Find x-axis and y-axis position of pixels for standard image
Step2: Find area of object and center of mass.
Step3: Find Centroid and centroid moment
Step4: Find invariant values using specified formula above 3.3.1
Step5:Store the values in a feature vector

Stop

4.3 Classifier Module


The Euclidian Distance method is used for classifying the image by calculating distance vector The
difference between feature vector of trained images and feature vector of test image is calculated.
The test image is assigned to the class of minimum distance vector value. The algorithm is given
below.

Algorithm 4.3: Classifier


Input : Test image
Output: Distance vector

Start

Step1: Load knowledge base (KB)


Step2: Find feature vector(FV) of test image
Step3:for(i=0 to row)
for (j=0 to column)
Difference of KB(I,j) ,FV(i,j)
Sum of differences
Step4: Store in distance vector

Stop

4.4 Training Module

Among the 10 samples , 8 samples of each character are taken for training. During training features
are extracted and stored in the knowledge base. Algorithm developed for hu’s moments is used for
feature extraction.
Algorithm 4.4: Training
Input : Image from database
Output: Knowledge base

Start

Step1: Declare Knowlede base(KB)


Step2: For(i=1 to rows) in the database
Extract the directory path and image number
Pass the filename to the function to extract feature vector
Store the feature vector in KB array
Step3: Save KB

Stop

4.5 Testing Module

Test sample of character is taken for testing. Pre processing is applied and features are extracted.
Euclidian distance is applied for extracted feature and character is classified.

Algorithm 4.5: Testing


Input : Test image

Output: Recognized character

Start

Step1: Load knowledge base (KB)


Step2: Declare Z array to store distance vector
Step3: Find the feature vector for test image
Step4: Find the Euclidian distance for test image’s feature vector with respect to
feature vectors in the KB
Step5: Store the distance vector in Z
Step6: Find the minimum value in Z
Step7: The test image belong to minimum value index number in Z

Stop.

4.6 GUI Module

The user interfaces is established by using Mat lab Guide . GUIDE, the MATLAB
Graphical User Interface Development Environment, provides a set of tools for creating graphical user
interfaces (GUIs). These tools greatly simplify the process of laying out and programming GUIs The tools
used in the interface are push button, text box, axes and list box . The push button is used to execute
the code written in its area and also is used to move to next figure. The text button displays the
message . The axes box display the images used in code. The list box displays the value calculated
using algorithms.
4.7 Sessions with Packages

TheFIg

The fig 4.7.1 shows the home page of the character recognition system which leads to the two
phases namely training and testing on clicking the push button “next”.
The above figure 4.3.2 gives a clear picture of the training phase .Before the testing phase it is
recommended to opt the training phase by clicking on the “training “ push button. The system then
creates the knowledge base comprising of feature vector of all the trained samples and the system
displays a message “ the training completed successfully”. Then the user need to go the testing
phase to recognise the desired character by clicking on the “testing ” push button.
In the above figure 4.7.3 testing phase the different stages are shown. “Load image”
button allows the user to browse and load the input coloured character image from the
system. “Binarized image” is the push button after clicking, which converts the loaded
image to binary image .On clicking the “Thinned Image” button the binarized image
is thinned and it is fitted in rectangular bounded box on clicking the “Bounded image ”
button. The various invariant features got after applying the feature extraction based
on Hu’s moments for the test image can be seen in the List box beside the “features”
button. And the Output is the Message box displaying in text the group to which the
character belongs.

4.6 Summary

The chapter explains the modules used in the project. The preprocessing modules explains about
functions used for standardizing the image .The training and testing module explains about
collection of feature vector and distance vector of images. The Hu’s moment module explain feature
vector obtaining. The Euclidian module explain distance vector obtaining. The algorithms of all the
modules are written which are useful in writing code.

CHAPTER 5
EXPERIMENTAL RESULTS AND DISCUSSION
Number of images with different specifications are experimented on the proposed methodology and
below table shows some images with input and output.

1)

Input Image Processed Image

The input image is an RGB image, which contains one of the basic kannada character with complex
background and slight abnormal orientation and the output image is a resulting image after
processing the input image. It contains extracted character information in the form of feature vector
obtained from Hu’s moments and the image is classified as belonging to THA group. As it can be
noticed, the complex background and other distortions are simplified .

2)
Input Image Processed Image

The input image is an RGB image, which contains one of the basic kannada character uniform and
simple background and normal orientation and the output image is a resulting image after processing
the input image. It contains extracted character information in the form of feature vector obtained
from Hu’s moments and the image is classified as belonging to AM group. As it can be noticed, the
unwanted background has been removed.

3)
Input Image Processed Image

The input image is an RGB image, which contains one of the basic kannada character with blurring
effect and non uniform background and the output image is a resulting image after processing the
input image. It contains extracted character information in the form of feature vector obtained from
Hu’s moments and the image is classified as belonging to KSHA group. As it can be noticed, the
unwanted background has been removed and the blurring effect has been overcome.

4)

Input Image Processed Image

The input image is an RGB image, which contains one of the basic kannada character not so uniform
background and contain distortion normal orientation and the output image is a resulting image after
processing the input image. It contains extracted character information in the form of feature vector
obtained from Hu’s moments and the image is classified as belonging to HA group. As it can be
noticed, the image id free from distortion and is simplified

5)
Input Image Processed Image
The input image is an RGB image, which contains one of the basic kannada character not so uniform
background and contain distortion ,normal orientation and of non uniform illumination and the
output image is a resulting image after processing the input image. It contains extracted character
information in the form of feature vector obtained from Hu’s moments and the image is classified
as belonging to DHA group. As it can be noticed, uniform illumination is obtained .

6)

Input Image Processsed Image

The input image is an RGB image, which contains one of the basic kannada character uniform
background and contain little distortion ,normal orientation but tilted and the output image is a
resulting image after processing the input image. It contains extracted character information in the
form of feature vector obtained from Hu’s moments and the image is classified as belonging to TA
group. As it can be noticed, The tilted images are also recognized.
CHAPTER 6
CONCLUSION AND FUTURE WORK

The various results of the proposed system with several examples and the overall
performance of the proposed work has been illustrated in the earlier chapters. In this chapter,
conclusion and future work of the proposed method is discussed.

The proposed system is a simple and efficient with respect to size, font, color, orientation,
alignment of the character.. In this work we present the Character recognition system for Camera
based images , designed using three stages pre-processing, feature extraction and Euclidian distance
classifier in order to make the right decision. The proposed methodology is robust, employing
invariant moment features. Firstly we convert colour image to gray scale image, secondly image is
then subjected to binarization and thinning . The image is then scaled to fit a pre-determined area
with the fixed but significant number of pixels .The feature vector is extracted from the image by
using Hu’s moments and is classified based on Euclidian distance classifier.

The experimental result shows that our proposed method is very effective and efficient in
recognizing the characters. Thus our proposed work concentrates on the recognition of characters
from images. The scope of the proposed system is limited to recognition of single Kannada basic
character.

Recognition is often followed by a post-processing stage. We hope and foresee that if post-
processing is done, the accuracy will be even higher .Implementing the presented system with post-
processing on mobile devices is also taken as part of our future work. The proposed system is
developed for Kannada basic characters, in future it can be extended to all the remaining characters.
REFERENCES

1. Jerod J. Weinman, Member, IEEE, Erik Learned-Miller, Member, IEEE, and Allen R. Hanson,
Member, IEEE “Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief
Propagation”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE, VOL. 31, NO. 10, OCTOBER 2009.

2. Te´ofilo E. de Campos,Bodla Rakesh Babu Manik ”CHARACTER RECOGNITION IN


NATURAL IMAGES”International Institute of Information Technology, France.

3. Nobuo Ezaki, Marius Bulacu, Lambert Schomaker, “Text Detection from Natural Scene Images:
Towards a System for Visually Impaired Persons”, Proc. of 17th Int. Conf. on Pattern Recognition
(ICPR 2004), IEEE Computer Society, 2004, pp. 683-686, vol. II, 23-26 August, Cambridge, UK

4. Xuewen Wang, Xiaoqing Ding, Changsong Liu, “Character Extraction and Recognition in Natural
Scene Images”,Image Processing Div., Dept. of E.E., Tsinghua Univ., Beijing, 100084, P.R..China.

5. Lukas Neumann and Jiri Matas,”A method for text localization and recognition in real-world
images”,Center for Machine Perception, Czech Technical University in Prague, Czech Republic.
Published at the 10th Asian Newzland at 2010.

6. Seon Yeong Kim, Sung-Hwan Kim, Hwan-Gue Cho,”A Novel Post processing Method for Street
Text Recognition Using GPS Information and String Alignment”,Dept. of Computer Science
Engineering Pusan National University Busan, Republic of Korea 978-1-4244-8728-8/11/$26.00
©2011 IEEE

7. Basilios Gatos, Ioannis Pratikakis and Stavros Perantonis, “Towards Text Recognition in Natural
Scene Images”, Computational Intelligence Laboratory, Institute of Informatics and
Telecommunications, NCSR “DEMOKRITOS” Athens 153 10, Greece.
8. Zohra Saidane and Christophe Garcia,”Automatic Scene Text Recognition using a Convolutional
Neural Network”,Orange Labs 4, rue du Clos Courtel BP 9122635512 Cesson S´evign´e Cedex -
France

9. Anand Mishra , Karteek Alahari and C.V. Jawahar, ” An MRF Model for Binarization of Natural
Scene Text” International Institute of Information Technology Hyderabad, India- INRIA - Willow,
ENS, Paris, France

10.Shyama Prosad Chowdhury,Soumyadeep Dhar,Karen Rafferty Amit Kumar Das,Bhabatosh


Chanda, “Robust Extraction of Text from Camera Images using Colour and Spatial Information
Simultaneously”, Journal of Universal Computer Science, vol. 15, no. 18 (2009), 3325-3342
submitted: 26/10/09, accepted: 20/11/09, appeared: 28/12/09 J.UCS

11.Hsien-Chu WU, Chwei-Shyong TSAI, and Ching-Hao LAI, “A License Plate Recognition
System in e- Government”, In proc. of An International Journal, 2004, Vol.15, No.2, pp.199-210.

12 Dinesh Dileep, “A Feature Extraction Technique Based On Character Geometry For Character
Recognition”, In proc. of Department of Electronics and Communication Engineering, Amrita
School of Engineering, pp. 1-4.

13Jimmywales, larry sanger[online] available: “http:// en. Wikipedia


.org/wiki/Optical character recognition”, 2001.

14 Michael Hogan, John W. Shipman, “OCR (Optical Character Recognition): Converting paper
documents to text” In proc. of New Mexico Tech Computer Center, 2008,pp.1-4. International
Journal of Computer Applications (0975 – 8887) Volume 25– No.10, July 2011 39

15 Jianlan Feng Yuping Li Mianzhou, “The Research of Vehicle License Plate Character
Recognition Method Based on Artificial Neural Network”, In proc. of 2nd International Asia
Conference on Information in Control, Automation and Robotics, 2010, pp. 317-320.

16 Bo Lin, Bin Fang, Dong-Hui Li, “Character Recognition Of License Plate Image Based On
Multiple Classifiers” In proc. of International Conference on Wavelet Analysis and Pattern
Recognition, Boading, 2009, pp. 138-143
17 G. E. Hinton, and C. K. I. Williams, “Adaptive Elastic Models for Character Recognition,”
Advances in Neural Information Processing Systems, R. Lippmann, Vol. IV, to be published,
Denver, 1991.

18 C. L. Wilson and M. D. Garris, “Hand printed Character Database,” NIST Special Database 1,
HWDB, 1990.

19 M. D. Garris, “Design and Collection of a Handwriting Sample Image Database,” Social Science
Computing Journal, Vol.10, published, 1992.

20. M. D. Garris and C. L. Wilson, “A Neural Approach to Concurrent Character Segmentation and
Recognition,” to be published in the proceedings of Southcon, Orlando, 1992.

You might also like