Comp Networking - Ijcnwmc - Tamil and Hindi Script Recognition - Bharath Kumar

www.tjprc.org editor@tjprc.
org
International Journal of Computer Networking, Wireless
and Mobile Communications (IJCNWMC)
ISSN(P): 2250-1568; ISSN(E): 2278-9448
Vol. 4, Issue 3, Jun 2014, 7-16
TJPRC Pvt. Ltd.

TAMIL AND HINDI SCRIPT RECOGNITION SYSTEM USING HIERARCHICAL
MULTILAYERED NEURAL NETWORK
BHARATH KUMAR M R
1
& S. GANESHMOORTHY
2
1
Pre Final Year, Nehru Institute of Engineering & Technology, Coimbatore, Tamil Nadu, India
2
Assistant Professor, Department of Computer Applications, Nehru Institute of Engineering & Technology,
Coimbatore, Tamil Nadu, India

ABSTRACT
In this proposed work, a neural network for Tamil and Hindi character recognition is been proposed by
implementing algorithms over the neural network. The neural network can effectively recognize various characters and
digits with good recognition and accuracy. Character recognition has become one of the vital in mobile e-learning, teaching
and much other interesting application industry. The application is to apply Neural Network for recognizing digits and
characters. It consists of an interconnected group of artificial neurons and processes information using a connectionist
approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or
internal information that flows through the network during the learning phase. In practical neural networks are non-linear
statistical data modeling tools. They can be used to model complex relationships between outputs and inputs or to find
various patterns in data given for detection.
KEYWORDS: Neural Network, Feed Forward & Back Propagation Algorithm, Digital Recognition, Character
Recognition
INTRODUCTION
A NEURAL NETWORK (NN) plays a vital role in the character recognition system, it has a huge and wide
parallel structure which is composed of many number of processing elements inter connected to each other connected
through different weights [1] [3]. There are many researches are carried out in the arena of Neural Network. A NN has a
higher rate of performance and a speed of response than other system/algorithm/network, in our previous paper work
[8] we proposed the similar concept with lesser accuracy in such a way the growth of the field increases rapidly, NNs have
been used in digits and character recognition problems, especially where input digits and character are shifted in
state/position. As discussed in [4], [5] it gives the idea presented the Neocognitron, which is insensitive to translation and
deformation of input characters and digits, and used it to recognize hand printed characters.
The Neocognitron is complex and needs many number of cells. And later Grossberg and Carpenter [6] had
proposed a self-organizing system which can classify different kinds of patterns by adaptive resonance theory. A lot of
internal exemplars including noise patterns distortion patterns are formed in the network. Pittman and Martin [7] had
carried out Back Propagation (BP) learning network to recognize hand-printed letters and digits. The simulations had
indicated that the genetic neural network can recognize various distorted patterns with higher rate of accuracy and
recognition.
8 Bharath Kumar M R & S. Ganeshmoorthy

Impact Factor (JCC): 5.3963 Index Copernicus Value (ICV): 3.0

Currently, a lot of works was done by depending on the computer; In order to let the processing time to be
reduced and to provide more results that are accurate, for example, depending on different types of data, such as characters
and digits and the numbers are used frequently in normal life operation. In order to automate systems that deal with
numbers such as registration, aadhaar card, voting, postal code, banking and transaction numbers and car plates. In this
paper, we are dealt with automatic recognition number system is implemented with experimental results. Digit recognition
has been made a clear study. There are wide methodologies in pattern recognition and image processing has been
developed by scientists and engineers to solve this problem [6], [1]. This is due to the demand and the necessity of this
kind of recognition system is required in various fields.
In this study, system for recognized of digits is built, which may benefit various fields, the system concerning on
characters and digits, the input is considered to be an image of specific size and format, the image is processed and then
recognized to result of an edited digits and characters. The proposed system recognizes characters as the system acquire an
image consisting of characters of Tamil and Hindi. A feed forward back propagation algorithm will be applied for training
the network and finally change them into text or a value [2].
LITERATURE STUDY
Definition of Character and Digits
Character and digits is the basic building block of any language that is used to build different structures of a
language. Characters are the alphabets and the digits are the numbers. The structures are the words, numerals, strings,
and sentences etc. [1]
Optical Character Recognition
Optical character recognition (OCR) is the most widely used method of recognition; its the process of converting
an image, like scanned copies of documents or electronic fax, ID numbers into computer-editable text. The text and digits
in an image are non-editable. The characters and digits are made of tiny dots (pixels) that collection of tiny dots together
forms a picture of text and numbers. In OCR, the software analyzes an image and converts the pictures of the characters
and digits to editable based on the patterns obtained of the pixels in the image given as an input. Later OCR, we can export
the converted text and use it with a variety of word-processing, page layout and spreadsheet applications. It reduces the
human effort in the process of making an editable soft copies of any images printed documents.
Scope of Study
The ultimate scope of this project is to build a system, that automatically recognize the digits and character input
to the system, and the corresponding outputs produced can be utilized for various other purposes of the application based
on different domain.
Objective
In current scenario, the implementation of such higher accuracy of recognition related projects are used in a very
low rate; the main objective is to develop a recognition system that efficiently recognizes characters and digits utilizing a
limited processing time.
Tamil and Hindi Script Recognition System Using Hierarchical Multilayered Neural Network 9

www.tjprc.org editor@tjprc.org

Figure 1: Examples of Different Shapes of Tamil Scripts

Figure 2: Scenario of Character Recognition with Artificial Neural Network
Framework
The characters may vary from one person to other as shown in the Figure 1 is one of the examples of different
user handwritten and it shows the Tamil character which is pronounced as aah, in Figure 2 the scenario of number
recognition with Artificial Neural Network Which contains input of the handwritten script, hiding layer and output with the
Tamil character for evaluation of the Network.
DEVELOPMENT OF CHARACTER AND DIGIT RECOGNITON USING NEURAL NETWORK
OCR Data Pre-Process Unit
The Optical Character Recognition, recognizes the character and digits are preprocessed and its provided as the
input files to the application at the developer stage. The digits through 0-9 and characters A-Z are represented as images.
Each digit and character is provided with ten samples for initial recognition. The sample images are fixed to the
size 320 x 320.
Feed Forward Neural Network Representation Unit
A Feed Forward Neural Network is mode of an artificial neural network where their connection between the units
does not form a direct chain (cycle). Its a different approach in Neural Network. In this network, the information moves in
only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no
cycles or loops in the network.
Training Unit
The Back Propagation Algorithm is a one of the common way of approaching artificial neural networks.
It requires an assistance that knows, or it can calculate the desired output for any given input. Back propagation algorithm
acquires the weights for a multilayered network, given a network with a limited fixed set of units and interconnections.
It is most useful for feed forward networks. It employs gradient descendent rule to attempt to minimize the squared error
between the network output values and the target values for these outputs.


Digit/Character Recognition Unit
This is the final test where a test input character or digit is entered by the user and is recognized using a classifier
algorithm. Figure 3 shows the character or digit is recognized as the character digit with corresponding highest node value.

Figure 3: Block Diagram for Character Recognition Using Neural Network
System Components: It is a group of number (0-9) and characters (A - Z) handwritten, the numbers will be given
to the system handwritten in a different styles, sizes and possible orientations.
Algorithm
Step 1: Load image file
Step 2: Analyze image for characters for each character
Step 3: Analyze and process symbol image to map into an input vector
Step 4: Feed input vector to network and compute output
Step 5: Convert the Unicode binary output to the corresponding character and render to a text box.
The above figure 1 is an example for this algorithm.
Training Algorithm
The algorithm for train the given data is given below with the graph
%trainning the BP network, weight
clear ;
clc;
nh=14; %the number of hidden unit
d=64; %the number of input unit

k=10; %the number of output unit
wji=unifrnd(-1/sqrt(d),1/sqrt(d),nh,d); %wji is 14*64
wkj=unifrnd(-1/sqrt(nh),1/sqrt(nh),k,nh); %wkj is 10*14
t=0.1; %learning rate
ct=0.6; % error
r=0.9;
sample=vector(0);
jj=[];
erms=2;
while erms>ct
nrnd=unidrnd(80,1,80); %get the randon number
p=1;
e=0;
bji=0;
bkj=0;
while p<81
x=sample(:,nrnd(p)); %x is 64*1
t=teach(nrnd(p)); %t is 1*10
netj=wji*x; %netj is 14*1
yj=f(netj); %yj is 14*1

netk=wkj*yj; %netk is 10*1
z=f(netk); %z is 1*10
% z=round(z);
sk=(t'-z).*df(netk); %sk is 10*1
bkj=it*sk*yj'*(1-r)+r*bkj;
wkj=wkj+bkj; %


sj=df(netj).*(wkj'*sk); %sj is 14*1
bji=it*sj*x'*(1-r)+r*bji;
wji=wji+bji;
j=(t'-z)'*(t'-z);
e=e+j;
p=p+1;
end
erms =sqrt(e/80);
jj=[jj,erms];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
plot(jj);
title('training process');
xlabel('with momentum ');
ylabel('error');
save wkj2(2).dat wkj -ascii
save wji2(2).dat wji ascii
RESULTS AND DISCUSSIONS
This method of approach is little time consuming. The output results are obtained in the form of Arial font
(Tamil & Hindi). Based on our requirement the font shapes and size can be altered in the programming part.
The is coded in a way that it can be scaled up with the users required feature and it can be easily implementable.
We had used MATLAB software to develop the application and to perform the execution with various inputs as
shown in the figure 4 & 5
Please enter the testing number: 8
Please enter the testing picture: 8-1
Testpicvector


Figure 4: Output Shows Recognition of Tamil Character
z =
-1.2297
-1.0973
-1.0917
-1.0240
-0.8005
-0.9569
-1.2016
1.0322
-1.1181
-0.8372
Please enter the testing number: 7
Please enter the testing picture: 7-5
testpicvector

Figure 5: Output Shows Recognition of Hindi Character


z =
-0.9390
0.7977
-0.7539
-0.7443
-0.9931
-1.0887
-0.9798
-1.1637
-1.1746
-1.0259
Our system identifies individual character with an accuracy of 98.3%.

Figure 6: A Comparison between the Theories with Our System in Terms of Recognition Accuracy
CONCLUSIONS AND FUTURE DIRECTION
We had dealt with our new approach of text recognition and segmentation for, Tamil and Hindi characters.
Our proposed character recognition system performs the operation on input image and efficiently recognizes the
script/characters. We would still pioneer in the field to enhances and to reduce the processing period of recognition.
The Script recognition system that is developed is not only able to recognize the single/isolated character also recognizes
the digits and Tamil characters.
REFERENCES
1. D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing:
Explorarior is 111 the Microstructure of Cognition. vol. I: Foundations. Cambridge. MA: MIT Press, 1986.
2. C. Lau, Ed., Neural Networks. Theoretical Foundations and Analysis Piscataway. NJ: IEEE Press. 1992.

3. E. Sanchez-Sinencio and C. 1,au. Eds., Artificial Neural Networks Paradigms. and Hardware Implementations.
Piscataway, NJ: IEEE Press, 1992.
4. K. Fukushima, S. Miyake, and T. Ito, Neocognitron: A neural network model for a mechanism of visual pattern
recognition, /EEE Trans. Sys.. Man. and Cybern., vol. SMC-13, no. 5. pp. 826 - 834. Sept. /Oct. 1983.
5. K. Fukushima and N. Wake, Handwritten alphanumeric character recognition by neocognitron, IEEE Trans.
Neural Network.s. vol. 2. no.3, pp. 35.5-365, May 1991.
6. G. A. Carpenter and S. Grossberg, The ART of adaptive pattern recognition by a self-organizing neural
network, lEEE Conrputer Mug.. vol. 21, no. 3, pp. 77-88, March 1988.
7. G. L. Martin and J. A. Pittman, Recognizing hand-printed letters and digits using back propagation learning,
Neural Compimitimf. vol. 3, no. 2, pp. 258-267, Summer 1991.
8. M. Gunasekaran, S. Ganeshmoorthy, OCR Recognition System Using Feed Forward And Back Propagation
Neural Network Second National Conference on Signal Processing, Communications and
VLSI Design NCSCV10

Comp Networking - Ijcnwmc - Tamil and Hindi Script Recognition - Bharath Kumar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comp Networking - Ijcnwmc - Tamil and Hindi Script Recognition - Bharath Kumar

Uploaded by

Copyright:

Available Formats

www.tjprc.org editor@tjprc.

You might also like