Professional Documents
Culture Documents
Abstract— Recognition of Indian languages scripts is challenging problems. In Optical Character Recognition [OCR], a
character or symbol to be recognized can be machine printed or handwritten characters/numerals. There are several
approaches that deal with problem of recognition of numerals/character depending on the type of feature extracted and different
way of extracting them. In this paper an automatic recognition system for isolated Handwritten Devanagari Numerals is
proposed. We presented a feature extraction technique based on recursive subdivision of the character image so that the
resulting sub- images at each iteration have balanced numbers of foreground pixels as possible. Support Vector Machine (SVM)
is used for classification. Accuracy of 98.98% has been obtained by using standard dataset provided by ISI (Indian Statistical
Institute) Kolkata
1 INTRODUCTION
© 2011 IJEECS
http://www.ijeecs.org
level, the feature is calculated and then recogni- Table 1: Distribution of numerals in Devanagari
tion rate is calculated at each level, choose the Database
level at which the highest recognition rate is Digits Training Set Test Set Total
achieved. 0 1844 369 2213
1 1891 378 2269
2 DATABASE 2 1891 378 2269
3 1882 377 2259
The database is provided by the ISI (Indian 4 1876 376 2252
Statistical Institute, Kolkata) [15]. Initially Deva-
5 1889 378 2267
nagari script was developed to write Sanskrit
6 1869 374 2243
but was later adapted to write many other lan-
7 1869 378 2247
guages such as Hindi Marathi and Nepali. The
8 1887 377 2264
printed Devanagari Numerals are shown in fig-
ure 1 and it is seen that there are variations in 9 1886 378 2264
the shapes of numerals 5, 8 and 9 in their printed 18784 3763 22547
forms. In figure 2, there are shown the samples 3.1 Preprocessing
of the Handwritten Devanagari Numerals data-
i) Adjust image intensity values of the image
base. The distributions of training data and test-
using imadjust () function of Matlab.
ing data are shown in table 1.
ii) Convert the image into binary image by
choosing threshold value 0.8.
iii) Remove from a binary image all connected
components (objects) that have fewer than 30
pixels
iv) Apply median filtering, is a nonlinear opera-
tion often used in image processing to reduce
Figure 1: Devanagari Numerals
"salt and pepper" noise
v) Normalized the image into 90*90
2
at particular level and drawn a graph (figure-6)
Step 6: if xq mod 2 = 0 that shows the level of granularity and the
2 sub-images are recognition rate. By the help of graph examine
[(1, 1), (x0, ymax) and (x0, 1), (xmax, ymax)] the highest recognition rate at corresponding
Else level (L best).
2 sub-images are [(1, 1), (x0, ymax) and
(x0+1, 1), (xmax,ymax)]
3.3 Classification
Classification step is divided into two phases.
(i) Training phase
In this phase, gradually increase the higher lev-
els of granularity starting with level 1, features
are extracted. The recognition rate is calculated Figure 6: Example finding the best level ( Lbest )
3
4 CLASSIFIER (SVM)
In classification we use a voting strategy: each
Support Vector Machine is supervised Machine
binary classification is considered to be a voting
Learning technique. The existence of SVM is
where votes can be cast for all data points x - in
shown in figure 5. It is primarily a two class
the end a point is designated to be in a class with
classifier. Width of the margin between the clas-
the maximum number of votes. In case that two
ses is the optimization criterion, i.e. the empty
classes have identical votes, though it may not
area around the decision boundary defined by
be a good strategy, now we simply choose the
the distance to the nearest training pattern. The-
class appearing first in the array of storing class
se patterns called support vectors, finally define
names.
the classification function.
LIBSVM is used with Radial Basis Func-
Computer tion (RBF) kernel, a popular, general-purpose
Vision yet powerful kernel, denoted as
4
fused with 3 and 4 confused with 5 as 7 with 6 Table 4: Comparison of accuracy obtained by
and the highest recognition rate is 99.73% for 8. different methods
Computation time taken by the training phase S.n Method proposed Data Accuracy
and testing phase is shown in table 3. by Size Obtained
REFERENCES
[1] Ivind due trier, anil Jain, torfiinn Taxt, “A feature
Table 3: Computational time (Feature Extracted extraction method for character recognition-A
at level 3) survey “, Pattern Recg, vol 29, No 4, pp-641-662,
Phase Sample size Time Required 1996
Training 18784 66 seconds [2] Sandhya Arora, Debotosh Bhattacharjee, Mita Na-
Testing 3763 15 seconds sipuri, D. K. Basu, M. Kundu, “ Recognition of
Non-Compound Handwritten Devnagari Characters
using a Combination of MLP and Minimum Edit
Distance”, International Journal of Computer Sci-
Experiment 3:
ence and Security (IJCSS),Volume (4) : Issue-1 pp
Test dataset and training dataset combined to
107-120.
perform the cross validation function of LIBSVM
[3] P M Patil, T R Sontakke,” Rotation, scale and transla-
with n=10 and set the γ =0.5 and c=500. Features
tion invariant handwritten Devanagari numeral
vector for whole dataset (22547) is calculated at
character recognition using general fuzzy neural
level 3 (Lbest) and obtained 98.98% recognition
network”, Pattern Recognition, Elsevier, 2007.
rate.
5
[4] Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, A. J. Smola, editors, Advances in Kernel Methods {
“Statistical Pattern Recognition: A Review”, IEEE Support Vector Learning, pages 255{268, Cambridge,
Transactions on Pattern Analysis and Machine Intel- MA, 1998. MIT Press.
ligence, Vol. 22, No. 1, pp- 4-37, January 2000. [18] http://www.csie.ntu.edu.tw/~cjlin/libsvm
[5] G S Lehal and Nivedan Bhatt, “A Recognition Sys- [19] http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.
tem for Devnagri and English Handwritten Numer- pdf
als”, Proc. Of ICMI, 2000. [20] http://www.isical.ac.in/~ujjwal/download/databa
[6] Reena Bajaj, Lipika Day, Santanu Chaudhari, “De- se.html.
vanagari Numeral Recognition by Combining Deci-
sion of Multiple Connectionist Classifiers”, Sadhana,
Mahesh Jangid is an M.Tech. Student in computer sci-
Vol.27, Part-I, 59-72, 2002. ence & engineering department of Dr. B R Ambedkar
[7] R.J.Ramteke and S.C.Mehrotra, “Recognition Hand- National Institute of Technology.He has completed his
written Devanagari Numerals”, International journal B.E. degree in 2007 from Rajasthan University.He has the
of Computer processing of Oriental languages, 2008. 2 year teaching experience from JECRC Jaipur. His re-
search area is image processing, optical character recog-
[8] U. Bhattacharya, S. K. Parui, B. Shaw, K. Bhattachar- niton, pattern recognition.
ya, “Neural Combination of ANN and HMM for
Handwritten Devnagari Numeral Recognition”. Renu Dhir has done her Ph.D in computer science and
[9] U. Pal, T. Wakabayashi, N. Sharma and F. Kimura, engineering from Punjabi University in 2007 and M.Tech.
in computer science & engineering from TIET Patiala in
“Handwritten Numeral Recognition of Six Popular 1997.Her area of research is mainly image processing
Indian Scripts”, Proc. 9th ICDAR, Curitiba, Brazil, and character recognition. She has published more than
Vol.2 (2007), 749-753. 35 papers in various international journals and confereces
Now she is working as a Associate Professor in NIT Jal-
[10] J. Park, V. Govindaraju, S. N. Shrihari, ''OCR in Hi-
andhar.
erarchical Feature Space'', IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 2000, Vol. Rajneesh Rani is doing Ph.D. from NIT Jalandahar. She
22, No. 24, pp. 400-408. has completed her M.Tech in computer science and engi-
neering from Punjabi University, Patiala in 2003.She has
[11] Samet H, “The Design and Analysis of Spatial Data 7 year of teaching experience. Her area of research is
Structures", Addison-Wesley Longman Publishing image proceesing and character recognition.
Co., Inc., 1990.
[12] S. Mozaffari, K. Faez, M. Ziaratban, "Character Rep-
resentation and Recognition using Quadtree-based
Fractal Encoding Scheme ", Proceedings of the 8th
International Conference on Document Analysis and
Recognition, Seoul, Korea, 2005, Vol.2, pp. 819-823.
[13] A. P. Sexton, V. Sorge, "Database-Driven Mathemat-
ical Character Recognition", Graphics Recognition,
Algorithms and Applications (GREC), Lecture Notes
in Computer Science (LNCS), Hong Kong, 2006, pp.
206-217.
[14] Georgios Vamvakas, Basilis Gatos, Stavros J. Peran-
tonis,” Handwritten character recognition through
two-stage foreground sub-sampling”,” Pattern
Recognition,” 43 (2010) 2807–2816
[15] U. Bhattacharya and B.B. Chaudhuri, “Databases for
Research on Recognition of Handwritten Characters
of Indian Scripts,” Proc. Eighth Int‟l Conf. Document
Analysis and Recognition (ICDAR ‟05), vol. 2, pp.
789-793, 2005.
[16] S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer
learning revisited: a stepwise procedure for building
and training a neural network. In J. Fogelman, edi-
tor, Neu-rocomputing: Algorithms, Architectures
and Applications. Springer-Verlag, 1990.
[17] U. H.-G. Kressel. Pairwise classication and support
vector machines. In B. Scholkopf, C. J. C. Burges, and