Professional Documents
Culture Documents
Abstract—This paper presents a way for detecting sign board • Get image
and text recognition to aid navigation in indoor environment . • Detection of features
Using text as a landmark for vision based navigation is still an • If same landmark is seen ,then relate it to previous data
active research and till date all algorithms developed for detection
• Position of the robot is updated whenever a new land-
and recognition of texts for an indoor navigation have a lot of
room to make it applicable on real time. Our proposed method mark is matched .
is an extension of the work and will aid in the on going research. If Optical character recognition algorithms are used for real
We were able to extract sign boards text from actual scenes. time images then it is not as efficient as it is for scanned
Index Terms—histogram backprojection, OCR, vision, indoor
, navigation. documents [5]. So main hurdle in using such algorithms is
to extract the data in such a fashion in which the maximum
I. I NTRODUCTION results are achieved. It is still an on going research and
work is been done to make it better in multiple conditions
In robotics, navigation is defined as finding a safe passage of light, shades of color, background , orientation etc . We
for robot’s movement from current point to a distant goal point. have explored on the option of text to aid with semantic map
For this purpose several types of sensors can be used like and make it robust.
visual, sonar, lidar etc . Visual sensors have an edge because In this paper, text from sign boards are used as landmarks
then the robot thinks like human and ability to be autonomous because the advantages it has in decribing the contents of the
increases. image. Our required section can be cropped from image easily.
Ability to extract information from an image has huge And also it aids in automatic robot navigation.
application in field of computer vision. There are two types The rest of the paper is organized as follows; section
of data which can be extracted from an image : 2 presents a literature review on work done uptil now on
First being the perceptual content, where we are able to text detection. Section 3 gives the methodology for detection
interpret features like shape of an object, its color , its texture and extraction of sign boards in indoor environment. Section
etc. 4 briefs about the experiment and results achieved. Finally
Second is the semantic content, which handles the problem Section 5 discusses conclusion of research and future work.
of letters, words, signs etc and how do they relate to each
other. Literature on visual and textual detection is extensive. II. L ITERATURE R EVIEW
For example in [1],author would count how many pixels are Literature is full of work on extraction of text from images.
in less than required threshold. From there text frames are But most of them are on the initial process, which is to
extracted in which majority of such pixels lie. localization of text in an image. Next stage is to recognize
If the robot has to understand the surrounding like us the text separately. Finally word is deduced from individual
humans then it should be able to recognize information like letters by processes like machine learning.
kitchen and bathroom, curtain or door etc. This can be solved In [6] , a new fast algorithm is proposed called the stroke
by semantic mapping , which adds information to its naviga- width transform (SWT). Binary edge map is generated from
tion ability. It will also help in achieving a better humanrobot an edge detector. Parallel lines evaluates the stroke width in
interaction [2]. Particular example in this regard is [3] which individual pixel. Those have same width are categorized into
uses natural language with semantic mapping and help robot a single character. But this method’s efficiency depend on
adjust better in human surroundings. noise and blur levels. In [7], method differentiates between
There are also other ways for building a semantic map like characters or non-characters on the basis of geometrical con-
using metric maps and injecting semantic information on it nected components . In [8], author used maximally stable
[4]. Here metric maps helps in movement (static and dynamic extremal regions properties of color and geometrical features
obstacles) and semantic map defines a relationship for the and displayed text output.
objects detected. In [8] ,MSER and SWT is used with convolutional neural
This is called self-localization and forms basis for moving in network. Its best performance comes correctly matched text on
an outdoor or indoor environment. Important points are listed: images with single characters whereas Google Goggles gives
C. Tesseract
B. Shape Detect After a sign board is extracted , next we have to recognize
After we have detected required color in a surrounding, we the text inside the image. Optical Character Recognition
need to find the shape of that object to confirm it’s a sign process is used to find and read texts in images received from
board. For this, contours of the detected objects are found. navigation of robot in an environment.
A polygonal curve is generated which stores the sequence of These techniques of Optical Character Recognition(OCR)
points extracted from object. A criteria is set for determining were mainly used for scanned images. So using them on
an object to be square, which are: images obtained from robot’s navigation will lead to errors and
• Shape is convex. bad recognition. So we have pre processed the image to extract
• There are four vertices. our region of interest. This part is cropped and to fed to OCR
• Angles detected are ninety degrees. algorithm to recognize texts. Our technique of pre processing
• contour Area is relatively large (so extra noise in elimi- of image results in better text recognition . Tesseract is widely
nated . used to perform OCR. Experimental results concluded that
To make the input image binary we change it to gray level mode in which a single uniform block of text is taken produces
and set manually a threshold level. To cater the problem of best result. To enable it to detect paragraph breaks, we have
little noises,we change the scale of image to first lower and set the mode for automatically segmenting each page. Figure
then increase it back to same scale. 5 shows image of the extracted sign board after all processing
Canny operator has a property of detecting square shapes is done. This image is input for text recognition algorithm
having different shades.
To know how a Canny edge detection works, process is
divided into following steps :
1) A gaussian filter can be used to eliminate background
noises. Mathematical form of gaussian filter disc of size
2k+1*2k+1 is:
2 2
! Fig. 5. Extracted Region of Interest
1 (i − k − 1) + (j − k − 1)
Hij = ∗ exp −
2πσ 2 2σ 2
IV. E XPERIMENT AND R ESULTS
To analyze the performance of algorithm under various
2) Points in the image are found having gradient of high set of conditions of lightning and alignment, we collected
intensity. test images from local indoor environment. The experimental
3) Extra thin lines because of edge detection are removed. results show that our proposed method is able to detect sign
4) Threshold is again set to increase the possibility of boards from the indoor environment and recognize text in
detecting an edge. it, results vary due to perspective disorder, changes in font
sizes, misalignment , variable number of characters in text • using neural network architecture .
and changing lightning conditions. • applying correction for affine distortion.
Algorithm was tested in real environment images. Results • To track detected region using image tracking technique.
proved that the suggested method was able to extract text
R EFERENCES
from sign boards to be used as landmark and aid in robot’s
navigation. A model of the wheelchair was run in gazebo as [1] Y.K. Lim, S.H. Choi, S.W. Lee, Text extraction in MPEG compressed
video for content-based indexing, Proceedings of International Confer-
seen in figure 6 to further test the results of the algorithm and ence on Pattern Recognition, 2000, pp. 409412.
to create an image database . [2] I. Kostavelis, A. Gasteratos, Learning spatially semantic representa- tions
for cognitive robot navigation, Robotics and Autonomous Sys- tems 61
(12) (2013) 14601475.
[3] H. Zender, O. Mart nez Mozos, P. Jensfelt, G.-J. Kruijff, W. Burgard,
Conceptual spatial representations for indoor mobile robots, Robotics
and Autonomous Systems 56 (6) (2008) 493502.
[4] A. Pronobis, P. Jensfelt, Large-scale semantic mapping and reason- ing
with heterogeneous modalities, in: International Conference on Robotics
and Automation, IEEE, 2012, pp. 35153522.
[5] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes
with stroke width transform, CVPR 2010, pp. 2963-2970.
[6] Boris Epshtein, Eyal Ofek, and Yonatan Wexler. Detecting text in natural
scenes with stroke width transform. In IEEE Conference on Com- puter
Vision and Pattern Recognition (CVPR), 2010, pages 29632970. IEEE,
2010.
[7] Jin-Liang Yao, Yan-Qing Wang, Lu- Bin Weng, and Yi-Ping Yang.
Locating text based on connected component and svm. In Interna-
tional Conference on Wavelet Analysis and Pattern Recogni- tion
(ICWAPR07), 2007, volume 3, pages 14181423. IEEE, 2007.
[8] Lukas Neumann and Jiri Matas. Real-time scene text localization and
recog- nition. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2012, pages 35383545. IEEE, 2012.
[9] P. Corke, O. Lam, F. Dayoub, and R. Schulz, Text recognition ap-
proaches for indoor robotics a comparison.pdf. Australasian Conference
Fig. 6. Gazebo model of wheelchair on Robotics and Automation, 02-Dec-2014.
[10] S. Soumya, V. Geethapriya, and S. Shankar Bharathi, A novel edge
detection and pattern recognition algorithm based on beamlet theory
Difference of color intensities does alter the results in for a vision-based wheeled mobile robot, in Communications and
algorithm but can be controlled by varying and adjusting the Signal Processing (ICCSP), 2014 International Conference on, 2014,
threshold. This method will be tested for different font sizes pp. 11971200.
[11] P. Shopa, N. Sumitha, and P. S. K. Patra, Traffic sign detection and
, alignment, variable lightning and reflection effects. Binarize recognition using OpenCV, in Information Communication and Embed-
output image is fed to OCR for text recognition. A database of ded Systems (ICICES), 2014 International Conference on, 2014, pp. 16.
possible sets of sign boards found in lab is created. Algorithm [12] Y.-Y. Nguwi and A. Z. Kouzani, Automatic road sign recognition using
neural networks, in Neural Networks, 2006. IJCNN06. International
can find the most similar match by comparing of strings but Joint Conference on, 2006, pp. 39553962.
for this result was found manually. [13] R. Saabni and M. Zwilling, Text Detection and Recognition in Real
For instance, from test input image of Exit sign board World Images, in Frontiers in Handwriting Recognition (ICFHR), bari,
2012, pp. 443448.
, output of OCR is 0EXIT0. By comparing it is deduced [14] K. Kataoka, K. Sudo, and M. Morimoto, Region of Interest detection
that chances of it being EXIT is much higher. Similarly using indoor structure and saliency map, in Pattern Recognition (ICPR),
we have stored names of different signboards found in local 2012 21st International Conference on, 2012, pp. 33293332
[15] X. Liu and J. Samarabandu, An edge-based text region extraction
environment. We were able to achieve an accuracy of 67% in algorithm for indoor mobile robot navigation, in Mechatronics and
recognizing of texts from an image. Automation, 2005 IEEE International Conference, 2005, vol. 2, pp.
701706.
V. C ONCLUSION [16] C. Case, B. Suresh, A. Coates, and A. Y. Ng, Autonomous sign reading
for semantic mapping, in Robotics and Automation (ICRA), 2011 IEEE
Landmarks are an important feature for mobile robots whose International Conference on, 2011, pp. 32973303.
navigation system are based on vision. It helps in localization [17] C.-S. Fahn and P.-H. Chang, Text Plates Detection and Recognition
Techniques Used for an Autonomous Robot Navigation in Indoor
of robot, and plan to move to goal point. We have proposed a Environments, in Robotics, Automation and Mechatronics (RAM), 2013
robust and simple technique for detection and recognition of 6th IEEE Conference on, Manila, 2013.
text in sign boards to act as a landmark in navigation system. [18] C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu. Rotation-invariant
features for multi-oriented text detection in natural images. PLoS One,
Till yet all the algorithms developed for detection and 8(8), 2013.
recognition of texts for an indoor environment are not much [19] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned multi-scale
applicable on real life. Our proposed method is an extension representation for scene text recognition. In Proc. of CVPR, 2014
of the work and will aid in the on going research. We are in
stage of utilizing the method for a larger project in which
autonomous wheel chair navigates and plan a route in an
indoor environment.
For future, we would focus on :