You are on page 1of 4

Text Detection and Recognition for Semantic

Mapping in Indoor Navigation


Muhammad Sami , Yasar Ayaz, Mohsin Jamil , Syed Omer Gilani, Muhammad Naveed
School of Mechanical and Manufacturing Engineering
National University of Science and Technology , Pakistan
Email: muhammadsami@smme.edu.pk,{yasar,mohisn,omer,naveed.muhammad@smme.nust.edu.pk }

Abstract—This paper presents a way for detecting sign board • Get image
and text recognition to aid navigation in indoor environment . • Detection of features
Using text as a landmark for vision based navigation is still an • If same landmark is seen ,then relate it to previous data
active research and till date all algorithms developed for detection
• Position of the robot is updated whenever a new land-
and recognition of texts for an indoor navigation have a lot of
room to make it applicable on real time. Our proposed method mark is matched .
is an extension of the work and will aid in the on going research. If Optical character recognition algorithms are used for real
We were able to extract sign boards text from actual scenes. time images then it is not as efficient as it is for scanned
Index Terms—histogram backprojection, OCR, vision, indoor
, navigation. documents [5]. So main hurdle in using such algorithms is
to extract the data in such a fashion in which the maximum
I. I NTRODUCTION results are achieved. It is still an on going research and
work is been done to make it better in multiple conditions
In robotics, navigation is defined as finding a safe passage of light, shades of color, background , orientation etc . We
for robot’s movement from current point to a distant goal point. have explored on the option of text to aid with semantic map
For this purpose several types of sensors can be used like and make it robust.
visual, sonar, lidar etc . Visual sensors have an edge because In this paper, text from sign boards are used as landmarks
then the robot thinks like human and ability to be autonomous because the advantages it has in decribing the contents of the
increases. image. Our required section can be cropped from image easily.
Ability to extract information from an image has huge And also it aids in automatic robot navigation.
application in field of computer vision. There are two types The rest of the paper is organized as follows; section
of data which can be extracted from an image : 2 presents a literature review on work done uptil now on
First being the perceptual content, where we are able to text detection. Section 3 gives the methodology for detection
interpret features like shape of an object, its color , its texture and extraction of sign boards in indoor environment. Section
etc. 4 briefs about the experiment and results achieved. Finally
Second is the semantic content, which handles the problem Section 5 discusses conclusion of research and future work.
of letters, words, signs etc and how do they relate to each
other. Literature on visual and textual detection is extensive. II. L ITERATURE R EVIEW
For example in [1],author would count how many pixels are Literature is full of work on extraction of text from images.
in less than required threshold. From there text frames are But most of them are on the initial process, which is to
extracted in which majority of such pixels lie. localization of text in an image. Next stage is to recognize
If the robot has to understand the surrounding like us the text separately. Finally word is deduced from individual
humans then it should be able to recognize information like letters by processes like machine learning.
kitchen and bathroom, curtain or door etc. This can be solved In [6] , a new fast algorithm is proposed called the stroke
by semantic mapping , which adds information to its naviga- width transform (SWT). Binary edge map is generated from
tion ability. It will also help in achieving a better humanrobot an edge detector. Parallel lines evaluates the stroke width in
interaction [2]. Particular example in this regard is [3] which individual pixel. Those have same width are categorized into
uses natural language with semantic mapping and help robot a single character. But this method’s efficiency depend on
adjust better in human surroundings. noise and blur levels. In [7], method differentiates between
There are also other ways for building a semantic map like characters or non-characters on the basis of geometrical con-
using metric maps and injecting semantic information on it nected components . In [8], author used maximally stable
[4]. Here metric maps helps in movement (static and dynamic extremal regions properties of color and geometrical features
obstacles) and semantic map defines a relationship for the and displayed text output.
objects detected. In [8] ,MSER and SWT is used with convolutional neural
This is called self-localization and forms basis for moving in network. Its best performance comes correctly matched text on
an outdoor or indoor environment. Important points are listed: images with single characters whereas Google Goggles gives

978-1-4673-6537-6/15/$31.00 ©2015 IEEE


the best performance on images with multiple words. They
investigated the performance with unconstrained text recog-
nition with different texture and lightning conditions. They
aimed to improve performance from campus on environment.
Word detection is not implemented.In [9] , Vision system is
used to detect the indicators on sign boards to assist wheeled
mobile robot. Beamlet theory algorithm is implemented in
matlab by taking information of edges .In [10] Traffic Signs
are recognized with using different filters. In [11], Recognition
is done using neural networks and rate of 96% is achieved
Fig. 1. indoor sign board
by Scaled Conjugate Gradient trained classifiers, Detection is
done by hue-saturation intensity. Color based detection has
problems with outdoor illumination .
In [12] ,layers of gray images is created and format of
YcBcR format is used to deal with the problem of non-smooth
illumination level. Text contained in the real world images
can be grouped together because of similar texture level.
Parts containing texts and those without are differentiated
with support vector machines. To recognize the characters
,technique of earth movers distance is used.
In [13] ,sign boards are detected from local environment.
Scenes are categorized in three different views and x
coordinate is assumed according to it. Sign board is then
extracted from saliency map and with rule that they appear
on wall or plane orthogonal to wall. In [14] Nameplates
and sign boards are extracted from edge-based text region
extraction algorithm. The Algorithm detects region by edge
detection ,then text is localized and finally characters are
extracted. Text area is extracted from a dilation operator
(using the characteristic that they are clustered). In [15] ,
SLAM-generated map is annotated with text. System predicts
region of text by logistic regression classifier and reads Fig. 2. outdoor sign board
characters from these images by OCR .In [16] , regions are
extracted by color. Image is divided into bigger pixels and
A. Color Detect
then the colors are merged to detect similar blocks of colors
to detect text plate. We have detected the required color of sign boards i.e. red
or green by histogram backprojection. It gives an advantage
The current problems for achieving an efficient result on of not able to cater non uniform illumination, contrast and
text detection and recognition can be listed as [17, 18]: occlusion. It helps to find the region of interest in an image.
We create a single channel image same as the size of our
• Texts in real world come in variety of sizes,colors, image where probability of pixels similar to required image is
texture, fonts,scales. higher. After that, histogram of an input image is calculated.
• Cluttered background alters the result significantly and This histogram is backpojected with the previous histogram
text is very difficult to differentiate. and possibility of finding the required pixels is calculated. The
• Different shades of lightning, occlusion, noise, result after adjusting the threshold levels gives us the extracted
blurs,resolution also does not help for the cause image.
Initially, color histogram is calculated of thing to find find
,N . And from where to find, U.
III. M ETHODOLOGY Find the ratio
N
R= (1)
To find solution for our own requirement of text recognition U
in indoor environment, we dissected the problem into three Then backproject E calculates the possibility of required
cases. First to find the color of sign boards in surroundings, object
second being able to detect the shape of sign board. And finally V (x, y) = E[g(x, y), a(x, y)] (2)
when the text is detected ,it is to be recognized. Figure 1-2
shows images from the local indoor and outdoor enivronment. where g is hue and a is saturation of the pixel at (x,y). Then
5) Edges which are not connected to strong edges are
V (x, y) = min[V (x, y), 1] (3) removed.
After doing disc convolutional Result of this process is dilated to eliminate some holes
from the image. Function then takes input from all contours
V = S ∗ V. (4) found and draws squares in a image. Figure 4 shows image
with only our required part extracted with little noises.
where S is the disc kernel.
Now the location of maximum intensity gives us the lo-
cation of object. If we are expecting a region in the image,
thresholding for a suitable value gives a nice result. Figure 3
shows the color detetction of EXIT signboard with histogram

Fig. 4. Detection of exit plate


Fig. 3. Color Detection of sign board

C. Tesseract
B. Shape Detect After a sign board is extracted , next we have to recognize
After we have detected required color in a surrounding, we the text inside the image. Optical Character Recognition
need to find the shape of that object to confirm it’s a sign process is used to find and read texts in images received from
board. For this, contours of the detected objects are found. navigation of robot in an environment.
A polygonal curve is generated which stores the sequence of These techniques of Optical Character Recognition(OCR)
points extracted from object. A criteria is set for determining were mainly used for scanned images. So using them on
an object to be square, which are: images obtained from robot’s navigation will lead to errors and
• Shape is convex. bad recognition. So we have pre processed the image to extract
• There are four vertices. our region of interest. This part is cropped and to fed to OCR
• Angles detected are ninety degrees. algorithm to recognize texts. Our technique of pre processing
• contour Area is relatively large (so extra noise in elimi- of image results in better text recognition . Tesseract is widely
nated . used to perform OCR. Experimental results concluded that
To make the input image binary we change it to gray level mode in which a single uniform block of text is taken produces
and set manually a threshold level. To cater the problem of best result. To enable it to detect paragraph breaks, we have
little noises,we change the scale of image to first lower and set the mode for automatically segmenting each page. Figure
then increase it back to same scale. 5 shows image of the extracted sign board after all processing
Canny operator has a property of detecting square shapes is done. This image is input for text recognition algorithm
having different shades.
To know how a Canny edge detection works, process is
divided into following steps :
1) A gaussian filter can be used to eliminate background
noises. Mathematical form of gaussian filter disc of size
2k+1*2k+1 is:

2 2
! Fig. 5. Extracted Region of Interest
1 (i − k − 1) + (j − k − 1)
Hij = ∗ exp −
2πσ 2 2σ 2
IV. E XPERIMENT AND R ESULTS
To analyze the performance of algorithm under various
2) Points in the image are found having gradient of high set of conditions of lightning and alignment, we collected
intensity. test images from local indoor environment. The experimental
3) Extra thin lines because of edge detection are removed. results show that our proposed method is able to detect sign
4) Threshold is again set to increase the possibility of boards from the indoor environment and recognize text in
detecting an edge. it, results vary due to perspective disorder, changes in font
sizes, misalignment , variable number of characters in text • using neural network architecture .
and changing lightning conditions. • applying correction for affine distortion.
Algorithm was tested in real environment images. Results • To track detected region using image tracking technique.
proved that the suggested method was able to extract text
R EFERENCES
from sign boards to be used as landmark and aid in robot’s
navigation. A model of the wheelchair was run in gazebo as [1] Y.K. Lim, S.H. Choi, S.W. Lee, Text extraction in MPEG compressed
video for content-based indexing, Proceedings of International Confer-
seen in figure 6 to further test the results of the algorithm and ence on Pattern Recognition, 2000, pp. 409412.
to create an image database . [2] I. Kostavelis, A. Gasteratos, Learning spatially semantic representa- tions
for cognitive robot navigation, Robotics and Autonomous Sys- tems 61
(12) (2013) 14601475.
[3] H. Zender, O. Mart nez Mozos, P. Jensfelt, G.-J. Kruijff, W. Burgard,
Conceptual spatial representations for indoor mobile robots, Robotics
and Autonomous Systems 56 (6) (2008) 493502.
[4] A. Pronobis, P. Jensfelt, Large-scale semantic mapping and reason- ing
with heterogeneous modalities, in: International Conference on Robotics
and Automation, IEEE, 2012, pp. 35153522.
[5] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes
with stroke width transform, CVPR 2010, pp. 2963-2970.
[6] Boris Epshtein, Eyal Ofek, and Yonatan Wexler. Detecting text in natural
scenes with stroke width transform. In IEEE Conference on Com- puter
Vision and Pattern Recognition (CVPR), 2010, pages 29632970. IEEE,
2010.
[7] Jin-Liang Yao, Yan-Qing Wang, Lu- Bin Weng, and Yi-Ping Yang.
Locating text based on connected component and svm. In Interna-
tional Conference on Wavelet Analysis and Pattern Recogni- tion
(ICWAPR07), 2007, volume 3, pages 14181423. IEEE, 2007.
[8] Lukas Neumann and Jiri Matas. Real-time scene text localization and
recog- nition. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2012, pages 35383545. IEEE, 2012.
[9] P. Corke, O. Lam, F. Dayoub, and R. Schulz, Text recognition ap-
proaches for indoor robotics a comparison.pdf. Australasian Conference
Fig. 6. Gazebo model of wheelchair on Robotics and Automation, 02-Dec-2014.
[10] S. Soumya, V. Geethapriya, and S. Shankar Bharathi, A novel edge
detection and pattern recognition algorithm based on beamlet theory
Difference of color intensities does alter the results in for a vision-based wheeled mobile robot, in Communications and
algorithm but can be controlled by varying and adjusting the Signal Processing (ICCSP), 2014 International Conference on, 2014,
threshold. This method will be tested for different font sizes pp. 11971200.
[11] P. Shopa, N. Sumitha, and P. S. K. Patra, Traffic sign detection and
, alignment, variable lightning and reflection effects. Binarize recognition using OpenCV, in Information Communication and Embed-
output image is fed to OCR for text recognition. A database of ded Systems (ICICES), 2014 International Conference on, 2014, pp. 16.
possible sets of sign boards found in lab is created. Algorithm [12] Y.-Y. Nguwi and A. Z. Kouzani, Automatic road sign recognition using
neural networks, in Neural Networks, 2006. IJCNN06. International
can find the most similar match by comparing of strings but Joint Conference on, 2006, pp. 39553962.
for this result was found manually. [13] R. Saabni and M. Zwilling, Text Detection and Recognition in Real
For instance, from test input image of Exit sign board World Images, in Frontiers in Handwriting Recognition (ICFHR), bari,
2012, pp. 443448.
, output of OCR is 0EXIT0. By comparing it is deduced [14] K. Kataoka, K. Sudo, and M. Morimoto, Region of Interest detection
that chances of it being EXIT is much higher. Similarly using indoor structure and saliency map, in Pattern Recognition (ICPR),
we have stored names of different signboards found in local 2012 21st International Conference on, 2012, pp. 33293332
[15] X. Liu and J. Samarabandu, An edge-based text region extraction
environment. We were able to achieve an accuracy of 67% in algorithm for indoor mobile robot navigation, in Mechatronics and
recognizing of texts from an image. Automation, 2005 IEEE International Conference, 2005, vol. 2, pp.
701706.
V. C ONCLUSION [16] C. Case, B. Suresh, A. Coates, and A. Y. Ng, Autonomous sign reading
for semantic mapping, in Robotics and Automation (ICRA), 2011 IEEE
Landmarks are an important feature for mobile robots whose International Conference on, 2011, pp. 32973303.
navigation system are based on vision. It helps in localization [17] C.-S. Fahn and P.-H. Chang, Text Plates Detection and Recognition
Techniques Used for an Autonomous Robot Navigation in Indoor
of robot, and plan to move to goal point. We have proposed a Environments, in Robotics, Automation and Mechatronics (RAM), 2013
robust and simple technique for detection and recognition of 6th IEEE Conference on, Manila, 2013.
text in sign boards to act as a landmark in navigation system. [18] C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu. Rotation-invariant
features for multi-oriented text detection in natural images. PLoS One,
Till yet all the algorithms developed for detection and 8(8), 2013.
recognition of texts for an indoor environment are not much [19] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned multi-scale
applicable on real life. Our proposed method is an extension representation for scene text recognition. In Proc. of CVPR, 2014
of the work and will aid in the on going research. We are in
stage of utilizing the method for a larger project in which
autonomous wheel chair navigates and plan a route in an
indoor environment.
For future, we would focus on :

You might also like