You are on page 1of 10

THE HONG KONG POLETECHNIC UNIVERSITY

Final Year Project


Proposal
Mobile Device Optical Character Recognition
and Applications







This document serves as the proposal for the final year project supervised by Prof. Henry Chan.
One optical character recognition (OCR) system is proposed and several mobile applications
would be built upon the system.
Student Name: YANG Fan
Student ID: 06846354d
Supervisor: Prof. Henry Chan
2



This pages is left blank intentionally
3

Table of Contents
Problem Statement ......................................................................................................................... 4
Objectives and Outcome ................................................................................................................. 4
Objectives .................................................................................................................................... 4
Outcome ...................................................................................................................................... 5
Project Methodology ....................................................................................................................... 6
Developing on PC......................................................................................................................... 6
Preprocessing .......................................................................................................................... 6
Character Recognition ............................................................................................................. 7
Porting to Android ....................................................................................................................... 8
Why Android? .......................................................................................................................... 8
Porting Method ....................................................................................................................... 9
Project Schedule .............................................................................................................................. 9
Resources Estimation ...................................................................................................................... 9
References ..................................................................................................................................... 10


4

Problem Statement

The computational capability of mobile device has been burgeoning during past two or three
years. Those applications that are once futile on mobile device due to the limitation of
computing power are now becoming available to mobile users. Moreover, with the release of
several state-of-art mobile operating systems especially iOS and Android, the creativity of
developers and researcher has been unleashed.
However, there is not any robust, efficient and yet free optical character recognition (OCR)
system available in the market even though there are plenty of popular desktop OCR
applications. It is then crucial to develop an OCR system that is suitable to mobile device. Then
this system can serve as a base service to other applications. Potentially it can also be a good
exploration for using OCR as an input for the mobile device. To be more specific, following
problems are to be tackled in this project:
Is it possible to build an OCR system on mobile device? If it is, which algorithm is most
effective and efficient?
Given a mobile OCR system, is there any possible application that can build on top of it?
Is it feasible to use OCR system as a complementary input method?
Therefore, in my proposal, I suggest to implement a mobile OCR system and also develop mobile
applications that can utilize such capability.

Objectives and Outcome
Objectives
The objectives of this project primarily contain three goals:
Design and Implement Mobile OCR System
Design and implement mobile OCR system which can successfully and efficiently
recognize characters from image. Current OCR systems are mostly available on PC
systems. An effective and efficient OCR system will become the foundation of other OCR
related applications.
Implement Application that utilizes the mobile OCR system
Utilize the OCR system to implement an application which can recognize Chinese 2D
code. There are a lot of applications focusing on the English 2D code. English 2D code
makes it much easier to send it through SMS messages. Unlike QR code, a 2D code
would not require the internet connection. A Chinese 2D code application will serve the
5

same purpose but the system uses Chinese characters as the primary encoding
characters.
Investigate Possible Ways to use OCR as input for mobile device
OCR will potentially become an important supplement input method for mobile device.
In this project, the possible ways to utilize OCR to input characters will be explored.
Outcome
The output of this project will potentially benefit the entire mobile user groups, the possible
outcome includes:
An OCR service that runs on mobile device which can convert characters contained in an
image into text.
A mobile application which can recognize and decode Chinese 2D code
An input method which can use characters in an image as input on the mobile device

6

Project Methodology
Due the complexity of this project, also by its nature, the project can be divided into three major
phases.
Developing on PC
As the first stage, an experimental OCR system will be developed on a PC system for
demonstration and validation purpose. The major technique and theory are mostly mature for
optical character recognition (OCR). In order to develop an OCR system, following preprocessing
will be applied:

Preprocessing
The first step in the preprocessing is to segment the shape image, a simple thresolding is applied
to convert the gray level shape image into binary image. In reality, shape images are often
corrupted with noise, as a result, the shape obtained from the thresholding usually has noise
around the shape boundary, therefore, a denoise process is applied. The denoising process
eliminates those isolated pixels and those isolated small regions or segments. Then the
technique of edge detection will be applied. The result of applying an edge detector to an image
may lead to a set of connected curves that indicate the boundaries of objects, the boundaries of
surface markings as well as curves that correspond to discontinuities in surface orientation. Thus,
applying an edge detection algorithm to an image may significantly reduce the amount of data
to be processed and may therefore filter out information that may be regarded as less relevant,
while preserving the important structural properties of an image.

Figure 2
Segmentation Denoising
Edge
detection
Input
image
Character
recognition
Figure 1 Preprocessing
Original Segmentation Denoised Edge Detected
7

Character Recognition
After preprocessing, the character recognition methods can then applied. However, there are
three ways that I want to compare in this project. These three methods are all widely applied
currently.
Fourier Descriptor
The term "Fourier Descriptor'' describes a family of related image features. Generally, it refers
to the use of a Fourier Transform to analyze a closed planar curve. Much work has been done
studying the use of the Fourier descriptor as a mechanism for shape identification. Some work
has also been done using Fourier descriptors to assist in OCR. In the context of OCR, the planar
curve is generally derived from a character boundary. Since each of a character's boundaries is a
closed curve, the sequence of (x, y) coordinates that specifies the curve is periodic. This makes it
ideal for analysis with a Discrete Fourier Transform. In this project, the Fourier descriptor
approach will the primary way of character recognition due its claimed efficiency and ease of
use.


ANN
The Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of
problems. The ANN is an information-processing paradigm inspired by the way the human brain
processes information. Artificial neural networks are collections of mathematical models that
represent some of the observed properties of biological nervous systems and draw on the
analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists
of a large number of highly interconnected processing elements (nodes) that are tied together
with weighted connections (links). Learning in biological systems involves adjustments to the
synaptic connections that exist between the neurons. This is true for ANN as well. Learning
typically occurs by example through training, or exposure to a set of input/output data (pattern)
where the training algorithm adjusts the link weights. The link weights store the knowledge
necessary to solve specific problems.
A single connected component (left image) and its boundary curves and centroids
(right image).
8

Originated in late 1950's, neural networks did not gain much popularity until 1980s, a computer
booming era. Today ANNs are mostly used for solution of complex real world problems. They
are often good at solving problems that are too complex for conventional technologies (e.g.,
problems that do not have an algorithmic solution or for which an algorithmic solution is too
complex to be found) and are often well suited to problems that people are good at solving, but
for which traditional methods are not. They are good pattern recognition engines and robust
classifiers, with the ability to generalize in making decisions based on imprecise input data. They
offer ideal solutions to a variety of classification problems such as speech, character and signal
recognition, as well as functional prediction and system modeling, where the physical processes
are not understood or are highly complex. The advantage of ANNs lies in their resilience against
distortions in the input data and their capability to learn.
However, ANN is potentially more complex than the Fourier descriptor approach. So it will serve
as a comparative object to Fourier descriptor approach unless it is proved to be very much
efficient and thus feasible to deploy on mobile device.
Template Matching
For characters without any transformation like scaling or rotation, a template matching approach
may be effective. The template matching process determines the best location by testing all or a
sample of the viable test locations within the search image that the template image may match up
to. The template matching algorithm may potentially require sampling of a large number of points,
it is possible to reduce the number of sampling points by reducing the resolution of the search
and template images by the same factor and performing the operation on the resultant downsized
images (multiresolution, or pyramid, image processing), providing a search window of data points
within the search image so that the template does not have to search every viable data point, or a
combination of both.


Porting to Android
Why Android?
Android is a mobile operating system developed by Google and is based upon a modified version
of the Linux kernel. It was initially developed by Android Inc. (a firm purchased by Google) and
later positioned in the Open Handset Alliance. According to NPD Group, unit sales for Android
Template Target Result
9

OS smart phones ranked first among all smart phone OS handsets sold in the U.S. in the second
quarter of 2010, at 33%.BlackBerry OS is second at 28%, and iOS is ranked third with 22%.
Therefore we can Android system has great popularities among users and developers. Also, as
an open source system, developing application for Android system will be mostly enjoyable
process. More important is, Android power device usually has more computational power which
will facilitate the performance of our system.
Porting Method
Porting a program originally written in C language from PC platform to Android system will be
basically straightforward. Google has released Android NDK tool chain which makes the porting
process fairly smooth.
Project Schedule
Following is the tentative project schedule:
Milestone Date Remarks
FYP proposal Oct 7 Required by the department
Implement Preprocessing Oct 30
Implement Fourier Descriptor Nov 15
Implement ANN Nov 30
Implement Template Matching Dec 10
Compare Three Approaches Dec 20 Decide which approach is most suitable
Design Chinese 2D code Dec 30
Mid-term check point Report Jan 13 Required by the department
Implement Chinese 2D code Jan 20
Explore OCR input method Feb 10 Decide whether it is feasible to use OCR
input
Implement demonstrative OCR input
method
Feb 30
Final Report April 14 Required by the department

Resources Estimation
During this project, only very limited resources are required, including
One web camera
One Android powered smart phone
A PC with Linux operating system installed

10

References

1. Brigham, E. Oran (1988). The fast Fourier transform and its applications. Englewood
Cliffs, N.J.: Prentice Hall. ISBN 0-13-307505-2.
2. Oppenheim, Alan V.; Schafer, R. W.; and Buck, J. R. (1999). Discrete-time signal
processing. Upper Saddle River, N.J.: Prentice Hall. ISBN 0-13-754920-2.
3. Smith, Steven W. (1999). "Chapter 8: The Discrete Fourier Transform". The Scientist and
Engineer's Guide to Digital Signal Processing (Second ed.). San Diego, Calif.: California
Technical Publishing. ISBN 0-9660176-3-3.
4. Cormen, Thomas H.; Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (2001).
"Chapter 30: Polynomials and the FFT". Introduction to Algorithms (Second ed.). MIT
Press and McGraw-Hill. pp. 822848. ISBN 0-262-03293-7. esp. section 30.2: The DFT
and FFT, pp. 830838.
5. P. Duhamel, B. Piron, and J. M. Etcheto (1988). "On computing the inverse DFT". IEEE
Trans. Acoust., Speech and Sig. Processing 36 (2): 285286. doi:10.1109/29.1519.
6. J. H. McClellan and T. W. Parks (1972). "Eigenvalues and eigenvectors of the discrete
Fourier transformation". IEEE Trans. Audio Electroacoust. 20 (1): 6674.
doi:10.1109/TAU.1972.1162342.
7. Azriel Rosenfeld, Picture Processing by Computer, New York: Academic Press, 1969
8. "Space Technology Hall of Fame: Inducted Technologies/1994", Space Foundation, 1994.
Retrieved 7 January 2010.
9. A Brief, Early History of Computer Graphics in Film, Larry Yaeger, 16 Aug 2002 (last
update), retrieved 24 March 2010
10. J. Canny (1986) "A computational approach to edge detection", IEEE Trans. Pattern
Analysis and Machine Intelligence, vol 8, pages 679-714.
11. R. Haralick, (1984) "Digital step edges from zero crossing of second directional
derivatives", IEEE Trans. on Pattern Analysis and Machine Intelligence, 6(1):5868.
12. R. Kimmel and A.M. Bruckstein (2003) "On regularized Laplacian zero crossings and
other optimal edge integrators", International Journal of Computer Vision, 53(3) pages
225-243.