You are on page 1of 10

Algorithm 1: Character Segmentation

Optical character recognition(OCR) is a mechanism to extract text from messages. Many


techniques have been employed to do the same, character segmentation being one of them. This
technique is essentially used to segment a given image into sub components which inturn are
analysed as individual text characters. The various methods used are:
a. Dissection
b. Recognition based segmentation
c. Holistic Approach.
Method 1: Dissection
In this technique, a given image is split into individual images, which are then further tested
individually to determine if they are text characters, until no further characters can be recognised
in the given image.
Salient Features

Determine the general properties for each of the obtained segments to extract
valid characters, such as their width, height, and the space between individual characters and
the space between words.

Vertical spacing plays a crucial role in spacing characters and dissecting them into
subimages. Thus, spacing and pitch play crucial role in identifying segmentation points.

A generalized three step process for dissection is


a. Detecting the start character
b. Sectioning
c. Detecting the last character.
This method is not efficient enough for heavy printing.

Projection [1] methods are essentially utilized for good quality machine printing,
employing column separation of characters.

Methods described in points earlier are feasible only if the characters are printed,
since the width of characters and spacing between them is constant. In the case of
handwritten text, such pitch based or column based approaches produce limited accuracy.
Segmentation of such characters requires a two dimensional process - identifying the black
regions and then splitting them accordingly. Two major techniques employed for the same are
bounding box analysis and splitting of connected components.

Since classification is separated from segmentation, various post-processing


techniques must be applied in order to determine the actual text. Two major techniques used
are Markov Model, which emphasizes in correcting errors in single words that are obtained

by splitting or merging and non-Markov Model, which utilizes a spell-checker to correct


repetitive merge and split errors in the whole text rather than on individual words [4].
Flaws

The techniques used in each stage are completely different from one another,
making it hard to select one method for each input type.

The subimages that are obtained are dependent on the text in the image.

This technique of segmentation does not collaborate with classification into


symbols, thus requiring an extra implementation dedicated for the former.

Initial implementations of dissection constrained the input to the system of OCR,


restricting the user to write information in an array of boxes, such as in a cheque, where an
account number is written into, or with equally spaced hand-written or printed characters. [2]
Inferences
Dissection is best suited for printed characters, where the pitch and column parameters are
constant and can be used as a tool to separate an image into subimages. It is also inferred that the
dissection need not be essentially between two characters where the space is easily identifiable.
For hand-written, cursive handwriting, the segmentation points can exist such that they cut a
character into two separate components. After dissection is done, classification is done using
post-processing techniques.
References
[1] H.S. Baird, S. Kahan and T. Pavlidis, Components of an omnifont page reader, Proc. 8th Int.
Conf. on Pattern Recognition, Paris, pp. 344-348, 1986.
[2] R.J. Evey, Use of a computer to design character recognition logic, Proc. Eastern Jt. Comp.
Conf., pp. 205-211, 1959.
[4] http://perso.telecom-paristech.fr/~elc/papers/nato94.pdf
Method 2: Recognition based segmentation
This technique is based on semantic and syntactic correctness of the overall result.
Salient Features

No complex dissection algorithms are applied, instead a segmentation-free.


Instead, the image is divided systematically without any dependence on the content.
Recognition of words is done, by using a variable length window.

Recognition can be done either serially, identifying words left to right, or parallely
wherein an optimal path is generated from a lattice of feature-to-letter combination.

The generation of a window can be done in two ways:


Direct operation of image pixels.
Grouping of positional feature measurements made on images.

For the first type, the techniques implemented are:

A model designed by Kovalevsky [1] based on the assumptions such as


the pitch and column variable is known.
Shortest Path segmentation [2], which selects the optimal consistent
combination of cuts from a predefined set of windows.
Selective Attention [3], takes neural networks even further in the handling
of segmentation problems

For the second type, the methods implemented are:


Hidden Markov Model, which uses states and their transitions to identify
characters. Essentially it shows transitions from state to another when a character is either
recognised (transition to a state for the next letter), or to a state when some part of the
character could not be identified (probabilistic approach for transition to a state for the
same letter).
Non Markov approaches, which uses various features and their positions
of occurrence are recorded for an image. Each feature contributes an amount of evidence
for the existence of one or more characters at the position of occurrence. The positions
are quantized into bins such that the evidence for each character indicated in a bin can be
summed. to give a score for classification. These scores are subjected to contextual
processing using a predefined lexicon in order to recognize words. The method is being
applied to text printed in a known proportional font.
Inferences

The techniques deployed by recognition based segmentation aids in making a


given system faster, wherein it avoid the need of an extra component, just for segmentation,
i.e., segmentation is de facto along with recognition based algorithms.

Progressive developments have been made with this model, thus making the
newer versions more efficient.

There are multiple ways in which this system can be deployed, each of which
have a number of alternatives, depending on the type of input that has been provided, and the
need for accuracy.
References
[1] V.A. Kovalevsky, Character readers and Pattern Recognition, Spartan Books, Washington
D.C., 1968.
[2] C.J.C. Burges, J.I. Be and C.R. Nohl, Recognition of Handwritten Cursive Postal Words
using Neural Networks, Proc. USPS 5th Advanced Technology Conference, page A-117,
Nov/Dec. 1992
[3] K. Fukushima and T. Imagawa, Recognition and segmentation of connected characters with
selective attention, Neural Networks, vol. 6, no. 1, pp. 33-41, 1993.

Method 3: Holistic Strategies

The criterion is basically that there is predefined set of lexicons, which are used as recognition
patterns, rather than dividing them as letters and testing each of them either serially or parallely.
Salient Features

This technique was initially designed for online recognition.

Middle zone technique has been used using which ascenders and descenders are
checked and identified [2].

This technique is uses dynamic programming in order to ensure the speed of the is
maintained [1].
Flaws

A major drawback of this class of methods is that their use is usually restricted to
a predefined lexicon: as they do not deal directly with letters but only with words,
recognition is necessarily constrained to a specific lexicon of words.

Although dynamic programming is applied, this system has limitations on speed


and accuracy, since the lexicon is a word, and determining the middle zone is an additional
task.
References
[1] http://perso.telecom-paristech.fr/~elc/papers/nato94.pdf
[2] http://lpp.psycho.univ-paris5.fr/pdf/PapersPC/1976/Cavanagh-4-1976-186-199.pdf
Algorithm 2: Character Recognition using Logical Regression
Method 1: Template Matching
Template Matching is a method that recognizes characters or alphabets by comparing two images
of the alphabets. This method is commonly used in OCR, a system prototype to recognize
various characters which includes alphabets and grey-scaled images.
Sub Topic: Feature- Based Template Matching

This approach is well-suited when both reference image and template image, both have
similar or common features or control points [1].
Salient features

Features that are used to recognize characters would include points, curves, or a
surface model that needs to be matched.

Feature extraction generates features that can be used for selection and
classification.

When features are detected, these are saved into a feature vector. These
descriptors are then matched with a reference image [3].

Features could be extracted from the following representations:


1. Character contour

2. Binary character image


3. Character skeleton
4. Gray level character image

The main objective is to locate a pair-wise connection between the reference and
template using their spatial relations of features.

This method could be made less tedious by reducing the reference image and
template image by a same factor and hence reducing the number of features that needs to be
compared, but this would compromise the efficiency.
Inference

Feature based template matching requires a basic set of features that could be
used to characterize the script. Higher the number of features used, more efficient the
algorithm will be. Feature based algorithm is quite versatile as it could be used on various
forms of representations. For this algorithm to be efficient in character recognition, it should
have large number of features but only relevant features.
Flaws

Not as well developed as older methods hence current efficiency is not high.

For the method to be proved efficient, the algorithm would require large number
of features to gain high efficiency in character recognition [2].

The feature (key) points used should be distinct enough to carry out the character
recognition.

The algorithm is bound to make unusual mistake while recognizing characters


based on feature, as the feature used by algorithm would not be the same features used by
humans to recognize characters.

It is important to recognize the needed feature and discard the useless one as they
could hinder the efficiency of the algorithm
Reference
[1] http://maxwellsci.com/print/rjaset/v4-5469-5473.pdf
[2] http://www.ccs.neu.edu/home/feneric/charrec.html
[3] http://www.flll.jku.at/sites/default/files/u20/QCAV2011_Template_Matching_Textures.pdf
[4]http://davinci.fmph.uniba.sk/~uhliarik4/recognition/resources/due_trier_1996_feature_extracti
on_survey.pdf

Sub Topic: Template/Area- Based Template Matching


Template-based matching is used for pattern recognition, where a template is stored on a system
for each character in a script to be recognized, and a stored set is compared with the unknown
character and the best match is found [1].

Salient features

This method enables the characters to be recognized easily compared to such


methods that use no techniques.

The aforementioned method has the following steps: Acquisition process, filtering
process, and threshold the images, clustering the image of the character, and recognizing the
character.

Template of a particular character of a script is a representative image for that


character and these are created in training phase using training samples. These templates
make the identification of the unknown characters easy.

Two types of templates could be used to identify the unknown character;


1.
Statistical template (gray-level templates)
2. Average template

Statistical templates are those that condense the characteristics of training samples
of a particular character class. The training samples belonging to the same character class are
then superimposed. Superimposition helps determine the probability of each bit being while
in the presence of the class sample. Here white bits are given the value of 1 and black the
value 0, hence the bit would have a value in the range 0-1. This method uses that whiteness
of the bit, i.e., the whiter the bit, more reliable it is.

Average templates, also known as binary templates, are derived from statistical
templates. Average templates round of the probability value to gain an acceptable, general
result.
Inference

Template-based template matching requires a collection of templates for a


particular character class. These templates are created from training set during training phase,
where two types of templates are generated; statistical templates, and average template.
These help determine the closed match to the unknown character and help recognize it. This
method is better than the previous algorithm as it requires fewer number parameters for
determination on the unknown character.
Flaws

Due the vast number of templates required to recognize the characters, large
amount of memory is needed to carry out the algorithm.

If the character to be recognized is not gray- scaled, then color information would
be required which would increase the complications in the recognition.

If the templates and the unknown character input are not scaled then efficient
character recognition would not difficult [2].

Even if no match is present in the set, the algorithm will find one, thereby
resulting in an incorrect recognition.
Reference

[1] http://www1.chihlee.edu.tw/teachers/ctw/paper/2005/imp2005-2.pdf
[2] http://www.ehu.eus/ccwintco/uploads/e/eb/PFC-IonMarques.pdf
[3]http://www.academia.edu/714194/Optical_Character_Recognition_By_Using_Template_Matc
hing_Alphabet_
[4]http://umpir.ump.edu.my/969/1/NaCSES-2007086_Optical_Character_Recognition_By_Using_Templ.pdf
Algorithm 3: Character Recognition using Logical Regression
Method 1: Character Recognition using Quantization
Quantization in character recognition is the process in which large, possibly infinite, set of values
is reduced to a much smaller set, which could include rounding values. Function that performs
quantization is known as quantizer. Quantization is further classified into two; scalar
quantization, and vector quantization.
Sub Topic: Scalar Quantization
A common quantization method where a scalar input is mapped to a scalar output using a
quantization function. Scalar quantization can be simple, such as rounding off high precision
numbers.
Salient features

When many scalar needs to be quantized but they are done independently, then
its called scalar quantization.

It covers many basic ideas such as rounding off a precise number or mapping a
continuous space to a discrete space.

A function that carries out quantization is called a quantizer, and several factors depend
on its design, such as the amount of compression obtained or the loss occurred while
performing the algorithm.

Quantizer consists of
1.
Encoder mapping
2.
Decoder mapping

In encoder mapping, the encoder divides a range into a number of interval, where
in each interval is represented by a codeword. Hence the codeword can identify the interval
but not the value.

In decoder mapping, the decoder generates a reconstruct value from the codeword
it receives. Since the codeword can just identify the interval, the decoder generates an
estimated value. Estimated value usually is the midpoint of the interval.
Inference

Scalar quantization carries out the process of each scalar value independently. It
reduces a large number of set into a smaller one, i.e., many to one mapping is carried out

which is irreversible. The algorithm is carried out using a quantizer, and its design affects the
compression and the loss that would be present. The algorithm reduces the value, thereby
saving up memory but it induces noise in the image.
Flaws

Since it is many to one mapping, it is irreversible.

Quantization generates an error, known as quantization error, due to decrease in


the number the output values as compared to input provided.

Does not maintain the correlation that exists between the samples.
Reference
[1] http://shodhganga.inflibnet.ac.in/bitstream/10603/2268/17/17_chapter%204.pdf
[2] http://nptel.ac.in/courses/117104069/chapter_5/5_5.html
[3] www.cs.ucf.edu/courses/cap5015/vector.ppt

Sub Topic: Vector Quantization


In vector Quantization, the input samples are grouped are vectors and then quantized, as it is
more effective than quantizing individual scalar.
Salient features

In vector quantization, correlated samples are quantized, preserving the


correlation between the samples.

Data is quantized in the form of contiguous blocks called vectors rather than
individual samples.

Earlier version of the algorithm provided transparency in that vectors which had large
number of dimensions. But new versions allow transparency to exist in vectors with smaller
dimensions.

A simple vector quantizer uses 2-D vector space and divides it into hexagonal
region called the encoding region. Each region has a representation of vectors that needs to
be quantized and a representation of the codeword. The vectors in that region are best
represented by the codeword in the same region.

A training set is required to generate codewords and this is carried out using an
iterative algorithm.

There are various quantization techniques such as The vector


1.
Split Vector Quantization (SVQ) technique
2.
Multistage Vector Quantization (MSVQ) technique
3.
Split-Multistage Vector Quantization (S-MSVQ) technique
4.
Switched Split Vector Quantization (SSVQ) technique
Inference

Vector quantization groups the sample together; quantize them, thereby maintaining the
correlation between the samples. This algorithm has higher efficiency compared to scalar
quantization as it maintains the correlation and transparency.
Flaws

Quantizing vectors increases the system complexity and hence increase memory
requirement.

If the dimension of the vector is quite small, then poor quality results would be
produced.

If the application requires high bit-rate, then it would have high complexity and
memory requirement.

A large number of training data would be required to produce an efficient


codeword.

Reference
[1] http://shodhganga.inflibnet.ac.in/bitstream/10603/2268/17/17_chapter%204.pdf
[2] http://www.dcs.gla.ac.uk/~vincia/papers/lvq.pdf
[3] www.cs.ucf.edu/courses/cap5015/vector.ppt

Algorithm 4: Logical Regression


Regression techniques are employed to combine the results of various recognition algorithms to
identify text letters and words in the image.
Salient Features

The method computes a weighted sum of the ranks scored by the individual
classifiers and derive a consensus ranking. The weights are estimated by a logical regression
analysis.

An assumption made here is that the output from each of the corresponding
recognition techniques is computed to a character set with respect to an image. Since a
combination method is applied on the set, it is independent of the internal techniques used in
algorithms from which the set is computed [1].

Regression plays the role of classification in the stages of character recognition.


Inferences

Major applications of regression is in pattern recognition, either with supervised


or unsupervised based techniques.

This technique provides features like independence among individual methods,


thus allowing to evaluate each of the methods, grading their performance.
Flaws


Logistic regression requires that each data point be independent of all other data
points. If observations are related to one another, then the model will tend to overweight the
significance of those observations. This is a major disadvantage, because a lot of scientific
and social-scientific research relies on research techniques involving multiple observations of
the same individuals.

Logistic regression works well for predicting categorical outcomes like admission
or rejection at a particular college. It can also predict multinomial outcomes, like admission,
rejection or wait list. However, logistic regression cannot predict continuous outcomes [2].
References
[1] http://ect.bell-labs.com/who/tkh/publications/papers/spie92.pdf
[2] http://www.ehow.com/info_8574447_disadvantages-logistic-regression.html

You might also like