Handwritten Objects Recognition Using Regularized Logistic Regression and Feedforward Neural Networks

HANDWRITTEN OBJECTS RECOGNITION USING REGULARIZED
LOGISTIC REGRESSION AND FEEDFORWARD NEURAL

NETWORKS
1) SHAHAM SHABANI
B.S.c Student, Electrical Engineering Department,
Amirkabir University of Technology
Tehran, Iran
shahamshabani@aut.ac.ir
2) YASER NOROUZI
Assistant Professor, Electrical Engineering Department,
Tehran, Iran
y.norouzi@aut.ac.ir
3) MARJAN FARIBORZ
B.S.c Student, Electrical Engineering Department,
Tehran, Iran
mfariborz@aut.ac.ir
ABASTARCT
In this paper, we present a feedforward training
algorithm using Regularized Logistic Regression and
Neural Networks to recognize handwritten objects.
Furthermore, we intend to consider the effect of
Gaussian noise in this procedure in order to examine the
versatility of our approach. We might intend to transmit
the image of our digits through an AWGN channel to a
certain destination and then do the recognition process
in our destination, so we need our algorithm to be still
robust against the noises caused by AWGN channels
and sensors. The main advantage of our approach is to
reduce the amount of computations and, in turn,
considerably decrease the processing time.
Index Terms Learning, Neural networks, AWGN,
Feedforward neural networks
1. INTRODUCTION
Regarding recent progress in processing technology,
lots of algorithms have emerged in order to classify and
recognize hand written objects, paving the way for the
idea of numerous useful applications finding a way to
recognize and determine handwritten notes has always
been an obsession for scientists. This can be so
beneficial in various applications in which the operators
tend to unify and integrate all the handwritten notes
with the aim of facilitating and enhancing the utilization
of the notes. Automated handwritten digit recognition is
widely used today from recognizing zip codes (postal
codes) on mail envelopes to recognizing the amounts
written on bank checks. In this paper we incline to
introduce an algorithm, called Logistic Regression, to
recognize handwritten notes and then we will amend it
by the use of neural networks. As a brief summary, in
this paper, we only consider the case of handwritten

digits; however, this algorithm can be utilized to
recognize alphabetical figures and words. We initiated
our learning process with images in size of 4040-pixel
figures, but the process was noticeably slowed down in
speed and lengthened in time, we imposed an inevitable
shift on size of the images, reducing them to a grid of
2020 pixel grayscale square images. These images can
now be interpreted as a 2020 matrix to which
recognition process will be applied. We will discuss this
procedure further in the following sections.
Casting a perfunctory retrospective glance at the
previous achievements in this area not beyond the last
three decades, we try to have a brief review of the
available approaches among which [1] and [2] stand
out. However, it is helpful to summarize the past
attempts. Many researchers have admitted the classical
pattern recognition methods which are based on preprocessing the image in order to extract features and
then feed the Neural Network with these features.
Although there have been lots of variations, these
methods may be described using two dimension:
1) Statistical/Structural
2) Global/Local
As an example of a global, statistical method, [3] has
suggested to extract two raw and eight central features.
Others extract topological features which depend on
global properties of the data. For instance, Shridhar and
Badreldin [4] used features extracted from the character
profiles in the image. These features can be fed into a
tree classifier. Recently, some new approaches have
emerged such as [5], [6], [7] which have successful
attempts to automatically learn appropriate local
features using feed forward neural networks. On the
other hand, in some algorithms, in order to improve the
recognition task, researchers have combbined more two

or more classifiers, for instance, [8], [2],, [9].
Optical Character Recognition (OCR)), an approach
which is used to convert scanned imagess of handwritten
or printed text into intelligible text. Ann important step
has been taken in OCR. According to [100], an optimized
method has reached an error rate of aboout 15% without
rejection, using a standard database of constrained
c
presegmented handwritten digits, althoughh, in the case of
unsegmented bursts of digits, it is reqquired to match
human performance. In order not to use extremely large
training sets, recognizers need to bee fed by prior
knowledge about the features and objectts they expect to
find in images. This will make the recoognition process
much sharper. This approach is prevaleent in structural
systems though seldom found in stattistical systems.
Some statistical systems have been impplemented which
permit typical digit transformations [11]], but recognizer
classifiers do not usually address the isssue of explicitly
explaining the data.
The digit model we are going to usse is a grid of
2020pixel images in grayscale formaat. These images
are set to be in the middle of the segm
ments. With each
image regarded as a square matrix, the gray parts shall
w
the white
function as the zeros of the matrix while
parts are the ones in the matrix and thhen our network
will be trained to find the best parameeters ( ) which
play an important role in our recognitionn process.
2.
MULTI-CLASS CLASSIF
FICAION
For this algorithm, we will use logisticc regression and

neural networks to recognize handwritteen digits (from 0
to 9). If we assume each single digit as a class, therefore
we need to classify each digit in a classs. If we do so, it
means that we recognize the digit. We start this
procedure by datasets.
2.1. Dataset
In this paper we used 5000 training examples in a
.mat format, where each training example is a
2020pixel grayscale image of the diggit. Each pixel is
represented by a floating point numbeer indicating the
grayscale intensity at that location. Byy use of Raster
technique [12] (which is a common application in
image processing) the 20 by 20 griid of pixels is
unrolled into a vector of 400 elements. Each of these
training examples becomes a single roow in our data
matrix X. This gives us a 5000 by 400 matrix X where
h
digit
every row is a training example for a handwritten
image.
(x (1) ) T
(x ( 2 ) ) T
(1)
X =
(m ) T
(x
)
The second part of the training set is a 5000element

vector y that contains labels foor the training set. To
make things more compatible with
w
Matlab indexing,
where there is no zero index, we have mapped the digit
0 to the value 10. Thereforee, a 0 digit is labeled
as 10, while the digits 1 to 99 are labeled as 1 to
9 as their natural order.
2.2.
Visualizing the Data
We begin by visualizing a subsett of the training set. We

randomly select 100 rows from
m X and pixel based
grayscale image and display theem together. After this
step we will see an image as folloows:
Figure 1: A subset of used databbase in this paper [13]
2.3.
Vectorizing Logistic Regrression
We will be using multiple one-Vs-all logistic regression

models to build a multi-class claassifier. Since there are
10 classes, we will need to traiin 10 separate logistic
regression classifiers. To make thhis training efficient, it
is important to ensure that our code is well vectorized.
In this section, we will implemennt a vectorized version
of logistic regression that doess not employ any for
loops. It makes our code to run faster in order to save
time which is the main feature off this approach.
2.3.1. Vectorizing the Cost Function

We will begin by writing a vecctorized version of the
cost function. Bear in mind thhat in (unregularized)
logistic regression, the cost functiion is [14]:
J ( ) =
1 m
[ y(i) Log(h (x (i) )) (11 y(i) ) Log(1 h (x (i) ))]
m i =1
(2)
To compute each element in the summation,

s
we have to
compute h (x(i)) for every exam
mple i, where h (x(i)) =
g(
T (i)
x ) and g(z) =
-1
is the siigmoid function [15]. It
turns out that we can compute this

t
quickly for all our
examples by using matrix multipplication. Let us define
X and as:
(x(1) )T
(2) T
(x )
X =
#
(m) T
(x )
0

1
and =
#

n
(3)
(x (1) ) T T (x (1) )
(x ( 2 ) ) T T (x ( 2 ) ) (4)
=
X =
#
#
(m) T
T
(m)
(x ) (x )
In the last equation, we used the fact that aTb = bTa if a
and b are vectors. This allows us to compute the
products T x(i) for all our examples i in one line of our
code. Our implementation uses the above mentioned
strategy to calculate T x(i). We also use a vectorized
approach for the rest of the cost function. A fully
vectorized version of logistic regression cost function
does not contain any loops.
2.3.2. Vectorizing the Gradient
Note that the gradient of the (unregularized) logistic
regression cost is a vector where the jth element is
defined as [16]:
1 m
1
((h (x(i) ) y(i) ) x(i) ) = X T (h (x) y)
m i =1
m
(6)
Where:
h (x (1) ) y(1)
h (x (2) ) y(2)
h (x) y =
(m)
(m)
h (x ) y
(7)
(8)
The expression above allows us to compute all the

partial derivatives without any loops.
2.3.3. Vectorizing Regularized Logistic Regression
After we have implemented vectorization for logistic
regression, we will now add regularization to the cost
function in order to reach a better approximation for
logistic regression. Note that for regularized logistic
regression [17], [18], the cost function is defined as:
J ( ) =
1 m
n 2
[ y(i) Log(h (x (i) )) (1 y(i) ) Log(1 h (x (i) ))] +
j
2m j =1
m i =1
(9)
It is noteworthy that we should not be regularizing 0

which is used for the bias term.
Correspondingly, the partial derivative of regularized
logistic regression cost for j is defined as:
J ( )
1 m
for j=0
=
(h (x ( i) ) y ( i) ) x (i)j
0
m i =1
(10)
(5)
To vectorize this operation over the dataset, we start by

writing out all the partial derivatives explicitly for all
j:
J
m
(i)
(i)
(i)

((h (x ) y ) x 0 )
0
i =1
J
m
(i)
(i)
(i)
= 1 ((h (x ) y ) x 1 )
1 m i =1
#
#
J
((h (x (i) ) y (i) ) x (i) )
n
i =1

|
| 1
|
i i x(i) = x(1) " x(m) #2 = X T

|
|
|
m
m
Then, by computing the matrix product X , we have:
J
1 m
= ((h (x (i) ) y (i) ) x (i)j )
j m i =1
Note that x(i) is a vector, while ( h (x(i)) y(i) ) is a

scalar (single number). To understand the last step of
the derivation, let us assume i = (h (x(i)) y(i) ),
therefore:
J ( )
1
=(
0
m
(h(x
(i)
) y (i) ) x (i)j ) +
i =1
for j1
(11)
2.4. One-Vs-all Classification

In this part, we will implement one-Vs-all classification
[19] by training multiple regularized logistic regression
classifiers, one for each of the K classes in our dataset.
In the handwritten digits dataset, the total number of
classes is ten (K = 10), but our algorithm should work
for any value of K. We should now train one classifier
for each class. In particular, our algorithm should return
all the classifier parameters in a matrix
RK(N+1) ,
where each row of corresponds to the learned logistic
regression parameters for one class. We can do this with
a for-loop from 1 to K, training each classifier
independently. Note that the y element to this function
is a vector of labels from 1 to 10, where we have
mapped the digit 0 to the label 10 (to avoid
confusions with indexing). When training the classifier
for class k
{1,2,,K} , we need a m-dimensional
vector of labels y, where yj {0,1} indicates whether
the jth training instance belongs to class k (yj = 1), or to
a different class (yj = 0).
3.2. Noise Measurement

2.4.1. One-Vs-all Prediction
After training our one-Vs-all classifier, we can now use
it to predict the digit contained in a giiven image. For
each input, we should compute the proobability that it
belongs to each class using the trained
t
logistic
regression classifiers [20]. Our one-V
Vs-all prediction
function will pick the class which thee corresponding
logistic regression classifier outputts the highest
probability and return the class label (11, 2,..., or K) as
the prediction for the input example. Onnce we are done,
the algorithm will compute the acccuracy of our
prediction by using the learned value off . We will see
that the training set accuracy is aboutt 94.9% (i.e., it
classifies 94.9% of the examples in the
t
training set
correctly).
3.
Before the arrival of a handwritteen image to our Neural

Network as an input, we add Gaussian
G
Noise with a
signal to noise ratio of 10 dB to thhe image. This helps us
evaluate our approach in more actual cases such as
additive noise caused by AWGN
N channels and sensors.
In this case our algorithm has to classify and recognize
the image which is added with some
s
amount of White
Gaussian Noise.
(12)
X n e w = X + n (t)
NEURAL NETWORK
KS
After adding noise
Before adding noise
In the previous part, we implemennted multi-class

logistic regression to recognize handdwritten digits.
However, logistic regression cannot form
m more complex
hypotheses as it is only a linear classifierr.
In this part, we will implement a neuural network to
recognize handwritten digits using the same training set
as before. The neural network will be able
a
to represent
complex models that form non-linear hypotheses.
h
For
this part, we will be using parameterss from a neural
network that we have already trained using Gradient
Descent method. Our goal is to implement the
u our weights
feedforward propagation algorithm to use
for prediction.
Now we will implement feed-foorward propagation for

the neural network [22]. We inntend to find a way to
return the neural network's prediction. We will
implement the feed-forward com
mputation that computes
h (x(i)) for every example i and returns the associated
predictions. Similar to the onne-Vs-all classification
strategy, the prediction from the neural network will be
the label that has the largest outpuut (h (x))k .
3.1. Model Representation
To achieve this goal, we assume:
Our neural network is shown in the folllowing figure. It

has 3 layers - an input layer, a hiddeen layer and an
output layer. Note that our inputs are pixel values of
digit images. Since the images are of size
s
2020, this
gives us 400 input layer units (excludinng the extra bias
unit which always outputs +1). As befoore, the training
data will be loaded into the variables X and y. We have
been provided with a set of network paarameters ( (1),
(2)
) that had already been trained.
Figure 3: Effect of addditive noise.
3.3. Feedforward Propagation

n and Prediction
1 "
X = # %
1 "
Then we compute z = X
x (1)
#
x (m)
(13)
and next
n C as:
1
(14)
C=
1 + e z
If we define a new parameter callling P:

1
(15)
P=
T
1 + e C
So, P is a 5000 10 matrix eaach of whose columns
corresponds to a digit from 0 to
t 10 and each row is
related to an example. In each
e
row, the index
corresponding to the maximum value
v
of the elements of
that row is the recognized digit.
Our algorithm will predict the handwritten
h
digit using
the loaded set of parameters for Theta1
T
and Theta2 with
presence of additive white Gausssian noise. We will see
that the accuracy is about 90.5%. This decrement in
accuracy is due to the White Gaaussian Noise added to
the Images, yet, by considerinng the speed of this
algorithm and the effect of additive noise, this accuracy

confirms the success of this approach .After that, an
interactive sequence will launch displaying images from
the training set one at a time, while the console prints
out the predicted label for the displayed image.
A sample result is depicted by following image:
So we summarize the results in the following chart:
Accuracy without noise

Accuracy after 100 time adding
noise
90.5 %
86.8 %
5. CONCLUSION
Figure 4: An example of the recognition process
4.
STATISTICAL ANALYSIS
After the recognition process, now we intend to

evaluate our algorithm statistically. So, in this time,
right before recognition process, we apply Additive
White Gaussian Noise (AWGN) to each image with
different noise powers in the range of 0 to 20 dB as the
values of SNR for our recognized image and we repeat
this procedure for 100 times for each SNR. In this part,
our goal is to compare the Mean Square Errors resulting
from the differences between the earlier images, before
the recognition process, and the images which were
undergone the AWGN several times. Then, we calculate
the term named Bias which is the variance of
differences between the recognized image and the
image results from applying AWGN after 100 times. If
we calculate the Mean Square Error (MSE) for the bias
term in each SNR we reach to following curve:
Figure 5: Results of applying noise 100 times for each SNR

value in the range of 0 to 20 dB
Hand written Recognition still remains an open problem

which is likely to make lots of progress in future and
reach more efficient approaches. In this paper we
introduced a feed forward algorithm which is used to
recognize hand written digits using Logistic Regression
and Neural Networks. The most important goal of
suggesting this algorithm was to do recognition process
as fast as possible with the lowest amount of calculation
and the most possible efficiency Since we have to do
lots of calculations in order to reach Back-Propagation
error and use it as a feedback to improve our
recognition result in Back-Propagation approach.
Another important feature of this approach is that we
have considered that our hand written image is passed
through an AWGN channel and our algorithm is
capable of recognizing the hand written image with
some amounts of noise. In practice, this noise may stem
from the sensors which are used to read hand written
images and pass them on to the recognition block.
In future, we hope to introduce and implement more
comprehensive and efficient algorithms for recognizing
hand written objects.
REFERENCES
[1]
S. Impedovo, Fundamentals in Handwriting

Recognition, Springer Verlag, 1994.
[2]
C.Y. Suen, C. Nadal, R. Legault, T.A. Mai,

and L. Lam, Computer Recognition on
Unconstrained Handwritten Numerals, Proc
IEEE, vol. 80, no. 7, pp. 1,1621,180, July
1992.
[3]
G.L. Cash and M. Hatamian, Optical

Character Recognition by the Method of
Moment, Computer Vision, Graphics, and
Image Processing, vol. 39, pp. 291310, 1987.
[4]
M. Shridhar and A. Badreldin, Recognition of

Isolated and Simply Connected Handwritten
Numerals, Pattern Recognition, vol. 19, no. 1,
pp. 112, 1986.
[5]
[6]
[7]
Y. Le Cun et al., Handwritten Recognition

with a Back Propagation Network, Advances
in Neural Information Processing, D.S
Touretzky, ed., Denver, vol. 2, pp. 396404.
San Mateo: Morgan Kaufmann, 1990.
J.D. Keeler, D.E. Rumelhart, and W.K. Leow,
Integrated Segmentation and Recognition of
Hand-Printed Numeral, Advances in Neural
Information Processing Systems 3, R.P.
Lippmann, J.E. Moody, and D.S. Touretzky,
eds., pp.557563. San Mateo: Morgan
Kaufmann, 1991.
K. Fukushima and N. Wake, Handwritten
Alphanumeric Character Recognition by the
Neocognitron, IEEE Trans. Neural Networks,
vol. 2, no. 3, pp. 355365, 1991.
[8]
D. Lee and S.N. Shridhar,Hand-printed Digit

Recognition: A Comparison of Algorithms,
Third Intl Workshop on Frontiers in
Handwritten Recognition, pp. 153162,
Buffalo, NY, May 1993.
[9]
F. Kimura and M. Shridhar, Handwritten

Numerical Recognition Based on Multiple
Algorithms, Pattern Recognition, vol. 24, no.
10, pp. 969983, 1991.
[10]
J. Geist et al., NISTIR 5452. The Second

Census Optical Character Recognition Systems
Conference, Technical Report, U.S. National
Institute of Stanford and Technology, 1994.
[11]
P. Simard, Y. Le Cun, and J. Denker,

Efficient Pattern Recognition Using a New
Transformation Distance, Advances in Neural
Information Processing Systems 5, J.D.
Cowan, S.J. Hanson, and C.L. Giles, eds., pp.
5058. Morgan Kaufmann, 1993.
[12]
Yeo, W. S. (2006). Raster scanning.

http://ccrma.stanford.edu/woony/works/raster/.
[13]
This is a subset of the MNIST handwritten

digit
dataset
(http://yann.lecun.com/exdb/mnist/)
[14]
Kwangmoo Koh, Seung-Jean Kim, Stephen

Boyd, An Interior-Point Method for LargeScale 1-Regularized Logistic Regression,
Journal of Machine Learning Research 8
(2007) 1519-1555
[15]
von Seggern, D. CRC Standard Curves and

Surfaces with Mathematics, 2nd ed. Boca
Raton, FL: CRC Press, 2007.
[16]
LeCun, Y., Bottou, L., Orr, G. B., & Mller,

K.-R. (1998), Efficient Backpropagation,
Neural Networks: Tricks of the Trade, Springer
(pp. 9{50).
[17]
V. Vapnik, The Nature of Statistical Learning

Theory, Springer Verlag, New York, 1995.
[18]
V. Vapnik, S. Golowich, and A. Smola,

Support Vector Method for function
approximation, regression estimation, and
signal processing, In Nips 9, San Meteo, CA,
1997
[19]
Cheng-Lin Liu, One-Vs-All Training of

Prototype Classifier for Pattern Classification
and Retrieval, 2010 International Conference
on Pattern Recognition
[20]
Leo Breiman and Jerome H.

Predicting multivariate responses
linear regression, Journal of
Statistical Society. Series B, vol.
1997, 3-54.
[21]
http://sparkpublic.s3.amazonaws.com/ml/exercises/
ex2.zip
[22]
Constantin Anton, Cosmin tirbu, Romeo

Vasile
Badea,
Identify
Handwriting
Individually Using Feed Forward Neural
Networks, International Journal of Intelligent
Computing Research (IJICR), Volume 1, Issue
4, December 2010
Friedman,
in multiple
the Royal
59, no. 1,

Handwritten Objects Recognition Using Regularized Logistic Regression and Feedforward Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handwritten Objects Recognition Using Regularized Logistic Regression and Feedforward Neural Networks

Uploaded by

Copyright:

Available Formats

HANDWRITTEN OBJECTS RECOGNITION USING REGULARIZED

LOGISTIC REGRESSION AND FEEDFORWARD NEURAL

this paper, we only consider the case of handwritten

recognition task, researchers have combbined more two

For this algorithm, we will use logisticc regression and

The second part of the training set is a 5000element

Visualizing the Data

We begin by visualizing a subsett of the training set. We

Figure 1: A subset of used databbase in this paper [13]

Vectorizing Logistic Regrression

We will be using multiple one-Vs-all logistic regression

2.3.1. Vectorizing the Cost Function

To compute each element in the summation,

is the siigmoid function [15]. It

turns out that we can compute this

The expression above allows us to compute all the

It is noteworthy that we should not be regularizing 0

To vectorize this operation over the dataset, we start by

i i x(i) = x(1) " x(m) #2 = X T

Then, by computing the matrix product X , we have:

Note that x(i) is a vector, while ( h (x(i)) y(i) ) is a

2.4. One-Vs-all Classification

3.2. Noise Measurement

Before the arrival of a handwritteen image to our Neural

Before adding noise

In the previous part, we implemennted multi-class

Now we will implement feed-foorward propagation for

3.1. Model Representation

To achieve this goal, we assume:

Our neural network is shown in the folllowing figure. It

Figure 3: Effect of addditive noise.

3.3. Feedforward Propagation

If we define a new parameter callling P:

algorithm and the effect of additive noise, this accuracy

So we summarize the results in the following chart:

Accuracy without noise

Figure 4: An example of the recognition process

After the recognition process, now we intend to

Figure 5: Results of applying noise 100 times for each SNR

Hand written Recognition still remains an open problem

S. Impedovo, Fundamentals in Handwriting

C.Y. Suen, C. Nadal, R. Legault, T.A. Mai,

G.L. Cash and M. Hatamian, Optical

M. Shridhar and A. Badreldin, Recognition of

Y. Le Cun et al., Handwritten Recognition

D. Lee and S.N. Shridhar,Hand-printed Digit

F. Kimura and M. Shridhar, Handwritten

J. Geist et al., NISTIR 5452. The Second

P. Simard, Y. Le Cun, and J. Denker,

Yeo, W. S. (2006). Raster scanning.

This is a subset of the MNIST handwritten

Kwangmoo Koh, Seung-Jean Kim, Stephen

von Seggern, D. CRC Standard Curves and

LeCun, Y., Bottou, L., Orr, G. B., & Mller,

V. Vapnik, The Nature of Statistical Learning

V. Vapnik, S. Golowich, and A. Smola,

Cheng-Lin Liu, One-Vs-All Training of

Leo Breiman and Jerome H.

Constantin Anton, Cosmin tirbu, Romeo

You might also like