You are on page 1of 6

Optical Recognition of Handwritten Digits

ECE539 Project Report
Pradeep Rajendran
December 20, 2013

1 Introduction
Recognition of handwritten digits has many everyday uses. It is particularly applied in the auto-
mated sorting of postal addresses based on zipcode. A general implementation of such a system is
breifly described.
As the postal item moves on a conveyor belt, robotic mechanisms reposition the item such that
the address label is visible to a camera. Then, the camera registers an image of the label. The
image is then fed into a image processing system which dissects the image into constituent character
blocks. The digit blocks composing the zipcode field is then thresholded and pre-processed to ensure
uniform scale and orientation. After preprocessing, the binary image of a digit block is ready for
use in machine learning tools.
The pre-processing step is rather involved and out of the scope of this project as it requires many
image processing steps. It cannot be skipped as it is crucial to the success of most machine learning
In this project, I focus on how the performance of common machine learning tools compare with
each other.

2 Dataset
The MNIST(Mixed National Institute of Standards and Technology) dataset [1] is used in this
project. All the digit images have been pre-processed such that the digit is centered on a 28 × 28
block of 8-bit grayvalues. Fig. 1 shows an example of a digit block containing the character ‘8’.
Fig. 2 shows a ensemble of digit blocks containing the character ‘5’.


e. 1: 28 × 28 digit block containing character ‘8’ Fig. 2 . 2: A small subset of training ensemble containing character ‘5’ 3 Tools applied 3. The second layer has h hidden neurons. And. The first layer is the input layer and it has 784 inputs. 5 10 15 20 25 5 10 15 20 25 Fig. 10 digits).1 Multi-layer perceptron A two layer perceptron implementation found in the Neural Network Toolbox is utlized for this section. the final layer is the output layer consisting of 10 neurons corresponding to the 10 class labels (i.

it appears that the number 4 is often misclassified as it is furthest away from the point (0. it is clear that h = 100 gives better performance than h = 200. 4. This might be due to over-fitting associated with h = 200. 3.2 Support vector machine The LIBSVM suite of tools developed by Chih-Chung Chang and Chih-Jen Lin is used in this section [2]. The main tools are: svm-scale (used for scaling data). h Error rate (%) 10 8.09 120 11.22 200 21. 3: An example of 2-layer neural network with 10 hidden neurons The table below shows the various values of h that were tried and the corresponding performances. 4: ROC plot and its zoomed version (right) From the ROC. 1). svm-train (used to obtain support vectors 3 .92 100 2. Fig. (a) ROC for h = 100 (b) ROC for h = 100 Fig.46 20 8. The ROC plots for h = 100 are given in Fig.27 Table 1: Various values of h and correspoding error rates From Table 1.

from given data). testing is performed using svm-predict. each training feature vector becomes a m-dimensinal vector instead of the original 784-dimensional vector as shown in Fig. . A scale file is also produced along with the scaled training file.01 Polynomial (Order 6) 1. Since there are 10 000 test vectors. 5. vm ]) are also obtained. the m eigenvectors (vj ) and 10 eigendigits (Ej = [v1 v2 . While performing PCA.18 Polynomial (Order 5) 2. This is because. . This model file contains the support vectors identified in the scaled training file. Therefore. Once the model file is obtained. Choice of parameters There are many choices of parameters that can be chosen during the training phase of the SVM. In this way. a similarity metric between each test feature vector and all the training feature vectors has to be calculated. Kernel Type Error rate (%) Linear 6. 3.92 Polynomial (Order 8) 1. Table 2 shows the error rate corresponding different kernels and parameters.3 K-Nearest Neighbor (K-NN) K-NN is the simplest classification method. The scale file contains the appropriate scale values that have to be applied to the testing data as a pre-processing step. The scaled training file is then input to svm-train which produces a model file. Eigen-digit method The Eigen-digit method involves using PCA (Principal Component Analysis) to reduce the dimen- sionality of the feature space from M to m. it seems that highest performance is achieved when a polynomial kernel of order 6 or 7 is used.99 Gaussian 2.85 Polynomial (Order 4) 2.92 Polynomial (Order 7) 1. Method svm-scale is first used to to scale the input feature vectors in the training file to have a value of either +1 or -1. it is also a slow and memory intensive method. for this dataset. But. svm-predict (used to make predictions based on a trained model).71 Table 2: Performance for different kernels According to Table 2. 60 000 similarity metric calculations have to be performed for each test vector. svm-predict uses the model file to determine the classification of test vectors and produces an output file containing these predictions. the total similarity metric calculation calls sums up to 600 000 000. 4 .66 Sigmoidal 10. during testing phase.

6). oj . During the comparision. ct .. 50-dim training vector io n E 9 50-dim vector Fig. each of 784-dimensional test feature vector is projected to the m- dimensional feature space using the eigen-digits obtained in the training phase. v1 Weight 1 v2 Weight 2 Projection 784-dim vector v3 Weight 3 5-dim vector v4 Weight 4 v5 Weight 5 Fig. Amongst the closest K matches.. 5: Example of dimensionality reduction for m = 5 During the testing phase. Distance bewteen 50-dim vector 50-dim training vector E0 50-dim training vector n io e ct . The resulting m-dimensional feature vector is then compared with the huge collection of labeled m-dimensional training feature vectors (leftmost vectors in Fig. . K labled closest matches from the collection are identified. Pr . Projection E5 784-dim test vector 50-dim vector 50-dim training vector Pr 50-dim training vector o je .. the most frequently occuring label is taken as the classification output of the Eigen-digit method. 6: Processing steps illustrated for m = 50 and K = 3 5 .

59 5. 2.7 4.981 8. 6 . Cortes.011 8. 27:1–27:27. 2010.01 4.23 4. “MNIST handwritten digit database. Combinations K=1 K=3 K=5 K=7 K = 11 Computation time (s) m = 10 61.991 2.731 3348 m = 784 3.661 5907 Table 3: Error rate and computation time for various combinations of m and K Increasing K does not seem to improve error rate.16 4. Chang and C. 2011.156 60. References [1] Y.” ACM Transac- tions on Intelligent Systems and Technology.12 4.851 410 m = 100 5. pp. [2] C.656 62.841 10.59 5. KNN with m = 784 and K = 1 nearly took 1 hour and 40 minutes of computation time for calculating labels of 10 000 test vectors.77 4. A trade-off between computation time and error rate can be acheived by picking a value for m for which the classification rate specifications are still met.496 71 m = 50 8.391 1786 m = 500 3.451 6.49 3.76 4.286 63.60 seconds for each test vector which is too slow. increasing m has a diminishing improvement on error rate.-C.261 5.-J.65 5.161 6.47 3.546 61. 4 Conclusion The SVM seems to be particularly well suited for digit recognition as it has the the best performance when compared to the MLP or KNN methods.381 6.47 respectively. This translates to about 0.78 5.501 837 m = 200 4.141 7. vol.lecun.931 9. LeCun and C. MLP and KNN are 1.03 4. Lin.” http://yann.09 and 3.621 1474 m = 250 3. And. The best error rates of SVM. SVM is not only accurate but also faster than the other methods tested. “LIBSVM: A library for support vector machines.92.