You are on page 1of 21

UNIVERSITY OF BIRMINGHAM

Image processing and Computer Vision


Character Interpretation
Folorunsho Solomon Opeyemi 5/22/2013

Image interpretation is one of the most common image processing activity, this document examines character recognition and the decision process that went into choosing the appropriate methods used to solve the problem.

CONTENT PAGES Contents


CONTENT PAGES ..................................................................................................................................................................... 1 LIST OF FIGURES ...................................................................................................................................................................... 2 LIST OF TABLES ........................................................................................................................................................................ 3 LIST OF EQUATIONS ................................................................................................................................................................ 4 DECLARATION OF CONTRIBUTION .......................................................................................................................................... 5 1. INTRODUCTION ............................................................................................................................................................... 6 1.1 PREAMBLE ..................................................................................................................................................................... 6 1.2 APPLICATION OVERVIEW ........................................................................................................................................ 6

2. REVIEW AND METHOD ........................................................................................................................................................ 8 2.1 IMAGE PREPROCESSING................................................................................................................................................ 8 2.1.1 GREY SCALE ............................................................................................................................................................ 8 2.1.2 IMAGE FILTERING ................................................................................................................................................... 9 2.1.3 BINARY THRESHOLDING ......................................................................................................................................... 9 2.2 SEGMENTATION .......................................................................................................................................................... 11 2.3 FEATURE EXTRACTION ................................................................................................................................................ 11 2.4 CLASSIFICATION / LABELLING ............................................................................................................................... 13

3. IMPLEMENTATION ............................................................................................................................................................ 14 3.1 APPLICATION USE CASE .............................................................................................................................................. 14 3.2 APPLICATION CLASSES ................................................................................................................................................ 15 3.3 OBJECT COLLABORATION ............................................................................................................................................ 18 4. EVALUATION ..................................................................................................................................................................... 20 5. CONCLUSION ..................................................................................................................................................................... 20 6. REFRENCES ........................................................................................................................................................................ 20

LIST OF FIGURES
Figure 1, Processing Steps....................................................................................................................................................... 7 Figure 2, a contrast between a color image and grey scale image ......................................................................................... 8 Figure 3, Threshold uniform illuminated image ..................................................................................................................... 9 Figure 4, Threshold non-uniform illuminated image ............................................................................................................ 10 Figure 5,Contrast between Otsu and Chow and Kanenko .................................................................................................... 11 Figure 6, Use case diagram ................................................................................................................................................... 14 Figure 7, Overview of classes in the Application. ................................................................................................................. 15

LIST OF TABLES
TO DO LIST OF TABLES

LIST OF EQUATIONS

DECLARATION OF CONTRIBUTION
TODO Declaration of contribution

1. INTRODUCTION
1.1 PREAMBLE
An image is an artifact that depicts human perception; a digital image is a numeric representation usually in the form of a 2D matrix. Vision allows humans to perceive and understand the world around them by interpreting images acquired using the eyes [1], for humans the concept of vision is a relatively simple task but, vision task that include measurement and repetition are quite complex for humans to do. Computers on the other hand are very comfortable performing measurement and repetitive interpretations of digital image are more of a challenge for computers. tasks but

This document describes work carried in attempt to used computers to interpret images of characters of different fonts that are scale and rotation invariant and to do it with a great deal of accuracy even if the acquire image has a lot of unwanted artifacts and random noise which could be have been introduced from the image acquisition hardware or caused by the environment the image was acquired from.

1.2 APPLICATION OVERVIEW


Figure 1, shows the steps employed in the achieving the objectives of the proposed application. It is assumed that these steps would be followed in the order although that is not always the case most times but considering the nature of the project most of the steps would be performed in the stated order. Image Acquisition: This is the process where an image is digitized and stored on a computer; this is the start of an image processing process. For this application standard test images which were acquired by Microsoft research India and hosted by the University of Surrey UK were used. The benefit of using a standard test image is to ensure that the results of the outcome of this work can be tested against and easily verifiable. Image Pre-processing: This is often concerned with emphasizing the important features of a captured image; it includes activities such as resizing, filtering etc. Image segmentation: This is the process of breaking up a digital image into multiple regions; the purpose of this process is to help identify regions of interest in an image. This is where the application able to separate the image into different areas that could be other items on the image e.g. coffee cup ring in the image from the actual character in the image. 6

Feature Extraction: Features are unique and consistent property of an image that can be used to help classify or label an image. The application extracts these properties to serve as inputs for the classifiers to help label the image. Classification/Labelling: This involves been able to determine or describe as accurate as possible the items or object within captured image. The application tried to identify the character within an image.

Figure 1, Processing Steps

2. REVIEW AND METHOD


All the steps described in section 1.2 there are more than one method to implement them, the decision of which method to choose in some cases are easy to arrive while in other cases it is usually a matter of trades offs like speed, data storage, reliability, accuracy that would determine which to choose, This section describes some methods that were considered and the method that were finally chosen and why they were picked at every processing stage the application is meant to go through.

2.1 IMAGE PREPROCESSING


Image pre-processing is an essential part of any image processing application and effective image pre-processing would lead to better performance ,for this application pre-processing task performed were binary thresholding, grey scale, image filtering, As mention earlier the decision of which methods to use at this stage were very trivial and easy to arrive at. 2.1.1 GREY SCALE The images where acquired as a three channel coloured 24bit per pixel image, this image is fairly large and hence can make computation more complex , Moreover most image processing algorithm work best on grey scale images hence the need to convert acquired images to grey. Equation 1 describes how to obtain a weighted average of a three channel, which is represents, the grey level of the image where R(x, y), G(x, y) and B(x, y) are R, G and B tricolour component of the input colour image --------------------------------------------- (1)

Figure 2, a contrast between a color image and grey scale image

2.1.2 IMAGE FILTERING Image filtering includes methods to enhance images for further computer vision operations. This application needs to reduce noise in the image and certain image details need to be emphasized and some suppressed Smoothing Erosion Dilation

2.1.3 BINARY THRESHOLDING Although this is considered a segmentation process for the purpose of this application is included as part of the pre-processing stage just to separate the objects in the image from its background. Equation 2 describes mathematically what happens when an image is been threshold. { Where T is the threshold value. ------------------------------------------------------------------------- (2)

Figure 3, Threshold uniform illuminated image

Figure 4, Threshold non-uniform illuminated image

Figure 3 shows an image with uniform illumination that has been threshold using the algorithm described in equation 2 for T = 100, Also considered Figure 4 which shows a non-uniform illuminated image been threshold by the algorithm described in Equation 2, It can be noted that the image is not properly segmented and lots of the image is lost this is due to the fact that the value for the grey levels does not contain distinctive peaks from background across the entire image and a global value for T is not ideal for a non-uniform illuminated image hence better techniques is required and these class of technique are generally referred to as adaptive thresholding techniques they general use different values for T throughout the image based on local neighbourhood values , Equation 3 describes mathematically adaptive thresholding techniques where f(x, y) is the gray level of point (x, y) in the original image, and p(x, y) is some local property of Image Segmentation by Adaptive thresholding . ----------------------------------------------------------------------------------------- (3) Chow & Kanenko, Otsu thresholding were the adaptive techniques that were reviewed. Chow and Kaneko method divides an image into a set of overlapping sub images and then find the optimum threshold value of T for each sub image from the histogram of the sub image. The threshold value T for every single pixel is found through interpolation the results of the sub images. This method is computationally expensive which makes it not ideal for real-time image processing. Otsu thresholding is a class of adaptive thresholding referred to as local thresholding, Otsu method calculates the optimum threshold separating background and objects in an image so that their variance is very small Figure 5 shows a contrast between an image that was segmented using both Chow and Kanenko method and Otsu method, although chow and Kanenko method provides about the same results as Otsu, Otsu was selected for the application because Chow and Kanenko s method is relatively slow and computationally expensive.

10

Figure 5, Contrast between Otsu and Chow and Kanenko

2.2 SEGMENTATION
Segmentation is the process of separating the objects within an image, there are variety of ways which this process can be achieved ,one method is to find all edges in the image, thus implying the borders of shapes in the given image another method is to try and detect whole shapes, given the fact that these shapes usually tend to have distinctive properties as opposed to the background they are aligned against, for this application both method were combined for optimal results the second method have been described in section 2.1.3 as part of the preprocessing stage. This section describes the first method as a form of segmentation within the application, although the techniques reviewed are generally referred to as contour tracing methods they still form a part of edge based segmentation, Moore Neighbours, Freeman chain code

2.3 FEATURE EXTRACTION


A typical image would have lots of data; the process of simplifying this large amount of data to a set of information that can to a good degree identify the image is referred to as feature extraction. With most image identification application this is a very important step, used features and extraction technique has to be carefully selected, for this application shape contexts ,Scale invariant feature transform, Image moments were reviewed . Shape Contexts (SC) (Belongie et al., 2002) is a descriptor for point sets and binary images. We sample points using the Sobel edge detector. The descriptor is a log-polar histogram, which gives a qn vector, where q is the angular resolution and n is the radial resolution. Scale Invariant Feature Transform (SIFT)(Lowe, 1999) are extracted on points located by the Harris HessianLaplace detector, which gives affine transform parameters. The feature descriptor is computed as a set of orientation histograms on (4 4) pixel neighborhoods. The orientation histograms are relative to the key-point orientation. The histograms contain 8 bins each, and each descriptor contains a4 4 array of 16 histogram around the key-point. This leads to feature vector with 128 elements. 11

Image Moments, is a certain particular weighted average (moment) of the image pixels' intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. Image moments are useful to describe objects after segmentation. Simple properties of the image which are found via image moments include area (or total intensity), its centroid, and information about its orientation. ---------------------------------------------------------------------------------------------------------- (4)

Equation XX describes how moments are calculated from an image, it is possible to calculate moments which are invariant under translation, changes in scale, and also rotation. Most frequently used are the Hu set of invariant moments

----------- (5) The first one, I1, is analogous to the moment of inertia around the image's centroid, where the pixels' intensities are analogous to physical density. The last one, I7, is skew invariant, which enables it to distinguish mirror images of otherwise identical images.In order to normalize the Hu moments and make it easy to process Euclidian distances between the Hu moments are calculated. ----------------------------------------------------------------------------------------------------------- (6)

FEATURE EXRACTION METHOD SC SIFT HU

%ACCURACY 64.83 46.94 70*

12

Table XX shows a comparison between the feature extraction method using KNN as a classifier based on a combination of results from XXX and XXX, it is shown that the moment based approached provided much better accuracy, hence normalized HU moment was selected for the application as a feature extraction method.

2.4 CLASSIFICATION / LABELLING


Features can be considered as knowledge representation the process of interpreting this knowledge for either matching or clustering is There are essentially to main activities involving in classification process which includes train and matching, most classification algorithms are as powerful as the amount of training they have been exposed too. There are two categories of classification algorithms, supervised and unsupervised, the supervised algorithms requires that sample of the features and label or class are initially used to train the algorithm while unsupervised algorithm does not require this initial training ,for this application supervised learning algorithms were reviewed. Support Vector Machines, Artificial Neural Network, K-Nearest Neighbour are the supervised algorithms reviewed. Support Vector Machines, belongs to a group of classifiers referred to as binary classifiers because it can be trained with a set of input to classify into just two classes ,a support vector machine constructs a hyper plane or set of hyper planes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyper plane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. Artificial Neural Network is a mathematical model inspired by humans brain neural networks. An Artificial neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system changing its structure during a learning phase. Neural networks are used for modelling complex relationships between inputs and outputs or to find patterns in data. K-Nearest Neighbour One of the simplest classification techniques is K-nearest neighbours (KNN), which merely stores all the training data points. When you want to classify a new point, look up its K nearest points (for K an integer number) and then label the new point according to which set contains the majority of its K neighbours. It is often effective but it is slow and requires lots of memory.

KNN was selected for the applications because of it relatively ease of use and the training just consists of saving the feature vector.

13

3. IMPLEMENTATION
Section 2 reviews the various image processing algorithms and techniques that would be used by this application many of these algorithms are easy to implement, This application the not implement the algorithms rather open source implementations of these algorithms are used and they are packaged as part of OpenCV, Although a managed wrapper for OpenCV is used to make its implementation available in .net runtime. The advantage of using this already implemented libraries include 1. The implementations are already tested by a large community of engineers and most error would have been identified and fixed hence it is more stable and reliable. 2. The implementation of the application would concentrate on solving the actual problem of character interpretation and not concentrate on implementing low level image processing algorithms, hence shorter development time.

3.1 APPLICATION USE CASE

Figure 6, Use case diagram

With emphasis on the problem the application is meant to solve, Figure 5 is a use case model of the application requirements. There are two main functional requirements for the application which are training and matching, the training involves extracting the features of an image based on the techniques described in section 2.2 and saving them. The matching involved extracting the features of an inputted image and matching it features to the closest character in the labeled features set saved in the file using KNN.

14

Train: This use case includes other use cases that help it achieve the goal of building a model that can be as knowledge for the classifier. Match: This use case also includes other use case, its goal is to accept an input image and match it with the closest labeled features.

Extract Features: This is extracting unique properties from the image that can use for classification this is done as described in section 2. Save Features: This saves the extracted feature vector to a file to allow for retrieval. Load Image: This this loads an image from a physical file location in the application memory for processing Preprocess Image: This performs all the image emphasis techniques KNN Match: This is to use the saved features and try to find the nearest K match to the input features using KNN classifier Load Saved Features: This is reading the save features from the file and making it available to the classifier

3.2 APPLICATION CLASSES

Figure 7, Overview of classes in the Application.

Figure 6 gives an overview of all the entities within the application, the Features class stores the feature vector that are used for building the knowledge model for each class of characters, the OCREngine class is the controller class and it 15

performs all the core operation within the application while MainForm class and CountourForm class are display classes that provides an interface for the users to interact with the application. Features Class: Public field HU1 to HU7 represents the values of the invariant HU as described in section 2, while methods GetE1 () to GetE2 () calculates the Euclidian distances between the HU moments as described in section 2. OCREngine Class: This class contains only static methods that are used to carry each step in the entire application process and they serve as Object oriented wrappers for the OpenCV function calls. ConvertToGreyScale: This wraps the OpenCV function that converts the image for an RGB image to a grey scale image. ConvertToBinary: This wraps the OpenCV function that converts the grey scale image to a binary image. ExtractFeatures: This includes the OpenCV function that. Smooth: Filter: This wraps the OpenCV functions that perform the morphological operations, which are erosion and dilation. FindCountour: This wraps the OpenCV function that performs the contour tracing from the loaded image and is also responsible for launching the CountourForm as well for each contour extracted. Train: Match:

MainForm Class: This class is responsible for interacting with the user especially to load the images for processing and it is the first contact the application user as with the application, figure 8 is a screen shot of this GUI class when rendered. CountourForm: This class is responsible for interacting with the user to display the extracted features and to label the extracted features from the contours extracted, figure 9 is a screen shot of this GUI class when rendered.

16

Figure 8, Screen Shot of the MainForm

Figure 9, Screen Shot of the ContourForm

17

3.3 OBJECT COLLABORATION

Figure 10, Training Sequence Diagram

Figure 11, Matching Sequence Diagram

18

Figure 10 and figure 11 shows the interaction between the objects in the application for the training and matching use cases. The MainForm object is responsible for loading the image through user interaction and it sends requests to the controller Object to perform the processing stages and the User interface of the MainForm is updated after each step except after the contour extraction step where a the controller object interacts with the contour form to display the contour and the ContourForm sends message to the controller object to extract features and the features are update on the ContourForms display depending on which use case the ContourForm can send either a train or match message to the controller object if it is a train message the ContourForm labels the features by accepting a user input which is the ASCII code of the character, if a match message is sent to controller object the controller object responds with the ASCII code of the .

19

4. EVALUATION 5. CONCLUSION 6. REFRENCES

20

You might also like