Professional Documents
Culture Documents
Image interpretation is one of the most common image processing activity, this document examines character recognition and the decision process that went into choosing the appropriate methods used to solve the problem.
2. REVIEW AND METHOD ........................................................................................................................................................ 8 2.1 IMAGE PREPROCESSING................................................................................................................................................ 8 2.1.1 GREY SCALE ............................................................................................................................................................ 8 2.1.2 IMAGE FILTERING ................................................................................................................................................... 9 2.1.3 BINARY THRESHOLDING ......................................................................................................................................... 9 2.2 SEGMENTATION .......................................................................................................................................................... 11 2.3 FEATURE EXTRACTION ................................................................................................................................................ 11 2.4 CLASSIFICATION / LABELLING ............................................................................................................................... 13
3. IMPLEMENTATION ............................................................................................................................................................ 14 3.1 APPLICATION USE CASE .............................................................................................................................................. 14 3.2 APPLICATION CLASSES ................................................................................................................................................ 15 3.3 OBJECT COLLABORATION ............................................................................................................................................ 18 4. EVALUATION ..................................................................................................................................................................... 20 5. CONCLUSION ..................................................................................................................................................................... 20 6. REFRENCES ........................................................................................................................................................................ 20
LIST OF FIGURES
Figure 1, Processing Steps....................................................................................................................................................... 7 Figure 2, a contrast between a color image and grey scale image ......................................................................................... 8 Figure 3, Threshold uniform illuminated image ..................................................................................................................... 9 Figure 4, Threshold non-uniform illuminated image ............................................................................................................ 10 Figure 5,Contrast between Otsu and Chow and Kanenko .................................................................................................... 11 Figure 6, Use case diagram ................................................................................................................................................... 14 Figure 7, Overview of classes in the Application. ................................................................................................................. 15
LIST OF TABLES
TO DO LIST OF TABLES
LIST OF EQUATIONS
DECLARATION OF CONTRIBUTION
TODO Declaration of contribution
1. INTRODUCTION
1.1 PREAMBLE
An image is an artifact that depicts human perception; a digital image is a numeric representation usually in the form of a 2D matrix. Vision allows humans to perceive and understand the world around them by interpreting images acquired using the eyes [1], for humans the concept of vision is a relatively simple task but, vision task that include measurement and repetition are quite complex for humans to do. Computers on the other hand are very comfortable performing measurement and repetitive interpretations of digital image are more of a challenge for computers. tasks but
This document describes work carried in attempt to used computers to interpret images of characters of different fonts that are scale and rotation invariant and to do it with a great deal of accuracy even if the acquire image has a lot of unwanted artifacts and random noise which could be have been introduced from the image acquisition hardware or caused by the environment the image was acquired from.
Feature Extraction: Features are unique and consistent property of an image that can be used to help classify or label an image. The application extracts these properties to serve as inputs for the classifiers to help label the image. Classification/Labelling: This involves been able to determine or describe as accurate as possible the items or object within captured image. The application tried to identify the character within an image.
2.1.2 IMAGE FILTERING Image filtering includes methods to enhance images for further computer vision operations. This application needs to reduce noise in the image and certain image details need to be emphasized and some suppressed Smoothing Erosion Dilation
2.1.3 BINARY THRESHOLDING Although this is considered a segmentation process for the purpose of this application is included as part of the pre-processing stage just to separate the objects in the image from its background. Equation 2 describes mathematically what happens when an image is been threshold. { Where T is the threshold value. ------------------------------------------------------------------------- (2)
Figure 3 shows an image with uniform illumination that has been threshold using the algorithm described in equation 2 for T = 100, Also considered Figure 4 which shows a non-uniform illuminated image been threshold by the algorithm described in Equation 2, It can be noted that the image is not properly segmented and lots of the image is lost this is due to the fact that the value for the grey levels does not contain distinctive peaks from background across the entire image and a global value for T is not ideal for a non-uniform illuminated image hence better techniques is required and these class of technique are generally referred to as adaptive thresholding techniques they general use different values for T throughout the image based on local neighbourhood values , Equation 3 describes mathematically adaptive thresholding techniques where f(x, y) is the gray level of point (x, y) in the original image, and p(x, y) is some local property of Image Segmentation by Adaptive thresholding . ----------------------------------------------------------------------------------------- (3) Chow & Kanenko, Otsu thresholding were the adaptive techniques that were reviewed. Chow and Kaneko method divides an image into a set of overlapping sub images and then find the optimum threshold value of T for each sub image from the histogram of the sub image. The threshold value T for every single pixel is found through interpolation the results of the sub images. This method is computationally expensive which makes it not ideal for real-time image processing. Otsu thresholding is a class of adaptive thresholding referred to as local thresholding, Otsu method calculates the optimum threshold separating background and objects in an image so that their variance is very small Figure 5 shows a contrast between an image that was segmented using both Chow and Kanenko method and Otsu method, although chow and Kanenko method provides about the same results as Otsu, Otsu was selected for the application because Chow and Kanenko s method is relatively slow and computationally expensive.
10
2.2 SEGMENTATION
Segmentation is the process of separating the objects within an image, there are variety of ways which this process can be achieved ,one method is to find all edges in the image, thus implying the borders of shapes in the given image another method is to try and detect whole shapes, given the fact that these shapes usually tend to have distinctive properties as opposed to the background they are aligned against, for this application both method were combined for optimal results the second method have been described in section 2.1.3 as part of the preprocessing stage. This section describes the first method as a form of segmentation within the application, although the techniques reviewed are generally referred to as contour tracing methods they still form a part of edge based segmentation, Moore Neighbours, Freeman chain code
Image Moments, is a certain particular weighted average (moment) of the image pixels' intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. Image moments are useful to describe objects after segmentation. Simple properties of the image which are found via image moments include area (or total intensity), its centroid, and information about its orientation. ---------------------------------------------------------------------------------------------------------- (4)
Equation XX describes how moments are calculated from an image, it is possible to calculate moments which are invariant under translation, changes in scale, and also rotation. Most frequently used are the Hu set of invariant moments
----------- (5) The first one, I1, is analogous to the moment of inertia around the image's centroid, where the pixels' intensities are analogous to physical density. The last one, I7, is skew invariant, which enables it to distinguish mirror images of otherwise identical images.In order to normalize the Hu moments and make it easy to process Euclidian distances between the Hu moments are calculated. ----------------------------------------------------------------------------------------------------------- (6)
12
Table XX shows a comparison between the feature extraction method using KNN as a classifier based on a combination of results from XXX and XXX, it is shown that the moment based approached provided much better accuracy, hence normalized HU moment was selected for the application as a feature extraction method.
KNN was selected for the applications because of it relatively ease of use and the training just consists of saving the feature vector.
13
3. IMPLEMENTATION
Section 2 reviews the various image processing algorithms and techniques that would be used by this application many of these algorithms are easy to implement, This application the not implement the algorithms rather open source implementations of these algorithms are used and they are packaged as part of OpenCV, Although a managed wrapper for OpenCV is used to make its implementation available in .net runtime. The advantage of using this already implemented libraries include 1. The implementations are already tested by a large community of engineers and most error would have been identified and fixed hence it is more stable and reliable. 2. The implementation of the application would concentrate on solving the actual problem of character interpretation and not concentrate on implementing low level image processing algorithms, hence shorter development time.
With emphasis on the problem the application is meant to solve, Figure 5 is a use case model of the application requirements. There are two main functional requirements for the application which are training and matching, the training involves extracting the features of an image based on the techniques described in section 2.2 and saving them. The matching involved extracting the features of an inputted image and matching it features to the closest character in the labeled features set saved in the file using KNN.
14
Train: This use case includes other use cases that help it achieve the goal of building a model that can be as knowledge for the classifier. Match: This use case also includes other use case, its goal is to accept an input image and match it with the closest labeled features.
Extract Features: This is extracting unique properties from the image that can use for classification this is done as described in section 2. Save Features: This saves the extracted feature vector to a file to allow for retrieval. Load Image: This this loads an image from a physical file location in the application memory for processing Preprocess Image: This performs all the image emphasis techniques KNN Match: This is to use the saved features and try to find the nearest K match to the input features using KNN classifier Load Saved Features: This is reading the save features from the file and making it available to the classifier
Figure 6 gives an overview of all the entities within the application, the Features class stores the feature vector that are used for building the knowledge model for each class of characters, the OCREngine class is the controller class and it 15
performs all the core operation within the application while MainForm class and CountourForm class are display classes that provides an interface for the users to interact with the application. Features Class: Public field HU1 to HU7 represents the values of the invariant HU as described in section 2, while methods GetE1 () to GetE2 () calculates the Euclidian distances between the HU moments as described in section 2. OCREngine Class: This class contains only static methods that are used to carry each step in the entire application process and they serve as Object oriented wrappers for the OpenCV function calls. ConvertToGreyScale: This wraps the OpenCV function that converts the image for an RGB image to a grey scale image. ConvertToBinary: This wraps the OpenCV function that converts the grey scale image to a binary image. ExtractFeatures: This includes the OpenCV function that. Smooth: Filter: This wraps the OpenCV functions that perform the morphological operations, which are erosion and dilation. FindCountour: This wraps the OpenCV function that performs the contour tracing from the loaded image and is also responsible for launching the CountourForm as well for each contour extracted. Train: Match:
MainForm Class: This class is responsible for interacting with the user especially to load the images for processing and it is the first contact the application user as with the application, figure 8 is a screen shot of this GUI class when rendered. CountourForm: This class is responsible for interacting with the user to display the extracted features and to label the extracted features from the contours extracted, figure 9 is a screen shot of this GUI class when rendered.
16
17
18
Figure 10 and figure 11 shows the interaction between the objects in the application for the training and matching use cases. The MainForm object is responsible for loading the image through user interaction and it sends requests to the controller Object to perform the processing stages and the User interface of the MainForm is updated after each step except after the contour extraction step where a the controller object interacts with the contour form to display the contour and the ContourForm sends message to the controller object to extract features and the features are update on the ContourForms display depending on which use case the ContourForm can send either a train or match message to the controller object if it is a train message the ContourForm labels the features by accepting a user input which is the ASCII code of the character, if a match message is sent to controller object the controller object responds with the ASCII code of the .
19
20