1996, KOVESI, Invariant Measures of Image Features From Phase Information

INVARIANT MEASURES OF IMAGE FEATURES FROM PHASE INFORMATION
This thesis is presented to the Department of Psychology for the degree of Doctor of Philosophy of the University of Western Australia
By Peter Kovesi May 1996
c Copyright 1996 by Peter Kovesi
ii
Abstract
Invariant Measures of Image Features From Phase Information If reliable and general computer vision techniques are to be developed it is crucial that we nd ways of characterizing low-level image features with invariant quantities. For example, if edge signicance could be measured in a way that was invariant to image illumination and contrast, higher-level image processing operations could be conducted with much greater condence. However, despite their importance, little attention has been paid to the need for invariant quantities in low-level vision for tasks such as feature detection or feature matching. This thesis develops a number of invariant low-level image measures for feature detection, local symmetry/asymmetry detection, and for signal matching. These invariant quantities are developed from representations of the image in the frequency domain. In particular, phase data is used as the fundamental building block for constructing these measures. Phase congruency is developed as an illumination and contrast invariant measure of feature signicance. This allows edges, lines and other features to be detected reliably, and xed thresholds can be applied over wide classes of images. Points of local symmetry and asymmetry in images give rise to special arrangements of phase, and these too can be characterized by invariant measures. Finally, a new approach to signal matching that uses correlation of local phase and amplitude information is developed. This approach allows reliable phase based disparity measurements to be made, overcoming many of the diculties associated with scale-space singularities.
iii
iv
Acknowledgements
First of all I would like to thank my supervisors John Ross and James Trevelyan. With their gentle guidance and encouragement, the odd searching question, and the occasional nudge, they ensured that progress was always maintained. In each of them I have also greatly valued their enormous breadth of knowledge that spanned many disciplines. This helped me keep my thoughts open and wide ranging as I searched for answers to my problems. I must also thank my other supervisor, my wife Robyn Owens, for an uncountable number of technical discussions, for her proof-reading skills, and for always being there and making the generation of this thesis far less traumatic than I would have dared to hope for. I thank Grace, Genevieve, and later in the generation of this thesis, Gabriel for their tolerance and patience while Daddy did his Pee-Aiche-Dee. I would also like to acknowledge the many hours of useful discussions I have had with Ben Robbins, Chris Pudney, Mike Robins, and Adrian Baddeley. Ben Robbins pointed out the eciencies that can be made in the Fourier convolution of an image with a quadrature pair of lters. This must have saved me many hours of waiting and allowed me to do many more experiments that I would have done otherwise. Others I must thank include the following: Daniel Reisfeld who introduced me to the problem of nding local symmetry in images, resulting in many long and impassioned discussions on the subject; Concetta Morrone for her amazing grasp of both the psychophysics and computer vision literature, and therefore, always being able to suggest yet another paper I should read; Carlo Tomasi for his help in converting an early version of my phase congruency code from C to a MATLAB script; Olivier Faugeras and his colleagues for their hospitality and the ne working environment they have developed at INRIA in Sophia Antipolis which I was able v
to enjoy during my visit there during the rst half of 1995. Finally I thank everyone in The Robotics and Vision Research Group in the Department of Computer Science at The University of Western Australia for the enjoyable working environment that they contribute to.
vi
Contents
Abstract Acknowledgements 1 Introduction 1.1 The Need for Invariant Quantities in Images . . . . . . . . . . . . . 1.2 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Image features 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Gradient based feature detection . . . . . . . . . . . . . . . . . . . 2.3 Local energy and phase congruency . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 Dening phase congruency . . . . . . . . . . . . . . . . . . . Local energy . . . . . . . . . . . . . . . . . . . . . . . . . . . iii v 1 1 3 4 6 9 9 10 17 19 21 27 31 33 33 34 39 43 49
2.4 Issues in calculating phase congruency . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Phase congruency from wavelets 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Using Wavelets for Local Frequency Analysis . . . . . . . . . . . . . 3.3 Calculating Phase Congruency Via Wavelets . . . . . . . . . . . . . 3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Extension to two dimensions . . . . . . . . . . . . . . . . . . . . . . vii
3.5.1 3.5.2 3.5.3 3.5.4
2D lter design . . . . . . . . . . . . . . . . . . . . . . . . . Filter orientations . . . . . . . . . . . . . . . . . . . . . . . . Noise compensation in two dimensions . . . . . . . . . . . . Combining data over several orientations . . . . . . . . . . .
49 50 52 53 54 61 61 63 67 69 75 77 77 78 84 85 87 93 93 95 97 97 98
3.6 The importance of frequency spread . . . . . . . . . . . . . . . . . . 3.7 Scale via high-pass ltering . . . . . . . . . . . . . . . . . . . . . . 3.7.1 3.7.2 3.7.3 Diculties with low-pass ltering . . . . . . . . . . . . . . . High-pass ltering . . . . . . . . . . . . . . . . . . . . . . . High-pass ltering and scale-space . . . . . . . . . . . . . . .
3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A second look at phase congruency 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Log Gabor wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Phase congruency from broad bandwidth lters . . . . . . . . . . . 4.4 Another way of dening phase congruency . . . . . . . . . . . . . . 4.4.1 Calculation of P C2 via quadrature pairs of lters . . . . . .
4.5 A third measure of phase congruency . . . . . . . . . . . . . . . . . 4.5.1 Calculation of P C3 via quadrature pairs of lters . . . . . .
4.6 Biological computation of phase congruency . . . . . . . . . . . . . 4.7 Symmetry and Asymmetry: Special patterns of phase . . . . . . . . 4.7.1 4.7.2 4.7.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . A frequency approach to symmetry . . . . . . . . . . . . . .
Biological computation of symmetry and asymmetry . . . . 103
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5 Representation and matching of signals 105
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2 Spatial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3 Phase Based Disparity Measurement . . . . . . . . . . . . . . . . . 107 5.4 Matching Using Localized Frequency Data . . . . . . . . . . . . . . 110 viii
5.5 Using Phase to Guide Matching . . . . . . . . . . . . . . . . . . . . 111 5.6 Determining Relative Signal Distortion . . . . . . . . . . . . . . . . 113 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6 Conclusion 119
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Bibliography A Portfolio of experimental results 123 133
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2.1 Image acknowledgements . . . . . . . . . . . . . . . . . . . . 136 A.3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B Noise models and noise compensation 161
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 B.2 Noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.3 Noise spectra measured from images . . . . . . . . . . . . . . . . . 164 B.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C Non-maximal suppression 167
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 C.2 Non-maximal suppression using feature orientation information . . . 168 C.3 Orientation from the feature image . . . . . . . . . . . . . . . . . . 169 C.4 Morphological approaches . . . . . . . . . . . . . . . . . . . . . . . 171 C.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 D Implementation details 173
D.1 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . 175
ix
Chapter 1 Introduction
1.1 The Need for Invariant Quantities in Images
This thesis is concerned with the search for measures of image features that remain constant over wide ranges of viewing conditions. Such invariant quantities provide powerful tools for the analysis of images, allowing image processing algorithms to work more reliably and over wider classes of images. The work presented in this thesis concentrates on invariant quantities in low-level or early vision. Some eort has been devoted to investigating invariant measures of higher level structures in images, for example, Hu [37] developed a series of invariant moments for recognizing binary objects. More recently there has been considerable interest in geometric invariance, the study of geometric properties of objects that remain invariant to imaging transformations. A collection of papers in this area can be found in the book by Mundy and Zisserman [63]. However, little attention has been paid to the invariant quantities that might exist in low-level or early vision for tasks such as feature detection or feature matching. Some limited exceptions to this include the work of Koenderink and van Doorn [44, 45] who recognized the importance of dierential invariants associated with motion elds, and Florack et al. [28] who propose dierential invariants for characterizing a number of image contour properties. However, in general, interest in low-level image invariants has been limited. This is surprising considering the fundamental importance of being able to obtain reliable results from low level image operations in order to successfully 1
CHAPTER 1. INTRODUCTION
perform any higher level operations. There are two main points about an invariant measure: rstly, of course, it must be dimensionless (that is, have no units attached to it), and secondly it should represent some meaningful and useful quality. If it does not represent some meaningful quality one has no idea how to use it. It is easy to construct a dimensionless quantity that is meaningless, for example, the ratio of my height to the width of the letter o. It is also easy to nd measures that are useful but not dimensionless, for example, the speed of your car. However, it is often hard to dene something that is both dimensionless and useful. Why do we want to nd invariant quantities? Quantities that are useful but not dimensionless are generally only useful because they are applied in relatively structured environments and at a specic scale. For example, using the speed of your car to decide whether you are driving safely only works because most cars are similar in size, roadways are standardized and gravitational forces are eectively constant. Images, on the other hand, provide a very dynamic and unstructured environment in which we struggle to make our algorithms operate. Objects can appear with arbitrary orientation and spatial magnication along with arbitrary brightness and contrast. Thus the search for invariant quantities is very important for computer vision. It is all too easy to forget that a number inside a computer often has units associated with it. The fact that a number has units associated with it imposes some constraints on how it should be used. For example, it does not make sense to add a quantity representing time to one representing a length. Despite this, it is quite common to nd such nonsensical combinations of quantities in the computer vision literature. There are many algorithms that involve the minimization of some energy; often the energy is dened to be the addition of many components, each having very dierent units. For example, energy minimizing splines (snakes) are usually formulated in terms of the minimization of an energy that is made up of an intensity gradient term and a spline bending term [42]. These two components, while representing meaningful quantities, are not dimensionless. This means that for energy minimizing splines to be eective their parameters have to be tuned
1.2. THE APPROACH
carefully for each individual application. The parameters are used to balance the relative importance of individual components of the overall energy. If, say, the overall image contrast was halved one would need to double the weighting applied to the intensity gradient term to retain the same snake behaviour. If one was to somehow replace the intensity gradient and spline bending terms with dimensionless quantities that represented, in some way, the closeness of the spline to a feature and the deformation of the spline, one would be able to use xed parameters over wider classes of images. Clearly there is a pressing need for the identication of low level invariant quantities in images. The main form of invariance that will be investigated in this thesis is invariance to image illumination and contrast. That is, this thesis will be seeking to construct low level image measures that have a response that is independent of the image illumination and/or contrast.
1.2
The Approach
In the search for low level invariant quantities in images the approach taken in this thesis is to make use of data from representations of the image in the frequency domain. Working directly in the spatial domain is avoided for two reasons. Firstly, the spatial domain of an image, while convenient and intuitive, almost always forces one into making use of dimensional measures in the analysis of an image; it is hard to get away from the use of intensity gradients, contrast levels or equivalent quantities. Secondly, low level spatial techniques have been extensively researched, and while one cannot say all possibilities have been exhausted, the opportunities for the development of signicantly new techniques appear limited. The most logical alternative approach is to consider representations of the image in the frequency domain; much of the psychophysical literature in visual perception has been devoted to the development of models in this domain. However, these psychophysical models have generally not been developed to the point where they could be implemented as algorithms in a computer vision system. With an image represented in terms of the variation of amplitude and phase
values with frequency one has a number of new and interesting possibilities in the analysis of image signals. However, so far in the computer vision literature, very little work has been done on the use of frequency data to recognize and characterize features in signals. Some notable exceptions to this include the following: Granlund [30], who proposed a multiscale Fourier transform approach to the analysis of images; Knutsson, Wilson and Granlund [43], who developed these ideas further for image coding and the restoration of noisy images; Morrone and Owens [61], who use phase congruency as a means of nding image features; Fleet and Jepson [26], who used phase to determine image velocities; and Langley, Atherton and Wilson [51], Fleet, Jepson and Jenkin [27] and Calway, Knutsson and Wilson [9], who have investigated the use of phase information to estimate image disparities. Jones and Malik [40, 41] have also used local frequency information for determining disparity, though they do not directly use phase information. In this thesis considerable eort is devoted to the understanding of the variations of phase and amplitude over frequency for dierent image features. In particular, phase data is used as the fundamental building block for the various low-level invariant feature measures that are developed in this thesis. Phase information is an ideal starting point for the development of invariant measures for two reasons. Firstly, phase itself is a dimensionless quantity, and secondly phase information has been shown to be crucial in the perception of images [65]. This is discussed further in Chapter 2.
1.3
Contributions
This thesis develops a number of invariant frequency based low-level image quantities for feature detection, local symmetry/asymmetry detection, and for signal matching. Most of this thesis is devoted to the investigation of the use of congruency of the local phase over many scales as an illumination and contrast invariant measure of feature signicance at points in images. Phase congruency was rst proposed by Morrone et al. [62] and Morrone and Owens [61] as a computational model of the
1.3. CONTRIBUTIONS
perception of low-level features such as step edges, lines, and Mach bands in images. However, due to practical diculties in calculating phase congruency they developed the use of a related quantity, local energy, for feature detection instead. The main contribution of this thesis is to establish the importance of phase congruencys invariance to illumination and contrast and to develop a practical implementation of it. The goal of an illumination and contrast independent feature detector is achieved and its reliable performance over a wide range of images using xed thresholds is demonstrated. In achieving this goal a number of other contributions are developed. These include an eective noise compensation technique - something that is often essential when normalized image measures such as phase congruency are used. This noise compensation technique makes minimal assumptions about the nature of the image noise and can be applied to any image processing technique that makes use of banks of lters over several scales. Another contribution is the recognition of the importance of the spread of frequencies that are present at each point in a signal when one is considering phase congruency. For phase congruency to be used as a measure of feature signicance it must be weighted by some measure of the spread of frequencies present. A method for doing this is presented. Also presented is an argument that when a frequency based approach is used in the analysis of images a more logical interpretation of scale is obtained by using high-pass ltering rather than low-pass or band-pass ltering. This approach results in feature positions remaining stable over dierent scales of analysis, something that is not achieved with low-pass or band-pass ltering. Another contribution is the recognition that points of local symmetry and asymmetry in images also give rise to special arrangements of phase, and these can be readily detected. The new measures of local image symmetry and asymmetry that are developed are unique in that they are dimensionless and that they do not require any previous image segmentation to have taken place prior to analysis. Finally, with the insights obtained from this work in the use of phase for feature detection a new approach to the matching of signals is developed. This technique
uses correlation of local phase and amplitude information, rather than spatial intensity data, for matching. An advantage of this new method is that it also allows disparity between points in stereo images to be estimated.
1.4
Thesis Overview
Chapter 2 reviews the major approaches that have been used for low-level edge detection and discuses their shortcomings. The main problems are that existing approaches use very simple edge models, and that one cannot know in advance of applying the edge operator what level of edge response will be signicant. That is, edge thresholds for individual images have be set interactively by viewing the output. The local energy and phase congruency model of feature perception is then introduced and previous work in this area is reviewed. A new geometric interpretation of phase congruency is provided and it is argued that phase congruency rather than local energy should be used to identify features in images because it is a dimensionless quantity. However, while phase congruency appears to be an attractive measure to use there are some diculties in calculating it and the chapter concludes by identifying these problems: Phase congruency can be dened in 1D but it is not clear how it should be calculated in 2D. Being a normalized quantity phase congruency responds strongly to noise in signals. Phase congruency is only meaningful if there is a spread of frequency components present in a signal; how should this spread be measured? Phase congruency appears to require a dierent interpretation of scale, suggesting that high-pass ltering rather than low-pass ltering should be used. Chapter 3 sets out to develop a practical method for calculating phase congruency in images. The rst requirement is to identify an appropriate method of obtaining local frequency information in images. Complex Gabor wavelets are
1.4. THESIS OVERVIEW
adopted for this purpose. It is then shown how phase congruency in 1D signals can be readily calculated from the convolution outputs of a bank of complex Gabor lters. The problem of noise is then considered and a method of automatically recognizing, and compensating for, the inuence of noise on phase congruency values in an image is devised. This is followed by a section covering the issues involved in extending the calculation of phase congruency to 2D images. It is then shown how the use of wavelets allow us to obtain a measure of the spread of frequencies present at a point of phase congruency. This helps us determine the degree of signicance of a point of phase congruency and allows us to improve feature localization. Finally the issue of analysis at dierent scales is considered in more detail and it is concluded that high-pass ltering should be used to obtain image information at dierent scales instead of the more usually applied low-pass ltering. Chapter 4 re-examines the work on phase congruency that was developed in the previous chapter. Firstly the choice of the wavelet function used for the analysis of images is considered. Of particular concern is the limited maximum bandwidth that can be obtained using Gabor functions and it is concluded that the log Gabor function is more appropriate as it allows one to construct lters of arbitrary bandwidth. However, when these high bandwidth lters were used to calculate phase congruency unexpected results were produced. The analysis of these results led to the development of two new approaches to the calculation of phase congruency, one of which produced very superior results. This work, in turn, led to a new frequency based approach to the detection of points of local symmetry and asymmetry in images. It is shown how symmetry and asymmetry can be thought of as representing generalizations of delta and step features respectively. Chapter 5 changes subject and considers the matching of signals and the estimation of disparity using local frequency information. Many of the ideas and insights obtained from the work on phase congruency are employed to great benet here. Where this work diers mainly from other work in this area is in its integrated use of frequency data over many scales. An approach to signal matching via correlation of local phase and amplitude is developed. A by-product of this approach to signal matching is that an estimate of the spatial shift required in one signal to
match the second is obtained. This allows rapid convergence to the correct matching locations. The chapter concludes with some discussion about the advantages of matching signals represented in the log frequency domain. In this domain spatial scale changes in signals manifest themselves as a translation of the local amplitude spectra along with an amplitude rescaling, however, shape of the spectra remains unchanged. This invariance in the log frequency domain oers a number of interesting possibilities. For example it may allow textures to be correctly recognized in foreshortened views, or provide a new way of identifying surface slant from spatial scale change in stereopsis or motion. Finally, Chapter 6 concludes this work and discusses the areas that might be developed further in future work. Four appendices are also included. Appendix 1 presents a comprehensive portfolio of experimental results comparing phase congruency to the output of the Canny edge detector over a wide range of images. Phase symmetry images are also presented for each image in the portfolio. In addition, phase congruency images are presented for a number of test conditions to illustrate its behaviour under dierent parameter settings. Appendix B looks at the sensitivity of the phase congruency noise compensation technique to dierent noise models, showing that the noise model is not critical. Appendix C describes the problems in performing non-maximum suppression on phase congruency images. The techniques that were used in generating the nal phase congruency edge maps are described, and it is concluded that much work could be done on the problem of non-maximum suppression. Finally, Appendix D describes some of the implementational details for the calculation of phase congruency.
Chapter 2 Image features

2.1 Introduction
The detection of edges and other low-level features in images has long been recognized as a fundamental operation of great importance. A good line drawing can provide much the information that might be contained in a photograph of the same scene, and in doing so only requires a small fraction of the data used by the photograph to represent that information. Indeed, line drawings can be easier to interpret and are often used instead of photographs in technical manuals and How to do it books. However, one has to be cautious in comparing the interpretability of line drawings with photographs. Drawings made by humans are almost always constructed with their semantic content in mind, particularly so for technical manuals. Extraneous details are removed and extra details that would not normally be visible may be added, shading is also often used1 . Thus a line drawing that has been automatically generated with no regard to the images semantic content may not provide all the information that one might hope to obtain. Nevertheless the extraction of a line drawing is an important rst step in the automated analysis of a scene. In searching for parameters to describe the signicance of image features, such as edges, we should be looking for measures that are invariant with respect to image
If one had a good automated feature detector one would be able to construct line drawings with no regard to their semantic content, this would allow a fair comparison between line drawings and photographs.
10
CHAPTER 2. IMAGE FEATURES
contrast and spatial magnication. Such quantities would provide an absolute measure of the signicance of feature points that could be applied universally to any image irrespective of image contrast and magnication. The human visual system is able to reliably identify the signicance of image features under widely varying conditions. Even if the illumination of a scene is altered by several orders of magnitude our interpretation of it will remain largely unchanged. Similarly, our interpretation of images is not greatly aected by changes in apparent spatial magnication, though not with the same degree of tolerance that we have to illumination changes. Despite the obvious importance of characterizing low-level image features in some invariant manner almost no eort seems to have been devoted to this task. One recent exception is the work of Heeger [35] in his development of a normalized model of contrast sensitivity that qualitatively matches psychophysical data, though this work is not directed at computer vision. This chapter discusses some of the shortcomings of existing feature detectors and introduces the idea of detecting features on the basis of phase congruency.
2.2
Gradient based feature detection
The majority of work in the detection of low-level image features has been concentrated on the identication of step discontinuities in images using gradient based operators. Gradient based edge detection methods were pioneered by Roberts [77], Prewitt [71] and Sobel [72, 86]. They were then developed in terms of a computational model of human perception by Marr and Hildreth [55, 54]. Inspired by the presence of on-centre and o-centre receptive elds in the retina, Marr and Hildreth developed a model where edges were detected via the zero-crossings of the image after convolution with a Laplacian of Gaussian lter. While this model was attractive it had a number of diculties: Zero-crossings always form closed contours, often not realistically modelling the connectivity of image features; staircase intensity proles result in false positives being detected; and nally, with the second derivative of the image being used the results are susceptible to noise. Marr also introduced the concept of the Primal Sketch, that is, the idea that the brain generates a concise
2.2. GRADIENT BASED FEATURE DETECTION
11
representation of the scene that contains important images tokens, such as edges and other basic image features, and that this representation permits further analysis of the scene to be done more eciently by the brain. This concept has greatly inuenced much of the research done in computer vision. A number of variations of second derivative operators have been devised in various attempts to overcome their deciencies. Some examples of this include the work of Fleck [22], Haralick [32], and Sarkar and Boyer [83]. Fleck and Haralick used directional second derivatives to reduce the inuence of noise, with Fleck also employing rst and third derivative information to eliminate the detection of false positives. Sarkar and Boyer adopted the optimality criteria proposed by Canny [11, 12] to develop innite impulse response lters for the detection of edges via zero crossings. Canny [11, 12] formalized the problem of the detection of step edges in terms of three criteria: good detection; good localization; and uniqueness of the response to a single feature. Subsequently Spacek [87] and Deriche [16] followed Cannys approach to develop similar operators; Deriche allowing the operator to have an innite impulse response and Spacek modifying the response uniqueness criterion. An objection to these optimal detectors is that they are only optimal in a very limited domain, that of one dimensional step edges in the presence of noise. At 2D features such as corners and junctions where the intensity gradient becomes poorly dened these detectors have diculties. Thus, a major problem with gradient based operators is that they use a single model of an edge, that is, they assume edges are step discontinuities. In an ideal system a feature detector would mark features wherever a good artist would draw features when making a sketch of a scene. An artist produces marks in a sketch for a wide range of feature types, not just step edges. Marks are drawn to indicate line, roof and step edges along with other features such as shadow boundaries, highlights, and presumably a range of other (unknown) feature types. Perona and Malik [68] point out that many image features are represented by some combination of step, delta, roof and ramp proles. For example, a very commonly encountered feature type is the occluding boundary of a convex object, such as a ball. If the ball surface
12
illumination
lambertian sphere
background
measured grey value
Figure 1: Intensity prole observed across a lambertian sphere against a plain backgound with overhead illumination. The occlusion boundary is not a simple step edge. is lambertian and the illumination is aligned with the viewing direction the feature prole will consist of an intensity prole that starts o brightest at the mid-point of the ball and then gets darker as our view moves across the ball as a result of the surface normal becoming perpendicular to our viewing direction, and nally culminating in a step jump to the grey level of the background (Figure 1). In this simple, idealized situation we have a feature that is considerably more complex than a step edge. In practice the situation will be far more awkward; the ball surface is unlikely to be lambertian, lighting can be from any direction, there may be mutual illumination eects between the ball and other objects, and of course, the background may not be uniform. For this reason the word feature will be generally used in this thesis rather than the word edge in order to emphasize the aim of nding all important features that represent points of high information content, not just step edges. The denition of what a feature is will be deliberately left vague, though subsequent sections which describe the phase congruency model of feature perception will oer a possible denition. Some might argue that an automated feature detector does not need to attempt
13
to emulate human sketching skills. However, the interest in producing feature detectors has been primarily inspired from the ability of artists to produce line drawings 2 . Artists have shown us that line drawings can provide very compact yet eective descriptions of scenes. Indeed, in the assessment of any automated feature detector perhaps the best we can do is to compare its output against a line drawing of the same scene made by an expert reproductive artist. After all it is artists who are our best experts in representing scenes via line drawings. It is probably fair to say that excessive emphasis has been placed on nding optimal step edge detectors and the original objective, that of nding points of high information content in images has been forgotten. Just because a detector is eective in nding and localizing noisy step edges in a scene does not mean that it will represent the information in the scene well. A second problem with gradient based edge detectors is that they typically characterize edge strength by the magnitude of the intensity gradient. Thus the perceived strength or signicance of an edge is sensitive to illumination and spatial magnication variations. Intensity gradient has units of lux/radian (pixel coordinates represent viewing direction and hence have angular units)3. Intensity gradients in images depend on many factors, including scene illumination, blurring and magnication. For example, doubling the size of an image while leaving its intensity values unchanged will halve all the gradients in the image. Any gradient based edge detection process will need to use a threshold modied appropriately. However, in general, one does not know in advance the level of contrast present in an image or its magnication. The image gradient values that correspond to signicant edges are usually determined empirically.
Here the distinction is made between line drawings, which contain only lines, and sketches which may also include shading. Strictly speaking, image grey values should not be called intensity values. Intensity is dened as the luminous ux that is emitted per solid angle and is a property that is associated with a light source. Intensity has units candelas (lumens/steradian). In constructing an image a camera measures the illumination at each point in the image plane that is received from a scene. Thus, image grey values have units lux (lumens/m ). Despite this, the use of the term intensity value for an image grey value appears to be commonly accepted. David Marr used the term in this manner in his book [54].
14
Little guidance is available for the setting of thresholds, indeed Faugeras4 can only oer the following advice: Thresholding is a plague that occurs in many areas in engineering, but to our knowledge it is unavoidable and must be tackled with courage. A limited number of eorts have been made to determine threshold values automatically. In his thesis, Canny [11] sets his thresholds on the basis of local estimates of image noise obtained via Weiner ltering. However, the details of setting thresholds on this basis, and the eectiveness of this approach are not reported. Canny also introduced the idea of thresholding hysteresis which has proved to be a useful heuristic for maintaining the continuity of thresholded edges, though one then has the problem of determining two threshold levels. Sarkar and Boyer [83] also employed Weiner ltering to estimate the derivative of the noise output in their zero crossing based detector. Having an estimated slope of the noise response allowed them to set thresholds appropriately. However, this process required them to take three more derivatives after the image had been ltered by their edge operator. This presumably limited the quality of the estimate of the derivative of the noise output. Kundu and Pal [50] devised a method of thresholding based on human psychophysical data where contrast sensitivity varies with overall illumination levels. However, it is hard to provide any concrete guide to the tting of a model of contrast sensitivity relative to a digitized grey scale of 0255. More recently Fleck [24, 23] suggested setting thresholds at some multiple (typically 3 to 5) of the expected standard deviation of the operator output when applied to camera noise. This approach of course, requires detailed a priori knowledge of the noise characteristics of any camera used to take an image. Noise is always a concern for gradient based detectors. The main tool used to reduce the inuence of noise is spatial smoothing. However, smoothing degrades feature localization, and 2D feature positions such as corners can be severely corrupted (see Perona and Malik [69]). With high degrees of smoothing feature locations can move signicantly, and distinct features may
Olivier Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press 1993, p117.
15
merge. It is very unsatisfactory for the perceived location of a feature to depend on how much smoothing was required to overcome the inuence of noise. This issue will be considered in more detail in the next chapter. Bergholm [5] adopts the scale-space model in developing his edge focusing approach to edge detection, and in doing so addresses an number of problems associated with gradient based detectors. He observes that to eliminate the inuence of noise on a gradient based detector a heavily smoothed image is required, but this degrades edge localization. To achieve good localization no smoothing should be used but then noise becomes a problem. Bergholms solution is to start with an edge map at a heavily smoothed scale. He then proceeds to calculate an edge map at a slightly ner scale but only at pixels in the image connected to edge pixels found at the previous scale. The old edge points are discarded, the new ones at the slightly ner scale retained, and the process is repeated. In this manner edges are propagated out from their initial, rough locations and focused to their correct positions at the nest scale. An important point is that the problem of noise is overcome by starting with edges at a coarse scale and only looking for edges in adjacent pixels as scale is gradually reduced. Another attractive feature is that edge thresholding is only required to generate the initial edge map. However if this initial map is incorrectly thresholded at too high a level then many features will never be found. Conversely, if the threshold is too low many noise features will be found and these will be propagated down to the nest scale. The discussion so far has been directed at gradient based detectors though, of course, other types of detectors have been developed. For example the weak membrane approach of Blake and Zisserman [6] involves minimizing a global energy function over the image in order to solve for a surface function that ts the image in a manner that is considered to be appropriate. Blake and Zissermans energy measure is a weighted combination of terms representing the deviation of the surface function from the image, the square of the slope of the function, and the contour length of the function. This can be interpreted as tting a weak membrane to the image data in such a way that discontinuities are preserved. An objection to this approach is that the energy term is not dimensionally consistent with dierent
16
types of quantities being added together. This makes the result very sensitive to the relative weightings of the terms that make up the energy. Noble [64] devised a number of grey level morphological operations to detect edges. She develops a dilation-erosion residue operator which is analogous to a rst derivative operator and is used as an edge strength map. A second operator called the signed maximum dilation-erosion residue (analogous to a second derivative operator) is used to guide the tracing of edges, and to classify the responses to the dilation-erosion residue operator. While Nobles approach is morphological, the steps involved can be interpreted in terms of dierential operators. Thus it depends on using a simple edge model and it does not escape the thresholding problem. Perona and Malik [69] devised an approach to edge detection using anisotropic diusion. They developed an approach to scale space smoothing that is based on the heat diusion equation. To detect edges they make the conduction coecient a function of the image gradient to impede the ow of heat. Thus step discontinuities in the image form local barriers to the diusion process. Over repeated iterations of the diusion process step edges in the image become sharper and regions between the step discontinuities become smoother. Final extraction of the edges then becomes straightforward. A very signicant attribute of this approach is that feature positions remain stable over scale. All that changes with scale is the level of contrast (heat dierence) required for a feature to persist. However, this approach only detects step edges and is very much dependent on local image contrast. Another interesting approach that has been developed recently by Smith and Brady [85] is the SUSAN edge nder. This non-linear technique involves indexing a circular mask over the image and at each location determining the area of the mask having similar intensity values to the centre pixel value. This segment of the mask is denoted the Univalue Segment Assimilating Nucleus (USAN). Locations in the image where the USAN is locally at a minimum (locally the Smallest USAN, hence SUSAN) mark the positions of step and line features. The detector performs well, and its tolerance to noise is a signicant attribute. However, the detector is not invariant to image contrast as it requires the setting of a threshold which is
2.3. LOCAL ENERGY AND PHASE CONGRUENCY
17
used to decide whether or not elements of the mask are similar to the centre value when determining the size of the USAN. This threshold species the minimum edge contrast that can be detected. The discussion above represents a generalized overview and sampling of existing edge detection techniques. Others have conducted far more comprehensive reviews (for example Noble [64]), and it is not intended to repeat such a review here. The main purpose of this overview is to point out that almost all existing edge detectors are based on the calculation of intensity gradients or some other measure of the spatial variation of intensity across the image. These measures are dimensional quantities and hence depend on image contrast and spatial magnication. Thus the fundamental problem is that one does not know in advance what level of edge strength corresponds to a signicant feature. As a result, edge thresholds are generally set by humans viewing the output and adjusting the threshold until the result is deemed acceptable. This is not automated feature detection.
2.3
Local energy and phase congruency
The local energy model of feature perception is a relatively new model. It is not based on the use of local intensity gradients for feature detection. Instead it postulates that features are perceived at points in an image where the Fourier components are maximally in phase 5 . For example, when one looks at the Fourier series that makes up a square wave all the Fourier components are sine waves that are exactly in phase at the point of the step at an angle of 0 or 180 degrees depending on whether the step is upward or downward. At all other points in the square wave individual phase values vary, making phase congruency low. Similarly one nds that phase congruency is a maximum at the peaks of a triangular wave (at an angle of 90 or 270 degrees). A particularly important point about using phase congruency to mark features of interest is that one is not making any assumption about the
It should be emphasized that when phase is referred to in this thesis it is local phase that is being considered. That is, we are concerned with the local phase of the signal at some position x. This is distinct from phase values that one might obtain, say, from a FFT of a signal in which phase values will be the phase osets of each of the sinusoidal basis functions in the decomposition.
18
shape of the waveform at all. One is simply looking for points in the image where there is a high degree of order in the Fourier domain.
Figure 2: Construction of square and triangular waveforms from their Fourier series. In both diagrams the rst few terms of the respective Fourier series are plotted with broken lines, the sum of these terms is the solid line. Notice how the Fourier components are all in phase at the point of the step in the square wave, and at the peaks and troughs of the triangular wave. A wide range of feature types give rise to points of high phase congruency. These include step edges, line and roof edges, and Mach bands. It was, in fact, investigations into the phenomenon of Mach bands by Morrone et al. [62] that led to the development of the local energy model. Mach bands are illusory bright and dark bands that appear on the edges of trapezoidal intensity gradient ramps, for example, on the edges of shadows. The classical explanation for the perception of Mach bands has been lateral inhibition (see Ratli [74]). However, this explanation fails in that it predicts maximal perception of Mach bands on step edges, where in fact we see none. In their paper, Morrone et al. show that at the points where we perceive Mach bands the Fourier components of the signal are maximally in phase (though not exactly in phase); this led to their hypothesis that we perceive features in images at points of high phase congruency. Further work by Morrone and Burr [60] and Ross et al. [80] went on to show that this model successfully explains a number of other psychophysical eects in human feature perception. Other studies of the sensitivity of the human visual system to phase information include that by Burr [8], Field and Nachmias [21] and du Buf [18]. Fleet [25] argues strongly for the use of phase information in the calculation of image velocities. He shows that the motion of contours of constant phase in images provide a better measure of
19
the motion eld than contours of constant intensity amplitude in the image. Phase information is more robust to noise, and shading and contrast variations in the image. The classic demonstration of the importance of phase was devised by Oppenheim and Lim [65]. They took the Fourier transforms of two images and used the phase information from one image and the magnitude information of the other to construct a new, synthetic Fourier transform which was then back-transformed to produce a new image. The features seen in such an image, while somewhat scrambled, clearly correspond to those in the image from which the phase data was obtained. Little evidence, if any, from the other image can be perceived. A demonstration of this is repeated here in Figure 3. With phase data demonstrated as being so important in the perception of images it is natural that one should pursue the development of a feature detector that operates on the basis of phase information. From their work on Mach bands Morrone and Owens [61] quickly recognized that the local energy model had applications in feature detection for computer vision.
2.3.1
Dening phase congruency
We shall rst consider one dimensional signals. The phase congruency function is developed from the Fourier series expansion of a signal, I at some location, x, I (x ) =
n
An cos(nx + n0 ) An cos(n (x)) ,

n
(1) (2)
where An represents the amplitude of the nth cosine component, is a constant (usually 2 ), and n0 is the phase oset of the nth component (the phase oset also allows sine terms in the series to be represented). The function n (x) represents the local phase of the Fourier component at position x. Morrone and Owens dene the phase congruency function as P C (x) = max(x)
n An [0,2]
cos(n (x) (x)) . n An
(3)
20
(a) image providing magnitude data
(b) image providing phase data
(c) phase and amplitude mixed image Figure 3: When phase information from one image is combined with magnitude information of another it is phase information that prevails. The value of (x) that maximizes Equation 3 is the amplitude weighted mean local phase angle of all the Fourier terms at the point being considered. Taking the cosine of the dierence between the actual phase angle of a frequency component and this weighted mean, (x), generates a quantity approximately equal to one minus half this dierence squared (the Taylor expansion of cos(x) 1 x2 /2 for small x). Thus nding where phase congruency is a maximum is approximately equivalent to nding where the weighted variance of local phase angles, relative to the weighted average local phase, is a minimum (see Figure 4).
21
weighted mean of Fourier components _ (x) An
n (x)
Figure 4: Polar diagram of the components of a Fourier series at a point in a signal. The series is represented as a sequence of vectors, each vector having a length An and local phase angle n .
2.3.2
Local energy
As it stands phase congruency is a rather awkward quantity to calculate. As an alternative to this Venkatesh and Owens [89] show that points of maximum phase congruency can be calculated equivalently by searching for peaks in the local energy function. The local energy function is dened for a one dimensional luminance prole, I (x), as the modulus of a complex number, E (x ) = I 2 (x ) + H 2 (x ) , (4)
where the real component is represented by I (x) and the imaginary component by iH (x), where i = 1 and H (x) is the Hilbert transform of I (x) (a 90 degree phase shift of I (x)). Venkatesh and Owens prove that energy is equal to phase congruency scaled by the sum of the Fourier amplitudes, that is E (x ) = P C (x )
n
An .
(5)
Thus the local energy function is directly proportional to the phase congruency function, so peaks in local energy will correspond to peaks in phase congruency.
22
Venkatesh and Owens formal proof is not repeated here but the relationship between phase congruency, energy and the sum of the Fourier amplitudes can be seen geometrically in Figure 5. The local Fourier components are plotted as complex vectors adding head to tail. The sum of these components projected onto the real axis represent I (x), the original signal, and the projection onto the imaginary axis represents H (x), the Hilbert transform. The magnitude of the vector from the origin to the end point is the total energy, E (x). One can see that E (x) is equal to
n n
An cos(n (x) (x)). Recalling that phase congruency is equal to

n
An (x)cos(n (x) (x))/
An we can see that phase congruency is the ratio
of E (x) to the overall path length taken by the local Fourier components in reaching the end point. Thus, one can clearly see that the degree of phase congruency is independent of the overall magnitude of the signal. This provides invariance to variations in image illumination and/or contrast.
Imaginary axis
H(x)
An n (x)
E(x)
_
An cos( n (x) (x))
_
(x)
I(x)
Real axis
Figure 5: Polar diagram showing the Fourier components at a location in the signal plotted head to tail. This arrangement illustrates the construction of energy, the sum of the Fourier amplitudes and phase congruency from the Fourier components of a signal. Rather than compute local energy via the Hilbert transform of the original luminance prole one can calculate a measure of local energy by convolving the
23
signal with a pair of lters in quadrature. The signal is rst convolved with a lter designed to remove the DC component from the image. This result is saved and the image is then convolved with a second lter that is in quadrature with the rst (the Hilbert transform of the rst). This gives us two signals, each being a band passed version of the original, and one being a 90 degree phase shift of the other. The results of the two convolutions are then squared and summed to produce a local energy function. Odd and even-symmetric Gabor functions can be used for the quadrature pair of lters. Thus local energy is dened by E (x ) = (I (x ) M e )2 + (I (x ) M o )2 , (6)
where M e and M o denote the even and odd symmetric lters in quadrature. Figure 6 illustrates the calculation of local energy on a synthetic signal containing a variety of features. The calculation of energy from spatial lters in quadrature pairs has been central to many models of human visual perception, for example those proposed by Heeger [33, 34, 36], Adelson and Bergen [1] and Watson and Ahumada [93] to name just a few. The signicance of Venkatesh and Owens work is that they provide another explanation for the perceptual importance of energy: Peaks in the energy function correspond to points where phase congruency is a maximum. From this early work by Morrone et al. [62], Morrone and Owens [61] and Venkatesh and Owens [89] the local energy model was developed further. Owens et al. [67] investigated the idempotency properties of the local energy feature detector. They argue that when any feature detecting operator is applied to its own output it should not change the output. That is, the primal sketch of a primal sketch should be itself. Gradient based detectors fail in this respect because they attempt to mark edges on each side of any line feature in an image. Local energy, on the other hand, produces a single response on a line feature, and hence satises the idempotency requirement. Venkatesh and Owens [88] investigated the classication of image features via the phase angle at which phase congruency occurs. In this manner they show how step, line and shadow edges can be distinguished from each other.
24
signal 2.5 2 1.5 1 0.5 0 0 even-symmetric filter 50 100 150 200 250 odd-symmetric filter
convolution with even filter 3 2 1 0 -1 -2 -3 0 50 100 150 200 250 3 2 1 0 -1 -2 -3 0
convolution with odd filter
50
100
150
200
250
square and sum

local energy 3 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250
Figure 6: Calculation of local energy via convolution with two lters in quadrature. Aw et al. [4] in their work on image compression make use of the fact that local energy makes no assumptions about the intensity proles of features. They used local energy to detect features across a range of images, collecting information about commonly occuring intensity proles of features in images. This catalogue of feature proles enabled them to eciently encode images for compression.
25
Owens [66] identies the conditions under which images have no local maxima in local energy, and hence are feature free. She also investigates image transformations under which image features are preserved. It is pointed out that some image operations, such as addition between images, can destroy or create image features. She proposes two new operators for the interaction between images which do not corrupt feature structures within images. These operators are analogous to complex multiplication and complex division. Using these operators Owens shows how it is possible to decompose a signal into its feature component and its feature-free component. Other researchers who have studied the use of local energy for feature detection are Perona and Malik [68], Freeman [29] and Ronse [78]. Perona and Maliks work on local energy is interesting in that they arrive at a generalization of the model without using the concept of phase congruency. They point out that image features are generally composed of combinations of step, delta, roof and ramp structures. Under these conditions it is shown that linear lters will produce systematic errors in localization. Perona and Malik go on to show that a quadratic ltering approach results in the correct detection and localization of composite features. That is, instead of looking for maxima in (I (x) M ) one should look for maxima in
i
(I (x) Mi )2 , where the Mi are a series of dierent lters. The local energy
model, in its use of two lters in quadrature, can be seen to be a specic case of quadratic ltering. Perona and Malik suggest that there is no special reason to use lters in quadrature and argue that one might wish to use quite dierent sets of lters. However, in the results they presented they chose to use two lters in quadrature; the second derivative of a Gaussian and its Hilbert transform. Freeman, in his thesis [29] studied the local energy model with particular emphasis on multi-orientation analysis and the behaviour of local energy at feature junctions. He devised an approach to the detection and classication of feature junctions. The lters he used were generally second and fourth derivatives of Gaussians along with their corresponding Hilbert transforms, depending on the narrowness of the frequency tuning he required. As a tool for his multi-orientation analysis
26
Freeman developed the concept of steerable lters whereby lter outputs at any orientation can be eciently computed from a linear combination of the outputs of a limited number of basis lters. Of relevance to the work presented in this thesis, Freeman developed a normalized measure of local energy. However, his motivation for doing this was primarily to allow image information to be represented over a small dynamic range rather than to specically seek an invariant measure of feature signicance. Some of his post-processing techniques might also be considered to be somewhat ad hoc. Despite this he considers a wide range of issues concerning the use of local energy for feature detection. Ronse [78] makes a detailed mathematical study of the idempotency properties of the local energy model and the conditions of image modication over which local energy remains invariant. An important result, that will be used later, is that the locations of local energy peaks are invariant to smoothing of the image by a Gaussian or any other function having zero Fourier phase. Rosenthaler et al. [79] make a comprehensive study of the behaviour of local energy at 2D image feature points. They develop a model of 2D feature detection based on dierential geometry, using the rst and second derivatives of oriented local energy to identify what they call keypoints. Robbins and Owens [76] have followed on from Rosenthaler et al.s work and developed a simpler model of 2D feature detection that does not resort to the use of derivatives of the local energy signal. Instead, they detect 2D features by calculating oriented local energy over the image and then calculate local energy of this local energy image, but in an orientation perpendicular to the rst. The second application of local energy detects the end points of any features detected by the rst application of local energy. This process is then repeated over multiple orientations to capture all 2D features. Wang and Jenkin [92] use complex Gabor lters to detect edges and bars in images. They recognize that step edges and bars have specic local phase properties which can be detected using lters in quadrature, however they do not connect the signicance of high local energy with the concept of phase congruency. One issue that previous work on local energy has not really addressed is the problem of how one should integrate data over many scales. If the perceptual
2.4. ISSUES IN CALCULATING PHASE CONGRUENCY
27
signicance of a peak in local energy is due to it also being a maximum in phase congruency then it is important to consider many scales simultaneously. After all, it is the occurrence of phase congruency over a range frequencies that makes it signicant. While the use of the local energy function to nd peaks in phase congruency is computationally convenient it does not provide a dimensionless measure of feature signicance as it is weighted by the sum of the Fourier component amplitudes, which have units lux. Thus, like derivative based feature detectors, local energy suers from the problem that we are unable to specify in advance what level of response corresponds to a signicant feature. Despite this, local energy remains a useful measure in that it responds to a wide range of feature types. Phase congruency, on the other hand, is a dimensionless quantity. We obtain it by normalizing the local energy function; dividing energy by the sum of the Fourier amplitudes. Values of phase congruency vary from a maximum of 1, indicating a very signicant feature, down to 0 indicating no signicance. This property oers the promise of allowing one to specify universal feature thresholds, that is, we could set thresholds before an image is seen - truly automated feature detection.
2.4
Issues in calculating phase congruency
This section describes an initial attempt at devising a way of calculating phase congruency. What is highlighted is that there are a number of diculties that have to be overcome if a practical method of calculating phase congruency is to be devised. These problems include the following: How should one extend the idea of phase congruency to 2D signals? What is the appropriate way of controlling the scale of analysis? How should information be integrated over many scales, and how can the inuence of noise be overcome? As mentioned earlier, phase congruency is awkward to calculate. An initial approach to calculating phase congruency might be to take a signal, remove its DC component, (it is removed because a 90 degree phase shift of a zero frequency does not have any meaning) calculate the Hilbert transform (say, by calculating
28
the Fourier transform, multiplying the result by i and then performing an inverse Fourier transform), square and sum the Hilbert transform and the AC component of the signal, and nally normalize the result by dividing by the sum of the Fourier amplitudes. Results using this method were reported by Kovesi [47] (further work in which wavelets are used to calculate phase congruency were also presented by Kovesi [48]). An example of the calculation of phase congruency via the FFT is shown in Figure 7.
signal 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250
signal (DC removed) 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 3 2 1 0 -1 -2 -3 -4 0 50
Hilbert transform
100
150
200
250
square, sum and divide by sum of Fourier amplitudes

phase congruency 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250
Figure 7: Calculation of phase congruency via the FFT. Notice how phase congruency values range between 0 and 1. There are some problems with the calculation of phase congruency via the FFT. Firstly it is not clear how one adapts this approach for one-dimensional signals to two dimensions; the Hilbert transform is only dened in one dimension. A second diculty is that the Fourier transform is not good for localizing frequency
2.4. ISSUES IN CALCULATING PHASE CONGRUENCY
29
information spatially. In the example shown in Figure 7 the Fourier transform was calculated over the whole signal. Thus phase congruency at each point was calculated with respect to the whole signal. To control the local scale and spatial extent over which phase congruency is determined we have to use windowing of the signal. Windowing introduces the problem of having to balance spatial localization against the range of frequencies we wish to analyze; the window width controlling spatial localization but also constraining the lowest frequency we can measure. Figure 8 shows the result of calculating phase congruency using a rectangular windowing function 32 points wide. The computational procedure was as follows: Over each windowed section of the signal the Fourier transform was calculated, and the Hilbert transform generated. The signal value (minus the DC value) and the Hilbert transform value at the centre of the window was then squared and summed; this quantity would then be divided by the sum of the Fourier amplitudes over the current window to produce a phase congruency value at the centre position of the window. The window would then be indexed one point forward in the signal and the process repeated. Notice how the peaks in phase congruency are higher and more distinct. By windowing the signal each feature is considered in relative isolation to the others and hence ends up being considered to be very signicant. An important point to note here is that for the calculation of phase congruency the natural scale parameter to vary is the size of the analysis window over which we calculate local frequency information. A large window means that the signicance of features are determined in a more global manner, and a small window results in features being treated individually and locally. This leads to a new concept of multi-scale analysis which will be discussed in detail in the next chapter. If the scale of analysis of phase congruency is controlled by window size we must consider what might happen when a windowed section of signal contains no features and only consists of noise. Being a normalized quantity, phase congruency does not depend on the magnitude of a feature on its own, it depends on the magnitude of the feature in the context of the local window. Thus, if the signal is purely noise each uctuation in the signal will be considered quite signicant relative to the surrounding features as they will all be of similar magnitude. Hence, noise poses
30
signal 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250
signal (DC removed) 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 3 2 1 0 -1 -2 -3 -4 0 50
Hilbert transform
100
150
200
250

Figure 8: Calculation of phase congruency via the FFT using a rectangular windowing function 32 points wide. a serious diculty for us in trying to devise a practical way of calculating phase congruency in images. Figure 9 shows what happens if we introduce a small amount of noise into our signal. In regions that are distant from features the inuence of noise becomes very noticeable. A further issue we must also consider is that phase congruency as dened in Equation 3 does not take into account the spread of frequencies that are congruent at a point. For example, a signal containing only one frequency component, say a sine wave, will be in perfect congruence with itself and hence have phase congruency of 1 everywhere (the Hilbert transform of sine is cosine, and sin2(x) + cos2 (x) is identically 1 and so no point x has maximal local energy). To mark all such points as features would not make sense. Signicant feature points are presumably ones
2.5. SUMMARY
31
noisy signal 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250
noisy signal (DC removed) 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 3 2 1 0 -1 -2 -3 -4 0 50
Hilbert transform
100
150
200
250

Figure 9: Phase congruency of a noisy signal calculated using a rectangular windowing function 32 points wide. with high information content; a point of phase congruency indicates a point of high information content only if we have a wide range of frequencies present. We do not gain much information from knowing the phase congruency of a signal which has only one frequency component.
2.5
Summary
This chapter has briey re-examined the aims of feature detection. The objective should be to nd points of high information content in images. This objective is not necessarily satised by nding optimal ways of detecting step edges in the presence of noise. Ideally, the ability to detect features and assess their signicance should
32
be independent of image contrast and spatial magnication. This implies that we need to measure feature signicance via a dimensionless quantity. The shortcomings of derivative based feature detectors have been briey reviewed. The main problems included an inability to specify in advance what level of response corresponds to a signicant feature, and the fact that they are generally only designed to detect step edges. The local energy model of feature perception has been introduced. This model has been inspired from psychophysical data and it detects a wide range of feature types. Local energy can be normalized to produce a measure of phase congruency; an approximation of the standard deviation in phase angles of the Fourier components at a point in the signal. Phase congruency is a dimensionless quantity and is thus an attractive way of detecting features and identifying their signicance. It provides an absolute measure of the signicance of feature points in an image and this oers the promise of allowing constant threshold values to be applied across wide classes of images. Thresholds could then be specied in advance of processing any image, and not have to be determined by trial and error after processing. However, there are a number of issues to be addressed. How should phase congruency be calculated in 2D images? How should we calculate local frequency information and control the scale of analysis? How do we deal with the inuence of noise, and how do we identify the range of frequencies present at a point of phase congruency? These issues, and others, are addressed in the following chapter which will describe how phase congruency can be calculated using wavelets.
Chapter 3 Phase congruency from wavelets

3.1 Introduction
This chapter describes a new way of calculating phase congruency using wavelets. In calculating phase congruency it is important to obtain spatially localized frequency information in images, wavelets oer perhaps the best way of doing this. The use of wavelets is also biologically inspired; the interest in calculating phase congruency is motivated from psychophysical results, hence, it would seem natural that one should try to calculate it using biologically plausible computational machinery. In this respect geometrically scaled spatial lters in quadrature pairs will be used. In addition to this it will be seen how the use of wavelets allows one to address the issues raised at the end of the previous chapter regarding the calculation of phase congruency. The material that will be covered in this chapter is organized as follows: First, it will be shown how local frequency information can be obtained using quadrature pairs of wavelets, concentrating in particular on the use of Gabor wavelets. From this it is relatively straightforward to develop the ideas behind the calculation of phase congruency in one dimensional signals using wavelets. Material is then presented to address the diculties regarding the calculation of phase congruency that were introduced in the previous chapter. First, the inuence of noise in the calculation of phase congruency is considered and an eective method for identifying and compensating for noise is developed. This is followed by a section covering the 33
34
CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS
issues involved in extending the calculation of phase congruency to 2D images. It is then shown how the use of wavelets allow us to obtain a measure of the spread of frequencies present at a point of phase congruency. This helps one determine the degree of signicance of a point of phase congruency and allows to improve feature localization. The issue of analysis at dierent scales is then considered and it is argued that high-pass ltering should be used to obtain image information at dierent scales instead of the more usually applied low-pass ltering. Finally, some results and conclusions are presented.
3.2
Using Wavelets for Local Frequency Analysis
Recently the Wavelet Transform has become one of the methods of choice for obtaining local frequency information. Most of the current literature on wavelets can be traced back to the work of Morlet et al. [59] Morlet and his co-workers were interested in obtaining temporally localized frequency data in their analysis of geophysical signals. The basic idea behind wavelet analysis is that one uses a bank of lters to analyze the signal. The lters are all created from rescalings of the one wave shape, each scaling designed to pick out particular frequencies of the signal being analyzed. An important feature is that the scales of the lters vary geometrically, giving rise to a logarithmic frequency scale. However, many of these ideas were developed earlier by Granlund [30]. In this remarkable paper he developed many of the ideas behind what we would now call multi-scale wavelet analysis. He also proposed an image feature detector that is closely related to the local energy model. For some reason Granlunds paper has remained relatively unnoticed despite its innovative nature, though his work has been developed by Wilson, Calway and Knutsson (see, for example, Wilson, Calway and Granlund [96], Knutsson, Wilson and Granlund [43], Calway and Wilson [10] and Calway, Knutsson and Wilson [9]). From the initial work of Morlet and his colleagues wavelet theory has been subsequently developed by Grossmann and Morlet [31], Meyer [58], Daubechies [14], Mallat [53] and many others.
3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS
35
We are interested in calculating local frequency and, in particular, phase information in signals. To preserve phase information linear phase lters must be used, that is, we must use wavelets that are symmetric/anti-symmetric. This constraint means that the work on orthogonal wavelets (which dominates much of the literature) is not applicable to us. Chui [13] provides a proof that, with the exception of the Haar wavelet, one cannot have a wavelet of compact support that is both symmetric and orthogonal. The Haar wavelet is rectangular in shape and is clearly not appropriate for our needs. For this work the approach of Morlet will be followed, that is, using wavelets based on complex valued Gabor functions - sine and cosine waves, each modulated by a Gaussian. Using two lters in quadrature enables one to calculate the amplitude and phase of the signal for a particular frequency at a given spatial location. It should be noted that these wavelets are not orthogonal; some conditions must apply in order to achieve reasonable signal reconstruction after decomposition. However, we only require approximate reconstruction up to a scale factor over a band of frequencies or wavelet scales.
Odd Wavelet
Even Wavelet
Figure 10: Gabor wavelet: a sine and cosine wave modulated by a Gaussian. If the bank of wavelet lters is designed so that the transfer function of each lter overlaps suciently with its neighbours in such a way that the sum of all the transfer functions forms a relatively uniform coverage of the spectrum one can reconstruct the decomposed signal over a band of frequencies up to a scale factor. (If the transfer functions are scaled so that when their sum is taken we obtain a
36
uniform transfer function of magnitude one, the reconstructed signal will have the original scale.) Therefore, a problem we have is determining the appropriate scaling factor between successive centre frequencies so that the overlap between transfer functions results in an even spectral coverage. Granlund [30] suggests that the upper cuto frequency of one transfer function (where it falls to half its maximum value) should coincide with the lower cuto frequency of the next function. However, in practice this does not produce particularly even coverage, and a closer spacing is generally desirable. In the results presented in this chapter the lters used have had bandwidths of approximately one octave with a scaling between successive centre frequencies of 1.5. This arrangement was arrived at by experimentation, the values are not critical and a wide range of parameters produce satisfactory results. Referring to Figure 11 one can see that, in this example, the sum of the spectra of the ve wavelets produces a relatively ideal band-pass lter, especially when viewed on the log frequency scale. Design of the wavelet bank ends up being a compromise between wishing to form a smooth sum of spectra while at the same time minimizing the number of lters used so as to minimize the computation requirements. Analysis of a signal is done by convolving the signal with each of the wavelets.
e o If we let I denote the signal and Mn and Mn denote the even and odd wavelets at
a scale n, the amplitude of the transform at a given wavelet scale is given by A n (x ) = and the phase is given by
e o n (x) = atan2(I (x) Mn , I (x) Mn ). e )2 + (I (x ) M o )2 (I (x) Mn n
(7)
(8)
Note that from now on n will be used to refer to wavelet scale (previously n has denoted frequency in the Fourier series of a signal). The results of convolving a signal with a bank of wavelets can be displayed graphically via a scalogram (Figure 12). Each row of the scalogram is the result of convolving the signal with a quadrature pair of wavelets at a certain scale. Phase is plotted by mapping 0360 degrees to 0255 grey levels (note therefore, that the black/while discontinuities in the scalogram correspond to the wrap-around in
3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS
37
even-symmetric wavelets
odd-symmetric wavelets
spectra
spectra (log w)
0.125
0.25 0.375 frequency sum of spectra
0.5 log frequency sum of spectra (log w)
.125 .5
0.125
0.25 frequency
0.375
0.5 log frequency
.125 .5
Figure 11: Five wavelets and their respective Fourier Transforms indicating which sections of the spectrum each wavelet responds to. Collectively the wavelets provide a wide coverage of the spectrum, though with some overlap. Note that on a logarithmic frequency scale the spectra are identical. phase). The vertical axis of the scalogram is a logarithmic frequency scale, with the lowest frequency at the bottom. Each column of the scalogram can be considered to be a local Fourier spectrum for each point. Note that to achieve a dense scalogram such as shown here the scaling factor between successive lter center frequencies will be only slightly greater than 1. The phase plot of the scalogram is of particular interest because it enables one to actually see the points of high phase congruency. At locations in the signal where there are large step changes one can see a vertical line of constant grey value in the phase diagram indicating a constant phase angle over all frequencies at that point in the signal.
38
Signal to be Analyzed
Magnitude of Scalogram
* Phase of Scalogram
Figure 12: A one dimensional signal and its amplitude and phase scalograms. The horizontal axes of the scalograms correspond directly with the signals horizontal axis. The vertical axes of the scalograms correspond to a logarithmic frequency scale with low frequencies at the bottom. The asterisks mark vertical lines of constant phase that occur at the step transitions in the signal. These are points of phase congruency. (Note: the phase scalogram is presented by mapping 0360 degrees to 0255 grey levels).
3.3. CALCULATING PHASE CONGRUENCY VIA WAVELETS
39
3.3
Calculating Phase Congruency Via Wavelets
To calculate phase congruency we need to construct the following quantities: Firstly we need to remove the DC component from our signal, I (x) while at the same time retaining as many of the other frequency components as possible. We will denote this DC removed signal F (x). Note, we wish to retain a broad range of frequencies in our signal because phase congruency is only of interest if it occurs over a wide range of frequencies. Secondly we have to construct H (x), the Hilbert transform of F (x), and nally we need to calculate the normalizing component An (x), the
sum of the amplitudes of the frequency components in F (x). Note that the sum of amplitudes becomes a function of x as our frequency analysis of the signal is now spatially localized. The outputs from a quadrature pair of lters can be thought of as representing a vector in the complex plane; the real component coming from the even-symmetric lter output, and the odd-symmetric output representing the imaginary component (see Figure 13). If we sum the results of convolving our signal with the bank of even wavelets we will reconstruct a band passed version of our signal, amplied according to the scaling and overlap of the transfer functions of our lters. We can use this result for F (x), an approximation of the original signal with the DC component removed. An approximation of the Hilbert transform, corresponding to H (x), can be constructed from the sum of the convolutions of the signal with the odd wavelets; this is a signal covering the same bandwidth of the original signal and amplied in the same way as F (x), but shifted in phase by 90 o . Thus F (x ) =
n e I (x) Mn , and o I (x) Mn .
(9) (10)
H (x ) =
n
The sum of the amplitudes of the frequency components in F (x) is given by A n (x ) =

n n e )2 + (I (x ) M o )2 . (I (x) Mn n
(11)
With these three components we are able to calculate phase congruency. At this point the expression for phase congruency is slightly modied by adding a small positive constant to the denominator. This small constant, , prevents the
40
expression from becoming unstable when small. Thus P C (x ) = where E (x) =
An (x) (and hence E (x)) becomes very
E (x ) , n A n (x ) +
(12)
F (x)2 + H (x)2 . The appropriate value of depends on the preci-
sion with which we are able to perform convolutions and other operations on our signal; it does not depend on the signal itself. Large values of can be used to suppress the inuence of noise, but as we shall see in the next section there is a much better way to compensate for noise. An value of 0.01 has been used for all the results presented here. Note that the amplication of the signal due to the scaling and overlap in the lter transfer functions is cancelled out through the normalization process in the calculation of phase congruency. The use of wavelets allows one to obtain high spatial localization in the calculation of phase congruency. The frequency range over which phase congruency is calculated is easily controlled through the number of scales used in the wavelet bank. Figure 14 illustrates all the intermediate steps in the calculation of phase congruency on a one-dimensional signal. Note that this gure incorporates a measure of frequency spread in the calculation of phase congruency. Frequency spread is discussed in section 3.6.
3.3. CALCULATING PHASE CONGRUENCY VIA WAVELETS
41
odd symmetric filter output
even symmetric filter output 2
phase
3 4 frequency
odd symmetric filter output H(x) 1 E(x) sum of response vectors
odd symmetric filter output H(x) 1 4
3 E(x) 2
3 2 F(x) 4 even symmetric filter output F(x) even symmetric filter output
view along frequency axis
view along frequency axis with response vectors plotted head to tail
Figure 13: Calculation of phase congruency from convolution of the signal with quadrature pairs of lters. The convolution output from each quadrature pair of lters at a location in the signal can be considered to represent a response vector having length An and phase angle n . When the response vectors are plotted head to tail phase congruency can be seen to be the ratio of the length of the sum of the vectors to the total path length taken by the response vectors in reaching the end point.
42
signal 200 150 100 50 0 0 50 100 150 200 250 300
convolution with even filter at scale 1 150 100 50 0 -50 -100 -150 0 50 100 150 200 250 300 convolution with even filter at scale 4 150 100 50 0 -50 -100 -150 0 50 100 150 200 250 300 convolution with even filter at scale 7 150 100 50 0 -50 -100 -150 0 50 100 150 200 250 300
convolution with odd filter at scale 1 150 150 100 calculate magnitude 50 100 0 -50 50 -100 -150 0 0 50 100 150 200 250 300 0 convolution with odd filter at scale 4 150 150 100 calculate magnitude 50 100 0 -50 50 -100 -150 0 0 50 100 150 200 250 300 0 convolution with odd filter at scale 7 150 150 100 calculate magnitude 50 100 0 -50 50 -100 -150 0 0 50 100 150 200 250 300 0
amplitude at scale 1
50 100 150 200 250 300 amplitude at scale 4
50 100 150 200 250 300 amplitude at scale 7
50 100 150 200 250 300
sum
sum of convolutions with even filters 600 400 200 0 -200 -400 -600 0 50 100 150 200 250 300
sum
sum of convolutions with odd filters 600 400 200 0 -200 -400 -600 0 50 100 150 200 250 300 1000 800 600 400 200 0 0
sum
sum of amplitudes
50 100 150 200 250 300
calculate magnitude
signal 200 150 100 50 0 0 50 100 150 200 250 300 phase congruency 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300 1000 800 600 400 200 0 0 50 100 150 200 250 300 frequency spread 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300 local energy
calculate frequency spread
normalize energy by the sum of amplitudes and weight by frequency spread
Figure 14: Steps in the calculation of phase congruency. Note: Not all intermediate convolutions are displayed.
3.4. NOISE
43
3.4
Noise
A diculty with phase congruency is its response to noise. Figure 15 illustrates the phase congruency of a step function with and without noise. In this example the signal-to-noise ratio is 80 (here SNR is dened as step size/noise ). In the vicinity of the step, phase congruency is only high at the point of the step. However, away from the step the uctuations due to noise are considered to be signicant relative to the surrounding signal (which is noise). This will occur no matter how small the noise level is. This is the price one pays for using a normalized measure such as phase congruency.
step 1 0.5 0 -0.5 -1 0 50 100 150 200 250 300 1 0.5 0 -0.5 -1 0 50 100 150 200 250 300 noisy step
phase congruency 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300 1 0.75 0.5 0.25 0 0
phase congruency
50 100 150 200 250 300
Figure 15: Phase congruency of a step function with and without noise. What is it that makes noise noise? Trying to recognize noise by its spectral characteristics is of no use to us. The amplitude spectrum of noise is typically at, eectively indistinguishable from the spectral characteristics of a line (delta) feature in an image. However, we can make an attempt to identify the level of noise in an image if we make use of the following two observations: 1. Noise is everywhere in an image and its level is generally constant. 2. Features, such as edges, occur sparsely in an image (in fact on a set of measure zero).
44
If one assumes these two conditions (and assumes that the noise is additive) we can deduce that the response of the smallest scale (highest centre frequency) lter pair in our wavelet bank will be almost entirely due to noise. Only at feature points will the response dier from the background noise response, and the regions where it will be responding to features will be small due to the small spatial extent of the wavelet. The distribution of response amplitudes from the smallest scale lter pair across the whole image will be asymmetric. The peak of the distribution, representing response to noise, will be strongly skewed to the left and a long tail will extend at the upper end of the distribution as a result of responses to features. A simple mean of such a distribution will not provide us with a good estimate of the average noise response because the response to features will strongly bias the result. The approach that has been taken is to use the exponent of the mean (over x) of the log of the response amplitudes as an estimate of the average response to noise. That is, the estimate of the mean noise response of the smallest scale lter pair over the signal is given by A0 = elog A0 (x) .
(13)
Taking the log of the response amplitudes tends to discount the inuence of the tail at the upper end of the distribution1 . As illustrated in the example shown in Figure 16 this estimate is not perfect but is certainly better than using a simple mean. Another candidate for estimating the mean noise response is the mode of the distribution. However, in experiments it has been found that use of the mode can produce unreliable results. The response amplitude distribution is not always a smooth curve, and may not have a unique or clearly identiable maximum. Spurious spikes can also cause the mode to be a very poor estimator. If the noise distribution is uniform the typical maximum amplitude response to noise will be given by twice this estimated average noise response as the minimum possible response level will always be 0. If the noise distribution is Gaussian
Canny [12] encounters a similar problem in estimating the noise response of his optimal detector. He convolves the output of his edge detector output with the second derivative of an impulse function. The distribution of the output consists of a Gaussian noise distribution plus an extended tail due to valid edge responses. Canny chooses to take the mean of the lower 80% of the distribution to estimate the noise.
3.4. NOISE
45
(a) image
6000 5000 4000 3000 2000 1000 0 0 exp(mean(log(x))) mean(linear x)
(b) lter response amplitudes
2 4 6 8 amplitude of filter response
10
(c) histogram of amplitudes Figure 16: Example histogram of response amplitudes of a quadrature pair of small scale lters applied in one orientation to an image. (a) Image being analyzed. (b) Amplitude of smallest scale lter pair response (lters oriented horizontally). (c) Histogram of response amplitudes. Marked on the histogram is the simple mean of amplitudes and the exponent of the mean of the log of the amplitude values. (Note the response amplitude image contains oating point values. Values in the image were quantized into 1000 levels in order to construct the histogram. Note also that the histogram plot has been truncated; the maximum amplitude was approximately 53.) the typical maximum amplitude response will be approximately three times the estimated average noise response (the mean amplitude being an estimate of the standard deviation). It remains for us to estimate the response of the wavelets at other scales in our lter bank to the noise in the image, and then from this deduce the inuence of noise on our measure of phase congruency. If it is assumed that the frequency spectrum of the noise is at we can estimate
46
the noise response of other wavelets relative to the response of the smallest scale wavelet pair from the relative bandwidths of the wavelets. The square of a lter pairs response amplitude in the spatial domain will be related to the power spectrum of the signal via Rayleighs theorem: The integral of the squared modulus of a function is equal to the integral of the squared modulus of the spectrum. That is

|f (x)|2 dx =
|F ( )|2d .
If we assume the noise power level is constant with frequency then the square of a lters response amplitude will be proportional to its bandwidth. Thus, the amplitude of a lters response to noise will be proportional to the square root of its bandwidth (and hence the square root of its centre frequency). A2 nnoise bandwidth fn Annoise fn (14)
where Annoise is the amplitude of the noise response of the lter at scale n and fn denotes the centre frequency of the lter at scale n.
noise power spectrum A
2
wavelet amplitude response A
wavelet 1 frequency band wavelet 2 frequency band wavelet 3 frequency band
W1
W2
W3
log
Figure 17: The amplitude response of a 1D wavelet to noise is proportional to the square root of its bandwidth (and hence the square root of its centre frequency) if the noise has a uniform power spectrum. If the estimate of the mean noise response of the smallest scale wavelet pair over the whole image is given by A0 , then the amplitude of the inuence of noise on a lter at another scale can be estimated from the relative square roots of the lter bandwidths/centre frequencies. Annoise = fn A0 f0 , (15)
3.4. NOISE
47
where Annoise is the estimated amplitude of the inuence of noise on the lter at scale n, f0 is the centre frequency of the smallest scale (highest centre frequency) lter and fn is the centre frequency of the lter at scale n. If m is the scaling factor between successive wavelets then the lter at the nth scale will have a centre frequency given by fn = 1 f0 . mn (16)
If equations 15 and 16 are combined the magnitude of the total noise inuence over all lter scales can be estimated with the following expression T = k
A0 N 1 n=0
1 , mn
(17)
where N is the number of wavelet scales used, m is the scaling factor between successive lters and k is a scaling factor used to estimate the maximum amplitude of the noise response from the mean amplitude (typically k 2.5). The sum of the estimated noise responses over all the wavelet scales, T , will give us an upper bound on the eect of noise on the sum of the wavelet response amplitudes,
n An (x).
It is an upper bound because in general the noise responses
will not all be in phase; some will cancel. This bound on the eect of noise on
n
An (x) is in turn an upper bound of its eect on the value of local energy, E (x),
n
as
An (x) is always greater than, or equal to, E (x) by a triangle inequality. If
we subtract this estimated noise eect from the local energy before normalizing it by the sum of the wavelet response amplitudes we will eliminate spurious responses to noise. Thus, the expression for phase congruency is modied to the following: P C (x ) = E (x) T , n A n (x ) + (18)
where denotes that the enclosed quantity is equal to itself when its value is positive, and zero otherwise. The phase congruency of a legitimate feature will be reduced according to the magnitude of the noises local energy relative to the feature. Thus we end up with a measure of condence that the feature is signicant relative to the level of noise. This approach proves to be highly eective. Figure 19 shows the results of
48
odd symmetric filter output
H(x)
An n (x)
E(x)
_
An cos( n (x) (x))
_
(x) even symmetric filter output
T noise circle
F(x)
Figure 18: Compensating for noise in the calculation of phase congruency. The background value of energy generated from the image noise results in a noise circle of radius T. Phase congruency is now calculated by only using the amount by which the signal energy exceeds T before normalizing it by n An (x).
signal 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 300 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 300 signal
phase congruency 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300 1 0.75 0.5 0.25 0 0
phase congruency
50 100 150 200 250 300
Figure 19: Noise compensated phase congruency of two step proles; SNR 13.3 and 5.3 respectively. processing two noisy step proles. In both cases a k value of 2 was used to estimate the maximum inuence of noise on local energy. It should be re-emphasized here that this approach to noise compensation makes the assumption that the noise spectrum is uniform. In practice this will not be
3.5. EXTENSION TO TWO DIMENSIONS
49
exactly the case, particularly so with images from CCD cameras. Smear in pixel values along the rows of an image from a CCD camera result in the noise content having greater amplitude at lower frequencies in the horizontal direction. However, looking in the vertical direction one does nd that the noise spectrum is close to uniform as assumed. In all the results that are subsequently presented for 2D images in this thesis it has been assumed that the noise spectrum is uniform in all orientations. This has produced satisfactory results though no doubt improvements could be made if the spectral noise model was modied to match data measured from CCD cameras. Appendix B describes some experiments that were conducted to qualitatively assess the spectral properties of noise in images from CCD cameras.
3.5
Extension to two dimensions
So far the discussion has been limited to signals in one dimension. Calculation of phase congruency requires the formation of a 90o phase shift of the signal which we have done using odd-symmetric lters. As one cannot construct rotationally symmetric odd-symmetric lters one is forced to analyze a two dimensional image by applying our one dimensional analysis over several orientations and combining the results in some way. There are four issues to be resolved: The shape of the lters in two dimensions, the numbers of orientations to analyze, the way in which the results from each orientation are combined, and the changes for noise compensation that are required in two dimensions.
3.5.1
2D lter design
The one dimensional lters described previously can be extended into two dimensions by simply applying some spreading function across the lter perpendicular to its orientation. The obvious spreading function to use is the Gaussian and there are good reasons for choosing it. Consider a 2D wavelet being convolved with a step edge feature that is not aligned with the oriention of the lter, as shown in Figure 20. If the lter is separable the convolution can be accomplished by a 1D convolution in the vertical direction with the spreading function, followed by a
50
1D convolution horizontally with the wavelet function. Since we are interested in the phase information, the important thing to ensure is that convolution with the spreading function does not corrupt the phase data in the image.
+ -
Figure 20: Convolution of a 2D wavelet with an angled edge. In this example, the result of convolving the image vertically with a Gaussian spreading function will be to blur the edge so that on the subsequent convolution with the wavelet function we encounter a Gaussian smoothed step. Looking in the frequency domain, any function smoothed with a Gaussian suers amplitude modulation of its components, but phase is unaected (the transfer function of a Gaussian is a Gaussian). Thus, the phase congruency at the point of the feature is preserved. Ronse provides a formal proof of this result [78]. If on the other hand we were to, say, use a rectangular spreading function some phase angles in the signal would be reversed because the transfer function (a sinc function) has negative values. The phase values would be corrupted and the step edge would be changed to a ramp; we would then perceive two Mach bands along the ramp edges instead of one step edge.
3.5.2
Filter orientations
To detect features at all orientations the bank of lters must be designed so that it tiles the frequency plane uniformly. In the frequency plane the lters appear as 2D Gaussians symmetrically or anti-symmetrically placed around the origin, depending on the spatial symmetry of the lters. The length to width ratio of the 2D wavelets controls their directional selectivity. This ratio can be varied in conjunction with the number of lter orientations used in order to achieve an even coverage of the
51
rectangular spreading function
Gaussian spreading function
x transfer function transfer function
w w
(a)
(b)
Figure 21: (a) A rectangular spreading function has a transfer function that reverses some phase components. (b) A Gaussian spreading function only modulates amplitudes. 2D spectrum. A logical way to construct 2D lters in the frequency domain is to use polarseparable 2D Gaussians. In the radial direction, along the frequency axis the lters are designed in the same way as we have been designing 1D lters, that is, Gaussians with geometrically increasing centre frequencies and bandwidths. In the angular direction, the lters have Gaussian cross-sections, where the ratio between the standard deviation and the angular spacing of the lters is some constant. This ensures a xed length to width ratio of the lters in the spatial domain. Thus the cross-section of the transfer function in the angular direction is
G() = e
( 0 )2 2 2
(19)
where 0 is the orientation angle of the lter, and = s with s being a scaling factor and being the orientation spacing between lters. The scaling factor s, is set so that the overlap of the Gaussians in the angular direction is sucient to ensure even coverage of the 2D frequency plane. In the results presented here a scaling factor value of 1.2 has been used. This value is not critical, though it has been found that when values greater than about 1.4 are used, the uneven spectral coverage that results starts to cause errors in the normalizing of local energy to produce phase congruency. A lter orientation spacing of 30o has been found to provide a good compromise between the need to achieve an even spectral coverage
52
while minimizing the number of orientations. The use of more lter orientations does not change the quality of the results signicantly (see Appendix A). The nal arrangement of lters results in a rosette of overlapping polar-separable 2D Gaussians in the frequency plane (Figure 22). Simoncelli et al. [84] describe a systematic lter design technique for achieving uniform coverage of the frequency plane that could be applied here.
Figure 22: Tiling of the frequency plane with oriented lters at dierent scales.
3.5.3
Noise compensation in two dimensions
When compensating for noise in two dimensions one has to consider the spatial width of the lters in addition to their bandwidths. The spatial width of a lter aects its noise response. In the design of a lter rosette, the spatial width of a lter is varied with its centre frequency in order to maintain constant directional sensitivity. Viewed in the frequency plane the angular width of the lters is proportional to their bandwidth resulting in the smallest scale lters gathering an even greater proportion of energy from the spectrum than they do in the 1D case (see Figure 22). Thus, in 2D, the square of the amplitude response in the spatial domain will be proportional to the square of the bandwidth multiplied by the power level of the noise. That is, in 2D, the relative amplitudes of the lter responses to noise will now be directly proportional to their bandwidth. Accordingly the noise
53
compensation term, T now becomes T = k

A0 N 1 n=0
1 mn (20)
= k A0
1 N ) 1 (m , 1 1 m
where A0 = elog A0 (x,y) forms our estimate of the mean noise response of the smallest scale 2D lter pair over the image.
3.5.4
Combining data over several orientations
The important issue here is to ensure that features at all possible orientations are treated equally, and all possible conjunctions of features, such as corners and T junctions, are treated uniformly. Indeed, it is important to avoid making assumptions about the 2D form of features that one may encounter. It is important that the normalization of energy to form phase congruency is done after summing energies over all orientations. We want the nal result to be a weighted normalized value, with the result from each orientation contributing to the nal result in proportion to its energy. If energy was normalized in each orientation prior to being summed, then the contribution from each orientation would be independent of its energy, clearly an undesirable situation. The approach that has been adopted is as follows: At each location in the image we calculate energy, E (x) in each orientation, compensate for the inuence of noise, and then form the sum over all orientations. This sum of energies is then normalized by dividing by the sum over all orientations and scales of the amplitudes of the individual wavelet responses at that location in the image. This produces the following equation for 2D phase congruency: P C (x ) =
o o Eo (x)
To n Ano (x) +
(21)
where o denotes the index over orientations. On a straight edge segment, only lters with orientations roughly perpendicular to the edge will have signicant output; responses from other orientations will be negligible and the equation above will reduce to the 1D expression for phase
54
congruency developed earlier. At more complex feature types such as corners and T junctions, several orientations will have energy at signicant levels. However, the sum of the amplitudes of the individual wavelet responses will also increase, maintaining the appropriate normalization. Notice in the equation above that noise compensation is performed in each orientation independently. In practice this has been found to give signicantly better results as it allows eective noise compensation in the case when the image noise content is anisotropic. The perceived noise content as deduced from the average response of the smallest scale wavelet pair can vary signicantly with orientation. This is due to the correlation in noise along scan lines that is often observed, and other processes that can occur in the digitization of an image. Appendix B discusses this issue in more detail.
3.6
The importance of frequency spread
Clearly a point of phase congruency is only signicant if it occurs over a wide range of frequencies. As an extreme example of a signal with a narrow frequency spread consider a pure sine wave: its Hilbert transform will be a cosine, the sum of the squares of these two signals will be one everywhere and thus phase congruency will be one everywhere. Clearly, in this situation phase congruency is not acting as a feature detector. A more common situation is where a feature has undergone Gaussian smoothing, either intentionally or through image degradation. For these smoothed functions phase congruency can be high over extended regions of the signal resulting in poor localization. The Gaussian smoothing reduces the high frequency components in the signal and accordingly reduces the frequency spread. In the extreme, the frequency spread is reduced so much that locally we approach the situation that arises with pure sine functions. From Figure 23 we can see that to obtain good localization it is important to incorporate low frequency components in the calculation of phase congruency on smoothed proles. These low frequency components are the least aected by the smoothing of the signal.
3.6. THE IMPORTANCE OF FREQUENCY SPREAD
55
Gaussian smoothed step 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300
(a) Smoothed step prole.

phase congruency 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300
(b) Phase congruency considering wavelengths up to 65 units. The undulations in phase congruency are due to numerical eects.
phase congruency (including low frequencies) 1 0.75 0.5 0.25 0 0 50 100 150 200 250 300
(c) Phase congruency considering wavelengths up to 150 units. Figure 23: Phase congruency of a Gaussian smoothed step prole at two dierent scales of analysis. Thus we are concerned with two conditions: one is where there is a very narrow range of frequencies present causing the values of energy and
n
An to become
equal, making phase congruency one. The other is where only a limited range of frequencies is considered, as in Figure 23(b), and one encounters no signicant frequency content at all. In this situation both energy and
n
An become very
small, making the calculation of phase congruency poorly conditioned. This second situation is handled through the use of the parameter , in the denominator of the
56
expression for phase congruency. Thus, as a measure of feature signicance phase congruency should be weighted by some measure of the spread of frequencies present. What then is a signicant distribution of frequencies? If we consider some common feature types such as the square waveform (step edge), the triangular waveform (roof edge) and the delta function (line feature) as some of the edgiest waveforms imaginable we can use their frequency distributions as a guide to the ideal. The power spectrum of a square wave is of the form 1/ 2 . Each of the wavelets that we use to analyze the signal gathers power from geometrically increasing bands of the spectrum. On a spectrum of this shape each wavelet will gather an amount of power that is inversely proportional to its bandwidth. (The power level is inversely proportional to the centre frequency squared, but the bandwidth over which the power is gathered is proportional to centre frequency). In a manner analogous to the analysis of lter amplitude responses to noise we can now deduce the expected distribution of lter amplitude responses to a square waveform. However, unlike the situation with noise, a lters response to a step feature is localized in space. The spatial width of the response is dictated by the spatial width of the lter, which is inversely proportional to its bandwidth. So for each lter, while the power it captures falls with bandwidth, the spatial width over which that power is expressed in terms of the square of the amplitude response, also falls with bandwidth. Therefore, to preserve the expected power gathered by each lter the magnitudes of the squared amplitude response have to remain constant, independent of lter centre frequency. Hence, the expected distribution of amplitude responses to a step feature will be a uniform one. This is illustrated in Figure 24. Field [20] points out that in many cases images of natural scenes have overall power distributions that are inversely proportional to the frequency squared, and for this reason he also advocates the use of geometrically scaled lter banks. Under these conditions lters at all scales will, on average, be responding with equal magnitudes. This is likely to maximize the precision of any computation (numerical or neural) that we make with the lter outputs. It should be noted that the Gaussian smoothing of a feature that arises whenever
57
step function
power spectrum of step
1 0 0.125 0.25 frequency 0.375 0.5
wavelet amplitude response at scale 1 2.5 2 1.5 1 0.5 0 0 1
wavelet transfer function at scale 1
0.125
0.25 frequency
0.375
0.5
0.125
0.25 frequency
0.375
0.5
0.125
0.25 frequency
0.375
0.5
Figure 24: On a step feature, wavelet pairs at all scales will respond with the same amplitude. a 2D lter having a Gaussian spreading function encounters a feature at some non-orthogonal angle (see Figure 20) does not signicantly change the frequency distribution relative to what would be expected from an unsmoothed feature. The reason for this is that the degree of Gaussian smoothing, as seen by each lter, varies directly in proportion with the scale of each lter due to its xed length/width ratio. Consider the convolution process in the frequency domain: A lter at some given centre frequency will see a feature spectrum that has been modulated by a Gaussian of some spread. A second lter tuned to a frequency of twice the rst
58
will see a feature spectrum that has been modulated by a Gaussian of twice the spread (spatially, its spreading function is a Gaussian of half the width of the rst). The relative modulation of the feature spectrum as seen by each individual lter over its bandwidth will therefore remain roughly constant. This is illustrated in Figure 25. Thus, the overall distribution of lter responses, while modulated, will remain largely unchanged. The other important feature types we must consider are the delta function (corresponding to line features) and roof edges. The power spectrum of a delta function is uniform. Following an analysis similar to that done for the square waveform one can show that for a delta feature the amplitude of the wavelet lter responses will be proportional to their bandwidths, and hence their centre frequencies. This will give a distribution of lter responses strongly skewed to the high frequency end. On the other hand, for a triangular waveform where all the features are roof edges, the power spectrum falls o like 1/ 4 . When considered in terms of the amplitude of lter responses on a logarithmic frequency scale this will produce a distribution strongly skewed to the low frequency end. Thus, the diculty faced here is that there is no one ideal distribution of lter responses. Given the very dierent types of distributions expected for delta and roof edges, we cannot even say the ideal distribution should be a function of the phase angle at which the congruency occurs. All we can say is that the distribution of lter responses should not be too narrow in some general sense. We can also say that a uniform distribution is of particular signicance as step discontinuities are a common feature type in images. Accordingly we can construct a weighting function that devalues phase congruency at locations where the spread of lter responses is narrow. A measure of lter response spread can be generated by taking the sum of the amplitudes of the responses and dividing by the highest individual response to obtain some notional width of the distribution. If this is then normalized by the number of scales being used, we obtain a fractional measure of spread that varies between 0 and 1. This spread is given by s( x ) = 1 N A n (x ) Amax (x) +
n
(22)
59
- ++ -
(a) Filters at scales diering by a factor of two encountering a step edge at an angle.
(b) Equivalent 1D signals seen by each lter after 1D convolution with each lters vertical spreading function. The broad scale lter sees a more heavily smoothed feature.
spectrum of step function transfer function of smoothing filter spectrum of smoothed step transfer function of wavelet filter spectrum of step function transfer function of smoothing filter spectrum of smoothed step transfer function of wavelet filter
section of spectrum seen by the wavelet filter
section of spectrum seen by the wavelet filter
frequency
frequency
(c) Processing of the respective signals in the frequency domain. Figure 25: When lters having xed length-width ratios are applied to an angled step edge the degree of smoothing of the feature resulting from the lters spreading function will vary with the scale of the lter. The eect of this is that the shape of the distribution of lter amplitude responses will not be greatly dierent from that obtained when the lter encounters a perpendicularly oriented edge. where N is the total number of scales being considered, Amax (x) is the amplitude of the lter pair having maximum response at x, and is used to avoid division by zero
60
and to discount the result should both
An (x) and Amax(x) be very small 2. If all
the lter responses are equal we have the maximum possible frequency spread, and the spread measure becomes one. If only one lter response is non-zero (minimal spread) the spread measure falls to 1/N . A phase congruency weighting function can then be constructed by applying a sigmoid function to the lter response spread value, namely W (x ) = 1 1+ eg(cs(x)) , (23)
where c is the cut-o value of lter response spread below which phase congruency values become strongly penalized, and g is a gain factor that controls the sharpness of the cut-o. Typical values for c and g are 0.4 and 10 respectively; a plot of the weighting function with these parameters is shown in Figure 26. Note the sigmoid function has been merely chosen for its simplicity and ease of manipulation.
1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 frequency spread 0.8 1
Figure 26: Frequency spread weighting function with a cut-o value of 0.4 and g value of 10. In 2D, the weighting of phase congruency by frequency spread at each point in an image has to be done in each orientation independently. Thus the weighting factor is applied to the energy in each orientation prior to summing over all orientations and normalizing by the total sum of the lter response amplitudes. We modify equation 21 to produce the following: P C (x ) =
o (Wo (x)Eo (x)
To ) . n Ano (x)
(24)
Figure 14 provides an example of how the frequency spread weighting function varies across a 1D signal. Weighting by frequency spread, as well as reducing
The presence of the small constant to avoid division by zero will, of course, mean that s(x) can never attain its ideal maximum value of 1.
3.7. SCALE VIA HIGH-PASS FILTERING
61
spurious responses where the frequency spread is low, has the additional benet of sharpening the localization of features, especially those that have been smoothed. Referring to Figure 14, it is of interest to note that peaks in E (x),
n
An (x), and to
a lesser extent, W (x) are all strong indicators of the presence of a feature, though it is phase congruency that provides a dimensionless quantity that is invariant to contrast.
3.7
3.7.1
Scale via high-pass ltering

Diculties with low-pass ltering
The traditional approach to analyzing an image at dierent scales is to consider various low-pass or band-passed versions of the image. Versions of the image having only low frequencies left are considered to contain the broad scale features. This approach is inspired from the presence of receptive elds in the visual cortex that act as band-pass lters [54]. While this approach is intuitive, the justication for assuming the brain uses these band-passed versions of the image directly for multiscale analysis is perhaps somewhat circular. On being presented with a low-pass version of an image one is asked What features do you see? Of course you only see the broad scale features - they are the only things left in the image to be seen. A major problem with the use of low or band-pass ltering for multi-scale analysis is that the number of features present in an image, and their locations, vary with the scale used. It seems very unsatisfactory for the location of a feature to depend on the scale at which it is analyzed. This is a problem for dierential based feature detection schemes that require smoothing to suppress the inuence of noise. It also creates diculties for applications that make use of coarse-to-ne strategies for feature matching. Feature positions drift with scale, localization at broad scales is poor and the number of features proliferate as scale is reduced. This problem was recognized by Marr and a good illustration of this is found in his book3 which shows zero crossings in an image of a Henry Moore sculpture at three dierent scales. In many parts of the image zero crossings at one scale do not correspond to any other
Marr. Vision. 1982, Figure 2-20, page 69
62
zero crossings at other scales. In his caption for this sequence of images Marr says ... This set of gures neatly poses the next problem - How do we combine all this information into a single description? Marr proposes a coarse-to-ne strategy to resolve this problem, but he has diculties in constructing a truly systematic approach and recognizes that many special cases would have to be accounted for. Witkin [97] introduces the idea of a scale-space image in an attempt to resolve this problem, that is, dierent settings of the scale parameter result in dierent descriptions of the scene, and notes that we have no way of knowing what scale of analysis is correct. He argues that signals should be analyzed over a continuum of scales to produce a scale-space image. He also species the requirement that while additional zero crossings in the scale-space image may appear as scale is decreased, existing ones should not disappear. Witkin shows that the Gaussian is the only smoothing kernel that guarantees this behaviour. Then, by considering the trajectories of zero crossings over scale he suggests that the scale-space image can be collapsed into a tree structure that provides a concise, qualitative description of the signal over all scales. Witkin also makes the observation that features that we consider most salient in a signal seem to correspond to zero crossings that remain stable over wide ranges of scale. Koenderink [46] develops Witkins analysis of 1D signals and extends them to 2D images. He recognizes the connection between smoothing in scale-space and the diusion equation and this allows him to develop a detailed study of the behaviour of scale-space in 2D. While Witkin and Koenderink advocate the use of zero crossings in scale-space for representing signals Hummels work [38] suggests that there are considerable diculties in this approach. Hummel investigates the representation of signals in terms of their zero crossings in scale-space and ones ability to reconstruct the signal from this data. He concludes that for general signals this representation is unstable. Further, he shows that even when gradient information at the zero crossings is used the representation is still unstable.
63
Bergholm [5] adopts the scale-space model in developing his edge focusing approach to edge detection (described in Chapter 2). He makes a detailed study of the way feature positions are corrupted by Gaussian smoothing, this information being essential to the design of his coarse to ne strategy of nding edges. However, he reports diculties in tracking blurred edges where the nal edge points end up being scattered and broken up, presumably because scale-space becomes degenerate at the nest scales on a heavily blurred feature. Perona and Malik [69] make a strong critique of the standard scale-space model. They are particularly concerned with the fact that the position of a feature is not known unless it is tracked through scale-space to the nest level and they argue that feature positions should be stable over scale. They develop an edge detection technique that is inspired by Koenderinks suggestion that the diusion equation could possibly be run backwards in time to enhance an image; this however, is an ill-posed problem. Instead Perona and Malik devise a method to run the diusion equation forwards in time to enhance edges. Their approach was described in Chapter 2.
3.7.2
High-pass ltering
It is clear that there are many problems with the concept of scale which have not been satisfactorily resolved. In particular, the idea that a features position should vary with the scale of analysis is very unsatisfactory. It is argued here that feature location should not be a function of the scale of analysis, only the relative signicance of features should change. Subsequent image operations such as coarse-to-ne strategies would be greatly facilitated (or not even needed) if only the relative signicance of features varied with scale, with the number of features and their locations remaining stable. Much of the thinking about scale has been strongly inuenced by the problems of applying dierential operators to images. The use of phase congruency to measure feature signicance allows one to consider an alternative interpretation of feature scale. Phase congruency at some point in a signal depends on how the feature is built up from the local frequency components. Depending on the size of the
64
analysis window, features some distance from the point of interest may contribute to the local frequency components considered to be present. Thus features are not considered in isolation but in context with their surrounding features. Therefore, as far as phase congruency is concerned, the natural scale parameter to vary is the size of the window in the image over which we perform the local frequency analysis. In the context of the use of wavelets to calculate phase congruency the scale of analysis is specied by the spatial extent of the largest lter in the wavelet bank. (Filters in the wavelet bank range from some minimum spatial scale, say corresponding to the Nyquist frequency, up to a scale matching the window size). With this approach high-pass ltering is being used to specify the analysis scale. Low frequency components are being cut out (those having wavelengths larger than the window size) while leaving the high frequency components intact. If we use a small analysis window, each feature will be treated with a great degree of independence from other features in the image. We will only be comparing each feature to a small number of other features that are nearby, and hence each feature is likely to be perceived as being more important locally. At the largest scale (window size equal to image size) each feature is considered in relation to all other features, and we obtain a sense of global signicance for each feature. Something that may have high phase congruency/signicance when analyzed over a small window may end up having a low measure of phase congruency if we look at it within a larger analysis window and consider the phase of the lower frequencies present. A feature having high phase congruency when analyzed over a wide range of spatial scales from small to large is clearly of greater signicance than one having congruency over a limited number of small spatial scales. It should be noted that the original ideal that the signicance of image features should be invariant to image magnication is not really attainable. In practice we have to compute phase congruency using a nite number of spatial lters that cover a limited range of the spectrum. Changing the magnication of an image may alter the relative responses of individual lters and hence change the perceived phase congruency. However, the changes in measured phase congruency will, in general, be much smaller than the corresponding changes that would occur in the intensity
65
gradients of the image, and hence the in output of a gradient based edge detector. In summary, it is proposed that multi-scale analysis be done by considering phase congruency of diering high-passed versions of an image. The high-pass images are constructed from the sum of band-passed images, with the sum ranging from the highest frequency band down to some cut-o frequency. With this approach, no matter what scale we consider, all features are localized precisely and in a stable manner. There is no drift of features that occurs with low-pass ltering. All that changes with analysis at dierent scales is the relative signicance of features. Figure 27 illustrates a one-dimensional signal at three dierent scales of low-pass, band-pass and high-pass ltering, along with phase congruency at the three highpass scales. The low-pass signals were obtained by convolving the signal with Gaussian masks of dierent widths. The band-pass signals were obtained through convolution with even Gabor lters. The high-pass signals were obtained by summing the convolution results from even Gabor lters ranging from the nest scale (wavelength 2 pixels) down to varying levels of coarser scale. In addition to the 1D example shown here Appendix A contains experimental results where the stability of phase congruency on 2D images at dierent levels of high-pass ltering is demonstrated.
66
signal
signal
signal
signal
low-pass 1
band-pass 1
high-pass 1 1 0.8 0.6 0.4 0.2 0
phase cong. 1
low-pass 2
band-pass 2
high-pass 2 1 0.8 0.6 0.4 0.2 0
phase cong. 2
low-pass 3
band-pass 3
high-pass 3 1 0.8 0.6 0.4 0.2 0
phase cong. 3
(a) low-pass
(b) band-pass
(c) high-pass
(d) phase congruency
Figure 27: Analysis at dierent scales. (a) Three dierent low-pass versions of a 1D signal. (b) Three dierent band-pass versions of a 1D signal. (c) Three dierent high-pass versions of the signal. (d) Phase congruency at the three scales of high-pass ltering shown in (c). Note that the number and location of features as measured by phase congruency remains constant and only their relative signicance varies. Under low-pass and band-pass ltering the number and locations of features varies.
67
3.7.3
High-pass ltering and scale-space
There is a mirror symmetry in the relationship between the classical low-pass ltering model of scale-space and the high-pass model. In the low-pass model the low scale image is the one that has been heavily smoothed, the highest scale image corresponds to the original image. On the other hand, the high-pass approach to scale considers the original image as representing the lowest scale of image with higher scales obtained via high-pass ltering. This is illustrated by Figure 28.
low-pass filtering
increasing scale
high-pass filtering
decreasing scale
Nyquist frequency
Figure 28: Filter transfer functions for low-pass and high-pass approaches to scalespace. With low-pass ltering the nest scale is achieved when the whole of the image spectrum is included. For high-pass ltering this situation is considered to represent the lowest scale of image. If we look at the phase scalogram of a test prole (Figure 29) we can gain further insights in the dierences between using high and low-pass ltering for multi-scale analysis. The lines of phase congruency corresponding to the proles feature points are readily seen in the scalogram. They all start at the nest scale (at the top of the scalogram) and extend downwards. The extent to which these lines of phase congruency extend down species the overall signicance of a feature. High-pass ltering corresponds to extracting sections of the scalogram from the top downwards. Thus, at a ne scale of analysis with high-pass ltering one will
68
signal
Figure 29: Test prole and its phase scalogram. Highest frequencies are at the top of the scalogram and lowest at the bottom. be considering a narrow section of the scalogram at the top. All lines of phase congruency will span the full width of the section being considered and hence will all be considered to be equally signicant. As the analysis section of the scalogram is progressively widened downwards some lines of phase congruency will terminate. For these features the measured phase congruency will be reduced and their relative signicance will fall. The perceived location of these features, given by the point of maximal phase congruency, remains stable. Thus we have an interpretation of scale where only relative feature signicance changes with the scale of analysis, not their locations. In contrast to this approach low-pass ltering corresponds to extracting sections of the scalogram from the bottom upwards. Analysis of a heavily smoothed image corresponds to the analysis of a narrow section at the bottom of the scalogram.
3.8. EXPERIMENTAL RESULTS
69
Here one has only the vaguest idea of what features are present and what their locations are. As the analysis section is progressively extended upwards (in a coarse to ne strategy) new features appear and existing feature positions drift. Clearly, interpretation of the signal with this approach is far more dicult.
3.8
Experimental Results
A problem in discussing the performance of a feature detector is devising a sensible form of evaluation. Performance criteria have been used by a number of researchers to design edge operators, notably Canny [11, 12], Spacek [87] and Deriche [16]. These criteria generally measure the ability of a detector to produce a distinct local maximum at the point of a step discontinuity in the presence of noise. However these criteria are limited in their usefulness. They are only concerned with specic feature types, usually step discontinuities, and they are not concerned with the absolute value of the resulting maxima in the detectors output. They provide no guide as to ones ability to set general thresholds. A feature detector is of limited use if one does not know in advance what level of response corresponds to a signicant feature. One of the primary motivations for using phase congruency to detect image features is that it is unaected by image contrast and scale. This allows one to set thresholds that are applicable across wide classes of images. The other motivation for detecting features on the basis of phase congruency is that we are not required to make any assumptions about the luminance prole of the feature; we are simply looking for points where there is order in the frequency domain. Step discontinuities, lines and roof edges are all detected. Figure 30 illustrates some results on a test image containing line features and step discontinuities of various magnitudes and orientations. For comparison, the output of the Canny detector is also presented [11]. The implementation of the Canny detector used here follows the modications suggested by Fleck [24]. The raw, gradient magnitude image is displayed so that comparison can be made without having to consider any artifacts that may be introduced by non-maximal suppression
70
and thresholding processes. The purpose of providing this comparison is to illustrate some of the qualitative dierences in performance between the two detectors. It is hard to make meaningful quantitative comparisons as the design objectives of the two detectors are completely dierent. One is seeking to localize step edges and the other is seeking to identify points of phase congruency. In all the results presented here a wavelet bank over 6 scales was used, the smallest lters having a period of 3 pixels, and the scaling factor between successive wavelets being 1.5. Filters were constructed directly in the frequency domain as 2D Gaussians in the frequency plane. For each lter the ratio between the centre frequency and the Gaussian standard deviation was set at 3 (see Appendix D for implementation details). The wavelets had a length to width ratio of one and their orientational spacing was 30o . The nal output was non-maximally suppressed and hysteresis thresholding applied between phase congruency values of 0.5 to 0.3. A noise compensation k value of 2.5 was used, and was set at 0.01. For the frequency spread weighting function the cut-o fraction, c was set at 0.4 and the gain parameter, g was set at 10. These parameters have been found to give reasonable results over a wide range of images. The non-maximal suppression technique that was used to produce the phase congruency edge maps is described in Appendix C. The Canny edge detector output was calculated after smoothing the images with a Gaussian mask having a standard deviation of 1 pixel. The main qualitative dierence between the two detectors is the wide range of response values from the Canny detector. For example, with the Canny detector, the low contrast square in the circle at the top right of the image almost disappears, whereas under phase congruency it is marked prominently. This wide range of responses from the Canny detector makes threshold selection dicult. The other obvious dierence is that the Canny detector produces responses on each side of line features, whereas the phase congruency detector produces a response centred on the line (this problem was recognized by Canny and he designed a separate operator to detect line features).
71
One problem with the phase congruency detector (or at least this implementation of it) is its behaviour at junctions of features having greatly dierent magnitudes. Notice how the horizontal edges in the grey scale on the left hand side of the image fade as they meet the strong vertical edge of the grey scale. At the junction between the low magnitude horizontal edges and the high magnitude vertical edge the normalizing component of phase congruency,
n
An , is dominated by the mag-
nitude of the vertical edge. Thus, at the junction, the signicance of the horizontal edges relative to the vertical one is downgraded. Possibly this problem could be overcome through a dierent approach to combining phase congruency information over several orientations. Freeman [29], in his use of a normalized form of energy for feature detection, encountered this same problem at junctions of varying contrast. His solution was to normalize energy in each orientation independently using only energy responses from lters in the same orientation. The energy values used for the normalization were blurred spatially in a direction perpendicular to the orientation being considered. A problem here is that one has to decide on the extent of the spatial smoothing. In addition to this we cannot employ Freemans approach to this problem because, as discussed in section 3.5.4, we have specically chosen not to apply the normalization in each orientation independently prior to combining the results over all orientations, otherwise the contribution from the result in each direction would be independent of its energy, which is clearly not desirable. Others who have studied the behaviour of local energy at feature junctions are Rosenthaler et al. [79] and Robbins and Owens [76]. Their interest is specically in the detection of 2D features, such as junctions and corners. Here, the fact that the sum of energy over multiple orientations at junctions becomes large makes it well suited to the task of 2D feature detection. Figures 31 and 32 illustrate the output on two real images. Notice how, on the face image, the nose and the cheek features have been detected. These features are eectively broad roof edges. The use of low frequency components in the calculation of phase congruency contribute greatly to the detection of such features. However, the use of these low frequency components is usually important for another reason. The other major qualitative dierence between the raw phase
72
(a)
(b)
(c)
(d)
Figure 30: Results on a 256x256 test image. (a) Test image. (b) Raw output of phase congruency. (c) Phase congruency after non-maximal suppression and hysteresis thresholding between phase congruency values of 0.5 to 0.3. (d) Raw output of Canny detector (standard deviation of smoothing lter = 1). congruency output and the Canny output is the localization of the response to features. The raw phase congruency output is generally quite blurred. This is especially evident in the results on these two real images. The features in real images are usually smoothed or blurred in some way due to sampling; this reduces the spread of frequencies present in the signal, which in turn results in a poorly
73
(a)
(b)
(c)
(d)
Figure 31: (a) Grace image (256x256). (b) Raw output of phase congruency. (c) Phase congruency after non-maximal suppression and hysteresis thresholding between phase congruency values of 0.5 to 0.3. (d) Raw output of Canny detector (standard deviation of smoothing lter = 1). localized phase congruency output. Hence, the importance of using low frequency components in the calculation of phase congruency. However, this alone is not enough to completely solve the problem and many ner scale features are lost in the poorly localized phase congruency response. Freeman [29] also reports diculties with local energy producing poorly localized output, though some of his problems
74
(a)
(b)
(c)
(d)
Figure 32: (a) Boat image (512x512). (b) Raw output of phase congruency. (c) Phase congruency after non-maximal suppression and hysteresis thresholding between phase congruency values of 0.5 to 0.3. (d) Raw output of Canny detector (standard deviation of smoothing lter = 1). are a result of the blurring he uses to reduce interference eects at feature junctions. He resorts to applying a second local energy calculation over the energy image to produce a sharper output. This approach is not adopted here and the problem of poor localization is addressed in a dierent manner in the following chapter.
3.9. SUMMARY
75
3.9
Summary
This chapter has presented a practical implementation for the calculation of phase congruency in two-dimensional images using wavelets. It has been shown that for a normalized measure of feature signicance, such as phase congruency, it is crucial to be able to recognize the level of noise in an image and to compensate for it. A highly eective method of compensation has been presented that only requires that noise be roughly constant across the image and that its power spectrum be approximately constant. An argument has been presented for the importance of weighting phase congruency by some measure of the spread of the frequencies that are present at each point in an image. This prevents false positives being marked where the frequency spread is very narrow. It also improves the localization of features. While it is not possible to specify one ideal distribution of lter response amplitudes with frequency, it has been shown that when geometrically scaled lters are used, a uniform distribution of responses is a particularly signicant one. This distribution matches typical spectral statistics of images and corresponds to the distribution that arises at step discontinuities. Another contribution of this work has been to oer a new approach to the concept of scale in image analysis. The natural scale parameter to vary in the calculation of phase congruency is the size of the analysis window over which to calculate local frequency information. Thus, under these conditions, scale is varied using high-pass ltering rather than low-pass or band-pass ltering. The signicant advantage of this approach is that feature locations remain constant over scale, and only their signicance relative to each other varies. While the motivation for detecting phase congruency in images has been inspired from psychophysical results (Morrone et al. [62], Morrone and Burr [60]) one can only speculate on the biological plausibility of the method of calculation presented in this chapter. Despite the use of geometrically scaled lters in quadrature the implementation presented here is not biologically plausible in its present form as it requires intermediate convolution results to represent arbitrarily large positive
76
and negative values (see Figure 14). Physical cells eventually saturate in their output, and are generally considered only to be able to represent some kind of rectied response. Despite this, it is of interest to compare the general form of the expression for phase congruency (equation 24) with the form of some of the normalized contrast sensitivity models that have been recently developed to model psychophysical data. For example, some of the similarities between this work and the models proposed by Heeger [36, 35] are very striking. While many of the issues in the practical implementation of the calculation of phase congruency in images have been addressed in this chapter the poor localization of blurred features remains a problem. The following chapter reviews a number of issues associated with the calculation of phase congruency and in doing so develops a new approach that provides much better localization and also oers other possibilities in the use of local phase information in images.
Chapter 4 A second look at phase congruency

4.1 Introduction
This chapter takes a fresh look at phase congruency and re-examines some of the assumptions that were used and the techniques that were employed. Much of the work presented in this chapter is the result of investigating the use of log Gabor functions for the wavelet design. Unlike Gabor functions there are no limitations in the maximum bandwidth that can be obtained from log Gabor functions. Increasing the bandwidth of a lter generally corresponds to reducing its size in the spatial domain. Accordingly, it was hoped that log Gabor functions could be exploited to produce lters of small spatial extent, and that this would result in a more localized phase congruency response to features in images. Paradoxically, phase congruency localization is made worse if one takes this approach. To solve this problem an analysis of the phase congruency function itself is made. It is noted that that the phase congruency measure that has been used so far is a function of the cosine of the phase deviation. As this function is not sensitive to small deviations of phase it is concluded that this is the main cause of the poor localization that had been achieved so far with phase congruency. To remedy this a new measure of phase congruency, based on the weighted mean absolute deviation of phase from the weighted mean, is devised. This new approach proves to be very 77
78
CHAPTER 4. A SECOND LOOK AT PHASE CONGRUENCY
successful and it also opens up some new ideas in the use of phase information for extracting feature information. This work, in turn, leads to an approach to the detection of points of local symmetry and asymmetry in images. It is shown how symmetry and asymmetry can be thought of as representing generalizations of delta and step features respectively.
4.2
Log Gabor wavelets
So far Gabor wavelets have been used for the calculation of phase congruency. They would appear to be the logical choice as they oer the best way of simultaneously localizing spatial and frequency information. Is this really the objective we want? An alternative objective of our wavelet design might be to obtain as broad as possible spectral information with maximal spatial localization. Under this objective the Gabor function is not the best. One cannot construct Gabor functions of arbitrarily wide bandwidth and still maintain a reasonably small DC component in the even-symmetric lter. This diculty can be seen if we look at the transfer function of an even-symmetric Gabor lter in the frequency domain. The transfer function is the sum of two Gaussians centred at plus and minus the centre frequency. If , the standard deviation of these Gaussians becomes more than about one third of the centre frequency the tails of the two Gaussians will start to overlap excessively at the origin, resulting in a nonzero DC component. This is illustrated in Figure 33. At the limiting situation where the centre frequency is equal to three standard deviations, the bandwidth will be approximately one octave1. This can be be seen as follows: For a Gaussian, the points where its value falls to half the maximum are at approximately plus and minus one standard deviation. Thus the upper and lower cut-o frequencies will be at approximately 4 and 2 respectively, giving a bandwidth of one octave. This limitation on bandwidth means that we need many Gabor lters to obtain wide coverage of the spectrum.
Bandwidth is the ratio of the upper and lower cut-o frequencies. Here the cut-o frequencies are dened to be the upper and lower values at which the transfer function falls to half its maximum value.
4.2. LOG GABOR WAVELETS
79
sum
amplitude
-f
0 frequency
+f
Figure 33: Transfer function of a high bandwidth even-symmetric Gabor lter. The two Gaussians that make up the function overlap at the origin, resulting in a signicant DC component. With the inspiration for the calculation of phase congruency coming from psychophysical results it is natural that we would like to use computational machinery that is consistent with what is known about the human visual system. A number of researchers have made measurements of the characteristics of mammalian visual cortical cells, and a review of many of the results can be found in Daugman [15], and in Webster and De Valois [94]. The apparent bandwidth of our spatial lters seems to depend on a variety of factors; these include motion, illumination, and spatial frequency of the stimulus. The overall conclusions that one can draw from this data are limited. There appears to be some consensus that we have spatial lters of bandwidths ranging from about 0.5 octaves up to about 3 octaves, with perhaps 1.5 octaves being considered a typical bandwidth. What is also generally observed is that lter transfer functions are roughly symmetric when viewed on a logarithmic frequency scale [95, 2]. Given the bandwidth limitations of Gabor functions described above we now pose the following question: How are zero DC lters with bandwidths of up to 3 octaves obtained? Rosenthaler et al. [79], in their study of local energy, recognized this problem and devised a method of modifying even-symmetric Gabor lters to achieve a zero DC value. However, this diculty does not appear to have been noted by researchers studying cortical cells. The requirement that lters have zero DC response would appear to be overwhelming, otherwise how would we be able
80
to operate in lighting conditions that span several orders of magnitude, and how would one be able to construct lters in quadrature pairs? An alternative to the Gabor function is the log Gabor function proposed by Field [20]. Field suggests that natural images would be better coded by lters that have Gaussian transfer functions when viewed on the logarithmic frequency scale. (Gabor functions have Gaussian transfer functions when viewed on the linear frequency scale). On the linear frequency scale the log Gabor function has a transfer function of the form G ( ) = e 2(log(/o ))2 ,
(log(/o ))2
(25)
where o is the lters centre frequency. To obtain constant shape ratio lters2 the term /o must also be held constant for varying o . For example, a /o value of .74 will result in a lter bandwidth of approximately one octave, .55 will result in two octaves, and .41 will produce three octaves.
1
amplitude amplitude
1 0.8 0.6 0.4 0.2 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 frequency 0 0.0001 0.001 0.01 log frequency 0.1
0.8 0.6 0.4 0.2 0
Figure 34: An example of a log Gabor transfer function viewed on both linear and logarithmic frequency scales. There are two important characteristics to note. Firstly, log Gabor functions, by denition, always have no DC component, and secondly, the transfer function of the log Gabor function has an extended tail at the high frequency end. Fields studies of the statistics of natural images indicate that natural images have amplitude spectra that fall o at approximately 1/ . To encode images having such spectral characteristics one should use lters having spectra that are similar. Field suggests that log Gabor functions, having extended tails, should be able to encode natural images more eciently than, say, ordinary Gabor functions, which would overrepresent the low frequency components and under-represent the high frequency
That is, lters that are all geometric scalings of some reference lter.
81
components in any encoding. Another point in support of the log Gabor function is that it is consistent with measurements on mammalian visual systems which indicate we have cell responses that are symmetric on the log frequency scale. What do log Gabor functions look like in the spatial domain? Unfortunately due to the singularity in the log function at the origin one cannot construct an analytic expression for the shape of the log Gabor function in the spatial domain. One is reduced to designing the lters in the frequency domain and then performing a numerical inverse Fourier Transform to see what they look like. Their appearance is similar to Gabor functions though their shape becomes much sharper as the bandwidth is increased. The shapes of log Gabor and Gabor functions are almost identical for bandwidths less than one octave. Figure 35 shows three log Gabor lters of dierent bandwidths all tuned to the same centre frequency.
40 30 20 10 0 -10 -20 -30 -40 -150 -100 -50 0 50 100 150 50 40 30 20 10 0 -10 -20 -30 -150 -100 -50 0 50 100 150 70 60 50 40 30 20 10 0 -10 -150 -100 -50 0 50 100 150
40 30 20 10 0 -10 -20 -30 -40 -150 -100 -50 0 50 100 150
40 30 20 10 0 -10 -20 -30 -40 -150 -100 -50 0 50 100 150
40 30 20 10 0 -10 -20 -30 -40 -150 -100 -50 0 50 100 150
Figure 35: Three quadrature pairs of log Gabor wavelets all tuned to the same frequency, but having bandwidths of 1, 2 and 3 octaves respectively. Given that we are now able to construct lters of arbitrary bandwidth and zero DC component the following question arises: What is the best bandwidth to use? One observation is that as bandwidth increases so too does the sharpness of the lter. Therefore, one constraint might be imposed by the maximum sharpness of the lter that we can eectively represent. Of perhaps greater interest is to study the variation of the spatial width of lters with bandwidth. A useful objective might be to minimize the spatial width of lters in order to get maximal spatial localization of our frequency information.
82
Normally when a function is wide in the frequency domain it is narrow in the spatial domain, thus we expect broad bandwidth lters to be narrow in the spatial domain. However, changing the bandwidth of a log Gabor lter does not result in a simple linear stretch of its transfer function in the frequency domain, so ones rst intuitive thoughts about their behaviour in the spatial domain can be misleading. Careful observation of the behaviour of broad bandwidth log Gabor lters in the spatial domain reveals that while the central spike(s) of the lter may become very narrow the tails of the lter become extended. To investigate this phenomenon further two measures of lter width were studied. 1. The width required to represent 99% of the spatial lters absolute area. 2. The second moment about the centre of the lter with respect to the absolute value of the lter. Analytical investigation of these quantities is hampered by the singularity in the expression for the log Gabor function at the origin. Thus, the variation of both these width measures with respect to bandwidth could only be investigated numerically, and the results are shown in Figure 36.
10000 "2nd_moment" "99%_width"
1000
log width units

100
10 0 1 2 3 4 bandwidth (octaves) 5 6
Figure 36: Variation of the spatial width of log Gabor functions with bandwidth (evaluated numerically).
83
As one can see, both measures of width are minimized when the bandwidth is about two octaves. The troughs in the curves are very broad with any value between one and three octaves achieving a near minimal spatial width. The data in Figure 36 were for even-symmetric lters. The results for odd-symmetric lters are similar though with a more gradual increase in width for bandwidths above three octaves being observed. These results have to be treated with some caution as they are vulnerable to numerical eects; the spatial form of the lters was calculated via the discrete Fourier transform, and the width measures are also determined numerically. The systematic undulations in the measure of width to represent 99% of the area are troubling; all attempts to eliminate them were unsuccessful. The magnitude of these undulations would vary with lter centre frequency but their locations would remain constant. A aw in attempting to measure the width required to represent 99% of the lters absolute area is that one does not know the actual area, all one knows is the total discrete area in the nite spatial window being considered. The data in Figure 36 were obtained using an FFT applied over 1024 points and with lters having a centre frequency of 0.05 (a wavelength of 20 units). The aim was to achieve a good discrete representation of the lter in both spatial and frequency domains, and also to avoid truncation of the lter tails. Despite the concerns one might have over the absolute accuracy of the data, it is felt that the overall trends of the curves are valid. It is interesting to note that the range of bandwidths over which lter spatial size is near minimum, 1 to 3 octaves, matches well with measurements obtained on mammalian visual cells. One should also note that the spatial width of a 3 octave log Gabor function is approximately the same as that of a 1 octave Gabor function, clearly illustrating the ability of the log Gabor function to capture broad spectral information with a compact spatial lter.
84
4.3
Phase congruency from broad bandwidth lters
Designing a bank of lters for the calculation of phase congruency involves trading lter bandwidth against the size of the scaling factor between frequencies of successive lters. The aim is to obtain a reasonably broad and uniform coverage of the spectrum without having to use too many lters. As mentioned in the previous section the maximum bandwidth we can obtain from a Gabor lter is about one octave. To obtain uniform spectral coverage, the scaling between successive lter centre frequencies cannot be much greater than about 1.5. Thus, to construct a lter bank that spans, say, four octaves we need to use 8 lters (1.5(81) 17). The larger bandwidths possible with log Gabor functions oer us greater exibility in lter bank design. For example, instead of using 8, one octave lters to achieve a four octave wavelet bank, we could make do with only 4, two octave lters having a scaling of 2.6 between successive centre frequencies (one can use a scaling of up to 3 between centre frequencies and still have an even spectral coverage). Being able to use fewer lters means the computational load is reduced accordingly. There were high hopes that using two octave bandwidth log Gabor lters would result in improved phase congruency maps. In particular, it was hoped that their compact spatial size might improve feature localization and the detection of small scale features. Unfortunately this was not the case. Using these broad bandwidth lters to calculate phase congruency with the techniques developed in the previous chapter produced similar results to those obtained with one octave Gabor lters. If anything, the results were of slightly lower quality. While sharp features were located as well as before, the localization of any features that had been blurred was generally slightly worse. This was in addition to the fact that the localization of blurred features using one octave lters had not been particularly good in the rst place. Figure 37 shows phase congruency calculated on a smoothed step function using a wavelet bank of 4 two octave lters and a wavelet bank of 6 one octave lters. The lter banks were designed to cover approximately the same section of the
4.4. ANOTHER WAY OF DEFINING PHASE CONGRUENCY
85
spectrum. The smallest scale lters had a wavelength of 4 pixels with subsequent lters being successively scaled by a factor of 2 for the two octave lter bank and by a factor of 1.5 for the one octave lter bank. In both cases the localization of the feature is very poor and the results become sensitive to numerical eects because the lter outputs become small and the normalization becomes unstable (causing the oscillations in the results for the one octave lter bank). In many cases it can be the frequency spread weighting function that provides the localization information rather than phase congruency itself. However, in the examples shown in Figure 37 the perceived frequency spread is so low (because only a few low frequency lters are responding) that the nal weighted results are heavily suppressed. Clearly there are two problems that were not properly addressed in the previous chapter; rstly the poor localization of blurred features, and secondly, the sometimes delicate and unstable nature of the calculation of phase congruency. The cause of the reduced localization associated with broad bandwidth lters can be identied as follows: Increasing the bandwidth of our lters will reduce the resolution at which we obtain amplitude and phase information with respect to frequency. This is due to the phase and amplitude information now being averaged over larger bandwidths of the spectrum by the lters. In eect we are quantizing amplitude and phase information over frequency more heavily. In particular, the increased averaging associated with the greater phase quantization will result in us not seeing the more extreme variations in phase angles at any point in the signal so strongly. Thus, as we move away from a point of phase congruency in a signal, we will have a reduced rate of decay in the perceived phase congruency. Here we can clearly see the tradeo between the frequency resolution of our phase and amplitude information against the computational resources (the number of lters) we are prepared to provide.
4.4
Another way of dening phase congruency
We have gained some knowledge about the way in which localization of phase congruency can be degraded if lter bandwidth is made large. However, even if one
86
smoothed step function 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 1 0.8 0.6 0.4 0.2 0 0 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 phase congruency weighted by frequency spread 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 50 100 150 200 frequency spread 250 1 0.8 0.6 0.4 0.2 0 50 100 150 200 250 raw phase congruency (one octave filters) 1 0.8 0.6 0.4 0.2 0 0 50 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0
smoothed step function
50 100 150 200 250 raw phase congruency (two octave filters)
100 150 200 frequency spread
250
0 50 100 150 200 250 phase congruency weighted by frequency spread 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250
Figure 37: Phase congruency calculated on a Gaussian smoothed step using a wavelet banks of one octave lters (left column), and two octave lters (right column. uses narrow bandwidth lters we still have poor localization of blurred features. It is therefore appropriate that we re-examine the way in which phase congruency is dened and calculated with a view to modifying it to improve its sensitivity and localization. To do this it is useful to remind ourselves of the equation for phase congruency, P C (x) = max(x)[0,2]
n An
cos(n (x) (x)) n An
(26)
87
E (x ) . n A n (x ) A n (x ) .
Phase congruency is the ratio of the signal energy, E (x), to the overall path length taken by the lter outputs in reaching the end point,
n
Looking at the expression for energy we can see the cause of the poor localization. Energy is proportional to the cosine of the deviation of phase angle, n (x) from the overall mean phase angle, (x). While the cosine function is maximized when n (x) = (x), it requires a signicant dierence between n (x) and (x) before its value falls appreciably. For example, the lter outputs could be such that all phase angles were (x) 25o and we would still have a phase congruency of 0.9 (cos(25o ) 0.9). Thus, using the cosine of the phase deviation is a rather insensitive measure of phase congruency. We can construct a more sensitive measure of phase congruency by noting that at a point of phase congruency the cosine of the phase deviation should be large, and the absolute value of the sine of the phase deviation should be small. The gradient of the sine function is a maximum at the origin. Therefore, making use of the sine of the phase deviation will increase our sensitivity. Accordingly a more appropriate phase deviation function to base the calculation of phase congruency might be (x) = cos(n (x) (x)) |sin(n (x) (x))| . (27)
Figure 38 plots this function along with the cosine function for comparison. The function falls very nearly linearly as phase deviation moves from 0 to /2. Thus a near direct measure of phase deviation is obtained without having to resort to inverse trigonometric functions.
4.4.1
Calculation of P C2 via quadrature pairs of lters
Using this new measure of phase deviation, (x), a new measure of phase congruency can be dened as P C 2 (x ) =
n An (x)[cos(n (x)
(x)) | sin(n (x) (x))|] T , n A n (x ) +
(28)
where, as before, is a small constant to avoid division by zero, T is the estimated noise inuence (calculated as described in Section 3.4), and denotes that the
88
1 cos(x) - abs(sin(x)) cos(x) 0.5
-0.5
-1
-1.5 -pi -pi/2 0 phase deviation pi/2 pi
Figure 38: Comparison between cos(x) and cos(x) | sin(x)|. enclosed quantity is itself if it is positive, and zero for all other values. Note that this expression for phase congruency is called P C2 (x) to distinguish it from the previous denition of phase congruency which will now be referred to as P C1 (x). The relationship between these two phase congruency measures can be seen in statistical terms. The measure P C 2 (x) is related to the weighted mean absolute deviation of phase from the weighted mean 3 in that the phase deviation measure it uses varies almost linearly with angular deviation of phase. On the other hand P C1(x), in using the cosine of the phase deviation, is related to the weighted variance (and hence, the weighted standard deviation) with respect to the weighted mean phase. (Using the rst few Taylor series terms cos(x) can be approximated by 1 x2/2 for small x). The calculation of this new measure of phase congruency, P C2(x), can be done readily using the outputs of quadrature pairs of lters. For convenience we shall denote en (x) and on (x) as the results of convolving a signal, I (x), with the even
e o and Mn , where n denotes scale. Each pair of outputs and odd-symmetric lters M n
at scale n can be considered to form a vector, (en (x), on (x)) having a magnitude
Normally the mean absolute deviation is calculated with respect the the median of a distribution; the median minimizes this quantity. However, the mean of the phase distribution is more accessible to us than the median, especially if we want to weight phase values by amplitude values.
89
of An (x). Using dot and cross products between these lter output vectors we can calculate the cosine and sine of (n (x) (x)). The unit vector representing the direction of the weighted mean phase angle, (x) is given by (e (x), o (x)) = where F (x ) =
n
1 (F (x )2 + H (x )2 )
(F (x), H (x)) ,
(29)
en (x) , and on (x).

n
(30) (31)
H (x ) =
Now, using dot and cross products one can form the quantities An (x) cos(n (x) (x)) = en (x).e (x) + on (x).o (x) and An (x)| sin(n (x) (x))| = |en (x).o (x) on (x).e (x)| Thus, An (x)(cos(n (x) (x)) | sin(n (x) (x))|) = (en (x).e (x) + on (x).o (x)) |en (x).o (x) on (x).e (x)| (34) (33) (32)
which gives us the quantity needed to calculate this new version of phase congruency. The results obtained from this new measure of phase congruency are indeed much better localized. Figure 39 compares this new measure of phase congruency against the old on the Gaussian smoothed step function that was used in Figure 37. For both sets of calculations a bank of 4 two octave lters were used, the smallest scale lters having a wavelength of 4 pixels with subsequent lters being successively scaled by a factor of 2. Note that raw phase congruency results are presented; there is no weighting by frequency spread, but the frequency spread weighting, if applied, would be identical in both cases. A 2D implementation of this approach to measuring phase congruency produces very satisfying results. Figures 40 and 41 provide examples of how P C2 produces a
90
smoothed step function 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 50 100 150 PC1 200 250 1 0.8 0.6 0.4 0.2 0 0 50 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 50
100
150 PC2
200
250
100
150
200
250
Figure 39: Comparison of the response of the new phase congruency measure, P C 2 (right) against P C1 (left) on a Gaussian smoothed step. more localized response to features, which allows better detection of ne detail in images. An extensive portfolio of results obtained on a variety of images is presented in Appendix A. In the meantime there are other possibilities for the calculation of phase congruency to consider.
91
Boat image.
Figure 40: Raw phase congruency calculated on the Boat image comparing the localization achieved with phase congruency measures P C1 (left) and P C2 (right).
92
Goldhill image.
Figure 41: Raw phase congruency calculated on the Goldhill image comparing the localization achieved with phase congruency measures P C1 (left) and P C2 (right).
4.5. A THIRD MEASURE OF PHASE CONGRUENCY
93
4.5
A third measure of phase congruency
So far the measurement of phase congruency has been approached by considering the scatter of phase components relative to some weighted mean phase angle. An alternative approach to the measurement of phase congruency might be to look at the deviation of phase angles among the individual components rather than relative to some mean phase angle. For example, if we analyze a signal over three scales we can look at the dierences between the phase of the lter response vector measured at the rst scale with those at the second and third scales, and between the phase angle at the second scale and at that of the third; these combinations cover all the possible comparisons. A weighted mean of all these phase dierences could then be used to identify the degree of phase congruency. A potential attraction of this approach is that we may obtain a more sensitive measure of phase congruency (and hence better localization) as we expect that the deviation amongst individual phase components will be greater than their deviation from some mean phase angle. A diculty with this approach is that if we extend this idea to a large number of lter scales we end up with a combinatorial explosion in the number of phase angles to compare. One way of overcoming this is to choose to only measure phase dierences between adjacent scales. This limits the number of phase comparisons that have to be made, though possibly at the expense of reducing sensitivity to a gradual phase deviation that is distributed over several scales. Choosing to only compare phase dierences between adjacent scales also makes sense in that the magnitudes of the lter response vectors at adjacent scales are likely to be similar. Under these circumstances phase angle comparisons are more likely to be accurate. The amplitudes of lter response vectors from widely separated scales are likely to be very dierent, making phase angle comparisons less meaningful.
4.5.1
Calculation of P C via quadrature pairs of lters
Using dot and cross products between lter output vectors as before, one can construct a weighted measure of the phase deviation, (x) between the phase angles
94
at scales i and j as follows: (x)i,j = Ai (x)Aj (x)(cos(i (x) j (x)) | sin(i (x) j (x))|) = (ei (x)ej (x) + oi (x)oj (x)) |ei (x)oj (x) oi (x)ej (x)| . (35)
From this we can construct this new measure of phase congruency by summing the weighted phase dierences between adjacent scales. P C 3 (x ) =
n1 i=1 Ai (x)Ai+1(x)[cos(i (x) i+1 (x)) | sin(i (x) n1 i=1 Ai (x)Ai+1(x) +
i+1 (x))|] T3 (36)
where T3 , the noise compensation factor, is calculated by taking the estimated noise eect at each scale, Ti (calculated following the approach described in Section 3.4) and forming the sum T3 =
n1 i=1
Ti Ti+1.
(37)
step function 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 50 100 150 200 raw phase congruency 250 1 0.8 0.6 0.4 0.2 0 0 50 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 0 50
100 150 200 raw phase congruency
250
100
150
200
250
Figure 42: Performance of P C3 on a step function and a smoothed step function. Figure 42 shows the raw output of the P C3 measure applied to a step function and a smoothed step function. The lters used were the same as those used for the results shown in Figure 39. The localization obtained is excellent, especially so on the smoothed step. However, the results are very sensitive to the value of . Here, a value of 0.001 has been used rather than 0.01 which has been used for
4.6. BIOLOGICAL COMPUTATION OF PHASE CONGRUENCY
95
all other results presented so far. If a value of 0.01 had been used the magnitude of the phase congruency peak on the smoothed step would have fallen to only 0.2. This sensitivity is a result of the phase dierence weighting being a product of the amplitudes of the lter response vectors being compared. When the lter outputs become small (say, when the signal is smooth) then the product of the amplitudes becomes very small and the inuence of the parameter becomes very signicant. This diculty becomes more apparent when a 2D implementation of this phase congruency measure is used. For this reason the preferred measure of phase congruency remains P C 2 . No doubt P C3 could perhaps be modied and developed further to overcome these problems, but this will not be pursued here. The main point is that there are a number of ways of constructing measures of phase congruency; the three approaches described here merely illustrate some of the possibilities, and better measures may yet be found.
4.6
Biological computation of phase congruency
Given the number of dierent ways of calculating phase congruency that have been developed in the previous sections it would be nice to nd a way of calculating phase congruency in a biologically plausible manner. This is natural as the interest in nding points of phase congruency has been motivated from psychophysical results. From a biological point of view the main problem with the two measures of phase congruency P C1 and P C2 is the need to form the quantities F (x) and H (x) from the sum of the outputs from the even and odd-symmetric lters respectively. These quantities can take on arbitrarily large values, both positive and negative. This is problematic because it is generally considered that biological cells are only able to represent signals in some rectied form. Thus, it is not clear how these quantities can be represented, let alone calculated. The other main requirement in the calculation of phase congruency is the magnitude of the lter output vectors. These can be readily calculated in a biologically plausible manner as shown by Heeger [35]. Heeger solves the problem of cells only
96
being able to represent rectied quantities by suggesting that one uses the rectied outputs of lters in all four quadratures (positive and negative odd-symmetric, and positive and negative even-symmetric lters). This ensures that no matter what polarity or phase of input signal is encountered at least one of the four quadrature lters will be responding with a positive signal. If these four rectied lter outputs are then squared and summed we obtain the amplitude of the response of an equivalent quadrature pair of lters that did not have rectied outputs. Heeger calls this process of squaring the rectied outputs of lters half-squaring. Therefore, the primary diculty is in the calculation of the overall energy vector which provides us with the weighted mean phase angle. The third measure of phase congruency, P C3 , oers some promise in that it does not require the calculation of energy, it looks at the deviation of phase angles between the individual components rather than relative to some mean phase angle. Accordingly we are now concerned with the calculation of phase deviation, (x) as dened in equation 27, which requires the formation of the dot product and the amplitude of the cross product between the two response vectors being considered. A rectied value of the dot product is readily calculated from the rectied outputs of the response vectors, and this is sucient because any dot product values less than zero clearly indicate a large angle of phase deviation. However, it is not apparent that one can determine the amplitude of the cross product from rectied lter outputs. So here too we draw a blank. A possibility here might be to try to just use the rectied dot product as a phase deviation measure, though this would involve some sacrice of feature localization. One should not be too discouraged by these initial failures to devise a biologically plausible method of calculation for phase congruency. The limited attempts that have been made in this section merely scratch the surface, but hopefully they have illustrated that there are many posibilities to explore.
4.7. SYMMETRY AND ASYMMETRY: SPECIAL PATTERNS OF PHASE 97
4.7
Symmetry and Asymmetry: Special patterns of phase
4.7.1
Introduction
So far this thesis has concentrated on the detection of phase congruency, that is, nding points where all phase components in the signal are aligned. Clearly, these are special points representing locations of high information content in the signal. However, this is not to say that there are no other arrangements of phase that might be important. In this section it will be shown how bilateral and rotational symmetries in image intensity values give rise to special phase patterns that can be readily identied. Under the most general denition of symmetry an object is considered symmetric if it remains invariant under some transformation. Two forms of symmetry that we can readily identify in images are bilateral symmetry and rotational symmetry. An object exhibits bilateral symmetry if it remains invariant with respect to reection about some axis. An object has rotational symmetry if it remains invariant with respect to rotations about some axis. This section will only be considering these two forms of symmetry and in the following discussion where the word symmetry is used it should be taken to mean bilateral symmetry and/or rotational symmetry. Symmetry is an important mechanism by which we identify the structure of objects. Man-made objects, plants and animals are usually highly recognizable from the symmetry, or partial symmetries that they often exhibit. A limited number of approaches have been tried in the detection of symmetry in images. A fundamental weakness found in most is that they require objects to be segmented prior to any symmetry analysis. For example Atallah [3] describes an algorithm that requires objects to be represented in terms of points, line segments and circles. Morphological techniques such as medial axis transforms, thinning, and grass re algorithms can only be applied to binary objects. A survey of these approaches is provided by Xie [98]. A diculty with morphological approaches is that they are very sensitive to small variations in the outlines of objects; a notch in an object contour
98
will propagate several symmetry axes, complicating the representation of the object. Brady and Asada [7] attempt to overcome these problems by using smoothed object contours as input to an algorithm that is eectively morphological in nature. Reisfeld et al. [75] provide one of the few approaches to symmetry that does not require object recognition or segmentation. Opposing pairs of points within some distance of a location in the image are considered with respect to the direction and strength of the intensity gradients at these points. At each location in the image a weighted sum of the degree of symmetry of the surrounding opposing pairs of points is computed to obtain an overall symmetry measure. Each pair of points contributes to the measure of symmetry according to the symmetry of the directions and magnitudes of their intensity gradients, and to the strength of the intensity gradients themselves. An objection to Reisfeld et al.s measure of symmetry is that it depends on the contrast of the feature in addition to its geometric shape. A bright circle will be considered to be more symmetric than a low contrast one. Thus, we have no absolute sense of the degree of symmetry of an object, all one obtains are locations in the image where symmetry is locally maximal.
4.7.2
A frequency approach to symmetry
In this section a new measure of symmetry is presented that does not require any prior recognition or segmentation of objects. An important aspect of symmetry is the periodicity that it implies in the structure of the object that one is looking at. Accordingly it is perhaps natural that one should use a frequency based approach in attempting to recognize and analyze symmetry in images. Indeed, an inspection of the Fourier series of some simple functions makes this very apparent. At points of symmetry and asymmetry we nd readily identiable patterns of phase. Figure 43 shows the Fourier series representation of both a square wave and a triangular wave. We can see that the axis of symmetry corresponds to the point where all the frequency components are at either the minimum or maximum points in their cycles, that is, where all the frequency components are at the most symmetric points in their cycles (the mid-point of the square wave and the peaks/troughs of the triangular wave). Similarly one can see that the axis of asymmetry corresponds
to the point where all the frequency components are at the most asymmetric points in their cycles; the inection point (the steps on the square wave and the mid-point of the ramp on the triangular wave).
symmetry axis
asymmetry axis
Figure 43: Phase patterns at points of symmetry and asymmetry

odd symmetric filter output even symmetric filter output odd symmetric filter output even symmetric filter output
frequency
frequency
local phase pattern at a point of symmetry
local phase pattern at a point of asymmetry
Figure 44: At a point of symmetry the local phase pattern will be such that only even-symmetric lters will be responding, and at a point of asymmetry only oddsymmetric lters will be responding. It should be noted that here we are only considering symmetry and asymmetry of intensity values in images, that is, a low-level view of symmetry and asymmetry. Overall geometric symmetries that might exist in the image are not considered; this level of analysis requires the recognition of higher level structures in images which is outside the scope of this thesis. One can readily adapt the existing measures of phase congruency to construct equivalent measures of symmetry and asymmetry. At a point of symmetry the absolute value of the even-symmetric lter outputs will be large and the absolute value of the odd-symmetric lter outputs will be small. Therefore, a natural measure of symmetry to form is:
100
Sym(x) = =
n An (x)[| cos(n (x))|
| sin(n (x))|] T n A n (x ) + n [|en (x)| |on (x)|] T , n A n (x ) +
(38) (39)
where T is the estimated noise inuence as described in Section 3.4.

1 abs(cos(x)) - abs(sin(x)) 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -pi -pi/2 0 phase angle pi/2 pi
Figure 45: Plot of the symmetry measure | cos(x)| | sin(x)|. At points of high asymmetry the absolute values of the even and odd-symmetric lter outputs will be reversed from the symmetric case. The magnitude of the oddsymmetric lter output will be large and the even-symmetric lter output will be small. Thus, a measure of asymmetry can be expressed as
ASym(x) =
n [|on (x)|
| e n (x )| ] T n A n (x ) +
(40)
Symmetry, in some sense, represents a generalization of delta features and asymmetry represents a generalization of step edges. A delta feature starts o having all frequency components aligned in phase and in symmetry. As a delta feature is broadened into a rectangle function the alignment in phase of the higher frequency components starts to break down but the symmetry remains. Similarly, when a step
function is gradually degraded to a trapezoidal function the phase congruency at the centre of the ramp progressively breaks down but all the frequency components remain asymmetric at that point. The equations above only deal with symmetry/asymmetry in one dimension. One can extend this 1D analysis to 2D by applying the 1D analysis in multiple orientations and forming a weighted sum of the result. The same techniques described in Chapter 3 that were used to extend the calculation of phase congruency from 1D to 2D have been used here. It should be noted that this approach only provides a basic extension to 2D in that no attention is paid to the type of symmetry that that may be occuring at any point in an image. For example, no distinction is made between bilateral and radial symmetries, the symmetry measures are simply pooled over all orientations. Two examples of phase symmetry images obtained using this simple approach are shown in Figure 46. Ideally, some consideration to the way in which symmetry varies with orientation at each point in the image should be made. This would allow classication of bilateral and radial symmetries. An important contribution of this work is that the measures of symmetry and asymmetry that have been developed here are normalized, dimensionless measures. They are independent of the brightness or contrast of image features. That is, unlike symmetry measures developed by others, we obtain an absolute sense of the level of symmetry/asymmetry that is present at a point in the image. The results obtained from the phase symmetry measure can sometimes be counter-intuitive. This is because it measures local symmetry to the exclusion of everything else. The measure is invariant to the magnitude of the local contrast and so features that we might consider to be of little signicance can be marked as having strong symmetry (see the wave patterns in the Whale image of Figure 46). Secondly, being a low-level measure that only considers local intensity values there is no distinction between foreground objects and background, it will faithfully report symmetries that occur in the spaces between foreground objects. This is in contrast to what we generally do when studying a scene, that is, we only consider the properties of the foreground objects which we have unconciously segmented out from the scene. Thus, the output of the phase symmetry measure may not quite be
102
Whale image and corresponding phase symmetry image
Test image and corresponding phase symmetry image Figure 46: Two examples of phase symmetry images. what one wants when one searches for symmetry in an image. Perhaps this indicates we need to think more about what we really want when we say symmetry; we may be talking about some other (undened) quantity that incorporates other properties such as local contrast and foreground/background attributes. Further examples of phase symmetry images can be found in Appendix A
4.8. SUMMARY
103
4.7.3
Biological computation of symmetry and asymmetry
A biological model of the computation of symmetry and asymmetry is very straightforward to construct. To detect symmetry one simply subtractively inhibits the rectied output of the even-symmetric lters with the rectied output of the oddsymmetric lters, before divisively normalizing by the sum of the lter output pair magnitudes. For asymmetry we reverse the subtractive inhibition between the even and odd-symmetric lters.
4.8
Summary
In re-examining the ideas behind phase congruency this chapter has covered a wide range of areas. The design of lters was investigated in an attempt to improve the localization of the original phase congruency measure. Log Gabor lters were shown to overcome the limitations that Gabor functions have in terms of producing high bandwidth lters. With the freedom of lter bandwidth choice made available as a result of using log Gabor functions an investigation into the variation of the spatial extent of lters with bandwidth was conducted. It was shown that the spatial extent of a log Gabor lter is minimized when its bandwidth is about 2 octaves, though any bandwidth between 1 and 3 octaves produces a lter with near minimal spatial size. The indierent results obtained with large bandwidth log Gabor lters prompted a re-examination of the measurement of phase congruency. It was noted that the original measure of phase congruency was a function of the cosine of the phase deviation. While the cosine function is at a maximum when its argument is 0 it does not have a sharply dened peak and this aects the localization of phase congruency. Accordingly two new measures of phase congruency were devised, both based on the cosine of the phase deviation minus the absolute value of the sine of the phase deviation. This function has a sharply dened maximum when the phase deviation is 0 and falls almost linearly as phase deviation increases. The rst new measure of phase congruency, P C2 , used this new function to measure phase variation relative to the local mean weighted phase angle. The second new measure, P C 3 , used this
104
function to measure the phase variation between adjacent frequency components in the signal. Both new measures provided much better feature localization, though P C2 was preferred over P C3 because P C3 was more sensitive to numerical eects when the signal had little energy. The new phase congruency measures P C2 and P C3 opened up an awareness of some new possibilities in the way one might use phase information to identify local image properties. This led to a recognition that local symmetry and asymmetry in image intensity patterns can be identied as being particular arrangements of phase. The new measures of symmetry and asymmetry that were developed are signicant in that they are low-level operators that do not require any prior object recognition or segmentation. They are also unique in that they are dimensionless measures that provide an absolute sense of the degree of local symmetry or asymmetry independent of the image illumination or contrast.
Chapter 5 Representation and matching of signals

5.1 Introduction
The ability to measure the similarity between two or more image points is important for many image processing tasks. Stereo analysis, motion tracking, region segmentation, and many other operations rely on the ability to match signals. In this chapter a new approach to the comparison and matching of image signals for disparity measurements is introduced. It is proposed that localized frequency data, rather than spatial data, should be used as the basis for matching signals against each other. Working in the frequency domain one can use local phase and amplitude information to construct a dimensionless measure of similarity that has high localization. Of greater interest, it is shown that phase information can be used to eciently guide the search for a point in one signal that best matches a selected point in another signal. It is also shown that the representation of a signal in terms of phase and amplitude information on a logarithmic frequency scale provides a powerful domain for the comparison of signals. When measuring the degree of similarity between image points we want the result to be independent of the intensity and spatial scales of the images. Therefore, the similarity measure should be a dimensionless quantity. In addition, the matching process should tolerate a degree of spatial distortion between the two signals, and 105
106
CHAPTER 5. REPRESENTATION AND MATCHING OF SIGNALS
ideally it should recognize the distortion. Being able to determine the relative spatial distortion between the two signals in the vicinity of the matched points allows one to deduce important aspects of the three-dimensional structure of the scene at that point.
5.2
Spatial Correlation
The classic method of signal matching for disparity measurement is spatial correlation. By normalizing the correlation relative to the mean and standard deviation of intensities within the correlation window a dimensionless measure can be obtained. This has proved to be very successful and is perhaps the most commonly used approach for stereo matching. Faugeras [19] provides an overview of current correlation techniques.
uo uo+
vo
2M+1
vo
2M+1
2N + 1
2N + 1
image 1
image 2
Figure 47: Spatial correlation windows in two images. The normalized spatial correlation over a rectangular window (2N +1) by (2M + 1) about a point centred at (u0, v0) in image 1 with a point (u0 + , v0) in image 2 is calculated using the following expression: C s ( ) = 1 K
u0 +N v0 +M
u=u0 N v=v0 M
[I1(u, v ) I1(u0 , v0)][I2(u + , v ) I2 (u0 + , v0)] , (41)
where K = (2N + 1)(2M + 1)1 (u0, v0 )2(u0 + , v0) and I1 (u0, v0) and 1 (u0, v0) are the mean intensity and standard deviations within the window about point (u0, v0) in image 1.
5.3. PHASE BASED DISPARITY MEASUREMENT
107
It is important that the correlation is calculated using intensity values relative to the local mean, otherwise the correlation measure can become large simply because a region may have a high average intensity value, rather than because the region is a good match. The normalization by the factor K ensures that C s ( ) lies between 1 and +1. Indeed, this correlation measure is one of the few examples of low-level invariant measures that are currently in use. While spatial correlation can be very eective in determining whether or not two signals match, nding the locations where the match is maximized is generally done by brute-force. The correlation measure on its own provides no guide as to where one should search next in order to increase the degree of match. Nor does the correlation measure provide information about any distortion that may exist between the two signals at the matching points.
5.3
Phase Based Disparity Measurement
As an alternative to spatial correlation, phase based methods of disparity measurement have been proposed by a number of researchers including Jenkin and Jepson [39], Sanger [82], Langley et al. [51], and Fleet et al. [27]. This approach is based around using Gabor lters in quadrature pairs to obtain local phase information and instantaneous frequency. The disparity is estimated from the local phase dierence divided by the instantaneous frequency, where the instantaneous frequency is estimated from the derivative of the local phase.
signal 1 phase
phase difference disparity/phase difference signal 2 polar diagram of phase angles at the two points in the signals
wavelength
Figure 48: Estimating disparity between two signals from phase dierence and instantaneous frequency.
108
A factor that is common to all these previous approaches is the use of only one scale of quadrature Gabor lters, and this gives rise to a number of diculties. Firstly, not enough information is available to determine whether the disparity is being measured between matching points. Secondly, at some locations the signal characteristics may be such that the magnitude of the lter responses becomes very small, leading to a scale-space singularity where the phase at the scale being considered becomes ill-dened. Finally, the instantaneous frequency may not be well dened as there may be several distinct frequencies present. Using only one scale of quadrature lters means that only a limited range of frequency components can be detected, and if the amplitude is small the estimation of the instantaneous frequency from the phase derivative may become ill-conditioned. This problem of scale-space singularities features strongly in the papers of Fleet, Jepson and Jenkin, and considerable eort is devoted to the detection of these points. From the work covered in previous chapters on phase congruency and the study of phase patterns that arise at points of local symmetry and asymmetry we are in a good position to understand the conditions that give rise to scale-space singularities. Clearly, scale-space singularities will not occur near points of phase congruency. At a point of exact phase congruency all frequency components will be at the same phase over all scales. Thus, points of phase congruency will provide very stable information if one is seeking to determine disparity from phase. Of course, points of phase congruency mostly correspond to the step and line features that have been used for so long in the spatial domain for establishing disparity. Therefore, scale-space singularities will occur at locations where the phase congruency is low. In particular, they will occur at points of low phase congruency which also happen to be locations of local symmetry or asymmetry. Examples of this would be the mid-point of a square waveform or at the mid-point of the ramp on a trapezoidal or triangular waveform. As shown in Figure 49 the phase measured at the mid-point of a square wave depends on the scale of analysis used, but it can only be /2 or 3/2. The transition between these two phase values occurs at scale-space singularities. Note also that the amplitude of the lter responses is only strong where the phase values are well established; at the scale-space singularities
5.3. PHASE BASED DISPARITY MEASUREMENT
109
the amplitude falls to zero. A similar situation can be seen at the mid-point of the ramp on the triangular waveform, but with phase angles making the transition between 0 and .
symmetry axis
asymmetry axis
amplitude scalograms
phase scalograms
Figure 49: Fourier series components and amplitude and phase scalograms of square and triangular waveforms. The scale-space singularities occur at the symmetry point of the square wave and at the asymmetry point of the triangular wave (marked by stars). The grey level representing the phase value changes in a discontinuous manner as one moves vertically through the scalogram at these points. At each phase singularity there is a corresponding hole in the amplitude scalogram as the magnitude of the response falls to zero at these points. (Note that data at the edges of the scalograms is unreliable due to edge eects, particularly for low frequencies). The material presented in this chapter will develop some preliminary ideas towards an approach to phase based disparity measurement that uses local phase and
110
amplitude information over many scales. This has a number of advantages: The extra information from data over many scales allows one to test whether or not points match. The use of many scales also largely overcomes the problems of scalespace singularities and eliminates the need to calculate an instantaneous frequency via the local phase derivative.
5.4
Matching Using Localized Frequency Data
As before, it is useful to think of the convolution outputs of a quadrature pair of lters at some scale as forming a response vector:
o e ), , I (x) Mn ( en (x), on (x) ) = ( I (x) Mn o e denote the even-symmetric (cosine) and Mn where I denotes the signal and Mn
and odd-symmetric (sine) wavelets at a scale n. At each point x in a signal we will have an array of these response vectors, one vector for each scale. These response vectors form the basis of the localized representation of the signal that will be used for signal matching and estimating disparity from phase. Given two signals I 1 and I 2 and two locations in these signals x1 and x2 we want to test whether the arrays of lter response vectors at these locations in the signals correlate strongly. The correlation measure devised makes use of the dot product between corresponding response vectors at each scale. This measure gives us the cosine of the phase angle deviation between the vectors weighted by the amplitudes of the vectors. By forming the weighted sum of corresponding pairs of response vector dot products over all lter scales and then normalizing by the sum of the products of the pairs of the response vector amplitudes, we obtain a dimensionless measure of signal correlation C (x 1 , x 2 ) =
n [en (x 1
)en (x2 ) + on (x1)on (x2)] T , 1 2 n A n (x )A n (x ) +
where en (x1), on (x1) denote the nth scale even and odd-symmetric lter outputs at location x1 in signal I 1, An (x1 ) denotes the amplitude of the lter pair response at scale n in signal I 1 at x1. Similar terms correspond to the responses in signal
5.5. USING PHASE TO GUIDE MATCHING
111
I 2 at x2 , and as before, is a small constant to prevent division by zero when the signal is at. The factor T is a noise compensation term representing the maximum correlation response that could be generated from noise alone in the signal. This factor is obtained by taking the estimated inuence of noise on each of the lters (calculated as described in Section 3.4) and forming the sum T =
n1 i=1 2 Ti1T1 ,
(42)
where Ti1 represents the estimated inuence of noise on lter scale i in signal 1, Ti2 represents the equivalent quantity in signal 2.
5.5
Using Phase to Guide Matching
Inevitably when one rst compares a signal at one location against another signal at a second location the match will not be exact. It would be nice if one could estimate where one should move to in one signal in order to improve the match. Assuming one is already close to the correct matching location it is possible to use the relative phase angle dierences between the lter response vectors to estimate the shift required in one signal to improve the match. This approach is used by Langley et al. [51] and Fleet et al. [27] for the estimation of disparity from phase information. Here these ideas are extended to make use of phase information over many scales At each scale n the estimated shift required to match the signals is n (x1 , x2) = n n /2 , where n is the phase angle dierence in the lter response vectors at the two locations in the signals being compared at scale n, and n is the wavelength of the analysis lters at scale n. One problem is that the shift estimates at dierent scales may be in error by multiples of the lter wavelengths. We have no way of recognizing this; all we can hope for is that the lter responses with the largest magnitudes will provide us with the most reliable disparity information. Therefore, the approach that has been adopted is to weight the estimated shifts by the lter
112
response amplitudes and compute a weighted average, namely (x1, x2) = An (x1)An (x2)n n , 2 n An (x1)An (x2)
n
where An (x1 ) denotes the amplitude of the lter pair response at scale n in signal I 1 at x1 (and similarly for x2 in I 2). Thus, the lters with the strongest responses dominate the result1. This approach is highly eective in matching one-dimensional signals with convergence on the correct matching location being obtained with very few iterations. By using a weighted average phase dierence over many scales one overcomes the problems that may arise when the amplitude at some scales becomes small and the phase ill-dened. We dont have to place bets on the best scale of analysis to use. Figure 50 illustrates the results when this frequency based correlation method is applied at two points in a 1D signal. (That is, in performing the correlation and disparity estimates the two signals I 1 and I 2 are identical, the correlation point marked in the plots represents the position x1 in signal I 1 and x2 is indexed through every point in the signal). Thus the value in the correlation plot at say, location 150, represents the correlation of the signal at location 150 with the signal at the reference correlation point. Spatial correlation signals are also shown for comparison. The plots of estimated disparity are only valid for a limited region about the matching point. The size of this region is dependent on the local frequency content of the signal. If there are strong low-frequency components then the large scale lters will contribute strongly to the estimate of the mean weighted phase dierence, resulting in the disparity estimate being valid over a larger region before the phase wrap-around occurs. This eect can be seen in the two sets of plots in Figure 50 where the right hand set of results correspond to a point in the signal that contains mainly high frequency components, resulting in a relatively small region where disparity measurements are valid. Note that over the region where the disparity estimate is valid the slope of
One might argue that once the response magnitude exceeds some value (say, a xed multiple of the expected noise value) then all responses should be weighted equally. There is also a case for weighting the disparity estimate obtained at each lter scale according to the wavelenth of the lter. Filters tuned to the lowest frequencies (greatest wavelengths) will be least aected by any phase wrap-around problems due to disparities that exceed one spatial period. These possibilities have not been investigated here.
5.6. DETERMINING RELATIVE SIGNAL DISTORTION
113
the disparity function is close to 1. Thus, when this information is used to guide the search for a matching point in a signal the convergence to the correct location is very rapid. However, the convergence process is not completely fool-proof as one can sometimes encounter points that are near to the correct matching point where the estimated disparity falls to zero (as in these plots). Thus when the disparity estimate falls to zero one needs to check the correlation value to see if small local movements can increase the correlation, and hence improve the estimate of the matching point.
5.6
Determining Relative Signal Distortion
An important parameter to recognize is the relative signal distortion in the vicinity of the matched points. For example, a stereo view of a slanted surface will result in the two images displaying the surface with dierent degrees of foreshortening. If one can measure the relative foreshortening, or scaling, between the two images one can deduce the surface slant. In general, when one matches two image points, the signal at one point will be some scaled version of the signal seen at the other point. Accordingly, in the frequency domain the spectrum of one of these signals will be a stretched and shifted version of the others spectrum. Identifying the scale dierence in the two signals that corresponds to the observed dierences in the spectra is awkward because it manifests itself as both a shifting and a rescaling of the spectra along the frequency axis, and a rescaling in amplitude. However, if we choose to display the spectra of these two signals on a logarithmic frequency scale we nd that the shape of the two spectra is identical, they will only dier by a translation along the log frequency axis and a rescaling of the amplitudes. (We are familiar with the idea that the spectra of geometrically scaled wavelets are all identical when viewed in the logarithmic frequency scale, but this property applies to any signal shape that is scaled). Thus to determine the frequency shift, and hence scale dierence between two signals, we simply nd the translation that maximizes the correlation between the two spectra. The inverse log of this translation is the frequency shift from which we can deduce the scale change. This
114
1D signal 200 150 100 50 0 0 50 100 150 200 250 correlation point 200 150 100 50 0 0 50
1D signal
correlation point 100 150 200 250
spatial correlation 1 0.5 0 -0.5 -1 0 50 100 150 200 250 1 0.5 0 -0.5 -1 0
spatial correlation
50
100
150
200
250
phase and amplitude correlation 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 1 0.8 0.6 0.4 0.2 0
phase and amplitude correlation
50
100
150
200
250
estimated disparity 20 15 10 5 0 -5 -10 -15 -20 0 20 15 10 5 0 -5 -10 -15 -20 0
estimated disparity
valid disparity estimates 50 100 150 200 250
50
valid disparity estimates 100 150 200 250
Figure 50: Examples of correlation of a 1D signal against itself at two reference points. Results for both spatial correlation and the new frequency based correlation method are shown for comparison. The spatial correlation was performed over a window of 30 pixels. The frequency based correlation was performed using a bank of 5 log Gabor lters with wavelengths ranging from 4 pixels up to 64 pixels. is illustrated in gure 51. Note that the use of a logarithmic scale to simplify the identication of the scale dierence cannot be applied directly in the spatial domain as we do not have
5.6. DETERMINING RELATIVE SIGNAL DISTORTION
115
signal 1
signal 2
0.5 0 -0.5
0.5 0 -0.5
-300
-200
-100
100
200
300
-300
-200
-100
100
200
300
Fourier spectrum of signal 1 20 40 30 10 20 10 0 0 0.1 0.2 0.3 0.4 0 0
Fourier spectrum of signal 2
0.1
0.2
0.3
0.4
Spectrum of signal 1 (log frequency) 20 40 30 10 20 10 0 0.001 0 0.001
Spectrum of signal 2 (log frequency)
0.01
0.1
0.01
0.1
Figure 51: A Gabor function at two scales along with their corresponding Fourier Spectra an absolute zero point about which to determine the logarithmic scale. However, by transforming our data into the frequency domain we obtain an absolute zero point (zero frequency), and hence allow the application of this approach. This technique has been used to determine image velocities from visual Doppler eects by Kovesi and Trevelyan [49]. Another, more obvious, application of this form of signal analysis is in texture segmentation. In an image a texture pattern can appear with dierent degrees of foreshortening depending on the slant of the surface on which the texture lies. On a logarithmic frequency scale the shape of the textures amplitude spectra will remain constant throughout the image. If the magnitudes and principal directions of the scaling/distortion of the texture can be extracted then the surface slant should be recoverable.
116
amplitude
amplitude spectrum of signal being analysed
shift in spectrum with scale change
log frequency analysis filters
Figure 52: Identifying frequency shift/scale change is dicult when the signals spectrum is broad relative to the frequency range covered by the analysis lters. While this approach sounds very attractive there are diculties in implementing it. In particular there is an aperture problem. We only see a window of the spectrum, namely the band of frequencies covered by our lters. If the signal spectrum is considerably wider than the analysis window covered by our lters we will be unable to see the end points of the spectrum and hence be unaware of any frequency/scale shift (see Figure 52). Experimentally it has been found that it is very important to have lters that cover the very low end of the spectrum. At the other end of the frequency scale one nds that any distinguishing features in the signals spectrum tend to be masked by noise in the signal (which remains unchanged relative to any spatial distortion of features in the image), hence noise compensation very important. This aperture problem is not resolved here, though a few suggestions are oered. It would seem that points of maximal local symmetry in the image are probably the best for recognizing any frequency/scale shift between signals. Points of local symmetry will tend to have most of their spectral energy concentrated at the scale corresponding to the spatial width of the symmetry. This will result in the amplitude spectra having a strong peak which is likely to minimize the spectral aperture problem. In addition, one can argue intuitively that a point of local symmetry is probably going to be the most eective location to determine scale change because one needs to see both sides of an object to determine how much it has changed its shape. Information about local phase congruency and symmetry/asymmetry are likely
5.7. CONCLUSION
117
to be very useful in analyzing disparity in signals and relative scalings/ane distortions. If a point in an image exhibits a high degree of local asymmetry in the signal then this may indicate a point of occlusion, and hence a possible discontinuity in disparity. If this point is also one of high phase congruency then one will have high condence in any phase based disparity measurements because we know there are no scale-space singularities present. If the level of local symmetry is high at a point in the image then this may well indicate that we are in the middle of an object. Thus, the disparity values in this region of the image are likely to be continuous, though our condence in estimating the disparity will be reduced by the likely presence of scale-space singularities.
5.7
Conclusion
A local frequency based method of signal correlation and disparity measurement has been introduced. By using quadrature lters over multiple scales this new approach overcomes the problems encountered in existing techniques for phase based disparity measurement. Using local phase and amplitude information obtained from a bank of quadrature paired lters one can construct a dimensionless measure of signal similarity. Where it is found that the two signals do not match one can use the mean weighted phase dierence at the two locations in the signals to estimate the disparity between the signals. This greatly improves the eciency of any search for matching points, and convergence on the correct location is very rapid. The ideas presented in this chapter are only partially developed. There is much to explore. Obviously this work needs to be extended to 2D, an approach that might be taken is to use banks of 2D wavelets in multiple orientations. Analysis in each orientation will give dierent disparity and scale distortion measures. Through some principal components analysis one might be able to extract 2D disparity and measures of ane distortion at each point in the image. If this could be done successfully there would be many applications for this approach in the analysis of stereo images, texture gradients, and motion sequences.
118
Chapter 6 Conclusion
This thesis has argued strongly for the importance of invariant measures in low-level image processing. Images provide a very dynamic and unstructured environment in which we struggle to make our algorithms operate. Objects can appear with arbitrary orientation and spatial magnication along with arbitrary brightness and contrast. Thus the search for invariant quantities is very important for computer vision. While some work has been directed at nding higher-level geometric invariances in images little eort has been devoted to nding low-level invariant quantities. The neglect of this research area has compromised the reliability of many of the current algorithms used in computer vision. It has been argued that a frequency based approach to image analysis, and in particular, the use of phase information, is a good basis on which to develop low-level invariant measures. Phase is a dimensionless quantity and phase information has been shown to be crucial in the perception of images. By studying the behaviour of phase and amplitude over frequency for dierent image features this thesis has developed a number of low-level invariant measures for feature detection, local symmetry/asymmetry analysis, and signal matching.
6.1
Contributions
The main contribution of this thesis has been to develop the concepts behind phase congruency to the point where one now has a reliable feature detector that gives an 119
120
CHAPTER 6. CONCLUSION
absolute measure of the signicance of features irrespective of the image illumination and contrast. The reliable performance of the detector over a wide range of images using xed thresholds has been demonstrated. In achieving this overall objective a number of new ideas were developed. These can be summarized as follows: The basic ideas behind phase congruency have been extended to show how it can be calculated via a bank of geometrically scaled lters in quadrature. In this process some new geometrical insights into the interpretation of phase congruency have been gained. A new measure of phase congruency that is based on the weighted mean absolute deviation (rather than the weighted variance) of phase from the weighted mean has been developed. This new measure provides a very localized response to features, allowing ne image details to be detected. It is recognized that normalized image measures, such as phase congruency, are susceptible to noise. The inuence of noise on phase congruency was studied and an eective noise compensation technique has been devised that makes minimal assumptions about the underlying noise model. This noise compensation technique is applicable to any kind of image analysis that uses banks of geometrically scaled lters. A new interpretation of scale in image analysis has been developed. It is argued that when a frequency based approach is used in the analysis of images a more logical interpretation of scale is obtained by using high-pass ltering rather than low-pass ltering. The advantage of this model is that the positions of features are not a function of the analysis scale used. All that varies with the scale of analysis is the relative signicance of features. Precision of measurement is obtained at all scales, this is of great importance to any automated system wanting to use the output of a vision system to perform some action. It is argued that for phase congruency to be used as a measure of feature signicance it must be weighted by a measure of the spread of frequencies
6.2. FUTURE WORK
121
present at each point in the image. A method for doing this is presented. The shortcomings of Gabor lters are recognized and it is suggested that log Gabor lters be used instead. Log Gabor lters allow broad spectral information to be gathered using lters of minimal spatial extent. Evensymmetric log Gabor lters also have the important property of always having a zero DC component. It is shown that points of local symmetry and asymmetry in images also give rise to special arrangements of phase, and these can be readily detected. The measure of local image symmetry/asymmetry that has been developed is unique in that it is illumination and contrast invariant, and does not require any previous image segmentation to have taken place prior to analysis. A new approach to the matching of signals that uses correlation of local phase and amplitude information over many scales has been presented. This technique also allows the disparity between signals to be estimated. This approach to disparity measurement diers from other frequency based approaches in that its use of data over many scales provides greater immunity to the diculties created by scale-space singularities. The invariance of the shape of the amplitude spectrum of a signal on the logarithmic frequency scale to scale changes is recognized. Suggestions are made as to how this invariance might be exploited for texture segmentation, and stereo and motion analysis.
6.2
Future Work
It is hoped that this thesis has shown that there are many possibilities to be explored in the use of local phase and amplitude information in images. Three dierent ways of measuring phase congruency, a method for measuring local symmetry/asymmetry and an approach to signal matching, disparity measurement and scale change measurement have all been devised from the information that is available from local
122
CHAPTER 6. CONCLUSION
phase and amplitude. No doubt many of these ideas can be improved and/or adapted to identify other low-level image properties. One aspect of the detection of features via phase congruency that was not fully addressed was the problem encountered at junctions of features having greatly differing magnitudes. At these locations the normalization is dominated by the features having the greatest contrast and this results in lower contrast features at the junction being over normalized. This problem is perhaps a result of the approach used in combining the lter outputs over several orientations. In this thesis all that has been used is the summing one-dimensional results over many orientations. It might be useful to explore a more two-dimensional approach to combining lters outputs over many orientations. If one recalls the geometric interpretation of phase congruency in 1D signals (as shown in Figure 5 in Chapter 2) phase congruency can be thought of the ratio of the magnitude of the vector sum of the quadrature pairs of lter outputs to the scalar sum of the amplitudes of the lter pair outputs. This interpretation could be extended to 2D signals in which case each lter pair response vector would be oriented in a 3D polar coordinate system, the extra coordinate being used to represent the orientation of the lter pair in addition to the phase angle. Thus, geometrically we could imagine a sequence of 3D lter response vectors spiraling out from the origin. Phase congruency could then be dened in terms of the ratio between the magnitude of the 3D vector sum of these response vectors to the scalar sum of their response amplitudes. The work presented on feature matching, disparity measurement and distortion measurement is, of course, preliminary. There are many possibilities to develop. The obvious work to be done is to extend the ideas that have been presented to 2D. An approach that might be taken is to use banks of 2D wavelets in multiple orientations and perform a principal components analysis to identify the direction and magnitude of the disparity, and the components of any ane distortion between the two signals. With this information one would be able to reconstruct the 3D structure of the scene. Invariant measures are important in low-level vision. Phase information is a good basis on which to construct these measures.
Bibliography
[1] E. H. Adelson and J. R. Bergen. Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2(2):284299, 1985. [2] S. J. Anderson and D. C. Burr. Spatial and temporal selectivity of the human motion detection system. Vision Research, 25(8):11471154, 1985. [3] J. R. Atallah. On symmetry detection. IEEE Transactions on Computers, C-34:663666, 1985. [4] Y. K. Aw, R. Owens, and J. Ross. Image compression and reconstruction by a feature catalogue. In ECCV92, pages 749756. Springer-Verlag Lecture Series in Computer Science, May 1992. Santa Margherita Ligure, Italy. [5] F. Bergholm. Edge focusing. IEEE Trans. Pattern Analysis and Machine Intelligence, 9(6):726741, November 1987. [6] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, Cambridge, MA, 1987. [7] M. Brady and H. Asada. Smoothed local symmetries and their implementation. The International Journal of Robotics Research, 3(3):3661, 1984. [8] D. C. Burr. Sensitivity to spatial phase. Vision Research, 20:391396, 1980. [9] A. D. Calway, H. Knutsson, and R. Wilson. Multiresolution estimation of 2-D disparity using a frequency domain approach. In D Hogg and R Boyle, editors, British Machine Vision Conference 1992, pages 227236. SpringerVerlag, September 1992. 123
124
BIBLIOGRAPHY
[10] A. D. Calway and R. Wilson. Curve extraction in images using the multiresolution fourier transform. In Proc. Int. Conf. Acoust., Speech, and Signal Processing, pages 21292132. IEEE, April 1990. [11] J. F. Canny. Finding edges and lines in images. Masters thesis, MIT. AI Lab. TR-720, 1983. [12] J. F. Canny. A computational approach to edge detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):112131, 1986. [13] C. K. Chui. An Introduction to Wavelets. Academic Press, San Diego, CA, 1992. [14] I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5):9611005, September 1990. [15] J. G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical lters. Journal of The Optical Society of America A, 2(7):11601169, July 1985. [16] Rachid Deriche. Using Cannys criteria to derive an optimal edge detector recursively implemented. The International Journal of Computer Vision, 1:167 187, April 1987. [17] Rachid Deriche, Jean Pierre Cocquerez, and Guy Almouzny. An ecient method to build early image descriptions. In International Conference on Pattern Recognition, pages 588590, 1988. Rome. [18] J. M. H. du Buf. Ramp edges, Mach bands, and the functional signicance of the simple cell assembly. Biological Cybernetics, 70(5):449461, 1994. [19] Olivier Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Massachusetts, 1993.
BIBLIOGRAPHY
125
[20] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. Journal of The Optical Society of America A, 4(12):23792394, December 1987. [21] David J. Field and Jacob Nachmias. Phase reversal discrimination. Vision Research, 24(4):333340, 1984. [22] M. M. Fleck. Spectre: An improved phantom edge nder. In Proceedings, 5th Alvey Vision Conference, pages 127132, 1989. [23] Margaret M. Fleck. Multiple widths yield reliable nite dierences. IEEE T-PAMI, 14(4):412429, April 1992. [24] Margaret M. Fleck. Some defects in nite-dierence edge nders. IEEE TPAMI, 14(3):337345, March 1992. [25] D. J. Fleet. Measurement of Image Velocity. Kluwer Academic Publishers, Massachusetts, USA, 1992. [26] D. J. Fleet and A. D. Jepson. Computation of component image velocity from local phase information. International Journal of Computer Vision, (5):77104, 1991. [27] D.J. Fleet, A. D. Jepson, and M. Jenkin. Phase-based disparity measurement. Computer Vision, Graphics and Image Processing, (53), 1991. [28] L. M. J. Florack, B. M. ter Harr Romeny, J. J. Koenderink, and M. A. Viergever. Scale and the dierential structure of images. Image and Vision Computing, 10(6):376388, July 1992. [29] W. T. Freeman. Steerable Filters and Local Analysis of Image Structure. PhD thesis, MIT Media Lab. TR-190, June 1992. [30] Goesta H. Granlund. In search of a general picture processing operator. Computer Graphics and Image Processing, (8):155173, 1978.
126
BIBLIOGRAPHY
[31] A. Grossmann and J. Morlet. Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM Journal of Mathematical Analysis, 15(4):723736, July 1984. [32] R. M. Haralick. Digital step edges from zero crossings of second directional derivatives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):5868, January 1984. [33] D. J. Heeger. Optical ow from spatiotemporal lters. In Proceedings, 1st International Conference on Computer Vision, pages 181190, June 1987. London. [34] D. J. Heeger. Optical ow using spatiotemporal lters. International Journal of Computer Vision, 1:279302, 1988. [35] D. J. Heeger. Half-squaring in responses of cat striate cortex. Visual Neuroscience, 9:427443, 1992. [36] D. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9:181197, 1992. [37] M. K. Hu. Visual pattern recognition by moment invariants. IRE Transactions in Information Theory, IT-8:179187, 1962. [38] A. Hummel. Representations based on zero-crossings in scale-space. In Proceedings IEEE Computer Vision and Pattern Recognition Conference, pages 204209, June 1986. [39] M. Jenkin and A. D. Jepson. The measurement of binocular disparity. In Z. Pylyshyn, editor, Computational Processes in Human Vision. Ablex, New Jersey, 1988. [40] D. G. Jones and J. Malik. A computational framework for determining stereo correspondence from a set of linear spatial lters. In ECCV92, Springer-Verlag Lecture Notes in Computer Science, volume 588, pages 395410. SpringerVerlag, May 1992. Santa Margherita Ligure, Italy.
BIBLIOGRAPHY
127
[41] D. G. Jones and J. Malik. Determining three-dimensional shape from orientation and spatial frequency disparities. In ECCV92, pages 661669. SpringerVerlag Lecture Series in Computer Science, May 1992. Santa Margherita Ligure, Italy. [42] Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. Intl. J. Computer Vision, 2:321331, 1988. [43] H. E. Knutsson, R. Wilson, and G. H. Granlund. Anisotropic nonstationary image estimation and its applications: Part I restoration of noisy images. IEEE Transactions on Communications, COM-31(3):388397, March 1983. [44] J. J. Koenderink and A. J. van Doorn. Invariant properties of the motion parallax eld due to the movement of rigid bodies relative to an observer. Optica ACTA, 22(9):773791, 1975. [45] J. J. Koenderink and A. J. van Doorn. How an ambulant observer can construct a model of the environment from the geometrical structure of the visual inow. Kybernetik, pages 224247, 1978. [46] Jan J. Koenderink. The structure of images. Biological Cybernetics, (50):363 370, 1984. [47] P. D. Kovesi. A dimensionless measure of edge signicance. In The Australian Pattern Recognition Society, Conference on Digital Image Computing: Techniques and Applications, pages 281288, 4-6 December 1991. Melbourne. [48] P. D. Kovesi. A dimensionless measure of edge signicance from phase congruency calculated via wavelets. In First New Zealand Conference on Image and Vision Computing, pages 8794, August 1993. Auckland. [49] P. D. Kovesi and J. P. Trevelyan. Using visual Doppler eects to deduce image motion. In The Australian Pattern Recognition Society, Conference on Digital Image Computing: Techniques and Applications, pages 493500, December 1993. Sydney.
128
BIBLIOGRAPHY
[50] M. K. Kundu and S. K. Pal. Thresholding for edge detection using human psychovisual phenomena. Pattern Recognition Letters, 4:433411, 1986. [51] K. Langley, T. J. Atherton, R. G. Wilson, and M. H. E. Larcombe. Vertical and horizontal disparities from phase. In ECCV90, Lecture Notes in Computer Science, Vol 427, pages 315325. Springer-Verlag, 1990. Antibes, France. [52] P. LEcuyer. Ecient and portable combined random number generators. Communications of the ACM, 31(6):742774, June 1988. [53] S. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674693, July 1989. [54] D. Marr. Vision. Freeman: San Francisco, 1982. [55] D. Marr and E. C. Hildreth. Theory of edge detection. Proceedings of the Royal Society, London B, 207:187217, 1980. [56] Alan M. McIvor. A test of camera noise models. In British Machine Vision Conference, pages 355359, 1990. Oxford. [57] Alan M. McIvor. Edge extraction and linking. In First New Zealand Conference on Image and Vision Computing, pages 485491, August 1993. Auckland. [58] Y. Meyer. Orthonormal wavelets. In Wavelets, Time-Frequency Methods and Phase Space, pages 2137. Springer Verlag 1989, December 1987. Marseille. [59] J. Morlet, G. Arens, E. Fourgeau, and D. Giard. Wave propagation and sampling theory - Part II: Sampling theory and complex waves. Geophysics, 47(2):222236, February 1982. [60] M. C. Morrone and D. C. Burr. Feature detection in human vision: A phasedependent energy model. Proc. R. Soc. Lond. B, 235:221245, 1988. [61] M. C. Morrone and R. A. Owens. Feature detection from local energy. Pattern Recognition Letters, 6:303313, 1987.
BIBLIOGRAPHY
129
[62] M. C. Morrone, J. R. Ross, D. C. Burr, and R. A. Owens. Mach bands are phase dependent. Nature, 324(6094):250253, November 1986. [63] Joseph L. Mundy and Andrew Zisserman, editors. Geometric Invariance in Computer Vision. Articial Intelligence Series. MIT Press, Cambridge, MA, 1992. [64] J. A. Noble. Descriptions of image surfaces, 1989. D. Phil thesis, Department of Engineering Science, University of Oxford. [65] Alan V. Oppenheim and Jae S. Lim. The importance of phase in signals. In Proceedings of The IEEE 69, pages 529541, 1981. [66] R. A. Owens. Feature-free images. Pattern Recognition Letters, 15:3544, 1994. [67] R. A. Owens, S. Venkatesh, and J. Ross. Edge detection is a projection. Pattern Recognition Letters, 9:223244, 1989. [68] P. Perona and J. Malik. Detecting and localizing edges composed of steps, peaks and roofs. In Proceedings of 3rd International Conference on Computer Vision, pages 5257, 1990. Osaka. [69] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diusion. IEEE Trans. PAMI, 12(7):629639, July 1990. [70] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientic Computing. Cambridge University Press, 2nd edition, 1992. [71] J. M. S. Prewitt. Object enhancement and extraction. In B. S. Lipkin and A. Rosenfeld, editors, Picture Processing and Psychopictorics, pages 75149. Academic Press, 1970. [72] K. K. Pringle. Visual perception by a computer. In A. Grasselli, editor, Automatic Interpretation and Classication of Images, pages 277284. Academic Press, New York, 1969.
130
BIBLIOGRAPHY
[73] C. J. Pudney and M. J. Robins. Surface extraction from 3D images using local energy and ridge tracing. In The Australian Pattern Recognition Society, Conference on Digital Image Computing: Techniques and Applications, pages 240245, December 1995. Brisbane. [74] F. Ratli. Mach Bands. Holden-Day, San Francisco, 1965. [75] D. Reisfeld, H. Wolfson, and Y. Yeshurun. Detection of interest points using symmetry. In Proceedings of the 3rd International Conference on Computer Vision, pages 6265, December 1990. Osaka, Japan. [76] Ben Robbins and Robyn Owens. The 2D local energy model. Technical Report 94/5, Department of Computer Science, The University of Western Australia, August 1994. [77] L. G. Roberts. Machine perception of three-dimensional solids. In J. Tippet, D. Berkowitz, L. Clapp, C. Koester, and A. Vanderburgh, editors, Optical and Electro-optical Information Processing, pages 159197. MIT Press, 1965. [78] C. Ronse. On idempotence and related requirements in edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(5):484491, May 1993. [79] L. Rosenthaler, F. Heitger, O. Kubler, and R. von der Heydt. Detection of general edges and keypoints. In ECCV92, Springer-Verlag Lecture Notes in Computer Science, volume 588, pages 7886. Springer-Verlag, May 1992. Santa Margherita Ligure, Italy. [80] John Ross, M. Concetta Morrone, and David C. Burr. The conditions under which Mach bands are visible. Vision Research, 29(6):699715, 1989. [81] Charlie Rothwell, Joe Mundy, Bill Homan, and Van Duc Nguyen. Driving vision by topology. Technical report, INRIA, Sophia-Antipolis, 1995. No 2444. [82] T. Sanger. Stereo disparity computation using Gabor lters. Biological Cybernetics, (59):405418, 1988.
BIBLIOGRAPHY
131
[83] S. Sarkar and K. L. Boyer. Optimal innite impulse response zero crossing based edge detectors. CVGIP: Image Understanding, 54(2):224243, Sepetember 1991. [84] Eero P. Simoncelli, William T. Freeman, Edward H. Adelson, and David J. Heeger. Shiftable multiscale transforms. IEEE Transactions on Information Theory, 38(2):587607, March 1992. [85] S. M. Smith and J. M. Brady. SUSAN - A new approach to low level image processing. Technical Report TR95SMS1b, Defense Research Agency, Farnborough, Hampshire, UK, 1994. [86] I. Sobel. Neighbourhood coding of binary images for fast contour following and general array binary processing. Computer Graphics and Image Processing, 8:127135, 1978. [87] Libor A. Spacek. The Detection of Contours and their Visual Motion. PhD thesis, University of Essex at Colchester, December 1985. [88] S. Venkatesh and R. Owens. On the classication of image features. Pattern Recognition Letters, 11:339349, 1990. [89] S. Venkatesh and R. A. Owens. An energy feature detection scheme. In The International Conference on Image Processing, pages 553557, 1989. Singapore. [90] L. Vincent. Ecient computation of various types of skeletons. Image Processing, pages 297311, 1991. [91] L. Vincent. Morphological Algorithms. Marcel Dekker Inc., New York, 1993. [92] Zhengyan Wang and Michael Jenkin. Using complex gabor lters to detect and localize edges and bars. In Colin Archibald and Emil Petriu, editors, Advances in Machine Vision: Strategies and Applications, World Scientic Series in Computer Science, volume 32, pages 151170. World Scientic Press, Singapore, 1992.
132
BIBLIOGRAPHY
[93] A. B. Watson and A. J. Ahumada. Model of human visual-motion sensing. Journal of the Optical Society of America A, 2(2):322341, 1985. [94] M. A. Webster and R. L. De Valois. Relationship between spatial-frequency and orientation tuning of striate-cortex cells. Journal of The Optical Society of America A, 2(7):11241132, July 1985. [95] Hugh R. Wilson, David K. McFarlane, and Gregory C. Phillips. Spatial frequency tuning of orientation selective units estimated by oblique masking. Vision Research, 23(9):873882, 1983. [96] R. Wilson, H. E. Knutsson, and G. H. Granlund. Anisotropic nonstationary image estimation and its applications: Part II predictive image coding. IEEE Transactions on Communications, COM-31(3):398406, March 1983. [97] Andrew P. Witkin. Scale-space ltering. In Proceedings, 8th International Joint Conference on Articial Intelligence, pages 10191022, August 1983. Karlsruhe. [98] Y. Xia. Skeletonization via the realization of the re fronts propagation and extinction in digital binary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(10):10761086, 1989.
Appendix A Portfolio of experimental results

A.1 Introduction
This appendix presents a collection of experimental results. The rst part of the appendix consists of a portfolio of results obtained on a variety of images. This is followed by a set of results illustrating the eects of varying some of the key parameters in the calculation of phase congruency.
A.2
Portfolio
This portfolio demonstrates the reliability of phase congruency as a feature detector over a wide range of conditions. Displayed on each page is the original image, the raw phase congruency image calculated using the P C2 measure, the raw output from the Canny detector for comparison and a non-maxima suppressed and hysteresis thresholded phase congruency edge map. The raw phase symmetry image is also displayed.
It should be emphasized that all the phase congruency edge maps that are presented in this portfolio were obtained using constant threshold values throughout.
Hysteresis thresholding was used with upper and lower hysteresis threshold values of 0.3 and 0.15. (Note: the P C2 measure requires lower threshold values than P C1 because it penalizes phase deviations more strongly). 133
134
APPENDIX A. PORTFOLIO OF EXPERIMENTAL RESULTS
Notes on parameters used: Phase congruency was calculated using two octave bandwidth lters over four scales and six orientations. The wavelength of the smallest scale lters was 3 pixels, the scaling between successive lters was 2. Thus, over the four scales the lter wavelengths were 3, 6, 12 and 24 pixels. For computational reasons the image convolutions were performed via the Fourier transform. Accordingly lters were constructed directly in the frequency domain as polar separable functions; a logarithmic Gaussian function in the radial direction and a Gaussian in the angular direction. In the angular direction the ratio between the angular spacing of the lters and angular standard deviation of the Gaussians was 1.2. (See Appendix D for implementation details). A noise compensation k value of 2.5 was used (the estimate of the ratio between the mean noise amplitude and the maximum noise amplitude). The frequency spread weighting function cut-o fraction c, was set at 0.4 and the gain parameter g , was set at 10. The value of , the small constant used to prevent division by zero in the case where local energy in the image becomes very small was set at 0.01. The results using the Canny edge detector were obtained using an implementation that followed the suggestions of Fleck (1992a). The raw gradient magnitude image is displayed so that comparisons with the raw phase congruency image can be made without having to consider any artifacts that may be introduced by nonmaximal suppression and thresholding. In all cases the Canny edge detection was performed after the images were smoothed with a 2D Gaussian having a standard deviation of 1 pixel. The phase symmetry images were obtained using the same parameters that were used for the phase congruency calculations, except that ve lter scales were used to capture more low frequency information and no weighting for frequency spread was applied. To produce the phase congruency edge maps some form of non-maximal suppression is required. Most approaches to non-maximal suppression require orientation information to be used in conjunction with the feature strength image to determine local maxima. This is because non-maximal suppression is generally only dened in
A.2. PORTFOLIO
135
terms of one-dimensional signals. For gradient based edge detectors this information comes naturally from the local gradient direction. However, phase congruency images pose a problem in that they do not readily provide orientation information. In the results presented here the lter orientation having maximal local energy was used to provide the feature orientation information at each point in the image. This is not ideal and some of the edge maps have artifacts as a result of this. Paradoxically the problems appear to be worst on synthetic images. The responses on the very sharp articial features in the Test 1 and Test 2 images, particularly at line terminations and corners, cause problems for the non-maximal suppression technique used. Some of these diculties are a result of the ne localization achieved with phase congruency - the discretization of object boundaries is faithfully detected (see the small circles in these test images). The problems associated with non-maximal suppression, and the details of the implementation used, are discussed at some length in Appendix C. Another problem that only seems to occur on synthetic images containing idealized sharp features is the appearance of faint ringing artifacts around edges. The cause of this is not clear, they are believed to be caused numerical eects.
136
A.2.1
Image acknowledgements
The following table lists the images used in this portfolio and their sources. In some cases the original images have been clipped from their original size to 512 x 512 or 256 x 256. Image Test1 Test2 Aerial Size 256 x 256 256 x 256 512 x 512 Source UWA UWA SIP VL SIP UWA SIP BT SIP SIP UWA MSU OX BT
Mandrill 512 x 512 Boat Grace Goldhill Harbour Lena Peppers Scissors Temple VDU Venice 512 x 512 256 x 256 512 x 512 512 x 512 512 x 512 512 x 512 256 x 256 512 x 512 256 x 256 512 x 512
Key to image sources UWA Robotics and Vision Research Group, Department of Computer Science, The University of Western Australia. (Note: images Test1 and Test2 are modelled after test images from ETH, Zurich)
VL
Vision list archive ftp://teleosresearch/VISION-LIST-ARCHIVE/IMAGERY
BT
British Telecom image database. ftp://teleosresearch/VISION-LIST-ARCHIVE/IMAGERY/BT scenes
A.2. PORTFOLIO
137
SIP
Signal and Image Processing Group, University of East Anglia. Standard Image Page http://www.sys.uea.ac.uk/Research/ResGroups/SIP/images ftp/index.html
MSU Michigan State University, Department of Computer, Science Pattern Recognition and Image Processing Laboratory. Image archive ftp://ftp.cps.msu.edu/pub/prip
OX
Oxford Robotics Robotics Group, School of Engineering Science, The University of Oxford.
138
(a) original image
(b) phase congruency edge map
(c) Canny edge strength image
(d) raw phase congruency image Comments: Phase congruency marks lines with a single response, not two, and the magnitude of the phase congruency response is largely independent of local contrast. The phase congruency edge map illustrates the problems with nonmaximal suppression that were mentioned in the introduction to this portfolio. The phase symmetry image picks out the centres of circles and squares in the image (see the checker board pattern). It also identies the centres of some of the negative spaces between objects in the image.
(e) raw phase symmetry image Figure 53: Test1 image
A.2. PORTFOLIO
139
(a) original image
(d) raw phase congruency image
(e) raw phase symmetry image Figure 54: Test2 image
140
(a) original image
Comments: There are many line and point features in this image. Notice how the Canny detector marks edges on each side of lines, and marks circles around point features. The phase symmetry image has mainly responded to roads in the image. (e) raw phase symmetry image Figure 55: Aerial image
A.2. PORTFOLIO
141
(a) original image
Comments: This image is largely made up of line features and this highlights the difference between phase congruency and rst derivative edge operators. The Canny detector marks edges around all the hairs while phase congruency marks the hairs directly as line features. (e) raw phase symmetry image Figure 56: Mandrill image
142
(a) original image
(e) raw phase symmetry image Figure 57: Boat image
A.2. PORTFOLIO
143
(a) original image
(e) raw phase symmetry image Figure 58: Grace image
144
(a) original image
Comments: This image illustrates the ability of the P C2 measure to pick out ne features. The window panes and roof tiles in the nearer houses are clearly marked. (e) raw phase symmetry image Figure 59: Goldhill image
A.2. PORTFOLIO
145
(a) original image
(e) raw phase symmetry image Figure 60: Harbour image
146
(a) original image
Comments: The highlights on Lenas nose and upper lip cause diculties for both detectors, though phase congruency is more successful in picking out the details in the feathers and hat band. (e) raw phase symmetry image Figure 61: Lena image
A.2. PORTFOLIO
147
(a) original image
(e) raw phase symmetry image Figure 62: Peppers image
148
(a) original image
Comments: On a near binary image such as this the phase symmetry image can be very successful. (e) raw phase symmetry image Figure 63: Scissors image
A.2. PORTFOLIO
149
(a) original image
Comments: This image is an example where lower threshold values would have been desirable. The phase symmetry image is not entirely successful in marking the axis of reection because the reections in the water are darker, that is, the symmetry in intensity values is not perfect. (e) raw phase symmetry image Figure 64: Temple image
150
(a) original image
Comments: The low contrast edges of the VDU against the table are marked quite clearly by phase congruency. Notice also, the marking of the right-hand vertical edge of the monitor. (e) raw phase symmetry image Figure 65: VDU image
A.2. PORTFOLIO
151
(a) original image
Comments: Here the value of phase congruencys invariance to contrast can be seen. Notice how some of the buildings along the canal are partly in shadow. The output of the Canny detector almost disappears in the shadowed regions. However, phase congruency successfully picks out many of the features in these regions. (e) raw phase symmetry image Figure 66: Venice image
152
A.3. PARAMETER VARIATIONS
153
A.3
Parameter variations
The following set of results illustrate the eects of varying some of the key parameters in the calculation of phase congruency.
154
original image
phase congruency calculated over 6 orientations
phase congruency calculated 12 orientations
phase congruency calculated over 9 orientations
Figure 67: Test2 image analyzed with dierent numbers of lter orientations. This sequence of images shows the eect of increasing the number of lter orientations. In these three examples the ratio between the angular separation of the lters and the standard deviation of the angular Gaussian distributions of the lters in the frequency plane was kept constant. The appearance of the phase congruency output is mostly unchanged with the number of lter orientations, though some aring starts to occur at corners and other 2D features as the number of lter orientations increase. The cause of this is not clear.
155
original image
phase congruency calculated over 3 scales
Figure 68: Goldhill image analyzed at dierent scales of high-pass ltering. This sequence of images illustrates the eect of using dierent scales of high-pass ltering. The phase congruency result calculated over three scales marks the ne details in the image well, and most features are marked at a similar level of signicance. As the number of analysis scales are increased, bringing more low frequency information into the phase congruency calculation, the signicance of small scale features declines relative to the strong, broad scale features. Note though, that the positions of features remain the same. Apart from varying the number of lter scales used in the analysis, all other parameters were set at the same values used to produce the portfolio of results in the preceding section. These images should be compared to those presented on the following page which show the output of the Canny operator on this image with dierent degrees of low-pass ltering.
156
original image
Canny, sigma = 1
Canny, sigma = 4
Canny, sigma = 2
Figure 69: Goldhill image analyzed with the Canny operator at dierent scales of low-pass ltering. In these images sigma refers to the standard deviation (in pixels) of the Gaussian smoothing lter that was applied to the image prior to applying the Canny operator. This sequence of images illustrates the great loss of feature localization that can occur when low-pass ltering is used.
157
original image
phase congruency, = 0.01
Figure 70: Goldhill image analyzed using diering values of . This series of images shows the eect of using dierent values of , the small constant used to prevent division by zero in the case where local energy in the image becomes very small. As one can see the results are not sensitive to this parameter. All other parameters were set at the same values used to produce the portfolio of results in the preceding section.
158
Test2 image with additive Gaussian noise.
Phase congruency on noise free image.
Raw Canny edge strength on noisy image.
Phase congruency on noisy image.
Figure 71: Illustration of noise compensation on the Test2 image. In this example additive Gaussian noise with a standard deviation of 20 grey levels has been applied. The noise has been successfully ignored, as is seen by the lack of features being marked in the smooth regions of the image. However, the noise has reduced the condence of the measured phase congruency at features resulting in lower phase congruency magnitudes.
159
Test2 image with additive Gaussian noise.
Phase congruency on noise free image.
Raw Canny edge strength on noisy image.
Phase congruency on noisy image.
Figure 72: Illustration of noise compensation on the Test2 image. In this example additive Gaussian noise with a standard deviation of 40 grey levels has been applied. Again, the noise has been successfully ignored in the smooth regions of the image. However, with this higher noise level the condence of the measured phase congruency at features has been reduced even further.
160
Appendix B Noise models and noise compensation

B.1 Introduction
An important part in developing the normalized feature measures described in this thesis has been the development of an eective means of noise compensation. Without eective noise compensation normalized feature measures are generally unusable. In any noise compensation scheme one has to make some assumptions and form a model of the noise. This of course can be dicult and the more assumptions one has to make, the greater the chance that some will be wrong. The noise compensation method developed in Chapter 3 makes four assumptions; that the noise is additive, that the amplitude spectrum of the noise is fairly at, that the level of noise is uniform over the image, and that the maximum inuence of noise on lter output can be estimated by taking a scalar multiple of the estimated average noise inuence on the lter output. Fleck [23] conducts a detailed study of actual noise measurements on images and concludes that the commonly used noise model, independent additive Gaussian noise, is generally valid. However, an earlier study by McIvor [56] concludes that (on the cameras and frame-grabbers that he tests) this simple model is not adequate. In his experiments he found signicant correlation in the noise, especially temporally, and that the noise variance varied with mean intensity. Thus, he suggests that for 161
162
APPENDIX B. NOISE MODELS AND NOISE COMPENSATION
some systems a multiplicative noise model may be more appropriate. Our main interest is in the shape of typical noise spectra in images, and in particular, the degree to which they have uniform amplitude spectra. This appendix describes some basic experiments that were conducted to qualitatively assess these assumptions about the amplitude spectra of image noise. Initial experiments were conducted using synthetic noise. It was found that the spectra of pseudo-random uniform noise and pseudo-random Gaussian noise are substantially uniform. What does vary between these two noise distributions is the ratio between the mean amplitude and the 99th percentile magnitude, though not signicantly. These experiments were followed by an analysis of the noise measured from test images taken with a CCD camera. The main nding was that the noise spectral properties varied with orientation. In the vertical direction it tended to be quite uniform but in the horizontal direction the noise energy was biased to the lower frequencies.
B.2
Noise generators
The pseudo-random number generator used to generate synthetic noise in the test images was the mixed linear congruential generator suggested by LEcuyer [52]. This generator combines two linear congruential generators of dierent periods to obtain a random sequence having a new period that is the least common multiple of the two periods. This generator produces numbers with good statistical properties [52, 70]. A generator such as this produces a uniformly distributed oating point number in the range 01. This approximates the ideal of white noise. To generate noise with a Gaussian distribution the Box-Muller method was used to generate a sequence with a normal distribution from a sequence having a uniform distribution. The implementation used followed that provided by Press et al. ( [70] Ch 7. page 289). Figure 73 provides a comparison of the properties of pseudo-random Uniform and Gaussian noise. In general, the properties we are interested in are very similar. The spectra are at, though noisy, and when the two noise signals are convolved with quadrature pairs of lters the square of the mean amplitude response varies nearly
B.2. NOISE GENERATORS
163
uniform noise 5 4 3 2 1 0 0 100 200 300 400 500 3 2 1 0 -1 -2 -3 0
Gaussian noise
100 200 300 400 filter centre frequency spectrum
500
spectrum 90 80 70 60 50 40 30 20 10 0 0 0.1 0.2 0.3 frequency 0.4 0.5 60 50 40 30 20 10 0 0 0.1
0.2 0.3 frequency
0.4
0.5
filter response amplitude 1 0.8 mean response 0.6 mean squared response 0.4 0.2 0 0 0.05 0.1 0.15 0.2 filter centre frequency ratio of mean response to 99th percentile response 3 2.5 2 1.5 1 0.5 0 0 0.05 0.1 0.15 0.2 filter centre frequency
filter response amplitude 1 0.8 mean response 0.6 mean squared response 0.4 0.2 0 0 0.05 0.1 0.15 0.2 filter centre frequency ratio of mean response to 99th percentile response 3 2.5 2 1.5 1 0.5 0 0 0.05 0.1 0.15 0.2 filter centre frequency
Figure 73: Comparison of the properties of pseudo-random Uniform and Gaussian noise. linearly with lter centre frequency. This is the result we expect in 1D. Where the two dier is in the ratio between the 99th percentile of the lter response amplitude and the mean amplitude response. For uniform noise we expect this ratio to be two, and for Gaussian noise we expect it to be approximately three (three standard deviations). In the results shown here, these values were not obtained exactly. The observed ratio for uniform noise was approximately two though at low frequencies this ratio rose to 2.5. The observed ratio for Gaussian noise was lower than expected,
164
being approximately 2.5 over all frequencies. This may have been caused by a truncation of the ideal Gaussian distribution in the discrete implementation.
B.3
Noise spectra measured from images
What is of real interest is the properties of noise spectra found in real images. A number of images were collected using a Pulnix TM-6CN CCD camera and a PC Vision+ frame-grabber. The camera was pointed at a uniform paper surface with lighting arranged to be as even as possible. The camera was defocused to eliminate the inuence of any texture on the paper. A variety of camera gains, frame-grabber gains and osets, and camera aperture settings were tried under dierent illumination conditions. The images were processed to remove their DC content by calculating the amount that each pixel value deviated from the local mean calculated within a rectangular window about each pixel. Large window sizes (30 - 50 pixels square) were used to allow any low frequency noise content to be identied. This processing does not allow the noise levels at the edge of the image to be calculated. Therefore, to eliminate the inuence of image edge eects only the central 256x256 portion of the processed 512x512 test images were analyzed for their spectral content. Typical image noise values that were found after this processing were about 4 grey values. Low light/high gain conditions would increase this to about 9 grey levels. Taking the Fourier transform of these processed images and displaying the amplitude of their spectra revealed that the the noise spectral properties varied with orientation. In the vertical direction the noise spectrum would be quite uniform (indicating no correlation vertically) but in the horizontal direction the noise energy would be biased to the lower frequencies (indicating some correlation in the horizonatal direction). This observation is consistent with the smear of grey values along pixel rows that is typically observed with CCD cameras. An example of one of these processed noise images and its amplitude spectrum is shown in Figure 74.
B.4. DISCUSSION
165
Figure 74: Typical noise image (left), and its amplitude spectrum (right). (In both images the values have been scaled to ll the maximum display range. Zero frequency is at the centre of the amplitude spectrum image .)
B.4
Discussion
The concern here is how much does this non-uniformity of the noise spectrum aect the noise compensation method used. Figure 75 shows the transfer function of a horizontally oriented, even symmetric, log Gabor lter with a wavelength of 3 pixels. If the lter transfer function is compared with the noise spectrum image one can see that such a lter will under estimate the image noise content in the horizontal direction.
Figure 75: Transfer function of a horizontally oriented, even symmetric, log Gabor lter with a wavelength of 3 pixels. (Zero frequency is at the centre of the image.) From this one concludes that the assumption that noise has uniform spectral
166
properties is simplistic and this results in an under estimation of noise in images from CCD cameras in the horizontal direction. It is hard to quantify the extent to which this degrades the quality of the noise compensation. In practice this eect will not be as signicant as one might rst think. The noise level that is estimated will be accurate for estimating the inuence of noise on the smaller scale lters, and these are the lters that will be most aected by noise. The estimated noise inuence on the larger scale lters will be an under estimate but these lters are least aected by the noise anyway. The area of the frequency plane that the large scale lters respond to is very much smaller than that for small scale lters. Compare Figure 75 with Figure 76 which shows the transfer function for a log Gabor lter having a wavelength of 24 pixels. This wavelength corresponds to the largest scale lters used in calculating the phase congruency maps presented in the portfolio of results. If considered really necessary, there is no reason in principle why one could not modify the noise compensation technique that has been presented to make use of a more accurate (non uniform) model of the noise spectral characteristics.
Figure 76: Transfer function of a horizontally oriented, even symmetric, log Gabor lter with a wavelength of 24 pixels. (Zero frequency is at the centre of the image.)
Appendix C Non-maximal suppression

C.1 Introduction
Non-maximal suppression is generally an afterthought that is tacked on to the end of many image processing algorithms. Unfortunately in this thesis we do not do much better (the subject is being considered in an appendix!). The aims of presenting the material in this appendix are as follows; to highlight the specic problems of non-maximal suppression on phase congruency images, to clearly describe the nonmaximal suppression techniques that were used in the results that were presented, and to generally comment on the weaknesses of existing techniques. Non-maximal suppression is somewhat of an art that sometimes requires excruciating attention to how things behave at a pixel by pixel level. Various techniques have been developed for ensuring corners and junctions are treated correctly, or at least lled in in a plausible manner. Often the non-maximal suppression algorithm and other post-processing operations are designed to make up for the shortcomings of the underlying edge detector at junctions and corners; examples of this sort can be found in Deriche et al. [17], McIvor [57], and Rothwell et al. [81] No really systematic study of non-maximal suppression seems to have been done in the computer vision literature. Psychophysicists do not seem to have considered the problem in detail either, and that is despite many psychophysical models of perception implicitly relying on some kind of non-maximal suppression process for ultimate feature localization. 167
168
APPENDIX C. NON-MAXIMAL SUPPRESSION
C.2
Non-maximal suppression using feature orientation information
Most non-maximal suppression techniques require an orientation image to be used in conjunction with the feature strength image to determine local maxima. This is because non-maximal suppression is generally only dened in terms of one-dimensional signals. (Is the value of a point greater than the values to the left and right of it?) The orientation image allows one to scan across the feature image to extract the appropriate local one-dimensional signal in which we test whether the point of interest is a local maximum or not. For gradient based edge detectors this orientation is supplied by the direction of the local intensity gradient. Implicit in this approach is the assumption that features are smooth contours of low curvature. Feature junctions are problematic because, strictly speaking, no feature orientation can be dened. Indeed Rothwell et al. [81] show that the direction of intensity gradients can provide very misleading orientation information in the vicinity of junctions. Despite these diculties this approach to non-maximal suppression is widely used, and this is the approach that was adopted in producing the results presented in this thesis. A problem in using this approach to perform non-maximal suppression on a phase congruency image is that we do not have a direct equivalent of the intensity gradient to provide orientation information. Instead, the lter orientation having the maximum local energy at each point in the image is used. This produces a somewhat quantized orientation image (quantized to the number of lter orientations). More accurate orientation data can be obtained by applying parabolic interpolation to the energy values across the orientation that has maximum energy and its two neighbouring orientations, and then nding the orientation that maximizes energy on the tted parabola. In practice the change in the nal result from using this extra precision is hardly noticeable. An additional problem found in some non-maximal suppression algorithms (eg. Canny 1983) is that the discrete nature of the pixel grid is allowed to inuence the orientation in which the test for local maxima is made. To minimize the inuence
C.3. ORIENTATION FROM THE FEATURE IMAGE
169
of the discrete grid the following process was used: In considering each pixel in the image two ideal (real valued) pixel locations at a radius of 1 on each side of the pixel in the required direction are calculated. The phase congruency values at these two real valued locations are then estimated from the phase congruency values at their four surrounding, integer pixel locations, using bilinear interpolation. These interpolated values are then tested against the centre pixel value to decide whether or not it is a local maximum. This process in illustrated in Figure 77.
image pixel grid desired scan orientation
interpolated pixel locations
pixel being tested
Figure 77: Estimation of surrounding pixels values in non-maximal suppression. Note: using a radius greater than 1 results in a thickening of the lines in the nal edge map. In practice it can be useful to use a radius slightly greater than 1, up to say, 1.5 on each side of the pixel being tested. This prevents occasional breaks in diagonal lines on the edge map, while keeping the thickness of most horizontal and vertical lines in the edge map to 1.
C.3
Orientation from the feature image
It seems rather desirable that one should not need orientation data obtained via the feature detection process to perform non-maximal suppression. Firstly, orientation information near feature junctions and corners can be very unreliable, and secondly, feature orientation information may not be directly available as is the case with phase congruency. This second problem was encountered by Canny [11] in the implementation of
170
his optimal line detector. One cannot obtain orientation information when using symmetric operators to detect line features. Canny approached this problem by using the feature strength image itself to calculate the best orientation to test for local maxima. Canny chooses the direction, n in the feature image that maximizes the second derivative of the signal, I in the direction n plus a scalar times the rst derivative of the signal in the direction n, that is n: d2 I dI + is a maximum . 2 dn dn (43)
This expression works as follows1 : When we are at the top of a ridge in the feature image the appropriate direction to scan in the image is across the ridge, and this is the direction that maximizes the expression above; the rst term dominating because the second derivative will be largest across the top of the ridge, with the rst derivative remaining small in all orientations. When we are on the side of a ridge in the feature strength image the appropriate direction to scan across the image is up and down the ridge, this is achieved because in this situation the second term in the expression above will dominate. The value of controls the point at which one term dominates over the other. The second term, and its scaling factor , also controls the point at which the end of a ridge feature is considered to occur; this is illustrated in Figure 78. The direction n represents the direction in which the feature strength image is changing most rapidly. A problem with Cannys expression is that it is not dimensionally consistent; we are adding a second derivative of the signal to its rst derivative. This makes the setting of the relative weighting term very much dependent on image contrast values. We can make the expression dimensionally consistent by multiplying the rst term by the distance over the image in which the second derivative is being considered. In a sense, the expression is modied to represent the rst two terms of a Taylor expansion (though with the weighting term ). We then nd which of the
In Cannys thesis the expression that is actually published is n: d I dI is a maximum . + dn dn (44)
where n denotes the orientation perpendicular to n. I believe this is a mistake.
C.4. MORPHOLOGICAL APPROACHES
171
Figure 78: Determining the direction in which to perform non-maxima suppression using rst and second derivative information in the feature strength image. At point 1 the second derivative is a maximum across the ridge. At points 2 and 3 the direction of maximum rst derivative species the appropriate direction. two terms dominates in the signal. If we think of the expression in discrete terms where the derivatives are calculated in terms of nite dierences the expression now becomes n:
I ( ) I n n + is a maximum . n n
(45)
This approach was implemented, whereby the direction n was determined by simply testing a xed number of orientations (usually 8) at each point in the feature strength image. Non-maximal suppression would then be performed in this direction. The results, while satisfactory, were not signicantly dierent from the non-maximal suppression achieved using orientation information obtained from the lter directions that had maximum energy. In some situations the output was marginally inferior and accordingly this approach was not used in the results presented.
C.4
Morphological approaches
While morphological approaches to non-maximal suppression have not been investigated here they deserve some comment. They have the attraction that no feature orientation information is required. An example of the morphological approach is the watershed algorithm by Vincent [90, 91]. This algorithm skeletonizes the feature image by progressively ooding low valued portions of the image and recording points where adjacent puddles meet. An important property of this technique is
172
that the homotopy of junctions is preserved. However, a diculty with the algorithm is that isolated ridges in the feature strength image can be eroded completely from the nal result. Pudney and Robins [73] have recently adapted Vincents algorithm to incorporate a process that identies anchor points in the skeleton of the image which prevent isolated features from being eroded away in the skeletonization process.
C.5
Conclusion
Overall the conclusion is that none of these methods is entirely satisfactory. It is probably fair to say that the problem of non-maximal suppression deserves considerably more attention than it has received so far.
Appendix D Implementation details

The calculation of phase congruency in an image is computationally expensive. This section describes some techniques that reduce the computation requirements signicantly. A number of other details that are important in the implementation are also covered. Finally, a MATLAB implementation of the phase congruency measure P C is presented to allow duplication of the results presented in this thesis. The main computational load in calculating phase congruency is the need to perform convolutions with quadrature pairs of lters over many scales and orientations. In particular, convolutions with lters of very large spatial extent are required. For these reasons convolution via the FFT is perhaps the only practical approach. As described in Chapter 3 lters are constructed directly in frequency space as polar-separable 2D Gaussians. In the radial direction, along the frequency axis the lters have a Gaussian cross-section if we are using Gabor lters, or a log Gaussian cross-section if we are wishing to construct log Gabor lters To satisfy the wavelet requirement that lters have a constant shape ratio, the ratio between the centre frequency, f and the radial standard deviation, of the Gaussians is kept constant. Successive values of centre frequency are scaled geometrically. In the angular direction, the lters have Gaussian cross-sections, where the ratio between the standard deviation and the angular spacing of the lters is some constant. This ensures a xed length to width ratio of the lters in the spatial domain. It is very important that the lters have absolutely no DC component. If one
Trademark of The Mathworks, Inc.
173
174
APPENDIX D. IMPLEMENTATION DETAILS
is using log Gabor lters (as recommended in Chapter 4) this is not a problem. However, if Gabor lters are used two measures should be used to ensure a zero DC value: rstly the ratio of f / should be kept at 3 or larger, and secondly, after constructing the lter the magnitude of the residual zero frequency component should be checked and a 2D Gaussian of that amplitude, centred over the zero frequency point subtracted from the lter. The practical range of lter frequencies that can be constructed is limited. When constructed in the frequency domain low frequency lters suer from quantization eects, and high frequency lters suer from truncation at the Nyquist frequency. In practice, the maximum frequencies that can be used eectively correspond to lters having wavelengths of no less than about 3 or 4 pixels. A major gain in computational eciency can be made by combining the otherwise two separate convolutions between the image and a quadrature pair lters into the one multiplication in the frequency domain. This is done by adapting the ideas presented by Press et al. ([70], Ch 12) for combining the inverse Fourier transform of two real valued functions into the one inverse FFT operation. Spatially, a quadrature pair of lters are real/even-symmetric and real/odd-symmetric. In the frequency domain they are real/even-symmetric and imaginary/odd-symmetric respectively. If the imaginary/odd-symmetric lter is multiplied by i so that it becomes real and odd-symmetric, its inverse transform will become imaginary and odd-symmetric. Exploiting the linear properties of the Fourier Transform we can add the real/even-symmetric and real/odd-symmetric lters together to produce one lter and then multiply this by the Fourier Transform of the image to perform the convolution. After taking the inverse transform the individual results of the two convolutions are easily extracted due to the even and odd-symmetric properties of the lters. The convolution with the even-symmetric lter will be found in the real component of the result, and the imaginary component will contain the convolution with the odd-symmetric lter. A further, signicant, eciency can be gained by only performing complex multiplications between the Fourier transform of the image and the lter within, say, 4 standard deviations from the centre frequency of the lter. Note also that in adding the real/even-symmetric and real/odd-symmetric
D.1. MATLAB IMPLEMENTATION
175
lters together half of the lter cancels out, and hence, halving the number of complex multiplications to be performed. If one employs these computational measures in a carefully coded C program the time taken to calculate phase congruency over 4 scales and 6 orientations on a 256 x 256 image is about 4 minutes on a 486DX2 66MHz PC running Linux.
D.1
MATLAB Implementation
The MATLAB code provided on the following pages is presented in order to provide a formal specication of the calculation process for phase congruency, and to allow duplication of the results that are presented in this thesis. The code calculates the P C measure of phase congruency. With a slight modication the code can be made to calculate the measure of phase symmetry described in Chapter 4. Comments in the code describe how to do this. It should be noted that this MATLAB implementation is considerably slower than an optimized C program.
176
MATLAB Implementation
% % % % % % % % % % % % % % % % % % % % % % PHASECONG - function for computing phase congruency on an image Usage: [phaseCongruency orientation] = phasecong(image) This function calculates the PC_2 measure of phase congruency. The input image should be square and preferably have a size that is a power of 2. Return values: PhaseCongruency orientation Parameters: The convolutions are done via the FFT. Many of the parameters relate to the specification of the filters in the frequency plane. The parameters are set within the file rather than being specified as arguments because there are so many of them and they rarely need to be changed. You may want to experiment with editing the values of nscales and noiseCompFactor. Author: Peter Kovesi April 1996 - phase congruency image - orientation image.
function[phaseCongruency,orientation]=phasecong(image) sze = size(image); if(sze(1) ~= sze(2)), error(phasecong: image must be square) end nscale norient minWaveLength mult sigmaOnf = = = = = % % % % % % % 1.2; % % % 2.5; % % 0.4; % % 10; % % % 0.001; % 4; 6; 3; 2; 0.55; Number of wavelet scales. Number of filter orientations. Wavelength of smallest scale filter. Scaling factor between successive filters. Ratio of the standard deviation of the Gaussian describing the log Gabor filters transfer function in the frequency domain to the filter center frequency. Ratio of angular interval between filter orientations and the standard deviation of the angular Gaussian function used to construct filters in the freq. plane. Scaling factor used to relate the estimated mean noise amplitude to its maximum amplitude The fractional measure of frequency spread below which phase congruency values get penalized. Controls the sharpness of the transition in the sigmoid function used to weight phase congruency for frequency spread. Used to prevent division by zero.
dThetaOnSigma
noiseCompFactor = cutOff g = =
epsilon
% % % % %
This code requires arrays of concatenating matrices side rows x cols the concatenated function submat is used to concatenated matrices.
2D matrices to be stored. This is done by by side. If we have N matrices of size matrix will be of size rows x (N*cols). The extract the ith sub-matrix from one of these
thetaSigma = pi/norient/dThetaOnSigma; % Calculate the standard deviation of the % angular Gaussian function used to % construct filters in the freq. plane. % % % % % The noise compensation factor is scaled to account for the total effect over all scales given one unit of noise measured at the smallest scale. The assumption is that the noise spectrum is flat, hence the noise response of filters at different scales can be determined from their bandwidth relative to the bandwidth of the smallest scale filter.
noiseCompFactor = noiseCompFactor * ( 1 - (1/mult)^nscale) / ( 1 - 1/mult);
177
imagefft = fft2(image); sze = size(imagefft); rows = sze(1); cols = sze(2); totalEnergy = zeros(sze); totalSumAn = zeros(sze); orientation = zeros(sze);
Fourier transform of image
% % % % % %
Matrix for accumulating weighted phase congruency values (energy). Matrix for accumulating filter response amplitude values. Matrix storing orientation with greatest energy for each pixel.
for o = 1:norient, % For each orientation. disp([Processing orientation num2str(o)]); angl = (o-1)*pi/norient; % Calculate filter angle. wavelength = minWaveLength; % Initialize filter wavelength. sumE_ThisOrient = zeros(sze); % Initialize accumulator matrices. sumO_ThisOrient = zeros(sze); sumAn_ThisOrient = zeros(sze); Energy_ThisOrient = zeros(sze); EOArray = []; % Array of complex convolution images - one for each scale. AnArray = []; % Array of amplitude images - one for each scale. for s = 1:nscale, % For each scale. % Convolve image with even and odd filters - result is stored in EO EO = convloggabor(imagefft, wavelength, angl, sigmaOnf, thetaSigma); EOArray = [EOArray, EO]; An = abs(EO); % Amplitude of even & odd filter response. AnArray = [AnArray An]; sumAn_ThisOrient = sumAn_ThisOrient + An; % Sum of amplitude responses. sumE_ThisOrient = sumE_ThisOrient + real(EO); % Sum of even filter convolution results. sumO_ThisOrient = sumO_ThisOrient + imag(EO); % Sum of odd filter convolution results. wavelength = wavelength * mult; % Wavelength of next filter end % Get weighted mean filter response vector, this gives the weighted mean phase angle. MeanE = sumE_ThisOrient ./ (sumAn_ThisOrient + epsilon); MeanO = sumO_ThisOrient ./ (sumAn_ThisOrient + epsilon); % % % % Now calculate An(cos(phase_deviation) - | sin(phase_deviation)) | by using dot and cross products between the weighted mean filter response vector and the individual filter response vectors at each scale. This quantity is phase congruency multiplied by An (energy).
for s = 1:nscale, Energy_ThisOrient = Energy_ThisOrient ... + real(submat(EOArray,s,cols)).*MeanE + imag(submat(EOArray,s,cols)).*MeanO ... -abs(real(submat(EOArray,s,cols)).*MeanO - imag(submat(EOArray,s,cols)).*MeanE ); end % % % % % % % % % % Note: To calculate the phase symmetry measure replace the for loop above with the following loop. (The calculation of MeanE, MeanO, sumE_ThisOrient and sumO_ThisOrient can also be omitted). It is suggested that the value of nscale is increased to 5 for a 256x256 image, and that cutOff is set to 0 to eliminate weighting for frequency spread. for s = 1:nscale, Energy_ThisOrient = Energy_ThisOrient ... + abs(real(submat(EOArray,s,cols))) - abs(imag(submat(EOArray,s,cols))); end
% Get mean amplitude for smallest scale at this orientation - this is used % for estimating the influence of noise. Ao_mean = exp(mean(mean(log(abs(submat(EOArray,1,cols)))))); estNoiseEffect = Ao_mean * noiseCompFactor; % Scale by noise comp factor. % Subtract estimated noise influence on energy at this orientation % Note, energy is kept >= 0
178
Energy_ThisOrient= max(Energy_ThisOrient - estNoiseEffect, zeros(sze)); % Form weighting that penalizes frequency distributions that are particularly % narrow. % First get maximum filter amplitude response at each point in the image maxAn = submat(AnArray,1,cols); for s=2:nscale, maxAn = max(maxAn, submat(AnArray,s,cols)); end % Then calculate fractional width of the frequencies present by taking % the sum of the filter response amplitudes and dividing by the maximum % amplitude at each point on the image. width = sumAn_ThisOrient ./ (maxAn + epsilon) / nscale; % Now calculate the sigmoidal weighting function. weight = ones(sze) ./ (1 + exp( (cutOff - width)*g)); % and weight energy by this frequency spread weighting function Energy_ThisOrient= Energy_ThisOrient .* weight; % Update accumulator matrices for energy and sumAn totalEnergy = totalEnergy + Energy_ThisOrient; totalSumAn = totalSumAn + sumAn_ThisOrient; % % % % Update orientation matrix by finding image points where the energy in this orientation is greater than in any previous orientation (the change matrix) and then replacing these elements in the orientation matrix with the current orientation number.
if(o == 1), maxEnergy = Energy_ThisOrient; else change = Energy_ThisOrient > maxEnergy; orientation = (o - 1).*change + orientation.*(~change); maxEnergy = max(maxEnergy, Energy_ThisOrient); end end % For each orientation
% Normalize totalEnergy by the totalSumAn to obtain phase congruency phaseCongruency = totalEnergy ./ (totalSumAn + epsilon); % Convert orientation matrix values to radians. orientation = orientation * (pi / norient); end % End of phasecong function
% % % % % %
SUBMAT Function to extract the ith sub-matrix cols wide from a large matrix composed of several matricies. The large matrix is used in lieu of an array of matricies (arrays of matricies currently not supported by MATLAB).
function a = submat(big,i,cols) a = big(:,((i-1)*cols+1):(i*cols));
179
% % % % % % % % % % % % % % % % % % % % % % % % % % % % %
CONVLOGGABOR Usage: EO = convloggabor(imagefft, wavelength, angl, sigmaOnf, thetaSigma); This function performs convolution via the fft on an image with even and odd log Gabor filters. This code is designed to perform the two convolutions simultaneously. The output EO, is a complex valued matrix. The real part of EO contains the result of the even convolution. The imaginary part of EO contains the result of the odd convolution. A polar-separable filter is constructed. It has a log Gabor transfer function in the radial direction and a Gaussian transfer function in the angular direction. Arguments: imagefft
- Fourier transform of the image (Zero frequency *not* shifted to the centre). wavelength - Wavelength (in pixels) of filter to be applied. angl - Orientation in radians of filter to be applied. sigmaOnf - Ratio of the standard deviation of the Gaussian describing the log Gabor filters transfer function in the frequency domain to the filter center frequency. thetaSigma - Ratio of angular interval between filter orientations and the standard deviation of the angular Gaussian function used to construct filters in the freq. plane.
The image must be square and preferably have a size that is a power of 2. Author: Peter Kovesi April 1996
function EO = convloggabor(imagefft, wavelength, angl, sigmaOnf, thetaSigma) cols = size(imagefft,1); rows = size(imagefft,2); if(cols ~= rows), error(convloggabor: image must be square) end % % % % % First we have to construct the log Gabor filter in the frequency plane. For ease of understanding the filter is constructed with zero frequency placed at the centre of the frequency plane. The filter is then quadrant swapped prior to multiplication with the Fourier transform of the image and subsequent inverse transformation.
fo = 1.0/wavelength; % Centre frequency of filter. rfo = fo/0.5 *(cols/2); % Radius from centre of frequency plane % corresponding to fo. % Create two matrices, x and y. All elements of x have a value equal to its % x coordinate relative to the centre, elements of y have values equal to % their y coordinate relative to the centre. x = ones(rows,1) * (-cols/2 : (cols/2 - 1)); y = (-rows/2 : (rows/2 - 1)) * ones(1,cols); radius = sqrt(x.^2 + y.^2); theta = atan2(y,x); % % % % % Matrix values contain radius from centre. % Matrix values contain polar angle.
For each point in the filter matrix calculate the angular distance from the specified filter orientation. To overcome the angular wrap-around problem sine difference and cosine difference values are first computed and then the atan2 function is used to determine angular distance.
ds = sin(theta) * cos(angl) - cos(theta) * sin(angl); % Difference in sine. dc = cos(theta) * cos(angl) + sin(theta) * sin(angl); % Difference in cosine. dtheta = abs(atan2(ds,dc)); % Absolute angular distance. radius(rows/2+1,cols/2+1) = 1; % Get rid of the 0 radius value in the middle so that % taking the log of the radius will not cause trouble.
180
% Calculate the radial filter component. logGabor = exp((-(log(radius/rfo)).^2) / (2 * log(sigmaOnf)^2)); logGabor(rows/2+1,cols/2+1) = 0; % Set the value at the center of the filter % back to zero. % Calculate the angular filter component. spread = exp((-dtheta.^2) / (2 * thetaSigma^2)); filter = logGabor .* spread; filter = fftshift(filter); EOfft = imagefft .* filter; EO = ifft2(EOfft); end % % % % % Multiply the two to get the filter. Swap quadrants to move zero frequency to the corners. Do the convolution. Back transform.

1996, KOVESI, Invariant Measures of Image Features From Phase Information

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1996, KOVESI, Invariant Measures of Image Features From Phase Information

Uploaded by

Copyright:

Available Formats

INVARIANT MEASURES OF IMAGE FEATURES FROM PHASE INFORMATION

By Peter Kovesi May 1996

c Copyright 1996 by Peter Kovesi

3.5.1 3.5.2 3.5.3 3.5.4

Biological computation of symmetry and asymmetry . . . . 103

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5 Representation and matching of signals 105

D.1 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . 175

1.2. THE APPROACH

1.4. THESIS OVERVIEW

Chapter 2 Image features

CHAPTER 2. IMAGE FEATURES

Gradient based feature detection

2.2. GRADIENT BASED FEATURE DETECTION

CHAPTER 2. IMAGE FEATURES

measured grey value

2.2. GRADIENT BASED FEATURE DETECTION

CHAPTER 2. IMAGE FEATURES

2.2. GRADIENT BASED FEATURE DETECTION

CHAPTER 2. IMAGE FEATURES

2.3. LOCAL ENERGY AND PHASE CONGRUENCY

Local energy and phase congruency

CHAPTER 2. IMAGE FEATURES

2.3. LOCAL ENERGY AND PHASE CONGRUENCY

Dening phase congruency

An cos(nx + n0 ) An cos(n (x)) ,

cos(n (x) (x)) . n An

CHAPTER 2. IMAGE FEATURES

(a) image providing magnitude data

(b) image providing phase data

2.3. LOCAL ENERGY AND PHASE CONGRUENCY

weighted mean of Fourier components _ (x) An

CHAPTER 2. IMAGE FEATURES

An cos(n (x) (x)). Recalling that phase congruency is equal to

An (x)cos(n (x) (x))/

An we can see that phase congruency is the ratio

2.3. LOCAL ENERGY AND PHASE CONGRUENCY

CHAPTER 2. IMAGE FEATURES

convolution with even filter 3 2 1 0 -1 -2 -3 0 50 100 150 200 250 3 2 1 0 -1 -2 -3 0

convolution with odd filter

square and sum

2.3. LOCAL ENERGY AND PHASE CONGRUENCY

CHAPTER 2. IMAGE FEATURES

2.4. ISSUES IN CALCULATING PHASE CONGRUENCY

Issues in calculating phase congruency

CHAPTER 2. IMAGE FEATURES

square, sum and divide by sum of Fourier amplitudes

2.4. ISSUES IN CALCULATING PHASE CONGRUENCY

CHAPTER 2. IMAGE FEATURES

signal 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250

square, sum and divide by sum of Fourier amplitudes

noisy signal 2.5 2 1.5 1 0.5 0 0 50 100 150 200 250

square, sum and divide by sum of Fourier amplitudes

CHAPTER 2. IMAGE FEATURES

Chapter 3 Phase congruency from wavelets

CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

Using Wavelets for Local Frequency Analysis

3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS

CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS

3.2. USING WAVELETS FOR LOCAL FREQUENCY ANALYSIS

0.25 0.375 frequency sum of spectra

0.5 log frequency sum of spectra (log w)

0.5 log frequency

CHAPTER 3. PHASE CONGRUENCY FROM WAVELETS