You are on page 1of 180
INTRODUCTOR TECHNIQUES for 3-D COMPUTE VISION Emanuele Trucco Alessandro Verri ‘nunc agere incipiam tibi, quod vementer ad has res attinel, esse ea quae rerum simulacra vocamus. you shall now see me begin to deal with what is of high importance to the Subject, and to show that there exists what we call images of things) Lucretius, De Rerum Nanura, 424-44 Contents Foreword Preface: About this Book Introduction 1 11 Whatis Computer Vision? 1 12 The Mary Faces of Computer Vision 2 121. Rested Disciplines 2 122. Recrchand Application Areas 3 13. Exploring the Computer Vision World. 4 131. CaxferencesJourals and Books 4 132 Income, 6 133 Some Hints on Math Sofware, 11 14 TheRoad Ahead 11 Digital Snapshots 5 21 Introdudion 16 2.2 Intensitylmages 16 21 Main Concens 16 222 Baie Optics 18 223 Baie Radiomery, 22 224 Gemmaric Image Formation, 26 23 Acquirirg Digital Images 28 2D Basic acs 28 232 Spatial Sampling 31 233 Acquistion Noise and How to Esimat 32 24 — CameraParameters M 24d Denton 34 242. Berisic Prameters, 38 Contents 243 Inrnsic Parameters 36 24S Camera Models Rested, 38 25 Range Data and Range Sensors 40 25.1. Representing Range Images 41 252 Range Sensors 41 253 Active Trangulaon, 254 A Simple Sensor $4 26 Summary 47 27 Further Readings 47 28 Review 48 Dealing with Image Noise st 31 Image Noise 51 BLL Gausian Noise 3 B12 Impulsive Nowe 53 32 Noise Filtering 55 4121 Smoothing by Averaging, 56 322 Gaussian Smoodhing 423 Areour Samples Really Gausian’, 60 324 Nonlinear Filtering, 62 33 Summary 64 34 Further Readings 64 35 Review 6 Image Features 67 4.1 What Are Image Features? 68 42 Bdge Detection 69 424 Bases, 9 422. The Canny Edge Decor 71 423 Other Ee Detectors 50 424 Concluding Remarks on Edge Detection 89 43, Point Beatures: Corners. 82 44 Surface Extraction from Range Images 85 44.1. Deine Shape Clases 86 442. Esimasing Local Shape 87 45° Summary 90 46 Further Readings 90 47 Review 91 More Image Features 95 SA Introduction: Line and Curve Detection 96 52 The Hough Transform 97 S21. The Hough Transform for Lines 97 522 The Hough Traform for Curves, 100 Contents vl 523. Concluding Remarks on Hough Transforms, 100 53. Fitting Elipses to Image Data 101 $41 Euclidean Distance Fi 101 532. Algebra Disance Fi 103, ‘533 Robust ing 106 534 Concluding Remarks on Elie Fiting 107 54 Deformable Contours 108 SAL The Energy Petional, 109 542 The Element ofthe Enerty Functional, 109 543 AGreedy Algorithm, 110 55 LineGrouping 114 56 Summary 17 57 Further Readings 117 58 Review 118 Camera Calibration 123 61 Introdtetion 123 62. Direct 2arameter Calibration 125 621. Basie Equations 25 622 eal Length, Aspect Rati, and Eurinsic Parameters 127 (623 Fuimasing the Image Cenen,131 63 Camera Parameters from the Projection Matrix 132 631. imation ofthe Projection Mari, 132 632 Computing Camera Parameters, 134 64 — Conclusing Remarks 136 65 Summary 136 66 Further Readings. 136 67 Review 137 Stereopsis 139 71 Introdvetion 140 ZA Tee Two Problems of Stra, 140 712. ASinple Stereo Sytem, 188 213. Toe Poramaor ofa Str Spon, 144 712 The Correspondence Problem 145 72. Bascs 145 722 Corelation-Based Methods 146 723. Fate ated Methods, 148 724 Conluding Remarks 149 73 Epipoler Geometry 150 73. Notation, 150 732. Bases 151 733. Tre Esental Mais, E182 734 Tae Fundamental Mtr, F156 Contents 735. Computing Eand F: The Figh-point gordon, 155 736 Locating the Epipotes ftom Eand F156 737. Rectfcaion, 157 74 3D Reconstruction 161 74.1 Reconsructon by Tangulaton, 122 742. Reconsnction up oa Sele Factor 164 743 Reconstruction up toa Projectve Transformation, 165 78° Summary 171 76 Further Readings 171 77 Review 172 Motion ” 81 Introduction 178, SLL. Tre Importance of Visual Motion, 178 812 The Problems of Modon Analysis 180 82 The Motion Field of Rigid Objects 183 821 Basics 18 {$22 Special Case: Pare Translation, 188 823. Special Cave 2 Moving Plane, 187 824 Motion Parlay 188 325 The Instantaneous Epipole, 191 83 The Notion of Optical Flow 191 831. ‘The Image Brighnes Constancy Equation, 122 832. The Aperure Problem, 192 833. TheValiliy ofthe Constaney Equation: Optical Flow, 166 84 Estimating the Motion Field 195 84D. Differential Techniques. 195 842. Feaure-based Telniqus 198 85 Using the Motion Feld 208 SI. 4D Motion and Sracture rom a Sparse Motion Field, 203 852 ED Motion and Sucre rom a Dense Motion Feld, 208 86 — Motion-based Segmentation 212 87 Summary 215 88 Further Readings 215 89° Review 216 Shape from Single-image Cues a9 9.1 Introduction 220 92 Shape from Shading 221 921. Dhe Refecance Map, 221 922 The Fundamental Equation, 23 93. Finding Albedo and lluminant Direction 226 83.1 Some Necessary Assumptions 226 932A Simple Method for Lamberian Swfoces 277 10 " 94 9s 96 97 98 Contents ‘A Variational Method for Shape from Shading 229 9441 The Functional 1 be Minimized, 229 942 The Euler Lagrange Equations, 230 943. from the Continuous othe Discrete Case, 231 944 The Algor, 231 945. Enforcing Igri, 232 946. Some Necessary Deals 244 Shape ‘tom Texture 235, 95.1. Whatis Texte’, 235 95.2. Using Terre to Infor Shape: Fundamentals 297 953 Surface Orientaion from Stas Texture 239 954 Concluding Remarks, 242 Summary 241 Furthe- Readings 242 Review 242 Recognition 207 101 102 03 104 10s 106 107 108 109 What Does it Mean to Recognize? 248 Interpretation Trees 249 1021 An Example, 251 1022 Wild Cards ond Spurious Features 253 1023 A Feable Algorithm, 253 Invariants. 255 1031 troduction 285 02 Definitions 26 1033 twvariant Based Recognition Algorithm, 259 Appescance Based Identification 252 M41 Images or Features? 22 1042 mage Eigenspaces, 25 Concluding Remarks on Object Identification 270, 3D Object Modelling 270, 1061 Feawre-ased and Appearance-based Modes 271 1062 Objes Versus Vewercenered Representation, 272 1063 Coneluding Remarks 273 Summary 273 Furthe- Readings 274 Review 275 Locating Objects in Space 279 Md m2 Introduction 280 Matehing from Intensity Data. 283, 1121 3D Location from a Pepectv Image, 283 122 ¥D Location froma Weak perspective Image, 288 1123 Pose from Elipes, 292 Contents 11.24 Conctding Remarks, 299 113 Matching from Range Data. 294 1131 Esimating Translation Firs, 296 M22. Esimaing Rowton First, 300 1133 Concluding Remarks 301, 114 Summary 302 115 Further Readings 302 116 Review 303, ‘A Appendix 307 AJ Experiments: Good Practice Hints 307 A2 Numerical Differentiation 311 A3.TheSampling Theorem 314 Ad Projective Geometry 316 AS Differential Geometry 320, AG — Singular Value Decomposition 322 AT Robust Estimators and Model Fiting. 326 AS Kalman Filtering 328 AQ Three-dimensional Rotations 332 Index 335 Foreword Until recently, computer vision was regarded asa fleld of research stil in its infancy, rot yet mature and siable enough to be considered part ofa standard curriculum in ‘computer science. Asa consequence, most hooks on computer vision became absolete as soon as they were published. No book thus far has ever managed to provide a ‘comprehensive avervew of theficld since even the good ones focusonanarrow subarea ‘typically the author's research endeavor. With Trucco and Veri, the situation has finally changed. Ther book promises to bo the first true textbook of computer vision, the first to show that computer vision is now a mature discipline with solid foundations. Among connoisseurs, the aulhors are ‘well knowa a carefuland critical experts inthe eld (Tam proud to have figured inthe career of one of therr: Alessandro Verti worked with me at MIT fora short yea, and ‘twas ajoy to work with him) ‘Over the years I have been asked many times by new graduate students or colleagues what to readin order to lear about computer visio. Until now, my answer ‘was that I could not recommend any single book. As.a substitute, | would suggest an ‘ever-changing list of existing books together witha small collection of specific papers. rom now on, however, my answers clear: Introductory Techniques for 3-D Computer Vision i the ext to read, T personaly belive that Jnroductory Techniques for3.D Campute Visi wil he the standard textbook for graduate and undergraduate courses on computer vision in years to come, Its ar almost perfect combination of theory and practice. It provides a complete introduction to computer vison, effectively giving the base background for practitioners and future researchers in the fl “Trucco and Ver have written a textbook that is exemplary in its larity of ex- positon and in its intentions. Despite the intial warning (“Frail dice e il fare c di smezz0 il mare™), the objectives stated in the preface are indeed achieved. The book "penore wor a deere th i Foreword ‘not only places a correctly balanced emphasis on theory and practice but also provides ‘needed material about typically neglected but important topics such as measurements, calibration, SVD, robust estimation, and numerical diferentiaion, Computer vision is just now maturing from an almost esoteric corner of research to.a key discipline in computer science. In the last couple of years, the first billion- dollar computer vision companies have emerged, a phenomenon no dovbt facilitated by the irrational exuberance of the stock market. We will undoubtedly see many more commercial applications of computer vision inthe near future, ranging from industrial inspection and measurements to security database search, surveillance, multimedia and ‘computer interfaces. This is a transition that other fields in engineering, such as signal processing and computer graphics, underwent long ago. Trucco and Vers imely book isthe first to represent the discipline of computer vision ia its new, mature state, asthe industries and applications of computer vision grow and mature as wel. sit reaches ‘adulthood, computer vision is sil far from being a solved problem. The most exciting ‘developments, iscoveries and application ie ahead of us. Though a similar statement can be made about most areas of computer science, iis true for computer vision in ‘much deeper sense than, ay, for databases or graphics. Afterall, understanding the Principles of vision has implications far beyond engineering, since visual perception is ‘one of the key modules of human intelligence, Ulimately, understanding the problem ‘of vision islielyt help us understand the brain. For this reason, Iam sure that along and successful series of new editions will follow this book, with updates most likely to.come in the chapters dedicated to object recognition and in new hot topics such as adaptation and learning. Introductory Techniques for 3-D Computer Vision is much more than a good ‘textbook: I's the fist book to mark the coming of ge of our own discipline, computer ‘Tomaso Poggio ‘Cambridge, MA Brain Sciences Department and Artificial Intelligence Laboratory “Massachussetts Insitute of Technology Preface: About this Book Here tae this book and peruse it wel (Christopher Marlowe, Doctor Fuss, rai die ei fare ce’ di mezzo il mar! Ttlin proves What this Book is and is Not ‘This book le meant:0 be: ‘+ anapplie introduction tothe problems and solutions of modern computer vision ‘+ practical texbook, teaching how to develop and implement algorithms to rep: resentative problems. ‘+a structured, easy-to-follow textbook, in which each chapter concentrates on a specifi problem and solves it building on previous results, and all chapters form logical progression, + a colection ofseleted, well-tested methods (theory and algorithms), aiming to balance difficulty and applicability. ‘+ a starting point to understand and investigate the literature of computer vision, including confxences, journals, and Internet sites a sel-teaching tool for esearch students acad and professional scientists This book is nor meant to be: + an alLembracing book on computer vision and image processing ‘+ a book reporting research results that only specialist ean appreciate: It is meant for taching an exhaustive or historical review of methods and algorithms propose for each problem. ‘The choive of topics has been guided by our feeling of practitioners. There is ‘no implication whatsoever that what is lft out is unimportant. selection has been Between words ndnkthere ithe se. Preface: About this Book imposed by space limits and the intention of expining both theory and algorithms to the level of detail necessary to make implementation really posible What are the Objectives ofthis Book? + ‘Tointroduce the fundamental problems of computer vision, ‘+ Toenable the reader to implement solutions for easonaby complex problems. + To develop two parallel tracks, showing how fundamental problems are solved using both intensity and range images, wo most popular types of images in today's ‘computer vision community. ‘+ To enable the reader to make sense of the literature of computer vision. ‘What isthe Reader Expected to Know? This book hasbeen written for people itrestedin programming solutions tocomputer ‘ison problems The bes way of reading iso ty out the algorithms ona computer. ‘We assume that th eader is able oraslte ou pseudocode into computer programs, and therefore that he o hei familiar with language suitable for numerical compu tations (for instance C, Fortran). We also expect thatthe reader has aces to popular rumercl bares tik the Numerical Rerpes or Meschach orto high-evellanguages, for developing numerical software like MATLAB, Mathematica or Sia The whole book is non-languae specif: We have endeavored to present ll he necessary vision specif information, so thatthe reader anly needs some competence ina programming Language Although some of the mathematics may appear complex at fist glance, the whole book revolves around base calls linear algebra including least squares, eigenvee: tors andsingular value decomposition andthe fundamentals ofanalyticand projective geometry. Who can Benefit from this Book? + Students of university courses on computer vison, typically final-year undergrad uates or postgraduates of degrees like Computer Science, Engineering, Mathe: matics, and Physics. Most of the knowledge required to read this book should be part oftheir normal background, ‘+ Researchers looking for a modem presentation of computer vision, as well asa collection of practical algorithms covering the main problems ofthe discipline. ‘+ Teachers and students of professional training courses. ‘+ Industry scientists and academics interested in learning the fundamentals and the practical aspests of computer vision. 3For information on tia the othe packages msatiaed ere se Chae Preface: About this Book xv How is this Book Organized? Each chapter is opened by a summary fits contents, and concluded by asel-check list, review questions, acancse guide tofurther readings, as wellasexercses and suggestions for computer project. For each problem analyzed, we give 1. problem statement, defising the objective to be achieved 2, atheoretical treatment ofthe problem, 3, one or two algorithms in pseudocode. 4, hints on the practical applicability ofthe algorithms ‘few maberatil concep are cui tothe understanding of solos and signin bt not neces Haown to everybody To make te ook ressoaby Scfeontaned, we lave ince an append ih several brit section reviewing Bachgound tps We ed to gear te append to he velo etl neo 0 Understand te dscsions ofthe main extn tempt ovo jut amere spel ot vague reminder. it iade an fro Keep the fone informal thowghot,hopefly without rsaxing oo mosh te material gor. "he paps ave ben designed oft ick etication of important mater Pelem Stement port defitons and algorithms at ese i fas hin and coments of pace elvan alling codag suggestion spear ina fren point rand are bight a pointer (=). Final we ht nuded in Caper information on he computer von com smn indding ptr Internet von ies (eta image ad dimen) Anda tofthe na pucatons electron newts ndcnferences Suggestions for Instructors “Thematerialinthistextshould be enough for two semesters atthe senior undergraduate level, assuming thre hours per week. Ukimately, this depends onthe students’ back: ‘round, the desired evel of detail the choice of topics, and how much time is allocated ta project work. Insructors may want to review some of the material in the appendix in the first few lectures ofthe course nase ony ore semester i available, we suggest two selections of topics. + Stereo and Motion Chapters 1 to 6 (image acquisition, noise attenuation, feature texttaction andcalibration), then Chapters 7 (stereopsis) and 8 (motion analysis), ‘Object Recognition Chapters | to 6, then Chapters 10 (object recognition) and 11 {object location). [Ideally the students should be assigned projects to implement and test atleast some of the algorittms It is up to th instructor to decide which ones, depending on how the course is stctured, what existing software is available to students, and which parts ofthe book one wants to cover. ‘So Why Another Book on Computer Vision? ‘We like to think of this textbook, frst and foremost, asa practical guide tothe solutions of problems characteristic of today's computer vision community As thisbookis meant for both students and practitioners, we have tied to give a reasonably complete theo= retical treatment of each problem while emphasising practical solutions. We have ried to state algorithms as clearly as possible, and to lay out the material in a graphically appealing manne, i a logical progression. Ttseems tous that theresa shortage of such textbooks on computer vision, There are books surveying large number of topics and techniques, often large and expensive, sometimes vague in many places because of the amount of material included; books very detailed on theory, but lacking on algorithms and practical advice; books meant for the specials, reporting advanced esuts in specific research or application areas, but of little use to students; and books which are nearly completely out of date. Moreover, and ‘not infrequently in computer vison, the style and contents of research articles makes it dificult (sometimes close to impossible) to reimplement the algorithms reported. ‘When working on such articles for this book, we have tried to explain the theory in what seemed to us a more understandable manner, and to add details necessary for implementation. Of course, we take full and sole responsiblity for our interpeetalion. ‘We hope our book fils gap, and satisfies areal demand. Whether or not we have succeeded is for you, the eader, to decide, and we would be delighted to hear your ‘comments. Above all, we hope you enjoy reading this book and find it useful ‘Acknowledgments ‘We are indebted toa numberof persons who contributed various ways tothe making ofthis book. We thank Dave Braunegg, Bob Fisher, Andrea Fusiello, Massimiliano Ponti, Claudio Uras, and Larry Wolff for their precious comments, which allowed us tore move several laws from preliminary drafi Thanks ls to Massimiliano Aonzo, Adele Lorusso, Jean-Francois Los, Alessandro Migliorini, Adriano Pascoleti, Piero Parodi, and Maurizio Plu for their careful proofreading Many people kindly contributed various material which has been incomporated in the book; in the hope of mentioning them all, we want to thank Tiziana Acard, Bill Austin, Brian Calder, Stuart Clarke, Bob Fisher, Andrea Fusiello, Christan Fruhling, Alois Goller, Dave Lane, Gerald McGunigle, Stephen McKenna, Alessandro Miglio- fini. Maid Mirmehd, David Murray, Francesca Cwlone, Mattzin Pi, Costas Plaka, Toseba Tena Ruiz, John Selkirk, Marco Straforini, Manickam Umasuthan, and Andy Wallace. Thanks to Marco Campani, Marco Cappello, Bruno Caprile, Enrico De Micheli, Andrea Fusello, Federico Girosi, Francesco Isgrd, Greg Michaelson, Pasquale Ot tonello, and Vito Roberto for many useful discussions (Our thanks to Chris Glennie and Jackie Harbor of Prentice-Hall UK, the former {or taking us through the early stages ofthis adventure, the latter for following up with remarkably light-hearted patience the development of this book, which was peppered by our consistent inftnging of deadlines. Thanks to Irwin Zucker and Tom Robbins of Prentice Hal in the US. for taking the book through is very final stage. Finally very special thanks to Clare, Daniela, Emanuele, Emily, Prancesce and Lorenzo, who put up wit two absent fathers and husbands for many a month, for their support and love, Fortunately for us, maybe unfortunately for them, we are back, Binanuele Truce Alessandro Ver Dept of Computing and Dip. di lnformaticae letra Engineering ‘Scienze de nformarion Heriot-Watt Unizersiy Universi i Genova Rcearton Via Dodecanes0 35 Edinburgh EH92501 16146 Genova UK Kea sretece.t0- 46. vorridge-inéa.tt [stu wn writen eile the authr was wi the Department of Pass tthe Unversity of Gens, L Introduction “alse “Ready when you are” Big Trouble in Lite China 1.1. Whatis Computer Vision? “This isthe fis, ineseapable question of this book. Since itis very dificult to produce an uncontroversial definition of such a multifaceted discipline as computer vision, let tusask more precise questions Which problems are we attempting to tackle? And how ddowe plan to solve them? Answering these questions wil imit and define the scope of this book, and, in doing so, motivate our definition of computer vision, ‘The Problems of Computer Vision, The target problem ofthis book is comput ing properies ofthe -D world from one or mor digital images. The properties that intrest us are mainly geometric (fr instance, shape and position of solid objets) and dynamic (for instance, object velocities). Most ofthe solutions we present assume that ‘considerable amourt of image processing has alteady taken place; thats, new images have been computed from the original ones, or some image parts have been identified tomake explicit the information necessary to the target computation The Tools of Cemputer Vision. As the name suggests, computer vision involves computers interpreting images. Therefor, the toolsmeeded bya computer vision sytem inchude hardware for acquiring and storing diptal imagesina computer, processing the mages, and commun ating results to users or other automated systems Thisisa book bout the algorithms 3f computer vision: it contains very litle material about hardware, but hopefully enough o realize where digital images come from. Ths does nor mean that algorithms and software are the only important aspect ofa vision system. On the 2 Chapter 1 Introduction ‘contrary, in some applications, one can choose the hardware and can engineer the scene to facilitate the task ofthe vision system: fr instance, by controlling the illumination, using high-resolution cameras, or constraining the pose and location of the objects In ‘many situations, however, one has litle or no control over the scene. For instance, in the case of outdoor surveillance or autonomous navigation in unknown environments, appropriate algorithms are the key to sucess. ‘We are now ready to define the scope of computer vison targeted by this book: a set of computational techniques aimed at estimating or making explicit the geometric and dynamic properties of the 3-D world from digital images. 1.2. The Many Faces of Computer Vision ‘An exhaustive list of all the topics covered by the term “computer vision” dificult to collate, because the ld is vast, multidsciplinry, and in continuous expansion: new, exciting applications appear ll the time. So there is more to computer vision than this ‘book can cover, and we complement our definition inthe previous section with a quick ‘overview of the main research and application areas, and some related disciplines, 1.21 Related (Computer vision has been evolving asa multidisciplinary subject for about thirty years. Its contours blend into those of artificial intelligence, robotics, signal processing, pat- tem recognition, control theory, psychology, neuroscience, and other fields Two conse- quences of the rapid growth and young age of the field of computer vision have been, that + the objectives tools nd people ofthe computer vision community overlap those of several other disciplines. * the definition and scope of computer vision are sill maters of discussion, so that all definitions should be taken witha grain of sal You are likely to come across terms like image analysis, scene analysis, and image understanding, tht in this book we simply regard as synonyms for computer vision, ‘Some other tems, however, denote disciplines closely related but not identical (0 computer vision. Here are the principal ones: mage Processing. Image processing isa vast research area. For our purposes, it differs from computer vision in that it concerns image properties and image-o-image transformations, whereas the main target of computer Vsion isthe 3-D world. As most computer vision algorithms require some preliminary image processing, the overlap between the two disciplines is significant. Examples of image processing include en- ‘hancement (computing an image of better quality than te original oe), compression (devising compact representations for digital images, typically for transmission pur- ‘poses, restoration (climinating the effect of known degradations) and featureexraction (locating special image elements like contours, or textured areas). A practical way to Section 12. The Many Faces of Computer Viion 3 understand the diference between representative problems of image processing and ‘computer vision is tocompare the contents of Chapters 3 4, and 5 with those of Chap: ters6 011, Pattern Recogaition. For along time, pattern recognition has produced teeh- niques for recognizng and clasifving objects using digial images. Many methods developed in the past worked well with 2-D objects or 3D objects presented in con- strained poses, but were unsuitable forthe general 3-D world. This triggered much of the research which led to todays field of computer vision. This book does not cover classic patter recognition, although some of its methods creep up here and there, The International Association jor Pattern Recognition (IAPR) gathers many re- searchers and users interested inthe field, and maintains a comprehensive WWW site (ntep://peipa.esesx.ac.uk/iapr/) Photogrammery. Photogrammetry is concerned with obtaining reliable and accurate measuremers from noncontactimaging. Ths discipline overlap les withcom- puter vision than image processing and pattern recognition The main differences are that photogrammetry pursues higher levels of accuracies than computer vision, and not all of computer vision is related to measuring Taking a look at photogrammetric meth- ‘ds before designinga vision system carrying out meesurements i always a good idea ‘The International Soiesy of Photogrammetry and Remote Sensing iste international organization promoting the advancement of photogrammetry. It maintains avery com: prehensive Internet site (ae%p://0u.p.igp-ethz.ch/isprs/isprs.htal) including archives and activites, and publishes the Journal of Photogrammerry and Remote Sensing 1.22 Research and Application Areas For the purposes ofthis section, research areas refer to topics addressed bya significant ‘numberof computer vision publications (a visible indicator of research), and application ‘areas refer to domains in which computer vision methods ate used, possibly in conjunc- tion with othe techuologies to solve real-world problems. Te following ists and the agcompanying figures should give you the flavor ofthe variety and scope of computer vision; further applications ae illustrated in the book. The ls are meant to be sugges tive, not exhaustive; most ofthe terms that may be unclear now will be explsned later inthe book. Examples of Research Areas Image feature detection Contour representation Featuro-hased segmentation ange image analysis ‘Shape modelling and representation ‘Shape reconstruction from single mage cues (shape from X) 4 Chapter 1 Introduction Stereo vision Motion analysis Color vision ‘Active and purposive vision Invariants ‘Uncalibeated and selcalibeating ystems Object detection 34D object recognition 3D object location High-performance and real-time architectures Examples of Applicaton Areas Industrial inspection and quality control Reverse engineering Surveillance and security Face recognition Gesture recognition Road monitoring Autonomous vehicles (land, underwater, space vehicles) Hancb-eye robotics systems Space and applications Miltary applications Medical image analysis (e., MRI, CT, X-rays, and sonar sean) Image databases Virtual reality telepresence, and teleroboties 1.3 Exploring the Computer Vision World ‘This setion provides satin set of pointers (othe mulifceted work of computer vision. nal the following list, items appear in no particular order, 1.3.1. Conferences, Journals, and Books Conferences. ‘The following international conferences cover the most significant advancements on the topics central o this book. Printed proceedings are avsilable for allconferences, and details appear regularly on the Internet. International Conference on Computer Vision (ICCV) International Conference on Computer Vision und Pattern Recognition (CVPR) European Conference on Computer Vision (ECCV) Section 13. Exploring the Computer Vision World 5 Figure 11 protaype of 2D inspection cel, The cell incudes two types of depth sensors 8 laser scanner, and a Mair fine sem (se Chapter2), whi locate the object n space and esform measurements Noie the urnable for optima, tomate objet pastoning. International Conference on Image Processing (ICIP) International Conference on Pattern Recognition (ICPR) Several national conferences and international workshops are organized on annual or biennial hasis. A complete list would be too long, so none of these are mentioned for fairness. Journals, ‘The following technical journals cover the most signilicant advance ‘ments onthe fied. Tey ean be found inthe libraries of any university hosting research ‘on computer vision or image processing International Journal of Computer Vision IEEE Transactions on Pattern Analysis and Machine Intelligence Compuier Vision and Image Understanding ‘Machine Visionand its Applications “Image and Visien Computing Journal Journal ofthe Cptical Socesy of America A Pater Recognition 6 Chapter 1 Introduction Figure 12 Left: automatic recogiion of road bridges in arial inrared images courtesy of Majid Mirmebdi, University of Surey; Cow copyright reproduced withthe permission ‘of the Conller of Her Majesty's Stationery Office), Right an example of automatic ace detection, parcalarly important for survslline and security systems Tae face regions teleted ‘ean bo subsequenly compared wi database of faces for dentieaton (courtesy of Stephen MeKenna, Queen Mary and Wesel College, London) Paitemn Recognition Leters IEEE Transactions on Image Processing IEEE Transactions on Systems, Man and Cybernetics IEE Proceedings: Vision, Image and Signal Processing Biological Cybernetics Neural Computation Arfcal Intelligence Books. So many books on computer vision and related fields have been pub- lished that it seems futile to produce lon lists unless each entry is accompanied by a ‘comment. Since including a complete, commented ist here would take too much space, wwe leave the task of introducing books inspect, echnical contexts to the following. chapters. 13.2 Internet As the Internet undergoes continuous, ebullient transformation, this information is likely o age faster than the rest of this book, and we can only guarantee that the section 12 Exploring the Computer Vsion World 7 Steaks and avtonomous road navigation: some images from a sequence nthe estimated motion Bed (optical low, cscs in Chapter 8) rapa ndzating the sativa motion of wel and came Figure 12 Computer vsion scguired fom a moving 2 ‘Somputed by motion rays list below is correct a the time of printing, Further Internet sites related to specitic problems ae given in the relevant chapters of this book. cau. edu/-ch1/vision chive home page, nevp:// demos, archives, research + The Computer Vision Home Page, itep:// haem} and the Pilot Buropean Image Processing Ar peipa.essex.ac.uk, contain links to test images, 8 chapter Introduction Section 13 Exploring the Computer Vision World 9 Figure 15 Computervison and vinta eleprsence: the movements ofthe operator’ head ae racked by a sion system (aot shown) and copied in el ine by the head-ee plato (or stereo head) onthe sigh (courtesy of David W. Maray, University of Oxford) roups, research publications, teaching material, frequently asked questions and plenty of pointes to other interesting sites. + The Annotated Computer Vision Bibliography is an excllent, well-organized source of online published papers and eports, as well as announcements of con ferences and journals at http: //iris.usc.edu/Vision-llotes/oibliography/ contents. heal. You can search the contents by keyword, author, journal, con ference, paper te, and other ways. ‘ery competes bibliographies on image analy, pater reopition and Figure 14. Compote vison ming nein npn for ene) oe nd compute von are proded every yer by Aatel Resetfeld atte Univer turner vec (ROVAUYS te tt one shown os ANGUS, bl by the Siy or Mayland (Eepe/septeleoycon/VSION-LISTARCEVE/RUSEPLD™ Occr Sulems Laborato Hero Wat Unser As wh ay ROVIAUYS ANGUS He ae Wee ee ements cis so an sonar somos Chager) Boton ln example of wart soe patio Balan age The wt cs ute rons rma ver athe ples pls, nage fom shove : Beton agi he rot of oma uo lj inret te of Dave Lae + C¥ontin i colton of hypertext summaries of methods and applications o eso Unberst) computer vis, rece extished ty the Univers of abu at /7 wor dato ae h/eaigh/svatt/pesonth pages/eh/Clontine/Nenty. ne) 0 Chapter 1 Introduction Figure 16 An example of modal application of computer vision: computer sista diagoses from mammographic images Top: Xray image of a female breast, distized rom a conventional Xiu) photography. Bottom: ose-up and automatic identifation of suxpet nodules (courtesy ‘rseuar Clarke and Brian Calder, ero Watt University, and Matthw Freedman, Georgetown University Media Scoot, Washington DC). Section 14 TheRoad Ahead 11 «The Vision List and The Pixel are fre electronic bulletins circulating news and requests, and hostng technical debates To subscribe, email pixel essex.ac.x and Vislon-List-Requesteteleoe.con, Both have fip and WWW archives of ‘useful material 133. Some Hints on Math Software “Thssecton gives pointers to numerical computation packages widely used in computer vision, which we founc useful. Notice that this list reflects only our experience; no ‘comparison whatsoever with other packages i implied «+ Numerical Recipes isa book and software package very popularin the vision com ‘munity The soure: code, in C, FORTRAN and Pascal, is published by Cambridge University Press together with the companion book by Pres, Teukolsky, Vet tering, and Flannery Numerical Recipes in CFORTRAN/Pascal. The book is an ‘xcellent introduction to the practicalities of numerical computation. There ialso 1 Numerical Recipes: Example Book illustrating how to cal the library routines. + Meschach i a public-domain numerical library of C routines fo linear algebra, developed by David F. Stewart and Zbigniew Leyk ofthe Australian National University, Canberra. For information and how to obtain a copy, see Web page at http: //a.netlib.no/net1ib/c/neschach/readze, + MATLAB sa software environment for fast prototyping of numerical programs, ‘vith its own language, interpreter, libraries (called ¢oolbox) and visualization took, commerciaized by the US. company The MathWorks. It is designed to be ‘easy to use, and rans on several platforms, inchuding UNIX and DOS machines. Matlab is desrited in several rocent books, and there is large community of tasers Plenty ofieformation on software, books bulletins training and soon avail fable at The MarsWorks’ WWW site, xeep://nex nathworks con/, of contact "The MathWorks In, 24 Prime Park Way, Natick, MA 01760, USA. «+ Mathematiais another software ensironment for mathematical applications, with ‘a large community of users. The standard reference book is Stephen Wolfram’ ‘Mathematica, Plenty of information on software, books, and bulletins available at ‘Wolfram Research's WWW sit, tp: /awy.04 -con/. «+ Sila is a publedomain scientific software package for numerical comput- ing developed ty INRIA (France). It includes linear algebra, control, signal processing, praphics and animation. You can aocess Sciab from http: //awe= rroeq.inria, fr/ecilab/, or contact Scslabeinria. fr, 14 The Road Ahead “This book is organize in two logical parts The fist part (Chapters 2to 5) deals with the image aequsition and processing methods (nose attenuation, feature extraction, Fine and curve detection) necessary to produce the input data expected by subsequent algorithms The primary purpose of thsrst part isnot to give aneshaustive treatment of Jmage processing, butto make the book sl-contained by suggesting image processing Chapter 1 introduction Figure 1.7 ‘The book ata glance: metho dases (white boxes), results (grey Box) their faterdependence, and where to ind the various tpi in this book nto commonly fund inn some cases characteris of compte vison, The me era ebook (Chaps 6 1d with he conpute sn probes {cts nn anal bee entation, objet lean) hat we hve nied sours Tree cute of he book captured by Fi 17, wich stows the metho preset her merdpenene he mtr nr and wth ha {Eons ich opis Ouro tops wh he seg of ne inet or eo ‘oars id sequen, Ble bmg lo rts te images ae proceed atten nie ogo bythe auton pres. The arget Mrantn {tre ot lean and en of ec and ssemparan- Section 1.4 The Road Ahead 13 eters) ae shown at the bottom of Figute 7. The diagram suggests that in most cases the same information an be computed in more than one way. ‘One well-known class of methods rely on the identification of special image clement, called image features. Examples of such methods are: + callration, which determines the value of internal and external parameters of the vision system; + stereo anayss, vhich exploits the difference between two images to compute the structure (shape) of -D objects, and thei location in space: + recognition, whigh determines the objects’ identity and location; + feature-based motion analysis, which exploits the finite changes induced in an mage sequence by the relative motion of world and camera to estimate +-D structure and motion; and + some shape fron single image methods, which estimate 3D structure from the information contained in one image only. [Another class of methods computes the target information from the images di- rectly Of these, this book ineludes: + one shape from single image method, which estimates 3D structure from the shading of a singe image, and + optical flow metiods, class of motion analysis methods which regards an image sequence as a cbse approximation ofa continuous, time-varying sign ‘We are now ready to begin our investigation into the theory and algorithms of computer vision. Digital Snapshots Verwelle doch! Dubs so shin!! Goethe, Faust This chapter deals with digita. images and their relation tothe physical work, We learn the principles of image formation define the two main types of images i this book (inensity and range images), and discuss hov to acquire and store them in a computer. Chapter Overview Section 2.2 considers the base optical, radiometric, and geometric principles underlying the formation of intensity images. ‘Section 23 brings the computer into the picture, laying out the special nature of digital images, their acquisition, and some mathematical models of intensity cameras, Section 24 discuses the funlamental mathematical models of intensity cameras and theit parameters Section 2.5 introduces range images and describes class of range sensors hated on intensity ‘cameras o that we can use what we learn about intensity imaging ‘What You Need to Know to Understand this Chapter ‘+ Sampling theorem (Appentix, section A3). + Rotation matries (Appendix, section A.) "op Yor ae ote! 8 6 Chapter 2 Digital Snapshots 24 Introduction This chapter deals with the main ingredients of computer vision: digital images. We concentrate on two types of images frequently used in computer vision: intensity images, the familiar, photographie images encoding light intensities, ac ‘quired by television cameras; ange images, encoding shape and distance, acquired by special sensors like sonars oF laser scanners Intensity images measure the amount of Fight impinging on a photosensitive device rangeimagesestimate directly the3-D structure ofthe viewed scene through avarity of techniques. Throughout the book, we will develop algorithms for both types ofimages? tis important to stres immediatly that an digital image, irrespective of type is a2-D army (trix) of numbers. Figure 21 illustrates ti fact forthe case of intensity images Depending onthe nature of the image the numbers may represent light intensities, distances or other physical quantities Ths fact has two fundamental consequences + The exact relationship ofa digital image to the physical world (ie. its nature of range or intensity image) is determined by the acquisition process, which depends on the sensor used + Any information contained in images (¢. shape, measurements, or object iden: sity) must ultimately be extracted (computed) trom7-D numerical arrays in which itisencoded. {In this chapter, we investigate the origin ofthe numbers forming a digital image: the rest ofthe book is devoted to computational techniques that make explicit some of the information contained implicy in these numbers. 22. Intensity images ‘We startby introducing the main concepts behind intensity image formation 2.2.1. Main Concepts In the visual systems of many animals incloding man, the process of image formation begins withthe light ays coming from the outside world and impinging onthe photore ceptors in the retina A simple look at any ordinary photograph suggests the variety of physical parameters playing a role in image formation. Here i an incomplete list: (Optical parameters of the ens characterize the sensor's optics They include «lens ype, + foal length, Anoop esas ome algorithm mk ses fr nts nage cn Section 22 Intersty mages 47 Figure 21. Digital images are 2 arrays of numbers: 2020 grey evel image of an eye (pivels have been elrgd for display) andthe coresponding 2-D acy. + fild of view, + angular apertures. Photometric pramcters appear in models ofthe light energy reaching the sensor after being reflected from the objects in the scene. They include: + type, intensity, and direction of ilumination, « reflectance properties ofthe viewed surfaces, + effets ofthe sensors structure on the amount of ight reaching the photore cxptors Geometric parameters determine the image position on which aD points projected. ‘They include: + typeof projections, + position and orientation of camera in space, + perspective distortions introduced by the imaging process. 8 Chapter 2 Digital snapshots ‘oprica ‘SoREEN ‘opricaL ae svsTeM ‘APERTURE Figure 22. "The basic elements of an imaging device Allthe above playsa olen any intensity imaging device, beita photographiccam- cera, camooeder, or computer-based systom. However, further parametersare needed to characterize digital images and their acquisition systems. These include: ‘the physical properties of the photosensitive matrix of the viewing camera, ‘the discrete nature ofthe photoreceptors, ‘the quantization of the intensity scale We will now review the optical, radiometric, and geometri aspects of image formation, 222 Basic Optics ‘We fist noed to establish afew fundamental notions of optics. As for many natural visual systems, the process of image formation in computer vision begins with the light ‘ays which enter the camera through an angular aperture (or pupil), and hit a sereen ‘oF image plane (Figuse 2.2), the eamera’s photosensitive device which registers ight intensities. Notice that most ofthese rays are the result of the reflections of the rays ‘emitted by the light sources and hiting object surfaces. Image Focusing. Any single point of a sene reflects light coming from possibly ‘many directions so that many ras reflected by the same point may enter the camera In order to obtain sharp images, all rays coming from a single scene point, P, must ‘converge onto a single point on the image plane, p, the image of P. i this happen, we say that the image of Pisin focus if ot, the image is spread over aetce, Focusing all rays from a scene point onto a single image point can be achieved in two ways: 1 Reducing the camera's aperture to a point, called a pinhole. his means that only ‘one ray from any given point can enter the camera, and creates a one-to-one correspondence between visible points, rays, and image points Thsresultsin very Section 22 Intensity mages 19 sharp, undistored images of objects at different distances from the camera (see Project 21), 2. Introducing an optical system composed of lenses, apertures, nd other elements, explicily designed to make all rays coming from the same 3-D point converge ‘onto a single image point ‘An obvious disadvartage of a pinhole aperture is its exposure rime; that i, how long the image plane is alowed to receive light. Any photosensitive device (camera film, electronic sensors) needs @ minimum amount of light to rgistera legible image. As @ pinhole allows ver litle lightinto the camera er time uit the exposure time necessary toform the image is too long (typically several seconds) tobe of practical use? Optical systems, instead, can 2e adjusted to work under a wide range of illumination conditions nd exposure times (he exposure time being controlled by a shut), "= Intuitively an opal system canbe regarded as a device that sims at producing the same ‘mage obtained pinhole aperture, but by means oa much ager aperture and ashore «exposure time: Moreover, an optical systems enhances the light gathering power. Thin Lenses. Standard optical ystems are quite sophisticated, but we can learn the basi ideas from the simplest optical system, the shin lens, The optical behavior of thin lens (Figure 2.3) is characterized by two clement: an axis, called opical avs, going through the lens center, O, and perpendicular to the plane; and two special points F and F,, called left ani right focus, placed on the optical ais oa the opposite sides of the lens, and at the same distance from 0. This distance, called the focal length ofthe lens is usually indicated by By construction a tin lens deflects all rays parallel tothe optical axis and coming from one side onto tke focus onthe other side, as described by two basic properties, Thin Lens: Basic Properties 1 Any ray entering the les parallel othe axis on one side goes through the fots on the “other side, 2. Any ray eoteringthe lens from the focus. on onesie emerges pall to thesis oa the ther side The Fundamental Equation of Thin Lenses. Our next task isto derive the {fundanerial equation of thin lenses from the basic properties 1 and 2. Consider a point ?P, not too far from the optical axis, and let 7+ f be the distance of P from the lens along the optical axis (Figure 2.4). By assumption, a thin lens focuses al the rays from onto the same poit, the image point p. Therefore, we can locate p by intersect only two known rays.and we do not have to worry about tracing the path of any other sume, uhh, ven) popetonlo the ager othe spre dane, ihn um ‘spropriorl the mont of ight ht enter the nag em Chapter2. Digital Snapshots Section 22 Intensity mages 21 Figure 23 Geometic optics ofa thin les (a perpendcaat view 10 ‘he plane approximating the len} Note that by applying property 1 tothe ray PQ and property? tothe ray PR, PQ and PR are deflected to intersect at certain pointon the other side of the thin lens. But since the lens focuses all rays coming from P onto the same point, PQ and PR must intersect at p! From Figuee 24 and using the two pars of similar triangles < PS > and < ROF; > and < psF, > and < QO F, >, we obiain immediately f en Setting 2=Z + f and3=2+ f, (21) reduces to our target equation. 2: ‘The Fundamental Equation of Thin Lenses, @2) © The say going through the lens ceter, 0, named the principal ray, ges through pun selected. Field of View, One last observation about optics Let d be the effective diameter ofthe lens, ideatifying the portion ofthe lens actually reachable by light rays Figure 24 Ingingby a thin lens Noie tht in genera, tel leas has wo Aitferent focal lengths, because the carvatares of twosurtaes may beillerent ‘Te sation depicted hee ia speci case, but its sufcient for our purposes. See the Furhet Readings at the end of his chapter for more on optics. © Wecalld the efcive diameter wo emphasie the ilerence between d and the physical ameter ofthe ens The aperture may prevent ight rays from reaching the peripheral Pon ofthe lng so that ds usually smaller than the pial diameter ofthe les. ‘The effective lens diameter and the focal length determine the field of view of the lens, which isan angular measure ofthe portion of3-D space actually seen by the camera Itiscustomary to deine the field of view, w, as half of the angle subtended by the lens ameter as seen fom the focus: a nw = 55 ee ‘This is me minimum amount of optics needed for our purposes Optical models of real imaging devices are a great desl more complicated than our treatment of thin (and ideal) lenses; problems and phenomena not considered here include spherical aberration (defocusirg of nonparaxial rays), chromatic aberration (different defocusing ‘of rays of different dors), and focusing objets at different distances from the camera.* “The findanetal oat oth ems imp ht scene point dln stances he esc ‘infocus te! nag dsances The opal las sews fel exneras re signed hat al ons thn agen ge of dances aread ono slo ote mage plane nd etetoe sepa ino, “Thisrange scaled dh ld ofthe camera 22 Ghapter2_ Digital Snapshots Lic source see coo ARRAY 0. Tot D> omnes p . ap) ua) 1 @ |sunrace Figure 25. tlusraton of the basic radiometric concepts. ‘The Further Readings section atthe end of this chapter tell where to find more about opts. 2.23. Basic Radiometry ‘Radiometrysthe essential part of image formation concerned with the relation among the amounts of light energy emitted {rom light sources, reflected from surfaces, and registered by sensors. We shall use radiometric concepts to pursue two objectives 1. modelling how much of the illuminating lights reflected by object surfaces; 2, modeling how much ofthe reflected light actually reaches the image plane of the Definitions. We begin wit some definitions, illustrated in Figure 25 and sum marized as follows: ‘Image Irradiance and Seene Radiance ‘The image tradiance isthe power of he ight, er unt area and at each point p of the image plane "The sone radiance isthe power ofthe light, er unit rca, deally emitted by each pot P of surface in SD space ina given direction d = Ideally refers to the fact thatthe surface in the definition ofsoene radiance might be the illuminated surface ofan objet the radiating surface ofa light source, or even & Scttiows surface. Te term scone radiance denotes the tll radiance ete bya pint; Section 22 Intensity mages 23 Sometimes radiance refers tothe enery raised from a surface (emitted or reflected), whereas rane refers to the energy incident on a surface. Surface Reflecance and Lambertian Model. A model of the way in which 2 surface refiecs incident light called a surface reflectance model. A well-known one is the Lambertian model, which assumes that each surface point appears equally bright from all viewing directions This approximates well the behavior of rough, nonspecular surfaces as well as various materials lke matte paint and paper. If we represent the direction and amouat of incident light by @ vector I, the scene radiance ofan ideal Lambertian surface, , is simply proportional tothe dot product between Tand the unit ‘normal to the surfac, Un ea) With >0 a constant called the surfac's albedo, which is typical of the surfaces material, We abo assume that I's postive; that the surface faces the ight source “his isa necessary canton forthe ay flight wo reach P this condition snot met, the scene radiance shouldbe set equal 0. ‘We willuse the Lambertian modelin several pars fthisbook; or example, wile analyzing image sequences (Chapter) and computing shape fom shading (Chapter9). Intuitively, the Lambertian modelisased on the exact cancellation of two factors Ne- sletng constant ems the amount of ight reaching any surface isalways proportional tothe cosine of the angle between the Huminant and the surface normal w (that i the effective area of the suriae as seen fom the luinant direction). According to the model, a Lambertian surface reflects light ina given direction d proportionally to the cosine of the angle between dand m. Bu since the surfaces rea sen fom the m grid of squared ‘element (0). Section 23 Acquiring Digital Images 31 In summary itis convenient fo assume that the CCD elements are always in ‘one-to-one correspondence with the image pixels and to introduce effective horizontal and vertical sizes to account forthe posibe different scaling along the horizontal and vertical direction, The effective sizes of the CCD elements are our first examples of camera parameters, vhich are the subject of section 24, 23.2. Spatial Sampling ‘The spatial quantization of images originates at the very early stage of the image {formation process asthe photoreceptors ofa CCD sensor are organized in rectangular array of photosensitive elements packed closely together. For simplicity, we assume that the distance d between adjacent CCD elements (specified by the camera manufacturer) isthesamein thehorzontal and vertical directions Weknow from the sampling theorem ‘that d determines the highest spatial frequency, that can be captured by the system, according tothe relation How does this characteristic frequency compare with the spatial frequency spectrum of ‘mages? A classical sult ofthe diffraction theory of aberration states that she imaging _process canbe expressed in terms of a liner low-pass fltring ofthe spatial frequencies, (ofthe visual signal Foe more information about the diffraction theory of aberrations, see the Further Reafings) In particular, fais the linear size of the angular aperture ‘ofthe optics (eg, the diameter ofa circular aperture), the wavelength of ight, and f ‘the focal length, spatial frequencies larger then SF {do not contribute tothe spatial spectrum of the image (that is they are fered ou). Ina typical imege acquisition system, the spatial frequency vs nearly one order ‘of magnitude smaller than v.. Therefore, since the viewed pattern may well contain spatial frequencies larger than, we expect aliasing. You can convince yourself of the realty of spatial aliasing by taking images of a pattern of equally spaced thin black lines on a white background (sce Exercise 2.6 at inercasing distance from the camera. ‘As predicted by the sampling theorem, ifm isthe numberof the CCD elements inthe horizontal direction, the camera cannot se more than’ vertical lines with’ somewhat less than n/2, say n’~n/3). Until dhe number of ines within the field of view remains smaller than all the ine are correctlyimaged and resolved. Once the limits reached, i the distance of the patter is increased further, but before blurring effects take over, the numberof imaged lines decreases a the distance of the pattern increases! © Themainresor why spatial aliasing soften newecedis thatthe amplitude (hat ithe {informatica coment of high requency components of ordinary images s usualy, though ‘byo means always, very small 32 Chapter? Digital Snapshots 233 Acquisition Noise and How to Estimate It Let us briefly touch upon the problem of nose introduced by the imaging system and bow itis estimated. The effect of noses, essentially that image values are not those expected, as these ae corrupted during the various stages of image acquisition. Asa onsequcnc, the pixel values of twoimagesof the same scene taken bythe sume camera andinthe same light conditions are never exact the same (ty it), Such Buctuations vil intodce eosin the results of ealulation based on piel values iis therefore ‘important to estimate the magnitude ofthe noise. “The main objective of this setion io suggest sinple characterization of image noite, which can be used by the algorithms of fllowing chapters, Noise attenuation, in pariculr, isthe subject of Chapter 3 "An obvious way 1 proceed isto regard noisy variations as random variables, and ty to characterize their statistical behavior, Todo this we aequie a sequence of mages ofthe same scene in the same aquisition conditions, andcompute the pointwise average of the image rightness overallthe images The same sequence can also be sed to estimate the signalo-noise rai ofthe aoqusition system, as fllows® Algorithm EST_NOISE. Weare givenm images ofthe same cee, Fy .. Ey-1hich we assume guare (Nx N) fo simplicity Torech f= let r= Baw - 1 vore(2 {LTD Ba, a} en he quaiyo(, san eal the sana devon th acuion net ach pia “heavens of, oe te inagean ema fe serge owl ey (ot. anesinat ofthe wort ae acon noe, © Notice thatthe bea requeney of some Muorescent room lights may skew the resus of EST-NOISE. Figure211 shows the noise estimates relative to particular acquisition system. A statie camera was pointed at a picture posted on the wall. A sequence of n= 10 images ‘was then acquired. The graphs in Figure 2.11 reproduce the average plus aad minus ‘the sgn rato is usally exes ia decibel (8), an i deed as 10 tne the past in ‘eI othe to of wo power in ures, of pal nd nae). Fo xa» alton ao of 1heamesponc t= 28 Section 23 Acquiring Digital Images 33, 1 Srey eves scan line Figure 2.11. Estimated acquisition noise. Graphs of the average image rightness plu (solid ine) and minus (dotted ne) the estimated standard Geviaton, over sequence of inages of the same scene along the same orion can fe. The image brightness ranges fom 73 o 211 grey evel the standard devition of the image brightness (pixel values) over the entre sequence, along an horizontal scantine (image row). Notice thatthe standard deviation is almost independent ofthe average, typically ess than 2 and never larger than 25 grey values ‘This corresponds o an average signal-to-noise ratio of nearly one hundred "Another cause of noise, which i important when a vision system is used for fine reasurements is that pizel values are not completely independent ofeach other: some cros-salhing occurs between adjacent photosensors in each row of the CCD array. {due to the way the content of each CCD row is read in order tobe sent to the frame bulfer. This caa be vetfied by computing the autocovariance Cre, ) of the image of spatially unifoem pater parallel to the image plane and illuminated by diffuse light ‘Algorithm AUTO_COVARIANCE 1. Given an image E, foreach Cent’) -FTDVEU +E +1)-FT TT 8) Fn Chapter? Digital snapshots Figure 2.12 Autocovariance ofthe image ofa uniform patter for typical image dequisition sistem, showing ross aking Between adjacent piel along © The autcovarance should actually be estimated asthe average ofthe autocovarianee ‘computed on many images of the same patter. To minimize tbe effect of radiometric rotlinearies (se (2.13), Cre should be computed ona atch in the cena portion of the image Figure 2.12 display the graph ofthe average ofthe autocovariance computed on ‘many images acquired bythe same acquisition system used to generate Figure 2.11. The utocovariance was computed by means of (2.18) onapatch of 16 x I6pixelscentered in the image center, Notice the smal but visible covariance along the horizontal direction consistently with the physical properties of many CCD cameras, this indicates that the ‘rey valu ofeach pixel is not completely independent ofthat ofits neighbors 24 Camera Parameters ‘We now come back to discuss the geometry of a vision system in greater detail. In prtvula, we want to characterize the parametcrs underlying camera models 2A. Definitions Computer vision algorithms reconstructing the 3-D structure of a scene or computing the positon of objects in space need equations linking the coordinates of points in -D space with the coordinates oftheir corresponding image points. These equations are ‘written in the camera reference frame (see (2.14) and section 2.2.4), but itis often assunied that + the camera reference frame can be located with respect to some other, known, eference frame (the world reference frame), nd Section 24 Camere Parameters 35 + the coordinatesof the image points inthe camera reference frame canbe obtained {tom pivelcoorlinates, the only ones directly available from the image. This is equivalent to assume knowledge of some camera's characteristics, known in vision asthe camera's extrinsic and ininsic parameters. Our next task to understand ‘theexaet nature of the intrinsic and extrinsic parameters and why the equivalence holds. Defi ‘Camera Parameters ‘Tae extrinsic parameten ao the parameters that define he location and orientation of th camera reference fame wath rapes to known wold eteence fame. “Te nisi paramere are the parameters necessary oink the piel coordinates of animage point with the corespanding coordinates inthe camera reference frame. Inthe next two sections, we write the basi equations that allow us to define the extrinsic and intinsic parameters in practical terms. The problem of estimating the value ofthese parareters is called camera calibration. We shall solve this problem in ‘Chapter 6 since calination methods need algorithms which we discuss in Chapters 4 and 5 242. Extrinsic Parameters ‘The camera reference frame has been introdced forthe purpose of writing the funds ‘mental equations of the perspective projection (2.14) in a simple form. However, the ‘camera reference frane i ofen unknown, and & common problem is determining the location and orientation of the camera frame with respect to some known reference frame, using only image information. The extrinsic parameters are defined as any’ set ‘of geomeric parame that identify uniquely the transformation berween the unknown ‘camera reference frameand a known reference frame, named the world reference fame. ‘A typial choie for describing the transformation between camera and world frame isto wse + 1 3-D translation vector, desribing the relative postion ofthe origins ofthe twoeference frames, and + 43 x 3rotationmatri, R,an orthogonal matrix(R” R= RAT = 1) that brings the corresponding axes of the two frames onto each othe, The orthogonality relations reduce the numberof degrees of freedom of Rto thee (see section A.9 inthe Append). Inn obvious rotation (see Figure 2.13), the relation between the coordinates of ‘point Pin world sed camera frame, Py and P, respectively, is 19) 6 ‘Chapter 2 Digital snapshots Figure 2.13. The elation between camere and world coordinate frames. with my rans Rel rams mre m3 Definition: Extrinsic Parameters The camer extrinsic parameters ae the translation vector, T, and he rotation math, (or, beter it fee parameters) which pei the transformation between the camera andthe word relerence frame 243. Intrinsic Parameters ‘The intinsc parameters can be defined as the set of parameters needed to characterize ‘the optical, geometric, and digital characteristics ofthe viewing camera, For a pinhole ‘camera, we need thee sets of intrinsic parameters, specifying respectively + the perspective projection, for which the only parameter is the focal length, f + the transformation between camera frame coordinates and piel coordinates; * the geometric distortion introduced by the optics. From Camera to Pisel Coordinates. To find the second set of intrinsic param- ‘ters, we must link the coordinates (1g Yn) of an image point in pixel units with the ‘coordinates (x,y) of the same point in the camera reference frame. The coordinates Section24 Camera Parameters 37 (ins Yin) canbe thought of as coordinates of a new reference frame, sometimes called image reference frame, ‘The Transformation between Camera and Image Frame Coordinates Neglecting any geometrdstorsions possibly introduced by the opis nd in the assumption that the CCD arrays made «rectangular rid of photosensitive elements, we have 2G, in) eam) with (2.0) the coordinates in pixel of the image center (the principal point), and (5) the tffetive sie of th pixel (in ilimeters) inthe boizontl and vertical direction respectively, ‘Therefore, the earrent set of intrinsic parametersisf,0,,0) 554 1% ‘The sign changein(220)isduc tothe fact thatthe horizontal nd vertical axes of the image nd camera elerece frames have opposite orientation. In several cases the optics introduces image distortions that become evident a the periphery of the image, or even elsewhere using optics with large fields of view. Fortunately, these distortions can be modelled rather accurately as simple radial distortions aecordingto the relations xenil thr? th) pull thy? + hor with (2494) the coordinates ofthe distorted points, and = x + 93. As shown by the equations above, this distortion isa radial displacement ofthe image points The displacement i all atthe image center, and increases with the distance ofthe point from the image center. ky and a are further intrinsic parameters Since they are usually ‘ery smal, radial distortion is ignored whenever high accuracy ie not required inal regions ofthe image cr when the peripheral pixelscanbe discarded, IFnot, as <<, fps often set equal to O, and ky isthe only intinsie parameter to be estimated inthe radial distortion mode The magnitude of geometric dstortion depends on the qult ofthe ens used. Asarule of thumb, with optisof average quality and CCD size around 50 » S00 expect distorsonsof several pics Gayaround in he outer cori ofthe image- Under these circumstances, ‘model with fs =0 is stl acute, Its now time foe a summary. 38 Chapter? Digital Snapshots Intrinsic Parameters ‘The camera itis parameters are defined a8 the foal length, f, the location ofthe image center in pixel coordinates, the effective pixel size inthe hovigonta nd vertical direction (sy), and, frequied the radial distortion coefficient, ky 244 Camera Models Revisited We are now fully equipped to write relations linking diretl the pixel coordinates ofan ‘mage point wit the world coordinates ofthe corresponding 3D point, without explicit reference tothe camera reference frame needed by (2.18). Linear Version of the Perspective Projection Equations. Plugging (219) and (2.20) into 2.14) we obtain (tin = 008 =f bm = 098) =F; 2 Re. where R,,/=1,2,3,isa3-D vector formed by the/-throw ofthe matrix . Indeed, (2.21) relates the 3-D coordinates ofa point in the word frame to the image coordinates of the corresponding image point, va the camera extrinsic and intinsie parameters. "Notice that, due tothe particular form of (2.21), not all the itis parameters are independent. In particular, the Focal Tength could be absorbed into the effective sizes of the CCD elements [Neglecting radial distortion, we can rewrite (2.21) asa simple matrix produc. To this purpose, we define two matrices, My and Men a ~fls: 0 0 0 -Fisy oy o 0 7 mong ns “RIT (= mons 1), rym ny RIT M, and Man so that the 3 3 matrix Miy depends only on the intrinsic parameters, while the 3x 4 ‘matrix M,,, only oncheexzrnsic parameters, we now adéa“1” asa fourth coordinate of Po (thatis express Pin homogeneous coordinates) and form the product Mi Mex Pyy ‘we obtain linear matrix equation describing perspective projections. Section24 Camera Parameters 39 ‘The Linear Matrix Equation of Perspective Projections Me x } Pa T What isinteresng about vector [x 2,23] thatthe ratios (3/85) and (x9) are nothing but the inage coordinates n/n =n 22/3 Yin Moreover, we have separated nicely the two steps ofthe word-image projection; ‘+ Mex performs te transformation between the world and the camera reference frame; ‘+ My performs the transformation between the camera reference frame and the mage reference frame. © In more formal terms the relation between a 3-D point anit perspective projection on ‘he image plane cn be seen linear wansfrmation fom the projective space, the space of vectors [Xa Fo Za 1] tothe projective plane, the space of vector [2,82 33) This ‘wansformation is fined up oan arbrary sale factors otal tho matix Mba ony ‘independent eres (Ge review questions). This fat wl be dscused in Chapter 6 The PerspectiveCamera Model. Various eamera model inching te perspec tive and weak perspective ones, can be derived by setting appropriate constraints on the matrix M = Mi Mss. Assuming, for simplicity, 0, and sy ss) 1, M ean then be rewritten a fins Sma ~fns SRT M ( [Resolution or prcson: the smallest change in range thatthe Sensor can measure ot represent. Speed the number of ange points measured per second ‘Sie and weight inpriant in some applications eg, only smal sensors canbe ited on 8 ot ar), = tis often dificult to know the ata souray of Sensor without carrying out your own measurements A:curacy figures are sometimes reported without speiying to which error they refer to (eg, RMS absolute mea, maximum), and often omiting the experimental conltons and the optical properties of the surfaces used. 2.6 Summary Alter working through tis chapter you should be able wo: 2 explain ow digital images are formed, represented and acquired 2 estimate experimentally the nos introduced in an image by an acquisition sytem 2 explainthe concep ofintrisicand extrinsic parameters, the mosteommon modes of intensity cameras and their applicability design (but not etimplement) an algorithm fr calibrating and wing complete range sensor based on det calibration 27 Further Readings Ttishard to find moreon the cnntent of hichapter ont one hook Asa resto ‘want to know more jou must be willing to do some bibliographic seach. A readable account basic optican be foundinthe Feynman's Lecture on Physics[4). Aclasscon the subject and beyond i the Born and Wolt[3]. The Bom and Wolf also covers topics like image formation and spatial requency filtering (though ts not always simple to go through). Our derivation of (2.13) is based on Hor and Sjoberg 6], Horn [5] gvesan extensive treatment of surface refetance models Of the many, very good textbooks on signal theory, ou favorit isthe Oppeniteim, Wllsky ané Young [11]. The discussion con camera models vis the projection mati is based on the appendix of Mundy and Zisserman’s book Geometric Invariants in Computer Vision (9). ‘Our discussion of range seasrs is largely based on Bes [1], which isa very 200d introduction tothe principles, types and valuation of range Sensors. recent 48 Chapter? Digital Snapshots detailed review of commercial laser scanners can be found in [14]. Two laser-based, active iiangulation range sensors are described in (12, 13]; the latter is based oa des cltraon, te former wera grote mode Refen [8] and [2] are examples of triangulation sensors projecting patterns of lines generated us Tncerent ih (es opposed to ase ight) ont th sens. Krotkow (7 nd Nayar and [Nakagawa [10] make good introductions to focus-besed ranging. 28 Review Questions © 21 How does an image change if the focal eng is varied? 22 Give an intuitive explanation of the reason why a pio camera as an infinite depth of Hl 323 Use the definition of Fnumber to explain geometrically why this quantity measures the fraction ofthe light entering the camera which reaches the image plane [3 24 Explain why the beat requency of fuorescent room ight (eg, 60 Hz) ean ‘skew the results of EST NOISE, 1 25 Intensity shresholing is probably the simplest way to locate interesting ob- ject in an image (a problem caled image segmenaion). Te idea is that only the pels whose valuc is abowe a threshold belong to interesting objects. Com rent on the shortcomings ofthis technique, patcuady in trms of he relation between scene radiane and image irradiance. Assuming that scene and ilumina- tion can be controlled, what would you dot gurantee sucessul segmentation by thresholding? 26 The projection matrix fis 23% 4 matrix defined upto an arbitrary sale factor. Thisleaves only 1 of the 1 enties of independent. On the oer hand, we have seen thatthe matrix canbe written in terms of 10 parameters ($intinsic and Gextisicindependent parameters). Can you gues the independent intrinsic parameter that has been lt out? Ifyou cannot guess now, you have to wait for Chapters 1-27. Explain the problem of camera calibration, and why calibration s necessary atl 28 Explain why the length in milimees of an image line of endpoints 2, :) and [9] snot simply 9 ~~ 2 = 9) What does this formal mis? 29. Explain the ditferenc between a range and an intensity image. Could range images be acquired using intensity cameras only (Leno laser ight or the ke)? 2.10. Explain the reason for the word “shaded” in “cosine shaded rendering of ‘ange image". What assumptions onthe illumination does cosine shaded image imply? How i the surface gradient Linked to shading? 2-211 Whatisthe eason fr step lin RANGE CAL? Section2.8 Review 49 15 22 Consider a triangulation sensor which scans a wiole surface profile by translating an object through a plane of laser light. Now imagine the surface is scanned by macing the laser light sweep the object. In both cases the camera is stationary, What parts ofthe triangulation algorithm change? Why? 18 213. The performance ofa range sensor based on (224) depend on the values of {f.8,8. How would you define and determine “optimal” values of /,,9 fo such Exercises © 24 Show that 2.1) and (2.2) are equivalent (© 22 Devise an experiment that checks the prediction of (213) on your own ‘stem. Hint: Use spatially uniform object (ike a lat sheet of matte gay paper) illuminated by perfectly diffuse light. Use opties with a wide ild of view. Repeat the experimentby averaging the acquired image overtime. What ditference does ‘his averaging sep make? © 23. Show that.in the pinhole camera model, three collinear points in 3-D space are imaged int three collinear points onthe image plane. 2 24 Use the perspective projection equations to explain why ina picture ofa face {aken frontaly and from a very small distance, the nose appears much larger than the rest ofthe face, Can this effect be reduced by acting onthe focal length? © 25 Estimate tie nose of your acquisition system using procedures EST NOISE. and AUTO_COVARIANCE. © 26 Use the equations of section 232 to estimate the spatial aliasing of your acquisition system, and devise a procedure to estimate, roughly, the number of CCD elementsof your eamera. 2 2.7 Writea program which displays a range image asa normal image (grey levels encode distance) ora a cosine shaded image. (2 28 Derive 224 from the geometry shown in Figure 215. Hin: Use the law of sines and the pinhole projection equation. Why have we chosen to position the reference frame asin Figure 2.15? © 29. Wecan pric the sensitivity of measurements obtained through (2.24) by ‘aking partial derivatives with espect to the formula’ parameters, Compare such Predictions with respect tb and f. Projects © 21 You can tuld your own pinhole camera, and join the adepts of pinhole ‘photography. Perce ahole about Sami diameter on one side ofan old tin box, 10 to 30min depth. Spray the inside of box and li with back paint Pierce a pinhole ina piece of thsk aluminium fol (eg, the one used for milk tops), and fix the foil, tothe hole int box ith black tape. Ina dark room, fixa piece of black and white photographic film on the hole in the box, and sel the box with black tape. The ‘nearer the pinhole tothe fim, the widor the field of view. Cover the pinhole with 50° Chapter? Digital snapshots References W a 6 4 ‘1 a a 8 o) 0) un ir 03) 0 «pies of black paper tobe used as shutter, Your camera is read. Indicative, 8 125-ASA film may require an exposure of about 5 seconds Make sure that the camera doesnot move as you open and clos the shutter. Some experimentation willbe necessary but results can be trking! 22 Although you wil lam how to locate image features and extract stright ines atomatislyin the next chapter youcan et ead for an implementation of the profile scanner described in section 254, and setup the equipment necessary Au need (inv aition to eamera, fame buffer and computes) is a pojetar cxeating black stripe (easily done with ide projector and an appropiate slide, or even with a asiight) and a few, accurately cut blocks You must also work out the bes arrangement for projector, spe and camera J. Bes, Active, Optical Imaging Sensors, Machine Vision and Applications, Vo. 1, pp. 127152 (988), A. Bake, H.R. Lo, D.MeCoven and B Lindsey taocalr Active Range Sesing, EEE Transactions on Pater Analysis and Machine Ineligence, Vo. 8, pp. 477-483 (193) M. Bor and E. Wolf, Principles of Opis, Pergamon Press, New York (1959), RP Feynman, RB Leighton, and M, Sand, The Feynman Lectures on Physics, Adson ‘Wesley, Reading, Mass (1965). BK Horn, Robor Vision, MIT Press, Cambridge, MA (1986. [BKC Horn and RW. Sjoberg. Caleuatng the Reflectance Map, Applied Optics, Vol. 18, pp 17101799197). E Krotkow, Fotsng, International oust Computer Vision, Vol, 1,9p.223-237 (185) 'M. Maruyama and'S. Abe, Range Sensing by Projecting Malipl Slits with Random Cus, IEEE Transactions on Pater Analysis and Machine Ineligence vo 15, 00, p. 647-551 (1988), {UL Mundy and A. Ziserman, Appendix -Projective Geometry for Machine Vision. Ia ‘Geomericfvarians in Computer Vision, Mundy JL and Ziserman, A. eds, MIT Press (Cambidgs, MA (192) SSK. Nayar and, Nakagawa, Spe from Foeus IEEE Tansactons on Pater Analysis ‘and Machine Ineligece, Vl 16, pp. 824-831 (1938) A. Oppeubcia, A'8 Wilsy an LT. Young, Sinan Stes, Preatie al Items ‘ional Editions (193) Sunt Mare, 1-C Jzouin and G. Mein A Versatile PC-Based Range Finding System, TEE Transctons on Robotics and Automation Vl, RAT, 9, 2, pp. 250-256 (191) E Truccoand RB Fst, Acquistion of Consistent Range Data Using DizetCalibaton, Proc IEEE lt Conf on Robotics and Automaton San Diego, pp. 410-315 (194), ‘T. Wohlers, 3D Digs, Computer Graphies Worl, hl pp. 78.7 (192) Dealing with Image Noise ‘The mariachis would serenade ‘And they would not shut yp tl thy were pid ‘Tom Letre,n Old Mexzo Attenuating of, ideally, suppressing image noise is important because any computer vision ‘system begins by processing intensity values, This chapter introduces a few, basic noise models and filtering methods which constitute an initial but useful toolkit for many practical situations. Chapter Overview Section 3.1 discusses the concept of noise and bow to quantify it I also introduces Gaussian and impulsive noise, and the effects on images. Section 32 discusses some estental linear and a nonlinear filtering methods sed to attenuate random and impulsive noise. ‘What You Need to Know to Understand this Chapter ‘+ The basics of signal theory: sampling theorem (Appendix, section A.3), Fourier transforms, and linear filtering. 3.1. Image Noise (Chapter 2 introduced the concept of acquisition noise, and suggested a method to estimate it. But, in general, the term noise covers mach more st 52 Chapter3 Dealing with image Noise Noise In computer vison, noise may rete to any ei in images, data or intermediate ess, thats not interesting forthe purposes ofthe maia computation For example, one can speak of noise in different cases ‘+ For image processing algorithms lke edge or line detection, noise might be the spurious Nuctuations of pixel values introduced by the image acquisition system, ‘+ For algorithms taking as input the results of some numerical computation, noise can be the errors introduced in the latter by random fluctuations or inaccuracies of input data, the computer's limited precision, round-off errors, and the like, * For algorithms trying to group lines into meaningful objects, noise is the contours ‘which do not belong to any meaningful object. 8 In computer vision, what is considered noise for a ask soften te interesting signal fora sitleret task Different types of noise are countered by different techniques, depending on the noise’ mature and characteristics. This chapter concentrates on image nse. It must be clear that nove filtering is classic topic ofboth signal and image processing, and the literature on the subjects vas (Gee section 34, Further Readings). This chapters just ‘meant to provide afew starting tools which prove useful in many practical situations tis now time to formalize better our dramatis persona. Image Noise ‘We sal ase thatthe main image nos is dive and random; thai spurious, random signal, mij), added othe tre piel vals 14) Hepat Demin 6 Noise Amount ‘The amount of nie nan image an be estimated by means of mth standard deviation of the random signaln())-simportantto iow how ong che vithceapct othe interesting signa. This is specid by the sgnalso-noze rao, or SNR: Sie 6) where ois the standard deviation ofthe signal (ihe piel vales 1). The SNR is often expressed in dase: Sitar ta 03) Ths bsration wat mide by end Mai Section 3.1. Image Noise 53 © Addtive noise isan adequate assumption for he image acquisition spstems introduce in ‘Chapter 2, ut in some eases the nose might not be additive. For insiaace, malice oie whereby I =n1,odelsimage degradation in loisin ines und photonraphs owing to gin se "Notice that we assume that the resolution of the quantized grey eves issufcient to sample the image appropriately; that i, to represent all significant variations of the image iradianct.? Coarse quantization can introduce spurious contours and is thus called quantizaion noise. Byte images (256 grey levels per pixel), introduced in (Chapter2, appear to 2 adequate for most practical purposes and are extremely popular. BAA Gaussian Noise In the absence of information, one often assumes nj) fo be modelled by a white, Gaussian, zero-meanstochasic process. For eachlocation (i, ),thisamountsto thinking ‘of mi ) 8 & random variable, distributed according toa zero mean Gaussian distr ‘bution function of fixed standard deviation, which is added to 1(, j) and whose values are completely independent ofeach other and of the image in both space and time. “This simple model preits that noise values are distributed symmetrically around zero and, consequeil, pixel values /(,j) around ther true values 1(j; this is what you expect from goud acquisition systems, whic, in addition, should guarantee low rise levels, Moreover, its easier to deal formally with Gaussian distributions than ‘wth many other statistical models To illustrate the effect of Gaussian noise on images, Figure 3. (a) shows a synthetic grey-level “checkerboard” pattem and the profile ofthe ‘rey levels along a hrizontl canine. Figure 3.1 (b) shows the same image corrupted by adaltive Gaussiar noise, and the profil of the grey levels along the same seanlin. © The Gaussian mie modal is often convenient approximation dictated by ignorance: if ‘xe donot know and cannot eimae the noise crac, we take ito be Gaussian. Be ‘rare, however, tht white Gausian nosis jst an approximation of adie eal ose! ‘You shoul away try and discover as much as possble atthe origin ofthe noise: e3- investigating whch sensor aoquired the Gata, and design suppression methods optimally talloed tos eharacterisics This 8 known a8 mage restoration, another vast chapter of sage processing, 3.4.2. Impulsive Noise [Impulsive noise, also know 2s spot or peak nots, occurs usualy in addition tothe one normally introducedby acquisition, Impulsive noise alters random pixels, making their values very diferen: from the true values and very often from those of neighboring pixels 100, Impulsive nose appears inthe image & a sprnkle of dark and ight spots Itean be caused by ransmission errors, faulty elements in the CCD array, or external noise corrupting the analog-o-digial conversion. Ofcouns, hat asgiionvritonsdependson what youste ae. Tisisdicoseuhrin Capers se Chapter 3 Dealing with Image Noise @ © © Figure 3.1. (a) Symhetcimage o «120 x 120 greysevel“checkerhoat” and sey evel profile slong row. (b) Ate adding roman Gaussian aise =5).(e) Aer adding sl and pepper ‘we (etext for parameters Sal-and-pepper noise i 4 model adopted frequently to simulate impulsive noise in symthetic images. The nosy image values fy(h,&) are given by 1k) xe Ia k ul + Himes ~inin) X21 6a) where / isthe trucimage, x,» ¢ [0,1] are two uniformly distributed random variables, isaparameter controling how much of the image iscorrupted, and nie HOW Severe Section 32 NoiseFitering 55 isthe noise, You eanoblain saturated sal-and-pepper noise turning y into atwo-valued variable (y = Oot y= 1) and setting nin =0 0 ngs = 255, ‘illustrat the effects of salt and pepper noise onimages, Figure 3.1 (right) shows the “checkerboard” satlern andthe sane Seanline of Figure 3.1 (let) corupted by alt and pepper noise Wil nig and |= 99, 32. Noise Filtering Problem Statement: N Suppression, Smoothing, and Filtering Given an image 1 crnpied by nose attenuate a much as possible (aly, eliminate it ng! significantly Attenuating or if posible, suppressing image nose is important a the result of ‘most computation ca pixel values might be distorted by noise. An important example is computing image derivatives, which isthe basis of many algorithms: any noise in the signal can resultin serious errors in the derivatives (see Exercise 3.3). A common technique for noise smoothing sincar fering, which consists in convolving the image With a constant mati, called mask or kernel’ Asa rominder, here isthe basic linear filtering algorithm. ‘Algorithm LINEAR FILTER [Let be aN % M imag, an odd number smaller than both W and Mand A the Kernel of linea ler, that sa mm mas, The fered version fof Ja each pel (, ) en by he diserete convolution ter replaces the value 1, j) with a weighted sum of J values in a neighborhood of i,j); the weights are the entries of the kerel. Te effects of linear filter on a signal can be better appreciated in the frequency domain. Through the convolution theorem, the Fourie transform ofthe convolution of 7 and A is simply the product oftheir Fowier transforms (1) and $(A). Therefor, the result of convolving signal with sto attenuate (or suppress the signal frequencies corresponding to low (or zero) values of [5(4), peetrum of the filter A. "The ane aris 0 inv easter os his pit few, Kernel 3 ft ht he common with const Reel model pace: ad ine ithe pu reponse a the ler, 56 ‘Chapter 3 Dealing with Image Noise 3.2.1. Smoothing by Averaging fall entries of Ain (35) are non-negative, the filters performs average smoothing. The simplest smoothing Kernel isthe mean iter, which replaces pixel value with the mean ofits neighborhood for instance, with m = 3 ifita liad 66 Sliad ‘ifthe sum ofall kernel entries isnot one, asit happens for averaging Kernels (must, be divided bythe sum of te entries to avoid that the fiered image becomes brigher than the origina ove Wy does such ler tenuate nos? Inte, averaging takes out small ‘vats: Averaging mn vale around piel) Grids he andr deviation tthe ne by vm, Frequency Behavior of the Mean Filter. in the frequency domain we have that ‘the Fourier transform of I-D mean iter kernel of width 2W isthe “sine” function sinful sine(o) = 252) (Figure 33 shows an example in 2-D). suai lobe are weighted more thaa the frequencies fal ‘mean filter canbe regarded as an approximate “low-pass ice the signal frequencies falling inside the in the secondary lobes, the filter Limitations of Averaging. Averaging is simple but as problems, including at least the following 1. Signal frequencies shared with noise are los; this implies tht sharp signal vari ations ae filtered out by averaging, and the image is blurted. As we shall ee in (Chapter 4, blurring affects the accuracy of feature localization 2. Impulsive noise is only attenuated and diffused, not removed. 3. The secondary obesin the Fourier transform ofthe mean filter's et nose into the filtered image. 3.22 Gaussian Smoothing Gaussian smoothing isa particular case of averaging, in which the kernels a2-D Gaus sian. Iseffectisllusrated by Figure 32, which shows the results of Gaussian smoothing applied to the noisy “checkerboards” of Figure 3.(center and right), corrupted by Gaussian and impulsive noise respectively. Notice that impulsive noise has only been attenuated; in fact, each spike has also been spread in space Frequency Behavior of the Gaussian Kernel, Te Fourier transform of Gauss {anisstilla Gaussian and, hence, hs no secondary lobes. Thismakes the Gaussian kernel Section3.2 Noise fitering 57 LCC = @) () Figure 3.2 (a) Results of applying Gaussian ering (kernel vd spinel, =1) tothe "checherbour” image coreuped by Gaus note and grey-level pole along a row (b) Same or the eeckerbosed” mage corrupted by sal und pepper noise a better low-pass filter than the mean filter. A comparison of the mean and Gaussian filters in both the spatial and frequency domain in 2-D is shown in Figure 33 Separability ofthe Gaussian Kernel. Gaussian smoothing canbe implemented efficiently thanks tothe fact thatthe kernels separable: 58 Chapter3 Dealing with Image Noise @ () Figure 23. (a) The plot of «$x 5 Gaussian kernel of width (tp) and its Fourie transform (Gottom).(b) The same fora mean-fterkemel, ai ao =¥ YD chi -nj-= en This means that convolving an image / with a 2-D Gaussian kernel G is the same {8 convolving frst all ows, then all columns with a 1-D Gaussian having the same 2. The advantage is that time complexity increases linearly with mask size, instead of quadratically (see Exercise 34). The next box gives the obvious algorithm for a ‘separable kernel implemented in the special cae of the Gaussian kerel Section 32 NoseFitering 59 @ ) Figure 3.4 (a) 1-D Gausin (dotted) and real samples (cres for Sx 5 kere. (b) Plot af corresponding integer cermel, Algorithm SEPAR FILTER “Toconvove an image! with am x m2-D Gaussian kernel G witha = 90 4 Build 1D Gausian mask g, of width, with 2 Convole each rw of with, yeding anew image | Comoe cach column of with Building Gausian Kemels. Thanks to the separability ofthe Gaussian kernel, wwe can consider only 1-D masks, To build a discrete Gaussian mask, one has to sample ‘continuous Gaussian. To do so, we must determine the mask width given the Gaus- sian kernel we intend to use, or, conversely the o of the continuous Gaussian given the desired mask width, A relation between o and the mask width w (typically an odd umber) can be obtained imposing that w subtends most of the area under the Gaus- fan, A adequate choice i w ~ 57, which swbtends 9R.76% of the area. Fitting this portion ofthe Gaussian between the endpoints of the mask, we find that a 3pixel mask corresponds to 23 =3/5 = (46 pixel: Spiel mask to 05 =5/5 = I pixel: in general, oat Ga) Sampling acortinuous Gaussian yields real ere! entries Filtering times can be realy reduced by approximated integer kernels so that image values being integers 00, ‘lating point operations are necessary a all To build an integer kernel, you simply normalize the real eel to make its smallest entry 1, round off the results, and divide by the sum ofthe entries Figure 34 shows the plot ofa LD Gaussian profile, the real samples taken, and the 5» 5 integer kernel (1,9, 18,9, 1), 60 Chapter3_Dealing with Image Noise ‘Algorithm INT_GAUSS_KER ‘To build an approximate integer kernel 1. Compute a floating point kernel G¢h 8) the same size 85 Get gin = G0) be the minimum valu of 6 2 Determine the normalization factor f = Ita ‘3 Compute the entries of the nom normalized fer as G.(hk) indicates closest integer. int {GU 8), where ie 3.23 Are our Samples Really Gaussian? You must be avare that problems lrk behind the straightforward recipe we gave for building discrete Gaussian kernels Itis instructive to take a closer look atleast atone, sampling. The pseization imposes axed sarplinginterval of pixel the piel with is taken as un, by virtue of the sampling theorem (see Append, section A’), we , =2x(05) = is lost. Notice that ai ied ony by the pixelizaton step not by the sana ‘What are the consequence foe building discrete Gaussian Kernels? In the contin ‘uum, the Fourier transform ofthe Gausian ga) =6 3 is the Gaussian g(o, 0°) with o'= 1/o. As g(o, 0") snot bandlimited, sampling ¢(x.0) ‘onthe pixel grid implies necessarily the loss of all components with > a, For roa tively small this means thatthe Fourier transform of g(x), s(a,0), is substantially ferent from zero well outside the interval [~, x], as shown in Figure 3S. To avoid lasing, the best we can do is to try Keeping most ofthe energy of g(v,0") within the interval [—,x]- Applying the “98:86% of the area” criterion inthe frequeney domain, wefind one ‘The preceding inequality tells you that you cannot sample appropriately a Gaussian ‘kernel whose 0 isless than 08 (in pixel units) no matter how many spatial samples you keep! ‘We can also interpret this result in terms of the minimum size for a Gaussian kernel Since o = w/5, for w =3 we have ¢ =046. Therefore, you cannot build a fitful Gaussian Kernel with just 3 samples For w =5,iastead, we have o = 1 which means that S samples are enough, What happens if you ignore allthis? Figure 3.6 shows that the inverse FFT of the FFT ofthe original Gaussian g(x,2) is significantly different from g(r, 0) for ¢ =06 (w =3). In acordance with oar prediction, a much smaller difference is found for @ = (w =5). 0.796, Section32 Noe itering 61 Figure 35. The Fourier transforms of two sampled Gaussians, for w= 3 (9 = 05, dotied line) and w= 5 (¢ =1, slid lin) [Notice thata smaller poron othe transform coresponing 0.2 ands lost between = Repeated averaging (RA) isa simple and etcient way to approximate Gassan smoothing It is sed on the fact thay vit of the centr it theorem, convohing a3 3 verging mask mines witha image J apponimates the convolution fF with a Gaussan mask of ~ yn/3 and sie n+ 1) “n= 2043 © Notise that RA ends toa iffrent relation betwoce @ and m fom the one we obtained {com the aeacieron (Exercise 3.6). © Figure3.6_Contiuots Gausian kernel (dotted), sampled el kernel, and continuous kernels reconstructed from samples (sit), fr = 0:5 (w= 3) (a) and o = 1 (w= 5} (b) respectively CChapter 3 Dealing with Image Noise Algorithm REP_AVG Leta « Binds th RA mask convolution matrices A and @.Let/ be the mputimage, Dine the 3x3 124] 1 = 2 69) 5[} 24] To onvolve / wth an approximated Guusian Kerac of = 73 L hoa! 2 For = rR ‘You might be tempted to combine separability and repeated averaging, as this would yield avery efficent algorithm indeed. But are you sue thatthe Kernel defined in REP_AVG is separable? Using separability with a nonseparable kernel means that the result of REP_AVG is different from the application of the 2-D mask, which ‘may result in errors in further processing; image differentiation is once again an apt example. Asafe way to combine separability and repeated averaging is cascading. The idea is that smoothing with Gaussian kernels of increasingly large standard deviations can also be achieved by convolving an image repeatedly with the same Gaussian kernel, In this way each tering pass of REP_AVG is surely separable (see Exercise 37) 3.24 Nonlinear Filtering Insection 32.1, we listed the main problems of the averaging filter: blur, poor feature localization, secondary lobes inthe frequency domain, and incomplete suppression of peak noise. Gaussian filters solve only the third one, asthe Fourier transform of a Gaussian has no secondary lobes The remaining problems are tackled efficiently by nonlinear filtering; thats fitering methods that cannot be modelled by convolution, The median filter is 4 useful representative of this lass. A median filter just replaces each pixel value /, ) wit the median of the values found ina local neigh hothood of (ij). As with averaging, the larger the neighborhood, the smoother the result Algorithm MED_FILTER ‘LeU be the input image, the tered i Foreach pel, sand an odd number 1. Compute the median m(, ) of he values in nn neighborhood of jy +h + he [on/2sn/]). where n/2ndsaesintger division, Section32 Nokefitering 63 @ ©) Figure3.7_ (a) Results of wppying median filtering (pixel, wide) othe “checkerboard” image corrupted by Gausian se, and eey-vel pro lon (@)Sane forthe “checkerboard” image onropted by impulsive Figure 327 shows the effects of median filtering on the “checkerboard” image corrupted by Gaussin and impulsive noise (Figure 3.1 center and right, respectively). (Compare these resus with those obtained by Gaussian smoothing (Figure 32): Median filtering has suppressed impulsive noise completely. Contoursare ako blurred lessby the sedan than by the Gaussian filter; therefore, a median filter preserves discontinuities better than linear, sseraping filters 6 Chapter3 Dealing with image Noise 33. Summary After working through this chapter you shouldbe able to: explain the concept of nose, image noise, and why noise smoothing s important for computer vision design noise-smoothing algorithms wing Gaussian and median filtering decide whetheritis appropriate touse linear or median smoothing filtersin specific 3.4 Further Readings [Noise filtering and image restoration are clase tpies of noise and image processing, Detaled discussions of image processing methods are found in several books; for instance, [, 3, 10,8]. Papoulis [7] is a good reference text for Fourier transforms Repeated averaging for computer vision was fst reported by Brady eta. (1) Cai 2] discusses several linear filtering methods inthe context of difusion smoothing ‘Witkin [II] and Lindeberg [5] provide good introduction to sealespace represen tations, the study of image properties when smoothing with Gaussians of increasing standard deviation (the scale parameter). One reason for keeping multiple scalesis that, some image features may be lost after filtering with large Kernel, but small Kernels could keep in too much noise, Alternative methods for representing signals at multiple scales include pyramids [9] and wavelets [6] (see also references therein). 35. Review Questions, 54. Explain the concept of image nose, how itcan be quanted, and how it can affect computer vision computations. 13 32 How would you estimate the quantization noise in a range image in terms ‘of meun and standard deviation? Notice that thi allows you to compare icetly ‘quantization and acquisition nose 833. Explain why a non-negative kernel works as a low-pass filters and in whit ‘assumptions it ean suppress noise, (1 34 Whatisa separable kernel? What are the advantages of separability? 135 What are the problems ofthe mean filter for noise smoothing? Why and in ‘what sense is Gaussian smoothing better? 5 36 Explain why the sampling accuracy of a L-D Gaussian filter with « =0.6 ‘cannot be improved using more than three spatial samples. 1 37 Whatis repeated averaging? What are its effects and benefits? 1 38 Whatis the diference between cascading and repeated averaging? 3 39 Can you think of any disadvantage of cascading? (Hine Which standard 0) 2 38 Consider the L-D step profile 10 (3 fe [0,3] 8 iels] ‘Work out the result of median filtering with » =3, and compare the result with the output of ilering wih the averaging mask 1/4 [12 1} © 39 Median tering can degrade thin ines. Thiscan be partially avoided by vsing ‘nonsquare neighborhoods What neighborhood shape would you use to preserve horizontal or vertical ines, pixel wide? 65 Chapter3_ Dealing with Image Noise Project (© 21. Write programs implementing Gaussian and median noise filtering The code should allow yout specify the filter's width. The Gaussian implementation should bbe made as eficient a possible. References [1] MBrad,1 Ponce, A. Yuilland M Asada, Describing Surfaces, Computer Vision, Graph les and Image Processing, Vol. 32,001, pp. 1-28 (1985), LD. Cai, Sele Based Surface Undersanding Using Difsion Smooshing, PHD Thess, Deparment of Arial Intligence, Univesity of Edinburgh (1950), [3] RC Gonzalezand RE. Woods Digital nage Processing, Addison-Wesley, Reading (MA) 099, RM. Haralick and LG, Shapiro, Computer and Robot Vision, Vl, Adison-Wesley, Reacing (MA) (1992) [5] TiLindeberg, Scale-Space for Discrete Signal, IEEE Transactions on Patera Analysisand ‘Machine Intelignce, Vo, PAMI-2, 30.3, p. 24-254 (1930. [6] SG Maat, A Theor of Mulizesluion Image Procesing: the Wavelet Representation, TEEE Transactions on Pater Analysis and Machine Ineligence, Vo. PAMI-I, 0.6, (614-695 (198). 11. Papouis, The Fourier Integral ands Applications, MeGra-Hl, New York (196). [8] WK Pra, Digital image Processing, Wiles, New York (1981). [9] A. Rosenfeld Multiresolution Image Processing, Springer Verlag, New York (184), [00] A. Rosenfeld and A.C. Kak, Digi! Picture Procesing, Academic Press, London 1976). [01] AP Witkin, ScaleSpace itering, Proc. th In. Conf on Arf Inligence ICAL, Kariseube, pp. 1019-1022 (193). i Image Features (Que nso dito come una salita veg oh ales dl itliano in gia! Polo Cots, Baral ‘This and the following chapter consider the detection, location and representation of special parts ofthe image, called image features, usually corresponding to interesting element ofthe Chapter Overview Section 41 introduces the concept of image feature, and sketches the fundamental isues of Feature detection, on which many computer visio algorithms are based. Section 4.2 deals with edges, cr contour fragments, and how to detect them, Edge detectors are the basis ofthe line and curve detectors presented in the next chapter. Section 43 presens features which do not correspond necessarily to geometric elements of the scene, but are nevertheless useful Section 4.4 discusses surface features and surface segmentation for range images. What You Need to Know to Understand this Chapter ‘+ Working knowledge of Chepter2and3 + Basie concepts of signal theory, ‘Eigenvalues and eigenvectors ofa matrix. ‘Elementary differential geometry, mainly surface curvatures (Append, section A.) The ne anit nul oud The ey ys on alan on boys, o 62 Chapter4 Image Features 4.1 What Are Image Features? Tn computer vision, the term image feature refers to two possible entities: 1. a global property of an image or part thereof, for instance the average grey level, the area in pixel (global feature); oF 2 a par ofthe image with some special properties, for instance a circle, a line, ot textured region in an intensity image, a planar surface in a range image (local feature) The Sequence of operations of most computer vision systems begins by detecting and Tocating some features i the input images. I this and the following chapter, We con: centrate on the second definition above, and illustrate how to detect special parts of intensity and range images like points, curves particular strutures of grey levels, oF su face patches. The reason for this choices that mos algorthmsin the following chapters assume that specifi, local features have already been Tocated. Here, we provide ways of doing that. Global features are indeed used in computer vision, but are less useful to solve the problems tackled by Chapters 7, 8,9, 10 and 11. We assume therefore the following detition, Definition: Image Features meaningful dete Image features ae os -Meaningfal means thatthe features are associated to interesting scene elements via the image formation process. Typical examples of meaningful features are sharp intensity variation crated by the contours of the objects the scene, or image regions ith uniform grey levels, for instance images of planar surfaces, Sometimes the image features welook for are nat associated obviously toany part or property ofthe scene, but reflect particular arrangements image value wth desirable properties like invariance orease of detectability Fr instance, section 4.3 discusses an example of features whch prove adequate for tracking across several images (Chapter 8). On the other hand, the number of pixels of grey level 134 makes a rather unuseful feature, as in general, it cannot be associated o any interesting properties of the scene, a individual grey levels, chaage with illumination and viewpoint. ‘Detecable means tha location algorithms must exist, otherwise a particular fea- lure is of no use! Different features are, of course, associated to diferent detection algorithms; these algorithms output collections of feature descriptors, which specify the position and other essential properties of the features found in the image. For instance, ‘descriptor for line features could specify the coordinates of the segments central point the segments length, and its orientation, Feature descriptors ae used by higher-level programs; fr instane, i this book, chains of edge points (section 4.2 are used by line detectors (Chapter3)ines, in tur, are used by calibration (Chapter 6) and recognition algorithms (Chapter 10). Section 42 EdgeDetecton 69 "© In BD computer vision, feature extraction i an intermediate step, not the gel ofthe system, We do mc extzat ine, a, jas to obi ine mape; we extract ines fo nage robots in corido, to decide wheter an image contains certain objet, ta calibrate the innsc parameters ofa camera, and soon. The important corollary is that does not ‘make mc senseto pursue “peret feature extraction per se asthe adequacy of feature , the edge orientation image and ‘as twothreshods such hat ral the edge points in fy, nd scansing I ina fixed order: 1. Locate the next unvsited ede piel (J), such hat If) > 1 ‘ithe hips senting, vont Section 42 Edge Detection 79 2 Stang fom yj), follow the chains of connected local maxima, in bot directions perpendicular tthe edge normal, as long Jy > x. Mark al visited points, and eave 3 Tt the locations al points ia the coenected nou found ‘The ouput is st of sts each describing the poston of «connected contour inthe image, ‘swells the strength anc tho orientation images, desrbing the properties ofthe edge pois Hysteresis thresholding reduces the probability of fale contours @ they must produce a response higher than r, 1 occur, as wells the probability ofsteaking, which requires now much lager fluctuations to cur than in the single-threshold case, FA yis lange, x amas be set 100, Notice that HYS“ERESIS_THRESH performs edge tracking: i finds chains of connected edge maxims, or connected contours. The descriptors for such chains saved by HYSTERESIS THRESH in addition to edge point descriptors, can be useful for curve detection ‘= Notie that Yunis willbe splithy NONMAX._ SUPPRESSION. How serious this de pendson what the elgsare computed fr. posible solution sto modly HYSTERESIS ‘THRESH so that treognizes Vjunction and interrupt all edges, Figure4.6showstte output ofourimplementation of NONMAX_SUPPRESSION and HYSTERESIS.THRESH when run on the images in Figure 45, All contours are ‘one pixel wide, a desi Figure4.6_ Output of HYSTERESIS. THRESH ran on igure 45 showingtheeffect of varying thefilters ize Lefto right 67 = 1,23 pixel The grey levels hasbeen inverted (back on white) forclaiy 80 Chapter4 Image Features 423, Other Edge Detectors Early edge detection algorithms were less formalized mathematically than Canny’ ‘We sketch two examples, the Robers and the Sobel edge detectors, which are easily implemented in their essential form, 7 ‘Algorithm ROBERTS EDGE DET “The inputs formed by an image 7, anda threshold. 1. apply aise soothing as appropri (Fr instance, Gaussian smoothing in the absence of information on nie: see Chapter 3) obtaining a new image 1 2: filer (algorithm LINEAR. FILTER, Chapter) withthe masks [1a] G4] btining two images J and 1. estimate the gradient magnitude at cach pixel (i,j) 15 iia. cbining n imag of mags rains 4 aay esl ps, ) eh at) > ou The outpt isthe location of edge poins obtained inthe last step _ Algorithm SOBEL_EDGE_DET Same as for ROBERTS EDGE_DET, but replace step 2. withthe follwing, 2 filter, algorithm LINEAR FILTER, Chapter 3) wi the masks 4-2-1 101 oo a] |-202 124 101 obtaining wo images and Notice that the element special to these two detectorsis the edge-enbancing filter (see Review Questions), Figure 4.7 shows an example of Sobel edge detection. 424 Concluding Remarks on Edge Detection Evaluating Bdge Detectors. The ultimate evaluation for edge detectors which are pat of larger vision systems is whether or not a particular detector improves the performance ofthe global system, other conditions being equal. For instance, within an Section 42 Edge Detection 81 Figure 47_Lete output of Sobel edge enhancer run on Figure 4.1. Middl: edges detected by thresholding the enlancod image at 38. Right sme, thresholding at SU. Notice hat some ‘omtous ae thicker thin one pitel (compare wth Figure 45), inspection system, the detector leading tothe best accuracy inthe target measurements, and acceptably fat, ito be preferred However, it is useful to evaluate edge detectors per se as well. We run edge . The geome interpretation i ad can be understood through fe pati ates Fist consider perfect uniform Q: the image gradient vanishes everywhere, © comes te mul matt and wehave y= = 0. Second asume that conan an ideal lak an white step edge: we hve p=, i > andthe eigenvector asocinted wih is parle othe image gradient Noe that Ci rank dee in Both cases, with rk 0 andl respectely id, ssume that Q contains the comer ofa black Sauate aginst awhile background: ss there ae two principal dictions in we Chee orig» Ovand the larger the elgevaes, the sronge (hehe eons) heir Corresponding image lines At this pit, you have caught on wh he fact tha he Cigemectorseheadeedge directions the egeales edge srengh A comers ened Section 43 Point Features: Comers &3 @ oS) Figure 48 Corners fan ina 8-i,syothtic checkerboard image, coruptd by two realizations of sybase Gausian noise of standard deviation 2. Te corner isthe botom right point of each 15 1S nighbourhood (hight), by two strong edges; therefore, as; = 2a, a comer isa location where the smaller eigenvalue, 3 i Tare enough Time for examples Figure 4.8 shows the comers found in a synthetic image of a ‘checkerboard, with and without additive noise. Figure 49 shows the corners found in the image ofa building, an the histogram of the 22 values. The shape ofthis histogram israther typical fr most natural images. Ifthe image contains uniform regions or many almost ideal step edgs the histogram has a second peak at =0. he til (right ofthe histogram is formed by the points fr which hs is large, which are precisely the points (or, equivalently, the neighbourhoods) we are interested in. Figure 4.10 shows another ‘example with a road scene @ o rc) Figure 49 (a: rigina ima the image points for wheh of building (b): the 15x 1 pixel neighbourhoods of some of 20.6): hitgram ofA values across the image Chapter 4 image Features Figure 410. (a):image ofan outdoor sene. The oeneris the bottom right point ofeach 15 18 ‘eibouthood (highlighted). (corer found sing 215 x 15 neighbourhood. ‘We altcratethat our feature pon ise high const image comes and junc earth messin Set soto (asthe comeria Fite 48), Bak comer tthe lal neat aterm aot cresponing o obvi en fea antmeof te eres in Fig 0) In onerale t aes pois he try aac hsivewel pronounced dstncvedcon asociedo cgrvalcs vc bon scanty ager an er et nc ummare te pede or ating ths new pe afimge features ‘Algorithm CORNERS “Te input is formed by an image, , ad wo parameters: the threshold on 2, rand the Hiner sive of square wind (aeighbourbood), ay 2 +1 pte 1. Compute the image gradient ove the enite image 1 2 For ech image point (4) form tne matic C of (49) over (2H +1) x QN +1) aeghbourhood @ ofp {b) compute the smal eigenvae of C; {e) iia > rate the coordinates pinto a Ts, 2 Sort Lin dereasing order of, ‘4 Scanning the sorted list topo hottom: foreach current pint, p, delete al points pearing furtheron in thelist which belong io the neighbourhood ofp he output nist of feature points fr which 22> «and whose neighbourhoods Jo not overap Section 4.4 Surface Exaction from Range Images 85 Algorithm CORNERS has two main parameters: the threshold , and the sizeof the neighbourhood, 2 +1). The threshold, x, canbe estimated from the histogram of ig (Exercise 4.6), the latter has often an obvious valley near zero (Figure 49) Notice that such ale snot abvays present (Exercise 47). Unfortunatly thereis no simple criterion for the estimation ofthe optimal size ofthe neighbourhood. Experience indicates that choices of ¥ between 2and 10 are adequate in most practical case. © Inthe case of comerpoits the vale of Wi linked tothe lestono the corner within the neighbourhood. As you ean sce rom Figure 49, for elativaly large values of Whe cornet tends to move any from he neighbourhood center (ee Exercise 48 fra quantitative analysis ofthis eft). 4.4 Surface Extraction from Range Images Many 3-D objects, especially man-made can be conveniently described in terms of the shape and postion cf the surfaces they are made of, For instance, you can describe a cone as an object formed by two surface patches, one conical and one planar, the latter perpendicular to theaxis ofthe former. Surface-based descriptions are used for abject classification, poseeximation, and reverse engineering, and are ubiguitous in computer sraphics ‘As we ave seen in Chapter 2, range images are basically sampled version ofthe visible surfaces inthe scene. Therefore, ignoring the distortion introduced by sensor imperfections, the shape of the image surface and the shape ofthe visible scene surfaces ‘are the same, and any geometric property holding for one holds forthe other too. This section presents a wdl-known method to fin patches of various shapes composing the Visible surface of an object. The method, called HK segmtentation, partitions a range image into regions ofhomogeneous shape, called homogencous surface patches, just surface patches fo shor.” The method is based on differential geometry; Appendix, section A gives a stort summary of the basic concepts necessary. == Thesolution1o several eomputer vision problems involving 3D objet modes are simpler when using }D features than 2-D features, as image formation must be taken into account forthe later. Tate tho iage alucreardl a asic land nthe age lan "Nose hat sure paces ae he bacingesiens for bung sara tase CAD model of an ox atonal Chapter 4 image Features kK T Tale Figure 411. usteation ofthe local shapes resuing rom the HK casifcaion Problem Statement: 1 K Segmentation of Range Images Given a range image Fin. form, compute anew image registered with and the same ize, in which ach pe asia with local shape cas selected from a pven dictionary ‘To sole this problem, we need two tools a dictionary of shape classes, and ‘an algorithm determining which shape class approximates best the surface at each piel 44.4. Defining Shape Classes ‘Since we want to estimate surface shape at each point (pixel), we need a local definition ‘of shape Differential geometry provides a convenient one: using the sign ofthe mean ‘curvature Hand ofthe Gaussian curvature K, we can clasty the local surface shape as ‘shown in Table 4.1, and illustrated by Figure 4.11 ‘Inthe able, concave andconvex are defined with respect tothe viewing direction: a holein therange surfaces concave, and itsprincipal curvatures Appendix, section A.S) negative, At cylindrical points, one of the two principal curvatures vanishes, a8 for instance at any point of a simple cylinder or cone (not the vertex). At eliptc points, both principal curvatures have the same sign and the surface Tooks locally like either Section 44 Surface Extraction from Range images 87 x H____Laalaape dw ° ae cave ein comercial cnear ipic + = camvereigue xy ype Table 4.1, Surface patches casifcation scheme, the inside of a bow if concave) or the tip of anos (ifconves) Athypesboic points the principal curvatures ate nonzero and have diferent signs the surface looks like a sade Novice that this classication i qualitative, in the sense that only the sign of the curvatures, not ther magnitude," influences the result. This fers Some robust. ses assign can oft be estimated correctly even when magnitude estimates become 442. Estimating Local Shape Given Table 4.1, all ve have to dois to recall the appropriate expressions of H and K, ‘evaluate them at each image point, and use the signs of Hand K to index Table 4. Here ishow to compute HK from a range image, hin ry form (subscripts indicate again partial ditferetition), Dray Hy 2 4.10) asin en) (4 Hhyy ~ Dh hlny OE ay = ee ++ AY fa Unfortunately, we cannot expect good results without sorting outa few details Fist, the input imagecontains nose, and this distorts the aumericl estimates of deriva tives and curvaturessnoise smoothing is therefore required. Notice thatthe worst nose may be due to quanisation (if the 8-bit image does not capture all significant depth Variations of the sere), o to limited sensor accuracy "Win ee exception fo cous Chapter image Features The low anqiston noise of state-of-the-art later scanners should not jeopardise se ‘us the quality ofthe HK segmeatation. Segmentation can fly Howevet, because im ge quantisation and resolution are nt sfcont given the objects andthe stand-off distance, = Gaussian smoothing, as any averaging file, tends to underestimate high curvatures, and to introduce spurious curvatures around contours (Exercise 49) Second, the reslt may sill contain smal, noisy patches, even wien smoothing is applied to the data, Small patches can be eliminated by additional filtering (Exer cise 4.10) Third, planar patches should yield # = K 0, but numerical estimates of Hand K will nover be exactly ero, To decide which small numbers ean be safely considered zero, we can establish zero-hresholds for H and K. In this cas, the ac curacy with which plana patches are extracted depends (among others) on the noise level, the orientation ofthe plane and the values of the thresholds Fourth, estimat ing derivatives and curvatuzes does not make sense at surface discontinuities, To skip discontinuities, ne could therefore un an edge detector on R (c.g, CANNY. EDGE, DETECTOR for step edges) keep a map of edge points andskip them when estimating Hand K. ‘Tosummarie, here isa basic HK segmentation algorithm. Algorithm RANGE, SURF_PATCHES The inputs a range image, In form, and ast of six shape label (sl, asocated to the clases of Table. 1 Aply Gausan smoothing oF, obtaining, 2 Compute the inmages ofthe derivatives fy fu yy (Append, Seton A2) ‘Compute the HK images wing (4.1) and (4.10) ‘4 Compute the shape image, by assigning a shape label to each pel, aoxoring to the rules in Table. “The output isthe shape image S Figure 4.12 and 4.13 show two examples of HK segmentation. The range data were acquired and processed using the range image acquisition and processing systems developed by the Computer Vision Group of Heriot-Watt University. The objects are ‘mechanical components formed by planar and curved surfaces. The figures show the input range data the data after smoothing (both as grey-level image and as 3D plot), and the patches detected Section 44 Surface Extraction from Range images 89 @ @ Figure 4.12 (a): Inputrange image, grey ced (the darker te cost to the sensor. (b): Aer smoothing srey coded, (c: Same stp ht a6 3-D Hometrc pot. (4) The patches detected bby HA segmentation, Courtesy of M Uist, Heriot-Watt Universi Inorder to be wed by subsequent task ike classification or pose estimation, the ‘ulput of RANGE SURF PATCHES is often converted into alist of symbolic patch descriptors. In each descriptor, a surface patch is associated with a number of attributes, ‘which may include a enigue identifier, postion of pati center, patch area, information fon normals and curvatures, contour representations, and poiaters to neighbouring patches Closed-form surface models (eg, quadrcs) are often fitted to the surface patches extracted by the HK segmentation, and only the model's coefiiens and type (ce. plane, elinder,cone) stored inthe symbolic descriptors, 90 Chapter a Image Features @ Figure 413. (a): input range image, ry coded the darker te loser othe sensor) (alter soothing, grey coded, c):stme a top right, a8 -D isometric lo (2: the patches detected by HK segmentation. Courtesy of M, Umasithan, Heriot-Watt Univers 45° Summary After working through tis chapter you should be able to: {2 explain what image features are and how they relate tothe 3-D world ests forthe related design detectors for edges and point features andperforman algorithms; a design a simple HK segmentation program for range images. 46 Further Readings Several books review the theory and algorithms of large collections of edge detectors, for instance (8, 9,14} John Canny’s description of his edge detector is found in [4] Section 4.7 Review 91 Spacek [15] derives sihtl different, optimal ede detector. De Micheli et al [6] and Pratt [4] give exarples of discussions on performance evaluation of edge detectors A very good electronic textbook on image processing, including material on feature ‘detection, isHIPR (Hypermedia Image Processing Reference), published by Wily (on line information at ettp; //w, viley .co.uk/electronic/aipr). Second-order derivative filters for edge detection have been very popular inthe eighties; a clas reference is Marts book {13}. These filters look forthe zero-rossings ofthe second derivative of a Gaussian-fitered image. Their dsadventages in compar ison with Canny’s detector include worse directional properties (being sotropic their ‘output contains contibutions fom the direction perpendicular tothe edge normal: this does increase noise without contribute to detection); moreover, they always produce closed contours, whch do not always correspond to interesting edges For a theoreti cal analysis of the main properties of ist-order and second order derivative filters see Torre and Poggio (Ii. "The pont feature (corner) detector CORNERS is based on Tomasi and Kanade’s fone [I6; an application to motion-based reconstruction is desribed in [17], and discussed in Chapter 8. Further corner detectors are reported in 10, 39} ‘Bes! and Jain 1}, Hoffman and Jain [11], and Fan [7] are some variations of the HK segmentation method. Hoover et al. {12} and Truceo and Fisher [19] report useful experimentalassessments of H,K segmentation algorithms from range images. For examples of reverse engineering from range images se [5,2 47. Review Questions 141 Describe ie difference between local and global features and give examples ‘of image features from both classes, 13 42 Given ou: definition and classification of edges, discus the differences be: tween edges in intensity and range images Do our edge types make sense for range images? 19 43 Would youapply intensity edge detection algorithms to range images? Would the algorthmsrequire modifications? Why? 10 44. Consider he equation giving the orientation ofthe edge normal in algorithm CANNY_ENHANCER. Why can you ignore the aspect rato of the pixel here, ‘but notin the situation proposed by Question 28? 1D 45 In section 422, we used the image gradient to estimate the edge normal Discuss the practical approximations implied by this choice. 1346. Explain vty ¥junctions are spit by HYSTHERESIS. THRESH. 1-42. Discus the diferences between Sobel’ edge enhancement scheme pre ceded by a Gaussian smoothing pass, and Canny’s edge enhancement scheme, ‘What are the main design differences? Do you expect different results? Why? 2 Chapter 4 image Features 13-48 You can suppress short edge chains in the output of CANNY_EDGE_ DETECTOR by filtering the input image with wider Gaussians, How would you achieve the same result with a Sobel edge detector? 19-49 How would you design an experimental comparison of Canny’s and Sobel's edge detectars? 440 Why isthe case K > 0, H =0 not featured in Table 4.1? 3 411 Explain why 11K segmentation cannot be applied to intensity imagesin the hope to find homogencous scene surfaces Exercises © 4d Consider the SNR measure formalzing the good detection criterion (43). ‘Show that, ifthe filter has any symmetric components, the SNR measure wil ‘worsen (decrease). This shows that the best detection is achieved by purely ant- symmetric filters (© 42 Prove the detection-localizaton uncertainty equations (46) for step edge rele?) where © isa fixed fraction (eg, 09) and the weights are proportional tothe counters” values ‘Asan example, Figure 52 (s) shows asymthetic 64x 64 image of two Fines Oaly a subsel ofthe lines’ points are present, and spurious points appear at random locations. Figure 5.2 (b) shows the counters in the associated (m,n) parameter space. Section52 The Hough Transform 99 o 0) Figure 52_ (a) An imag containing wo ines sampled iregulaly and several random pints (©) Plot ofthe counters in the corresponding parameter space (ow may pont contribute to «ach ell (n,n)). Notice thatthe main peaks are obvious but hore are masy secondary peaks. ‘We are now recy fo te following algorithm ‘Algorithm HOUGH. LINES ‘The input is, a AN binary image in which each pixel Bj) 81 if an edge pixel. 0 ober. Le. 8 bethe araysconsiing the discrete intervals of the p, parameter spaces (010, VAP + N29 [0.x] and R, 7 respectively their numberof clement. 1. Discetize the parameter spaces ofp and @ asing sampling steps Jp, 30, which must yield ceptable resolition and manageable size foray an 2 Let A(R, 7) hom area integer counters (accumulators: inal 5 For each pte, Bj), sUch that £()= and for (4) let 9 = co 40) + jin (0) find the nde. ofthe element of loses 0 (6) increment 4,0) by one allelements of Ato r ‘4 Finda ocal mavima (fy such hat Ai, p> e, whore i user defined threshold The ourputis ast of pairs ad), By), seine the lines detested in n polar form. *© Iamestinate mp) ofthe edge direction at image point pis available, and we assume that mp) i also he dretion ofthe line through p, gue cll m,n, = ¥~ mys) canbe identified. Inthissase instead of the whol line, weinerement onl the cuter tm.) toallow forthe uncerainty associated wth edge direction estimates weintement al the cello a small sqyment centered in (m,),the length af which depends inversely on he 100 chapter More Image Features elailt ofthe direction estimates This can speed up considerably te construction of the paramcter space 5.2.2. The Hough Transform for Curves, ‘The HT is easly generalized to detect curves y = f(s, a), where a=[ay,...,9]' isa vector of P parameters, The basi algorithm is very similar to HOUGH LINES. ‘Algorithm HOUGH_CURVES The input asin HOUGH_LINES. Let (i) be the chosen parametrization of tags 1. Diseetize the intervals of variation of ,.-.ap with sampling steps yeding acceptable resoltion and managesble size forthe parameter space Let ...5p be te ize ofthe iseoized intervals 2. Let Alston) be an aray of integer counters (accumulators), and iniaize alts ‘ements 020. ‘4 For each pel Ej) such that EG, y= flsyayin 4, Final focal maxima a suc that A(ag) > where xb a wer defined threshold 1. increment al counters n he curve defined by set of vectors ay. ar deserting the curve instances detected . = Thesizeofthe parameter space increases exponentially with te numberof model parame ters and thetime neded to find all maximabecomesrapdly unacceptable. This iascrious limatation. In particle, assuming for simplicity that he disretied intervals fal param fers ave the same ie N, the cost ofan exhanstve search ora curve with parameters proportional o.N?. This pablem canbe tackled by varabe-esoluion parameter spaces (sce Question 56), 5.23 Concluding Remarks on Hough Transforms ‘The HT algorithm i a voting algorithm: Each point “votes” for all combinations of ppstanetets which may have produced iifitwors part of the trget curve From thispoint bf view, the array of counters in parameter space can be regarded asa histogram. The fina total of votes, c(m), in a counter of coordinates m indicates the relative likelihood ofthe hypothesis “a curve with parameter set m exists in the image. “The HTcan also be regarded as patern marching: the clas of curves identified by the parameter space isthe class of patterns. Notice that the HT is more efficient than direct template matehing (comparing all posible appearances ofthe pattern with the image). "The HT has several attractive features Fist, a all points are procesed indepen: dently, it eopes well with occlusion (fhe noise does ot result in peak as high as those ‘created by the shortest true lines) Secon itis relatively robust to noise, as spurious Secton53 Fitting Elipsesto image Data 101 points are unfikely to contribute consistently to any single bi, and just generate back ‘round noise. Thirdit detects multiple instances ofa model ina single pass ‘The major limiation of the Hs probably the rapid inerease ofthe search time with the numberof parameters inthe curve's representation, Another limitation i that non-target shapes cn produce spurious peaks in parameter space: For instane, line detection can be distarbed by low-curvature circles. 5.3. Fitting Elipses to image Data Many objects contancicular shapes, which almost always appear slipsesin intensity images (but see Exercise 5.5): for this reason, ellipse detectors are usefl tools for ‘computer vision. Theellipse detectors we consider take an image of ee points in input, {ndnd hobs lige ing te points Therefore ths sexton connate ele {iting nd assumes that we have identified a set of mage points plausibly belonging to a single arc of elise ‘Problem Statement: Elipse Fiting Letpr py beasetof image poins and voltae [2a 8ni] fip.a) =a" amar + bay ty? tae bey f=0 ‘heim equation ofthe generis mv aracterzed by the parameter vector ee Find the parameter vector, a, asocated tothe elise which fits p py bes in the east squares sense asthe soutonof| in (Dp. (1) where Dip table distance ‘Notice that the ecuation we wroe fr fia) is eally a generic conic We sal ave more ‘say about his point later Whats asta tance? Tee ae to main answers for ei iting, he Euclidean distance and the algebraic distance. * 5.3.1. Euclidean Distance Fit The first idea sto tryand minimize the Euclidean distance between the ellipse and the ‘mieasured points In this cae, problem (5.1) becomes sin > ip pl? (62) 102 Chapter More image Features ‘under the constrain that p belongs to the ellipse: (pa) =0. Geometrically, the Euclidean distance seems the most appropriate, Unfortunately, it leads only to an approximate, numerical algorithm, How does this happen? Let us try Lagrange muliplies to solve problem (52). We define an objective function b= ip— pil ~2/p.a, which yield p-p=2°/(Pa) 3) Since we do not know p, we try to expres it asa function of computable quantitics. To do this, we introduce two approximations 1. We considera first-order approximation ofthe curve = Fp.a)® f(Pi.a) + (PPI V/0..a) G4) 2, We assume that the p, are close enough to the curve, so that V (p) ~ V(@)- Approximation 2 allows us to rewrite (53) as p—Pi=AVf(P.a), which, plugged into (5.4), gives = £0018) Wn. ‘Substituting in (5.3), we finally find 0 ISFeei “Thisisthe equation we were after: Itallows uso replace, n problem (5.2), the unknown ‘quantity ip ~ pil with a function we can compute. The resulting algorithm iss follows: Ip pel= Algorithm EUCL_ELLIPSE_ FIT The inputisaset of Vimage pins p,-..-py- Weassume the notation inroducedin he problem statement box fr elie tng, 1. Stato an iii value my Section53FitingElipses to image Data 103, 2 Using ay asin point, run a numerical minimization to find the soation of ‘The outputs the efnng the best. ise © A reasonale inal value is the solution ofthe clsedorm algorithm discussed next (secon 532), How satisfactory is EUCL_FLLIPSE_FIT? Only partially, We started with the tmue (Euclidean) distance, the best possible, but were forced to iatroduce approxima- tions and arrived a 2 nonlinear minimization that can be solved only numerically, We are not even guaranteed thatthe best-fit solution isan ellipse: It could be any conic, as ‘we imposed no constraints ona. Moreover, we have all the usual problems of numerical ‘optimization, ineludcg how tofind a good initia estimate fora and how to avoid getting stuck in local minima = Thegood news however stat EUCL. ELLIPSE. FIT canbe weld for general oni ‘Of cous, tere ia isk thatthe rsat is not the con we expect (ee Further Reangs) A logical question at this point is: If using the rue distance implies anyway approximations and anumerial solution, can we pechaps find an approximate distance leading toa closed-form solution without further approximations? The answer is yes, and the next section explains how todo it 532 Algebraic Distance Fit Definition: Algebrale Distance The algebraic distance oa point p from acure fp, simply LF. ‘The algebraic dstance is different fom the true geometric distance between & curve anda pont; inthis sense, we start of with an approximation, However, this isthe ‘only approximation we introduce, sine the algebraic distance turns problems (51) ito 8 linear problem that we can solve in closed form and with no further approximations. Problem (5.1) becomes ayo Sisfa? re) ‘To od he ri ston a=0, we mast enforse a constant on a. OF he several constraints possible (see Further Readings), we choose one which forces the solution tobe an elise 108 Chapter More Image Features 00-2000 010000 r}-2.0.0 000|,_sroy_ 100 0 000) 66) 000000 000000 Notice that this can be regarded as a “normalized” version of the elpial constraint P= dae = 0,38 aly defined up toa sale factor. ‘We can ida solution to this problem with no approximations. Fist, we rewrite problem (5.5) 38 Jax Xa = mn "Sal, (57) where 2 2 1 qoan op non dup yan x-|2 mm Hon (58) 2 yw Yh ow wT Inthe terminology of constrained last squares is called the design matrix, §= XX the seater matrix, and C the constrains matrix. Again using Lagrange multipliers, we ‘obtain that problem (5.7) issolved by Sa=2Ca 69) ‘Thisisaso-called generalized eigenvalue problem, which can be solved in closed form. It canbe proven thatthe solution, asthe eigenvector corresponding othe only negative eigenvalue, Most numerical packages wil find the solution of problem (5.9) for you, ting care of the fact that Ci rank-deficient, The resulting algorithm is very simple. ‘Algorithm ALG ELLIPSE. FIT “Theinputisaset of W imag pins p,...,py. Weassume he notation introduced inthe problem. statement box fr elise iting 1. tld he design mrs, as pee (58), 2 Build the scatter matrix, S=X", {3 Build the constant mati, C, 8 por 4). ‘4 Use a numerical package to compute the eigenvalves ofthe g Jen andl the oly negative eigenvalue eralized eigenvalue pro “The output isthe best parameter vector, ven by the eigenvector asain 10. Figure 53 shows the result of ALG_ELLIPSE_FIT run on an elliptical ar cor rupted by increasing quantities of Gaussian noise. Section 53 Flting Elipsesto image Data 105 Figure 53. Fxample cf best elpses found by ALG_ELLIPSE.FIP for the sume ar of ellipse, coruped by increasingly strong Gaussian noise From lett right, the noise varies fom 395 10.20% ofthe data read (gure courtesy of Maur Plu, Univesity of Edinburg). ‘ALG_ELLIPSE_FIT tends tobe biased towards low-eccenmiy slutonsindeed 4 characteristic ofall methods based on the algebraic distance. Informally, this means that the algorithm prefers fat ellipses to thin ellipses, as shown in Figure 54, The reason isbest understood through the geometric interpretation of the algebraic distance Geometre Interpretation ofthe Algeb Consider a point p ner Tying onthe ellipse f(y proportional 0 stance (Elipses) The algebraic distance, fp, is =! [s#} =] whee risth tans the pe roms cetera th dnc ofp fom he lps along the ame ine hich goos through p, and ds (Figure 55). Figure $4. station of the low-ccentrcy bias introduced by the algebraic tance. ‘ALG_ELLIPSE_FIT was run on 20 sample covering hall an elise, spaced uniformly along sand corruped by diferent realization of rather strong, Gaussian ise wih costa Standard deviation (0 =008, abou 10% ofthe smaller soma). Tho best ellipse (oid) is systematically biased tobe Mater” than the tue one (dashed) Chapter More Image Features Figure 55 ttwsraion of the distances d and ‘nthe geomet interpretation ofthe algebraic itance, Q. Ata parity of, Qisargerat Phan MPs Notice that this interpretation i valid flor any conic. For hyperbola, the ceater ithe inersection of the asympcoes for parabola, he center sat init For any fixed d, Q is maximum atthe intersection of the ellipse with its smaller axis (¢, Pin Figure 55) and minimum at the intersection of the elise with ts agger axis (eg. Pin Figure 5.5). Therefore, te algebraic distance is maximum (high weight) for observed points around the fat parts ofthe clips and minimum (Jow weight) for served points around the pointed parts. As consequence, iting algorithm based fon Q ends to believe that most data points are concentrated inthe Matter part ofthe ellipse, which results in “ater” best-fit elipses 53:3. Robust Fitting ‘One question which might have occured to yous Where do the data point for ellipse iting come frm? In real applications and without apron information on the cen, finding the points most kel to belong aspecifc elise isa dificult problem. Insome cases its reasonable to expect that the datapoints can he selected by hand. Filing that, serelyonedge chaining as dseribodin HYSTERESIS, THRESH. inanyaseitisvery key thatthe datapoints contain outs ree ts re dt pons which vt te Stil asaptns ofthe ein tor In our ease, an outer isan edge point erroneously assumed to biog toa elise fut Both EUCL_ ELLIPSE. FrT and ALG_ELLIPSE. FIT asleastsquares estimators, assume thatal datapoints aa he regarded as itu poits corrupted by additive, Caus- sian noise; hence, evenasmall number fours can degrade theiresls badly. Robust estimators ae a lass of methods designed to tolerate outers" A robust distance that i Sxtion inthe Append gies sn itredtin toronto Section5.3 Fitting Elipses to Image Data 107 ‘often works wells the absolute value, which is adopted by the following algorithm for robust ellipse ting Algorithn ROB_ELLIPSE. FIT Theinputisasctof image points... .py.Weastume the notationintcedinthe problem statement box for elips iting 1, run ALG_ELLIPSE FT, and allt stuson my 2 Usingay asin, un aaumeszl mination ond oat of nin la ‘The outputs the soliton, a, which denis the best elise Figure 56 illutates the problems caused by outliers to ALG_ELLIPSE_FIT, and allows you to conpare the result of ALG_ELLIPSE_FIT with those of ROB_ ELLIPSE_FIT, started from the solution of ALG_ELLIPSE_FIT, in conditions of severe nose (that is, bts of outliers and Gaussian nose). Both algorithms were run ‘on 40 points from halfan ellipse, spaced uniformly along x, and corrupted by different realizations of Gaussin nose with constant standard deviation (o = 0.05, about 7% of the smaller semians c), About 20% of the points were turned into outliers by adding 8 uniform deviate in a, a] to their coordinates. Notice the serious errors caused to ‘ALG_ELLIPSE_FIT by the outliers which are well tolerated by ROB_ELLIPSE_FIT. 5.3.4 Concluding Remarks on Ellipse Fitting With moderately noisy data, ALG_ELLIPSE_FIT should be your first choice, With seriously noisy data te eccentricity ofthe best-fit ellipse can be severely underest- ‘mated (the more so, the smaller the arc of ellipse covered by the data). If ths is a problem for your applization, youcan try EUCL ELLIPSE, FIT, starting from theso- lution of ALG_ELLIFSE_FIT: With data containing many outlier the resuits ofboth UCL ELLIPSE_FIT and ALG_ELLIPSE_FIT wil be skewed: in his case, ROB ELLIPSE_FTT, started from the solution of ALG_ELLIPSE_FIT should do the trick (but you are advised t take a look at the references in section A.7 in the Appendix if robustness sa serios issue for your application). If speed maters, your best bet i ALG_ELLIPSE_FIT slone, assuming you use a reasonably efficient package to solve ‘the eigenvalue problen, and the assumptions ofthe algorithms are plausibly satisfied ‘What “moderate noisy” and “seriously nosy” mean quantitatively depends on your data (number and density of data points along the ellipse, statistical distribution, land standard deviation of noise). In our experience, ALG ELLIPSE. FIT gives good fits with more than 10 points from half an ellipse, spaced uniformly along 1, and 108 Chapter More Image Features FigureS6 Compara of ALG_ELLIPSE. FIT and ROB_ELLIPSE_FTT when ing to data severely corrupted by oaticrs The cls show the datapoints the asterisks suges the robust fit the slid ine show the algebraic, and the dots the tre (uncorrupted) eligse corrupted by Gaussian noise of standard deviation up to about 5% of the smaller semiaxis Section 5.7 suggests Further Readings on the evaluation and comparison of ellpse-fiting algorithms. 5.4. Deformable Contours Havin discussed how to it simple curves, we now move on tothe general problem of fitting & curve of arbitrary shape toa set of image edge points We shall deal with closed contours only. 'A widely used compster vision model to represent and lt general, closed curves isthe snake, or active contour, or again deformable contour. You can think of a snake asan elastic band of arbitrary shape, sensitive to the intensity gradient. The snake is located initially near the image contour of interest, and is attracted towards the target contour by forces depending on the intensity gradient Section 5.4 Deformable Contours 109 "Notice thatthe sakes applied othe inst image, not to anime of ede points ax the line and ellipse detactos ofthe previous sections ‘We start giving description of the deformable contour model using the notion of energy functional an continuous image coordinates (no pxelization) We then discuss «simple, iterative algrithm fiting a deformable contour toa chain of egde points of real, piselized image. 5.4.1 The Energy Functional “The keyidea of deformable contours isto associate an energy funcionalto each possible contour shape, in such a way thatthe image contour to be detected corresponds to & ‘minimum ofthe functional. Typically, the energy functional used is a sum of several terms, each correspording to some force acting on the contour. Consider a contour, e=e(s), parametrized by its arclength? s.A suitable energy functional, &,consistsof the sum of three tems: es f (015) Boa + BU)Ecare + (8) Eimage) 43, (5:10) ‘where the integrals taken along the contoure and each ofthe energy term Ener and Einag, i function of e or ofthe derivatives of e with respect tos. The parameters 1,8 and, contro therelative influence ofthe corresponding energy term, and can vary along e. Let us now define more precisely the three energy terms in (510) 5.4.2. The Elements of the Energy Functional Each energy term serves a diferent purpose. The terms Ey and Fours encourage continuity and smooulness of the deformable contour, respectively they canbe reparded 88a form of internal energy. Eiqagea€counts for edge attraction, dragging the contour toward the closest image edge: it canbe regarded asa form of external energy. What functions can achieve these behaviors? Continuity Tere. Wecanexploit simple analogies with physical systems to devise ‘rather natural formfor the continuity term: "Gen an array paren of cave e= ei) ih the parameter and > awe have Eons = 1Ps ~ Bs-all*s ‘hile for smaller distances (5.11) promotes the formation of equally spaced chains of points and avoids the formation of point clusters. ‘Smoothness Term, The sim ofthe smoothness term is to avoid oscillations ofthe deformable contour. This is achieved by introducing an energy term penalizing high contour curvatures Since Evy encourages equally spaced points on the contour, the ‘curvature is well approximated bythe second derivative ofthe contour (Exercise 5.8), hence, we cam define Eee 38 Beare = [Bit — 281+ Bist (5.12) Ecdge Atiaction Term. ‘The third term corresponds to the energy associated to the extemal foree attracting the deformable contour towards the desired image contout. ‘This canbe achiewed by a simple function: Iii, (13) Eine = were VF isthe spatial gradient of he intensity image 1, computed at each snake point. (Ceatly, Ejnays becomes very small (negative) wherever the norm ofthe spatial gradient is large, (that ig near images edges), making € small and attracting the snake towards {mage contours Note that Ejmoe alike Fee and Ey, Gepends only onthe contour, ‘ot on is derivatives with respect tothe ae length, 5.43 AGreedy Algorithm We are now ready to describe a method for iting a snake to an image contour. The ‘method is based on the minimization ofthe energy functional (510). First of ll, let us summarize the assumptions and state the problem. ‘Assumptions Let beam image and jy... the chain of image locations representing the ntl postion ofthe deformable contour, which we assume close tothe image contour f interes, Section5.4 Deformable Contours 119 Problem Statement Starting rom jy... fy ind the deformable contour ps... py which fs the target image ‘contour best, by inimzing the energy fuetonal Y leon + Alar +E ne) with ws =O, Eons Engr tl Ea 8885.11), $12) and (5.13) espectvly (Ofte many algorithms proposed to fit deformable contours, we have selected a seedy algorithm. A gredy algorithm makes oly optimal ches, inthe hope that they lead ta globaly optimal solution. Among the reasons for selecting the seedy algorithm instead of cher methods we emphasize its simplicity and low computational complet The algorithm iconcepnal sinple because i doesnot require knowledge ofthe calculus of variations thas low computainal comple because it converges ina number of trations proportional tothe number of contour pints times the number of locations in whie each point can move at each iteration, whereas other snake algorithms take much longer. ‘Te core oa greedy algorithm for the computation of deformable contour con- sists of two hascsteps Fist, at each iteration, ach point ofthe contour is moved within a small neighborhood tothe point which minimizes the energy functional. Second, fore starting anew iteration the algorithm looks for corner in the eontour, and takes appropriate measures onthe parameters j,..., By controlling Ey. Let ws discuss these two steps in more detail Step 1: Greedy Mininization. The neighborhood over which the energy functionals locally minimized is typically smal for instance, a3 x3 or x S window entered at cach contour point). Keeping the size ofthe neighborhood small lowers the computational nad of the method the complexity beng liner the sizeof the neighborhood). The local minimization is done by direct comparison ofthe eneray functional valu teach locaton. ‘Step 2: Comer Elimination. During the second step the algorithm searches for comers ascurvature macina along contour. a eorvatoe maximum found at point Bisset to zt Neglecting the contibuion of Es at p makesit posible to Keep the deformable contour pieeense smooth. © Fora corre impkmentaton ofthe mood, it isimportanto normalize the contibation ofeach energy ter. For the ters Bad Er ti suiient did by the largest he leas of various he mubomataltectaigue fo zen he imam ofan, ‘hese wy alas ves the efor termining te minima an rr neon,

You might also like