Professional Documents
Culture Documents
CH#1: INTRODUCTION.........09
1-1. INTRODUCTION .............................................................................................................10 1-2. PROBLEM STATEMENT.................................................................................................10 1-3. METHODOLOGY ............................................................................................................11 1-3.1. HARDWARE DEVELOPMENT ................................................................................11 1-3.2. SOFTWARE DEVELOPMENT ..................................................................................11 1-4. THESIS ORGANIZATION ..............................................................................................13
________________________________________________________________Table of Contents 2-3. ROBOT KINEMATICS & POSITION ANALYSIS .........................................................26 2-3.1 MATRIX REPRESENTATION ....................................................................................27 2-3.1.1. Representation of a Point in Space ..............................................................27 2-3.1.2. Representation of a Vector in Space............................................................27 2-3.1.3. Representation of a Frame at the Origin of a Fixed-Reference Frame ..28 2-3.1.4. Representation of a Frame in a Fixed Reference Frame..........................28 2-3.1.5. Representation of a Rigid Body ....................................................................29 2-4. HOMOGENEOUS TRANSFORMATION MATRICES ..................................................31 2-4.1. REPRESENTATION OF TRANSFORMATIONS ......................................................31 2-4.1.1. Representation of a Pure Translation...........................................................31 2-4.1.2. Representation Of Pure Rotation About An Axis........................................32 2-4.1.3. Representation Of Combined Transformations ..........................................34 2-4.1.4. Transformations Relative To The Rotating Frame .....................................35 2-4.2. INVERSE OF TRANSFORMATION MATRICES .....................................................35 2-5. VISION SYSTEMS ............................................................................................................37 2-5.1. IMAGE ACQUISITION ............................................................................................37 2-5.2. IMAGE PROCESSING.............................................................................................38 2-5.3. IMAGE ANALYSIS....................................................................................................38 2-5.4. IMAGE UNDERSTANDING.....................................................................................38 2-6. WHAT IS AN IMAGE .......................................................................................................38 2-6.1. TWO-AND THREE-DIMENSIONAL IMAGES ........................................................39 2-6.2. ACQUISITION OF IMAGES ....................................................................................39 2-6.2.1. Vidicon Camera ...............................................................................................39 2-6.2.2. Digital Camera .................................................................................................41 2-6.2.3. Analog-to-Digital Conversion .........................................................................41 2-6.2.4. Pixels .................................................................................................................42 2-6.2.5. Digital Images...................................................................................................42 2-6.3. IMAGE PROCESSING.............................................................................................43 2-6.4. IMAGE-PROCESSING TECHNIQUES ...................................................................43 2-6.4.1. Preprocessing ..................................................................................................43 2-6.4.2. Lighting ..............................................................................................................44 2-6.4.3. Frequency Content Of An Image ..................................................................44 2-6.4.4. Windowing ........................................................................................................45 2-6.4.5. Sampling And Quantization ...........................................................................45 2-6.4.6. SAMPLING THEOREM ..................................................................................46 2-6.4.7. Histogram Of Images ......................................................................................49 2-6.4.8. Histogram Flattening .......................................................................................49 2-6.4.9. Thresholding .....................................................................................................50 2-6.4.10. Connectivity ......................................................................................................50 2-6.4.11. Neighborhood Averaging ................................................................................52 2-6.4.12. Image Averaging ..............................................................................................53 2-6.4.13. Median Filters ...................................................................................................54 2-6.5. IMAGE ANALYSIS....................................................................................................54 2-6.5.1. Object Recognition By Features ...................................................................55 2-6.5.2. Basic Features Used for Object Identification .............................................56 2-6.5.3. Binary Morphology Operations ......................................................................56 2-6.5.4. Thickening Operation ......................................................................................57 2-6.5.5. Dilation...............................................................................................................57 2-6.5.6. Erosion ..............................................................................................................57 2-6.5.7. Skeletonization .................................................................................................58
________________________________________________________________Table of Contents 2-6.5.8. Open Operation ...............................................................................................59 2-6.5.9. Close Operation ...............................................................................................59 2-6.5.10. Fill Operation ....................................................................................................59 2-6.5.11. Edge Detection.................................................................................................59 2-6.5.12. Roberts Cross-Operator .................................................................................61 2-6.6. IMAGE UNDERSTANDING.....................................................................................62 2-6.6.1. Template Matching ..........................................................................................62 2-6.6.2. Other Techniques ............................................................................................63 2-6.6.3. Limitations And Improvements ......................................................................64 2-6.6.4. Detecting Motion ..............................................................................................65 2-7. MATLAB & DIGITAL IMAGE REPRESENTATION ....................................................65 COORDINATE CONVENTIONS IN MATLAB ......................................................................66 IMAGES AS MATRICES ..........................................................................................................66 2-8. READING IMAGES .........................................................................................................67 2-9. DISPLAYING IMAGES ....................................................................................................68 2-10. WRITING IMAGES ..........................................................................................................69 2-11. DATA CLASSES ................................................................................................................72 2-12. IMAGE TYPES ..................................................................................................................73 2-12.1. INTENSITY IMAGES ..............................................................................................73 2-12.2. BINARY IMAGES ..................................................................................................73 2-13. CONVERTING BETWEEN DATA CLASSES AND IMAGE TYPES ...........................74 2-13.1. CONVERTING BETWEEN DATA CLASSES ......................................................74 2-13.2. CONVERTING BETWEEN IMAGE CLASSES AND TYPES ..............................74 2-14. FLOW CONTROL.............................................................................................................76 2-14.1. IF, ELSE, AND ELSEIF ...........................................................................................76 2-14.2. FOR LOOP ............................................................................................................78 2-14.3. WHILE ....................................................................................................................79 2-14.4. BREAK ....................................................................................................................79 2-14.5. CONTINUE ............................................................................................................79 2-14.6. SWITCH .................................................................................................................79 2-15. CODE OPTIMIZATION ..................................................................................................80 2-15.1. VECTORIZING LOOPS........................................................................................80 2-15.2. PREALLOCATING ARRAYS ...............................................................................84 2-16. EDGE DETECTION METHODS ...................................................................................84 2-16.1. DESCRIPTION .......................................................................................................85 2-16.2. SOBEL METHOD ..................................................................................................85 2-16.3. PREWITT METHOD ...............................................................................................86 2-16.4. ROBERTS METHOD..............................................................................................86 2-16.5. LAPLACIAN OF GAUSSIAN METHOD ............................................................86 2-16.6. ZERO-CROSS METHOD......................................................................................86 2-16.7. CANNY METHOD................................................................................................86 2-16.8. CLASS SUPPORT ..................................................................................................87 2-17. PERFORMING MORPHOLOGICAL OPERTAIONS ON IMAGES ............................87 2-17.1. SYNTAX& DESCRIPTION ....................................................................................87 2-17.2. CLASS SUPPORT ..................................................................................................89 2-18. CAPTURING IMAGE FROM WEBCAM USING MATLAB ..........................................90 2-18.1. SYNTAX .................................................................................................................91 2-19. REFRENCES .....................................................................................................................91
________________________________________________________________Table of Contents
CH#4: HARDWARECOMPONENTS..99
4-1. INTRODUCTION ...........................................................................................................100 4-2. HARDWARE BUILDING BLOCKS ..............................................................................100 4-2.1. DUAL POLARITY POWER SUPPLY ......................................................................100 4-2.2. INTERFACING CIRCUITARY OF ROBOT ...........................................................101 4-2.3. PARALLEL PORT ....................................................................................................101 4-2.4. OPTOCOUPLER .....................................................................................................102 4-2.5. TRANSISTOR ...........................................................................................................103 4-2.6. OVERALL CIRCUIT SCHEME .......................................................................................103 4-3. ROBOTIC ARM TRAINER (OWI- 007) .........................................................................104 4-3.1. PRODUCT INFORMATION...................................................................................104 4-3.2. SPECIFICATIONS ...................................................................................................104 4-3.2.1. Five Axis Of Motion .......................................................................................104 4-3.2.2. Product Dimensions ......................................................................................104 4-3.2.3. Power Source .................................................................................................104 4-4. ROBOT EYE ...................................................................................................................105 4-5. CPU REQUIRMENTS .....................................................................................................105 4-6. REFRENCES ...................................................................................................................105
________________________________________________________________Table of Figures FIGURE 1-1: Our machine vision, robot system...........................................................................12 FIGURE 2-1: A Fanuc M-410iww palletizing robotic Arm ..........................................................17 FIGURE 2-2: Robot types..............................................................................................................19 FIGURE 2-3: Some possible robot co ordinate frames..................................................................19 FIGURE 2-4: A robots reference frame........................................................................................20 FIGURE 2-5: Workspaces for common robot configurations. ......................................................22 FIGURE 2-6: A Fanuc P-15 robot. ................................................................................................23 FIGURE 2-7: Representation of a point in space...........................................................................27 FIGURE 2-8: Representation of a vector in space.........................................................................28 FIGURE 2-9: Representation of a frame in a frame ......................................................................29 FIGURE 2-10: Representation of a rigid body in a frame .............................................................29 FIGURE 2-11: Representation of a pure Translation in space.......................................................32 FIGURE 2-12: Coordinates of a point relative to the reference frame and rotating frame............33 FIGURE 2-13: The Universe, robot, hand, part, and end effecter frames .....................................35 FIGURE 2-14: Gray intensity creation in printed images..............................................................39 FIGURE 2-15: Schematic of a vidicon camera,.............................................................................40 FIGURE 2-16: Raster scan depiction of a vidicon camera. ...........................................................40 FIGURE 2-17: (a) Image data collection model. (b) The CCD element of a VHS. ......................41 FIGURE 2-18: Sampling a Video Signal.......................................................................................42 FIGURE 2-19: Sample/Hold Amplifiers........................................................................................42 FIGURE 2-20: Noise and edge information in an image...............................................................44 FIGURE 2-21: Discrete signal reconstructed of the signal............................................................45 FIGURE 2-22: Different sampling rates for an image. ..................................................................46 FIGURE 2-23: An image quantized at 2, 4, 8 and 44 gray level. ..................................................46 FIGURE 2-24: A low resolution (16x16) image............................................................................47 FIGURE 2-25: Sinusoidal signal with a frequency of f s ...............................................................47 FIGURE 2-26: Reconstruction of signals from sampled data........................................................47 FIGURE 2-27: Sampling rate comparison of two signals..............................................................48 FIGURE 2-28: The image of Figure2.24 Presented at higher resolution.......................................48 FIGURE 2-29: Effect of histogram equalization. ..........................................................................49 FIGURE 2-30: Histogram Flattering to improve detail. ................................................................50 FIGURE 2-32: Neighborhood connectivity of pixels ....................................................................51 FIGURE 2-33: The image for Example 2.3 ...................................................................................52 FIGURE 2-34: The results of the connectivity searches for Example 2.3. ....................................52 FIGURE 2-35: Neighborhood averaging of an image. ..................................................................52 FIGURE 2-36: Neighborhood averaging mask..............................................................................52 FIGURE 2-37: Neighborhood averaging of image. .......................................................................53 FIGURE 2-38: 5x5 and 3x3 Gaussian averaging Filter. ................................................................53 FIGURE 2-39: Improvement of corrupted image with 7X7 median filter. ..................................54 FIGURE 2-40: Consult [21] for more details.................................................................................55 FIGURE 2-41: Image analysis scheme. .........................................................................................55 FIGURE 2-42: Aspect ratio of an object........................................................................................56 FIGURE 2-43: Binary image of a bolt and its skeleton. ................................................................57 FIGURE 2-44: Removal of threads by thickening operation.........................................................57 FIGURE 2-45: Effect of dilation operations. .................................................................................57 FIGURE 2-46: Effect of erosion on objects (a) with (b) 3 and (c) 7 repetitions. ..........................58 FIGURE 2-47: Skeletonization of an image without thickening. ..................................................58 FIGURE 2-48: The skeleton of the object after thickening. ..........................................................59 FIGURE 2-49: result of a fill operation. ........................................................................................59 FIGURE 2-50: Edge detection algorithm.......................................................................................60 FIGURE 2-51: Flow chart for edge detection................................................................................61
________________________________________________________________Table of Figures FIGURE 2-52: The Roberts cross-operator. ..................................................................................61 FIGURE 2-53: Types of Vertexes That Form Functions Between Objects in a Scene .................64 FIGURE 2-54: Other Types of Vertexes........................................................................................64 FIGURE 2-55: Coordinate convention. .........................................................................................66 FIGURE 2-56: Result of scaling by using imshow(h , [ ] ) ...........................................................69 FIGURE 2-57: Effect of storing image at different quality values. ...............................................71 TABLE 2-1: Data classes...............................................................................................................73 TABLE 2-2: Function in IPT for converting between image classes and types ............................75 TABLE 2-3: Flow control statements ............................................................................................76 FIGURE 2-58: Sinusoidal image generated in previous example. ................................................83 FIGURE 2-59: Edge detection methods.........................................................................................87 TABLE 2-4: Morphological operations .........................................................................................88 FIGURE 2-60: Before morphological operation............................................................................89 FIGURE 2-61: After morphological operation. .............................................................................89 FIGURE 2-62: Skeletonization. .....................................................................................................90 FIGURE 2-63: Screen shot of capturing window. .........................................................................90 FIGURE 3-1: Arm Trainer Model: OWI-007. ...............................................................................95 FIGURE 4-1: Circuit diagram of the dual power supply. ............................................................100 FIGURE 4-2: Actual view of our power supply. .........................................................................101 FIGURE 4-3: Pin configuration of the parallel port. ...................................................................101 TABLE 4-1: Detailed description of parallel port. ......................................................................102 FIGURE 4-4: View of the optocouplers. .....................................................................................102 FIGURE 4-5: Component view of the C1383C. ..........................................................................103 FIGURE 4-6: Actual view of our PCB. .......................................................................................103 FIGURE 4-7: Overall view of our actual circuit scheme.............................................................104 FIGURE 4-8: View of the robot arm trainer model: OWI-007....................................................105
To our Parents and Teachers whose love and guidance has been instrumental and the driving force behind our effort.
______________________________________________________________Acknowledgements
Chapter 1
Introduction
Chapter 1
Introduction
1-1. INTRODUCTION
Man has always been fascinated by its five basic senses. The sense of sight, hearing, smell, taste and touch. The understanding of these functions has gone through many a mile of development and we have tried to comprehend them and emulate them. The whole theme behind this is to provide more leisure time to human. It is this reason that man has tried to develop these qualities in machines, of which most of the work has been done on the field of sight, hearing and touch. We have sensors to depict touch, microphones & speech recognition software to imitate hearing and camera and machine vision system to develop sight. Scientists have always been fascinated by human thinking and reasoning process. But as we know that human brain does more than just pure thinking and reasoning. It is the control interpreter for all of our senses. The brain must have inputs about its surroundings to control the body. Our senses must provide those inputs. In short, what we wish to say here is that we want to develop a machine that sees, hear and feels like a human, but never tires. Its smart enough to deal with different changing scenarios. It may work in a hazardous environment and take decisions and make inferences like a human. This is very important in saving important human lives and stops the suffering of people who become ill working in such an environment. It is with this desire that people have worked in this field. It is the same passion that we earnestly pursue in developing machine vision. What ever work we have done may not have that much significance in the vast scientific world as people may have already worked and achieved what we have done over here. But it is our belief that every beginning is a new beginning, in the midst of the whole enclave there may be something new which we may achieve. It is very difficult to explain in true spirit what we see; why we see it and how do we comprehend it. Human brain is complex circuit of neurons. Scientists have largely worked upon the procedures going on behind the scene inside the brain. We are still perplexed by the fact that how come a complex entangled bunch of neurons is able to think so diversely. Whatever understanding that have led us to develop microprocessor, uses the same logic of reasoning as a human. Vision is nothing more than just mental reasoning and understanding of surroundings. In humans the understanding is done by the brain, whereas in computers it is done by a computer program which is created again by a human.
10
Chapter 1
Introduction
conventional methods; however, if somehow a robotic vision system is developed the product can be distinguished. Machine vision is a very complex field. There is no boundary as to what you can achieve. But at graduate we had to limit our selves to a great extent due to time constraints and the level of our expertise. It is due to that reason we have limited ourselves to the two dimensional view but we plan in the future to go into the third dimension.
1-3. METHODOLOGY
People have used various methods to emulate vision. For this they have used various languages like visual C++, visual basic and java. But these languages are very specific and need proper training and intensive study procedures. Even then people do not master these languages, leaving a side using them for vision systems. Our aim basically was to use a simple programming environment which is simpler than these languages and is also fast enough. It is for this reason that we have chosen MATLAB. Some people may argue that MATLAB is not fast enough to run vision systems in real time although we may agree on this to some extent, but we have manage to develop certain ideal conditions which have enabled us to use MATLAB for vision. and to give us comparable movement of our robot which equal in stature to a robot using visual C in its background but we must concede here that it is not as fast enough till now. Looking at such difficulties we had decided to follow a modular approach for the completion of our project. So, the first step, from where we started our project, was the study of basics of robot designing and using that knowledge we described our design parameters & keeping market availability & cost factors in mind we selected a robotic arm for our project. For simplicity and due to the enormity of this field we have limited ourselves to two dimensions. For vision we have used a PC camera which connects through a USB cable to the computer. The PC camera is able to produce an image ranging from 160x120 pixels. Its refreshing rate is 30 frames/seconds. This high resolution camera is able to capture image in real time.
1-3.1.
HARDWARE DEVELOPMENT
The next step was to develop an interface between computer and the robot. We have selected the parallel port to communicate with the robot. A special circuit was designed using a special methodology which will be discussed in details in chapter 3.
1-3.2.
SOFTWARE DEVELOPMENT
Programming in MATLAB is similar to programming in C yet it uses simple English words to carry out complex tasks. What we have done here is that we have emulated the process (of edge detection) for image understanding. We used a simple white background (this again done to keep things simple at start) and placed our robot in front of that white background and the camera is placed in front of the robot to give a to dimensional image feedback to the computer. To understand the image we have written algorithms to reduce the different links of robot into straight line. We may add here that
11
Chapter 1
Introduction
this process of converting from a real time image to a line representation goes through much process. After achieving line image the algorithms basically calculate the angles between the different lines. The angles basically help us to position the robot. The line representation is stored within the computer memory, so that computer knows the difference between the robot and the image. If any other object is then placed into the environment the computer is able to differentiate between the object and the robot. In future the computer may be made to learn different shaped objects and differentiate them from robot and make the robot pick the exact object even if there are many objects present in the environment. When an object is detected the software tells the robot to move its extreme point that is the gripper towards the object. Meanwhile there is continuous feedback from the camera to the computer. The camera image and the angles between the different links guides the movements of the robot until its extreme point reaches the object and the grippers comes in line with the center point on the object. Once this happens a command is given to the gripper to grip the object. The robot then moves the object to another position within the two dimensional frame according to previously supplied target coordinates. The arm then moves down on to the floor when it touches the floor the movement stops, the gripper loosens and the product is placed on the floor. Then the cycle starts again. The overview of the whole system is given below. It is quite evident from the diagram that the camera is the eye of the robot whereas computer is its brain and the effector is the robotic arm itself.
12
Chapter 1
Introduction
13
Chapter 2
Literature Study
14
Chapter 2
Literature Study
2-1. MANIPULATORS
An industrial robot defined by U.S robot industries association (RIA) a "reprogrammable multifunctional manipulator designed to move material parts, tools or specialized devices through variable programmed motions for the performance of a verity of tasks. Similar definitions are adopted by British Robot Association and Japanese Robot Association, etc [13] There are several more or less clearly distinguished generations of industrial robots. The first-generation robots are fixed-sequence robots which can repeat a sequence of operations once they have been programmed to do so, to carry out a different job. They have to be reprogrammed, often by "training" or "education." The second-generation robots are equipped with sensory devices which allow a robot to act in a not-completely defined environment, e.g. pick up a part which is misplaced from its ideal position, pick up a needed part from a batch of mixed parts, recognize a need to switch from one succession of motions to another etc. The third-generation robots which are emerging now have the intelligence to allow them to make decisions, such as ones necessary in assembly operations (assembling a proper combination of parts; rejecting faulty parts; selecting necessary combinations of tolerances, etc.). Robots of the first and so-called "1.5" generation 'with some sensing devices (constitute the overwhelming majority of robots now in use and in production. However, regardless of the generation, industrial robots are built of three basic systems: The "mechanical structure" consisting of mechanical linkages and joints capable of various movements. Additional movements are made possible by end effectors fitted at the arm end. The "control system," which can be of "fixed" or "servo" type. Robots with fixed control systems have fixed (but, possibly, adjustable! mechanical stops, limit switches, etc... for positioning and informing the controller. Servo-controlled robots can be either point to point (PTP) where only specified point coordinates are under control or not the path between them or continuous path (CP) controlled, thus achieving a smooth transition between the critical points. The "power units," which can be hydraulic, pneumatic, electrical, or their combination, with or without mechanical transmissions. If we consider a human being as a manipulator, it would be a very effective and efficient one. With the total mass 68 to 90 kg (150 to 200 Ib) and its "linkage" (lower and upper arm and wrist) mass 4.5 to 9.0 kg (10 to 20 lb), this manipulator can precisely handle, with a rather high speed, loads up to 4,5 to 9.0 kg (10 to 20 Ib); with slightly lower speeds it can handle loads up to 15 to 25 kg (30 to 50 Ib), or about one-fifth to one-quarter of its overall mass, far exceeding the "linkage" mass; and it can make simple movements with loads exceeding its overall mass, up to 90 to 135 kg (200 to 300 Ib), and in cases of trained athletes, much more. On the other hand, industrial robots have payload limitations (and, in this case, the payload includes the mass of the gripper or end effectors which amount to one-twentieth to one-fiftieth of their total mass, more than 10 times less effective than a human being. And such massive structures cannot move with the required speeds [13]. It was found that human operators can handle loads up to 1.5 kg (3 Ib) faster than existing robots, in the 1.5 to 9 kg (3 to 20 Ib) range they are very competitive, and only above 9 kg (l20 Ib) are robots technically more capable. If the mass of end effectors or grippers is considered, which the human operator has built in but which consumes up to half of the maximum payload mass in robots, then one can come to the conclusion that robots with maximum rated loads below 3 kg l6 Ib are mechanically inferior to 15
Chapter 2
Literature Study
human operators, in the 3 to 20 kg (6 to 40 Ib) range they are comparable, and only at higher loads are they superior.
2-2.1.
CLASSIFICATION OF ROBOTS
The following is the classification of robots according to the Japanese Industrial Robot Association (JIRA): Class 1: Manual-Handling Device: A device with multiple degrees of freedom that is actuated by an operator. Class 2: Fixed-Sequence Robot: A device that performs the successive stage of a tasks according lo a predetermined, unchanging method and is hard to modify. Class 3: Variable-Sequence Robot: Same as class 2, but easy to modify. Class 4: Playback Robot: A human operator performs the task manually by leading the robot which records the motions for later playback. The robot repeats the same motion according to the recorded information. Class 5: Numerical Control Robot: The operator supplies the robot with i movement program rather than teaching it the task manually. Class 6: Intelligent Robot: A robot with the means to understand its environment and the ability to successfully complete a task despite changes in the surrounding conditions under which it is to be performed [1].
2-2.2.
ROBOT COMPONENTS
A robot, as a system, consists of the following elements, which are integrated together to form a whole:
16
Chapter 2
Literature Study
2-2.2.1. 2-2.2.2.
This is the main body of the robot and consists of the links, the joints, and other structural elements of the robot. Without other elements, the manipulator alone is not a robot (Figure 2.1). This is the part that is connected to the last joint (hand) of a manipulator, which generally handles objects, makes connection to other machines, or performs the required tasks (Figure 2.1). Robot manufacturers generally do not design or sell end effectors. In most cases, all they supply is a simple gripper. Generally, the hand of a robot has provisions for connecting specialty end effectors that are specifically designed for a purpose. This is the job of a company's engineers or outside consultants to design and install the end effecter on the robot and to make it work for the given situation. A welding torch, a paint spray gun, a glue-laying device, and a parts handler are but a few of the possibilities. IN Most cases, the action of the end effecter is either controlled by the robots controller or the controller communicates with the end effectors controlling device (such as a PLC).
2-2.2.3.
Actuators
Actuators are the -muscles" of the manipulators. Common types of actuators are servomotors, stepper motors, pneumatic cylinders, and hydraulic cylinders. There are also other actuators that are more novel and are used in specific situations. Actuators are controlled by the controller.
2-2.2.4.
Sensors
Sensors are used to collect information about the internal state of the robot or to communicate with the outside environment as in humans. Robots are often equipped with external sensory devices such as a vision system, touch and tactile sensors, speech synthesizers, etc., which enable the robot to communicate with the outside world.
2-2.2.5.
Controller
The controller receives its data from the computer, controls the motions of the actuators, and coordinates the motions with the sensory feedback information. Suppose that in order for the robot to pick up a part from a bin. it is necessary that its first joint be at 35 If the Joint is not already at this magnitude, the controller will send a signal lo the actuator (a current to an electric motor, air to a pneumatic cylinder, or a signal to a hydraulic servo valve), causing it to move. It will then measure the change in the joint angle through the feedback sensor attached to the joint (a potentiometer, an encoder. etc.). When the joint reaches the desired value, the signal is stopped.
17
Chapter 2
Literature Study
2-2.2.6.
Processor
The processor is the brain of the robot. It calculates the motions of the robots joints, determines how much and how fast each joint must move to achieve the desired location and speeds, and oversees the coordinated actions of the controller and the sensors The processor is generally a computer which works like all other computers, but is dedicated to a single purpose. It requires an operating system, programs, peripheral equipment such as monitors, and has many of the same limitations and capabilities of a PC processor.
2-2.2.7.
Software
There are perhaps three groups of software that are used in a robot. One is the operating system, which operates the computer. The second is the robotic software, which calculates the necessary motions of each joint based on the kinematics equations of the robot. This information is sent lo the controller. This software may be at many different levels, from machine language to sophisticated languages used by modern robots. The third group is the collection of routines and application programs that are developed in order to use the peripheral devices of the robots, such as vision routines, or to perform specific tasks. It is important to note that in many systems, the controller and the processor are placed in the same unit. Although these two units are in the same box and even if they are integrated into the same circuit, they have two separate functions.
2-2.3.
ROBOT COORDINATES
Robot configuration generally follows the coordinate frames with which they are defined, as shown in Figure 2.2. Prismatic joints are denoted by P, revolute joints are denoted by R. and spherical joints are denoted by S. Robot configurations are specified by a succession of Ps, R's, or Ss. For example, a robot with three prismatic and three revolute Joints is specified by 3P3R. The following configurations are common for positioning the hand of the robot:
18
Chapter 2
Literature Study
19
Chapter 2
Literature Study
2-2.5.1. Payload
Payload is the weight a robot can carry and still remain within its other specifications. For example, a robot's maximum load capacity may be much larger than its specified payload, but at 20
Chapter 2
Literature Study
the maximum level, it may become less accurate may not follow its intended path accurately, or may have excessive deflections. The payload of robots compared with their own weight is usually very small, For example. Fanuc Robotics LR Mate robot has a mechanical weight of 86 lbs and a payload of 6.6 Ibs, and the M-16i robot has a mechanical weight of 594 lbs and a payload of 35 Ibs.
2-2.5.2. Reach
Reach is the maximum distance a robot can reach within its work envelope. As we will see latter, many points within the work envelope of the robot may be reached with any desired orientation (called dexterous). However, for other points, dose to the limit of robots reach capability, orientation cannot be specified as desired (called nondexterous point). Reach is a function of the robots joint lengths and its configuration.
2-2.5.3. Precision
Precision is defined as how accurately a specified point can be reached. This is a function of the resolution of the actuator, as well as its feedback devices. Most industrial robots can have precision of 0.001 inch or better.
21
Chapter 2
Literature Study
quently, the actual size and shape of the workspace depend significantly on parameters other than basic kinematics, such as structural tolerances, thermal condition of the linkage, the payload being manipulated, velocities and accelerations, etc. An important parameter of the workspace is its degree of redundancy. A human being can reach most of the workspace of his or her arm from several directions, thus overcoming possible obstacles. Some robots also demonstrate such ability thanks to redundant degrees of freedom. While extra degrees of freedom significantly complicate programming and control strategies, as well as structural design, the use of at least one redundant degree of freedom is advocated to overcome the degeneracy problems [7]. The angle at which the arm reaches a certain point is not routinely specified by the robot manufacturer. However, it can be a critical factor for certain applications.
Chapter 2
Literature Study
A system with seven degrees of freedom does not have a unique solution. This means that if a robot has seven degrees of freedom, there are an infinite number of ways it can position a part and orientate it at the desired location. For the controller to know what to do there must be some additional decision making routine that allows it to pick only one of the infinite ways. As an example, one may use an optimization routine to pick the fastest or the shortest path to the desired destination. Then the computer has to check all solutions to find the shortest or fastest response and perform it. Due to this additional requirement, which can take much computing power and time, no seven-degree-of-freedom robot is used in industry. A similar issue arises when a manipulator robot is mounted on a moving base such as a mobile platform or a conveyor belt (Figure 2.6). The robot then has an additional degree of freedom, which, based on the preceding discussion, is impossible to control. The robot can be at a desired location and orientation from infinitely many distinct positions on the conveyor belt or the mobile platform. However, in this case, although there are too many degrees of freedom, generally, the additional degrees of freedom are not solved for. In other words, when a robot is mounted on a conveyor belt or is otherwise mobile, the location of the base of the robot relative to the belt or other reference frame is known. Since this location does not need to be defined by the controller, the remaining number of degrees of freedom are still 6, and thus, unique. So long as the location of the base of the robot on the belt or the location of the mobile platform is known (or picked), there is no need to find it by solving a set of equations of robot motions, and, thus, the system can be solved.
Can you determine how many degrees of freedom the human arm has? This should exclude the hand (palm and the fingers), but should include the wrist. Before you go on, please try to see if you can determine it. You will notice that the human arm has three joint clusters in it, the shoulder, the elbow and the wrist. The shoulder has three degrees of freedom, since the upper arm (humerus) can rotate in the sagittal plane (parallel to the mid-plane of the body), the coronal plane (a plane from shoulder to shoulder), and about the humerus. (Verify this by rotating your arm about the three different axes.) The elbow has only one degree of freedom; it can only flex and extend about the elbow joint. The wrist also has three degrees of freedom. It can abduct and adduct, flex and extend, and since the radius bone can role over the ulna bone, it can rotate longitudinally (pronate and supinate). Thus, the human arm has a total of seven degrees of freedom, even if the ranges of some movements are small. Since a seven-degree-of-freedom system does not have a unique solution, how do you think we can use our arms? You must realize that in a robot system, the end effecter is never considered as one of the degrees of freedom. All robots have this additional capability, which may appear to be similar to a degree of freedom. However, none of the movements in the end effecter are counted towards the robot's degrees of freedom. There are many robots in industry that possess fewer than six degrees of freedom. In fact, robots with 3.5,4 and 5 degrees of freedom are very common. So long as there is no need for the additional degrees of freedom, these robots perform very well. As an example, suppose that you 23
Chapter 2
Literature Study
desire to insert electronic components into a circuit board. The circuit board is always laid flat on a known work surface; thus, its height (z-value) relative to the base of the robot is known. Therefore, there is only need for two degrees of freedom along the x- and y-axes to specify any location on the board for insertion. Additionally, suppose that the components would be inserted in any direction on the board, but that the board is always flat. In that case, there will be need for one degree of freedom to rotate about the vertical axis (z) in order to orientate the component above the surface. Since there is also need for a 1/2-degree of freedom to fully extend the end effecter to insert the part or to fully retract it to lift the robot before moving, all that is needed is 3.5 degrees of freedom; two to move over the board, one to rotate the component, and 1/2 to insert or retract. Insertion robots are very common and are used extensively in electronic industry. Their advantage is that they are simple to program, are less expensive, and are smaller and faster. Their disadvantage is that although they may be programmed to insert components on any size board in any direction, they cannot perform other Jobs. They are limited to what 3.5 degrees of freedom can achieve, but they can perform a variety of functions within this design limit.
24
Chapter 2
Literature Study
executed. The motions are taught by an operator, either through a model, by physically moving the end effecter, or by directing the robot arm and moving it through its work space. Painting robots, for example, are programmed by skilled painters through this mode.
25
Chapter 2
Literature Study
2-2.9.2. Point-to-Point
Level In this level (such as in Funky and Cincinnati Milacrons T3), the coordinates of the points are entered sequentially, and the robot follows the points as specified. This is a very primitive and simple type of program is easy to use, but not very powerful. It also lacks branching, sensory information, and conditional statements.
In these languages, it is possible to develop more sophisticated programs, including sensory information, branching, and conditional statements (such as VAL by Unimation). Most languages of this level are interpreter based.
Most languages of this level are compiler based, are powerful, and allow more sophisticated programming. However, they are also more difficult to learn. Currently, there are no actual languages of this level in existence. Autopass, proposed by IBM in the 1980s, never materialized. Autopass was supposed to be task oriented. This means that instead of programming a robot to perform a task by programming each and every step necessary to complete the task, the user was simply to mention the task, while the controller would create the necessary sequence. Imagine that a robot is to sort three boxes by size. In all existing languages, the programmer will have to tell the robot exactly what to do, which means that every step must be programmed. The robot must be told how to go to the largest box, how to pick up the box, where to place it, go to the next box, etc. In Autopass, the user would only indicate "sort," while the robot controller would create this sequence automatically.
26
Chapter 2
Literature Study
A point P in space (Figure 2.7) can be represented by its three coordinates relative to a reference frame: (2.1) Where ax, by, and cz are the three coordinates of the point represented in the reference frame. Obviously, other coordinate representations can also be used to describe the location of a point in space.
where ax, by, and cz are the three components of the vector in the reference frame. In fact, point P in the previous section is in reality represented by a vector connected to it at point P and expressed by the three components of the vector. The three components of the vector can also be written in a matrix form, as in Equation (2.5). This format will be used throughout this book to represent all kinematics elements:
27
Chapter 2
Literature Study
This representation can be slightly modified to also include a scale factor w such that if x, y, and z are divided by w, they will yield ax, by, and cz. Thus, the vector can be
written as Variable w may be any number, and as it changes, it can change the overall size of the vector. This is similar to zooming a picture in computer graphics. As the value of w changes, the size of the vector changes accordingly. If w is greater than unity, all vector components enlarge; if w is less than unity, all vector components become smaller. This is also used in computer graphics for changing the size of pictures and drawings. If w is unity, the size of the components remains unchanged. However, if w = 0, then ax, by, and cz, will be infinity. In this case, x, y, and z. (as well as ax, by, and cz) will represent a vector whose length is infinite, but nonetheless, is in the direction represented by the vector. This means that a directional vector can be represented by a scale factor of w = 0, where the length is not of importance, but the direction is represented by the three components of the vector. This will be used throughout this book to represent directional vectors.
28
Chapter 2
Literature Study
2-3.1.5.
An object can be represented in space by attaching a frame to it and representing the frame in space. Since the object is permanently attached to this frame, its position and orientation relative to this frame is always known. As a result, so long as the frame can be described in space, the object's location and orientation relative to the fixed frame will be known (Figure 2.10). As before a frame in space can be represented by a matrix, where the origin of the frame as well as the three vectors representing its orientation relative to the reference frame are expressed. Thus,
A rigid body in space has 6 degrees of freedom, meaning that not only it can move along three axes of X, Y, and Z, but can also rotate about these three axes. Thus, all that is needed to completely define an object in space is 6 pieces of information describing the location of the origin of the object in the reference frame relative to the three reference axes, as well as its orientation about the three axes. However, as can be seen in Equation (2.9), 12 pieces of information are given, 9 for orientation, and 3 for position. (This excludes the scale factors on the last row of the matrix, because they do not add to this information.) Obviously, there must be some constraints
29
Chapter 2
Literature Study
present in this representation to limit the preceding to 6. Thus, we need 6 constraint equations to reduce the amount of information from 12 to 6 pieces. The constraints come from the known characteristics of the frame, which we have not used yet: The three unit vectors are mutually perpendicular, and Each unit vector's length must be equal to unity. These constraints translate into the following six constraint equations: (1) 0. (The dot product of n and o vectors must be zero.) (2) = 0. = 0. (3) (4) |n | = 1. (The magnitude of the length of the vector must be 1.) (5) = 1. (6) = 1. As a result, the values representing a frame in a matrix must be such that the foregoing equations are true. Otherwise, the frame will not be correct. Alternatively, the first three equations in Equation (2.10) can be replaced by a cross product of the three vectors as follows: EXAMPLE2.1 For the following frame, find the values of the missing elements, and complete the matrix representation of the frame: F = ? 0 0.707 ? ? ? 0 0 ? ? 0 0 5 3 2 1
Solution: Obviously, the values 5, 3, 2 representing the position of the origin of the frame do not affect the constraint equations. You also notice that only three values for directional vectors are given. This is all that is needed. Using Equation (2.10), we get nx ox + ny oy + nz oz = 0 nx a x + n y a y + n z a z = 0 ax ox + ay oy + az oz = 0 nx2 +ny2 + nz2 =1 ox2 +oy2 +oz2 =1 ax2 +ay2 +az2 =1 Simplifying these equations yields 0.707 (oy) + nz (oz) = 0 0.707 (ay) + nz (0) = 0 ay(oy) = 0 nx2+ nz2 =0.5 oy2 +oz2 = 1 ax2 +ay2 = 1. Solving these six equations yields nx = 0.707, nz = 0, oy = 0, oz= 1, ax = 0.707, and ay = - 0.707. Please notice that both nx and ax must have the same sign. The reason for multiple solutions is that with the given parameters, it is possible to have two sets of mutually perpendicular vectors in opposite directions. The final matrix will be: or or or or or or nx (0) + 0.707 (oy) + nz (oz) = 0. nx(ax) + 0.707 (ay) + nz (0) = 0. ax (0) + ay(oy) + 0 (oz) = 0. nx2 + 0.7072 +nz2 = 1. 02 + oy2 +oz2 = 1. 2 2 ax +ay +az2 = 1.
30
Literature Study
F=
or
F=
As can be seen, both matrices satisfy all the requirements set by the constraint equations. It is important to realize that the values represented by the three direction vectors are not arbitrary, but are bound by these equations. Thus, you may not arbitrarily use any desired values in the matrix. The same problem may be solved by taking the cross products of n and o and setting it equal to a as n x o = a, or k nx ny nz = ax + ay + azk , ox oy oz Or (ny oz nz oy) (nx oz nz ox ) + k (nx oy ny ox) = ax + ay+ azk Substituting the values into this equation yields: (0.707oz nz oy ) - (nx oz ) + k(nx oy) = ax +ay + azk . The three simultaneous equations give 0.707 oz - nz oy =ax - nx oz = ay nx ox = 0 Above replace the three equations for the dot products. Together with the three unit vector length constraint equations, there are six equations, which ultimately result in the same values for the unknown parameters. Please verify that you get the same results.
Chapter 2
Literature Study
translation to the vector representing the original location of the origin of the frame. In matrix form the new frame representation may be found by pre-multiplying the frame with a matrix representing the transformation. Since the directional vectors do not change in a pure translation, the transformation T will simply be:
Where dx, dy, and dz are the three components of a pure translation vector d relative to the x-, y-, and z-axes of the reference frame. As you can see, the first three columns represent no rotational movement (equivalent to unity), while the last column represents the translation. The new location of the frame will be
This equation is also symbolically written as Fnew = Trans (dx, dy, dz,) x Fold. (2.15) First, as you see, by premultiplying the transformation matrix with the frame matrix, the new location can be found. This, in one form or another, is true for all transformations, as we will see later. Second, you notice that the directional vectors remain the same after a pure translation, but that as vector addition of d and P would result, the new location of the frame is d + P. Third, you also notice how homogeneous transformation matrices facilitate the multiplication of matrices, resulting in the same dimensions as before.
32
Chapter 2
Literature Study
the reference frame and Pn,Po, and Pa relative to the moving frame. As the frame rotates about the x-axis, point P attached to the frame, will also rotate with it. Before rotation, the coordinates of the point in both frames are the same. (Remember that the two frames are at the same location and are parallel to each other.) After rotation, the Pn,Po, and Py coordinates of the point remain the same in the rotating frame (n, o, a), but Py Py, and P; will be different in the (x, y, z) frame (Figure 2.10). We desire to find the new coordinates of the point relative to the fixed-reference frame after the moving frame has rotated. Now let's look at the same coordinates in 2-D as if we were standing on the; caxis. The coordinates of point P are shown before and after rotation in Figure 2.12. The coordinates of point P relative to the reference frame are Px, Py, and Pz,
FIGURE 2-12: Coordinates of a point relative to the reference frame and rotating frame.
From Figure2.12, you will see that the value of Px does not change as the frame rotates about the x-axis, but the values of Py and Pz, do change. Please verify that Px = Pn Py = l1 l2 = Po cos - Pa sin (2.16) Pz = /3 + /4 = Posin + Pacos, Which in Matrix form
This means that the coordinates of the point (or vector) P in the rotated frame must be premultiplied by the rotation matrix, as shown, to get the coordinates in the reference frame. This rotation matrix is only for a pure rotation about the x-axis of the reference frame and is denoted as Pxyz = Rot (x, ) x Pnoa. (2.18) Please also notice that the first column of the rotation matrix in Equation (2.16), which expresses the location relative to the x-axis, has 1,0,0 values, indicating that the coordinate along the x-axis has not changed.Desiring to simplify writing of these matrices, it is customary to designate CO to denote cos 0 and S 0 to denote sin 0. Thus, the rotation matrix may be also written as 1 Rot(x,)= 0 0 0 C S 0 -S C
33
Literature Study
Equation (2.18) can also be written in a conventional form, which assists in easily following the relationship between different frames. Denoting the transformation as u TR (and reading it as the transformation of frame R relative to frame U (for Universe), denoting Pnoa as RP (P relative to frame R), and denoting Pxyz as UP (P relative to frame U), Equation (2.18) simplifies to U P = UTR x rP. (2.21) As you notice, canceling the s gives the coordinates of point P relative to U. The same notation will be used throughout this book to relate to multiple transformations.
Chapter 2
Literature Study
2-4.1.4.
All transformations we have discussed so far have been relative to the fixed reference frame. This means that all translations, rotations, and distances (except for the location of a point relative to the moving frame) have been measured relative to the reference frame axes. However, it is in fact possible to make transformations relative to the axes of a moving or current frame. This means that, for example, a rotation of 90 may be made relative to the n-axis of the moving frame (also referred to as the current frame), and not the x-axis of the reference frame. To calculate the changes in the coordinates of a point attached to the current frame relative to the reference frame, the transformation matrix is post-multiplied instead. Please note that since the position of a point or an object attached to a moving frame is always measured relative to that moving frame, the position matrix describing the point or object is also always post-multiplied.
2-4.2.
As was mentioned earlier, there are many situations where the inverse of a matrix will be needed in robotic analysis. One situation where transformation matrices may be involved can be seen in the next example. Suppose that the robot in Figure 2.13 is to be moved towards part P in order to drill a hole in the part. The robot's base position relative to the reference frame U is described by a frame R, the robot's hand is described by frame H, and the end effecter (let's say the end of the drill bit that will be used to drill the hole) is described by frame E. The part's position is also described by frame P. The location of the point where the hole will be drilled can be related to the reference frame U through two independent paths: one through the part and one through the robot. Thus we can write
FIGURE 2-13: The Universe, robot, hand, part, and end effecter frames
TE=UTR RTH HTE = UTP PTE Which means that the location of point E on the part can be achieved by moving from U to P, and from P to E, or it can alternatively be achieved by transformations from U to R, from R to H, and from H to E.? In reality, the transformation UTR or the transformation of frame R relative to the U (Universe reference frame) is known, since the location of the robot's base must be known in any setup. For example, if a robot is installed in a work cell, the location of the robot's base will be known, since it is bolted to a table. Even if the robot is mobile or attached to a conveyor belt, its location at any
U
35
Chapter 2
Literature Study
instant will be known, since a controller must be following the position of the robot's base at all times. The HTE, or the transformation of the end effecter relative to the robot's hand is also known, since any tool used at the end effecter is a known tool, and its dimensions and configuration are known. UTP, or the transformation of the part relative to the universe, is also known, since we must know where the part is located if we are to drill a hole in it. This location is known by putting the part in a jig, through the use of a camera and vision system, through the use of a conveyor belt and sensors, or other similar devices. PTE is also known, since we need to know where the hole is to be drilled on the part. Consequently, the only unknown transformation is RTH, or the transformation of the robot's hand relative to the robot's base. This means that we need to find out what the robot's joint variables (the angle of the revolute joints and the length of the prismatic joints of the robot) must be in order to place the end effecter at the hole for drilling. As you see, it is necessary to calculate this transformation, which will tell us what needs to be accomplished. The transformation will later be used to actually solve for joint angles and link lengths. To calculate this matrix, unlike in an algebraic equation, we cannot simply divide the right side by the left side of the equation. We need to pre- or post-multiply by inverses of appropriate matrices to eliminate them. As a result, we will have (UTR)-1 (UTR RTH HTE ) 1 =(UTR ) 1 (UTP ) (HTE )-1 Since (UTR)-1 (UTR) =1, and (HTE) (HTE)-1=1, The left side of the equation simplifies to RTH, and we get R TH=UTR -1 UTP PTE HTE-1 We can check the accuracy of this equation by realizing that (HTE)1 is the same as ETH. Thus, the equation can be rewritten as R TH =UTR -1 UTP PTE HTE-1=RTU UTP PTE ETH =RTH It is now clear that we need to be able to calculate the inverse of transformation matrices for robot kinematics analysis as well. To see what transpires, let's calculate the inverse of a simple rotation matrix about the x-axis. The rotation matrix about the x-axis is 1 Rot(x, ) = 0 0 0 C S 0 -S C
Please recall that the following steps must be taken to calculate the inverse of a matrix: Calculate the determinant of the matrix. Transpose the matrix. Replace each element of the transposed matrix by its own minor Divide the converted matrix by the determinant. Applying the process to the rotation matrix, we get = 1(C2+S2) + 0=1 1 Rot(x,)T = 0 0 0 0 C S -S C
36
Chapter 2
Literature Study
Now calculate each minor. As an example, the minor for the 2,2 element will be C6 - 0 = C6, the minor for 1,1 element will be C2 + S2 = 1, etc. As you notice, the minor for each element will be the same as the element itself. Thus, Rot (x, )T = Rot (x, )T. Since the determinant of the original rotation matrix is unity, dividing the Rot (x,)Tminor matrix by the determinant will yield the same result. Thus, the inverse of a rotation matrix about the xaxis is the same as its transpose, or Rot (x,)-1 = Rot (x,)T. (2.29)
Of course, you would get the same result with the second method. A matrix with this characteristic 'is called a unitary matrix. It turns out that all rotation matrices are unitary matrices. Thus, all we need to do to calculate the inverse of a rotation matrix is to transpose it. Please verify that rotation matrices about the y- and z-axes are also unitary in nature. Please beware that only rotation matrices are unitary. If a matrix is not a simple rotation matrix, it may not be unitary. The preceding result is also true only for a simple 3x3 rotation matrix without representation of a location. For a homogenous 4x4 transformation matrix, it can be shown that the matrix inverse can be written by dividing the matrix into two portions: The rotation portion of the matrix can be simply transposed, as it is still unitary. The position portion of the homogeneous matrix is the negative of the dot product of vector P with each of n, o, a the vectors as follows: nx ox ax Px T= ny oy ay Py nz oz az Pz 0 0 0 1 T-1= nx ny nz 0 ox oy oz 0 ax -P. n ax -P. o az -P .a 0 1
As shown, the rotation portion of the matrix is simply transposed, the position portion is replaced by the negative of the dot products, and the last row (scale factors) is not affected. This is very helpful, since we will need to calculate inverses of transformation matrices, but directly calculating inverse of a 4 X 4 matrix is a lengthy process.
2-5.1.
IMAGE ACQUISITION
Image acquisition in humans, which begins with the eyes, translates visual information into a format that can be further manipulated by the brain. Similarly, a computer vision system needs an eye to take in the sights. In most computer vision systems that eye is a video camera.
37
Chapter 2
Literature Study
The camera translates a scene or image into electrical signals. These signals are translated into binary numbers so the "intelligent'' computer can work with them. The output of camera is an analog signal whose frequency and amplitude represent the brightness detail in a scene. The camera observes a scene and divides it into hundreds of fine horizontal lines. Each line creates an analog signal whose amplitude represents the brightness changes along that line. Digital computers cannot deal with analog signals. For that reason, an analog to digital converter is required
2-5.2.
IMAGE PROCESSING
The-next stage of computer vision involves some initial manipulation of the binary data. Image processing helps improve the quality of the image so it can be analyzed and comprehended more efficiently. Image processing improves the signal-to-noise ratio. The signal, of course, is the information representing objects in the image. Noise is any interference, flaw, or aberration that obscures the objects. Through various computational means, it is possible to improve the signalto-noise ratio. For example, the contrast in a scene can be improved by removing flaws, such as unwanted reflections. The process is somewhat akin to retouching a photograph to improve its quality. Once the image has been cleaned up and enhanced, it is ready for analysis.
2-5.3.
IMAGE ANALYSIS
Image analysis explores the scene to determine major characteristics of the object under investigation. A computer program begins looking through the numbers that represent the visual information to identify specific features and characteristics. More specifically, the image analysis program looks for edges and boundaries. An edge is formed between an object and its background or between two specific objects. These edges are identifiable because of different brightness levels on either sides of boundary. The computer produces a simple linen drawing of the objects in the scene, just as an artist would draw outlines of all the objects. Image analysis also looks for .textures and shading between fines both are useful in helping to identify the scene. At this point, a considerable amount or computer processing has taken place. Yet, objects still have not been identified and the scene is not yet understood. The processing, however, has prepared the image for the next step.
2-5.4.
IMAGE UNDERSTANDING
The final step in the computer vision process is image understanding, by which specific objects and their relationship are identified. This portion of the computer vision process employs artificial intelligence techniques. The previous steps of image processing and analysis were done with algorithms. Now, symbolic processing is used to interpret the scenes. Image understanding can be enhanced by neural networks
Chapter 2
Literature Study
area, it will look darker gray. By changing the size of the printed dot many gray levels may be produced, and collectively, a gray-scale picture may be printed.
2-6.1.
Although all real scenes are three dimensional, images can either be two or three dimensional. Two-dimensional images are used when the depth of the scene or its features need not be determined. As an example, consider defining the surrounding contour or the silhouette of an object. In that case it will not be necessary to determine the depth of any point on the object. Another example is the use of a vision system for inspection of an integrated circuit board. Here too there is no need to know the depth relationship between different parts and since all parts is `fixed to a flat plane, no information about the surface is necessary. Thus, a two-dimensional image analysis and inspection will suffice [12]. Three dimensional image processing deal with operations that require motion detection, depth measurement, remote sensing, relative positioning and navigation. CAD/CAM-related operations also require three-dimensional image processing, as do many inspection and object recognition tasks. Other techniques, such as computed tomography (CT) scan are also three dimensional. In computed tomography, either X-rays or ultrasonic pulses are used to gel images of one slice of the object at a time, and later, all of the images are put together to create a threedimensional image of the internal characteristics of the object.
2-6.2.
ACQUISITION OF IMAGES
There are two types of vision cameras: analog and digital. Analog cameras are not very common any more, but are still around; they used to be standard at television stations. Digital cameras are much more common and are mostly similar to each other. A video camera is a digital camera with an added videotape recording section. Otherwise, the mechanism of image acquisition is the same as in other cameras that do not record an image. Whether the captured image is analog or digital, in vision systems the image is eventually digitized in a digital form, all data are binary and are stored in a computer file or memory chip.
2-6.2.1.
Vidicon Camera
A vidicon camera is an analog camera that transforms an image into an analog electrical signal. The signal, a variable voltage (or current) versus time, can be stored, digitized, broadcast, or reconstructed into an image. Figure 2.15 shows a simple schematic of a vidicon camera. With the use of a lens the scene is projected onto a screen made up of two layers: a transparent metallic film and a photoconductive mosaic that is sensitive to light. The mosaic reacts to the varying intensity of light by varying its resistance. As a result, as the image is projected onto it the magnitude of the resistance at each location varies with the intensity of the light. An electron gun generates and sends a continuous cathode beam (a stream of electrons with a negative live charge) through two pairs of capacitors (deflectors) that are perpendicular to each other. Depending on the charge on each pair of capacitors, the electron beam is deflected up or down and left or right, and is projected onto the photoconductive mosaic. At each instant, as the beam
39
Chapter 2
Literature Study
of electrons hits the mosaic, the charge is conducted to the metallic film and can he measured at the output port. The voltage measured at the output is V = IR. Where I is the current (of the beam of electrons), and R is the resistance of the mosaic at the point of interest.)
Now suppose that we routinely change the charges in the two capacitors and thus deflect the beam both sideways and up and down so as to cause it to scan the mosaic (a process called a raster scan). As the beam scans the image, at each instant the output is proportional to the resistance of the mosaic or proportional to the intensity of the light on the mosaic. By reading the output voltage continuously, an analog representation of the image can be obtained to create moving images in televisions, the image is scanned and reconstructed 30 times a second. Since human eyes possess a temporary hysteresis effect of about 1/10 second, images changing at 30 times a second are perceived as continuous and thus moving. The image is divided into two 240line sub images, interlaced onto each other. Thus a television image is composed of 480 image lines, changing 30 times a second. In order to return the beam to the top of the mosaic another 45 lines are used creating a total of 525 lines. In most other countries, 625 lines are the standard. Figure 8.3 depicts a raster scan in a vidicon camera. If the signal is to be broadcast, it is usually frequency modulated (FM); that is, the frequency of the carrier signal is a function of the amplitude of the signal. The signal is broadcast and is received by a receiver, where it is demodulated back to the original signal, creating a variable voltage with respect to time. To re-create the image for example, in a television set this voltage must be converted back to an image. To do this the voltage is fed into a cathode-ray lube (CRT) with an electron gun and similar deflecting capacitors, as in a vidicon camera. The intensity of the electron beam in the television is now proportional to the voltage of the signal, and is scanned similar to the way a camera does. In the television set, however, the beam is projected onto a phosphorous-based material on the screen, which glows proportionally to the intensity of the beam thus re-creating the image.
For color images, the projected image is decomposed into the three colors of red, green, and blue (RGB). The exact same process is repeated for the three images, and three simultaneous signals are produced and broadcast. In the television set, three electron guns regenerate three simultaneous images in RGB on the screen, except that the screen has three set of small dots (pixels) that react by glowing in RGB colors and are repeated over the entire screen. All color images in any system are divided into RGB images and are dealt with as three separate images.
40
Chapter 2
Literature Study
2-6.2.2.
Digital Camera
A digital camera is based on solid-state technology. As with other cameras, a set of lenses is used to project the area of interest onto the image area of the camera. The main part of the camera is a solid-state silicon wafer image area that has hundreds of thousands of extremely small photosensitive areas called photo sites printed on it. Each small area of the wafer is a pixel. As the image is projected onto the image area, at each pixel location of the wafer a charge is developed that is proportional to the intensity of light at that location. Thus, a digital camera is also called a charge coupled device, or CCD camera, and a charge integrated device or CID camera. The collection of charges, if read sequentially, would be a representation of the image pixels. The wafer may have as many as 520, 100 pixels in an area with dimensions of a fraction of an inch. Obviously, it is impossible to have direct wire connections to all of these pixels to measure the charge in each one. To read such an enormous number of pixels, 30 times a second the charges are moved to optically isolated shift registers next to each photo site, are moved down to an output line, and then are read[8][9]. The result is that every thirtieth of a second the charges in all pixel locations are read sequentially and stored or recorded. The output is a discrete representation of the image a voltage sampled in time as shown in Figure 8.5(a). Figure 8.5(b) is the CCD element of a VHS camera. Similar to CCD cameras for visible lights, long-wavelength infrared cameras yield a television like image of the infrared emissions of a scene [3]. Image acquisition with a digital camera involves the development, at each pixel location, of a charge proportional to the light at the pixel. The image is then read by moving the charges to optically isolated shift registers and reading them at a known rate.
FIGURE 2-17: (a) Image data collection model. (b) The CCD element of a VHS.
2-6.2.3.
Analog-to-Digital Conversion
The video output signal from the camera is fed to an analog-to-digital converter ADC. The ADC then periodically samples the analog signal and converts the amplitude into a parallel binary number. This sampling process is shown in Figure 2.18. The sampling is usually done, by a circuit called a track/store or sample/hold (S/H) amplifier (Figure 2.19). As the analog signal is applied to the input of the S/H amplifier, a digital clock signal occurring at a fixed rate drives a switch called the FET in the S/H amplifier. During its sample state, the FET conducts, and the output of the S/H amplifier is the same as, the analog input. The charge on the capacitor follows the input voltage value. When the binary control signal switches states, the FET is cutoff and the amplifier stores or holds the value of the analog input signal at the instant the FET cuts off. That analog voltage level is stored as an electrical charge on a capacitor. The output of the S/H amplifier is then a fixed DC voltage. This signal is then fed to the ADC. The ADC performs a
41
Chapter 2
Literature Study
conversion operation and outputs a binary number whose value is proportional to the amplitude of the analog signal.
2-6.2.4.
Pixels
Each time the video signal is sampled by the ADC, we say that a pixel has been created. A pixel is the value of light intensity at one particular point on a scan line. A pixel, therefore, is a small element into which each scan line is broken. Each scan line contains approximately 300 to 500 pixels. These samples, then, give a fairly accurate representation of light intensity variation across the scan line. Naturally the more pixels per line the higher is the definition. In any case, the pixel is a point of light that is in effect, some shade of gray. That shade of gray is designated by a particular binary number [12]. The number of output bits in the ADC determines the total number of gray levels to be represented. In some low-definition systems, a 4-bit output is satisfactory.
With 4 bits, sixteen possible gray levels can be represented, ranging from 0 for black to 1111 for white. For higher definition systems; more bits are used. Eight-bit ADC are popular today because they provide 256 individual gray levels, which gives extremely fine detail.
2-6.2.5.
Digital Images
The sampled voltages from the afore mentioned processes are first digitized through an analog-to-digital converter (ADC) and then either stored in the computer storage unit in an image format such as TIFF, JPG, Bitmap, etc., or displayed on a monitor. Since it is digitized, the stored information is a collection of 0's and 1's that represent the intensity of light at each pixel; a digitized image is nothing more than a computer file that contains the collection of these 0's and 1's, sequentially stored to represent the intensity of light at each pixel. These files can be accessed and read by a program, can be duplicated and manipulated, or can be rewritten in a different form. Vision routines generally access this information, perform some function on the data, and either display the result or store the manipulated result in a new file [12]. 42
Chapter 2
Literature Study
An image that has different gray levels at each pixel location is called a gray image. The gray values are digitized by a digitizer, yielding strings of 0's and 1's that are subsequently displayed or stored. A binary image is an image such that each pixel is either fully light or fully dark a 0 or a 1. To achieve a binary image, in most cases a gray image is converted by using the histogram of the image and a cut-off value called a threshold. A histogram determines the distribution of the different gray levels. One can pick a value that best determines a cutoff level with least distortion and use that value as a threshold to assign 0's (or "off") to all pixels whose gray levels are below the threshold value and to assign 1's (or "on") to all pixels whose gray values are above the threshold.
2-6.3.
IMAGE PROCESSING
With the binary version of the scene stored in memory, image processing can now begin. Image processing, also known as image enhancement, is the process of improving the quality of the image. Anything that can be done to make the image clearer will simplify analysis and lead to improved understanding. From the very instant the light reflected from a scene enters a computer vision system; it is distorted and misrepresented by system imperfections. For instance, the camera Jens produces some distortion; also, inconsistencies in the target area of the vidicon or CCD produce uneven Light intensities. Finally, the imperfect process of analog-to-digital conversion introduces further misrepresentations. There can be other problems as well. Extremely low light levels can produce a scene that is difficult for the camera to see. The camera itself may not be sensitive enough to clearly capture the fine definition in the scene. The same is true for scenes with excessive brilliance; the camera may produce distortions. Regardless of the source of the degradation of the signal, -processing techniques can be used to eliminate or minimize these problems.
2-6.4.
IMAGE-PROCESSING TECHNIQUES
As was mentioned earlier, image-processing techniques are used to enhance, improve, or otherwise alter an image and to prepare it for image analysis. Usually, during image processing information is not extracted from the image. The intention is to remove faults, trivial information, or information that may be important, but not useful, and to improve the image. Image processing is divided into many sub processes, including histogram analysis, thresholding, and masking, edge detection, and segmentation, region growing, and modeling, among others. In the next few sections, we will study some of these processes and their application.
2-6.4.1.
Preprocessing
Before image enhancement occurs, preprocessing takes place to improve the scene. First, filters may be attached to the lens to control the amount of light, its color, and the contrast of the various objects in the scene. Second, many computer vision systems operate in a controlled environment where it is possible not only to control illumination level, but also to position light sources or the objects to be viewed for maximum visibility and comprehension.
43
Chapter 2
Literature Study
2-6.4.2.
Lighting
Lighting of the subject should be considered an important element of the image acquisition system, and it is mentioned separately here because lighting plays such an important part in the success of the application. Anyone who has studied the moon with a telescope or even with binoculars knows that the terrain features are greatly enhanced by the angle of incidence from the dominant light source, the sun. Thus, mountains and canyons are much more visible during waxing and waning portions of the cycle, when viewed near the sunset or sunrise line, than at full moon. Machine vision systems, too, can be made more effective by cleverly taking advantage unusual lighting effects.
2-6.4.3.
Consider sequentially plotting the gray values of the pixels of an image (on the y-Axis) against time or pixel location (on the x-axis) as the image is scanned. Although we are discussing a discrete (digitized) signal, it may be transformed into a large number of sines and cosines with different amplitudes and frequencies that, if added together, will reconstruct the signal. As discussed earlier, slowly changing signals (such as small changes between succeeding pixel gray values) will require few sines and cosines in order to be reconstructed, and thus have low frequency content. On the other hand, quickly varying signals (such as large differences between pixel gray levels) will require many more frequencies to be reconstructed and thus have high frequency content. Both noises and edges are instances in which one pixel value is very different from the neighboring ones. Thus, noises and edges create the larger frequencies of a typical frequency spectrum whereas slowly varying gray level sets of pixels, representing the object, contribute to the lower frequencies of the spectrum. However, if a high-frequency signal is passed through a low-pass filtera filter that allows lower frequencies to go through without much attenuation in amplitude, but that severely attenuates the amplitudes of the higher frequencies in the signal.
The filter will reduce the influence of all high frequencies, including the noises and edges. This means that, although a low-pass filter will reduce noises, it will also reduce the clarity of an image by attenuating the edges, thus softening the image throughout. A high-pass filter, on the other hand, will increase the apparent effect of higher frequencies by severely attenuating the low-frequency amplitudes. In such cases, noises and edges will be left alone, but slowly changing areas will disappear from the image. To see how the Fourier transform can be applied in this case, let's look at the data of Figure 2.20 once again. The grayness level of the pixels of row 9 is
44
Chapter 2
Literature Study
repeated in Figure 2.21. A simple first-approximation Fourier transform of the gray values[10] was performed for the first four harmonic frequencies, and then the signal was reconstructed, as shown in Figure 2.21. Comparing the two graphs reveals that a digital, discrete signal can be reconstructed, even if its accuracy is dependent on the number of sines and cosines, as well as the method of integration, etc.
2-6.4.4.
Windowing
Windowing is a means of concentrating vision system analysis on a small field of view, thereby conserving computer resources of run time and storage. Windowing is an essential first step for virtually every robotic vision system analysis, and the most practical applications of windowing employ fixed windows, that is, the window is always set up in the same place within the image. This usually means that some sort of fixturing must be used to identically position every work piece so that consistency of the window subject is maintained More sophisticated machine vision systems are able to employ adaptive windowing, in which the system is able to select the appropriate window out of context. In such systems a search of the entire image detects known landmarks that identify the position and orientation of the subject work piece. The landmarks can then be used by the system to find the window area of interest and proceed as in a fixed window scheme. The advantage is obvious; orientation and positioning become unnecessary, resulting in dramatic savings in production costs. Some robots, equipped with adaptive windowing capability, have experienced success with "bin-picking," the selection and picking up of work pieces piled randomly in a bin, an easy task for humans, but a very difficult one for a robot. Once a window of interest has been identified, a variety of analyses can enhance the target features or identify and describe them in detail. The most basic of these analyses is thresholding.
2-6.4.5.
To be useful in image processing the image must be digitized both spatially as well as in amplitude. The more pixels those are present and individually read the better is the resolution of the camera and the image. This technique is called sampling as the light intensities are sampled at equally spaced intervals. A larger sampling rate will create a larger number of pixel data and thus better resolution. Figure 2.22 shows the same image sampled at (a) 432 x 576, (b) 108 x 144, (c) 54 x 72 and (d) 27 x 36 pixels. As the sampling rate decreases, the clarity of the image is lost. The voltage or charge read at each pixel value is an analog value and must also be digitized. Digitization of the light intensity values at each pixel location is called quantization. Depending on the number of hits used the resolution of the image will change. The total number of gray level possibilities is 2n where n is the number of hits. For a 1-bit analog-to-digital converter (ADC)
45
Chapter 2
Literature Study
there are only two possibilities: "off "or "on." or, alternately (0 or 1) called a binary image. For quantization with an 8-bil ADC, the maximum number of gray levels will be 256. Thus the image will have 256 different gray levels in it Quantization and sampling resolutions are completely independent of each other. For example, a high-resolution image may be converted into a binary image, and thus the quantization is only into two digits (0 and 1, or black and while, or "off" and "on"). Still, the same image may be quantized into 8 bits which can yield a spectrum of 256 different shade of gray.
Figure 2.23 shows the same image quantized at (a) 2 levels, (b) 4 levels, (c) 8 levels and (d) the original 44 levels.
Example 2.2
Consider an image with 256 by 256 pixels. The total number of pixels in the image will be 256 X 256 = 65,536. If the image is binary, it will require 1 bit to record each pixel as 0 or 1. Thus, the total memory needed to record the image will he 65.536 hits, or, with 8 bits to a byte, 8192 bytes. If each pixel were to he digitized at the rate of 8 hits (or 256 shades of gray, it would require 65.536x8= 524.288 bits. or 65,536 bytes. If the image were in color, it would require 65.536 bytes for each of the three colors of red green and blue. For a color video clip changing al the rate of 30 images per second, the memory requirements will be 65.536 x 3 x 30 = 5,898,240 bytes per second. Of course, this is only the memory requirement for recording the image pixels and does not include index information and other bookkeeping requirements.
2-6.4.6.
SAMPLING THEOREM
Consider a simple sinusoidal signal with frequency f as shown in Figure 2.25, suppose that the signal is sampled al the rate of fs. This means that the sampling circuit will read the amplitude of the signal with a frequency of fs. Now suppose that we want to use the sampled data to reconstruct the signal. Doing this would be similar 10 sampling a sound source such as a CD player and then trying to reconstruct the sound signal from the sampled data through a speaker. One possibility would be that, by chance, the same signal might be reconstructed. However, as you see in Figure 2.26, it is quite possible that, from the same data another signal may be reconstructed which is completely different from the original signal. Both signals are valid, and in fact, many other signals can be valid as well and
46
Chapter 2
Literature Study
might be reconstructed from the sampled data. This loss of information is called aliasing of the sampled data, and it can be a serious problem.
In order to prevent aliasing, according to what is referred to as the sampling theorem, the sampling frequency must be at least twice as large as the largest frequency present in the signal. In that case, one can reconstruct the original signal without aliasing. The highest frequency present in the signal can be determined from the frequency spectrum of the signal. Using the Fourier transform, one finds that the frequency spectrum of a signal will contain many frequencies. However, as we have seen, the higher frequencies may have smaller amplitudes. One can always pick a maximum frequency that may be of interest, while assuming that the frequencies with very low amplitudes beyond that point can be ignored without much effect in the system's total representation. In that case, the sampling rate of the signal must be at least twice as large as this maximum frequency. In practice, the sampling rate is generally chosen to be even larger, to further ensure that aliasing of the signal will not occur. Frequencies four to five times as large as the maximum frequency are common. As an example, consider a CD player. Theoretically, human ears can hear frequencies of up to about 20,000 Hz. If the CD player is to be able to reconstruct the digitized sampled music, the sampling rate of the laser sensor must be at
47
Chapter 2
Literature Study
least twice as large, namely, 40,000 Hz. In practice, CD players sample at the rate of about 75,000 Hz; at lower sampling rates, the sound may become distorted. In the example of Figure 2.27, the sampling rate is lower than the higher frequencies of the signal. Although the lower frequencies of the signal are reconstructed, the signal will not have the higher frequencies of the original signal. The same may happen to any signal, including audio and video signals. For images, too, if the sampling rate (which translates into the resolution of the image) is low, the sampled data may not have all the necessary detail, the information in the image is lost, and the image cannot be reconstructed to match the original. The image in Figure 2.24 is sampled at a very low rate, and the information in it is lost. This is why you cannot tell what the image is. However, if the sampling rate is increased, there will be a time when there will be enough information to be able to recognize the image. The still higher resolutions or sampling rates will transfer more information, and thus, increasingly more detail can be recognized. Figure 2.28is the same image as in Figure2.24, but at 2, 4, and 16 times higher resolutions. Now suppose that you need to recognize the difference between a bolt and a bolt in a vision system in order to direct a robot to pick up the parts. Because the information representing a bolt is very different from that representing a nut, low-resolution images will still enable you to determine what the part is. However, in order to recognize the license plate number of a car while moving in traffic, one would need to have a high-resolution image to extract enough information about the details, such as the numbers on the license plate.
(a)
(b)
(c)
48
Chapter 2
Literature Study
2-6.4.7.
Histogram of Images
A histogram is a representation of the total number of pixels of an image at each gray level. Histogram information is used in a number of different processes, including thresholding. For example, histogram information can help in determining a cutoff point when an image is to be transformed into binary values. It can also be used to decide whether there are any prevalent gray levels in an image. For instance, suppose a systematic source of noise in an image causes many pixels to have one "noisy" gray level. Then a histogram can be used to determine what the noisy gray level is in order to attempt to remove or neutralize the noise. Now suppose that an image has all its pixel gray levels clustered between two relatively close values, as in Figure 2.29. In this image, all pixel gray values are between 120 and 180 gray levels, at four-unit intervals. (The image is quantized at 16 distinct levels between 0 and 256.) Figure 2.29 is the histogram of the image, and clearly, all pixel gray levels are between 120 and 180, a relatively low range. As a result, the image is not very clear and details are not visible. Now suppose that we equalize the histogram such that the same 16 gray levels present in the image are spread out between 0 and 255 gray levels, at intervals of 17 units, instead of the present 120-180 gray levels at intervals of 4 units. Then, due to the equalization, the image is vastly improved, as shown in Figure 2.29, with its corresponding histogram in Figure 2.29. Notice that the number of pixels at each gray level is exactly the same in both cases, but that only the gray levels are spread out.
2-6.4.8.
Histogram Flattening
Another technique that helps improve some images is known as histogram flattening. A histogram is a vertical bar chart used to plot statistical information. A histogram can be constructed of the binary image by counting the number of times that each distinct gray level occurs. The result might look like the chart shown in Figure 2.30(a),With this histogram constructed in memory, a program can determine if there are excessive high or low values. For example, one analysis may discover an unusually high number of very bright or very dark levels. If certain intensity values occur widely throughout the scene, histogram flattening can reduce them. Brightness values that occur only occasionally can be increased.
49
Chapter 2
Literature Study
2-6.4.9.
Thresholding
Thresholding is the process of dividing an image into different portions (or levels) by picking a certain grayness level as a threshold, comparing each pixel value with the threshold, and then assigning the pixel to the different portions or levels, depending on whether the pixel's grayness level is below the threshold (off or zero, or not belonging) or above the threshold (on or 1, or belonging). Thresholding can be performed either at a single level or at multiple levels, in which the image is processed by dividing it into "layers," each with a selected threshold [11]. To aid in choosing an appropriate threshold, many different techniques have been suggested, ranging from simple routines for binary images to sophisticated techniques for complicated images- Early routines used for a binary image had the object lighted and the background completely dark. This condition can be achieved in controlled lighting in industrial situations, but may not be possible in other environments, in binary images, the pixels arc either on or off and thus choosing a threshold is simple and straightforward. In certain other situations, the image will have multiple gray levels, and its histogram will exhibit a bimodal distribution. In this case the valley is chosen as the threshold value. More advanced techniques use statistical information and distribution characteristics of the image pixels to develop a thresholding value. As the thresholding value changes, so does the image. Figure 2.31shows an original image with 256 gray levels and the result of thresholding at grayness levels of 100 and 150.
(a)
(b)
(c)
FIGURE 2-31: Thresholding an image with 256, 100 and 150 gray levels.
2-6.4.10. Connectivity
Sometimes we need to decide whether neighboring pixels are somehow "connected" or related to each other. Connectivity establishes whether they have the same properties, such as
50
Chapter 2
Literature Study
being of the same region, coming from the same object, having a similar texture, etc. To establish connectivity of neighboring pixels, we first have to decide upon a connectivity path. There are three fundamental connectivity paths for two-dimensional image processing and analysis: +4- or X4-connectivity, H6 or V6 connectivity and 8-con-nectivity. In three dimensions, connectivity between voxels (volume cells) can range from 6 to 26. The following terms are defined with respect to Figure 2.32. +4-connectivity applies when a pixel p's relationship is analyzed only with respect to the four pixels immediately above, below, to the left, and to the right of p (namely, b, g, d, and e).
x4-connectivity applies when a pixel p's relationship is analyzed only with respect to the four pixels immediately across from it diagonally on 4 sides (a, c, f, and h). For a pixel p(x, y) the relevant pixels are as follows: For +4-connectivity: (x + x,y).{x - 1,y). (x,y + 1). (x,y - 1); For X4-connectivity: (x+l, y+l). (x+ l, y-l) (x-l, y+l), (x-l,y-1) H6-connectivity applies when a pixel p's relationship is analyzed only with respect to the six neighboring pixels on the two rows immediately above and below p (a, b, c, f, g, h) V6-connectivity applies when a pixel p's relationship is analyzed only with respect to the six neighboring pixels on the two columns immediately to the right and to the left otp (a, d, f, c, e, h). For a pixel p(x,y), the relevant pixels are as follows for H6-connectivity: For H6-connectivity: (x-l,y+l), (x,y+l), (x+l,y+l) (x-l,y-l), (x,y-1),(x + l,y- 1) For V6-connectivity: (x-l,y+l), (x-1,y),(x-l.y-1) (x + l,y+ 1), (x + l,y), (x + l,y - 1). 8-connectivity applies when a pixel p's relationship is analyzed with respect to all eight pixels surrounding it (a, b, c, d, e, f, g, h). For a general pixel p(x,y), the relevant pixels are as follows: (x-l.y-1), (x,y-l), (x+l.y-1), (x-l,y), (x + l,y), {x - l,y + 1), (x,y + 1), (x + l,y + 1). So far, we have studied some general issues and fundamental techniques that are used in image processing and analysis. Next, we will discuss particular techniques that are used for specific applications
Example2.3
In the image of Figure 2.33 starting with pixel (4,3), find all succeeding pixels that can be considered as connected to each other based on +4-, X4-, H6-, V6-, and 8-connectivity rules. Solution Figure 2.34 shows the results of the five connectivity searches. For each search, take one pixel, find all others that are connected to it based on the rule you are working with, and then search the pixels that were found to be connected to the
51
Chapter 2
Literature Study
FIGURE 2-34: The results of the connectivity searches for Example 2.3. Previous ones for additional connected pixels, until you are done all of the remaining pixels will not be connected. We will use the same rules later for other purposes, such as region growing.
52
Chapter 2
Literature Study
Applying the mask over the corner of the image, with a normalizing value of 9 (the sum of all the values in the mask), yields:
X =
(20 x 1 + 20 x 1 + 20 X 1 + 20 x 1 + 100 X 1 + 20 x 1 + 20 X 1 + 20 x 1 + 20 x 1) 9
=29
As a result of applying the mask on the indicated corner, the pixel with the value of 100 changes to 29, and the large difference between the noisy pixel and the surrounding pixels (100 vs. 20) becomes much smaller (29 vs. 20), thus reducing the noise. With this characteristic, the mask acts as a low-pass filter. Notice that the operation will introduce new gray levels into the image (29) and thus will change its histogram. Similarly, this averaging low-pass filter will also reduce the sharpness of edges, making the resulting image softer and less focused. Figure 8.26 shows an original image (a), a corrupted image with noise (b), the image after a 3 X 3 averaging filter application (c), and the image after a 5 X 5 averaging filter application (d). As you can see, the 5 x 5 filter works even better than the 3X3 filter, but requires a bit more processing. There are other averaging filters, such as the Gaussian averaging filter (also called the mild isotropic low-pass filter), which is shown in Figure 8.27. This filter will similarly improve an image, but with a slightly different result.
53
Chapter 2
Literature Study
random noise N(x, y), then the desired image I(x ,y) can be found from averaging because the summation of random noises will be zero; that is, A(x,y) = I(x,y) + N(x,y) 0 A(x,y)= I(x,y)+N(x,y)= I(x,y)+ N(x,y)=I(x,y) n n n n n n n Although image averaging reduces random noise, unlike neighborhood averaging, it does not blur the image or reduce its focus.
Figure 2.39 shows an original image (a), the image corrupted with random Gaussian noise (b), and the image improved with a 3 X 3 median filter (c) and a 7x7 median filter (d).
2-6.5.
IMAGE ANALYSIS
Up to this point, we have been generally value in describing the scene. Image analysis begins the process of locating and defining the various objects in the scene. Artificial intelligence then attempts to determine what the objects are.
54
Chapter 2
Literature Study
Image analysis is accomplished by identifying regions and boundaries or edges. Edges represent boundaries where two surfaces come together. They also identify the interface between two different surfaces or between an object and a background. An edge is also formed between two objects when one is in front or back of another. The line between an object and its shadow and the outline of the shadow itself form edges. The first step, therefore, in image analysis is to locate all of these edges and boundaries. Edges and regions, or surfaces, completely define the scene, regions are large, flat areas of an object or scene that have the same intensity value and occur between the various edges and boundary lines. Various mathematical techniques have been developed for detecting edges and surfaces, and these forms the core of image analysis.
2-6.5.1.
Objects in an image may be recognized by their features, which may include, but are not limited to, gray-level histograms, morphological features such as area, perimeter number of holes, etc., eccentricity, cord length, and moments. In many cases, the information extracted is compared with a priori information about the object, which may be in a lookup table. For example, suppose that two objects are present in the image, one with two holes and one with one hole. By using previously discussed routines, it is possible to determine how many holes each part has, and by comparing the two parts (let's say they are assigned regions 1 and 2) with information about them in a lookup table, it is possible to determine what each of the two parts are. As another example, suppose that a moment analysis of a known part is performed at different angles, that the moment of the part relative to an axis is calculated for these angles, and that the resulting data are collected in a lookup table. Later, when the moment of the part in the image is calculated relative to the same axis and is compared with the information in the lookup table, the angle of the part in the image can be estimated. 55
Chapter 2
Literature Study
We next discuss a few techniques and different features that may be used for object recognition.
2-6.5.2.
The following morphological features may be used for object recognition and identification: a. The average, maximum, or minimum gray levels may be used to identify different parts or objects in an image. As an example, assume that the image is divided into three parts, each with a different color or texture that will create different gray levels in the image. If the average, maximum, or minimum gray levels of the objects are found, say, through histogram mapping, the objects can be recognized by comparing them with this information. In other cases, even the presence of one particular gray level may be enough to recognize a part. b. The perimeter, area, and diameter of an object, as well as the number of holes it has and other morphological characteristics may be used to identify the object. The perimeter of an object may be found by first applying an edge detection routine and then counting the number of pixels on the perimeter. The left-right search technique of Section 8.18 can also be used to calculate the perimeter of the object by counting the number of pixels that are on the object's path in an accumulator. The area of the object can be calculated by region-growing techniques. Moment equations can also be used, as will be discussed later. The diameter of a noncircular object is defined as the maximum distance between any two points on any line that crosses the identified area of the object. c. An object's aspect ratio is the ratio of the width to the length of a rectangle enclosed about the object, as shown in Figure 2.42. All aspect ratios, except for
The minimum aspect ratio, are sensitive to orientation. Thus, the minimum aspect ratio is generally used to identify objects. d. Thinness is denned as one of the following two ratios Thinness = (perimeter)2 Area Thinness = Diameter Area
2-6.5.3.
Morphology operations refer to a family of operations that are performed on the shape (and thus the morphology) of subjects in an image. They include many different operations, both for binary and gray images, such as thickening, dilation, erosion, skeletonization, and opening, 56
Chapter 2
Literature Study
closing, and filling. These operations are performed on an image in order to aid in its analysis, as well as to reduce the "extra" information that may be present in the imaged For example, consider the binary image in Figure 2.43 and the stick figure representing one of the bolts. As we will see later, a moment equation may be used to calculate the orientation of the bolts. However, the same equation can also be applied to the stick figure of the bolt, but with much less effort. As a result, it would be desirable to convert the bolt to its stick figure or skeleton. In the sections that follow we will discuss a few of these operations.
2-6.5.4.
Thickening Operation
The thickening operation fills the small holes and cracks on the boundary of an object and can be used to smooth the boundary. In the example of Figure 2.43 thickening operation will reduce the appearance of the threads of the bolts. This will become important when we try to apply other operations, such as skeletonization, to the object. The initial thickening will prevent the creation of whiskers caused by the threads, as we will see later. Figure 2.44 shows the effect of three rounds of thickening operations on the threads of the bolts.
2-6.5.5.
Dilation
In dilation, the background pixels that are 8-connected to the foreground (object) are changed to foreground pixels. As a result, a layer is effectively added to the object every time the process is implemented. Because dilation is performed on pixels that are 8-connected to the object, repeated dilations can change the shape of the object. Figure 2.45(b) is the result of four dilation operations on the objects in Figure 2.45(a). As can be seen, the objects have bled into one piece. With additional applications of dilation, the objects, as well as the disappearing hole, can become one solid piece and hence cannot be recognized as distinct objects anymore [1].
2-6.5.6.
Erosion
In erosion, foreground pixels that are 8-connected to a background pixel are eliminated. This effectively eats away a layer of the foreground (the object) each time the operation is performed. Figure 2.46(b) shows the effect of three repetitions of the erosion operation on the 57
Chapter 2
Literature Study
binary image in Figure 2.46(a). However, erosion disregards all other requirements of shape representation. It will remove one pixel from the perimeter (and holes) of the object even if the shape of the object is eventually lost, as in 2.46(c) with seven repetitions, where one bolt is completely lost and the nut will soon disappear. The final result of too many erosions will be the loss of the object) that is to say, if the reversing operation of dilation
FIGURE 2-46: Effect of erosion on objects (a) with (b) 3 and (c) 7 repetitions.
Which adds one pixel to the perimeter of the object with each pass, is used, the dilated object may not resemble the original object at all. In fact, if the object is totally eroded to one pixel, dilation will result in a square or a circle. As a result, erosion can irreparably damage the image. However, it can also be successfully used to eliminate unwanted objects in an image. For example, if one is interested in identifying the largest object in an image, successive erosions will eliminate all other objects before the largest is eliminated. Thus, the object of interest can be identified.
2-6.5.7.
Skeletonization
A. skeleton is a stick representative of an object in which all thicknesses have been reduced to one pixel at any location. Skeletonization is a variation of erosion. Whereas in erosion the thickness of an object may go to zero and the object may be totally lost, in skeletonization, as soon as the thickness of the object becomes one pixel, the operation at that location stops. Also, although in erosion the numbers of repetitions are chosen by the user, in skeletonization the process automatically continues until all thicknesses are one pixel thick. (The program stops when no new changes are made as a result of the operation.) The final result of skeletonization is a stick figure (skeleton) of the object, which is a good representation of ft indeed, sometimes much better than the edges. Figure 2.47(b) shows the skeleton of the original objects in Figure 2.47(a). The whiskers are created because the objects were not smoothed by thickening. As a result, all threads are reduced to one pixel, creating the whiskers. Figure 2.48 shows the same objects, thickened to eliminate the threads, resulting in a clean skeleton. Figure 2.48(c) is the result of dilating the skeleton seven times. As can be seen, the dilated objects are not the same as the original ones. Notice how the smaller screw appears to be as big as the bigger bolts
58
Chapter 2
Literature Study
Although dilating a skeleton will also result in a shape different from that of the original object, skeletons are useful in object recognition, since they are generally a better representation of an object than other representations. When a stick representation of an object is found, it can be compared with the available a priori knowledge of the object for matching.
2-6.5.8.
Open Operation
Opening is erosion followed by dilation and causes a limited smoothing of convex parts of the object. Opening can be used as an intermediate operation before skeletonization.
2-6.5.9.
Close Operation
Closing is dilation followed by' erosion and causes a limited smoothing of convex parts of the object. Like opening, closing can be used as an intermediate operation before skeletonization.
59
Chapter 2
Literature Study
An example edge-detection procedure is described in Figure 2.50, displaying a binary image in which the object is dark (binary 1) and the background is light (binary 0). A similar procedure (with opposite logic) works as well for a light object on a dark background. The system begins on a straight path across the background, examining each pixel it encounters and determining whether it is light or dark. As long as it encounters light pixels it continues on a straight path. As soon as the system encounters a dark pixel, it knows that it has crossed the boundary into the region within the object, so it turns in an attempt to maintain the edge. From that point forward the system keeps turning so that by continually crossing the boundary it maintains contact with it. Note that the edge-detection system is able to detect both interior comers and exterior comers and in a crude fashion can even follow a curve, although the computer image will be squared off into a series of tiny square comers. The construction of a practical procedure for edge detection is in truth a design problem, because there are many different ways the job can be done. As a design case study, Figure 2.50 will demonstrate the construction of a logic flow diagram for the edge detection procedure described in Figure 2.50 One problem with the edge-detection procedure is that if it ever makes a wrong turn, it can get lost in a tight loop, constantly turning left if it is lost inside the object, and constantly turning right if it is outside. An algorithm can watchdog this situation, and if it is ever discovered that pixels are being repeated in a tight loop, the system can go back 1o the pixel repeated the most and know that this is where the circle pattern began[11]. At that point the system can reverse the turn taken in that pixel and be back on track. Such an algorithm is especially useful when negotiating curves, for which a clear decision regarding a single pixel may not be obvious. The logic flow design shown in Figure 2.51 is unable to deal with possible wrong turns that can result in the system being caught in a tight loop either within or outside the edge of the object. One of the exercises at the end of this chapter is a design problem in which the student is asked to add this sophistication to the design of the logic flow diagram shown in Figure 2.51.
60
Chapter 2
Literature Study
Edge-detection capability opens up a host of machine vision applications. One of these is dimensional gauging. If the opposite edges of an object can be detected, then the distance between these edges can be gauged by the vision system's control computer, whether the dimension be x, y or a computed dimension at an angle between the axes. Edge detection can also be used in the process of "skeletonization," in which all white pixels are reduced to a one-pixelwide outline on the periphery of the white field. A similar process with dark pixel areas can produce familiar line drawings. We now examine a small sample of practical industrial applications of machine vision systems.
61
Chapter 2
Literature Study
2-6.6.
IMAGE UNDERSTANDING
Up to this point in the computer vision process, a lot of algorithmic computation has taken place. Yet, none of it is really AI. Even though an image has been acquired, enhanced, and analyzed, the computer still does not know what the scene means. The computer is not aware of the contents of the scene, what objects are represented, and how they are related to one another. The final stage of computer vision then is for the computer to have some knowledge about the things it may see in a scene. Object shapes and some kind of AI search and pattern-matching program enable the computer to examine the incoming scene and compare it to those objects in the knowledge base. The computer should be able to identify the objects there and thus understand what it sees. Unlike image analysis that essentially concerns local features, now extraction of global features takes place. In general, there are four comprehension paradigms in this phase: 1. Hierarchical bottom-up processing: Such processing is suitable for simple schemes and limited number of objects known in advance. 2. Hierarchical top-down processing or hypothesize and test: This processing method involves a goal (object-model)-directed search for a hypothesized object. 3. Heterarchical approach: In this method distributed control (with some monitor) of the various system parts is used to modify their operation as necessary. 4. Blackboard approach: Various components interact via a common database (the "blackboard") that is accessible by all the others. This approach is especially useful when several hypotheses should be kept and tracked at several levels. Sophisticated industrial systems commonly use both a bottom-up and a top-down approach. Usually a binary (black and white) image is formed; but the trend is toward the more flexible general gray scale representation. As you might suspect, a considerable amount of research has gone into the comprehension process of computer vision. It is not an easy process. It is complicated by the fact that a three-dimensional scene is converted into two-dimensional format. From this twodimensional view the computer must ascertain the existence of three-dimensional objects. However, some computer vision applications do involve what could be called two-dimensional scene analysis. In such cases, a simple template-matching technique is used to pick out specific object shapes. The template, which is stored in memory, is an_ outline or silhouette of an object that the computer knows. Thus, the comparison process that takes place during search and pattern matching can produce identification.
2-6.6.1.
Template Matching
Template matching compares a model of a known object stored as a binary image in the computer to the binary image of the scene being viewed the basis of comparison is the primal sketch (described earlier). Both the scene and the template are stored as primal sketches. The comparison is usually done on a pixel-by-pixel basis. Corresponding pixels in the two images are, for example, subtracted from one another to obtain a difference value. If the difference is zero, a match occurs; they are the same. Very small difference numbers produce a reasonably close match. High difference figures, of course, indicate wide disparity. Various statistical calculations can be made to determine just how close the match is between the template and the input scene. It is difficult to match a known shape to a viewed shape because the template stored in memory is fixed in size and orientation. Chances are the template has a different size and position than the object in the scene. Therefore, to cause a match or near match to occur/the template object must be manipulated. Various processing techniques can be used to scale or increase or decrease the size of the' template to more closely match the size of the object in the scene. In addition, other
62
Chapter 2
Literature Study
mathematical routines can be used to rotate or tilt the template to better approximate the orientation of the object in the scene [12]. Many of these calculations may have to be performed before obtaining a good approximation. After each reorientation or scans of the template, a new .comparison is made. If the object bears any resemblance to the template, a reasonable match will be found eventually. It is at that point that the system understands. It can then state with reasonable certainty that it has viewed a particular shape or object. On the other hand, the template-matching process may not produce a match, and the image comprehension program may simply move on to another template shape. In fact, the knowledge base for a computer vision system may be an entire set of templates of various shapes. The process is continued until an object is identified or no match occurs.
2-6.6.2.
Other Techniques
Even though template matching is a widely used and highly successful technique, there are other methods of image understanding. The process still involves search and pattern matching, but at a different level. For example, instead of storing entire objects or shapes as templates, the computer can store bits and pieces of objects in its knowledge base. At the lowest level, the computer may store a line segment which it then manipulates and attempts to match against lines in the primal sketch. It could do the same with squares and rectangles or circles and ellipses. Another alternative is feature matching, in which key features (e.g., diameter, area) are stored, rather than the actual image. A considerable amount of research in computer vision has dealt with what is generally called the blocks world. A blocks world is a contrived environment in which multiple blocks of different geometric shapes are used AS objects to form a scene to be" identified. Colored cubes, rectangles, cylinders, pyramids, triangles and other blocks are stacked or placed adjacent to one another to form a limited controlled environment. The camera of a computer vision system is then focused on this assembly of blocks to acquire, process, and analyze the scene. The primal sketch is used as the basis for attempting to understand what is in the blocks world. But instead of looking for complete patterns or objects, the computer begins to look for identifying pieces. In a blocks world where all of the blocks are made of straight lines, it is easy to determine that all of these lines will come together in a few easily recognized formats. A line is formed between the edge of an object and the background, and between adjoining surfaces of an object. A line also is formed where two separate objects touch, overlap, or appear one behind another. In addition, these lines form vertexes; that is, many of the lines come together to form a point. Lines form an L, a fork (Y) an arrowhead or a T. These various vertex possibilities are illustrated in Figure 2.53. Each of these is represented in the knowledge base of the computer. Several other types are shown in Figure2.54. The illustrations in Figure 2.54 show how a computer, knowing the types of vortexes, can misjudge a scene. The arrowhead or peak of the pyramid shown in Figure 2.54b could be mistaken for the comer of a square. In Figure 2.54c, the computer could have a difficult time determining if this is (1) the picture of a cube with a small cube cut out of one corner or (2) a small cube sitting in the comer where three larger planes form a comer of two walls and a floor. The computer must be given additional knowledge to help it discriminate. For example, in Figure 2.54c, the Y vertex will determine which of the two representations just described is correct. If the vertex is concave or indented inward, the first description is correct. If that vertex is convex or pointing outward, the second description is correct. Systems have been invented to determine whether a line or vertex is concave or convex.
63
Chapter 2
Literature Study
FIGURE 2-53: Types of Vertexes That Form Functions Between Objects in a Scene
Vertexes are not independent, of course. For instance, the Waltz labeling algorithm operates on the assumption that certain vertex types impose constraints on possible adjacent vertexes. Waltz [14] defined eleven line types (including, for instance, shadow edges) and created a vertex catalog with thousands of entries. However, by local elimination of impossible interpretations, the possible matches are filtered very quickly. Often, no further search is necessary. The search process attempts to locate vertexes in the primal sketch and match them with the various vertex models stored in memory. When a match occurs, the computer knows that it has identified a particular type of vertex and that certain surfaces and shapes are associated with it. The computer goes about looking for all of the various vertexes and junctions and slowly but surely builds up a complete picture of what is being represented. At some point, sufficient information becomes available so the computer can say that it has identified a cube, a rectangle, a pyramid, or whatever. Rules stored in the knowledge base state facts about the vertexes and what they mean. In this way, the computer can decide what it has "seen.
2-6.6.3.
Identifying objects in a simple blocks world is relatively easy. However, when we move but into the real world and attempt to identify outdoor scenes or even indoor scenes, the comprehension problem becomes incredibly difficult. In fact few computer vision systems have been built that are capable of such scene understanding. The reason for this is that most scenes, whether they are outdoors or indoors, contain an incredible number of different objects. In a room, for example, there may be a personal computer on a desk. There may be a printer, a 64
Chapter 2
Literature Study
typewriter, a telephone, books, and many other objects. There may be pictures on the wall, hanging plants, chairs, and even people. The primal sketch is very complex. Picking out the various objects in the scene is difficult. Moving outside further complicates the process. Some things like buildings, cars, trucks, and airplanes are relatively easy to identify. People are somewhat more difficult, and objects such as trees, bushes, and animals are very difficult anything with a curved or irregular surface is difficult to pick out. For these reasons, computer vision systems have limited applications. They work best in controlled environments where the scenes they view are relatively simple and contain only a few objects. An exception is military applications where there is less pressure to justify the cost. As research continues in this field, more sophisticated techniques will be developed to deal economically with more complex scenes. There are several ways to improve image understanding. One way is to use information from different sensor types (e.g., vision and touch) and fuse them to achieve a better understanding than with either alone. Another way is to use some aspect of the image surfaces, for example, light reflectance. Special| filters can be added to the camera, such as Polaroid light filters, that enhance the contrast between different material types (e.g. metal and plastic). This is quite useful when inspecting an electronic circuit board where wires can be highlighted. Another trend that might well be a predominant one in future computer vision is parallel processing [12]of the input image. Vision lends itself well to parallel processing techniques at various phases of the processing, analysis, and comprehension levels. It is the common view nowadays that the human brain is capable of analyzing the huge amount of information it acquires only because of its parallel processing architecture. Even more ambitious is the hope that neural nets [12]might combine several stages of image processing and understanding and, in particular, learn to recognize objects from the original acquired or the processed image.
2-6.6.4.
Detecting Motion
Most of the preceding discussion pertains to static images. However, vision is especially important when the object, the camera, or both are moving (as in the case of mobile robots), in such situations motion detection is needed. Detecting motion is a complex process that is mostly experimental at the present time. It usually relies on several continuity constraints common to consecutive frames taken at dose temporal intervals. The successive frames are compared and the motion flow (direction) vectors of each point are determined. Several military applications are available, but industrial applications are rare.
Chapter 2
Literature Study
2-7.1.
The result of sampling and quantization is a matrix of real numbers. We use two principal ways in this book to represent digital images. Assume that an image f(x, y) is sampled so that the resulting image has M rows and N columns. Say that the image is of size M x N. The values of the coordinates (x, y) are discrete quantities. In many image processing books, the image origin is defined to be at (x, y) = (0, 0). The next coordinate values along the first row of the image are (x, y) = (0, 1). The notation (0, 1) is used to signify the second sample along the first row. It does not mean that these are the actual values of physical coordinates when the image was sampled. Figure 2.55 shows this coordinate convention. Note that x ranges from 0 to M-1, and y from 0 to N-1, in integer increments. First, instead of using (x, y), the toolbox uses the notation (r, c) to indicate rows and columns. Note, however, that the order of coordinates is the same as the order discussed in the previous paragraph, in the sense that the first element of a coordinate tuple, (a, b), refers to a row and the second to a column. The other difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1 to M, and c from 1 to N, in integer increments. This coordinate convention is shown in Fig. 2.55.
IPT documentation refers to the coordinates in Fig. 2.55 as pixel coordinates. Less frequently, the toolbox also employs another coordinate convention called spatial coordinates, which uses x to refer to columns and y to refers to rows. This is the opposite of our use of variables x and y. With very few exceptions, we do not use IPT's spatial coordinate convention, but the reader will definitely encounter the terminology in IPT documentation.
2-7.2.
IMAGES AS MATRICES
The coordinate system in Fig. 2.55 and the preceding discussion lead to the following representation for a digitized image function:
The right side of this equation is a digital image by definition. Each element of this array is called an image element, picture element, pixel, or pel. The terms image and pixel are used throughout the rest of our discussions to denote a digital image and its elements [16]. A digital image can be represented naturally as a MATLAB matrix:
66
Chapter 2
Literature Study
Where f (1, 1) = f (O, 0) (note the use of a mono-space font to denote MATLAB quantities). Clearly the two representations are identical, except for the shift in origin. The notation f (p, q) denotes the element located in row p and column q. For example, f (6, 2) is the element in the sixth row and second column of the matrix f. typically we use the letters M and N, respectively, to denote the number of rows and columns in a matrix. A 1 x N matrix is called a row vector, whereas an M x 1 matrix is called a column vector. A 1 x 1 matrix is a scalar. Matrices in MATLAB are stored in variables with names such as A, a, RGB, and so on. Variables must begin with a letter and contain only letters, numerals, and underscores. As noted in the previous paragraph, all MATLAB quantities in this book are written using mono space characters. We use conventional Roman, italic notation, such as f(x, y), for mathematical expressions.
Here, filename is a string containing the complete name of the image file (including any applicable extension).For example, the command line >> f = imread( 'chestxray.jpg'); Reads the JPEG (Table) image chest x-ray into image array f. Note the use of single quotes (') to delimit the string filename. The semicolon at the end of a command line is used by MATLAB for suppressing output [18]. If a semicolon is not included, MATLAB displays the results of the operation(s) specified in that line. The prompt symbol (>>) designates the beginning of a command line, as it appears in the MATLAB Command Window.
67
Chapter 2
Literature Study
When, as in the preceding command line, no path information is included in filename, imread reads the file from the current directory and, if that fails, it tries to find the file in the MATLAB search path. The simplest way to read an image from a specified directory is to include a full or relative path to that directory in filename. For example, >> f = imread(D:\myimages\chestxray.jpg'); Reads the image from a folder called my images on the D: drive, whereas >> f = imread('.\myimages\cnestxray.jpg'); Reads the image from the myimages subdirectory of the current working directory, The Current Directory Window on the MATLAB desktop toolbar displays MATLAB's current working directory and provides a simple, manual way to change it. Table lists some of the most popular image/graphics formats supported by imread and imwrite. Function size gives the row and column dimensions of an image. This function is particularly useful in programming when used in the following form to determine automatically the size of an image: >> [M, N] = size(f); This syntax returns the number of rows (M) and columns (N) in the image [15].
68
Chapter 2
Literature Study
Which shows the cursor on the last image displayed. Clicking the X button on the cursor window turns it off. If another image, g, is displayed using imshow. MATLAB replaces the image in the screen with the new image. To keep the first image and output a second image, we use function figure as follows: >> figure, imshow(g) Using the statement >> imshow(f), figure, imshow(g) Displays both images. Note that more than one command can be written on a line, as long as different commands are properly delimited by commas or semicolons. (b) Suppose that we have just read an image h and find that using imshow(h) produces the image in Fig. 2.56(a). It is clear that this image has a low dynamic range, which can be remedied for display purposes by using the statement >> imshow(h, [ ]) Figure 2.56(b) shows the result. The improvement is apparent.
2-10.
syntax:
WRITING IMAGES
Images are written to disk using function imwrite. Which has the following basic imwrite(f, 'filename') With this syntax, the string contained in filename must include a recognized file format extension (see Table 2.1). Alternatively, the desired formal can be specified explicitly with a third input argument. For example, the following command writes f to a TIFF file named patient10_run1: >> imwrite(f, 'patient10_run1', tif) or, alternatively, >> imwrite(f, 'patientl0_run1.tif')
69
Chapter 2
Literature Study
If filename contains no path information, then imwrite saves the file in the current working directory. The imwrite function can have other parameters, depending on the file format selected. Most of the work in the following chapters deals either with JPEG or TIFF images, so we focus attention here on these two formats [18]. A more general imwrite syntax applicable only to JPEG images is imwrite(f, 'filename.jpg', 'quality', q) where q is an integer between 0 and 100 (the lower the number the higher the degradation due to JPEG compression). EXAMPLE 2.4 Figure 2.57(a) shows an image, f, typical of sequences of images resulting from a given chemical process. It is desired to transmit these images on a routine basis to a central site for visual and/or automated inspection. In order to reduce storage and transmission time, it is important that the images be compressed as much as possible while not degrading their visual appearance beyond a reasonable level. In this case "reasonable" means no perceptible false contouring. Figures 2.57(b) through (f) show the results obtained by writing image f to disk (in JPEG format), with q = 50, 25,15, 5, and O, respectively. For example, for q = 25 the applicable syntax is >> imwrite(f, 'bubbles25.jpg', 'quality', 25) The image for q = 15 [Fig. 2.57(d)] has false contouring that is barely visible, but this effect becomes quite pronounced for q = 5 and q = 0. Thus, an acceptable solution with some margin for error is to compress the images with q = 25. In order to get an idea of the compression achieved and to obtain other image file details, we can use function imfinfq, which has the syntax imfinfo filename Where filename is the complete file name of the image stored in disk. For example, >> imfinfo bubbles25.jpg Outputs the following information (note that some fields contain no information in this case
70
Chapter 2
Literature Study
Where FileSize is in bytes. The number of bytes in the original image is computed simply by multiplying Width by Height by BitDepth and dividing result by 8. The result is 486948. Dividing this by FileSize gives the compression ratio: (486948/13S49) = 35.16. This compression ratio was achieved while maintaining image quality consistent with the requirements of the application. In addition to the obvious advantages in storage space, this reduction allows the transmission of approximately 35 limes the amount of uncompressed data per unit time. The information fields displayed by imfinfo can be captured into a so called structure variable that can be used for subsequent computations. Using the preceding image as an example, and assigning the name K to the structure variable, we use the syntax >> K = imfinfo('bubbles25.jpg'); to store into variable K all the information generated by command imfinfo. The information generated by imfinfo is appended to the structure variable by means of fields, separated from K hy a dot. For example, the image height and width are now stored in structure fields K. Height and K. Width. As an illustration, consider the following use of structure variable K to compute the compression ratio for bubbles25. jpg: >> K = imfinfot'bubbles25.jpg'); >> image_bytes = K.Width*K.Height*K.BitDepth/8; 71
Literature Study
Note that imfinfo was used in two different ways. The first was to type imfinfo bubbles25.jpg at the prompt, which resulted in the information being displayed on the screen. The second was to type K = imfinfo( 'bub-tiles25.jpg'). This resulted in the information generated by imfinfo being stored in K. These two different ways of calling imfinfo are an example of command function duality, an important concept that is explained in more detail in the MATLAB online documentation. Note that imfinfo was used in two different ways. The first was to type imfinfo bubbles25.jpg at the prompt, which resulted in the information being displayed on the screen. The second was to type K = imfinfo('bub-tiles25.jpg'), which resulted in the information generated by imfinfo being stored in K. These two different ways of calling imfinfo are an example of command function duality, an important concept that is explained in more detail in the MATLAB online documentation. More general imwrite syntax applicable only to tif images has the form. imwrite(g, ' filename, tif' , 'compression' , 'parameter', ...'resolution', [colres rowres]) where 'parameter' can have one of the following principal values: 'none' indicates no compression: 'packbits' indicates packbits compression (the default for nonbinary images); and ' ccitt' indicates ccitt compression (the default for binary images).The 1 X 2 array [colres rowres] contains two integers that give the column resolution and row resolution in dots-per-unit (the default values are [72 72]). For example, if the image dimensions are in inches. colres is the number of dots (pixels) per inch (dpi) in the vertical direction. and similarly for rowres in the horizontal direction. Specifying the resolution by a single scalar, res, is equivalent to writing [res res]
2-11.
DATA CLASSES
Although we work with integer coordinates, the values of pixels themselves are not restricted to be integers in MATLAB. Table 2 lists the various data classes supported by MATLAB and IPT for representing pixel values. The first eight entries in the table are referred to as numeric data classes. The ninth entry is the char class and, as shown, the last entry is referred to as the logical data class. All numeric computations in MATLAB are done using double quantities, so this is also a frequent data class encountered in image processing applications. Class uint8 also is encountered frequently, especially when reading data from storage devices, as 8-bit images are the most common representations found in practice. These two data classes, class logical, and, to a lesser degree, class uint16, constitute the primary data classes on which we focus in this book. Many IPT functions, however, support all the data classes listed in Table 2. Data class double requires 8 bytes to represent a number, uint8 and int8 require 1 byte each, uint16 and int16 require 2 bytes, and uint32, int32. and single, require 4 bytes each.
72
Chapter 2
TABLE 2-1: Data classes
Literature Study
The char data class holds characters in Unicode representation. A character siring is merely a 1 x n array of characters. A logical array contains only the values 0 and 1. With each element being stored in memory using one byte per element. Logical arrays are created by using function logical or by using relational operators [17].
2-12.
IMAGE TYPES
The toolbox supports four types of images: Intensity images Binary images Indexed images RGB images Most monochrome image processing operations are carried out using binary or intensity images, so our initial focus is on these two image types.
2-12.1.
INTENSITY IMAGES
An intensity image is a data matrix whose values have been scaled to represent intensities. When the elements of an intensity image are of class uintt8 or class uint16 they have integer values in the range [0 255] and [0 65535]. respectively. If the image is of class double, the values are floating-point numbers. Values of scaled, class double intensity images are in the range [0.1] by convention.
2-12.2.
BINARY IMAGES
Binary images have a very specific meaning in MATLAB. A binary image is logical array of Os and Is. Thus, an array of Os and Is whose values are of data class, say, uint8, is not considered a binary image in MATLAB. A numeric array is converted to binary using function logical. Thus, if A is a numeric array consisting of Os and Is, we create a logical array B using the statement
73
Chapter 2
Literature Study
B = logical(A) If A contains elements other than Os and Is, use of the logical function converts all nonzero quantities to logical Is and all entries with value 0 to logical Os. Using relational and logical operators also creates logical arrays. To test if an array is logical we use the islogical function: islogical(C) If C is a logical array, this function returns a 1. Otherwise it returns a 0. Logical arrays can be converted to numeric arrays using the data class conversion functions [16].
2-13.1.
Converting between data classes is straightforward. The general syntax is B = data_class_name(A) there data_class_name is one of the names in the first column of Table 2. For example, suppose that A is an array of class uint8. A double-precision array, B, is generated by the command B = double (A). This conversion is used routinely throughout the book because MATLAB expects operands in numerical computations to be double-precision, floating-point numbers. If C is an array of class double in which all values are in the range [0,255] (but possibly containing fractional values), it can be converted to an uint8 array with the command D = uint8(C). If an array of class double has any values outside the range [0,255] and it is converted to class uint8 in the manner just described, MATLAB converts to 0 all values that are less than 0, and converts to 255 all values that are greater than 255. Numbers in between are converted to integers by discarding their fractional parts. Thus, proper scaling of a double array so that its elements are in the range [0,255] is necessary before converting it to uint8.Converting any of the numeric data classes to logical results in an array with logical Is in locations where the input array had nonzero values, and logical Os in places where the input array contained Os.
74
Chapter 2
Literature Study
TABLE 2-2: Function in IPT for converting between image classes and types
From which we see that function im2uint8 sets to 0 all values in the input that are less than 0, sets to 255 all values in the input that are greater than 1, and multiplies all other values by 255. Rounding the results of the multiplication to the nearest integer completes the conversion. Note that the rounding behavior of im2uint8 is different from the data-class conversion function uint8 discussed in the previous section, which simply discards fractional parts. Converting an arbitrary array of class double to an array of class double scaled to the range [0, 1] can be accomplished by using function mat2gray whose basic syntax is >> g = mat2gray(A, [Amin, Amax]) where image g has values in the range 0 (black) to 1 (white). The specified parameters Amin and Amax are such that values less than Amin in A become 0 in g, and values greater than Amax in A correspond to 1 in g. Writing >> g = mat2gray(A); Sets the values of Amin and Amax to the actual minimum and maximum values in A. The input is assumed to be of class double. The output also is of class double. Function im2double converts an input to class double. If the input is of class uint8, uint16, or logical, function ini2double converts it to class double with values in the range [0,1]. If the input is already of class double, ini2double returns an array that is equal to the input. For example, if an array of class double results from computations that yield values outside the range [0,1], inputting this array into im2double will have no effect. As mentioned in the preceding paragraph, a double array having arbitrary values can be converted to a double array with values in the range [0, 1] by using function mat2gray. As an illustration, consider the class uint8 image. >> h = uint8([25 50; 128 200]); Performing the conversion >> g = im2double(h); Yields the result g= 0.0980 0.1961 0.4706 0.7843 75
Chapter 2
Literature Study
From which we infer that the conversion when the input is of class uint8 is done simply by dividing each value of the input array by 255. If the input is of class uint16 the division is by 65535. Finally, we consider conversion between binary and intensity image types. Function lm2bw, which has the syntax g = im2bw(f, T) Produces a binary image, g from an intensity image, f by thresholding. The output binary image g has values of 0 for all pixels in the input image with intensity values less than threshold T, and 1 for all other pixels. The value specified for T has to be in the range [0, I], regardless of the class of the input. The output binary image is automatically declared as a logical array by im2bw. If we write g = im2bw (f), IPT uses a default value of 0.5 for T. If the input is a uint8 image, im2bw divides all its pixels by 255 and then applies either the default or a specified threshold. If the input is of class uint16, the division is by 65535. If the input is a double image, im2bw applies either the default or a specified threshold directly. If the input is a logical array, the output is identical to the input. A logical (binary) array can be converted to a numerical array by using any of the four functions in the first column of Table 3.
2-14.
FLOW CONTROL
The ability to control the flow of operations based on a set of predefined conditions is at the heart of all programming languages. In fact, conditional branching was one of two key developments that led to the formulation of general-purpose computers in the 1940s (the other development was the use of memory to hold stored programs and data). MATLAB provides the eight flow control statements summarized in Table 4. Keep in mind the observation made in the previous section that MATLAB treats a logical 1 or nonzero number as true, and a logical or numeric 0 as false [17].
TABLE 2-3: Flow control statements
2-14.1.
Conditional statement if has the syntax if expression statements end The expression is evaluated and, if the evaluation yields t rue, MATLAB executes one or more commands, denoted here as statements, between the if and end lines. If expression is false,
76
Chapter 2
Literature Study
MATLAB skips all the statements between the if and end lines and resumes execution at the line following the end line. When nesting if s, each if must be paired with a matching end. The else and elseif statements further conditionalize the if statement. The general syntax is if expression1 statements1 elseif expression2 statements2 else statements3 end If expression1 is true, statements1 are executed and control is transferred to the end statement. If expression1 evaluates to false, then expression1 is evaluated. If this expression evaluates to true, then statements2 are executed and control is transferred to the end statement. Otherwise (else) statements3 are executed. Note that the else statement has no condition. The else and elseif statements can appear by themselves after an if statement; they do not need to appear in pairs, as shown in the preceding general syntax. It is acceptable to have multiple elseif statements. Example 2.5 Suppose that we want to write a function that computes the average intensity of an image. As discussed earlier, a two-dimensional array f can be converted to a column vector, v, by letting v = f (:). Therefore, we want our function to be able to work with both vector and image inputs. The program should produce an error if the input is not a one- or two-dimensional array. function av = average(A) %AVERAGE Computes the average value of an array. % AV = AVERAGE(A) computes the average value of input % array, A, which must be a 1-D or 2-D array. % Check the validity of the input. (Keep in mind that % a 1-D array is a special case of a 2-D array.) if ndims(A) > 2 error('The dimensions of the input cannot exceed 2.') end % Compute the average av = sum(A(:) )/length(A(:)); Note that the input is converted to a 1-D array by using A(:). In general, length (A) returns the size of the longest dimension of an array, A. In this example, because A (:) is a vector, length (A) gives the number of elements of A. This eliminates the need to test whether the input is a vector or a 2-D array. Another way to obtain the number of elements in an array directly is to use function numel, whose syntax is n = numel (A) Thus, if A is an image, numel (A) gives its number of pixels. Using this function, the last executable line of the previous program becomes av = sum(A(:))/numel(A);
77
Chapter 2
Literature Study
2-14.2.
FOR LOOP
As indicated in Table2.4, a for loop executes a group of statements a specified number of times. The syntax is for index = start:increment:end statements end It is possible to nest two or more for loops, as follows: for index1 = start1:increment1:end statements1 for index2 = start2:increment2:end statements2 end additional loop1 statements
end
For example, the following loop executes 11 times: count = 0; for k = 0:0.1:1 count = count + 1; end If the loop increment is omitted, it is taken to be 1. Loop increments also can be negative, as in k = 0: -1: -10.Note that no semicolon is necessary at the end of a for line. MATLAB automatically suppresses printing the values of a loop index. Example 2.6 Example 2.4 compared several images using different JPEG quality values. Here, we show how to write those files to disk using a for loop. Suppose that we have an image, f, and we want to write it to a series of JPEG files with quality factors ranging from 0 to 100 in increments of 5. Further, suppose that we want to write the JPEG files with filenames of the form series_xxx. j pg, where xxx is the quality factor. We can accomplish this using the following for loop: for q = 0:5:100 filename = sprintf('series_%3d.JP9', q); imwrite(f, filename, 'quality', q); end Function sprintf, whose syntax in this case is >> s= sprintf (characters1%ndcharacters2,q) writes formatted data as a string, s. In this syntax form, character1 and character2 are character string and %nd denotes a decimal number (specifies by q) with n digits. In this example character1 is series_, the value of n is 3, chracters2 is .jpg and q has the value specified in the loop.
78
Chapter 2
Literature Study
2-14.3.
WHILE
A while loop executes a group of statements for as long as the expression controlling the loop is t rue. The syntax is while end expression statements
As in the case of for, while loops can be nested: while expression1 statements1 while expression2 statements2 end additional loop1 statements
end
For example, the following nested while loops terminate when both a and b have been reduced to 0: a = 10; b = 5; while a a = a - 1; while b b = b - 1; end
end
Note that to control the loops we used MATLABs convention of treating a numerical value in a logical context as true when it is nonzero and as false when it is 0. In other words, while a and while b evaluate to t rue as long as a and b are nonzero.
2-14.4.
BREAK
As its name implies, break terminates the execution of a for or while loop. When a break statement is encountered, execution continues with the next statement outside the loop. In nested loops, break exits only from the innermost loop that contains it.
2-14.5.
CONTINUE
The continue statement passes control to the next iteration of the for or while loop in which it appears, skipping any remaining statements in the body of the loop. In nested loops, continue passes control to the next iteration of the loop enclosing it.
2-14.6.
SWITCH
This is the statement of choice for controlling the flow of an M-function based on different types of inputs. The syntax is Switch switch_expression case case_expression statement(s) case {case_expression1, case_expression2,...} statement(s)
79
Literature Study
end The switch construct executes groups of statements based on the value of a variable or expression. The keywords case and otherwise delineate the groups. Only the first matching case is executed. There must always be an end to match the switch statement. The curly braces are used when multiple expressions are included in the same case statement. As a simple example, suppose that we have an M-function that accepts an image f and converts it to a specified class, call it newclass. Only three image classes are acceptable for the conversion: uint8, uint16, and double. The following code fragment performs the desired conversion and outputs an error if the class of the input image is not one of the acceptable classes: switch newclass case 'uint8 g = im2uint8(f); case 'uint16' g = im2uint16(f); case 'double' g = im2double(f); otherwise error('Unknown or improper image class.') end
2-15.
CODE OPTIMIZATION
MATLAB is a programming language specifically designed for array operations. Taking advantage of this fact whenever possible can result in significant increases in computational speed. In this section we discuss two important approaches for MATLAB code optimization: vectorizing loops and preallocating arrays.
2-15.1.
VECTORIZING LOOPS
Vectorizing simply means converting for and while loops to equivalent vector or matrix operations. As will become evident shortly, vectorization can result not only in significant gains in computational speed, but it also helps improve code readability. Although multidimensional vectorization can be difficult to formulate at times, the forms of vectorization used in image processing generally are straightforward. We begin with a simple example. Suppose that we want to generate a 1-D function of the form f(x) = A sin{x/2pi) for x = 0,1,2,..., M - 1. A for loop to implement this computation is for x = 1:M % Array indices in MATLAB cannot be 0. f(x) = A*sin((x - 1)/(2*pi)); end However, this code can be made considerably more efficient by vectorizing it; that is, by taking advantage of MATLAB indexing, as follows:
80
Chapter 2
Literature Study
X= 0:M - 1; f = A*sin(x/(2*pi)); As this simple example illustrates, 1-D indexing generally is a simple process. When the functions to be evaluated have two variables, optimized indexing is slightly more subtle. MATLAB provides a direct way to implement 2-D function evaluations via function meshgrid, which has the syntax [C, R] = meshgrid(c, r) This function transforms the domain specified by row vectors c and r into arrays C and R that can be used for the evaluation of functions of two variables and 3-D surface plots (note that columns are listed first in both the input and output of meshgrid). The rows of output array C are copies of the vector c, and the columns of the output array R are copies of the vector r. For example, suppose that we want to form a 2-D function whose elements are the sum of the squares of the values of coordinate variables x and y for x = 0, 1, 2 and y = 0, 1. The vector r is formed from the row components of the coordinates: r = [ 0 1 2 ]. Similarly, c is formed from the column component of the coordinates: c = [0 1] (keep in mind that both r and c are row vectors here). Substituting these two vectors into meshgrid results in the following arrays: >> [C, R]= meshgrid(c, r)
C= 0 0 0 R= 0 1 0 1 1 1 1
2 2 The function in which we are interested is implemented as >> h = R. ^ + C. ^2 This gives the following result: h= 0 1 1 2
4 5 Note that the dimensions of h are length (r) x length (c). Also note, for example, that h(1,1) = R(1,1)'2 + C(1,1)'2. Thus, MATLAB automatically took care of indexing h. This is a potential source for confusion when Os are involved in the coordinates because of the repeated warnings in this book and in manuals that MATLAB arrays cannot have 0 indices. As this
81
Chapter 2
Literature Study
simple illustration shows, when forming h, MATLAB used the contents of R and C for computations. The indices of h, R, and C, started at 1. The power of this indexing scheme is demonstrated in the following example [15]. EXAMPLE2.7 In this example we write an M-function to compare the implementation of the following two-dimensional image function using for loops and vectorization: f(x, y) = A sin(uox + voy) for x = 0,1,2,..., M - 1 and y = 0,1,2,..., N - 1. We also introduce the timing functions tic and toc. The function inputs are A, uo, vo, M and N. The desired outputs are the images generated by both methods (they should be identical), and the ratio of the time it takes to implement the function with for loops to the time it takes to implement it using vectorization. The solution is as follows: function [rt, f, g] = twodsin(A, uo, vo, M, N) %TWODSIN Compares for loops vs. vectorization. % The comparison is based on implementing the function % f(x, y) = Asin(u0x + vOy) for x = 0, 1, 2,..., M - 1 and % y = 0, 1, 2,..., N - 1. The inputs to the function are % M and N and the constants in the function. % First implement using for loops. tic % Start timing. for r = 1:M uox = u0*(r - 1); for c = 1;N voy = v0*(c - 1); f(r, c) = A*sin(u0x + vOy); end end t1 = toc; % End timing. % Now implement using vectorization. Call the image g. tic % Start timing. r = 0:M - 1; c = 0:N - 1; [C, R] = meshgrid(c, r); g = A*sin(uO*R + vO*C); t2 = toc; % End timing. % Compute the ratio of the two times. rt = t1/(t2 + eps); % Use eps in case t2 is close, to 0. Running this function at the MATLAB prompt, >> [rt, f, g] = twodsin(-l, l/(4*pi), l/(4*pi), 512, 512);
82
Chapter 2
Literature Study
yielded the following value of rt: >> rt rt = 34.2520 We convert the image generated (f and g are identical) to viewable form using function mat2g ray: >> g = mat2gray(g); and display it using imshow, >> imshow(g) Figure 2.58 shows the result. The vectorized code in Example 2.8 runs on the order of 30 times faster than the implementation based on for loops. This is a significant computational advantage that becomes increasingly meaningful as relative execution times become longer. For example, if M and N are large and the vectorized program takes 2 minutes to run, it would take over 1 hour to accomplish the same task using for loops. Numbers like these make it worthwhile to vectorize as much of a program as possible, especially if routine use of the program in envisioned. The preceding discussion on vectorization is focused on computations involving the coordinates of an image. Often, we are interested in extracting and processing regions of an image. Vectorization of programs for extracting such regions is particularly simple if the region to be extracted is rectangular and encompasses all pixels within the rectangle, which generally is the case in this type of operation. The basic vectorized code to extract a region, s, of size m x n and with its top left corner at coordinates (rx, cy) is as follows: rowhigh = rx + m - 1 ; colhigh = cy + n - 1; Yielded the following value of rt: >> rt rt = 34.2520 We convert the image generated (f and g are identical) to viewable form using function mat2g ray: >> g = mat2gray(g); and display it using imshow, 83
Chapter 2
Literature Study
>> imshow(g) Figure 2.58 shows the result. I The computational advantage that becomes increasingly meaningful as relative execution times become longer. For example, if M and N are large and the vectorized program takes 2 minutes to run, it would take over 1 hour to accomplish the same task using for loops. Numbers like these make it worthwhile to vectorize as much of a program as possible, especially if routine use of the program in envisioned. The preceding discussion on vectorization is focused on computations involving the coordinates of an image. Often, we are interested in extracting and processing regions of an image. Vectorization of programs for extracting such regions is particularly simple if the region to be extracted is rectangular and encompasses all pixels within the rectangle, which generally is the case in this type of operation. The basic vectorized code to extract a region, s, of size m x n and with its top left corner at coordinates (rx, cy) is as follows: rowhigh = rx + m - 1 ; colhigh = cy + n - 1; s = f(rx:rowhigh, cy:colhigh); Where f is the image from which the region is to be extracted. The for loops to accomplish the same thing were already worked out in Example 2.6. Vectorized code runs on the order of 1000 times faster in this case than the code based on for loops.
2-15.2.
PREALLOCATING ARRAYS
Another simple way to improve code execution time is to preallocate the size of the arrays used in a program. When working with numeric or logical arrays, preallocation simply consists of creating arrays of Os with the proper dimension. For example, if we are working with two images, f and g, of size 1024 X 1024 pixels, preallocation consists of the statements >> f = zeros (1024); g = zeros(1024); Preallocation also helps reduce memory fragmentation when working with large arrays. Memory can become fragmented due to dynamic memory allocation and deallocation. The net result is that there may be sufficient physical memory available during computation, but not enough contiguous memory to hold a large variable. Preallocation helps prevent this by allowing MATLAB to reserve sufficient memory for large data constructs at the beginning of a computation.
2-16.
Find edges in an intensity image. The syntax BW = edge (I,'sobel') BW = edge (I,'sobel', thresh) BW = edge (I,'sobel', thresh, direction) [BW, thresh] = edge (I, 'sobel', ...) BW = edge (I, 'prewitt') BW = edge (I, 'prewitt', thresh) BW = edge (I, 'prewitt', thresh, direction) [BW, thresh] = edge (I, 'prewitt', ...) BW = edge (I, 'roberts') BW = edge (I, 'roberts', thresh) 84
Chapter 2 [BW, thresh] = edge (I, 'roberts', ...) BW = edge (I, 'log') BW = edge (I, 'log', thresh) BW = edge (I, 'log', thresh, sigma) [BW, threshold] = edge (I, 'log', ...) BW = edge (I, 'zerocross', thresh, h) [BW, thresh] = edge (I, 'zerocross',...) BW = edge (I, 'canny') BW = edge (I, 'canny', thresh) BW = edge (I, 'canny', thresh, sigma) [BW, threshold] = edge (I, 'canny', ...)
Literature Study
2-16.1.
DESCRIPTION
Edge takes an intensity image I as its input, and returns a binary image BW of the same size as I, with 1's where the function finds edges in I and 0's elsewhere. edge supports six different edge-finding methods: The Sobel method finds edges using the Sobel approximation to the derivative. It returns edges at those points where the gradient of I is maximum. The Prewitt method finds edges using the Prewitt approximation to the derivative. It returns edges at those points where the gradient of I is maximum. The Roberts method finds edges using the Roberts approximation to the derivative. It returns edges at those points where the gradient of I is maximum. The Laplacian of Gaussian method finds edges by looking for zero crossings after filtering I with a Laplacian of Gaussian filter. T The zero-cross method finds edges by looking for zero crossings after filtering I with a filter you specify. The Canny method finds edges by looking for local maxima of the gradient of I. The gradient is calculated using the derivative of a Gaussian filter. The method uses two thresholds, to detect strong and weak edges, and includes the weak edges in the output only if they are connected to strong edges. This method is therefore less likely than the others to be fooled by noise, and more likely to detect true weak edges. The parameters you can supply differ depending on the method you specify. If you do not specify a method, edge uses the Sobel method [20].
2-16.2.
SOBEL METHOD
BW = edge (I,'sobel') specifies the Sobel method. BW = edge (I,'sobel',thresh) specifies the sensitivity threshold for the Sobel method. Edge ignores all edges that are not stronger than thresh. If you do not specify thresh, or if thresh is empty ([]), edge chooses the value automatically. BW = edge (I,'sobel', thresh, direction) specifies the direction of detection for the Sobel method. Direction is a string specifying whether to look for 'horizontal' or 'vertical' edges or 'both' (the default). [BW, thresh] = edge (I, 'sobel', ...) returns the threshold value.
85
Chapter 2
Literature Study
2-16.3.
PREWITT METHOD
BW = edge (I,'prewitt') specifies the Prewitt method. BW = edge (I, 'prewitt', thresh) specifies the sensitivity threshold for the Prewitt method. Edge ignores all edges that are not stronger than thresh. If you do not specify thresh, or if thresh is empty ([]), edge chooses the value automatically. BW = edge (I,'prewitt', thresh, direction) specifies the direction of detection for the Prewitt method. Direction is a string specifying whether to look for 'horizontal' or 'vertical' edges or 'both' (the default). [BW, thresh] = edge (I, 'prewitt',...) returns the threshold value.
2-16.4.
ROBERTS METHOD
BW = edge (I, 'roberts') specifies the Roberts method. BW = edge (I, 'roberts', thresh) specifies the sensitivity threshold for the Roberts method. edge ignores all edges that are not stronger than thresh. If you do not specify thresh, or if thresh is empty ([]), edge chooses the value automatically. [BW, thresh] = edge (I, 'roberts',...) returns the threshold value.
2-16.5.
BW = edge (I, 'log') specifies the Laplacian of Gaussian method. BW = edge (I,' log', thresh) specifies the sensitivity threshold for the Laplacian of Gaussian method. edge ignores all edges that are not stronger than thresh. If you do not specify thresh, or if thresh is empty ([]), edge chooses the value automatically. BW = edge (I, 'log', thresh, sigma) specifies the Laplacian of Gaussian method, using sigma as the standard deviation of the log filter. The default sigma is 2; the size of the filter is n-by-n, where n = ceil (sigma*3)*2+1. [BW, thresh] = edge (I, 'log', ...) returns the threshold value.
2-16.6.
ZERO-CROSS METHOD
BW = edge (I, 'zerocross', thresh, h) specifies the zero-cross method, using the filter h. thresh is the sensitivity threshold; if the argument is empty ([]), edge chooses the sensitivity threshold automatically. [BW, thresh] = edge (I, 'zerocross',...) returns the threshold value. Canny Method BW = edge (I, 'canny') specifies the
2-16.7.
CANNY METHOD
BW = edge (I, 'canny', thresh) specifies sensitivity thresholds for the Canny method. Thresh is a two-element vector in which the first element is the low threshold, and the second element is the high threshold. If you specify a scalar for thresh, this value is used for the high threshold and 0.4*thresh is used for the low threshold. If you do not specify thresh, or if thresh is empty ([]), edge chooses low and high values automatically. BW = edge (I, 'canny', thresh, sigma) specifies the Canny method, using sigma as the standard deviation of the Gaussian filter. The default sigma is 1; the size of the filter is chosen automatically, based on sigma. [BW,thresh] = edge (I, 'canny',...) returns the threshold values as a two-element vector.
86
Chapter 2
Literature Study
2-16.8.
CLASS SUPPORT
I can be of class uint8, uint16, or double. BW is of class logical. For the 'log' and 'zerocross' methods, if you specify a threshold of 0, the output image has closed contours, because it includes all the zero crossings in the input image. Example2.8 Find the edges of an image using the Prewitt and Canny methods. I = imread('circuit.tif'); BW1 = edge(I,'prewitt'); BW2 = edge(I,'canny'); imshow(BW1); figure, imshow(BW2)
2-17.1.
SYNTAX& DESCRIPTION
BW2 = bwmorph (BW, operation) BW2 = bwmorph (BW, operation, n) BW2 = bwmorph (BW, operation) applies a specific morphological operation to the binary image BW. BW2 = bwmorph (BW, operation, n) applies the operation n times. n can be Inf, in which case the operation is repeated until the image no longer changes. Operation is a string that can have one of the values listed below.
87
Chapter 2
TABLE 2-4: Morphological operations
Literature Study Description Performs the morphological "bottom hat" operation, which is closing (dilation followed by erosion) and subtracts the original image. Bridges unconnected pixels, that is, sets 0-valued pixels to 1 if they have two nonzero neighbors that are not connected. For example: 1 0 0 1 1 0 1 0 1 becomes 1 1 1 0 0 1 0 1 1 Removes isolated pixels (individual 1's that are surrounded by 0's), such as the center pixel in this pattern. 0 0 0 0 1 0 0 0 0 Performs morphological closing (dilation followed by erosion). Uses diagonal fill to eliminate 8-connectivity of the background. For example: 0 1 0 0 1 0 1 0 0 becomes 1 1 0 0 0 0 0 0 0 Performs dilation using the structuring element ones(3). Performs erosion using the structuring element ones(3) Fills isolated interior pixels (individual 0's that are surrounded by 1's), such as the center pixel in this pattern. 1 1 1 1 0 1 1 1 1 Removes H-connected pixels. For example: 1 1 1 1 1 1 0 1 0 becomes 0 0 0 1 1 1 1 1 1 Sets a pixel to 1 if five or more pixels in its 3-by-3 neighborhood are 1's; otherwise, it sets the pixel to 0. Peforms morphological opening (erosion followed by dilation). Removes interior pixels. This option sets a pixel to 0 if all its 4connected neighbors are 1, thus leaving only the boundary pixels on. With n = Inf, shrinks objects to points. It removes pixels so that objects without holes shrink to a point, and objects with holes shrink to a connected ring halfway between each hole and the outer boundary. This option preserves the Euler number With n = Inf, removes pixels on the boundaries of objects but does not allow objects to break apart. The pixels remaining make up the image skeleton. This option preserves the Euler number. Removes spur pixels. For example: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 becomes 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 With n = Inf, thickens objects by adding pixels to the exterior of 88
'clean'
'close' 'diag'
'hbreak'
'skel' 'spur'
'thicken'
Chapter 2
Literature Study objects until doing so would result in previously unconnected objects being 8-connected. This option preserves the Euler number. With n = Inf, thins objects to lines. It removes pixels so that an object without holes shrinks to a minimally connected stroke, and an object with holes shrinks to a connected ring halfway between each hole and the outer boundary. This option preserves the Euler number Performs morphological "top hat" operation, returning the image minus the morphological opening of the image.
'thin'
'tophat'
2-17.2.
CLASS SUPPORT
The input image BW can be numeric or logical. It must be 2-D, real and nonsparse. The output image BW2 is of class logical [20]. Example2.9 BW = imread 'circles.png'); imview(BW);
89
Chapter 2
Literature Study
90
Chapter 2
Literature Study
2-18.1.
SYNTAX
VFM Perform frame grabbing from any Video for Windows source. The function wraps a number of sub-functions. These are parameterized in the first parameter. Invocation of a subfunction, say 'grab', is of the form: vfw('grab', ...parameters...) VFM ('grab'?, framecount?) Grabs framecount frames. framecount defaults to 1. Returns M x N x 3 x framecount array of uint8, where M and N are the height and width respectively. Images are in RGB format. VFM ('preview'?, bPreview?) Switches preview mode on or off, according to the Boolean value bPreview. bPreview defaults to 1. VFM ('show'?, bShow?) Shows or hides the capture window according to the value of the Boolean, bShow. The window is displayed if and only if bShow is 1. bShow defaults to 1.
2-19. REFRENCES
[1]. Saeed B. Niku Introduction to robotics Analysis, Systems, Application [2].Bonner, Susan, K. G. Shin, "A Comprehensive Study of Robot Languages," IEEE Com puter, December 1982, pp. 82-96. [3].Kusiak, Andrew, "Programming, Off-Line Languages," International Encyclopedia of Robotics: Applications and Automation, Richard C. Dorf, Editor, John Wiley & Sons, New York, 1988, pp. 1235-1250. [4].Gruver, William, "Programming, High Level Languages," International Encyclopedia of Robotics: Applications and Automation, Richard C. Dorf, Editor, John Wiley & Sons, New York, 1988, pp. 1203-1234. [5].Unimation, Inc., VAL-II Programming Manual, Version 4, Pittsburgh, 1988. [6].IBM Corp., AML Concepts and User's Guide, 1983. [7]. Press, Cambridge, Mass., 1981. [8]. Madonick. N., "Improved CCDs for Industrial Video,'" Machine Design, April 1982. pp. 167-172. [9]. Wilson, A.. "Solid-State Camera Design and Application." Machine Design, April 1984, pp. 38-46. [10]. Meagher, J.. Fourier.xle Program, Mechanical Engineering Department, California Poly technic State University, San Luis Obispo, CA, 1999. [11]. C. Ray Asfahl Robots and Manufacturing Automation 2nd edition, University of Arkansas, Fayetteville. [12]. Efrain Turban Expert systems and Applied Artificial Intelligence California state university [13]. Eugin I. Irvin Mechanical design of Robots. [14]. Waltz, D. L. Generating Semantic description from drawings of scene with shadows Technical report AI. TR. 271, MIT, Cambridge, Mass. 91
Chapter 2
Literature Study
[15]. Gonzales, Woods, Eddins Digital Image Processing Using MATLAB Pearsons education Titles. [16]. Gonzales, R.C. and Woods R.E. [2002]. Digital Image Processing 2nd edition, Prentic hall, upper saddle River, NJ. [17]. Hanselman, D. and Little Field, B.R. [2001] Mastering the MATLAB 6 Prentic hall, upper saddle River, NJ. [18]. Image Processing Toolbox, users guide, version 4. [2003], The MathWorks, Inc., Natick , MA. [19]. Using MATLAB V6.5, [2000], The MathWorks, Inc., Natick , MA. [20]. Visit www.mathworks.com. [21]. Anil K. Jain Fundamentals of Digital Image Processing, university of California, Davis. [22]. Staugaard, Andrew C., Jr. Robotics and AI : An Introduction to applied machine intelligence Englewood Cliffs, N.J.: Prentice hall 1987.
92
Chapter 3
Our Methodology
93
Chapter 3
Our Methodology
94
Chapter 3
Our Methodology
3-1.2.1. Program
#include<iostream.h> #include<conio.h> #include<dos.h> void main() { char a,b; clrscr(); cout<<"Which part you want to move\n";
95
Chapter 3
Our Methodology
cout<<"Press G for gripper, B for base, W for wrist, S for shoulder and E for elbow\n"; //cout<<"Hit escape key to stop movement at any stage\n"; cin>>a; //********************************GRIPPER*************************** if(a=='g'||a=='G') { cout<<"\n Which direction you want to move gripper,\n Press < for counter clockwise and > for clockwise\n"; cin>>b; if(b=='>') // clockwise { cout<<"you are moving the gripper in clockwise direction\n"; outportb(0x378,1); getch(); } else if(b=='<') // counter clockwise { cout<<"you are moving the gripper in counter clockwise direction\n"; outport(0x378,2); getch(); } else { cout<<"Invalid key pressed"; } } //********************************BASE******************************* else if(a=='b'||a=='B') { cout<<"\n Which direction you want to move base,\n Press < for counter clockwise and > for clockwise\n"; cin>>b; if(b=='>') // clockwise { cout<<"you are moving the base in clockwise direction\n"; outportb(0x378,4); getch(); } else if(b=='<') // counter clockwise { cout<<"you are moving the base in counter clockwise direction\n"; outport(0x378,8); getch(); } else { cout<<"Invalid key pressed"; } } //******************************WRIST******************************* else if(a=='w'||a=='W') { cout<<"\n Which direction you want to move wrist,\n Press < for counter clockwise and > for clockwise\n"; cin>>b; if(b=='>') // clockwise { cout<<"you are moving the wrist in clockwise direction\n"; outportb(0x378,16); getch(); } else if(b=='<') // counter clockwise
96
Chapter 3
{ cout<<"you are moving the wrist in counter clockwise direction\n"; outport(0x378,32); getch(); } else { cout<<"Invalid key pressed"; }
Our Methodology
} //*******************************SHOULDER************************** else if(a=='s'||a=='S') { cout<<"\n Which direction you want to move shoulder,\n Press < for counter clockwise and > for clockwise\n"; cin>>b; if(b=='>') // clockwise { cout<<"you are moving the shoulder in clockwise direction\n"; outportb(0x378,64); getch(); } else if(b=='<') // counter clockwise { cout<<"you are moving the shoulder in counter clockwise direction\n"; outport(0x378,128); getch(); } else { cout<<"Invalid key pressed"; } } //******************************ELBOW******************************* else if(a=='e'||a=='E') { cout<<"\n Which direction you want to move elbow,\n Press < for counter clockwise and > for clockwise\n"; cin>>b; if(b=='>') // clockwise { cout<<"you are moving the elbow in clockwise direction\n"; outportb(0x379,2); getch(); } else if(b=='<') // counter clockwise { cout<<"you are moving the elbow in counter clockwise direction\n"; outport(0x379,1); getch(); } else { cout<<"Invalid key pressed"; } } //**************************************END************************** }
97
Chapter 3
Our Methodology
98
Chapter 4
Hardware Components
99
Chapter 4
Hardware Components
4-1. INTRODUCTION
The project is divided into parts, namely hardware part and the software part. The hardware part is mainly related to the control circuitry of the robot motion as discussed in previous chapter of our methodology. Since robot has to move in a controlled pattern to pick some object from one place and to place it at somewhere at other place, therefore we need controlled electronic power supply. The control circuit will not only check the power supply requirements but also the robot motions in different directions i.e. clock wise and anti clock wise. The circuit will comprise of a dual polarity power supply, optocouplers, transistors and a parallel port connecter for the computer interface. The detailed description for each component is given in this chapter.
To increase the accessibility the joints of a robot should have both clockwise and anticlockwise motions. To move the motors in either direction we need both positive and negative voltages. So for the accomplishment of the above stated task, dual polarity power supply was built. The requirement of our robot was 3.00V but we developed power supply having voltage ranging from 0 to 15V. We regulated the output voltage using LM317 Adjustable Positive Regulator and LM337 Adjustable Negative Regulator to increase the flexibility in the application of power supply for different requirements. The Amperage of our robot is approximately 1A. For this we use center tapped step down transformer (220V to 20V) having 1A amperage. The output of this transformer is given to the bridge rectifier and then to the LM317 and LM337 and capacitors are place in parallel to reduce fluctuations in the DC voltage generated by bridge rectifier. The output is varied by potentiometer connected to regulators [1]. The circuit diagram is shown below.
The component used in above circuit is S1 is the main off switch. BR1 is the bridge rectifier. C1 and C2 are 2200uF 35V Electrolytic Capacitors. C3, C4, C5 & C7 are 1uF 35V Electrolytic Capacitors. C6 and C8 are 100uF 35V Electrolytic Capacitor. R1, R4 are 10K 100
Chapter 4
Hardware Components
potentiometer. R2, R3 are 220 Ohm 1/4 W Resistor. U1 is LM317 Adjustable Positive Regulator and U2 is LM337 Adjustable Negative Regulator.
4-2.2.
To connect the robot with computer we developed the interface circuitry. The main components are as follows.
4-2.3.
PARALLEL PORT
Parallel port is a simple and inexpensive tool for building computer controlled devices and projects. The simplicity and ease of programming makes parallel port popular in electronics hobbyist world [2]. The parallel port is often used in Computer controlled robots, Atmel /PIC programmers, home automation ...etc... The hardware of the parallel port (DB25) connector is shown in the picture below
FIGURE 4-3: Pin configuration of the parallel port. The lines in DB25 connector are divided in to three groups, they are 1) Data lines (data bus) 2) Control lines 3) Status lines As the name refers, data is transferred over data lines, Control lines are used to control the peripheral and of course, the peripheral returns status signals back computer through Status lines. These lines are connected to Data, Control and Status registers internally. The details of parallel port signal lines are given below
101
Chapter 4
TABLE 4-1: Detailed description of parallel port.
Pin No (DB25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-25 Signal name nStrobe Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 nAck Busy Paper-Out Select Linefeed nError nInitialize nSelect-Printer Ground Direction Out In/Out In/Out In/Out In/Out In/Out In/Out In/Out In/Out In In In In Out In Out Out Register bit Control-0 Data-0 Data-1 Data-2 Data-3 Data-4 Data-5 Data-6 Data-7 Status-6 Status-7 Status-5 Status-4 Control-1 Status-3 Control-2 Control-3 -
Hardware Components
Inverted Yes No No No No No No No No No Yes No No Yes No No Yes -
Since we have five degrees of freedom (base rotation, shoulder and elbow flex, wrist roll and gripping) in our robot. To move the joints in clock wise and anti clock wise directions we used ten optocouplers, one for clockwise and one for anti clockwise direction of each joint. These ten optocouplers are attached to the ten pins of the parallel port. The pin from 2 to 9 can be used as outputs pins and their status is not inverted no they are most suitable the remaining two pins can be taken out of 1, 14, 16, 17 so we use pin 1 and 14 because they both have inverted status and combine in pair. The ground can be used from pin 18 to 25 and we use pin 18 as ground.
4-2.4.
OPTOCOUPLER
The purpose of optocouplers is to provide electrical isolation between switching components that is between robot and the parallel port. The optocouplers we used is P521 having four pins. Pin 1 is connected to parallel port output, pin 2 is connected to the ground of the parallel port, pin 3 is connected to the +3V of the dual polarity power supply and pin 4 is connected to the base of the transistor to bias it.
102
Chapter 4
Hardware Components
4-2.5.
TRANSISTOR
Bipolar junction transistor consists of two back-to-back p-n junctions, who share a thin common region with width, wB. Contacts are made to all three regions, the two outer regions called the emitter and collector and the middle region called the base. The structure of an NPN bipolar transistor is shown in Figure. The device is called "bipolar" since its operation involves both types of mobile carriers, electrons and holes. The pin configuration of the transistor used in our project is shown in figure below. Its a PNP transistor. We use this for both rotation activation by using the technique that when its desired to connect +3V to the motor then supply +3V is given to the collector and output is connected to the emitter of the transistor while activation is done by giving signal at base via optocouplers
4-2.6.
In short we combine pin into pair one pair for each joint rotation. So we have five joints and so five pairs. Each pair is responsible for clockwise and anti clockwise rotation of that joint. Pin 1 and 14 are responsible for Elbow flex, pin 1 for clockwise and pin 14 for anticlockwise. Since output is one so it is alternatively connected to the +3.00V and -3.00V depending upon whether the input is from pin 1 or from 14. The configuration of optocouplers for pin 1 and 14 is same yet the transistor has different connection for both. For clockwise direction the positive supply is connected to the collector of the transistor while output is emitter while for anti clockwise negative supply is connected to the emitter of the transistor while output is taken from collector, so emitter of the first and collector of the transistors are combine and given to the
103
Chapter 4
Hardware Components
motor. The ground of the robot is connected to the supply ground. The same is repeated for shoulder, wrist, base, and gripper by connecting them with pairs 2&3, 4&5, 6&7 and 8&9 respectively. The circuit diagram of interfacing is shown bellow. Initially we have developed this circuit on the Vero board but that causes problem and sometimes malfunctions due to its low quality but now we have made PCB of this circuit.
The ROBOTIC ARM TRAINER teaches the basic robotic sensing and locomotion principles while testing your motor skill as you build and control the arm. We can command this unit with five switches, wired controller with corresponding lights to grab, release, lift, lower, rotate wrist, and pivot sideways 350 degrees. After assembly, observe the dynamics of gears mechanisms through the transparent arm. Five motors and joints allow for flexibility and fun! For educators and home schoolers, you will find the robotic technology curriculum and personal computer interface useful tools.
4-3.2.
4-3.2.1.
SPECIFICATIONS
Five Axis of Motion
350 degrees 120 degrees 135 degrees 340 degrees 50 mm ( 2)
Base to rotate left and right: Shoulder moving range : Elbow moving range : Wrist rotate CW & CCW : Gripper open and close :
4-3.2.2.
Product Dimensions
360 mm (14.2) 510 mm (20.1) 130 g (4.6 0z)
4-3.2.3.
Power Source
104
Chapter 4
Hardware Components
4-6. REFRENCES
[1]. http://www.aaroncake.net/ [2]. http://www.logix4u.net/ [3]. A text book of robotic arm trainer by
105