You are on page 1of 12

Sub-Pixel Edge Detection in Character Digitization

Gideon Avrahami , Vaughan Pratt

Department of Computer Science, Stanford University
ABSTRACT: We introduce an algorithm for extracting edge points from a high-quality gray-level image of a glyph. The edge is described as a list of points with sub-pixel precision, roughly one pixel apart. When applied to images obtained from a high-resolution scanner, the precision is essentially that of the human eye. KEYWORDS: Edge Detection, Digitization

We have designed and implemented a technique for rapidly digitizing ink artwork at the limit of the human eye's resolving power. We have applied it 1 to the digitization of images obtained from a 400 dot per inch 8 2 11 inch scanner. This paper reports on the nature of the problem, the rationale for our design, the technical details of the design and its implementation, and the results obtained on some sample artwork. Any digitization of artwork renders one vulnerable to the charge of \slavish copying." But copying of some kind is an inherent part of typography, and our premise is that copying to an accuracy within the limits of human vision is su cient. These limits are not obvious. The human eye's response to a sinusoidal image peaks at an angular frequency of 5 cycles per degree of arc VB], which at a viewing distance of ten inches corresponds to a spatial frequency of 30 cycles/inch, Beyond this frequency it falls o to vanish completely at 50 cycles/deg. or 300 cycles/inch. Using the Nyquist conversion of two dots per cycle, this would be just where a 600 dot per inch scanner would fade out. Thus such a scanner would seem about adequate for the job. However the brain does an extraordinary job of post-processing this frequency-limited data. Di erences between two distances of as little as 12 arc-seconds can be recognized Vol], while vernier acuity, the ability to recognize a lateral displacement of part of a line, is 10 arc-seconds Wul]. About the same accuracy is achieved in detecting collinearity of three dots
Supported by a gift from SUN Microsystems.

1 Introduction

Avrahami and Pratt

Lud]. Westheimer Wes] gives a comprehensive and more up to date treatment of these and related phenomena. Ten arc-seconds at ten inches corresponds to 1/2000 of an inch. Allowing for possibly closer viewing of ve inches this gives us as a goal an accuracy of 1/4000 of an inch, ten times the resolution of the 400 dot per inch scanner we have used in this work. What the brain can add to the retina's output the computer should in principle be able to add to the scanner's. We have taken advantage of the 8 bits per pixel of gray scale, and the coherence of typical artwork, to achieve 1/4000 inches of precision on outlines of more than ten pixels radius of curvature. At higher curvatures this precision drops o , until the radius drops below one pixel at which point our software can no longer tell what the curve is doing at all but can only make an educated guess at which pixels it appears to be visiting. For su ciently wild outlines such as space lling dragon curves involving multiple points of in ection per pixel, this guess can be completely wrong. Our design therefore assumes that regions of high curvatures, while permitted, are su ciently isolated from each other that the tame regions in between can serve to indicate what happens at each tight curve. The next issue to address is the volume of data represented by such accuracy. We are fortunate that the image is not scanned in at 4000 dots per inch since that would correspond to 1:5 109 pixels, which at a byte per pixel is 1.5 gigabytes of data. If we used say Deriche's optimal edge-detection algorithm, taking 5 additions and 5 multiplications per pixel Der], even if this work per pixel could be compressed into 10 microseconds it would take 4 hours. With only 400 dots per inch this time falls to 140 seconds, still somewhat tedious. Before addressing this issue we raise another issue that bears on the volume of data. Two equally valid approaches are possible to the analysis of digital images, analytic and combinatorial. Both approaches work with the same amount of information in the image. The analytic approach treats the image as continuous but bandlimited, namely by the Nyquist frequency of half a period per pixel, with no loss of information in so viewing the samples. This approach lends itself to techniques for immunizing analysis against various kinds of noise. The state of the art today in analytic edge detection is represented by the work of Canny Can] and Deriche Der]. The combinatorial approach works directly with the individual samples, without attempting to view the situation in the continuous domain. The

Sub-Pixel Edge Detection

advantage of this approach is simplicity, while its chief disadvantage is that the level of noise immunity easily achieved by analytic means is extremely di cult to achieve combinatorially. However ink artwork is of a very di erent character from the typical scenes addressed in computer vision. There is a negligibly low level of noise, edges are crisp, and curvature tends to be well-behaved (piecewise continuous). Under these conditions the combinatorial approach can work not only reliably but also very e ciently due to its greater simplicity. Secondly it also lends itself to tracking, the process whereby one follows only the outline, as opposed to the area-oriented emphasis of the analytic approach, thus disposing of the concern about volume of data. Thirdly, tracking of lowcurvature outlines at subpixel resolution is extremely easy using the grayscale values at each pixel. Fourthly, the amount of ltering required by analytic methods to equal the accuracy of our tracking results in much greater rounding of sharp corners than with our approach. With the same premises, the same subpixel resolution for su ciently low curvature outlines can also be achieved analytically, but at considerably higher cost due to more arithmetic and especially to touching more pixels. In the absence of noise there is therefore no reason to prefer the analytic approach. The cost of the combinatorial approach is extremely low. Not only do we only need examine those pixels immediately adjacent to the outline, but the logic and arithmetic requirements per pixel are negligibly small: four subtractions per pixel for tracking plus one more subtraction to compute the position of the curve within the pixel currently visited. Our current implementation does not yet achieve the performance potential of this method because we have not tuned it to take advantage of it. Instead the arithmetic has been done with oating point and with unnecessary divisions introduced for ease of understanding while debugging. These will be removed in due course.

2 Tracking the Edge

The tracking algorithm employs a simple nite-state automaton to keep track of the location of the edge. At a given time, the edge is assumed to pass between two neighboring pixels. Call these pixels D and L|the dark and the light. Now, the automaton examines the immediate neighbors of D and L and looks to advance, while maximizing the di erence in gray level values between dark and light.

Avrahami and Pratt

This can be achieved in one of three ways: 1. Replace D by its darkest neighbor, which is also a neighbor of L. 2. Replace L by its lightest neighbor, which is also a neighbor of D. 3. Replace both D and L simultaneously by a pair of neighbors. The neighbors are tested only on one side of the current pair, so the edge always advances forward while keeping dark pixels on its right. (This choice|to advance in a clockwise fashion around a black region|or counterclockwise around a white one|is arbitrary.) To discuss the way we advance along the edge, consider the following diagram:

We are looking at four pixels around a piece of horizontal edge. On the right, the pixels are shown as squares, with various shades of gray. On the left, we denote each pixel by a letter. The current pair is denoted by D and L, so this is a vertical pair. The edge advances with the dark region on its right, so the next two pixels to be considered are the vertical pair on the right|denoted by X and Y. To advance, the algorithm tests which pair among (X,L), (D,Y) and (X,Y) maximizes the grey-level di erence. That pair will become the current pair for the next step. In the above picture, it would appear that (X,Y) has the most contrast, so the tracking algorithm advances both pixels at once. Similarly, when the current pair is diagonal, the situation is as follows:

D and L, on the left, lower corner, are the current pair; the edge is advancing towards the right, upper corner; and the relevant neighbors are X,Y and Z. The next pair is selected from among four possible pairs: (D,Y), (L,Y), (X,Y) and (Y,Z).

These steps are small enough to ensure that the algorithm will not lose track of the edge even around sharp corners or deep concave inlets. But problems could arise when the edge is blurred and gets lost in a gray area. For example, Figure 1 shows how the algorithm goes astray when two edges are too close to each other. At the given resolution, the edges are not separated enough, and the area between the two edges is less than one

Sub-Pixel Edge Detection

pixel wide. The tracking procedure choses to maximize the contrast, and instead of following the ill-de ned edge inside the gray \fjord" it skipped to the other part. Our algorithm assumes that images do not contain such \di cult" features. We require the artwork to be not only clean but scaled to a magni cation su cient to eliminate such interference.

Figure 1: Losing track of an ill-de ned edge

We are working towards formulating a precise theorem about our edge detection algorithm. Whenever there are two rookwise connected paths, one made of black pixels and the other of white, such that the distance between them is never more than one pixel, then the algorithm will safely follow the edge that lies between the two paths, in a sense to be made precise. (In Figure 1, the gray area|which should have two distinct edges|contains no white pixels.) Based on this theorem, the algorithm can be modi ed to detect suspect areas, report them, and thus avoid pathological tracking behavior such as failure to return to the start of the edge or skipping to a di erent edge. The tracking phase, described in the previous section, provides us with a stream of dark and white neighboring pixels. In order to compute a precise edge point, we choose successive dark-white pairs of rookwise (horizontal or vertical) neighbors, and compute an edge point between them. The tracking phase ensures that these are high-contrast pairs, and our basic assumption is that the edge line passes somewhere between these two pixels. We assume further that gray level values of each pixel correspond to the distance of its center from the edge. More precisely, we assume that each pixel's value is obtained by integrating the light intensity over a circular pixel in the image. We also assume that the image is strictly black on one

3 Computing An Edge Point

Avrahami and Pratt

arccos( d ) r2 ? d r2 ? d2: r Since we'd like to use the diameter of one pixel as our unit of length, we set r = 1 and divide by the area of the pixel, which is r2. Thus, the gray 2 level of a pixel whose center lies inside the white part and at a distance of 1 d (d < 2 ) from the edge would be p arccos(2d) ? 2d 1 ? 4d2 : G(d) =
G(0) =

side of the edge and strictly white on the other. This yields the situation of Figure 2, where the gray level value of a pixel is the relative area on the dark side of the edge. For a straight line edge, this value can be easily computed as a function of the edge's distance from the pixel's center: Let d be the distance of the chord from the center of the circle, and let r be the circle's radius. The area trapped between the chord and the circle is

If the center lies inside the dark part, the gray level is 1 ? G(d). (Note that 1 ). 2

r d

Figure 2: The gray level of a pixel on the edge

This formula allows us to compute the gray level of a pixel as a function of the edge's distance from its center. The formula, and hence the assumption of circular pixels, is supported by measurements taken when a straight edge is scanned at a small angle to the scanner's axis: at an angle of 3 , for example, there will be a 20-pixel long cross section of the edge. Our experiments (see Figure 3) agree with the formula to within 3 percent of gray. A square pixel aligned with the scanner's axis would, in contrast, yield a straight line. What we really need, however, is the inverse calculation: given a pixel's gray level, infer the location of the edge. The inversion can be done numerically, and it only needs to be done once: During the run of the algorithm,

Sub-Pixel Edge Detection



0.0 -0.5



Figure 3: The G(d) function and a cross-section of a scanned edge.

the computed distance is retrieved from a table, indexed by the gray level value. The next step is to use two neighboring pixels to compute an edge point. The situation is described in Figure 4. Given the values of d1 and d2 (which we calculate from the gray values of each pixels) we can compute the values t t of t1 and t2 . By triangle similarity, d11 = d22 = 1?2t1 , and so t1 = d1d1d2 . Thus, d + in Figure 4, we can calculate exactly the coordinates of point E , which lies on the edge: the y coordinate is the same as that of the two pixels, i.e., y1 ; and the x coordinate is x1 + t1 (or x2 ? t2 ).

t1 (x1 ,y1) d1

E t2


Figure 4: An edge passing through several neighboring pixels

To summarize: in order to compute an edge point, consider two horizontal or vertical neighboring pixels; for each of them, calculate the distance di as a function of its gray level (computed through a lookup table). Next, compute t1 = d1d1d2 and calculate the coordinates of E : when the two pixels + are horizontal neighbors, i.e., both have the same y coordinate, this means

Avrahami and Pratt

setting yE to y1 = y2 , and xE to x1 + t1 (x2 ? x1 ) where x2 ? x1 is either 1 or ?1, since the pixels are neighbors. In the vertical case, xE is set to x1 = x2 and yE to y1 + t1 (y2 ? y1 ):
3.1 Limitations The rst source of error in this computation is in the concept of \gray level." We like to think of the gray level as a number between 0.0 (black) and 1.0 (white). In an 8-bit image, pixel values should be in the range of 0 to 255, respectively. However, the dynamic range of the scanner is smaller. Even on high quality art, the black and the white values are clustered around, say, 20 and 220. In the current implementation, the rst phase creates a histogram of the image and a table that translates pixel values to relative gray levels. Everything below 20 is set to zero, everything above 220 is 1.0, x?20 and a value x in between is interpolated as 220?20 ; e.g. 0.4 when x = 100:

d d

Figure 5: A straight edge vs. a curved edge

The linear interpolation assumes that no gamma correction is necessary. Empirical tests with our scanner support that assumption, but di erent input devices may require such a correction. In any case, this should not slow down the algorithm, since the correction should also be implemented as a lookup table. The second source of error is the fact that the edge is not really a straight line. The edge is, in general, curved, and so its intersection with a pixel-sized circle in the image is better modeled by a circular arc than by a straight line. Consider Figure 5: calculating d as a function of the amount of gray is like assuming that the edge is the straight (dashed) line; whereas the true, curved, edge is lying at a slightly di erent distance, d0, from the center of the pixel. Fortunately, our numerical tests show that at a given gray level (i.e., area of the intersection with a pixel) the distance of a circular arc does not

Sub-Pixel Edge Detection

9 0.5 0.4 0.0648 0.0435 0.0328 0.0221 0.0130 0.3 0.0617 0.0415 0.0285 0.0195 0.0119 0.2 0.0444 0.0328 0.0260 0.0154 0.0094 0.1 0.0314 0.0156 0.0137 0.0076 0.0048

gray level radius 1.0 1.5 2.0 3.0 5.0

0.0710 0.0467 0.0348 0.0231 0.0138

Table 1: The distance between the straight line and circular arc that correspond to the same gray level.

di er much from the distance of a straight line. In fact, when the radius of curvature is 3 pixels or more, the error is smaller than 0.025 pixels. The maximal error occurs when the edge passes close to the center of the pixel, i.e., at gray levels of about 0.5, and it falls considerably as the gray level moves toward either 0.0 or 1.0. Even at higher curvatures|with radii that approach that of a single pixel|the error remains relatively small. (See Table 1.) We may safely conclude that the straight-line approximation is reasonably accurate. Finally, we always assume in this discussion that no other edge line crosses the pixels involved. If two edges in the image are so close together that they both in uence the same pixel, the derivation breaks down and the quality of the reported edge su ers. Still, a sharp image of a well de ned corner will contain one or two bad edge points, but the damage is isolated and detectable. We expect that a second phase, one where curves are tted to the edge points, will handle sharp corners with explicit care.

4 Experimental Results

The current implementation of our algorithm is written in C and runs on a SUN SPARCstation 1+. Given an image le and a starting point, it tracks one connected edge and saves the computed edge-points to a le. In a sense, it behaves as one step in a pipeline|counting on previous steps to scan the image, prepare a histogram of all the pixels in it, and point to a starting point; and leaving the work of interpreting, or tting curves to these points, to subsequent steps. Being still a research project, the algorithm is not yet optimized to run as fast as it could. For example, it still uses oating-point calculation of every


Avrahami and Pratt

edge point (one division and one addition per point) and saves some extra information as it tracks the edge. Its current performance rate is roughly 13,000 edge-points per second. Based on 400 dots per inch (with a Microtek MSF-400GS scanner), and with an average distance of 1.1 pixels between successive edge points, we achieve a speed of some 36 inches, or 3 feet, per second.

Figure 6: Baskerville(TM) italic y (detail)

Figure 7: A closer look at y's edge points.

Sub-Pixel Edge Detection


Figure 6 shows a detail from the letter y in a Baskerville(TM) italic font. The image was scanned from 125 point art, produced under license from International Typeface Corporation. On the right hand side, we see the output of the edge-detection algorithm, showing one edge-point for every pixel along the edge. The full image contains about 1900 edge-points, and the processing time is about half a second. A closer look at this part of the y is shown in Figure 7. We can now see exactly how well the edge points follow the gray-level edge in the image. (A similar closeup, of an image with a less well-de ned edge, was a orded by Figure 1, in a previous section.) On a di erent level, Figure 8 shows a combined picture of an image and some of its edge points. In this case, the image was of much lower quality: it was scanned from a book Vac] and was only half an inch in height. The upper right of the found outline comes dangerously close to sharing the fate of Figure 1. Images with such \risky" features are not recommended for our algorithm.

Figure 8: The pictorial character KIN, and some of its edge points


Avrahami and Pratt

5 Conclusion

We have implemented a simple and e ective algorithm for tracking and reporting high resolution edges in scanned images. Paying close attention to each pixel's gray level, we are able to increase the spatial resolution of our scanner by a factor of ten. Some of the algorithm's limitations have been mentioned: It relies on clean images with sharp edges and well-de ned areas of black and white. Gray or blurred areas can confuse the tracking mechanism or the highresolution calculation. A small radius of curvature, caused either by blemishes or by true sharp corners in the image, may produce unreliable results.

545 Tech. Sq., Camb. MA, 1983. Der] R. Deriche Using Canny's Criteria to Derive a Recursively Implemented Optimal Edge Detector. Int. J. Comp. Vis., 1, 1988. Lud] E. Ludhvig. Direction sense of the eye. Am. J. Ophthal., 36:139-142, 1953. VB] F.L. Van Nes and M.A. Bouman. Spatial modulation transfer in the human eye. J. Opt. Soc. Amer., 57:401-406, 1967. Vol] A.W. Volkmann. Physiologische Untersuchungen im Gebiete der Optik. Leipzig: Breitkopf and Hartel, 1863. Wes] G. Westheimer. Visual Acuity and Spatial Modulation Thresholds. In Visual Psychophysics, VII/4:171-187, ed. D. Jameson and L. Hurvich, 1972. Wul] E.A. Wul ng. Uber den kleinsten Gesichtswinkel. Zeit. Biolog., 29:199202, 1892. Vac] O. Vaccari. Pictorial Chinese-Japanese Characters. Tokyo: Vaccari's Language Institute, 1950.

Can] J. Canny Finding edges and lines in images. MIT AI Lab report TR-720,

You might also like