You are on page 1of 4

RINGTREE : A VLSI ARCHITECTURE FOR FAST IMAGE GENERATION AND PROCESSING

K.S.Eo,S.S.KimandC.M.Kyung

Department of Electrical Engineering, KAIST P.O. Box 150, Cheongryang, Seoul 131, Korea

ABSTRACT This paper describes a new hardware architecture called Ringtree for 2-D geometry generation and processing such as image processing (noise suppression, notch elimination and contour extraction), graphics processing (polygon filling and multiwindowing with nonrectangular window) and VLSI layout verification (design rule checking). Ringtree consists of Ring memory which is a special rotating frame buffer, EFT(Edge Painting Tree) which is a polygon rasterizing hardware and LPA(Linear Processor Array). LPA executes a set of basic operations a p plicable for bit map data manipulation while EPT generates the bit map data from a set of scanline commands received from host processor. VLSI implementation issues are also discussed for practical display screen size of 1024x1024 pixels, utilizing the concept of Super T r 4 5 ] for realizing the whole system using identical VLSI chips. I. INTRODUCTION Owing t o the recent drastic advances in VLSI process and design techniques, tremendous efforts are now being made toward the development of various hardware accelerators for diverse applications such as image processings, computer graphics, and layout verification in VLSI design exploiting hardware parallelism.[ 1-51 In dealing with 2-dimensional image generation and processing, we note that there are basically two kinds of processing algorithms. One is what we call global processing which considers very wide area of pixel plane in calculating the transform result of one pixel value, while the other is what we call local processing which considers only the neighboring or near-range pixels for the updating of each pixel value. Global processing is generally very complicated and less repetitive compared t o the local processing, and therefore, suitable for software implementation. On the other hand, the individual complexity of local processing is very small, while the number of repetition is huge, i.e., as many as the number of pixels in each frame for generating one frame data. Local image processing including noise filtering, contour extraction and geometrical layout verification can, therefore, be speeded up drastically by utilizing hardware parallelism. Actually, many hardware architectures for 2-dimensional image processing are mainly targeted for implementing the above-mentioned local processing algorithm rather than the global o m , for which cytocomputer[l], bit map processor[2] and wire routing machine[3] are typical examples. In cytocomputer[l], bit map data representing the 2D geometry space is scanned one pixel by one in row-major order and processed by a pipelined processor chain where each unit processor executes one of the basic operations into which the desired instruction is decomposed. Each processor in the pipeline stage itself consists of a subarray processor implemented as combinational logic circuitry and a shift register t o perform the neighborhood operations m the vertical direction as

well as in the horizontal direction. In this architecture, however, the complexity of each processor becomes excessive because each processor has t o contain a shift register array whose length is proportional to ( length of two rows of the bit map plane ) x ( number of layers ). Moreover, the structure of the unit processor and the number of stages in the processor pipeline is tied up t o a specific bit map size and a specific set of operations. AISO, its time complexity is ~ ( n )for n ~n bit map planes due t o the serial processing with no hardware parallelism. Bit map processor in [2] describes an array processor for two-dimensional bit map data where the cells called SAM cell are linlied in a two-dimensional mesh-connected network. Another example of this scheme is an interconnected array of microcomputers to solve the wire routing problem described in [3]. These schemes need minimal external data flow because bit data in each cell are directly accessible from its neighbors, resulting in high speed performance. However, the complexity of this scheme is somewhat excessive even with the present VLSI technology since the size of the bit map data could easily be more than loo0 X loo0 in VLSI layout. As a trade off in terms of hardware complexity and performance between the cytocomputer[l] and bit map processor[2], we proposed a hardware architecture called MultiRing[6] which requires O(n) processors for n x n pixel planes and was proven t o be capable of performing DRC(Design Rule Check) for VLSI layout verification. Another point of major concem in designing image processing or graphics system is the method of receiving the imagefgraphics data from host processor and store them in such smart frame buffers as MultiRing[6] for optional local 2dimensional image processing. To reduce the bandwidth requirement on the channel between host CPU and the imagdgraphics processing engine, another hardware called Edge Painting Machine[7] was proposed, which is used for converting scanline commands into a stream of pixel data to be stored in frame buffer memory. In this paper, we propose a new hardware architecture called Ringtree performing various 2-D image processing, image generation with scanline command processing and multiwindowing for graphics applications. Ringtree is composed of three parts, EPT[7], Ring memory which is a kind of smart frame buffer[6] and LPA as shown in Fig. 1. Ring memory is a special frame buffer where the pixel data is rotating m their respective row, while undergoing necessary transformation as they pass through two vertically long gates. One gate, LPA (Gate 1) is responsible for performing most of the local 2-D image processing operations. The other gate (Gate 2) is used for the purpose of either transmitting the scanline image generated at the EPT t o the frame buffer (Ring memory) o r masking the rotating pixel data with the mask data available at the leaves of the EPT. Ringtree has two input ports, i.e., root processor port of the EPT receiving the scanline commands and the serial input port at the Gate 2 receiving the

80 1

ISCAS88

CH2458-8/88/0000-0801$1.OO 0 1988 IEEE

pixel data in serial fashion. Detailed description o EPT, Ring f memory and LPA are given in section II and II respectively. I The VLSI implementation issues of Ringtree is discussed in section IV. Finally, three major applications for Ringtree, i.e., image processing, graphics processing and VLSI layout verification are explained in section V with the simulation results.
11. EPT(EDGE PAINTING TREE)

As is shown in Fig. 2, EPT consists of a pipelined binary tree of specialized processors and image buffer whose cells are one-to-one connected to each leaf of the tree. The image buffer is only one scanline long and temporarily stores the pixel data produced at the leaves of the tree. To explain the operation of EFT, we assume a scanline command @., ) enters I,= the root. We also assume that there are P pixels per scanline and there are P/2 first-level &(leaves), and P/4 second-level nodes, ... and one n-th (n=log P) level node(root). XL and XR are then represented as n-bit %nary numbers, respectively. The basic operation o the root processor is to compare f the MSB of XL and XR. When the MSB of XL is '0' and l, the MSB of XR is ' 'the painting region is divided into two subregions, i.e., one region is in the left subtree extending from XL to the maximum coordinate of the left subtree, and the other region to be painted in the right subtree extends from the minimum coordinate of the right subtree to XR. I MSBs of f XL and XR are both Q', the painting region is to be d i n e d in the left subtree. If both MSB's of XL and XR are all ' ' l, the painting region is to be confined in the right subtree. The data is transmitted only to the relevant subtree(s) with the MSB of current data stripped off. Therefore, the XL and XR data in the processors just below the root is (n-1)-bit long and the subsequent data movement is performed according to the MSB's of current XL and XR. Similar operations occur in each node processor of the binary tree. F g 3 describes the (XL,XR) data to be propagated to the i. left and right child of the current node according to the condition of MSBs of (XL,XR) data of the current node. Scanline command processing at each level of the binary tree occurs in a pipelined fashion, such that while the second MSB's are compared in the next lower level (children of the root), the MSBs of XL and XR in the next scanline command are used in the root node for determining the redirection of the scanline command. Since the current MSBs of XL and XR data in the present level of the tree are used only for data redirection, and stripped off when the XL and XR are propagated down the next level, there is no more bit to propagate in the leaf node level where the pixel data written onto the corresponding positions o the image buffer according to the @., ) values at f I,= the leaves. While the function of EPT is the same as that of raster graphics engine in the Super Buffer[4], the hardware complexity of EFT is significantly reduced Owing to the use of one-bit comparator rather than log,P-bit comparator in (41. ( P is the number of pixels per scanline. ) 111. RING MEMORY AND LPA As was shown in Fig. 1, Ring memory is a special frame buffer which is a recirculating two-dimensional memory plane. The Ring memory can be implemented as multiple rows of dymmic shift registers. The pixel data stored in Ring memory cncles around in a column-synchronized fashion and undergoes necessary transformations as they pass through LPA(Linear Processor Array). Fig. 4 shows a block diagram of the unit processor in LPA which consists of three switches (nxl-SW, nx2-SW, 2x1-SW) and three submodules (Boolean module, Geometry module and Load-back stage), where the number of bit map planes is n. n xl-SW is a simple switch to select one plane out o the n inf

put planes in the Ring memory. We let Ct denote the selected pixel data in the current unit processor at time point t (t = -, 0, signifies past, present and future, respectively) The output of n x l switch will be stored m a 3-stage shift register as C-,C and C + constructing 3x3 window with O those from the upper unit processor LU-, U, U+) and t t m e ' from the lower unit processor ( L - , L ,L + ) as shown in Fig. 5. This 3x3 window becomes the data for the geometry operations. On the other hand, C,, C,, C,, and C, are also fed to nxZ-SW, where two ( B , , B2) of them are selected as the inputs to the Boolean module which executes various Boolean operations such as AND, OR, NOT and Copy between planes. The outputs of the Geometry and Boolean modules are fed to 2x1-SW whose output is Po. P is fed to the output stage either t o replace or to be ORed &h one of c,, c,, c,, . . . and C,,. In the output stage, . . . and C,, are delayed by four clocks to be c,, c,,c,, synchromed with P o . Basic instructions of LPA consists of 4 Boolean instructions and 16 geometry instructions.

1) Boolean instructions

The Boolean instructions that LPA performs are pixelwise AND, OR, NOT, and Copy operations.
2) Geometrical instructions

There are three geometrical instructions, 'Shriak'(SHR) and 'Expand'(EXP) and Move'(M0V). When the EXp(SHR) instruction is executed for each geometrical primitive, its widths in X and Y directions are enlarged(shrunken) as much as 1 unit length.
3) Detection instructions Detection instructions are respwsible for searching for the specific pattern in the given 3 x 3 window, and consist o inte f rior detection (INDTC) and exterior detection (EXDTC). These are described using Boolean expression as follows.

m c

= P.((UO.LO)+(C- .c +)+(L - .U+ - . .8 i

'$

+(U-.L+.i-.O

))

eq(2)

, where -(tilde) denotes complementing.


I"C is used to check for the width rule in the 3 X 3 window shown in Fig. 5 . A prerequisite for I"C to be '1' is that @, the center of 3 X 3 window, should be 1, which is the first term of eq.(l). 2 A width rule error in the vertical as well as horizontal direction is reflected in the first and second terms within the outemnost parenthesis (OR term) in eq.(l). The third and fourth OR-term in eq.(l) check for the width rule error in the two diagonal directions. Exterior detection instruction, EXDTC can be explained in a similar fashion as INDTC except that all the Boolean values are complemented.
IV. VLSI IMPLEMENTATION To implement the aforementioned Ringtree in VLSI technology for 1024x1024 pixel graphic display system application, the whole Ring memory frame buffer is first divided mto 64 slices each of which consists of 1024 (pixeldrow) X 16 (rows) = 16 K pixels. Accordiogly, the number of processom in LPA in each Ringtree chip is reduced to 16. Subdivision o the W e f binary tree (EPT) which is 10 levels deep with 1024 leaves into 64 identical units is less straightforward. A small EFT 4 levels deep and 16 leaves wide is incorporated into each chip, while the upper six levels are realized with a linear cham called EPC(Edge Painting Cham) as shown in Fig. 6. A similar idea was implemented in a binary pipelined multiplier m [SI. EPC is constructed of a cascade of processing elements (PE's) and ad-

802

dress registers. Each PE receives two n-bit data XL and XR

from its upper level PE and sends two (n-1)-bit data, XL' and )(R' to its lower level PE, according t o the values of the MSBs of XL and XR, as well as the corresponding bit value of the chip address, which is unique to each Ringtree chip and thus supplied and stored in the PROM resident in each chip.
V. VARIOUS APPLICATIONS OF RINGTREE
1) Image Processing Noise suppression; Noise is defined as small image fragment whose widths in either vertical or horizontal dimension are less than a specified number of pixels. Noise can be suppressed through two steps of basic instructions, that is, shrinking the original image by the amount of the specified noise width and expanding the shrunk image by the same amount. Notch elimination; Notch is a noise in the bit complement plane of the original image plane. Therefore, notch can be eliminated through the reverse order o the two steps applied t o f the noise suppression. Contour extraction; The contour extraction is obtained by ANDing the original image and the negated image of the image obtained by shrinking the original image by the amount of one pixel width.

7(a) is illustrated in Fig. 7(b). This example consists of rectilinear patterns in layer A and B. Four kinds of design rules, that is, width, spacing, extension, and enclosure rules were tested successfully.
VI. CONCLUSION

In this paper, we proposed a hardware architecture called Ringtree for image processings, graphics processings and layout verification in VLSI. The proposed processor consists of three parts, Ring memory which is a special memory, EFT which is a rasterization processor with a binary tree structure, and LPA which executes the modification of bit map data stored in Ring memory. Various application examples of Ringtree were explained with the results software simulation using C language. VLSI implementation issues of the Ringtree for 1024 X 1024 pixel graphics applications was also studied using the slicing structures, where each slice consists of horizontal strips of Ring memory, small EPT and EPC.
REFERENCES

2) Graphics processings Most o recent graphics workstations require fast and f powerful multi-window functions. On top of the image generation capability of EPT, Rugtree is capable of performing such functions as pattern filling, PD(BLT(PIXe1 BLock Transfer) and hardwired multi-window function including nonrectangular, nonconvex window.
3) Layout Verification

Another application of Ringtree is DRC(Design Rule Check) in VLSI layout. Compared to any other hardware architectures, Ringtree is very flexible against the variations of design rules since the number of basic operations into which the required instruction is decomposed does not affect the hardware architecture. The design rules described in this paper were taken from [8]. The application programs running on Rugtree for DRC in IC layout are presented in a simplified form. The layers 2 and 3 in Ring memory are assumed as reserved layers prohibited from being used as input layers when Ringtree is used for DRC applications. Widlh rule checking; Since the interior detection operation, INMC of LPA checks only for 2 A width rule (where 1 A width rule is illegal, while minimal 2 A width is legal), successive shrinking and interior detection instruction is required for reporting errors for all patterns having less than n A widths. Spacing rule checking; Since the intra-layer spacing rule check is a problem which is complementary t o the width rule check, it can be understood in a very similar way. The interlayer spacing rule check n& extra processes such as zero inter-layer spacing check and the elimination of the bays in the two individual layers whose widths are smaller than n A's. The bay elimination prevents checking the intra-layer spacing errors in the two layers. Extension and Enclosure rule checking; The extension rule check is to report the insufficient amount of extension o f the patterns in one layer over the patterns in another layer, for example, polysilicon over diffusion, depletion implant over POlysilicon, etc. The extension rule check algorithm consists of 1 A expansion of all the patterns in layer B in four directions and subtraction of layer A from layer B, which is followed by (n+l) A width rule check for layer B. The extension rule check algorithm can be used t o check for the enclosure rule error, where layer A is regarded as the contact cut layer. Simulation results of DRC for an example shown in Fig.

[ l ] R. A. Rutenbar, T. N. Mudge and D. E. Atkins, "A class of Cellular Architectures t o support Physical Design Automation." IEEE Transaction on CAD of IC's and systems, Vol. CAD3,No.4, pp 264-278, October 1984 [2] T. Blank, M. Stefik and W. vanClemput, "A Parallel Bit Map Processor Architecture for DA Algorithms." Proc. 18th Design Automation Conference, pp 837-845, 1981 [3] R. Nair, S . J. Hong, S . Lila and R. Villani, "Global Wiring on a Wire Routing Machine." Proc. 19th Design Automation Conference, pp 224-231, 1982 [4] N. Gharachorloo and C. Pottle, "SUPER BUFFER : A Systolic VLSI Graphics Engine for Real Time Raster Image Generation", 1985 Chapel Hill Conference on Very Large Scale Integration , Computer Science Press, Inc., pp. 285-305. [5] J. Poulton, H. Fuchs, J. D. Austin, J. G. Eyles, J. Heinecke, C. H. Hsieh, J. Goldfeather, J. P. Hultqukt, S . Spach, : "PIXEL PLANES : Building a VLSI-Based Graphic System", 1985 Chapel Hill Conference on Very Large Scale Integration , Computer Science Press, Inc., pp. 35-60. [6] K. S . Eo and C. M. Kyung, " A Two-Dimensional Geometry Processor for DRC Applications", Proc. 1987 IEEE Region 10 Conference, pp. 266-270, Aug. 1987, Seoul, K -. O [7] S . S . Kim, K. S . Eo and C. M. Kyung, "Edge Painting Machine : A Hardware for Image Rasterization", Proc. 1987 IEEE Region 10 Conference, pp. 115-119, Aug. 1987, Seoul, KOREA. [8] C. Mead and L. Conway, "Introduction to VLSI system." Addison Wesley, 1980, Chapter 2.
Gore 2

Edge

Gore 1
(Linear Procesxx Arroy I
'

Rlng

Memory

pixel dofo in

pixel'dora out ( t o Display)

Figure 1. Block diagram of Ringtree.

803

scnnline ammonds ( X - left, X-right) root

..

Figure 2. Block diagram of Edge Painting Tree.


XLXR

Video sipnal Pmssor 0

Rocessor 1

Processor 63

Figure 6. Implementation scheme of Ringtree using 64 identical VLSI chips.

Figure 3. Rule for determining XL and XR data for the left and right child from the MSB of the XL and XR data o their parent, where R(N) denotes (n-1)-bit f number made by dropping the MSB o n-bit biaary f

number, N

*+
m.

n
I I t--l

Ill
I

I
I I
I

I I
I
I
I

c; L;
Figure 4. Block diagram of the unit processor m the Linear Processor Array.

E l
cC O

c+

Figure 5. The 3 x 3 window of bit map data used as input o f LPA

Figure 7. Simulation results of Ringtree for DRC applications (a) Two mput layers A and B (b) Simulation results of Ringtree for the given input layers. Locations of various DRC errors are shown with their identification numbers explained below. 1 width error for layer A. 2 spacing error for layer A. 3 spacing error between layer A and B. 4 extension error of layer B from A. 5,6 width error to diagonal direction of layer A. 7 enclosure error of layer A from B.

804

You might also like