You are on page 1of 11

248

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 3, NO. 3, JUNE 1993

Coding and Cell-Loss Recovery in DCT-Based Packet Video


Qin-Fan Zhu, Student Member, IEEE, Yao Wang, Member, IEEE, and Leonard Shaw, Fellow, IEEE Even worse, depending on the coding and synchronization methods used, the loss of a single cell can corrupt either part of a frame or as much as several frames. In the past few years, various approaches have been proposed to combat the cell-loss problem in DCT-based as well as other coding systems [6]. One popular method is to exploit the insensitivity of the human visual system to the high-frequency component of video signals by using layered coding [71-[lo]. With that scheme, the lowfrequency coefficients and other more important information (such as coding modes and motion vectors) are transmitted with a higher level of priority and protection than for the less important high-frequency components. When channel congestion occurs, the high-frequency components are simply discarded. To limit error propagation in coders using motion-compensated prediction, studies have also been made on how to perform motion compensation only over the low-frequency components, such that discarding high-frequency coefficients will not cause error propagation [111. Leaky prediction has also been investiI. INTRODUCTION gated, which reduces the effect of previously corrupted ISCRETE cosine transform (DCT) coding is curframes to the current frame by attenuating the prediction rently the most effective and popular technique for results by a leakage factor [7], [121. image and video compression. It has been adopted in With the above hierarchical transmission scheme, the several international standards for image/video storage low-priority, high-frequency components can be simply and transmission [11-[3]. Although there has been signifireplaced with zeros if they are lost. The resulting image cant progress in the algorithm optimization and very will be blurred, but usually still acceptable. But when the large-scale-integration (VLSI) implementation of these high-priority, low-frequency components are damaged, coding algorithms, questions still remain about how to zero or mean substitution will result in noticeable artibest apply DCT-based coding in broadband integratedfacts. To overcome this problem, simple spatial or temposervices-digital-network (ISDN) channels employing the ral interpolation has been used. The former interpolates a asynchrous transfer mode (ATM). In spite of the many damaged area from its adjacent, undamaged regions in advantages offered by the ATM protocol, potential cell the same frame. Its performance depends on the coding loss remains as the main obstacle to video transmission method and image content. For block-transform coders, over these networks [4]-[6]. Due to the high bit rate interpolation cannot produce satisfactory images because associated with video signals, a fairly low loss rate can several consecutive lines will be damaged when one packet result in many damaged regions in a short period of time. is lost. Temporal interpolation replaces the missing region in a current frame with the corresponding area in the Manuscript received July 7, 1992; revised November 9, 1992. This previous frame or the region specified by the motion work was supported in part by a grant from the New York State Science vectors if available. Although working well for slow-moand Technology Foundation to the Center for Advanced Technology in tion sequences, this cannot yield satisfactory results in the Telecommunications at polflechnic University, Brooklyn, W . paper was presented in part at the SPIE Conference on Visual Communi- presence of fast-moving objects and background-lighting cations and Image Processing, Boston, MA, November 1991 and Novem- changes. It can even have a disastrous effect when a loss ber 1992, and at the IEEE Data Compression Conference, Snowbird, adjacent frames Utah, March 1992. Paper was recommended by the special issue guest occurs during a Scene change and the are entirely different. Furthermore, this is not applicable editors. Q.-F. Zhu is with Motorola Codex, Mansfield, MA 02048. Y. Wang for still-image transmission at all. and L. Shaw are with the Department of Electrical Engineering, PolyTo improve the reconstruction quality, we have develtechnic University, Brooklyn, NY 11201. oped an adaptive scheme that varies the interpolation IEEE Log Number 9208514.
Abstract-This paper considers the applications of DCT-based image- and video-coding methods in the asynchronous transfer mode (ATM) environment. Coding and reconstruction mechanisms are jointly designed to achieve a good compromise among compression gain, system complexity, processing delay, errorconcealment capability, and reconstruction quality. The Joint Photographic Experts Group (JPEG) and Motion Picture Experts Group (MPEG) algorithms for image and video mmpression are modified to incorporate block interleaving in the spatial domain and discrete cosine transform (DCT) coefficient segmentation in the frequency domain to conceal the errors due to packet loss. A new algorithm is developed that recovers the damaged regions by adaptive interpolation in the spatial, temporal, and frequency domains. The weights used for spatial and temporal interpolations are varied according to the motion content and loss patterns of the damaged regions. When combined with proper layered transmission, the proposed coding and reconstruction methods can handle very high packet-loss rates at only slight cost of compression gain, system complexity, and processing delay.

1051-8215/93$03.00 0 1993 IEEE

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

ZHU et. al.: CODING AND CELL-LOSS RECOVERY IN DCT-BASED PACKET VIDEO

249

coefficients in spatial, temporal, as well as frequency domains according to the motion contents and loss patterns of the damaged regions [13]-[15]. The interpolator is derived by imposing smoothing constraints between adjacent (spatially and temporally) samples in the block and along the block boundary. The recovered block is maximally smooth among all those with the same received coefficients and boundary conditions. This method can effectively recover the lost low-frequency components, while still preserving the received high-frequency contents. When the damage occurs in isolated regions, satisfactory images can be obtained even when the dc and many low-frequency ac coefficients are lost. When contiguous blocks are damaged, the algorithm must be iterated several times, using the previously reconstructed values as the boundary condition in the current iteration. The processing in such a case, however, requires more time and yields less satisfactory results. In this paper, we consider the integration of DCT-based image- and video-coding methods into ATM networks. To minimize the effect of packet loss and facilitate loss recovery using our reconstruction algorithms, we propose to modify the Joint Photographic Experts Group (JPEG) and Motion Picture Experts Group (MPEG) algorithms [ 11, [2] to incorporate block interleaving in the spatial domain and DCT coefficient segmentation in the frequency domain, such that the information of spatially adjacent blocks is assembled into separate packets and that the information of the same block is spread over several packets. In this way, the loss of a packet will only affect isolated (noncontiguous) image blocks, and within each damaged block, only partial information will be lost. Using our reconstruction algorithm, the damaged regions can be reconstructed quickly and satisfactorily. Fig. 1 shows the overall coding and packetization system incorporating such block interleaving and coefficient segmentation (BICS). The use of BICS can not only significantly enhance the resilience of the system to packet loss, but also facilitate layered/progressive transmission and packet-loss recovery. However, the incorporation of BICS may also add processing delays in both transmitter and receiver. In addition, it may reduce the achievable gain by the runlength coding and DC prediction used in the JPEG and MPEG algorithms. Our objective is to jointly develop approaches for coding, packetization, transmission, and reconstruction, such that the packet-loss effect can be minimized at the least cost of compression gain, system complexity, and processing delay. Toward this goal, different alternatives for BICS have been compared, and a simple scheme using even/odd block interleaving and fixed-break-point coefficient segmentation has been adopted. Layered transmission of the bit streams in different bands has also been investigated. Simulations with real video sequences have shown that the proposed coding and reconstruction algorithms, when combined with proper prioritization in transmission, can handle very severe packet-loss rates at only slight cost of compression gain

B1:Block Interleaving
PCKT Packetization

CS: C d c i e n t Segmentation

EC: Entropy Coding

N I Network Interface

Fig. 1. System configuration incorporating block interleaving and coefficient segmentation.

and processing delay compared to those of the original JPEG/MPEG coding algorithm. The arrangement of this paper is as follows. Section I1 discusses the alternatives for BICS and describes a coding and packetization system incorporatingthe proposed BICS method. Section I11 presents the loss-recovery algorithm for the video case, which includes the algorithm for still images previously presented in [14] as a special case. Section IV shows simulation results of the proposed coding and reconstruction algorithms under different layered transmission schemes. Finally, Section V summarizes the main results and suggests some future work. For brevity, certain terminology in the JPEG/MPEG standards is used without giving definitions. The readers are referred to [16], [17] for description of these coding algorithms.
AND WITH 11. CODING PACKETIZATION BLOCK INTERLEAVING AND COEFFICIENT SEGMENTATION

A. Block Interleaving

In order for block interleaving to be effective in isolating transmission errors, it should separate the information of nearby blocks as far as possible. However, this will lead to encoding and reconstruction delays. It may also reduce the prediction gain in coding the dc coefficients of intracoded blocks and the motion vectors in intercoded blocks. Although the random block-interleaving method suggested in [18] can effectively conceal the effect of packet loss, it introduces excessively long delay and also defies the use of prediction for dc coefficients and motion vectors. To minimize processing delay and compression-gain reduction, a simple even/odd block-interleaving scheme has been adopted. In this scheme, contiguous 8 X 8 blocks in an image (original or after motion compensation:) are coded and packetized in the order shown in Fig. 2. Successive packets are first filled with even-indexed (denoted by a ) blocks in one slice of macroblocks and then followed by the odd-indexed (denoted by b ) blocks in the same slice. This way, when a packet containing the odd-indexed blocks is damaged, their adjacent even-indexed blocks are usually still available. Using the proposed reconstruction scheme, the damaged odd-indexed blocks can be effectively recovered. In addition to spatial block interleaving, temporal interleaving can also be used to prevent error propagation across several contiguous frames. For example, using even/odd frame interleaving, the loss can in most cases be limited to nonadjacent even or odd frames, but not

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

250

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL.3, NO.3, JUNE 1993

Fig. 2. Even-odd block interleaving.

both. Visually, the distortion may be less objectionable if there is a good frame between every two damaged or poorly reconstructed frames. However, such interleaving may incur excessive delays that are not acceptable for real-time applications. It may also reduce the motioncompensated prediction gain substantially. For these reasons, temporal interleaving is not incorporated in the proposed system.

B. Coefjicient Segmentation Coefficient segmentation, or more precisely, information segmentation, splits the information of a block (including side information and motion vector, as well as the DCT coefficients) into a few groups or bands. In order to preserve the compression gains of the JPEG and MPEG algorithms, sequential segmentation is preferred over random interleaving, because the latter will break the run lengths of zeros and render run-length coding inefficient. Furthermore, sequential segmentation also supports both layered and progressive transmission [191, [201. Two approaches have been considered for sequential segmentation. One is jixed-rate segmentation, by which the ratios of bit rates among different bands are kept at certain prescribed constants. This method performs segmentation over the coded bit stream and the original JPEG/MPEG algorithr can be applied for coding without modification. The bit rates of different bands are user-controllable, which is desirable in applications requiring constant bitrate transmission. The drawback of this method is that the content of the first codeword in each band, except in the first band, needs to be provided as side information to guarantee that each received packet be decodable on its own. Fixed-rate segmentation has been used in [7], [21]. In their systems, the coded information is divided into two bands. The information content of the second band is not provided, and the second band is simply discarded when the first band is damaged in the corresponding blocks. The second approach is jixed-point segmentation, which splits the original information before coding. The advantage of this method is that no side information need be transmitted about the code-word content of each band and it therefore introduces no overhead due to coefficient segmentation. The problem with this approach is that the JPEG/MPEG algorithms have to be modified such that run-length coding is only exercised within each band. This will break the zero run lengths between nonzero coefficients and reduce the run-length-coding gain. Such loss, however, can be compensated for by using a different codebook for each band, optimized based on its own statistics as proposed in [201. Another advantage of

ked-point segmentation over ked-rate segmentation lies in the ease it provides for cell-loss recovery, as will be shown in Section 111. Since our system is designed for ATM networks, data rates from different bands do not have to be fixed. Based on the above considerations, fixed-point segmentation has been adopted. In the proposed system, the information for each block is split into four bands. The break points between successive bands are chosen based on two criteria. The first is that different bands should form a natural hierarchy in terms of visual importance to facilitate layered transmission supported by the ATM protocol. The second requirement is that the average bit rates of different bands should be similar, to reduce the processing delay. The first criterion is satisfied by segmenting the information of a block sequentially in the order of side information, motion vectors, and then the DCT coefficients in the usual zigzag order. More specifically, for blocks in still images or intracoded frames, the side information, the dc, and the first few low-frequency coefficients are put into the first band, and the rest of the coefficients are segmented into three bands. For intercoded blocks (in the MC or No-MC mode), the coding mode and other side information are put into the first band and the motion vector (in the MC mode) into the second. The DCT coefficients (in the MCor No-MC-Coded mode) of the error image are assembled into another two bands. The second requirement is fulfilled by selecting the break points as those that equally divide the cumulative distribution of the bits required by the DCT coefficients. Fig. 3 illustrates the selection of break points for intracoded blocks. In this figure, each dashed line is the cumulative bit distribution of a chosen training image, and the solid line is the average distribution. The break points are those that split the solid curve into four equal segments. The resulting bands consist of DCT coefficients with indices 0 to 2,3 to 7,8 to 14, and 15 to 63, respectively, as shown in Fig. 4(a). For intercoded blocks (unidirectional or bidirectional), a similar approach is used to arrive at the segmentation shown in Fig. 4(b), in which the DCT coefficients are split into two bands containing the coefficients with indices 0 to 9 and 10 to 63, respectively.
C. Coding

To incorporate the even-odd block interleaving and fixed-point coefficient segmentation, the JPEG and MPEG standards for image and video compression have been modified. First, the prediction of the dc coefficients for still images or intracoded blocks is performed in the interleaved order shown in Fig. 2. That is, an even-indexed block is predicted from the previous even-indexed block and vice versa. To preserve the gain of motion estimation and motion-vector prediction, predictions are still performed over macroblocks formed according to the original ordering of the blocks. Secondly, the run-length coding of the quantized coefficients is performed within individual bands. More specifically, the coefficients in each band are first converted into symbols, each repre-

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

ZHU et. al.: CODING AND CELL-LOSS RECOVERY IN DCT-BASED PACKET VIDEO
Number of bib

251

lZO-:

'

MlT

30

'J
0

....

Sampled . . . ...... . . . . .

codebooks for individual bands [20]. Several representative video sequences coded at different quantization step sizes have been used for codebook design. The interleaved prediction of the dc coefficients is slightly less efficient than the sequential prediction used in the original JPEG/MPEG standards. But it can significantly enhance the error resilience of the system. We have compared the total number of bits used for dc coefficients by the sequential and interleaved prediction over three intracoded test images. The average increase in the bits for the dc coefficient by the interleaved prediction is 6.1%. When the comparison is made in terms of the total number of bits, including the ac coefficients and other side information, the increase is negligible.
63

14

21

28

35

42

49

56

D.Packetization
In both the image and video cases, the proposed coders generate four bands of bit streams. For all four bands in the still images/intracoded frames and the last two bands in the intercoded frames, the bits over sequential blocks are assembled into ATM cells in the interleaved order shown in Fig. 2. The bits in the first two bands of intercoded frames from consecutive macroblocks are assembled sequentially. The band index and the address of the first block or macroblock are specified in the beginning of each cell. Further, the packetization is performed on the code word basis, such that a code word that cannot fit into the remaining space of a cell is put into the next cell, and the current cell is filled with null bits. The specific packetization format is shown in Fig. 5. In the 48-byte payload field of each cell, the first three bits specify the band (2 bits) and odd/even index (1 bit) information. The following few bits describe the spatial location of the first block included in the packet. For images in the SIF format with 352 x 240 active pixels or 44 X 30 blocks of 8 X 8 samples, 10 bits are sufficient to specify the block address, together with the previous odd/even bit. The rest of the packet is then filled with the code words left from the last block in the previous packet, followed by the code words of the remaining blocks in this band, with possible insertion of null bits at the end. With the block-address information, the codewords in each received packet are directly decodable, except for those in the beginning of the packet left from a block contained in a lost packet, unless backward decoding is performed. The redundancy introduced by adding the band and block-address information is only 3.4%.

DCT index

Fig. 3. Selection of the break points according to the accumulative bit distribution.

:7,7)

0)
Fig. 4. Coefficient segmentation for (a) intracoded and (b) intercoded blocks.

1 1 RECOVERY PACKET LOSS 1. OF senting a zero run length and the following nonzero value. With the proposed coding and packetization method The last run length of zeros in each band is replaced by incorporating BICS, the loss of a packet will usually only an end-of-band (EOB) symbol (necessary only if the last destroy a partial set of coded information in the damaged coefficient in the band is zero). To encode the symbols in blocks. The reconstruction of a damaged block from the different bands, either a single codebook or multiple received partial information is an ill-posed problem in codebooks can be used depending on the desired trade- that many solutions exist. Additional a priori information offs between compression gain and complexity. Since each about image signals is required to arrive at a desirable band has different characteristics and statistics, additional solution. It is well-known that the spectra of common compression gains can be achieved by using separate images have low-pass characteristics. In the spatial

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

252

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 3, NO. 3, JUNE 1993

I(ftz,l o f / I l
Band

Codewords left from the last Codewords from the 1st block, 2nd Null bits 1st block block i the block, ..., last block n ( 1 bit) (10 bits) previous packet

Odd/ Address

2:

11

Fig. 5. Contents of the payload field of a cell carrying the coded bit stream.

Let f be the vector consisting of the samples in an image block in the current frame to be coded and fp the (unidirectionally or bidirectionally) predicted block. Let e = f - fp be the vector corresponding to the predictionerror block. The transform of the error vector can be represented by
a
=

TTe

domain, this is reflected by the abundance of flat areas. Even in edge areas, the transition is often slow. In the temporal domain, this corresponds to the fact that the difference (after motion compensation) between adjacent frames is usually small. Such spatial and temporal smoothness is utilized in the proposed reconstruction algorithm. Depending on which part of an image is damaged, different operations are required. For still images or intracoded frames, only the missing DCT coefficients need to be recovered. On the other hand, for intercoded frames, the motion vectors as well as the DCT coefficients may need to be estimated. The reconstruction of motion vectors is accomplished by linear interpolation. Since motion vectors are predictively coded and sequentially packetized, the loss of a packet often damages the motion vectors in many adjacents blocks in the same row. Therefore, a missing motion vector is interpolated only from the motion vectors of the upper or lower neighboring blocks, rather than those in the horizontally neighboring blocks. Although the median filter suggested in [12] may be more effective, a simple mean filter is used for simplicity. We have found that it works reasonably well. As will be described later, the inaccuracy of this interpolation is compensated for in the reconstruction process by relaxing the temporal smoothing constraints. To recover the missing DCT coefficients of an intracoded block or the prediction error of an intercoded block, the proposed method imposes spatial and temporal smoothing constraints on the reconstructed block and finds a block that is maximally smooth among all the blocks with the same received coefficients and boundary condition. To reduce computation complexity, this method is only applied to the lost dc and low-frequency coefficients in the first three bands for the still-image case, and the third band for the video case. Any lost coefficients in the last band are simply replaced by zeros. This is justified because the recovered high-frequency coefficients using the proposed method would be zeros in most cases to satisfy the smoothing constraint. If the coding mode of a block is lost, then its motion vector and DCT coefficients are not decodable and also need to be estimated. In our reconstruction scheme, we simply treat such a block as intracoded and perform reconstruction by imposing the spatial smoothing constraint alone. In the following, we first present the reconstruction algorithm for blocks coded in the intermode and then derive as a special case the solution for the intramode. We assume that the motion vector in a damaged block is either received or previously estimated.

where a is the vector consisting of the transform coefficients and T the matrix consisting the basis vectors of the DCT. Let 5 be the coefficient vector after quantization. Then, in the absence of transmission errors, the prediction error and the original image block can be reconstructed by E =TH and f = 6 + fp.

Suppose that some of the DCT coefficients are lost during transmission. Let 5, represent the subvector containing the correctly received coefficients and 9, the estimates of the lost coefficients. Further, let T, and T, be the submatrices composed of the basis vectors in T corresponding to the entries of 6, and P,, respectively. Then the reconstructed error and the original image block can be described by

T,C,

+ T,A,

and

i = f + fp.

(1)

To !etermine 9,, we require that the resulting image block f described in (1) be as smooth as possible. The following smoothness measure is used in the proposed algorithm

= -

1 w ( i T S i- 2bTi+ c ) 2

+ (1 - w)fTf].

(2)

The first term in (2) measures spatial smoothness and the second term temporal smoothness. The spatial term is essentially a weighted sum of squared pixel-wise differences, minimization of which forces the pixels to be connected smoothly with each other within the block and with the boundary pixels outside the block. The temporal term is the energy of the error vector, minimization of which enforces smooth transition between corresponding regions in adjacent frames. The matrices S,, S e , S,, and S, depend on the amount of smoothing to be imposed between every two samples in the directions towards west, east, north, and south, respectively. The vectors b,, be, b,, and b, are composed of samples in the one-pixel-wide boundaries outside the block in the above four directions. More detailed definitions can be found in [13], [141. The weighting factor w controls the relative contribution of the spatial and temporal smoothing constraints and should be chosen according to the possible degrees of spatial and temporal smoothness in the damaged block.

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

ZHU et. al.: CODING AND CELL-LOSS RECOVERY IN DCT-BASED PACKET VIDEO

253

For blocks coded in the Intra or No-MC-Coded mode, 0, Tl = T. The solution in (3) becomes regions often uncovered by motion, a larger w should be -1 1-w used to emphasize the spatial smoothness constraints. On a n o - c o e f = - + T'ST) I TT(b - Sf,). the other hand, for blocks coded in the MC mode, a W smaller w should be used to emphasize the temporal smoothing. Further, more spatial smoothing should be When w = 0, which makes use of the temporal correlaimposed when the motion vector is lost, to compensate for tion alone, the objective function in (2) consists of the the error in the estimated motion vector. This should be second term only, and the optimal solution is done even for the blocks in the MC-Not-coded mode, 6=0 and f =fp. which would have been simply replaced with the prediction block if the motion vector were received. To reduce In this case, the damaged block is simply replaced by the the system complexity, all possible combinations of the corresponding block in the previous frame, without makcoding modes and band-loss patterns are clustered into ing use of the received coefficients. four groups, and each group uses a fixed weighting factor. The above discussion is for intercoded blocks. The Table I lists the weighting factors used in our simulations solution for intracoded blocks can be obtained by letting for different coding modes and loss patterns. w = 1 and fp = 0. The optimal solution in (3) then beEquation (2) is a quadratic fynction of P,, since f' and 2 comes are linearly related to 1, according to (1). Hence, @ (a,) PinIra = (T:ST,)j1T;(b - ST,H,) has only one minimum and reaches its minimal point iopt when the gradient vanishes, i.e., which includes only spatial- and frequency-domain interpolation. When no coefficients are available, or when the - a* - TT(w[S(fp + T,C, + T,Pop,) - b] coding mode is lost and the damaged block is treated as asopt an intracoded block, the above solution is further simplified to +(1 - w)(T,H, + Tisopt))= 0.

When w

0, the optimal solution Popt is given by

*[b - Sfp - (S
=

+VI)T,C,]
(3)

Ab

+ BfP + CC,

where

B = -AS

1-w

(4)
The solution in (3) essentially consists of three linear interpolations, in the spatial, temporal, and frequency domains, from the boundary vector b, the prediction-block fp, and the received coefficients C,, respectively. It can maintain the information from the received coefficients, while enforcing the reconstructed image to be as smooth as possible. Compared to the conventional methods using only spatial and/or temporal interpolation, the reconstructed image is less blurred when some high-frequency coefficients are present. When all the coefficients are lost in a damaged block, the proposed method reduces to spatial and temporal interpolation only. In this case, 5, =

which is a simple spatial interpolation from the boundary values. The solution defined in (3) can be implemented in two ways. The direct approach is to compute the matrices A, B, and C defined in (4) on-line according to the coefficient loss pattern (different T, and T,) and the weighting factor w. This will be very time-consuming. An alternative is to precalculate the matrices for all possible coefficient loss patterns and weighting factors and only perform the necessary transformations during reconstruction. This is feasible in the proposed coding/packetization system since there are only a few possible loss patterns and weighting factors, as listed in Table I. Up until now, we have assumed that each damaged block is surrounded by undamaged blocks and the values in the one-pixel-wide boundary outside a damaged block (contained in the vector b) have been used for its reconstruction. This is true in most cases with the proposed coding/packetization method incorporating even-odd block interleaving, since the loss of a packet will only affect the even- or odd-indexed blocks but not both. Nonetheless, cases where contiguous blocks are damaged may still occur when multiple packets containing adjacent blocks happen to be lost. To handle such situations, each block in an image is first reconstructed using the inverse DCT with the missing coefficients set equal to zero. 'The optimal solution defined in (3) is then repeatedly applied to the damaged blocks, using the values reconstructed from a previous iteration as the boundary information for the current iteration. It has been found from our simulations that an average of 20 iterations can yield satisfactory

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

254

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 3, NO. 3, JUNE 1993

TABLE I
WEIGHTING FACTORS DIFFERENT FOR CODING MODESAND LOSS PAlTERNS Band2and3 Band2

Loss
MC, Coded MC, Not Coded No MC, Coded No MC, Not Coded Intra or Mode (Band 1) loss

Loss
0.8 0.5 N/A N/A 1.0

Band3 Loss 0.2

0.5 N/A N/A

N/A 0.8

results. When a block is surrounded by undamaged blocks, one iteration will yield the optimal solution.

RESULTS IV. SIMULATION


The proposed coding and reconstruction methods have been implemented in software. To simulate packet loss in transmission, each band of coded information was given a packet-loss probability. The positions of lost packets in each band were generated by a uniform-random-number generator. The performances of the proposed system under different layered transmission schemes have been evaluated by using several combinations of the loss probabilities. For comparison, a single-bund coding and transmission system and a copying reconstruction algorithm have also been simulated. The single-band system uses the original MPEG coding algorithm and performs sequential packetization and single-layer transmission. The copying algorithm replaces the damaged blocks by the prediction block or the block with the same spatial position in the previous frame, depending on the availability of the motion vector. When only the last band is damaged, the missing coefficients are replaced with zeros, as is done in the reconstruction algorithm proposed here. In the following, we show the simulation results of different coding, transmission, and reconstruction methods. Only the results for the video case are shown, and when appropriate, the results for the intracoded and intercoded frames are separately presented. The results for the intraframe are also meant to indicate the performance of the proposed system for still images. The test sequence has 30 frames and is formed by concatenating two video sequences, Football and Table Tennis. The former exemplifies fast-motion sequences, while the latter moderate. The transmission at frame 15 emulates a scene cut. Only the luminance components are processed. Frames 1, 15, and 29 are intracoded, while the rest of the odd-indexed frames between 1 and 30 are coded in the predictive mode. The even-indexed frames to be coded in the interpolative mode are skipped. The full-search block-matching algorithm is used to find the motion vectors in the predictive frames. The quantization tables for the DCT coefficients in the intramode and intermode are those suggested in [21 with QP = 15. In addition to visual evaluation, the peak signal-to-noise ratio (PSNR) defined by PSNR = 1010g(255~)/cr; is also used as an objective image-quality measure, where :c is the mean square r error between the processed and original images. The

sequence is in SIF format, with 352 active pixel/line, 240 line/frame, and 30 frame/s. Three combinations of loss probabilities with similar overall loss rates have been used to simulate different layer transmission schemes. Letting p , be the packet-loss probability in the ith band, the loss probabilities in the three transmission systems are: a) for one-layer, p 1 = p 2 = p 3 = p 4 = 0.1; b) for two-layer, p 1 = p 2 = 0.01, p 3 = p4 = 0.15; and c) for four-luyer, p 1 = 0.002, p 2 = 0.01, p 3 = 0.05, p 4 = 0.2. For the test sequence, the numbers of packets produced by the coder in the first three bands are similar, while that in the last band is roughly twice as many. The overall loss rates in all the three cases are therefore approximately equal, with p , = 0.1. Such a high loss rate is meant to simulate a worst case scenario. The loss rate for the single-band system is chosen to be the same as the overall loss rate in the previous cases. We first evaluate the compression gains of the proposed coding/packetization method incorporating BICS. Fig. 6 compares the numbers of packets generated in sequential frames of the test sequence by the single-band and BICS systems. For the latter, two cases are presented: one using a single Huffman codebook for the ac symbols in different bands and the other using four and two codebooks for the intrablocks and interblocks, respectively. We can see that the BICS systems require only slightly more bits than the single-band system. The total increase over all the coded frames is 15% when using a single codebook, and that is reduced to 8% when using multiple codebooks. Such a small loss in compression gain by the use of BICS is well-justified for its significant enhancement of the system-error-resilience capability. The choice between single and multiple codebooks depends on the affordable memory space in underlying applications. Fig. 7 compares the PSNR values of the reconstructed sequences obtained by the copying and the proposed reconstruction algorithms when different coding and transmission systems are used. To smooth out the effect caused by a particularly bad selection of the packet-loss positions, five passes are run for each transmission scheme using different seeds for the random-packet-loss generator and the result shown here for each case is the average of five runs. For a fair comparison, the packet-loss positions are made the same when comparing different reconstruction algorithms by using the same seed for the random-loss generator in each run. We first compare different reconstruction algorithms under the same coding-and-transmission system. It can be seen that the proposed reconstruction algorithm significantly outperforms the copying method in all the four coding-and-transmission systems. The more prominent difference in the curves in frame 15 is due to the scene change from Football to Table Tennis. Since this frame is intracoded, the proposed method using only spatial interpolation produces much better results than the copying algorithm. The average gain of the proposed reconstruction algorithm over the copying algorithm is slightly larger in the first half of the sequence, because the first part has larger motion and the

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

ZHU er. al.: CODING AND CELL-LOSS RECOVERY IN DCT-BASED PACKET VIDEO
Number of Padrets

255

effectively conceal the packet loss. On the other hand, when the proposed reconstruction algorithm is applied, the one-layer BICS system is worse on the average than the single-band algorithm. This is because the loss of a packet containing bits from the first band in the BICS system leaves more image blocks completely damaged (no side information available and all other bits not decodable) than the loss of one containing sequential bits in the single-band system. The one-layer BICS system is, however, better than the single-band system for the intracoded frames (frames 1, 15, and 29) because spatial smoothing is most appropriate for such a case. The gain of four-layer transmission over two-layer is about 1.5 dB when the copying algorithm is used for reconstruction and is diminished when the proposed reconstruction method is used. This is because the proposed reconstruction algorithm can effectively recover the lost low-frequency components contained in the third band. Since the current 1 5 9 13 17 21 25 29 ATM standard supports two priority transmission, the Frame Number two-layer BICS system may be preferred in practice. Fig. 6. Bit rates of different coding methods for sequential frames of For subjective evaluation, Figs. 8-11 show the images the test sequence. of intercoded frames obtained by different reconstruction algorithms in four separate coding/transmission systems. In each figure, the image in (a) is obtained by replacing the damaged blocks with black blocks, which shows the positions of the damaged blocks. The images in (b) and (c) are obtained by the copying and maximally smooth reconstruction methods, respectively. The image in (d) is the decoded image without loss. For a fair comparison of different coding and transmission systems, we have selected those frames in which the total numbers of damaged blocks are similar. It can be seen that, with the same coding and prioritization method, the enhancement in image quality by the proposed reconstruction method is significant. With the same loss-recovery algorithm, the improvement by using BICS and multilayer transmission is also evident. Note that the differences in the images in Figs. 8-11 16 I I I I I I 1 5 9 13 17 21 25 29 are not only due to the loss in the presented frames, but Rame Number also in frames that precede them. In order to evaluate the Fig. 7. PSNR comparison of different coding and transmission methods effect of error propagation, Fig. 12 shows the PSNR when the copying and the proposed reconstruction algorithms are used values of the first halves of the reconstructed sequences to recover the lost packets. when only the third frame was subject to cell loss. The damage in the third frame has affected all the subsequent frames until the next intracoded frame (frame 15). But, as copying algorithm suffers more from error propagation. is clearly demonstrated in Fig. 12, the proposed reconThe gains of the proposed reconstruction algorithm are struction algorithm can satisfactorily suppress the errorless substantial in the two-and four-layer systems. This is propagation effect by correcting the errors immediately because the use of BICS and multilayer transmission has after their occurrences. Fig. 13 includes the reconstructed already partially concealed the error by isolating damaged images of frames 3 and 7 in the two-layer BICS system by regions and protecting more important information. the copying and the proposed reconstruction algorithms. We next compare different coding-and-transmission The decoded images of these two frames in the absence of schemes when the same reconstruction method is used. It loss are given previously in Figs. 9(d) and 10(d). Notice can be observed from Fig. 7 that the two- and four-layer that the affected regions in frame 3 that cannot be corBICS systems outperform the single-band and one-layer rected by the copying algorithm have been propagated to systems by a large margin. When the copy algorithm is frame 7. On the other hand, frames 3 and 7 obtained by used, the one-layer BICS system is better on the average the proposed reconstruction algorithm are almost free of than the single-band system. This suggests that BICS can artifacts.
200

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

256

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 3, NO. 3, JUNE 1993

Fig. 8. Reconstructed images of an intercoded frame using different algorithms in the single-band coding-and-transmission system. From left to right, top to bottom: (a) with the damaged blocks replaced by black blocks, (b) obtained by the copying algorithm, (c) obtained by the proposed maximally smooth recovery algorithm, and (d) without packet loss. The PSNRs are 19.24, 23.95, and 28.56 dB for () (c), and (d), b, respectively.

Fig. 11. Reconstructed images of a intercoded frame using different algorithms in the four-layer BICS system. The order of the images is the same as in Fig. 8. The PSNRs are 24.02, 27.63, and 28.56 dB.

PSNR (dB)
30

Reconstr.

Two Laya

20
1 3

7 9 Frame number

11

13

Fig. 9. Reconstructed images of an intercoded frame using different algorithms in the one-layer BICS system. The order of the images is the same as in Fig. 8. The PSNRs are 20.34, 23.8, and 29.45 dB.

Fig. 12. Effect of error propagation from a damaged frame to its succeeding frames.

Fig. 10. Reconstructed images of an intercoded frame using different algorithms in the two-layer BICS system. The order of the images is the same as in Fig. 8. The PSNRs are 21.42, 24.66, and 28.42 dB.

Fig. 13. Reconstructed images after packet loss happens in frame 3 in the two-layer BICS system: (a) and (c) frames 3 and 7 obtained by the copying algorithm and (b) and (d) obtained by the proposed reconstruction algorithm. The PSNRs are 23.06, 27.66, 25.18, and 28.08 dB.

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

ZHU et. al.: CODING AND CELL-LOSS RECOVERY IN DCT-BASED PACKET VIDEO

251

Fig. 14. Reconstructed images of a frame after a scene change using different algorithms in the two-layer BICS system. The order of the images is the same as in Fig. 8. The PSNRs are 19.95, 25.78, and 26.64
dB.

Finally, we demonstrate the superiority of the proposed reconstruction algorithm during a scene cut (the Table Tennis frame preceded by the Football sequence). Fig. 14 includes the images reconstructed by different methods when the two-layer BICS system is used. As expected, the copying method gives unacceptable results, while our algorithm produces a quite satisfactory image even in such a difficult case. The above results are presented for a sequence with moderate to large motion. We have also run the same experiments for Suise, a head-and-shouldersequence with much less motion. The overall trend is still the same: the two- and four-layer systems outperform the single-band and one-layer systems; the proposed loss-recovery method improves upon the copying algorithm. The gain in each case is smaller, though, because of the less significant changes from frame to frame. The difference between two- and four-layer transmission is also less significant.

V. CONCLUSION

This paper considers the application of DCT coding in image and video transmission over ATM networks. Joint optimization of image coding, transmission, and reconstruction has been attempted to achieve a good compromise among compression gain, system complexity, processing delay, error-concealment capability, and reconstruction quality. To combat potential packet loss, the JPEG and MPEG standards for image and video compression have been modified to incorporate even-odd block interleaving and fixed-point coefficient segmentation. Compared to a system using the original JPEG or MPEG coding algorithm and performing sequential packetization, the proposed coding and packetization scheme can effectively suppress the effect of packet loss at only slight ACKNOWLEDGMENT cost of compression gain and processing delay. It also facilitates both progressive and layered transmission. The The authors would like to thank the reviewers for their proposed reconstruction algorithm can effectively recover constructive suggestions and Dr. S. Singhal of Bellcore in the lost low-frequency components, to which the human Morristown, NJ for providing the video sequences used in visual system is most sensitive. It can exploit more thor- the simulations.

oughly the available information in a damaged block, including the temporally and spatially neighboring samples, as well as the received DCT coefficients. When the high-frequency DCT coefficients are present, the reconstructed image blocks suffer from less blurring than the images obtained by the conventional spatial and/or temporal interpolation techniques. By adapting the weights of the spatial and temporal smoothing according to coding modes and loss patterns, the proposed algorithm can work well in various types of regions, including those with slow or fast motion, as well as scene changes. The performance of the proposed system under different layered transmission schemes has been evaluated. Simulations with real video sequences have shown that the proposed coding and reconstruction methods, when combined with proper layered transmission, can produce satisfactory images for packet-loss rates as high as 0.1. Compared to the simpler copying algorithm, the proposed reconstruction algorithm can render higher quality service under the same transmission environment or reduce the necessary transmission cost to achieve the same degree of service quality. In real applications, the choice between the two- and four-layer transmission and between the copying and the proposed reconstruction methods depends on the character of the video sources, the required service quality, and the affordable transmission and reconstruction complexity. For sources with small motions such as in videophone and teleconferencing applications, two-layer transmission and the copying algorithm may be sufficient. For sources with moderate motion, either fourlayer transmission and the copying algorithm, or two-layer transmission with the proposed reconstruction method may be appropriate. For fast-moving sequences encountered in entertainment services, four-layer transmission with the proposed reconstruction algorithm may be necessary. These expectations need to be validated by more extensive simulations. Although the proposed reconstruction method is developed for packet-loss recovery in ATM networks, with minor modifications it can be applied to combat other types of transmission errors in any DCT-based image- and video-coding systems. In this paper, we have only considered the coding and reconstruction of the luminance component of color images. While the results can be directly extended to the chrominance components, the correlations between color components can be exploited to further improve the reconstruction quality. As pointed out in 1221, there exist significant correlations in the edge content of different color components. Investigations are currently under way on how to exploit this redundancy to recover the lost high-frequency content in one color component from those in the others.

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

258

IEEE TRANSACTIONSON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 3, NO.3, J U N E 1993

REFERENCES
Joint Photographic Experts Group, ISO/IEC/JTCl/SC2/WG8, JPEG technical specification, Revision 8, Aug. 1990. Motion Picture Experts Group, ISO-IEC/JTCl/SC2/WGll, MPEG video simulation model three (SM3), July 1990. D a t revirion of recommendation H.261, Doc. 572, CCITT SG XV, rf Working Party XV/l, Special Group on Coding for Visual Telephony. W. Verbiest, L. Pinnoo, and B. Voeten, The impact of the ATM concepts on video coding, IEEE J. Select. Areas Commun., 6, vol. pp. 1623-1632, Dec. 1988. M. Nomura, T. Fujii, and N. Ohta, Basic characteristics of variable rate video coding in ATM environment, IEEE J. Select., Areas Commun., 7, pp. 752-760, June 1989. vol. Y. Q. Zhang, et al., Variable bit-rate video in the broadband ISDN environment, Proc. IEEE, vol. 79, pp. 214-222, Feb. 1991. M. Nomura, T. Fujii, and N. Ohta, Layered packet-loss protection for variable rate video coding using DCT, in Proc. Int. Wohhop Packet V i 0 (Torino, Italy), Sept. 1988. M. Ghanbari, Two-layer coding of video signals for VBR networks, ZEEE J. Select. Areas Commun., vol. 7, pp. 771-781, June 1989. F. Kishino, et al., Variable bit-rate coding of video signals for ATM networks, IEEE J. Select. Areas Commun.,vol. 7, pp. 801-806, June 1989. A. Puri and R. Aravind, Interframe coding scheme for packet video, in Proc. SPZE Conf. Image Commun., Nov. 1989, pp. 1610-1619. A. R. Reibman, DCT based embedded coding for video, Signal Processing Image Commun.vol. 3, pp. 333-343, Sept. 1991. P. Haskell and D. Messerschmitt, Resynchronizaton of motion compensated video affected by ATM cell loss, in Proc. 1992 Znt. Conf. Acoust., Speech, Signal Processing (San Francisco, CA), Mar. 1992, pp. 111545548. Y. Wang and Q. F. Zhu, Signal loss recovery in DCT-based still and video image codecs, in Proc. SPZE Znt. Conf. H s u l Commun. Image Processing (Boston, MA), Nov. 1991, pp. 667-678. Y. Wang, 0. Zhu, and L. Shaw, Maximally smooth image F. recovery in transform coding, submitted to ZEEE Trans. Commun. Q. F. Zhu, Y. Wang, and L. Shaw, Image reconstruction for hybrid video coding systems, in Proc. ZEEE Data Compression Conf. (Snowbird, UT), Mar. 1992, pp. 229-238. G. K. Wallace, The JPEG still picture compression standard, Commun. Assoc. Comput. Mach., vol. 34, pp. 30-45, Apr. 1991. D. Le Gall, MPEG A video compression standard for multimedia applications, Commun. Assoc. Comput. Mach., vol. 34, pp. 46-58, Apr. 1991. A. S. Tom, C. L. Yeh, and F. Chu, Packet video for cell loss protection using deinterleaving and scrambling, in Proc. Int. Conf. Acoust., Speech, Signal Processing (Toronto, CAN), May 1991, pp. 2857-2960. D. W. Petr and V. S. Frost, Priority cell discarding for overload control in BISDN/ATM networks: an analysis framework, Znt. J. Digital Analog Commun. Syst., vol. 3, pp. 219-227, 1990. E. Dubois and J. L. Moncet, Encoding and progressive transmission of still pictures in NTSC composite format using transform domain methods, ZEEE Trans. Commun., vol. 34, pp. 310-319, Mar. 1986. David Samoff Research Center, ADTV System Description, final certification document submitted to FCC Working Party I, Feb. 1992.

[22] J. 0. Limb, C. B. Rubinstein, and J. E. Thompson, Digital coding of color video signals-a review, IEEE Tmns. Commun., vol. 25, pp. 1349-1385, NOV. 1977.

[41 [51

Qin-Fan Zhu (S90) was born in Sichuan, China, on Jan. 4, 1964. He received the B.E. degree from Northwestern Polytechnic University, Xian, China, the M.E. degree from the University of Electronic Science and Technology of China, Chengdu, and the M.S. and Ph.D. degrees from Polytechnic University, Brooklyn, NY, all in electrical engineering, in 1983, 1986, 1990, and 1993, respectively. He is currently pursuing the Ph.D. degree in electrical engineering at Polytechnic University. He was a teaching and research fellow from 1989 to 1993 at Polytechnic. He joined Motorola Codex Corporation in March 1993. His research interests include image and video compression, transmission, and reconstruction.

191 [lo1

Yao W n (S88-M90) was born in Dec. 1962, ag in Zhejiang, China. She received the B.S. and M.S. degrees in electrical engineering from Tsinghua University, Beijing, China, in 1983 and 1985, respectively, and the Ph.D degree in electrical engineering from the University of California, Santa Barbara, in 1990. Since July 1990, she has been with Polytechnic University, Brooklyn, NY, as an Assistant Professor of Electrical Engineering. Her current research interests include image/video compression, medical image processing, image reconstruction from limited data, and pattern recognition.

1151

[191 [201

[211

Leonard Shaw (S54-M59-SMSO-F84) was born on Aug. 15, 1934, in Toledo, OH. He received the B.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, and the M.S. and Ph.D. degree in electrical engineering from Stanford University, Stanford, CA. He is a Professor at Polytechnic University, Brooklyn, NY,where he has been the Head of Electrical Engineering and Computer Science since 1982. Since joining Polytechnic Univ. in 1960 (then Brooklyn Poly) he has also been Visiting Professor i The n Netherlands (1970) and France (1977) and has served as an Industrial Consultant, especially with Speny Systems Management Division from 1964 to 1980. His research has involved filtering and modeling of stochastic processes with applications to signal processing, control, and reliability, and he has coauthored a text on signal processing. Dr. Shaw has served on the IEEE Publications Board since 1984, including four years as the Editor-in-Chief of the IEEE Press, and has been active in publications and conferences of the Control Systems Society, of which he is now Vice President for Financial Affairs.

Authorized licensed use limited to: Amirkabir University of Technology. Downloaded on July 23,2011 at 17:27:44 UTC from IEEE Xplore. Restrictions apply.

You might also like