You are on page 1of 18

Multimed Tools Appl (2006) 28: 283300 DOI 10.

1007/s11042-006-7715-8

Gradual shot boundary detection using localized edge blocks


Hun-Woo Yoo & Han-Jin Ryoo & Dong-Sik Jang

# Springer Science + Business Media, LLC 2006

Abstract A new algorithm for gradual shot boundary detection is proposed in this paper. The proposed algorithm is based on the fact that most of gradual curves can be characterized by variance distribution of edge information in the frame sequences. Average edge frame sequence is obtained by performing Sobel edge detection. Features are extracted by comparing variance with those of local blocks in the average edge frames. Those features are further processed by the opening operation to obtain smoothing variance curves. The lowest variance in the local frame sequence is chosen as a gradual detection point. Experimental results show that the proposed method provides 87.0% precision and 86.3% recall rates for six selected videos. Keywords Gradual shot detection . AGI (Average Gradient Image) . Variance . Parabolic curve . Local blocks . Opening

1. Introduction Digital video is becoming an increasing common data type in the new generation of multimedia databases. Many broadcasters are switching to digital formats for broadcasting, and some of them already have a signicant amount of video materials

H.-W. Yoo (*) Center for Cognitive Science, Yonsei University, 134 Shinchon-Dong, Seodaemun-Ku, Seoul 120749, South Korea e-mail: paulyhw@yonsei.ac.kr H.-J. Ryoo Department of Electronics and Computer Engineering, Korea University, Sungbuk-gu Anam-Dong 5 Ga 1, Seoul 136701, South Korea e-mail: hanjin@mpeg.korea.ac.kr D.-S. Jang Department of Industrial Systems and Information Engineering, Korea University, Sungbuk-gu Anam-Dong 5 Ga 1, Seoul 136701, South Korea e-mail: jang@korea.ac.kr Springer

284

Multimed Tools Appl (2006) 28: 283300

Fig. 1 Abrupt shot changes (cuts)

available in digital formats for previewing. Improved compression technologies and increased Internet bandwidth have made a webcasting a real possibility. The evergrowing amount of digital videos pose new challenges, both of the storage and access, as vast repositories are being built at an increasing pace. A key step for managing a large video database is to segment the video sequences into shots. Video segmentation makes the video data more manageable by imposing on it a hierarchy. It also forms the rst step to understanding video content by dividing it into shots on which content analysis can be performed. This segmentation process is generally referred to as shot boundary detection. A shot is a sequence of frames generated during a continuous camera operation and represents a continuous action in time and space. Video editing procedures produce abrupt and gradual shot transitions. A cut is an abrupt shot change that occurs in a single frame. A gradual change occurs over multiple frames and is the product of fade-ins, fade-outs, or dissolves (where two shots are superposed). Figures 1 and 2 show examples of abrupt and gradual changes. There have been tremendous works reported in past few years on shot boundary detection in the literature [116, 1825]. Earlier works concentrate mainly on an abrupt cut. Therefore, recent related works geared toward gradual shot boundary detection. The detection of gradual changes is more difcult than that of abrupt cuts. This is because a difference sequence is temporally well separated for cuts, whereas, it is not at any time of the sequence for gradual changes.

Fig. 2 Gradual shot changes (a) fade-in; (b) fade-out; (c) dissolve Springer

Multimed Tools Appl (2006) 28: 283300

285

In this paper, a new gradual scene detection algorithm is proposed. The proposed algorithm is based on the fact that most of gradual curves can be characterized by the variance distributions of edge information in the frame sequences and during dissolve will show parabolic shapes. We rst obtain average edge frame sequence by applying a Sobel operation to original frame sequence and extract feature sequence showing a distinct parabolic variance curve by comparing full frame variance sequence with those of local nine sub-blocks in the average edge frames. This feature sequence is further processed by the opening operation to obtain a smoothing curve. The local minimum in a certain size of sliding window is chosen as a gradual detection point. Our contributions in this paper are threefold. First, in theoretical gradual (dissolve) transitions, consecutive variances over frames are shown like a parabolic curve. However, in an actual case, due to the noises and motions in a video, a variance graph is not sufciently pronounced. Our method approximates the variance sequence to an ideal curve by obtaining most distinct parabolic sequence extracted from local regions of the video frames. Second, the proposed algorithm achieves robust detection by smoothing the saw-like variance sequence through the morphological opening operation and a time-local analysis with a certain size of sliding window. Third, it is tested on video data that its performance is shown to be more accurate and reliable when compared with two commonly used algorithms DCD and twin comparison.

2. Related researches on gradual changes & Twin comparison method [24]. It is the rst attempt to detect and classify abrupt and gradual changes. In the approach, dual threshold values are applied to the difference of intensity histogram in order to detect gradual transitions. The method requires two thresholds: higher one, Th, for detecting cuts and a lower one, T1, for detecting gradual transitions. First, the threshold Th is used to detect high discontinuity values corresponding to cuts, and then the threshold T1 is applied to the rest of the discontinuity values. If a discontinuity value is higher than T1, it is considered to be the start of the gradual transition. At that point, the summation of consecutive discontinuity values starts and goes on until the cumulative sum exceeds the threshold Th. Then, the end of the gradual transition is set at the last discontinuity valued included in the sum. However, one of the major problems in this approach is that many false positives can be generated when thresholds are not properly assigned. Plateau detection [22]. Yeo and Liu noted that the comparison based solely on successive frames will not be adequate for the detection of gradual transitions. They used the difference between a current frame and a following kth frame. d Xi ; Xik . It obtains rst the sequence of delayed inter-frame distances Dk i If we choose k greater than the length of the gradual transition, the sequence exhibits a plateau of maximal width. A signicant plateau at location i is Dk i characterized by a sequence of similar values Dk j ; j i s; :::; i s, which are consistently higher than the preceding or successive values. The value of s is proportional to the difference between k and the transition length. The method applies to linear and nonlinear gradual transitions; it is the shape of the rises and falls at the plateau boundaries.
Springer

&

286

Multimed Tools Appl (2006) 28: 283300

&

&

&

Algorithm by Meng et al. [14]. In a compressed domain, an intensity variance of successive frames is used to detect gradual changes. This method exploits the DCT DC coefcients and motion vectors. Theoretically, as most dissolves show a parabolic shape, the authors tried to use the depth and width of that curve. However, in actual cases, due to the noises and motions in a video, the graph is not sufciently pronounced. Algorithm by Song et al. [18]. A chromatic video edit model for gradual transitions is built based on the assumption that discontinuity values belonging to such a transition form a pattern consisting of two piece-wise linear function of time, one decreasing and one increasing. Such linearity does not apply outside the transition area. The authors search for close-to-linear segments in the series of discontinuity values by investigating the rst and second derivative of the slope in time. A close-to-linear segment is found if the second derivative is less than a pre-specied percentage of the rst derivative. Feature-based detection [23]. This algorithm is based on calculating edge change fraction in temporal domain. During a cut or a dissolve, new intensity edges appear far from location of old edges. Edge pixels that appear/disappear far from existing edge pixels are considered as entering/exiting edge pixels. Cuts, fades, and dissolves can be detected by counting the entering and exiting edge pixels, while wipes can be detected by looking at their spatial distribution. The algorithm is based on the following steps: 1. Frames Ft and Ft+1 are aligned using a global motion compensation algorithm. 2. Edges are computed by applying the Canny algorithm to a smoothed version of the frames. 3. The binary edge maps are dilated by radius r, so that the condition on the mutual distance of edge pixels can be easily veried by set intersection. 4. The fraction of entering edge pixels rin and exiting pixels rout are computed. Shot changes are detected by looking at the edge change fraction r = max ( rin, rout). A cut leads to a single isolated high value of r while the other scene breaks lead to an interval where rs value is high. During a fade-in the value >in is much higher than rout. The reverse happens for fade-outs. A dissolve is characterized by a predominance of rin during the rst phase and rout during the second phase. The technique works properly also on heavily compressed image sequence. This approach presents high accuracy, but it takes a large amount of computation time.

&

&

Algorithm by Truong et al. [20]. It tried to improve cut detection accuracy by utilizing an adaptive threshold computed from a local window on the luminance histogram difference curve. Also, based on the mathematical models for producing ideal fades and dissolves, the existence of these effects were examined. In that procedure, constraints on the characteristics of frame luminance mean and variance curves were derived to eliminate false positives caused by camera and object motions during gradual transitions. Detection based on spatio-temporal distribution of the macro block types [9]. It performed dissolve detections based on the spatio-temporal distribution of the macro block types in MPEG-compressed videos. The ratio of forward macro blocks in the B-type frames and the spatial distribution of forward/backward
Springer

Multimed Tools Appl (2006) 28: 283300

287

macro blocks is utilized for detecting dissolve changes. After nding such sequence of frames two heuristic rules are applied: 1. The global color distributions of the frames at which the dissolve starts and terminates are very different. 2. The duration of a dissolve transition is typically more than 0.3 s. & Machine learning approach [12]. A novel dissolve detection algorithm using machine learning and multi resolution concept was proposed. The approach is less concerned about actual features used for dissolve detection, but more with a general framework for recognizing gradual transitions. First, a huge number of dissolve examples are created from a given video database using a dissolve synthesizer. Then these examples are used to train a heuristically optimal classier which is then employed in a multi-resolution search for dissolves of various durations. DCD method [13]. Variance, gradient magnitude, and double chromatic difference (DCD) of image sequence were used for dissolve detection. The rst step of the DCD segments the video into non-overlapping categories of Bpotential dissolve and Bnon-dissolves using edge-based or pixel-based statistics. The second step of the DCD detector uses this segmentation to dene one synthetic dissolve per potential-dissolve segment, beginning and ending at the rst and last frame of the segment, respectively. From these starting and ending frames, the center frame of a synthetic dissolve is formed and compared to the intervening footage. If the shape of the comparison error over time is parabolic shaped, the potential-dissolve segment is accepted. Other algorithms related to gradual transition detection are found in [3, 4] and good surveys can be found on [1, 2, 57, 11, 19]. Table 1 is the summary of existing and proposed algorithms.

&

3. Abrupt shot boundary detection In order to detect the gradual shot boundary, an abrupt cut is detected rst. An existence of gradual shot changes is examined between neighboring cuts. A fulldecoded MPEG video sequence is used to achieve more accurate detection. In this paper, we used a histogram correlation metric to detect cuts as follows. Let mk, Ak be the average and variance of the pixel intensities in kth frame. Then, the inter-frame correlation between two consecutive frames k, and (k+1) is described in the following.
H 1 W 1 P P

Xk i j mk Xk1 i j mk1 'k 'k1 1

Cork; k 1

i0 j0

Cork; k 1

where, W and H are the width and height of a frame, and Xk[i][j] is the pixel intensity at (i, j) coordinate in kth frame. If Cor (k, k+1) is under certain threshold, i.e., low correlation, the associated frame k + 1 is declared as a cut. It is contrary to a general frame difference metric where the cut is declared if the difference exceeds certain threshold, i.e., high difference.
Springer

288

Multimed Tools Appl (2006) 28: 283300

Table 1 Summary of existing and proposed gradual shot boundary detection algorithms Name of methods Twin comparison [24] Approach A lower threshold T1 for cuts and a higher threshold Th for gradual boundaries are used. Many false positives can be generated when thresholds are not properly assigned. Plateaus detection is performed in k-interval difference sequence. Dk i dXi ; Xik Assuming the intensity variance of successive frames show a parabolic shape, it tries to detect the depth and width of the curve. However, in actual cases, the ideal parabolic curve is not pronounced. It searches for close-to-linear segments in the series of discontinuity values by investigating the rst and second derivative of the slope in time. A close-to-linear segment is found if the second derivative is less than a pre-specied percentage of the rst derivative. Gradual changes are detected by examining the fraction of existing edge pixels. r = max (rin, rout) It presents high accuracy, but takes a large amount of computation time. Based on the mathematical models for producing ideal fades and dissolves, the existence of gradual changes are examined. However, in actual cases, the ideal cases do not exist. The ratio of forward macro blocks in the B-type frames and the spatial distribution of forward/backward macro blocks is examined. Two heuristic rules are applied: &Global color distributions of the frames at which the dissolve starts and terminates are very different. &Duration of a dissolve transition is typically more than 0.3 s. Many dissolve examples are trained to obtain a heuristically optimal classier. It is based on the fact that variance, gradient magnitude, and double chromatic difference (DCD) of sequence show a parabolic-like shape. However, in actual cases, the ideal parabolic curve is not pronounced. It approximates the variance sequence to an ideal curve by obtaining most distinct parabolic sequence extracted from local frame regions. It smoothes the saw-like variance sequence through morphological opening and a local sliding window to achieve robust detection.

Plateau detection [22] Algorithm by Meng et al [14]

Algorithm by Song et al [18]

Feature-based detection [23]

Algorithm by Truong et al [20] Detection based on spatio-temporal distribution of the macro block types [9]

Machine learning method [12] DCD method [13]

Proposed method

4. Gradual shot boundary detection The gradual shot boundary tends to have a high correlation between consecutive frames but accumulates small changes over multiple frames as time passes, which result in a different shot, i.e., occurrence of gradual change. It has no distinct characteristics between two successive frames. In an ideal dissolve case, a variance of pixel intensities in a frame is distributed over the frames as gure 3(a) [14]. It
Springer

Multimed Tools Appl (2006) 28: 283300

289

Fig. 3 Variance distribution over gradual changes (a) dissolve; (b) fade-in; (c) fade-out

looks like a parabolic curve. A fade-in, where new scene is gradually shown up with the increase of pixel intensity may be like gure 3(b) and a fade-out, where scene is gradually disappeared with the decrease of pixel intensity be like gure 3(c). Some of earlier gradual detection methods are performed on compressed domain. These methods have an advantage of fast detection since a full-decoded procedure was not necessary. However, lost information by using compressed data yields a distorted parabolic shape and is an obstacle to obtain robust detection. Other method has used a mean of pixel intensities in a full-decoded frame since it can provide little distorted sequence. However, this method has a drawback that it can search the transition point only when the mean frame sequence has high difference among neighboring frames. In fact, generally, video sequence is not following the ideal case of gure 3 due to the noises and camera motions. Hence, in this paper, we try to approximate frame difference sequence to an ideal curve by using Baverage edge frames to obtain more robust gradual detection. 4.1. Average edge image An average edge image is a reconstructed image using only pixels, which have intensities more than an average intensity of a Sobel edge-detected image [17]. Sequence of these images has distinct and smooth variance distribution compared to that of gray images as shown in gure 4. This is somewhat similar to the effective average gradient (EAG) in [13]. The extraction of the average edge image is by the following steps. Step 1: Convert a color image (frame) to a gray image (frame). Y Luminance 0:299R 0:587G 0:114B 2
Springer

290

Multimed Tools Appl (2006) 28: 283300

Fig. 4 Variance distribution in the video source of Missing_You.mpg (a) distribution of a gray image (b) distribution of an average edge image

where, Y is an intensity in the gray image and R, G, and B are red, green, and blue components in a RGB color image. Step 2: Obtain the edge image by applying a Sobel edge mask to the original image with threshold 100. A detailed edge detection procedure is explained in gure 5. & f x; y; if f x; y 1 100 3 fGradient x; y 0; if f x; y 100 where, f (x, y) is a gray value at coordinate (x, y) and fGradient (x, y) is a gray value at coordinate (x, y) after applying a threshold. Step 3: Compute an AG (Average Gradient). AG & where, px; y
Springer

X
x;y

fGradient x; y=

X
x;y

px; y

1; if 0; if

fGradient x; y 1 0 fGradient x; y 0

Multimed Tools Appl (2006) 28: 283300

291

Fig. 5 Sobel edge detection: (a) Sobel mask operation; (b) applying direction of the mask on an original image; (c) Sobel mask for xy directions

Step 4: Extract an average edge image using the average gray value (AG) as a new threshold. & fGradient x; y; if fGradient x; y 1 AG 5 fAG x; y 0; if fGradient x; y AG

4.2. Feature extraction In order to maximize a property of a gradual transition, we extract nine variances from nine equal-sized, non-overlapping blocks (see gure 6) in the average edge image. The reason for computing variances of localized blocks is that we try to obtain a new distinct sequence, which shows more gradual change properties than that of an overall frame variance. Complexity of contents (for example, edge information in our research) will be different according to the spatial location within a frame. Hence, we search the blocks, which maximize the depth and width of parabolic curve. For example, gure 7 shows variance sequences of overall frame and three sub-blocks. Distinct gradual sequence is obtained rst by computing

Fig. 6 Sub-block image

Springer

292

Multimed Tools Appl (2006) 28: 283300

Fig. 7 Variance distribution of the average edge image and the three sub-blocks (Missing_You.mpg)

maximum and minimum difference sequences between overall and each block using Eq. (6) and by intersecting two sequences using Eq. (7). Smax maxjTk Ski j; Smin minjTk Ski j 6 7
th

AGI T ; S minSmax ; Smin


th

where, Tk is variance of k overall frame, and Ski is variance of i block in kth frame. Equations (6) and (7) try to nd the block that maximizes the depth and width of a parabolic curve in each frame and take variance of corresponding block for computing the gradual point. We refer the result sequence to as AGI (Average Gradient Image) sequence. The result AGI sequence is shown in gure 8. This sequence shows more distinct parabolic shape compared with the variance sequence of frames.

Fig. 8 Sequence of AGI (Missing_You.mpg) Springer

Multimed Tools Appl (2006) 28: 283300

293

4.3. Computation of local variance We now have to pick out a gradual point using AGI sequence. In order to investigate the amount of changes over AGI frame sequence, we use the variance for every 30 frames using Eq. (8). In general, since gradual changes are proceeding over 3060 frames (12 s), we chose 30 as a sliding local window where the existence of gradual change is examined. For similar shot sequence, the variance is almost constant, while for gradual sequence, it shows near to parabolic characteristics, i.e., gradually decreasing at starting frame of changes and gradually increasing at the frame of new shot shown up (for the dissolve case).
i L 1 X AGI k meank2 L 1 ki i L 1X meani AGI k L k i

vari

where, i = 1,2,..., n j L (frame number), L is the total number of frames in a window (30 frames), AGI (k) is the variance of the AGI in kth frame. 4.4. Filtering Even though the feature variance sequence have an ideal variance curve, an additional ltering procedure for reducing distortions within the curve is needed in order to obtain more accurate detection. In this paper, we smooth the curve by applying a morphological opening operation. Sequence after the opening on the AGI sequence shows softer curve as gure 9. The opening is performed by the following equation. Openingn f ]B & Bn 9

where, f ] B (n) = max [ f (n), f (n T 1), f (n T 2)], f & B (n) = min [ f (n), f (n T 1), f (n T 2)], n = 1,2,...m (frame number), f (n) is the variance in nth frame. B is the structuring element of one-dimensional array (window size 5).

Fig. 9 Sequence after the opening operation on the AGI sequence in gure 8 (Missing_You.mpg) Springer

294

Multimed Tools Appl (2006) 28: 283300

4.5. Detection of gradual change frames One of frames during gradual change has a minimum value in the local parabolic sequence. We detect a gradual change point based on the sequence width and depth, i.e., frame interval and variance difference (see gure 9). We declare the gradual change point if Eq. (10) is satised. Two thresholds, 30 for width and 0.03 (normalized value) for depth are heuristically chosen. Dfvariance j'local max i 1 'local min ij ! 0:03 Dframe jFrmlocal max i 1 Frmlocal max ij 30 10

where, i = 1,2,...,n are frame numbers that have local minimum, Alocalmin[i] and Alocalmax[i] is variances of ith frame that have the local minimum and maximum, respectively, Frmlocalmax[i] are a frame number that has local maximum.

5. Experimental result Experiments are performed on the IBM Pentium PC using Microsoft Visual C++. A graphic user interface (GUI) is shown in gure 10. Video data for experiments are one music video, two commercials, two movies, and one drama. We selected these videos because those contain many gradual frames. For the evaluation of the proposed detection algorithm, the precision and recall were computed using Eqs. (11)(12). Precision NCORRECT 100 NCORRECT NFALSE 11

Fig. 10 GUI (Graphic User Interface) Springer

Multimed Tools Appl (2006) 28: 283300 Table 2 Experimental results with six video sources MPEG le Missing you Sin noodle White valentine Illwolgie Posco GaeulDongWha Average Type Music video Commercial Movie Movie Commercial Drama

295

NTOTAL NSCD NCORRECT NMISSED NFALSE Precision Recall 1,686 477 1,200 1,035 441 765 20 6 5 10 4 6 16 4 4 10 4 5 4 2 1 0 0 1 0 0 1 1 1 2 100 100 80 91 80 71 87.0 80 75 80 100 100 83 86.3

Where NTOTAL is total number of frames related to the MPEG le.

Recall

NCORRECT 100 NSCD NCORRECT NMISSED

12

where, NCORRECT is the number of correctly detected frames, NFALSE is the number of falsely detected frames, NMISSED is the number of missed frames, and NSCD is total number of frames where transitions are occurred. In order to detect the gradual shot boundary, an abrupt cut is detected rst. An existence of gradual shot changes is examined between neighboring cuts. As Eq. (1)

Fig. 11 Performance comparison with different number of sub-blocks in terms of (a) precision; (b) recall Springer

296

Multimed Tools Appl (2006) 28: 283300

Fig. 12 False detection due to the continuous distribution of local minimum (left is falsely detected frame and right is actual change frame) in (a) White_Valentine.mpg; (b) GaeulDongWha.mpg

shows, the correlation has a value between j1 and 1. Correlation 1 means a perfect match. In our research, threshold 0.82 for an abrupt cut was heuristically chosen. In the experiments for change detection, average precision and recall are 87.0 and 86.3%, respectively. Results are described in Table 2. We also performed experiments using different number of sub-blocks. Figure 11 shows the performance comparison. As gure 11 show, dividing into 33 yields the best overall performance and dividing into 11 i.e., the use of frame sequence shows

Fig. 13 False detection due to object and camera movement (left is falsely detected frame and right is actual change frame) in (a) Illwolgie.mpg; (b) Posco.mpg Springer

Multimed Tools Appl (2006) 28: 283300

297

worst result. It means using localized variance information is effective for detection purpose. It is not surprising that decrease on the number of sub-blocks does not much impact on performance improvement because it has similar variance sequence to that of frame. In the contrary, increase on the number of sub-blocks tends to add distortions of variance sequence at the same time. Most of non-detected frames (NMISSED) are due to the fact that those have almost constant distribution of edge information between neighboring shots, hence do not show a distinct dissolve curve. Some of false-detected frames (NFALSE) in movies such as BWhite Valentine and BGaeulDongWha have consecutive distribution of local minimum between gradual-changed frame and neighboring frame. Those declare the change at the 2030 frames before actual transition point. Figures 12 and 13 show detected frames (left) through the proposed algorithm and actual change frame (right), respectively. For the BIlwolgie case (gure 13(a)), falsely detected frames (left) are obtained due to the object movement within the same scene. For the BPosco case (gure 13(b)), falsely detected frame (left) are due to continuing camera movement over many frames where actual transitions occur. We compared the proposed algorithm with well-known twin comparison method [24] and DCD (Double Chromatic Difference) method [13], which show robustness on object and camera movements. Edge histogram for twin comparison and fulldecoded edge image for DCD method (not DC image) are used for the proper evaluation. Experiments on six video data are depicted in gure 14. In precision, the
Fig. 14 Comparison with other algorithm (Twin & DCD) in terms of (a) precision; (b) recall

Springer

298

Multimed Tools Appl (2006) 28: 283300

proposed algorithm was superior to others except for the video 4 where the DCD showed the best performance. In recall, the proposed algorithm was superior in vides 2, 4, 5, and 6. However, the twin comparison showed best performance in videos 1 and 3. In average precision, the proposed algorithm showed best performance (87%) and the twin comparison (77.2%) and the DCD (74.3%) were followed. Also, in average recall, the proposed algorithm showed best performance (86.3%) and the twin comparison (82%) and DCD (65.3%) were followed. Through the experiments, we noticed the twin comparison was sensitive to thresholds. Two thresholds should be adequately assigned to obtain improved results. In the experiments, in order to apply equal thresholds to six videos, we set two thresholds based on the linear summation of an average and a standard deviation over total histogram difference (That is, Th = average 3.0 standard deviation and T1 = average 0.4 standard deviation).

6. Conclusions and further research We proposed a new gradual shot boundary detection algorithm in this paper. The proposed algorithm tried to approximate variance sequence to an ideal parabolic curve that was shown in typical gradual transition. That was obtained by using most distinct parabolic sequence extracted from the nine local sub-block frame sequences. Experiments on six video sources showed the proposed algorithm yielded better detection performance than well-known twin comparison and DCD methods. For average precision, the proposed algorithm showed performance of 87%. The twin comparison and the DCD showed 77.2 and 74.3%, respectively. In average recall, it also showed best performance (86.3%) and the twin comparison (82%) and the DCD (65.3%) were followed. Experimental results were encouraging, but it is worth stressing some problems encountered. Camera motion, object motion, and extensive content change within the shot should be considered for high performance. The method of handling these problems along with the proposed algorithm could yield better results. The algorithm is performed on full-decoded frame. Therefore, it took more times than in compressed domain. Fast detection along with handling the distortion of compressed data is necessary. Future work includes more testing on different types of videos and efforts on shot transitions on the basis of camera motion. For other direction, we are under research to investigate emotions caused by various video effects to perform emotion-based video scene retrieval.
Acknowledgments This work was supported by Korea Research Foundation Grant (KRF-2002 005-H 20002). Comments and suggestions from the reviewers were greatly appreciated.

References
1. Ahanger G, Little TDC (1996) A survey of technologies for parsing and indexing digital video. J Vis Commun Image Represent 7(1):2843 2. Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J Vis Commun Image Represent 10(1):78112 3. Covell M, Ahmad S (2002) Analysis-by-synthesis dissolve detection. In: Proc ICIP, vol 1. pp 2325 Springer

Multimed Tools Appl (2006) 28: 283300

299

4. Fernando WAC, Canagarajah CN, Bull DR (1999) Fade and dissolve detection in uncompressed and compressed video sequences. In: Proc ICIP, vol 3. pp 299303 5. Ford RM, Robinson C, Temple D, Gerlach M (2000) Metrics for shot boundary detection in digital video sequences. Multimedia Syst 8(1):3746 6. Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of video-shot-change detection methods. IEEE Trans Circuits Syst Video Technol 10(1):113 7. Hanjalic A (2002) Shot boundary detection: unraveled and resolved? IEEE Trans Circuits Syst Video Technol 12(2):90105 8. Jain AK, Vailaya A, Xiong W (1999) Query by video clip. Multimedia Systems: Special Issue on Video Libraries 7(5):369384 9. Jun SB, Yoon K, Lee HY (2000) Dissolve transition detection algorithm using spatio-temporal distribution of MPEG macro-block types. ACM International Conference on Multimedia 391 394 10. Lee SW, Kim YM, Choi SW (2000) Fast scene change detection using direct feature extraction from MPEG compressed videos. IEEE Trans Multimedia 2(4):240254 11. Lienhart R (1999) Comparison of automatic shot boundary detection algorithms. Storage Retrieval for Media Database SPIE 3656:290301 12. Lienhart R (2001) Reliable dissolve detection. Storage and Retrieval for Media Database SPIE 4315:219230 13. Lu HB, Zhang YJ, Yao YR (1999) Robust gradual scene change detection. International Conference on Image Processing 3:304308 14. Meng J, Juan Y, Chang SF (1994) Scene change detection in a MPEG compressed video sequence. In: Proc. SPIE/IS&T Symp. Electronic Imaging Science and Technology: Digital Video Compression: Algorithms and Technologies, vol 2419. pp 1425 15. Nagasaka A, Tanaka Y (1991) Automatic video indexing and full-motion search for object appearances. In: Proc. IFIP TC2/WG2.6 Second Working Conf. on Visual Database System. pp 113127 16. Otsuji K, Tonomura Y, Ohba Y (1991) Video browsing using brightness data. Vis Commun Image Process SPIE-1606:980989 17. Shapiro L, Stockman GC (2001) Computer vision. Prentice Hall 18. Song S, Kwon T, Kim W (1998) Detection of gradual scene changes for parsing of video data. In: Proc IS&T/SPIE, vol 3312. pp 404413 19. Truong BT (1999) Video genre classication based on shot segmentation. Honours Thesis, Curtin University of Technology, Western Australia, November 1999 20. Truong BT, Dorai C, Venkatesh S (2000) New enhancements to cut, fade, and dissolve detection processes in video segmentation. ACM International Conference on Multimedia pp 219227 21. Xing W, Lee JC (1998) Efcient scene change detection and camera motion annotation for video classication. Comput Vis Image Underst 71(2):166181 22. Yeo BL, Liu B (1995) Rapid scene analysis on compressed video. IEEE Trans Circuits Syst Video Technol 5(6):533544 23. Zabih R, Miller J, Mai K (1999) A feature-based algorithm for detecting and classifying production effects. Multimedia Syst 7(2):119128 24. Zhang HJ, Kankanhalli A, Smoliar SW, Tan SY (1993) Automatic partitioning of full motion video. ACM Multimedia Systems 1(1):1028 25. Zhang HJ, Wu J, Zhang D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recogn 30(4):643658

Springer

300

Multimed Tools Appl (2006) 28: 283300

Hun-Woo Yoo is a research professor at the Center for Cognitive Science at Yonsei University. He received his B.S. and M.S. degrees in Electrical Engineering from Inha University, Korea and a Ph.D. degree in Industrial Systems and Information Engineering at Korea University, Korea. From 1994 to 1997, he has worked as a research engineer at the Manufacturing Technology Center of LG Electronics. His current research interests include multimedia information retrieval, computer vision, and image processing.

Han-Jin Ryoo received a B.S. degree from Korea Military Academy, Korea and M.S. degree in Industrial Systems and Information Engineering from Korea University, Korea. Currently he is a Ph.D. candidate in Electronics and Computer Engineering at Korea University. His research interests are face detection/recognition, multimedia communication and content based image search.

Dong-Sik Jang is a professor in the department of Industrial Systems and Information Engineering at Korea University. He received a B.S. degree in Industrial Engineering from Korea University, Korea, M.S. degree from University of Texas, and a Ph.D. degree in Industrial Engineering from Texas A&M University. Dr. Jangs research interests are computer vision, multimedia communication and articial intelligence. Springer

You might also like