Surveillance Video Synopsis Via Scaling Down Moving Objects

IJSTE - International Journal of Science Technology & Engineering | Volume 3 | Issue 09 | March 2017
ISSN (online): 2349-784X
Surveillance Video Synopsis via Scaling Down

Moving Objects
Surafi PS Hema S Mahesh
PG Student Assistant Professor
Department of Electronics & Communication Engineering Department of Electronics & Communication Engineering
Marian Engineering College, Trivandrum, India Marian Engineering College, Trivandrum, India
Abstract
This paper produces an approach for video synopsis through which shorter and condensed form of video can be obtained. Most of
the spatiotemporal redundancies in surveillance video can be avoided and essential activities in the video can be preserved here.
In most of the existing approaches collisions are caused while video is condensed. Most of the collisions can be avoided by reducing
size of moving objects. Mainly three steps are used. Interesting objects are identified and detected in first step. An optimization
framework technique is used in second step in order to find energy mapping values. Finally scaled down objects are added back to
the video of user defied length video.
Keywords: Video Synopsis, Surveillance, Collision, Optimization Framework, Reduce Size
________________________________________________________________________________________________________
I. INTRODUCTION
With the development of our society number of surveillance cameras is increasing nowadays. Most of these work 24 hours a day
and this makes the video analyzing time consuming [1]. So most of these videos are never reviewed after they are made. These
videos contain too much redundant information and important activities are less. Video synopsis give shorter condensed version
of the original video with important events preserved. This is the efficient index of the original video by calculating the time of the
interesting object. According to the time label interesting objects can be find out from the original video easily. In addition to this
video synopsis can also reduce memory storage for surveillance video so that memory storage can be reduced.
So far there is no specified method to determine if the synopsis is good or bad. Because the quality of the synopsis video depends
on the content of the original video and the requirement of the application. Besides different users have different views which
makes the criteria more difficult. Generally there are some qualitative criteria for video synopsis, they are as follows. Output video
should preserve the activities present in the original video by reducing the most of the spatiotemporal redundancies. Then the
collisions among the active objects should be reduced as much as possible. Next the consistency of the moving objects should be
kept as far as possible in the synopsis video.
There are many approaches proposed for video synopsis. Some of them are based on frames in which the entire video frames
are considered as the fundamental building blocks that cannot be decomposed. Commonly in frame based approaches the video is
compressed along the time axis which allows the user to see the summary of the entire video. One of these methods is fast forward
[2], in which several video frames are skipped. Unfortunately there is a chance of missing fast moving objects while skipping video
frames. To overcome this some methods are proposed [3]-[7] where according to some criteria key frames are extracted or some
video clips without activities are skipped. However frame based approaches tends to lose dynamic aspect of the original video and
retain empty spaces of the background in the synopsis [1], [8]. To reduce most of these empty spaces of the original video a number
of different approaches have been proposed to retain informative video portion from the original video together [9]-[12]. To avoid
these problems [1], [13], [14] proposed an object based approach which is the mainstream for surveillance video synopsis field. In
object based approach objects are first extracted from the original video then they are converted to temporal domain. It can avoid
most of spatiotemporal redundancies but causes unwanted collisions among active objects in the synopsis video. To avoid further
collisions and improve compression ratio some spatiotemporal approaches are proposed. These methods are capable of avoiding
most of the collisions but both temporal and location information of objects is violated.
II. RELATED WORKS
There are mainly three approaches in video synopsis field which are frame based video synopsis, object based video synopsis and
part based video synopsis.
Frame based Video Synopsis

It can be subdivided into two classes: key frame selection approaches and video skimming. With respect to key frame selection
approach a series of important frames are selected from the original video according to certain criteria. Among the all key frame
approach technique uniform temporal sampling is the simplest but it may extract less important frames also. McMillan [15]
All rights reserved by www.ijste.org 298

Surveillance Video Synopsis via Scaling Down Moving Objects
(IJSTE/ Volume 3 / Issue 09 / 062)
proposed a non-uniform sampling approach to create time lapsed video summaries to create users duration. Adaptive fast forward
techniques were developed in [3] and [6] to avoid the loss of fast activities when deleting frames. To select key frames more
correctly and accurately, some criteria for selection were proposed in [4], [5], and [16][18]. These approaches aim at eliminating
video clips with no activities or discarding frames of low interest according to users definition. In addition to the video frame,
multimedia information can also be utilized for video content analysis such as sound to find key frame. In [19][21], auxiliary
information is taken into consideration to help select key frames for synopsis video.
Object based Video Synopsis

An important concept was introduced by [9], [10], [21] which combined activities coming from different times rather than selecting
and connecting the whole video frames. These approaches take video portions as building blocks, and first extract informative
video portions from the original video. Then, the video portions are shifted and stitched together according to optimization
techniques such as simulated annealing and graph cut. Although more activity information in the original video are presented
simultaneously, obvious discontinuity usually appear between different portions, which results in unpleasant visual effects. Rav-
Acha et al. [13] identified and extracted moving objects that can be shifted along the time axis to create a compact and continuous
synopsis. Similar with [10], this approach also disturbed the chronological order of moving objects to get a more condensed video
than previous methods. Pritch et al. [3] expanded the previous work [13] to deal with the continuous video streams captured by
webcams. A more detailed approach was proposed in [1] which preserved the local chronological order of moving objects present
in the frames. It can be seen as the unification and expansion of approaches represented in [13], [33]
Part based Video Synopsis

More recently, a novel approach called part-based object movements synopsis is proposed. The traditional object-based video
synopsis approaches cannot reduce the redundant information in the movements of objects as it is difficult. Different from the
previous work, object parts are used as the basic building blocks in [12]. An object is first discomposed into different parts and
each part corresponds to a movement sequence. Then, slightly changing subsequence is discarded and the rest are assembled and
stitched together optimally to get a compact video. This approach can reduce redundancies in a better level through eliminating
several unimportant movement sequences.
III. PROPOSED METHOD
In this section, proposed approach is described in detail. To show the effectiveness of the proposed approach, an example is
presented in Fig. 2. Given a source video, if it is condensed using traditional synopsis approaches including [1], [13] and [14],
visual collisions will occur when the objects are shifted along the time axis as shown in Fig. 1(a) and (b). It can be observed from
Fig. 1(a) that objects move across each other, which results in bad artifacts. In Fig. 2(b), there are collisions among moving objects
and the scene is crowded. To decrease collisions and eliminate unwanted crowded visual effects, we attempt to reduce the sizes of
objects when collisions occur. The corresponding condensed results produced by proposed approach are presented in Fig. 1(c) and
(d). As can be seen from the comparison between Fig. 1(a) and (c), collisions is much fewer objects are scaled down. Besides, the
comparative result of Fig. 1(b) and (d) indicates that reducing the sizes of objects can also avoid congested phenomenon in the
condensed synopsis.
Fig. 2 presents the main procedure of our approach which can be mainly divided into three parts. First, moving objects are
identified and segmented using Background subtraction. Second, an optimization framework is proposed to determine an optimal
mapping from the input video to the synopsis video. The moving objects will be shifted in the temporal domain and the reduction
coefficient of each object in order to avoid collision will be calculated in this part. Finally, each object is stitched into the
reconstructed background according to the results obtained in the second part
Fig. 1: (a). Collisions occur while shifting objects along the time axis. (b) Crowded conditions occur. (c) Collisions are reduced through
decreasing the sizes of objects. (d) Collisions are fewer, and the scene is not so crowded as before

Fig. 2: Flow chart
Object detection and segmentation

In object-based approaches, object detection and segmentation are most essential steps. Recently, many detection and segmentation
approaches have been proposed [15][20]. But, they are not appropriate to address our problem. For video synopsis, a qualitative
criterion is to preserve the most interesting activities. Generally, similar with [1] and [8] we define a moving object as an interesting
one for synopsis. However, it must be also noted that not all the moving objects in the frames should draw the users attention and
not all static objects can be ignored. For example, the swing of leaves is not the significant information for us, while a motionless
man may be important according to users definition. Hence, these exceptions must be considered in object detection stage.
Similar with [1], [8], [10], and [13], a background subtraction method is employed here to extract the moving objects. Before
segmenting objects from the original video, we have to reconstruct the background. In most cases, the surveillance camera is static.
So that the background changes very slightly and slowly due to the variation of illumination or the vibration of camera. Based on
this observation, a temporal median over several video clips which is divided into a fixed number of frames is used to represent
the background of the current frame. In this method, the video clips within one minute (30 seconds before and after the current
frame) are used to calculate the corresponding median value. In this case, the objects which are remaining stationary for a long
time will become a part of the background. Then moving objects are identified according to the difference between the current
frame and its corresponding background extracted. For segmentation of the moving objects, first construct a mask of foreground
pixels, and apply 2D morphological dilation and erosion on all the mask frames to get a more precise result.
Usually, some noise, such as the motion of leaves, which is unwanted, is included in the foreground pixels by background
subtraction approaches. Therefore, Aggregated Channel Features (ACF) detection is also employed to extract the moving objects.
The ACF detector can produce a precise bounding box for each moving object. In this case, the foreground pixels that are exceeding
the bounding box will be ignored, and most noise is eliminated from the foreground. After that, a large area of adjacent foreground
pixels inside the bounding box is treated as a moving object.
Fig. 3: (a) Detection of moving objects by background subtraction (b) Detection using morphological filters (c) Moving objects detected using
bounding box
Optimization Framework
In this section, we will introduce the detail about optimization framework in detail which penalizes the loss of activity information,
the collision artifacts, and the size reduction of object segmented and the violation of relationships between moving objects. By
using the proposed framework, an appropriate mapping result can be obtained to indicate the new time positions of detected objects
in the synopsis video and determine the corresponding reduction coefficients for the synopsis video. Video synopsis based on this
mapping result will maximize activity information and preserve the temporal relationships between moving objects while avoiding

most collisions that can occur. Let E represent the energy of a function and M represent a mapping from the source video to the
synopsis video. The objective function of the proposed approach can be defined as follows:
E(M)=Ea(M)+Ec(M).Er(M)+Et(M) (1)
Where Ea(M) indicates the activity cost of the current mapping, Ec(M) is the collision cost, Er (M) represents the reduction cost,
and Et (M) denotes the temporal consistency cost. The activity cost term Ea(M) is prone to preserve most observations when an
object is mapped to the synopsis video. It can be defined as
( ) = o* (2)
*
In Eq. (2), O represents the set of all the tubes, and o is a tube with new time positions and new size of object, which corresponds
to the mapping result of tube o. As moving objects are represented by tubes, these two concepts can be interchangeable in later
formulas.
The second term of Eq. (1) is the product of Ec(M) and Er(M) as shown. Ec(M) restricts collisions generated by shifting moving
objects along the time axis, and Er(M) prevents the reduction operations on the sizes of objects. Reducing the sizes of moving
objects is done to decreasing collisions, but the size cannot be reduced indefinitely. When reducing the size of an object, the
collision cost will be decreased and the collisions between moving objects can be avoided to a great extent. But there is a chance
that the reduction cost will be increased and the objects in synopsis video will be difficult to identify. In this case, the multiplied
form of these two cost terms (Ec(M) and Er (M)) is constructed to achieve a compromise between alleviating collisions and
obtaining identifiable frames from the condensed synopsis video. These two cost terms can be defined as follows, respectively:
( ) = (,) (3)
( ) = (, ) (4)
In Eq. (3), o* and p* represents the mapping results of moving object o and p, respectively. is a weight which can be changed
according to the needs of users. For example, if the collision cannot be tolerated, we can set a larger value to to reduce the
collision. Then, the importance of Ec(M) will be increased, collision phenomenon in the optimization process will reduced to a
greater extent . Consequently, the final mapping result contains only fewer collisions. In Eq. (4), X is a set of reduction coefficients,
and xo is one of them, which corresponds to the object o. Ao denotes the area of the bounding box of object o. Et (M) prevents the
serious violation of temporal relationships between moving objects, and it can be defined as follows:
( ) = ( , ) , (5)
Where is also a weight. If we need to preserve the temporal relationships among objects as much as possible, then should be
set a larger value to increase the importance of Et (M) in the objective function. We can also say that, reducing the value of can
also decrease the collisions to a certain extent. Because disorder sequence of objects may be more conductive to reduce overlapping
areas.
On the whole, the undesired observations such as collisions will increase the energy of the objective function in the optimization
process. Therefore, the mapping with minimum energy value will preserve activity information to the large extent and also the
temporal relations among objects are preserved. It can also reduce collisions and avoid scaling down the objects to a much smaller
size. Gaussian mixture model (GMM) can be used in order to classify the detected moving objects and to estimate its new position.
So that the collisions caused by the moving objects can be avoided to a greater extent.
Fig. 4: (a) Predicted object position (b) Estimated object position (c) New estimated position
Stitching Objects into Background

After the optimization process, objects that are scaled down are stitched back into the background to create the synopsis video. For
stitching the moving objects, a background video with user-defined length is generated and objects are added into the synopsis. In
previous section we have mentioned that for each frame in the original video, a corresponding background image is reconstructed
using a temporal median over a few video clips. Among these reconstructed background images, a uniform temporal sampling of
active periods is employed in order to produce the background video. This approach is simple but only efficient for short videos.
For more complicated background video generation approach, the time lapse background in [1] can be included. Since the perfect
segmentation for all the objects is not possible, direct stitching without any process will produce visible seams.
In this case, Poisson Image Editing [12] can be employed. Poisson Image Editing is an effective image fusion tool which can
reduce the change of gradient and fuse two images together smoothly. Thus, this approach can reduce the seams, and it is widely

used in many object based video synopsis techniques [1], [8]. Furthermore, we can also calculate the time that an object appears
in the original video by utilizing the frame rate and the frame original number. Then, we can mark the time for the corresponding
object in the synopsis, which can be an effective index of the original video and make up for some temporal inconsistency that can
occur.
Fig. 5: Single frame in synopsis video
IV. CONCLUSION
Video synopsis is an efficient way to condense the activities in a long surveillance video into a short synopsis video. The compact
synopsis can enable effective access to the activities present in the original video by preserving most of the activities. In this paper,
an approach is proposed to decrease collisions in the synopsis through reducing the sizes of moving objects to a particular limit.
Our approach determines the optimal time positions as well as appropriate reduction coefficients for all the objects that are detected.
Meanwhile, the geometric center of each moving object is kept same to remain the spatial location information. The results shows
that our approach is an effective method for video synopsis.
ACKNOWLEDGEMENT
I would like to express my gratitude to all other faculty members of department of electronics and communication
REFERENCES
[1] Y. Pritch, A. Rav-Acha, and S. Peleg, Nonchronological video synopsis and indexing, IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1971
1984, Nov. 2008.
[2] B. M. Wildemuth et al., How fast is too fast? Evaluating fast forward surrogates for digital video, in Proc. Joint Conf. Digit. Libraries, May 2003, pp. 221
230.
[3] N. Petrovic, N. Jojic, and T. S. Huang, Adaptive video fast forward,Multimedia Tools Appl., vol. 26, no. 3, pp. 327344, 2005.
[4] X. Zhu, X. Wu, J. Fan, A. K. Elmagarmid, and W. G. Aref, Exploring video content structure for hierarchical summarization, Multimedia Syst., vol. 10,
no. 2, pp. 98115, 2004
[5] T. Liu, X. Zhang, J. Feng, and K.-T. Lo, Shot reconstruction degree: A novel criterion for key frame selection, Pattern Recognit. Lett., vol. 25, no. 12, pp.
14511457, 2004.
[6] J. Nam and A. H. Tewfik, Video abstract of video, in Proc. IEEE 3rd Workshop Multimedia Signal Process., Sep. 1999, pp. 117122.
[7] Y.-F. Ma and H.-J. Zhang, A model of motion attention for video skimming, in Proc. Int. Conf. Image Process., vol. 1. 2002, pp. I-129I-132.
[8] Y. Nie, C. Xiao, H. Sun, and P. Li, Compact video synopsis via global spatiotemporal optimization, IEEE Trans. Vis. Comput. Graphics, vol. 19, no. 10,
pp. 16641676, Oct. 2013.
[9] C. Pal and N. Jojic, Interactive montages of sprites for indexing and summarizing security video, in Proc. IEEE Comput. Soc. Conf. Comput.Vis. Pattern
Recognit., Jun. 2005, p. 1192.
[10] H. Kang, Y. Matsushita, X. Tang, and X. Chen, Space-time video montage, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2006,
pp. 13311338.
[11] N. Manickam and S. Chandran, Automontage: Photo sessions made easy, in Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 13211325.
[12] Z. Li, P. Ishwar, and J. Konrad, Video condensation by ribbon carving, IEEE Trans. Image Process., vol. 18, no. 11, pp. 25722583, Nov. 2009.
[13] A. Rav-Acha, Y. Pritch, and S. Peleg,Making a long video short: Dynamic video synopsis, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit., Jun. 2006, pp. 435441.
[14] Y. Pritch, S. Ratovitch, A. Hendel, and S. Peleg, Clustered synopsis of surveillance video, in Proc. 6th IEEE Int. Conf. Adv. Video Signal Based Surveill.,
Sep. 2009, pp. 195200
[15] E. P. Bennett and L. McMillan, Computational time-lapse video, ACM Trans. Graph., vol. 26, no. 3, 2007, Art. ID 102.
[16] B. T. Truong and S. Venkatesh, Video abstraction: A systematic review and classification, ACM Trans. Multimedia Comput., Commun. Appl., vol. 3, no.
1, 2007, Art . ID 3.
[17] C. Gianluigi and S. Raimondo, An innovative algorithm for key frame extraction in video summarization, J. Real-Time Image Process., vol. 1, no. 1, pp.
6988, 2006.
[18] H. Liu, W. Meng, and Z. Liu, Key frame extraction of online video based on optimized frame difference, in Proc. 9th Int. Conf. Fuzzy Syst. Knowl.
Discovery, May 2012, pp. 12381242.
[19] C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. J. Delp, Automated video program summarization using speech transcripts, IEEE Trans.
Multimedia, vol. 8, no. 4, pp. 775791, Aug. 2006.
[20] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, A generic framework of user attention model and its application in video summarization, IEEE Trans.
Multimedia, vol. 7, no. 5, pp. 907919, Oct. 2005.
[21] X. Zhu, C. C. Loy, and S. Gong, Video synopsis by heterogeneous multi-source correlation, in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 8188.

Surveillance Video Synopsis Via Scaling Down Moving Objects

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Surveillance Video Synopsis Via Scaling Down Moving Objects

Uploaded by

Copyright:

Available Formats

IJSTE - International Journal of Science Technology & Engineering | Volume 3 | Issue 09 | March 2017

ISSN (online): 2349-784X

Surveillance Video Synopsis via Scaling Down

II. RELATED WORKS

Frame based Video Synopsis

All rights reserved by www.ijste.org 298

Object based Video Synopsis

Part based Video Synopsis

III. PROPOSED METHOD

All rights reserved by www.ijste.org 299

Fig. 2: Flow chart

Object detection and segmentation

All rights reserved by www.ijste.org 300

Stitching Objects into Background

All rights reserved by www.ijste.org 301

Fig. 5: Single frame in synopsis video

All rights reserved by www.ijste.org 302

You might also like