You are on page 1of 5

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

Video Stabilization based on Point Feature


Matching Technique
Labeeb Mohsin Abdullah, Nooritawati Md Tahir* & Mustaffa Samad
Faculty of Electrical Engineering
Universiti Teknologi MARA (UiTM)
40450, Shah Alam, SELANGOR
Corresponding author: norita_tahir@ yahoo.com*

processing [9] .This three-step frameworks are the essential


steps in most of the video stabilization algorithms.

Abstract This study proposed an algorithm to stabilize jittery videos


directly without the need to estimate camera motion. A stable
output video will be attained without the effect of jittery that
caused by shaking the handheld camera during video recording.
Firstly, salient points from each frame from the input video is
identified and processed followed by optimizing and stabilize the
video. Optimization includes the quality of the video stabilization
and less unallied area after the process of stabilization. The output
of using such method showed good result in terms of stabilization
and discarded distortion from the output videos recorded in
different circumstances. Initial results showed that the proposed
technique is suitable to be used and provide great deal of
stabilization.

II. MATERIALS AND METHOD


This section will present the overview of the proposed
methodology and implementation as depicted in Figure 1.
Start

Reading frames from video sequence


recorded

Keywords: image processing, video stabilization, point feature


matching, salient points, image quality measurement
I.
INTRODUCTION
Recently, the market of handheld camera has growth rapidly.
However, video capturing by non-professional user normally
will lead to unanticipated effects. Hence, many researchers study
such drawbacks to enhance the quality of casual videos.
Currently, hardware stabilizers are attached to the cameras as
effective solution. On one hand, pre-processing techniques such
as nonlinear filters is applied to discard the unwanted noise. On
the other hand, using multi-stages for pre and post processing
could aggravate the existing problems according to errors
accumulative. However, there are shortcomings related to
process the videos with complicated motion such as multiple
moving foreground objects [1].
Generally the process of stabilization have to go through
three phases namely motion estimation, motion smoothing and
image composition [2]. For the first phase the purpose is to
estimate the motion between frames. After that, the parameters
of estimated motion which is obtained from the first phase will
be sent to motion compensation, where removing the highfrequency distortion and calculating the global transformation,
which is very important to stabilize the current frame. Next,
warping will be done by image composition for the frame under

978-1-4673-2036-8/12/$31.00 2012 IEEE

Identify salient points from each frame


Perform Harris Corner Detection

Select Correspondences between points


Apply sum of squared differences SSD
Estimate transform from noisy
correspondences
Find initial points of two frames

Apply Gaussian filter


Color composite of affine and S-R-T
transform output
Transform approximation and smoothing
Corrected frame sequence

END

Figure 1: Overview of the proposed method of video stabilization.

303

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

A.

Identification of salient points from each frame &


Harris Corner Detection
The main goal of this step is to correct the distortion
between the two frames by finding a transformation that will be
done by applying an object system which returns affine
transform [3]. The input for this stage should supply the object
with a set of point correspondences between the two frames [4].
Firstly, the wanted points from the two chosen frames have to be
identified followed by selecting the common correspondence
between the frames. At this point, the candidate points for each
frame are identified but to make sure that these points will have
corresponding points in the second frame, it is necessary to find
points around salient image features, like corners. Thus, Corner
Detector System Object is used to find corner values using
Harris Corner Detection which is one of the fastest algorithms to
find corner values.

Figure 2: The detected strong corners from both frames where they marked with
green dots.

B. Select correspondences between points & SSD


After the salient points from each frame are obtained the
correspondence between the points that are identified previously
need to be picked [4]. For each point, the matching of lowest
cost between the points that existed in frame A and B are also
needed to be found for all points. Hence, it is necessary to divide
the sequence of frames image into 99 block. The matching cost
means the distance between frame A and B measured in pixel.
To find this cost, the technique of Sum of Squared Differences
(SSD) can be used between the consecutive frame images. Each
point in frame A is compared with the points in frame B to find
the lowest matching cost or in other words the shortest distance
between them measured in pixels.

B. Corresponding points
Next, the initial correspondences between the points that is
identified from the previous step will be invoked.
Correspondences between the invoked points have to be picked
for each point, for that purpose a matrix of 9 x 9 blocks will be
extracted around each point from its consecutive image frames.
The most important here is matching the cost between points by
finding the Sum of Squared Differences (SSD) between the
consecutive image regions of frames. Thus we have to find the
lowest costs to consider them in the solution [8]. Figure 3
showed the same positions for the green color points of the
initial corresponding points existed in both frames.

III. RESULTS AND DISCUSSION


In this section, the results attained based on the proposed
methodology will be discussed. Table I showed the basic
characteristics of each video utilized as database in this study.
In addition, the values of the size and the number of bytes for
the salient points existed in each video are also tabulated.
A. Strong corners Detection
Firstly, an algorithm is developed based on Harris and
Stephens corner detection algorithm [2] to identify all salient
points or strong corners from each frame. These points are
considered as the anchor points as benchmark for points to be
considered and vice versa. Sample of detected points obtained
from two frames are as demonstrated in Figure 2. Furthermore,
it is observed the total points covered are the same frame
features for instance the salient points along the trees, corners
of the sidewalk and the moving object.

Figure 3: Corresponding points between frames.

However, not all these correspondence points are correct, which


means many of them are redundancy points, but at the same time
there is a significant number of outlier points as well. This lack
will be considered in the next step. SSD will ensure to find the

304

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

Firstly, the effect of the number of corners is influenced by the


output as resulted in Vid1 with highest matching point values
but least SSD followed by Vid3 and Vid2 respectively. This
indicated that Vid2 comprised of the maximum number of
salient points to be handled since the SSD attained is the highest.

minimum cost matching point in points B with the aid of


features, which resulted a loop over points A that search for best
matches in points B with features contribution.

C. Accurate correspondence
As mentioned above, there are several incorrect point
correspondences but strong estimation of geometric transform
between the two image frames can be determined using the
random sample consensus algorithm (RANSAC) [5][6]. This
algorithm searched through the given set of point
correspondences specifically valid linear correspondence as in
Figure 4.

D. Frames Correction
Further, the raw mean video frames and the mean of
corrected frame are computed as in Figure 5.

Figure 6:
Cod Figure 5:

Corrected frames

The left image showed the mean of the raw input frames that
resembled the distorted original video frame due to extreme
jittery. On the right side is the mean of the corrected frames with
less distortion. This proven that the stabilization algorithm
worked well. Several more samples of corrected video frames
are as depicted in Figure 6.

Figure 4: Correct correspondences according to RANSAC.

From Figure 4, the inliers correspondences consecrated in the


image background, not in the foreground, which itself is not
aligned; is observed. The reason stand behind this is the
background features are far enough that act as if they were on an
infinitely distant plane. We can assume that background plane is
static and will not change dramatically between the first and
second frame, instead, this transform is capturing the motion of
the camera. Thus correcting process will stabilized the video.
Furthermore, as long as the motion of the camera between frame
A and frame B is minimize or the time of sampling the video is
high enough, this condition is maintained. The RANSAC
algorithm is repeated multiple times and at each run the cost of
the result is calculated by projecting frame B onto frame A via
Sum of Absolute Differences between the two image frames and
the results attained is as in Table I.

E. Quality
The output video quality is also measured based on the
proposed methods. This is evaluated based on SVD based
grayscale Image value and graphical measurement.
i. SVD Based Grayscale Image Quality
Singular value decomposition (SVD) is developed as a new
measurement that can express the quality of distorted images
either graphically that is in 2D measurement or numerically as a
scalar measurement, both near and above the visual threshold.
The experiments here utilized SVD based measurement that
outperformed the normally used PSNR [10]. Equation 1
represented the computed value for this purpose:

305

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

the graphical results, Vid1 has the most size of blocks among all
frames blocks to indicate that it is the worst video experienced
distortion followed by Vid2 and Vid3. This result agreed with
the SVD value calculated earlier.

where:
Dmid represents the midpoint of the sorted DiS
k is the image size
n is the block size
M SVD is the measurement of Singular value
decomposition
An example for the output quality for Vid1 based on Equation 1
with k=8, n=1, Di and Dmid represented by 256256 matrix
attained M-SVD of 22.20. Hence, the numerical quality
obtained from the three sample videos are tabulated in Table 1.
As visualize in Figure 6, it can be seen that Vid3 obtained the
best quality based on the calculated value that is 40.50%
followed by Vid2 with 39.21% and Vid1 22.20%. This
resembled that Vid1 has great distortion whilst Vid3 is least
distorted.

Figure 7: Graphical measurement for each sample videos


(Vid1, Vid2 & Vid3) as quality stabilization indication

IV. CONCLUSION
In conclusion, the video stabilization technique based on
proposed method showed remarkable results in term of
stabilizing high jittery videos suffered from distortion. Initial
results also proven that due fusions of RANSAC algorithm,
Gaussian filter, Harris, Stephenss and SAD
efficiency
stabilization process succeeded based on the output quality
attained. Future work includes finding better feature detector and
overcome the consequences of extreme shaking of handheld
camera in feasible real time implementation for video
stabilization.

Figure 6: Three input videos (Vid1, Vid2, Vid3) with different


stabilization need to be done since recorded in different
circumstances.

Acknowledgment
Funding for presenting this study was supported by Faculty of
Electrical Engineeering, UiTM Shah Alam, Selangor.

i. Graphical measurement
The criteria of measuring graphical quality in any image or
frame can be done as shown in Figure 7. Graphical measurement
will indicate the condition of video due to distortion. As seen in

306

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

Table I: Criteria of sample videos & Results (Type: RGB & Extension: AVI)
st

Sample 1 Frame

Sample
Inputs

Size

Bytes
No

Vid1

2x139

Vid2

Vid3

1112

Frames #
&
Length
34 & 2

8.85e5

8.81

Quality
Value
(M -SVD)
22.20 %

2x66

528

73 & 3

0.0824

9.18

39.21%

2x128

1024

132 & 4

3.553e3

9.28

40.50%

REFERENCES
[1] M. Gleicher and F. Liu., Re-cinematography: Improving the camerawork
of casual video, ACM Transactions on Multimedia
Computing,
Communications, and Applications, 5(1), pp 1- 28, 2008.
[2] C. Harris and M.J. Stephens, A combined corner and edge detector,
Proc of Alvey Vision Conference, pp 147152, 1988.
[3] Anu Suneja and Gaurav Kumar . An Experimental Study of Edge
Detection Methods in Digital Image, Global Journal of Computer
Science and Technology, 10(2), 2010.
[4] http://www.mathworks.com/products/computervision/demos.html?file=/products/demos/shipping/vision/videostabilize_
pm.html.
[5] Fischler, MA; Bolles, RC. "Random Sample Consensus: A Paradigm for
Model Fitting with Applications to Image Analysis and Automated
Cartography." Comm. of the ACM 24, 1981.
[6] Tordoff, B; Murray, DW. "Guided sampling and consensus for motion
estimation." 7th European Conference on Computer Vision, 2002.
[7] J. Jin, Z. Zhu, and G. Xu. Digital video sequence stabilization based on
2.5D motion estimation and inertial motion filtering, Real-Time
Imaging, 7(4):357365, 2001.
[8] http://siddhantahuja.wordpress.com/tag/sum-of-squared-differences/
[9] M. Pilu. Video stabilization as a variation problem and numerical
solution with the Viterbi method. In Proceedings of Computer Vision
and Pattern Recognition, pp 625630, 2004.
[10] Aleksandra Shnayderman, Alexander Gusev, and Ahmet M. Eskicioglu
An SVD-Based Grayscale Image Quality Measure for Local and Global
Assessment ,IEEE 15(2), 2006.

307

SAD
Value

Computational
Time (s)

You might also like