Video Stab Using Ransac

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)
Video Stabilization based on Point Feature

Matching Technique
Labeeb Mohsin Abdullah, Nooritawati Md Tahir* & Mustaffa Samad
Faculty of Electrical Engineering
Universiti Teknologi MARA (UiTM)
40450, Shah Alam, SELANGOR
Corresponding author: norita_tahir@ yahoo.com*
processing [9] .This three-step frameworks are the essential

steps in most of the video stabilization algorithms.
Abstract This study proposed an algorithm to stabilize jittery videos

directly without the need to estimate camera motion. A stable
output video will be attained without the effect of jittery that
caused by shaking the handheld camera during video recording.
Firstly, salient points from each frame from the input video is
identified and processed followed by optimizing and stabilize the
video. Optimization includes the quality of the video stabilization
and less unallied area after the process of stabilization. The output
of using such method showed good result in terms of stabilization
and discarded distortion from the output videos recorded in
different circumstances. Initial results showed that the proposed
technique is suitable to be used and provide great deal of
stabilization.
II. MATERIALS AND METHOD

This section will present the overview of the proposed
methodology and implementation as depicted in Figure 1.
Start
Reading frames from video sequence

recorded
Keywords: image processing, video stabilization, point feature

matching, salient points, image quality measurement
I.
INTRODUCTION
Recently, the market of handheld camera has growth rapidly.
However, video capturing by non-professional user normally
will lead to unanticipated effects. Hence, many researchers study
such drawbacks to enhance the quality of casual videos.
Currently, hardware stabilizers are attached to the cameras as
effective solution. On one hand, pre-processing techniques such
as nonlinear filters is applied to discard the unwanted noise. On
the other hand, using multi-stages for pre and post processing
could aggravate the existing problems according to errors
accumulative. However, there are shortcomings related to
process the videos with complicated motion such as multiple
moving foreground objects [1].
Generally the process of stabilization have to go through
three phases namely motion estimation, motion smoothing and
image composition [2]. For the first phase the purpose is to
estimate the motion between frames. After that, the parameters
of estimated motion which is obtained from the first phase will
be sent to motion compensation, where removing the highfrequency distortion and calculating the global transformation,
which is very important to stabilize the current frame. Next,
warping will be done by image composition for the frame under
978-1-4673-2036-8/12/$31.00 2012 IEEE
Identify salient points from each frame

Perform Harris Corner Detection
Select Correspondences between points

Apply sum of squared differences SSD
Estimate transform from noisy
correspondences
Find initial points of two frames
Apply Gaussian filter

Color composite of affine and S-R-T
transform output
Transform approximation and smoothing
Corrected frame sequence
END
Figure 1: Overview of the proposed method of video stabilization.
303
A.
Identification of salient points from each frame &

Harris Corner Detection
The main goal of this step is to correct the distortion
between the two frames by finding a transformation that will be
done by applying an object system which returns affine
transform [3]. The input for this stage should supply the object
with a set of point correspondences between the two frames [4].
Firstly, the wanted points from the two chosen frames have to be
identified followed by selecting the common correspondence
between the frames. At this point, the candidate points for each
frame are identified but to make sure that these points will have
corresponding points in the second frame, it is necessary to find
points around salient image features, like corners. Thus, Corner
Detector System Object is used to find corner values using
Harris Corner Detection which is one of the fastest algorithms to
find corner values.
Figure 2: The detected strong corners from both frames where they marked with
green dots.
B. Select correspondences between points & SSD

After the salient points from each frame are obtained the
correspondence between the points that are identified previously
need to be picked [4]. For each point, the matching of lowest
cost between the points that existed in frame A and B are also
needed to be found for all points. Hence, it is necessary to divide
the sequence of frames image into 99 block. The matching cost
means the distance between frame A and B measured in pixel.
To find this cost, the technique of Sum of Squared Differences
(SSD) can be used between the consecutive frame images. Each
point in frame A is compared with the points in frame B to find
the lowest matching cost or in other words the shortest distance
between them measured in pixels.
B. Corresponding points
Next, the initial correspondences between the points that is
identified from the previous step will be invoked.
Correspondences between the invoked points have to be picked
for each point, for that purpose a matrix of 9 x 9 blocks will be
extracted around each point from its consecutive image frames.
The most important here is matching the cost between points by
finding the Sum of Squared Differences (SSD) between the
consecutive image regions of frames. Thus we have to find the
lowest costs to consider them in the solution [8]. Figure 3
showed the same positions for the green color points of the
initial corresponding points existed in both frames.
III. RESULTS AND DISCUSSION

In this section, the results attained based on the proposed
methodology will be discussed. Table I showed the basic
characteristics of each video utilized as database in this study.
In addition, the values of the size and the number of bytes for
the salient points existed in each video are also tabulated.
A. Strong corners Detection
Firstly, an algorithm is developed based on Harris and
Stephens corner detection algorithm [2] to identify all salient
points or strong corners from each frame. These points are
considered as the anchor points as benchmark for points to be
considered and vice versa. Sample of detected points obtained
from two frames are as demonstrated in Figure 2. Furthermore,
it is observed the total points covered are the same frame
features for instance the salient points along the trees, corners
of the sidewalk and the moving object.
Figure 3: Corresponding points between frames.
However, not all these correspondence points are correct, which

means many of them are redundancy points, but at the same time
there is a significant number of outlier points as well. This lack
will be considered in the next step. SSD will ensure to find the
304
Firstly, the effect of the number of corners is influenced by the

output as resulted in Vid1 with highest matching point values
but least SSD followed by Vid3 and Vid2 respectively. This
indicated that Vid2 comprised of the maximum number of
salient points to be handled since the SSD attained is the highest.
minimum cost matching point in points B with the aid of

features, which resulted a loop over points A that search for best
matches in points B with features contribution.
C. Accurate correspondence
As mentioned above, there are several incorrect point
correspondences but strong estimation of geometric transform
between the two image frames can be determined using the
random sample consensus algorithm (RANSAC) [5][6]. This
algorithm searched through the given set of point
correspondences specifically valid linear correspondence as in
Figure 4.
D. Frames Correction
Further, the raw mean video frames and the mean of
corrected frame are computed as in Figure 5.
Figure 6:
Cod Figure 5:
Corrected frames
The left image showed the mean of the raw input frames that
resembled the distorted original video frame due to extreme
jittery. On the right side is the mean of the corrected frames with
less distortion. This proven that the stabilization algorithm
worked well. Several more samples of corrected video frames
are as depicted in Figure 6.
Figure 4: Correct correspondences according to RANSAC.
From Figure 4, the inliers correspondences consecrated in the

image background, not in the foreground, which itself is not
aligned; is observed. The reason stand behind this is the
background features are far enough that act as if they were on an
infinitely distant plane. We can assume that background plane is
static and will not change dramatically between the first and
second frame, instead, this transform is capturing the motion of
the camera. Thus correcting process will stabilized the video.
Furthermore, as long as the motion of the camera between frame
A and frame B is minimize or the time of sampling the video is
high enough, this condition is maintained. The RANSAC
algorithm is repeated multiple times and at each run the cost of
the result is calculated by projecting frame B onto frame A via
Sum of Absolute Differences between the two image frames and
the results attained is as in Table I.
E. Quality
The output video quality is also measured based on the
proposed methods. This is evaluated based on SVD based
grayscale Image value and graphical measurement.
i. SVD Based Grayscale Image Quality
Singular value decomposition (SVD) is developed as a new
measurement that can express the quality of distorted images
either graphically that is in 2D measurement or numerically as a
scalar measurement, both near and above the visual threshold.
The experiments here utilized SVD based measurement that
outperformed the normally used PSNR [10]. Equation 1
represented the computed value for this purpose:
305
the graphical results, Vid1 has the most size of blocks among all
frames blocks to indicate that it is the worst video experienced
distortion followed by Vid2 and Vid3. This result agreed with
the SVD value calculated earlier.
where:
Dmid represents the midpoint of the sorted DiS
k is the image size
n is the block size
M SVD is the measurement of Singular value
decomposition
An example for the output quality for Vid1 based on Equation 1
with k=8, n=1, Di and Dmid represented by 256256 matrix
attained M-SVD of 22.20. Hence, the numerical quality
obtained from the three sample videos are tabulated in Table 1.
As visualize in Figure 6, it can be seen that Vid3 obtained the
best quality based on the calculated value that is 40.50%
followed by Vid2 with 39.21% and Vid1 22.20%. This
resembled that Vid1 has great distortion whilst Vid3 is least
distorted.
Figure 7: Graphical measurement for each sample videos

(Vid1, Vid2 & Vid3) as quality stabilization indication
IV. CONCLUSION
In conclusion, the video stabilization technique based on
proposed method showed remarkable results in term of
stabilizing high jittery videos suffered from distortion. Initial
results also proven that due fusions of RANSAC algorithm,
Gaussian filter, Harris, Stephenss and SAD
efficiency
stabilization process succeeded based on the output quality
attained. Future work includes finding better feature detector and
overcome the consequences of extreme shaking of handheld
camera in feasible real time implementation for video
stabilization.
Figure 6: Three input videos (Vid1, Vid2, Vid3) with different

stabilization need to be done since recorded in different
circumstances.
Acknowledgment
Funding for presenting this study was supported by Faculty of
Electrical Engineeering, UiTM Shah Alam, Selangor.
i. Graphical measurement
The criteria of measuring graphical quality in any image or
frame can be done as shown in Figure 7. Graphical measurement
will indicate the condition of video due to distortion. As seen in
306
Table I: Criteria of sample videos & Results (Type: RGB & Extension: AVI)
st
Sample 1 Frame
Sample
Inputs
Size
Bytes
No
Vid1
2x139
Vid2
Vid3
1112
Frames #
&
Length
34 & 2
8.85e5
8.81
Quality
Value
(M -SVD)
22.20 %
2x66
528
73 & 3
0.0824
9.18
39.21%
2x128
1024
132 & 4
3.553e3
9.28
40.50%
REFERENCES
[1] M. Gleicher and F. Liu., Re-cinematography: Improving the camerawork
of casual video, ACM Transactions on Multimedia
Computing,
Communications, and Applications, 5(1), pp 1- 28, 2008.
[2] C. Harris and M.J. Stephens, A combined corner and edge detector,
Proc of Alvey Vision Conference, pp 147152, 1988.
[3] Anu Suneja and Gaurav Kumar . An Experimental Study of Edge
Detection Methods in Digital Image, Global Journal of Computer
Science and Technology, 10(2), 2010.
[4] http://www.mathworks.com/products/computervision/demos.html?file=/products/demos/shipping/vision/videostabilize_
pm.html.
[5] Fischler, MA; Bolles, RC. "Random Sample Consensus: A Paradigm for
Model Fitting with Applications to Image Analysis and Automated
Cartography." Comm. of the ACM 24, 1981.
[6] Tordoff, B; Murray, DW. "Guided sampling and consensus for motion
estimation." 7th European Conference on Computer Vision, 2002.
[7] J. Jin, Z. Zhu, and G. Xu. Digital video sequence stabilization based on
2.5D motion estimation and inertial motion filtering, Real-Time
Imaging, 7(4):357365, 2001.
[8] http://siddhantahuja.wordpress.com/tag/sum-of-squared-differences/
[9] M. Pilu. Video stabilization as a variation problem and numerical
solution with the Viterbi method. In Proceedings of Computer Vision
and Pattern Recognition, pp 625630, 2004.
[10] Aleksandra Shnayderman, Alexander Gusev, and Ahmet M. Eskicioglu
An SVD-Based Grayscale Image Quality Measure for Local and Global
Assessment ,IEEE 15(2), 2006.
307
SAD
Value
Computational
Time (s)

Video Stab Using Ransac

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Stab Using Ransac

Uploaded by

Copyright:

Available Formats

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

Video Stabilization based on Point Feature

processing [9] .This three-step frameworks are the essential

Abstract This study proposed an algorithm to stabilize jittery videos

II. MATERIALS AND METHOD

Reading frames from video sequence

Keywords: image processing, video stabilization, point feature

978-1-4673-2036-8/12/$31.00 2012 IEEE

Identify salient points from each frame

Select Correspondences between points

Apply Gaussian filter

Figure 1: Overview of the proposed method of video stabilization.

Identification of salient points from each frame &

B. Select correspondences between points & SSD

III. RESULTS AND DISCUSSION

Figure 3: Corresponding points between frames.

However, not all these correspondence points are correct, which

Firstly, the effect of the number of corners is influenced by the

minimum cost matching point in points B with the aid of

Figure 4: Correct correspondences according to RANSAC.

From Figure 4, the inliers correspondences consecrated in the

Figure 7: Graphical measurement for each sample videos

Figure 6: Three input videos (Vid1, Vid2, Vid3) with different

You might also like