You are on page 1of 11

A Technical Paper

On

that help to produce time


consistent results and also
improve the performance of the
algorithm by re-ducing the search
space. In the first video inpainting
step, moving objects reconstruct
in the foreground that are
“occluded” by the region to be
inpainted. To this end,gap filled
as much as possible by copying
information from the moving
foreground in other frames, using
a priority-based scheme. In the
second step, inpaint the
remaining hole with the
background. To accomplish this,
first frames are aligned and
directly copy when possible. The
remaining pixels are filled in by
Abstract extending spatial texture
A framework for inpainting synthesis techniques to the
missing parts of a video sequence spatiotemporal domain. The
recorded with a moving or proposed framework has several
stationary camera is presented in advantages over state-of-the-art
this paper. The region to be algorithms that deal with similar
inpainted is general: It may be types of data and constraints. It
still or moving, in the background permits some camera motion, is
or in the foreground, it may simple to implement, fast, does
occlude one object and be not require statistical models of
occluded by some other object. background nor foreground,
The algorithm consists of a works well in the presence of rich
simple preprocessing stage and and cluttered backgrounds, and
two steps of video inpainting. In the results show that there is no
the preprocessing stage, each visible blurring or motion
frame is segment into foreground artifacts. A number of real
and background. Segmentation is examples are shown supporting
used to build three image mosaics these findings.
results are good, although not free
Index Terms—Camera motion, from artifacts. Damaged moving
special effects, texture synthesis, objects are reconstructed by
video inpainting. synthesizing a new un-damaged
object, overlaying it on the sequence,
I . INTRODUCTION and moving it along a new,
AND OVERVIEW interpolated trajectory. This approach
produces very noticeable artifacts
Introduction to the Video Inpainting where objects move in an unrealistic
Problem : way (for instance a walking person
seems at some points to float over the
The problem of automatic video ground).
restoration in general, and automatic
object removal and modification in II. ASSUMPTIONS
particular, is beginning to attract the AND
attention of many researchers.This PREPROCESSING
paper address a constrained but
important case of video inpainting.
Basic Assumptions :
Assume that the camera motion is • The scene essentially consists of
approximately parallel to the plane of stationary background with some
image projection, and the scene moving foreground.
essen-tially consists of stationary
background with a moving • Camera motion is approximately
parallel to the plane of image
foreground, both of which may projection. This restriction ensures
require inpainting. The algorithm that background objects will not
described in this paper is able to (significantly) change size
inpaint objects that move in any
fashion but do not change size • Foreground objects move in a
appreciably. As we will see below, repetitive fashion. In order to
recover occluded or damaged
these assumptions are implicitly or foreground, and without the use of
explicitly present in most state of the probabilistic models or libraries the
art algorithms for video inpainting, vital information must be present in
but they still leave a very challenging the video itself. Hence, this
task and apply to numerous scenarios. “periodicity” assumption.
There is an important amount of user
• Moving objects do not
interaction: the user has to manually significantly change size. This
draw the boundaries of the different constraint can be removed by using
depth layers of the sequence. Also, a multiscale matching algorithm
the algorithm has to “learn” the which can address the change in
statistics of the background. The size when the object moves away
from or towards the camera.
motion of the objects in the
background is restricted to be
periodic, which implies that objects
also do not change scale as they
move, so movement is approximately
on a plane parallel to the projection
plane of the camera. All the examples
shown involve either a static camera
or a very smooth horizontal “lateral
dolly” type of camera motion. The
static background besides
translation, can be easily computed
by a simple thresholding of the
block-matching result.Image
mosaics used here are able to deal
with some camera motion and to
speed up the inpainting process. A
mosaic is a panoramic image
obtained by stitching a number of
frames together. In the
preprocessing stage three mosaics
are given : a background mosaic, a
foreground mosaic, and an optical
flow mosaic. The computation of
gives us a segmentation of the
sequence into foreground and
background layers, as well as a
good estimate of the camera shift
for each frame. This camera shift is
used for estimation to align
(register) the frames. Each mosaic
is built from the set of aligned
overlapping frames in the following
way: each pixel of the mosaic is the
average of the overlapping
components. This is straightforward
in the case of the foreground and
background mosaics. In Fig. 3, one
can see the mosaics obtained from a
Fig. 1. Overview of the proposed video sequence shown in Fig. 2.
video inpainting algorithm. For the optical flow mosaic, which
contains data used for the Sum of
Preprocessing : Squared Difference (SSD)
computations as shown below,two-
channel image is used here to store
The simple assumptions allow us to the horizontal and vertical
compute a rough “motion components of the residual optical
confidence mask” for each frame flow, that is, the motion vectors
just by comparing it with the from which we have subtracted the
following frame using block camera shift. In Fig. 3 we use color
matching. The median shift of all coding to represent the direction of
the blocks in the image gives a these 2-D vectors: green tones
good estimate of the camera shift in indicate horizontal motion and red
this case. Any block that has tones indicate vertical motion. We
considerable shift after subtracting must mention that there are more
this median camera motion is sophisticated mosaic generation
assumed to belong to the moving techniques in the literature to
foreground. Hence, given that the handle camera motion
motion of the camera does not
produce any transformation of the
mentioned three mosaics. so let us
start this section by briefly
reviewing that procedure.

Fig. 2. Some frames of a video Review of Nonparametric


sequence satisfying the assumptions Sampling
Given the problem of inpainting an
image “hole” Ω in a still image , this
is simple yet extremely effective
algorithm. For each pixel P in the
boundary of Ω, consider its
surrounding patch ψp, a square
centered in P. Compare this patch,
using a simple metric such as the
sum of squared differences (SSD),
with every possible patch in the
image. There will be a set of patches
with small SSD distance to ψp .
Randomly choose a patch ψq from
Fig. 3. Preprocessing step: this set, and copy its central pixel Q
background, foreground, and optical- to the current pixel P. We have filled
flow mosaics, respectively, of the P, so next we proceed to the
sequence shown in Fig. 2. following pixel in the boundary of
Ω noted that the order in which the
pixels of Ω filled is crucial, so in
This mosaic generation step allows [11], they proposed an inpainting
us to do a quick search for possible procedure which is basically that of
candidate frames from where to Efros and Leung with a new
copy information when filling in the ordering scheme that allows to
moving object, thereby speeding up
the implementation by limiting restore long structures “occluded”
search to only these candidate by the hole Ω compute a priority
frames instead of the entire value for each pixel in the boundary
sequence. The next section discusses of Ω , and at each step the pixel
the moving foreground completion chosen for filling is the one with the
step in detail. highest priority. For any given pixel
III. MOTION
INPAINTING .
The algorithm consists of a
are
preprocessing stage and two steps of
P ,its priority Pr (P) is the product of
video inpainting. In the first video
two terms: confidence term C(P)
inpainting step, that of Motion
and a data term D(P) :
Inpainting, reconstruct the
Pr (P)= C(P) D(P) . The confidence
foreground (moving) objects that are
term C(P) is proportional to the
“occluded” by the region to be
number of undamaged and reliable
inpainted. To this end fill the gap as
pixels surrounding P. The data term
much as possible by copying
D(P) if there is an image edge
information from the moving
arriving at P, and highest if the
foreground in another frame, using a
direction of the edge is orthogonal to
priority-based scheme and the above
the boundary of Ω thus higher computationally very inefficient.
priority values at significant edges Hence, there is need to first have a
that need to be continued are small set of candidate frames
obtained.It is important to note that which can provide information to
data from the mosaics is not used to complete those moving objects. To
fill in the damaged frames. The achieve this, first search in the
mosaics are only used to search for foreground mosaic, since
the “candidate-undamaged-frames,” foreground objects are inpainting,
from where copy is done into the to find the candidate frames, i.e., a
damaged frames. small subset of frames where there
is look for the best match. This
“initial guess search” is
implemented using the following
steps:

Fig. 5. Pseudocode for the motion


inpainting step

Fig. 4. Foreground candidate search


process. First, (top) the highest priority 1) In the current damaged frame
patch is located in the damaged frame, under consideration,3 find the highest
and then the mosaic is used to find the priority location P and its
candidate frames (f ; f ; . . .; f ) from surrounding patch ψp
where information can be copied into the
damaged frame (F ). and its surrounding patch
2) Using the already available
Initial Guess Search camera shifts computed during the
Coming back to our problem,it preprocessing step, find the
start’s by restoring moving objects corresponding location Pm for P and
“occluded” by the gap in video also its surrounding patch the
sequence.This restoration can be foreground mosaic.
done by copying suitable 3) Using as a template
information from other frames, but perform a search in the foreground
searching the entire sequence for a mosaic to find the matching patch(es)
good match would be
computed using central differences.

Second, when looking for a (2-D)


match for the template patch
SSD metric is used for involving a 5-
D vector value composed by the color
values R,G,B and and the optical
flow values VX and VY.The optical
flow components are computed using
the simple approximation
VZ = It / Ix and VY = It / Iy

Where I is the grayscale frame under


consideration, and Ix , Iy and It
are its horizontal, vertical, and
temporal derivatives, respectively
(computed with a very simple
numerical scheme, like central
differences). The optical flow
components can be computed with
Fig. 6. Motion inpainting scheme. Green more recent, robust and fast state of
dots indicate highest priority. (Frame A) the art techniques ,but all results were
Red squares indicate the patch to be obtained with the very simple
inpainted and (frame B) the corresponding approximation just described.Adding
best match. Areas filled with red are optical flow to the SSD vector helps
constrained to have zero priority.
us to ensure motion consistency. For
example, if the moving person in a
4) Now, using the camera shifts and
video has his right leg going forward
the motion confidence masks for
and left going backward, there is no
each frame, identify the frames that
way to get a similar match without
have motion at the location
using optical flow, because in a 2-D
corresponding to the mosaic area
image this situation would look
specified by the matching patch(es)
similar (in R,G,B) to the situation
when the two legs are in the same
These frames are candidate position but moving in the opposite
frames for searching a matching direction (i.e., left leg moving
patch for the highest priority forward and right moving backward).
location in the current frame.

Now some details on the above Copy Foreground and


steps. First, for the data term D(P)
in the priority computation, the Constrain Priority
following formula is used : Once the candidate frames are
identified, the main process of
D(k) = | (▼ M 1\c)k . nk | / ά motion inpainting is
performed(refer to Fig. 6). Then
Where ά is a normalizing constant search each candidate frame for a
(usually 255) and nk is the normal best matching patch ψQ, the patch
to the hole boundary. The inner with minimum distance to our
product of the rotated gradient of target patch ψP . Again,following
MC,▼M1\c and the normal nk is
[24] we use the SSD metric for the
distance computation, and a 5-D Ending the Foreground
vector value composed of the three Inpainting
color components and the two Repeat the above steps until all the
optical flow components.Once the pixels in the inpainting area are
matching patch ψQ is found, instead either filled in or have zero priority
of fully copying it onto the target for motion inpainting (i.e., are
ψP, do the following. Look at MC “disabled” as explained above). This
and copy from ψQ only the pixels is precisely indication that moving
that correspond to the background, objects have been fully inpainted in
so do not fill them at this motion the current frame. Now repeat this
inpainting stage. For this reason, process for all the frames that
mark them to have zero priority require motion inpainting. This
(i.e., “disable” them from any gives us a sequence with only
future motion-filling in). moving objects filled in, and the rest
of the missing region needs to be
This last one is a key point of filled in with background
algorithm. The separation of
background and foreground is BACKGROUND
essential if the background is rich INPAINTING
and inhomogeneous. If the whole Once after finishing the stage of
patch ψQ is copied instead of only Motion Inpainting, next enter the
its foreground pixels, it is assuming stage i.e, inpaint the background.
that whenever foreground matches To accomplish this first align the
foreground, their surrounding frames and directly copy whenever
background matches as well. Such possible, while the remaining pixels
an assumption would imply that the are filled in by extending spatial
background is more or less the texture synthesis techniques to the
same all along the trajectory of the spatiotemporal domain.
moving foreground object(s). Let us see this in a little more
detail.
Update When there is camera motion
involved, often the background is less
occluded in one frame than another
After inpainting ψp , the MC values at .When filling in the background, align
ψp are updated to the MC values at ψp. all the frames using the precomputed
Next, we update the confidence C(p) shifts, and then look for background
information available in nearby
at each newly inpainted pixel p as frames. Then copy this temporal
follows: information using a “nearest neighbor
first” rule, that is, copy available
C(p)=∑q€ψp∩(MC\Ω)C(q)/| ψp | information from the “temporally
nearest” frame .Note that this will, of
course, be faster and of better quality
where,|ψp| is the area of the patch and than a simple block synthesizing
Ω is the region of inpainting. procedure.In cases where the occluder
Finally, update the foreground is stationary (refer to Fig. 7), there is a
and the optical flow mosaics with considerable part of the background
the newly filled-in data. that remains occluded in all of the
frames. This shows up as a hole in the
background mosaic. Fill in this hole
directly on the background mosaic
using the priority. The missing
information in each frame is then In Fig. 9, an artificial rectangular
copied from the the inpainted hole in each frame at the same
background mosaic, by spatially location. This presents not only a
aligning the frame with the mosaic challenging task but also models
using the precomputed shifts. This practical scenarios like a camera with
leads to a consistent looking a damaged set of CCDs, or a speckle
background throughout the sequence. in the camera lens or on the film
stock. Notice also that the camera is
in motion throughout the sequence.
The moving person has been
successfully inpainted and the filled-
in background is consistent along the
sequence, thanks in part to the
mosaic filling process.

Fig. 7. (Left) Missing part of the


background is (right) filled in using a
priority based texture synthesis scheme
derived from [11].

EXAMPLES
Please note that, even if we display
the inpainted videos at full resolution,
no blurring artifacts appear, Fig. 8
shows at large scale a restored frame.
Also, in the video results, it can be
observed that inpainted moving
objects have a consistent, natural
looking trajectory. These results are Fig. 9. Example of video inpainting with
moving camera. The damaged part in the
state of the art, lacking the visual original sequence is filled in, while the
artifacts present in recent works on motion generated is globally consistent.
the subject, and with (a) Some frames of the original sequence,
a faster and generally simpler with missing area. (b) Moving
person is filled in, note the consistency in
technique. the generated motion. (c) Completely
filled-in sequence.

Fig. 10 shows another real-life video


sequence where a moving person is
occluded by a stationary pole, which
also occludes considerable amount of
the static background in all the
frames.Notice that the camera does
not move exactly parallel to the plane
of projection while tracking the
person of interest, which shows that
Fig. 8. Left: A frame from the video in our method is not very sensitive to
Fig. 11 shown in large scale. Right: Its mild relaxations of the assumptions
inpainted result. Resolution is 640 2 480. stated in Section II-A. We have
Notice how there is no blur in the successfully removed the pole and
inpainted region, even at this full
the motion of the person is
seamlessly continued through the
resolution.
region of inpainting. Again, Fig. 7
illustrates the hole in the background
due to the static occluder, which is
inpainted directly on the background
mosaic, as described earlier.

Fig. 11. Person running in from the left


(occluder) is removed and the person
walking (object of interest) is completed.
In the final result (d), we have used the
Fig. 10. Static occluder is inpainted from average background (from the
background mosaic), in all frames, to
a sequence with significant camera compensate for subtle light variations.
motion. (The results are noteworthy as (a) Original sequence with a moving
the camera motion is not completely occluder.
(b) Sequence with occluder removed.
parallel to the plane of projection, leading (c) The moving person is filled in.
to parallax. There are also inter-frame (d) The area of occlusion is completely
filled in.
light variations in the original sequence).
(a) Original sequence with a large pole
Fig. 12 show that algorithm works
occluding the moving person as well as
considerable amount of the background. well even when the captured video
(b) The occluding pole is removed. does not strictly adhere to the
(c) Moving person is filled in. assumptions mentioned in Section II-
(d) Completely filled-in sequence. A. The moving car moves at an angle
to the plane of projection, thereby
changing size. The occluder is
removed and the filled-in background
is consistent throughout the video, in
spite of appreciable hand-held camera
motion and small parallax

Fig. 11 shows a challenging example


where the region to inpaint is a
moving object that changes shape
constantly.
Fig. 12. Red car moves at an angle to the
camera, thereby slightly changing size as
it moves towards the right.
(a) Original sequence with a car moving
“nonparallel” to plane of projection.
(b) Moving car is inpainted.
(c) Background is also filled in.

Figs. 13 and 14 It should be


observed that technique compares
favorably even in the presence of the
moderate dynamic background in
Fig. 14, though algorithm was not
designed to specifically address
dynamic background.This is achieved Fig. 14. Lady moving towards left is
by incorporating optical-flow in the occluded. Proposed approach inpaints the
SSD computation for synthesizing occluder and does a good job of filling in
background also. Also note the better the moderate dynamic background. Also
performance of technique in restoring note the better performance of our
small moving objects such as the hat technique in restoring a small moving
in the woman’s hand, or her left leg. object such as the hat in the woman’s
The inpainted region in Fig. 13 is hand. (a) Frames from original sequence,
sharp and no oversmoothing is with dynamic background. (b) Results
from [24]. (c) Results using the proposed
observed.
approach. (d) Detail of second frame:
proposed approach (right).

CONCLUDING
REMARKS
Here it is presented a simple
framework for filling in video
sequences in the presence of camera
motion. The technique is based on
combining motion based inpainting
with spatial inpainting, using three
image mosaics that allow us to deal
with camera motion and speed up the
process as well. If there are moving
objects to be restored, they are filled in
first, independently of the changing
background from one frame to
another.Then the background is filled
Fig. 13. Jumping girl moving from left to in by extending spatial texture
right is occluded by another person. The synthesis techniques to the
proposed technique inpaints the occluder spatiotemporal domain.To be able to
and fills-in the background without the
oversmoothing deal with arbitrary camera motion
(including zooms), changes of scale in
the moving objects, and dynamic
backgrounds. Currently algorithm does
not address complete occlusion of the
moving object .Here the work is
towards adapting technique to such
scenarios. Also to be addressed are the
automated selection of parameters
(such as patch size, mosaic size, etc.),
and dealing with illumination changes
along the sequence.

REFERENCES
[1] D. Anguelov, P. Srinivasan, D. Koller,
S. Thrun, J. Rodgers, and J.Davis, “Scape:
Shape completion and animation of
people,”
[2] S. Baker, R. Szeliski, and P. Anandan,
“A layered approach to stereo
reconstruction,” Comput. Vis. Pattern
Recognit., p. 434, 1998.
[3] C. Ballester, V. Caselles, and J.
Verdera, “Dissoclusion by joint
interpolation of vector fields and gray
levels,” SIAM Multiscale Model. Simul.
[4] M. Bertalmio, A. L. Bertozzi, and G.
Sapiro, “Navier-stokes, fluid dynamics,
and image and video inpainting,” in Proc.
IEEE Computer Vision Pattern
Recognition, 2001, vol. 1, pp. 355–362.
[5] M. Bertalmio, G. Sapiro, V. Caselles,
and C. Ballester, “Image inpainting,” in
Proc. ACM SIGGRAPH, 2000
[6] M. Bertalmio, L. Vese, G. Sapiro, and
S. Osher, “Simultaneous texture and
structure image inpainting,” IEEE Trans.
Image Process., vol. 12
[7] M. J. Black and P. Anandan, “The
robust estimation of multiple motions:
parametric and piecewise-smooth flow
fields,” Comput. Vis.Image Understand.
[8] A. Bruhn and J. Weickert, “Towards
ultimate motion estimation: combining
highest accuracy with real-time
performance,” presented at the IEEE Int.
Conf. Computer Vision, 2005.
[9] V. Caselles, L. Igual, and L. Garrido,
“A contrast invariant approach to motion
estimation,” presented at the Scale Space
Conf., 2005.
[10] V. Cheung, B. J. Frey, and N. Jojic,
“Video epitomes,” in IEEE
Conf.Computer Vision and Pattern
Recognition, 2005.

You might also like