You are on page 1of 8

Boundary Feature Extraction by Contour Matching and an Efficient Algorithm for Automated

Reconstruction of Torn-paper Images


Glory Grace P. Grajo, Janine Riva O. Rufo, Vladimir Y. Mariano
Institute of Computer Science
University of the Philippines Los Baos
College, Laguna Philippines
{glorygracegrajo, janinerufo} @yahoo.com, vymariano@uplb.edu.ph

I. Introduction
Reconstruction of torn images or documents is used in different fields like forensic science
investigation for piecing together evidence, in archeology for reconstruction of historical papers
[1], and other departments that involves important pieces of documents.
Manual reconstruction of torn-paper images is a very difficult task, especially with small pieces,
and often leads to low quality output that damages the contents of a document or obscures the
image. Torn-paper images or documents can either be hand torn or shredded [1]. Hand-torn
pieces have variable shapes, meaning they may differ in sizes and edge curves unlike that of a
common jigsaw puzzle. These pieces have unique edge curves which is a good feature used in
finding its adjacent piece. However, hand-torn images may also result to pieces with the same
edge curves. These are pieces torn at the same time. The feature used in finding its neighbor is
the same as that of a shredded image.
Reconstruction of shredded documents or images is more difficult since they have the same
shape, the same edges and most of the time, the same sizes. The pixel information on the border
of a piece will be used to find its adjacent piece [4]. Pieces with the same color and intensity on
the borders will be connected.
The proposal will implement a method that involves two processes for boundary feature
extraction contour boundary extraction and boundary feature extraction. The proposal is set to
reconstruct both hand-torn shredded documents or images. It also proposes an efficient algorithm
in joining the pieces for the final output.
The proposal, however has some limitations concerning the input data:
(a). The torn pieces can only have small shears. Large shears as shown in the figure below can
result to wrong matches. Shears result from twisting and skewing of the grip while tearing the
paper. [1]

Fig. 1. An example of a large shear.


(b). The input pieces must be complete. The proposed method cannot work with a missing puzzle

piece.
(c). In the process of scanning the input, the background and torn-pieces must differ largely in
color and intensity. This will be important in separating the pieces from the background.
(d). The scanning of the input must be neatly done, i.e. no glue marks or any dirt on the page and
no folded pieces.
(e). The pieces must have significant spaces in between when scanned. There shouldnt be
connected or overlapping pieces.
(f). The input image must have four corners, i.e. it can either be square or rectangle, and not
circle.
(g). Only full RGB shredded images can be reconstructed. Grayscale or binary shredded images
cannot be processed.
Automated reconstruction of torn-paper images and documents is immensely significant. It is
faster and easier than manual reconstruction. This is an important contribution to forensic science
and archeology where correctness and clarity is important.
1. Review of Related Literature
There had been several methods and techniques done in the past for automated reconstruction of
torn paper documents. Digital image processing programmers are aware of the importance of
this process and are trying to design a good algorithm and method.
Biswas, Bhowmick, and Bhattacharya worked on Reconstruction of Torn Paper Documents
Using Contour Maps [1]. Their method used the contour descriptors for shape based matching.
This only applied on torn paper documents with unique edges. Their algorithm for the
reconstruction makes use of a feature list and an AVL Tree. They match pieces and join them at
once when a match is found. This produces large clumps of connected pieces in the event of a
missing puzzle piece. The algorithm also fails to match pieces with large shears.
The study by Kong and Kimia on Solving 2D and 3D Puzzles Using Curve Matching [3] have
some similarity with the previous project wherein they implemented a matching algorithm that
used an alignment curve to represent a correspondence between two curves. In addition to that,
matching of pieces in this study is divided into two stages, the first one is based on coarse-scale
representation of curves and the other one is based on a fine-scale elastic curve matching
method. Their method however focuses on 3D puzzles.
Murakami, Toyama, Shoji and Miyamichi had a method for automated reconstruction called
Assembly of Puzzles by Connecting Between Blocks [4]. Their method reconstructs a puzzle with
equal straight edges by matching the pixel values by the pieces borders. This method has a lot of
limitations: (1) the pieces must have the same sizes, (2) the pieces must be rectangle, (3) the
pieces must be scanned in their right orientation meaning the direction of the pieces are fixed,
and (3) the input image must be in RGB full color. Their algorithm uses the pixel values on the
border of the puzzle pieces as baseline for matching. Their method could also assemble two
different puzzles scanned in one image.
There were also other projects that implemented concepts of torn paper image reconstruction
such as Document Image Mosaicing by Wichello and Yan [5]. Their project joins images that are
too large to be scanned all at once. One example of this is a newspaper page. The newspaper
must be divided into several pieces to be scanned and their method will reconstruct this image in
the computer. Their algorithm used boundary pixel extraction just like the work of Toyama et.
al..
The study conducted by Zhi, Ge, and Ji. Called Image Matching Algorithm Based on Edge Color

Used in Automatic Computer Jigsaw Puzzle [6] allows the construction of puzzle with different
shapes. It used edge detection first before proceeding to the matching of adjacent pieces.
However, this method only applies on jigsaw puzzles which have concaved and convexed sides.
These projects are all able to reconstruct torn paper images using different designs, methods, and
algorithms. They also have different limitations. The proposed algorithm for this project uses a
feature extraction mechanism that can match puzzle pieces with different or same shapes, with
different or same sizes, and with full RGB color or grayscale. The pieces may also be scanned in
any way meaning the pieces may not have the same directions. This algorithm is concerned in
implementing this new feature extraction mechanism in a short or desirable running time.
2. Materials and Methods
The method can be summarized in three major procedures, (1) Initial Treatment Phase which
separates the puzzle piece from the background, (2) Feature Extraction Phase which finds the
match of a given puzzle piece and (3) Merging Phase which joins a pair of puzzle pieces to
reconstruct the whole image.
1. Initial Treatment Phase
The puzzle pieces to be scanned will be placed on a background image with a solid color. The
background color will have high contrast from the puzzle pieces. To separate the puzzle pieces
from the background, the scanned image will undergo binarization by histogram thresholding.
Connected component analysis will then be applied to the scanned image because some pixels of
a puzzle piece may have been considered as background by the binarization algorithm. Boundary
extraction will be performed on the binary image to separate the individual pieces from the
background. All in all, there will be three types of saved image for each puzzle piece, one in
binary format, RGB format and an image of its boundary.
2. Feature Extraction Phase
Feature extraction phase has two major sub processes contour boundary extraction and color
feature extraction.
Contour boundary extraction will get and record information about the edges of each puzzle
piece. This will implement Image Pattern Analysis (IPAN99) Technique by Chetverikov and
Szab [2] on the binary image of each puzzle piece. The information gathered are called
characteristic points CPs.
The technique used by IPAN deals with high curvature points in planar curves. IPAN implements
a two pass-algorithm to define a significant corner. The algorithm initially plots sample points p
along the curve of an input image.

Fig. 2. IPAN triangle inscription.


The first pass scans the sequence and selects candidate corner points by inscribing a triangle
(p ,p,p ). To determine whether point p is a high curvature point, it must follow the given
requirements.
-

(1)
d |p p | d
d |p p | d
aa
2
2

min
min

+ 2

- 2

max
max

max

where |p p | = a = a is the distance between p and p ,


|p p | = b = b is the distance between p and p ,
and a [-, ] is the opening angle of the triangle and computed as
+

(2)
a = arcos a2+b2-c22ab
where c = |p - p |. [See Fig. 2]
-

a , d and d are constants defined by the user. d sets the resolution, d is usually set to d + 2,
and a is the angle limit or the minimum accepted sharpness. Large values for d and d results
to few corners points while small values results to slower running time [insert citation].
If the following rules are met, the triangle is called admissible. The algorithm moves outward
from p until the next two points for p and p are encountered. This stops when the rules are
violated. Among admissible triangles, the smallest a(p) is the one selected. If there are no
admissible triangles, point p is discarded.
The second pass removes a candidate point p with if it has sharper neighbors p . This is done by
discarding p if a(p) > a(p ).
max

min

max

min

max

min

max

min

max

Given two puzzle pieces, the method will select certain CPs from both pieces to compare. The

CPs will be selected by


1.

(b)

Figure 3. (a) CP Selection, (b) Distance comparison.

finding the center of mass of both pieces. The pieces will be divided into four parts by inscribing
two diagonal lines intersecting at the center of mass [See Fig. 3(a)]. CPs within 45 to -45 of the
first piece will be compared to CPs within 135 to 225 of the other piece.
After gathering the necessary CPs called selected CPs, the method will compare their angles
and distances. For a given p and p from selected CPs of piece 1 and p and p from piece 2,
o

1i

1i+1

2i

2i

(1) the distance d between p and p must be equal to the distance d between p and p [See
Fig. 3(b)] and
(2) the angle A between p and p must be equal to the angle A between p and p
for i=0 to N-1 where N is the least number of selected CP from the two pieces.
1

1i

1i

1i+1

1i+1

2i

2i

2i+1

2i+1

If the following conditions are satisfied for all p, piece1 and piece2 will be declared a match
within the selected CPs.
Color feature extraction will get the color information of pixels within the boundary of the
puzzle pieces. The algorithm will implement an RGB comparison on the boundary of the puzzle
pieces to be matched. RGB comparison is implemented by taking the pixel values on the border
of the puzzle pieces and comparing them. The ones with the highest color similarity will be the
ones considered a match.
3. Merging Phase
Merging phase deals on reconstructing the image in a short time. This phase is divided into three
parts piece classification, match finding and joining.
After initial phase, pieces with straight edges will be separated from the rest and will be
classified as border pieces. The rest will be classified as middle pieces.

Fig. 4. An example of a puzzle piece with a straight edge.


The puzzle pieces will first be fed to the IPAN99 technique to get its CPs. To determine whether
a puzzle piece has a straight edge, it must satisfy the following.
(1) It must contain two ps p and p such that given p (x ,y ) and p (x ,y ),
x - x ,y - y
or their slope = 0,
1

2=

(2) and given angle p from p to p and angle p from p to p ,


1

0 < (p and p ) < 180 or


180 < (p and p ) < 360
[See Fig. 4]
1

If the puzzle piece has two CPs that passed the requirements, it will be classified as border piece.
Next the method will look for the corner pieces among the border pieces. A corner piece must
satisfy the requirement.
Given that p and p satisfied the straight edge requirements,
(p and p or (p and p )must also satisfy the straight edge requirements.
1

1)

If all pieces are considered corner pieces, then the method will not classify any as corner pieces.
This means that the input pieces have the same shape. Naturally, there must be four corner
pieces.
After separating the pieces into two sets, the border with the corner pieces will be the first to
undergo match finding. The method will select one of the corner pieces as starting piece. The
rotation method will be implemented to make one straight edge parallel to the x axis. The
rotation method uses the center of mass formula,
C = x=0wy=0h ( x * Bx,y )x=0wy=0hBx,y
x

C = x=0wy=0h ( y * Bx,y )x=0wy=0hBx,y


y

to specify center of rotation.


One by one, candidate pieces will be tried against the curved side of the selected corner piece.
The candidate piece will be rotated such that its straight edge is also parallel to the x axis. The
method will implement contour boundary extraction from the feature extraction phase to
determine a match. Even if a match is found, the method will continue until all remaining border
pieces are tried. In case of several matches, the method will implement the color feature
extraction to determine the candidate with the highest color similarity to the selected piece. The
declared match pieces will be joined.

(a)

(d)

(f)

(h)

(j)
Fig. 5. A simulation of the proposed method.
The method will take new candidate pieces and try them against the curved side of the previously
joined pieces until the match it finds is another corner piece [See Fig. 5(a) to 5(c)]. Once a corner
piece is joined to the block of puzzle pieces, the method now looks for candidate pieces to match
to the left and right border.
As convention dictates, the method will first match candidate pieces on the left side which means
that candidate pieces will be rotated such that their straight edges are parallel to the y axis. After
the method joins another corner piece on the left side, it will match candidate pieces on the right
side until the last corner piece is attached on the border.
Last part of building the image border will be to merge the remaining border pieces. These pieces
will be rotated such that their straight side is parallel to the x axis . They will be matched from
the left unmatched corner piece to the right unmatched corner piece. Once the method attached a
piece to the right corner piece, the border image will have been reconstructed [See Fig. 5(h)].
There may be border pieces that werent matched, meaning they are middle pieces that have a
straight edge.

After completing the border of the image, the method will now match middle pieces against the
border image. The same method applies except that the candidate pieces will now be rotated by
some degree until a match is declared or it goes back to its original orientation. Once a match is
declared, it will be joined to the image block and the process goes on until the last middle piece
is attached and the image will have been reconstructed.
3. Expected Results
The proposed method is expected to reconstruct the input torn paper pieces resulting into a
complete image with small gaps between image pieces due to rotating the puzzle pieces [See Fig.
5(j)]. The method is expected to do this within an average running time less than 15 seconds.
4. References
[1] A. Biswas, P. Bhowmick, and B. Bhattacharya. Reconstruction of Torn Documents
Using Contour Maps. IEEE 2005.
[2] D. Chetverikov and Z. Szab. A Simple and Efficient Algorithm for Detection of High
Curvature Points in Planar Curves.
[3] W. Kong and B. Kimia. On Solving 2D and 3D Puzzles Using Curve Matching. IEEE
2001.
[4] T. Murakami, F. Toyama, K. Shoji, and J. Miyamichi. Assembly of Puzzles by
Connecting Between Blocks. IEEE 2008.
[5] A.P. Whichello and H. Yan. Document Image Mosaicing. IEEE 2006.
[6] L. Zhi, Q. Ge, and Z. Ji. Image Matching Algorithm Based on Edge Color Used in
Automatic Computer Jigsaw Puzzle. IEEE 2009.

You might also like