Professional Documents
Culture Documents
AbstractDigital imaging, image forgery and its forensics have become an established field of research now days. Digital imaging is used to
enhance and restore images to make them more meaningful while image forgery is done to produce fake facts by tampering images. Digital
forensics is then required to examine the questioned images and classify them as authentic or tampered. This paper aims to design and
implement a blind classifier to classify original and spliced Joint Photographic Experts Group (JPEG) images. Classifier is based on statistical
features obtained by exploiting image compression artifacts which are extracted as Blocking Artifact Characteristics Matrix. The experimental
results have shown that the proposed classifier outperforms the existing one. It gives improved performance in terms of accuracy and area under
curve while classifying images. It supports .bmp and .tiff file formats and is fairly robust to noise.
Keywords-component: Blocking Artifact Characteristics Matrix (BACM); Image Forensics; Image Splicing; Joint Photographic Experts Group
(JPEG) compression artifacts; Support Vector Machine (SVM) classifier.
__________________________________________________*****_________________________________________________
I. INTRODUCTION
The readily available software, tools and techniques have
made the image processing quite easier these days. Tools
developed for enhancement of image are being misused to
hide the truth and establish the fallacies. There are enormous
ways to manipulate or forge an image. Most common image
forgery techniques are copy-move and splicing as shown in
Fig. 1. In copy move forgery, some part of the image is a)
cropped, processed and then replicated in the image to either
hide or add some content to the image. In splicing, two
different images are used to create a new image with new
content altogether. Thus, before relying on an image we
need to first check its truthfulness using image forensic tools
and techniques. These techniques are based on active and
passive approaches. In active approach, features like
watermark or signature is added to the image which would
get distorted if the image is tampered. This is mainly used b) c)
for sensitive documents and images, as they are highly Fig. 1 a) Original image; b) copy move forgery; c)
prone to fakery. In the absence of such active approach, a splicing forgery (Dong and Wang, 2011)
passive approach needs to be used. Passive approaches do
not require any background information about the image Initially, Lukas and Fridrich1, 2003 and Lukas et al.2, 2006
rather they extract features and characteristics from the proposed image tamper detection by identifying source
available image only to make a decision. camera using sensor pattern noise but it fails to correctly
Most of the image processing tools and digital cameras now classify the regions where the pattern noise was low. Ng and
days are using Joint Photographic Experts Group (JPEG) Chang3, 2004 proposed physics based model to detect image
format, so, the forensics for this format is very crucial. JPEG splicing but the detection rate was moderate. Popescu and
image forensics is done either by source or camera detection Farid4-6 (2004; 2005a; 2005b) presented image resampling
or by utilizing compression characteristics to identify image and color filter interpolation based methods to detect image
tampering. These characteristics are based on quantization splicing. Proposed method5 doesnt perform well where
and Discrete Cosine Transform (DCT) artifacts present in images with high quality factors were spliced and resaved at
the image due to double compression. a low quality factor. Pan et al.7 (2004) and Perra et al.8
(2005) utilized edge based features for detecting blocking
artifacts in JPEG images and achieved good results. Fan and
Queiroz9 (2003) introduced Blocking Artifact
Characteristics Matrix (BACM) based features to identify
168
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 168 174
_______________________________________________________________________________________________
double image compression which Luo et al.10 (2007) used to
determine cropping and forgery, but this method gave a low
true positive rate. Chen and Hsu11(2008) investigated the
periodic property of blocking artifact by using different
features. But this method only performed well when forged
image has high quality factor as compared to original image.
Pan and Lyu12(2010) proposed region duplication detection
using image key-points and feature vectors as these are
robust to usual image transforms. Barni et al.13 (2010)
localized tampering by statistically analysing the image both
block and region wise. Bianchi and Piva14 (2012)
categorized the double JPEG compression as either aligned
or non-aligned and localized the tampering. Although results
presented were very comprehensive but classifier achieved
low Area Under Curve (AUC) for spliced images with high
Quality Factor. Thing et al.15 (2012) tried to improve the
accuracy of JPEG image tampering detection by considering
the characteristics of the random distribution of high value Fig. 2 System Design for proposed JPEG tool
bins in the DCT histograms. Then, Tralic et al.16(2012)
proposed a method to detect re-compression using Blocking A. Proposed algorithm for statistical features extraction
Artifact Grid extraction but sufficient illustration of method
on different types of images was lacking. Mall et al. 17(2013) The algorithm used for extracting image statistical features
proposed a combined hashing index for image which was and its complexity is as follows:
capable of detecting structural tampering, brightness level Step1: Consider an image I. transform the image I to
adjustment and contrast manipulations. Chang et al.18 (2013) grayscale such that Ig=rgb_to_gray (I).
proposed copy move detection by searching similarity Step2: Subdivide the image into sub-blocks of 8 x 8 pixels.
blocks in the image and used similarity vector field to assure For each sub-block, for every pixel location , ,
the true positives. Recently, Wattanachote et al.19 (2015) where,1 , 8
utilized BACM features to identify seam modifications in Calculate difference in neighbour pixel intensities
JPEG images and presented efficient results. (, ) as:
All these researchers contributed significantly in image (, ) = |[(, ) + ( + 1, + 1)] [( + 1, ) +
forensics but only few provided a comprehensive study. The (, + 1)] (1)
aim of presented work is to design and implement a blind Where,(, ) represent intensity of pixel at location , .
classifier for splice detection of JPEG images at various Calculate ( + 4, + 4).
quality factors with higher accuracy and area under curve. Calculate absolute difference (, ) = | + 4, +
Proposed classifier works for .bmp and .tiff images as well. 4 (,)|. (2)
It is robust to presence of noise in images. It detects image Step 3. Calculate energy (, ) at each pixel location
splicing even when pre-processing and post-processing , from each sub-block as
operations have been applied and spliced area vary from (, ) = =1 (, ) (3)
small to large. The proposed design and the experimental Where, is total number of image sub-blocks.
results obtained are discussed in following sections. Step 4. Calculate BACM matrix (, ) as (, ) =
II. PROPOSED SYSTEM DESIGN FOR SPLICE DETECTION (, )/. (4)
CLASSIFIER Step 5. Extract features F1-F20 from BACM and input them
to SVM to obtain the classifier model.
The system design consists of two main components i.e. The algorithm works on 2x2 pixel neighbouring in each sub-
training and testing of Support Vector Machine (LIBSVM 20) block. Every pixel is considered neighbour to 4 pixels as
to classify images as shown in Fig. 2. Image dataset shown in Fig 3. Algorithm needs to access each block once
consists of original and spliced images from CASIA21 and each pixel of the image 4 times to calculate pixel
database. Dataset is divided as training and testing dataset. intensity difference. So, the number of access for each pixel
Statistical features from these images are extracted from is 4 and the complexity is equivalent to O (4n) O (n). It is
image Blocking Artifact Characteristic Matrix (BACM) linearly dependent on the size of the image. The algorithms
which is the mean inter-pixel intensity difference inside and main steps i.e. extracting BACM and defining feature set are
across the JPEG sub-block boundaries. This difference is further clarified with example in the following two sections.
similar for uncompressed images but when an image is
compressed, the discontinuities appear in pixel intensity
difference. The statistical features of images from training B. Extracting BACM
dataset are fed to SVM and a model is obtained. Then this Blocking Artifact Characteristics Matrix (BACM) is a
model is used to test images for their identification as matrix extracted form DCT blocks of the image. It reveals
original or spliced. important features about the image compression history. To
extract BACM, grey scale image is subdivided into sub-
blocks of 8x8 pixels. For each sub-block and every pixel
169
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 168 174
_______________________________________________________________________________________________
location the inter-pixel intensity difference is calculated. For (1,1) = |(11 + 22 ) (21 + 12 )| and (5,5) =
example, P, Q, R and S are four consecutive sub-blocks in |(55 + 66 ) (65 + 56 )|
image. Then for sub-block P, the inter-pixel distance (4,4) = |(44 + 55 ) (54 + 45 )| and (8,8) =
at = = 1 , is calculated as (1,1) and 5,5 and the |(88 + 11 ) (81 + 18 )|
inter-pixel distance at = = 4 is calculated as (4,4) Further, the absolute difference (, ) is calculated using
and(8,8)using Eq. (1) as shown in Fig. 3. Eq. (2). Then energy (, ) and then BACM (, ) is
derived using Eq. (3) & (4). Fig. 4 shows the value of
BACM of an original JPEG image at each pixel location.
For example, 2.5364 in BACM is the mean value for all
pixels intensity differences which are located at (1, 1) in
every block.
a) b)
Fig. 3 Calculation of inter pixel difference a) inside b) across
the block boundary
2.5364 2.4623 2.5075 3.5274 2.6886 2.4883 2.6667 4.6139
2.5343 2.4259 2.3567 3.4163 2.3909 2.4273 2.5110 3.9047
2.3738 2.2702 2.4554 3.5679 2.4547 2.3628 2.3608 4.2888
2.5981 2.6049 2.8265 3.5741 2.7606 2.5171 2.6337 4.1214
2.7311 2.5343 2.8656 3.9444 2.9266 2.6317 2.7263 4.2798
2.5995 2.5816 2.6399 3.5178 2.5583 2.4067 2.5178 4.2305
2.6310 2.6879 2.9005 3.7634 2.7908 2.5583 2.7661 4.3909
3.3765 2.9156 3.0391 3.7558 3.2449 3.0192 3.4136 4.4266
12 Spliced at QF
80
10 Fig. 6 Division of BACM in regions for extracting
8 Spliced at QF statistical features
100 The first set of features is based on symmetry of horizontal
6 region H1and vertical region V1. For H1 and V1
4 Original feature 1and 2 are extracted as:
1 = 3=1 | 4, 4,8 | (5)
2 3
2 = =1 | , 4 8 , 4 | (6)
1 2 3 4 5 6 7 8 Where (, ) represents BACM matrix value at
x value for y=4 location , . The next set of features is based on symmetry
of four regions R1, R2, R3 and R4. Feature 3 is based on
Fig. 5 Comparison of 4th column values of BACM of symmetry of R1 and R2, 4 is based on the symmetry of
Au_nat_00093.jpg with its spliced versions at QF100, blocks R3 and R4, 5 is based on the symmetry of blocks
QF80, and QF60 R1 and R3, 6 is based on the symmetry of blocks R2 and
R4, 7 is based on the symmetry of blocks R1 and R4 and
C. Defining Feature Set 8 is based on the symmetry of blocks R2 and R3.
3 = 3=1 3 =1 , (, 8 ) (7)
After calculating BACM, statistical features need to be
4 = 7=5 3 =1 , , 8 (8)
defined and extracted. For feature extraction, BACM is
divided in various regions. In existing techniques9, 10, 19, only 5 = 3=1 3 =1 , 8 , (9)
170
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 168 174
_______________________________________________________________________________________________
6 = 3=1 7 =5 , (8 , ) (10) Last two features19 and 20 are based on symmetry of
7 = 3=1 3 =1 , 8 , 8 (11) horizontal region H2 and vertical region V2:
8 = 3=1 7 =5 , 8 (8 , ) (12) 19 = 3=1 | 8, 8,8 | (23)
Further six features, 9 14 are extracted based on 20 = 3=1 | , 8 8 , 8 | (24)
percentage of occupancy of centre point C1 against different The values for all these features have been studied. Luo et
regions R1, R2, R3, R4, H1 and V1. These are calculated as: al., 2007 used first fourteen features i.e. 1 14 based on
9 = 1/ 3=1 3 =1 , (13) Eq. 5 to Eq. 18 to classify the images. In addition to these
fourteen features another set of six features based on Eq. 19
10 = 1/ 3=1 7=5 , (14)
to Eq. 24 have been added to increase the capability of the
11 = 1/ 7=5 3=1 (, ) (15) classifier. Another set of these features which are based on
7 7
12 = 1/ =5 =5 (, ) (16) the Occupancy of centre points C2, C3 and C4 have been
13 = 1/ 7 =1 4, 1 (17) studied but are not included in classifier design as less
14 = 1/ 7=1 , 4 1 (18) deviation is observed in their feature values. Fig. 7
Next four new features, 15 18 are extracted based on illustrates an example of feature values for original and
mean of four sub-regions i.e. R5, R6, R7 and R8 as: spliced images for existing and proposed classifiers for
15 = 4=1 4 =1 (, ) (19) image shown in Fig. 1. First fourteen (1-14) features are
16 = 8=5 4 =1 (, ) (20) common for both the classifiers and next six (15-20) are
4 8 added in the proposed classifier.
17 = =1 =5 (, ) (21)
18 = 8=5 8 =5 (, ) (22)
10
9
8 orig
Value obtained
7
6 tamp_
QF100
5
4 tamp_
QF80
3
2 tamp_
QF60
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Extracted Feature
Fig. 7 Representation of Feature values for original (Fig. 1a) and spliced images
(Fig. 1c at QF100, QF80, QF60)
QF100 QF60 95.4 98.6 97.0 0.9912 94.3 98.8 96.5 0.9971
QF80 71.0 95.4 83.2 0.8518 79.6 94.6 85.7 0.9195
QF100 71.0 96.0 80.8 0.8477 80.6 94.6 86.0 0.9067
QF80 QF60 86.7 98.5 92.4 0.9764 90.7 98.8 95.1 0.9796
QF80 74.1 95.0 85.2 0.8634 78.2 95.8 87.4 0.9168
QF100 70.3 97.1 83.8 0.8616 80.5 96.2 88.5 0.9324
QF60 QF60 66.1 96.3 82.2 0.8253 71.1 95.9 84.4 0.8905
QF80 81.7 96.9 89.8 0.9414 83.4 97.3 90.8 0.9712
QF100 86.5 97.5 92.6 0.9567 86.0 97.9 92.0 0.9639
100
98 Existing Proposed
96
94
92
90
Accuracy
88
86
84
82
80
174
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________