Professional Documents
Culture Documents
We will call those coefficients usable coefficients. OutGuess first pa h2i + qa h2i – 2pqa2 h2i 2 /h2i+1 = a h2i (p + q – 2pqa h2i / h2i+1) ,
calculates the maximal length of a randomly-spread message that respectively. Thus, the total number of expected changes in the
can be embedded in the image while making sure that one will be cover image after consecutive embedding of two independent
able to make corrections to adjust the histogram to its original
randomly-spread messages of size 2paP and 2qaP bits, 0 ≤ p, q ≤
values. After embedding m pseudo-random bits in the LSBs of the
1, is
cover-image in randomly selected usable coefficients, the
histogram values (h2i, h2i+1) will be changed to h
h2i → h2i − α(h2i − h2i+1),
T pq = 2a ∑ i ≠0
h2i p + q − apq1 + 2i .
h
2i
(2)
p=
S0 − S
S 0 − S1
. (4) Qc = arg min Q ∑ ∑
(i , j ) d
2
hd (i, j ) − hd (i, j , Q ) .
The linear interpolation and Equation (4) can be justified using We have tested this algorithm on 70 test grayscale 600×800 JPEG
Equation (2) for the number of changes. Because the blockiness is images with quality factors ranging from 70 to 90 and a fixed
a linear function of the number of DCT coefficients with flipped stego image quality factor Qs = 80. In all but four cases we
LSBs, we can write B(p) = c + dTp, where Tp is the number of estimated the cover image quality factor correctly.
coefficients with flipped LSBs after embedding a message of The same database of images was used for evaluation of the
length 2paP bits, and c and d are constants. Using (2) we can performance of our detection method. Among the 70 test images,
write 24 of them were processed using OutGuess with message sizes
ranging from the maximal capacity to zero. Because the detection
h
S1 = B1(1) − B(1) = d (T11 − T10 ) = 2ad ∑ i ≠0
h2i 1 − a1 + 2i
h 2i
algorithm contains randomization, we have repeated the detection
10 times for each image and averaged the p values (4). The results
are shown in Figure 1. On the y axis is the relative number of
S 0 = B(1) − B(0) = d (T10 − T00 ) = 2ad ∑ h2i changes due to embedding Tp/aP (see Equation (1)) and on the x
i ≠0
axis is the image number. Assuming the distribution of the
h
S = Bs(1) − Bs(1) = d (T p1 − T p 0 ) = 2ad
i ≠0∑h2i 1 − ap1 + 2i
h
2i
difference between the estimated and actual values is Gaussian,
the estimation error is −0.0032 ± 0.0406. From our experiments
with Equation (1) on test images, we determined that the number
which, after simple algebra, confirms Equation (4). Equation (4)
of changes due to the correction step is about 1/3 of the changes
generally provides an accurate estimate of the secret message
due to message embedding. Thus, on average the total number of
length. However, there are some situations when a large error
changes due to embedding m bits is Tp = m/2 (1+1/3).
may occur. This happens when the image sent to OutGuess is
Consequently, the error for the estimated message length m is
already a JPEG file. OutGuess always decompresses the cover
−0.48 ± 6 % of total capacity.
image to the spatial domain and then recompresses it using a
specified quality factor. The message is then embedded into this
Figure 1. The actual relative number of changes Tp/aP (circles) compared to the calculated number of changes (triangles) for 70
test JPEG images resized to 600×800 pixels obtained using a digital camera Kodak DC 290
Governmental purposes notwithstanding any copyright notation
4. CONCLUSION there on. The views and conclusions contained herein are those of
In this paper, we describe a threshold-free detection methodology the authors and should not be interpreted as necessarily
for attacking steganographic methods that embed data by representing the official policies, either expressed or implied, of
modifying quantized DCT coefficients. The detection starts with Air Force Research Laboratory, or the U. S. Government.
identifying a macroscopic quantity S(p) that predictably changes
with the length of the embedded message. We show how to
determine the parameters in S by calculating S(0) and S(1) for an
6. REFERENCES
approximation to the cover image obtained by cropping the stego [1] Steganography software for Windows, http:
image and recompressing it. Using the values S(0) and S(1), it is //members.tripod.com/steganography/stego/ software.html
possible to calculate an estimate of the length of the embedded [2] Westfeld, A. High Capacity Despite Better Steganalysis (F5–
message p. For OutGuess, we take the increase in spatial A Steganographic Algorithm). In: Moskowitz, I.S. (eds.):
blockiness as a function of p as the macroscopic quantity S. For Information Hiding. 4th International Workshop. Lecture
the database of 70 grayscale images, the estimated relative Notes in Computer Science, Vol.2137. Springer-Verlag,
number of modifications due to embedding is quite close to the Berlin Heidelberg New York, 2001, pp. 289–302
actual numbers with the standard deviation for the error of 4% of
the total image capacity. [3] Provos, N. Defending Against Statistical Steganalysis. Proc.
10th USENIX Security Symposium. Washington, DC, 2001
The detection methodology is based on the assumption that the
macroscopic quantity S behaves approximately the same for the [4] Westfeld, A. and Pfitzmann, A. Attacks on Steganographic
cover image and the cropped recompressed stego image. Systems. In: Pfitzmann A. (eds.): 3rd International
Although, this assumption has been verified experimentally, it Workshop. Lecture Notes in Computer Science, Vol.1768.
deserves a more formal mathematical approach. It would be Springer-Verlag, Berlin Heidelberg New York (2000), pp.
especially useful to automatically detect cases when this 61−75
assumption is not satisfied and thus the result of the detection may [5] Provos, N. and Honeyman, P. Detecting Steganographic
be inaccurate. Content on the Internet. CITI Technical Report 01-11, 2001
For F5, we can take the individual histograms of low-frequency [6] Westfeld, A. Detecting Low Embedding Rates. 5th
DCT coefficients as the quantity S (for details, see [8]). For J-Steg Information Hiding Workshop. Nooerdwijkerhout,
(including the version of J-Steg with random straddling), one can Netherlands, Oct. 7−9, 2002
also use the histogram because it changes predictably with the
length of the embedded message. [7] Farid, H. and Siwei Lyu. Detecting Hidden Messages Using
Higher-Order Statistics and Support Vector Machines. 5th
One of the lessons learned from this paper is that in order to
Information Hiding Workshop, Noordwijkerhout,
develop a high-capacity steganographic method for JPEGs, one
Netherlands, Oct. 7−9, 2002
needs to avoid making predictable changes to some macroscopic
characteristics of the JPEG file. However, this task seems to be [8] Fridrich, J., Goljan, M., and D. Hogea. Steganalysis of JPEG
quite difficult if we insist on embedding one bit in each non-zero Images: Breaking the F5 Algorithm. 5th Information Hiding
DCT coefficient. Also, another lesson is that one should abandon Workshop, Noordwijkerhout, Netherlands, Oct. 7-9, 2002
the concept of LSB flipping for embedding and instead use
incrementing/decrementing the coefficient values as already
[9] Fridrich, J., Goljan, M., and Hogea, D.: New Methodology
for Breaking Steganographic Techniques for JPEGs.
pointed out in [2].
Submitted to SPIE: Electronic Imaging 2003, Security and
Watermarking of Multimedia Contents. Santa Clara,
5. ACKNOWLEDGMENTS California, 2003
The work on this paper was supported by Air Force Research
Laboratory, Air Force Material Command, USAF, under a
research grant number F30602-02-2-0093. The U.S. Government
is authorized to reproduce and distribute reprints for