You are on page 1of 160

STEGANALYSIS OF BINARY IMAGES

This thesis is presented for the degree of

DOCTOR OF PHILOSOPHY

by

KANG LENG CHIEW

Department of Computing Faculty of Science MACQUARIE UNIVERSITY Australia

June 2011

2011 KANG LENG CHIEW

TABLE OF CONTENTS
Page LIST OF FIGURES LIST OF TABLES ABSTRACT LIST OF PUBLICATIONS ACKNOWLEDGMENTS 1 Introduction 1.1 Motivations . . . . . . . . . . . . 1.2 Research Problems . . . . . . . . 1.3 Objectives . . . . . . . . . . . . . 1.4 Research Overview . . . . . . . . 1.4.1 Contributions . . . . . . . 1.4.2 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv vi vii x xi 1 2 3 4 5 5 6 9 9 11 13 14 15 16 19 20 21 22 22 22 25 26 27 28 28 29 30

2 Background and Concepts 2.1 Overview of Steganography . . . . . . . . . . 2.2 SteganalysisModel of Adversary . . . . . . . 2.3 Level of Analysis . . . . . . . . . . . . . . . . 2.4 Blind Steganalysis as Pattern Recognition . . 2.4.1 Feature Extraction . . . . . . . . . . . 2.4.2 Classication . . . . . . . . . . . . . . 2.5 Digital Images . . . . . . . . . . . . . . . . . . 2.5.1 Image File Formats . . . . . . . . . . . 2.5.2 Spatial and Frequency Domain Images

3 Literature Review 3.1 Steganography . . . . . . . . . . . . . . . . . . . . . 3.1.1 Liang et al. Binary Image Steganography . . 3.1.2 Pan et al. Binary Image Steganography . . . 3.1.3 Tseng and Pan Binary Image Steganography 3.1.4 Chang et al. Binary Image Steganography . 3.1.5 Wu and Liu Binary Image Steganography . 3.1.6 F5 Steganography . . . . . . . . . . . . . . . 3.1.7 OutGuess Steganography . . . . . . . . . . . 3.1.8 Model-Based Steganography . . . . . . . . . i

3.2

Steganalysis . . . . . . . . . . . . . . . . . . . . . 3.2.1 Dierentiation of Cover and Stego Images 3.2.2 Classication of Steganographic Methods . 3.2.3 Estimation of Message Length . . . . . . . 3.2.4 Identication of Stego-Bearing Pixels . . . 3.2.5 Retrieval of Stegokey . . . . . . . . . . . . 3.2.6 Extracting the Hidden Message . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 31 41 47 52 56 58 59 60 61 62 62 63 64 65 67 67 67 69 70 71 72 73 75 75 76 77 79 81 81 82 83 86 87 88 88 91 93 94 94 95 97

4 Blind Steganalysis 4.1 Comparison of the Steganography Methods under Analysis 4.2 Proposed Steganalysis Method . . . . . . . . . . . . . . . . 4.2.1 Grey Level Run Length Matrix . . . . . . . . . . . 4.2.2 Pixel Dierences . . . . . . . . . . . . . . . . . . . 4.2.3 GLRL Matrix from the Pixel Dierence . . . . . . . 4.2.4 GLGL Matrix . . . . . . . . . . . . . . . . . . . . . 4.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . . 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . 4.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . 4.3.2 Results Comparison . . . . . . . . . . . . . . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Multi-Class Steganalysis 5.1 Summary of the Steganographic Methods under Analysis 5.2 Proposed Steganalysis . . . . . . . . . . . . . . . . . . . 5.2.1 Increasing the Grey Level via the Pixel Dierence 5.2.2 Grey Level Run Length Matrix . . . . . . . . . . 5.2.3 Grey Level Co-Occurrence Matrix . . . . . . . . . 5.2.4 Cover Image Estimation . . . . . . . . . . . . . . 5.2.5 Final Feature Sets . . . . . . . . . . . . . . . . . 5.3 Multi-Class Classication . . . . . . . . . . . . . . . . . . 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . 5.4.2 Results Comparison . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Hidden Message Length Estimation 6.1 Boundary Pixel Steganography . . . . . . . . . . . 6.2 Proposed Method . . . . . . . . . . . . . . . . . . . 6.2.1 512-Pattern Histogram as the Distinguishing 6.2.2 Matrix Right Division . . . . . . . . . . . . 6.2.3 Message Length Estimation . . . . . . . . . 6.3 Experimental Results . . . . . . . . . . . . . . . . . 6.3.1 Experimental Setup . . . . . . . . . . . . . . 6.3.2 Results of the Estimation . . . . . . . . . . 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

7 Steganographic Payload Location Identication 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . 7.2 Motivation and Challenges . . . . . . . . . . . . . . . 7.3 Proposed Stego-Bearing Pixel Location Identication 7.4 Experimental Results . . . . . . . . . . . . . . . . . . 7.4.1 Experimental Setup . . . . . . . . . . . . . . . 7.4.2 Results Comparison . . . . . . . . . . . . . . . 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 8 Feature-Pooling Blind JPEG Image Steganalysis 8.1 Feature Extraction Techniques . . . . . . . . . . . 8.1.1 Image Quality Metrics . . . . . . . . . . . 8.1.2 Moment of Wavelet Decomposition . . . . 8.1.3 Feature-Based . . . . . . . . . . . . . . . . 8.1.4 Moment of CF of PDF . . . . . . . . . . . 8.2 Features-Pooling Steganalysis . . . . . . . . . . . 8.2.1 Feature Selection in Feature-Based Method 8.2.2 Feature-Pooling . . . . . . . . . . . . . . . 8.3 Experimental Results . . . . . . . . . . . . . . . . 8.3.1 Classier Selection . . . . . . . . . . . . . 8.3.2 Results Comparison . . . . . . . . . . . . . 8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . 9 Improving JPEG Image Steganalysis 9.1 Steganography as Additive Noise . . . . . . 9.2 Image-to-Image Variation Minimisation . . . 9.3 Steganalysis Improvement . . . . . . . . . . 9.3.1 Moments of Wavelet Decomposition . 9.3.2 Moment of CF of PDF . . . . . . . . 9.3.3 Moment of CF of Wavelet Subbands 9.4 Experimental Results . . . . . . . . . . . . . 9.4.1 Experimental Setup . . . . . . . . . . 9.4.2 Results Comparison . . . . . . . . . . 9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

98 99 99 101 103 103 104 108

109 . 109 . 110 . 110 . 111 . 112 . 113 . 113 . 114 . 116 . 116 . 118 . 120 121 . 121 . 122 . 125 . 125 . 126 . 126 . 127 . 127 . 128 . 130

10 Conclusions and Future Research Directions 131 10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 10.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography 134

iii

LIST OF FIGURES
Page 1.1 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 5.1 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 7.1 7.2 7.3 7.4 8.1 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 5

General model of steganography . . . . . . . . . . . . . . . . . . . . 10 General framework of blind steganalysis . . . . . . . . . . . . . . . 15 Two-class SVM classication . . . . . . . . . . . . . . . . . . . . . . 18 Example of eligible pixels . . . . . . . . . . . . . . . . . . . . . Example of ineligible pixels . . . . . . . . . . . . . . . . . . . Eect of ipping a pixel . . . . . . . . . . . . . . . . . . . . . Measurement of smoothness and connectivity . . . . . . . . . Algorithm of model-based steganography . . . . . . . . . . . . Co-occurrence matrices extracted from cover and stego images Illustration of wavelet decomposition . . . . . . . . . . . . . . Intra- and inter-block correlations in a JPEG image . . . . . . The 64 modes of an 88 DCT block . . . . . . . . . . . . . . Modied image calibration for double compressed JPEG image One-against-one approach for a multi-class classication . . . . A portion of image histogram before and after LSB embedding The boundaries of 8 8 blocks . . . . . . . . . . . . . . . . . The extraction of residual image . . . . . . . . . . . . . . . . . Detection results displayed in ROC curves and AUR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 24 26 29 31 33 37 39 43 44 46 48 49 55

. . . . . . . . 68

Pixel dierence in vertical direction . . . . . . . . . . . . . . . . . . 73 Illustration of a boundary pixel . . . . . . . . . . . . . . . . . . . Examples of 512 patterns . . . . . . . . . . . . . . . . . . . . . . . Comparison of patterns histogram between cover and stego images Histogram dierence between two binary images . . . . . . . . . . Histogram quotient with increasing message length . . . . . . . . Estimated length of hidden messages for all binary images . . . . Example of a highly distorted stego image . . . . . . . . . . . . . Estimation error of hidden message length for all binary images . Identication results for dierent window sizes Comparison of results for image Database A . Comparison of results for image Database B . Comparison of results for image Database C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 89 90 92 94 95 96 97 104 107 107 107

Features comparison in detecting F5 . . . . . . . . . . . . . . . . . 114 iv

8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.1 9.2

Features comparison in detecting OutGuess . . . . . . . . . . . Features comparison in detecting MB1 . . . . . . . . . . . . . . Classier comparison in detecting F5 . . . . . . . . . . . . . . . Classier comparison in detecting OutGuess . . . . . . . . . . . Classier comparison in detecting MB1 . . . . . . . . . . . . . . Comparison of steganalysis performance in detecting F5 . . . . . Comparison of steganalysis performance in detecting OutGuess . Comparison of steganalysis performance in detecting MB1 . . .

. . . . . . . .

. . . . . . . .

115 115 117 117 118 119 120 120

Two images with their respective underlying statistics . . . . . . . . 123 Transformed image by scaling and cropping . . . . . . . . . . . . . 124

LIST OF TABLES
Page 4.1 4.2 4.3 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 7.1 7.2 7.3 7.4 8.1 9.1 9.2 9.3 Comparison of the steganographic techniques . . . . . . . . . . . . . 61 Summary of the 68-dimensional feature space . . . . . . . . . . . . 66 Experimental parameters . . . . . . . . . . . . . . . . . . . . . . . . 67 Properties of features . . . . . . . . . . . . . . . . . . . . Example of majority-voting strategy for multi-class SVM Summary of image databases . . . . . . . . . . . . . . . Summary of stego image databases . . . . . . . . . . . . Confusion matrix for the textual database . . . . . . . . Confusion matrix for the mixture database . . . . . . . . Confusion matrix for the scene database . . . . . . . . . Mean and standard deviation of the estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 80 81 82 84 85 85

. . . . . . . . . . . 96 . . . . . . . . . . . . 103 105 105 106

Summary of image databases . . . . . . . . . . . . . . . . . . The accuracy of the identication for image Database A . . . . The accuracy of the identication for image Database B . . . . The accuracy of the identication for image Database C . . .

Feature selection comparison for SFFS, T-test and Bhattacharyya . 114 Comparison for the proposed technique and the Farid technique . . 128 Comparison for the proposed technique and the COM technique . . 129 Comparison for the proposed technique and the MW technique . . . 129

vi

ABSTRACT

Steganography is a science of hiding messages into multimedia documents. A message can be hidden in a document only if the content of a document has high redundancy. Although the embedded message changes the characteristics and nature of the document, it is required that these changes are dicult to be identied by an unsuspecting user. On the other hand, steganalysis develops theories, methods and techniques that can be used to detect hidden messages in multimedia documents. The documents without any hidden messages are called cover documents and the documents with hidden messages are named stego documents. The work of this thesis concentrates on image steganalysis. We present four dierent types of steganalysis techniques. These steganalysis techniques are developed to counteract the steganographic methods that use binary (black and white) images as the cover media. Unlike greyscale and colour images, binary images have a rather modest statistical nature. This makes it dicult to apply directly the existing steganalysis on binary images. The rst steganalysis technique addresses blind steganalysis. Its objective is to detect the existence of a secret message in a binary image. Since the detection of a secret message is often modelled as a classication problem, consequently it can be approached using pattern recognition methodology. The second steganalysis technique is known as multi-class steganalysis. Its purpose is to identify the type of steganographic method used to create the stego image. This extends the earlier blind steganalysis from two-class (cover or stego image) to multi-class (cover or dierent types of stego images) classication. Similar to blind steganalysis, this technique is also based on the pattern recognition methodology to perform the classication. The third steganalysis technique uses rst-order statisticbinary pattern histogramto estimate the length of an embedded message. This technique is used specically to analyse the steganography developed by Liang et al. The estimated message length usually plays an important role and is needed at other levels of analysis. The fourth steganalysis technique identies the steganographic payload locations based on multiple stego images. This technique can reveal which pixels in the binary image carry the message bits. This technique is crucial as it not only vii

reveals the existence of a hidden message, it also provides information to locate the hidden message. Finally, we proposed two improvements to existing JPEG image steganalysis. We combined several feature sets and applied a feature selection technique to obtain a set of powerful features. We showed that by minimising the inuence of image content, we can improve the features sensitivity with respect to steganographic alteration.

viii

STATEMENT OF CANDIDATE

I certify that the work in this thesis entitled STEGANALYSIS OF BINARY IMAGES has not previously been submitted for a degree nor has it been submitted as part of requirements for a degree to any other university or institution other than Macquarie University. I also certify that the thesis is an original piece of research and it has been written by me. Any help and assistance that I have received in my research work and the preparation of the thesis itself have been appropriately acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis.

KANG LENG CHIEW (41375521) 8 June 2011

ix

LIST OF PUBLICATIONS

1. K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Steganalysis. IEEE Conference on Digital Image Computing: Techniques and Applications, 96103, 2008. 2. K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement via Image-to-image Variation Minimization. International IEEE Conference on Advanced Computer Theory and Engineering, 223227, 2008. 3. K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in Binary Image Embedded by Using Boundary Pixels Steganography. International Conference on Availability, Reliability and Security, 683688, 2010. 4. K. L. Chiew and J. Pieprzyk. Blind Steganalysis: A Countermeasure for Binary Image Steganography. International Conference on Availability, Reliability and Security, 653658, 2010. 5. K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Classication Based on Multi-Class Steganalysis. 6th International Conference on Information Security, Practice and Experience, 6047:341358, 2010. 6. K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Location in Binary Image. 11th Pacic Rim Conference on MultimediaAdvances in Multimedia Information Processing, 6297:590600, 2010.

ACKNOWLEDGMENTS

I would like to express my sincere appreciation to my supervisor, Professor Josef Pieprzyk for his countless help, assistance and guidance in every stage of my research. I have beneted a lot from the valuable discussion with him since the very beginning of my research. I would also like to express my gratitude and special thanks to Dr. Scott McCallum for being so patient and inspiring in guiding my academic writing skill. The interaction with him has tremendously improved my understanding in academic writing. I want to take this opportunity to thank Ministry of Higher Education Malaysia and Universiti Malaysia Sarawak for providing me with SLAI scholarship for the research. I am also very grateful for the HDR Project Support Funds supported by Macquarie University. Very special thanks to Joan for spending valuable time to proof-read my thesis. I would like to thank Nana who always provides me with valuable information, hints and updates related to my research. I would also like to thank Gaurav for the enjoyable discussions and interactions. To all the sta in the Department of Computing, their excellent supports are highly appreciated. Thanks to my parent, brother, sister and brother-in-law for their continuous support, encouragement and motivation in me throughout the years. I am so grateful to my wife, for her love, thoughtful comment, support and nurturing in all aspect. Her advice and encouragement have been always be the point of reference whenever I am lost. The surviving moment would be much tougher if without her accompany. And nally, to all the people who have helped directly and indirectly to support me throughout this undertaking, thank you. This thesis was edited by Dr Lisa Lines, and editorial intervention was restricted to Standards D and E of the Australian Standards for Editing Practice.

xi

Chapter 1 Introduction
The process of sending messages between two parties through a public channel in such a way that it deceives the adversary from realising the existence of the communication is known as steganography. Tracing back to antiquity, Histaiacus shaved a slaves head, wrote a message on his scalp and the slave was sent as a messenger after his hair grew back to convey steganographic content [12]. The Greeks received warning about the intention of invasion by Xerxes from a message underneath a writing tablet covered by wax [3, 84]. In a more recent history, invisible ink was used as a form of steganography during World War II [12, 59] to establish covert communication. An application of steganography was reported in the literature around 1980s when British Prime Minister Margaret Thatcher had the word processors programmed to encode the identity in the word spacing to trace disloyal ministers that were responsible for the leaks of cabinet documents [2, 3]. The ongoing development of computer and network technologies provides an excellent new channel for steganography. Most digital documents contain redundancy. This means that there are parts of documents that can be modied without an impact on their quality. The redundant parts of a document can be identied in many distinct ways. Consider an image. Typically, margins of the image do not convey any signicant information and they can be used to hide a secret message. Also, some pixels of the image can be modied to carry a small number of secret bits as small modication (e.g., least signicant bit of pixels) will not be noticeable to an unsuspecting user. As the redundant parts of a digital document can be determined in a variety of ways, many steganographic methods can be developed. Mainly, steganography considers methods and techniques that can create covert 1

communication channels for unobtrusive transmission for military purposes. Steganography is also used for automatic monitoring of radio advertisements, indexing of videomail (to embed comments) and medical imaging (to embed information like patient and physician names, DNA sequences and other particulars) [3]. Other applications include: smart video-audio synchronization, secure and invisible storage of condential information, identity cards (to embed individuals details) and checksum embedding [12]. Steganography is also used for the less dramatic purpose of watermarking. The applications of watermarking mainly involve the protection of intellectual property such as ownership protection, le duplication management, document authentication (by inserting an appropriate digital signature) and le annotation.

1.1

Motivations

Like most other areas, steganography has thrived in the digital era. Many interesting steganographic techniques have been created and its continuing evolution is guaranteed by a growing need for information security. Inevitably, they are potentially open to abuse and can be used by criminals and terrorists. An article from USA Today stated that steganography was used by terrorists [60], although there was little evidence to substantiate this claim [79]. Nonetheless, after the 9/11 incident, it has triggered immediate concern on the possibility that steganography can be used in the terrorism planning. In addition, several reports from the literature stated that steganography has been suspected as a possible means of covert communication and planning of terrorist attacks [6, 103, 52]. A training manual for the Mujahideen, which contains an exposition on image steganography over the Internet is also reported in Hogans PhD thesis [52]. While initially the use of steganography by terrorists appeared doubtful, it has since become accepted and should be treated seriously. For the less drastic case, those who wish to evade surveillance (e.g., who have reason to fear punishment for expressing sensitive political thoughts) can use steganography. For example, the communication between members of a political dissident organisation is usually under surveillance. The adversary (i.e., government agencies) may arrest the dissidents if evidence of sensitive issue being discussed and planned is found. Therefore, steganography may be the safest form of communication between dissidents. There are a large number of steganographic

tools available as commercial software or freeware, which can be easily downloaded. With these tools, accomplishing such activities will become even simpler1 . As a result, this has created unique challenges for law enforcement agencies. Digital media and information technology have developed rapidly and are ubiquitous. Information is stored digitally and is abundant. Specically, there are a multitude of daily tasks that involves dealing with documents. The originals of these documents might be digital or they may be converted from hardcopies into appropriate digital formats. In general, the majority of documents are binary (black and white), which consist of foreground (black) and background (white). Scanning such a document obtains a binary image that can potentially be used as a medium for steganography. This deserves a careful analysis. Despite the importance and widespread use of binary images in steganography, it has received little attention, especially the steganalysis of binary image steganography. More research is found on the more commonplace steganalysis of greyscale and colour images; however, these techniques cannot be directly used to analyse binary image steganography. Therefore, a more appropriate and eective set of techniques should be developed.

1.2

Research Problems

In general, the steganalysis techniques can be categorised into six levels depending on how much information about the hidden messages we require. These levels (ordered according to the increased amount of information acquired) are as follows: Dierentiation between cover and stego documentsthis is the rst step in steganalysis and the purpose of this technique is to determine if a given document carries a hidden message. Identication of steganographic methodthis technique identies the type of steganographic method used and it is the so-called multi-class steganalysis. Estimation of the length of a hidden messagethis technique reveals the amount of embedded message as the acquired information. Identication of stego-bearing pixelsthis technique uncovers the exact locations where the pixels are used to carry the message bits. Retrieval of stegokeythis technique provides access to the stego-bearing
1

A list of free steganographic tools can be found in the citation entry #25 given in [12].

pixels as well as the embedding sequence. Message extractionthis technique normally involves extracting and deciphering the hidden message to obtain a meaningful message.

1.3

Objectives

The main part of the thesis is steganalysis of information hiding techniques. The task of steganalysis is to design an algorithm that can tell apart a cover document from its copy but with a hidden message. A larger part of steganalysis works published so far deals with grayscale and color images. We consider a less explored area of binary image steganography, which becomes more and more important for electronic publishers, distribution, management of printed documents and electronic libraries. To summarise, our main objectives cover the following: To study techniques that can be applied to distinguish the images hidden with secret messages from those without. This technique will serve as an automated system to perform the analysis on a large number of images. To evaluate the functionality of the steganalysis technique across dierent steganographic methods. In particular, we are going to investigate how the steganalysis technique could be used to detect new and unknown steganographic methods. To investigate dierent types of binary image steganography. This is important to gain an understanding of the internal mechanism used during the embedding operation. To make contributions that will extend the steganalysis technique to extract additional secret parameters. These secret parameters include hidden message length, type of steganographic method used, locations of stego-bearing pixels and secret key. Note that there are two aspects of steganalysis. The rst relates to the attempt to break or attack a steganography; the second uses it as an eective way of evaluating and measuring steganography security performance. This work studies steganalysis in terms of the rst aspect. In particular, we aim to carry out dierent levels of analysis to extract the relevant secret parameters.

Background and Review

Binary Image Steganalysis

Steganalysis Enhancement

1. Introduction

2. Background and Concepts

4. Blind Steganalysis

8. Feature-Pooling Steganalysis 9. Improving JPEG Image Steganalysis

10. Conclusion

3. Literature Review

5. Multi-Class Steganalysis 6. Message Length Estimation 7. Payload Location Identifcation

Figure 1.1: Overview of the thesis

1.4

Research Overview

The general structure of the thesis is shown in Figure 1.1. The chapters can be divided into the following three parts: background and review, binary image steganalysis and steganalysis enhancement. The background and review part describes the main developments and concepts in steganography and its analysis. It also describes the state of the art and major publications that have inuenced the research developments in the eld. The binary image steganalysis part presents techniques to counteract binary image steganography. The underlying ideas are to employ statistical techniques to analyse the given images. The steganalysis enhancement part provides improvement to some of the existing steganalysis techniques that deal with JPEG images.

1.4.1

Contributions

The major contributions of this thesis are listed below. Blind steganalysis. We have developed a steganalysis technique to distinguish a stego image from a cover image. Mainly, we have broken several steganographic methods from the literature. This technique uses an image processing technique that extracts sensitive statistical data as the feature set. From the feature set, it employs classier to determine the existence of a secret message. In addition, this technique can be rened and used to detect a dierent type of steganographic method. This property is important when dealing with an unknown and new steganographic method. 5

Multi-class steganalysis. We have extended our blind steganalysis to determine the type of steganographic method used to produce the stego image. This is important information that allows an adversary to mount a more specic attack. From the literature review, this is the rst multi-class steganalysis technique developed particularly to attack binary image steganography. Message length estimation. We have designed a simple yet eective technique based on rst-order statistic to estimate the length of an embedded message. This estimation is crucial and normally is required if we intend to extract a hidden message. We have identied that the notches and protrusions can be utilised to approximate the degree of image distortion caused by embedding operation. In particular, this technique attacks the steganographic method developed in [69]. Steganographic payload locations identication. We have presented a technique to identify the locations where hidden message bits are embedded. This technique is one of the very few researches in the literature that is able to extract additional secret information. Eventually, this information is very important for an adversary who wishes to remove a hidden message or deceive communication. Enhancement of existing steganalysis techniques. We have proposed improvement to existing JPEG image steganalysis. Specically, we select and combine several types of features from several existing steganalysis techniques by using a feature selection technique to form a more powerful blind steganalysis. We have shown that the technique has improved the detection accuracy and also reduced the computational resources. We also show that by minimising the inuence of image content, the detection accuracy can be improved.

1.4.2

Organisation of the Thesis

The rest of the thesis is organised into nine chapters. Chapter 2 introduces some background to explore the state-of-the-art techniques studied in this work. Additionally, we introduce the fundamental concepts that will be used in the following chapters. More precisely, this chapter gives short introductions to the eld, including the denitions, terms, synonyms and taxonomy. Chapter 3 reviews the literature related to our work. We select several steganalysis 6

techniques that are going to be analysed in the thesis. To make the presentation as meaningful as possible, the reviews are organised into dierent levels of analysis. There are myriad of possible steganographic methods available; however, we will discuss only the methods selected for our analysis. Please refer to [12] for a comprehensive review of steganography. Our steganalysis starts from nding an algorithm that is able to distinguish a cover image from a stego one. This work employs pattern recognition methodology to perform the classication. Our focus is to extract a discriminative feature set to enable accurate detection of the existence of secret messages. This analysis was published in [20] and is presented in Chapter 4. Chapter 5 discusses an algorithm for identication of a steganographic method that has been used to embed a secret message into a binary image. We assume that the collection of possible methods is known. The objective of this analysis is twofold: to dierentiate an image with hidden message from one without and to identify the type of steganographic method used. This analysis is an extension on the work presented in Chapter 4 to form a more powerful multi-class steganalysis. This work has been published in [19]. In Chapter 6, we present a technique for estimating the length of a hidden message embedded in a binary image. This estimated length is one of the important secret steganographic parameters and is usually required to accomplish further analysis, such as retrieving the stegokey shared between the sender and receiver. The technique presented in this chapter has also been published in [21]. The work done in the previous chapters so far has enabled us to discriminate images with a hidden message from those without one. However, the ability to discriminate images does not enable us to locate the hidden message. Therefore, we wish to investigate the identication of hidden message bits location in an image. The work is based on the concept developed by Ker [62] where it is assumed that we may access dierent stego images with message bits embedded in the same locations. This assumption is possible when the same stegokey is reused for a batch of secret communications. The essential dierence is the medium under the analysis, namely the binary image, which is known to have modest statistical characteristics. This work is presented in Chapter 7. An initial study of this chapter has been published in [22]. Although the previous chapters focused primarily on binary image steganalysis, we have also paid attention to the steganalysis in other image domains. Our

contribution to greyscale image steganalysis is supplementary, but is as important as that of the other chapters and is presented in Chapters 8 and 9. This work can be considered an adjunct to existing steganalysis techniques that contributes some enhancements. The enhancements discussed in Chapters 8 and 9 have been published in [17] and [18], respectively. We conclude the thesis in Chapter 10 where we discuss possible future directions for the research.

Chapter 2 Background and Concepts


This chapter introduces and denes the concepts used throughout this thesis and provides relevant background information. We start by providing an overview of steganography and a formal denition. We also provide the description of its counterpart, namely steganalysis. We discuss dierent types of steganalysis, which are referred to as dierent levels of analysis. For steganalysis that involves classication, we dedicate a section that discusses dierent types of classiers. Finally, since this thesis focuses on the analysis of image steganography, we also provide a description of a variety of common digital images used for steganography.

2.1

Overview of Steganography

Usually cryptography is used to protect a communication from eavesdropping. Messages are encrypted and only a rightful recipient can decrypt and read the messages. However, encrypted messages are obvious, which might arouse the suspicion of an eavesdropper. Consequently, the communication is probably susceptible to attacks. Steganography is an alternative method for privacy and security. Instead of encrypting, we can hide the messages in other innocuous looking medium (carrier) so that their existence is not revealed. Clearly, the goal of cryptography is to protect the content of messages, steganography is to hide the existence of messages. An advantage of steganography is that it can be employed to secretly transmit messages without the fact of the transmission being discovered. Often, cryptography and steganography are used together to achieve higher security.

Message

Message

Carrier

Embedding Public Channel

Extraction

Key
Figure 2.1: General model of steganography Steganography can be mathematically dened as follows:

Emb : C M K S Ext : S K M, (2.1)

such that Emb(C, M, K) = S and Ext(S, K) = M. Emb and Ext are the embedding and extraction mapping functions, respectively. C is the cover medium, S is the medium embedded with message M and K denotes the key. Figure 2.1 shows a simple representation of the generic embedding and extraction operation in steganography. During the embedding operation, a message is inserted into the medium by altering some portion of it. The extraction operation involves the recovery of the message from the medium. In this example, the message is embedded inside a carrier and is transmitted via a public channel (e.g., internet). While at the receiving site, the message is extracted using the key shared between the sender and receiver. The message is the hidden information and can be a plain text, cipher text, image or anything that can be converted into stream of bits. Consider a typical image steganography. In the embedding operation, a secret message is transformed into a bit stream of bits, which is embedded into the least signicant bits (LSBs) of the image pixels. The embedding overwrites the pixel LSB with the message bit if the pixel LSB and message bit do not match. Otherwise, no changes are necessary. For the extraction operation, message bits are retrieved from pixel LSBs and combined to form the secret message. There are two main selection algorithms that can be employed to embed secret message bits: sequential and random. For sequential selection, the locations of

10

pixels used for embedding are selected sequentiallyone after another. For instance, pixels are selected from left to right and top to bottom until all message bits are embedded. With random selection, the locations of the pixels used for embedding are permuted and distributed over the whole image. The distribution of the message bits is controlled by a pseudorandom number generator (PRNG) whose seed is a secret shared by the sender and the receiver. This seed is also called the stegokey. The latter selection method provides better security than the former because random selection scatters the image distortion over the whole image, which makes it less perceptible. In addition, the complexity of tracing the selection path for an adversary is increased when random selection is applied. Apart from this, steganographic security can be enhanced by encrypting the secret message before embedding it. Almost any form of digital media can be used for steganographic purposes as long as the information in the media has redundancy. These media can be classied (but not limited) to the following categories: images, videos, audios, texts, executable les and computer le systems [67, 94, 5, 81, 29, 26, 104, 118, 46, 1, 28, 83]. The most common medium is an image, as the large redundancy of images allows easy embedding of messages [78]. The input image used in the embedding operation is called the cover image; the generated output image (with the secret message embedded in it) is called the stego image. Ideally, the cover and stego images should appear identicalit should be dicult for an unsuspecting user to tell apart the stego image from the cover image. A list of possible choices for cover images includes binary (black and white), greyscale and colour images. Tseng and Pan [107] developed a steganography that embeds a secret message in a binary image, and Liang et al. [69] used binary images in their steganography. The OutGuess [90] and F5 [110] are examples of steganography that apply greyscale and colour images. A more recent steganographic method developed by Yang (see [117]) uses colour images.

2.2

SteganalysisModel of Adversary

The invasive nature of steganography leaves detectable traces within the stego image. This allows an adversary to use steganalysis techniques to reveal that a secret communication is taking place. Sometimes, an adversary is also referred

11

to as a warden. In general, there are two types of warden: passive and active. A passive warden only examines the communication and wishes to know if the communication contains some hidden messages. The warden does not modify the content of the communication. For example, the communication is allowed if no evidence of secret message is found. Otherwise, it is blocked. On the other hand, an active warden may introduce distortion to interrupt and destroy the communication although there is no evidence of secret communication. Most current steganographic methods are designed for the passive warden scenario. Without loss of generality, we will use the term adversary instead of warden in all the following steganalysis scenarios. Beside the warden scenario discussed above, sometimes an adversary may not have the authority or resources to block the communication. Then, the adversary might wish to acquire related secret information (parameters) or even to extract the secret message. Note that our works are based on this type of adversary who wants to extract information about a secret message. We will discuss this at length shortly in the next section. In general, there are two types of steganalysis: targeted and blind. Targeted steganalysis is designed to attack one particular embedding algorithm. For example, the work in [7, 49, 57, 42] is considered targeted steganalysis. Targeted steganalysis can produce more accurate results, but it normally fails if the embedding algorithm used is not the target. Blind steganalysis can be considered a universal technique for detecting dierent types of steganography. Because blind steganalysis can detect a wider class of steganographic techniques, it is generally less accurate; however, blind steganalysis can detect new steganographic techniques where there is no targeted steganalysis available yet. In other words, blind steganalysis is an irreplaceable detection tool if the embedding algorithm is unknown or secret. The feature-based steganalysis developed in [35] is one example of successful blind steganalysis. Other examples are to be found in [99, 70]. The most widely used denition of steganography security is based on Cachin scheme [8]. Let the distribution of cover image and stego image be denoted as PC and PS , respectively. Cachin dened steganography security by comparing the distribution, PC and PS . The comparison can be made by using Kullback-Leibler distance dened as follows: D(PC PS ) =
cC

PC (c) log

PC (c) . PS (c) 12

(2.2)

When D(PC PS ) = 0, it means the distribution of stego image, PS is identical to the distribution of cover image, PC . This implies the steganography is perfectly secure because it is impossible for the adversary to distinguish between cover and stego images. If D(PC PS ) , then Cachin dened the steganography as -secure. Thus, the smaller is, the greater the likelihood that a covert communication (i.e., steganography) will not be detected. As discussed in [25], another possible way to dene steganography security is based on a specic steganalysis technique. Alternatively, one could dene the security with respect to the inability of an adversary to prove the existence of covert communication. In other words, a steganographic method may be considered practically secure if no existing steganalysis technique can be used to mount a successful attack.

2.3

Level of Analysis

Under ideal circumstances, an adversary applying steganalysis intends to extract the full hidden information. This task can be very dicult, or even impossible to achieve. Thus, the adversary may start steganalysis with more realistic and modest goals in mind, such as restricting the eort to dierentiating cover and stego images, classifying the embedding technique, estimating the length of hidden messages, identifying the locations where bits of hidden information are embedded and retrieving the stegokey. Achieving some of these goals allows improvement of the steganalysis, making it more eective and appropriate for the steganographic method. The rst step in analysing steganography can be distinguishing cover from stego images. This involves analysing the characteristics of the image and looking for the evidence of abnormalities. This step is plausible because the embedding operation will distort the image content and produce deviations from normal image characteristics. For example, the rst-order statistic of a stego image tends to form histogram bin pairing, where this abnormal characteristic practically never occurs in a cover image. Normally, this analysis is known as the most basic level of blind steganalysis. It is also possible to extend this level of blind steganalysis to a more involved level, known as multi-class steganalysis. From a practical perspective, multi-class steganalysis is similar to the basic level; however, instead of classifying two classes

13

(cover and stego images), multi-class steganalysis can classify images into more classes that come from dierent types of stego images produced by dierent embedding techniques. Hence, the task of multi-class steganalysis is to identify the embedding algorithm applied to produce a given stego image, or to classify it as a cover image if no embedding is performed on it. Normally, to avoid suspicion, the amount of message embedded is far less than the image can accommodate. Thus, an adversary cannot tell how much information has been embedded based on the size of the image and a statistical approach needs to be utilised to estimate the hidden message length. Note that the terms message, hidden message and secret message are used interchangeably. The message length is the number of bits embedded in the image. It is normally dened by the ratio between the number of embedded message bits and the maximum number of bits that can be embedded in a given image. It can also be measured as bits per pixel (bpp). The analysis levels discussed so far cannot reveal the locations where hidden message bits are embedded. However, with the help of estimated hidden message length as side information, an adversary can proceed to identify the stego-bearing pixels. Identifying the exact location of stego-bearing pixels is not easy for two reasons. First, the message bits are often randomly scattered throughout the whole image. Second, it is dicult or impossible to detect hidden message bits that are unchanged with respect to the cover image. Identifying the stego-bearing pixels locates the message bits, but does not determine the sequence of the message bits. Thus, the next level of steganography analysis is to retrieve the stegokey. Successfully retrieving the stegokey can be considered a bigger achievementit provides access to the stego-bearing pixels as well as the embedding sequence. In other words, a correct stegokey will give information about the order of bits that create the hidden message. Studies related to each analysis technique will be given and elaborated in Section 3.2.

2.4

Blind Steganalysis as Pattern Recognition

An example of classication problem involves dividing a set of many possible objects into disjoint subsets where each subset forms a class. Usually, the pattern recognition techniques are used to solve this problem. Pattern recognition is an important aspect of Computer Science that focuses on recognising complex pat-

14

Image Feature Extraction Features Classification Training Testing

Trained Model

Decision (Cover or Stego)

Figure 2.2: General framework of blind steganalysis terns from samples and making intelligent decisions based on the patterns. As discussed in Section 2.3, blind steganalysis examines the image characteristics (samples) and determines whether these characteristics exhibit abnormalities (decision making). This means that, given an image, the steganalysis should be able to decide the class (cover or stego) in which the image belongs. Hence, the problem of blind steganalysis can be considered a classication problem and techniques from pattern recognition can be employed. Dierent embedding techniques are thought to produce dierent changes in image characteristics. In other words, the characteristics of cover and stego images dier, and those resulting from dierent stego images (stego images produced by dierent embedding techniques) dier as well. Therefore, it is possible to extend the pattern recognition techniques to dierentiate and classify these images. This extended blind steganalysis is known as multi-class steganalysis. As with any pattern recognition methodology, blind and multi-class steganalysis consist of two processesfeature extraction and classication. The general framework for blind steganalysis is shown in Figure 2.2.

2.4.1

Feature Extraction

Feature extraction is a process of constructing a set of discriminative statistical descriptors or distinctive statistical attributes from an image. These descriptors or 15

attributes are called features. Alternatively, feature extraction can be considered a form of dimensionality reduction. It is desirable that the extracted features should be sensitive to the embedding artefact, as opposed to the image content. Some examples of the features extracted in the early stage of blind steganalysis research include image quality metrics, wavelet decompositions and moment of image statistic histograms. These features were used in the blind steganalysis developed in [4], [73] and [48], respectively. More recent features developed include Markov empirical transition matrix, moment of image statistic from spatial and frequency domains and the co-occurrence matrix, which are employed in [54], [14] and [116], respectively. The details of these features will be covered in Section 3.2.

2.4.2

Classication

Classication identies or categorises images into classes (such as a cover or stego image) based on their feature values. The primary classication involved in steganalysis is supervised learning. In supervised learning, a set of training samples (consisting of input features and class labels) is fed in to train the classier. Once the classier is trained (trained model), it predicts the class label based on the given features. Some of the common classiers used in steganalysis include multivariate regression, Fisher linear discriminant, neural network and support vector machines (SVM). Multivariate regression [11] provides a trained model, which consists of regression coecients. During training, regression coecients are predicted using the minimum mean square error. For example, let the target label (or class label) be yi and xij denotes the features, where i = 1, . . . , N indicates the ith image and j = 1, . . . , n indicates the jth feature, then the linear expression would be as shown below:

y1 y2 yN

= =

1 x11 + 2 x12 + + n x1n + 1 , 1 x21 + 2 x22 + + n x2n + 2 , . . . 1 xN 1 + 2 xN 2 + + n xN n + N , (2.3)

where is the regression coecient and is the zero mean Gaussian noise. N and

16

n are the total number of samples and features, respectively. With these regression coecients, a given image can be classied by regressing the image features. The computed target value is then compared to a threshold to determine the right image class. Fisher linear discriminant is a classication method that projects multidimensional features, x onto a linear space [16]. Suppose two classes of observations have means y=0 and y=1 , and covariances y=0 and y=0 , the linear combination of features w x will have means w y=i and variances w T y=i w for i = 0, 1. Fisher linear discriminant is dened as a linear combination of features that maximizes the following separation, S:
2 between class , 2 within class (w y=0 w y=1 )2 , w T y=0 w + w T y=1 w (w(y=0 y=1 ))2 . w T (y=0 + y=1 )w

= = =

(2.4)

Next, it can be shown that the optimal w is given by w = (y=0 + y=1 )1 (y=0 y=1 ). (2.5)

Finally, an image can be classied by linearly combining its extracted features with w and comparing the result to a threshold. Articial neural network, usually called neural network, is an informationprocessing model inspired by the way the biological nervous system (e.g., the brain) processes information. The basic building block of the neural network is the processing element (PE), commonly known as the neuron. The processing capabilities are derived from a collection of interconnected neurons (PEs). Mathematically, a neural network can be considered a mapping function F : X n Y , where n dimensions of features X are the inputs to the neural network, with decision values Y (class labels) [119]. The function F can be dened as a composition of other functions Gi = (G1 , . . . , Gm ). In addition, function Gi can further be dened as a composition of other functions. The composition of these function denitions forms the neural network. The structure of these functions and their dependencies between inputs and outputs will determine the type of neural network. The most common types used in classication are feedforward and backpropagation neural network. As with any other supervised learning, the classication 17

Figure 2.3: Two-class SVM classication process in a neural network involves two operationstraining and testing. During training, the neural network learns to associate outputs with input patterns. This is carried out by systematically modifying the weights of the inputs throughout the neural network. When the neural network is used for testing, it identies the input pattern and tries to determine the associated output. When the input pattern has no associated output, the neural network provides an output that corresponds to the best match of the learned input patterns. Support vector machines (SVM) are a classication technique that can learn from a sample. More precisely, we can train the SVM to recognise and assign class labels based on a given collection of data (i.e., features). For example, we train the SVM to dierentiate cover images from stego images by examining the extracted features from many instances of cover and stego images. To illustrate the point, let us interpret this example using the illustration shown in Figure 2.3. The X and Y axes represent two dierent features. Cover and stego images are represented by circles and stars, respectively. Given an unknown image (represented by a square), the SVM is required to predict the class to which it belongs. This example is easy, as the two classes (cover and stego) form two distinct clusters that can be separated by a straight line. Hence, the SVM nds the separating line and determines the cluster for the unknown image. Finding the right separating line is crucial and it is provided during the training. In practice, the feature dimensionalities are higher and we need a separating plane, known as separating hyperplane, instead of a line. Thus, the goal of SVM is to nd a separating hyperplane that can eectively separate classes. To do that, the SVM will try to maximise the margin of the separating hyperplane during training. Obtaining this maximum-margin hyperplane will optimise the SVMs ability to predict the correct class of an unknown object (image). 18

However, there are often non-separable datasets that cannot be separated by a straight separating line or at plane. The solution to this diculty is to use a kernel function. A kernel function is a mathematical routine that projects the features from a low-dimensional space into a higher dimensional space. Note that the choice of kernel function will aect the classication accuracy. For additional reading on SVMs, see [80].

2.5

Digital Images

As discussed in the overview of steganography section, practically any form of digital media can be used to carry secret messages. Examples of these media include image, video, audio, text, etc. By far the most popular choice is image. In this section we will introduce various digital images, since the vast majority of research in steganography is concerned with image steganography. In addition, the work of the thesis is concentrated on image steganalysis. A digital image is produced through a process called digitisation. Digitising an image involves converting analogue information into digital information; thus, a digital image is the representation of an original image by discrete sets of points. Each of these points is called a picture element or pixel. Pixels are normally arranged in a two-dimensional grid corresponding to the spatial coordinates in the original image. The number of distinct colours in a digital image depends on the number of bits per pixel (bpp). Hence, the types of digital image can be classied according to the number of bits per pixel. There are three common types of digital image: Binary image. In a binary image, only one bpp is allocated for each pixel. Since a bit has only two possible states (on or o), each pixel in a binary image must represent one of two colours. Usually, the two colours used are black and white. A binary image is also called a bi-level image. Greyscale image. A greyscale image is a digital image in which the only colours are shades of grey. The darkest possible shade is black, whereas the lightest possible shade is white. Normally, there are eight bits per pixel assigned for a greyscale image. This creates 256 possible dierent shades of grey. Colour image. In general, a pixel in a colour image consists of several primary colours. Red, green and blue are the most commonly used primary colours. 19

Each primary colour forms a single component called a channel, with eight bits usually allocated for each channel, producing 24 bits per pixel. This corresponds to roughly 16.7 million possible distinct colours. When the channels in a colour image are split, each forms a dierent greyscale image.

2.5.1

Image File Formats

After digitisation, a digital image can be stored in a specic le format. Although many le formats exist, the major formats include BMP, JPEG, TIFF, GIF and PNG. Images stored in these formats are considered raster graphics. Another type of graphic image is a vector graphic image. Unlike raster graphics, which use pixels, vector graphics use geometric primitives such as points, lines and polygons to represent the images. The rendering of the geometric primitives in vector graphics is based on mathematical equations. This thesis focuses on raster rather than vector graphics. In a raw image, the data captured from a digital device sensor are preserved and stored in a le. The data captured are raw in the sense that no adjustment or processing is applied. The data are merely a collection of pixel values captured at the time of exposure. Note that there is no standard for a raw image and it is device dependent. Hence, a raw image is often considered an image, rather than a standard image le format. The bitmap or BMP format is considered a simple image le format. Normally the data is uncompressed and easy to manipulate. However, the uncompressed BMP format gives a BMP image a larger le size than that of a compressed image. A BMP image can also use a colour palette for indexed-colour images. Nonetheless, a colour palette is not used for BMP images greater than 16 bpp or higher. The joint photographic experts group (JPEG) format is by far the most common image le format. JPEG images are very popular and primarily used in photographs. Their popularity is due to the excellent image quality they produce despite a smaller le size. This is achieved through lossy compression. Many imaging applications allow users to control the level of compression. This is useful because users can trade o image quality for a smaller le size and vice versa. However, lossy compression reduces the image quality and cannot be reversed. In situations where the image quality is as important as the le size, the tagged 20

image le format (TIFF) could be a suitable choice. The TIFF format uses lossless compression, which reduces the image le size while preserving the original image quality. This makes TIFF a popular image archive option. In addition, as the name implies, the TIFF format also oers exible information elds in the image header called tags. These tags are very useful and can be dened to hold application-specic information. The graphics interchange format (GIF) uses a colour palette to produce an indexedcolour image. It also uses lossless compression. GIF can oer optimum compression when the image contains solid colour graphics (such as a logo, diagram, drawing, or clipart). In addition, GIF supports transparency and animation. These features make GIF an excellent format for certain web images. However, GIF is not suitable for complex photographs with continuous tones, as a GIF image can store only 256 distinct colours. Compared with GIF, the portable network graphics (PNG) format provides more improvements. These improvements include greater compression, better colour support, gamma correction in brightness control and image transparency. The PNG format is an alternative to GIF and is expected to become a mainstream format for web images.

2.5.2

Spatial and Frequency Domain Images

In a general sense, an image (I) can be considered a result of the projection of a scene (S) [34]. The spatial domain image is said to have a normal image space, which means that each image element at location in image I is a projection at the same location in scene S. The distance in spatial domain corresponds to the real distance. A common example of the spatial domain image is BMP image. The frequency domain image has a space where each element value at location in image I represents the rate of change over a specic distance related to the location . A popular frequency domain image is the JPEG image.

21

Chapter 3 Literature Review


This chapter considers research relevant to both steganography and steganalysis. Steganography is presented in Section 3.1. We give an overview of dierent types of steganography with the emphasis on image steganography. In particular, we discuss binary image steganography in the rst part of the section and JPEG image steganography in the second part. Steganalysis is discussed in Section 3.2. We review and highlight the most relevant existing techniques in steganalysis. These techniques are specically used in analysing image steganography. The discussion is divided into six subsections. We follow and organise the discussion according to the dierent levels of analysis, which is presented in Section 2.3.

3.1

Steganography

This section discusses some of the selected steganographic methods. These particular methods are used as a subject in our analysis. The rst ve subsections discuss steganographic methods that use binary images as the cover images and the rest of the subsections discuss methods that use JPEG images.

3.1.1

Liang et al. Binary Image Steganography

Consider a variant of boundary pixel steganography proposed by Liang et al. [69]. Boundary pixel steganography hides a message along the edges, where white and black pixels meetthese are known as boundary pixels. Note that the boundary pixels are those pixels within the image where there is colour transition occurred 22

between white and black pixels. The boundary pixels should not be confused with the four borders of an image. To obtain higher imperceptibility, the pixel locations used for embedding are permuted and distributed over the whole image. The distribution of message bits is controlled by a pseudorandom number generator whose seed is a secret shared by the sender and the receiver of the hidden message. This seed is also called stegokey. As the message bits are embedded on the boundary pixels of the image, it is important to identify the boundary pixels and their orders unambiguously. Once the sequence of boundary pixels is obtained, a pseudorandom number generator is used to determine the place where the message bits should be hidden. The authors of [69] dene boundary pixels as those that have at least one neighbouring pixel with a dierent intensity. For example, a white (black) pixel must have at least one black (white) neighbouring pixel. Note that a pixel can have, at most, four neighbours (left, right, top and bottom). Not all boundary pixels are suitable for carrying message bits because embedding a bit into an arbitrary boundary pixel may convert it into a non-boundary one. If this happens, then the extraction will not be correct and recovery of the hidden message is impossible. Because of this technical diculty, the authors have proposed a modied algorithm that adds restrictions on the selection of boundary pixels for embedding. A currently evaluated boundary pixel, P is considered eligible for embedding if the following two conditions are satised: i. Among the four neighbouring pixels, there exist at least two unmarked neighbouring pixels and their pixel values must be dierent. ii. For each marked neighbouring pixel (if any), its neighbouring pixels (excluding the current pixel, P ) must also satisfy the rst criterion. A pixel is said to be marked if it has already been evaluated or is assigned a (pseudorandom) index with a smaller value than the current index. In contrast, a pixel is said to be unmarked if it is evaluated after the current pixel. Figures 3.1 and 3.2 show some examples of eligible and ineligible pixels, respectively. The shaded box represents a pixel value of zero and the white box represents a pixel value of one. These pixels are taken from some portion of a binary image. Pixel P is the currently evaluated pixel and the number inside each box is the 23

pseudorandom index. This index will indicate if a pixel is unmarked or marked. For example, in Figure 3.1(b), the current pixel, P will have three unmarked (i.e., left, right and top neighbouring pixels) and one marked pixels (i.e., bottom neighbouring pixel). Pixel P in Figure 3.1(a) is an eligible pixel because it satises the rst condition and it does not have any marked neighbouring pixel. Pixel P in Figure 3.1(b) satises both conditions and thus, it is an eligible pixel. On the other hand, pixel P in Figure 3.2(a) is an ineligible pixel because it does not satisfy the rst condition. Pixel P in Figure 3.2(b) only satises the rst condition; therefore, it is also considered ineligible.
Current pixel, P Current pixel, P

10 55 12 90

63 7 22 56
(a)

67 30 21 9

45 32 80 73

10 55 12 90

63 7 22 56
(b)

67 30 21 9

45 32 80 73

Figure 3.1: Example of eligible pixels

Current pixel, P

Current pixel, P

10 55 12 90

63 7 22 56
(a)

67 30 21 9

45 32 80 73

10 55 12 90

63 7 22 56
(b)

67 30 21 9

45 32 80 73

Figure 3.2: Example of ineligible pixels

Once the boundary pixel is found eligible, the message bit will be embedded in the pixel by overwriting its value if the message bit does not match the value; otherwise, the pixel is left intact. This procedure is applied and repeated to embed other message bits.

24

3.1.2

Pan et al. Binary Image Steganography

Motivated by Wu and Lee [113], Pan et al. developed a steganographic method that embeds secret messages in binary images [82]. Compared with [113], this method is more exible, in terms of choosing the cover image block. The Pan et al. method uses every block within an image to carry a secret message. This gives it a greater embedding capacity. The security is also improved by having less alteration of the cover image. In this embedding algorithm a random binary matrix, and a secret weight matrix, are dened and shared between the sender and receiver. Both matrices are of size mn. is a binary matrix and the matrix has elements of {1, 2, . . . , 2r 1} where r is the number of message bits to be embedded within a block. A given binary image is partitioned into non-overlapping blocks, Fi of size m n and the following matrix, i is computed: i = [(Fi ) ], (3.1)

where and are the bitwise exclusive-OR and pair-wise multiplication operators, respectively. [] is the arithmetic summation of all elements in the matrix. r message bits, mN = (m1 m2 . . . mr ) are embedded in block Fi by ensuring the following invariant: i mN mod 2r , (3.2)

where mN is the decimal representation of the message bits and () is the binary to decimal conversion. If the invariant holds, the Fi from i (Equation (3.1)) is left intact. Otherwise, some pixels from Fi will be altered. In most cases, one pixel will be ipped if there is a mismatch and an alteration is required. However, if ipping one pixel is not sucient, ipping a second pixel will guarantee the invariant to be held. Hence, only two pixels of Fi will be altered, at most. This method can embed up to r = log2 (mn + 1) bits per block. Successfully extracting a secret message requires the correct combination of and . and , in this case, can be considered the stegokey. The receiver also needs to know the correct parameters (m, n and r) used in the embedding. Then the secret message bits embedded in a block can be extracted through Equation (3.2)mN = i mod 2r . The extracted mN from each block is converted into binary bits and concatenated to form the secret message.

25

3.1.3

Tseng and Pan Binary Image Steganography

Although the method developed in [82] generally enhanced security (by altering fewer pixels for the same amount of embedded message), the quality of the stego image has not been taken into consideration. Noise may become prominent in certain blocks after embedding. For example, an isolated dot may exist in an entirely black or white block. As a sequel to the work done in [82], Tseng and Pan revised the method and enhanced it [107]. The main contribution of this work was to maintain the image quality through sacricing some of the payload. According to the authors, the image quality can be greatly improved while still maintaining a good embedding rateas much as r = log2 (mn + 1) 1 bits per block, where m n is the size of a block. On average, r is only one bit per block less than their previous method. To maintain image quality, the method discards any block that is either entirely black or white. In addition, when a pixel must be ipped to carry a message bit, the selection of which pixel to ip is governed by a distance matrix. The distance matrix selects only a pixel in which the new value (after ipping) is the same as the pixel value of its majority neighbouring pixels. This prevents the generation of isolated dots, which degrades the image quality. For example, Figure 3.3 shows two possible ways of ipping a pixel. Obviously, the eect of ipping will be less visible in Figure 3.3(b) than in Figure 3.3(c). The authors also dened an additional criterion for the secret weight matrix, which also improves the image quality.
Flipped pixel Flipped pixel

(a)

(b)

(c)

Figure 3.3: Eect of ipping a pixel: (a) original block of pixels; (b) no isolated dot (c) obvious isolated dot

Similar to their previous method, the maximum number of pixels that must be altered per block to carry the message bits is, at most, two. The rest of the embedding and extraction algorithms are similar to the previous method. However, 26

if block Fi becomes entirely black or white after embedding, it is skipped. The alteration of that block will not be reversed and the same message bits will be embedded in the next block. This is important to ensure the correctness of message extraction. Both methods have the exibility to adjust between security level and payload size. When increased security is necessary, the block size (parameters m and n) can be increased. This larger block size will reduce the payload size because the total number of blocks per image will be reduced when the block size is larger. Eventually, with the same r bits per block, the total payload is reduced as the total number of blocks is reduced.

3.1.4

Chang et al. Binary Image Steganography

The steganographic method developed by Chang et al. [10] can be considered a variant improved from the binary image steganography developed by Pan et al. [82]. In general, this method oers the same embedding rate as the Pan et al. method, which is r = log2 (mn + 1) bits per block (m n is the block size). However, this method is superior to the Pan et al. method in the sense that it alters one pixel (at most) to embed the same amount of message bits within a block (as opposed to two pixels in the Pan et al. method). Thus, this method provides a higher level of security by reducing the alteration of the stego image. Practically, the Chang et al. method also employs two matrices during embedding: a random binary matrix and serial number matrix. The main dierence in the Chang et al. method is the introduction of the serial number matrix to replace the secret weight matrix. This enables this method to work with less image alteration. With the serial number matrix, r linear equations, known as general hiding equations, are dened to embed r bits of message in a block. The general hiding equations are used to determine the pixel suitable for ipping. To obtain valid general hiding equations, the serial number matrix is required to have 2r 1 elements with non-duplicate decimal values. For message extraction, each block is transformed using the bitwise exclusiveOR operator with the random binary matrix. For each block, r general hiding equations are dened through the serial number matrix. The parities of results calculated from the r general hiding equations are obtained as the message bits. Clearly, the random binary matrix and serial number matrix are used as the stegokey and shared between the sender and receiver. 27

3.1.5

Wu and Liu Binary Image Steganography

Another steganography using a block-based method to embed secret messages in binary images is that developed by Wu and Liu in [112]. This technique also starts by partitioning a given image into blocks. To avoid synchronisation problems (which lead to incorrect message extraction) between embedding and extraction, this technique embeds a xed number of message bits within a block. In their implementation details, the authors opt to embed one message bit per block. The embedding algorithm is based on the odd-even relationship of the number of black pixels within a block. In other words, the total number of black pixels within a block is kept as an odd number when a message bit of ones is embedded, whereas the total number of black pixels is kept as an even number for a message bit of zeros. If the odd-even relationship matches the message bit, no ipping is needed. Otherwise, some pixels must be ipped. Like any other embedding technique, the most important part is the selection of pixels for ipping. An ecient selection approach ensures minimum distortion. That is why, in [112], Wu and Liu introduced a ippability scoring system for selecting pixels for ipping. The score for each pixel is computed by examining the pixel and its immediate neighbours (those within a 3 3 block). The ippability score is produced by a decision module based on the input of two measurements. The rst measurement is the smoothness, which computes the total number of transitions in the vertical, horizontal and two diagonal directions. The second measurement is the connectivity, which computes the total number of black and white clusters formed within a block. These measurements are all computed within a 3 3 block. An illustration of these measurements is shown in Figure 3.4.

3.1.6

F5 Steganography

In [111], Westfeld and Ptzmann observed that an embedding algorithm that overwrites the LSB of JPEG coecients causes the JPEG distribution to form pair of values (PoV). PoV occurs when two adjacent frequencies in the JPEG distribution are similar (Figure 3.12 shows the eect of PoV). By exploiting the PoV, Westfeld and Ptzmann concluded that a steganographic method can be broken. They showed the analysis and attack on Jsteg using the chi-square test (details of this attack are discussed in Subsection 3.2.3). As a result, Westfeld developed a new steganography called F5 [110]. F5 is for-

28

0 0

0 0

1 0

1 1 1

0 1 1 Horizontal
(a)

1 1

1 1

Vertical

Diagonal

Anti-diagonal

3 x 3 block

1 white cluster

(b)

2 black clusters

Figure 3.4: Measurement of smoothness and connectivity: (a) smoothness is measured by the total number of transitions in four directions (the arrows indicate the transition directions, 0 indicates no transition and 1 indicates a transition) (b) Connectivity is measured by the number of the black and white clusters (four white pixels forming 1 cluster and 5 black pixels forming 2 clusters)

mulated to preserve the original property of the statistic (i.e., the JPEG distribution). When alteration is required during embedding, F5 will decrement the absolute value of JPEG coecients by one, instead of overwriting the LSBs with message bits. This prevents the formation of the PoV; hence, F5 cannot be detected through the chi-square test. To minimise the changes caused by embedding, matrix encoding is employed to increase embedding eciency. Finally, to avoid concentrating the embedded message bits in a certain part of the image, F5 embeds on the randomly permuted sequence of coecients. The random sequences are generated by a PRNG.

3.1.7

OutGuess Steganography

OutGuess is a type of JPEG image steganography developed by Provos in [90]. This method was developed to withstand the chi-square attack and the extended chi-square attack as well. This method can be summarised as two main operations: embedding and statistical correction. Similar to other JPEG image steganographies, OutGuess embeds message bits by altering the LSBs of JPEG coecients. The embedding is spread randomly 29

throughout the whole image using a random selection that proceeds with the coecients from the beginning until the end of the image. To select the next coecient, OutGuess computes a random oset and adds the oset to the current coecient location. The random osets are computed by a PRNG. Note that the embedding will cause the image statistics (i.e., the distribution of the coecients) to deviate, hence some coecients are reserved (unaltered) with the intention of correcting the statistical deviation. In other words, after all the message bits are embedded, the reserved coecients will be adjusted accordingly. The adjustment is carried out such that the distributions of cover and stego images are similar.

3.1.8

Model-Based Steganography

Based on the concept of statistical modelling and information theory, Sallee developed a steganography called model-based steganography [96]. Model-based steganography is designed to withstand a rst-order statistical attack while maintaining a high embedding rate. Unlike OutGuess, which preserves only the distribution of an image, model-based steganography preserves the distributions of individual coecient modes. To start the embedding, model-based steganography separates an image into an unalterable x and alterable part x . If a JPEG image is used as the cover image, the most signicant bits of the coecients will be the x and the least signicant bits will be the x . x is used to build a conditional probability P (x |x ) from a selected cover image model. Together with this conditional probability and a secret message, a non-adaptive arithmetic decoder is used to generate a new part x , which will carry the message bits. The selection of the coecients to use is based on a PRNG. Finally, x and x are combined to form the stego image. The embedding algorithm is shown in Figure 3.5(a). To extract the secret message, steps similar to those discussed above are followed with the exception of the non-adaptive arithmetic decoder. An arithmetic encoder is used instead of an arithmetic decoder. The input to the non-adaptive arithmetic encoder is x and the conditional probability P (x |x ). Since x is unaltered, the conditional probability can be regenerated. Therefore, the secret message can be extracted successfully through the non-adaptive arithmetic encoder. Figure 3.5(b) illustrates the extraction algorithm.

30

Cover image x x

Image model

Message

Conditional probability generation x Stego image x'

Entropy decoding

(a)
Image model

Conditional probability generation x Stego image x'

Entropy encoding

Message

(b)

Figure 3.5: (a) Embedding algorithm of model-based steganography (b) extraction algorithm of model-based steganography

3.2

Steganalysis

This review and organisation of the types of steganalysis is not intended to be exhaustive, but that it is organised according to the dierent levels of possible steganographic analysis. More precisely, these levels are ordered according to the type of secret information or parameter an adversary wishes to extract. We begin with the techniques employed by the adversary to detect the presence of a secret message in an image and to determine which type of steganographic method is used. After that, we discuss the techniques used to recover some attributes (secret parameters) of the embedded secret message. This attributes include secret message length, location of stego-bearing pixels and stegokey.

3.2.1

Dierentiation of Cover and Stego Images

In this scenario, it is assumed that the adversary has access to an image (or a collection of images) and tries to determine if the image contains a secret message (stego image) or does not (cover image). This task is doable only if the statistical features present in both cover and stego images are dierent enough to make a reliable decision. In order to do that, dierent feature extraction techniques 31

can be applied to extract relevant statistical features. The following collection of statistical features can be found in the literature: Co-occurrence matrix Statistical moments Wavelet subbands Pixel dierence The next step is to perform classication based on the extracted features. Because the distributions of cover and stego images will never be exactly known, sometimes overlapping happens. To alleviate this problem, cover image estimation is utilised to derive a more sensitive feature for steganalysis. In the following subsections, we are going to discuss how these features have been applied in steganalysis, follow by the discussion on classication and last but not least, we will discuss cover image estimation as well.

Co-occurrence matrix Sullivan et al. use an empirical matrix as the feature set to construct a steganalysis [102]. The steganalysis technique developed can detect several variants of spread-spectrum data hiding techniques [24, 76] and perturbed quantisation steganography [36]. This empirical matrix is also known as a co-occurrence matrix. The authors observe that the empirical matrix of a cover image is highly concentrated along the main diagonal. However, data hiding will spread the concentration away from the main diagonal. An example of this eect is shown in Figure 3.6. To capture this eect, the six highest probabilities (the elements of the empirical matrix with the highest probability) along the main diagonal are chosen. Then ten nearest elements of each highest probability element are also chosen. This creates a feature set with 66-dimensional vectors. Next, the authors subsample the remaining main diagonal elements by four and obtain another feature set with 63-dimensional vectors. A feature set with 129 dimensions is used in their steganalysis. The feature set selected in [102] is stochastic and may not eectively capture the embedding artefacts. Xuan et al. [116] constructed a better feature set from the co-occurrence matrices. They generated four co-occurrence matrices from 32

(a)

Spread away from diagonal (b)

Figure 3.6: Plot of co-occurrence matrices extracted from: (a) cover image; (b) stego image

the horizontal, vertical, main and minor diagonal directions (as opposed to using only the horizontal direction as in [102]). These four matrices are averaged and normalised to form a nal matrix. Note that, because the nal co-occurrence matrix is symmetric, it is sucient to use the main diagonal and part of the upper triangle of the co-occurrence matrix. Xuan et al. selected 1018 elements from this area to form their feature set (a 1018-dimensional feature set). A specically tuned classier (class-wise non-principal components analysis) is used to obtain a high detection rate. Xuan et al. proved its eciency with JPEG and spatial domain image steganography. However, their high dimensional features may suer from the curse of dimensionality when applied to other types of classier. Although their current implementation is arguably optimal, it is threshold dependent, which limits its exibility for blind steganalysis. Chen et al. developed a blind steganalysis based on a co-occurrence matrix [15]. It is well known that direct use of a co-occurrence matrix as the feature will create an expansion of the matrix dimension. For example, for an 8-bit image, the cooccurrence matrix will have 256256 dimensions. Therefore, Chen et al. projected the co-occurrence matrix into a rst-order statistic to reduce its dimensionality. More precisely, this rst-order statistic is the frequency of occurrence along the horizontal axis of the co-occurrence matrix. In [43], the authors exploited the correlations between the discrete cosine transform (DCT) coecients in intra- and inter-blocks of JPEG images. Intra-block correlation is the correlation between neighbouring coecients within a block; inter-block correlation measures the correlation between a DCT coecient in one block and the coecient of the same position in another block.

33

The authors arranged the DCT coecients in a block into a one-dimensional vector using the zigzag order. For each block, only AC coecients are considered while the DC coecient is discarded. This is because normally DC coecients are not changed in JPEG steganography. In addition, the authors also discard some coecients with a high frequency of occurrence (i.e. coecients with a value of zero). All the blocks in a JPEG image are scanned in a xed pattern to form a new re-ordered block called a 2-D array. Only the magnitudes of the coecients are used. Markov empirical transition matrices are used to capture these dependencies. Horizontal and vertical Markov empirical transition matrices are used to capture the intra- and inter-block correlations, respectively. The authors also further trim the dimensionality of the matrices by thresholding the 2-D array. In other words, elements with a magnitude greater than the threshold are assigned a maximum value (the threshold value).

Statistical moments Harmsen and Pearlman [48] showed that additive noise data hiding techniques are equivalent to a low-pass ltering of image histograms. The centre of mass (COM) is used to quantify this eect. Note that COM is the rst-order of statistical moment. The authors have shown that it is better to compute the COM from the frequency domain. Hence, the discrete Fourier transform is applied to transform the image histogram. This transformation produces a histogram characteristic function. COM is computed based on this characteristic function. The detection accuracy reported exceeded 95 per cent when the embedding rate was 1.0 bpp. Unfortunately, the authors did not test for a smaller embedding rate. In most cases, decreasing the embedding rate reduces the detection accuracy. Further, only 24 images were used to test the detection accuracy. This smaller subset of images may not fully represent the actual accuracy. However, the use of COM as the feature in [48] has brought insight into much research. For instance, Shi et al. [100] used a set of statistical moments as the features in their blind steganalysis. First, the authors use the Haar wavelet to decompose the image. After the decomposition, eight wavelet subbands are produced and a discrete Fourier transform was applied on the probability density

34

function of these subbands. The same discrete Fourier transform is applied to the given image as well. Note that these transformations produce nine characteristic functions. Finally, the rst and second orders of statistical moments are computed from the characteristic functions. In a dierent work [115], Xuan et al. developed an enhanced version, based on statistical moments. Enhancement is achieved with an additional level of wavelet decomposition. The third order is used in addition to the rst two orders of statistical moments. The reported experimental results show improvements in detection accuracy and an ability to detect more steganographic types. Shi et al. further improved the use of statistical moments as features [101]. The main dierence compared to [100] and [115] is the incorporation of a predictionerror image. The prediction-error image is obtained from the pixel-wise dierence between the given image and its predicted version. The prediction algorithm is based on a predened relationship within a block of four neighbouring pixels. The statistical moments in [101] are computed from two image components: the given image and the prediction-error image. The procedures to compute the statistical moments are the same as the procedures used in [115], obtaining a 78dimensional feature set (39 features from each image component). The experimental results reported are promising. However, the detection accuracy is unclear for wider percentage range of hidden messages, since only certain percentages of hidden messages are tested. Research similar to that developed in [101] can be found in [15]. The authors of [15] raised the concern of precision degradation in the rst-order statistic when a wavelet is used (i.e., the wavelet coecients are oating points). Hence, the cooccurrence matrix (discrete integers) was used instead of wavelet decomposition. Inspired by the work in [101], Chen et al. enhanced and applied the statistical moments on JPEG image steganalysis [14]. This enhancement involves the incorporation of additional high-order statistics. In their work, the rst feature set is inherited directly from [101]. The same feature extraction procedure, with some modication in the prediction algorithm, is used to form the second feature set. Note that the second feature set is extracted from the absolute value of non-zero DCT coecients. For the third feature set, the same set of non-zero DCT coecients and wavelet subbands are used to construct three co-occurrence matrices. These co-occurrence matrices are transformed into 35

the characteristic functions and the statistical moments are calculated from these characteristic functions. According to the authors, it is crucial to use higher-order statistics as the features because some modern steganography, such as OutGuess and Model-based Steganography, tries to preserve the rst-order statistics. This may cause the rst-order statistical features to become less eective. Hence, it is suitable to incorporate co-occurrence matrices as features. The statistical moments computed from characteristic functions are more eective than those computed from image histogram (i.e., the image probability density function). The main dierence between moments of characteristic functions and
1 image histogram is the variance proportionality (i.e., and , respectively). This means moments of characteristic functions are determined by a smaller variance

distribution. Moments of image histogram are determined by a larger variance distribution. Since data hiding involves the addition of smaller variance noise, it is clear that the eect will be reected more strongly in moments of characteristic functions. Hence, moments of characteristic functions are more sensitive to data hiding. This claim has been veried in [115].

Wavelet subbands It is well known that natural images exhibit strong higher-order statistical regularities and consistencies. Thus, wavelet decomposition is often used to represent these characteristics for various image processing purposes. It is also well known that steganographic embedding signicantly disturbs the characteristics of statistics. Hence, it is very natural to employ wavelet decomposition to detect disturbances. The rst steganalysis technique using wavelet decomposition was developed by Farid [32, 33]. In his work, quadrature mirror lters (QMFs) are used to decompose a given image into multiple scales and orientations of wavelet subbands, obtaining nine wavelet subbands. Quadrature mirror lters is formed by the combination of low- and high-pass decomposition lters and their associated reconstruction lters, which produces three dierent directions (i.e., horizontal, vertical and diagonal). An illustration of the decomposition is shown in Figure 3.7. Farid also used a linear predictor to compute the log errors from the magnitude of coecients in each subband. A linear predictor is dened as a linear combination of some scalar weighting values and a subset of neighbouring coecients. This results in 36

V3 H3 D3

V2 D2

H2

V1 D1

H1

Figure 3.7: Illustration of wavelet decomposition. Hi , Vi and Di denote the horizontal, vertical and diagonal subbands, respectively. The index i indicates the scale another nine sets of log errors (i.e., from the decomposed nine wavelet subbands). Finally, the mean, variance, skewness and kurtosis are used to characterise the wavelet coecient distribution in all nine subbands. The same statistics are used to characterise the nine sets of log error distributions. Combining these statistics forms a 72-dimensional nal feature set. In subsequent work [74], Lyu and Farid extended the wavelet statistics to include the colour components of an image. The wavelet decomposition process used is the same as in their prior work. This means that each colour component will be treated as a greyscale image and is decomposed into wavelet subbands. For example, in a colour image consisting of red, green and blue components, each is processed independently as a greyscale image. However, the main dierence in [74] is the second part of the feature setthe log errors. The linear predictor used to compute the log errors has been updated to include neighbouring coecients from dierent colour components. Identical to their prior work, the mean, variance, skewness and kurtosis are used to characterise the wavelet coecients and log error distributions. Obviously, the number of dimensions for the nal feature set has been increased to 216. Through extensive work on wavelet decomposition, Lyu and Farid [75] extended their work to include phase statistics (in addition to their prior work with magnitude statistics). In their work, phase statistics are modelled using the local angular harmonic decomposition (LAHD). The LAHD can be regarded as a local decomposition of image structure by projecting onto a set of angular Fourier basis kernels. Dierent orders of LAHD can be computed from the convolution of the image with the derivatives of a dierentiable radial lter such as a Gaussian lter. The feature set has been extended to form a 432-dimensional feature set. The experimental results reported show promising results and the ability to detect eight 37

dierent steganographic methods. A feature set extracted from wavelet decomposition may seem eective, but normally the feature dimensionality is large, which increases the complexity of the classication process. In addition, a larger dimensional feature set requires more training samples to achieve stable classication. Other related works that utilise wavelet decomposition to extract feature set can be found in [100, 14, 120].

Pixel dierence Liu et al. consider the dierential operation as high-pass ltering process when applied to images [70]. This is desirable as it can capture the small distortion caused by the embedding operation. In [70] the dierential operation is dened as the pixel-wise dierence between two neighbouring pixels in the horizontal direction (similarly in the vertical direction). The similar dierentiation operations are extended to obtain higher orderssecond and third orders. The authors call these statistics dierential statistics. In the feature extraction phase, dierential statistics and the image pixel probability mass function are used to construct the rst- (histogram for the frequency of occurrence) and second-order (co-occurrence matrix) statistics. With these rstand second-order statistics, a discrete Fourier transform is applied to obtain the respective characteristic functions. Finally, the COM for each characteristic function is computed as a feature set. Note that the COM features computed are identical to the features developed by Harmsen and Pearlman [48]. The experimental results reported in [70] suggest that this method can eectively detect spread-spectrum data hiding. In addition, incorporating the dierential statistics feature set signicantly improves the JPEG blind steganalysis developed in [35]. According to the authors, the dierential statistics are used to enlarge the blockiness eects incurred during embedding. Hence, the enlargement makes their feature set more sensitive to data hiding. In a dierent work [99], Shi et al. developed an eective steganalysis technique to attack JPEG steganography. The high accuracy achieved by this technique is due to a sensitive feature set, notably the use of a dierence JPEG 2D array. JPEG 2D array has the same size as the given image lled with the absolute value of quantised DCT coecients. Note that the dierence JPEG 2D array is very similar to the dierential statistics in [70]. More precisely, it is the rst-order

38

dierential statistic. Compared with dierential statistics, where only the horizontal and vertical directions are used, Shi et al. included the major and minor diagonal directions. For each of the four directions, a transition probability matrix is computed. Thresholding is also utilised to achieve a balance between detection accuracy and computational complexity. The dierence JPEG 2D array in [99] reects the correlations of neighbouring coefcients within an 88 block. These correlations are called intra-block correlations. Later, the authors in [13] include the inter-block correlations. For inter-block correlations, the dierence between two coecients with the same mode is computed from two neighbouring 8 8 blocks (as opposed to two immediately neighbouring coecients within an 8 8 block for intra-block correlations). Figure 3.8 shows an example of these correlations. Note that there are 64 coecients per block and the location of each coecient within a block is known as the mode. The experimental results indicate a signicant improvement from incorporating inter-block correlations. Clearly, the coecient dierences contribute crucial information for this improvement.
(i)-th block (i+1)-th block

Intra-block correlation

Inter-block correlation

(N)-th block

Figure 3.8: Illustration of the intra- and inter-block correlations in a JPEG image The eectiveness of dierential statistics [70] can be attributed to the net results after high-pass ltering. More precisely, the results of dierentiation will produce only the variable partspossibly altered during embedding. This characteristic is desirable as it amplies the embedding artefact. Similarly, the alterations incurred in JPEG steganography can be greatly enlarged and captured. This is the case for the dierence 2D array in [99], where the authors examine the dierence between a DCT coecient and its neighbouring coecient. This may seem optimistic for richer statistics image like an 8-bit image. However, its applicability may be 39

questionable for a moderate statistics image, such as a halftone image.

Classication As discussed in Subsection 2.4, dierentiating a stego image from a cover image involves classication. From the literature, the most commonly used classiers include Fisher linear discriminant, articial neural networks and support vector machines. These classiers were discussed in Subsection 2.4.2. Note that most of the work on blind steganalysis focuses on feature extraction. The type of classier selected is merely a choice. The task of feature extraction is considered more crucial than the selection of the classier in steganalysis, primarily because the detection accuracy depends signicantly on the feature sensitivity to the embedding artefact. It is clear that for a sensitive and discriminating feature set, the overall accuracy can be optimised by an optimised classier. It is not hard to switch from one type of classier to another. For example, Farid changed from using Fisher linear discriminant in [33] to SVM [73]. In [101] and [14], where neural network is the initial classier, it is later switched to SVM.

Cover image estimation Normally, a cover image is destroyed or kept secret once a stego image is generated to ensure maximum security of covert communications [92]. This implies that only one version is typically available. If we have access to both the cover and stego versions of the image, we can tell the dierences easily and the steganography scheme is considered broken. In general, the eect of data hiding can be modelled as the eect of additive noise in an image. It is sucient to assume that if the additive noise or message is independent of the cover image, the probability mass function (PMF) of the stego image is equal to the convolution of additive noise PMF and cover image PMF [48]. Hence, the cover image can be estimated from the stego image if the additive noise is eliminated. This has inspired the incorporation of cover image estimation in much steganalysis research, such as image calibration [39] and prediction-error [101], to increase feature sensitivity with respect to the embedded artefacts and to remove the inuence of the image content. This improves the discriminatory power of steganalysis.

40

Ker [61] applied image calibration (along with another improvement called the adjacency histogram) to improve the blind steganalysis initially developed by Harmsen and Pearlman [48]. In his work, a given image is down sampled with an averaging lter. This down sampling involves addition and rounding operations on the pixels (or coecients). These operations even out the additive noise, allowing the cover image to be estimated. However, the eciency degrades when it is used to detect stego images with shorter messages. This suggests that calibration by down sampling may not be the optimal option. In [121], Zou et al. used a simpler method to obtain the estimated cover image, which they call a prediction-error image. The current pixel is subtracted from the neighbouring pixel to obtain the prediction. For example, x(i, j) x(i + 1, j) will produce the prediction-error image in the horizontal direction. x(i, j) is the current pixel at location i and j. The same prediction is applied for vertical and diagonal directions. The authors note that this prediction may exhibit high variation for the prediction values within a predicted image. For instance, the prediction values for an 8-bit image will be [255, 255]. To overcome this issue, the authors proposed using a threshold T . If the absolute value of the prediction is greater than T , it will be set to zero. The authors suggest that thresholding is eective as a high variation in the prediction values is mostly caused by the image content (hence, is insignicant in steganalysis). However, it may be possible for adaptive steganography to counteract this thresholding technique by adaptively selecting the region with high variations. This causes the hidden data to be regarded as the original image content and discarded; therefore, the detection fails.

3.2.2

Classication of Steganographic Methods

In this case, the adversary holds an image and wants to discover which steganographic technique has been used. Further on, we may assume that the collection of possible steganographic techniques is public and known to the adversary. This steganalysis problem has been tackled using the following approaches: Feature extraction Multi-class classication

41

Feature extraction The rst part of multi-class steganalysis is feature extraction. Here, several important feature extraction techniques used in multi-class steganalysis are described and analysed to gain an understanding of the analysis. Rodriguez and Peterson focus on the determination of the embedding techniques for JPEG image steganography [95]. The feature set is extracted from the multilevel energy bands of DCT coecients. First, all DCT coecients are arranged into blocks of 8 8 coecients. Within each block, the DCT coecients are arranged using zigzag and Peano scan methods to produce the multilevel energy bands. Then, the higher-order statistics (such as inertia, energy and entropy) are computed for each band. These higher-order statistics form the rst part of the feature set. In addition, log errors are computed for the multilevel energy bands. The log errors are the residuals computed from the DCT coecients and their predicted coecients. The predicted coecients are obtained from a predene subset of neighbouring coecients. The same higher-order statistics are applied to these log errors to form the second part of the feature set. In general, the multi-class steganalysis [95] performs fairly well. However, the method performs poorly on some steganographic techniques, such as OutGuess [90] and StegHide [51], because OutGuess and StegHide can use a similar embedding algorithm, which makes dierentiation dicult. On the other hand, this also shows that the developed feature set may not discriminate suciently. The weakness is manifested in the feature elimination procedure. It is unclear how the method performs the feature elimination. As a result, important information can be discarded. The extensive work in multi-class steganalysis carried out by Pevn and Fridrich y [85, 86, 87, 88, 89] was aimed at determining the types of embedding algorithms employed in JPEG image steganography. The rst version of their multi-class steganalysis was an enhanced version of their blind steganalysis developed in [35]. Mainly, they utilised their proven discriminative feature set, applying it to multiclass steganalysis. There are 23 features and they can be grouped as global histogram, individual histograms, dual histograms, variation, blockiness and co-occurrence matrices. A global histogram is the histogram of all DCT coecients in an image. Individual histograms are extracted from the DCT coecients of the ve lowest-frequency AC modes. Note that the mode refers to which DCT coecient is located in the block 42

and there are 64 modes. Figure 3.9 shows an illustration of the modes from a JPEG image. The next features are dual histograms, which represent the distributions of eleven selected DCT coecient values within the 64 modes. Variation is used to measure the inter-block dependencies among the DCT coecients. Blockiness measures the spatial inter-block boundary discontinuities. The discontinuities are calculated from the spatial pixel values of the decompressed JPEG image. Finally, the co-occurrence matrices are calculated from the DCT coecients of neighbouring blocks. In addition, the estimated cover image is also used in feature construction to increase the discriminative power of the feature set. To obtain an estimation of the cover image, the authors decompress the JPEG image, crop o some portion of the image and re-compress it. This process is called image calibration.
1-st mode (i)-th block (i+1)-th block 2-nd mode

63-th mode 64-th mode

A magnified view of an 8x8 DCT block

Figure 3.9: The 64 modes of an 88 DCT block. The circle represents the DCT coecient

Note that their multi-class steganalysis shows promising results. Later, Pevn y and Fridrich extended it to include a more complicated case, which involved the analysis of double compressed JPEG images [86, 87]. Double compression occurs when a JPEG image has been decompressed and re-compressed with dierent JPEG quality factors after embedding the secret message. This can occur when F5 or OutGuess is used to generate the stego image. According to the authors, the double compression eect will make cover image estimation inaccurate. Hence, the results of steganalysis may be misleading. The main diculty lies in the unavailability of the previous or rst JPEG quality factor. To alleviate this problem, the authors use an estimation algorithm from [72] to estimate the previous JPEG quality factor. The estimation algorithm utilises a set of neural networks to compute the closest estimation, based on the 43

Q
JPEG image JPEG decompression Spatial JPEG image Cropping Spatial JPEG image

Quantisation Matrix Estimation

Q'

JPEG compression

Q Q
JPEG image JPEG compression Spatial JPEG image JPEG decompression

Q'
JPEG image

Figure 3.10: The modied image calibration steps used for double compressed JPEG image. The shaded box represents the calibrated image and Q denotes the JPEG quality factor DCT coecients of ve lowest-frequency AC modes. With the estimated JPEG quality factor, the updated image calibration process proceeds as follows: First, the JPEG image is decomposed, cropped and re-compressed with the estimated JPEG quality factor. Then, the re-compressed JPEG image is decomposed again and re-compressed for the second time using the second JPEG quality factorThe second JPEG quality factor is the one stored in the JPEG image before calibration. These steps are shown in Figure 3.10. The rest of the feature extraction process remains the same. Pevn and Fridrich later discovered that some important information might be y lost due to the existing features representation [88]. Hence, they enhanced some of the features by replacing the L1 norm with the feature dierences within a carefully chosen DCT coecient range. Only a subset of features is involved in the improvementthe global histogram, individual histograms, dual histograms and co-occurrence matrices. According to the authors, their feature set eectively models the inter-block dependencies of DCT coecients. To build a strong multiclass steganalysis requires features that can also model intra-block dependency. Hence, the authors incorporate the feature set developed in [99] with their extended feature set. Prior to the incorporation, the feature set developed in [99] is averaged and calibrated. From the work of Pevn and Fridrich, the authors combined and constructed a y complete functional multi-class steganalysis in [89]. This system was developed to handle both single and double compressed stego images generated from current popular steganographic techniques. The system can perform classication under a diversied range of JPEG quality factors. In addition, for some non-standard 44

JPEG quality factors tested, the system also shows reliable classication results. The experimental results reported proved the system could classify when presented with stego images that it had not been previously trained to interpret. In a dierent work [31], Dong et al. constructed a multi-class steganalysis based on the analysis of image run length. This work is an extension of their previous work [30]. The main contribution of this work is the ability to perform multi-class steganalysis across dierent image domainsthe same technique can be used to classify spatial (e.g., BMP) and frequency (e.g., JPEG image) domain images. This shows the ability to generalise, which is desirable in multi-class steganalysis. The core feature in their work is the histogram of image run length. Image run length can be considered a compression technique. A sequence of consecutive pixels with the same intensity along a direction can be represented compactly as a single intensity value and count. This forms a matrix r(g, ) with intensity value g and count as the axes. For an 8-bit image and a maximum count of run length L, the histogram of the image run length can be dened as follows:
255

H() =
g=0

r(g, ),

1 < < L,

(3.3)

Note that the histogram count dened in Equation (3.3) is for one direction. Other directions (e.g., 0 , 45 , 90 and 135 ) are computed in a similar manner. Based on the histograms of image run length, several higher-order moments are computed and used as a feature set. The embedding of a secret message alters the distribution of the run length. More precisely, the original pixel sequence, with identical intensity, will be turned into dierent shorter sequences. These changes will be signicantly reected in the image run length. The reported experimental results show comparable performance in the spatial and frequency domains; however, these results may not be representative because the experimental message lengths are arbitrary. It is well known that detection accuracy is inuenced signicantly by the size of the embedded message. It will be useful to determine a fair measurement in terms of message length that can be used in both image domains such that the detection performance accurately reects the discriminative power of the multi-class steganalysis.

45

Multi-class classication The second part of multi-class steganalysis is classication. The most common classier used in multi-class steganalysis is the support vector machine. In general, there are two methods for constructing a multi-class classier: the all-together method and the method that combines several two-class classiers. The alltogether method can be considered one that solves the entire classication with a single optimised classier. Clearly, this method requires more computational resources and involves a complex classier. The other method solves multi-class classication problem by combining several two-class classiers (for brevity, we refer to this as the multiple two-class classiers method). This method requires relatively less computational resources and provides competitive classication accuracy. According to the review in [53], there are three multiple two-class classier approaches: one-against-one, one-against-all and directed acyclic graph support vector machine (DAGSVM). Based on the ndings in [53], one-against-one is preferable and more suitable for practical applications. Examples of work using this approach in multi-class steganalysis can be found in [97, 89, 31]. The rst step in the one-against-one approach is to perform a normal two-class classication among the classes. Every two-class classier is trained to classify one class against each of the other classes. For instance, the rst two-class classier is assigned to distinguish between cover images and type-1 stego images. The next two-class classier is assigned to distinguish between type-1 and type-2 stego images and so on until all pairs of combinations are formed. This method uses K(K 1)/2 two-class classiers for all pairs of classes, where K is the total number of classes. The conceptual diagram for this approach is shown in Figure 3.11.
Stego type-2

Stego type-2

Stego type-1

Stego type-2

Stego type-1

=
Cover

+
Stego type-1

+
Cover

Cover

Figure 3.11: The multi-class classication on the left is formed by a combination of several two-class classications on the right

The second step is to employ a strategy to determine the correct class for the 46

image. A commonly used strategy is majority voting or the max-wins strategy. In the majority-voting strategy, the results from each two-class classier are obtained and accumulated. From the accumulated results, the class receiving the highest count is assigned as the correct class. If two classes obtain the same highest count, one class is randomly selected. Clearly, an embedding algorithm alters a cover image one way or another. This implies that, with an eective feature set, the distance in the feature space between cover images and all types of stego images should be large. However, the distance in the feature space within each dierent type of stego images should be comparatively smaller. Therefore, it is more ecient to use blind steganalysis (i.e., two-class classication) to initially separate cover images from stego images. Then, multi-class classication can determine the type of embedding algorithm. This scheme reduces the number of classes and the number of classiers. The eciency of this scheme has been proven and reported in [31].

3.2.3

Estimation of Message Length

If the steganographic method is known to the adversary, he can begin to recover some attributes about the embedded message. For instance, the steganalysis technique used may provide the adversary with an estimate of the number of embedding changes. For that, the adversary can approximately infer the embedded message length. Further on, we will discuss several well-known steganalysis techniques that estimate the length of an embedded message. Note that the LSB embedding algorithm that overwrites the pixel LSBs will not change the grand total frequencies of pixel intensities. Only the frequencies of occurrence are swapped between these intensities. In other words, when embedding occurs, the frequencies of occurrence for odd pixel intensities are transferred to the corresponding even pixel intensities and vice versa. These frequencies of odd-even pixel intensities are called pairs of values (PoV), (2i, 2i+ 1). This change involves swapping the frequencies of occurrence within each PoV and the sums of the frequencies in every PoV remain the same. If the message bits are uniformly distributed (typically the case, because the message is encrypted), the frequencies of the intensities in each PoV will become identical after embedding (refer to Figure 3.12). From this observation, Westfeld and Ptzmann [111] developed a steganalysis technique based on the chi-square test (known as a chi-square or 2 attack). The 47

500 400 300 200 100 0 85 90 95


PoV (2i, 2i+1) PoV (2i, 2i+1)

PoV (2i, 2i+1)

500 400 300 200 100 0 85 90 95


PoV (2i, 2i+1) PoV (2i, 2i+1) PoV (2i, 2i+1)

(a)

(b)

Figure 3.12: (a) A portion of image histogram before embedding. (b) The same portion of image histogram after embedding. Notice that the histogram bins of each PoV have been equalised

chi-square test measures the degree of similarity between the observed sample distribution and the expected frequency distribution. The observed sample distribution is obtained from the given image distribution. The expected frequency distribution is computed from the arithmetic mean of the PoVs. The 2 attack can estimate the length of an embedded message as long as the message is embedded sequentially. However, the attack is unable to provide reliable detection if the message bits are randomly embedded in the image. To address this weakness, Provos and Honeyman [90, 91] extended the chi-square attack. In contrast to the previous chi-square attack (where the sample size was increased from a xed start location along the test), the extended chi-square attack uses a xed sample size that moves over the entire image. The start location for the xed sample size is set at the beginning of an image and moved with a constant distance along the test. Another dierence is that, instead of computing the PoV arithmetic mean, the expected frequency distribution is obtained from the arithmetic mean of pairs of unrelated coecients. Although the 2 attack is eective for attacking generic LSB replacement steganography, it fails if a steganography employs a complicated algorithm such as F5 [110]. For that, Fridrich et al. [39, 40] developed a steganalysis technique targeted specifically to attack F5. This technique can estimate the length of a hidden message embedded in the JPEG image. The main idea is based on the proportionality of a dened macroscopic quantity and the hidden message length. In other words, the size of the embedded message will be reected in the macroscopic quantity. Hence, the hidden message length can be determined by computing the macroscopic quantity.

48

The rst step in this technique is to estimate a copy of cover image from the given stego image. The estimation is carried out by cropping four pixels in both the horizontal and vertical directions after decompressing the stego image. The cropped image is then recompressed with the same JPEG quantisation table from the stego image. Note that this process is the image calibration discussed in the preceding subsection. In the next step, the authors use the histograms of several low-frequency DCT coecient modes as the macroscopic quantity. The histograms used are from the given stego image and the estimated cover image. The modication caused by the embedding will be reected on the distribution of the histograms. Hence, based on the histograms, the modication rate can be determined. Finally, with the modication rate, the size of the hidden message can be computed. In [38], Fridrich et al. launched an attack on OutGuess using a similar concept. They started by determining a macroscopic quantity that progressively changes with the size of the embedded message. Due to the LSB ipping algorithm of OutGuess, embedding increases the spatial discontinuities at the boundaries of all 8 8 blocks. Hence, the authors used blockiness as the macroscopic quantity for measuring the degree of change that occurred at the boundaries. In Figure 3.13, we show an illustration of the 8 8 block boundaries, where the blockiness measurement is calculated.
(i)-th block (i+1)-th block Boundaries of an 8 x 8 block

A magnified view of an 8x8 block


(N-1)-th block (N)-th block

Figure 3.13: The shaded regions denote the boundaries of 8 8 blocks in a decompressed JPEG image

According to the authors, the increase of blockiness is expected to be smaller in 49

a stego image than in a cover image when a full-length dummy message (random bits) is articially re-embedded using OutGuess. This is because of the partial cancellation eect on the stego image. For example, the LSB of a pixel xi is changed from zero to one after embedding. For re-embedding a full-length message, the LSB of pixel xi is changed back from one to zero. Note that this attack also depends on the estimation of the cover image, which is the same technique used in [39]. With the blockiness measurement and the re-embedding of a full-length message on the given stego image and the estimated cover image, a linear interpolation is used to estimate the length of the embedded message. In another example, He and Huang [50, 49] analysed non-adaptive stochastic modulation steganography [36] and showed how to estimate the length of a hidden message. Stochastic modulation steganography is a noise-additive steganography, where a signal with specic probabilistic distribution is modulated and added to carry the message bits. The signal in this context is known as stego noise. The attack is based on the diering probability distributions of pixel dierences for cover and stego images. More precisely, the probability distribution of pixel dierence for a cover image closely follows a generalised Gaussian distribution (GGD), while the probability distribution of a stego image reects the statistical characteristics of a hidden message. Note that, for non-adaptive stochastic modulation steganography, the probabilistic distribution of a stego images pixel dierence is a convolution of the probabilistic distributions of a cover images pixel dierence and a stego noise dierence. Thus, the attack starts by establishing a model to describe the statistical relationship among the cover image, stego image and stego noise. Next, the required distributional parameters are estimated from the given stego image. Then, based on the distributional parameters, the authors employ a grid search and chi-square goodness of t test approach to estimate the length of the embedded message. The experimental results reported show promising detection accuracy. The authors mention that this steganalysis technique is not only eective for noise additive steganography, but also suitable for other types of non-adaptive steganography (e.g., LSB-based steganography and k steganography). Unfortunately, no further details regarding this are provided. This technique depends signicantly on the assumption where the pixel dierence of a cover image is accurately modelled as GGD. However, this assumption will likely cause this technique to fail when the analysed cover image is a binary image. The failure is due to the modest 50

statistical characteristic of a binary image. Jiang et al. [57] launched an attack on boundary-based steganography that embeds a secret message in a binary image. Their attack hinges on the observation that embedding disturbs pixel positions and this degrades the t of the autoregressive model on binary object boundaries. The attack works by assuming that the boundaries of characters or symbols in a textual document can be modelled by a cubic polynomial. This allows a boundary pixel to be estimated from its neighbouring pixels through an autoregressive process. An estimation error vector is computed from the given and estimated boundary pixels. Then the mean and variance of the estimation error vector are calculated. According to their experiments, the mean and variance increase proportionally with respect to the relative message length. Hence, based on some testing samples, a linear equation is dened for message length estimation. However, this attack is not applicable when the object boundaries cannot be modelled by a cubic polynomial. In [56], Jiang et al. launched another attack on binary image steganography. This attack is based on the idea that the entropy of a stego image is a monotonically increasing function of the embedding rate. A JBIG 2 binary image compression algorithm is used to capture the entropy. This compression algorithm establishes a quantitative relationship between the compression rate and embedding rate. Thus, the estimate of message length can be derived from the computed compression rate. The message length estimation steganalysis techniques mentioned above are mainly focused on a specify steganography. The work developed in [71] generalises the estimation technique so that it can be applied to a wider range of steganographic methods. Indeed, it can be considered a multi-class steganalysis technique that uses a multi-class classier to estimate the hidden message length. The authors employ SVM classiers with one-against-all strategy to perform the multi-classication tasks. The measurement of standard mean square error is modied and used as a feature set. Unfortunately, the use of multi-classication technique to estimate the message length is of limited use and impractical. Unlike the multi-class steganalysis discussed in Subsection 3.2.2, where the number of classes is small, treating each dierent message length as a single class will contribute to a large number of classes. For instance, if there are n classes of steganographic methods with m dierent lengths, the multi-classier will be required to classify n m dierent

51

classes. Clearly, when n and m increase, the classication will become ineective as the extracted feature points may have signicant overlaps. Therefore, the message length estimation technique developed in [71] will likely become unreliable when the number of classes is large.

3.2.4

Identication of Stego-Bearing Pixels

In some cases when the adversary is certain and has knowledge about the steganographic method used, this provides an opportunity for him to identify which pixels are used to carry the message bits. The work in [27] is motivated by the concept of outlier detection. A model of image distribution is built rst. After that, any pixel that deviates from the model is identied as an outlier. Together with the outlier detection, the authors of [27] opted to utilise an image restoration technique. Their idea is that a pixel altered to carry a message bit will deviate from the image distribution and be identied as an outlier. When the image restoration technique is applied, the pixel (outlier) will be automatically removed. Obtaining the list of removed pixels identies the locations of stego-bearing pixels. Due to the wide variety of image content, a non-parametric model will be more suitable and useful than a parametric model. Hence, the image pixel energy has been used to model the distribution. In the restoration process, each pixel is examined and may be conditionally updated to minimise the pixel energy. The whole process is repeated until convergence occurs. The developed technique is reported to work with greyscale and colour palette images, such as GIF. However, the authors report that their technique may be defeated if the message is adaptively embedded in the high-energy regions of an image. This also implies that identication of stego-bearing pixels becomes unreliable when an image with rich texture content is used as the cover image. Kong et al. developed a steganalysis technique to identify the region in a colour image where a secret message is embedded [68]. This technique is specically targeted to attack steganography with a sequential embedding algorithm. The idea comes from the fact that when the colour components (e.g., red, green and blue) of the colour image are altered independently, the smoothness of the colour will be disturbed. According to the authors, this observation becomes prominent under the investigation of a dierent colour system (e.g., HIS, YUV and YCbCr) 52

which uses luminance, hue and saturation to describe the colours. Kong et al. suggest that, in general, the hue of a cover image varies slowly and tends to be constant in a small neighbourhood of pixels. This is no longer true when a hidden message is embedded. Thus, when the coherence of the hue in a region under examination exceeds a certain threshold, there is a good reason to suspect that it contains bits of a hidden message. The algorithm of Kong et al. technique can be summarised as follows. Given a colour image, their technique will partition the image into blocks and examine each block separately. The pixels within each block are divided into two distinct groups: coherence and incoherence. A pixel is assigned to the coherence group if the maximal dierence of hue between this pixel and its neighbouring pixels is less than a threshold. In addition, at least one neighbouring pixel with the same hue as that pixel must exist. Any pixel that fails to full these conditions will be assigned to the incoherence group. The ratio of the coherence group to the incoherence group determines if a block should be labelled as a stego-bearing region. Note that this technique involves a high degree of threshold dependency. For example, the technique requires three dierent thresholds to identify the stego-bearing region. This may be a drawback in practice, especially with a wide variety of image content. Hence, careful selection of cover images renders this technique ineective. Furthermore, this technique only works for steganography with sequential embedding. If the embedding is random, then this attack will fail. In [62], Ker argues that it is possible to have a situation, where several dierent cover images are used for a batch of secret communications. These images are the same size, but embedded with dierent messages. It is very likely that the same stegokey will be used for this batch of communications. We can relate a simple but plausible scenario to this assumption. For instance, a batch of secret communications can use a set of dierent images, captured with the same settings of a digital camera. This produces images with the same size. For security reasons, random embedding algorithms are preferable. As usual, the random embedding is controlled by a stegokey and it is quite possible the same stegokey is reused for the entire batch of communications. This will result in dierent messages embedded in the same locations across dierent images. In addition, it is also possible that a sequential embedding algorithm is used, resulting in embedding with the same xed pixel locations.

53

This is what inspired Ker [62] to develop a technique to identify the locations of these stego-bearing pixels. In this work, Ker employed a weighted stego image (WS), initially developed in [37] and later improved [63]. The analysis is based on the residuals of WS. The residual of a WS is the pixel-wise dierence between the stego image and the estimated cover image. The residual at the ith pixel can be dened as follows: ri = (si si )(si ci ), (3.4)

where si is the ith pixel of the stego image, si is the corresponding stego pixel with its LSB ipped and ci is the pixel of the estimated cover image. With access to multiple stego images, as in the scenario described above, the mean of residual at the ith pixel can be computed as follows: 1 ri = N
N

rij ,
j=1

(3.5)

where N is the total number of stego images. rij is obtained as in Equation (3.4) for the jth stego image and can be dened as follows: rij = (sij sij )(sij cij ). (3.6)

When N is suciently large, the mean of the residual will provide strong evidence for separating stego-bearing pixels from normal pixels. Note that the analysis developed in [62] is most eective for LSB replacement steganography. It may become ineective for other steganography, such as LSBmatching steganography. Being aware of this limitation, Ker and Lubenko extended the work to cover the analysis of LSB-matching steganography in [64]. They use the residuals of wavelet absolute moments (WAM), which are derived from the feature set developed for blind steganalysis in [44]. The residuals of WAM are computed as follows. Given an image, one level of wavelet decomposition using an eight-tap Daubechies lter is employed. The decomposition produces four subbandsthe low frequency, vertical, horizontal and diagonal subbands. Then, the authors use a quasi-Wiener lter to compute the residuals of WAM from the vertical, horizontal and diagonal subbands. However, the low-frequency subband is not used and initialised to zero. The residuals of WAM and the zeroed low frequency subband are reconstructed through the inverse of the wavelet transform. This reconstruction produces something similar to a spatial domain image that 54

Image

Wavelet decomposition

V H L Zero Filling

Quasi-Wiener filtering

Residual image

Inverse wavelet transform

L'

R[V] R[H] R[D]

Figure 3.14: The extraction of residual image. L, V, H and D denote the low frequency, vertical, horizontal and diagonal subbands, respectively. L' is the zero out low frequency subband. R[V], R[H] and R[D] denote the residuals of WAM from the vertical, horizontal and diagonal subbands, respectively the authors call a spatial domain residual image. The whole process is depicted in Figure 3.14. Similar to [62], and based on the same assumption, where multiple stego images are available and the same stegokey is reused for the embedding, the identication of stego-bearing pixels is performed. The identication starts by computing the mean of the absolute residual for every pixel across all stego images, Note that the mean of the absolute residual is identical to the mean of the residual dened in Equation (3.5). The pixel used to carry the message bit will have a higher mean of the absolute residual. Thus, the locations of stego-bearing pixels can be identied by selecting p number of pixels with the highest mean of the absolute residual. According to the authors, p can be estimated by a quantitative steganalysis technique (the analysis discussed in Section 3.2.3). However, if the estimation of p is less accurate, the inaccuracy of p will cause inaccurate identication of stego-bearing pixels. One important observation is that cover image estimation has a signicant eect on the analysis. In general, the accuracy of the analysis can be greatly improved by using a more accurate estimation technique. In [62, 64] it is very important to keep the number of required stego images to a minimum. For example, if several hundred stego images are required to obtain an accurate identication, the technique will be of limited use.

55

3.2.5

Retrieval of Stegokey

In a scenario where the embedded message is not encrypted and the key space is small, it is very likely that the adversary can mount a dictionary attack or brute-force search for the stegokey. For example, for every stegokey tried, the adversary gets an alleged message and the correct stegokey would be revealed when a meaningful message is obtained. The following examples show a more advanced version of this type of attack. Fridrich et al. [41] developed a steganalysis technique that can retrieve the stegokey. The technique was developed with several assumptions: (i) the retrieved stegokey is the seed of a pseudorandom number generator (PRNG), and (ii) the steganalysis is independent of the encryption algorithm. As the steganography may use a mapping component, such as a hash function, to map the password to the seed, it is reasonable to assume the retrieved stegokey is the seed of a PRNG rather than the password. Normally, a message will be encrypted before the embedding algorithm is applied. For that, a stegokey (or stegokeys) is used in the encryption algorithm as well as the PRNG that generates the embedding path in the embedding algorithm. Clearly, the computational of stegokey retrieval may become infeasiblethis is where the second assumption comes into play. The technique involves only nding the seed used in the embedding algorithm and discarding the encryption algorithm. Given an image with N pixels, where m < N pixels are randomly selected during embedding to carry the secret message bits, the embedding path generated by the stegokey is a random path. The steganalysis technique starts by taking n samples of pixels where n < m. The n samples are selected randomly from the stego image and the random selection path is generated from a seed kj . kj is from the stegokey space. The correct stegokey is determined through a brute force search within the stegokey space for dierent kj . The distributions of n samples for the correct and incorrect stegokeys are dierent; therefore, it is suitable to use their probability density functions (PDFs) as the statistical properties. Finally, the chi-square test is used. For every tested stegokey, the chi-square statistic is obtained and the outlier will be identied as the correct stegokey. This technique was tested on two JPEG steganographic methodsF5 and OutGuess. Later, they extended the analysis to cover spatial domain image steganography [42]. In [42], Fridrich et al. chose generic LSB replacement and LSB matching steganography as the benchmark steganographic methods for testing their extended steganalysis technique. 56

Their extended technique adds a pre-processing step. In the pre-processing step, a non-linear ltering operation is used to increase the signal-to-noise ratio (SNR) between the stego signal and the cover image. Thus, instead of pixel values, residuals are used. Residuals are computed from the dierence between the pixels values of the image and its ltered version. Although both techniques are powerful and can be applied practically to a wider class of steganography, the following issues may reduce their eectiveness: i. The embedded message occupies 100 per cent of the image capacity. ii. Matrix encoding is employed as part of the embedding algorithm. iii. The speed of the PRNG is reduced (hence the brute search time increases exponentially and makes the technique infeasible). The authors also noted that their technique could become complicated and dicult when the stego image is noisy or the stegokey space is huge. However, if multiple stego images embedded with the same stegokey are available, this will increase the probability of retrieving the correct stegokey. A similar analysis is found in [105, 106]. The focus of the analysis is on retrieving the stegokey in a sequential embedding algorithm. The stegokey is dened dierently and identied as the start-end locations of a consecutive embedding path. To identify the start-end locations, the analysis employs a sequential analysis technique called the cumulative sum (CUSUM). The idea is to detect a sudden jump in the statistic, which indicates the existence of a message. The authors extended the steganalysis by utilising a locally most powerful (LMP) sequential statistical test. LMP test is an optimum statistical test for the detection of weak signals. The extended steganalysis is mainly used to handle the diculty of analysing messages with a low signal-to-noise ratio (SNR)1 in a stego image. The CUSUM was later combined with LMP to form an enhanced steganalysis technique and its eectiveness tested with spread-spectrum steganography. In addition, it can detect multiple messages embedded sequentially in dierent image segments. The developed steganalysis technique seems to be a useful tool to evaluate the security level in watermarking applications. However, this technique may not be suitable for analysing steganography, especially when it uses a random embedding algorithm. Note that the analysis presented in [105, 106] is more related to the identication of stego-bearing pixel locations (discussed in Subsection 3.2.4) than
1

Low SNR is often required to maintain the imperceptibility of embedding.

57

stegokey retrieval.

3.2.6

Extracting the Hidden Message

In general, messages are encrypted using cryptographic strong encryption algorithm before embedding. This enhances security and provides a dual layer of security. Therefore, we might not be able to obtain a meaningful message even after extracting a hidden message using the steganalysis techniques discussed. Clearly, it is most desirable for the extracted message to be deciphered. It is reasonable to consider and separate the analysis of steganography into two phases: steganalysis and cryptanalysis. Steganalysis involves the analysis discussed from Subsection 3.2.1 until 3.2.5, whereas cryptanalysis deciphers the hidden message extracted in the steganalysis phase. Note that if a message is not encrypted before embedding, cryptanalysis is not required.

58

Chapter 4 Blind Steganalysis


In general, there are two types of steganalysistargeted and blind. Targeted steganalysis is designed to attack one particular embedding algorithm. For example, Bhme and Westfeld [7] broke model-based steganography [96] using analyo sis of the Cauchy probability distribution. In another example, He and Huang [49] successfully estimated the hidden message length for stochastic modulation steganography [36], where a specic probabilistic distribution of a signal is modulated and added to carry message bits. Jiang et al. in [57] launched an attack on boundary-based steganography, which embeds secret messages in binary images. Their attack hinges on an observation that embedding disturbs pixel positions and this degrades the t of an autoregressive model on binary object boundaries. The work in [41, 42] showed how to estimate the stegokeys used for embedding hidden messages. Targeted steganalysis can produce more accurate results, but can fail if the embedding algorithm diers from the target one. Blind steganalysis can be considered a universal technique that detects dierent types of steganography. Because blind steganalysis can detect a wider class of steganographic techniques, it is generally less accurate compared to targeted steganalysis. However, blind steganalysis can detect a new steganographic technique, when there is no targeted steganalysis available. Thus, blind steganalysis is an irreplaceable detection tool if the embedding algorithm is unknown or secret. Successful blind steganalysis techniques include the feature-based steganalysis proposed in [35], where a set of eective statistical characteristics (features) is extracted to dierentiate cover images from stego images. A similar technique relying on pixel dierences was used in [99, 70] to detect hidden messagesthis feature was proven to work well. Meng et al. employ a run length histogram in their steganalysis to detect a hidden message in binary images [77]. 59

In this work, we further study blind steganalysis and its eectiveness in detecting a secret message embedded in a binary image. To conrm that our attack works, we experiment with steganographic techniques, for which we have reduced the length of an embedded message. In other words, we use images with a reduced steganographic payload1 in our experiments. Our experiments have shown that our steganalysis works well over a wide range of steganographic payloads. The organisation of this chapter is as follow. In the next section, we give a brief comparison of the steganographic methods. The technique of analysis we apply is given in Section 4.2. Section 4.3 presents the experimental results of the analysis and Section 4.4 concludes the chapter.

4.1

Comparison of the Steganography Methods under Analysis

The steganographic techniques in [107, 10] are actually extended from [82]. Without loss of generality, we will describe the technique given in [82] as an example. A detail description of these three techniques can be found in Section 3.1. Steganography involves two basic operationsembedding and extraction. The embedding operation in [82] starts by partitioning a given image into non-overlapping blocks of size m n. The payload for each non-overlapping block is r bits. The message bits are segmented into streams of r bits and embedded by modifying some pixels in the blocks. The modication of the pixels is governed by certain criteria computed through bitwise exclusive-OR and pair-wise multiplication operations between the non-overlapping block, random binary (denoted ) and secret matrix (denoted ). Both and are m n matrices and serve as the stegokey. During extraction, parameters such as m, n and r must be communicated correctly between the sender and receiver to construct the correct size of non-overlapping blocks and number of r bits per stream. In addition, the correct stegokey ( and ) is needed to extract the secret message. After that, the receiver can derive the message bits by using the inverse of the criteria used in the embedding operation. The steganography in [107] is an improved version of the steganography developed initially in [82]. The improvement is mainly control of the visual quality
Reducing the payload minimises the alteration of image pixels and cause less distortion, hence increasing the steganographic security.
1

60

Table 4.1: Comparison of the steganographic techniques


Steganography Pan et al. [82] Tseng & Pan [107] Chang et al. [10] Secret Matrix, weight matrix weight matrix serial number matrix Payload, r log2 (mn + 1) log2 (mn + 1) 1 log2 (mn + 1) Image Quality enhanced Altered Bit 2 2 1

of the produced stego image, where only boundary pixels are ipped. The third steganography [10] is also improved from the steganography in [82]. As compared to the method in [82] where two bits alteration per block at most are required, the method in [10] only requires one bit. The extraction operation for Tseng and Pan [107] and Chang et al. [10] steganographic methods are similar to that of Pan et al. [74]. All of these steganographic techniques were developed to embed secret message in binary images. Table 4.1 compares the main steganographic characteristics of the techniques. It is interesting that these techniques, for example the method in [82] can embed as many as log2 (mn + 1) message bits but only need to alter two pixels at most. Whereas, in conventional techniques, a one-pixel alteration accommodates one message bit at most. Further, adjusting m and n changes the payload and aects the security level. This gives exibility in balancing between payload and security level.

4.2

Proposed Steganalysis Method

Blind steganalysis can be viewed as a supervised machine learning problem that classies points as either points of an original cover image or points related to an inserted secret message. Our analysis includes feature extraction and data classication. The rst stage is crucial and we show how to construct the features. The second stage uses the SVM [23] as the classier. SVM is based on the idea of hyperplane separation between two classes. It obtains an optimal hyperplane that separates the feature set of dierent classes into dierent sides of the hyperplane. Based on the separation, the class an image belongs to can be determined.

61

4.2.1

Grey Level Run Length Matrix

The feature we want to extract from images is based on the grey level run length (GLRL). The length is measured by the number of consecutive pixels for a given grey level g and direction . Note that 0 g G 1, G is the total number of grey levels and , where 0 180, indicates the direction. The sequence of pixels (at a grey level) is characterised by its length (run length) and its frequency count (run length value), which tells us how many times the run has occurred in the image. Thus, our feature is a GLRL matrix that fully characterises dierent grey runs in two dimensions: the grey level g and the run length . The GLRL matrix is dened as follows:

r(g, |) = # {(x, y) | p(x, y) = p(x + s, y + t) = g; p(x + u, y + v) = g; 0 s < u & 0 t < v; u = cos() & v = sin(); 0 g G 1 & 1 L & 0 180 }, (4.1)

where # denotes the number of elements and p(x, y) is the pixel intensity (grey level) at position x, y. G is the total number of grey levels and L is the maximum run length. The extracted GLRL matrix from an image can be considered a set of higherorder statistical characteristics. The GLRL matrix is not sucient for an analysis of black and white images, where the number of grey levels is drastically reduced (note that in greyscale and colour images, the number of grey levels is larger than that in binary images, which is at least 256). To x this technical diculty, we propose a solution that allows us to create more grey levels and, consequently, more meaningful statistics. Our approach is to use pixel dierences.

4.2.2

Pixel Dierences

The pixel dierence is the dierence between a pixel and its neighbouring pixels. Given pixel p(x, y) of an image, with x [1, X] and y [1, Y ], where X and Y are the image width and height, respectively, the vertical dierence for the pixel

62

p(x, y) in the vertical direction is dened as follows: pv (x , y ) = p(x, y + 1) p(x, y), (4.2)

where x [1, X 1] and y [1, Y 1]. The pixel dierences in the horizontal, main diagonal and minor diagonal directions are dened similarly. It is easy to observe, and has been conrmed by experiments, that introducing the pixel dierence increases (almost doubles) the number of grey levels. To illustrate the point, consider a greyscale image with 256 grey levels. After introduction of pixel dierence, the range of grey levels becomes [255, +255]. The same doubling eect happens for binary images. This eect is desirable for addressing the technical diculty mentioned in Subsection 4.2.1. The authors in [70] named this similar pixel dierence a high-order dierentiation and derived some additional sets. Their features are dened as below:

pn+1 (x, y) c pn+1 (x, y) r p1 (x, y) p2 (x, y)

= = = =

pn (x, y + 1) pn (x, y), pn (x + 1, y) pn (x, y), |p1 (x, y)| + |p1 (x, y)|, r c p1 (x, y) p1 (x 1, y) + p1 (x, y) p1 (x, y 1), r r c c (4.3) (4.4) (4.5)

where n = 0, 1, 2 and | | represents the absolute value. p1 (x, y) and p1 (x, y) can c r be considered the pixel dierence in the vertical and horizontal directions, respectively. p1 (x, y) and p2 (x, y) are the respective higher-order total dierentiations. p0 (x, y) is a special case, which is the given image.

4.2.3

GLRL Matrix from the Pixel Dierence

The statistical features we use in our analysis are developed in the following two stages:

1. In the rst stage, we use the pixel dierence to increase the number of grey levels. We incorporated the pixel dierence shown in Equation (4.5). Note that p2 (x, y) in Equation (4.5) is obtained by summing the pixel dierences computed in the horizontal and vertical directions. Note that the doubling eects of grey levels on the pixel dierence increased from [0, 1] for p(x, y) 63

to [4, 4] for p2 (x, y). This is not hard to verify. For example, consider the minimum and maximum of grey levels for p1 are 1 and +1, respectively. r The same is applied to p1 . Hence, we can obtain the minimum and maximum c grey level for p2 (x, y). The minimum is obtained when the neighbouring dierences for both p1 and p1 (right components) in Equation (4.5) are 2, r c which will produce p2 (x, y) = 4, whereas the maximum is obtained when the neighbouring dierences for both p1 and p1 are 2, which will produce r c 2 p (x, y) = 4. 2. In the second stage, we compute GLRL matrix to get the required feature set. This is achieved by extracting the GLRL matrix discussed in Subsection 4.2.1 on top of the pixel dierence obtained in the rst stage. We do not include p(x, y) (pixel from the given binary image) and all the pixel dierences as in Subsection 4.2.2 (except p2 (x, y)) because we observed that these features do not improve the results signicantly. We also observed that it is signicant to use only two directions for 0 and 90 . Thus, by substituting p(x, y) by p2 (x, y) in Equation (4.1), we obtain our rst set of features.

4.2.4

GLGL Matrix

Since GLRL matrix features tend to measure plateaus of the image, we need additional sensitive features to reect the image peaks2 . The grey level gap length (GLGL) matrix proposed in [114] seems to be an appropriate choice. The authors in [114] used the GLGL matrix in texture analysis and dened it as follows:

a(g, |) = # {(x, y) | p(x, y) = p(x + u, y + v) = g; p(x + s, y + t) = g; s < u & t < v; u = ( + 1) cos() & v = ( + 1) sin(); 0 g G 1 & 0 L & 0 180 }, (4.6)

where # denotes the number of elements, L is the maximum gap length and the rest of notations are the same as in Equation (4.1). We compose two features from the GLGL matrix for our second feature set. The rst feature is composed directly from the binary image, as shown in Equation
2

Small pixel-wide notches and protrusions near the boundary pixels caused by embedding.

64

(4.6). The second feature is from the pixel dierence, similar to that from the GLRL matrix in Subsection 4.2.3. We replace p(x, y) in Equation (4.6) with p2 (x, y) and set = 0 for both features.

4.2.5

Final Feature Sets

It is too computational expensive to use all elements in the GLRL and GLGL matrices as feature elements. Therefore, we propose to simplify by transforming the two-dimensional GLRL and GLGL matrices to one-dimensional histograms.
L

hGLRL g

=
=1

r(g, |),

0 g G 1,

(4.7)

where = 0 and 90 , and the rest of notations are the same as in Equation (4.1). In addition, we observe that, within a GLRL matrix, there are some high concentration of frequencies near the short runs, which may be important. Hence, we propose to extract the rst four short runs as a single histogram, hsr . g hsr = r(g, |), g

0 g G 1,

(4.8)

where = 0 and 90 , and = 1, 2, 3, 4 are the selected short runs. The one-dimensional histogram of the GLGL matrix hGLGL can be obtained in the g same way as in Equation (4.7). r(g, |) is replaced by a(g, |) for both p(x, y) and p2 (x, y). in this case is 0 and 0 L. We also incorporate some of the high-order dierentiation features developed in [70]. These one-dimensional histograms can be derived from Equations (4.3) to (4.5) and shown as follows:
X Y

hp q

=
x=1 y=1 X Y 1

(q, pn (x, y)), (q, pm (x, y)) c


x=1 y=1

minp q maxp,
X1 Y

(4.9)

m m hpc +pr q

+
x=1 y=1

(q, pm (x, y)), r minp q maxp, (4.10)

where (, ) = 1 if = and 0 otherwise. n = 1, 2 and m = 1, 2, 3. minp and maxp denote the minimum and maximum values of the grey level, respectively. 65

Other notations are the same as in Subsection 4.2.2. As noted previously, blind steganalysis can be considered a two-class classication, so the extracted feature sets must be sensitive to embedding alterations. This is to say that the feature values of the cover image should be dierent than those of the stego imagethe larger the dierence, the better the features. Hence, we apply the characteristic function, CF to each of the above histograms to achieve better discrimination. The characteristic function can be computed using a discrete Fourier transform, as shown in Equation (4.11).
N 1

CFk =
n=0

hn e N

2i

kn

0 k N 1,
2i

(4.11)

where N is the vector length, i is the imaginary unit and e N is the Nth root of unity. For each characteristic function (one for each histogram), we compute the mean, variance, kurtosis and skewness. This includes the characteristic functions calculated from Equations (4.9) and (4.10), where the original work developed in [70] uses the rst-order of moment. We form a 68-dimensional feature space, as summarised in Table 4.2. Table 4.2: Respective feature sets and the total number of dimensions for each set Histogram Type hGLRL g
hsr g hGLGL g n hp q pm +pm hq c r

Number of Direction 2 2 1

Number of Matrix 1 4 2 2 3

Statistic3 4 4 4 4 4

Total Dimension 8 32 8 8 12

Empirical evidence shows that the dierence in feature values between the cover image and the re-embedded cover image4 is signicantly larger than the dierence between those of the stego image and the re-embedded stego image. This is helpful in creating discriminating features and we apply it in our 68-dimensional feature space as the nal constructed feature set.
Consists of the mean, variance, kurtosis and skewness. The re-embedded image is the same image that has been re-embedded with the full-length random message using the same steganography.
4 3

66

Table 4.3: Experimental parameters


Steganography Pan et al. [82] Tseng & Pan [107] Chang et al. [10] Block Size (pixel) 32 x 32 32 x 32 32 x 32 Payload, r 3 3 3 Message Length (%) 10, 30, 50 and 80 10, 30, 50 and 80 10, 30, 50 and 80 Total Number of Stego Image 659 x 4 = 2636 659 x 4 = 2636 659 x 4 = 2636

4.3
4.3.1

Experimental Results
Experimental Setup

In our experiments, we construct a set of 659 binary images as cover images. The images are all textual documents with white background and black foreground. The image resolutions are all 200 dpi and with image size of 800 800. The experimental parameters are summarised in Table 4.3. As shown in Table 4.3, we used a larger non-overlapping block size (32 32) and shorter secret message (smaller payload of r = 3 bits per block). This setup makes our attack more dicult as the steganographic embedding is more secure. The secret message length is measured as the ratio between the number of embedded message bits and the maximum number of message bits that can be embedded in a binary image. We employ uniform distribution of random message bits for the experiments. We extract the feature sets proposed in Subsection 4.2.5 for each image and use the SVM implemented in [9] to classify the class (cover or stego) of the image. For all experiments, we dedicate 80 per cent of the images to training the classier; the remaining 20 per cent are used for testing. The prototype implementation is coded in Matlab R2008a.

4.3.2

Results Comparison

We use a receiver operating characteristic (ROC) curve to illustrate our detection results. The ROC curve is a plot of detection probability versus false alarm probability; each point plotted on the ROC curve represents the achieved performance of the steganalysis. We also use the area under the ROC curve (AUR) to provide a clearer comparison. The AUR values ranged from 0.5 to 1.0, where 0.5 is the 67

worst detection performance and 1.0 is optimuman AUR of 0.5 indicates that the detection is merely random guessing; an AUR of 1.0 means the detection is very reliable (detection probability = 1.0). Therefore, the closer the AUR is to 1.0, the better it is. The respective ROC curves with the AUR values (in brackets) are shown in Figure 4.1. The area under the dotted diagonal line in each ROC curve is 0.5 (AUR = 0.5), which corresponds to random guessing. The gure clearly shows that the detection results are very promising and the steganography developed by Tseng and Pan [107] appeared to be the most dicult to detect. This is consistent with the claim by Tseng and Pan that their method is an improved version. The detection results for Pan et al. [82] and Chang et al. [10] methods are nearly perfect.
1 0.9 0.8 0.7 0.6
Detection

1 0.9 0.8 0.7 0.6


Detection Detection

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Alarm 0.7 0.8 10%s (0.9203) 30%s (1.0000) 50%s (1.0000) 80%s (1.0000) 0.9 1

0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Alarm 0.7 0.8 10%s (0.7843) 30%s (0.9930) 50%s (0.9997) 80%s (1.0000) 1%s (0.5205) 0.9 1

0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Alarm 0.7 0.8 10%s (0.9182) 30%s (1.0000) 50%s (1.0000) 80%s (1.0000) 0.9 1

(a)

(b)

(c)

Figure 4.1: Detection results using ROC curves and AUR: (a) detection result for Pan et al. [82]; (b) for Tseng and Pan [107]; and (c) for Chang et al. [10]

It is also worth mentioning that the longer the embedded secret message, the more image distortion it produces. Hence, it is relatively easier to detect a stego image with a long embedded message than one with a shorter message. This is shown in Figure 4.1 where the detection accuracy increased as the test moved from a shorter message (10 per cent) to a longer message (80 per cent). For a message length greater than 30 per cent, the detection is very accurate. However, if the message is very short (one per cent), it is very dicult to detect (refer to the yellow line in Figure 4.1(b)). The main reason for this is the image alteration caused by the embedding is minimal (e.g., modifying 15 pixels in a 800 800 pixel image) and therefore is not captured to any extent by our features.

68

4.4

Conclusion

Our proposed 48 new dimensional features, used in combination with some modication of the existing 20-dimensional features, achieve reliable and eective detection of secret messages embedded in binary images. The experimental results show that the proposed method can detect dierent lengths with low percentage of embedded message. In addition, our proposed method can detect more than one steganographic method, which make it a suitable blind steganalysis for binary images.

69

Chapter 5 Multi-Class Steganalysis


In general, blind steganalysis is considered two-class classication. This means that, given an image, the steganalysis should be able to decide the class (cover or stego) of the image. It is possible to extend blind steganalysis to form a multi-class steganalysis. From a practical point of view, multi-class steganalysis is similar to blind steganalysis; however, it can accommodate more classes. The additional classes come from dierent types of stego images, produced by dierent embedding techniques. The task of multi-class steganalysis is to identify the embedding algorithm applied to produce the given stego image or, if no embedding has been performed on the image, it should be classied as a cover image. In [87], Pevn and Fridrich extended the blind steganalysis developed in [35] to y form a multi-class steganalysis. Their multi-class steganalysis can classify embedding algorithms based on the given JPEG stego images. Rodriguez and Peterson [95] studied a dierent multi-class steganalysis for JPEG images. In [95], the extracted features are based on wavelet decomposition and a SVM is employed as the classier. The most recent work is the technique developed by Dong et al. [31]. The main contribution of this multi-class steganalysis is its ability to carry out classication in two dierent image domainsthe frequency domain (e.g., JPEG images) and spatial domain (e.g., BMP images). Other multi-class steganalysis approaches can be found in [88, 97, 108] and all were developed to counter JPEG image steganography. Note that these multi-class steganalysis techniques are for images with at least eight bits per pixel intensity. This means that the images can be greyscale, colour or true colour images. It is not clear how the existing multi-class steganalysis can be generalised for black and white binary images. Unlike greyscale and colour

70

images, binary images have a rather modest statistical nature. This makes it dicult to apply the existing multi-class steganalysis techniques on binary images. To the best of our knowledge, there is no multi-class steganalysis proposed for binary images in the literature. In this chapter, we propose a multi-class steganalysis for binary images. The work in this chapter is based on an extension of our previously developed blind steganalysis for binary images (Chapter 4). There are three main contributions of this chapter. First, we incorporate additional new features to our existing feature sets. Second, the concept of cover image estimation is incorporated to enhance the feature sensitivity. Third, a new multi-class steganalysis technique is developed. Consequently, we are able to assign a given image to its appropriate class. This will provide valuable information for steganalysts (e.g., forensic examiners) towards the goal of extracting hidden messages. The remainder of this chapter is as follow. In the next section, we summarise the steganographic methods under analysis. The method of analysis we apply is given in Section 5.2. The construction of multi-class classier is discussed in Section 5.3. Section 5.4 presents the experimental results of the analysis and Section 5.5 concludes the chapter.

5.1

Summary of the Steganographic Methods under Analysis

This chapter analyses ve dierent types of steganography. These steganographic methods have been elaborated concisely in Section 3.1. In this section, we briey summarise the methods and focus on their embedding algorithms. All the steganographic methods were developed to embed secret messages in binary images. The rst three methods under analysis are from the work developed in [82, 107, 10]. These methods are all variants of block-based steganography. To perform embedding, a given binary image will be partitioned into non-overlapping blocks. The message bits are divided into a stream of r bits before being embedded in the block. Two sets of matrices, the random binary matrix and secret weight matrix (the method in [10] uses the serial number matrix instead) are used to determine which pixels should be ipped when necessary. The two matrices are shared between the sender and receiver as the stegokey.

71

The steganography developed in [69] is considered boundary-based steganography. This type of steganography will hide a message along the edges where white pixels meet black onesthese pixels are known as boundary pixels. To obtain higher imperceptibility, the locations of pixels used for embedding are permuted and distributed over the whole image. The permutation is controlled by a PRNG whose seed is a secret shared by the sender and the receiver. Not all boundary pixels are suitable for carrying message bits because embedding a bit into an arbitrary boundary pixel may convert it into a non-boundary one. This will jeopardise the extraction and cause the recovery of the hidden message becomes impossible. Because of these technical diculties, some improvements were developed by adding restrictions on the selection of boundary pixels for embedding. The last steganography under our analysis is that developed by Wu and Liu in [112]. This technique also starts by partitioning a given image into blocks. The odd-even relationship of the pixels within a block is adjusted to hold the message bit. Clearly, when this odd-even relationship holds for the message bit to be embedded then no alteration is required. Otherwise, some pixels are selected and altered to adjust the odd-even relationship. Moreover, a ippability scoring system is constructed to ensure the pixel selection for alteration is ecient.

5.2

Proposed Steganalysis

The ultimate goal of steganalysis is to extract the full hidden message. This task, however, may be very dicult to achieve. Thus, we may start with more realistic and modest goals, such as identifying the type of steganographic technique used for the embedding. We want to improve our existing technique so that we can identify the embedding algorithm. To do this, we propose a multi-class steganalysis. Multi-class steganalysis can be viewed as a supervised machine learning problem where we want to determine the class of a given image. Our analysis includes feature extraction and data classication stages. First stage is crucial and we show how to construct the existing and new features in this section. The second stage uses the SVM [23] to construct the multi-class classier. We will describe the multi-class classier in detail in Section 5.3.

72

Figure 5.1: Pixel dierence in vertical direction

5.2.1

Increasing the Grey Level via the Pixel Dierence

The number of grey levels is insucient for an analysis of black and white images where the number of grey levels is drastically reduced (note that in greyscale and colour images, there are at least 256 grey levels). To resolve this technical diculty, we propose a solution that allows us to create more grey levels and consequently more meaningful statistics. Our approach is to use the pixel dierence. The pixel dierence is the dierence between a pixel and its neighbouring pixels. Given a pixel p(x, y) of an image, with x [1, X] and y [1, Y ], where X and Y are the image width and height, respectively, the vertical dierence for the pixel p(x, y) in the vertical direction is dened as follows: pv (x , y ) = p(x, y + 1) p(x, y), (5.1)

where x [1, X 1] and y [1, Y 1]. The pixel dierences for the horizontal, main diagonal and minor diagonal directions are dened similarly. Figure 5.1 illustrates the pixel dierence in the vertical direction. It is easy to observe and been conrmed by experiments that the introduction of the pixel dierence increases (almost doubles) the number of grey levels. To illustrate this point, consider a greyscale image with 256 grey levels. After introducing the pixel dierence, the range of grey levels becomes [255, +255]. The same doubling eect happens for binary images. This eect is desirable for resolving the technical diculty mentioned before. For this purpose, we incorporated the pixel dierence developed in [70]. Their

73

features are dened as below:

pn+1 (x, y) c pn+1 (x, y) r p1 (x, y) p2 (x, y)

= = = =

pn (x, y + 1) pn (x, y), pn (x + 1, y) pn (x, y), |p1 (x, y)| + |p1 (x, y)|, r c p1 (x, y) p1 (x 1, y) + p1 (x, y) p1 (x, y 1), c c r r (5.2) (5.3) (5.4)

where n = 0, 1, 2 and | | represents the absolute value. p1 (x, y) and p1 (x, y) c r can be considered as pixel dierences in the vertical and horizontal directions, respectively. p1 (x, y) and p2 (x, y) are the respective higher-order total dierences. p0 (x, y) is a special case and is actually the given binary image. We further dene the pixel dierence an order higher as follows:

p3 (x, y) c p3 (x, y) r

= =

p2 (x, y + 1) p2 (x, y), p2 (x + 1, y) p2 (x, y). (5.5)

We call these third-order pixel dierences. We would like to stress that all the statistical features we use in our analysis are based on this third-order pixel dierence and can be summarised in the following two stages: 1. In the rst stage we use the third-order pixel dierence to increase the number of grey levels. Note that p2 (x, y) in Equation (5.4) is obtained by summing the pixel dierences computed in the horizontal and vertical directions. The doubling eect of the pixel dierence increases the grey levels from [0, 1] for p(x, y) to [4, 4] for p2 (x, y). This is not hard to verify. For example, consider the minimum and maximum of grey levels for p1 are 1 and +1, r respectively. The same is applied to p1 . Hence, we can obtain the minimum c 2 and maximum grey level for p (x, y). The minimum is obtained when the neighbouring dierences for both p1 and p1 (right components) in Equation c r 2 (5.4) are 2, which produces p (x, y) = 4. The maximum is obtained when the neighbouring dierences for both p1 and p1 are 2, which will pror c 2 duce p (x, y) = 4. Finally, using the same concept for the third-order pixel dierence, we can increase the number of grey levels to 17 (i.e., [8, 8]). 2. In the second stage, we proceed with the computed third-order pixel dier74

ence to extract each of the specic feature sets. In other words, a certain feature set (the feature sets will be discussed in Subsections 5.2.2 and 5.2.3) is extracted on top of this third-order pixel dierence.

5.2.2

Grey Level Run Length Matrix

The rst feature set we try to extract is based on the grey level run length (GLRL). The length is measured by the number of consecutive pixels for a given grey level g and a direction . Note that 0 g G 1 and G is the total number of grey levels and indicates the direction, where 0 180 . The sequence of pixels (at a grey level) is characterised by its length (run length) and its frequency count (run length value) that tells us how many times the run has occurred in the image. Thus, our feature is a GLRL matrix that fully characterises dierent grey runs in two dimensions: the grey level g and the run length . The general GLRL matrix is dened as follows:

r(g, |) = # {(x, y) | p(x, y) = p(x + s, y + t) = g; p(x + u, y + v) = g; 0 s < u & 0 t < v; u = cos() & v = sin(); 0 g G 1 & 1 L & 0 180 }, (5.6)

where # denotes the number of elements and p(x, y) is the pixel intensity (grey level) at position x, y. G is the total number of grey levels and L is the maximum run length. For our practical implementation, we simply concatenate p3 (x, y) with p3 (x, y) and c r substitute p(x, y) in Equation (5.6). In addition, we observed that it is signicant to use only two directions for 0 and 90 . Therefore, the extracted GLRL matrices from the third-order pixel dierence can be considered higher-order statistical features.

5.2.3

Grey Level Co-Occurrence Matrix

We replaced the grey level gap length (GLGL) matrix proposed in our previous work with the grey level co-occurrence matrix (GLCM). From empirical studies, 75

we found that GLCM performs better in multi-class classications than GLGL. GLCM can be considered an approach for capturing the inter-pixel relationships. More precisely, the elements in a GLCM matrix represent the relative frequencies of two pixels (with grey level g1 and g2 , respectively) separated by a distance, d. GLCM can be dened as follows:

o(g1 , g2 , d|) = # {(x, y) | p(x, y) = g1 ; p(x + u, y + v) = g2 ; u = d cos() & v = d sin(); 0 g1 , g2 G 1 & 1 d D & 0 180}, (5.7)

where # denotes the number of elements and p(x, y) is the pixel intensity (grey level) at position x, y. G is the total number of grey levels and D is the maximum distance between two pixels. In our implementation, we substitute p(x, y) in Equation (5.7) with p3 (x, y), c 3 3 3 pr (x, y) and |pc (x, y)| + |pr (x, y)|. To avoid confusion, we call the resultants o1 (g1 , g2 , d|), o2 (g1 , g2 , d|) and o3 (g1 , g2, d|), respectively. Thus, we can obtain four GLCM matrices from each of o1 (g1 , g2 , d|), o2 (g1 , g2 , d|) and o3 (g1 , g2, d|). Each matrix comes from one direction for a total of four directions (0 , 45 , 90 and 135). We set the distance, d to one.

5.2.4

Cover Image Estimation

Cover image estimation is the process of eliminating embedding artefacts1 in a given image with the objective of getting close to a clean image. Cover image estimation was rst proposed by Fridrich and known as image calibration [38, 39, 35]. For brevity, consider the following proposition: Let Ic and Is represent the cover image and stego image, respectively. If |Ic Ic | < |Is Is |, then
(Ic ) (Ic ) < (Is ) (Is ),

(5.8)

where Ic and Is are the estimated cover images from Ic and Is , respectively. I I is the pixel-wise dierence between two same resolution
1

Embedding artefact refers to any alteration or mark introduced by embedding.

76

images. | | represents absolute value and () indicates the feature extraction function. From this proposition, the feature sets extracted from the feature dierences (e.g., (Is ) (Is )) can be considered as the dierences caused by the embedding operation, as long as the relationship holds. This is desired because we want to have feature sets that are sensitive to the embedding artefacts and invariant to the image content. We chose an image ltering approach to cover image estimation. There are several alternative image lters and, from empirical studies, we found that the Gaussian lter produces the best results. Three parameters must be determined to use this lter: standard deviation of Gaussian distribution () and the distances for horizontal and vertical directions (dh and dv , respectively). We determined = 0.6, dh = 3 and dv = 3 from trial and error, which gives us the optimum solution.

5.2.5

Final Feature Sets

It is very computationally expensive to use all elements in GLRL and GLCM matrices as the feature elements. Therefore, we propose to simplify them by transforming the two-dimensional GLRL and GLCM matrices into one-dimensional histograms.
L

hGLRL g

=
=1

r(g, |),

0 g G 1,

(5.9)

where = 0 and 90 , and the rest of notations are the same as in Equation (5.6). We observe that, within a GLRL matrix, there are some high concentration of frequencies near the short runs, which may be important. Hence, we propose extracting the rst four short runs as a single histogram, hsr . g hsr = r(g, |), g

0 g G 1,

(5.10)

where = 0 and 90 and = 1, 2, 3, 4 are the selected short runs. The one-dimensional histogram of the GLCM matrix, hg
GLCM

can be obtained in

a similar manner as in Equation (5.9), which is dened as follows:


G1 g1 =0 G1 g2 =0

hGLCM = g

o (g1 , g, d|) + 2

o (g, g2, d|)

0 g G 1,(5.11)

77

where = 1, 2, 3 and = 0 , 45, 90 and 135 . d = 1 and the rest of notations are the same as in Equation (5.7). As noted before, multi-class steganalysis can be considered a multi-class classication, so the extracted feature sets must be sensitive to embedding alterationsthe feature values should be very distinctive. The larger the dierences across the different classes, the better the features. Hence, we apply the characteristic function, CF to each of the histograms to achieve better discrimination. The characteristic function can be computed by a discrete Fourier transform, as shown in Equation (5.12).
N 1

CFk =
n=0

hn e N

2i

kn

0 k N 1,
2i

(5.12)

where N is the vector length, i is the imaginary unit and e N is a Nth root of unity. For each of characteristic function (one for each histogram), we compute the mean, variance, kurtosis and skewness. Except for the characteristic function of the four
histograms (i.e., for the four directions) shown in Equation (5.11), we hg compute the average of these characteristic functions. Based on the averaged

GLCM

characteristic function, we compute its mean, variance, kurtosis and skewness. We include another four statistics for each of the computed GLCM matrices, as discussed in Subsection 5.2.3. These four statistics can be dened as follows:

contrast energy homogeneity correlation

=
g1 g2

|g1 g2 |2 o(g1 , g2 ), o(g1 , g2 )2 ,


g1 g2

(5.13) (5.14) (5.15) (5.16)

= =
g1 g2

o(g1 , g2 ) , 1 + |g1 g2 | (g1 g1 )(g2 g2 )o(g1 , g2) , g1 g2

=
g1 g2

where g1 and g2 are the means of o(g1 , g2 ), whereas g1 and g2 are the standard deviations of o(g1 , g2 ). We form a 100-dimensional feature space as summarised in Table 5.1.
2

averaged from four directions

78

Table 5.1: Respective feature sets and the total number of dimensions for each set Number of Direction hGLRL g hg
sr GLCM
hg contrast

Number of Matrix 1 4 3 3 3 3 3

CF

Mean, Variance, Kurtosis and Skewness 4 4 4

Total Dimensions 8 32 12 12 12 12 12

2 2 12 4 4 4 4

applied applied applied

energy homogeneity correlation

5.3

Multi-Class Classication

As stated in Section 5.2, the second stage of our proposed steganalysis is multiclass classication. We have chosen the SVM as our multi-class classier. We start this section by explaining the general terminology of two-class SVM classication and then show how to generalise the two-class classication into multi-class classication using SVM. SVM can be considered a classication technique that can learn from a sample. More precisely, we can train the SVM to recognise and assign labels (classes) based on the given data collection (using features). For example, we train the SVM to dierentiate a cover image (class-1 ) from a stego image (class-2 ) by examining the extracted features from many instances of cover images and stego images. The SVM nds the separating line and determines the cluster in which an unknown image falls. Finding the right separating line is crucial and this is what the training accomplishes. In practice, the feature dimensionality is higher and we need a separating plane instead of line. This is known as separating hyperplane. The goal of SVM is to nd a separating hyperplane that eectively separates classes. To do that, the SVM will try to maximise the margin of the separating hyperplane during training. Obtaining this maximum-margin hyperplane will optimise the ability of the SVM to predict the class of an unknown object (image). However, there are often non-separable datasets that cannot be separated by a straight separating line or at plane. The solution to this diculty is to use a 79

Table 5.2: Example of majority-voting strategy for multi-class SVM class-1 SVM-a SVM-b SVM-c Total Votes 0 1 0 1 class-2 1 0 1 2 class-3 0 0 0 0

kernel function. The kernel function is a mathematical routine that projects the features from a low-dimensional space to a higher dimensional space. Note that the choice of kernel function aects the classication accuracy. For further reading of SVM, readers are referred to [80]. Although the nature of SVM is two-class classication, it is not hard to generalise the SVM to handle multiple classes. Several approaches can be used, including one-against-one, one-against-all and all-together. According to the recommendations given in [53], one-against-one provides the best and most ecient classications. Here, therefore, we will be using one-against-one and we discuss only this approach. For other approaches and the details of comparison, readers are referred to [53]. For a multi-class SVM based on the one-against-one approach, K(K 1)/2 twoclass SVMs are constructed. Each of these SVM classiers is assigned to classication of a non-overlapping pair of classes (which means there are no two pairs with the same combination of classes). After completing all two-class classications, a majority-voting strategy determines the nal class of an object. With majority voting, the class receiving the most votes during the classication processes is considered the correct class. If there are two or more classes with the same number of votes, one is chosen arbitrarily. Consider the following example. Suppose we have class-1, class-2 and class-3. We can construct three two-class SVMsSVM-a classifying classes-1 and -2, SVM-b classifying classes-1 and -3 and SVM-c classifying classes-2 and -3. Assume that, given an image, each of the two-class SVM classication results can be obtained, as tabulated in Table 5.2. From the table, the given image can be identied as belonging to class-2 because it received the highest number of votes (typeset to bold).

80

Table 5.3: Summary of image databases Database Textual Mixture Scene Total Images 659 659 1338 Resolution 200 dpi 200 dpi 72 dpi Image Type Textual Textual and Graphic Scene

5.4
5.4.1

Experimental Results
Experimental Setup

To cover a wider range of images, we constructed three image databases. The rst database consists of 659 binary images as cover images. The images are all textual documents with white background and black foreground. The image resolutions are all 200 dpi and with image size of 800 800. The second image database also consists of 659 binary images as cover images. These images have the same properties as the rst image database; however, we added some graphics (i.e., cartoons, clipart and some random shapes) randomly positioned in each textual document. In the third image database, we constructed 1338 binary images from a greyscale images using Irfanview version 4.10 freeware. These images are actually converted from natural images. Their resolution is 72 dpi and with image size of 512 384. Overall, we constructed 2656 cover images. The image databases are summarised in Table 5.3. For brevity, we will name the image databases textual, mixture and scene databases, respectively. As discussed in Section 5.1, we used ve dierent steganographic techniques to generate dierent types (classes) of stego images. Due to the dierent embedding algorithm in each technique, the steganographic capacity produced also varies signicantly. Hence, to obtain a fair comparison we opt to use the absolute steganographic capacity, which can be measured as bits per pixel (bpp). Since a binary image only has one bit per pixel, we can think of bpp as the average number of bits per image. For example, 0.01 bpp embedding means that, for every 100 pixels, only one pixel is used to carry message bits. 0.01 bpp is signicantly small, which also means there is less distortion in the image. This implies that the produced stego image is relatively secure and harder for steganalysis to detect.

81

Table 5.4: Summary of stego image databases


Database Textual Mixture Scene Total Images 659 659 1338 Number of Steganography 5 5 5 0.003 bpp 1 1 0.006 bpp 1 1 0.01 bpp 1 1 1 Total Images 9885 9885 6690

To verify the eectiveness of our proposed multi-class steganalysis, we constructed stego images with capacities of 0.003, 0.006 and 0.01 bpp for each steganography approach. Every stego image in the experiment is embedded with randomly generated message bits. This means that we used relatively huge stego image databases of 26460 images (as summarised in Table 5.4). We will extract the feature sets for each image from all the image databases mentioned above using the feature extraction methods proposed in Section 5.2. These feature sets will serve as the inputs for the multi-class SVM. As describes in Section 5.3 we will use the one-against-one approach to construct our multi-class steganalysis. We use the SVM implemented in [9] and follow the recommendations given in [53] for using radial basis function (RBF) as the kernel function and the optimum parameters for the SVMs were determined through a grid-search tool from [9]. We dedicated 80 per cent of each image database for training the classiers and the remaining 20 per cent is used for testing. The prototype is implemented in Matlab R2008a.

5.4.2

Results Comparison

To simplify the presentation, we abbreviate the ve steganographic methods as PCT, TP, CTL, LWZ and WL for [82], [107], [10], [69] and [112], respectively. Our multi-class steganalysis classication results are displayed in a table format called a confusion matrix. In the confusion matrix, the rst column consists of the classes, which include one cover image class and ve dierent classes of stego image (i.e., each class of the stego image is produced by each of the ve steganographic methods discussed). The value within brackets beside each class indicates the embedded capacity. For the cover image class, there is no embedding and we can consider the embedded capacity as zero (0.0 bpp). The rst row of the confusion matrix indicates the class of a given image.

82

We separated the three databases into three confusion matricesTables 5.5, 5.6 and 5.7. To better illustrate the results, we typeset the desired results in bold. In other words, the correct classication results are aligned along the diagonal elements within each confusion matrix. From the confusion matrices, we clearly see that the multi-class steganalysis gives very promising results. Especially in Table 5.5, the detections are nearly perfect. The results obtained for the mixture image database (Table 5.6) are accurate although slightly less than the results obtained in Table 5.5. The results for the scene image database (Table 5.7) appear to be the least accurate; however, the detection reliability is good and all the detection results show at least 80 per cent accuracy. Note that the type of cover image used aects the detection accuracy, which means it is relatively easier to detect images with textual content than images with natural scenes. This observation is supported by the detection accuracy order (where the results in Table 5.5 are the best, followed by the results in Table 5.6 and lastly the results in Table 5.7). We attribute this phenomenon to the fact that the textual content in an image has periodic patterns that are uniform and consistent. However, an image with scene content has fewer xed patterns and may appear more random. It is also worthwhile mentioning that embedding a longer secret message produces more distortion in an image. Hence, it is relatively easier to detect a stego image with a longer embedded message (higher bpp) than with a shorter message (lower bpp). This is seen by comparing the rows with 0.01 bpp to the rows with 0.003 bpp in the confusion matrices.

5.5

Conclusion

We proposed a multi-class steganalysis for binary images. Our proposed 60dimensional feature sets, used in combination with existing 40-dimensional feature sets, extended from our previous work eectively and accurately classied images to the appropriate classone cover image class and ve of stego images produced by dierent steganographic techniques. We employed the concept of cover image estimation, which improved the classication. Experimental results showed that our proposed method can detect low embedded capacity. Further, the experimental results showed that a detection accuracy of at least 92 per cent can be achieved with textual or a mixture of textual and graphic images. However, the accuracy

83

decreased slightly, to 80 per cent, in natural scene binary images. Table 5.5: Confusion matrix of the multi-class steganalysis for the textual database
Classied as CTL(%) 0.00 0.77 0.00 100.00 0.77 0.77 0.00 0.00 100.00 0.77 0.77 0.00 0.00 100.00 0.00 0.77

WL(%) Cover PCT (0.003 bpp) TP (0.003 bpp) CTL (0.003 bpp) LWZ (0.003 bpp) WL (0.003 bpp) PCT (0.006 bpp) TP (0.006 bpp) CTL (0.006 bpp) LWZ (0.006 bpp) WL (0.006 bpp) PCT (0.01 bpp) TP (0.01 bpp) CTL (0.01 bpp) LWZ (0.01 bpp) WL (0.01 bpp) 0.00 0.00 0.00 0.00 0.00 96.15 0.00 0.00 0.00 0.00 98.46 0.00 0.77 0.00 0.00 99.23

LWZ(%) 0.00 0.00 0.00 0.00 99.23 0.77 0.00 0.00 0.00 99.23 0.00 0.00 0.00 0.00 100.00 0.00

TP(%) 0.00 0.00 100.00 0.00 0.00 1.54 0.00 100.00 0.00 0.00 0.77 0.00 99.23 0.00 0.00 0.00

PCT(%) 0.00 99.23 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00

Cover(%) 100.00 0.00 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

84

Table 5.6: Confusion matrix of the multi-class steganalysis for the mixture database
Classied as CTL(%) 0.00 2.31 0.00 92.31 0.77 0.77 2.31 0.00 99.23 0.00 0.00 1.54 0.00 99.23 0.00 0.00

WL(%) Cover PCT (0.003 bpp) TP (0.003 bpp) CTL (0.003 bpp) LWZ (0.003 bpp) WL (0.003 bpp) PCT (0.006 bpp) TP (0.006 bpp) CTL (0.006 bpp) LWZ (0.006 bpp) WL (0.006 bpp) PCT (0.01 bpp) TP (0.01 bpp) CTL (0.01 bpp) LWZ (0.01 bpp) WL (0.01 bpp) 0.00 0.00 2.31 0.00 1.54 96.92 0.00 0.00 0.00 0.00 98.46 0.00 0.00 0.00 0.00 99.23

LWZ(%) 0.77 0.00 1.54 0.77 96.15 1.54 0.00 0.77 0.00 99.23 0.00 0.00 0.00 0.00 99.23 0.00

TP(%) 0.00 0.77 96.15 0.77 1.54 0.77 0.00 99.23 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00

PCT(%) 0.00 96.92 0.00 6.15 0.00 0.00 97.69 0.00 0.77 0.77 0.77 98.46 0.00 0.77 0.77 0.77

Cover(%) 99.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00

Table 5.7: Confusion matrix of the multi-class steganalysis for the scene database
Classied as CTL(%) 1.88 4.51 0.38 89.10 0.38 0.38

WL(%) Cover PCT (0.01 bpp) TP (0.01 bpp) CTL (0.01 bpp) LWZ (0.01 bpp) WL (0.01 bpp) 2.26 0.76 11.65 1.13 1.88 85.34

LWZ(%) 5.26 0.38 3.01 1.50 91.35 2.26

TP(%) 8.65 0.00 80.08 0.00 2.26 10.90

PCT(%) 0.38 93.99 0.00 3.76 0.38 0.38

Cover(%) 81.58 0.38 4.89 4.51 3.76 0.75

85

Chapter 6 Hidden Message Length Estimation


The eld of information hiding has two facets. The rst relates to the design of ecient and secure data hiding and embedding methods. The second facet, steganalysis, attempts to discover hidden data in a medium. Under ideal circumstances, an adversary who applies steganalysis wishes to extract the full hidden information. This task, however, may be very dicult or even impossible to achieve. Thus, the adversary may start steganalysis with more realistic and modest goals. These could be restricted to nding the length of hidden messages, identication of places where bits of hidden information have been embedded, estimation of the stegokey and classication of the embedding algorithms. Achieving some of these goals enables the adversary to improve the steganalysis, making it more eective and appropriate for the steganographic method used. Most works published on steganalysis relate to methods that use colour or greyscale images. Steganography that uses binary images has received relatively little attention. This can be partially attributed to the diculty of applying the statistical model used for colour and greyscale images and adapting it to the new environment. In spite of this diculty, binary images are very popular and frequently used to store textual documents, black and white pictures, signatures and engineering drawings, to name a few. Colour and greyscale images are characterised by a rich collection of various statistical features that have been used to develop new steganalysis techniques. Unlike colour and greyscale images, binary images have a rather modest statistical nature. In general, it is dicult to convert steganalysis used for colour or greyscale 86

images to an attack on binary images. However, we have successfully adapted the concepts used for colour or greyscale images and we propose a new collection of statistical features that estimate the length of hidden message embedded in a binary image. Consequently, we can decide whether a given image contains a hidden message. More precisely, we can tell apart the cover images from the stego images. We must emphasise that our steganalysis is designed to attack the steganographic technique developed in [69]. This means that our analysis is a type of targeted steganalysis. In this work, we dene the length of embedded message as the ratio between the number of bits of the embedded message and the maximum number of bits that can be embedded in a given binary image. Note that we use the terms message length, embedded message length and hidden message length as synonyms. The organisation of this chapter is as follow. In the next section, we give a brief summary of the steganographic method we analyse. The technique of analysis we apply is given in Section 6.2. Section 6.3 presents the results of the analysis. Section 6.4 concludes the chapter.

6.1

Boundary Pixel Steganography

We briey introduce the steganographic method under analysis in this section. Note that this steganographic method [69] is described in detail in Subsection 3.1.1 and is summarised here. The steganography developed in [69] is a variant of boundary pixel steganography. This method uses a binary image as the medium for secret message bits. A set of rules is proposed to determine the data carrying eligibility of the boundary pixels. This plays an important role in ensuring that embedding produces minimum distortion and obtaining error free message extraction. In addition, the embedding algorithm generates no isolated pixels. This method also employs a PRNG to produce a random selection path for embedding. As the embedding algorithm modies only boundary pixels, the visual distortions are minimal and there is no pepper-and-salt like noise. However, if we take a close look at an image with an embedded message, we can observe small pixel-wide notches and protrusions near the boundary pixels.

87

We use these small distortions to launch an attack on the steganographic algorithm. In our attack, we rst detect the existence of a hidden message and then estimate its length.

6.2

Proposed Method

We want to propose a steganalysis technique that can counteract the steganography developed in [69]. However, given an image, we do not know whether the image is a cover or a stego image without a priori knowledge. What we can do is to extract some useful characteristics from the given image. These characteristics may reveal some estimate of the length of embedded hidden message (i.e., if zero percent is estimated then the given image is a cover image; if certain nonzero percentage is estimated then the given image is a stego image). We rst dene a statistic by measuring the number of notches and protrusions in the image. This statistic will reect the degree of image distortion. Then we dene a numerical value associated with this statistic. Finally, we show that this numerical value is approximately proportional to the size of the embedded message, which enables us to compute an estimate of the embedded message length.

6.2.1

512-Pattern Histogram as the Distinguishing Statistic

For any boundary pixel (as shown in Figure 6.1), we can form a certain pixel pattern together with its eight neighbouring pixels. Examples of the pattern are shown in Figure 6.2 (the shaded box represents a pixel value of zero and the white box represents a pixel value of one). Altogether, 512 patterns can be formed by the dierent combinations of black and white pixels in a block of nine pixels; however, two patterns cannot be used, because they do not contain any boundary pixels. Clearly, these patterns are formed by either all black or all white pixels. To simplify our considerations, we assume that there are 512 patterns. The 512-pattern histogram H(J) tabulates the frequency of occurrence of each pattern in the given image J. The frequency of occurrence hi for the ith pattern

88

Figure 6.1: Illustration of a boundary pixel in the magnied view on some portion of the n character
n n n n n n n b n n b n n n n n n n n n n n n n n b n n b n n n n n n n n n n n n n n b n n b n n n n n n n n n n n n n n b n n b n n n n n n n n n n n n n n b n n b n n n n n n n n n n n n n n b n n b n n n n n n n

Figure 6.2: Examples of the patterns formed by a single boundary pixel (denoted by b) and its eight neighbouring pixels (denoted by n) from a binary image

is given by
M

hi =
k=1

(i, p(k)),

(6.1)

where p(k) denotes the kth pattern in the given image, M is the total number of patterns in the given image and is the Dirac delta function ((u, v) = 1 if u = v and 0 otherwise). For brevity, we let H represent H(J) and have H = {hi | 1 i 512}. (6.2)

In this 512-pattern histogram, a cover image has some high-frequency bins (corresponding to the pattern types) and some other bins with low frequency. However, these bins tend to be attened1 after embedding (see Figure 6.3 for example). This happens because, during embedding, some image pixels are ipped to carry message bits, which disturbs the inter-pixel correlation. This is reected in the pattern changes. The longer the embedded message, the atter the 512-pattern histogram. From
By attened we mean some of the local maxima in the histogram will decrease and some of the local minima in the histogram will increase.
1

89

3000 Cover Image Stego Image (80%) Decrement 2500

2000 Frequeny

1500 Decrement

1000

500 Increment 11 12 13 14 15

Increment 0 0 1 2 3 4 5 6 7 8 9 10 Selected Bins from the 512 Pattern Histogram

Figure 6.3: Some of the bins from the 512-pattern histogram are selected to illustrate the comparison between a cover image and stego image (embedded with 80 per cent of the message length). Note that some of the bins are attened in the stego image this observation, we propose to compute the histogram dierence to capture the atness of the 512-pattern histogram. The histogram dierence is the bin-wise absolute dierence between the 512-pattern histograms for two images. The rst histogram is from the given binary image and the second is from the same image after it been re-embedded with a random message of the maximum length (100 per cent). The re-embedding operation uses the steganographic technique described in Section 6.1. The following equation denes the second histogram: H = {hi | 1 i 512}, (6.3)

where hi is the corresponding frequency of occurrence for the ith pattern in the same image that has been re-embedded with 100 per cent of the length of a random message. Then the histogram dierence can be written as follows: HD = {|hi hi | | 1 i 512}, where | | represents the absolute value. We choose to calculate the histogram dierence because using the 512-pattern histogram for the given image directly is insucient. The 512-pattern histogram for a given image does not fully represent the embedding artefacts and may be biased by the image content. It would be easier if we had both the cover and stego versions of the imagesthe dierences between these two images would be (6.4)

90

only from the embedded message. However, under normal circumstances we do not have both versions. Therefore, it is useful to work backward by determining how many remaining boundary pixels can be used for embedding, which gives an estimate the number of message bits that have (or have not) been embedded. This explains why we opted to use re-embedding to obtain the histogram dierence. We use Figure 6.4 to illustrate our considerations. Figure 6.4(a) shows two binary images that are slightly dierent but their pattern histograms (Figure 6.4(b)) are entirely dierent. However, as shown in Figure 6.4(c), the histogram dierences for the respective bins of the two binary images are (almost) identical. This argument supports the use of the histogram dierence.

6.2.2

Matrix Right Division

To allow the histogram dierence to measure the embedded message length, we propose using matrix right division. Matrix right division can be considered a transformation of a histogram to a numerical value (one-dimensional metric). Alternatively, matrix right division can be seen as an attempt to solve an appropriate system of linear equations. The matrix right division used is from the standard MATLAB R2007b built-in matrix division function and dened as follows: If A is a non-singular and square matrix and B is a row vector, then x = B/A is the solution to the system of linear equations x A = B computed by Gaussian elimination with partial pivoting. If xA = B is an over-determined system of linear equations, then x = B/A is the solution in the least squares sense of the over-determined system. In general, when A is non-singular and square, the system has an exact solution given by x = B A1 where A1 is the inverse of matrix A. Hence, the solution can be computed by multiplying the row vector B with the inverse of matrix A. It is also dened as multiplication with the pseudo inverse (refer to [58, 45] for details of the pseudo inverse). However, a solution based on a matrix inverse is inecient for practical applications and may cause large numerical errors. A better approach is to use matrix division. For an over-determined system of equations, it is impossible to compute the inverse 91

(a)
450 400 350 300 Frequency 250 200 150 100 50 0 1 2 3 4 5 6 7 Selected Bins from the 512 Patterns Histogram 8 9 10 Image # 1 Image # 2

(b)
800 700 600 Frequency 500 400 300 200 100 0 1 2 3 4 5 6 7 Selected Bins from the 512 Histogram Difference 8 9 10 Image # 1 Image # 2

(c)

Figure 6.4: (a) Two sample binary images. (b) Some bins from the 512-pattern histogram of the binary images shown in (a). (c) The respective bins in the histogram dierence of the binary images shown in (a).

92

of matrix A. However, a solution for this system can be computed by minimising the number of elements in r = x A B (also known as the Euclidean length). This can be computed in matrix division, which corresponds to the sum of squares of r. This yields a solution in the least squares sense. In our application of matrix right division, matrix A is actually a row vector of the same length as B. It is reasonable to consider a pattern histogram as a row vector; since the histogram dierence is the bin-wise absolute dierence between two 512-pattern histograms, the histogram dierence can be considered a row vector as well. Thus, we can perform matrix right division between the histogram dierence and the 512-pattern histogram of a given binary image, as in Equation (6.5). We call the resulting numerical value a histogram quotient hq. However, the division is not an element-wise division. hq = HD/H. (6.5)

We illustrate matrix right division in the following examples: Example 1 x= y= 2 4 8 1 2 4 Example 2 x= y= 2 4 12 9 18 6 1 2 6 3 6 2

x/y = 2

x/y = 2.5444

In Example 1, all elements of x are a multiple of two of the vector y. Thus, the quotient matrix right division is two. In Example 2, the rst three elements of x are products of the rst three elements of y by two. The last three elements of x are products of last three elements of y by three. Thus, the quotient matrix right division is 2.5444, which is between two and three.

6.2.3

Message Length Estimation

In this subsection, we select a binary image to demonstrate the consistent behaviour of the histogram quotient. We re-embedded it with a ve per cent increment of the message length and observed the response of the histogram quotient. As shown in Figure 6.5, the respective histogram quotient increases almost linearly until a certain point, where a further increase the length of the re-embedded message does not increase the histogram quotient. This shows that the desired 93

0.7 0.6 histogram quotient 0.5 0.4 0.3 0.2 0.1 0

10

15

20

25

30

35 40 45 50 55 60 65 Length of Reembedded Message (%)

70

75

80

85

90

95

100

Figure 6.5: Histogram quotient with a ve per cent increment in the re-embedded message length consistency can be obtained by using a histogram quotient based on the histogram dierence. Therefore, as discussed in Subsection 6.2.1, nding the corresponding dierence between the two circles shown in Figure 6.5 proves crucial and provides us a strong trait for estimating the embedded message length. In short, our proposed method rst identies all boundary pixels in a given binary image. The boundary pixel used here is dened as a pixel that has at least one neighbouring pixel (among the four neighbouring pixels) with a dierent pixel value. Then the 512-pattern histogram will be obtained from these boundary pixels. Based on this pattern histogram, the histogram dierence is computed and the histogram quotient is calculated (denoted hq in Equation (6.5)). Finally, we employ linear interpolation to obtain an approximate constant of proportionality c such that hq c , where is the message length. Then, for any particular value of hq, we can compute an estimate of using
hq . c

6.3
6.3.1

Experimental Results
Experimental Setup

The experimental settings are described below: The embedding algorithm used to create the stego images is the steganography proposed in [69]. The total embeddable pixels per image produced by this embedding algorithm is about 25 per cent of the total boundary pixels. The maximum message length (100 per cent length) is dened as the total 94

Estimated Message Length (%)

80

60

40

20

0 0 100 200 300 Image # 400 500 600

Figure 6.6: Estimated length of hidden messages for all binary images number of embeddable pixels per image. Eight sets of stego images (i.e., 10, 20, 30, 40, 50, 60, 70 and 80 per cent) are created from 659 binary cover images. The cover images are all textual documents with a white background and black foreground. The resolution of all binary images is 200 dpi and with image size of 800800. The prototype is implemented in Matlab R2008a.

6.3.2

Results of the Estimation

From the 5931 mixture of cover and stego images, we estimate the length of the embedded message and compare it with the actual embedded lengths of 0, 10 , 20, 30, 40, 50, 60, 70 and 80 per cent, using our proposed method. Zero per cent represents a cover image. The estimation results are shown in Figure 6.6. The estimated lengths are very close to the actual lengths. The estimations for large embedded messages, such as 80 per cent are not as close as those of other estimations, although they retain good accuracy. At such a high percentage, some stego images are quite distorted and the pixels exhibit a high degree of randomness. We believe this randomness causes slight instability of our proposed method; however, this phenomenon does not pose a serious problem because we can easily spot the embedding artefacts in such a highly distorted stego image (Figure 6.7 shows a highly distorted stego image). Table 6.1 summaries the mean and standard deviation of all the estimated message lengths according to the actual embedded message lengths. The average value for

95

Figure 6.7: Example of a highly distorted stego image embedded with 80 per cent of the message Table 6.1: Mean and standard deviation of the estimation Length (%) 0 10 20 30 40 50 60 70 80 Mean 0.0277 9.8540 19.8438 29.9271 39.9608 50.0210 60.0763 70.3436 79.9598 Standard Deviation 1.8761 1.8034 1.5966 1.4337 1.3445 1.2869 1.2747 1.7666 2.0547

each estimated length is very close to the actual length. The standard deviation is also very smallonly about one or two per cent. This implies that the estimated lengths do not deviate much from the actual lengths. The estimation errors are displayed in Figure 6.8. The estimation error for each binary image is computed as the dierence between the estimated and the actual embedded message length in percentage terms. The estimation errors are relatively low and concentrated around 0.00 per cent. The highest estimation error is occasionally found and only about 6.00 per cent, except that one outlier has an error of 7.43 per cent.

96

10 8 6 Estimation Error (%) 4 2 0 2 4 6 8 10 0 Cover 100 10% 20% 200 30% 40% 400 50% 60% 500 70% 600 80%

300 Image #

Figure 6.8: Estimation error of hidden message length for all binary images

6.4

Conclusion

The method proposed in this work can detect the steganography developed in [69] and estimate the length of the embedded message. In this estimation, we rst build the 512-pattern histogram from a binary image as the distinguishing statistic. From this 512-pattern histogram, we compute the histogram dierence to capture the changes caused by the embedding operation. Performing matrix right division creates a histogram quotient. Based on this histogram quotient, the length of the embedded message is estimated. We used a signicantly large image database, consisting of 5931 binary images (one set of cover images and eight sets of stego images) to test the proposed method. From the experiment results, we conclude that our proposed method eectively estimated the hidden message length with low estimation error. We observe that it is insucient only using a set of rules to ensure suitable datacarrying pixels because the notches and protrusions produced from embedding still can be utilised to mount an attack. To alleviate this shortcoming in the steganography, we suggest to incorporate an adaptive pixel selection mechanism for the identication of suitable data-carrying pixels.

97

Chapter 7 Steganographic Payload Location Identication


In general, as discussed in Section 3.2, the task of steganalysis involves several different levels of analysis (also considered dierent forms of attacks). They are the determination of the existence of a hidden message, classication of the steganographic methods, nding the length of hidden message, identication of locations where bits of hidden message have been embedded and retrieval of the stegokey. Compared to other forms of attack, the identication of locations that carry stego pixels and retrieval of the stegokey have received relatively less attention in the literature. These attacks require extracting extra and more information about the steganography method used. Consequently, they are much more dicult than attacks that extract only partial information. For instance, the estimation of the stegokey in the attack given in [42] requires the identication of the hidden message length. In this chapter, we develop an attack that identies the steganographic payload locations in binary images, where bit-replacement steganography is used. More precisely, our proposed method will nd and locate the pixels in the image used to carry secret message bits. Note that steganographic payload, hidden data and message are used interchangeably throughout the chapter. The remainder of the chapter is structured as follow. In the next section, we provide the related background. The motivation for this research and main research challenges are discussed in Section 7.2. The attack is discussed in detail in Section 7.3. Section 7.4 gives experimental results for the attack and the chapter is concluded in Section 7.5. 98

7.1

Background

Some attacks, such as blind steganalysis or stego message length estimation, can determine if a given image is a stego or cover image. Assume that we have already determined that the image contains a steganographic payload. The next and quite natural step is to identify the location of the hidden message. Because of the invasive nature of steganography, the embedding operation is likely to disturb the inter-pixel correlations. The embedding operation creates pixels with high energy, as dened by Davidson and Paul [27]. They developed a method to measure the energy caused by the embedding disturbance and were able to identify pixels with high energy that are likely to carry the hidden message bits. However, their method suers from high false negatives or missed detections when some message bits do not change the pixel energy. This occurs when the parity of the hidden message bits is the same as the parity of the image pixels. Kong et al. in [68] used the coherence of hue in a colour image to identify a subset of pixels used to carry the hidden message bits. They observed that, in cover images (without a hidden message), the coherence of hue varies slowly and tends to be constant in a small neighbourhood of pixels. This is no longer true when a hidden message is embedded. Thus, when the hue of a region under examination exceeds certain threshold, there is a good reason to suspect that it contains bits of the hidden message. Unfortunately, this analysis only works for steganography with sequential embedding. If the embedding is random, then this attack fails. In [62], Ker showed how to use the residual of the weighted stego image to identify the location of bits of the hidden message. The residual is the pixel-wise dierence between the stego image and the estimated cover image. This analysis requires a large number of stego images. The only concern is whether it is possible to obtain dierent multiple stego images with the payload embedded in the same locations. Nevertheless, this is plausible when the same stegokey is reused across dierent stego images. The author applied a similar concept in a dierent paper to attack the LSB-matching steganography and proved to be eective [64].

7.2

Motivation and Challenges

The promising results obtained by the Ker method in analysis of greyscale image steganography motivated us to take a closer look at the analysis and extend 99

the concept to binary image steganography. The Ker method is superior to the methods developed in [27, 68]. More importantly, the method can be applied for sequential and random embedding and has low false negatives. These two advantages are very important. For example, the problem of false negatives gets worse and becomes critical when a message is encrypted before embedding. The bits of an encrypted message behave like truly random ones with uniform probability distribution for zeros and ones. This implies that half of the time the message bits will match the pixel LSBs, so no change takes place. Consequently, nothing can be detected. It is well known that sequential embedding is insecure as it can be easily detected using a visual inspection. Most current steganographic techniques employ random embedding. Thus, steganalysis is of limited use if it can only attack sequential embedding. Let us introduce the ideas used in our work. We follow the conventions used by Ker in his work [62]. Given a stego image as a sequence of pixels S = {s1 , s2 , , sn } and an estimated cover image C = {1 , c2 , , cn }. C can be estimated from c the stego image by taking the average of the four connected neighbouring pixels (linear ltering). n is the total number of pixels. Now we can dene a vector of residuals ri = (si si )(si ci ), where si is the ith pixel of the stego image with its LSB ipped. Assume that we have N multiple stego images. We can dene the residual of the ith pixel in the jth stego image as rij = (sij sij )(sij cij ). The mean of the residual of the ith pixel can be computed as follows: ri = 1 N
N

(7.1)

(7.2)

rij .
j=1

(7.3)

With a sucient number of stego images, this mean of residuals will provide strong evidence that can be used to separate the stego-bearing pixels from nonstego-bearing pixels. Although there are some similarities in the embedding algorithms for greyscale and binary image steganography, the attack on dierent image types may require

100

a dierent approach. Unlike a greyscale image, a binary image has a rather modest statistical nature. This makes it dicult to apply the existing method directly. In addition, it is clear that the Ker technique oers a high accuracy; however, there is always a trade-o between the required number of stego images and detection accuracy.

7.3

Proposed Stego-Bearing Pixel Location Identication

In this section, we discuss our proposed method for attacking binary stego images embedded using bit-replacement steganography. Let us rst introduce bitreplacement steganography for a binary image. Given a cover image C = {c1 , c2 , , cn } and a stego image S = {s1 , s2 , , sn } that contains a hidden message. Since a binary image has only two intensities (black and white), the embedding operation involves simply ipping the one-bit pixel (i.e. changing the zeros to ones and vice versa) when the message bit does not match that of the image pixel. Assume that the order of hidden message bits that are going to be embedded are randomly permuted. The random permutation is obtained from a PRNG that is controlled by a stegokey. Clearly, it is not impossible for an adversary to get access to multiple stego images that reuse the same stegokey for a batch of covert communications. Although the content of the secret message and the cover image used may dier every time, the steganographic payload locations will be the same because they use the same stegokey. We begin by adapting the method proposed by Ker in [62]. This includes nding the vector of residuals and employing multiple stego images to obtain the mean of the residuals. However, due to the limited characteristic of a binary image, we need a dierent approach to estimating the cover image. We choose an image smoothing approach to achieve binary cover image estimation. Several alternatives exist; our empirical studies found that a Gaussian lter produces the best results. The Gaussian lter is dened as follows: g(x, y) = 1 x2 +y2 e 22 , 2 2 (7.4)

where x and y are the window size of the Gaussian lter in the horizontal and vertical directions, respectively and is the standard deviation of the Gaussian 101

distribution. Once we estimate the cover image, we can compute the vector of residual for each stego image using Equation (7.1). We compute the mean of residuals, as shown in Equation (7.3), by employing multiple stego images. The identication of pixel locations containing a steganographic payload can be carried out by choosing the M pixels with the highest mean residual ri . According to the author in [62], M can be calculated by M = 2n with r = r
1 nN n,N i,j

rij .

However, the estimation of M for binary images does not always give as accurate an estimation as for greyscale images. This happens because of the modest statistical characteristic in a binary image. This becomes more severe when N is small. To overcome this problem, we propose incorporating the entropy measurement. Entropy is dened as follows:
K

E(I) =
i=1

pi log2 pi ,

(7.5)

where I can be the given stego image S or the estimated cover image C. pi is the probability of the pixel intensity occurrences with a total of K possible intensities. However, computing entropy for the entire image at once will give us a global feature. Instead, we use Equation (7.5) to compute the local entropy of the 3 3 neighbourhood around the ith pixel. We can obtain the local entropy for every pixel in both the stego image and its corresponding estimated cover image. Entropy measurement has been widely used as a statistical measure for randomness to characterise the content of an image. Note that the embedding operation will alter the image content, which causes direct changes on the degree of randomness in the image. Thus, incorporating entropy appears to be an appropriate solution for capturing the embedding artefact. The next question would be, how do we combine the mean residual with the local entropy? Firstly, we nd the local entropy dierence,
di = s c , i i

(7.6)

where s and c are the ith pixel local entropy from the stego image and its i i estimated cover image, respectively. Secondly, we employ multiple stego images to compute the mean of local entropy dierences di by replacing rij in Equation

(7.3) with dij . dij can be obtained in a manner similar to rij in Equation (7.2). 102

Table 7.1: Summary of image databases Database Database A Database B Database C Total Images 5867 1338 2636 Resolution 300 dpi 96 dpi 200 dpi Image Size 400 400 512 384 400 400

Thirdly, we construct two pixel subsets, S r S and S d S, by evaluating the mean of residuals and the mean of local entropy dierences, respectively. The elements in subset S r are obtained by 10 per cent more than M pixels with the highest mean of residual. The 10 per cent is determined empirically and the aim is to obtain slightly more samples. For the second subset S d we select those pixels with a mean local entropy dierence exceeding threshold . Finally, the identication of pixels containing the steganographic payload is determined by Sr Sd.

7.4
7.4.1

Experimental Results
Experimental Setup

To cover a diverse set of images, we constructed three image databases. The rst image database consists of 5867 binary cover images. These images are cropped from a set of 4288 2848 pixel RAW images captured by a Nikon D90 digital camera. Then we use the conversion software supplied by the camera manufacturer to convert the images to TIFF. In the second image database, we constructed 1338 binary images from the image database used in [98]. The third image database consists of 2636 binary cover images. The images in the rst and second databases are natural scene images. The images in the third database are textual document images. The cropping operation and greyscale to binary conversion are carried out with Irfanview version 4.10 freeware. Overall, we constructed 9841 cover images and the databases are summarised in Table 7.1. For brevity, we call these Database A, B and C. We use bit-replacement steganography (as discussed in the rst paragraph of Section 7.3) to generate stego images from the three image databases for dierent message lengths. We generated three message lengths (0.01, 0.05 and 0.10 bpp) 103

for each database. Since a binary image only has one bit per pixel, we can think of bpp as the average message bits embedded per image. For example, 0.01 bpp embedding means that, for every 100 pixels, only one pixel is used to carry message bits. We employ uniform distribution of random message bits for the experiments. For the parameters of our proposed method, we set x = 3, y = 3 (abbreviated as 3 3) and = 0.6 for the Gaussian lter. We also tried several dierent window sizes for the Gaussian lter and the results are given in Figure 7.1. Window sizes of 3 3 and 5 5 give the optimum performance. Since the window size of 3 3 and 5 5 give about the same accuracy, we chose 3 3 to reduce the demand for computational resources. The threshold is > 0.05.
100 100 100

90

90

90

80

80

80

70 True Positive, TP (%) True Positive, TP (%)

70 True Positive, TP (%) 2x2 3x3 4x4 5x5 6x6

70

60

60

60

50

50

50

40

40

40

30

20

10

2x2 3x3 4x4 5x5 6x6

30

30

20

20

10

10

2x2 3x3 4x4 5x5 6x6

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

(a)

(b)

(c)

Figure 7.1: Identication results of 0.05 bpp for dierent window sizes: (a) Database A (b) Database B (c) Database C

7.4.2

Results Comparison

To evaluate the accuracy of the identication, we compared the estimated locations with the actual set of stego-bearing pixel locations. For each message length, we show the accuracy of the identication in terms of true positives (abbreviated as TP), false positives (FP) and false negatives (FN). We divided the results into three tablesone for each image database (Tables 7.2, 7.3 and 7.4, respectively). The identication accuracy shown in each table is in a percentage. The tables show clearly that the proposed method gives very promising results. Especially in Table 7.2, the identication is nearly perfect for N = 100 and perfect for N greater than 300 images. Similarly, reliable accuracy is also shown in Table 104

Table 7.2: The accuracy of the stego-bearing pixel location identication for image Database A (* indicates the message length) # of images, N TP (* 0.01bpp) FP (* 0.01bpp) FN (* 0.01bpp) TP (* 0.05bpp) FP (* 0.05bpp) FN (* 0.05bpp) TP (* 0.10bpp) FP (* 0.10bpp) FN (* 0.10bpp) 100 100 0.00 0.00 99.95 0.00 0.05 99.80 0.04 0.16 200 100 0.00 0.00 99.99 0.00 0.01 99.96 0.00 0.04 300 100 0.00 0.00 100 0.00 0.00 99.99 0.00 0.01 > 320 100 0.00 0.00 100 0.00 0.00 100 0.00 0.00

Table 7.3: The accuracy of the stego-bearing pixel location identication for image Database B (* indicates the message length) # of images, N TP (* 0.01bpp) FP (* 0.01bpp) FN (* 0.01bpp) TP (* 0.05bpp) FP (* 0.05bpp) FN (* 0.05bpp) TP (* 0.10bpp) FP (* 0.10bpp) FN (* 0.10bpp) 100 99.90 0.05 0.05 99.77 0.11 0.12 99.62 0.19 0.19 200 100 0 0 100 0.00 0.00 99.86 0.07 0.07 300 100 0 0 100 0.00 0.00 99.92 0.04 0.04 > 820 100 0 0 100 0.00 0.00 99.98 0.01 0.01

105

Table 7.4: The accuracy of the stego-bearing pixel location identication for image Database C (* indicates the message length) # of images, N TP (* 0.01bpp) FP (* 0.01bpp) FN (* 0.01bpp) TP (* 0.05bpp) FP (* 0.05bpp) FN (* 0.05bpp) TP (* 0.10bpp) FP (* 0.10bpp) FN (* 0.10bpp) 100 84.48 1.95 13.57 86.69 3.21 10.10 85.91 4.11 9.98 200 90.33 0.94 8.73 90.84 1.91 7.25 90.41 2.35 7.23 300 92.59 0.50 6.91 92.85 1.26 5.89 92.45 1.72 5.83 > 2600 99.31 0.06 0.63 99.48 0.09 0.43 99.40 0.13 0.47

7.3, except for an embedded message length of 0.10 bpp, where near perfect identication is achieved for N > 820 images. The identication of stego-bearing pixel locations for images in Database C (Table 7.4) appeared to be the most dicult compared to the others. However, the detection reliability is still very good and all the identications show at least 84 per cent for N = 100 and more than 90 per cent when N > 200. Further analysis reveals that the textual content in image Database C has periodic patterns that are uniform and consistent across the whole image. This increased the image entropy signicantly in the whole image. Since our method is partly based on using the local entropy, this interfered with our identication mechanism. To the best of our knowledge, there is no stego-bearing pixels identication approach proposed for binary images in the literature. Thus, we compare our proposed method to a general method, where just the residual of weighted stego images and linear ltering are used. From the results shown in Figures 7.2, 7.3 and 7.4, our proposed method shows better performance. However, with Database C, the identication results for both methods show only a marginal dierence. Figure 7.4 illustrates the dierence. This justies the explanation given in the previous paragraph about the use of local entropy in textual images and its smaller eect.

106

100

100

100

90

90

90

80

80

80

70 True Positive, TP (%) True Positive, TP (%)

70 True Positive, TP (%) 10 20 30 40 50 60 # of images, N 70 80 90 100

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

10

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

0 0

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

(a)

(b)

(c)

Figure 7.2: Comparison of results for image Database A (solid line represents the proposed method and the line with crosses is the general method): (a) 0.01 bpp (b) 0.05 bpp (c) 0.10 bpp

100

100

100

90

90

90

80

80

80

70 True Positive, TP (%) True Positive, TP (%)

70 True Positive, TP (%) 10 20 30 40 50 60 # of images, N 70 80 90 100

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

10

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

0 0

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

(a)

(b)

(c)

Figure 7.3: Comparison of results for image Database B (solid line represents the proposed method and the line with crosses is the general method): (a) 0.01 bpp (b) 0.05 bpp (c) 0.10 bpp

100

100

100

90

90

90

80

80

80

70 True Positive, TP (%) True Positive, TP (%)

70 True Positive, TP (%) 10 20 30 40 50 60 # of images, N 70 80 90 100

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

10

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

0 0

0 0

10

20

30

40 50 60 # of images, N

70

80

90

100

(a)

(b)

(c)

Figure 7.4: Comparison of results for image Database C (solid line represents the proposed method and the line with crosses is the general method): (a) 0.01 bpp (b) 0.05 bpp (c) 0.10 bpp

107

7.5

Conclusion

We successfully proposed a steganalysis technique to identify the steganographic payload location for binary stego images. This work was motivated by the concept developed in [62] where greyscale stego images are used. We enhanced and applied the concept to binary stego images where we propose using Gaussian smoothing to estimate the cover images. We employ local entropy to improve the identication accuracy. Experimental results showed that our proposed method can provide reliable identication accuracy of at least 84 per cent for N = 100 and more than 90 per cent when N > 200. The experimental results also showed that our proposed method can provide nearly perfect ( 99 per cent) identication for N as low as 100 in non-textual stego images. It is important to note that our proposed method will not produce the same accuracy if only one stego image is available. Although this may seem like a downside, it is unavoidable. If only one stego image is available, we would not have sucient evidence to locate the unchanged pixels where the LSBs already matched the message bits. As a result, the problem of high false negatives, discussed in Section 7.2 (second paragraph) would occur.

108

Chapter 8 Feature-Pooling Blind JPEG Image Steganalysis


From a practical point of view, blind steganalysis is more useful because, if an image is suspected to carry a secret message, we can use blind steganalysis rst to detect the existence of hidden message. Then, we can carry out further analysis, such as to identify the steganographic technique used, except in a rare case, where a priori knowledge of the type of steganography is known (for example, when the computer of a suspect is conscated and a certain steganography tool is found in that computer). This chapter focuses on blind steganalysis. The analysis is carried out and tested on greyscale JPEG image steganography. We will study several existing JPEG image blind steganalysis techniques, especially feature extraction techniques. Then we will select and combine the features to form a pooled feature set. The rest of the chapter is structured as follows. In the next section, we discuss the feature extraction techniques. The proposed feature-pooling steganalysis will be given in Section 8.2. Section 8.3 presents the experimental results and the chapter is concluded in Section 8.4.

8.1

Feature Extraction Techniques

Feature extraction plays an important role in blind steganalysis. A good feature should be representative and sensitive to steganographic operations. Moreover, the feature should be insensitive to image content. In the following subsection, 109

several well known steganalysis techniques are discussed. The emphasis will be on feature extraction algorithms.

8.1.1

Image Quality Metrics

In [4], the authors proposed and selected a set of ten image quality metrics. These metrics are the mean absolute error, mean square error, Czekanowski correlation, angle mean, image delity, cross correlation, spectral magnitude distance, median block spectral phase distance, median block weighted spectral distance and normalised mean square HVS error. These metrics are selected based on the one-way ANOVA test. Among these metrics, seven metrics are more sensitive in detecting active warden steganography. The other four metrics test more sensitive in detecting passive warden steganography. Active warden steganography is constructed to withstand alterations robustnessmade by the warden (steganalyst). Robustness is not the main objective in passive warden steganography, rather it is to conceal the existence of a secret message to create a covert communication (the description of active and passive warden is given in Section 2.2). The metric sensitivity is based on the statistic signicance obtained from the ANOVA tests, where the tests are performed on active and passive warden steganography separately.

8.1.2

Moment of Wavelet Decomposition

Lyu and Farid [73] proposed using a higher-order statistic as features that include the mean, variance, skewness and kurtosis. Two sets of these higher-order statistics are obtained and result in 72-dimensional features. The rst set is acquired from wavelet decomposition, based on separable quadrature mirror lters. In the decomposition, a given image is decomposed into multiple orientations and scales. Each scale has three orientationsvertical, horizontal and diagonal subbands. The elements in each subband are called wavelet subband coecients. The original paper used the rst three scales and produced nine subbands. The mean, variance, skewness and kurtosis of each wavelet subband coecient are composed and result in 36 statistics, which are used as the rst set of features. Next, based on the nine decomposed subbands, the linear predictor of the wavelet 110

coecients is obtained from the neighbouring wavelet coecients for each vertical, horizontal and diagonal subband. The linear relationship for the predictor is dened as below: V = Qw, (8.1)

where w is the weight, V is the vertical subband coecient and Q is the neighbouring coecients. The error log for the linear predictor is obtained by the following: E = log2 (V ) log2 (|Qw|). (8.2)

The same linear predictor and error log is applied for the horizontal and diagonal subbands. The second set of features is composed from the error logs for all nine subband coecients, using the mean, variance, skewness and kurtosis, which yield another 36-dimensional features.

8.1.3

Feature-Based

In [35], the cover image is estimated from a stego image using calibration. A set of 20-dimensional features are constructed from the L1 norm of the dierence between the features of the estimated cover image and the stego image. The L1 norm is dened as the sum of the vector absolute values. Among these features, 17 are the rst-order features. They are dened as follows: Global histogram: the frequency plot of quantised DCT coecients Individual histogram: low frequency coecient of individual DCT mode histogram where ve DCT modes are selected Dual Histogram: frequency of occurrence for a (i, j)-th quantised DCT coecients in an 8 8 block equal to a xed value, d over the whole image and dened as follows:
B d gi,j

=
k=1

(d, dk (i, j)),

(8.3)

where (u, v) = 1 if u = v and 0 otherwise. B is the total of all blocks in the JPEG image and 11 d values are selected. The other three second-order features are given below.

111

Variation measures the inter-block dependency and is dened as follows:


8 i,j=1 8 i,j=1 |Ir |1 k=1 |dIr (k) (i, j)

dIr (k+1) (i, j)|

|Ir | + |Ic |
|Ic |1 k=1

+ (8.4)

|dIc(k) (i, j) dIc (k+1) (i, j)| , |Ir | + |Ic |

where Ir and Ic are collection of blocks that are scanned by rows and columns, respectively, throughout the image. d(i, j) is the quantised DCT coecient at (i, j)-th position from a 8 8 block in the kth block. Blockiness measures the spatial inter-block boundary discontinuity dened as follows: B = p8i+1,j | + N(M 1)/8 + M(N 1)/8 |pi,8j pi,8j+1 | , N(M 1)/8 + M(N 1)/8
(N 1)/8 j=1 M i=1 (M 1)/8 i=1 N j=1 |p8i,j

(8.5)

where = 1, 2 and pi,j is the spatial pixel value. M and N are image resolution. Another three nal features that make up a total of 23-dimensional features in [35] are co-occurrence matrices dened as follows:
|Ir |1 k=1 |Ic |1 k=1 8 i,j=1 8 i,j=1

Cst

s, dIr (k) (i, j) t, dIr (k+1) (i, j) + |Ir | + |Ic | s, dIc (k) (i, j) t, dIc (k+1) (i, j) , |Ir | + |Ic | (8.6)

where s and t (1, 0, 1), which form nine combinations. From these combinations, the three nal features are obtained by the dierence between Cst of the estimated cover image and Cst of the stego image.

8.1.4

Moment of CF of PDF

The use of characteristic functions (CF) in steganalysis was pioneered by Harmsen and Pearlman in [48]. In their work, they assume the stego image histogram is the convolution between the hidden message probabilistic mass function and the cover 112

image histogram. This is because a steganographic operation can be considered as noise addition. The characteristic function is obtained by applying the discrete Fourier transform to the probabilistic density function (PDF) of an image. From this characteristic function, the rst-order of absolute moment (or centre of mass, in their paper) is computed and used as the feature. Equation (8.7) shows the calculation of this moment. M=
K k=0 k|H[k]| , K |H[k]| k=0

(8.7)

where H[] is the characteristic function, K {0, . . . , N 1} and N is the width 2 of the domain of the PDF.

8.2

Features-Pooling Steganalysis

This section discusses the proposed feature-pooling method motivated by the feature selection capability discussed in [55]. The proposed method selects from the existing sensitive discriminant features and pools them with another two feature sets from dierent feature extraction techniques.

8.2.1

Feature Selection in Feature-Based Method

The rst set of proposed feature-pooling features is obtained from [35]. The features from [35] are selected because they include rst- and second-order features that are sensitive to steganographic operations. In addition, the experiments carried out in Section 8.3.2 proved the ecacy of these features. The feature selection technique used is the sequential forward oating selection technique (SFFS) from [93]. As proven by Jain and Zongker in experiments [55], the use of SFFS dominated other feature selection techniques being tested. We also tested other selection techniques based on T-test and Bhattacharyya distance. The experimental results showed the superiority of the SFFS technique. A comparison of the results is summarised in Table 8.1. We also compared the eciencies of the selected feature set (selected through the SFFS technique) and the original 23-dimensional feature set. The comparison was 113

Table 8.1: Feature selection comparison for SFFS, T-test and Bhattacharyya SFFS F5 OutGuess MB1
0.9 0.88 0.86 0.84 0.82 AUR 0.8 0.78 0.76 0.74 0.72 0.7 0 5 Max AUR = 0.86528 at combination of 9 features Max AUR = 0.85447 at original 23 features 10 15 Number of Combined Features 20 25

T-test 0.86063 0.84185 0.78638

Bhattacharyya 0.85447 0.84232 0.79566

0.86528 0.84505 0.80208

Figure 8.1: Comparison between the selected features and the original features in detecting F5 made using three steganographic modelsthe F5, OutGuess1 and model-based Steganography2 (MB1) from [110], [90] and [96], respectively. The F5, OutGuess and MB1 steganography are discussed in Section 3.1. The area under the ROC curve (AUR) is used to evaluate the detection accuracy and is shown in Figures 8.1, 8.2 and 8.3. The higher the AUR, the better the detection accuracy. It can be clearly seen that the selected feature set performs wellbetter than the original 23-dimensional feature set for all three steganographic models. The Y-axis in the graph represents the AUR ranging from 0.5 to 1, and the X-axis is the top nth ranked number of selected and combined features produced by SFFS. The square asterisk corresponds to the AUR for the original 23-dimensional feature set and the circled asterisk is for the selected top nth ranked features. The selected features form the best feature set with optimum discriminant capability.

8.2.2

Feature-Pooling

Pooling the selected feature set from the SFFS technique in Section 8.2.1 with two additional feature sets from dierent feature extraction techniques creates the
1 2

OutGuess steganography with statistic correction. Model-based steganography without deblocking.

114

0.9

0.85

0.8 AUR 0.75 0.7 Max AUR = 0.84505 at combination of 18 features Max AUR = 0.83394 at original 23 features 0.65 0 5 10 15 Number of Combined Features 20 25

Figure 8.2: Comparison between the selected features and the original features in detecting OutGuess
0.82 0.8 0.78 0.76 0.74 0.72 0.7 0.68 0 Max AUR = 0.80208 at combination of 17 features Max AUR = 0.78451 at original 23 features 5 10 15 Number of Combined Features 20 25

Figure 8.3: Comparison between the selected features and the original features in detecting MB1 nal feature set for blind steganalysis. The rst additional set is extracted from the image quality metric developed in [4]. The second additional set is from the feature extraction developed in [48], which is the moment of characteristic function computed from the image PDF. Both of these feature sets are discussed in Section 8.1.1 and 8.1.4, respectively. Based on the analysis given in the original paper [4], we chose the four features assigned for the passive warden steganography case, because the blind steganalysis that we propose in this research is for passive warden steganography as well. However, from these four features, we excluded the angle mean feature because the images tested here are all greyscale images. The contribution of the angle mean feature will be signicant only when colour images are used. For the next pooled features, the original feature proposed in [48] has only the rst moment, so we increase it to the second and third moments according to the

AUR

115

following equation for 1, 2, 3: M =


K k=0 k |H[k]| . K k=0 |H[k]|

(8.8)

Increasing the moment to a higher order does not always improve the result signicantly, which has been justied well in [109]. Furthermore, in our experiments, we found that it is sucient to use only the rst three orders.

8.3

Experimental Results

This section shows and analyses the experimental results. First we choose the optimal classier for our proposed blind steganalysis and followed by the result of comparisons with some existing blind steganalysis. In the construction of the image database, 2037 images of four dierent sizes (512 512, 608 608, 768 768 and 1024 1024) were downloaded from [47]. All images were cropped to obtain the centre portion of the image and were converted to greyscale images. F5, OutGuess and MB1 were selected as the steganographic models for creating three dierent types of stego images. To have a percentage wise equal number of changes over all images, we dene the embedding rate in term of bits per embeddable quantised DCT coecient of the cover image for each. We dene the embeddable coecients as the coecients that can be used to carry the message bits in each steganographic model. We used four embedding rates (5, 25, 50 and 100 per cent), resulting in a mixture of 10,185 cover and stego images in the database. We employ uniform distribution of random message bits for the experiments and the prototype is implemented in Matlab R2008a.

8.3.1

Classier Selection

We selected four types of classiers. There are multivariate regression, Fisher linear discriminant, support vector machine and neural network. Concise explanations of these classiers are available in Section 2.4.2. In this section, we compare dierent classiers using the same set of the proposed features. The purpose is to choose the optimal combination of proposed feature set and classier. To test the exibility and consistency of this combination, the same three dierent steganographic models were tested.

116

Figures 8.4, 8.5 and 8.6 compare the ROC curves for F5, OutGuess and MB1, respectively. In each gure, the Y-axis represents the detection rate and X-axis is the false alarm rate. Each axis ranges from zero to one. The value shown inside the bracket is the AUR value, indicating the detection accuracy. NN, FLD, SVM and MR stand for neural network, Fisher linear discriminant, support vector machine and multivariate regression, respectively. In the comparison for all steganographic models, classications using the neural network as the classier produced the highest AUR values. This indicates that the combination of the proposed feature set and the neural network provide optimal blind steganalysis. Thus, our blind steganalysis is constructed by combining the proposed feature set with a neural network classier.
1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 NN (0.9359) FLD (0.87976) SVM (0.87766) MR (0.75908) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.4: Classier comparison using the proposed features in detecting F5

1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 NN (0.91068) FLD (0.86376) SVM (0.89634) MR (0.88408) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.5: Classier comparison using the proposed features in detecting OutGuess

117

1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 NN (0.76213) FLD (0.7232) SVM (0.72565) MR (0.64392) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.6: Classier comparison using the proposed features in detecting MB1

8.3.2

Results Comparison

This section compares the performance of our proposed blind steganalysis to that of selected existing blind steganalysis. From the constructed image database, 80 per cent of the images are used for training and the remaining 20 per cent are used for testing. The same steganographic models (i.e., F5, OutGuess and MB1) are used and the classication for each steganography is carried out separately. The following blind steganalysis techniques are selected for the detection performance comparison: Image quality metrics are combined with the multivariate regression classier [4] (IQM). Moment of wavelet decomposition is combined with the SVM classier [73] (Farid). Feature-based method is combined with the SVM classier3 [85] (FB). Moments of characteristic function of the image PDF is combined with Fisher linear discriminant classier [48] (COM). Figure 8.7 shows the ROC curves and AUR values for our proposed method and other blind steganalysis techniques at an embedding rate of 25 per cent. From the best ROC curve at the top left of the graph to the diagonal, the AUR values are 0.9359 for our proposed method, followed by the FB method at 0.72827. The Farid method, at 0.53736, is slightly better than the COM and IQM methods at 0.52292 and 0.51072, respectively. Our proposed method outperformed all other
Although the original paper [35] used Fisher linear discriminant as the classier, their later paper [85] obtain improvement by using SVM.
3

118

1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 PF (0.9359) Farid (0.53736) FeatureBased (0.72827) IQM (0.51072) COM (0.52292) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.7: Comparison of steganalysis performance in detecting F5 blind steganalysis techniques in detecting F5. From Figure 8.8, the steganalysis results in detecting OutGuess show that the FB method is competitive with our proposed methodthe dierence in AUR values is only 0.0243. But our proposed method is better overall and especially with a lower false alarm rate. This property is desired in an optimal blind steganalysis because an optimal blind steganalysis should be able to classify correctly with a low false alarm rate. The other three blind steganalysis techniques do not show good performance at this low embedding rate and the AUR values are cantered around 0.51. Although it is well known that, among the three steganographies (F5, OutGuess and MB1), OutGuess is relatively easier to detect, we obtained a relatively lower AUR values compared with that for detecting F5. This is because we are using bits per embeddable quantised DCT coecients as the embedding rate, which reduces the message length embedded in OutGuess. In other words, we are using a shorter message length in our experiments, which makes it more dicult for a steganalysis to perform the detection. Figure 8.9 compares the detection performance in detecting MB1. Again, our proposed method outperformed all other blind steganalysis techniques at an AUR value of 0.76213. This exceeds the AUR values of 0.64868, 0.53785, 0.53025 and 0.50423 for FB, Farid, IQM and COM, respectively. All the AUR values in Figure 8.9 are low compared to the AUR values in both Figure 8.7 and 8.8. This nding is consistent with the nding in [35, 65, 66], which indicated that MB1 is the hardest to detect.

119

1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 PF (0.91068) Farid (0.53844) FeatureBased (0.88637) IQM (0.50571) COM (0.50159) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.8: Comparison of steganalysis performance in detecting OutGuess


1 0.9 0.8 0.7 0.6 Detection 0.5 0.4 0.3 0.2 0.1 0 PF (0.76213) Farid (0.53785) FeatureBased (0.64868) IQM (0.53025) COM (0.50423) 0 0.2 0.4 0.6 False Alarm 0.8 1

Figure 8.9: Comparison of steganalysis performance in detecting MB1

8.4

Conclusion

In this research, we proposed a feature-pooling method for building a blind steganalysis feature set. We applied the SFFS technique to selecting the key significant features from the feature-based method [35] and then combined them with two additional feature sets: image quality metrics [4] and the modied rst three moments of the characteristic function computed from the image PDF [48]. Based on these feature sets, we employed a neural network as the classier to construct a blind JPEG image steganalysis. From the experimental results, we concluded that our proposed blind steganalysis showed improvement and dominates the other tested blind steganalysis techniques.

120

Chapter 9 Improving JPEG Image Steganalysis


Although the performance of blind steganalysis is often inferior to a targeted one, its exibility and wide coverage of dierent steganographic methods make it an attractive and practical choice. This chapter focuses on blind steganalysis; specifically we will propose a technique for improving some of the existing steganalysis techniques. To do that we propose to minimise the image-to-image variations, which increases the discriminative ability of a feature set. We will illustrate the eciency of the proposed method by incorporating it into several existing blind JPEG image steganalysis techniques. The experimental results presented will verify the feasibility and applicability of the proposed technique for improving existing techniques. The remainder of this chapter is as follow. The next section will model the steganography artefact as an additive noise. The proposed method is discussed in Section 9.2 and 9.3. Section 9.4 presents the experimental results. Section 9.5 concludes the chapter.

9.1

Steganography as Additive Noise

Let X denote an instance of a JPEG cover image and let PC (x) denote the probability mass function of a cover image. In a JPEG image, the probability mass function can be considered as the frequency count of the quantised DCT coecients.

121

The secret message probability mass function can be dened as the distribution of additive stego noise, which is dened as follows: PN (n) P (x x = n), (9.1)

where x and x are quantised DCT coecients before and after embedding, respectively. Generally, we can divide a cover image used in steganography into two parts, xc and xs . Part xc is the unperturbed part and normally consists of a group of the most signicant bits. xs is the part that will be altered and used to carry the secret message. Normally this part contains a group of less signicant bits. Since the additive stego noise is independent of the cover image, perturbing the xs part by embedding a secret message bit into it is equivalent to the convolution of the additive stego noise probability mass function and the cover image probability mass function. This can be expressed by the following: PS (n) = PN (n) PC (n), (9.2)

where is the convolution and PS (n) is the stego image probability mass function.

9.2

Image-to-Image Variation Minimisation

Dening a discriminative feature set in image steganalysis is a challenging task because the dened feature set should be optimally sensitive to steganographic alteration and not to image-to-image variations. Image-to-image variation is dened as the dierence between the underlying statistic of one image and that of another. The underlying statistic can be the histogram distribution of the DCT coecients or the pixel intensities. For example, the images shown in Figure 9.1 are obviously dierent and, therefore, their underlying statistics (histogram distributions shown below each image) dier. This dierence is the image-to-image variation. In other words, the image-to-image variation is caused by the dierence of the image content. The question of interest here is how we can categorise these images into either cover or stego images. It is obvious that there is no consistency in dierentiating these two images as either a cover or stego image by just examining the histogram distribution because the distribution is rather random and dierent for each image. 122

Figure 9.1: Two images with their respective underlying statistics If we apply feature extraction directly to the histogram distribution, then the extracted feature will have poor discriminative capability because the image-toimage variation is large. Ideally, the cover image is presented together with the stego image during steganalysis detection. We could subtract the stego image S from the cover image C directly as follows: N = S C. (9.3)

Hence, N will be the stego noise and the image-to-image variation is minimal. The result of the subtraction will be the corresponding stego noise N. Note that the subtraction can be viewed as pixel-wise subtraction. However, this case is not typical, and most of the time we have only one version of the imagethe cover or stego image. As a result, it is reasonable to estimate the cover image from the stego image, so that we can minimise the image-to-image variation. To demonstrate the eciency of our proposed technique, we will apply it to existing steganalysis techniques. Thus, we propose two dierent techniques for optimum performance of the respective existing steganalysis techniques. For the rst technique, given two versions of an image, we will rst extract the feature set for each image and compute the dierence between them. For the second technique, we will compute the pixel-wise dierence between the two images and follow it with feature extraction. 123

Figure 9.2: Transformed image by scaling (left) and cropping (right) The two proposed techniques are dened as follows:

1 2

= =

() (), (), = ,

(9.4) (9.5)

where and are the given image (possibly a cover or a stego image) and the estimated cover image, respectively. The variable is the additive stego noise generated by the embedding operation and i is the feature set produced by the feature-extraction technique, (). If is given as a cover image, then i 0. If is given as a stego image, then the absolute value of i is always greater than zero and i = 1, 2. In cover image estimation, we rst decompress the JPEG images to the spatial domain and apply a transformation to the decompressed images. In our experiments, we employed scaling by bilinear interpolation (shown in the left of Figure 9.2) and cropping four pixels in both the horizontal and vertical directions (shown in the right of Figure 9.2). We then recompress the transformed image back to the JPEG domain. In the decompression and recompression processes, we used the same JPEG image quality as before the transformation to avoid double compression. Since steganography can be modelled as additive noise, the eect of the transformation can be attributed as noise neutralisation. Hence, the cover image estimation is reasonable. Similar estimation approaches have proven ecient and can be found in [35, 61].

124

9.3

Steganalysis Improvement

We will select three existing steganalysis techniques to demonstrate the eciency of the proposed technique. In the following subsections, we discuss the incorporation of the proposed technique in each of the selected steganalysis techniques separately.

9.3.1

Moments of Wavelet Decomposition

Lyu and Farid [73] proposed using higher-order statistics as featuresmean, variance, skewness and kurtosis. Two sets of these higher-order statistics are obtained, resulting in 72-dimensional features. The rst set is acquired from wavelet decomposition based on separable quadrature mirror lters. A total of nine subbands are obtained and the mean, variance, skewness and kurtosis are computed for each subband. These 36 higher-order statistics (nine subbands four higher-order statistics), wk and k = 1, 2, . . . , 36 will be used as the feature set. The second set of features is obtained from the log error in the linear predictor for each of the same nine subbands. The four higher-order statistics (mean, variance, skewness and kurtosis) are computed for each log error, resulting in another 36dimensional features, ek for k = 1, 2, . . . , 36. Instead of using the 72-dimensional features extracted directly from the image in the classication, we use the proposed technique discussed in Section 9.2. Specifically, we improve the feature discrimination capability by employing the second proposed technique, dened in Equation (9.5). Thus, our improved feature set is dened in the following equation:

w k e
k

= = = =

, wk (), ek (), w k + e k . (9.6)

125

9.3.2

Moment of CF of PDF

The use of characteristic functions (CF) was pioneered by Harmsen and Pearlman in [48]. The CF is obtained by applying a discrete Fourier transform to the PDF of an image. From this characteristic function, the rst-order absolute moment is computed and used as the feature. Equation (9.7) shows the calculation of this moment. M=
K k=0 k|H[k]| , K k=0 |H[k]|

(9.7)

where H[] is the characteristic function, K {0, . . . , N 1} and N is the width 2 of the domain of the PDF. The feature proposed in [48] has only one featurethe rst moment. We increased to the second and third moments according to the following equation: M =
K k=0 k |H[k]| , K |H[k]| k=0

(9.8)

where 1, 2, 3. Increasing the moment to a higher order does not always improve the result signicantly and therefore is not necessary [109]. By incorporating the proposed technique dened in Equation (9.4), we obtained a new set of features, dened as follows:

= =

() (), M M , (9.9)

where M is the moment computed from the estimated cover image.

9.3.3

Moment of CF of Wavelet Subbands

The basis of the feature set proposed in [115] derives from a Haar wavelet decomposition. The authors decomposed the wavelet to 12 subbands denoted by LLi , HLi , LHi , HHi where i = 1, 2, 3. The given image histogram, denoted LL0 , is also employed. Essentially, the probability mass function can be considered as the distribution for

126

the wavelet subbands and the image histogram. Motivated by the characteristic function from [48], the authors constructed the characteristic functions from all the wavelet subbands and the image histogram, resulting in 13 characteristic functions. After that, the rst three moments for each of the characteristic functions can be computed as in the following equation: M =
N/2 k=0 fk |H(fk )| , N/2 |H(fk )| k=0

(9.10)

where = 1, 2 and 3 are the three moments. fk is the probability mass function and H() is the characteristic function. N is the width of the domain of the probability mass function. Next, we improve this method by incorporating the proposed technique, as dened in Equation (9.4). The improved feature set can be dened as follows:

= =

() (), M M ,
j j

(9.11)

where Mj is the moment computed from the estimated cover image. j = 1, 2, . . . , 13 are the 13 characteristic functions.

9.4
9.4.1

Experimental Results
Experimental Setup

Since we are interested in comparing feature discrimination performance, we standardise the classication by using a SVM [9] as the classier in all experiments. Three dierent steganographic methods, F5 [110], OutGuess [90] and MB1 [96], are employed to create three dierent types of stego images. To have a percentage wise equal number of changes over all images, we dene the embedding rate in term of bits per embeddable quantised DCT coecient for each steganography. We use four embedding rates: 5, 25, 50 and 100 per cent. In the construction of the image database, 2037 images of four dierent sizes (512 512, 608 608, 768 768 and 1024 1024) were downloaded from [47]. All 127

Table 9.1: Performance comparison between the proposed technique and the Farid technique Steganalysis Improved 5% 25% 50% 100% Original Improved Original Improved Original Improved Original F5 0.5160 0.5077 0.5850 0.5374 0.7236 0.5940 0.90437 0.7218 OutGuess 0.5120 0.4625 0.6460 0.5384 0.7983 0.6606 0.8811 0.7650 MB1 0.5165 0.5064 0.6304 0.5379 0.7564 0.5754 0.8815 0.6485

images were cropped to obtain the centre portion and then converted to greyscale images. From the constructed database, 80 per cent is used for training and the remaining 20 per cent is used for testing. The prototype implementation is coded in Matlab R2008a.

9.4.2

Results Comparison

We compare the improved version, which uses our proposed method, to the original version for each of the three steganalysis techniques, as discussed in Section 9.3. The detection results were evaluated using the area under the ROC curve (AUR). A higher AUR value indicates better steganalysis performance. The obtained results are tabulated in Tables 9.1, 9.2 and 9.3. We abbreviate the original version of steganalysis techniques discussed in Subsections 9.3.1, 9.3.2 and 9.3.3 as Farid, COM and MW, respectively. Table 9.1 shows the comparison between the improved version and the original version of Farid technique. From the table, each AUR value of the improved version is larger than the corresponding original version. This indicates that the improved version outperformed the original version for all the steganographic models and the four embedded message lengths. Table 9.2 compares the improved version and the original version of the steganalysis technique in [48]. Although the improvement is not as large as the improvement obtained in Table 9.1, the overall performance has been improved.

128

Table 9.2: Performance comparison between the proposed technique and the COM technique Steganalysis Improved 5% 25% 50% 100% Original Improved Original Improved Original Improved Original F5 0.5055 0.5022 0.5279 0.5097 0.5654 0.5314 0.6392 0.5971 OutGuess 0.4838 0.4631 0.5261 0.4794 0.5408 0.4799 0.5686 0.4817 MB1 0.5040 0.5020 0.5080 0.5029 0.5147 0.5030 0.5228 0.5052

Table 9.3: Performance comparison between the proposed technique and the MW technique Steganalysis Improved 5% 25% 50% 100% Original Improved Original Improved Original Improved Original F5 0.5016 0.5009 0.5514 0.5202 0.7601 0.5625 0.8518 0.6667 OutGuess 0.5087 0.4793 0.5385 0.4927 0.5635 0.5064 0.6213 0.5307 MB1 0.5105 0.5042 0.5116 0.5043 0.5648 0.5286 0.5747 0.5520

129

As for the third improved steganalysis technique, Table 9.3 clearly shows signicant improvement in detecting all the steganographic modelsthe detection performance for the F5 steganographic model appears the most improved. This veries the eectiveness of the proposed technique.

9.5

Conclusion

In conclusion, our proposed technique has improved the three selected steganalysis techniques by minimising image-to-image variations. To minimise the image-toimage variation, we estimate the cover image from the stego image and then compute the dierence between the two. Finally, we extract the feature set from this dierence. The experimental results prove the eectiveness of using the proposed technique.

130

Chapter 10 Conclusions and Future Research Directions

10.1

Summary

In this thesis, we investigated steganalysis that extract information related to a secret message hidden in multimedia document. In particular, we focused our analysis on steganographic methods that use binary images as the medium for a secret message. We organised our work according to the amount of information extracted about the hidden message (i.e., the organisation structured in Section 3.2). The work presented in this thesis is summarised below. 1. Blind steganalysis. We studied and analysed dierent image characteristics. These images are produced by three dierent steganographic methods. Based on the analysis, we developed an eective feature extraction technique to extract a set of sensitive and discriminating features. Using a SVM as the classier, we constructed a blind steganalysis to detect the presence of secret messages embedded in the binary images. 2. Multi-class steganalysis. To the best of our knowledge, no multi-class steganalysis was proposed for binary images at the time we published our multiclass steganalysis in [19]. Besides being able to detect the presence of a secret message in the binary image, this analysis reveals the type of steganographic method used to produce the stego image. This information is crucial and serves as an additional secret parameter that can narrow the scope of anal131

ysis. Thus, our multi-class steganalysis can be considered to extend blind steganalysis to a more involved level of analysis. 3. Message length estimation. Information such as the length of an embedded message is important. In this thesis, we proposed a technique for estimating the length of a message embedded in a binary image. Specically, in this work, our technique attacks the steganographic method developed by Liang et al. in [69]. This type of analysis is normally considered targeted steganalysis, which plays an important role at other levels of analysis (i.e., retrieval of the stegokey). 4. Steganographic payload locations identication. In general, the only evidence we need to break a steganography is to verify and prove that a secret message exists in the image. However, this does not provide enough information for us to locate the secret message. We developed a technique for identifying the steganographic payload locations, based on multiple stego images. Our technique can reveal which pixels in the binary image have been used to carry the secret message bits. Finally, we revisited some of the existing blind steganalysis techniques for analysing JPEG images. We combined several types of features and applied a feature selection technique for the analysis, which not only improves the detection accuracy, but also reduces the computational resources. We showed that an enhancement can be obtained by minimising the inuence of image content. In other words, we increased the feature sensitivity with respect to the dierences caused by steganographic artefacts, rather than the image content. Even though this thesis is formulated as an attack on binary image steganography, we hope that it will contribute to the design of a more secure steganographic method. More precisely, the analysis presented in this thesis can be used to evaluate and measure the security level of a steganographic method, instead of using conventional measurements, such as PSNR.

10.2

Future Research Directions

Modern blind steganalysis techniques are not universal in the sense that their effectiveness depends very much on both the type of cover images and the steganographic methods used. For example, eective blind steganalysis for JPEG image steganography will not be as eective when applied to a spatial domain image. 132

Hence, future work should focus on constructing a real universal steganalysis technique. For multi-class steganalysis, the performance drops when the number of dierent steganographic methods used to train the classier increases. This happens because the feature set may not be optimal. Another reason is that a similarity in the embedding algorithms might exist across dierent steganographic methods, making them dicult to identify and dierentiate. Thus, a more eective and discriminating feature set should be developed. In addition, a better strategy should be found and employed for constructing the multi-class steganalysis. The identication of payload locations in this thesis has been simplied to represent a generic case. This can be seen in our experiments where a generic LSB replacement steganography is used. A more challenging environment can be set up to include other complicated steganographies, such as steganography with adaptive embedding functionality. Even though our steganographic payload identication technique can identify the locations with high accuracy, unfortunately, we cannot retrieve meaningful content of the secret message because what we have is a randomly scattered collection of message bits. We need to re-order them into the correct sequence to extract the message. Obviously, a more complete analysis, involving the correct sequence extraction should be engaged. To gain further insight in this problem, we should examine the retrieval of the stegokey. Unfortunately, this area has been scarcely studied except for the material published in [41, 42].

133

Bibliography
[1] B. Anckaert, B. D. Sutter, D. Chanet, and K. D. Bosschere. Steganography for Executables and Code Transformation Signatures. 7th International Conference on Information Security and Cryptology, 3506:425439, 2004. [2] R. J. Anderson. Stretching the Limits of Steganography. 1st International Workshop on Information Hiding, 1174:3948, 1996. [3] R. J. Anderson and F. A. P. Petitcolas. On the limits of steganography. IEEE Journal of Selected Areas in Communications, 16(4):474481, 1998. [4] I. Avcibas, M. Nasir, and B. Sankur. Steganalysis based on image quality metrics. IEEE 4th Workshop on Multimedia Signal Processing, pages 517 522, 2001. [5] S. Badura and S. Rymaszewski. Transform domain steganography in DVD video and audio content. IEEE International Workshop on Imaging Systems and Techniques, pages 15, 2007. [6] J. D. Ballard, J. G. Hornik, and D. Mckenzie. Technological Facilitation of Terrorism: Denitional, Legal, and Policy Issues. American Behavioral Scientist, 45(6):9891016, 2002. [7] R. Bhme and A. Westfeld. Breaking cauchy model-based JPEG steganogo raphy with rst order statistics. 9th European Symposium on Research Computer Security, 3193:125140, 2004. [8] C. Cachin. An Information-Theoretic Model for Steganography. 2nd International Workshop on Information Hiding, 1525:306318, 1998. [9] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001. [10] C.-C. Chang, C.-S. Tseng, and C.-C. Lin. Hiding data in binary images. 1st International Conference on Information Security Practice and Experience, 3439:338349, 2005. [11] S. Chatterjee and A. S. Hadi. Regression Analysis by Example. John Wiley and Sons, 4th edition, 2006. [12] A. Cheddad, J. Condell, K. Curran, and P. McKevitt. Digital image steganography: Survey and analysis of current methods. Signal Processing, 90(3):727752, 2010. 134

[13] C. Chen and Y. Q. Shi. JPEG Image Steganalysis Utilizing both Intrablock and Interblock Correlations. IEEE International Symposium on Circuits and Systems, pages 30293032, 2008. [14] C. Chen, Y. Q. Shi, W. Chen, and G. Xuan. Statistical Moments Based Universal Steganalysis using JPEG 2-D Array and 2-D Characteristic Function. IEEE International Conference on Image Processing, pages 105108, 2006. [15] X. Chen, Y. Wang, T. Tan, and L. Guo. Blind Image Steganalysis Based on Statistical Analysis of Empirical Matrix. International Conference on Pattern Recognition, 3:11071110, 2006. [16] Z. Chen, S. Haykin, J. J. Eggermont, and S. Becker. Correlative Learning: A Basis for Brain and Adaptive Systems. John Wiley and Sons, 2007. [17] K. L. Chiew and J. Pieprzyk. Features-Pooling Blind JPEG Image Steganalysis. IEEE Conference on Digital Image Computing: Techniques and Applications, pages 96103, 2008. [18] K. L. Chiew and J. Pieprzyk. JPEG Image Steganalysis Improvement Via Image-to-Image Variation Minimization. International IEEE Conference on Advanced Computer Theory and Engineering, pages 223227, 2008. [19] K. L. Chiew and J. Pieprzyk. Binary Image Steganographic Techniques Classication Based on Multi-Class Steganalysis. 6th International Conference on Information Security, Practice and Experience, 6047:341358, 2010. [20] K. L. Chiew and J. Pieprzyk. Blind steganalysis: A countermeasure for binary image steganography. International Conference on Availability, Reliability and Security, pages 653658, 2010. [21] K. L. Chiew and J. Pieprzyk. Estimating Hidden Message Length in Binary Image Embedded by Using Boundary Pixels Steganography. International Conference on Availability, Reliability and Security, pages 683688, 2010. [22] K. L. Chiew and J. Pieprzyk. Identifying Steganographic Payload Location in Binary Image. 11th Pacic Rim Conference on Multimedia Advances in Multimedia Information Processing, 6297:590600, 2010. [23] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273297, 1995. [24] I. Cox, J. Kilian, F. Leighton, and T. Shamoon. Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing, 6(12):16731687, 1997. [25] I. J. Cox, M. L. Miller, J. A. Bloom, J. Fridrich, and T. Kalker. Digital watermarking and steganography. The Morgan Kaufmann series in multimedia information and systems. Morgan Kaufmann Publishers, 2nd edition, 2008.

135

[26] N. Cvejic and T. Seppanen. Increasing robustness of lsb audio steganography by reduced distortion lsb coding. Journal of Universal Computer Science, 11(1):5665, 2005. [27] I. Davidson and G. Paul. Locating Secret Messages in Images. 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 545550, 2004. [28] J. Davis, J. MacLean, and D. Dampier. Methods of information hiding and detection in le systems. 5th IEEE International Workshop on Systematic Approaches to Digital Forensic Engineering, pages 6669, 2010. [29] A. Delforouzi and M. Pooyan. Adaptive Digital Audio Steganography Based on Integer Wavelet Transform. 3rd International Conference on International Information Hiding and Multimedia Signal Processing, pages 283286, 2007. [30] J. Dong and T. Tan. Blind image steganalysis based on run-length histogram analysis. 15th IEEE International Conference on Image Processing, pages 20642067, 2008. [31] J. Dong, W. Wang, and T. Tan. Multi-class blind steganalysis based on image run-length analysis. 8th International Workshop on Digital Watermarking, 5703:199210, 2009. [32] H. Farid. Detecting Steganographic Messages in Digital Images. TR2001412, Department of Computer Science, Dartmouth College, 2001. [33] H. Farid. Detecting Hidden Messages Using Higher-Order Statistical Models. International Conference on Image Processing, 2:905908, 2002. [34] R. Fisher, S. Perkins, A. Walker, and E. Wolfart. HYPERMEDIA IMAGE PROCESSING REFERENCE. Available at http://homepages.inf.ed.ac.uk/rbf/HIPR2/spatdom.htm. [35] J. Fridrich. Feature-Based Steganalysis for JPEG Images and its Implications for Future Design of Steganographic Schemes. 6th International Workshop on Information Hiding, 3200:6781, 2004. [36] J. Fridrich and M. Goljan. Digital image steganography using stochastic modulation. Proceedings of the SPIE on Security and Watermarking of Multimedia Contents V, 5020(1):191202, 2003. [37] J. Fridrich and M. Goljan. On estimation of secret message length in LSB steganography in spatial domain. Security, Steganography, and Watermarking of Multimedia Contents VI, 5306:2334, 2004. [38] J. Fridrich, M. Goljan, and D. Hogea. Attacking the OutGuess. Proceedings of ACM: Special Session on Multimedia Security and Watermarking, 2002. [39] J. Fridrich, M. Goljan, and D. Hogea. Steganalysis of JPEG Images: Breaking the F5 Algorithm. 5th International Workshop on Information Hiding, 2578:310323, 2003. 136

[40] J. Fridrich, M. Goljan, D. Hogea, and D. Soukal. Quantitative steganalysis of digital images: estimating the secret message length. Multimedia System, 9(3):288302, 2003. [41] J. Fridrich, M. Goljan, and D. Soukal. Searching for the Stego-Key. Proceedings of the SPIE on Security and Watermarking of Multimedia Contents VI, 5306:7082, 2004. [42] J. Fridrich, M. Goljan, D. Soukal, and T. Holotyak. Forensic Steganalysis: Determining the Stego Key in Spatial Domain Steganography. Proceedings of the SPIE on Security and Watermarking of Multimedia Contents VII, 5681:631642, 2005. [43] D. Fu, Y. Q. Shi, and D. Zou. JPEG Steganalysis Using Empirical Transition Matrix in Block DCT Domain. International Workshop on Multimedia Signal Processing, 2006. [44] M. Goljan, J. Fridrich, and T. Holotyak. New blind steganalysis and its implications. Security, Steganography, and Watermarking of Multimedia Contents VIII, 6072, 2006. [45] G. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2(2):205224, 1965. [46] D. Gong, F. Liu, B. Lu, P. Wang, and L. Ding. Hiding informationin in java class le. International Symposium on Computer Science and Computational Technology, 2:160164, 2008. [47] P. Greenspun. Philip Greenspun. Available at http://philip.greenspun.com. [48] J. Harmsen and W. Pearlman. Steganalysis of additive-noise modelable information hiding. Proceedings of the SPIE on Security and Watermarking of Multimedia Contents V, 5020:131142, 2003. [49] J. He and J. Huang. Steganalysis of stochastic modulation steganography. Science in China Series F: Information Sciences, 49(3):273285, 2006. [50] J. He, J. Huang, and G. Qiu. A New Approach to Estimating Hidden Message Length in Stochastic Modulation Steganography. 4th International Workshop on Digital Watermarking, 3710:114, 2005. [51] S. Hetzl and P. Mutzel. A Graph-Theoretic Approach to Steganography. 9th IFIP TC-6 TC-11 International Conference on Communications and Multimedia Security, 3677:119128, 2005. [52] M. Hogan. Security and Robustness Analysis of Data Hiding Techniques for Steganography. PhD thesis, University College Dublin, 2008. [53] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13:415425, 2002.

137

[54] F. Huang and J. Huang. Calibration based universal JPEG steganalysis. Science in China Series F: Information Sciences, 52(2):260268, 2009. [55] A. Jain and D. Zongker. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 1997. [56] M. Jiang, N. Memon, E. Wong, and X. Wu. Quantitative steganalysis of binary images. IEEE International Conference on Image Processing, pages 2932, 2004. [57] M. Jiang, X. Wu, E. Wong, and A. Memon. Steganalysis of boundarybased steganography using autoregressive model of digital boundaries. IEEE International Conference on Multimedia and Expo, 2:883886, 2004. [58] L. Jodr, A. G. Law, A. Rezazadeh, J. H. Watson, , and G. Wu. Compua tations for the moore-penrose and other generalized inverses. Congressus Numerantium, pages 5764, 1991. [59] N. F. Johnson and S. Jajodia. Exploring steganography: Seeing the unseen. Computer, 31(2):2634, 1998. [60] J. Kelley. Terror groups hide behind Web encryption. USA Today. Available at http://www.usatoday.com/tech/news/2001-02-05-binladen.htm, 2 May 2001. [61] A. Ker. Steganalysis of LSB Matching in Grayscale Images. IEEE Signal Processing Letters, 12(6):441444, 2005. [62] A. D. Ker. Locating Steganographic Payload via WS Residuals. 10th ACM Workshop on Multimedia and Security, pages 2732, 2008. [63] A. D. Ker and R. Bhme. Revisiting weighted stego-image steganalysis. o Security, Forensics, Steganography, and Watermarking of Multimedia Contents X, 6819, 2008. [64] A. D. Ker and I. Lubenko. Feature Reduction and Payload Location with WAM Steganalysis. Media Forensics and Security, 7254, 2009. [65] M. Kharrazi, H. Sencar, and N. Memon. Benchmarking steganographic and steganalysis techniques. Proceedings of the SPIE on Security, Steganography, and Watermarking of Multimedia Contents VII, 5681:252263, 2005. [66] M. Kharrazi, H. Sencar, and N. Memon. Improving steganalysis by fusion techniques: a case study with image steganography. Proceedings of the SPIE on Security, Steganography, and Watermarking of Multimedia Contents VIII, 6072:5158, 2006. [67] C. Kim. Data hiding based on compressed dithering images. Advances in Intelligent Information and Database Systems, 283:8998, 2010.

138

[68] X.-W. Kong, W.-F. Liu, and X.-G. You. Secret Message Location Steganalysis Based on Local Coherences of Hue. 6th Pacic-Rim Conference on Multimedia, 3768:301311, 2005. [69] G.-l. Liang, S.-z. Wang, and X.-p. Zhang. Steganography in binary image by checking data-carrying eligibility of boundary pixels. Journal of Shanghai University (English Edition), 11(3):272277, 2007. [70] Z. Liu, L. Ping, J. Chen, J. Wang, and X. Pan. Steganalysis based on differential statistics. 5th International Conference on Cryptology and Network Security, 4301:224240, 2006. [71] D.-C. Lou, C.-L. Liu, and C.-L. Lin. Message estimation for universal steganalysis using multi-classication support vector machine. Computer Standards & Interfaces, 31(2):420427, 2009. [72] J. Luk and J. Fridrich. Estimation of primary quantization matrix in as double compressed JPEG images. Proceedings on Digital Forensic Research Workshop, pages 58, 2003. [73] S. Lyu and H. Farid. Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines. 5th International Workshop on Information Hiding, 2002. [74] S. Lyu and H. Farid. Steganalysis using color wavelet statistics and oneclass support vector machines. Security, Steganography, and Watermarking of Multimedia Contents VI, 5306:3545, 2004. [75] S. Lyu and H. Farid. Steganalysis Using Higher-Order Image Statistics. IEEE Transactions on Information Forensics and Security, 1(1):111119, 2006. [76] L. Marvel, C. G. Boncelet, Jr, and C. T. Retter. Spread Spectrum Image Steganography. IEEE Transactions on Image Processing, 8(8):10751083, 1999. [77] Y.-y. Meng, B.-j. Gao, Q. Yuan, F.-g. Yu, and C.-f. Wang. A novel steganalysis of data hiding in binary text images. 11th IEEE Singapore International Conference on Communication Systems, pages 347351, 2008. [78] T. Morkel, J. H. P. Elo, and M. S. Olivier. Using image steganography for decryptor distribution. OTM Confederated International Workshops, 4277:322330, 2006. [79] B. Morrison. Ex-USA Today reporter faked major stories. USA Today. Available at http://www.usatoday.com/news/2004-03-18-2004-0318 kelleymain x.htm, 19 March 2004. [80] W. S. Noble. What is a support vector machine? 24(12), 2006. Nature biotechnology,

139

[81] H. Noda, T. Furuta, M. Niimi, and E. Kawaguchi. Video steganography based on bit-plane decomposition of wavelet-transformed video. Security, Steganography, and Watermarking of Multimedia Contents VI, 5306(1):345 353, 2004. [82] H.-K. Pan, Y.-Y. Chen, and Y.-C. Tseng. A secure data hiding scheme for two-color images. 5th IEEE Symposium on Computers and Communications, pages 750755, 2000. [83] H. Pang, K.-L. Tan, and X. Zhou. Steganographic schemes for le system and b-tree. IEEE Transactions on Knowledge and Data Engineering, 16:701713, 2004. [84] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hidinga survey. Proceedings of the IEEE, 87(7):10621078, 1999. [85] T. Pevn and J. Fridrich. Towards Multi-class Blind Steganalyzer for JPEG y Images. International Workshop on Digital Watermarking, LNCS, 3710:39 53, 2005. [86] T. Pevn and J. Fridrich. Determining the Stego Algorithm for JPEG Imy ages. In Special Issue of IEE Proceedings - Information Security, 153(3):75 139, 2006. [87] T. Pevn and J. Fridrich. Multi-class blind steganalysis for JPEG images. y Proceedings of the SPIE on Security and Watermarking of Multimedia Contents VIII, 6072(1):257269, 2006. [88] T. Pevn and J. Fridrich. Merging markov and DCT features for multi-class y JPEG steganalysis. Proceedings of the SPIE on Security and Watermarking of Multimedia Contents IX, 6505(1):113, 2007. [89] T. Pevn and J. Fridrich. Multi-Class Detector of Current Steganographic y Methods for JPEG Format. IEEE Transactions on Information Forensics and Security, 3(4):635650, 2008. [90] N. Provos. Defending Against Statistical Steganalysis. Proceedings of the 10th conference on USENIX Security Symposium, 10:323335, 2001. [91] N. Provos and P. Honeyman. Detecting Steganographic Content on the Internet. Proceedings of the Network and Distributed System Security Symposium, 2002. [92] N. Provos and P. Honeyman. Hide and Seek: An Introduction to Steganography. IEEE Security & Privacy, 1(3):3244, 2003. [93] P. Pudil, J. Novoviov, and J. Kittler. Floating search methods in feature c a selection. Pattern Recognition Letters, 15(11):11191125, 1994. [94] N. B. Puhan, A. T. S. Ho, and F. Sattar. High capacity data hiding in binary document images. 8th International Workshop on Digital Watermarking, 5703:149161, 2009. 140

[95] B. Rodriguez and G. L. Peterson. Detecting steganography using multi-class classication. IFIP International Conference on Digital Forensics, 242:193 204, 2007. [96] P. Sallee. Model-based steganography. 2nd International Workshop on Digital Watermarking, 2939:154167, 2003. [97] A. Savoldi and P. Gubian. Blind multi-class steganalysis system using wavelet statistics. 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2:9396, 2007. [98] G. Schaefer and M. Stich. UCID - An Uncompressed Colour Image Database. Proc. SPIE, Storage and Retrieval Methods and Applications for Multimedia, pages 472480, 2004. [99] Y. Q. Shi, C. Chen, and W. Chen. A Markov Process Based Approach to Eective Attacking JPEG Steganography. 8th International Workshop on Information Hiding, 4437:249264, 2006. [100] Y. Q. Shi, G. Xuan, C. Yang, J. Gao, Z. Zhang, P. Chai, D. Zou, C. Chen, and W. Chen. Eective Steganalysis Based on Statistical Moments of Wavelet Characteristic Function. International Conference on Information Technology: Coding and Computing, pages 768773, 2005. [101] Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen, and C. Chen. Image Steganalysis Based on Moments of Characteristic Functions Using Wavelet Decomposition, Prediction-Error Image, and Neural Network. IEEE International Conference on Multimedia and Expo, pages 269272, 2005. [102] K. Sullivan, U. Madhow, S. Chandrasekaran, and B. Manjunath. Steganalysis for Markov Cover Data With Applications to Images. IEEE Transactions on Information Forensics and Security, 1(2):275287, 2006. [103] P. S. Tibbetts. Terrorist Use of the Internet And Related Information Technologies. Monograph, School of Advanced Military Studies, Fort Leavenworth, 2002. [104] U. Topkara, M. Topkara, and M. J. Atallah. The hiding virtues of ambiguity: quantiably resilient watermarking of natural language text through synonym substitutions. 8th workshop on Multimedia and security, pages 164174, 2006. [105] S. Trivedi and R. Chandramouli. Locally Most Powerful Detector for Secret Key Estimation in Spread Spectrum Image Steganography. Proceedings of the SPIE on Security, Steganography, and Watermarking of Multimedia Contents VI, 5306:112, 2004. [106] S. Trivedi and R. Chandramouli. Secret Key Estimation in Sequential Steganography. IEEE Transactions on Signal Processing, 53(2):746757, 2005.

141

[107] Y.-C. Tseng and H.-K. Pan. Secure and invisible data hiding in 2-color images. 20th Annual Joint Conference of the IEEE Computer and Communications Societies, 2:887896, 2001. [108] P. Wang, F. Liu, G. Wang, Y. Sun, and D. Gong. Multi-class steganalysis for JPEG stego algorithms. 15th IEEE International Conference on Image Processing, pages 20762079, 2008. [109] Y. Wang and P. Moulin. Optimized feature extraction for learning-based image steganalysis. IEEE Transactions on Information Forensics and Security, 2(1), 2007. [110] A. Westfeld. F5 - A Steganographic Algorithm. 4th International Workshop on Information Hiding, 2137:289302, 2001. [111] A. Westfeld and A. Ptzmann. Attacks on Steganographic System. 3rd International Workshop on Information Hiding, 1768:6176, 2000. [112] M. Wu and B. Liu. Data hiding in binary image for authentication and annotation. IEEE transactions on multimedia, 6(4):528538, 2004. [113] M. Y. Wu and J. H. Lee. A Novel Data Embedding Method for Two-Color Facsimile Images. International Symposium on Multimedia Information Processing, 1998. [114] W. Xinli, F. Albregtsen, and B. Foyn. Texture features from gray level gap length matrix. Proceedings of IAPR Workshop on Machine Vision Applications, pages 375378, 1994. [115] G. Xuan, Y. Q. Shi, J. Gao, D. Zou, C. Yang, Z. Zhang, P. Chai, C. Chen, and W. Chen. Steganalysis Based on Multiple Features Formed by Statistical Moments of Wavelet Characteristic Functions. 7th International Workshop on Information Hiding, 3727:262277, 2005. [116] G. Xuan, Y. Q. Shi, C. Huang, D. Fu, X. Zhu, P. Chai, and J. Gao. Steganalysis Using High-Dimensional Features Derived from Co-occurrence Matrix and Class-Wise Non-Principal Components Analysis (CNPCA). 5th International Workshop on Digital Watermarking, 4283:4960, 2006. [117] C.-Y. Yang. Color image steganography based on module substitutions. 3rd IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2:118121, 2007. [118] L. Yuling, S. Xingming, G. Can, and W. Hong. An Ecient Linguistic Steganography for Chinese Text. IEEE International Conference on Multimedia and Expo, pages 20942097, 2007. [119] G. Zhang. Neural networks for classication: a survey. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 30(4):451462, 2000.

142

[120] H. Zong, F. Liu, and X. Luo. A wavelet-based blind JPEG image steganalysis uing co-occurrence matrix. 11th International Conference on Advanced Communication Technology, 3:19331936, 2009. [121] D. Zou, Y. Q. Shi, W. Su, and G. Xuan. Steganalysis Based on Markov Model of Thresholded Prediction-error Image. IEEE International Conference on Multimedia and Expo, pages 13651368, 2006.

143

Index
512-pattern histogram, 80 AUR, 59, 108 backpropagation, 13 bit per pixel, 10, 14 bitmap, 16 Blind steganalysis, 8, 10 BMP, 16 bpp, 10, 14 center of mass, 20 CF, 107 characteristic function, 20, 107 classication, 10 COM, 20 cover image, 7 cumulative sum, 41 curse of dimensionality, 19 CUSUM, 41 DCT, 28 dierential operation, 24 dierential statistics, 24 digitisation, 14 discrete Fourier transform, 20 embedding operation, 6 extraction operation, 6 F5, 7, 33 feedforward, 13 Fisher linear discriminant, 12 generalized Gaussian distribution, 35 GGD, 35 GIF, 16 GLCM, 67 grey level co-occurrence, 67 histogram dierence, 82 histogram quotient, 85 image calibration, 26 JBIG 2, 36 JPEG, 16 LAHD, 23 least signicant bit, 7 LMP, 41 local angular harmonic decomposition, 23 locally most powerful, 41 LSB, 7 machine learning, 10 mode, 25, 29 Model-based steganography, 48 multivariate regression, 12 Neural networks, 13 OutGuess, 7 pairs of values, 33 pattern recognition, 10 pixel, 14 PMF, 26 PNG, 17 PoV, 33 prediction-error, 21, 26 PRNG, 7, 40, 94 144

probability density function, 21 probability mass function, 26 processing element, 13 pseudorandom number generator, 7, 40 QMF, 23 quadrature mirror lters, 23 radial basis function, 74 random embedding, 7 raster graphic, 15 RBF, 74 Receiver Operating Characteristic, 59 ROC, 59 separating hyperplane, 14 separating line, 13 sequential embedding, 7 sequential forward oating selection, 108 SFFS, 108 steganalysis, 6 steganographic, 6 stego image, 7 stegokey, 7 supervised learning, 11 support vector machine, 13, 31 SVM, 13 Targeted steganalysis, 8 TIFF, 16 vector graphic, 15 WAM, 39 wavelet absolute moments, 39 weighted stego image, 38 WS, 38

145

You might also like