You are on page 1of 62

MP3 STEGANOGRAPHY AND STEGANALYSIS

BY
RAGHU JAYAN MENON

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE


REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
COMPUTER SCIENCE

UNIVERSITY OF RHODE ISLAND


2009

MASTER OF SCIENCE THESIS


OF
RAGHU JAYAN MENON

APPROVED:
Thesis Committee:
Major Professor

DEAN OF THE GRADUATE SCHOOL

UNIVERSITY OF RHODE ISLAND


2009

ABSTRACT
This thesis involves resear h in the eld of MP3 steganography and steganalysis. Steganography is the te hnique of hiding data in a medium in an oblivious
manner. Steganalysis is the dete tion of the presen e of steganographi ontent in
arrier. A new and novel method of MP3 steganography is proposed with emphasis
on in reasing the steganographi apa ity of the arrier medium, MP3 in this ase.
An interesting problem in the eld of steganography is a hieving optimal trade-o
between attaining a high apa ity to hide data while making the noise introdu ed
in the arrier indis ernible. The work presented on the development of a new MP3
steganographi te hnique fo uses on attaining high apa ity as ompared to the
existing MP3 steganographi tools. The tool alled BvSteg a hieves 4 times the
apa ity of MP3Stego and UnderMP3Cover while introdu ing omparable noise
artifa ts in the arrier. The te hnique is novel with its approa h of using Human
odes of quantized MDCT (Modied Dis rete Cosine Transform) oe ients to
represent the bit to hide. The modied dis rete osine transform (MDCT) is a
Fourier-related transform based on the type-IV dis rete osine transform (DCTIV), with the additional property of being lapped: it is designed to be performed
on onse utive blo ks of a larger dataset, where subsequent blo ks are overlapped
so that the last half of one blo k oin ides with the rst half of the next blo k1 .
The se ond part of this thesis deals with MP3 steganalysis. MP3 steganalysis
analyzes MP3 les for possible presen e of steganographi ontent. MP3Stego and
UnderMP3Cover being the only known steganographi tools. Work in this eld by
Westfeld [1 [2 has helped in dete ting these tools with a high level of onden e.
Bohme and Westfeld have in addition worked on the problem of MP3 en oder
lassi ation whi h involves lassifying a MP3 le based on the en oder used to
1 Sour e:wikipedia

produ e it. This a ts as a lter to the steganalysis stage of the tool des ribed.

ACKNOWLEDGMENTS
I would like to thank to Dr. Vi tor Fay-Wolfe for his en ouragement and
support over the years as my advisor. He gave me the freedom to explore and
trusted my abilities. His guidan e on the pra ti al aspe ts of resear h and the
work presented here has been riti al. I would like to thank Dr. Lutz Hamel
for his invaluable suggestions on ma hine learning te hniques, in parti ular the
knowledge I gained in support ve tor ma hines through his lasses, his book and
qui k responses to my E-mails. I also thank him for his support and areful reading
of my work. I would like to thank Dr. Peter Swaszek for a epting my request to
join the defense ommittee. As an external member of my ommittee I thank him
for his interest, as well as areful examination of my work. I would like to thank
Dr. Stuart Westin for a epting the role as the hair of my defense ommittee.
I would like to thank Dr. Andreas Westfeld for his support and responses to my
franti E-mails with regards to his papers.
I would like to thank Kevin Bryan, who helped me shape my resear h in more
ways than one. Kevin has been instrumental in providing ideas, te hni al and
moral support. Kevin and I have had many fruitful dis ussions throughout the
ourse of this work. His dire t and indire t impa t has been riti al to the su ess
of my work. I would like to thank Neil Bennett for his suggestions and areful
reading of my work. Neil and I have had many a dis ussions, a few frustrating
ones when it omes to the relevan e of steganography. The dis ussions helped me
see both the sides.
I would like to thank everyone at the omputer s ien e department for having
given me an opportunity to study and work at the University of Rhode Island.
Finally, I would like to thank my parents, sister and brother for their patien e,
understanding, en ouragement and unyielding support over the years.
iv

PREFACE
This thesis is written in a manus ript format, and investigates the issues related to MP3 (MPEG I/II Layer III) steganography and steganalysis. Steganography is the te hnique of hiding data in a medium without raising suspi ions about
the embedding. Steganalysis is the s ien e of analyzing the over media for the
presen e of hidden data. Steganographi te hniques predate the evolution of multimedia and omputers in general. With the advent of various multimedia formats
of JPEG (Joint Photographi Experts Group), MPEG (Motion Pi ture Experts
Group) to store image, video and audio data steganography has reated its own
ni he in se ure digital multimedia based ommuni ation. Almost all the digital steganographi te hniques exploit the lossy aspe t of the ompressed formats.
Lossy formats like JPEG and MPEG attenuate data that is not per eptually relevant. A general methodology to follow in building a steganographi tool for a
multimedia format is shown in gure on page vi.
Manus ript 1 of the thesis involves MP3 steganography. The nas ent nature
of MP3 based steganographi te hniques is evident from the number of tools available for the purpose. The work analyzes the existing MP3 steganographi tools
MP3Stego and UnderMP3Cover in terms of the te hniques employed to hide data
along with the apa ity and noise introdu ed. In the pro ess the work exposes a
bug in the MP3Stego hiding te hnique that results in the pro ess hanging. Both
the tools have identi al payload apa ity bounds though MP3Stego is theoreti al sin e it involves en ryption for se urity purposes whi h redu es its payload
apa ity. The BvSteg tool proposed in the work is a MP3 steganographi tool
that hides data in the quantized MDCT oe ients. In terms of apa ity the
BvSteg tool ex eeds that of MP3Stego and UnderMP3Cover by a fa tor of nearly
4. In addition safeguards to prevent per eivable noise distortion have been put
v

into the BvSteg tool by limiting the data hiding to region2 in the bigvalue region of the longblo ks. The higher frequen y ranges in region2 as a result of the
MDCT ompa tion property provide good over in terms of imper eptibility of the
noise patterns introdu ed by the data hiding. In addition, the hiding te hnique
uses Human pair swaps to hide data based on the magnitude relationships among
pairs of quantized MDCT oe ients. Analysis of the noise introdu ed in the original signals reveals that BvSteg is omparable in terms of the noise introdu ed in
the arrier with MP3Stego and UnderMP3Cover. BvSteg employs SHA1 hash algorithm to hash a user given passphrase to generate the seed for a pseudo-random
number generator. A pseudorandom number generator (PRNG) is an algorithm
for generating a sequen e of numbers that approximates the properties of random
numbers 2 . The bits from the pseudo random generator determine whi h blo ks
to embed and whi h ones to skip. Introdu ing randomness using a passphrase enhan es the tool se urity. The dete tability of this te hnique has not been studied
even though the Human pair swaps ensure that the hanges to the over data
are very similar to that using LSB (Least Signi ant Bit) hiding whi h is hard to
2 Sour e:wikipedia

vi

dete t.
Manus ript 2 of the thesis deals with MP3 steganalysis. The work is primarily
an implementation of the methods put forth by Westfeld in his papers [1 [2 in
dete ting MP3Stego and UnderMP3Cover. Ma hine learning te hniques, primarily
support ve tor ma hines (SVM) are used for the step of en oder lassi ation.
MP3 en oders are software that onvert a wav le

to MPEG I/II Layer III

format (MP3). MP3 les an a hieve a ompression ratio of 1/12. Even though
MP3 te hnology is patented, no single party owns it wholly. With the intent
of a hieving speed and high audio quality MP3 en oders have mushroomed over
the years. The rst step in building the steganalysis tool involves MP3 en oder
lassi ation using a multi lass SVM.
We thus use SVMs after evaluating the suitability for the purpose of en oder
lassi ation. To build statisti ally signi ant models for en oder lassi ation
bootstrapping was performed with 200 samples of the original data with optimal
parameters to obtain the 95% onden e interval for a ura y. An overall a ura y
of 90.47% was a hieved with regards to lassifying the MP3 les to the appropriate
en oder lass using a polynomial kernel of degree 2. The error rate of 9.53% is solely
attributed to the mis lassi ation of 8Hz and SoloH. The en oder lassi ation
is su eeded by the steganalysis step. The only les that are passed onto the
steganalysis stage are the ones that are en oded using 8Hz and SoloH. MP3 steg
tools MP3Stego and UnderMP3Cover are built on top of the open sour e 8Hz
en oder. One of the obje tives of the en oder lassi ation is to be able to redu e
the false negatives during the steganalysis stage. To a hieve this the inputs to
this stage are limited to those les that are lassied as either 8Hz or SoloH.
MP3Stego dete tion is implemented using QDA (Quadrati Dis riminant Analysis)
as the lassier with the auto-regression oe ients 0 , 1 and 2 over the blo k
3 Mi rosoft,

IBM le format

vii

lengths as attributes. The lassier separates the les en oded using 8Hz and
MP3Stego perfe tly. This perfe t lassi ation is attainable due to the larger
varian e observed in the blo k length in MP3Stego as opposed to 8Hz. The varian e
is a result of the hiding s heme used in MP3Stego whi h modies the blo k length
to obtain a bit parity whi h is the same as the bit to hide. UnderMP3Cover
dete tion is worked into the tool by in orporating the updet program written by
Westfeld for the purpose [2.

viii

TABLE OF CONTENTS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

. . . . . . . . . . . . . . . . . . . . . . . . . .

iv

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

ix

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

ABSTRACT

ACKNOWLEDGMENTS
PREFACE

TABLE OF CONTENTS
LIST OF TABLES

LIST OF FIGURES

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

MANUSCRIPT
1

BvSteg - A High Capa ity MP3 Steganographi Tool using

. . . .

1.1 Introdu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2 MPEG Audio Compression . . . . . . . . . . . . . . . . . . . . .

Spe tral Pair Swaps in Bigvalue Region of Longblo ks

1.3 Overview of existing steg te hniques in MP3 . . . . . . . . . . . 10


1.3.1 MP3Stego . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 UnderMP3Cover . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Other Methods of Data Hiding in MP3 . . . . . . . . . . 13
1.4 BvSteg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 LSB Steganography using Spe tral Pairs and Human
Values . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.2 The Extra tion Pro ess . . . . . . . . . . . . . . . . . . . 21
1.5 Tool Development . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Steg Capa ity and Noise Analysis . . . . . . . . . . . . . . . . . 24
2

A Tool Framework for MP3 Steganalysis

ix

. . . . . . . . . . . . . 31

Page
2.1 Introdu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 MPEG Audio En oders . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Overview of MP3 steg tools . . . . . . . . . . . . . . . . . . . . 33
2.3.1

MP3Stego . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.2

UnderMP3Cover . . . . . . . . . . . . . . . . . . . . . . 34

2.4 Support Ve tor Ma hines and Steganalysis . . . . . . . . . . . . 34


2.5 MP3 Steganalysis Tool Ar hite ture . . . . . . . . . . . . . . . . 37
2.5.1

The tool ar hite ture . . . . . . . . . . . . . . . . . . . . 37

2.5.2

Training and Testing Classier Models . . . . . . . . . . 39

2.5.3

Validity and Statisti al Signi an e . . . . . . . . . . . . 40

2.5.4

MP3Stego Dete tion . . . . . . . . . . . . . . . . . . . . 42

2.5.5

UnderMP3Cover Dete tion . . . . . . . . . . . . . . . . . 43

LIST OF REFERENCES

. . . . . . . . . . . . . . . . . . . . . . . . . . 46

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

LIST OF TABLES
Table

Page

Features used for lassi ation . . . . . . . . . . . . . . . . . . . 38

The en oder list . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

SVM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

SVM Test Results . . . . . . . . . . . . . . . . . . . . . . . . . 40

Confusion Matrix for the polynomial kernel . . . . . . . . . . . 42

Confusion Matrix for QDA model for MP3Stego dete tion . . . 43

xi

LIST OF FIGURES
Figure

Page

MP3 En oding pro ess . . . . . . . . . . . . . . . . . . . . . . .

MP3 Frame Stru ture

Side Information for ea h granule

The bit hiding pro ess in detail

Size omparison of payload apa ity

. . . . . . . . . . . . . . .

24

Noise/Signal Analysis of BvSteg . . . . . . . . . . . . . . . . . .

26

Noise/Signal Analysis of MP3Stego . . . . . . . . . . . . . . . .

27

Noise/Signal Analysis of UnderMP3Cover

. . . . . . . . . . . .

28

MP3 En oding pro ess . . . . . . . . . . . . . . . . . . . . . . .

33

10

Support Ve tor Ma hine Model with a linear de ision surfa e . .

35

11

MP3 Steganalysis Tool Framework

. . . . . . . . . . . . . . . .

38

12

Blo k length distribution . . . . . . . . . . . . . . . . . . . . . .

44

. . . . . . . . . . . . . . . . . . . . . . .

xii

. . . . . . . . . . . . . . . . .

10

. . . . . . . . . . . . . . . . . .

17

MANUSCRIPT

BvSteg - A High Capa ity MP3 Steganographi Tool using Spe tral
Pair Swaps in Bigvalue Region of Longblo ks

Abstra t
Steganography is the te hnique of hiding information in plain sight. Digital
steganographi te hniques embed data in multimedia and les with various formats
su h that a warden per eives the le as normal.

The advent of ompression

te hniques for image, audio and video data has also given rise to avenues galore, for
hiding data in these formats. This paper presents a new te hnique for hiding data
in MPEG I/II layer III ompressed audio les. The te hnique has a higher apa ity
as ompared to the existing methods used in MP3Stego and UnderMP3Cover for
hiding data in MP3 les. The steganographi method proposed hides data in the
bigvalue region of long blo ks by modifying pairs of spe tral values before they are
Human oded. Further, the te hnique redu es the noise introdu ed by embedding
data in region2 of the bigvalue region. Region2 holds spe tral information in the
high frequen y range (5-14 KHz at 44.1 KHz sampling rate), whi h as per the
psy hoa ousti model would have low amplitude values, thus introdu ing lower
noise in the arrier when perturbed.

1.1

Introdu tion
Steganography or data hiding is the te hnique of embedding a message (pay-

load) in a medium ( arrier), without ausing suspi ion about the existen e of
hidden data in the medium. The perturbations to the medium are arried out in
su h a manner that there is no per eivable noise omponent introdu ed.
One way to illustrate the on ept of steganography would be to analyze Sim-

mons' Prisoners' problem [3. Two prisoners are allowed to ommuni ate through a
medium via an agent trusted by the warden. The prisoners' are dis ouraged from
dis ussing any plans of an es ape from the prison. The warden himself though
has a vested interest in letting them ommuni ate as he wants to at h them in
the a t of hat hing an es ape plan or by foiling their plans by modifying the message itself. In the ase of a passive warden a ryptographi te hnique would have
worked. In this ase whi h involves an a tive warden however, the message needs
to look inno uous and hen e ryptography fails. Steganography omes to the prisoners' res ue. The prisoners', with a strong intention of planning an es ape have
already ex hanged a odeword before they were aptured. They use this odeword
to se retly ex hange messages in the pro ess de eiving the warden by hiding the
message in plain sight. The odeword lets them embed and extra t information. A
possible te hnique would be to use the odeword as a position ompass for hiding
and extra ting letters from the message ex hanged. The warden is oblivious to
the existen e of a se ret message. The medium mentioned in the problem above
ould be photographi ally produ ed mi rodots used by espionage agents during
World War II, a Ba on ipher that uses dierent typefa es to hide information or
a digitally altered JPEG image le using Steghide 1 . In all the above mentioned
methods the priority is to hide messages in plain sight and make the arrier look
inno uous.
Digital steganography often uses ompressed/un ompressed image, video and
audio formats. Image steganography has grown in prominen e with tools like
Outguess [4, F5 [5, Steghide [6, to name a few. Compressed audio formats like
MP3 and Ogg lag in their usage as a medium for steganography. The only known
steg tools that use MP3 as a arrier are MP3Stego [7 and UnderMP3Cover [8.
MP3Stego hides data into a MP3 le during the en oding pro ess. The te hnique
1

An open sour e steganography tool.

uses power of parity [9 to embed a bit in the part2_3_length2 of a granule in


a MP3 le. The desired value of part2_3_length is obtained in the inner_loop
that quantizes the input data (spe tral data) by in reasing the quantizer step
size until the quantized data an be en oded using the available number of bits.
The additional ondition that hides the data bit is the he k on parity of the
part2_3_length variable. If the parity is the same as the bit to be hidden the
loop exits. The outer loop he ks if the bound put on the quantization noise
(that gets introdu ed in the inner_loop) has been brea hed. UnderMP3Cover [8
is a steg tool that embeds data by applying LSB steganography on global_gain3
parameter in a MP3 granule. LSB or Least Signi ant

Bit steganography as the

name suggests embeds data in the least signi ant bit of a arrier byte. In ase
of UnderMP3Cover the arrier byte is the global_gain value. UnderMP3Cover
works on an already en oded MP3 le unlike MP3Stego whi h hides data during
the en oding pro ess of Pulse Code Modulation (PCM) samples to MP3.
This paper proposes a new method of steganography in MP3 les in the bigvalue region of long blo ks using a spe tral pair swap method. The layout of the
paper is as des ribed. Se tion 1.2 of the paper gives an overview of the MPEG
layer III audio en oding algorithm. Se tion 1.3 overs the existing MP3 based steg
tools. Se tion 1.4 delves into the proposed high apa ity steg te hnique BvSteg.
Se tion 1.5 provides notes on the tool development along with a link to the sour e
ode. Se tion 1.6 of the paper dis usses the noise introdu ed and ompares the
apa ity of the tools. Se tion 1.7 on ludes highlighting future work.
2 Indi ates

the number of bits used for en oding part2(s alefa tors) and part3(Human en oded data).
3 Used to determine quantizer step size

1.2 MPEG Audio Compression


An un ompressed audio le is stored as PCM samples. The PCM samples are
a digital representation of the analog waveform of an audio signal.

Figure 1: MP3 En oding pro ess


The MP3 en oding pro ess has the following main omponents.
1. Analysis lter bank
The analysis lter bank used for MP3 en oding onsists of two as aded
lter banks. The rst one is a polyphase lter bank whi h is the same as
in Layer I/II. The polyphase lter bank serves the purpose of making Layer
III ba kward ompatible with layers I/II. The polyphase lter bank has 32
equal bandwidth lters. The input audio is riti ally sampled to produ e
32 spe tral omponents. Ea h of the 32 omponents is further split into
18 bands via a Modied Dis rete Cosine Transform (MDCT) whi h is the
se ond lter bank. The MDCT lter bank has been introdu ed in Layer
III to provide better frequen y resolution whi h further helps in removing
possible redundant frequen ies for tonal signals. This improves the oding
e ien y [10. The output of the MDCT blo k is a set of 576 spe tral lines.
One of the other improvements that the augmented lter bank provides is
4

a better ontrol over the error signal. MP3 has two possible window sizes
for analysis/ oding of the signal. MP3 uses a long window with 576 samples
for steady state signals, whi h provides good frequen y resolution or 3 short
windows ea h ontaining 192 samples for transient signals whi h provides
good time resolution. The short windows get introdu ed when there is an
"atta k" (transient), sin e using a long window would spread the noise introdu ed over a wider range of adja ent frequen ies. The shift from a long
window to a short and vi e-versa employs "start" and "stop" windows as
part of the transition. The output of the analysis lter bank is a set of spe tral values with the property of energy ompa tion introdu ed by MDCT.
Ea h frame in MP3 audio has 2 granules. Ea h granule ontains 576 spe tral
values.
2. Psy hoa ousti model
A parallel pro ess runs alongside the analysis lter bank whi h rst onverts
the time domain samples to frequen y domain using the FFT and then provides the output of the Fourier transform to the psy hoa ousti model. A
fast Fourier transform (FFT) is an e ient algorithm to ompute the dis rete Fourier transform (DFT) and its inverse. A Hann window is used prior
to the FFT to redu e the edge ee ts. The Fourier analysis provides the
psy hoa ousti model with the spe tral hange over time. On e the PCM
samples are onverted to the frequen y domain using FFT, the psy hoa ousti model runs algorithms on the data. These algorithms model the human
auditory system. The algorithms provide dire tives on window swit hing to
redu e noise spreading and ompute the allowable distortion in s alefa tor
bands whi h losely resemble the riti al bands of human hearing [11. More
importantly, it provides information on parts of audio that are audible and
5

inaudible. The inaudible part gets eliminated. This is the lossy part in MP3
ompression pro ess.
3.

Quantization
Traditional data ompression te hniques are employed to further ompress
the spe tral data. The psy hoa ousti analysis ompresses ompli ated
sounds better than simpler sounds. Quantization and Human oding are
used to further enhan e the ompression of these simpler sounds. The 576
frequen y bins are further split into 12 or 21 s alefa tor bands depending
on the use of short or long blo ks respe tively. Ea h s alefa tor band represents a range of frequen ies. The frequen ies are then quantized using a
non-uniform power law quantizer. Any error that is introdu ed in the pro ess
is what appears as quantization noise.
The FFT analysis mentioned in the analysis lter bank has an important
role to play in determining how mu h pre ision is needed in a s alefa tor
band. The FFT/Psy hoa ousti model analyzes the signal for sounds that
would be masked by neighboring sounds (masking threshold). In this ase
the weaker signal an be ee tively s aled down without loss of per eptual
quality thus redu ing the number of bits needed to ode that part of the
signal. On the ip side when the signal is s aled ba k up during de oding
there is noise introdu ed due to rounding errors introdu ed during the en oding pro ess. An en oder therefore needs to keep tra k of when the noise
introdu ed makes the SNR (Signal to Noise Ratio) per eptually unfavorable
while at the same time keeping tra k of the number of bits needed to en ode
the part of the signal. SNR is dened as the ratio of a signal power to the
noise power orrupting the signal. A re on iliation between the number of
bits used to en ode a granule and the noise introdu ed as a result of quanti6

zation is a hieved though a feedba k pro ess alled the outer-inner loop. The
inner-loop uses Human oding to assign shorter odes for more frequently
o urring quantized values. It omputes the total number of bits required to
ode a blo k of data and he ks if the number is within the bounds provided
for a frame of data as determined by the sampling and bit rate4 . If not the
quantization step size is in reased by in reasing the global_gain. The quantization step size is hanged until the required the number of bits is within
the allotted bits for the frame.
The outer loop on the other hand is responsible for shaping the quantization
noise a ording to the masking threshold that is omputed by the FFT/Psy hoa ousti model for ea h s alefa tor band. The s alefa tor bands that have
quantization noise above the masking threshold after quantization, i.e. after
the inner-loop iteration, are amplied to redu e the noise. In the pro ess
of ampli ation the number of bits needed to en ode spe tral values of the
amplied bands goes up in reasing the pre ision thus redu ing the noise in
these bands. Ampli ation of s alefa tor bands also mandates a all to the
inner loop to he k if the bits required to en ode the spe tral lines is within
the set bound. This pro ess of quantization and noise shaping is an iterative
pro ess with the outer loop alling the inner loop every time the s alefa tor
bands are amplied.
The terminating ondition arises when all the s alefa tor bands have noise
within the permissible limits and the number of bits used to en ode the blo k
is within the allotted value. This however is not always feasible, and hen e
additional onditions are used in order to terminate the iteration [12.
4. Bit stream Formatting and Human En oding
4 For

example a 44.1 KHz, 128bit MP3 le is allotted 419 bytes per frame

Human odes are variable length odes. They are used in the lossless part
of MP3 ompression. Human odes are used to assign shorter odes to
more frequently o urring strings and longer odes for less frequently o urring ones. MP3 en oding pro ess makes use of 32 Human tables to en ode
quantized spe tral data in various s alefa tor bands. Tables 4 and 14 are
never used. The quantized spe tral values fall in the range [-8191, 8191.
One of the results of modelling ompression based on psy hoa ousti s is that
the resultant signal has high amplitude values asso iated with low frequen y
omponents. The amplitude de reases as the frequen y in reases. The quantized spe tral values are hen e arranged a ording to in reasing frequen y.
Regions of spe tral lines are formed a ording to various frequen y ranges.
Most of energy in the audio signal is on entrated in the 20Hz to 14KHz frequen y range [13 [12. This frequen y range orresponds to the big_value
region in a MP3 le. Further, the big_value region is split into 3 sub-regions
with typi al frequen y range split up of 0-2 KHz (region0), 2-5 KHz (region1), 5-14 KHz (region2) for a MP3 le whi h has been sampled at 44.1
KHz. Ea h of the regions use a dierent Human table for en oding the
quantized values. The sele tion of the table is done on the basis of the lo al
region statisti s of the signal.
The higher frequen y omponents whi h have magnitudes of -1, 0, 1 form the
ount1 region. The rzero region onsists of high frequen y spe tral values
with amplitude 0. The rzero region information is not transmitted a ross as
part of the MP3 le. The ount1 region uses 2 separate Human tables to
en ode ontiguous quadruples of spe tral values. The big_value regions on
the other hand en odes pairs of values using one of the 30 Human tables.
The Human en oding tables an be found in the standard [14. The rzero

se tion does not need any Human en oding.


The MP3 de oder uses a de ode tree me hanism to de ode the Human
values and form the pairs/quads of spe tral values. This is one of the reasons
why de oding a MP3 le is faster than en oding PCM samples to MP3. For
a more elaborate dis ussion on MPEG layer III en oding/de oding refer to
[14 [15 [12 [10 [16 [11.
Figure 2 des ribes the layout of a MP3 frame [17. Ea h blo k in the diagram
indi ates a size of 1 bit.

Figure 2: MP3 Frame Stru ture


Ea h frame in a MP3 bit stream is further split into 2 granules. The side
information for ea h granule in a frame ontains information needed to de ode the
9

main data. Figure 3 des ribes the omponent elds along with the size in bits for
the side information. The size represent the requirements in single hannel mode
as well as the double that would be needed in a dual hannel mode.

Figure 3: Side Information for ea h granule

1.3

Overview of existing steg te hniques in MP3

As mentioned earlier MP3 as a steganographi medium is still in its infan y.


One of the reasons is the terse MP3 spe i ations and involved MP3 en oder
implementation. Any steganographi tool built using MP3 as a medium would
have to be built on top of an existing en oder. As we will see the en oder of hoi e
in the 2 tools mentioned below is 8Hz. Less surprising then would be the fa t that
BvSteg too is built on top of 8Hz en oder.
1.3.1

MP3Stego

MP3Stego [7 is one of the earliest MP3 based steganographi tools. Data


hiding in MP3Stego is done as part of the en oding pro ess. The tool uses the
power-of-parity [9 prin iple to embed data in part2_3_length of a granule in a
MP3 le. The part2_3_length variable indi ates the total number of bits re10

quired to en ode the s alefa tors and the Human oded data. The granules to
be modied are randomly hosen using SHA-1. The value and hen e the parity
of part2_3_length variable is modied in the inner-loop during quantization. In
addition to the original ondition that the number of bits used for en oding a
blo k be within a bound the inner-loop terminates only if the parity of the variable part2_3_length is the same as the bit to be embedded . The inner-loop in
the MP3Stego hiding pro edure is shown in Listing 1.1.
Listing 1.1: MP3Stego inner-loop for hiding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

do
{

do
{

od_info >q u a n t i z e r S t e p S i z e += 1 . 0 ;
q u a n t i z e ( xrs , ix , od_info ) ;
} while ( ix_max ( ix , 0 , 5 7 6 ) > ( 8 1 9 1 + 1 4 ) ) ;

/ w i t h i n t a b l e r a n g e ? /

a l _ r u n l e n ( ix , od_info ) ;
b i t s = 1 b i t s = ount1_bit_ ount ( ix , od_info ) ;
s u b d i v i d e ( od_info ) ;
b i g v _ t a b _ s e l e t ( ix , od_info ) ;
b i t s += b v b i t s = bigv_bit_ ount ( ix , od_info ) ;

/ r z e r o , ount1 , b i g _ v a l u e s /
/ o u n t 1 _ t a b l e s e l e t i o n /
/ b i g v a l u e s s f b d i v i s i o n /
/ odebook s e l e t i o n /
/ b i t ount /

swit h ( h i d d e n B i t )

ase 2 :

embedRule = 0 ;

break ;

ase 0 :
ase 1 :

embedRule = ( ( b i t s + p a r t 2 l e n g t h ) % 2 ) != h i d d e n B i t ;

break ;
default :
}

ERROR( " i n n e r _ l o o p : unexpe ted hidden b i t . " ) ;

} while ( ( b i t s >max_bits ) | embedRule ) ;

Problems with MP3Stego

Analyzing MP3Stego gave rise to


1. Hiding pro ess hangs
The hiding pro ess in MP3Stego hangs on o asion be ause of its inability to
satisfy the parity ondition. In ases when the inner-loop annot en ode the
11

spe tral lines within the bit ration it sets all the spe tral values to 0 [10. In
doing so the number of bits used to en ode the spe tral values part3 (Human
oding) redu es to 0. In the ode in Listing 1.1, bits would be set to 0 on Line
13 as bigv_bit ount would return a 0 in the above said ondition. In ase
1

on Line 22 in the ode were the embedding o urs, the ondition

+ part2length)) % 2 != hiddenBit

, when

% 2) != hiddenBit

bits

((bits

essentially redu es to (part2length

=0. The part2length variable whi h is the

number of bits needed to en ode the s alefa tors is xed for a granule and
does not hange during the pro essing of the inner-loop. Suppose that the
inner-loop set all the spe tral values to 0s and we have a 0 to embed i.e.
=0. If the variable part2length is odd the pro ess (do loop) will

hiddenBit

exe ute forever.


A rude way to over ome this problem is to hange the passphrase and hen e
hange the seed to the pseudo random number generator whi h is responsible
for sele ting the blo ks that would undergo embedding. In hanging the
passphrase one an hope that the seed of the random number generator is
hanged and the granule whi h was ausing the problem during embedding
is not sele ted.
2.

Size onstraint

To begin with, MP3Stego has low embedding rates. This is also not helped
by the fa t that the maximum apa ity of 4 number_of _f rames is never
a hieved. The hiding apa ity of the arrier is diminished by two fa ts,
a.

zlib onsumption

The overhead asso iated with ompressing a 0 byte le with zlib result
in the usage of 24 bytes [1. This overhead redu es the apa ity of the
arrier.
12

b.

Skip random

The logi in MP3Stego skips random blo ks while embedding thereby


leading to a loss in apa ity.
A method of dete ting les stegged using MP3Stego is dis ussed in [1 and is
implemented in the se ond paper.
1.3.2

UnderMP3Cover

UnderMP3Cover [8 is a MP3 LSB steganographi tool that uses global_gain


to hide data. Unlike MP3Stego, UnderMP3Cover hides data in an already en oded
MP3 le. The modi ation is done on the LSB of global_gain variable in sele ted
granules to ree t the embedded bit. The tool uses a spa ing parameter to sele t
the granules to embed in. MP3Stego and UnderMP3Cover have omparable datahiding rates for a given arrier le. A method for dete ting les stegged using
UnderMP3Cover is dis ussed in [2 and is implemented in the se ond paper.
The maximum steg apa ity of the arrier when used with MP3Stego and
UnderMP3Cover is 4 number_of _f rames bits. Both the programs an embed
a maximum of 4 bits in a frame if the signal is stereo sin e a stereo signal has 2
hannels and ea h hannel has two granules.
1.3.3

Other Methods of Data Hiding in MP3

Do-Hyoung et al. [18 dis uss a method of data insertion into MP3 bitstream
using linbits hara teristi s. As mentioned in the paper the method does not have
high apa ity but is good for watermarking appli ations. Litao Gang et al. [19
analyze data hiding s hemes in amplitude domain, phase domain and also dis uss a
noise substitution s heme. N Moghadam and Sadeghi [20 propose a watermarking
s heme in MDCT domain. They des ribe a geneti algorithm to sele t the best
oe ients to embed the watermark.

13

Very few implementations of MP3 based steg te hniques exist. In addition the
te hniques of watermarking though have a similar requirement of se urity through
obs urity impose onstraints on robustness whi h is not very essential for steganographi te hniques. In addition watermarks usually have a small payload size
whi h makes them thrifty when it omes to the payload size while steganography
is more demanding in terms of the payload apa ity of the arrier. These reasons
make watermarking te hniques usually inadequate for steganography.
1.4

BvSteg

BvSteg has almost 4 times the steganographi apa ity of MP3Stego and UnderMP3Cover. The tool hides data in the big_value region of the long blo ks. Embedding is arried out in region2 whi h for a 44.1 KHz sampling rate orresponds
to the frequen y range of 5-14 KHz. Due to the energy ompa tion properties of
MDCT [21 most of the spe tral energy is on entrated in region0 and region1 of
the signal. Changes in region2 introdu es low noise omponents in the signal and,
hen e, the perturbed audio signal as per eived by the human ear is not signi antly
dierent from the signal without the embedding. The a tual algorithm used for
embedding is based on the magnitude relationship between the pairs of spe tral
values that o ur in the big_value region (region2) during the en oding pro ess of
a MP3 le.

14

Algorithm 1 BvSteg hiding pro ess


1: pro edure StegMP3(passphrase, f ile_to_hide)
2:
f ile_size 0
3:
hide_status HIDE _IN _BLOCK
4:
f ile_size_bit_count 0
5:
f ile_size GetFileSize(f ile_to_hide)
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:

while f ile_size_bit_count < 32 do


if hide_status == HIDE _IN _BLOCK then
bit_to_embed f ile_size >> 1
f ile_size_bit_count = f ile_size_bit_count + 1
end if
hide_status Hide(passphrase, bit_to_embed)
end while
f ile_size_bit_count 0
hide_status HIDE _IN _BLOCK
while f ile_size_bit_count < f ile_size do
if hide_status == HIDE _IN _BLOCK then
bit_to_embed bit f rom f ile_to_hide
f ile_size_bit_count = f ile_size_bit_count + 1

end if

hide_status Hide(passphrase, bit_to_embed)

end while
end pro edure
Require: First time Hide all in a longblo k embed_count 1
25: pro edure Hide(passphrase, bit_to_embed)
Ensure: First time Hide all in a longblo k embed_count 1
26:
if (In Human- oding a long blo k) then
27:
if (In region2 of bigvalues) then
28:
if ((equivmap[x[y == 1) && (embed_count 4)) then
29:
hide_or_skip GetRandBit(passphrase)
30:
if (hide_or_skip == HIDE_IN_BLOCK) then
31:
if ((bit_to_embed == 0)&&(x < y)) then
32:
33:
34:
35:
36:
37:
38:

SW AP (x, y);
else if ((bit_to_embed == 1)&&(x > y)) then
SW AP (x, y);

end if
embed_count embed_count + 1
return HIDE _IN _BLOCK
end if

15

39:
40:
41:
42:
43:

end if
end if
end if
return SKIP _BLOCK
end pro edure
In Algorithm 1, the pair (x,y) represents the quantized spe tral values in the

bigvalue region. The bit_to_embed as the name suggests is the bit that is to be
hidden. equivmap is a global array, one for ea h of the 30 tables (Refer [14).
equivmap[x[y is set to 1 if the number of Human bits to en ode the pair (x,y)
is the same as the number of Human bits required to en ode the pair (y,x). The
routine SWAP ips the pair (x,y) . The fun tions GetFileSize and GetRandBit
whi h are not dened expli itly perform the following operations. GetFileSize
fun tion returns the size of the le that is to be hidden. GetRandBit takes as
the argument the passphrase that the user inputs. The passphrase is then hashed
using the SHA1 algorithms and part of the hash is used to seed a pseudo random
number generator. The return value of the fun tion is the LSB of the random
number generated. The embed_ ount is the maximum number of hanges that
are allowed per longblo k and is limited to 4.
The number of embedded bits is restri ted to 4 per granule in the long blo k.
Our experiments show that this number is a good trade-o between high steg
apa ity and low noise. Embedding is skipped if the granule does not ontain a
long blo k. Within a long blo k embedding is further restri ted to spe tral value
pairs in region2 so that the amount of noise introdu ed is minimal.

1.4.1 LSB Steganography using Spe tral Pairs and Human Values
The ow hart in Figure 4 has ea h of the onstraints that are enfor ed on the
andidate pairs of spe tral values in the diamond boxes. Two driving for es behind
the onstraints imposed on the andidate pairs for embedding are:
16

The Hiding Process


0
1
Obtain the block of data
embedCount <- 0

bit_to_embed ==0?

N
Is the block
a longblock?

Y
Y
x<y

x>y

Get the (x,y) pair of spectral values


Y

N
SWAP(x,y)
N

Is the (x,y) pair


in region2?

Proceed to Huffman code


N
Y
0

Huffbits(x,y)
==Huffbits(y,x)?

embedCount == 4?

Does getRandBit return


HIDE_IN_BLOCK?

Y
Get the bit to embed in bit_to_embed
embedCount = embedCount + 1

Figure 4: The bit hiding pro ess in detail

17

1.

The overall bitrate

Constant Bit Rate (CBR) MP3 en oding imposes size onstraints on a MP3
frame. The overall number of bits per frame an be omputed using the
BitRate
). For example, a MP3 le
formula F rameSize = 144 ( SampleRate+P
adding

en oded at a sampling rate of 44.1 KHz and a bitrate of 128 Kbps would
have a frame size of approximately 417 bytes. The distribution of bytes
among the granules is en oder dependent. As dis ussed in se tion 1.2, it
is the responsibility of the inner-loop to ensure that the total number of
Human bits needed to ode a blo k of signal data is within the allotted
number for a frame whi h impli itly puts a limit on the granule.
The problem that arises as a result of swapping a spe tral pair (x,y) is that
Hubits5 (x,y) need not be equal to Hubits(y,x). This auses a hange in
the bit ount in a granule. In order to keep the total number of bits in a
granule within set bounds, two strategies ould be adopted.
a.

Restri ting (x,y) pairs

In this approa h we impose a onstraint of only modifying those (x,y)


pairs whi h satisfy the ondition Huf f bits(x, y) = Huf f bits(y, x).
This ase typi ally arises if the odes for (x,y) and (y,x) are the right
and left leaves of a node in the Human ode tree. A statisti on the
number of su h (x,y) pairs reveals that of the 4995 (table,x,y) triplets,
3302 an be hanged even with the above restri tion imposed. This
implies that 66% of spe tral pairs over all the Human tables an be
swapped while preserving the bit ount in terms of the Human bits
used to en ode the granule.
5 Hubits

omputes the number of bits required to en ode a spe tral pair using Human odes.
For example:Hubits(1,1)=3 for table 1, whi h is the hlen olumn value, refer page 56 [14

18

b. Count1 region modi ation


The se ond approa h is to modify the ount1 region. As mentioned
the ount1 region uses 2 Human tables to en ode quads of ontiguous
values that are either -1, 0 or 1. The en oding of the ount1 region
follows the en oding of the big_value region. The extra bits that might
have been used to en ode the pair (y,x) after the swap ould then be
ompensated by nding a suitable quad in the ount1 region that an be
repla ed with another one with a bit ount lesser by the number of extra
bits used to en ode the pair (y,x). This approa h has the advantage of
providing all the spe tral pairs as andidates for swap thus in reasing
the data hiding apa ity of the arrier. The noise introdu ed due to
ount1 region modi ation is also minimal as ount1 onsists of mostly
high frequen y omponents.
The disadvantage however is that ount1 modi ation is not always
possible. A suitable repla ement for a quad may not exist in some ases
or all possible substitutions might have been performed leading to a
shortage of quads to hange. In addition, if the distan e between the
pairs (x,y) is large then swapping them might introdu e more noise than
that desired.
As an example of ount1 modi ation onsider the s enario wherein
modi ation (swaps) to spe tral pairs in bigvalue region aused a bit
ex ess of 4 in the granule. The extra bits would be the result of longer
Human odes used to en ode the pair (y,x) after the swap. Now suppose that table A was used to en ode the quads in the ount1 region.
The bit ompensation logi would repla e 6 bit odes with 5 bit ones
and 5 with 4 bits ones. If the logi was able to nd 4 su h substitutions

19

the granule en oding would have been a omplished in the stipulated


bit limit. The problem however arises in ases were no su h substitution
is possible for example if the ount1 region uses table B whi h only uses
4 bit odes a suitable substitution is not possible to redu e the ex ess
bits in bigvalue region as there are no odes in the table that are shorter
than 4 bits.
Even though experiments done with the ount1 modi ation show that the
noise introdu ed using method (b) is not dis ernible, due to its failings on
o asion when the pro edure annot nd a suitable quad repla ement, approa h (a) has been followed in this paper. There is however a redu tion in
steg apa ity when ompared to approa h (b).
2. Least distan e between spe tral pair magnitudes
Dieren e in the magnitude between the pair (x,y) was the se ond driving
for e on the onstraints imposed on the andidate pairs. The pairs of spe tral
values represent the quantized MDCT oe ients within a s alefa tor band.
Swapping these values has the ee t of pronoun ing one of the frequen ies in
a band over the adja ent one. As in DCT based steganographi methods the
amount of hange and hen e the noise introdu ed an be minimized by the
te hnique of least signi ant bit modi ation. We ould impose an additional
onstraint on the (x,y) pairs that are modied, su h that |x y| = 1. This
onstraint reates an ee t similar to LSB steganography, but by imposing
this onstraint however we redu e the steg apa ity of the arrier drasti ally.
The total number of pairs that satisfy this ondition with all the previous
onditions in pla e falls to 524 thereby redu ing the swappable pairs from
66% to 11%. This leads to a drasti redu tion in apa ity albeit with the
advantage of redu ed noise. Experiments however, have shown that the noise
20

introdu ed without this onstraint is negligible when ompared to the loss


in urred in the steg apa ity of the arrier with the onstraint in luded. We
therefore do not impose this restri tion in the implementation. This ould
however be interesting for MP3 watermarking appli ations.
1.4.2

The Extra tion Pro ess

The data extra tion pro ess is straightforward. The pairs of spe tral values
that are obtained after Human de oding are ompared for their magnitude relationship. Algorithm 2 details the extra tion pro ess. The equivmap array in
the ondition is he ked for either (x,y) or (y,x) pair being set. The logi is self
explanatory and mirrors that of the hiding pro ess.
1.5

Tool Development

There are quite a few open sour e MP3 en oders available on the web. The
8Hz MP3 [22 en oder was used for the development of the tool. 8Hz sour e ode
base is also used by MP3Stego and UnderMP3Cover tools. 8Hz MP3 en oder is not
the best available en oder in terms of speed and the quality of sound produ ed.
It is however one of the earliest en oders and has been the sour e base for the
development of numerous en oders, prominent among them is the LAME [23
en oder that started o as a pat h to the 8Hz en oder to its present status as one
of the prominent open sour e en oders.
The ode hange primarily involved manipulating the quantized MDCT values
in the Human odebits fun tion of the l3bitstream. le. The quantized MDCT
values are ompared to determine if a swap is needed to en ode the hidden bit
before they are passed onto the Human en oding fun tion HumanCode. The
extra tion logi is implemented in the de ode. le in III_hufman_de ode fun tion. The reverse logi as mentioned in algorithm 2 is oded to extra t the hidden

21

Algorithm 2 BvSteg- The extra tion pro ess


1: pro edure RetrieveMesg(passphrase)
2:
retrieve_status GOT _BIT
3:
f ile_size GetFileSize(f ile_to_hide)
4:
f ile_size_bit_count 0
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

while f ile_size_bit_count < 32 do


retrieve_status Extra t(passphrase, 0, F ILE _SIZE _BIT S)
if retrieve_status == GOT _BIT then
f ile_size olle tFileSize(bit_retrieved)
f ile_size_bit_count + = 1
end if
end while
retrieve_status GOT _BIT
f ile_size_bit_count 0

while ((f ile_size_bit_count < f ile_size) &&


(retrieve_status! = EXT RACT ION _COM P LET E)) do
retrieve_status Extra t(passphrase, f ile_size, M ESSAGE _BIT S)
19:
20:
if retrieve_status == GOT _BIT then
message gatherBits(bit_retrieved)
21:
22:
f ile_size_bit_count + = 1
23:
end if
24:
end while
25: end pro edure
Require: First time all in a granule change_per_granule 1
Require: First time all of Extra t pro edure total_bits 0
26: pro edure Extra t(passphrase, f ile_size, f ile_size_bit_or _mesg _bit)
Ensure: First time all in a granule change_per_granule 1
Ensure: First time all of Extra t pro edure total_bits 0
27:
if (In Human-de oding a long blo k) then
28:
if (In region2 of bigvalues) then
29:
if (((equivmap[x[y==1)||(equivmap[y[x==1))) then
30:
if (change_per_granule 4) then
31:
retrieve_or_skip GetRandBit(passphrase)
32:
if then(retrieve_or_skip == RET RIEV E)
33:
if (x < y) then
34:
Hiddenbit 1;
35:
else if (x > y) then
HiddenBit 0;
36:
37:
end if

22

38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:

then

change_per_granule change_per_granule + 1;
if (f ile_size_or_mesg _bit == M ESSAGE _BIT S)

total_bits total_bits + 1;
end if
end if
end if
end if
end if
end if
if (f ile_size_or_mesg _bit == M ESSAGE _BIT S) then
if total_bits == f ile_size then
return EXT RACT ION _COM P LET E
end if
end if
end pro edure

bit based on the magnitude relationship of the quantized Human pairs. Helper
fun tions were written to build a database of equivalent spe tral pairs in terms of
Human ode length.
The hiding and extra tion pro ess is integrated into a Python s ript whi h alls
the modied en ode and de ode exe utables of the 8Hz en oder. Bit onversion
routines that onvert a text le to a stream of bits for hiding and vi e-versa after
extra tion are in orporated into the s ript. In addition, two helper fun tions in
the main hiding and extra tion pro edures in lude a pseudo random generator
based on SHA1 hash and a le size embedding logi . A part of the SHA1 hash of
the user input passphrase is used to the seed a repli able pseudo random number
generator. This provides the logi to randomly hide and skip blo ks. The le size
of the payload le is hidden into the rst 32 randomly sele ted bits. This limits the
size of payload to 4GB whi h is expe ted to su ient in terms of payload apa ity.

23

1.6

Steg Capa ity and Noise Analysis

The table in Figure 5 shows the apa ity of various wav les under dierent
steg tools. The les have been sele ted from dierent genre. In addition the size
of wav le indi ates dierent duration.
Name

beatles3
jazz2
sting10
vanmorris7
guitar19
nayyar2

Size of wav

9
65
64
50
105
60

MP3Stego

1
7
7
5
11
6

UnderMP3Cover

1
7
7
5
11
6

Bvsteg

3
15
20
17
40
24

(a) Size of the wav le is in MB, Steg apa ity of the MP3 les after en oding the wav les
using MP3Stego, BvSteg is in KB, MP3 les with data hidden using UnderMP3Cover with
spa ing of 2 have size mentioned in KB.

Figure 5: Size omparison of payload apa ity


In order to analyze the noise introdu ed as a result of stegging an MP3, the
MP3 signal was analyzed using auda ity [24, an open sour e MP3 editor. The
following steps were performed as part of the analysis
1. Original MP3 signal and the stegged MP3 signal are loaded.
2. The phase of the steg signal was inverted.
3. Both signals were then mixed together to obtain the dieren e signal.
The dieren e signal shows the noise introdu ed by using a parti ular te hnique for
data hiding. Also, the stegged MP3 signal was analyzed to he k for the delity
with the original signal without the hidden data when en oded with 8Hz. All
steganographi methods introdu e noise in the arrier. A good steg te hnique is
one that does not introdu e per eivable noise. MP3Stego introdu es noise in the
arrier when it hanges the quantizer step size in order to meet the additional
onstraint in the inner-loop i.e. parity(part2_3_length) == bit_to_embed while
24

UnderMP3Cover introdu es noise as part of modi ation to the global_gain value.


BvSteg introdu es noise due to the spe tral pair swap.
Figures 6a and 6b show the hara teristi s of the signal and noise respe tively
for a non-stegged MP3 le and the same MP3 le stegged with BvSteg. The
MP3 signal follows the original signal quite faithfully. Figures 7a and 7b indi ate
the signal and noise plots for MP3Stego and Figures 8a and 8b indi ate that of
UnderMP3Cover. Though the ase for all the 3 steganographi te hniques look
identi al the BvSteg introdu ed noise is a result of apa ity whi h is 4 times more
than that of the other tools in displaying the same noise and signal hara teristi s.
This shows the superiority of BvSteg over both MP3Stego and UnderMP3Cover.

Con lusion
This paper began by introdu ing the on ept of digital steganography. The
MPEG layer III audio en oding pro ess was then illustrated highlighting the lossy
aspe ts of the ompression pro ess. This was followed by an overview of existing
steg tools and te hniques. A bug in the MP3Stego tool that was aused by the
inner_loop onstraints was exposed. A new steg te hnique BvSteg was proposed.
The te hnique is better over the existing steg on epts and tools in terms of the
payload apa ity. BvSteg has 4 times the apa ity of both MP3Stego and UnderMP3Cover. BvSteg tool modies the quantized MDCT oe ients in the high
frequen ies of the big_value region. The hanges in the high frequen y region
are less dis ernible and hen e don't introdu e substantial noise. The feasibility
of the te hnique was outlined in presenting the algorithms and ow hart for the
embedding and extra tion pro ess. Possible methods on how to in rease the apa ity su h as ount1 modi ation and for lowering the noise i.e. least spe tral
magnitude dieren e were also dis ussed. A noise and apa ity analysis presented

25

BvSteg/Original Signal

*
**

150

100

Amplitude in db

50

*
*************** ***
**** *
**
** *** *
* ************
***** ***************** *
***********
*********
***** ***
****** ******
*****
**********
**************
***************
* *****
*****
****
**
***
**
*
**
**
**
*
* * **
**
* ** * ** * * ** *** *
** ***** * *** * * **
*
**
*
***
*
*** * ** * **** * ** * *
* * ** * * *
*
*
*
*

5000

10000

15000

20000

25000

Frequency in Hz

(a) Bla k: MP3 Signal after en oding the wav le using 8Hz,
Green:MP3 signal after hiding using Bvsteg. BvSteg signal
follows the original signal with high delity

BvSteg Noise

100

Amplitude in db

50

**
****
*
** *************** *
*** **
****
* ****
** ******
**
* **
*
**
* *
*
*****
**** *
* *****
**
* ** * * *
*
*
***
* *
***
**
**
****
**
*
* *
* *
*** * ** *
**
* *
*
* *
*
*
***
** *
*
*
* * **
*
*
*
*
*
*
*
**

*
*

**

*
*

*
*
*
*
*
*
**
**
***

150

*
*
*
**
******
** * ****
* * *
************************** **************************************
*
**
**

5000

10000

15000

20000

25000

Frequency in Hz

(b) Bla k: MP3 Signal after en oding the wav le using 8Hz,
Green:Noise signal introdu ed after hiding using Bvsteg

Figure 6: Noise/Signal Analysis of BvSteg

26

MP3Stego/Original Signal

*
**

150

100

50

*
*************** **
**** ***
**
** * * *
* ************
**** *************** *
*************
**********
***** *
**************
*****
**********
**************
*************
** * ****
*****
****
**
****
**
*
***
*
**
*
* * **
**
* ** * ** * * ** *** *
** ***** * *** * * **
*
**
*
***
*
*** * ** * **** * ** * *
* * ** * * *
*
*
*
*

5000

10000

15000

20000

25000

(a) Bla k: MP3 Signal after en oding the wav le using 8Hz,
Red:MP3 signal after hiding using MP3Stego

Amplitude in db

50

MP3Stego Noise

****
********************** **
* **********
** **
**
**** ****************************
***
*******
*
*********** *
******
**********
***********
* **** **
** ***************
* ******
*******
*****
**
***
**
*
****
*
*
*
*

100

150

*
*
*
**
*** *
**********************************************
*
**** *** * ***

5000

10000

15000

20000

25000

Frequency in Hz

(b) Bla k: MP3 Signal after en oding the wav le using 8Hz,
Red:Noise signal introdu ed after hiding using MP3Stego.

Figure 7: Noise/Signal Analysis of MP3Stego

27

UMPC/Original Signal

*
**

Amplitude in db

50

*
*************** ***
**** *
**
** *** *
* ************
***** ***************** *
***********
*********
***** ***
****** ******
*****
**********
**************
***************
* *****
*****
****
**
***
**
*
***
*
**
*
*
** * ** ** ** ** *** **
*
** ****** * **** * * **
*
*
*
***
*
* ** **
*
* *
* * ** * * **
*
* * ****
* **

150

100

*
*

5000

10000

15000

20000

25000

Frequency in Hz

(a) Bla k: MP3 Signal after en oding the wav le using 8Hz,
Blue:MP3 signal after hiding using UnderMP3Cover

UMP3C Noise

*
**

150

100

Amplitude in db

50

*************** *
* * ***
**
******** *
* ************
***** *************** *
**************
********
****** **
**********
*******
**********
**************
*************
* ******
*****
***
***
****
**
*
**
**
**
*
* **
*
**
*
**
**
* ****** ******* **** **** *
*
*
* ** * *** **** ******
**
** * ** * *
*
*
* **
*
**

5000

10000

15000

20000

25000

Frequency in Hz

(b) Bla k: MP3 Signal after en oding the wav le using
8Hz, Blue:Noise signal introdu ed after hiding using UnderMP3Cover.

Figure 8: Noise/Signal Analysis of UnderMP3Cover

28

validates the laim of BvSteg having higher apa ity as ompared to the existing
tools. The spe trum analysis demonstrated the noise omponent introdu ed due
to the perturbations to be within the a eptable levels.
The BvSteg tool despite its high steg apa ity ould still improve in the manner in whi h the hiding algorithm sele ts the andidate frequen y bands for hiding
data. Methods for sele tion of noiseless bands provides s ope for future work. A
brief des ription of one su h proposed methodology ensues.
One of the hallenges in data insertion as a means of overt ommuni ation
or watermarking is that any perturbations should have minimal impa t on the
signal quality. In order to a hieve this we need to distinguish noisy bands from
noiseless ones. The distortion ontrol/rate loops are designed su h that both the
onstraints of bit rate and allowed quantization noise are met. In order to meet the
latter requirement the s alefa tor bands that have more than allowable distortion
are amplied. The ampli ation is done so that more number of bits are allo ated
during the subsequent all to the inner-loop and hen e the quantization noise whi h
was introdu ed earlier in the s alefa tor band would be lesser. This also implies
stealing bits from bands that had noise within permissible levels. The ampli ation
is done under the assumption that the bands whi h have tolerable noise levels after
quantization an assimilate more noise without rossing the noise threshold set by
the psy hoa ousti model.
It is proposed that identifying s alefa tor bands in granules that are not amplied would help sele t frequen y ranges that would introdu e least amount of
noise when their amplitude is modied. In addition te hniques involving sele tion
of blo ks based on the number of bits in the bigvalue region as done by [18 would
be an interesting approa h. This provides s ope for future work.
The paper has been su essful in demonstrating a new high apa ity steg

29

te hnique BvSteg whi h hides data in region2 of longblo k in MP3 les during en oding. This work aim at popularizing the use of MP3 as a steganographi medium
and lay foundation for development of steg tools in MDCT domain. The open nature of the tool would also serve as a ase study that would aid the steganalysis of
any MDCT based steganographi tools.

30

MANUSCRIPT 2
A Tool Framework for MP3 Steganalysis

Abstra t
Steganalysis is the te hnique of dete ting the presen e of hidden data in a
medium whi h an a t as a possible arrier. Digital steganalysis employs te hniques of statisti al analysis and ma hine learning to dete t hidden data in multimedia and les with various formats. Most of the steganographi te hniques
alter statisti al hara teristi s of the underlying media. Steganalysis te hniques
employed in JPEG/MPEG domain try and dete t this hange to as ertain steg in
the medium.
In the ase of MP3 based steganography the tools are mostly built on top of
an existing open sour e en oder. The 8Hz en oder whi h is one of the oldest MP3
en oders has been the en oder of hoi e for the steg tools dis ussed here. This paper
proposes a framework for a tool for the dete tion of MP3 based steganographi
tools. MP3Stego and UnderMP3Cover are the only two MP3 steganographi tools
known apart from the new te hnique proposed in the previous paper.
Step 1 of the dete tion pro ess uses a Support Ve tor Ma hine (SVM) to
identify the en oder used to en ode the MP3 le in question as outlined in [25. The
tool uses a multi- lass SVM for en oder lassi ation. Step 2 employs strategies
des ribed in [1 [2 that target the dete tion of spe i MP3 steg tools MP3Stego
and UnderMP3Cover. Step 1 of the dete tion pro ess is used for pre-ltering MP3
les in order to redu e the false positive rate. This step would be bene ial in
dete ting en oders used to implement new steg te hniques in the future.

31

2.1

Introdu tion

Steganalysis te hniques range from simple methods of dete ting presen e of


hidden data by per eptual analysis of the media, exploiting weak hiding te hniques
for example a JPEG steganographi te hnique that hides data after the end of le
marker to omplex ones that employ ma hine learning and se ond order statisti s
[26. In te hniques that employ ma hine learning the obje tive is to analyze the
alterations in the media that result from the hiding of data and build attributes
that an be used to distinguish stegged from non-stegged les. This paper des ribes
a MP3 steganalysis tool built based on the work in MP3 en oder lassi ation and
steganalysis by Bohme and Westfeld in their papers [25 [1 [2 . The layout of
the paper is as des ribed. Se tion 2.2 of the paper gives a high level view of the
MPEG I/II layer III audio en oding algorithm. Se tion 2.3 overs the existing MP3
based steg tools. Se tion 2.4 develops a basi understanding of lassi ation using
support ve tor ma hines. Se tion 2.5 delves into the proposed tool framework with
test results. Se tion 2.6 on ludes highlighting future work.
2.2

MPEG Audio En oders

MP3 En oders are software that implement the MPEG audio spe i ations in
ompressing an audio signal to MP3 format in the pro ess a hieving a ompression
ratio of nearly 1:12. An un ompressed audio le is stored as PCM (Pulse ode
Modulated) samples. PCM samples are a digital representation of the analog
waveform of an audio signal. In general the audio format is a WAV (Waveform
audio format) le ontaining un ompressed PCM samples. MP3 en oders that
onvert WMA (Windows Media Audio) format to MP3 format also exist.
Though no one holds ex lusive rights MP3 te hnology has most of the algorithms patented by Fraunhofer-Gesells haft. Despite patent issues quite a few
MP3 'like' en oders exist and follow the same basi en oding blo ks as shown in
32

Figure 9. The normative elements in the MPEG standard spe ify the format of
the bit stream ( ompressed audio) and the stru ture of the de oder. The en oder
implementation is ompletely left to the implementer. This freedom has given
rise to dierent en oders whi h produ e MP3 bit stream with same format albeit
distin t properties. Step 1 of MP3 steganalysis dete tion uses a SVM with these
properties as features to dierentiate between the MP3 en oders.

Figure 9: MP3 En oding pro ess

2.3

Overview of MP3 steg tools

As mentioned earlier MP3 based steganography is still in its early stages.


There are 2 known tools for MP3 based steganography MP3Stego and UnderMP3Cover. A des ription of the hiding pro edure employed in the tools is
des ribed below.
2.3.1

MP3Stego

MP3Stego [7 is one of the earliest MP3 based steganographi tools. Data


hiding in MP3Stego is done as part of the en oding pro ess. MP3Stego uses
the power-of-parity [9 prin iple to embed data in part2_3_length of a granule
in a MP3 le. The part2_3_length variable indi ates the total number of bits
33

required to en ode the s alefa tors and the Human oded data. The granules to
be modied are randomly hosen (using SHA-1). The value and hen e the parity
of part2_3_length variable is modied in the inner-loop during quantization. In
addition to the original ondition that the number of bits used for en oding a
blo k be within a bound, the loop terminates only if the parity of the variable
part2_3_length is the same as the bit to be embedded.
2.3.2

UnderMP3Cover

UnderMP3Cover [8 is a MP3 LSB steganographi tool that uses global_gain


to hide data. Unlike MP3Stego, UnderMP3Cover hides data in an already en oded
MP3 le. The modi ation is done on the LSB of global_gain variable in sele ted
granules to ree t the embedded bit. The tool uses a spa ing parameter to sele t
the granules to embed in. MP3Stego and UnderMP3Cover have omparable datahiding rates for a given arrier le.
The maximum steg apa ity of the arrier when used with MP3Stego and
UnderMP3Cover is 4 number_of _f rames bits. Both the programs an embed
a maximum of 4 bits in a frame if the signal is stereo sin e a stereo signal has 2
hannels and ea h hannel has two granules.
2.4

Support Ve tor Ma hines and Steganalysis

Support ve tor ma hines were invented by Vladimir Vapknik. SVMs are a


lassi ation te hnique whi h use the on ept of maximal margin lassiers as the
basis for lassi ation. Maximum margin lassiers allow for better generalization
of the lassier by pla ing the de ision surfa e equidistant from the two lasses. In
addition the de ision surfa e is pla ed su h that it maximizes the distan e between
the lasses. The data points in either lass that onstrain the de ision surfa e from
moving any further in either dire tion are alled support ve tors. A soft margin

34

Figure 10: Support Ve tor Ma hine Model with a linear de ision surfa e
SVM is one wherein the de ision fun tion is allowed to make mistakes. The on ept
of soft margin lassiers introdu es a ost parameter in the formulation of the
de ision surfa e. The ost parameter is introdu ed to penalize the de ision surfa e
on erroneous lassi ation. Noisy data points are the reason for the use of soft
margin lassiers. The formulation of the de ision surfa e in su h a ase often is
di tated by the noisy points whi h do not represent the true separation between
the lasses. In order to have better generalization a sla k variable is introdu ed in
the optimization problem for the maximum margin lassier and the lassier is
allowed to make mistakes on these noisy points so that the margin an be made
wider. With the sla k variable the optimization has a trade-o between margin
width and error. This trade-o makes a ase for having high osts whi h results
in smaller margins due to large penalties resulting from possible mis lassi ation.
Small margin also means that the model may not generalize well. On the other
hand smaller osts implies wider margins whi h implies that the model has the
ability to learn more while allowing for it to make mistakes thus making the model
more in tune with the real world noisy data for lassi ation.
The power of SVMs is highlighted by the fa t that the de ision surfa e an be
35

based on non-linear kernel fun tions that an be used to separate data that is not
linearly separable in higher dimensions. The kernel tri k as it is alled employs
transformation fun tion to data points in the input spa e where the data is not
linearly separable to a higher dimensional spa e alled the feature spa e thereby
making the data linearly separable. The kernel fun tions are pe uliar in the sense
that their properties of positive deniteness among others enable the feature spa e
omputations to be performed in input spa e whi h is quite remarkable.
Of the algorithms that implement the SVM methodology the SMO (Sequential
Minimal Optimization) is the popular implementation used in ma hine learning.
The on ept of VC-dimension in dening model omplexity plays an important
role is sele ting a model that is less omplex as they are the ones that are likely
to generalize better. In most ases the underlying data might not be a true representation of the data universe in terms of ompleteness. Without delving into the
details the VC-dimension of a model lass

dened for a dataset set D (n data

points) is the largest subset of D (size m) shattered by the model lass. In the
above denition m is the VC-dimension of the model lass. A model lass is said to
have shattered a data set D if for all the possible label ongurations in the data
set the models in the model lass an separate the data points perfe tly. Sin e
the data set used for training a model and hen e a lassier is a representation
of the data universe we annot expe t to have the knowledge in order to redu e
the expe ted risk. However we ould learn from the observed data in the data set
and redu e the empiri al risk. This is alled empiri al risk minimization. Overly
optimisti empiri al risk minimization and redu ing the training error to a hieve
high a ura y an lead to poor generalizable models.
Multi- lass SVMs with soft-margins use pairwise lassi ation to build lassi1 A model lass represents all the possible ongurations (rotation and translation) of a de ision

surfa e with a given width.

36

 ation models. In general, models with low ost (extremely soft margin) tend to
lean towards a heavily weighted (in terms of instan es) lass. Sin e in multi- lass
problems the lasses might not be evenly represented in the training data this
ould ause a problem. It is resolved by pairwise lassi ation between ea h of the
lasses. In order to lassify an unknown observation a voting s heme is adopted
whereby the lass that gets the largest number of votes with the pairwise lassi ation is assigned to the unknown observation. For an in-depth understanding of
SVMs refer [27 and [28.
The MP3 en oder lassi ation problem uses a soft margin multi- lass SVM.
Various kernels linear, radial and polynomial are tested to nd the best model in
terms of a ura y and generalization.
2.5

MP3 Steganalysis Tool Ar hite ture

The ensuing se tions elaborate on the ar hite ture of the tool, on epts of ma hine learning adopted along with the validity and signi an e of the lassi ation
model built.
2.5.1

The tool ar hite ture

Figure 11 des ribes the layout of the MP3 steganalysis tool in detail. The
tool onsists of two parts. The rst part des ribes an en oder lassi ation s heme
built using a support ve tor ma hine (SVM) for lassi ation of MP3 les based
on the en oder used to reate them. The feature extra tion pro edure was built
using the mpglib MP3 de oding library. All the features are generated as part
of the MP3 de oding pro ess and are written onto a le. The feature le is then
loaded in the R programming environment for training/testing using SVM. The
result of the multi lass SVM is an output depi ting the lass (en oder) whi h the
MP3 le belongs to. The les en oded using 8Hz and SoloH en oders are the only

37

ones that make it to the se ond stage of steganalysis.

Figure 11: MP3 Steganalysis Tool Framework


The features used for en oder lassi ation are spe ied in table 1. Details on
the signi an e of using these features an be found in [25.
Table 1: Features used for lassi ation
1 Ee tive bit rate ratio
2
Granule size balan e
3 Reservoir usage ramp
4
preag ratio
5 Blo k type transitions
6
SCFSI usage
7 Frame length alignment
8 Human table sele tion
9
Stung byte value
A dataset of 2000 MP3 les was generated by en oding 200 wav les using 10
dierent en oders. The en oders that are dete ted using the SVM lassi ation
are listed in Table 2. In order to generate the dataset a set of 200 wav les was
used. Ea h of the wav le was en oded into the MP3 format by en oding them
with the en oders in Table 2. MP3 les en oded with 8Hz and SoloH are arried
38

Table 2: The en oder list


1
8Hz
2 plugger
3 mp3sEn
4
fasten
5
shine
6
gogo
7
SoloH
8
m3e
9
lame
10 bladeen
over to the se ond stage. The en oding speed observed is varied with Gogo being
the fastest en oder and SoloH being the slowest. The reasons for restri ting the
en oders to the ones mentioned in the list are
1. Certain Fraunhofer and Xing MP3 en oders are not freely available due to
li ensing issues.
2. For the purpose of steganalysis we are only interested in pre lassifying en oders that have modiable sour e ode. Steganographi tool development
would only happen on open sour e en oders.

2.5.2 Training and Testing Classier Models


In order to build a suitable model for the lassi ation of 10 en oders in Table
2 we built a feature extra tion program using the mpglib library and extra ted
features from over 2000 MP3 les. These MP3 les have a mix of dierent genre
and duration. The tra ks in lude live performan e from The Beatles, Classi al
guitar, Van Morrison, Pearl Jam and Sting. Additionally, 1000 MP3 les were
assembled for testing purposes. Step 1 of the tool extra ts the feature ve tor from
an MP3 le and builds a multi- lass SVM for en oder lassi ation. We use the
tune fun tion along with the e1071 pa kage (libsvm library) in R for training the
39

models. The models are built with 10-fold ross-validation to redu e the bias. A
soft-margin lassier is used to train a SVM model on these feature ve tors whereby
we are able to a ount for outliers (noise) that would mimi the real world s enario.
Table 3 shows the range for the free parameters that were used to train models
using SVM. Linear, Radial and Polynomial kernels are used. Not all kernels have
all the free parameters, a '-' in the table represents the absen e of the parameter
for the kernel.
Table 3: SVM Training
Kernel
Cost Range Gamma Range Coef0 Range Degree TrainA %
Linear
0.01-1000
89.12
Radial
0.01-1000
0.0625-256
90.54
Polynomial 0.01-1000
0.0625-256
-100-1000
2,3,5,7,8
91.74
Table 4 shows the best parameters that are sele ted from the models built.
The optimal values for the parameters along with the test a ura y on 1000 pristine
MP3 les with a mix of all en oders is shown in Table 4. The polynomial kernel
is hosen based on the a ura y results on the test set of 1000 MP3 les. The
bootstrap onden e interval range for ea h of the kernels is also shown in Table
4.
Table 4: SVM Test Results
Kernel
Cost Gamma Coef0 Degree TestA % Bootstrap Interval
Linear
1
1
85.88
85.71 - 90.50
Radial
10
0.0625
86.82
86.76 - 90.61
Polynomial 0.01
0.5
10
2
90.47
85.05 - 92.29

2.5.3 Validity and Statisti al Signi an e


The onden e interval represents the impa t of the un ertainty of the real
world data on the lassiers ability to predi t. We ran a bootstrap algorithm with
40

200 bootstrap samples ea h having 3000 data points using the optimal parameters
obtained for ea h of the kernels. Ea h of the 200 samples was sampled from the
original dataset with repla ement. The repla ement in sampling represents the
bias in the real world data. A 10 fold ross validated error is omputed for ea h
sample. Ea h fold has a split of 90/10 (hold out method). The ross validated error
results are then sorted in the as ending order. In order to derive a 95% onden e
error interval we extra t the 2.5th % per entile whi h forms the lower bound and
97.5th % per entile bound whi h forms the upper bound. Thus the a ura y ranges
in the onden e interval olumn of Table 4 imply that we are 95% sure that with
the bias in the real world s enario the models have an a ura y that fall in the
given range.
Based on the test results and the bootstrap onden e interval values the
polynomial kernel is hosen as it is statisti ally signi ant and has a better test
a ura y.
The onfusion matrix of the polynomial kernel model on the 1000 test samples
is given in Table 5. The values represent lassi ation a ura y in terms of a
per entage for ea h of the en oder lass. As an be observed the model does not
alleviate the problem fa ed by authors in [25 whi h is the false lassi ation of 8Hz
as SoloH and vive-versa. As mentioned in [25 this is attributed to the similarity
in the origin of these en oders. The error rate does not however ause problems in
the step 2 as we run the steganalysis dete tion on the les that have been lassied
as either 8Hz or SoloH.
Step 2 of the tool dete ts spe i MP3 steg te hniques whi h in lude
MP3Stego and UnderMP3Cover. As mentioned earlier the nas ent nature of MP3
steg te hniques is the reason for the small number of MP3 steg tools. Both the
steg tools have been su essfully dete ted in Westfeld's papers [1 [2.

41

Table 5: Confusion Matrix for the polynomial kernel


8Hz
plugger
fasten
shine
gogo
m3e
lame
SoloH
bladeen
mp3sEn oder

2.5.4

8Hz
86
0
0
0
0
0
0
24
0
0

plugger
0
100
0
0
0
0
0
0
0
0

fasten
0
0
100
0
0
0
0
0
0
0

shine
0
0
0
100
0
0
0
0
0
0

gogo
0
0
0
0
100
0
0
0
0
0

m3e
0
0
0
0
0
100
0
0
0
0

lame
0
0
0
0
0
0
100
0
0
0

SoloH
28
0
0
0
0
0
0
82
0
0

bladeen
0
0
0
0
0
0
0
0
100
0

mp3s
0
0
0
0
0
0
0
0
0
100

MP3Stego Dete tion

MP3Stego dete tion employs blo k length analysis to distinguish MP3 les
that are stegged using MP3Stego software from the ones that are en oded using
any other MP3 en oder. As explained in [1, a MP3Stego modied MP3 le has
the same size as the original le, despite the blo k sizes being dierent. This is
due to the MP3 rate ontrol pro ess with CBR (Constant Bit Rate) audio whi h
results in en oder ompensating for the extra bits in one frame by redu ing the
bits allo ated to a subsequent one. Despite the mean of the blo k lengths being the
same their varian e in a steganographi ally modied le is dierent from that of a
non-stegged MP3 le. Figure 12 shows this dieren e in varian e using a histogram
on blo k lengths on 2 MP3 les, stegged and non-stegged. A non-stegged MP3
les has unimodal distribution of blo k length whi h peaks near the average frame
length.
MP3Stego dete tion involves 2 stages.

The rst stage in building the

MP3Stego dete tion engine involves determining the autoregressive oe ients
0 , 1 and 2 as per the blo k length relationship blocki = 0 + 1 .blocki1 +
2 .blocki2 mentioned in [1 . A model using quadrati dis riminant analysis (QDA)

with 0 , 1 and 2 as feature ve tors was built to distinguish les en oded using
MP3Stego and 8Hz en oder. The model was able to a hieve 100% distin tion between 8Hz and MP3Stego whi h is supported by the onfusion matrix in Table
42

2.5.4. The model was built using 1500 MP3 les with equal number of stegged
and non-stegged les. The stegged les had data at 50% embedding apa ity. The
model was tested on 1000 pristine MP3 les with equal number of les from the
steg and the non-steg ategory.
Table 6: Confusion Matrix for QDA model for MP3Stego dete tion
MP3Stego
8Hz

2.5.5

MP3Stego 8Hz
500
0
0
500

UnderMP3Cover Dete tion

The dete tion of UnderMP3Cover in the tool is mere integration of ode ited
in the work [2. The program updet exploits the feature of the steg tool whereby
the size information of the le that is hidden is stored in the rst 6 bits of the
arrier MP3 le. By extra ting this data the program

updet

he ks if this value

is larger than the theoreti al maximum whi h is 4 total_number_of _f rames.


The program has a limitation that it assumes a default spa ing of 2 (though adding
dete tion for other values of spa ing is trivial).
As part of the UnderMP3Cover dete tion the tool lters out any MP3 les
that have not been en oded using either 8Hz or SoloH. In stage 2 the dete tor for
UnderMP3Cover is invoked with the MP3 les that get ltered from stage 1.

43

1000
0

500

Frequency
0

500

Frequency

1000

1500

Nonstegged block length distribution

1500

MP3Stego block length distibution

500

1000

1500

part2_3_length

500

1000

part2_3_length

Figure 12: Blo k length distribution

44

1500

Con lusion and Future Work

In this paper we have outlined a framework for MP3 en oder lassi ation
and steganalysis. We adopted the work in this eld by Boheme and Westfeld and
extended it to our steganalysis framework whi h also in ludes JPEG steganalysis. The implementation uses SVMs for en oder lassi ation with the polynomial kernel hosen as a result of better generalization. In order to analyze the
model ee tiveness in the real world we performed the statisti al signi an e test
via bootstrapping. We were able to a hieve omparable results with that in the
original work using SVM, in addition by performing bootstrapping we have also
demonstrated the viability of the use of the SVM lassi ation methodology in the
real world. In terms of steganalysis we built into the tool pro edures to dete t
MP3Stego and UnderMP3Cover as des ribed in Westfeld's work [1 [2.
In future we intend to investigate possible features for a urate lassi ation
of 8Hz and SoloH. We propose the use of SVMs for this purpose in order to be able
to tune the free parameters in various kernels expe ting to reveal de ision surfa es
that ould exploit hidden patterns to dierentiate these en oders. The en oders
themselves need to be studied to develop new features.

45

LIST OF REFERENCES
[1 A. Westfeld,  Dete ting Low Embedding Rates,

LECTURE NOTES IN

COMPUTER SCIENCE, pp. 324339, 2003.

[2 A. Westfeld,  Steganalysis in the Presen e of Weak Cryptography and En oding, LECTURE NOTES IN COMPUTER SCIENCE, vol. 4283, p. 19,
2006.
[3 G. Simmons,  The prisoners problem and the subliminal hannel, in Pro eedings of CRYPTO, vol. 83, 1984, pp. 5167.

[4 N. Provos.  Outguess Steganography tool. [Online. Available: http://www.


outguess.org
[5 A. Westfeld,  F5-A Steganographi Algorithm: High Capa ity Despite Better
Steganalysis, in Information Hiding: 4th International Workshop, IH 2001,
Pittsburgh, PA, USA, April 25-27, 2001: Pro eedings.

[6  Steghide

Steganography

tool.

[Online.

Springer, 2001.

Available:

http://steghide.

sour eforge.net/
[7 University of Cambridge.  MP3Stego.

[Online. Available:

http://www.

petit olas.net/fabien/steganography/mp3stego/
[8 Sour eforge.  UnderMP3Cover. [Online. Available: http://www.les-library.
om/les/UnderMP3Cover.html
[9 F. A. P. Ross J. Anderson,  On The Limits Of Steganography, IEEE Journal
of Sele ted Areas in Communi ations, vol. 16, no. 4, pp. 474481, May 1998.

[10 K. Brandenburg and H. Popp,  An Introdu tion to MPEG Layer, 2003.


[11 S. Khalid, Introdu tion to Data Compression.

Morgan Kaufmann, 2000.

[12 D. Pan, M. In , and I. S haumburg,  A tutorial on MPEG/audio ompression, Multimedia, IEEE, vol. 2, no. 2, pp. 6074, 1995.
[13 A. Servetti, C. Testa, J. De Martin, and D. e Informati a,  Frequen y-sele tive
partial en ryption of ompressed audio, in A ousti s, Spee h, and Signal Pro essing, 2003. Pro eedings.(ICASSP'03). 2003 IEEE International Conferen e
on, vol. 5, 2003.

[14 MPEG,  Coding of Moving Pi tures and Asso iated Audio for Digital Storage
Media at Upto 1.5 MBIT/s, 1991.

46

[15 R. Raissi,  The Theory Behind Mp3, 2002.


[16 A. B. B. John P. Pri en,  Analysis/Synthesis Filter Bank Design Based on
Time Domain Aliasing Can ellation,

IEEE transa tions on A ousti ,Spee h

and Signal Pro essing, vol. ASSP-34, no. 5, O tober 1986.

[17 S. Ha ker and S. Hayes,

MP3: The Denitive Guide.

O'Reilly & Asso iates,

In . Sebastopol, CA, USA, 2000.


[18 K. Do-Hyoung, Y. Seung-Jin, and C. Jae-Ho,  Additive Data Insertion Into
MP3 Bitstream Using linbits Chara teristi s,",

184, 2004.

[19 L.

Gang,

A.

Akansu,

and

M.

Pro . on ICASSP04, IV-181-

Ramkumar,

 MP3

resistant

oblivious

A ousti s, Spee h, and Signal Pro essing, 2001. Pro eedings.(ICASSP'01). 2001 IEEE International Conferen e on, vol. 3, 2001.

steganography, in

[20 N. Moghadam and H. Sadeghi,  Geneti Content-Based MP3 Audio Watermarking in MDCT Domain,

watermark, vol. 1, no. 2, p. 3.

[21 Y. Wang, L. Yaroslavsky, M. Vilermo, and M. Vaananen,  Some pe uliar

Signal Pro essing Pro eedings, 2000. WCCCICSP 2000. 5th International Conferen e on, vol. 1, 2000.

properties of the MDCT, in

[22  8Hz MP3 En oder. [Online. Available: http://www.8hz. om/mp3/


[23  LAME MP3 En oder. [Online. Available: http://lame.sour eforge.net/
[24  Auda ity. [Online. Available: http://www.auda ity.sour eforge.net/
[25 R. Bhme and A. Westfeld,  Statisti al hara terisation of MP3 en oders for
steganalysis, in

Pro eedings of the 2004 workshop on Multimedia and se urity.

ACM New York, NY, USA, 2004, pp. 2534.


[26 T. Pevny and J. Fridri h,  Merging markov and d t features for multi- lass
jpeg steganalysis,

IS&T/SPIE EI, vol. 6505, 2007.

[27 L. Hamel,  Knowledge Dis overy With Support Ve tor Ma hines, unpublished.
[28 C. Burges,  A Tutorial on Support Ve tor Ma hines for Pattern Re ognition,

Data Mining and Knowledge Dis overy, vol. 2, no. 2, pp. 121167, 1998.

47

BIBLIOGRAPHY
 Auda ity. [Online. Available: http://www.auda ity.sour eforge.net/
Bhme, R. and Westfeld, A.,  Statisti al hara terisation of MP3 en oders for
steganalysis, in

Pro eedings of the 2004 workshop on Multimedia and se urity.

ACM New York, NY, USA, 2004, pp. 2534.


Brandenburg, K. and Popp, H.,  An Introdu tion to MPEG Layer, 2003.
Burges, C.,  A Tutorial on Support Ve tor Ma hines for Pattern Re ognition,

Data Mining and Knowledge Dis overy, vol. 2, no. 2, pp. 121167, 1998.

Do-Hyoung, K., Seung-Jin, Y., and Jae-Ho, C.,  Additive Data Insertion Into MP3
Bitstream Using linbits Chara teristi s,",

Pro . on ICASSP04, IV-181-184,

2004.
 8Hz MP3 En oder. [Online. Available: http://www.8hz. om/mp3/
 LAME MP3 En oder. [Online. Available: http://lame.sour eforge.net/
Gang, L., Akansu, A., and Ramkumar, M.,  MP3 resistant oblivious steganog-

raphy,
in A ousti s, Spee h, and Signal Pro essing, 2001. Pro eedings.(ICASSP'01). 2001 IEEE International Conferen e on, vol. 3, 2001.

Ha ker, S. and Hayes, S.,

MP3: The Denitive Guide.

O'Reilly & Asso iates,

In . Sebastopol, CA, USA, 2000.


Hamel, L.,  Knowledge Dis overy With Support Ve tor Ma hines, unpublished.
John P. Pri en, A. B. B.,  Analysis/Synthesis Filter Bank Design Based on Time
Domain Aliasing Can ellation, IEEE transa tions on A ousti ,Spee h and
Signal Pro essing, vol. ASSP-34, no. 5, O tober 1986.

 Steghide Steganography tool. [Online. Available:

http://steghide.sour eforge.

net/
Khalid, S.,

Introdu tion to Data Compression.

Morgan Kaufmann, 2000.

Moghadam, N. and Sadeghi, H.,  Geneti Content-Based MP3 Audio Watermarking in MDCT Domain,

watermark, vol. 1, no. 2, p. 3.

MPEG,  Coding of Moving Pi tures and Asso iated Audio for Digital Storage
Media at Upto 1.5 MBIT/s, 1991.
Pan, D., In , M., and S haumburg, I.,  A tutorial on MPEG/audio ompression,

Multimedia, IEEE, vol. 2, no. 2, pp. 6074, 1995.


48

Pevny, T. and Fridri h, J.,  Merging markov and d t features for multi- lass jpeg
steganalysis,
Provos,

IS&T/SPIE EI, vol. 6505, 2007.

N.  Outguess Steganography tool.

[Online. Available:

http://www.

outguess.org
Raissi, R.,  The Theory Behind Mp3, 2002.

IEEE Journal of
Sele ted Areas in Communi ations, vol. 16, no. 4, pp. 474481, May 1998.

Ross J. Anderson, F. A. P.,  On The Limits Of Steganography,

Servetti, A., Testa, C., De Martin, J., and e Informati a, D.,  Frequen y-sele tive

A ousti s, Spee h, and Signal Pro essing, 2003. Pro eedings.(ICASSP'03). 2003 IEEE International Conferen e
on, vol. 5, 2003.
Simmons, G.,  The prisoners problem and the subliminal hannel, in Pro eedings
of CRYPTO, vol. 83, 1984, pp. 5167.
partial en ryption of ompressed audio, in

Sour eforge.

 UnderMP3Cover.

[Online.

Available:

http://www.les-library.

om/les/UnderMP3Cover.html
University of Cambridge.  MP3Stego. [Online. Available: http://www.petit olas.
net/fabien/steganography/mp3stego/
Wang, Y., Yaroslavsky, L., Vilermo, M., and Vaananen, M.,  Some pe uliar prop-

Signal Pro essing Pro eedings, 2000. WCCC-ICSP


2000. 5th International Conferen e on, vol. 1, 2000.

erties of the MDCT, in

Westfeld, A.,  F5-A Steganographi Algorithm:

High Capa ity Despite Better

Information Hiding: 4th International Workshop, IH 2001,


Pittsburgh, PA, USA, April 25-27, 2001: Pro eedings. Springer, 2001.
Westfeld, A.,  Dete ting Low Embedding Rates, LECTURE NOTES IN COMPUTER SCIENCE, pp. 324339, 2003.
Steganalysis, in

Westfeld, A.,  Steganalysis in the Presen e of Weak Cryptography and En oding,

LECTURE NOTES IN COMPUTER SCIENCE, vol. 4283, p. 19, 2006.

49

You might also like