Professional Documents
Culture Documents
BY
RAGHU JAYAN MENON
APPROVED:
Thesis Committee:
Major Professor
ABSTRACT
This thesis involves resear
h in the eld of MP3 steganography and steganalysis. Steganography is the te
hnique of hiding data in a medium in an oblivious
manner. Steganalysis is the dete
tion of the presen
e of steganographi
ontent in
arrier. A new and novel method of MP3 steganography is proposed with emphasis
on in
reasing the steganographi
apa
ity of the
arrier medium, MP3 in this
ase.
An interesting problem in the eld of steganography is a
hieving optimal trade-o
between attaining a high
apa
ity to hide data while making the noise introdu
ed
in the
arrier indis
ernible. The work presented on the development of a new MP3
steganographi
te
hnique fo
uses on attaining high
apa
ity as
ompared to the
existing MP3 steganographi
tools. The tool
alled BvSteg a
hieves 4 times the
apa
ity of MP3Stego and UnderMP3Cover while introdu
ing
omparable noise
artifa
ts in the
arrier. The te
hnique is novel with its approa
h of using Human
odes of quantized MDCT (Modied Dis
rete Cosine Transform)
oe
ients to
represent the bit to hide. The modied dis
rete
osine transform (MDCT) is a
Fourier-related transform based on the type-IV dis
rete
osine transform (DCTIV), with the additional property of being lapped: it is designed to be performed
on
onse
utive blo
ks of a larger dataset, where subsequent blo
ks are overlapped
so that the last half of one blo
k
oin
ides with the rst half of the next blo
k1 .
The se
ond part of this thesis deals with MP3 steganalysis. MP3 steganalysis
analyzes MP3 les for possible presen
e of steganographi
ontent. MP3Stego and
UnderMP3Cover being the only known steganographi
tools. Work in this eld by
Westfeld [1 [2 has helped in dete
ting these tools with a high level of
onden
e.
Bohme and Westfeld have in addition worked on the problem of MP3 en
oder
lassi
ation whi
h involves
lassifying a MP3 le based on the en
oder used to
1 Sour
e:wikipedia
produ e it. This a ts as a lter to the steganalysis stage of the tool des ribed.
ACKNOWLEDGMENTS
I would like to thank to Dr. Vi
tor Fay-Wolfe for his en
ouragement and
support over the years as my advisor. He gave me the freedom to explore and
trusted my abilities. His guidan
e on the pra
ti
al aspe
ts of resear
h and the
work presented here has been
riti
al. I would like to thank Dr. Lutz Hamel
for his invaluable suggestions on ma
hine learning te
hniques, in parti
ular the
knowledge I gained in support ve
tor ma
hines through his
lasses, his book and
qui
k responses to my E-mails. I also thank him for his support and
areful reading
of my work. I would like to thank Dr. Peter Swaszek for a
epting my request to
join the defense
ommittee. As an external member of my
ommittee I thank him
for his interest, as well as
areful examination of my work. I would like to thank
Dr. Stuart Westin for a
epting the role as the
hair of my defense
ommittee.
I would like to thank Dr. Andreas Westfeld for his support and responses to my
franti
E-mails with regards to his papers.
I would like to thank Kevin Bryan, who helped me shape my resear
h in more
ways than one. Kevin has been instrumental in providing ideas, te
hni
al and
moral support. Kevin and I have had many fruitful dis
ussions throughout the
ourse of this work. His dire
t and indire
t impa
t has been
riti
al to the su
ess
of my work. I would like to thank Neil Bennett for his suggestions and
areful
reading of my work. Neil and I have had many a dis
ussions, a few frustrating
ones when it
omes to the relevan
e of steganography. The dis
ussions helped me
see both the sides.
I would like to thank everyone at the
omputer s
ien
e department for having
given me an opportunity to study and work at the University of Rhode Island.
Finally, I would like to thank my parents, sister and brother for their patien
e,
understanding, en
ouragement and unyielding support over the years.
iv
PREFACE
This thesis is written in a manus
ript format, and investigates the issues related to MP3 (MPEG I/II Layer III) steganography and steganalysis. Steganography is the te
hnique of hiding data in a medium without raising suspi
ions about
the embedding. Steganalysis is the s
ien
e of analyzing the
over media for the
presen
e of hidden data. Steganographi
te
hniques predate the evolution of multimedia and
omputers in general. With the advent of various multimedia formats
of JPEG (Joint Photographi
Experts Group), MPEG (Motion Pi
ture Experts
Group) to store image, video and audio data steganography has
reated its own
ni
he in se
ure digital multimedia based
ommuni
ation. Almost all the digital steganographi
te
hniques exploit the lossy aspe
t of the
ompressed formats.
Lossy formats like JPEG and MPEG attenuate data that is not per
eptually relevant. A general methodology to follow in building a steganographi
tool for a
multimedia format is shown in gure on page vi.
Manus
ript 1 of the thesis involves MP3 steganography. The nas
ent nature
of MP3 based steganographi
te
hniques is evident from the number of tools available for the purpose. The work analyzes the existing MP3 steganographi
tools
MP3Stego and UnderMP3Cover in terms of the te
hniques employed to hide data
along with the
apa
ity and noise introdu
ed. In the pro
ess the work exposes a
bug in the MP3Stego hiding te
hnique that results in the pro
ess hanging. Both
the tools have identi
al payload
apa
ity bounds though MP3Stego is theoreti
al sin
e it involves en
ryption for se
urity purposes whi
h redu
es its payload
apa
ity. The BvSteg tool proposed in the work is a MP3 steganographi
tool
that hides data in the quantized MDCT
oe
ients. In terms of
apa
ity the
BvSteg tool ex
eeds that of MP3Stego and UnderMP3Cover by a fa
tor of nearly
4. In addition safeguards to prevent per
eivable noise distortion have been put
v
into the BvSteg tool by limiting the data hiding to region2 in the bigvalue region of the longblo
ks. The higher frequen
y ranges in region2 as a result of the
MDCT
ompa
tion property provide good
over in terms of imper
eptibility of the
noise patterns introdu
ed by the data hiding. In addition, the hiding te
hnique
uses Human pair swaps to hide data based on the magnitude relationships among
pairs of quantized MDCT
oe
ients. Analysis of the noise introdu
ed in the original signals reveals that BvSteg is
omparable in terms of the noise introdu
ed in
the
arrier with MP3Stego and UnderMP3Cover. BvSteg employs SHA1 hash algorithm to hash a user given passphrase to generate the seed for a pseudo-random
number generator. A pseudorandom number generator (PRNG) is an algorithm
for generating a sequen
e of numbers that approximates the properties of random
numbers 2 . The bits from the pseudo random generator determine whi
h blo
ks
to embed and whi
h ones to skip. Introdu
ing randomness using a passphrase enhan
es the tool se
urity. The dete
tability of this te
hnique has not been studied
even though the Human pair swaps ensure that the
hanges to the
over data
are very similar to that using LSB (Least Signi
ant Bit) hiding whi
h is hard to
2 Sour
e:wikipedia
vi
dete
t.
Manus
ript 2 of the thesis deals with MP3 steganalysis. The work is primarily
an implementation of the methods put forth by Westfeld in his papers [1 [2 in
dete
ting MP3Stego and UnderMP3Cover. Ma
hine learning te
hniques, primarily
support ve
tor ma
hines (SVM) are used for the step of en
oder
lassi
ation.
MP3 en
oders are software that
onvert a wav le
format (MP3). MP3 les
an a
hieve a
ompression ratio of 1/12. Even though
MP3 te
hnology is patented, no single party owns it wholly. With the intent
of a
hieving speed and high audio quality MP3 en
oders have mushroomed over
the years. The rst step in building the steganalysis tool involves MP3 en
oder
lassi
ation using a multi
lass SVM.
We thus use SVMs after evaluating the suitability for the purpose of en
oder
lassi
ation. To build statisti
ally signi
ant models for en
oder
lassi
ation
bootstrapping was performed with 200 samples of the original data with optimal
parameters to obtain the 95%
onden
e interval for a
ura
y. An overall a
ura
y
of 90.47% was a
hieved with regards to
lassifying the MP3 les to the appropriate
en
oder
lass using a polynomial kernel of degree 2. The error rate of 9.53% is solely
attributed to the mis
lassi
ation of 8Hz and SoloH. The en
oder
lassi
ation
is su
eeded by the steganalysis step. The only les that are passed onto the
steganalysis stage are the ones that are en
oded using 8Hz and SoloH. MP3 steg
tools MP3Stego and UnderMP3Cover are built on top of the open sour
e 8Hz
en
oder. One of the obje
tives of the en
oder
lassi
ation is to be able to redu
e
the false negatives during the steganalysis stage. To a
hieve this the inputs to
this stage are limited to those les that are
lassied as either 8Hz or SoloH.
MP3Stego dete
tion is implemented using QDA (Quadrati
Dis
riminant Analysis)
as the
lassier with the auto-regression
oe
ients 0 , 1 and 2 over the blo
k
3 Mi
rosoft,
vii
lengths as attributes. The
lassier separates the les en
oded using 8Hz and
MP3Stego perfe
tly. This perfe
t
lassi
ation is attainable due to the larger
varian
e observed in the blo
k length in MP3Stego as opposed to 8Hz. The varian
e
is a result of the hiding s
heme used in MP3Stego whi
h modies the blo
k length
to obtain a bit parity whi
h is the same as the bit to hide. UnderMP3Cover
dete
tion is worked into the tool by in
orporating the updet program written by
Westfeld for the purpose [2.
viii
TABLE OF CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
. . . . . . . . . . . . . . . . . . . . . . . . . .
iv
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
ix
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
ABSTRACT
ACKNOWLEDGMENTS
PREFACE
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
MANUSCRIPT
1
. . . .
ix
. . . . . . . . . . . . . 31
Page
2.1 Introdu
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 MPEG Audio En
oders . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Overview of MP3 steg tools . . . . . . . . . . . . . . . . . . . . 33
2.3.1
MP3Stego . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2
UnderMP3Cover . . . . . . . . . . . . . . . . . . . . . . 34
2.5.2
2.5.3
2.5.4
2.5.5
LIST OF REFERENCES
. . . . . . . . . . . . . . . . . . . . . . . . . . 46
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
LIST OF TABLES
Table
Page
SVM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
xi
LIST OF FIGURES
Figure
Page
. . . . . . . . . . . . . . .
24
26
27
. . . . . . . . . . . .
28
33
10
35
11
. . . . . . . . . . . . . . . .
38
12
44
. . . . . . . . . . . . . . . . . . . . . . .
xii
. . . . . . . . . . . . . . . . .
10
. . . . . . . . . . . . . . . . . .
17
MANUSCRIPT
BvSteg - A High Capa
ity MP3 Steganographi
Tool using Spe
tral
Pair Swaps in Bigvalue Region of Longblo
ks
Abstra
t
Steganography is the te
hnique of hiding information in plain sight. Digital
steganographi
te
hniques embed data in multimedia and les with various formats
su
h that a warden per
eives the le as normal.
te
hniques for image, audio and video data has also given rise to avenues galore, for
hiding data in these formats. This paper presents a new te
hnique for hiding data
in MPEG I/II layer III
ompressed audio les. The te
hnique has a higher
apa
ity
as
ompared to the existing methods used in MP3Stego and UnderMP3Cover for
hiding data in MP3 les. The steganographi
method proposed hides data in the
bigvalue region of long blo
ks by modifying pairs of spe
tral values before they are
Human
oded. Further, the te
hnique redu
es the noise introdu
ed by embedding
data in region2 of the bigvalue region. Region2 holds spe
tral information in the
high frequen
y range (5-14 KHz at 44.1 KHz sampling rate), whi
h as per the
psy
hoa
ousti
model would have low amplitude values, thus introdu
ing lower
noise in the
arrier when perturbed.
1.1
Introdu
tion
Steganography or data hiding is the te
hnique of embedding a message (pay-
load) in a medium (
arrier), without
ausing suspi
ion about the existen
e of
hidden data in the medium. The perturbations to the medium are
arried out in
su
h a manner that there is no per
eivable noise
omponent introdu
ed.
One way to illustrate the
on
ept of steganography would be to analyze Sim-
mons' Prisoners' problem [3. Two prisoners are allowed to
ommuni
ate through a
medium via an agent trusted by the warden. The prisoners' are dis
ouraged from
dis
ussing any plans of an es
ape from the prison. The warden himself though
has a vested interest in letting them
ommuni
ate as he wants to
at
h them in
the a
t of hat
hing an es
ape plan or by foiling their plans by modifying the message itself. In the
ase of a passive warden a
ryptographi
te
hnique would have
worked. In this
ase whi
h involves an a
tive warden however, the message needs
to look inno
uous and hen
e
ryptography fails. Steganography
omes to the prisoners' res
ue. The prisoners', with a strong intention of planning an es
ape have
already ex
hanged a
odeword before they were
aptured. They use this
odeword
to se
retly ex
hange messages in the pro
ess de
eiving the warden by hiding the
message in plain sight. The
odeword lets them embed and extra
t information. A
possible te
hnique would be to use the
odeword as a position
ompass for hiding
and extra
ting letters from the message ex
hanged. The warden is oblivious to
the existen
e of a se
ret message. The medium mentioned in the problem above
ould be photographi
ally produ
ed mi
rodots used by espionage agents during
World War II, a Ba
on
ipher that uses dierent typefa
es to hide information or
a digitally altered JPEG image le using Steghide 1 . In all the above mentioned
methods the priority is to hide messages in plain sight and make the
arrier look
inno
uous.
Digital steganography often uses
ompressed/un
ompressed image, video and
audio formats. Image steganography has grown in prominen
e with tools like
Outguess [4, F5 [5, Steghide [6, to name a few. Compressed audio formats like
MP3 and Ogg lag in their usage as a medium for steganography. The only known
steg tools that use MP3 as a
arrier are MP3Stego [7 and UnderMP3Cover [8.
MP3Stego hides data into a MP3 le during the en
oding pro
ess. The te
hnique
1
name suggests embeds data in the least signi
ant bit of a
arrier byte. In
ase
of UnderMP3Cover the
arrier byte is the global_gain value. UnderMP3Cover
works on an already en
oded MP3 le unlike MP3Stego whi
h hides data during
the en
oding pro
ess of Pulse Code Modulation (PCM) samples to MP3.
This paper proposes a new method of steganography in MP3 les in the bigvalue region of long blo
ks using a spe
tral pair swap method. The layout of the
paper is as des
ribed. Se
tion 1.2 of the paper gives an overview of the MPEG
layer III audio en
oding algorithm. Se
tion 1.3
overs the existing MP3 based steg
tools. Se
tion 1.4 delves into the proposed high
apa
ity steg te
hnique BvSteg.
Se
tion 1.5 provides notes on the tool development along with a link to the sour
e
ode. Se
tion 1.6 of the paper dis
usses the noise introdu
ed and
ompares the
apa
ity of the tools. Se
tion 1.7
on
ludes highlighting future work.
2 Indi
ates
the number of bits used for en
oding part2(s
alefa
tors) and part3(Human en
oded data).
3 Used to determine quantizer step size
a better
ontrol over the error signal. MP3 has two possible window sizes
for analysis/
oding of the signal. MP3 uses a long window with 576 samples
for steady state signals, whi
h provides good frequen
y resolution or 3 short
windows ea
h
ontaining 192 samples for transient signals whi
h provides
good time resolution. The short windows get introdu
ed when there is an
"atta
k" (transient), sin
e using a long window would spread the noise introdu
ed over a wider range of adja
ent frequen
ies. The shift from a long
window to a short and vi
e-versa employs "start" and "stop" windows as
part of the transition. The output of the analysis lter bank is a set of spe
tral values with the property of energy
ompa
tion introdu
ed by MDCT.
Ea
h frame in MP3 audio has 2 granules. Ea
h granule
ontains 576 spe
tral
values.
2. Psy
hoa
ousti
model
A parallel pro
ess runs alongside the analysis lter bank whi
h rst
onverts
the time domain samples to frequen
y domain using the FFT and then provides the output of the Fourier transform to the psy
hoa
ousti
model. A
fast Fourier transform (FFT) is an e
ient algorithm to
ompute the dis
rete Fourier transform (DFT) and its inverse. A Hann window is used prior
to the FFT to redu
e the edge ee
ts. The Fourier analysis provides the
psy
hoa
ousti
model with the spe
tral
hange over time. On
e the PCM
samples are
onverted to the frequen
y domain using FFT, the psy
hoa
ousti
model runs algorithms on the data. These algorithms model the human
auditory system. The algorithms provide dire
tives on window swit
hing to
redu
e noise spreading and
ompute the allowable distortion in s
alefa
tor
bands whi
h
losely resemble the
riti
al bands of human hearing [11. More
importantly, it provides information on parts of audio that are audible and
5
inaudible. The inaudible part gets eliminated. This is the lossy part in MP3
ompression pro
ess.
3.
Quantization
Traditional data
ompression te
hniques are employed to further
ompress
the spe
tral data. The psy
hoa
ousti
analysis
ompresses
ompli
ated
sounds better than simpler sounds. Quantization and Human
oding are
used to further enhan
e the
ompression of these simpler sounds. The 576
frequen
y bins are further split into 12 or 21 s
alefa
tor bands depending
on the use of short or long blo
ks respe
tively. Ea
h s
alefa
tor band represents a range of frequen
ies. The frequen
ies are then quantized using a
non-uniform power law quantizer. Any error that is introdu
ed in the pro
ess
is what appears as quantization noise.
The FFT analysis mentioned in the analysis lter bank has an important
role to play in determining how mu
h pre
ision is needed in a s
alefa
tor
band. The FFT/Psy
hoa
ousti
model analyzes the signal for sounds that
would be masked by neighboring sounds (masking threshold). In this
ase
the weaker signal
an be ee
tively s
aled down without loss of per
eptual
quality thus redu
ing the number of bits needed to
ode that part of the
signal. On the ip side when the signal is s
aled ba
k up during de
oding
there is noise introdu
ed due to rounding errors introdu
ed during the en
oding pro
ess. An en
oder therefore needs to keep tra
k of when the noise
introdu
ed makes the SNR (Signal to Noise Ratio) per
eptually unfavorable
while at the same time keeping tra
k of the number of bits needed to en
ode
the part of the signal. SNR is dened as the ratio of a signal power to the
noise power
orrupting the signal. A re
on
iliation between the number of
bits used to en
ode a granule and the noise introdu
ed as a result of quanti6
zation is a
hieved though a feedba
k pro
ess
alled the outer-inner loop. The
inner-loop uses Human
oding to assign shorter
odes for more frequently
o
urring quantized values. It
omputes the total number of bits required to
ode a blo
k of data and
he
ks if the number is within the bounds provided
for a frame of data as determined by the sampling and bit rate4 . If not the
quantization step size is in
reased by in
reasing the global_gain. The quantization step size is
hanged until the required the number of bits is within
the allotted bits for the frame.
The outer loop on the other hand is responsible for shaping the quantization
noise a
ording to the masking threshold that is
omputed by the FFT/Psy
hoa
ousti
model for ea
h s
alefa
tor band. The s
alefa
tor bands that have
quantization noise above the masking threshold after quantization, i.e. after
the inner-loop iteration, are amplied to redu
e the noise. In the pro
ess
of ampli
ation the number of bits needed to en
ode spe
tral values of the
amplied bands goes up in
reasing the pre
ision thus redu
ing the noise in
these bands. Ampli
ation of s
alefa
tor bands also mandates a
all to the
inner loop to
he
k if the bits required to en
ode the spe
tral lines is within
the set bound. This pro
ess of quantization and noise shaping is an iterative
pro
ess with the outer loop
alling the inner loop every time the s
alefa
tor
bands are amplied.
The terminating
ondition arises when all the s
alefa
tor bands have noise
within the permissible limits and the number of bits used to en
ode the blo
k
is within the allotted value. This however is not always feasible, and hen
e
additional
onditions are used in order to terminate the iteration [12.
4. Bit stream Formatting and Human En
oding
4 For
example a 44.1 KHz, 128bit MP3 le is allotted 419 bytes per frame
Human
odes are variable length
odes. They are used in the lossless part
of MP3
ompression. Human
odes are used to assign shorter
odes to
more frequently o
urring strings and longer
odes for less frequently o
urring ones. MP3 en
oding pro
ess makes use of 32 Human tables to en
ode
quantized spe
tral data in various s
alefa
tor bands. Tables 4 and 14 are
never used. The quantized spe
tral values fall in the range [-8191, 8191.
One of the results of modelling
ompression based on psy
hoa
ousti
s is that
the resultant signal has high amplitude values asso
iated with low frequen
y
omponents. The amplitude de
reases as the frequen
y in
reases. The quantized spe
tral values are hen
e arranged a
ording to in
reasing frequen
y.
Regions of spe
tral lines are formed a
ording to various frequen
y ranges.
Most of energy in the audio signal is
on
entrated in the 20Hz to 14KHz frequen
y range [13 [12. This frequen
y range
orresponds to the big_value
region in a MP3 le. Further, the big_value region is split into 3 sub-regions
with typi
al frequen
y range split up of 0-2 KHz (region0), 2-5 KHz (region1), 5-14 KHz (region2) for a MP3 le whi
h has been sampled at 44.1
KHz. Ea
h of the regions use a dierent Human table for en
oding the
quantized values. The sele
tion of the table is done on the basis of the lo
al
region statisti
s of the signal.
The higher frequen
y
omponents whi
h have magnitudes of -1, 0, 1 form the
ount1 region. The rzero region
onsists of high frequen
y spe
tral values
with amplitude 0. The rzero region information is not transmitted a
ross as
part of the MP3 le. The
ount1 region uses 2 separate Human tables to
en
ode
ontiguous quadruples of spe
tral values. The big_value regions on
the other hand en
odes pairs of values using one of the 30 Human tables.
The Human en
oding tables
an be found in the standard [14. The rzero
main data. Figure 3 des
ribes the
omponent elds along with the size in bits for
the side information. The size represent the requirements in single
hannel mode
as well as the double that would be needed in a dual
hannel mode.
1.3
MP3Stego
quired to en
ode the s
alefa
tors and the Human
oded data. The granules to
be modied are randomly
hosen using SHA-1. The value and hen
e the parity
of part2_3_length variable is modied in the inner-loop during quantization. In
addition to the original
ondition that the number of bits used for en
oding a
blo
k be within a bound the inner-loop terminates only if the parity of the variable part2_3_length is the same as the bit to be embedded . The inner-loop in
the MP3Stego hiding pro
edure is shown in Listing 1.1.
Listing 1.1: MP3Stego inner-loop for hiding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
do
{
do
{
od_info >q u a n t i z e r S t e p S i z e += 1 . 0 ;
q u a n t i z e ( xrs , ix ,
od_info ) ;
} while ( ix_max ( ix , 0 , 5 7 6 ) > ( 8 1 9 1 + 1 4 ) ) ;
/ w i t h i n t a b l e r a n g e ? /
a l
_ r u n l e n ( ix ,
od_info ) ;
b i t s =
1 b i t s =
ount1_bit_
ount ( ix ,
od_info ) ;
s u b d i v i d e (
od_info ) ;
b i g v _ t a b _ s e l e
t ( ix ,
od_info ) ;
b i t s += b v b i t s = bigv_bit_
ount ( ix ,
od_info ) ;
/ r z e r o ,
ount1 , b i g _ v a l u e s /
/
o u n t 1 _ t a b l e s e l e
t i o n /
/ b i g v a l u e s s f b d i v i s i o n /
/
odebook s e l e
t i o n /
/ b i t
ount /
swit h ( h i d d e n B i t )
ase 2 :
embedRule = 0 ;
break ;
ase 0 :
ase 1 :
embedRule = ( ( b i t s + p a r t 2 l e n g t h ) % 2 ) != h i d d e n B i t ;
break ;
default :
}
spe
tral lines within the bit ration it sets all the spe
tral values to 0 [10. In
doing so the number of bits used to en
ode the spe
tral values part3 (Human
oding) redu
es to 0. In the
ode in Listing 1.1, bits would be set to 0 on Line
13 as bigv_bit
ount would return a 0 in the above said
ondition. In
ase
1
+ part2length)) % 2 != hiddenBit
, when
% 2) != hiddenBit
bits
((bits
number of bits needed to en
ode the s
alefa
tors is xed for a granule and
does not
hange during the pro
essing of the inner-loop. Suppose that the
inner-loop set all the spe
tral values to 0s and we have a 0 to embed i.e.
=0. If the variable part2length is odd the pro
ess (do loop) will
hiddenBit
Size onstraint
To begin with, MP3Stego has low embedding rates. This is also not helped
by the fa
t that the maximum
apa
ity of 4 number_of _f rames is never
a
hieved. The hiding
apa
ity of the
arrier is diminished by two fa
ts,
a.
zlib onsumption
The overhead asso
iated with
ompressing a 0 byte le with zlib result
in the usage of 24 bytes [1. This overhead redu
es the
apa
ity of the
arrier.
12
b.
Skip random
UnderMP3Cover
Do-Hyoung et al. [18 dis
uss a method of data insertion into MP3 bitstream
using linbits
hara
teristi
s. As mentioned in the paper the method does not have
high
apa
ity but is good for watermarking appli
ations. Litao Gang et al. [19
analyze data hiding s
hemes in amplitude domain, phase domain and also dis
uss a
noise substitution s
heme. N Moghadam and Sadeghi [20 propose a watermarking
s
heme in MDCT domain. They des
ribe a geneti
algorithm to sele
t the best
oe
ients to embed the watermark.
13
Very few implementations of MP3 based steg te
hniques exist. In addition the
te
hniques of watermarking though have a similar requirement of se
urity through
obs
urity impose
onstraints on robustness whi
h is not very essential for steganographi
te
hniques. In addition watermarks usually have a small payload size
whi
h makes them thrifty when it
omes to the payload size while steganography
is more demanding in terms of the payload
apa
ity of the
arrier. These reasons
make watermarking te
hniques usually inadequate for steganography.
1.4
BvSteg
BvSteg has almost 4 times the steganographi
apa
ity of MP3Stego and UnderMP3Cover. The tool hides data in the big_value region of the long blo
ks. Embedding is
arried out in region2 whi
h for a 44.1 KHz sampling rate
orresponds
to the frequen
y range of 5-14 KHz. Due to the energy
ompa
tion properties of
MDCT [21 most of the spe
tral energy is
on
entrated in region0 and region1 of
the signal. Changes in region2 introdu
es low noise
omponents in the signal and,
hen
e, the perturbed audio signal as per
eived by the human ear is not signi
antly
dierent from the signal without the embedding. The a
tual algorithm used for
embedding is based on the magnitude relationship between the pairs of spe
tral
values that o
ur in the big_value region (region2) during the en
oding pro
ess of
a MP3 le.
14
end if
end while
end pro
edure
Require: First time Hide
all in a longblo
k embed_count 1
25: pro
edure Hide(passphrase, bit_to_embed)
Ensure: First time Hide
all in a longblo
k embed_count 1
26:
if (In Human-
oding a long blo
k) then
27:
if (In region2 of bigvalues) then
28:
if ((equivmap[x[y == 1) && (embed_count 4)) then
29:
hide_or_skip GetRandBit(passphrase)
30:
if (hide_or_skip == HIDE_IN_BLOCK) then
31:
if ((bit_to_embed == 0)&&(x < y)) then
32:
33:
34:
35:
36:
37:
38:
SW AP (x, y);
else if ((bit_to_embed == 1)&&(x > y)) then
SW AP (x, y);
end if
embed_count embed_count + 1
return HIDE _IN _BLOCK
end if
15
39:
40:
41:
42:
43:
end if
end if
end if
return SKIP _BLOCK
end pro
edure
In Algorithm 1, the pair (x,y) represents the quantized spe
tral values in the
bigvalue region. The bit_to_embed as the name suggests is the bit that is to be
hidden. equivmap is a global array, one for ea
h of the 30 tables (Refer [14).
equivmap[x[y is set to 1 if the number of Human bits to en
ode the pair (x,y)
is the same as the number of Human bits required to en
ode the pair (y,x). The
routine SWAP ips the pair (x,y) . The fun
tions GetFileSize and GetRandBit
whi
h are not dened expli
itly perform the following operations. GetFileSize
fun
tion returns the size of the le that is to be hidden. GetRandBit takes as
the argument the passphrase that the user inputs. The passphrase is then hashed
using the SHA1 algorithms and part of the hash is used to seed a pseudo random
number generator. The return value of the fun
tion is the LSB of the random
number generated. The embed_
ount is the maximum number of
hanges that
are allowed per longblo
k and is limited to 4.
The number of embedded bits is restri
ted to 4 per granule in the long blo
k.
Our experiments show that this number is a good trade-o between high steg
apa
ity and low noise. Embedding is skipped if the granule does not
ontain a
long blo
k. Within a long blo
k embedding is further restri
ted to spe
tral value
pairs in region2 so that the amount of noise introdu
ed is minimal.
1.4.1 LSB Steganography using Spe
tral Pairs and Human Values
The ow
hart in Figure 4 has ea
h of the
onstraints that are enfor
ed on the
andidate pairs of spe
tral values in the diamond boxes. Two driving for
es behind
the
onstraints imposed on the
andidate pairs for embedding are:
16
bit_to_embed ==0?
N
Is the block
a longblock?
Y
Y
x<y
x>y
N
SWAP(x,y)
N
Huffbits(x,y)
==Huffbits(y,x)?
embedCount == 4?
Y
Get the bit to embed in bit_to_embed
embedCount = embedCount + 1
17
1.
Constant Bit Rate (CBR) MP3 en
oding imposes size
onstraints on a MP3
frame. The overall number of bits per frame
an be
omputed using the
BitRate
). For example, a MP3 le
formula F rameSize = 144 ( SampleRate+P
adding
en
oded at a sampling rate of 44.1 KHz and a bitrate of 128 Kbps would
have a frame size of approximately 417 bytes. The distribution of bytes
among the granules is en
oder dependent. As dis
ussed in se
tion 1.2, it
is the responsibility of the inner-loop to ensure that the total number of
Human bits needed to
ode a blo
k of signal data is within the allotted
number for a frame whi
h impli
itly puts a limit on the granule.
The problem that arises as a result of swapping a spe
tral pair (x,y) is that
Hubits5 (x,y) need not be equal to Hubits(y,x). This
auses a
hange in
the bit
ount in a granule. In order to keep the total number of bits in a
granule within set bounds, two strategies
ould be adopted.
a.
omputes the number of bits required to en
ode a spe
tral pair using Human
odes.
For example:Hubits(1,1)=3 for table 1, whi
h is the hlen
olumn value, refer page 56 [14
18
19
The data extra
tion pro
ess is straightforward. The pairs of spe
tral values
that are obtained after Human de
oding are
ompared for their magnitude relationship. Algorithm 2 details the extra
tion pro
ess. The equivmap array in
the
ondition is
he
ked for either (x,y) or (y,x) pair being set. The logi
is self
explanatory and mirrors that of the hiding pro
ess.
1.5
Tool Development
There are quite a few open sour
e MP3 en
oders available on the web. The
8Hz MP3 [22 en
oder was used for the development of the tool. 8Hz sour
e
ode
base is also used by MP3Stego and UnderMP3Cover tools. 8Hz MP3 en
oder is not
the best available en
oder in terms of speed and the quality of sound produ
ed.
It is however one of the earliest en
oders and has been the sour
e base for the
development of numerous en
oders, prominent among them is the LAME [23
en
oder that started o as a pat
h to the 8Hz en
oder to its present status as one
of the prominent open sour
e en
oders.
The
ode
hange primarily involved manipulating the quantized MDCT values
in the Human
odebits fun
tion of the l3bitstream.
le. The quantized MDCT
values are
ompared to determine if a swap is needed to en
ode the hidden bit
before they are passed onto the Human en
oding fun
tion HumanCode. The
extra
tion logi
is implemented in the de
ode.
le in III_hufman_de
ode fun
tion. The reverse logi
as mentioned in algorithm 2 is
oded to extra
t the hidden
21
22
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
then
change_per_granule change_per_granule + 1;
if (f ile_size_or_mesg _bit == M ESSAGE _BIT S)
total_bits total_bits + 1;
end if
end if
end if
end if
end if
end if
if (f ile_size_or_mesg _bit == M ESSAGE _BIT S) then
if total_bits == f ile_size then
return EXT RACT ION _COM P LET E
end if
end if
end pro
edure
bit based on the magnitude relationship of the quantized Human pairs. Helper
fun
tions were written to build a database of equivalent spe
tral pairs in terms of
Human
ode length.
The hiding and extra
tion pro
ess is integrated into a Python s
ript whi
h
alls
the modied en
ode and de
ode exe
utables of the 8Hz en
oder. Bit
onversion
routines that
onvert a text le to a stream of bits for hiding and vi
e-versa after
extra
tion are in
orporated into the s
ript. In addition, two helper fun
tions in
the main hiding and extra
tion pro
edures in
lude a pseudo random generator
based on SHA1 hash and a le size embedding logi
. A part of the SHA1 hash of
the user input passphrase is used to the seed a repli
able pseudo random number
generator. This provides the logi
to randomly hide and skip blo
ks. The le size
of the payload le is hidden into the rst 32 randomly sele
ted bits. This limits the
size of payload to 4GB whi
h is expe
ted to su
ient in terms of payload
apa
ity.
23
1.6
The table in Figure 5 shows the
apa
ity of various wav les under dierent
steg tools. The les have been sele
ted from dierent genre. In addition the size
of wav le indi
ates dierent duration.
Name
beatles3
jazz2
sting10
vanmorris7
guitar19
nayyar2
Size of wav
9
65
64
50
105
60
MP3Stego
1
7
7
5
11
6
UnderMP3Cover
1
7
7
5
11
6
Bvsteg
3
15
20
17
40
24
(a) Size of the wav le is in MB, Steg
apa
ity of the MP3 les after en
oding the wav les
using MP3Stego, BvSteg is in KB, MP3 les with data hidden using UnderMP3Cover with
spa
ing of 2 have size mentioned in KB.
Con
lusion
This paper began by introdu
ing the
on
ept of digital steganography. The
MPEG layer III audio en
oding pro
ess was then illustrated highlighting the lossy
aspe
ts of the
ompression pro
ess. This was followed by an overview of existing
steg tools and te
hniques. A bug in the MP3Stego tool that was
aused by the
inner_loop
onstraints was exposed. A new steg te
hnique BvSteg was proposed.
The te
hnique is better over the existing steg
on
epts and tools in terms of the
payload
apa
ity. BvSteg has 4 times the
apa
ity of both MP3Stego and UnderMP3Cover. BvSteg tool modies the quantized MDCT
oe
ients in the high
frequen
ies of the big_value region. The
hanges in the high frequen
y region
are less dis
ernible and hen
e don't introdu
e substantial noise. The feasibility
of the te
hnique was outlined in presenting the algorithms and ow
hart for the
embedding and extra
tion pro
ess. Possible methods on how to in
rease the
apa
ity su
h as
ount1 modi
ation and for lowering the noise i.e. least spe
tral
magnitude dieren
e were also dis
ussed. A noise and
apa
ity analysis presented
25
BvSteg/Original Signal
*
**
150
100
Amplitude in db
50
*
*************** ***
**** *
**
** *** *
* ************
***** ***************** *
***********
*********
***** ***
****** ******
*****
**********
**************
***************
* *****
*****
****
**
***
**
*
**
**
**
*
* * **
**
* ** * ** * * ** *** *
** ***** * *** * * **
*
**
*
***
*
*** * ** * **** * ** * *
* * ** * * *
*
*
*
*
5000
10000
15000
20000
25000
Frequency in Hz
(a) Bla
k: MP3 Signal after en
oding the wav le using 8Hz,
Green:MP3 signal after hiding using Bvsteg. BvSteg signal
follows the original signal with high delity
BvSteg Noise
100
Amplitude in db
50
**
****
*
** *************** *
*** **
****
* ****
** ******
**
* **
*
**
* *
*
*****
**** *
* *****
**
* ** * * *
*
*
***
* *
***
**
**
****
**
*
* *
* *
*** * ** *
**
* *
*
* *
*
*
***
** *
*
*
* * **
*
*
*
*
*
*
*
**
*
*
**
*
*
*
*
*
*
*
*
**
**
***
150
*
*
*
**
******
** * ****
* * *
************************** **************************************
*
**
**
5000
10000
15000
20000
25000
Frequency in Hz
(b) Bla
k: MP3 Signal after en
oding the wav le using 8Hz,
Green:Noise signal introdu
ed after hiding using Bvsteg
26
MP3Stego/Original Signal
*
**
150
100
50
*
*************** **
**** ***
**
** * * *
* ************
**** *************** *
*************
**********
***** *
**************
*****
**********
**************
*************
** * ****
*****
****
**
****
**
*
***
*
**
*
* * **
**
* ** * ** * * ** *** *
** ***** * *** * * **
*
**
*
***
*
*** * ** * **** * ** * *
* * ** * * *
*
*
*
*
5000
10000
15000
20000
25000
(a) Bla
k: MP3 Signal after en
oding the wav le using 8Hz,
Red:MP3 signal after hiding using MP3Stego
Amplitude in db
50
MP3Stego Noise
****
********************** **
* **********
** **
**
**** ****************************
***
*******
*
*********** *
******
**********
***********
* **** **
** ***************
* ******
*******
*****
**
***
**
*
****
*
*
*
*
100
150
*
*
*
**
*** *
**********************************************
*
**** *** * ***
5000
10000
15000
20000
25000
Frequency in Hz
(b) Bla
k: MP3 Signal after en
oding the wav le using 8Hz,
Red:Noise signal introdu
ed after hiding using MP3Stego.
27
UMPC/Original Signal
*
**
Amplitude in db
50
*
*************** ***
**** *
**
** *** *
* ************
***** ***************** *
***********
*********
***** ***
****** ******
*****
**********
**************
***************
* *****
*****
****
**
***
**
*
***
*
**
*
*
** * ** ** ** ** *** **
*
** ****** * **** * * **
*
*
*
***
*
* ** **
*
* *
* * ** * * **
*
* * ****
* **
150
100
*
*
5000
10000
15000
20000
25000
Frequency in Hz
(a) Bla
k: MP3 Signal after en
oding the wav le using 8Hz,
Blue:MP3 signal after hiding using UnderMP3Cover
UMP3C Noise
*
**
150
100
Amplitude in db
50
*************** *
* * ***
**
******** *
* ************
***** *************** *
**************
********
****** **
**********
*******
**********
**************
*************
* ******
*****
***
***
****
**
*
**
**
**
*
* **
*
**
*
**
**
* ****** ******* **** **** *
*
*
* ** * *** **** ******
**
** * ** * *
*
*
* **
*
**
5000
10000
15000
20000
25000
Frequency in Hz
(b) Bla
k: MP3 Signal after en
oding the wav le using
8Hz, Blue:Noise signal introdu
ed after hiding using UnderMP3Cover.
28
validates the
laim of BvSteg having higher
apa
ity as
ompared to the existing
tools. The spe
trum analysis demonstrated the noise
omponent introdu
ed due
to the perturbations to be within the a
eptable levels.
The BvSteg tool despite its high steg
apa
ity
ould still improve in the manner in whi
h the hiding algorithm sele
ts the
andidate frequen
y bands for hiding
data. Methods for sele
tion of noiseless bands provides s
ope for future work. A
brief des
ription of one su
h proposed methodology ensues.
One of the
hallenges in data insertion as a means of
overt
ommuni
ation
or watermarking is that any perturbations should have minimal impa
t on the
signal quality. In order to a
hieve this we need to distinguish noisy bands from
noiseless ones. The distortion
ontrol/rate loops are designed su
h that both the
onstraints of bit rate and allowed quantization noise are met. In order to meet the
latter requirement the s
alefa
tor bands that have more than allowable distortion
are amplied. The ampli
ation is done so that more number of bits are allo
ated
during the subsequent
all to the inner-loop and hen
e the quantization noise whi
h
was introdu
ed earlier in the s
alefa
tor band would be lesser. This also implies
stealing bits from bands that had noise within permissible levels. The ampli
ation
is done under the assumption that the bands whi
h have tolerable noise levels after
quantization
an assimilate more noise without
rossing the noise threshold set by
the psy
hoa
ousti
model.
It is proposed that identifying s
alefa
tor bands in granules that are not amplied would help sele
t frequen
y ranges that would introdu
e least amount of
noise when their amplitude is modied. In addition te
hniques involving sele
tion
of blo
ks based on the number of bits in the bigvalue region as done by [18 would
be an interesting approa
h. This provides s
ope for future work.
The paper has been su
essful in demonstrating a new high
apa
ity steg
29
te
hnique BvSteg whi
h hides data in region2 of longblo
k in MP3 les during en
oding. This work aim at popularizing the use of MP3 as a steganographi
medium
and lay foundation for development of steg tools in MDCT domain. The open nature of the tool would also serve as a
ase study that would aid the steganalysis of
any MDCT based steganographi
tools.
30
MANUSCRIPT 2
A Tool Framework for MP3 Steganalysis
Abstra
t
Steganalysis is the te
hnique of dete
ting the presen
e of hidden data in a
medium whi
h
an a
t as a possible
arrier. Digital steganalysis employs te
hniques of statisti
al analysis and ma
hine learning to dete
t hidden data in multimedia and les with various formats. Most of the steganographi
te
hniques
alter statisti
al
hara
teristi
s of the underlying media. Steganalysis te
hniques
employed in JPEG/MPEG domain try and dete
t this
hange to as
ertain steg in
the medium.
In the
ase of MP3 based steganography the tools are mostly built on top of
an existing open sour
e en
oder. The 8Hz en
oder whi
h is one of the oldest MP3
en
oders has been the en
oder of
hoi
e for the steg tools dis
ussed here. This paper
proposes a framework for a tool for the dete
tion of MP3 based steganographi
tools. MP3Stego and UnderMP3Cover are the only two MP3 steganographi
tools
known apart from the new te
hnique proposed in the previous paper.
Step 1 of the dete
tion pro
ess uses a Support Ve
tor Ma
hine (SVM) to
identify the en
oder used to en
ode the MP3 le in question as outlined in [25. The
tool uses a multi-
lass SVM for en
oder
lassi
ation. Step 2 employs strategies
des
ribed in [1 [2 that target the dete
tion of spe
i
MP3 steg tools MP3Stego
and UnderMP3Cover. Step 1 of the dete
tion pro
ess is used for pre-ltering MP3
les in order to redu
e the false positive rate. This step would be bene
ial in
dete
ting en
oders used to implement new steg te
hniques in the future.
31
2.1
Introdu tion
MP3 En
oders are software that implement the MPEG audio spe
i
ations in
ompressing an audio signal to MP3 format in the pro
ess a
hieving a
ompression
ratio of nearly 1:12. An un
ompressed audio le is stored as PCM (Pulse
ode
Modulated) samples. PCM samples are a digital representation of the analog
waveform of an audio signal. In general the audio format is a WAV (Waveform
audio format) le
ontaining un
ompressed PCM samples. MP3 en
oders that
onvert WMA (Windows Media Audio) format to MP3 format also exist.
Though no one holds ex
lusive rights MP3 te
hnology has most of the algorithms patented by Fraunhofer-Gesells
haft. Despite patent issues quite a few
MP3 'like' en
oders exist and follow the same basi
en
oding blo
ks as shown in
32
Figure 9. The normative elements in the MPEG standard spe
ify the format of
the bit stream (
ompressed audio) and the stru
ture of the de
oder. The en
oder
implementation is
ompletely left to the implementer. This freedom has given
rise to dierent en
oders whi
h produ
e MP3 bit stream with same format albeit
distin
t properties. Step 1 of MP3 steganalysis dete
tion uses a SVM with these
properties as features to dierentiate between the MP3 en
oders.
2.3
MP3Stego
required to en
ode the s
alefa
tors and the Human
oded data. The granules to
be modied are randomly
hosen (using SHA-1). The value and hen
e the parity
of part2_3_length variable is modied in the inner-loop during quantization. In
addition to the original
ondition that the number of bits used for en
oding a
blo
k be within a bound, the loop terminates only if the parity of the variable
part2_3_length is the same as the bit to be embedded.
2.3.2
UnderMP3Cover
34
Figure 10: Support Ve
tor Ma
hine Model with a linear de
ision surfa
e
SVM is one wherein the de
ision fun
tion is allowed to make mistakes. The
on
ept
of soft margin
lassiers introdu
es a
ost parameter in the formulation of the
de
ision surfa
e. The
ost parameter is introdu
ed to penalize the de
ision surfa
e
on erroneous
lassi
ation. Noisy data points are the reason for the use of soft
margin
lassiers. The formulation of the de
ision surfa
e in su
h a
ase often is
di
tated by the noisy points whi
h do not represent the true separation between
the
lasses. In order to have better generalization a sla
k variable is introdu
ed in
the optimization problem for the maximum margin
lassier and the
lassier is
allowed to make mistakes on these noisy points so that the margin
an be made
wider. With the sla
k variable the optimization has a trade-o between margin
width and error. This trade-o makes a
ase for having high
osts whi
h results
in smaller margins due to large penalties resulting from possible mis
lassi
ation.
Small margin also means that the model may not generalize well. On the other
hand smaller
osts implies wider margins whi
h implies that the model has the
ability to learn more while allowing for it to make mistakes thus making the model
more in tune with the real world noisy data for
lassi
ation.
The power of SVMs is highlighted by the fa
t that the de
ision surfa
e
an be
35
based on non-linear kernel fun
tions that
an be used to separate data that is not
linearly separable in higher dimensions. The kernel tri
k as it is
alled employs
transformation fun
tion to data points in the input spa
e where the data is not
linearly separable to a higher dimensional spa
e
alled the feature spa
e thereby
making the data linearly separable. The kernel fun
tions are pe
uliar in the sense
that their properties of positive deniteness among others enable the feature spa
e
omputations to be performed in input spa
e whi
h is quite remarkable.
Of the algorithms that implement the SVM methodology the SMO (Sequential
Minimal Optimization) is the popular implementation used in ma
hine learning.
The
on
ept of VC-dimension in dening model
omplexity plays an important
role is sele
ting a model that is less
omplex as they are the ones that are likely
to generalize better. In most
ases the underlying data might not be a true representation of the data universe in terms of
ompleteness. Without delving into the
details the VC-dimension of a model
lass
points) is the largest subset of D (size m) shattered by the model
lass. In the
above denition m is the VC-dimension of the model
lass. A model
lass is said to
have shattered a data set D if for all the possible label
ongurations in the data
set the models in the model
lass
an separate the data points perfe
tly. Sin
e
the data set used for training a model and hen
e a
lassier is a representation
of the data universe we
annot expe
t to have the knowledge in order to redu
e
the expe
ted risk. However we
ould learn from the observed data in the data set
and redu
e the empiri
al risk. This is
alled empiri
al risk minimization. Overly
optimisti
empiri
al risk minimization and redu
ing the training error to a
hieve
high a
ura
y
an lead to poor generalizable models.
Multi-
lass SVMs with soft-margins use pairwise
lassi
ation to build
lassi1 A model
lass represents all the possible
ongurations (rotation and translation) of a de
ision
36
ation models. In general, models with low
ost (extremely soft margin) tend to
lean towards a heavily weighted (in terms of instan
es)
lass. Sin
e in multi-
lass
problems the
lasses might not be evenly represented in the training data this
ould
ause a problem. It is resolved by pairwise
lassi
ation between ea
h of the
lasses. In order to
lassify an unknown observation a voting s
heme is adopted
whereby the
lass that gets the largest number of votes with the pairwise
lassi
ation is assigned to the unknown observation. For an in-depth understanding of
SVMs refer [27 and [28.
The MP3 en
oder
lassi
ation problem uses a soft margin multi-
lass SVM.
Various kernels linear, radial and polynomial are tested to nd the best model in
terms of a
ura
y and generalization.
2.5
The ensuing se
tions elaborate on the ar
hite
ture of the tool,
on
epts of ma
hine learning adopted along with the validity and signi
an
e of the
lassi
ation
model built.
2.5.1
Figure 11 des
ribes the layout of the MP3 steganalysis tool in detail. The
tool
onsists of two parts. The rst part des
ribes an en
oder
lassi
ation s
heme
built using a support ve
tor ma
hine (SVM) for
lassi
ation of MP3 les based
on the en
oder used to
reate them. The feature extra
tion pro
edure was built
using the mpglib MP3 de
oding library. All the features are generated as part
of the MP3 de
oding pro
ess and are written onto a le. The feature le is then
loaded in the R programming environment for training/testing using SVM. The
result of the multi
lass SVM is an output depi
ting the
lass (en
oder) whi
h the
MP3 le belongs to. The les en
oded using 8Hz and SoloH en
oders are the only
37
models. The models are built with 10-fold
ross-validation to redu
e the bias. A
soft-margin
lassier is used to train a SVM model on these feature ve
tors whereby
we are able to a
ount for outliers (noise) that would mimi
the real world s
enario.
Table 3 shows the range for the free parameters that were used to train models
using SVM. Linear, Radial and Polynomial kernels are used. Not all kernels have
all the free parameters, a '-' in the table represents the absen
e of the parameter
for the kernel.
Table 3: SVM Training
Kernel
Cost Range Gamma Range Coef0 Range Degree TrainA
%
Linear
0.01-1000
89.12
Radial
0.01-1000
0.0625-256
90.54
Polynomial 0.01-1000
0.0625-256
-100-1000
2,3,5,7,8
91.74
Table 4 shows the best parameters that are sele
ted from the models built.
The optimal values for the parameters along with the test a
ura
y on 1000 pristine
MP3 les with a mix of all en
oders is shown in Table 4. The polynomial kernel
is
hosen based on the a
ura
y results on the test set of 1000 MP3 les. The
bootstrap
onden
e interval range for ea
h of the kernels is also shown in Table
4.
Table 4: SVM Test Results
Kernel
Cost Gamma Coef0 Degree TestA
% Bootstrap Interval
Linear
1
1
85.88
85.71 - 90.50
Radial
10
0.0625
86.82
86.76 - 90.61
Polynomial 0.01
0.5
10
2
90.47
85.05 - 92.29
200 bootstrap samples ea
h having 3000 data points using the optimal parameters
obtained for ea
h of the kernels. Ea
h of the 200 samples was sampled from the
original dataset with repla
ement. The repla
ement in sampling represents the
bias in the real world data. A 10 fold
ross validated error is
omputed for ea
h
sample. Ea
h fold has a split of 90/10 (hold out method). The
ross validated error
results are then sorted in the as
ending order. In order to derive a 95%
onden
e
error interval we extra
t the 2.5th % per
entile whi
h forms the lower bound and
97.5th % per
entile bound whi
h forms the upper bound. Thus the a
ura
y ranges
in the
onden
e interval
olumn of Table 4 imply that we are 95% sure that with
the bias in the real world s
enario the models have an a
ura
y that fall in the
given range.
Based on the test results and the bootstrap
onden
e interval values the
polynomial kernel is
hosen as it is statisti
ally signi
ant and has a better test
a
ura
y.
The
onfusion matrix of the polynomial kernel model on the 1000 test samples
is given in Table 5. The values represent
lassi
ation a
ura
y in terms of a
per
entage for ea
h of the en
oder
lass. As
an be observed the model does not
alleviate the problem fa
ed by authors in [25 whi
h is the false
lassi
ation of 8Hz
as SoloH and vive-versa. As mentioned in [25 this is attributed to the similarity
in the origin of these en
oders. The error rate does not however
ause problems in
the step 2 as we run the steganalysis dete
tion on the les that have been
lassied
as either 8Hz or SoloH.
Step 2 of the tool dete
ts spe
i
MP3 steg te
hniques whi
h in
lude
MP3Stego and UnderMP3Cover. As mentioned earlier the nas
ent nature of MP3
steg te
hniques is the reason for the small number of MP3 steg tools. Both the
steg tools have been su
essfully dete
ted in Westfeld's papers [1 [2.
41
2.5.4
8Hz
86
0
0
0
0
0
0
24
0
0
plugger
0
100
0
0
0
0
0
0
0
0
fasten
0
0
100
0
0
0
0
0
0
0
shine
0
0
0
100
0
0
0
0
0
0
gogo
0
0
0
0
100
0
0
0
0
0
m3e
0
0
0
0
0
100
0
0
0
0
lame
0
0
0
0
0
0
100
0
0
0
SoloH
28
0
0
0
0
0
0
82
0
0
bladeen
0
0
0
0
0
0
0
0
100
0
mp3s
0
0
0
0
0
0
0
0
0
100
MP3Stego dete
tion employs blo
k length analysis to distinguish MP3 les
that are stegged using MP3Stego software from the ones that are en
oded using
any other MP3 en
oder. As explained in [1, a MP3Stego modied MP3 le has
the same size as the original le, despite the blo
k sizes being dierent. This is
due to the MP3 rate
ontrol pro
ess with CBR (Constant Bit Rate) audio whi
h
results in en
oder
ompensating for the extra bits in one frame by redu
ing the
bits allo
ated to a subsequent one. Despite the mean of the blo
k lengths being the
same their varian
e in a steganographi
ally modied le is dierent from that of a
non-stegged MP3 le. Figure 12 shows this dieren
e in varian
e using a histogram
on blo
k lengths on 2 MP3 les, stegged and non-stegged. A non-stegged MP3
les has unimodal distribution of blo
k length whi
h peaks near the average frame
length.
MP3Stego dete
tion involves 2 stages.
MP3Stego dete
tion engine involves determining the autoregressive
oe
ients
0 , 1 and 2 as per the blo
k length relationship blocki = 0 + 1 .blocki1 +
2 .blocki2 mentioned in [1 . A model using quadrati
dis
riminant analysis (QDA)
with 0 , 1 and 2 as feature ve
tors was built to distinguish les en
oded using
MP3Stego and 8Hz en
oder. The model was able to a
hieve 100% distin
tion between 8Hz and MP3Stego whi
h is supported by the
onfusion matrix in Table
42
2.5.4. The model was built using 1500 MP3 les with equal number of stegged
and non-stegged les. The stegged les had data at 50% embedding
apa
ity. The
model was tested on 1000 pristine MP3 les with equal number of les from the
steg and the non-steg
ategory.
Table 6: Confusion Matrix for QDA model for MP3Stego dete
tion
MP3Stego
8Hz
2.5.5
MP3Stego 8Hz
500
0
0
500
The dete
tion of UnderMP3Cover in the tool is mere integration of
ode
ited
in the work [2. The program updet exploits the feature of the steg tool whereby
the size information of the le that is hidden is stored in the rst 6 bits of the
arrier MP3 le. By extra
ting this data the program
updet
he ks if this value
43
1000
0
500
Frequency
0
500
Frequency
1000
1500
1500
500
1000
1500
part2_3_length
500
1000
part2_3_length
44
1500
In this paper we have outlined a framework for MP3 en
oder
lassi
ation
and steganalysis. We adopted the work in this eld by Boheme and Westfeld and
extended it to our steganalysis framework whi
h also in
ludes JPEG steganalysis. The implementation uses SVMs for en
oder
lassi
ation with the polynomial kernel
hosen as a result of better generalization. In order to analyze the
model ee
tiveness in the real world we performed the statisti
al signi
an
e test
via bootstrapping. We were able to a
hieve
omparable results with that in the
original work using SVM, in addition by performing bootstrapping we have also
demonstrated the viability of the use of the SVM
lassi
ation methodology in the
real world. In terms of steganalysis we built into the tool pro
edures to dete
t
MP3Stego and UnderMP3Cover as des
ribed in Westfeld's work [1 [2.
In future we intend to investigate possible features for a
urate
lassi
ation
of 8Hz and SoloH. We propose the use of SVMs for this purpose in order to be able
to tune the free parameters in various kernels expe
ting to reveal de
ision surfa
es
that
ould exploit hidden patterns to dierentiate these en
oders. The en
oders
themselves need to be studied to develop new features.
45
LIST OF REFERENCES
[1 A. Westfeld, Dete
ting Low Embedding Rates,
LECTURE NOTES IN
[2 A. Westfeld, Steganalysis in the Presen
e of Weak Cryptography and En
oding, LECTURE NOTES IN COMPUTER SCIENCE, vol. 4283, p. 19,
2006.
[3 G. Simmons, The prisoners problem and the subliminal
hannel, in Pro
eedings of CRYPTO, vol. 83, 1984, pp. 5167.
[6 Steghide
Steganography
tool.
[Online.
Springer, 2001.
Available:
http://steghide.
sour
eforge.net/
[7 University of Cambridge. MP3Stego.
[Online. Available:
http://www.
petit
olas.net/fabien/steganography/mp3stego/
[8 Sour
eforge. UnderMP3Cover. [Online. Available: http://www.les-library.
om/les/UnderMP3Cover.html
[9 F. A. P. Ross J. Anderson, On The Limits Of Steganography, IEEE Journal
of Sele
ted Areas in Communi
ations, vol. 16, no. 4, pp. 474481, May 1998.
[12 D. Pan, M. In
, and I. S
haumburg, A tutorial on MPEG/audio
ompression, Multimedia, IEEE, vol. 2, no. 2, pp. 6074, 1995.
[13 A. Servetti, C. Testa, J. De Martin, and D. e Informati
a, Frequen
y-sele
tive
partial en
ryption of
ompressed audio, in A
ousti
s, Spee
h, and Signal Pro
essing, 2003. Pro
eedings.(ICASSP'03). 2003 IEEE International Conferen
e
on, vol. 5, 2003.
[14 MPEG, Coding of Moving Pi
tures and Asso
iated Audio for Digital Storage
Media at Upto 1.5 MBIT/s, 1991.
46
184, 2004.
[19 L.
Gang,
A.
Akansu,
and
M.
Ramkumar,
MP3
resistant
oblivious
A ousti s, Spee h, and Signal Pro essing, 2001. Pro eedings.(ICASSP'01). 2001 IEEE International Conferen e on, vol. 3, 2001.
steganography, in
[20 N. Moghadam and H. Sadeghi, Geneti Content-Based MP3 Audio Watermarking in MDCT Domain,
Signal Pro essing Pro eedings, 2000. WCCCICSP 2000. 5th International Conferen e on, vol. 1, 2000.
[27 L. Hamel, Knowledge Dis
overy With Support Ve
tor Ma
hines, unpublished.
[28 C. Burges, A Tutorial on Support Ve
tor Ma
hines for Pattern Re
ognition,
Data Mining and Knowledge Dis overy, vol. 2, no. 2, pp. 121167, 1998.
47
BIBLIOGRAPHY
Auda
ity. [Online. Available: http://www.auda
ity.sour
eforge.net/
Bhme, R. and Westfeld, A., Statisti
al
hara
terisation of MP3 en
oders for
steganalysis, in
Data Mining and Knowledge Dis overy, vol. 2, no. 2, pp. 121167, 1998.
Do-Hyoung, K., Seung-Jin, Y., and Jae-Ho, C., Additive Data Insertion Into MP3
Bitstream Using linbits Chara
teristi
s,",
2004.
8Hz MP3 En
oder. [Online. Available: http://www.8hz.
om/mp3/
LAME MP3 En
oder. [Online. Available: http://lame.sour
eforge.net/
Gang, L., Akansu, A., and Ramkumar, M., MP3 resistant oblivious steganog-
raphy,
in A
ousti
s, Spee
h, and Signal Pro
essing, 2001. Pro
eedings.(ICASSP'01). 2001 IEEE International Conferen
e on, vol. 3, 2001.
http://steghide.sour eforge.
net/
Khalid, S.,
Moghadam, N. and Sadeghi, H., Geneti Content-Based MP3 Audio Watermarking in MDCT Domain,
MPEG, Coding of Moving Pi
tures and Asso
iated Audio for Digital Storage
Media at Upto 1.5 MBIT/s, 1991.
Pan, D., In
, M., and S
haumburg, I., A tutorial on MPEG/audio
ompression,
Pevny, T. and Fridri
h, J., Merging markov and d
t features for multi-
lass jpeg
steganalysis,
Provos,
[Online. Available:
http://www.
outguess.org
Raissi, R., The Theory Behind Mp3, 2002.
IEEE Journal of
Sele
ted Areas in Communi
ations, vol. 16, no. 4, pp. 474481, May 1998.
Servetti, A., Testa, C., De Martin, J., and e Informati a, D., Frequen y-sele tive
A
ousti
s, Spee
h, and Signal Pro
essing, 2003. Pro
eedings.(ICASSP'03). 2003 IEEE International Conferen
e
on, vol. 5, 2003.
Simmons, G., The prisoners problem and the subliminal
hannel, in Pro
eedings
of CRYPTO, vol. 83, 1984, pp. 5167.
partial en
ryption of
ompressed audio, in
Sour eforge.
UnderMP3Cover.
[Online.
Available:
http://www.les-library.
om/les/UnderMP3Cover.html
University of Cambridge. MP3Stego. [Online. Available: http://www.petit
olas.
net/fabien/steganography/mp3stego/
Wang, Y., Yaroslavsky, L., Vilermo, M., and Vaananen, M., Some pe
uliar prop-
49