You are on page 1of 13

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/223031059

An adaptive audio watermarking based on the


singular value decomposition in the wavelet
domain

Article in Digital Signal Processing December 2010


DOI: 10.1016/j.dsp.2010.02.006 Source: DBLP

CITATIONS READS

83 378

3 authors, including:

Indranil Sengupta Abhijit Das


Indian Institute of Technology Kharagpur Indian Institute of Technology Kharagpur
104 PUBLICATIONS 663 CITATIONS 32 PUBLICATIONS 281 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Power Aware Compiler for Embedded Processors View project

All content following this page was uploaded by Indranil Sengupta on 07 October 2014.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Digital Signal Processing 20 (2010) 15471558

Contents lists available at ScienceDirect

Digital Signal Processing


www.elsevier.com/locate/dsp

An adaptive audio watermarking based on the singular value


decomposition in the wavelet domain
Vivekananda Bhat K , Indranil Sengupta, Abhijit Das
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India

a r t i c l e i n f o a b s t r a c t

Article history: This paper presents a secure, robust, and blind adaptive audio watermarking algorithm
Available online 18 February 2010 based on singular value decomposition (SVD) in the discrete wavelet transform domain
using synchronization code. In our algorithm, a watermark is embedded by applying a
Keywords:
quantization-index-modulation process on the singular values in the SVD of the wavelet
Audio watermarking
Discrete wavelet transform (DWT)
domain blocks. The watermarked signal is perceptually similar to the original audio signal
Quantization index modulation (QIM) and gives high quality output. Experimental results show that the hidden watermark data
Singular value decomposition (SVD) is robust to additive noise, resampling, low-pass ltering, requantization, MP3 compression,
Synchronization code cropping, echo addition, and denoising. Performance analysis of the proposed scheme
shows low error probability rates. The data embedding rate of the proposed scheme is
45.9 bps. The proposed scheme has high payload and superior performance against MP3
compression compared to the earlier audio watermarking schemes.
2010 Elsevier Inc. All rights reserved.

1. Introduction

Recent advances in Internet and digital multimedia technology have allowed transmission and distribution of digital
multimedia (audio, image and video) easily and eciently to distant places. However, this convenience allows unauthorized
copying and distribution of multimedia data. Copyright protection of digital data has become an important issue. Digital
watermarking [1] technology has received great deal of attention to solve this problem. Digital watermarking is a process
of embedding watermark data into the audio signal. This embedded data can later be detected or extracted from the audio
signal for various applications. There are several applications of audio watermarking including copyright protection, copy
protection, content authentication, ngerprinting and broadcast monitoring.
In general, an effective audio watermarking [2] scheme must satisfy the following basic requirements: (i) Imperceptibil-
ity: The quality of the audio should be retained after adding the watermark. Imperceptibility can be evaluated using both
objective and subjective measures. According to IFPI (International Federation of the Phonographic Industry) recommenda-
tions, a watermarked audio signal should maintain more than 20 dB SNR. (ii) Security: Watermarked signals should not
reveal any clues about the watermarks in them. Also, the security of the watermarking procedure must depend on secret
keys, but not on the secrecy of the watermarking algorithm. (iii) Robustness: Ability to extract a watermark from a water-
marked audio signal after various signal processing attacks. (iv) Payload: The amount of data that can be embedded into the
host audio signal without losing imperceptibility. For audio signals, data payload refers to the number of watermark data
bits that may be reliably embedded within a host signal per unit of time, usually measured using bits per second (bps).
There should be more than 20 bps data payload.

* Corresponding author.
E-mail addresses: vbk@cse.iitkgp.ernet.in (V. Bhat K), isg@cse.iitkgp.ernet.in (I. Sengupta), abhij@cse.iitkgp.ernet.in (A. Das).

1051-2004/$ see front matter 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.dsp.2010.02.006
1548 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

Fig. 1. Concatenation of synchronization code and the watermark.

There is a trade-off between these requirements. For example, increasing data rate in a watermarking system results in
quality degradation of the watermarked signal and decrease of robustness against attacks.
In the literature, several watermarking techniques have already been proposed for image and video watermarking [3,4].
These techniques can also be applied to audio watermarking. However, audio watermarking algorithms are not easy to de-
velop because of the sensitivity of human auditory system [5]. In the recent years, audio watermarking techniques have
achieved signicant progress, and several good algorithms have been developed. A detailed survey of audio watermark-
ing algorithms can be found in [6]. Most of the recent audio watermarking algorithms can be broadly classied into two
categories: time-domain algorithms [7,8] and frequency-domain algorithms [9,10]. Time-domain algorithms directly insert
the watermark into the audio signal, whereas frequency-domain algorithms embed the watermark based on modifying the
frequency coecients. Compared with frequency-domain algorithms, time-domain algorithms are relatively easier to imple-
ment and require less computational cost, but they are less robust to some audio signal-processing attacks. Some of the
novel and popular audio watermarking algorithms use the patchwork method [11], and spread spectrum techniques [12].
The main weaknesses of the existing algorithms are as follows: (i) The watermark embedding positions are not selected
adaptively according to the characteristics of the audio signals, leading to signicant reduction in imperceptibility [9,10].
(ii) Synchronized code is embedded by modifying individual sample values, thereby reducing the resistance of the syn-
chronized code against signal processing attacks, to a great extent [9,10]. (iii) Low payload, for example 43 bps [8] (time
domain), 22 bps [13] (cepstrum domain), 16 bps [14] (time domain), 10 bps [11] (modied patchwork algorithm (MPA)),
8.5 bps [15] (frequency domain), 5 bps [16] (salient point), 4 bps [17] (time domain), 0.83 bps [18] (Fourier domain), and
0.51 bps [12] (spread spectrum). (iv) Low robustness, like vulnerability to cropping [11,13].
In this paper, an adaptive audio watermarking algorithm in the discrete wavelet transform (DWT) domain based on
singular value decomposition (SVD) is proposed. Henceforth, we refer to our scheme by adaptive DWT SVD. To the best
of our knowledge, this is the rst adaptive DWT SVD audio watermarking scheme. Several image watermarking algorithms
based on SVD can be found in [19]. Adaptive DWT SVD audio watermarking is an excellent algorithm that was rst proposed
for images [20]. Our watermark embedding scheme is based on quantization-index-modulation (QIM) [21] on the singular
values (SVs) of the audio signal in the wavelet domain. The watermark data bits are embedded in the SVs in the SVD of
wavelet domain of each block with adaptive quantization steps. The proposed scheme maintains overall imperceptibility.
Further, the block size, the block weight and the minimum and maximum quantization steps ensure good embedding rates
and robustness against signal-processing attacks. The proposed scheme is blind, that is, the host audio signal is not needed
during watermark extraction. Moreover, quantization parameters are selected adaptively using block weighting parameters
in the wavelet domain, so it is impossible to detect the watermark without quantization parameters. Our proposed scheme
has high data payload and better performance against MP3 compression compared to several recent audio watermarking
algorithms.
The rest of this paper is organized as follows. In Section 2, we introduce the concept of synchronization code. Sections 3
and 4 highlight our watermark embedding and extraction strategies using synchronization code. Sections 5 and 6 presents
experimental results and performance analysis of our scheme. Finally, Section 7 concludes the paper.

2. Synchronization code

Only a few audio watermarking algorithms based on synchronization code have been proposed in the literature [810].
De-synchronization attacks (watermark is present but cannot be detected because of a loss of synchronization) pose a seri-
ous problem to any watermarking scheme, especially during audio watermarking. Some attacks, such as cropping, shifting
and MP3 compression (some MP3 encoders unintentionally add around 1000 samples), which change the length of the
audio signal, usually lead to a failure to extract the watermark. So the correct position of the watermark must be iden-
tied before extraction. This problem can be solved by concatenating a synchronization code and watermark bits to form
a binary sequence as shown in Fig. 1. We have embedded the synchronization code in front of the watermark to locate
the position where the watermark is embedded. The detection of the synchronization code is based on the standard frame
synchronization technology.

3. Watermark embedding algorithm

The block diagram of our watermark embedding algorithm is shown in Fig. 2. SVD is an effective numerical analysis
tool used to analyze matrices. In SVD transformation, every real matrix is decomposed into a product of three matrices.
Let A = { A i j } p q be an arbitrary matrix with SVD of the form A = U S V T , where U and V are orthogonal p p and q q
j j j
matrices, respectively, and S is a p q diagonal matrix with nonnegative elements. The elements 1 , 2 , . . . , u of S are
the singular values (SVs) of the matrix A, and u is the rank of the matrix A. The SVD has some interesting properties:
(i) The sizes of the matrices from SVD transformation are not xed, and the matrices need not be square. (ii) Changing
V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558 1549

Fig. 2. Watermark embedding scheme.

SVs slightly does not affect the quality of the signal much. (iii) The SVs are invariant under common signal processing
operations. (iv) The SVs satises intrinsic algebraic properties.

3.1. Preprocessing

Let X = {x(i ), 1  i  L } be a host digital audio signal with L samples, W = { w (i , j ), 1  i  M , 1  j  M } a binary


image to be embedded within the host signal, and w (i , j ) {0, 1} the pixel value at (i , j ). Finally, let V = { v (i ), 1  i  P }
be a synchronization code with P bits, where v (i ) {0, 1}. The host audio signal X is divided into two parts X 1 and X 2
with L 1 and L 2 samples. The synchronization code and the watermark are embedded into X 1 and X 2 , respectively.

3.2. Synchronization code embedding

The synchronization code can be embedded as follows:

Step 1: The rst part X 1 of the audio signal X is cut into P audio segments, with each audio segment C X 1 (m) having n
samples,
   
C X 1 (m) = cx1 (m)(k) = x1 k + (m 1) n , 1  k  n, 1  m  P , (1)

where cx1 (m) is the mth segment.


Step 2: The mean value of each segment C X 1 (m) is given by:

1
n
C X 1 (m) = cx1 (m)(k) (1  m  P ). (2)
n
k =1

Step 3: Each bit of the synchronization code is embedded into each C X 1 (m) as follows:
1550 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

  
cx1 (m)(k) = cx1 (m)(k) C X 1 (m) C X 1 (m) , if v (i ) = 1,
 
cx1 (m)(k) = cx1 (m)(k) C X 1 (m) + C X 1 (m) ,

if v (i ) = 0,

where the C X 1 (m) = {cx1 (m)(k), 1  k  n} is the original sample, and C X 1 (m) = {cx1 (m)(k), 1  k  n} is the
modied sample. The reason for adding and subtracting the mean value is that the mean value is made positive
when the bit 1 is embedded and it is made negative when the bit 0 is embedded.

3.3. Synchronization code extraction

The synchronization code is extracted as follows:

Step 1: The audio samples are segmented as in the embedding process, and the mean value of each segment is calculated.
Step 2: If the mean value is greater than or equal to zero, a bit 1 is detected. If the mean is less than zero, a bit 0 is
detected.

3.4. Watermark embedding

The watermark embedding steps are given below:

Step 1: The rst-level DWT is applied to the second part X 2 of the audio signal X using the Haar wavelet lter.
Step 2: Segment the low frequency approximate wavelet coecients into 2-D matrix blocks D j , j = 1, 2, 3, . . . , M M,
each of size u u, where M M is the number of bits in the binary watermark image. The row and column
number of 2-D matrix are selected by the user to achieve good rate-distortion-robustness trade-offs. The SVD for
each block is computed.
j j j
Step 3: Let j = (1 , 2 , . . . , u ) be the vector of SVs of block D j . The norm of this vector is computed as follows:


u
 j 
  j 2
z j =   = q . (3)
q =1

Step 4: Compute the mean value m D j and the standard deviation D j for each block D j .
Step 5: The weight of each block is given by:

S j = S mean m D j + S std D j , (4)

where S mean and S std are user-dened weight parameters for m D j and D j , respectively.
Step 6: The maximum value S M = max( S j ) and the minimum value S m = min( S j ) are computed from all the S j values.
Step 7: To increase robustness and decrease distortion, we propose adaptive decision method for quantization steps, which
is better than using constant steps. The quantization step  j for block D j is calculated adaptively as follows:

S j Sm
 j = m + ( M m ) , j = 1, 2, . . . , M M , (5)
S M Sm
where m and  M are user-dened minimum and maximum quantization step values, respectively.
z
Step 8: The integer C =  j  is computed, where  j is the quantization step for z j , corresponding to the block D j .
j
Step 9: Each bit w (i , j ) of the watermark is embedded as follows:
If ( w (i , j ) = 1 and C (mod 2) = 1), then C = C + 1.
If ( w (i , j ) = 0 and C (mod 2) = 0), then C = C + 1.

Step 10: Calculate the value zj =  j C + 2 j and the new value of SVs as follows:

    zj
1j , 2j , . . . , uj = 1j , 2j , . . . , uj . (6)
zj

Step 11: The modied matrix of the blocks D j is obtained using the modied SVs by applying the inverse SVD.
Step 12: Reconstruct the audio segment from all the modied blocks D j . Perform the inverse DWT to get the watermarked
signal X .

4. Watermark extraction algorithm

To nd the exact location of the watermark, we have to rst search and extract the synchronization code. The extraction
of the synchronization code is based on the standard frame synchronization technique used in digital communication [8].
V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558 1551

The embedded synchronization code is searched before the watermark extraction process is started. Theoretically, the max-
imum number of samples to be searched for the synchronization code will be no more than B ( N 1 + N 2 ), where B is the
length of the audio segment and N 1 , and N 2 are the numbers of synchronization code bits and watermark bits respectively.
B ( N 1 + N 2 ) samples from the start of the received audio signal are examined to nd a full synchronization code.

4.1. Search for the synchronization code

The beginning position of the watermarked segment is identied as follows:

Step 1: Set the start index of the received signal to I = 1, and select L = B N 1 samples.
Step 2: Extract the information embedded in the samples from y ( I ) to y ( I + B N 1 ) as explained in Section 3.3.
Step 3: Evaluate the similarity between the extracted synchronization code and the original synchronization code as fol-
lows:
P
i =1 v (i ) v (i )
( v , v ) = , (7)
P 2 P
i =1 v ( i ) i =1 v 2 (i )

where v and v are the original and the extracted synchronization codes respectively. If the similarity between v
and v is greater than or equal to a predened threshold e, then v is taken as the synchronization code, and one
goes to Step 5. Otherwise, proceed to the next step.
Step 4: Increase I by 1 and slide the window to the next L samples and repeat Steps 2 and 3.
Step 5: Discard the rst y ( I + B N 1 ) samples for the extraction of the watermark.

4.2. Watermark extraction

The watermark can be extracted without using the original audio signal as follows:

Step 1: Locate the beginning position (BP) of the watermarked audio signal using synchronization code searching tech-
nique as explained above.
Step 2: Apply the rst-level DWT using the Haar wavelet lter to the audio segment after BP.
Step 3: Segment the low frequency approximate wavelet coecients into 2-D matrix blocks D j , j = 1, 2, . . . , M M, each
of size u u.
j j j
Step 4: For each j, the value z j =  j  is computed, where the vector j = (1 2 , . . . , u ) is formed by the SVs of the
block D j .
z
Step 5: Find the integer value C =  j .
j
Step 6: If C (mod 2) = 0, then the embedded bit is 1, otherwise it is 0.

5. Experimental results and discussions

We have performed extensive simulations using MATLAB 7.1 on different audio signals including classical, country, blues,
jazz and pop music. Each music is a 16-bit mono audio signal in the WAVE format sampled at 44 100 Hz. A plot of a short
portion of the jazz audio signal and its watermarked version is shown in Fig. 3. The embedded watermark is the binary
logo image of size M M = 32 32 = 1024 bits, shown in Fig. 4. We use a 16-bit barker code 1111100110101110 as the
synchronization code and an audio segment length of n = 484 samples for embedding the synchronization code. In our
experiment, we have set 2-D matrix size u u = 22 22. We have set the weight parameters S mean = 0.1 and S std = 0.6,
the minimum quantization parameter m = 0.5 or 0.6 or 0.7 and the maximum quantization parameter  M = 0.9. All these
parameters have been chosen so as to achieve a good compromise between the contending requirements of imperceptibility,
robustness and payload. The threshold e dened in Section 4.1 is set as 0.96.

5.1. Imperceptibility test

Generally, there are two approaches to perform perceptual quality assessment. (i) Subjective listening tests by human
acoustic perception. (ii) Objective evaluation tests by measuring the signal to noise ratio (SNR).
Subjective quality evaluations [22] of the watermarking method has been done by blind listening tests involving 10
persons. Participants listened to the original and the watermarked audio sequences and were asked to report dissimilarities
between them, using a 5-grade impairment scale called Subjective Grade (SG) as shown in Table 1. The average mean
opinion score (MOS) is shown in Table 2. Subjective quality evaluation proved a high transparency of the proposed algorithm
with high MOS.
1552 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

Fig. 3. A plot of a short portion of the jazz audio signal and its watermarked version.

Fig. 4. Binary watermark.

Table 1
Impairment grades.

SG Impairment Quality
5.0 Imperceptible Excellent
4.0 Perceptible, but not annoying Good
3.0 Slightly annoying Fair
2.0 Annoying Poor
1.0 Very annoying Bad

Table 2
SNR and MOS between original and watermarked audio.

Audio le SNR (dB) Average MOS


Classic 26.84 4.6
Country 25.13 4.5
Blues 23.01 4.4
Jazz 24.76 4.5
Pop 22.11 4.3

The objective quality of the watermarked audio signal is measured by the SNR [23]. It is dened as
L
i =1 X 2 (i )
SNR( X , X ) = 10 log10 L (dB), (8)
i =1 [ X ( i ) X (i )]2

where X and X are the original and the watermarked audio signals. The SNR values of all selected audio les are above
20 dB, conrming to IFPI standard, and are shown in Table 2.

5.2. Robustness test

The following attacks were performed to test the robustness and effectiveness of our scheme. The audio editing and
attacking tools adopted in the experiment are Adobe Audition 1.0 (for echo addition) and GoldWave 5.18 (for resampling,
V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558 1553

MP3 compression and denoising). Additive white Gaussian noise, low-pass ltering, requantization, and cropping operations
are implemented using MATLAB 7.1.

(A) Additive white Gaussian noise (AWGN): White Gaussian noise is added to the watermarked signal until the resulting
signal has an SNR of 20 dB.
(B) Resampling: The watermarked signal, originally sampled at 44.1 kHz, is re-sampled at 22.05 kHz, and then restored
back by sampling again at 44.1 kHz.
(C) Low-pass ltering: A second order Butterworth lter with cut-off frequency 11 025 Hz is used.
(D) Requantization: The 16-bit watermarked audio signals is re-quantized down to 8 bits/sample and then back to 16 bits/
sample.
(E) MP3 Compression 64 kbps: The MPEG-1 layer-3 compression is applied. The watermarked audio signal is compressed
at the bit rate of 64 kbps and then decompressed back to the WAVE format.
(F) MP3 Compression 32 kbps: The MPEG-1 layer 3 compression is applied. The watermarked audio signal is compressed
at the bit rate of 32 kbps and then decompressed back to the WAVE format.
(G) Cropping: Segments of 500 samples (5 100) are removed from the watermarked audio signal at ve positions and
subsequently replaced by segments of the watermarked audio signal attacked with low-pass ltering and additive
white Gaussian noise.
(H) Echo addition: An echo signal with a delay of 98 ms and a decay of 41% is added to the watermarked audio signal.
(I) Denoising: The watermarked audio signal is denoised by using the Hiss removal function of GoldWave.

Normalized cross-correlation (NC) is used to evaluate the similarity between the original watermark and the extracted
watermark and is dened as follows:
M M
i =1 j =1 W (i , j ) W (i , j )
NC( W , W ) =
M M , (9)
M M
i =1
2
j =1 W ( i , j ) i =1 j =1 W 2 (i , j )

where W and W are the original and the extracted watermarks respectively, and i, j are indices of the binary watermark
image. If NC ( W , W ) is close to 1, then the similarity between W and W is very high. If NC ( W , W ) is close to zero, then
the similarity between W and W is very low.
The bit error rate (BER) is used to evaluate the watermark detection accuracy after signal processing operations. The BER
of the watermarked signal retrieval is dened as follows:
M M
i =1 j =1 W (i , j ) W (i , j )
BER( W , W ) = , (10)
MM
where is the exclusive or (XOR) operator.
Tables 3 and 4 summarize the robustness results of the proposed scheme obtained against the attacks mentioned above
for different audio les. The NC and BER are also given in Tables 3 and 4. We see that our proposed scheme has higher
NC and lower BER values against various attacks for different audio les. The minimum BER and the maximum BER of the
proposed scheme are 0% and 8% respectively for various audio les against different attacks. The extracted watermark has
good visual quality. This clearly shows the good performance of the proposed scheme against different attacks for different
audio les.
From Table 4 we see that adding noise with SNR 20 dB to a watermarked signal with SNR 22 dB does not lead to any
detection error. This is because we have embedded the watermark into the low-frequency approximate wavelet coecients.
Low-frequency approximate wavelet coecients have high robustness against noise addition.
In Table 5, we compare the performance of several recent audio watermarking schemes against MP3 compression, sorted
by data payload. The proposed scheme has the highest payload and good performance against MP3 compression at the
bit rate of 32 kbps compared to earlier audio watermarking schemes. Our scheme has maximum BER of 1% against MP3
compression at the bit rate of 32 kbps for various audio les.

5.3. Security

Since the watermark embedding algorithm depends solely on the quantization parameters, it is impossible to maliciously
detect the watermark without the quantization parameters. These quantization parameters are used as secret keys for wa-
termark recovery.

5.4. Payload

The data payload refers to the number of bits that are embedded into the audio signal within a unit of time and is
measured in the unit of bps (bits per second). Suppose the length of host audio is L seconds, and the watermark data is M
bits. Then, the data payload D is dened as follows:
1554 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

Table 3
Extracted watermark with NC and BER for jazz audio.

Attack type Proposed scheme (NC) Proposed scheme (BER (%)) Extracted watermark

No attack 1 0

AWGN 1 0

Resampling 0.9854 2

Low-pass ltering 0.9993 0

Requantization 1 0

MP3 64 kbps 0.9980 0

MP3 32 kbps 0.9907 1

Cropping 1 0

Echo addition 0.9847 2

Denoising 0.9634 5

M
D= (bps). (11)
L
The data embedding capacity of the proposed algorithm is 45.9 bps. This performance is superior to other audio water-
marking algorithms, as illustrated in Table 5. The data embedding capacity also depends on the type of the audio signal,
audio imperceptibility and the watermark robustness to be achieved. The data payload can be higher in data-hiding appli-
cations, where robustness is not the primary goal [28] or not fully discussed [29].

6. Performance analysis

The performance of a watermarking system is generally characterized by two types of errors [30], false positive error
(FPE) and false negative error (FNE).

6.1. False positive error analysis

The FPE is the probability that an unwatermarked audio signal declared as watermarked by the decoder.
Let k be the total number watermark bits, and u 1 the number of matching bits. For an unwatermarked audio segment,
the extracted watermark bits are assumed to be independent random variables with probability P 1 matching the origi-
nal watermark bits. Then, based on the assumption of Bernoulli trials, u 1 is independent random variables with binomial
distribution
 
k
P 1 1 (1 P 1 )ku 1 ,
u
P u1 = (12)
u1
V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558 1555

Table 4
NC and BER of extracted watermark from different audio les.

Audio le Attack Proposed scheme Proposed scheme


(NC) (BER (%))
Classical No attack 1 0
AWGN 1 0
Resampling 0.9974 0
Low-pass ltering 0.9987 0
Requantization 1 0
MP3 64 kbps 1 0
MP3 32 kbps 1 0
Cropping 1 0
Echo addition 0.9442 8
Denoising 1 0

Country No attack 1 0
AWGN 1 0
Resampling 0.9947 1
Low-pass ltering 0.9980 0
Requantization 1 0
MP3 64 kbps 1 0
MP3 32 kbps 0.9941 1
Cropping 1 0
Echo addition 0.9934 1
Denoising 0.9974 0

Blues No attack 1 0
AWGN 1 0
Resampling 1 0
Low-pass ltering 1 0
Requantization 1 0
MP3 64 kbps 1 0
MP3 32 kbps 0.9980 0
Cropping 1 0
Echo addition 0.9907 1
Denoising 1 0

Pop No attack 1 0
AWGN 1 0
Resampling 1 0
Low-pass ltering 1 0
Requantization 1 0
MP3 64 kbps 1 0
MP3 32 kbps 1 0
Cropping 1 0
Echo addition 0.9960 1
Denoising 1 0

Table 5
Performance of audio watermarking schemes, sorted by data payload.

Reference Method Payload Blind Robustness to MP3


compression (bit rate tested)
Proposed Adaptive DWT SVD 45.9 (bps) Yes 32 kbps
Lie and Chang [8] Synchronization-based 43 (bps) Yes 80 kbps
Cvejic and Seppanen [24] Spread spectrum 27.1 (bps) Yes 32 kbps
Yeo and Kim [11] MPA 10 (bps) Yes 96 kbps
Tachibana et al. [15] DFT 8.5 (bps) Yes 96 kbps
Mansour and Tewk [16] Salient point 5 (bps) Yes 112 kbps
Li et al. [25] Content-based 4.2 (bps) Yes 32 kbps
Mansour and Tewk [26] Salient point 2.3 (bps) Yes 56 kbps
Xiang et al. [27] Histogram-based 2 (bps) Yes 64 kbps
Kirovski and Malvar [12] Spread spectrum 0.51 (bps) Yes 32 kbps

k
where u is the binomial coecient. Here P 1 is assumed to be 1/2.
1
Further, an audio signal is claimed to be watermarked if the number of matching bits is greater than or equal to a
threshold T 1 . Then, the probability of the cases that u 1  T 1 is the FPE, that is,


k
P fp = P u1 . (13)
u1 =T 1
1556 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

Fig. 5. False positive probabilities under various k.

Here T 1 = (1 BER) k
, where BER is bit error rate between the original watermark bits and the extracted watermark
bits dened by Eq. (10). In most of the practical applications we require very small P f p at 106 order of magnitude. BER
less than 20% can meet this demand. Therefore we set BER = 20%. Hence P f p may be described as


k  
k
P f p = 2k . (14)
u1
u 1 = 0.8k

Fig. 5 plots the false positive probabilities for k (0, 100]. It is evident that the false positive probability approaches
to 0 when k is larger than 20. In our method, k = 1024, hence the false positive probability is close to 0. Indeed, putting
k = 1024 in Eq. (14) gives P f p = 1.18 10529 .

6.2. False negative error analysis

The FNE is the probability that a watermarked audio signal declared as unwatermarked by the decoder.
Let k be the total number watermark bits, and u 2 the number of matching bits. For a watermarked audio segment,
the extracted watermark bits are assumed as an independent random variables with probability P 2 matching the original
watermark bits. Then, based on the assumption of Bernoulli trials, we get
 
k
P 2 2 (1 P 2 )ku 2 .
u
P u2 = (15)
u2
Further, an audio signal is claimed to be unwatermarked if the number of matching bits is less than or equal to a thresh-
old T 2 . Then, the probability of the cases that u 2  T 2 is the FNE, that is,


T2
P fn = P u2 . (16)
u 2 =0

Here T 2 = (1 BER) k
1. If we set BER = 20%, then P f n may be described as

8k
1
0.  
k u2 k u 2
P fn = ( P 2 ) (1 P 2 ) . (17)
u2
u 2 =0

Unlike P 1 , we cannot assume P 2 to be 1/2. For different attacks, P 2 has different values. However, the approximate value of
P 2 may be obtained from BER under different attacks. From Tables 3 and 4 we see that the BERs are all less than 0.08, so
P 2 is taken as 0.92 in our method. Fig. 6 plots the false negative probabilities for k (0, 100]. We see that the false negative
probability approaches to 0 when k is larger than 40. In our method, k = 1024, hence the false negative probability is close
to 0. Indeed, putting k = 1024 and P 2 = 0.92 in Eq. (17) gives P f n = 6.47 1034 .
V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558 1557

Fig. 6. False negative probabilities under various k.

7. Conclusion

This paper proposes an adaptive audio watermarking algorithm based on SVD in the DWT domain using synchronization
code. The watermark data bits are embedded in the SVs of the wavelet blocks of the original audio signals based on QIM.
Experiments demonstrate that the watermarked signals are indistinguishable from the original audio signals. Watermark
detection is ecient and blind. Only quantization parameters are needed but not the original audio signal. Experimental
results show that the proposed scheme is robust against MP3 compression, cropping, low-pass ltering, additive noise, re-
sampling, requantization, echo addition, and denoising. Our scheme has higher payload and better performance against MP3
compression compared to earlier audio watermarking schemes. The false positive and the false negative error probabilities
are very low. The proposed algorithm is suitable for applications like copyright protection, where the embedded data is the
information relevant to the owner of the digital audio media. Finally, the proposed algorithm involves only easy calculations
and admits easy implementation, and is, therefore, practical for real-time applications.

References

[1] I. Cox, M. Miller, J. Bloom, J. Fridrich, T. Kalke, Digital Watermarking and Steganography, second ed., Morgan Kaufmann, Burlington, MA, 2007.
[2] S. Katzenbeisser, F.A.P. Petitcolas, Information Hiding Techniques for Steganography and Digital Watermarking, Artech House, Boston, 2000.
[3] S. Bhattacharya, T. Chattopadhyay, A. Pal, A survey on different video watermarking techniques and comparative analysis with reference to H.264/AVC,
in: IEEE 10th International Symposium on Consumer Electronics (ISCE06), 2006, pp. 16.
[4] V.M. Potdar, S. Han, E. Chang, A survey of digital image watermarking techniques, in: 3rd IEEE International Conference on Industrial Informatics
(INDIN05), 2005, pp. 709716.
[5] M.D. Swanson, B. Zhu, A.H. Tewk, L. Boney, Robust audio watermarking using perceptual masking, Signal Process. 66 (3) (1998) 337355.
[6] N. Cvejic, T. Seppanen, Digital Audio Watermarking Techniques and Technologies, Information Science Reference, USA, 2007.
[7] P. Basia, I. Pitas, N. Nikolaidis, Robust audio watermarking in the time domain, IEEE Trans. Multimedia 3 (2) (2001) 232241.
[8] W.-N. Lie, L.-C. Chang, Robust high-quality time-domain audio watermarking based on low-frequency amplitude modication, IEEE Trans. Multime-
dia 8 (1) (2006) 4659.
[9] X.-Y. Wang, H. Zhao, A novel synchronization invariant audio watermarking scheme based on DWT and DCT, IEEE Trans. Signal Process. 54 (12) (2006)
48354840.
[10] S. Wu, J. Huang, D. Huang, Y.Q. Shi, Eciently self-synchronized audio watermarking for assured audio data transmission, IEEE Trans. Broadcast-
ing 51 (1) (2005) 6976.
[11] I.-K. Yeo, H.J. Kim, Modied patchwork algorithm: A novel audio watermarking scheme, IEEE Trans. Speech Audio Process. 11 (4) (2003) 381386.
[12] D. Kirovski, H.S. Malvar, Spread-spectrum watermarking of audio signals, IEEE Trans. Signal Process. 51 (4) (2003) 10201033.
[13] X. Li, H.H. Yu, Transparent and robust audio data hiding in cepstrum domain, in: IEEE International Conference on Multimedia and Expo, vol. 1, 2000,
pp. 397400.
[14] D. Gruhl, A. Lu, W. Bender, Echo hiding, in: Information Hiding (IH96), in: Lecture Notes in Computer Science, vol. 1174, Springer, Berlin, 1996,
pp. 295315.
[15] R. Tachibana, S. Shimizu, S. Kobayashi, T. Nakamura, An audio watermarking method using a two-dimensional pseudo-random array, Signal Pro-
cess. 82 (10) (2002) 14551469.
[16] M.F. Mansour, A.H. Tewk, Time-scale invariant audio data embedding, EURASIP J. Appl. Signal Process. 2003 (1) (2003) 9931000.
[17] B.-S. Ko, R. Nishimura, Y. Suzuki, Time-spread echo method for digital audio watermarking using PN sequences, in: IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP02), vol. 2, 2002, pp. 20012004.
[18] M. Arnold, Audio watermarking: features, applications and algorithms, in: IEEE International Conference on Multimedia and Expo (ICME 2000), vol. 2,
2000, pp. 10131016.
1558 V. Bhat K et al. / Digital Signal Processing 20 (2010) 15471558

[19] E. Yavuz, Z. Telatar, Improved SVD-DWT based digital image watermarking against watermark ambiguity, in: Proceedings of the ACM Symposium on
Applied Computing, 2007, pp. 10511055.
[20] P. Bao, X. Ma, Image adaptive watermarking using wavelet domain singular value decomposition, IEEE Trans. Circuits Systems Video Technol. 15 (1)
(2005) 96102.
[21] B. Chen, G.W. Wornell, Quantization index modulation: A class of provably good methods for digital watermarking and information embedding, IEEE
Trans. Inform. Theory 47 (4) (2001) 14231443.
[22] N. Christian, H. Jurgen, Digital watermarking and its inuence on audio quality, in: Proceeding of the 105th AES Convention, 1998, preprint 4823.
[23] E. Ercelebi, L. Batakcil, Audio watermarking scheme based on embedding strategy in low frequency components with a binary image, Digital Signal
Process. 19 (2) (2009) 265277.
[24] N. Cvejic, T. Seppanen, Spread spectrum audio watermarking using frequency hopping and attack characterization, Signal Process. 84 (1) (2004) 207
213.
[25] W. Li, X. Xue, P. Lu, Localized audio watermarking technique robust against time-scale modication, IEEE Trans. Multimedia 8 (1) (2006) 6069.
[26] M.F. Mansour, A.H. Tewk, Data embedding in audio using time-scale modication, IEEE Trans. Speech Audio Process. 13 (3) (2005) 432440.
[27] S. Xiang, H.J. Kim, J. Huang, Audio watermarking robust against time-scale modication and MP3 compression, Signal Process. 88 (10) (2008) 2372
2387.
[28] T.F. Tilki, A.A. Beex, Encoding a hidden auxiliary channel onto a digital audio signal using psychoacoustic masking, in: Proceedings IEEE Southeastcon97,
Engineering New Century, 1997, pp. 331333.
[29] Y. Wang, A new watermarking method of digital audio content for copyright protection, in: Fourth International Conference on Signal Processing
Proceedings (ICSP98), vol. 2, 1998, pp. 14201423.
[30] M. Fan, H. Wang, Chaos-based discrete fractional Sine transform domain audio watermarking scheme, Comp. Electrical Eng. 35 (3) (2009) 506516.

Mr. Vivekananda Bhat K received M.Sc. degree from Mangalore University, Karnataka, India, in 1995 and M.Tech.
degree from National Institute of Technology Karnataka, Surathkal, India, in 2003. He is currently working toward
Ph.D. degree in the Department of Computer Science and Engineering of the Indian Institute of Technology, Kharagpur,
India. He has 4 publications in international journal and conferences. His research interests include digital watermark-
ing and cryptography. He can be contacted through email: kvivekbhat@gmail.com.

Dr. Indranil Sengupta obtained B.Tech., M.Tech., and Ph.D. degrees in Computer Science and Engineering from the
University of Calcutta, India, in 1983, 1985, and 1990, respectively. He joined Indian Institute of Technology, Kharag-
pur, as a faculty member in 1988, in the Department of Computer Science and Engineering, where he is presently
a Professor and the Head of the Department. He also heads the School of Information Technology of the Institute.
A Centre of Excellence in Information Assurance has been set up at IIT Kharagpur under his leadership, where a
number of security related projects are presently being executed. He has over 24 years of teaching and research
experience, and over 100 publications to his credit in international journals and conferences. His research interests
include cryptography and network security, VLSI design and testing, mobile computing, and digital watermarking.

Dr. Abhijit Das received Bachelor degree in Electronics and Telecommunication Engineering from Jadavpur Univer-
sity, Calcutta, India, in 1991, M.E. and Ph.D. degrees in Computer Science and Engineering from the Indian Institute of
Science, Bangalore, India, in 1993 and 2000, respectively. He is currently an Assistant Professor in the Department of
Computer Science and Engineering, Indian Institute of Technology (IIT) Kharagpur. Before joining IIT, he held academic
positions at the Indian Institute of Technology Kanpur and Ruhr-Universitt Bochum, Germany. He has 12 publications
in international journals and conferences. His research interests include cryptography, number-theoretic and algebraic
computations, and digital watermarking.

View publication stats

You might also like