You are on page 1of 11

www.ietdl.

org
Published in IET Signal Processing
Received on 7th October 2013
Revised on 17th July 2014
Accepted on 26th August 2014
doi: 10.1049/iet-spr.2013.0399

ISSN 1751-9675

Wavelet-domain audio watermarking using optimal


modification on low-frequency amplitude
Shuo-Tsung Chen1,2, Chih-Yu Hsu3, Hunag-Nan Huang2
1
Sustainability Research Center, Tunghai University, Taichung 40704, Taiwan (R.O.C)
2
Department of Applied Mathematics, Tunghai University, Taichung 40704, Taiwan (R.O.C)
3
Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan (R.O.C)
E-mail: nhuang@thu.edu.tw

Abstract: On the basis of the Karush–Kuhn–Tucker (KKT) theorem, a novel digital audio watermarking scheme is proposed. To
guarantee the robustness of a watermark, this scheme embeds information into low-frequency coefficients of audio’s discrete
wavelet transform. For the modification of low-frequency amplitude, this study uses the KKT theorem to minimise the
difference between the original and the watermarked coefficients. Accordingly, embedding strength is increased to enhance
the most robustness of a watermarked audio under sufficient embedding capacity and audio quality. In addition, the proposed
watermarking scheme can extract the hidden data without the knowledge of original audio signal. Experimental results
indicate that the performance of the proposed scheme is mostly better than other amplitude modification methods.

1 Introduction embedding strength, the synchronisation codes are not


robust enough. However, if the synchronisation codes are
Digital watermarking is an efficient technique for copyright embedded in DCT, then the computation cost increases.
protection and information hiding. According to the Wu et al. [11] used single-coefficient quantisation index
application field, the digital watermarking can be classified modulation method to embed information into
into three categories: digital image watermarking, digital low-frequency coefficients in discrete wavelet transform
video watermarking and digital audio watermarking. (DWT). Their method achieves robustness against common
For the copyright protection of music, many audio signal processing procedure and noise corruption. However,
watermarking methods have been proposed recently. These it is very vulnerable to re-sampling, amplitude and time
methods can be grouped into two categories: time-domain scaling. Some authors improved this method by using
[1–9] and frequency-domain [10–26]. In the time-domain, multi-coefficient quantisation [12–14, 24, 25]. The methods
different embedding techniques [1–3] are first applied to of multi-coefficient quantisation are still vulnerable to
digital watermarking. Lie and Chang [6] adopted amplitude and time scaling. According to the audio
low-frequency amplitude modification to enhance statistics characteristics and synchronisation code technique,
robustness. However, it has an extremely low capacity. This Wang et al. [16, 17] gave a robust audio watermarking
is because they use three segments (length = 1020 points) to scheme in wavelet domain. However, they only embed the
present 1 bit. Xiang and Huang [7] proposed a synchronisation code by modifying individual sample value,
histogram-based audio watermarking which used the which reduces the resisting ability greatly so that their
invariant feature of audio histogram to design a robust method is vulnerable to the attacks: re-sampling, amplitude
method against time scaling. Peng and Wang [9] proposed scaling and time scaling. To improve these two methods,
an optimal audio watermarking scheme using genetic Wang et al. [23] proposed a robust digital audio
optimisation with variable-length mechanism. The presented watermarking scheme using wavelet moment invariance.
genetic optimisation procedure can automatically determine Chen and Huang [18] and Chen et al. [19, 26] proposed a
optimal embedding parameters for each audio frame of an novel audio watermarking that embeds information by
audio signal and the employed variable-length mechanism energy-proportion scheme. Their scheme can resist some
can effectively search most suitable positions for watermark attacks, especially amplitude scaling. However, the
embedding. However, the computing time is large since watermarked audio has low quality and is vulnerable to
generic algorithm is a kind of searching method and the time scaling. To enhance both signal-to-noise ratio (SNR)
obtained optimal solution is not verified. and robustness of the single-coefficient quantisation
In the frequency-domain, Huang et al. [10] embeds the method, Chen et al. [20] proposed the wavelet-domain
watermark into discrete cosine transform (DCT) audio watermarking scheme using optimisation-based
coefficients, and hides the Bark code in the time-domain as quantisation which is robust to common signal processing
synchronisation codes. Since the time-domain has low such as re-sampling, MP3 compression, amplitude scaling etc.

166 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399
www.ietdl.org
To achieve high robustness in time-domain, authors usually the Vk above. The orthogonal relations follow the existence
use large number of samples to represent a binary bit. of sequences h = {hk }k[Z and g = {gk }k[Z that satisfy the
However, such an embedding technique results in low following identities
capacity [6, 7]. This paper utilises low-frequency √ 
coefficients in DWT to embed the watermark. To modify hk = kw0, 0 , w−1, k l and w(t) = 2 hk w(2t − k)
these low-frequency coefficients, this paper uses the k[Z
Karush–Kuhn–Tucker (KKT) theorem to minimise the √ 
difference between the original and the watermarked gk = kc0, 0 , w−1, k l and c(t) = 2 gk w(2t − k)
coefficients in the matrix form. Accordingly, embedding k[Z
strength can be increased to enhance robustness under
sufficient audio quality and embedding capacity. In where h = {hk }k[Z and g = {gk }k[Z denote low-pass and
addition, the proposed watermarking scheme can extract the high-pass filters, respectively. Throughout this paper, the
hidden data without the help of original audio signal. host digital audio signal S(n), n [ N denotes the samples of
Finally, the performance is evaluated by embedding the original audio signal S(t) at the nth sample time. This
capacity, SNR, perceptual evaluation of audio quality paper adopts orthogonal Haar wavelet bases to implement
(PEAQ), mean opinion score and bit error rate (BER). DWT on the host digital audio signal S(n) through the filter
Simulation results verify the efficiency of the proposed bank [27].
scheme.
The rest of this paper is organised as follows. Overview of 2.2 Optimisation solver
DWT and mathematical clarification of the KKT theorem are
presented in Section 2. Section 3 uses the KKT theorem to To find the extreme of a matrix function, an optimisation
propose a novel scheme of watermark embedding and solver called the KKT theorem is summarised [28, 29].
detecting. Some experiments are used to test the Suppose that CN = [c1, c2, …, cN]T is an N × 1 matrix
performance of the proposed watermarking scheme in (vector). The problem of minimising a performance index f
Section 4. Conclusions are finally drawn in Section 5. (CN) subject to constraints gi (CN) ≤ 0, i = 1, 2,…, m can be
described as
2 Mathematical clarifications
minimise f (CN ) (3a)
To develop an optimisation-based scheme for audio
subject to gi (CN ) ≤ 0, i = 1, 2, . . . , m (3b)
watermarking, DWT and the KKT theorem for later use are
reviewed in this section.
To solve (3), the KKT theorem is introduced as follows.
KKT theorem: Let f and gi, i = 1, 2,…, m be convex and
2.1 DWT continuously differentiable
  functions and there exists a
point CN0 such that gi CN0 , 0, i = 1, 2, . . . , m. Then CN∗
Wavelets are obtained using a single prototype function, c(x), is a global optimal solution to minimise f(CN) subject to
which is regulated by the scaling and shift parameters. The constraints gi (CN) < 0, i = 1, 2,…, m, if and only if there
discrete normalised scaling and wavelet basis function are exists a m × 1 vector l = [l1, l2, …, lm] such that
defined as
    ∗
(1) ∇f CN∗ + m i=1 li ∇gi CN = 0,
w j, k (t) = 2 j/2 hj (2j t − k) (1) ∗
(2) gi CN ≤ 0, i = 1, 2, . . . , m,
(3) l
i ≥ 0, i = ∗1, 2, . . . , m and
c j, k (t) = 2 j/2 gj (2j t − k) (2) (4) m i=1 li gi CN = 0.

A construction of two subspaces In the above theorem, we refer to l as the KKT multiplier
vector and conditions (1)–(4) as the KKT condition.
Vj = span{w j, k : k [ Z} Specially, when m = 1 the KKT multiplier l is a scalar and
the KKT condition becomes
and        
∇f CN∗ + l∇g CN∗ = 0, g CN∗ ≤ 0, l ≥ 0, lg CN∗ = 0
Wj = span{c j, k : k [ Z}
And hence the global solution CN∗ can be calculated either by
follows where j and m refer to the dilation and translation (corresponding to the case l = 0)
parameters. Moreover, it is a necessity that the subspaces    
∇f CN∗ = 0 with g CN∗ ≤ 0 (4)
{0} , · · · , V1 , V0 , V−1 , · · · , L (R) 2
or (corresponding to another case l > 0)
2
form a multi-resolution analysis of L (R) and the subspaces      
…,W1, W0, W−1, …stand for the orthogonal differences of ∇f CN∗ + l∇g CN∗ = 0 with g CN∗ = 0 (5)

Fig. 1 Structure of synchronisation codes and watermarks

IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176 167


doi: 10.1049/iet-spr.2013.0399 & The Institution of Engineering and Technology 2015
www.ietdl.org
If we let 3.1 Adaptive threshold and embedding process

H(CN , l) = f (CN ) + lg(CN ) (6) Since the watermarked audio may suffer the attack of shifting
or cropping, it is necessary to embed the synchronisation
then solving (5) becomes to minimise the function H(CN, l) codes together. These synchronisation codes are used to
without any constraint which leads to the optimal solution locate the positions where the watermark is embedded. The
computed by structure is shown in Fig. 1. Before embedding, the
synchronisation codes and watermark are arranged into a
∂H ∂H binary sequence, j = {ji|ji = 1 or 0}. To determine the
=0 and =0 (7) adaptive threshold for embedding and extracting the binary
∂l ∂CN
sequence, original audio signal with length L is firstly cut
into I segments and then H-level DWT is performed on
3 Proposed watermarking scheme each segment. Accordingly, the total number of
lowest-frequency coefficients in each segment is K =
To achieve high robustness in time-domain, method [11] uses L/(I·2H) and the mean of these coefficients is
large number of samples to represent a binary bit. However,
such an embedding technique results in low capacity. To
guarantee the robustness of the watermark under high 1 K
m= |c | (8)
capacity, this paper embeds information into low-frequency K i=0 i
coefficients in DWT. Generally, the quality of a
watermarked audio is measured by SNR. To have the best The two thresholds are computed adaptively as
quality under the embedding constraint, this section rewrites
SNR as a performance index. Then the difficulty of h1 = m + 1 (9)
modifying consecutive coefficients is converted into an
optimisation problem. Finally, the KKT theorem is applied and
to solve this problem, and thus an optimisation-based
watermarking formula is obtained. h0 = m − 1 (10)

Fig. 2 Watermark embedding procedure

168 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399
www.ietdl.org
where ε > 0 is an embedding strength. Moreover, the vector whose entries are positive and can be arbitrarily
parameters m and ε for each segment are used as additional assigned by an encoder. To avoid the situation that the
keys for the watermark embedding and extraction. value of some entries becomes arbitrarily large, without loss
In the proposed embedding process, the DWT of generality we may set the summation of all the scaling
lowest-frequency coefficients of each audio segment are factors
 equal to a  constant N. For example,
partitioned into several groups and each group contains N A = 0.9 1.2 1.2 0.7 is a suitable selection when
consecutive coefficients. Let N = 4. With the suitable scaling vector A, we consider the
optimisation problem for watermarking is to select the
 T vector ĈN such that the SNR is maximised under the
CN = |c1 | |c2 | ··· |cN | (11)
constraint (12) or (13). The solution of the vector ĈN will
be derived optimally in next section.
Then, the binary bit of value 0 or 1 is embedded by the
following rules
3.2 Optimisation-based watermarking formula
† AĈN ≥ h1 , if ji = 1 (12) In general, the quality of the watermarked audio is evaluated
by SNR which is defined as follows
† AĈN ≤ h0 , if ji = 0 (13)


S(n) 2
where ĈN is watermarked wavelet coefficient vector SNR = 10 log10
2
(14)
Ŝ(n) − S(n) 2
corresponding to CN and A = [a1, …, aN] is the scaling 2

Fig. 3 Watermark extraction procedure

IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176 169


doi: 10.1049/iet-spr.2013.0399 & The Institution of Engineering and Technology 2015
www.ietdl.org
or


Ŝ(n) − S(n) 2
SNR = −10 log10 2 (15)
S(n) 2
2

where S(n) and Ŝ(n) denote the original and the modified
audio signal. Another consideration of SNR in (15) is


Ĉ − C 2
N N 2
SNR = −10 log10 2 (16)
C
N 2

since we use orthogonal wavelet basis. For optimisation, we


modify (16) as a performance index

Ĉ − C 2
N N
Fig. 4 Relationship between SNR and the scaling factor a2 for 2 (17)
C
popular N

or

(ĈN − CN )T (ĈN − CN )
(18)
CNT CN

Assuming we are embedding the watermark with the binary


bit of value 1, by connecting (17) with (12) we have the
following corresponding optimisation problem

(ĈN − CN )T (ĈN − CN )
minimise (19a)
CNT CN

subject to AĈN ≥ h1 (19b)

Fig. 5 Relationship between SNR and the scaling factor a2 for


piano

Fig. 7 Relationship between SNR and the scaling factor a2 for


symphony

Table 1 Subjective difference grades


SDG Qualities Impairments

0.0 excellent imperceptible


−1.0 good perceptible but not annoying
−2.0 fair slightly annoying
−3.0 poor annoying
Fig. 6 Relationship between SNR and the scaling factor a2 for −4.0 bad very annoying
singing

170 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399
www.ietdl.org
Table 2 Embedding domain, embedding capacity, SNR and PEAQ
Each audio’s Each audio’s Average SNR, dB and the Average PEAQ and the
embedding domain embedding capacity corresponding variance corresponding variance

reference paper time-domain low 500 bits/11.6 s 22.1|2.62 (popular) −1.03|0.66 (popular)
[6] frequency 15.7|1.43 (piano) −2.53|0.56 (piano)
17.4|1.42 (singing) −1.68|0.52 (singing)
20.3|2.53 (symphony) −1.22|0.62 (symphony)
reference paper DWT 7-level 1333 bits/11.6 s 20.5|2.83 (popular) −2.26|2.77 (popular)
[19] 17.1|2.27 (piano) −2.47|1.78 (piano)
17.6|2.14 (singing) −2.15|0.73 (singing)
17.2|2.68 (symphony) −2.04|1.89 (symphony)
proposed DWT 7-level 2000 bits/11.6 s 24.1|0.13 (popular) −1.01|0.24 (popular)
method (N = 2) 20.3|0.08 (piano) −1.56|0.22 (piano)
21.8|0.04 (singing) −1.02|0.18 (singing)
20.2|0.06 (symphony) −1.26|0.13 (symphony)
proposed DWT 7-level 1000 bits/11.6 s 23.5|0.12 (popular) −1.03|0.09 (popular)
method (N = 4) 19.8|0.06 (piano) −1.13|0.06 (piano)
21.3|0.04 (singing) −1.06|0.06 (singing)
20.2|0.04 (symphony) −1.22|0.08 (symphony)

whose optimal solution can be computed via the KKT Since CNT CN is a constant, H(ĈN , l) can be rewritten as
theorem with m = 1. Thus, there are two possible solutions follows
by using (4) or (7). From (4), one of the optimal solution is  
given by ĈN∗ = CN such that AĈN∗ = ACN ≥ h1 , that is, no H(ĈN , l) = (ĈN − CN )T (ĈN − CN ) + lCNT CN AĈN − h1
coefficient adjustment is needed to embedding the binary (21)
bit of value ‘1’. If the condition ACN ≥ h1 does not hold,
then we seek for another solution via (7). Set l denote the The necessary conditions for existence of the minimum of
multiplier to combine (19a) and (19b) to form an H(ĈN , l) are
unconstrained matrix function
∂H
= 2(ĈN − CN ) + lAT CNT CN = 0 (22)
∂ĈN
(ĈN − CN ) (ĈN − CN )
T  
H(ĈN , l) = + l AĈN − h1 (20) ∂H
CNT CN = AĈN − h1 = 0 (23)
∂l

Table 3 Re-sampling
Music types Re-sampling rate, Hz 96 000 22 050 11 025 8000 6000

popular reference paper [6] BER, % mean 0.28 0.32 0.34 0.41 0.29
variance 0.02 0.03 0.02 0.06 0.08
reference paper [19] BER, % mean 4.22 3.12 4.52 5.13 5.61
variance 0.12 0.04 0.22 0.62 0.44
proposed method (N = 2) BER, % mean 0.04 0 0 0.08 0.21
variance 0.01 0 0 0 0.03
proposed method (N = 4) BER, % mean 0.03 0 0 0 0.06
variance 0 0 0 0 0
piano reference paper [6] BER, % mean 0.80 0.31 0.92 1.62 2.13
variance 0.06 0.04 0.04 0.06 0.04
reference paper [19] BER, % mean 5.26 5.26 6.81 7.43 7.26
variance 0.22 0.12 0.38 0.46 0.42
proposed method (N = 2) BER, % mean 0.29 0.23 0.65 1.14 1.68
variance 0.04 0.01 0.02 0.04 0.04
proposed method (N = 4) BER, % mean 0.12 0 0 0.15 0.42
variance 0.01 0 0 0.02 0.02
singing reference paper [6] BER, % mean 0.22 0.28 0.30 0.54 0.56
variance 0.01 0.02 0.02 0.04 0.04
reference paper [19] BER, % mean 6.93 6.92 11.3 12.3 12.5
variance 0.14 0.06 0.06 0.24 0.32
proposed method (N = 2) BER, % mean 0.02 0 0 0.12 0.02
variance 0 0 0 0.02 0.01
proposed method (N = 4) BER, % mean 0 0 0 0 0
variance 0 0 0 0 0
symphony reference paper [6] BER, % mean 0.12 0.28 0.31 0.48 0.50
variance 0.03 0.01 0.01 0.02 0.01
reference paper [19] BER, % mean 5.74 4.81 9.92 9.98 10.7
variance 0.16 0.32 0.29 0.58 0.62
proposed method (N = 2) BER, % mean 0.12 0.14 0.42 0.44 0.46
variance 0.04 0.01 0.02 0.02 0.02
proposed method (N = 4) BER, % mean 0.04 0 0.05 0.07 0.13
variance 0 0 0 0.01 0

IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176 171


doi: 10.1049/iet-spr.2013.0399 & The Institution of Engineering and Technology 2015
www.ietdl.org
Table 4 MP3 compression Table 5 Low-pass filter
Bit rate, kbps 128 112 96 80 Cut-off frequency, kHz 3 5

popular reference mean 0.25 0.31 1.56 4.12 popular reference paper [6] BER, % mean 17.4 16.8
paper [6] BER, variance 0.01 0.02 0.02 0.04 variance 0.02 0.02
% reference paper [19] BER, % mean 25.1 24.3
reference mean 0.38 1.68 2.46 5.21 variance 0.64 0.52
paper [19] variance 0.04 0.12 0.12 0.34 proposed method (N = 2) mean 5.96 2.26
BER, % BER, % variance 0.01 0.02
proposed mean 0.06 0.28 1.74 3.12 proposed method (N = 4) mean 6.96 2.32
method (N = 2) variance 0 0.02 0.04 0.03 BER, % variance 0.01 0.01
BER, % piano reference paper [6] BER, % mean 19.9 19.3
proposed mean 0.04 0.24 1.39 2.86 variance 0.03 0.02
method (N = 4) variance 0 0.01 0.01 0.02 reference paper [19] BER, % mean 24.6 24.2
BER, % variance 0.44 0.46
piano reference mean 0.28 0.43 2.92 4.33 proposed method (N = 2) mean 6.95 1.78
paper [6] BER, variance 0.02 0.02 0.04 0.12 BER, % variance 0.02 0.02
% proposed method (N = 4) mean 6.84 1.98
reference mean 0.45 1.92 2.88 5.43 BER, % variance 0.02 0.01
paper [19] variance 0.04 0.22 0.24 0.21 singing reference paper [6] BER, % mean 18.2 17.3
BER, % variance 0.01 0.02
proposed mean 0.08 0.42 1.92 3.56 reference paper [19] BER, % mean 31.9 29.8
method (N = 2) variance 0.02 0.04 0.02 0.02 variance 0.46 0.44
BER, % proposed method (N = 2) mean 5.12 1.98
proposed mean 0.07 0.31 1.86 3.32 BER, % variance 0.01 0.01
method (N = 4) variance 0 0.01 0.02 0.01 proposed method (N = 4) mean 5.06 2.04
BER, % BER, % variance 0.02 0.01
singing reference mean 0.08 0.11 2.01 2.88 symphony reference paper [6] BER, % mean 20.3 20.1
paper [6] BER, variance 0 0.02 0.02 0.04 variance 0.02 0.04
% reference paper [19] BER, % mean 28.2 27.5
reference mean 0.32 0.98 1.42 2.24 variance 0.32 0.31
paper [19] variance 0.04 0.04 0.05 0.08 proposed method (N = 2) mean 7.16 2.04
BER, % BER, % variance 0.01 0.01
proposed mean 0.06 0.18 1.49 2.11 proposed method (N = 4) mean 7.34 2.28
method (N = 2) variance 0 0.01 0.02 0.02 BER, % variance 0.01 0.01
BER, %
proposed mean 0.03 0.12 1.18 2.36
method (N = 4) variance 0 0.01 0.02 0.02
BER, % coefficients is
symphony reference mean 0.22 0.28 0.96 1.12
paper [6] BER, variance 0.01 0.03 0.04 0.04
% ĈN∗ = CN − AT (AAT )−1 (ACN − h1 ) (28)
reference mean 0.37 2.37 3.32 5.79
paper [19] variance 0.02 0.26 0.54 0.52
BER, % where the superscript * denotes the optimal result with respect
proposed mean 0.08 0.24 0.58 0.58 to the corresponding variable. On the basis of (28), the binary
method (N = 2) variance 0.01 0.01 0.02 0.01 bit ‘1’ can be embedded by the optimal modified coefficients
BER, % ĈN∗ . In other words, the binary bit ‘0’ is embedded by using h0
proposed mean 0.01 0.10 0.12 0.26
method (N = 4) variance 0 0.01 0.02 0.02
instead of h1
BER, %
ĈN∗ = CN − AT (AAT )−1 (ACN − h0 )

Multiply (22) by A, we observe that


Fig. 2 shows the proposed embedding procedure.
 
2 AĈN − ACN + lAAT CNT CN = 0 (24)
3.3 Extraction process
Since When extracting the hidden information, we split the test
audio into several segments and then perform DWT on
AĈN = h1 (25) each segment in the same manner as in the embedding
process. Let ĈN∗ be the optimally watermarked coefficient
vector
∗ which has N consecutive absolute values
from (23), and CNT CN is a constant, (24) is rewritten as ĉ , i = 1, . . . , N . We extract the binary sequence j by
i
using the following rules:
2(h1 − ACN ) + lCNT CN AAT = 0 (26)
† If AĈN∗ ≥ h1 , the extracted bit j̃i = 1.
Hence the optimal solution for parameter l is † If AĈN∗ ≤ h0 , the extracted bit j̃i = 0.

where h0 and h1 are obtained by using (9) and (10) with two
2
l∗ = T (AAT )−1 (ACN − h1 ) (27) stored keys m and ε for each audio segment. After finding the
CN CN synchronisation code, we continue to apply the above
extraction rules to detect the watermark. The extraction
Replacing (27) to (22), the optimal solution of modified procedure is shown in Fig. 3.

172 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399
www.ietdl.org
3.4 Relationship between SNR and scaling factors 44 100 × 11.609 ÷ 4 ÷ 27 ÷ 2 = 500 bits for N = 2 and 44
100 × 11.609 ÷ 4 ÷ 27 ÷ 4 = 250 bits for N = 4. Accordingly,
To understand the effect of adjusting scaling factors on the the audio’s embedding capacity is 2000 and 1000 bits for
SNR, we consider the case with N = 4 and the scaling N = 2 and N = 4.
vector to be A = a1 a2 1 1 , that is, a1 + a2 = 2.
Figs. 4–7 show the relationship between the SNR and the
scaling factor a2 for audio popular and piano. As the 4.2 SNR and PEAQ
scaling factor a2 is decreasing from one, the SNR still
maintains about 21 and 15 dB. It indicates that the variance The watermarked audio quality is evaluated mathematically
of the scaling factors has no obvious effect on SNR. Hence, by SNR defined in (15) or (16). To evaluate the
we  randomly set  the scaling vector to watermarked audio quality practically, we performed an
A = 1.4 0.6 1.6 0.4 in the experiments in next objective listening test by using PEAQ which is a
section. standardised algorithm for objectively measuring perceived
audio quality [30–32]. The analyses of the algorithm results
are based on the subjective difference grade (SDG). The
4 Experimental results output values of the algorithm are mapped to SDG by an
artificial neural network with one hidden layer. It compares
This section applies the proposed watermarking scheme to the audio signal under test with the original reference audio
four types of 16 bit mono audios, namely popular, piano, signal. The SDG indicates the measured basic audio quality
singing and symphony music, sampled at 44 100 Hz. Each of the signal under test on a continuous scale from −4
type has 12 audio files. The length of each audio is about (very annoying impairment) to 0 (imperceptible
11.6 s. In the embedding process, the embedding strength ε impairment). The meaning of each score in SDG is listed in
is set to 2000 and 4000 for N = 2 and N = 4. The Table 1. The results are listed in Table 2. We can see that
performance of the proposed watermarking scheme is SNR of our method is mostly 20 dB under high embedding
evaluated by embedding capacity, SNR, PEAQ and BER. capacity. On the basis of the sufficient quality of the
watermarked audio, the proposed optimisation-based
scheme has higher embedding capacity than the methods
4.1 Embedding domain and capacity proposed by Lie and Chang [6] and Chen et al. [19].
Before embedding the watermark, each audio is cut into four
segments. By applying 7-level DWT to each segment of the 4.3 Robustness tests
original audio, we embed the synchronisation code and
watermark into the lowest-frequency sub-band. Since the Various attacks were used to test the robustness, classified
watermark is embedded into DWT level seven, the into seven types: (i) re-sampling (ii) MP3 compression (iii)
embedding capacity of each segment is calculated as low-pass filtering (iv) amplitude scaling (v) time scaling

Table 6 Amplitude scaling


Amplification, % 0.8 0.9 1.1 1.2

popular reference paper [6] BER, % mean 0.54 0.54 0.42 0.58
variance 0.02 0.01 0.03 0.03
reference paper [19] BER, % mean 0.45 0.43 0.34 0.38
variance 0.01 0 0 0
proposed method (N = 2) BER, % mean 7.98 6.17 2.88 1.96
variance 0.12 0.04 0.04 0.03
proposed method (N = 4) BER, % mean 29.6 25.4 0.36 0.36
variance 0.21 0.28 0.02 0.01
piano reference paper [6] BER, % mean 0.79 0.79 0.63 0.65
variance 0.03 0.04 0.04 0.04
reference paper [19] BER, % mean 0.43 0.35 0.32 0.38
variance 0.01 0 0 0
proposed method (N = 2) BER, % mean 15.8 12.6 3.22 2.52
variance 0.33 0.22 0.16 0.06
proposed method (N = 4) BER, % mean 34.7 29.4 0.43 0.45
variance 0.22 0.24 0.12 0.08
singing reference paper [6] BER, % mean 0.62 0.64 0.54 0.55
variance 0.01 0.01 0 0
reference paper [19] BER, % mean 0.07 0.07 0.07 0.07
variance 0 0 0 0
proposed method (N = 2) BER, % mean 8.22 6.33 3.11 2.09
variance 0.22 0.08 0.12 0.08
proposed method (N = 4) BER, % mean 29.6 23.2 0.18 0.18
variance 0.36 0.29 0.05 0.04
symphony reference paper [6] BER, % mean 0.72 0.71 0.58 0.55
variance 0.13 0.02 0.01 0.01
reference paper [19] BER, % mean 0.41 0.32 0.24 0.33
variance 0.01 0 0 0.01
proposed method (N = 2) BER, % mean 16.1 14.3 2.92 2.77
variance 0.24 0.31 0.26 0.22
proposed method (N = 4) BER, % mean 31.2 27.5 0.44 0.58
variance 0.44 0.46 0.02 0.02

IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176 173


doi: 10.1049/iet-spr.2013.0399 & The Institution of Engineering and Technology 2015
www.ietdl.org
Table 7 Time scaling other two approaches by Lie and Chang [6] and Chen et al.
[19]. The results of testing these seven attacks are shown as
Time scaling −5 −2 2 5
amount, % follows:

popular reference paper mean 38.9 40.3 38.7 38.8 (1) Re-sampling: The sampling rate of the watermarked audio
[6] BER, % variance 0.24 0.23 0.05 0.04 was down-sampled from 44 100 to 22050 Hz, 8000 and
reference paper mean 41.4 41.1 40.3 41.4 6000 Hz and then back to 44 100 Hz using interpolation.
[19] BER, % variance 0.08 0.04 0.06 0.05 Similarly, the sampling rate of the watermarked audio was
proposed mean 41.9 40.2 39.4 43.4
method (N = 2) variance 0.12 0.03 0.02 0.04 up-sampled from 44 100 to 96 000 Hz and then back to 44
BER, % 100 Hz. Table 3 shows the results of these re-sampling
proposed mean 40.7 40.1 39.2 41.4 processes for three approaches, which indicate that the
method (N = 4) variance 0.06 0.03 0.02 0.04 proposed scheme is more robust than the methods proposed
BER, %
piano reference paper mean 43.9 43.5 45.2 45.8
by Lie and Chang [6] and Chen et al. [19].
[6] BER, % variance 0.24 0.22 0.03 0.13 (2) MP3 compression: MP3 compression is the
reference paper mean 46.2 45.1 44.2 46.3 conventionally adopted audio compression method. Table 4
[19] BER, % variance 0.32 0.21 0.13 0.23 presents the results of applying MP3 compression at
proposed mean 42.8 43.2 42.4 44.7 different bit rates to the watermarked audio. The proposed
method (N = 2) variance 0.14 0.05 0.06 0.07
BER, % method was found to have better robustness MP3
proposed mean 42.1 42.1 42.3 42.4 compression attack than the methods proposed by Lie and
method (N = 4) variance 0.11 0.03 0.03 0.06 Chang [6] and Chen et al. [19].
BER, % (3) Low-pass filtering: Table 5 shows the effect of using a
singing reference paper mean 41.6 40.1 39.2 39.8
[6] BER, % variance 0.05 0.04 0.02 0.03
low-pass filter with a cutoff frequency of 3 and 5 kHz. The
reference paper mean 45.6 43.7 42.5 43.1 proposed method has higher robustness against the low-pass
[19] BER, % variance 0.22 0.02 0.04 0.29 filter attack than the methods proposed by Lie and Chang
proposed mean 42.8 41.2 39.9 44.5 [6] and Chen et al. [19].
method (N = 2) variance 0.02 0.01 0.01 0.02 (4) Amplitude scaling: An amplitude scaling attack was
BER, %
proposed mean 41.4 40.1 39.4 41.5 performed on the watermarked audio with scaling factors
method (N = 4) variance 0.01 0.01 0.01 0.02 0.5, 0.8, 1.1 and 1.2. Table 6 shows the experimental
BER, % results of this attack. Since the method by Lie and Chang
symphony reference paper mean 42.9 42.3 40.6 41.9 [6] uses the embedding technique that modify the audio
[6] BER, % variance 0.21 0.12 0.05 0.07
reference paper mean 42.4 40.3 40.2 41.3
samples or DWT coefficients by the same scales in a
[19] BER, % variance 0.43 0.32 0.21 0.29 section of a group. Hence, their method is more robust
proposed mean 43.4 42.4 40.6 42.1 against amplitude scaling attacks than ours. In addition, the
method (N = 2) variance 0.02 0.01 0.01 0.02 method proposed by Chen et al. [19] has high robustness
BER, % against amplitude scaling attacks.
proposed mean 42.7 41.1 39.7 40.4
method (N = 4) variance 0.02 0.01 0.01 0.01 (5) Time scaling: The watermarked audios were scaled by
BER, % −5, −2, 2 and 5%. Table 7 presents the analytical results of
this scaling. The proposed method was found to have similar
robustness with the methods proposed by Lie and Chang [6]
and Chen et al. [19].
(vi) echo and (vii) Gaussian noise. The robustness is (6) Echo: An echo was performed on the watermarked audios
measured by BER which is defined as with delay 0.5 ms, volume −6 dB and feedback − 60 dB.
The proposed method has BER about 10–15% which is
Berror
BER = × 100% much better than methods proposed by Lie and Chang [6]
Btotal and Chen et al. [19]. Table 8 shows the experimental
results of this attack.
where Berror and Btotal denote the number of error bits and the (7) Gaussian noise: Table 9 lists the experimental results of
number of total bits. Since we adopt 12 different songs in each adding Gaussian noise to the audio signal. Our method has
category, as piano, popular, symphony and singing, the test better performance than the methods proposed by Lie and
results for robustness are illustrated in their mean and Chang [6] and Chen et al. [19].
variance. We also compare the proposed scheme with the

Table 8 Echo The results in Tables 3–9 also indicate that the proposed
watermarking method is robust against most attacks under
Audio type Popular Piano Singing Symphony
higher embedding capacity.
reference mean 59.3 61.1 54.2 56.8
paper [6] variance 0.78 0.75 0.56 0.63
BER, % 5 Conclusions
reference mean 32.4 32.7 33.9 30.4
paper [19] variance 0.33 0.29 0.35 0.32 This paper presents an optimisation-based watermarking
BER, %
proposed mean 13.9 14.9 12.8 14.6 scheme in wavelet domain. We apply the KKT theorem to
method (N variance 0.26 0.04 0.02 0.23 minimise the difference between the original and the
= 2) BER, % watermarked coefficients. Accordingly, the
proposed mean 10.6 12.4 10.3 12.8 lowest-frequency amplitude can be modified optimally.
method (N variance 0.17 0.05 0.08 0.16
= 4) BER, %
Moreover, we also illustrate the relationship between SNR
and scaling factor. The relationship indicates that the

174 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399
www.ietdl.org
Table 9 Gaussian noise
Amplitude of noise, dB −40 dB −30 dB −20 dB −15 dB

popular reference paper [6] BER, % mean 9.32 16.2 38.8 41.4
variance 0.22 0.43 0.46 0.53
reference paper [19] BER, % mean 4.95 8.16 33.4 37.4
variance 0.21 0.34 0.56 0.63
proposed method (N = 2) BER, % mean 0 0 26.4 28.1
variance 0 0 0.24 0.21
proposed method (N = 4) BER, % mean 0 0 23.5 24.2
variance 0 0 0.18 0.22
piano reference paper [6] BER, % mean 8.34 15.6 39.2 42.5
variance 0.12 0.34 0.34 0.32
reference paper [19] BER, % mean 4.11 7.23 31.8 34.2
variance 0.12 0.14 0.23 0.36
proposed method (N = 2) BER, % mean 0 0 24.5 25.2
variance 0 0 0.22 0.24
proposed method (N = 4) BER, % mean 0 0 21.4 23.6
variance 0 0 0.11 0.22
singing reference paper [6] BER, % mean 9.74 18.3 38.5 42.1
variance 0.24 0.24 0.36 0.34
reference paper [19] BER, % mean 3.26 6.83 29.1 31.6
variance 0.16 0.32 0.44 0.54
proposed method (N = 2) BER, % mean 0 0 20.9 21.3
variance 0 0 0.31 0.32
proposed method (N = 4) BER, % mean 0 0 20.4 21.2
variance 0 0 0.24 0.21
symphony reference paper [6] BER, % mean 11.3 19.4 40.1 42.8
variance 0.56 0.45 0.52 0.51
reference paper [19] BER, % mean 4.83 8.47 34.2 38.9
variance 0.08 0.12 0.35 0.72
proposed method (N = 2) BER, % mean 0 0 22.9 23.3
variance 0 0 0.27 0.32
proposed method (N = 4) BER, % mean 0 0 21.4 22.2
variance 0 0 0.33 0.28

variance of the scaling factors has no obvious effect on SNR. 10 Huang, J., Wang, Y., Shi, Y.Q.: ‘A blind audio watermarking algorithm
The experimental results show that the embedded data are with self-synchronization’. Proc. IEEE Int. Symp. Circuits and Systems,
2002, vol. 3, pp. 627–630
robust against most signal processing and attacks, except 11 Wu, S., Huang, J., Huang, D., Shi, Y.Q.: ‘Efficiently self-synchronized
for amplitude scaling, under sufficient audio quality and audio watermarking for assure audio data transmission’, IEEE Trans.
embedding capacity. In the future work, we will focus on Broadcast., 2005, 51, (1), pp. 69–76
the improvement of robustness to amplitude scaling by 12 Huang, H.-N., Chen, D.-F., Lin, C.-C, Chen, S.-T.: ‘Wavelet-domain
improving the embedding rules or adding a corresponding image watermarking using optimization-based mean quantization’,
Advances in Intelligent Systems and Computing, 2014, 297, pp. 279–286
constraint in the optimisation problem. 13 Wang, X., Zhao, H.: ‘A novel synchronization invariant audio
watermarking scheme based on DWT and DCT’, IEEE Trans. Signal
Process., 2006, 54, pp. 4835–4840
6 References 14 Wang, L.-X., Chao, Y., Pang, J.: ‘An audio watermark embedding
algorithm based on mean-quantization in wavelet domain’. The Eighth
1 Chen, B., Wornell, G.W.: ‘Quantization index modulation: a class of Int. Conf. on Electronic Measurement and Instruments (ICEMI’2007),
provably good methods for digital watermarking and information pp. 423–425
embedding’, IEEE Trans. Inf. Theory, 2001, 47, (4), pp. 1423–1443 15 He, X., Scordilis, M.S.: ‘Efficiently synchronized spread-spectrum audio
2 Bassia, P., Pitas, I., Nikolaidis, N.: ‘Robust audio watermarking in the watermarking with improved psychoacoustic model’, Res. Lett. Signal
time domain’, IEEE Trans. Multimed., 2001, 3, (2), pp. 232–241 Process., 2008, 2008, Article ID 251868, p. 5
3 Ko, B.S., Nishimura, R., Suzuki, Y.: ‘Time-spread echo method for 16 Wang, X.-Y., Ma, T.-X., Niu, P.-P.: ‘Digital audio watermarking
digital audio watermarking using PN sequence’. Proc. IEEE Int. Conf. technique using pseudo-zernike moments’, Springer: Lect. Notes
on Acoustics, Speech and Signal Processing, 2002, vol. II, Comput. Sci., 2009, 5927, pp. 459–474
pp. 2001–2004 17 Wang, X.-Y., Niu, P.-P., Yang, H.-Y.: ‘A robust digital audio
4 Alaryani, H., Youssef, A.: ‘A novel audio watermarking technique based watermarking based on statistics characteristics’, Pattern Recognit.,
on frequency components’. Proc. of the Seventh IEEE Int. Symp. on 2009, 42, (11), pp. 3057–3064
Multimedia, 2005 18 Chen, S.-T., Huang, H.-N.: ‘Energy-proportion audio watermarking
5 Zaidi, A., Boyer, R., Duhamel, P.: ‘Audio watermarking under scheme in the wavelet domain’. Fourth Int. Conf. on Genetic and
desynchronization and additive noise attacks’, IEEE Trans. Signal Evolutionary Computing, ShenZhen, China, 13–15 December 2010,
Process., 2006, 54, pp. 570–584 pp. 679–682
6 Lie, W.-N., Chang, L.-C.: ‘Robust and high-quality time-domain audio 19 Chen, S.-T., Huang, H.-N., Chen, C.-J., Wu, G.-D.: ‘Energy-proportion
watermarking based on low-frequency amplitude modification’, IEEE. based scheme for audio watermarking’, IET Proc. Signal Process., 2010,
Trans. Multimed., 2006, 8, (1), pp. 46–59 4, (5), pp. 576–587
7 Xiang, S., Huang, J.: ‘Histogram-based audio watermarking against 20 Chen, S.-T., Wu, G.-D., Huang, H.-N.: ‘Wavelet-domain audio
time-scale modification and cropping attacks’, IEEE Trans. Multimed., watermarking scheme using optimization-based quantization’, IET
2007, 9, (7), pp. 1357–1372 Proc. Signal Process., 2010, 4, (6), pp. 720–727
8 Yamamoto, K., Iwakiri, M.: ‘Real-time audio watermarking based on 21 Chen, S.-T., Huang, H.-N., Hsu, C.-Y., Tseng, K.-K., Pan, J.-S.,
characteristics of PCM in digital instrument’, J. Inf. Hiding Multimed. Zhao, M.: ‘Optimization-based audio watermarking using
Signal Process., 2010, 1, (2), pp. 59–71 low-frequency amplitude modification’. Int. Conf. on Information
9 Peng, H., Wang, J.: ‘Optimal audio watermarking scheme using genetic Security and Intelligence Control, Jilin, China, August 2011, pp. 1–4
optimization’, Springer: Ann. Telecommun., 2011, 66, (5–6), 22 Xiang, S., Huang, J.: ‘Robust audio watermarking against the D/A and
pp. 307–318 A/D conversions’, EURASIP J. Adv. Signal Process., 2011, 3, pp. 1–14

IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176 175


doi: 10.1049/iet-spr.2013.0399 & The Institution of Engineering and Technology 2015
www.ietdl.org
23 Wang, X.-Y., Niu, P.-P., Lu, M.-Y.: ‘A robust digital audio 27 Burrus, C.S., Gopinath, R.A., Gao, H.: ‘Introduction to wavelet theory
watermarking scheme using wavelet moment invariance’, Elsevier: and its application’ (Prentice-Hall, New Jersey, 1998)
J. Syst. Softw., 2011, 84, pp. 1408–14421 28 Lewis, F.L.: ‘Optimal control’ (John Wiley and Sons, New York, 1986)
24 Wang, X., Wang, P., Zhang, P., Xu, S., Yang, H.: ‘A blind audio 29 Chong, E.K.P., Zak, S.H.: ‘An introduction to optimization’ (John Wiley
watermarking algorithm by logarithmic quantization index modulation’, and Sons, Inc., New York, 2001)
Multimed. Tools Appl., 2012, doi: 10.1007/s11042–012–1259-x 30 Salovarda, M., Bolkovac, I., Domitrovic, H.: ‘Estimating perceptual
25 Li, D., Quan, W., Kim, J.-W.: ‘An audio watermarking algorithm using audio system quality using PEAQ algorithm’. 18th Int. Conf. on
group quantization of DCT coefficients’, Springer: Lect. Notes Comput. Applied Electromagnetics and Communications, October 2005, pp. 1–4
Sci., 2012, 7709, pp. 159–166 31 Available at http://www.opticom.de/technology/technology.html: PEAQ
26 Chen, S.-T., Huang, H.-N., Chen, C.-J., Tseng, K.-K., Tu, S.-Y.: information from OPTICOM
‘Adaptive audio watermarking via the optimization point of view on 32 Available at http://www-mmsp.ece.mcgill.ca/Documents/Software/
the wavelet-based entropy’, Elsevier: Dig. Signal Process., 2013, 23, index.html: PqevalAudio - Matlab and C implementation of PEAQ
(2013), pp. 971–980 Basic Model

176 IET Signal Process., 2015, Vol. 9, Iss. 2, pp. 166–176


& The Institution of Engineering and Technology 2015 doi: 10.1049/iet-spr.2013.0399

You might also like