Professional Documents
Culture Documents
Abstract: Portable Document Format (PDF) developed by and operating system. Each PDF file contains a complete
Adobe Systems Inc. is a flexible and popular document description of a 2-D document, which includes text, fonts,
distribution and delivery file format, and it is supported within images, and vector graphics described by a context-free
various operating systems and devices. The ease of reproduction, grammar modified from PostScript. Many PDF readers are
distribution, and manipulation of digital documents creates available for use to read PDF files; each PDF file appears in
problems for authorized parties that wish to prevent illegal use the window of a PDF reader as an image-like document.
of such documents. To this end, digital watermarking has been The main advantage of the PDF format is that it allows
proposed as a last line of defence to protect PDF files copyright documents created within any desktop publishing package to
through visible watermarks in such files. As well as, to preserve
be viewed in the original typeset design, regardless of the
the integrity of the digital watermark, an asymmetric
cryptographic algorithm (DES) is employed. The proposed
systems where it is being displayed. Documents with texts,
watermarking method does not change the values of the stored images, hyper-links and other desirable features in
data objects. Experimental results show the feasibility of the document authoring, can be easily created with the packages
proposed method and provide a detailed security analysis and distributed by Adobe or with any other authoring application
performance evaluation to show that the digital watermarking is (e.g., Microsoft Office, OpenOffice, LaTeX, etc.) and then
robust and can withstand various types of attacks. converted to the PDF format. The result is an easy to
Keywords: Portable Document Format (PDF), Watermarking distribute, small size document, that will be displayed
(WM), Data Encryption Standard (DES), Copyright protection, exactly in the way it was created, on any platform and using
Cryptosystems. any viewer application.
Besides being very flexible and portable, PDF documents are
1. Introduction also considered to be secure. Popular document formats like
Microsoft Compound Document File Format (MCDFF) have
The number of files that are published and exchanged
been proven to have security flaws that can leak private user
through the Internet is constantly growing and electronic
information(see Castiglione et al.(2007)[6]), while PDF
document exchange is becoming more and more popular documents are widely regarded as immune to such
among Internet users. The diversity of platforms, formats problems. This is one of the reasons why many
and applications has called for a common technology to governmental and educational institutions have chosen PDF
overcome those differences and produce universally readable as their official document standard. In this paper, we will
documents to be exchanged without limitations. Even start giving a concise overview of the PDF format, focusing
though it is supported by nearly every application on any on how data is stored and managed.
machine, plain text ASCII has failed to become popular Digital watermarking is a relatively new research area that
because it does not allow text formatting, image embedding attracted the interest of numerous researchers both in the
and other features that are required to an efficient academia and the industry and became one of the hottest
communication. Portable Document Format (PDF) files [1] research topics in the multimedia signal processing
are popular nowadays, and so using them as carriers of community. Although the term watermarking has slightly
secret messages for covert communication is convenient. different meanings in the literature, one definition that
Though there are some techniques of embedding data in text seems to prevail is the following [7]: Watermarking is the
files [2-4], studies of using PDF files as cover media are practice of imperceptibly altering a piece of data in order to
embed information about the data. The above definition
very few, except Zhong et al. [5] in which integer numerals
reveals two important characteristics of watermarking. First,
specifying the positions of the text characters in a PDF file
information embedding should not cause perceptible
are used to embed secret data. In this paper a new algorithm
changes to the host medium (sometimes called cover
for PDF documents protection has been presented. medium or cover data). Second, the message should be
The PDF, created by Adobe Systems for document exchange related to the host medium. In this sense, the watermarking
[1], is a fixed-layout format for representing documents in a techniques form a subset of information hiding techniques.
manner independent of the application software, hardware, However, certain authors use the term watermarking with a
98 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
• One or more cross-reference tables storing (xref) followed by the object number of the first object that
information and pointers to objects stored in the has an entry in that section and the number of its entries.
file; Table 2 shows a section with entries of 36 objects, from
• One or more trailers that provide the location of object 0 to object 35. Each entry provides the following
the cross-reference tables in the file. information:
• The object offset in the file;
Table 1: PDF file header For version 1.7 document • The object generation number;
Header Version • The free/in-use flag with value n if the object is in use or ƒ
if the object is free, that is, if the object has been deleted.
%PDF- 1.7
Object 0 is a special object and it is always marked as free,
with generation number 65535. The latest document
(trailer) is stored at the end of the file and points to the last
A newly created PDF document has only one body section, section of the cross-reference table. A PDF document is
one cross-reference table and one trailer. When a document always read from the end (apart when generated with the
is modified, its previous version is not updated, but any “Fast Web View ”flag enabled),looking for the offset
changes and new contents are appended to the end of file, relative to the last section of the cross-reference table,
adding a new body section, a new section of the cross- required to identify the objects that constitute the latest
reference table and a new trailer. The incremental update version of the document. Each time the document is
avoids rewriting the whole file, resulting in a faster saving updated–adding new objects or modifying existing ones–a
process, especially when only small amendments are made new body, cross-reference table section and trailer are
to very large files. Objects stored in the body section have an appended to the file. The body section will contain the
object number used to unambiguously identify the object newly created objects or the updated version of the existing
within the file, anon-zero generation number and a list of ones, the cross-reference table section will store information
key-value pairs enclosed between the keywords (obj) and to retrieve those objects, while the trailer will have a
endobj. Generation numbers are used only when object reference to the newly created cross-reference table section,
numbers are reused, that is, when the object number as well as a pointer to the previous one.
previously assigned to an object that has been deleted is
assigned to a new one. Due to incremental updates, 4. Digital Watermarking and Cryptosystems:
whenever an object is modified, a copy of the object with the basics and overview
latest changes is stored in the file. The newly created copy
will have the same object number as the previous one. Thus, 4.1 Digital watermarking
several copies of an object can be stored in the file, each one Digital watermarking requires elements from many
reflecting the modifications made to that object from the disciplines, including signal processing,
time it was created, onwards. telecommunications, cryptography, psychophysics, and law.
The cross-reference table is composed of several sections In this paper, we focus on the process of protecting PDF
and allows random access to file objects. When a document documents.
is created, the cross-reference table has only one section and An effective watermark should have several properties,
new sections are added every time the file is updated. Each listed below, whose importance will vary depending upon
section contains one entry per object, for a contiguous the application.
number of objects. • Robustness
The watermark should be reliably detectable after alterations
Table 2: An example of cross-reference table to the marked document. Robustness means that it must be
xref difficult (ideally impossible) to defeat a watermark without
0 36 degrading the marked document severely-so severely that
0000000000 65535 f the document is no longer useful or has no (commercial)
0000076327 00000 n value.
0000076478 00000 n • Imperceptibility or a low degree of obtrusiveness
0000076624 00000 n To preserve the quality of the marked document, the
0000078478 00000 n watermark should not noticeably distort the original
document. Ideally, the original and marked documents
0000078629 00000 n
should be perceptually identical.
0000078775 00000 n
• Security
0000078488 00000 n Unauthorized parties should not be able to read or alter the
0000078639 00000 n watermark. Ideally, the watermark should not even be
. detectable by unauthorized parties.
. • No reference to original document
. For some applications, it is necessary to recover the
0000100661 00000 n watermark without requiring the original, unmarked
An example of cross-reference table section is given in document (which would otherwise be stored in a secure
Table 2.As shown, each section starts with the keyword archive).
100 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
The DES algorithm is the most widely known block cipher KPDF document belong to him. Software programs consist
in the world and even today, is resistant to most of practical of two major entities: KPDF creator and KPDF viewer.
attacks. It was created by IBM and defined in 1977 as U.S. The KPDF creator as shown in (Fig.2) and the flowchart is
standard FIPS 46. It is a 64-bit block cipher with 56 bit keys shown in (Fig.3):
and 16 rounds. A round in DES is a substitution 1. Embedded digital watermark into PDF document
(confusion), followed by a permutation (diffusion). 2. Generate string of the header which contain the
For each 64-bit block of plaintext that DES processes, an publisher info and the parameters value (Expire date,
initial permutation is performed and the block is broken into Allow password, Allow print and Allow print screen)
two 32-bit halves, a left half ( Li ) and a right half ( Ri ) . then convert it into array of bytes then append last byte
that’s contain (Number of Header bytes)
The 16 rounds of substitutions and permutations, called
3. Read the work document (watermark included or not)
function f, are then performed. For each round, a DES round
into array of bytes then transposition the array of bytes
key ( K i ) of 48 bits and the current Ri are input into from Z to A into a new array of bytes
function f. The output of f is then XORed with the current 4. Create a new array of bytes that contain header bytes +
Li to give Ri +1 . The current Ri becomes Li +1 . After the 16 reverse bytes and one byte for the trailer which has
header length
rounds, the two halves are rejoined and a final permutation
5. To preserve the integrity of the digital watermark, an
is the last step. This process is shown in Fig.1
asymmetric cryptographic algorithm (DES) is
employed using public key hashed to create a unique
32 character (256-bit)
6. Write the all bytes into the KPDF document
(a)
References [15] Chao, H., Fan, J., 2004. Layout and Content Extraction
for PDF Documents. Lecture Notes in Computer
[1]Adobe Systems Incorporated, Portable Document Format Science LNCS 3163, 213–224.
Reference (space) Manual, version1.7,November2006, [16] Zhong, S., Cheng, X., Chen, T., 2007. Data hiding in a
/http://www.adobe.comS. kind of PDF texts for secret communication.
[2] W. Bender, D. Gruhl, N. Morimoto, A. Lu, Techniques International Journal of Network Security 4 (1), 17–
for data hiding, IBM System Journal 35 (3, 4) 26.
(February 1996). [17] G.W. Braudaway, K.A. Magerlein, F.C. Mintzer,
[3] H.M. Meral, E. Sevinc, E. Unkar, B. Sankur, A.S. Protecting publicly available images with a visible
Ozsoy, T. Gungor, Syntactic tools for text image watermark, Proc. SPIE, Int. Conf. Electron.
watermarking, in: Proceedings of SPIE International Imaging 2659 (1996) 126–132.
Conference on Security, Steganography, and Water- [18] J. Meng, S.F. Chang, Embedding visible video
marking of Multimedia Contents, San Jose, CA, USA, watermarks in the compressed domain Proc, ICIP 1
January 29–February 1, 2007. (1998) 474–477.
[4] M. Topkara, U. Topkara, M.J. Atallah, Information [19] S.P. Mohanty, K.R. Ramakrishnan, M.S. Kankanhalli,
hiding through errors: a confusing approach, in: A dual watermarking technique for image, Proc. 7th
Proceedings of SPIE International Conference on ACM Int. Multimedia Conf. 2 (1999) 9–51.
Security, Steganography, and Watermarking of [20] P.M. Chen, A visible watermarking mechanism using a
Multimedia Contents, San Jose, CA, USA, January statistic approach, Proc. 5th Int. Conf. Signal Process.
29–February 1, 2007. 2 (2000) 910–913.
[5] S. Zhong, X. Cheng, T. Chen, Data hiding in a kind of [21] M.S. Kankanhalli, R. Lil, R. Ramakrishnan, Adaptive
PDF texts for secret communication, International visible watermarking of images, Proc. IEEE Int’l
Journal of Network Security 4 (1) (January 2007) 17– Conf. Multimedia Comput. Syst. (1999) 68–73.
26. [22] S.P. Mohanty, M.S. Kankanhalli, R. Ramakrishnan, A
[6]Castiglione, A., De Santis, A., Soriente, C., 2007. Taking DCT domain visible watermarking technique for
advantages of a disadvan- tage: digital forensics and image, Proc. IEEE Int. Conf Multimedia Expo 20
steganography using document metadata. Journal of (2000) 1029–1032.
Systems and Software, Elsevier 80, 750–764. [23] S.P. Mohanty, K.R. Ramakrishnan, M.S. Kankanhalli,
[7] Brassil, J., Low, S., Maxemchuk, N. and O'Gorman, L., A dual watermarking technique for image, Proc. 7th
Electronic marking and identification techniques to ACM Int. Multimedia Conf. 2 (1999) 9–51.
discourage document copying. Proceedings of IEEE [24] Y. Hu, S. Kwong, Wavelet domain adaptive visible
INFOCOM, `94, 1994, 3, 1278±1287. watermarking, Electron. Lett
[8] S. Voloshynovskiy, F. Deguillaume, O. Koval, T. Pun, 37 (20) (2001) 1219–1220.
Robust watermarking with channel state estimation, [25] Y. Hu, S. Kwong, An image fusion-based visible
Part I: theoretical analysis, Signal Processing: Security watermarking algorithm, in:
of Data Hiding Technologies, (Special Issue) 2003– Proc. Int’l Symp. Circuits Syst., IEEE Press, 2003, pp. 25–
2004, to appear. 28.
[9] S. Voloshynovskiy, F. Deguillaume, O. Koval, T. Pun, [26] L. Yong, L.Z. Cheng, Y. Wu, Z.H. Xu, Translucent
Robust watermarking with channel state estimation, digital watermark based on wavelets and error-correct
Part II: applied robust watermarking, Signal code, Chinese J. Comput. 27 (11) (2004) 533–1539.
Processing: Security of Data Hiding Technologies, [27] B.B. Huang, S.X. Tang, A contrast-sensitive visible
(Special Issue) 2003–2004, to appear. watermarking scheme, IEEE Multimedia 13 (2) (2006)
[10]Adobe SystemsInc.,2010.AdobePDFReferenceArchives. 0–66.
http://www.adobe. com/devnet/pdf/pdf reference [28] J.L. Mannos, D.J. Sakrison, The effects of a visual
archive.html (Last updatedJanuary2010). fidelity criterion on the encoding of images, IEEE
[11]Byers, S., 2004. Information leakage caused by hidden Trans. Info. Theory 20 (4) (1974) 25–536.
data in published documents. IEEE Security and [29] D. Levicky´ , P. For_is, Human Visual System Models
Privacy 2 (2), 23–27. in Digital Image Watermarking, Radio Eng. 13 (4)
[12]Wikipedia (2004) 38–43.
theOnlineEncyclopedia,2009.TheCalipariIncident. [30] A.P. Beegan, L.R. Iyer, A.E. Bell, Design and
http:// en.wikipedia.org/wiki/Nicola Calipari/, evaluation of perceptual masks forwavelet image
http://en.wikipedia.org/wiki/Rescue_of_Giuliana_Sgre compression, in: Proc. 10th IEEE Digital Signal
na/ (Last updatedDecember2009). Processing Workshop, 2002, pp. 88–93.
[13]King, J.C., 2004. A format design case study: PDF. In: [31] S. Voloshynovskiy, et al., A stochastic approach to
HYPERTEXT’04: Proceedings of the Fifteenth ACM content adaptive digital image watermarking, in: Proc.
Conference on Hypertext and Hypermedia. ACM 3rd Int. Workshop Information Hiding, Dresden,
Press, New York, NY, USA, pp. 95–97. Germany, 1999, pp. 211–236.
[14] Bagley, S.R., Brailsford, D.F., Ollis, J.A., 2007. [32] A.B. Watson, G.Y. Yang, J.A. Solomon, J. Villasenor,
Extracting reusable document com- ponents for Visibility of wavelet quantization noise, IEEE Trans.
variable data printing. In: DocEng’07: Proceedings of Image Proc. 6 (8) (1997) 1164–1175.
the ACM Symposium on Document Engineering.
ACM Press, New York, NY, USA, pp. 44–52.
104 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
Author’s Profile