Compression Encryption

i
Understanding the Raw Materials of the Internet

by
ii
Foreword
By Bruce Schneier
Compression and encryption are two of the Internets core technologies. Compression saves money and time by making it cheaper and faster to transmit data. But encryption protects something even more important: your privacy. The idea that a mans home is his castle has survived the invention of organized police, indoor plumbing and the telephone. The question now is, can privacy rights also survive the computer? In the time it takes a shoulder surfer to steal your calling card number in an airport, a computer can steal a million such numbers. A computer can rummage through your electronic mail and catalog the web sites you visit. It can scour 100 million telephone directory listings and nd everyone with a Jewish surname, and then correlate them with its collection of people who have requested information on a certain fringe political party. Anything a human snoop can do, a computer can do faster and more thoroughly. Computers are, quite simply, the best way ever invented to invade privacy. Fortunately, however, computers are also the best way to defend your privacy. This primer introduces Compression and Encryption. These technologies are, in a sense, opposites. Encryption works by hiding predictable patterns in text.Compression works by nding repetitive patterns in the text and replacing them with shorter tokens.Compression and encryption must therefore be performed in the correct order: Compression rst, then encryption. Encryption is what keeps computers secure from prying eyes. This primer presents some very simple examples of encryption. These easy-to-understand ciphers are hundreds of years old. And while they would not protect your data against a skilled modern attack, they do illustrate the basic principles that are at the heart of the strongest modern encryption.
Foreword
iii
In my book Applied Cryptography,1 I discuss the kind of encryption that can withstand an attack by the worlds fastest computers. Do average people need computer security that is strong enough to foil a major government intelligence service? Yes. Our Constitution guarantees us the right to be secure in our persons, houses, papers, and effects. You have a right to keep your business and personal dealings private; whether they be your tax records, medical records, or personalletters.The government wants to mandate that you make yourself available to surveillance;cryptography can prevent that.2 Virtually all information, from baby pictures to alarm system blueprints, will soon be stored in digital form. We are already sitting on top of a huge underground river of binary digits which grows in volume by the second. How we manage that river of data directly affects our privacy and our pocketbooks. Compression and encryption are important. This easy-to-read primer is a great place to start learning about them. Bruce Schneier Minneapolis,MN schneier@counterpane.com.
1. Applied Cryptography, Second Edition, John Wiley & Sons,1996. 2. The Electronic Privacy Papers, John Wiley & Sons,1997.
iv
Foreword
Table of Contents
Why encryption and compression are important . . . . . . . . . . . . . . . . . . . . . . . . 1 Compression saves money. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Encrypted data cant be compressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Encryption must follow compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 How compression works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Duplicate strings of characters replaced with tokens . . . . . . . . . . . . . . . . . . 4 Compression speed is important . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 What data makes the smallest files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Electronic mail messages contain many compressible phrases . . . . . . . . . . . 6 The HTML used for Netscapes homepage compressed at the rate of 5 to 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Types of compression programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Lossless vs. Lossy compression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 How encryption works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Mathematical functions create ciphertext. . . . . . . . . . . . . . . . . . . . . . . . . 9 Similarities between code breaking and compression. . . . . . . . . . . . . . . . . 10 Brute force computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Breaking codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Substitution codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Frequency patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Breaking codes with Microsoft Office . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Caesars code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Hiding letter frequencies with the Vigenere cipher . . . . . . . . . . . . . . . . . . 16 Transposition codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 A known plaintext attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Contemporary Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Algorithms and key length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Symmetric and public keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 The Internets building blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 LZS compression, the de facto standard . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Encryption: Essential for the Internets growth . . . . . . . . . . . . . . . . . . . . . 26
Table of Contents
vi
Table of Contents
Why encryption and compression are important

Compression and encryption are raw materials of the Internet. Compression shrinks computer data, making it cheaper and faster to send across the Internet. Encryption makes data secret, protecting your privacy. Even if you are not a computer user yourself, computers and the Internet touch your life every day. Shortly after you wake up, you notice that your breakfast cereal has machine-readable bar codes on it.When you stop at the bank, your teller uses a computer as he talks to you. The credit card you use at lunch zooms across the country and back while youre waiting to sign the bill. Your mail arrives after lunch, and like your breakfast cereal, it is bar-coded. On the way home, a police ofcer stops you for a broken turn signal. She also checks you for outstanding warrants via computer. The telephone solicitor who interrupts you at dinner stares at a computer terminal as he makes his pitch. On television at night, every commercial ends with www-dot-something-dot-com. If you own a postal scale or a postage meter, you care about mailing and shipping costs. If you have locks on your doors, you care about loss prevention. On the Internet, there are no postal scales or door locks. Instead, compression and encryption provide the basis for computer-era economy and security. Compression and encryption programs always include decompression and decryption programs. It would obviously do no good to make information so small, or so secret, that it could never be restored to its former state. So, when we talk about compression and encryption, we are really talking about four things:
Term 1. Compression 2. Decompression 3. Encryption 4. Decryption Denition Shrinks Data Expands Data Hides Contents Reveals Contents
Compression saves money
Compression saves money

The risks of poor computer security are obvious and sometimes dramatic. In 1995, for example, Kevin Mitnick stole more than 20 thousand credit cards from Netcom, Inc. Mitnicks computer invasions spurred an intense interest in security, which has been compared to the effect of the Sputnik satellite in 1957 on American education. Out of this concern, three Internet Security protocols have emerged: Point-to-Point protocol (PPP); Secure Socket Layer (SSL) and Internet Protocol Security (IPSec). All of these protocols support data compression. The Internet is safer today than it was two years ago. But business users are equally at risk of being robbed when they send uncompressed data across the Internet. Compression saves on computer processing power, memory and transmission costs, all of which are ultimately passed onto to the user. The reasons for performing compression are simple, and they are all preceded with dollar signs. In this primer, we discuss compression before encryption,and not just to observe alphabetical order. For reasons that will become clear shortly, compression must be performed before encryption, or it cant be performed at all. Compression, even in modest amounts, produces big savings. Most businesses connect to the Internet not through modem lines like home users but through faster and more expensive data links such as ISDN. As the business grows, and the amount of its data increases, the company must buy more equipment and more telephone service. Since it is not possible to buy part of an ISDN line, for example, the businesss expenses will double even if the companys need for additional capacity, or bandwidth, is only slightly more than the capacity of a single line. The situation is like a letter that weighs 1.1 ounces; if you cant get the weight below an ounce, you have to buy two stamps. Very large organizations often nd themselves in the position when they outgrow their highcapacity T1 or T3 lines.
Encrypted data cant be compressed
Type of Services Modem ISDN T1 T3
Approx. Monthly Cost of Telephone Service $20 $50 $500-$1000 $10,000 and up
Max. Data Transfer Rate (kilobyte per second) 56K 128K 1,544K 44,736K
Encrypted data cant be compressed

Compressing and encrypting data obviously makes good business sense. However, these functions cannot be performed interchangeably. Compression must be performed before encryption. It is impossible to compress encrypted data. Compression depends upon nding patterns within messages that can be represented by shorter symbols, called tokens. In contrast, encryption removes patterns from messages. For this reason, compression must be performed rst. Indeed, one of the tests used by professional code breakers is to try to compress a secret message. If the message is compressible, the encryption formula that produced it fails the test. Attempting to compress properly encrypted data not only fails to make the data smaller; it can actually cause the le to grow. Compression may add data to the le that is not compensated for by any reduction in le size. While this data expansion problem can be avoided through good design practices, an encrypted le never gets any smaller and may even get bigger.
Encryption must follow compression

While you cannot compress data after encryption, it is quite possible, and desirable, to compress data before encryption. Compression not only saves on transmission costs; it also saves on the costs of encryption. If you compress a le to half of its former size, encryption will use half as much processing power.
Duplicate strings of characters replaced with tokens
Encryption performs complex mathematical operations on each block of data, requiring lots of computational power. Smaller les save money because they encrypt faster. To be sure, compression also requires processing power, but dedicated compression chips can perform this function much less expensively than general-purpose microprocessors. The newest approach to compression and encryption combines both functions on the same chip. This guarantees that the functions are performed in the right order (compression rst, encryption second), and reduces even more the demand on the main processor.
How compression works

Duplicate strings of characters replaced with tokens
Compression works by replacing repeating strings of characters with shorter tokens. For example, the following message uses the same phrase in three different places.
Uncompressed
Page 1 The IBM Corporation is a large corporation. Page 2 The IBM Corporation is a protable organization. Page 10 The IBM Corporation is a force in world chess competition.
Compressed
The IBM Corporation is a large corporation. (40,25) protable organization. The text from page two is compressed using the token (40,25). The token means Go back 40 characters and get the next 25 characters. The IBM Corporation is a force in world chess competition. The text from page 10 is not compressed because it is outside the 2-page window of compression
The 1st appearance of the phrase is not compressed.
Compression speed is important
Compression speed is important

In Hifns LZS compression, the repetitive characters must be within about 2,000 characters (2,048 bytes) of each other. Otherwise, they will be passed along as uncompressed text.Why 2,000 characters? Wouldnt a larger chunk of text produce more compression? The answer is yes, a larger window would produce somewhat more compression but at a cost of slower performance. The point of compression is to speed up the Internet, not slow it down. And since compression must be done on the y in very fast-moving computer networks, performance is crucial. Extensive Hifn testing has found that a sliding window of about two double-spaced pages of text is best in terms both of compression efciency and time required to compress. Repeated searching for redundant strings of data can also deliver more compression, but at the expense of speed. Because of its speed, LZS compression, whose patents are owned by Hifn, is the de facto standard for compression among the major hardware and software companies who are building the Internet. The table below shows the difference in compression ratios and compression speed when Thomas Hardys 141,000-word novel Far from the Madding Crowd is compressed using LZS and WinZip.
LZS v WinZip
Compression Program LZS WinZip Original Size (in bytes) 785K 785K Compressed Size (in bytes) 458K 322K Ratio 1.7:1 2.4:1 Time to Compress 1 second 5 seconds
The table shows that WinZip makes the le smaller but takes ve times as long. For personal use, the time taken probably doesnt matter. However, when merging onto the Information Super Highway, speed is allimportant. The data stream must be able to travel at the maximum data rate of the telephone line. Most large companies and organizations use a T1 line, which can send data at the rate of 1.54 million bits, or about 180,000
What data makes the smallest files
bytes, per second. At the rate of 157,000 bytes, per second, data bubbles would quickly build up in the data stream. These data bubbles waste T1 capacity, which translates to wasted money.
What data makes the smallest les

Highly repetitive documents can be compressed more than documents with fewer redundant character strings. Electronic mail messages and web pages created in Hypertext Markup Language (HTML) are highly compressible (see examples below). On the other hand, encrypted data, which resembles page after page of random numbers, cannot be compressed at all. Previously compressed data sometimes can be compressed further, although usually not very much.
Electronic mail messages contain many compressible phrases

From - Thu Aug 07 15:43:14 1997 Received:from mailman.hifn.com (mailman.hifn.com [206.19.120.66]) by interstice.com (8.8.6/8.6.9) with SMTP id NAA29787 for <shigh@hifn-north.com>; Thu, 7 Aug 1997 13:45:08 -0700 (PDT) Received: by mailman.hifn.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63) id <01BCA337.1529C900@mailman.hifn.com>;Thu, 7 Aug 1997 13:37:43 -0700 Received:from smtp2.cerf.net by mailman.hifn.com with SMTP (Microsoft Exchange Internet Mail Connector Version 4.0.994.63) id SBC2RB16;Thu, 7 Aug 1997 13:37:37 -0700 Received:from interstice.com (inter2.interstice.com [209.50.32.201]) by smtp2.cerf.net (9.9.8/8.6.10) with ESMTP id NAA21231;Thu, 7 Aug 1997 13:35:24 -0700 (PDT) Received:from int-226-70.interstice.com (int-22670.interstice.com [205.199.226.70]) by interstice.com (8.8.6/ 8.6.9) with SMTP id NAA28636;Thu, 7 Aug 1997 13:01:51 -0700 (PDT) Received: by int-226-70.interstice.com with Microsoft Mail id <01BCA333.82613700@int-226-70.interstice.com>;Thu, 7 Aug 1997 13:12:08 -0700 Message-ID:<c=US%a=_%p=HIFN_Inc.%l=Internet_Mail_970807203740Z-369@mail-man.hifn.com> X-UIDL:870986870.001
The HTML used for Netscapes homepage compressed at the rate of 5 to 1
The HTML used for Netscapes homepage compressed at the rate of 5 to 1

<TITLE>WelcometoNetscape</TITLE> </HEAD> <BODYBGCOLOR=#ffffffLINK=#0000ffVLINK=#ff0000ALINK=#ff 0000TEXT=# 000000ONLOAD=clientSniff();> <MAPNAME=mainmap> <AREACO-ORDS= 0,2,75,24HREF=http:// guide.netscape.com/?h> <AREACO-ORDS= 79,2,156,24HREF=/ bbcomprod/comprod/index.html> <AREACO-ORDS= 158,1,232,24HREF=/bbentsol/comprod/at_work/index.html> <AREACO-ORDS= 234,1,312,24HREF=http:// developer.netscape.com/index.html> <AREACO-ORDS= 315,2,382,24HREF=http://merchant.netscape.com/ms_dom_bin/ bar_index.html> <AREACOORDS=385,1,467,24HREF=/bbhelp/ assist/index.html> </MAP> <!mastheadtable> <CENTER> <IMGSRC=/images/ home_igloo.jpgWIDTH=468HEIGHT=107BORDER=0ALT=WelcomeToNetsc ap> <AHREF=/misc/igloo_nav.map><IMGSRC=/images/ new_nav_home.gifWIDTH=468HEIGHT=25BORDER=0USEMAP=#mainmapI SMAP></A> <!MainFlash> <H3><FONTSIZE=+2>COMMUNICATOR <FONTSIZE=+2>AVAILABLEIN <FONTSIZE=+2>RETAIL <FONTSIZE=+2>STORES </H3> </CENTER> <FORMNAME=menuform><TABLENAME=table BORDER=0CELL-PADDING=0CELLSPACING=0WIDTH=100%> <TR> <TDVALIGN=TOP-WIDTH=58%> <!Date> A<FONTSIZE=-1>UGUST30S<FONTSIZE=-1>SEPTEMBER2,1997 <!FlashImage><AHREF=/flash1/newsref/pr/ newsrelease468.html><IMGSRC=/inserts/images/light-house. gifWIDTH=40HEIGHT=35HSPACE=5ALIGN=LEFTBORDER=0ALT=NewCommun icatoreditions:InternetAccessandDeluxe></A> <!Flash1> BuyNetscapeCommunicatorInternetAccessEditionandDeluxeEditionf rom<AHREF=/flash1/newsref/pr/ newsrelease468.html>leadingresellers<A>,includingBestBuy, ComputerCity,andEgghead,andenjoy<AHREF=/flash1/ad/ rebates.html>bigrebates </A>. <BRCLEAR=ALL> <SPACERTYPE=VERTICALSIZE=5> <TABLECELLPADDING=3BOR-DER=0WIDTH=100%> <!Flash2> <TRVALIGN=TOP><TDWIDTH=10ALIGN=RIGHT><IMGSRC=/inserts/ images/bullet_sm.gif HSPACE=10VSPACE=5BORDER=0WIDTH=4HEIGHT=4></TD> <TD><AHREF=/ flash2/comprod/products/tools/ visual_js.html>VisualJavaScript</A>Preview Release3isnow<AHREF=/flash2/download/ visual_javascript.html>available<A>, deliveringadatabaseapplicationwizardandacomponentpalette.</ TD> </TR> <!Flash3> <TRVALIGN=TOP> <TDWIDTH=10ALIGN=RIGHT><IMGSRC=/inserts/images/ bullet_sm.gifHSPACE=10VSPACE=5BORDER=0WIDTH=4HEIGHT=4></TD> <TD>Netscape<AHREF=/flash3/newsref/pr/ newsrelease467.html>unveils</A>twonewCommunicatoreditions<AHREF=/flash3/comprod/products/communicator/product_family/comm_retailia.html>InternetAccess</A>and<AHREF=/flash3/ comprod/products/communicator/product_family/
Types of compression programs
comm_retailde.html>Deluxe</A>-whichprovidevaluableutilities,plug-ins,andeasyInternetaccess.</TD> </TR> <! Flash4> <TRVALIGN=TOP><TDWIDTH=10ALIGN=RIGHT><IMGSRC=/ inserts/images/ bullet_sm.gifHSPACE=10VSPACE=5BORDER=0WIDTH=4HEIGHT=4></TD> <TD>Netscape<AHREF=/flash4/newsref/pr/ newsrelease475.html>expands</ A>itsindustryleadingsupportforJavawiththeavailabilityofJavaDe velopmentKit1.1forNetscapeCommunicator. </TD> </TR> <! Flash5> <TRVALIGN=TOP>
Types of compression programs

All compression programs work on the same basic principlereplace long redundant strings with shorter ones. The examples in this booklet refer to the LZS compression algorithm owned by Hifn. LZS, which stands for Lempel Zif Stac, is by far the most common type of compression found in networking hardware and software. Hifns LZS compression is found in many products, including those sold by Cisco, 3com, Lucent, IBM and Novell. Another compression algorithm, used in the Windows NT operating system, is MPPC (Microsoft Point-to-Point Compression). MPPC uses a licensed version of LZS. LZS is named for its inventors, Lempel, Zif and Stac Electronics. Lempel and Zif developed the mathematical foundations of LZS compression; Stac improved upon their work and developed the engineering solutions that made possible le compression on disks, tape and other computer media. For its efforts, Stac was awarded numerous U.S. patents. Hifn now owns Stacs LZS patents. Some of these patents have been tested in court, including a case won by Stac against Microsoft. Since these patents have been defended successfully once, they can now be enforced easily, without protracted litigation. Both because of the widespread commercial acceptance of LZS, and because of its numerous broad patents, LZS is likely to remain a standard for some time. Hifn is aware of the responsibilities that come with ownership of critically important technology. We participate in public standards bodies, such as the Internet Engineering Task Force (IETF), and support academic research into compression and encryption. In addition, we are committed to fair and reasonable pricing of all our chips and software, with the goal of improving performance and reducing computing and communications costs for everyone.
Lossless vs. Lossy compression
Lossless vs. Lossy compression

Lossless compression works the way you would expect. No data is lost during compression and decompression. You would not want your bank statement, for example, to lose a deposit or two during compression. Hifn is in the lossless compression business. Lossycompression is used for pictures and sounds. In this type of compression, data is lost during compression. This is possible because computers store many more colors and sounds than human eyes and ears can see or hear. Lossy compression can safely throw away much of the original data during compression. The decompressed les look and sound ne.
How encryption works

Mathematical functions create ciphertext
Encryption works by applying mathematical functions to ordinary text so that it is apparently changed beyond recognition. But mathematical functions work in both directions. The original text can be restored, provided you know the key used during encryption. Heres a very simple example: Let the letter A be represented by the number 65 (as it is in fact represented by virtually all computer systems). Multiply 65 by the key of 5 to get 325. Since 325 is not generally known to represent A, it is now secret. To convert the number 325 back to 65, divide by the key (5).
Term 1. Cipher 2. Encrypt (or encipher) 3. Decrypt (or decipher) 4. Cryptology 5. Cryptanalysis Denition A secret code Change plaintext to ciphertext Change ciphertext to plaintext Art of secret writing or making codes Code breaking
Similarities between code breaking and compression
6. Plaintext 7. Ciphertext
Ordinary writing Secret writing
The problem with the above code is that it wouldnt fool anyone who makes it his or her business to break codes. Nevertheless, the principle of substitution illustrated here is still at the heart of modern cryptanalysis (literally, taking apart secrets).
Similarities between code breaking and compression

Interestingly, standard code breaking techniques are similar to those used to compress data. Both nd and exploit repetitive patterns in messages. But where compression uses redundancies to save space, decryption (the unauthorized kind) uses repetitive patterns to pry loose the hidden meaning of a message. Earlier we saw that electronic mail messages are highly compressible because of the repetitive nature of their headers. This same characteristic makes electronic mail potentially vulnerable to cryptanalysis. The computer age has produced better codes, and better code breaking, than the world has ever known. Where a 1950s spy might have hidden secrets under a microdot in a letter, a contemporary spy may hide a text message in the magnetic stripe on his ATM card. Another trick uses the lossy compression principle described above. This technique replaces color information in a scanned photograph with text data. The photo appears to be unaltered, since the missing colors can be detected only by a computer, not by the human eye. Encryption is a complex subject, and its mathematics are well beyond most people. For this reason, this primer examines only very simple ciphers. By studying these examples, however, you can learn something about the principles used in much more advanced cryptography. Making codes, and trying to break them, is the best way to learn. And, its fun. Make no mistake, however, the real business of encryption is deadly serious. For example, selling or giving away computer encryption programs to a foreign country could earn you a stretch in federal prison. Under export laws, encryption was until recently regarded as a munition, such as a cruise missile. It is still illegal to export strong encryption without specic authorization from the Commerce Department. On the
10
Brute force computing
other hand, use of strong computer security within the United States and Canada is perfectly legal. Indeed, the government encourages businesses to use good security practices. Youve undoubtedly heard about computer hackers who use their expert knowledge to break into places where they dont belong. Sometimes these invaders hack just for the fun and challenge of it, but sometimes they steal money, condential business information or even military and diplomatic secrets that affect our national security. Encryption is our primary protection against hackers.
Brute force computing

You might think there is enough computing power available to perform all the compression and encryption in the world, but this is not so, at least not yet. One analyst calculated that the entire output of Intels factories for the next twenty years would be required to make one months credit card transactions secure enough for the World Wide Web. This calculation, based on 80 billion transactions and a rate of two credit card transactions per second, was accurate at the time it was made. But engineers are already working on a goal of 1500 transactions per second, which will reduce the hypothetical time required to build the chips from 20 years to 10 days. Cryptographers no longer talk about unbreakablecodes, but rather, they talk about the estimated time and money required to crack a particular code. Expert Bruce Schneier writes that a billion chips testing a billion instructions per second would still take longer than the age of the universe to make a successful brute force attack against an encryph engine called IDEA. And while you could get the job done in a single day with enough chips, he observes that there arent enough silicon atoms in the universe to build that many. It should be noted that this does not mean that IDEA is invulnerable, but only that an attack based on trying every possible key would probably not succeed. The point is, good computer designers, like good military strategists, use a combination of diplomacy, subtlety and raw power to achieve their objectives.
11
Substitution codes
Breaking codes
All of the examples in this section are taken from ciphers developed hundreds, if not thousands, of years ago. The most recent technique discussed was rst published in 1918, well before the invention of programmable electronic computers. Modern encryption is far more complex. Whether simple or complicated, however, almost all encryption methods use substitution or transposition, or both.
Substitution codes
There is a simple code used in the movie 2001: A Space Odyssey. The computer in this movie, named HAL, was really IBM. To encipher a message in this code, just substitute each letter of the alphabet with the one preceding it in the alphabet (Bbecomes A;C becomes B. . . A becomes Z). Heres what the previous paragraph looks like in the HAL ciphertext:
SGDQD HR Z RHLOKD BNCD TRDC HM SGD LNUHD Z ROZBD NCXRRDX SGD BNLOTSDQ HM SGHR LNUHD MZLDC GZK VZR QDZKKX HAL SN AQDZJ SGHR BNCD ITRS RTARSH-STSD DZBG KDSSDQ NE SGD ZKOGZADS VHSG SGD NMD ADGHMC HS HM ZKOGZADS Z ADBNLDR Y SGHR HR Z UZQHZMS NE NMD NE SGD NKCDRS JMNVM BNCDR TRDC AX ITKHTR BZDRZQ BZDRZQ R BNCD RGHESDC DZBG KDSS-DQ SGQDD SN SGD KDES HMRSDZC NE ITRS NMBD
This code is easy to use, and easy for your friends to decipher. Unfortunately, it is also extremely easy for your enemies to decipher. The HAL code uses the simple formula, Ciphertext=Plaintext+1.
12
Breaking codes
Frequency patterns
Frequency patterns
Nevertheless, if you are new to codes, cracking this simple cipher is a good way to understand the importance of producing ciphertext that looks like random letters or numbers. The HAL cipher has a discernible nonrandom pattern that makes it vulnerable to attack. Heres how to analyze this passage. Count the number of times each let-ter appears in the ciphertext. These frequency patterns are an important clue. Here is a partial distribution of letters in the above passage:
Five most common letters Letter D S R Z H Occurence 46 32 25 23 20 Five least common letters Letter I U V J Y Occurence 3 3 3 2 1
You probably can see at a glance that this is not the normal distribution of letters in English words. In the game of Scrabble, there are 12 E tiles, more than any other letter. Why is E not in the top ve? And why does Z appear eight times more frequently than I? If the 390 letters were distributed randomly, each block of ve would appear 75 times. Instead, the top ve letters appear 146 times while the bottom ve appear only 12 times. The human meaning behind this code betrays itself by its non-random distribution of letters. It is reasonable to guess that Dstands for E, since Eis the most common letter in English.
Breaking codes
13
Frequency patterns
The second most common letter in English is T. While this is not as obvious a guess, lets try it anyway. Here is what the passage looks like now:
TGEQE HR Z RHLOKE BNCE TREC HM TGE LNUHE Z ROZBE NCXRREX TGE BNLOTTEQ HM TGHR LNUHE MZLEC GZK VZR QEZKKX HAL TN AQEZJ TGHR BNCE ITRT RTARTHTTTE EZBG KETTEQ NE TGE ZKOGZAET VHTG TGE NME AEGHMC HT HM ZKOGZAET Z AEBNLER Y TGHR HR Z UZQHZMT NE NME NE TGE NKCERT JMNVM BNCER TREC AX ITKHTR BZERZQ BZERZQ R BNCE RGHETEC EZBG KETTEQ TGQEE TN TGE KEET HMRTEZC NE ITRT NMBE
Breaking codes with Microsoft Ofce

A. Microsoft Word. Almost any word processor can be used to search and replace ciphertext to crack a substitution code. However, after you have partially solved a message, it can be difcult to avoid accidentally replacing the plaintext letters as you continue to search for ciphertext. For example, after you have replaced all the Ds with Es in the example exercise, you want to be able to substitute F for E, without also changing the realEs to Fs. Word 7 allows you to search and replace by format, so you can replace regular text with bold text as we have done in the examples in this booklet. 1. Go to Edit-Replace. 2. Click on the More button to expand the dialog box. 3. Enter D in the Find What box. 4. Select the D in the Find What box and click the Format button. 5. Select Fontand choose regular. Click OK. 6. Enter E in the Replace What box and click the Format button. 7. Select Fontand choose bold. Click OK. B. Microsoft Excel. The MID function will break up any text string into one character per cell: MID($A$1,B1,1).The original text goes into cell A1; Column B1 contains a range of numbers from one to the largest estimated number of characters in A1. The number of Es can be counted with the statement, COUNTIF(C1:C1000,e).
14
Breaking codes
Frequency patterns
The very frequent appearance of TGE suggests that this may be the most common three letter word in English, the. Lets try it:
THEQE HR Z RHLOKE BNCE TREC HM THE LNUHE Z ROZBE NCXRREX THE BNLOTTEQ HM THHR LNUHE MZLEC HZK VZR QEZKKX HAL TN AQEZJ THHR BNCE ITRT RTARTHTTTE EZBH KETTEQ NE THE ZKOHZAET VHTH THE NME AEHHMC HT HM ZKOHZAET Z AEBNLER Y THHR HR Z UZQHZMT NE NME NE THE NKCERT JMNVM BNCER TREC AX ITKHTR BZERZQ BZERZQ R BNCE RHHETEC EZBH KETTEQ THQEE TN THE KEET HMRTEZC NE ITRT NMBE
This is getting easier and easier. Look at the word THQEE at the end of the next-to-the-last line. Since this can only be Three, Substitute Q for R:
THERE HR Z RHLOKE BNCE TREC HM THE LNUHE Z ROZBE NCXRREX THE BNLOTTER HM THHR LNUHE MZLEC HZK VZR REZKKX HAL TN AREZJ THHR BNCE ITRT RTARTHTTTE EZBH KETTER NE THE ZKOHZAET VHTH THE NME AEHHMC HT HM ZKOHZAET Z AEBNLER Y THHR HR Z UZRHZMT NE NME NE THE NKCERT JMNVM BNCER TREC AX ITKHTR BZERZR BZERZR R BNCE RHHETEC EZBH KETTER THREE TN THE KEET HMRTEZC NE ITRT NMBE
You can see by now how each substitution contributes to the solution, like unraveling a sweater by continuing to pull one thread. There are many ways to solve this puzzle, but one of the best is to attack the one letter words,A,Iand (among poets) O. Trying Z for A produces this:
THERE HR A RHLOKE BNCE TREC HM THE LNUHE A ROABE NCXRREX THE BNLOTTER HM THHR LNUHE MALEC HAK VAR REAKKX HAL TN AREAJ THHR BNCE ITRT RTARTHTTTE EABH KETTER NE THE AKOHAAET VHTH THE NME AEHHMC HT HM AKOHAAET A AEBNLER Y THHR HR A UARHAMT NE NME NE THE NKCERT JMNVM BNCER TREC AX ITKHTR BAERAR BAERAR R BNCE RHHETEC EABH KETTER THREE TN THE KEET HMRTEAC NE ITRT NMBE
By now, you may be able to gure out longer words or entire phrases, like this one:
EABH KETTER NE THE AKOHAAET
The algorithm and the key Trial and error will solve ciphers of this type fairly quickly. But a faster way is to crack its mathematical secrets. Ciphers have two basic secrets, the algorithm and the key. In the case of the HAL code, the algorithm is: Ciphertext Letter + X = Plaintext Letter.
Breaking codes
15
Caesars code
The key is 1. Once you know that X=1, you can decipher hundreds of pages of ciphertext easily. Your computer can do it in a blink of an eye. You dont have to puzzle out half-deciphered words like AKOHAAET (alphabet).
Caesars code
The HAL Substitution cipher is one of the oldest in the world. It was used by Julius Caesar to send orders and messages to his legions more than 2,000 years ago. Caesar used the key of three (A=D, B=E,etc.). Like a master passkey, the key to a cipher opens all of its doors. It is possible to make up substitution ciphers that are considerably harder to crack than these very easy examples. One thing you can do is hide word lengths by putting all the text in ve character blocks, like this:
ITRTR TARTH TTTEE ABHKE TTERN ETHEA KOHAA ETVHT HTHEN MEAEH HMCHT
Another way to make a better substitution cipher is to use random numbers instead of simply rotating through the alphabet: Breaking this cipher would require you to solve for all 26 letters, since there is no obvious pattern, such as A = B; C = D; E = F. . . . Here E is represented by the 21 st letter of the alphabet, U. Interestingly enough, T is represented by the 20 th letter, which is also T.
9 A 1 B 1 3 C 1 5 D 2 1 E 5 F 7 G 2 H 8 I 2 6 J 1 0 K 1 8 L 1 4 M 2 3 N 3 O 2 2 P 1 6 Q 2 4 R 2 5 S 2 0 T 1 1 U 4 V 6 W 1 9 X 1 2 Y 1 7 Z
While this is a somewhat harder code to crack, a professional cryptographer would pounce on all those Us and Ts, even without a computer, using the underlying frequency pattern of the letters to tear this cipher open like a can of sardines.
Hiding letter frequencies with the Vigenere cipher

An adaptation of the Caesar Code that does hide letter frequency is the Vigenere cipher, which uses the following table to produce multiple oneletter keys:
16
Breaking codes
A A B C D E F G H I J K L M N O P Q R S T U V A B C D E F G H I J K L
B B C D E F G H I J K L
C C D E F G H I J K L
D D E F G H I J K L
E E F G H I J K L
F F G H I J K L
G G H I J K L
H H I J K L
I I J K L
J J K L
K K L
L L
M N M N O P Q R S T U V
O O P Q R S T U V
P P Q R S T U V
Q Q R S T U V
R R S T U V
S S T U V
T T U V
U U V
V V
W X W X Y Z A B C D E F G H I J K L
Y Y Z A B C D E F G H I J K L
Z Z A B C D E F G H I J K L M
M N N P Q R S T U V
W X Y Z A B C D E F G H I J K L
M O O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N O P Q R S T U V
M N N O P Q R S T U V O P Q R S T U V
W X Y Z A B C D E F G H I J K
W X Y Z A B C D E F G H I J
M N O P Q R S T U V W
W X Y Z A B C D E F G H I
M N O P Q R S T Y V
W X Y Z A B C D E F G H
M N O P Q R S T U V
W X Y Z A B C D E F G
M N O P Q R S T U V
W X Y Z A B C D E F
M N O P Q R S T U V
W X Y Z A B C D E
M N O P Q R S T U
W X Y Z A B C D
M N O P Q R S T
W X Y Z A B C
M N O P Q R S
W W X X Y Z X Y Z A Y Z A B
M N O P Q R
M N O P Q
M N O P
W X Y Z
M N O
W X Y
M N
W X
The rst row uses a Caesar shift of 0; the second a shift of 1 and the last a shift of 25. To use this table, rst choose a keyword or phrase such as heartburn.
Breaking codes
17
Next, write the keyword above the message without spaces, repeating it as necessary. Finally, encrypt each letter of the message, To be or not to be by locating the intersection of the plaintext letter and the keyword letter:
HE A R T B UR NHE A R TO B E O R N O TTO BE AS B V H S HF F AS B V
The Vigenere Cipher is considerably more difcult to break than singlekey substitution ciphers, especially if all you have to attack it with is a pencil and paper. Notice, for example, that the two Fs and the two Hs do not stand for the same letter. On the other hand, the two Bs do stand for the same letter, which happens to be B itself. Standard frequency analysis as we used with the HAL cipher will not reveal the Es,Ts, As,Os and Ns. However, this cipher was invented by Blaise de Vigenere in the 16th Century and was demolished by modern cryptographers long ago. The Vigenere Cipher does not produce truly random ciphertext because it repeats itself every time the key repeats. You can see how a two-letter key produces more repetition, and less security, than a nine-letter key. U.S. export laws measure the strength of encryption programs primarily by looking at the maximum key length supported by the program. It is illegal to export encryption with keys that are longer than 40 bits in length without a special export license. The secret to breaking the Vigenere Cipher is learning the length of the key. Suppose you discover that the key is nine characters long. Then you can analyze every ninth letter just as you did when you broke the HAL cipher. In other words, the letters in the series,1, 10, 19, 28 will make up a set of letters where the most common and least common letters will reveal themselves. Similar analysis can be performed on letters 2, 11, 20, 29and so on. But how do you discover the length of the key? Applying something called the Index of Coincidence can do this. You do this by splitting the ciphertext in two blocks and counting the times that the letter in the upper block is the same as the letter in the lower block. In English
18
Breaking codes
Transposition codes
plaintext, this coincidence will occur about six percent of the time. On the other hand, randomly chosen letters will match up only about 1/26, or three percent of the time. Then you try the same thing, shifting the letters each time. The result of your comparisons is a table that looks like this one:
Shift 1 2 3 4 5 6 7 8 9 10 11 Index 2.80% 4.50% 3.40% 3.70% 4.20% 3.50% 0.32% 3.20% 7.00% 3.10% 4.20%
A shift of nine produces an English-language result, meaning that the key used in the Vigenere cipherhad nine letters. Knowing the length of the key, you can extract every ninth letter and treat the group as a sim-ple substitution cipher. The Vigenere cipher disguises, but does not ultimately hide, the underlying letter frequency patterns of English.
Transposition codes
Substitution is a key ingredient in all cryptography; a second equally important one is transposition. Here is a simple way to transpose the same plaintext message we used before. Take a piece of graph paper and write the message, one character per box. (Or, write the message in a spreadsheet
Breaking codes
19
Transposition codes
such as Microsoft Excel, one character per cell.) Then cipher the message by copying it vertically. (In Excel, use the paste special with the Transpose box checked.) Plaintext:
T HE R E I S A S I S I MP L E CODE U S E D I N T H E MO VIE2001:AS P A CE ODYS S E Y . T HE C OMP UT E R I N T HI S MOV I E , NA ME D HA L,WASREALL Y I B M. T O B R E A K T HI S C OD E , J US T S UB S T I T UT E E A C H L E T T E R OF T HE A L P HA B E T W I T H T HE ONE Z Z Z Z Z Z
Ciphertext:
T S S V P R I L YR E I T L T HI E I A E , E , T T P H E MDE C T I , A UE HE R P E HN I KJ T R A E L I 2 E NWB UE B O E N0 O T A A MT S OE N I 0 DC HMS . HT E F T E S CT 1 Y OI E I A Z
20
Breaking codes
Transposition codes
OH: S MS DR S S CT WZ A DE S P E T UHHI Z E A E UM A OC B E T Z S M YT OHL OS L HZ I UOS . E V A L B DT E A Z TSSVP RILYREITLTHIEIA E,E,TTPHEMDECTI,A UEHERP EHN IKJTRA ELI2 E NWB UE BO EN0O TAAMTS OENI 0DCHMS.HTEFTESCT1YOIE I A Z OH:SMSDR SSCTWZADE SP ET UHHIZ E AEUMAOCB ETZS M YTOHL OSL HZIUOS.EVALBDTEA Z
To attack a transposition cipher, begin as you did with the Caesar Cipher, by analyzing the letter frequency: The table shows that Eappears 24 times,Tappears 17 times, and that that Qand Xdo not appear at all. This conforms both to our
A B C D E F G H I J K L M 11 4 5 5 24 1 0 11 11 1 1 6 6 N O P Q R S T U V W X Y Z 4 9 4 0 5 12 17 5 2 2 0 3 6
Breaking codes
21
A known plaintext attack
Scrabble-playing experience and to the known incidence of letters in English writing,which are listed here in order from the most common to the least:
Standard English Ciphertext ETAONRISHDLFCMUGPYWBVKXJQZ ETSAHIOLMCDRUBNPYVWFJKGQXZ
Even a small amount of ciphertextin this case, just 200 charactersis enough to spot the telltale frequency patterns of the English language. This letter frequency strongly suggests a transposition, rather than a substitution cipher.
A known plaintext attack

Transposition ciphers can be hard to crack, especially if you are armed only with graph paper. But in this example, you have a big advantage. You can launch a Known Plaintext Attack against the message. This means you know both the plaintext and the ciphertext of a single paragraph taken from this primer. You can use that knowledge to recover the key, which you can use to decode not just one paragraph, but the entire primer. Knowing that the rst words of the message are There is a simple code count the characters between the rst instance of a T and an H. There are 16. Now count 16 characters from the H. Sure enough, you come to an E. If you count 16 characters from the E, you nd an R. Clearly, this is no coincidence. Create a box on your graph paper that is 16 boxes wide. Write the ciphertext, one character per box. Reading downward instead of across exposes the plaintext. What if you dont know any plaintext? Well, then you have to use some trial and error, but its not as difcult as you might think. A dedicated code-breaking program can solve much more difcult transposition ciphers than this at the speed of light, but even commonly available tools such as Microsoft Word can automate your cryptanalysis considerably. In Word, use the Text to Tableand Table to Text commands to try out different numbers of columns until you nd the right one. If you how to record macros, you can easily print out page after page of possible tables.
22
Breaking codes
Algorithms and key length
Known plaintext attacks are very common in the real world of espionage. A close relative is the chosen plaintext attack, where the code breaker can send a message and compare it to the resulting ciphertext. A famous example of a chosen plaintext attack occurred in World War II. The Americans had broken most but not all of a Japanese code. From this, they knew that the Japanese Navy was moving toward a U.S. island, but they didnt know which one. So U.S. naval intelligence sent a fabricated message, knowing that it would be intercepted. The message said that Midway Island was desperately short of water. Japanese spies sent a coded message home, __________ is short of water. By intercepting this new message, the Americans conrmed the name of island toward which the Japanese were steaming: Midway. There are numerous ways that plaintext can fall in the wrong hands. Suppose you tell someone in a eld ofce something about the competition. Isnt it likely that your information will be encrypted and sent to headquarters for evaluation? Or perhaps a subordinate of the company is asked to send an innocuous message by electronic mail as a favor to a friendly salesperson. The result: known plaintext. Finally, there are out-of-bounds attacks, including rubber-hose cryptanalysis (threats and bribes) to obtain not only plaintext, but also encryption keys and other secrets.
Contemporary Encryption
Algorithms and key length
You have seen how the 26 letters of the alphabet can be substituted and transposed to create ciphertext. Computers do not operate directly on the letters of the alphabet. Instead, they manipulate binary numbers ones and zerosthat represent data. In modern encryption, substitution and transposition operate directly on these binary digits or bits. In the following example, each of the last three bits of the number is exchanged for its opposite (ones become zeros; zeros become ones):
23
Symmetric and public keys
Plaintext Decimal Equivalent Binary Equivalent Binary Ciphertext Decimal Ciphertext Ciphertext
I 73 01001001 01001110 78 N
B 66 01000010 01000101 69 E
M 77 01001101 01001010 74 J
It hardly needs saying that numbers of these sizes, and the computing power necessary to manipulate them, have resulted in ciphers that are almost unimaginably more complex than the manual ciphers studied in this pamphlet. At the end of the day, however, these machine-produced ciphers use principles of substitutionand transposition that have their roots in classical cryptography. And, since the code breakers have the same advanced tools that are available to the code makers, it is fair to say that the game is still denitely afoot. A cardinal principle of modern cryptography is that all security should rest in the key. This means that the inner workings of an algorithm such as DES can be studied and discussed publicly and still produce ciphertext that is unreadable to anyone who does not have the key.
Symmetric and public keys

In all of the ciphers we have experimented with, the key used to encrypt and decrypt was the same. This is called symmetric key encryption or secret key encryption. If you and I want to use symmetric key encryption, we must agree on a single key and keep it secret. If you let someone see your key, then my security is compromised along with yours. Asymmetric, or public key encryption uses two pairs of keys, one public one private. I give you my public key so you can send me an encrypted message. I use my private key to read your message. In turn, you give me your public key so I can reply to your message. You use your private key to decipher my message. The advantage of this approach is that if either public key is intercepted, it can only be used to send an encrypted message. It cannot be used to decipher a message. Furthermore, if I give away my private key, it does not affect your security.
24
Authentication
Public keys are more convenient than symmetric keys, since they can be exchanged freely, even over insecure electronic mail.
Authentication
Authentication guarantees that a message has not been altered along the way. It is similar to watermarks, pin printing and other measures used to protect paper checks. Authentication works by creating a message digest, derived mathematically from the message itself. This digest accompanies the electronic mail. At the receiving end, another digest is created from the message and compared to the rst one. If the two digests are identical, it means that the message has not been changed. Hifns advanced coprocessors perform compression, encryption and authentication at the chip level for maximum efciency and performance.
Conclusion
The Internets building blocks
The Internets explosive growth depends upon the solidity and safety of its underlying structure. The related technologies of compression and encryption are crucial building blocks that go into the switches, routers, bridges and other computer networking equipment.
LZS compression, the de facto standard

If you buy networking equipment for your business, such as a router to connect to your Internet Service Provider, you can save money by making sure that it contains LZS compression, the de facto standard. Most hardware and software manufacturers use either LZS or its licensed variant, MPPC. Just as a package that weighs less than another one costs less to ship, so compressed data costs less than uncompressed data to send along the Internet. Even a reatively small amount of compression can avoid a signicant extra expense. Some kinds of les, such as HTML web pages and electronic mail, are very compressible. Previously compressed or encrypted les can be compressed only a little bit or not at all.
Conclusion
25
Encryption: Essential for the Internets growth
Encryption: Essential for the Internets growth

Encryption is crucial if you plan to buy or sell on the Internet. While standards are not as clear as in the case of compression, three emerging security protocols are SSL, IPSec and PPP. Each of these standards supports data compression, which must always occur before encryption. Authentication is a closely-related function that protects electronic messages so you can be sure they have not been altered. Encryption uses substitution and transposition to hide meaning. These techniques are at least as old as Caesars Roman Legions. Modern computer-assisted encryption methods produce ciphertext that has no discernible pattern. This ciphertext cannot be compressed because the characters appear to be randomly chosen. Known frequency patterns of the English language, including the index of coincidence, are completely disguised. Modern cryptanalysts are frequently able to mount a known plaintext attack. Strong encryption, which includes key lengths of 56 or more bits, should be able to resist such an attack. Security is not the only design goal of modern computer systems. Efciency and performance are also crucial. Hifns latest generation of products integrate compression, encryption and authentication on a single chip.
26
Conclusion

Compression Encryption

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compression Encryption

Uploaded by

Copyright:

Available Formats

i

Understanding the Raw Materials of the Internet

Why encryption and compression are important

Why encryption and compression are important

Compression saves money

Compression saves money

Why encryption and compression are important

Encrypted data cant be compressed

Type of Services Modem ISDN T1 T3

Encrypted data cant be compressed

Encryption must follow compression

Why encryption and compression are important

Duplicate strings of characters replaced with tokens

How compression works

The 1st appearance of the phrase is not compressed.

How compression works

Compression speed is important

Compression speed is important

How compression works

What data makes the smallest files

What data makes the smallest les

Electronic mail messages contain many compressible phrases

How compression works

The HTML used for Netscapes homepage compressed at the rate of 5 to 1

The HTML used for Netscapes homepage compressed at the rate of 5 to 1

How compression works

Types of compression programs

Types of compression programs

How compression works

Lossless vs. Lossy compression

Lossless vs. Lossy compression

How encryption works

How encryption works

Similarities between code breaking and compression

Ordinary writing Secret writing

Similarities between code breaking and compression

How encryption works

Brute force computing

Brute force computing

How encryption works

Breaking codes with Microsoft Ofce

Hiding letter frequencies with the Vigenere cipher

Hiding letter frequencies with the Vigenere cipher

Hiding letter frequencies with the Vigenere cipher

A known plaintext attack

A known plaintext attack

Algorithms and key length

Symmetric and public keys

Symmetric and public keys

LZS compression, the de facto standard

Encryption: Essential for the Internets growth

Encryption: Essential for the Internets growth

You might also like