You are on page 1of 43

DATA COMPRESSION AND ENCRYPTION

Written By

Vivek Laxmikant Shukla

Preface This Paper Solution will help the students of Mumbai University understand the concepts and how it could be applied to problem solving. The past papers help the students to become familiar with the types of questions being asked in the university exams.

Note: This paper solution however is not a substitute to a standard textbook. We would like you to refer various standard books to get a deep conceptual understanding of the topic before referring this book.

Table Of Contents

MAY 2013

Module No. 1

1). Briefly explain modeling and coding with respect to data compression.
Ans: Data Compression = Modeling + Coding Data compression consists of taking a stream of symbols and transforming them into codes. Modeling and coding: The decision to output a certain code for a certain symbol is based on the model.Model is simply a collection of data and rules used to process input symbol and determine which code to output. Coding helps to generate optimal codes. In short, a program uses the model to accurately define the probabilities for each symbol and a coder to produce an appropriate code based on those probabilities. The following example will illustrate the concept of modeling and coding. Consider Huffman encoder. If we break the Huffman encoder, It would look something like:

The above model is a statistical model for an Huffman encoder. In the case of Huffman coding, a set of probabilities are generated by a model which are then coded using an encoder to generate an output stream. The model calculate a set of probabilities and depending on the value of probability coding takes place. A symbol with high probability are coded with very few bits. In contrast, symbol with lower probabilities are coded with more bits. Thus we can say compression is made up of two parts: Modeling Coding

Different modeling techniques can be used all of which can use the same coding techniquies to produce output. Various types of modeling for lossless compression genrally used are : Statistical modeling Dictionary based modeling

2). Explain the principle of public key cryptography.


Ans: Private key cryptography uses a single private key which is shared by both sender and receiver. If the key is disclosed communication is compromised. To eliminate this public key cryptography is used. Public key cryptography is also called a asymmetric encryption. It is called asymmetric as the parties are not equal i.e. those who encrypt messages cannot decrypt messages. Principle: Public key cryptography is a form of cryptosystems in which encryption and decryption are performed using different keys : A public key A private key

A public key is known by anybody, and is used to verify signatures and encrypt messages. A private key is known only to the recipient which is used to decrypt messages and sign signatures. For example:

In public key cryptography, the plain text is transformed into cipher text using an encryption algorithm and a public key. Using the paired private key and decryption algorithm the plain text is recovered. Applications: Encryption/Decryption. Digital signatures. Key exchange.

3).Explain working principle of arithmetic coding with a suitable example.


Ans: Arithmetic coding is a form of entropy coding used in lossless data compression. One of the drawbacks of Huffman coding is that it assigns an integer no. of bits to individual symbols, which adds some coding redundancy. Arithmetic coding overcomes this drawback by assigning one long code to represent a string of symbols instead of assigning codes to individual symbols. Arithmetic coding is also based on the probability model of the symbols to be encoded. Initially, the coding starts with a code assigned to the first symbol which gets

modified as other symbols are added. The result code when the last symbol is encoded is the compressed data. The process of encoding and decoding will be clear with the following example. For Example : Consider a source set S {S1,S2,S3}. Let P(S1) = 0.80 , P(S2) = 0.02, P(S3) = 0.18. Let us encode {S1 S3 S2 S1}. Step 1: Map {S1 S3 S2 S1} {1 3 2 1} Calculate the cumulative distribution function:

FX(k) FX(1) FX(2) FX(3)

0 0.8 0.82 1.0

k0 k=1 k=2 k=3

Step 2: Initialization: l(0)=0 u(0)=1 Step 3: Ist term in the sequence is 1. l(1)= l(0)+(u(0)-l(0))Fx(0)

= 0 +(1-0)(0) =0 u(1)= l(0)+(u(0)-l(0))Fx(1) = 0 + (1- 0)(0.8) = 0.8 i.e. the tag is contained in the interval [0, 0.8) Step 4: 2nd term in the sequence is 3. l(2)= l(1)+(u(1)-l(1))Fx(2) = 0 + (0.8 0)(0.82) = 0.656 u(1)= l(1)+(u(1)-l(1))Fx(3) = 0 + (0.8 0)(1) = 0.8

i.e. the tag is contained in the interval [ 0.656 , 0.8) Step 5: 3rd term in the sequence is 2 l(3)= l(2)+(u(2)-l(2))Fx(1) = 0.656 + (0.8 0.656)(0.8)

= 0.7712 u(1)= l(2)+(u(1)-l(1))Fx(2) = 0.656 + (0.8 0.656)(0.82) = 0.77408 i.e. the tag is contained in the interval [ 0.7712,0.77408)

Step 6: 4th term in the sequence is 1 l(4)= l(3)+(u(3)-l(3))Fx(0) = 0.7712 + (0.77408 0.7712)(0) = 0.7712 u(1)= l(3)+(u(3)-l(3))Fx(1) = 0.7712 + (0.77408 0.7712)(0.8) = 0.773504 Step 7: , the tag for sequence 1321 can e generated using

Deciphering of tag: Let us decode the tag. Consider a tag 0.772352 Step 1:

10

Initialization: l(0)=0 u(0)=1 Step 2: l(1)= l(0)+(u(0)-l(0))Fx(x1-1) = 0 + (1-0) Fx(x1-1) = Fx(x1-1) u(1)= l(0)+(u(0)-l(0))Fx(x1) = 0+ (1 - 0) Fx(x1) = Fx(x1)

x1 1 As 0.772352 lies in the interval [0, 0.8) Pick x1=1

[Fx(x1-1), Fx(x1)) [0, 0.8)

Step 3: l(2)= l(1)+(u(1)-l(1))Fx(x2-1) = 0 + (0.8-0) Fx(x2-1) = 0.8Fx(x2-1) u(2)= l(1)+(u(1)-l(1))Fx(x2)

11

= 0+ (0.8 - 0) Fx(x2) = 0.8Fx(x1)

x1 1 2 3 As 0.772352 lies in the interval [0.656, 0.8) Pick x2=3 i.e. the tag is contained in the interval [ 0.656 , 0.8) Step 4: l(3)= l(2)+(u(2)-l(2))Fx(x3-1) = 0.656 + (0.144) Fx(x3-1) u(3)= l(2)+(u(2)-l(2))Fx(x3) = 0.656+ (0.144) Fx(x3)

[0.8Fx(x1-1), 0.8Fx(x1)) [0, 0.64) [0.64, 0.656) [0.656, 0.8)

x3 1 2

[0.656 + (0.144) Fx(x3-1), 0.656+ (0.144) Fx(x3)) [ 0.656,0.7712) [0.7712,0.77408)

As 0.772352 lies in the interval [0.7712, 0.77408) Pick x3=2

12

i.e. the tag is contained in the interval [ 0.7712 , 0.77408) Step 5: l(4)= l(3)+(u(3)-l(3))Fx(x4-1) = 0.7712 + (0.77408-0.7712) Fx(x4-1) = 0.7712 + (0.0028) Fx(x4-1) u(3)= l(3)+(u(3)-l(3))Fx(x4) = 0.7712+ (0.0028) Fx(x4)

X4 1

[0.7712 + (0.0028) Fx(x4-1), 0.7712 + (0.0028) Fx(x4)) [ 0.7712, 0.77344)

As 0.772352 lies in the interval [0.7712, 0.77344) Pick x4=1 i.e. the tag is contained in the interval [ 0.7712 , 0.77344) Decipehered tag is {S1 S3 S2 S1}{1 3 2 1}

4). Explain Chinese remainder theorem.


Ans: It is one of the important theorem of number theory. CRT says that it is possible to reconstruct integers in a certain range from their residues modulo a set of pairwise relatively prime moduli.

13

Theorem: Let m1,.,mk be integers with gcd (mi, mj)=1 whenever ij. Let m be the product m=m1 m2.mk. Let a1,., ak be integers. Consider the system of congruences: xa1 (mod m1) xa2 (mod m2) .. xak (mod mk): Then there exists exactly one x Zm satisfying this system. The solution to the system may be obtained by the following algorithm. We may solve the system as follows. (1) For each i=1,.,k, let zi =m/mi=mm2.mi-1mi+1mk. (2) For each i =1,.,k, let yi =zi-1 (mod mi). (Note that this is always possible because gcd(zi;mi)=1.) (3) The solution to the system is x=a1.y1.z1+.+ak.yk.zk . Example: Use the Chinese Remainder Theorem to find all solutions in Z60 such that x 3 mod 4 x 2 mod 3 x 4 mod 5: We solve this in steps. Step 0: Establish the basic notation. In this problem we have k=3, a1 =3, a2 =2, a3 =4, m1=4,m2=3, m3=5, and m=435=60.

14

Step 1: Implement step (1). z1=m=m1=60/4=35=15, z2=20,and z3=12. Step 2: Implement step (2). We solve ziyi 1 mod mi , i =1,2,3. In this problem, we need to solve 15y11mod4 20y21mod3 12y31mod5 The yi can be computed using the tally table version of the generalized Euclidean algorithm (cf. Congruence Supplement). For example, in the first equation for y1 , the tally method automatically solves 15y1+4t =1 for y1 and t, and we find that y1 =3. Continuing, we find that y1=3, y2=2,and y3=3. Step 3: Implement step (3). x a1y1z1+a2y2z2+a3y3z3 (mod 60). Substituting, we obtain 3315 + 2220 + 4312 = 359 which reduces to x 59 (mod 60).

5). Define the following terms: Compression Ratio Distortion Compression Rate Fidelity and Quality Self information
Ans: Compression Ratio:

15

Compression ratio is defined as the ratio between the uncompressed size and compressed size.

Thus a representation that compresses a 10 MB file to 2 MB has a compression ratio of 10/2 = 5, often notated as an explicit ratio, 5:1 (read "five" to "one"), or as an implicit ratio, 5/1. Lossless Compression techniques have low compression ratio whereas lossy compression techniques have more compression ratio. Compression Rate: Compression rate is the rate of the compressed data (which we imagined to be transmitted in ``real-time'. Compression rate is an absolute term, while compression ratio is a relative term. Distortion: In lossy compression, the reconstruction differs from the original data. Therefore, in order to determine the efficiency of a compression algorithm, we have to have some way of quantifying the difference. The difference between the original and the reconstruction is often called the distortion. Fidelity and Quality: Fidelity is also a quantity that is a difference of reconstruction and original. When we say that the fidelity or quality for construction is high, we mean that the difference between the reconstruction and the original is small. Self Information: Self information is a measure of the information content associated with the outcome of a random variable. It is expressed in a unit of information, for example bits, nats, or hartleys, depending on the base of the logarithm used in its calculation. It is given by: i(A)= .

A = an event which is a set of outcomes of a random experiment

16

P(A) = Probability that event A will occur

6). Explain Diffie Hellman Key Exchange with an example.


Ans: It is based on mathematical principles. Diffie Hellman Key Exchange algorithm can be used only for key agreement, but not for encryption or decryption of message. Algorithm: 1. 2. 3. 4. 5. Select 2 Prime numbers. (1) prime number q (2) is a primitive root of q. Suppose a user A and B wish to exchange a key. User A selects a random integer XA q and computes YA = XA mod q. User B selects a random integer XB q and computes YB = XB mod q. Both side keeps the value X private and makes the Y value available publicly to the other side. 6. User A computes the key as K= (YB)XA mod q. 7. User A computes the key as K= (YA)XB mod q. Both side gets the same result: K = (YB)XA mod q = (XB mod q)XA mod q = (XB)XA mod q = XAXB mod q = (XA mod q)XB mod q = (YA)XB mod q. For Example: Key exchange is based on the use of prime number and a primitive root of prime number. Prime number q = 353 Primitive root = 3 A and B select secret keys. XA = 97 XB= 233

17

Calculate the public keys. A computes YA = XA mod q = (3)97 mod 353 = (1.90801097) mod 353 = 40

B computes YB = XB mod q = (3)233 mod 353 = (1.476510111) mod 353 = 248 After they exchange public keys, each can compute the common secret key. A computes K = (YB)XA mod q = (248)97 mod 353 = (1.827310232) mod 353 = 160 B computes K = (YA)XB mod q = (40)233 mod 353 = (1.905310373) mod 353 = 160

7). What is adaptive dictionary based coding. Explain any one method.
Ans: In some applications we do not want coding technique to be based on the content of the file. Rather we would like the technique to adapt to the characteristics of the source. It was developed by Jacob Zin and Abraham Lempel.Hence it is also known as LZ techniques. One method:

18

LZ 77 approach: Diagram: Sliding window Search buffer Introduction: The LZ 77 compression method is an adaptive compression method where the encoder dynamically builds a dictionary from the input data and uses the previous occurring strings to compare and compress the new strings. In this method, the encoder examines the input sequence through a sliding window. (Search Buffer + Look ahead buffer). Search buffer contains a portion of recently encoded sequence. A look ahead buffer contains the next portion of a sequence to be encoded. For each encoded string, a token is written on the output stream. The LZ 77 token structure is Then, the window is shifted to right and the process is repeated. Offset: It gives the location of matched field in the dictionary. Length: It gives the no. of symbols in the string which found a match in the dictionary. Unmatched Symbol: It stores the next symbolin the input string after the matched string. Example: Let us consider an example to illustrate the process of encoding using LZ 77 approach. Let us encode: cabracadabrarrarrad Step 1: Assigning of the dimensions of the window: (The search buffer and the Look ahead Buffer) Let us consider a length of the window as 13. Let the size of search buffer be (N) = 7 Look ahead buffer

19

Let the size of look ahead buffer (N-1 ) = 6 Step 2: c a b r a c a d a b r a r Search Buffer Look ahead buffer rarrad

The first symbol in the look ahead buffer is d. There is no match found for d in the search buffer. offset= 0 Length= 0 Unmatched symbol= c(d) The triple is < 0, 0, c(d))> Step 3: Slide the window to right by length+1=1 a b r a c a d a b r a r r Search buffer look ahead buffer rrad

The 1st symbol in the look ahead buffer is a which has been not coded. a has 3 matches in the search buffer at 2, 4, 7. Consider the offset= 7 as it has the maximum match length= 4 unmatched symbol= c The triple is <7, 4, c(r)) Step 4: Slide the window by length + 1= 5 a d a b r a r r a r r a d Search buffer look ahead buffer The 1st symbol in the look ahead buffer is r which is not coded.

20

r has 2 matches at 1, 3. However consider offset= 3 as it has a match lenth=3+2(we add 2 as ra is already encoded)=5 Unmatched symbol= c(d) The triple is <3, 5, c(d)> Step 5: Side the window by length +1=6 r r a r r a d - - - - - Search Buffer look ahead buffer The look ahead buffer is empty so stop as the string has been encoded completely

8). What is the significance of prime numbers in public key Ans: Importance:

cryptography? Explain RSA algorithm with a suitable example.

A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. A product of larger prime numbers is difficult to factorize. This fact is widely used in public key cryptographic algorithms. For example Several public-key cryptography algorithms, such as RSA and the DiffieHellman key exchange , are based on large prime numbers. It has been found that building prime numbers from two other prime numbers is easier.This operation is computationally fast. Hence used in encryption. Also, To break the system it is necessary to factorize large natural numbers obtained from the product of two primes.However this operation is too slow. To factorize a 140 digit prime number, it takes a month if the system is continuously on.

21

RSA algorithm: RSA is a block cipher in which the plain text and the cipher text are integers between 0 to n1 for some n.A typical size for n is 1024 bits. The RSA algorithm developed by Rivest, Shamir and Addleman in 1977 at MIT is a public key encryption type algorithm.In this algorithm,one user uses a public key and the other uses the private key.In the RSA algorithm each station independently and randomly chooses two large primes p and q and multiplies them to produce n=pq which is the modulus used in arithmetic calculation of the algorithm. Process of Algorithm: 1. 2. 3. 4. 5. 6. 7. Select p and q but both are prime numbers. Calculate n=pq Calculate z=(p-1)(q-1)=(n) Select integer e.gcd( (n) ,e )= 1 ((n)=z) 1e<(n). Calculate d=e-1 mod ( (n) ) For encryption: C=Me mod n where M is plain text and C is cipher text. For decryption: M=Cd mod n

Example: Select two prime numbers , p=17 and q=11. Calculate n=pq =1711 =187 Calculate z =(n)=(p-1)(q-1) =(17-1)(11-1) =1610 =160 Select e such that e is relatively prime to (n)= 160. So we select e = 7. Determine d=e-1mod((n)) d=23 Public key PU= {7 , 187} Private key PR= {23,187} Suppose the plain text value M=88,

22

Then For encryption C = Me mod n = (88)7 mod 7 = 11 For Decryption M = Cd mod n = (11)23 mod 187 = 88

9). Explain DPCM and ADPCM techniques used in audio compression.


Ans: DPCM audio compression: Introduction: Differential pulse code modulation is a derivative of standard PCM . This technique exploits the fact thatfor most of the signal, the range of the differences in the amplitude getween successive samples is less than the actual sample amplitudes in any audio waveform. As a result, fewer bits are required to encode the difference signal. DPCM Encoder:

input x(n) + -

d(n)

Quantizer

code c(n)
Input signal: x(n) Inverse Quantizer Signal difference: d(n) Quantized difference signal : dq(n) Reconstructed signal: sr(n) Signal estimate: se(n) DPCM code: c(n)

se(n)
Predictor

dq(n) sr(n)
+

+ +

23

Explanation of encoder: The two main parts of the DPCM encoder is i. The quantizer ii. The linear predictor Let x(n) be the input signal and se(n) be the estimated or predicted signal. The comparator finds the difference between the input signal x(n) and the estimated signal se(n). This is called the difference signal d(n). d=x(n)-se(n) The predicted signal is produced by the linear predictor filter. Some part of the output signal is dequantized to get a quantized difference signal which is added and given to the predictor. This makes the prediction more and more close to the actual input signal x(n). As the difference signal is very small, less no. of bits are required to encode. DPCM Decoder: code c(n)
Inverse Quantizer

dq(n) +

sr(n)
+

+
Predictor

Explanation of decoder: The decoder first reconstructs the quantized difference signal from incoming binary signal. The prediction filter output is summed with quantized difference signal to give the quantized version of the original input signal. ADPCM audio compression: Introduction: DPCM is a simple technique but it is not efficient. This is because they do not adapt to varying magnitude of the audio stream To overcome this problem, an adaptive version can be used. Any such version is called ADPCM.

24

ADPCM encoder:

input x(n) + -

d(n)

Adaptive Quantizer Step size adaptation

code c(n)
Input signal: x(n) Signal difference: d(n) Adaptive Inverse Quantizer Quantized difference signal : dq(n) Reconstructed signal: sr(n) Signal estimate: se(n) ADPCM code: c(n)

se(n)
Adaptive Predictor

sr(n)
+

dq(n) + +

Explanation of Encoder: ADPCM uses the previous samples stored in the register to predict the sample values. It then computes the difference between the currnt input signal and the predicted signal and quantizes the difference. Let x(n) be the input signal and se(n) be the predicted value obtained from the stored value. The comparator finds the difference. The quantizer computes and outputs the quantized code. This is then sent to the adaptive dequantizer which produces dequantized difference signal dq(n). This value is added to the previous predicted output se(n) and the sum is given to the predictor. The predictor uses this to generate the next prediction. The adaptive technique adapts due to step size adaptation block. ADPCM Decoder: code c(n)
Adaptive Inverse Quantizer

dq(n) +

sr(n)
+

+
Adaptive Predictor

Stepsize adaptation

25

At the decoder, the received value is given to the adaptive dequantizer which generates the dequantized values. The output of dequantizer is added to the predicted value from an adaptive predictor to get sr(n).

10). What is malicious programme? Explain different types of malicious programmes.


Ans: Malware, short for malicious software, is software programme used to disrupt computer operation, gather sensitive information, or gain access to private computer systems. They exploit the vulnerabilities of a computing system. One way to classify:
Malicious Software

Programs needing host

Independent programs

Program needing host (Parasitic): These are fragments of programs that cannot exist independently. Eg: Viruses, Logic Bombs and Backdoors. Independent Programs: Self contained programs that can be run by an operating system. Eg: Worms and Bot programs. Other way to classify:
Malicious Software

Replicating

Non replicaitng

Replicatng programs: A program which develops several copies to be activated later.

26

Eg: Viruses and Worms. Non replicating programs: Programs that are activated by a trigger. Eg: logic bombs and bot programs. Type of Malicious programs: Name Virus Description Malware that when executed tries to replicate itself into other executable code. When it succeeds, the code is said to be infected. When the infected code executes, the virus also executes. A computer program that can run independently and can propagate a complete working version of itself onto other hosts on a network. A program inserted into a software by an intruder. A logic bomb lies dormant until a predefined condition is met; the program then triggers an unauthorized act. A computer program that appears to have a useful function, but also has a hidden as well as potentially malicious function that evades security mechanisms, sometimes by exploiting legitimate authorization of a system entity that invokes the Trojan Horse program. Any mechanism that bypasses a normal security check, which allows unauthorized acces to the functionality. Software that can be shipped unchanged to a heterogenous collection of platforms and execute with identical semantics.

Worm Logic Bomb

Trojan Horse

Back Door Mobile Code Exploits

27

11). Explain audio encoder and decoder used in MPEG.


Ans: Introduction: MPEG stands for Motion Pictures Expert Group. It was formed by ISO> MPEG has developed the standards for video and audio compression. MPEG audio coders are used for compression of audio. It uses perceptual coding. Encoder for MPEG audio coding: It is also called the MPEG audio coder.The various blocks explained are: PCM encoder: The PCM encoder generates 32 PCM samples of input audio signal. The time duration of 32 samples deped upon sampling frequency. Analysis Filter Bank: These 32 PCM samples of audio segment are given to analysis filter bank. It calculates 32 point DFT of input 32 PCM samples.This conversion generates 32 frequency components which hare treated as 32 sub bands by analysis filter bank. If the maximum signal frequency is 16 KHz and is sampled with 32 KHz frequency, then each subband will be of The analysis filter bank accumulates 12 sets of 32 PCM samples. Hence, it calculates maximum amplitude in 12 sets of each subband. Depending upon this maximum amplitude, the scaling factor is determined. The scaling factor and 12 sets of each subband are given to their respective quantizers. Psychoacoustic model: The 12 sets of frequency subbands are given to psychoacoustic model also. The model determines the masking effect of various subbands. Depending on this, signal to mask ratios are obtained. These ratios indicate that which frequency component will be masked and unmasked. The amplitude of the unmasked components are determined. The bit allocation for each subband are also determined. The frequency component with highest sensitivity are quantized with more bits. This gives higher accuracy. The information of bit allocations is given to frame conversion. The masking threshold for each subband are given to their respective quantizer. Quantizer blocks: The Quantizer masks the frequency components that are below masking threshold. For each subband, there is a separate quantizer.

28

Frame Conversion: The quantized frequency samples are carried in a specific frame. Its format is as follows: Header SBS format 1232 subbands Ancillary data

Header: The header contains an information about sampling frequency used in encoding. Subband Sample Format(SBS): The SBS format contains no. of bits used for scaling factor and 12 frequency components in each subband. Subband Samples: It contains data samples of each subband. There are total 1232 samples. Ancillary Data: This field is optional. It contains additional samples for features like surround sound.

29

encoder MPEG decoder: The following is the explanation of an MPEG decoder: The encoded MPEG audio signal is given to the demultiplexer. It multiplexes the incoming encoded frames into 12 sets of 32 subbands. The bit allocations are also extracted.

30

This information is given to dequantizers. The actual magnitudes of the 12 sets of samples in each subbands are determined by the dequantizer. The dequantized 32 subband samples are given to synthesis filter bank. The synthesis filter bank produces 32 PCM samples for each set. The PCM samples are then decoded to generate audio data. The decoding process does not requires psychoacoustic model.

12). Desiqn Digital Immune System.


Ans: Introduction:

decoder

The digital immune system is a comprehensive approach to virus protection developed by IBM [KEPH97a, KEPH97b, WHIT99] and subsequently refined by Symantec [SYMA01]. The motivation for this development has been the rising threat of Internet-based virus propagation. Threats: Traditionally, the virus threat was characterized by the relatively slow spread of new viruses and new mutations. Antivirus software was typically updated on a monthly basis, and this was sufficient to control the problem. Also traditionally, the Internet played a comparatively small role in the spread of viruses. But as [CHES97] points out, two major

31

trends in Internet technology have had an increasing impact on the rate of virus propagation in recent years: Integrated mail systems: Systems such as Lotus Notes and Microsoft Outlook make it very simple to send anything to anyone and to work with objects that are received. Mobile-program systems: Capabilities such as Java and ActiveX allow programs to move on their own from one system to another. IBMs Approach: In response to the threat posed by these Internet-based capabilities, IBM has developed a prototype digital immune system. This system expands on the use of program emulation and provides a general purpose emulation and virus-detection system. The objective of this system is to provide rapid response time so that viruses can be stamped out almost as soon as they are introduced. When a new virus enters an organization, the immune system automatically captures it, analyzes it, adds detection and shielding for it, removes it, and passes information about that virus to systems running IBM AntiVirus so that it can be detected before it is allowed to run elsewhere. The figure on the next page operation: illustrates the typical steps in digital immune system

1. A monitoring program on each PC uses a variety of heuristics based on system behavior, suspicious changes to programs, or family signature to infer that a virus may be present. The monitoring program forwards a copy of any program thought to be infected to an administrative machine within the organization. 2. The administrative machine encrypts the sample and sends it to a central virus analysis machine. 3. This machine creates an environment in which the infected program can be safely run for analysis. Techniques used for this purpose include emulation, or the creation of a protected environment within which the suspect program can be executed and monitored. The virus analysis machine then produces a prescription for identifying and removing the virus. 4. The resulting prescription is sent back to the administrative machine. 5. The administrative machine forwards the prescription to the infected client. 6. The prescription is also forwarded to other clients in the organization.

32

7. Subscribers around the world receive regular antivirus updates that protect them from the new virus.

The success of the digital immune system depends on the ability of the virus analysis machine to detect new and innovative virus strains. By constantly analyzing and monitoring the viruses found in the wild, it should be possible to continually update the digital immune software to keep up with the threat.

13). Design Huffman Code for a source which generates letters from an alphabet A{ a1, a2, a3, a4, a5} with P(a1) = P(a3) = 0.2, P(a2) = 0.4, P(a4) = P(a5) = 0.1. Calculate entropy of the source.
Solution: Given: Letters a1 a2 a3 Probability 0.2 0.4 0.2

33

a4 a5 Step 1:

0.1 0.1

Arrange the symbols in the descending order of their probabilities. Dummy variable B1 B2 B3 B4 B5 Step 2: Figure: B1 0.4 1 B2 0.2 B2345 0 B3 0.2 1 B345 B4 0.1 1 B45 B5 0.1 0 Explanation: The above figure illustrates the procedure: 0.2 0 0.4 0.6 1 B12345 0 1 Letters a2 a1 a3 a4 a5 Probability 0.4 0.2 0.2 0.1 0.1 Codeword

34

B4 is combined with B5 and both are replaced by the combined symbol B45 whose probability is 0.2. There are now 4 symbols left B1 with probabilitiy 0.4 and B2, B3, B45 with probabilities 0.2 each. We arbitrarily select B3 and B45, combine them and replace themwith auxillary symbol B345 with probability 0.4. Three symbols are now left B1 with probabilitiy 0.4, B2 with probability 0.2 and B345 with probability 0.4. We arbitrarily select B2, B345, combine them and replace them with auxillary symbol B2345 with probability 0.6. Finally B1 and B2345 are combined and replaced by B12345 with probability 1. To assign codes, we arbitrarily assign a bot 1 to then top edge and bit 0 to the bottom edge of every pair of edges. This results into: Dummy variable B1 B2 B3 B4 B5 { 1, a2, a3, a4, a5}{ Entropy of the source: H(S)= = 2.1219 bits per symbol. Letters a2 a1 a3 a4 a5 } Probability 0.4 0.2 0.2 0.1 0.1 Codeword 0 10 111 1101 1100

14). What is message digest? Explain HMAC algorithm.


Ans: Message Digest: A message digest is also referred as Hash function or Hash value.

35

A hash function accepts a variable size message M as input and produces a fixed sized output referred to as hash code H(M) Hash code does not use any key but is a function of only input message. Hash code is the function of all the bits of message and provides an error detection capability. A change in any bits in the message results in the change of the hash code. The following figure shows various shows various ways in which a hash code can be used to provide authentication.

Symmetric encryption Authentication (only A and B share key K) Confindentiality (Encryption is applied to entire message)

Authentication Only hash codes encrypted using symmetric encryption Reduces processing burden for those applications that do not require confidentiality

36

This technique assumes that the two communication parties share a common secret value. A computer hash value over the concatenation of M and S and appends resulting value to M. Because B possesses S it can recomputed hash value to verify Because the secret value itself is not sent, an opponent cannot modify an intercepted message and hence cannot generate false message Provides authentication( only A and B share S)

Provides authentication( only A and B share S) Provides confidentiality

HMAC Algorithm: The figure illustrates the overall operation of HMAC. In the figure, H = embedded hash function (e.g., MD5, SHA-1, RIPEMD-160) IV = initial value input to hash function M = message input to HMAC (including the padding specified in the embedded hash function) Yi k M I (L 1)

L =number of blocks in M b = number of bits in a block n = length of hash code produced by embedded hash function

37

K = secret key; recommended length is n; if key length is greater than b, the key is input to the hash function to produce an n-bit key

K+ = K padded with zeros on the left so that the result is b bits in length Ipad =00110110 (36 in hexadecimal) repeated b/8 times opad =01011100 (5C in hexadecimal) repeated b/8 times Then HMAC can be expressed as HMAC(K,M) = H[(K+ K[ M]]

We can describe the algorithm as follows. 1. Append zeros to the left end of K to create a b-bit string (e.g., if K is of length 160 bits and b=512 , then K will be appended with 44 zeroes).

38

2. XOR (bitwise exclusive-OR) K+ with ipad to produce the b-bit block Si . 3. Append M to Si . 4. Apply H to the stream generated in step 3. 5. XOR K+ with opad to produce the b-bit block S0 . 6. Append the hash result from step 4 to S0 . 7. Apply H to the stream generated in step 6 and output the result. Note that the XOR with ipad results in flipping one-half of the bits of K . Similarly, the XOR with opad results in flipping one-half of the bits of K , using a different set of bits. In effect, by passing Si and S0 through the compression function of the hash algorithm, we have pseudorandomly generated two keys from K . HMAC should execute in approximately the same time as the embedded hash function for long messages. HMAC adds three executions of the hash compression function (for Si , S0, and the block produced from the inner hash).

15). Write short notes on: 1. JPEG 2000 2. Key distribution center 3. Firewall Design 4. Differentiate between : - Lossy and Lossless compression - Audio compression and Image compression
Ans:

39

JPEG 2000: JPEG is widely used for image compression which makes use of DCT on 88 blocks of pixels. However it results in a reconstructed image that has a blocky appearance. Hence new standard for compression of still images is developed known as JPEG 2000. Features: High compression efficiency Ability to handle large images upto 232232 pixels. Easy and fast access to various points in the compressed stream. The decoder can zoom image while decompressing only pixels of it. The decoder can rotate and crop the image while decompressing it. Error correcting codes can be included in the compressed stream, to improve transmission reliability in noisy environment.

Working of JPEG: Step 1: If the image being compressed is in colour, it is divided into three components Step 2: Each component is partitioned into rectangular non overlapping regions called the tiles, that are compressed individually. Step 3: A tile is compressed in four main steps. Step 4: The first step is to compute a wavelet transformthat results in subbands of wavelet coefficients. Apply DCT The DCT coefficient are quantized. Encode using a MQ coder to arithmetically encode the quantized coefficient.

40

Construct the compressed bitstream by placing the packets as well as markers that can be used by decoder. Image Compression in JPEG 2000: (Using subband decomposition) We begin an NM image. We filter end row and then downsample to obtain two NM images. We then filter each column of subsample to obtain 4 images. M N N
H0 H0 2 H1 2 LH

M/2 N/2
2

M/2

LL

HPF

H0

HL

H1

2 H1 2 HH

LPF M/2 N/2 N/2


LL HL HL

M/2
LH HH HH

Key Distribution Center: Firewall Design:

41

42

43

You might also like