You are on page 1of 57

CHAPTER.

1
1.1. Introduction to Wavelet Transforms
1.1.1 Wavelet Transforms Wavelets are functions generated from one single function (basis function) called the prototype or mother wavelet by dilations (scaling) and translations (shifts) in time (frequency) domain. If the mother wavelet is denoted by (t ) , the other wavelets a ,b (t ) can be represented as

a , b (t ) =

t b ------------------------------------- (1) a a

Where a and b are two arbitrary real numbers [1] [3]. The variables a and b represent the parameters for dilations and translations respectively in the time axis. From Eq.1, it is obvious that the mother wavelet can be essentially represented as

(t ) = 1,0 (t ) --------------------------------------------------------- (2)

For any arbitrary a 1 and b = 0, it is possible to derive that

a , b (t ) =

t b ------------------------------------------------- (3) a a

As shown in Eq.3, a , 0 (t ) is nothing but a time-scaled (by a) and amplitudescaled (by a ) version of the mother wavelet function t in Eq. 2. The

parameter a causes contraction of (t ) in the time axis when a < 1 and expression or stretching when a > 1. Thats why the parameter a is called the

dilation (scaling) parameter. For a < 0, the function a ,b (t ) results in time reversal with dilation. Mathematically, substituting t in Eq. 3 by t-b to cause a translation or shift in the time axis resulting in the wavelet function a ,b (t ) as shown in Eq.1. The function a ,b (t ) is a shift of a , 0 (t ) in right along the time axis by an amount b when b > 0 whereas it is a shift in left along the time axis by an amount b when b < 0. Thats why the variable b represents the translation in time (shift in frequency) domain.

t t Figure 1.1 (a) A mother wavelet, (b) : 0 < < 1 , and (c) : > 1 . Figure 1 shows an illustration of a mother wavelet and its dilations in the time domain with the dilation parameter a = . For the mother wavelet (t ) shown in Figure 1(a), a contraction of the signal in the time axis when < 1 is shown in Figure 1(b) and expansion of the signal in the time axis when > 1 is shown in Figure 1(c). Based on this definition of wavelets, the wavelet transform (WT) of a function (signal) f (t) is mathematically represented by [1] W (a, b) = a ,b (t ) f (t )dt ----------------------------------------------------- (4)

1.1.2. Continuous wavelet transform A continuous wavelet transform is used to divide a continuous-time function into wavelets. Unlike Fourier transform, the continuous wavelet transform possesses the ability to construct a time-frequency representation of a signal that offers very good time and frequency localization. The continuous wavelet transform is defined as [2] f WT (a, b) = 1 a

f (t )

t b dt ----------------------------------------------- (5) a

The transformed signal f WT (a, b) is a function of the dilation parameter a and the translation parameter b. The mother wavelet is denoted by , the * indicates that the complex conjugate is used in case of a complex wavelet. The signal energy is normalized at every scale by dividing the wavelet

coefficients by at every scale.

1 a

(16). This ensures that the wavelets have the same energy

WT

( a, b)

1 t b dadb ------------------------------------------------ (6) a2 a

Where
2

C =
0

(f)
f df <

And ( f ) is the Fourier transform of the mother wavelet (t ) . 1.1.3. Discrete wavelet transform One drawback of the CWT is that the representation of the signal is often redundant, since a and b are continuous over R (the real number). The original signal can be completely reconstructed by a sample version of W (a, b). Typically, we sample W (a, b) in dyadic grid, i.e.[3]

f (t )

* m, n

m,n

(t ) = 2 m (2 m t n) Is the dilated and translated version of the

mother wavelet (t ).

The transform shown in Eq. 7 is called the wavelet series, which is analogous to the Fourier series because the input function f(t) is still a continuous function whereas the transform coefficients are discrete. This is often called the discrete time wavelet transform (DTWT). For digital signal or image processing applications executed by a digital computer, the input signal f(t) needs to be discrete in nature because of the digital sampling of the original data, which is represented by a finite number bits. When the input function f (t) as well as the wavelet parameters a and b are represented in discrete form, the transformation is commonly referred to as the discrete wavelet transform (DWT) of the signal f (t). The discrete wavelet transform (DWT) became a very versatile signal processing tool after Mallat [3] proposed the multiresolution representation of signals based on wavelet decomposition. The method of multiresolution is to represent a function (signal) with a collection of coefficients, each of which provides information about the position as well as the frequency of the signal (function). The advantage of DWT over Fourier transformation is that it performs multiresolution analysis of signals with localization. As a result, the DWT decomposes a digital signal into different subbands so that the lower frequency subbands will have finer frequency resolution and coarser time resolution compared to the higher frequency subbands. The DWT is being increasingly used for image compression due to the fact that the DWT supports features like progressive image transmission (by quality, by resolution), ease of compressed image

manipulation, region of interest coding, etc. Because of these characteristics, the DWT is the basis of the new JPEG2000 image compression standard.

1.1.4. Multiresolution Analysis Two-dimensional extension of DWT is essential for transformation of two-dimensional signals, such as a digital image [4]. A two-dimensional digital signal can be represented by a two-dimensional array X[M, N] with M rows and N columns, where M and N are nonnegative integers. The simple approach for two-dimensional implementation of the DWT is to perform the one-dimensional DWT row-wise to produce an intermediate result and then perform the same one-dimensional DWT column-wise on this intermediate result to produce the final result. This is shown in Figure 6(a). This is possible because the twodimensional scaling functions can be expressed as separable functions which is the product of two-dimensional scaling function such as 2 ( x, y ) = 1 ( x) 1 ( y ) . The same is true for the wavelet function ( x, y ) as well. Applying the onedimensional transform in each row, two subbands are produced in each row. When the low-frequency subbands of all the rows (L) are put together, it looks

N ) of the input signal as shown in Figure 6(a). 2

Similarly put together the high-frequency subbands of all the rows to produce

N , which contains mainly the high-frequency 2

information around discontinuities (edges in an image) in the input signal. Then applying a one-dimensional DWT column-wise on these L and H subbands (intermediate result), four subbands LL, LH, HL, and HH of size

M N are generated as shown in Figure 2(a). LL is a coarser version of the 2 2 original input signal. LH, HL, and HH are the high frequency subband containing the detail information. It is also possible to apply one-dimensional DWT column-wise first and then row-wise to achieve the same result. The multiresolution decomposition approach in the two-dimensional signal is demonstrated in Figures 2(b) and (c). After the first level of decomposition, it generates four subbands LL1, HL1, LH1, and HH1 as shown in Figure 2(a). Considering the input signal is an image, the LL1 subband can be considered as a 2:1 sub sampled (both horizontally and vertically) version of image. The other three subbands HL1, LH1, and HH1 contain higher frequency detail information. These spatially oriented (horizontal, vertical or diagonal)

subbands mostly contain information of local discontinuities in the image and the bulk of the energy in each of these three subbands is concentrated in the vicinity of areas corresponding to edge activities in the original image. Since LL1 is a coarser approximation of the input, it has similar spatial and statistical characteristics to the original image. As a result, it can be further decomposed into four subbands LL2, LH2, HL2 and HH2 as shown in Figure 2(b) based on

the principle of multiresolution analysis. Accordingly the image is decomposed into 10 subbands LL3, LH3, HL3, HH3, HL2, LH2, HH2, LH1, HL1 and HH1 after three levels of pyramidal multiresolution subband decomposition, as shown in Figure 2(c). The same computation can continue to further decompose LL3 into higher levels [4].

LL1 LH1

HL1 HH1

HL1 HH1

HL1 HH1

LH1

LH1

(c) Third level decomposition

Figure1.2.Row - Column computation of two-dimensional DWT 1.1.5. Multiresolution filter banks The wavelet decomposition [5] results in levels of approximated and

detailed coefficients. The algorithm of wavelet signal decomposition is illustrated in Fig 3. Reconstruction of the signal from the wavelet transform

and post processing, the algorithm is shown in Fig 4. This multi-resolution analysis enables us to analyze the signal in different frequency bands; therefore, we could observe any transient in time domain as well as in frequency domain.

Level 1

Level 2

A 2 (t ) A1 (t )
h 2 2 g 2

D 2 (t )

D 1 (t )

Figure1.3 .Two-level Multi-resolution wavelet decomposition filter structure

Level 2

A (t )
2 h

Level 1

A1 (t )
2 h Reconstructed signal

D (t )

Figure1.4. Multi-resolution wavelet reconstruction

D (t )

The relation between the low-pass and high-pass filter and the scalar function (t) and the wavelet (t) can be states as following:

k

(t ) = g (k ) (2t k ) ---------------------------------------------- (9)

k

Where h = low-pass decomposition filter; g = high-pass decomposition filter. The relation between the low-pass filter and high-pass filter is not independent to each other, they are related by: g ( L 1 n) = (1) n h(n) Where g(n) is the high-pass, h(n) is the low-pass filter, L is the filter length (total number of points). Filters satisfying this condition are commonly used in signal processing, and they are known as the Quadrature Mirror Filters (QMF). The two filtering and down sampling operation can be expressed by: A i (k ) = A i 1 (t )h( 2k n) ----------------------------------------------- (10)
n

D i ( K ) = A i 1 (t ) g (2k 1) ----------------------------------------------- (11)

n

The reconstruction in this case is very easy since the half band filters form the orthonormal bases. The above procedure is followed in reverse order for the reconstruction. The signals at every level are up sampled by two, passed through the synthesis filters g[n], and h[n] (high pass and low pass, respectively), and then added. A i (k ) = ( D i +1 (k ) g (n + 2k ) + A i +1 (k ).h(n + 2k )) ---------------------------------- (12)

1.1.6. Applications There is a wide range of applications for Wavelet Transforms. They are applied in different fields ranging from signal processing to biometrics, and the list is still growing. One of the prominent applications is in the FBI fingerprint compression standard. Wavelet Transforms are used to compress the fingerprint pictures for storage in their data bank. The previously chosen Discrete Cosine Transform (DCT) did not perform well at high compression ratios. It produced severe blocking effects which made it impossible to follow the ridge lines in the fingerprints after reconstruction. This did not happen with Wavelet Transform due to its property of retaining the details present in the data. In DWT, the most prominent information in the signal appears in high amplitudes and the less prominent information appears in very low amplitudes. Data compression can be achieved by discarding these low amplitudes. The wavelet transforms enables high compression ratios with good quality of reconstruction. At present, the application of wavelets for image compression is one the hottest areas of research. Recently, the Wavelet Transforms have been chosen for the JPEG 2000 compression standard. Wavelets also find application in speech compression, which reduces transmission time in mobile applications. They are used in denoising, edge detection, feature extraction, speech recognition, echo cancellation and others. They are very promising for real time audio and video compression applications. Wavelets also have numerous applications in digital

communications. Orthogonal Frequency Division Multiplexing (OFDM) is one of them. Wavelets are used in biomedical imaging. For example, the ECG signals, measured from the heart, are analyzed using wavelets or compressed for storage. The popularity of Wavelet Transform is growing because of its ability to reduce distortion in the reconstructed signal while retaining all the significant features present in the signal.

1.2. Introduction to Compression

After DWT was introduced, several codec algorithms were proposed to compress the transform coefficients as much as possible. Among them, Embedded Zerotree Wavelet (EZW) [7], Set Partitioning In Hierarchical Trees (SPIHT) [8] and Embedded Bock Coding with Optimized Truncation (EBCOT) [2] are the most famous ones. 1.2.1. Embedded zero tree wavelet algorithm The embedded zero tree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. The embedded code represents a sequence of binary decisions that distinguish an image from the null image. Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly. Also, given a bit stream, the decoder can cease decoding at any point in the bit stream

and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream. In addition to producing a fully embedded bit stream, EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images. Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source. The EZW algorithm is based on four key concepts: 1) a discrete wavelet transform or hierarchical subband decomposition, 2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, 3) entropy-coded successive-approximation quantization, and 4) universal lossless data

compression which is achieved via adaptive arithmetic coding. 1.2.2. Set Partitioning In Hierarchical Trees Algorithm (SPIHT) SPIHT is a new coding technique, developed by Said and Pearlman, which order the transform coefficients using a set partitioning algorithm based on the sub-band pyramid. By sending the most important information first of the ordered coefficients, the information required to reconstruct the image is extremely compact. SPIHT is also one of the fastest codecs available and provide user selectable file size or image quality and progressive image resolution and transmission. SPIHT is based on three concepts: 1) Partial ordering of the image coefficients by magnitude and transmission of order by a subset partitioning algorithm that is duplicated at the decoder. 2) Ordered bit plane transmission of

refinement bits, and 3) Exploitation of the self-similarities of the image wavelet transform across different scales. Let W is an array of wavelet coefficients that is achieved after wavelet transform. A wavelet coefficient is
m said to be significant for bit depth m, if W IJ 2 , otherwise it is said to be

insignificant. Moreover, a wavelet tree is said to be significant for bit depth m if some of its coefficients have absolute value larger than 2 m . The SPIHT repeatedly employs a set partitioning algorithm for identifying and refining significant wavelet coefficients until the rate budget is exhausted and after each set partitioning operation m decreases by one. For each m, the set partitioning operation consists of two passes: the sorting pass where the significance of each wavelet coefficient is determined respect to m, and the refining pass where the refinement of significant coefficients is performed. To effectively realize these two passes, three lists of information, termed: list of significant pixels (LSP), list of insignificant pixels (LIP) and list of insignificant sets (LIS) are maintained at any point of coding. The lists LSP and LIP contain the locations of significant and insignificant wavelet coefficients, respectively. The list LIS contains the root node of the insignificant wavelet tree.

1.3. Motivation
There is a wide range of applications for Wavelet Transforms. They are applied in different fields ranging from signal processing to biometrics. One of

the prominent applications is in the FBI fingerprint compression standard. Wavelets also find application in speech compression, which reduces transmission time in mobile applications. They are used in denoising, edge detection, feature extraction, speech recognition, echo cancellation and others. They are very promising for real time audio and video compression applications. Wavelets also have numerous applications in digital

communications. There exist two main approaches to compute the m-D DWT: separable approach and non-separable approach. The separable approach performs m-D DWT by 1-D DWT dimension by dimension, which requires extra huge memory to save the intermediate data that should be transposed for the next dimensional DWT, and has long output latency and system latency (SL). The non-separable approach does not require any transposition but requires more multipliers and accumulators (MACs) than the separable approach. In order to tradeoff the speed and area, some line based architectures for 2-D DWT by exploiting parallel and pipeline have been proposed. However, those architectures were all developed based on convolution hence they had higher hardware complexity. The lifting scheme can reduce efficiently the

computational complexity of DWT. The lifting scheme is an efficient tool for constructing second generation wavelets, and has advantages such as faster implementation, fully in-place calculation, reversible integer-to-integer

transforms, and so on. It is a structure that allows design and implementation of discrete wavelet transform.

1.4. Objective
The project consists of an efficient VLSI implementation of Piecewise Lifting Scheme algorithm. A novel and efficient VLSI architecture is proposed and implemented for the Piecewise Lifting Scheme DWT and Inverse Lifting Scheme. The VLSI architecture has been authored in VHDL code for Piecewise Lifting Scheme and its synthesis was done with Xilinx XST. Xilinx ISE Foundation 9.1i has been used for performing mapping, placing and routing. For behavioral simulation and place and route simulation Modelsim6.0 has been used. The Synthesis tool was configured to optimize for area and high effort considerations. The interest of the project work is an attempt to obtain a real time signal processing VLSI architecture for Lifting Scheme DWT. Piecewise Lifting Scheme used in numerous Image processing applications like denoising, edge detection, feature extraction, speech recognition, and echo cancellation etc.

Thesis Organization
The thesis is organized as follows: In Chapter 1 Introduction to Wavelets, compression algorithms and its applications and limitations are discussed. Chapter2 Deals with the overview of the mathematical definitions and their modules of Piecewise Lifting Scheme. Chapter 3 Discusses the hardware implementation of Piecewise Lifting Scheme DWT and Inverse Lifting Scheme DWT. Chapter 4 Deals with the detailed explanation of FPGA.

Chapter 5 Simulation and synthesis results of Piecewise Lifting Scheme DWT were presented. Chapter 6 Provides summary and future work.

CHAPTER.2 PRAPOSED ALGORITHM

2.1. Lifting Scheme
The lifting scheme is an efficient tool for constructing second generation wavelets, and has advantages such as faster implementation, fully in-place calculation, reversible integer-to-integer transforms, and so on. It is a structure that allows design and implementation of discrete wavelet transform. The lifting scheme has a few advantages over the classical implementation of the wavelet transforms: it offers faster implementation, and it easily implements reversible integer-to-integer wavelet transforms. Integer wavelet transforms when implemented via lifting scheme have better computational efficiency and lower memory requirements. Constructed entirely in spatial domain and based on the theory of biorthogonal wavelet filter banks with perfect

reconstruction, lifting scheme can easily build up a gradually improved multiresolution analysis through iterative primal lifting and dual lifting. It turns out that lifting scheme outperforms the classical especially in effective

implementation, such as convenient construction, in-place calculation, lower computational complexity and simple inverse transform, etc. With lifting, we can also build wavelets with more vanishing moments and/or more smoothness, contributing to its flexible adaptivity and non-linearity. The lifting scheme consists of the following three steps to decompose the samples, namely, splitting, predicting, and updating [27], [28], [29]. (1) Split step: The input samples split into even samples and odd samples. (2) Predict step (P): The even samples are multiplied by the predict factor and then the results are added to the odd samples to generate the detailed coefficients. (3) Update step (U): The detailed coefficients computed by the predict step are multiplied by the update factors and then the results are added to the even samples to get the coarse coefficients.

EVEN SAMPLES

SPLIT

PREDICT

UPDATE

ODD SAMPLES

Figure2.1. Forward Lifting Wavelet Transform SPLIT: In this step, the data is divided into ODD and EVEN elements.

sk

(0)

= x 2i , d k

(0)

(0)

= x 2i + 1

(0)

----------------------------------------- (1)

Where x i is the input sequence. s k Represents even samples d k Represents odd samples. r Represents the level of decomposition. PREDICT: The PREDICT [27] step uses a function that approximates the data set. The differences between the approximation and the actual data replace the odd elements of the data set. The even elements are left unchanged and become the input for the next step in the transform. The PREDICT step, where the odd value is "predicted" from the even value is described by the equation.

dk
UPDATE:

(r )

= dk

( r 1)

pi s k + j

(r )

( r 1)

---------------------------------------- (2)

The UPDATE [27], [28] step replaces the even elements with an average. These results in a smoother input for the next step of the wavelet transform. The odd elements also represent an approximation of the original data set, which allows filters to be constructed. The UPDATE phase follows the PREDICT phase. The original values of the odd elements have been overwritten by the difference between the odd element and its even "predictor". So in calculating an average the UPDATE phase must operate on the differences that are stored in the odd elements:

sk

(r)

= sk

( r 1)

+ u j dk+ j
(r )

(r)

-------------------------------- (3)

If there are 2 n 2 n data elements in an image, the first step of the forward transform will produce 2 n 1 averages and 2 n 1 differences (between the prediction and the actual odd element value). These differences are sometimes referred to as wavelet coefficients. The split phase that starts each forward transform step moves the odd elements to the second half of the array, leaving the even elements in the lower half. At the end of the transform step, the odd elements are replaced by the differences and the even elements are replaced by the averages. The even elements become the input for the next step, which again starts with the split phase.

2.2. Inverse Lifting Scheme:

One of the elegant features of the lifting scheme is that the inverse transform is a mirror of the forward transform. Inverse Lifting Scheme block schematic is shown in figure2.2. In the case of the Haar transform, additions are substituted for subtractions and subtractions for additions. The merge step replaces the split step. EVEN SAMPLES

UPDATE

PREDICT

MERGE

ODD SAMPLES

2.3. Piecewise Lifting scheme DWT

In conventional Lifting Scheme based DWT, complete image is divided into two parts that is even and odd image pixels. One even and one odd image pixel leads to PREDICT and UPDATE step as discussed. Here, in modified version of Lifting Scheme based DWT, image is not divided into even and odd sections, but the complete image is windowed. Windowing technique is applied throughout the complete image so as to have equal number of pixels in each window. Number of windows formed depends on the percentage interpolation required to be calculated. For example, if an image of size 256 x 256 is to be interpolated with 10% of reduction of original image size, then overall 26 x 26 pixels are to be reduced from original image. To achieve this from the original image of 256x256, 26x26 rows and columns are to be dropped such that resultant image formation is of size 230x230. To achieve this, the image is divided into n number of windows each having size as 256/26=9.86 rounded off to 10. Then, Lifting scheme is applied on a window of size 10 pixels. Thus, 26 windows are formed each containing 10 pixels for an image size of 256x256 for 10% reduction in image size. To equalize the last window containing 6 samples, complete image is padded by 2 rows of zeros at the top and bottom and 2 columns of zeros at left and right side of the image and then Lifting Scheme is applied on each window of 10 samples. Thus PREDICT and UPDATE step application on each window throughout the complete image

yields reduction in size of an image. Thus, 10% reduction in image size is computed. Magnification of image so as to increase image size by 10% can be achieved using inverse Lifting Scheme. For this the difference components obtained at every stage during forward Lifting Scheme procedure are stored and are used here in inverse lifting scheme procedure. Currently available average component and the stored difference components undergo inverse lifting scheme procedure to yield magnification of an image. The only difference remains in the application of PREDICT and UPDATE steps. These steps are interchanged and magnification of an image is obtained. Thus, piecewise application of Lifting Scheme based DWT technique results in reduction and magnification of an image. Figure2.3 shows piecewise application of Lifting scheme DWT. In this original image of size 30x30 is taken into consideration which is divided into 3 windows each containing 10 samples. To each window individually modified Lifting Scheme is applied so as to achieve required reduction. Similarly, reverse procedure that is Inverse Lifting Scheme is applied to obtain magnification of an image. For generalized Lifting scheme it was necessary to divide data into two parts i.e. even values and odd values and process it for Lifting Scheme. Here, in modified piecewise lifting scheme procedure, image is divided into number of windows as shown in fig.5. If original image is of size 30x30 pixels, then it is divided into 3 windows for 10% reduction in size. To each window lifting scheme procedure is applied.

Window1

Window2

Window3

CHAPTER-3 IMPLEMENTATION OF PIECEWISE LIFTING SCHEME DWT

The architecture for the implementation of the Piecewise Lifting Scheme DWT Algorithm consists of the two main components, windowing technique and Lifting Scheme. In windowing technique complete image is divided into different windows of equal size and then applying the Lifting Scheme for each and every window to reduce the image size. Reconstruction is also possible by applying Inverse Lifting Scheme.

3.1. Piecewise Lifting Scheme

In the hardware implementation entire design has been divided in to various modules given below.
1. Windowing.

Predict Update

3.1.2. Flow chart for Piecewise Lifting Scheme DWT

Start

Apply window technique Apply Row wise lifting scheme DWT Apply column wise lifting scheme DWT Compressed image

Apply inverse lifting scheme Original image End Figure3.1.Flow chart for the piecewise lifting scheme DWT

3.1.3. Windowing

In conventional Lifting Scheme based DWT, complete image is divided into two parts that is even and odd image pixels. One even and one odd image pixel leads to PREDICT and UPDATE step as discussed. Here, in modified version of Lifting Scheme based DWT, image is not divided into even and odd sections, but the complete image is windowed. Windowing technique is applied throughout the complete image so as to have equal number of pixels in each window. Number of windows formed depends on the percentage interpolation required to be calculated.

3.1.4. Lifting Scheme

The lifting scheme consists of three steps Split, Predict and Update.
(1) Split step: The input samples are split into even samples and odd

samples.

sk

(0)

= x 2i , d k

(0)

(0)

= x 2i + 1

(0)

-------------------------------------- (3.1)

Sequence Reset Counter . Mux Even Controller Odd Figure3.2.Architecture for Split Module

Clk

(2) Predict step: The PREDICT [8] step uses a function that approximates

the data set. The differences between the approximation and the actual data replace the odd elements of the data set. The even elements are left unchanged and become the input for the next step in the transform. The PREDICT step, where the odd value is "predicted" from the even value is described by the equation. The even samples are subtracted from the odd samples.

dk = dk

(1)

(0)

sk

(0)

--------------------------------- (3.2)

Even samples(s)

s1

( 0)

d1
Subtractor

( 0)

Predicted odd samples () Figure3.3.Architecture for Prediction Module

(3) Update step: The UPDATE [2], [3] step replaces the even elements

with an average. These results in a smoother input for the next step of the wavelet transform. The odd elements also represent an

approximation of the original data set, which allows filters to be

constructed. The UPDATE phase follows the PREDICT phase. The original values of the odd elements have been overwritten by the difference between the odd element and its even "predictor". So in calculating an average the UPDATE phase must operate on the differences that are stored in the odd elements.

sk = sk

(1)

(0)

d + k 2

(1)

---------------------------------- (3.3)

Split

Even samples

Odd samples

s1

( 0)

d1
Predicted samples
(1)

( 0)

d1

Right shift by one

d1 2

(1)

Updated samples ()

3.2. Inverse Piecewise Lifting Scheme

Magnification of image so as to increase image size can be achieved using inverse Lifting Scheme. For this the difference components obtained at every stage during forward Lifting Scheme procedure are stored and are used here in inverse lifting scheme procedure. Currently available average component and the stored difference components undergo inverse lifting scheme procedure to yield magnification of an image.

3.2.1. Inverse Lifting Scheme

One of the elegant features of the lifting scheme is that the inverse transform is a mirror of the forward transform. Inverse Lifting Scheme block schematic is shown in fig. In the case of the Haar transform, additions are substituted for subtractions and subtractions for additions. The merge step replaces the split step. In the hardware implementation entire design has been divided in to various modules like Update and Prediction. (1) Update: In the Update step, where the even samples are reconstructed from the predicted and Update functions of the forward transform described by the equation Forward Transform Predicted samples samples
d1
(1)

sk

(0)

s1
(1)

(1)

Right shift by one Subtractor Even samples ()

Figure3.5.Architecture for Inverse Update Module (2) Prediction: In Prediction step the odd values are reconstructed from the predicted values of the forward transform and the reconstructed even samples described by the equation

dk = dk

(1)

(0)

sk

(0)

-------------------------------- (3.5)

Forward Transform

Updated samples samples

d1

(1)

s1
Even samples

(1)

s1

( 0)

Odd samples ()

Figure3.6.Architecture for Inverse Predict Module After getting even and odd samples we merge both to reconstruct the original sequence.

CHAPTER-4 FPGA DESIGN FLOW

This is part of chapter deals with the implementation flow specifying the significance of various properties, reports obtained and simulation waveforms of architectures developed to implement.

4.1. FPGA Design flow

The various steps involved in the design flow are as follows: 1) Design entry. 2) Functional simulation. 3) Synthesizing and optimizing (translation) the design. 4) Placing and routing the design 5) Timing simulation of the design after post PAR. 6) Static timing analysis. 7) Configuring the device by bit generation.

4.1.1. Design entry The first step in implementing the design is to create the HDL code based on design criteria. To support these instantiations we need to include UNISIM library and compile all design libraries before performing the functional simulation. The constraints (timing and area constraints) can also be included during the design entry. Xilinx accepts the constraints in the form of user constraint (UCF) file.

4.1.2. Functional Simulation This step deals with the verification of the functionality of the written source code. ISE provides its own ISE simulator and also allows for the integration with other tools such as Modelsim. This project uses Modelsim. Therefore the functional verification by selecting the option during project creation. Functional simulation determines if the logic in the design is correct before implementing it in a device. Functional simulation can take place at the earliest stages of the design flow. Because timing information for the implemented design is not available at this stage, the simulator tests the logic in the design using unit delays. 4.1.3. Synthesizing and Optimizing In this stage behavioral information in the HDL file is translated into a structural net list, and the design is optimized for a Xilinx device. To perform synthesis this project uses Xilinx XST tool. From the original design, a net list is created, then synthesized and translated into a native generic object (NGO)

file. This file is fed into the Xilinx software program called NGD Build, which produces a logical native generic database (NGD) file. 4.1.4. Design implementation In this stage, The MAP program maps a logical design to a Xilinx FPGA. The input to MAP is an NGD file, which is generated using the NGD Build program. The NGD file contains a logical description of the design that includes both the hierarchical components used to develop the design and the lower level Xilinx primitives. The NGD file also contains any number of NMC (macro library) files, each of which contains the definition of a physical macro. MAP first performs a logical DRC (Design Rule Check) on the design in the NGD file. MAP then maps the design logic to the components (logic cells, I/O cells, and other components) in the target Xilinx FPGA. The output from MAP is an NCD (Native Circuit Description) file, and PCF (Physical constraint file). NCD (Native Circuit Description) filea physical description of the design in terms of the components in the target Xilinx device. PCF (Physical Constraints File)an ASCII text file that contains constraints specified during design entry expressed in terms of physical elements. The physical constraints in the PCF are expressed in Xilinxs constraint language. After the creation of Native Circuit Description (NCD) file with the MAP program, place and route that design file using PAR. PAR accepts a mapped

NCD file as input, places and routes the design, and outputs an NCD file to be used by the bit stream generator (BitGen). The PAR placer executes multiple phases of the placer. PAR writes the NCD after all the placer phases are complete. During placement, PAR places components into sites based on factors such as constraints specified in the PCF file, the length of connections, and the available routing resources. After placing the design, PAR executes multiple phases of the router. The router performs a converging procedure for a solution that routes the design to completion and meets timing constraints. Once the design is fully routed, PAR writes an NCD file, which can be analyzed against timing. PAR writes a new NCD as the routing improves throughout the router phases. 4.1.5. Timing simulation after post PAR Timing simulation at this stage verifies that the design runs at the desired speed for the device under worst-case conditions. This process is performed after the design is mapped, placed, and routed for FPGAs. At this time, all design delays are known. Timing simulation is valuable because it can verify timing relationships and determine the critical paths for the design under worst-case conditions. It can also determine whether or not the design contains set-up or hold violations. In most of the designs the same test bench can be used to simulate at this stage. 4.1.6. Static timing analysis

Static timing analysis is best for quick timing checks of a design after it is placed and routed. It also allows you to determine path delays in your design. Following are the two major goals of static timing analysis: Timing verification: This is verifying that the design meets your timing constraints. Reporting: This is enumerating input constraint violations and placing them into an accessible file. ISE provides Timing Reporter and Circuit Evaluator (TRACE) tool to perform STA. The input files to the TRACE are .ncd file and .pcf from PAR .and the output file is a .twr file. 4.1.7. Configuring the device by BitGen After the design is completely routed, it is necessary to configure the device so that it can execute the desired function. This is done using files generated by BitGen, the Xilinx bit stream generation program. BitGen takes a fully routed NCD (native circuit description) file as input and produces a configuration bit streama binary file with a .bit extension. The BIT file contains all of the configuration information from the NCD file that defines the internal logic and interconnections of the FPGA, plus device-specific

information from other files associated with the target device. The binary data in the BIT file is then downloaded into the FPGAs memory cells, or it is used to create a PROM file.

4.2. Processes and properties

Processes and properties enable the interaction of our design with the functionality available in the ISE suite of tools. 4.2.1. Processes Processes are the functions listed hierarchically in the Processes window. They perform functions from the start to the end of the design flow. 4.2.2. Properties Process properties are accessible from the right-click menu for select processes. They enable us to customize the parameters used by the process. Process properties are set at synthesis and implementation phase.

4.3. Synthesize options

The following properties apply to the Synthesize properties .using the Xilinx Synthesis Technology (XST) synthesis tool. Optimization Goal Specifies the global optimization goal for area or speed. Select an option from the drop-down list. Speed: Optimizes the design for speed by reducing the levels of logic. Area: Optimizes the design for area by reducing the total amount of logic used for design implementation. By default, this property is set to Speed. 4.3.1. Optimization Effort Specifies the synthesis optimization effort level. Select an option from the drop-down list.

Normal: Optimizes the design using minimization and algebraic factoring algorithms.

High: Performs additional optimizations that are tuned to the selected device architecture. High takes more CPU time than Normal because multiple optimization algorithms are tried to get the best result for the target architecture.

By default, this property is set to Normal. This project aims at Timing performance and was selected HIGH effort level. 4.3.2. Power Reduction When set to Yes (checkbox is checked), XST optimizes the design to consume as little power as possible. By default, this property is set to No (checkbox is blank). 4.3.3. Use Synthesis Constraints File Specifies whether or not to use the constraints file entered in the previous property. By default, this constraints file is used (property checkbox is checked). 4.3.4. Keep Hierarchy Specifies whether or not the corresponding design unit should be preserved and not merged with the rest of the design. You can specify Yes, No and Soft. Soft is used when you wish to maintain the hierarchy through

synthesis, but you do not wish to pass the keep_ hierarchy attributes to place and route. By default, this property is set to No. The change in option of this property from no to yes gave me almost double the speed. 4.3.5. Global Optimization Goal Specifies the global timing optimization goal Select an option from the drop-down list. AllClockNets: Optimizes the period of the entire design. Inpad to Outpad: Optimizes the maximum delay from input pad to output pad throughout an entire design. Offset In Before: Optimizes the maximum delay from input pad to clock, either for a specific clock or for an entire design. Offset Out After: Optimizes the maximum delay from clock to output pad, either for a specific clock or for an entire design. Maximum Delay: Global optimization will be set to maximum delay constraints for paths that start at an input and end at an output. This option incorporates the goals of all the above options. By default, this property is set to AllClockNets. 4.3.6. Generate RTL Schematic

Generates a pre-optimization RTL schematic of the design. Values for this property are Yes, No, and only. Only stops the synthesis process before optimization, after the RTL schematic has been generated. The default value is yes. 4.3.7. Read Cores Specifies whether or not black box core are read for timing and area estimation in order to get better optimization of the rest of the design. When set to True (checkbox is checked), XST parses any black boxes that have been instantiated in your code to extract timing and resource usage information. The black box net list is not modified or re-written. When set to False (checkbox is blank), cores are not read. By default, this property is set to True (checkbox is checked).

4.4. Write Timing Constraints (FPGA only)

Specifies whether or not to place timing constraints in the NGC file. The timing constraints in the NGC file will be used during place and route, as well as synthesis optimization. By default, this property is set to False (checkbox is blank). 4.4.1. Slice Utilization Ratio Specifies the area size (in %) that XST will not exceed during timing optimization. If the area constraint cannot be satisfied, XST will make timing optimization regardless of the area constraint. The default ratio is 100%. You can disable automatic resource management by entering -1 here. 4.4.2. LUT-FF Pairs Utilization Ratio

Specifies the area size (in %) that XST will not exceed during timing optimization. If the area constraint cannot be satisfied, XST will make timing optimization regardless of the area constraint. The default ratio is 100%. You can disable automatic resource management by entering -1 here. 4.4.3. BRAM Utilization Ratio Specifies the number of BRAM blocks (in %) that XST will not exceed during synthesis. The default percentage is 100%. You can disable automatic BRAM resource management by entering -1 here.

4.5. Implementation options

4.5.1. Map Properties 4.5.2. Perform Timing-Driven Packing and Placement Specifies whether or not to give priority to timing critical paths during packing in the Map Process. User-generated timing constraints are used to drive the packing and placement operations. The timing constraints are generally specified in the User Constraints File (UCF) and are annotated onto the design during the Translate process. At the completion of the process, the result is a completely placed design, and the design is ready for routing. If Timing-Driven Packing and Placement is selected in the absence of user timing constraints, the tools will automatically generate and dynamically adjust timing constraints for all internal clocks. This feature is referred to as Performance Evaluation mode. This mode allows the clock performance for all clocks in the design to be evaluated in one pass. The performance achieved

by this mode is not necessarily the best possible performance each clock can achieve. Instead it is a balance of performance between all clocks in the design. By default, this property is set to False (checkbox is blank). This project aims at speed and this option is selected. 4.5.3. Map Effort Level Note: Available only when Perform Timing-Driven Packing and

Placement is set to True (checkbox is checked). Specifies the effort level to apply to the Map process. The effort level controls the amount of time used for packing and placement by selecting a more or less CPU-intensive algorithm for placement. Select an option from the drop-down list. Standard Gives the fastest run time with the lowest mapping effort. Appropriate for a less complex design. Medium Gives a medium run time with good mapping results. High Gives the longest run time with the best mapping results. Appropriate for a more complex design. By default, this property is set to Medium. As this project is a complex design the option high is selected.

4.5.4. Extra Effort Map spends additional run time in an effort to meet difficult timing constraints. Note The Extra Effort property is available only when the Map Effort Level is set to High. Select an option from the drop-down list.

None No extra effort level is applied.

Normal Runs until timing constraints are met unless they are found to be

impossible to meet. This option focuses on meeting timing constraints. Continue on Impossible Continues working to improve timing until no more progress is made, even if timing constraints are impossible. This option focuses on getting close to meeting timing constraints. By default, this property is set to none. This project has a timing constraint of 100 ns; to meet this option Normal is selected.

4.6. Combinatorial Logic Optimization

Specifies whether or not to run a process that revisits the combinatorial logic within a design to see if any improvements can be made that will improve the overall quality of results. Timing constraints and logic packing information are considered when this process is run By default, this property is set to False (checkbox is blank), and this process is not run on the design. This project aims to meet timing constraint and this option is selected.

4.7. Optimization Strategy (Cover Mode)

Specifies the criteria used during the "cover" phase of MAP. In the "cover" phase, MAP assigns the logic to CLB function generators (LUTs). Select an option from the drop-down list. Area Select Area to make reducing the number of LUTs (and therefore the number of CLBs) the highest priority. Speed Select Speed to make reducing the number of levels of LUTS (the number of LUTs a path passes through) the highest priority. This setting makes it easiest to achieve your timing constraints after the design is placed and routed. For most designs there is a small increase in the number of LUTs (compared to the area setting), and in some cases the increase may be large.

Balanced Select Balanced to balance two priorities; reducing the number of LUTs

and reducing the number of levels of LUTs. The Balanced option produces results similar to the Speed setting but avoids the possibility of a large increase in the number of LUTs. Select Off to disable optimization. By default, this property is set to Area. To meet timing constraints this project selected the option of speed.

4.8. PAR properties

4.8.1. Place and Route Effort Level (Overall) Specifies the effort level you want to apply to the Place & Route process. The effort level controls the placement and route times by selecting a more or less CPU-intensive algorithm for placement and routing. You can set the overall level from Standard (fastest run time) to High (best results). By default, this property is set at Standard. To meet the timing constraint HIGH is selected for this project.

4.9. Xilinx Core Generator

The Xilinx CORE Generator System provides you with a catalog of ready-made functions ranging in complexity from simple arithmetic operators such as adders, accumulators and multipliers, to system level building blocks including filters, transforms and memories.

The CORE Generator System can customize a generic functional building block such as a FIR filter or a multiplier to meet the needs of your application and simultaneously deliver high levels of performance and area efficiency. 4.9.1. Block Memory Generator Block Memory Generator core is an advanced memory constructor that generates area and performance-optimized memories using embedded block RAM resources in Xilinx FPGAs. Available through the CORE Generator software, users can quickly create optimized memories to leverage the performance and features of block RAMs in Xilinx FPGAs. The Block Memory Generator core uses embedded Block Memory primitives in Xilinx FPGAs to extend the functionality and capability of a single primitive to memories of arbitrary widths and depths. Sophisticated algorithms within the Block Memory Generator core produce optimized solutions to provide convenient access to memories for a wide range of configurations. The Block Memory Generator has two fully independent ports that access a shared memory space. Both A and B ports have a write and a read interface. In Virtex-6, Virtex-5 and Virtex-4 FPGA architectures, all four interfaces can be uniquely configured, each with a different data width. When not using all four interfaces, the user can select a simplified memory configuration (for example, a Single-Port Memory or Simple Dual-Port Memory), allowing the core to more efficiently use available resources. 4.9.2. Memory Types

The Block Memory Generator core uses embedded block RAM to generate five types of memories: Single-port RAM Simple Dual-port RAM True Dual-port RAM Single-port ROM Dual-port ROM For dual-port memories, each port operates independently. Operating mode, clock frequency, optional output registers, and optional pins are selectable per port. For Simple Dual-port RAM, the operating modes are not selectable; they are fixed as READ_FIRST. 4.9.3. Configurable Width and Depth The Block Memory Generator can generate memory structures from 1 to 1152 bits wide, and at least two locations deep. The maximum depth of the memory is limited only by the number of block RAM primitives in the target device. 4.9.4. Selectable Operating Mode per Port The Block Memory Generator supports the following block RAM primitive operating modes: WRITE FIRST, READ FIRST, and NO CHANGE. Each port may be assigned an operating mode. 4.9.5. Selectable Port Aspect Ratios The core supports the same port aspect ratios as the block RAM primitives: In all supported device families, the A port width may differ from the B port width by a factor of 1, 2, 4, 8, 16, or 32.

In Virtex-6, Virtex-5 and Virtex-4 FPGA-based memories, the read width may differ from the write width by a factor of 1, 2, 4, 8, 16, or 32 for each port. The maximum ratio between any two of the data widths (DINA, DOUTA, DINB, and DOUTB) is 32:1. 4.9.8. Optional Byte-Write Enable In Virtex-6, Virtex-5, Virtex-4, Spartan-6, and Spartan-3A/3A DSP FPGA-based memories, the Block Memory Generator core provides byte-write support for memory widths of 8-bit (no parity) or 9-bit multiples (with parity).

4.9.9. Optional Pipeline Stages The core provides optional pipeline stages within the MUX, available only when the registers at the output of the memory core are enabled and only for specific configurations. For the available configurations, the number of pipeline stages can be 1, 2, or 3. 4.9.10. Memory Initialization The memory contents can be optionally initialized using a memory coefficient (COE) file or by using the default data option. A COE file can define the initial contents of each individual memory location, while the default data option defines the initial content of all locations. 4.9.11. Simulation Models The Block Memory Generator core provides behavioral and structural simulation models in VHDL and Verilog for both simple and precise modeling

of memory behaviors, for example, debugging, probing the contents of the memory, and collision detection. 4.9.12. Functional Description The Block Memory Generator is used to build custom memory modules from block RAM primitives in Xilinx FPGAs. The core implements an optimal memory by arranging block RAM primitives based on user selections, automating the process of primitive instantiation and concatenation. Using the CORE Generator Graphical User Interface (GUI), users can configure the core and rapidly generate a highly optimized custom memory solution.

CHAPTER-5 RESULTS AND ANALYSIS

5.1. Simulation Results
The behavioral simulation and post rout simulations waveforms for the Split function is shown in figure5.1 and figure5.2. In the figure5.1,the inputs are clock,reset, enable and 143 bit sequence input.143-bit sequence is given as the input, when the reset is high, all the signals are set to all zeros. The ena is high after the reset is set to low, this causes the 143-bit input splited and then generate even and odd samples as output.

Figure5.2.Post route simulation waveform for the Split function

The behavioral simulation and post route simulation waveforms for the prediction and update function is shown in figure5.3 and figure5.4. In the figure5.2,the inputs are clock,reset, enable and 143 bit sequence input.143bit sequence is given as the input, if enable is high the total sequence splits as even and odd samples. After splitting the sequence the prediction operation performed and generated the detailed coefficients as output and then update operation performed to generate coarse coefficients.

prediction and update function

Figure5.4.Post Route Simulation waveform for the prediction and update function

The behavioral simulation and post route simulation waveforms for the inverse lifting scheme is shown in figure5.3.in the figure the inputs are

detailed and coarse samples of forward transform.when ever the enable signal is high the update and prediction functions are performed to generate the original sequence.

Figure5.5. Behavioral simulation waveform for the inverse lifting scheme

Figure5.6. Post Route simulation waveform for the inverse lifting scheme

5.2. Design Summary Piecewise Lifting Scheme

The design implementation summary of Forward Lifting Scheme shown in Table 5.1 and Inverse Lifting Scheme shown in Table 5.2. Table1: Design Implementation summary for Forward Lifting Scheme Logic Utilization Number of Slices Number of Slice Flip Flops Number of 4 Input LUTs Number of IOs Number used as Flip Flops Number used as Latches Logic Distribution Used 105 31 208 165 5 26 Availabl e 14752 29,504 29,504 ---Utilizatio n 0% 0% 0% ----

Number of occupied Slices Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number used as logic IOB Latches Number of GCLKs Total equivalent gate count for design Additional JTAG gate count for IOBs Peak Memory Usage

1% 100% 0% 1% 43% -8%

Timing Summary: Minimum period: 2.346ns (Maximum Frequency: 426.212MHz) Minimum input arrival time before clock: 3.141ns Maximum output required time after clock: 8.386ns Maximum combinational path delay: No path found Table2: Design Implementation summary for Inverse Lifting Scheme Logic Utilization Number of Slices Number of Slice Flip Flops Number of 4 Input LUTs Number of IOs Used 68 9 132 36 Available 14752 29,504 29,504 -Utilization 0% 0% 1% --

Number used as logic Logic Distribution Number of occupied Slices Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number used as logic IOB Latches Number of GCLKs Total equivalent gate count for design Additional JTAG gate count for IOBs Peak Memory usage

-1% 100% 0% 1% --% -4%

Timing summary Minimum period: No path found Minimum input arrival time before clock: 6.895ns Maximum output required time after clock: 8.188ns Maximum combinational path delay: 8.852ns

5.3 RTL Schematic

In integrated circuit design, register transfer level (RTL) description is a way of describing the operation of a synchronous digital circuit. In RTL design, a circuit's behavior is defined in terms of the flow of signals (or transfer of data) between hardware registers, and the logical operations performed on those signals.

After the HDL synthesis phase of the synthesis process, use the RTL Viewer to view a schematic representation of the pre-optimized design in terms of generic symbols that are independent of the targeted Xilinx device, for example, in terms of adders, multipliers, counters, AND gates, and OR gates. The RTL schematic for the Forward Piecewise Lifting Scheme generated by the Xilinx Synthesis tool is shown in figure5.7 below.

Figure5.7.RTL Schematic for Forward Lifting Scheme The RTL schematic for the Forward Piecewise Lifting Scheme generated by the Xilinx shown in Synthesis tool is figure5.8 below.

Figure5.8.RTL Schematic for Inverse Lifting Scheme

[1] Olivier Rioul and Martin Vetterli, "Wavelets and Signal Processing, IEEE Trans. on Signal Processing, Vol. 8, Issue 4, pp. 14 - 38 October 1991.

[2] P.S. Addison. The Illustrated Wavelet Transform Handbook. IOP Publishing Ltd, 2002. ISBN 0-7503-0692-0. [3] S. Mallat, "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 11, No.7, pp. 674-693, July 1989. [4] I. Daubechies, "The Wavelet Transform, Time-Frequency Localization and Signal Analysis," IEEE Trans. on Inform. Theory, Vol. 36, No. 5, pp. 961-1005, September 1990. [5] Wavelet filters evaluation for image compression". al., J. Liao et. August 1995, IEEE Trans. Image Process. Vol. 4, pp. 10531060. 1. W. Sweldens, The lifting scheme: a custom-design construction of biorthogonal wavelets, Appl. Comput. Harmon. Anal., vol. 3, no. 2, pp. 186200, 1996.

2.

The lifting scheme: A construction of second generation wavelets, SIAM J. Math. Anal., vol. 29, no. 2, pp. 511546, 1997.

3. I. Daubechies and W. Sweldens, Factoring wavelet transforms into lifting steps, J. Fourier Anal. Appl., vol. 4, no. 3, pp. 247269, 1998. 4. W. Sweldens, The lifting scheme: A custom design construction of biorthogonal wavelets, Appl. Comput. Harmon. Anal., vol. 3, no. 2, pp. 186200, 1996