You are on page 1of 60

Adaptive compressed sensing of speech signal

CHAPTER-1

INTRODUCTION
Compressive sensing is an emerging and revolutionary technology that strongly relies on the sparsity of the signal. In compressive sensing the signal is compressively sampled by taking a small number of random projections of the signal, which contain most of the salient information. Compressive sensing has been previously applied in areas like image processing, radar systems and sonar systems. And now it is being used in speech processing as advanced technique to acquiring the data. The key objective in compressed sensing (also referred to as sparse signal recovery or compressive sampling) is to reconstruct a signal accurately and eciently from a set of few non-adaptive linear measurements. Of course, linear algebra easily shows that in general it is not possible to reconstruct an arbitrary signal from an incomplete set of linear measurements. Thus one must restrict the domain in which the signals belong. To this end, we consider sparse signals, those with few non-zero coordinates. It is now known that many signals such as real-world images or audio signals are sparse . Since sparse signals lie in a lower dimensional space, one would think that they may be represented by few linear measurements. This is indeed correct, but the diculty is determining in which lower dimensional subspace such a signal lies. That is, we may know that the signal has few non-zero coordinates, but we do not know which coordinates those are. It is thus clear that we may not reconstruct such signals using a simple linear operator, and that the recovery requires more sophisticated techniques. The compressed sensing eld has provided many recovery algorithms, most with provable as well as empirical results. There are several important traits that an optimal recovery algorithm must possess. The algorithm needs to be fast, so that it can eciently recover signals in practice. Of course, minimal storage requirements as well would be ideal. MRITS, DEPARTMENT OF ECE 1

Adaptive compressed sensing of speech signal The algorithm should provide uniform guarantees, meaning that given a specic method of acquiring linear measurements, the algorithm recovers all sparse signals (possibly with high probability).Ideally, the algorithm would require as few linear measurements as possible. However, recovery using only this property would require searching through the exponentially large set of all possible lower dimensional subspaces, and so in practice is not numerically feasible. Thus in the more realistic setting, we may need slightly more measurements. Finally, we wish our ideal recovery algorithm to be stable. This means that if the signal or its measurements are perturbed slightly, then the recovery should still be approximately accurate. This is essential, since in practice we often encounter not only noisy signals or measurements, but also signals that are not exactly sparse, but close to being sparse.The conventional scheme in signal processing, acquiring the entire signal and then compressing it, was questioned by Donoho. Indeed, this technique uses tremendous resources to acquire often very large signals, just to throw away information during compression. The natural question then is whether we can combine these two processes, and directly sense the signal or its essential parts using few linear measurements. Recent work in compressed sensing has answered this question in positive, and the eld continues to rapidly produce encouraging results.

1.1 Objective
Compressed sensing (CS) is an emerging signal acquisition theory that directly collects signals in a compressed form if they are sparse on some certain basis. It originates from the idea that it is not necessary to invest a lot of power into observing the entries of a sparse signal in all coordinates when most of them are zero anyway. Rather it should be possible to collect only a small number of measurements that still allow for reconstruction. This is potentially useful in applications where one cannot afford to collect or transmit a lot of measurements but has rich resources at the decoder.

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal Observing that different kind speech frames have different intra-frame correlations, a frame-based adaptive compressed sensing framework for speech signals has been proposed. The objective of this project is to further improve the performance of the existing compressed sensing process that uses non adaptive projection matrix, by using the adaptive projection matrix based on frame analysis. Average-frame signal-to-noise ratio (AFSNR) is calculated to evaluate the performance of the frame-based adaptive CS with the non-adaptive CS.

1.2 Existing System


Compressed sensing is the technique used to overcome the constraints of conventional sampling theorem. The compressed sensing allows us to go beyond the nyquist rate and sample the signal below nyquist frequency. In a typical communication system, the signal is sampled at least at twice the highest frequency contained in the signal. However, this limits efficient ways to compress the signal, as it places a huge burden on sampling the entire signal while only a small number of the transform coefficients are needed to represent the signal. On the other hand, compressive sampling provides a new way to reconstruct the original signal from a minimal number of observations. CS is a sampling paradigm that allows us to go beyond the Shannon limit by exploiting the sparsity structure of the signal. It allows us to capture and represent the compressible signals at a rate significantly below the Nyquist rate. The signal is then reconstructed from these projections by using different optimization techniques. During compressive sampling only the important information about a signal is acquired, rather than acquiring the important information plus the information of a signal which will be eventually discarded at the receiver.But the existing compressed sensing uses non adaptive projection matrix and takes the same number of projections for all the frames. This leads to degradation in systems efficiency.

1.3 Proposed System


The efficiency of the conventional non adaptive compressed sensing can be increased by using the Adaptive projection matrix. The adaptive projection matrix uses different projections for different frames based on their inter frame correlations thus improving the efficiency of the system. MRITS, DEPARTMENT OF ECE 3

Adaptive compressed sensing of speech signal Most work in CS research focus on random projection matrix which is constructed by considering only the signals sparsity rather than other properties. In other word, the construction of projection matrix is non-adaptive. Observing that different kind speech frames have different intra-frame correlations, a frame-based adaptive compressed sensing framework, which applies adaptive projection matrix for speech signals, has been proposed.To do so, neighbouring frames are compared to estimate their intra-frame correlation, every frame is classified into different categories, and the number of projections for each frame is adjusted accordingly.The experimental results show that the adaptive projection matrix can significantly improve the speech reconstruction quality.

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal

CHAPTER-2

LITERATURE SURVEY
According to information theory, the bit rate at which the condition of distortion less transmission of any source signal is possible is determined by the entropy of the speech source message. However, that in practical terms the source rate corresponding to the entropy is only asymptotically achievable, for the encoding memory length or delay tends to infinity. Any further compression is associated with information loss or coding distortion. Many practical source compression techniques employ lossy coding, which typically guarantees further bit rate economy at the cost of nearly imperceptible speech, audio, video, and other source representation degradation. Note that the optimum Shannonian source encoder generates a perfectly uncorrelated source-coded stream, in which all the source redundancy has been removed. Therefore, the encoded source symbols which in most practical cases are constituted by binary bits are independent, and each one has the same significance. Having the same significance implies that the corruption of any of the source-encoded symbols results in identical source signal distortion over imperfect channels. Under these conditions, according to Shannon's fundamental work the best protection against transmission errors is achieved if source and channel coding are treated as separate entities. When using a block code of length N channel-coded symbols in order to encode K source symbols with a coding rate of R = K/N, the symbol error rate can be rendered arbitrarily low if N tends to infinity .

2.1 Speech Production


Speech is a natural form of communication for human beings, we use speech every day almost unconsciously, but an understanding of the mechanisms on which it is based will help to clarify how the brain processes information. Figure 2.1 shows the process of human speech production.

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal

Figure 2.1: Human Speech Production System

Human speech production comprises lungs, vocal cords, and the vocal tract. The vocal cords are expressed as a simple vibration model, and the pitch of the speech changes according to adjustments in the tension of the vocal cords. Speech is generated by emitting sound pressure waves, radiated primarily from the lips, although significant energy emanates through sounds from the nostrils, throat, and the like. The air compressed by the lungs excites the vocal cords in two typical modes. When generating voiced sounds, the vocal cords vibrate and generate a high-energy quasi-periodic speech waveform, while in the case of lower energy unvoiced sounds, the vocal cords do not participate in the voice production and the source behaves similarly to a noise generator. In a somewhat simplistic approach, the excitation signal denoted by E (z) is then filtered through the vocal apparatus, which behaves like a spectral shaping filter with a transfer function of H (z) that is constituted by the spectral shaping action of the glottis, which is defined as the opening between the vocal folds. Further spectral shaping is carried out by the vocal tract, lip radiation characteristics, and so on.

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal The human speech in its pristine form is an acoustic signal. For the purpose of communication and storage, it is necessary to convert it into an electrical signal. This is accomplished with the help of certain instruments called transducers. This electrical representation of speech has certain properties. 1. It is a one-dimensional signal, with time as its independent variable. 2. It is random in nature. 3. It is non-stationary, i.e. the frequency spectrum is not constant in time. Microphone is a transeducer that converts the acoustic speech signal into elictrical signal .The micro phone receives the acoustic voice signal and produces the electrical signal as output whose amplitude is proportional to the input acoustic voice signal intensity.

Figure 2.2: block diagram of micro phone

The electrical signal which is produced by the micro phone is a analog signal whose amplitude various continuously with time and it is continuous both in time and amplitude as shown in figure below,

Figure 2.3: Analog speech signal

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal

2.2 Digitization of Speech


Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4 kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someones voice from anywhere in the world as if the person was in the same room .As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. Though the electrical signal which is coming from microphone can be

processed in its original analog form it is not as much efficient as it is processed in digital form. With the advent of digital computing machines, it was propounded to exploit the powers of the same for processing of speech signals. This required a digital representation of speech. To achieve this, the analog signal is sampled at some frequency and then quantized at discrete levels. There are many advantages in processing the signal in digital than in analog form. The analog speech signal is converted into digital form by using Analog to Digital Converter (ADC).

Figure 2.4: Block diagram of Analog to digital converter

As shown in the figure above the analog to digital converter consists of three basic functions. 1. Sampling 2. Quantization 3. Encoding

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal 2.2.1 Sampling Sampling process is used to convert the continuous time signal to discrete time signal. This is achieved by multiplying the input continuous signal with an impulse train of unity magnitude. Here the impulse train frequency which is also known as sampling frequency should be greater than twice the highest frequency component present in the input analog signal (Fs > = 2Fm), and this rate is called as nyquist rate( nyquist condition for sampling),

Figure 2.5: Block diagram of Sampler The output of sampler is a discrete time signal in which the samples present at only discrete intervals of time. 2.2.2 Quantization The output obtained from the sampling process is discrete in only time but its amplitude values are still assume continuous values. In order to make it discrete , quantization is used.

Figure 2.6: Block diagram of Quantizer

The quantizer assigns discrete values to the input samples by either Rounding off process or Truncation process. In rounding off process the values are assign to the nearest integer multiple of step size. Where as in truncation process the values are assigned by truncating the values above the integral multiple of step size. When compared Rounding off process the truncation process produces more error.Thus the output of quantizer is a signal which is discrete both in time and amplitude. MRITS, DEPARTMENT OF ECE 9

Adaptive compressed sensing of speech signal 2.2.3 Encoding The encoder is used to assign binary values to the sample values coming from the quantizer.

Figure 2.7: Block diagram of Encoder

Thus the out from the encoder is pure digital signal represented by zeros and ones. 2.2.4 Anti-Aliasing Filter Apart from the three building blocks sampler, quantizer and encoder discussed above the Analog to Digital Converter also contains a low pass filter called AntiAliasing filter. Anti-aliasing low-pass filtering (LPF) is necessary in order to band limit the signal to a bandwidth of B before sampling. In case of speech signals, about 1% of the energy resides above 4 kHz and only a negligible proportion above 7 kHz. Hence, commendatory quality speech links, which are also often referred to as wideband speech systems, typically band limit the speech signal to 7-8 kHz. Conventional telephone systems usually employ a bandwidth limitation of 0.3-3.4 kHz, which results only in a minor speech degradation, hardly perceptible by the listener.

2.3 Speech Signal Analysis


In contrast to deterministic signals, random signals, such as speech, music, video, and other information signals, cannot be described by the help of analytical formulas. They are typically characterized by the help of statistical functions. The power spectral density (PSD), autocorrelation function (ACF), cumulative distribution function (CDF), and probability density function (PDF) are some of the most frequent ones invoked.A typical speech sentence signal consists of two main parts.

MRITS, DEPARTMENT OF ECE

10

Adaptive compressed sensing of speech signal one carries the speech information, and the other includes silent or noise sections that are between the utterances, without any verbal information. The verbal (informative) part of speech can be further divided into two categories namely Speech and Unvoiced Speech. 2.3.1 Voiced and Non Voiced Speech In speech there are two major types of excitation, voiced and unvoiced. Voiced speech consists mainly of vowel sounds. It is produced by forcing air through the glottis, proper adjustment of the tension of the vocal cords results in opening and closing of the cords, and a production of almost periodic pulses of air. These pulses excite the vocal tract. Psychoacoustics experiments show that this part holds most of the information of the speech and thus holds the keys for characterizing a speaker. Unvoiced speech sections are generated by forcing air through a constriction formed at a point in the vocal tract (usually toward the mouth end), thus producing turbulence. Being able to distinguish between the three is very important for speech signal analysis. Voiced speech tends to be periodic in nature. Examples of voiced sounds are English vowels, such as the /a/ in .bay and the /e/ in .see.Since unvoiced speech is due to turbulence, the speech is aperiodic and has a noise-like structure. Some examples of unvoiced English sounds are the /s/ in .so and the /h/ in .he.. In general at least 90% of the speech energy is always retained in the first N/2 transform coefficients, if the speech is a voiced frame. However, for an unvoiced frame the energy is spread across several frequency bands and typically the first N/2 coefficients holds less than 40% of the total energy. Due to this wavelets are inefficient at coding unvoiced speech. Unvoiced speech frames are infrequent. By detecting unvoiced speech frames and directly encoding them (perhaps using entropy coding), no unvoiced data is lost and the quality of the compressed speech will remain transparent. Typical voiced and unvoiced speech waveform segments are shown in Figures 2.8 and 2.9 respectively, along with their corresponding power densities. Clearly, the unvoiced segment appears to have a significantly lower magnitude, which is also reflected by its PSD. MRITS, DEPARTMENT OF ECE 11

Adaptive compressed sensing of speech signal

Figure 2.8: Voice speech segment and its PSD

Figure 2.9: Unvoiced speech segment and its PSD

The voiced segment shown in Figure 2.10 is quasi-periodic in the time domain, and it has an approximately 80-sample periodicity, identified by the positions of the largest time-domain signal peaks, which corresponds to10 ms. This interval is referred to as the pitch period and it is also often expressed in terms of the pitch frequency p, which in this example 1/ (10 ms) = 100 Hz. In the case of male speakers, the typical pitch frequency range is between 40 and 120 Hz , whereas for females it can be as high as 300-400 Hz.

MRITS, DEPARTMENT OF ECE

12

Adaptive compressed sensing of speech signal Observe furthermore that within each pitch period there is a gradually decaying oscillation, which is associated with the excitation and gradually decaying vibration of the vocal cords. A perfectly periodic time-domain signal would have a line spectrum, but since the voiced speech signal is quasi-periodic with a frequency of p (rather than being perfectly periodic) its spectrum exhibits somewhat widened but distinctive spectral needles at frequencies of np, rather than being perfectly periodic. As a second phenomenon, it can also be observed that three, sometimes four, spectral envelope peaks. In voiced spectrum of Figure 2.8 these formant frequencies are observable around 500 Hz, 1500 Hz, and 2700 Hz, and they are the manifestation of the resonances of the vocal tract at these frequencies. In contrast, the unvoiced segment of Figure 2.9 does not have a formant structure, rather it has a more dominant high-pass nature, exhibiting a peak around 2500 Hz. Observe, furthermore, that its energy is much lower than that of the voiced segment . It is equally instructive to study the ACF of voiced and unvoiced segments, which are portrayed on an expanded scale in Figure 2.10 and Figure 2.11 respectively.

Figure 2.10: voice speech segment and its ACF

The voiced ACF shows a set of periodic peaks at displacements of about 20 samples, corresponding to 2.5 ms, which coincides with the positive quasi-periodic MRITS, DEPARTMENT OF ECE 13

Adaptive compressed sensing of speech signal time-domain segments. Following four monotonously decaying peaks, there is a more dominant one around a displacement of 80 samples, which indicates the pitch periodicity.

Figure 2.11: unvoiced speech segment and its ACF

The periodic nature of the ACF can therefore, for example, be exploited to detect and measure the pitch periodicity in a range of applications, such as speech codecs and voice activity detectors. Observe, however, that the first peak at a displacement of 20 samples is about as high as the one near 80.Hence, a reliable pitch detector has to attempt to identify and rank all these peaks in order of prominence, exploiting also the a priori knowledge as to the expected range of pitch frequencies. By contrast, the unvoiced segment has a much more rapidly decaying ACF, indicating no inherent correlation between adjacent samples and no long-term periodicity. The voiced speech signal and unvoiced speech signal have some distinct characteristic features that enable us to distinguish between voiced speech and unvoiced speech. 2.3.2 Zero Crossing Rate The rate at which the speech signal crosses zero can provide information about the source of its creation. It is well known that unvoiced speech has a much higher ZCR than voiced speech this is because most of the energy in unvoiced speech is found in higher frequencies than in voiced speech, implying a higher ZCR for the former. MRITS, DEPARTMENT OF ECE 14

Adaptive compressed sensing of speech signal 2.3.3 Cross-Correlation Cross-correlation is calculated between two consecutive pitch cycles. The cross-correlation values between pitch cycles are higher (close to 1) in voiced speech than in unvoiced speech.

MRITS, DEPARTMENT OF ECE

15

Adaptive compressed sensing of speech signal

CHAPTER-3

SPEECH COMPRESSION
In the recent years, large scale information transfer by remote computing and the development of massive storage and retrieval systems have witnessed a tremendous growth. To cope up with the growth in the size of databases, additional storage devices need to be installed and the modems and multiplexers have to be continuously upgraded in order to permit large amounts of data transfer between computers and remote terminals. This leads to an increase in the cost as well as equipment. One solution to these problems is COMPRESSION where the database and the transmission sequence can be encoded efficiently.

3.1 Fourier Analysis


Historically, Fourier Transform has been the most widely used tool for signal processing. As signal processing began spreading its tentacles and encompassing newer signals, Fourier Transform was found to be unable to satisfy the growing need for processing a bulk of signals.And therefore is had been replaced by wavelet transform. A major drawback of Fourier analysis is that in transforming to the frequency domain, the time domain information is lost. The most important difference between Fourier Transform and wavelet transform is that individual wavelet functions are localized in space. In contrast Fourier sine and cosine functions are non-local and are active for all time t.

3.2 Continuous Wavelet Transform (CWT)


The drawbacks inherent in the Fourier methods are overcome with wavelets. Consider a real or complex-valued continuous-time function (t) with the following properties 1. The function integrates to zero ( ) ( ) 16

MRITS, DEPARTMENT OF ECE

Adaptive compressed sensing of speech signal 2. It is square integrable or, equivalently has finite energy

( )

( )

A function is called mother wavelet if it satisfies these two properties. There is infinity of functions that satisfy these properties and thus qualify to be mother wavelet. The simplest of them is the Haar wavelet. Some other wavelets are Mexican hat, Morlet. Apart from this, there are various families of wavelets. Some of the families are daubechies family, symlet family, coiflet family etc. Consider the following figure which juxtaposes a sinusoid and a wavelet

Figure 3.1: comparing sine wave and a wavelet

wavelet is a waveform of effectively limited duration that has an average value of zero. Compare wavelets with sine waves, which are the basis of Fourier analysis, Sinusoids do not have limited duration,they extend from minus to plus infinity. And while sinusoids are smooth and predictable, wavelets tend to be irregular and asymmetric.Fourier analysis consists of breaking up a signal into sine waves of various Frequencies .Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet.

MRITS, DEPARTMENT OF ECE

17

Adaptive compressed sensing of speech signal

Figure 3.2: demonstrating the decomposition of a signal into wavelets

The above diagram suggests the existence of a synthesis equation to represent the original signal as a linear combination of wavelets which are the basis function for wavelet analysis (recollect that in Fourier analysis, the basis functions are sines and cosines). This is indeed the case. The wavelets in the synthesis equation are multiplied by scalars

3.3 Discrete Wavelet Transform (DWT)


Calculating wavelet coefficients at every possible scale (for continuous WT) is a fair amount of work, and it generates lot of data. If scales and positions are choosed based on powers of two then analysis will be much more efficient and just as accurate. Such analysis can be obtained from the discrete wavelet transform (DWT). 3.3.1 Vanishing Moments The number of vanishing moments of a wavelet indicates the smoothness of the wavelet function as well as the flatness of the frequency response of the wavelet filters (filters used to compute the DWT).Typically a wavelet with p vanishing moments satisfies the following equation,

( )

For m=0....p-1

Wavelets with a high number of vanishing moments lead to a more compact signal representation and are hence useful in coding applications. However, MRITS, DEPARTMENT OF ECE 18

Adaptive compressed sensing of speech signal in general, the length of the filters increases with the number of vanishing moments and the complexity of computing the DWT coefficients increases with the size of the wavelet filters. 3.3.2 Fast Wavelet Transform The Discrete Wavelet Transform (DWT) coefficients can be computed by using Mallets Fast Wavelet Transform algorithm. This algorithm is sometimes referred to as the two-channel sub band coder and involves filtering the input signal based on the wavelet function used. To explain the implementation of the Fast Wavelet Transform algorithm consider the following equations: 1. (t) = 2. (t) = ( ( ) (2t-k) ) ( ) ( )

The first equation is known as the twin-scale relation (or the dilation equation) and defines the scaling function. The next equation expresses the wavelet in terms of the scaling function. These equations represent the impulse response coefficients for a low pass filter of length 2N, with a sum of 1 and a norm of 1/squrerootof(2). The high pass filter is obtained from the low pass filter using the relationship

) (

Where k varies over the range (1-(2N-1)) to 1.

3.4 Wavelets and Speech Compression


The idea behind signal compression using wavelets is primarily linked to the relative scarceness of the wavelet domain representation for the signal. Wavelets concentrate speech information (energy and perception) into a few neighbouring coefficients. Therefore as a result of taking the wavelet transform of a signal, many coefficients will either be zero or have negligible magnitudes.

MRITS, DEPARTMENT OF ECE

19

Adaptive compressed sensing of speech signal Another factor that comes into picture is taken from psycho acoustic studies. Since our ears are more sensitive to low frequencies than high frequencies and our hearing threshold is very high in the high frequency regions, a method for compression is used by means of which the detail coefficients (corresponding to high frequency components) of wavelet transforms are thresholded such that the error due to thresholding is inaudible to our ears. Since some of the high frequency components are discarded, a smoothened output signal is expected , as is shown in the following figure

Figure 3.3 original signal and compressed signal using DWT

In summary, the notion behind compression is based on the concept that the regular Signal component can be accurately approximated using a small Number of approximation coefficients (at a suitably chosen level) and some of the detail Coefficients.Data compression is then achieved by treating small valued coefficients as insignificant data and thus discarding them. The process of compressing a speech signal using Wavelets involves a number of different stages, each of which is discussed below. 3.4.1 Choice of Wavelet The choice of the mother-wavelet function used in designing high quality speech coders is of prime importance. Choosing a wavelet that has compact support in both time and frequency in addition to a significant number of vanishing moments is essential for an optimum wavelet speech compressor.

MRITS, DEPARTMENT OF ECE

20

Adaptive compressed sensing of speech signal This is followed very closely by the Daubechies D20, D12, D10 or D8 wavelets, all concentrating more than 96% of the signal energy in the Level 1 approximation coefficients. Wavelets with more vanishing moments provide better reconstruction quality, as they introduce less distortion into the processed speech and concentrate more signal energy in a few neighbouring coefficients. 3.4.2 Wavelet Decomposition Wavelets work by decomposing a signal into different resolutions or frequency bands, and this task is carried out by choosing the wavelet function and computing the Discrete Wavelet Transform (DWT). Signal compression is based on the concept that selecting a small number of approximation coefficients (at a suitably chosen level) and some of the detail coefficients can accurately represent regular signal components. Choosing a decomposition level for the DWT usually depends on the type of signal being analyzed or some other suitable criterion such as entropy. For the processing of speech signals decomposition up to scale 5 is adequate, with no further advantage gained in processing beyond scale 5. 3.4.3 Truncation of Coefficients After calculating the wavelet transform of the speech signal, compression involves truncating wavelet coefficients below a threshold this means that most of the speech energy is in the high-valued coefficients, which are few. Thus the small valued coefficients can be truncated or zeroed and then be used to reconstruct the signal. This compression scheme provided a segmental signal-to-noise ratio (SEGSNR) of 20 dB, with only 10% of the coefficients.Two different approaches are available for calculating thresholds. The first, known as Global Thresholding involves taking the wavelet expansion of the signal and keeping the largest absolute value coefficients. In this case you can manually set a global threshold, a compression performance or a relative square norm recovery performance. Thus only a single parameter needs to be selected. The second approach known as By Level Thresholding consists of applying visually determined level dependent thresholds to each decomposition level in the wavelet transform.

MRITS, DEPARTMENT OF ECE

21

Adaptive compressed sensing of speech signal 3.4.4 Encoding Coefficients Signal compression is achieved by first truncating small-valued coefficients and then efficiently encoding them. One way of approach to compression is to encode consecutive zero valued coefficients, with two bytes. One byte to indicate a sequence of zeros in the wavelet transforms vector and the second byte representing the number of consecutive zeros. For further data compaction a suitable bit encoding format, can be used to quantize and transmit the data at low bit rates. A low bit rate representation can be achieved by using an entropy coder like Huffman coding or arithmetic coding.

MRITS, DEPARTMENT OF ECE

22

Adaptive compressed sensing of speech signal

CHAPTER-4

COMPRESSIVE SENSING
The theory of compressive sensing was developed by Candes et al and Donoho. Compressed sensing (CS) also named compressive sampling .In a typical communication system, the signal is sampled at least at twice the highest frequency contained in the signal. However, this limits efficient ways to compress the signal, as it places a huge burden on sampling the entire signal while only a small number of the transform coefficients are needed to represent the signal. On the other hand, compressive sampling provides a new way to reconstruct the original signal from a minimal number of observations. CS is a sampling paradigm that allows to go beyond the Shannon limit by exploiting the sparsity structure of the signal. It allows to capture and represent the compressible signals at a rate significantly below the Nyquist rate. The signal is then reconstructed from these projections by using different optimization techniques.During compressive sampling only the important information about a signal is acquired, rather than acquiring the important information plus the information of a signal which will be eventually discarded at the receiver. The key elements that need to be addressed before using compressive sensing are the following how to find the transform domain in which the signal has a sparse representation, how to effectively sample the sparsely signal in the time domain, how to recover the original signal from the samples by using optimization techniques. In summary, the large amount of data needed to sample at the Nyquist rate, especially for speech, image and video signals, motivates the study of compressive sensing as a feasible solution for future mobile communication systems. Sparse signals are defined as signals that can be represented by a limited number of data

MRITS, DEPARTMENT OF ECE

23

Adaptive compressed sensing of speech signal points in the transform domain. Many real-world signals can be classified into this category using an appropriate transform domain. For instance, if signal is a sine, it is clearly not sparse, but its Fourier transform is extremely sparse. The consequences of acquiring large amounts of data, added to the overhead of compression can be improved by using compressive sensing. As a result, there are potential savings in terms of energy, memory and processing.

4.1 Signal Sparsity


Sparsity of the signal represents the presence of signal transform coefficients less densely.Sparsity allows to reconstruct the signal with less number of projections (samples).The procedure used to ensure the sparsity of the signal is called transform coding, which is performed by the following four steps 1. Full N-points of a signal x is obtained using the Nyquist rate , 2. Complete set of transform coefficients (DFT) is obtained, 3. Located the K largest coefficients and threw away the smallest coefficients 4. Multipled the signal by the measurement matrix to obtain the observation vector of length M.

Figure 4.1 shows an example of how compressive sensing can be used to compress a signal below Nyquist rate. In this example, the original sampled signal is composed of 300 samples. The intent is to reconstruct the signal using only 30 samples. Figure 4.1 (a) shows the time domain representation of the sampled signal. From this figure, it is evident that by selecting only 30 samples (red dots) from the 300 samples it will be impossible to reconstruct the original signal perfectly. On the other hand, by applying compressive sensing to the frequency representation of the signal it is possible to perfectly reconstruct it from a significant small number of samples. In order to achieve this goal it is necessary to implement optimization techniques. However, not every optimization technique can be used for this purpose. For example, Figure 4.1(c) represents the reconstructed spectrum using the l2 minimization.Clearly; there are significant differences between the signal in Figure 4.1(b) and the signal from Figure 4.1(c). MRITS, DEPARTMENT OF ECE 24

Adaptive compressed sensing of speech signal

Figure 4.1: (a) Time domain representation of the signal composed of 300 samples, (b) Fourier spectrum of the signal to be encoded (c) Reconstruction of the Fourier spectrum via l2 minimization (d) Reconstruction of the Fourier spectrum via l1 minimization In contrast, the reconstruction using the l1 minimization gives as a result a perfect reconstruction. This can be clearly seen by comparing Figure 4.1(b) and Figure 4.1(d). In summary, optimization techniques based on l1 minimization are desired when compressive sensing is used.

MRITS, DEPARTMENT OF ECE

25

Adaptive compressed sensing of speech signal

4.2 Measurement Matrix


In compresed sensing special emphasis is given to represent the signal with an incoherent basis.The linear measurement process that computes M < N inner products between x and the collection of vectors

* +

via

=< x,

>

Where

is an MN measurement matrix with each row been a measurement

vector. It has been seen that some of the measurement matrices can be used in any scenario, in the sense that they are incoherent with any fixed basis such as Gabor, spikes, sinusoidal and wavelets. The compressive sensing measurement process with K-sparse coefficient vector x is depicted in Figure 4.2

Figure 4.2: Compressive sensing measurement process

The measurement matrix plays a vital role in the process of recovering the original signal.There are two types of measurements matrices that can be used in compressive sensing.The Random measurement matrix and the predefined measurement matrix. The fundamental revelation is that, if a signal x composed of N samples is sparse then the actual signal can be reconstructed using below formula M O (Log (N/K))

MRITS, DEPARTMENT OF ECE

26

Adaptive compressed sensing of speech signal Furthermore, x can be perfectly reconstructed using different optimization techniques. If is a structurally random matrix, its rows are not stochastically independent because they are randomized from the same random seed vector. The random matrix is transposed and then orthogonalized. This will have the effect of creating a matrix that represents an orthonormal basis. In a predefined measurement matrix, the matrix is created by using function like the Dirac functions and Sine functions. In this case, the signal is multiplied by several Dirac functions cantered at different locations to obtain the observation vector. Then the speech signal can be reconstructed using the l1 normalization method by using the observation vector and the predefined measurement matrix. Linear programming is another procedure that plays a vital role in reconstructing the original signal. It is a mathematical approach designed to get the best outcome in a given mathematical model, which is a special case of mathematical programming. Linear programming can be expressed in the following canonical form Maximize Subject to Ax b

Where x represents the variable that is to be determined, e and b are vectors of coefficients and A is a matrix of coefficients. The above expression which has to be maximized or minimized is called the objective function and the equation defines the constraints over which the objective function has to be optimized. At the end the reconstruction of the speech signal depends upon the observation vector and the measurement matrix.

4.3 Signal Reconstruction in Compressive Sensing


Recent developments in signal theory have shown that a sparse signal is a useful model in areas such as communications, radar and image processing. Therefore the assumption that every signal can be represented in a sparse form has helped in the compression of the signal of interest. The perfect reconstruction of a signal x depends on the measurement matrix and the measurement vector y. MRITS, DEPARTMENT OF ECE 27

Adaptive compressed sensing of speech signal

The compressive sensing theory tells

that when the matrix has the

Restricted Isometric Property (RIP) which are nearly orthonormal then it is possible to recover the K largest significant coefficients from a similar size set of M=O(K

log(N/K)) measurements of y. As a result, the sparse signal can be reconstructed by different optimization techniques such as l1 norm and convex optimization. The first minimization technique which has been used to reconstruct the signal is the l1 minimization (P1) min || x ||l1 Subject to x = y

This is also known as basis pursuit (P1). The goal of this technique is to find the vectors with the smallest l1-norm = It is also known as Taxicab norm Manhattan norm. The name relates to the distance a taxi has to drive in rectangular street grid to get from the origin to the point x. The distance obtained from this norm is called the Manhattan distance or l1 distance. The other optimization technique known as convex optimization (cvx) will solve many medium and small scale problems. By using cvx the signal is minimized in order to reconstruct the original signal.

4.4 Optimization Techniques


Signal reconstruction plays an important role in compressive sensing theory where the signal is reconstructed or recovered from a minimum number of measurements. By using optimization techniques it is possible to recover the signal without losing the information at the receiver. 4.4.1 l1 Minimization A recent series of papers have developed a theory of signal recovery from highly incomplete information. The results states that a sparse vector x RN can be recovered from a small number of linear measurement b=Ax RK, K<< N by solving a convex program. l1 minimization is used to solve the under determined linear equations or sparsely corrupted solution to an over determined equations. MRITS, DEPARTMENT OF ECE 28

Adaptive compressed sensing of speech signal Recently, l1 minimization has been proposed as a convex alternative to the combinatorial norm l1, which simply counts the number of nonzero entries in a vector, for synthesizing the signal as a sparse superposition of waveforms.The program P1min || x ||1 subject to Ax=b, is also known as basis pursuit. The goal of this program is to find the smallest l1 norm. This algorithm search is for a vector x, that will explain the observation b. If the signal x is sufficiently sparse then (P1) will find the norm of x by using the values of A and b. When x, A, and b have real valued entries, (P1) can be recast as a linear program. 4.4.2 Matching Pursuit Orthogonal matching pursuit (OMP) is a canonical greedy algorithm for a sparse approximation. Let represent a matrix of size MN (where typically M < N) and y denotes a vector in RM , the goal of OMP is to recover a coefficient vector x RN with roughly K < M non-zero terms so that x equals y exactly or

approximately. OMP is frequently used to find a sparse representations of the signal y RM in settings where represents a dictionary for the signal space. It is also commonly used in compressive sensing, where y=x represents compressive measurements of a sparse signal x RN to be recovered. One of the attractive features of OMP is its simplicity. OMP is empirically competitive in terms of approximation performance. 4.4.3 Orthogonal Matching Pursuit (OMP) In this project, signal is reconstructed frame by frame using OMP method. OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then is in a loose sense close to the identity. Therefore the largest coordinate of the observation vector y = x is expected to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting of that contribution from the observation vector y and repeating eventually yields the entire support of the signal x. OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit. The algorithms simplicity enables a fast runtime. The algorithm iterates s times, and each iteration does a selection through d elements, multiplies by x and solves a least squares problem. MRITS, DEPARTMENT OF ECE 29

Adaptive compressed sensing of speech signal

CHAPTER-5

ADAPTIVE COMPRESSIVE SENSING


In conventional compressed sensing process, the projection matrix which is used to generate the required compressed signal is generated randomly and considered to be fixed during the entire conversion process .That means the projection matrix is non-adaptive. Though this process results in better performance when compared to conventional sampling process, even better results can be obtained by using adaptive projection matrix.

5.1 Adaptive Projection Matrix


Most work in CS research focus on random projection matrix which is constructed by considering only the signals sparsity rather than other properties. In other word, the construction of projection matrix is non-adaptive. Observing that different kind speech frames have different intra-frame correlations, a frame-based adaptive compressed sensing framework for speech signals has been proposed , which applies adaptive projection matrix. To do so, the neighbouring frames are compared to estimate their intra-frame correlation, every frame is classified into different categories, and the number of projections are adjusted accordingly. The experimental results show that the adaptive projection matrix can significantly improve the speech reconstruction quality. Intra-frame correlation of speech signals is explored to achieve efficient sampling. Because different kind speech signals may have different intra-frame correlations a frame-based adaptive CS framework that uses different sampling strategies in different kind speech frames, has been proposed.

5.2 Frame Analysis


Each speech sequence is divided into non-overlapping frames of size 1x n and all frames in a speech sequence are processed independently. The projection matrix is initialised by Gaussian random matrix which has been proven to be incoherent with most sparse basises at high probability. MRITS, DEPARTMENT OF ECE 30

Adaptive compressed sensing of speech signal

Figure 5.1: The frame-based adaptive CS framework for speech

As shown in Figure 5.1, for each frame in a speech sequence, a small number of projections is collected, and compared these projections with the projections collected for the previous frame. Based on the comparison results, the correlation between these two frames is estimated , and the correlation is classified into different categories. Then the sampling strategy is adjusted according to the correlation type and different number of samples for the current frame are collected. For the current t th frame original speech signal, we represent it as X(t) its previous frame t-1 is represented using x(t-1) The difference between X(t) and x(t1) reflects the correlation between the two neighbouring frames, and can be used to classify the correlation. We use the collected measurements to estimate the correlation instead. The same projection matrix is applied to all frames in the partial sampling stage, and we have y(t)-y(t-1)= x(t)- x(t-1), where y(t) and y(t-1) are the the projection vectors of x(t) and x(t-1) respectively. As each sample in y(t)-y(t-1) is a linear combination of x9t)-x(t-1) the difference between the two projection vectors also reflects the intensity changes in the two frame. Therefore, we can estimate the amount of intensity changes in the two frames using only a small number of projections. Let M0 be a matrix containing the first M0 rows of the Gaussian random matrix . For the current frame t, we first use M0 to collect M0 measurements y (t) ^ M0= M0 .x(t) in the partial sampling stage. Then, we compare it with the first M0 measurements in y (t-1) and calculate the difference y (t) ^d=y (t) ^M0-y (t-1) ^M0. In the frame analysis module, given y (t) ^d, we calculate its l ^2 norm normalized by M0 and compare with two thresholds T1 and T2 (T1<T2). MRITS, DEPARTMENT OF ECE 31

Adaptive compressed sensing of speech signal If y (t) ^d/M0<=T1, the current frame is almost the same as its previous frame.We consider the two neighbouring frames may be both surd and label the intraframe correlation as surd vs. surd. If T1< y (t) ^d/M0<=T2, it indicates that these two neighbouring frames undergo small changes. In this situation, the two neighbouring frames may be both sonant at high probability and the intra-correlation is labelled as sonant vs. sonant. If y(t)^d/M0>T0, the two frames are significantly different from each other, which is most likely due to the change of the frame type, and we label the correlation as surd vs. sonant.

5.3 Partial Sampling


For each frame in a speech sequence, we first collect a small number of projections, and compare it with the projections collected for the previous frame. Based on the comparison results, we estimate the correlation between these two frames, and classify the correlation into different categories. We then adjust the sampling strategy according to the correlation type and collect different number of samples for the current frame. The next sections discuss details of each step in the above framework. For the current t th frame original speech signal, we represent it as x t . Its previous frame t-1 is represented using xt-1. The difference between xt and xt -1 reflects the correlation between the two neighboring frames, and can be used to classify the correlation. Since xt- xt-1 is not available at the sampling stage. We use the collected measurements to estimate the correlation instead. The same projection matrix is applied to all frames in the partial sampling stage, and we have yt-yt1=xt-xt-1

, where yt and yt-1 are the projection vectors of xt and xt-1 respectively. As

each sample in is a linear combination of 1 tt the difference between the two projection vectors also reflects the intensity changes in the two frame. Therefore, we can estimate the amount of intensity changes in the two frames using only a small number of projections. Let Mo be a matrix containing the first Mo rows of the Gaussian random matrix . For the current frame t, we first use Mo to collect Mo measurements ytMo =Moxt in the partial sampling stage. Then, we compare it with the first Mo measurements in yt-1 and calculate the difference ytd=ytMo-yt-1Mo. In the frame analysis MRITS, DEPARTMENT OF ECE 32

Adaptive compressed sensing of speech signal module, given ydt, we calculate its l2 norm normalized by Mo and compare with two thresholds T1 and T2 . If ||yd||2/Mo<= T1 the current frame is almost the same as its previous frame. We consider the two neighboring frames may be both surd and label the intra-frame correlation as surd vs. surd. If T1 < ||yd||2/Mo<= T2 it indicates that these two neighboring frames undergo small changes. In this situation, the two neighboring frames may be both sonant at high probability and the intra-correlation is labeled as sonant vs. sonant. If , ||yd||2/Mo >T1 the two frames are significantly different from each other, which is most likely due to the change of the frame type, and we label the correlation as surd vs. sonant. The correlation between the two neighbor frames, and can be used classify the correlation.

5.4 Adaptive Sampling


Depending on their classified intra-frame correlation types, different number of projections is used for the speech frames. We consider the frame as surd frame if its intra-frame correlation type is surd vs. surd. A surd frame contains the least new information in the speech. Thus, the M0 measurements collected in the partial sampling stage are sufficient and we do not need additional sampling. When its intraframe correlation is sonant vs. sonant, the frame is considered as sonant and contains some new information, which requires more measurements to be collected. For such frames, we collect M1 (M1 >M0) measurements. We use the (M0+1)th to the M1th rows of the Gaussian random matrix and combine with M0 to generate the final projection vector yt . The frames that experience large changes must contain the most new information. Therefore, we collect a total of M2 (M2>M1>M0) measurements during the sampling process. The total projection matrix is the first M2 rows of the Gaussian random matrix .

MRITS, DEPARTMENT OF ECE

33

Adaptive compressed sensing of speech signal

5.5 Reconstruction
The original signal is reconstructed from a significant small numbers of samples by using different optimization techniques such as l1-norm or convex optimization. In this project we have used orthogonal matching pursuit (OMP) which gives better results when compared to other optimization techniques. Reconstructing frame by frame of signal using OMP method. OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then x is in a loose sense close to the identity. Therefore one would expect the largest coordinate of the observation vector y = x to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting of that contribution from the observation vector y and repeating eventually yields the entire support of the signal x. OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit. The algorithms simplicity enables a fast runtime. The algorithm iterates s times, and each iteration does a selection through d elements, multiplies by x and solves a least squares problem. Reconstruction algorithms compute the support of the sparse signal x iteratively. Once the support of the signal is compute correctly, the pseudo-inverse of the measurement matrix restricted to the corresponding columns can be used to reconstruct the actual signal x. The clear advantage to this approach is speed, but the approach also presents new challenges. 5.5.1 Orthogonal Matching Pursuit Orthogonal Matching Pursuit (OMP), put forth by Mallrat and his collaborators and analyzed by Gilbert and Troop .OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then is in a loose sense close to the identity. Therefore one would expect the largest coordinate of the observation vector y = x to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting o that contribution from the observation vector y and repeating eventually yields the entire support of the signal x.

MRITS, DEPARTMENT OF ECE

34

Adaptive compressed sensing of speech signal OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit.The algorithms simplicity enables a fast runtime. The algorithm iterates s times,and each iteration does a selection through d elements, multiplies by , and solves a least squares problem. The selection can easily be done in O(d) time, and the multiplication of in the general case takes O(md). When is an unstructured matrix, the cost of solving the least squares problem is O(s2d). However, maintaining a QR-Factorization of |I and using the modied Gram-Schmidt algorithm reduces this time to O(|I|d) at each iteration. Using this method, the overall cost of OMP becomes O(smd). In the case where the measurement matrix is structured with a ast-multiply, this can clearly be improved. 5.5.2. Stagewise Orthogonal Matching Pursuit An alternative greedy approach, Stagewise Orthogonal Matching Pursuit (StOMP) developed and analyzed by Donoho and his collaborators, uses ideas inspired by wireless communications. As in OMP, StOMP utilizes the observation vector y = u.Where u = x is the measurement vector. However, instead of simply selecting the largest component of the vector y, it selects all of the coordinates whose values are above a specied threshold. It then solves a least-squares problem to update the residual. The algorithm iterates through only a xed number of stages and then terminates, whereas OMP requires s iterations where s is the sparsity level.The thresholding strategy is designed so that many terms enter at each stage, and so that algorithm halts after a xed number of iterations. The formal noise level k is proportional the Euclidean norm of the residual at that iteration. This method appears to provide slightly weaker results. It appears however,that StOMP outperforms OMP and Basis Pursuit in some cases. Although the structure of StOMP is similar to that of OMP, because StOMP selects many coordinates at each state, the runtime is quite improved. Indeed, using iterative methods to solve the least-squares problem yields a runtime bound of CNsd + O(d), where N is the xed number of iterations run by StOMP, and C is a constant that depends only on the accuracy level of the least-squares problem.

MRITS, DEPARTMENT OF ECE

35

Adaptive compressed sensing of speech signal 5.5.3 Regularized Orthogonal Matching Pursuit As is now evident, the two approaches to compressed sensing each presented disjoint advantages and challenges. While the optimization method provides robustness and uniform guarantees, it lacks the speed of the greedy approach. The greedy methods on the other hand had not been able to provide the strong guarantees of Basis Pursuit. This changed when we developed a new greedy algorithm, Regularized Orthogonal Matching Pursuit , that provided the strong guarantees of the optimization method. This work bridged the gap between the two approaches, and provided the rst algorithm possessing the advantages of both approaches. Regularized Orthogonal Matching Pursuit (ROMP) is a greedy algorithm, but will correctly recover any sparse signal using any measurement matrix that satises the Restricted Isometry Condition. Again as in the case of OMP, we will use the observation vector x as a good local approximation to the s-sparse signal x. Since the Restricted Isometry Condition guarantees that every s columns of are close to an orthonormal system, we will choose at each iteration not just one coordinate as in OMP, but up to s coordinates using the observation vector. It will then be okay to choose some incorrect coordinates, so long as the number of those is limited. To ensure that we do not select too many incorrect coordinates at each iteration, we include a regularization step which will guarantee that each coordinate selected contains an even share of the information about the signal. We remark here that knowledge about the sparsity level s is required in ROMP, as in OMP. Clearly in the case where the signal is not exactly sparse and the signal and measurements are corrupted with noise, the algorithm as described above will never halt. Thus in the noisy case, we simply change the halting criteria by allowing the algorithm iterate at most s times, or until |I| s. We show below that with this modication ROMP approximately reconstructs arbitrary signals.

MRITS, DEPARTMENT OF ECE

36

Adaptive compressed sensing of speech signal

5.6 Compressed Sensing Using DCT


The Algorithm for the compressed sensing by applying discrete cosine transform to make the signal sparse is as given below

1. Expand the signal in cosine basis (t)= cos (2jt/n), t=0,1,...............,n-1

and obtain coefficient vector x. 2. Choose the parameter M to obtain a new observation vector y by Correlating the signal

3. Create a new sensing matrix x and solve the L1 norm equation for the same. 4. Get the estimated coefficient vector s* and the recovered signal y. 5. Compare y with x to check whether the reconstruction is exact. 6. if not, go back to step 2 and increase the number of M. Conventionally, after acquisition of a scene, Discrete Cosine Transform (DCT) is performed on the image using values assigned to each pixel. After DCT, many coefficients will be zero or will carry negligible energy; these coefficients are discarded before quantization or/and entropy coding. Hence, though each frame of the image is acquired fully, much of the acquired information is discarded after DCT causing unnecessary burden on the acquisition process. This makes compressive sampling a good candidate for digital image and video

applications, where the Nyquist rate is so high that compressing the sheer volume of samples is a problem for transmission or storage.

MRITS, DEPARTMENT OF ECE

37

Adaptive compressed sensing of speech signal Applying compressive sampling to the whole image is ineffective. In order to exploit any sparsity within an image, we split the image into small nonoverlapping blocks of equal size. Compressive sampling is then performed only on blocks determined to be sparse, i.e., we exploit local sparsity within an image. Note that, in real-time acquisition, it is not possible to test for sparsity of a block before sampling. Reference frames are sampled fully, and DCT is applied on each of the B blocks. We select Bs sparse blocks in the following manner. Let C be a small positive constant, and T an integer threshold that is representative of the average number of non-significant DCT coefficients over all blocks. If the number of DCT coefficients in the block whose absolute value is less than C is larger than T, the block is selected as a reference for compressive sampling. Consider a real-valued, finite-length, one-dimensional, discrete-time signal x, which we view as an N 1 column vector in with elements x[n], n = 1, 2, . . . , N. We treat an image or higher dimensional data by vectorizing it into a long onedimensional vector. Any signal in can be represented in terms of a basis of N 1 vectors { i} =1. For simplicity, assume that the basis is orthonormal. Forming the N N basis matrix = [ 1 | 2 | . . . | n ] by stacking the vectors { i } as columns, we can express any signal x as x= where s is the N 1 column vector of weighting coefficients s1= <x, i

> iX= and where denotes the (Hermitian) transpose operation. Clearly, x and s are equivalent representations of the same signal, with x in the time domain and s in the domain. We will focus on signals that have a sparse representation, where x is a linear combination of just K basis vectors, with K << N. That is, only K of the above equation are nonzero and (N K) are zero. Sparsity is motivated by the fact that many natural and manmade signals are compressible in the sense that there exists a basis where the representation in above equation has just a few large coefficients and many small coefficients. Compressible signals are well approximated by K-sparse representations; this is the basis of transform coding. MRITS, DEPARTMENT OF ECE 38 in

Adaptive compressed sensing of speech signal For example, natural images tend to be compressible in the discrete cosine transform (DCT) and wavelet bases on which the JPEG and JPEG-2000 compression standards are based. Audio signals and many communication signals are compressible in a localized Fourier basis. The main idea of compressive sampling is to remove this sampling redundancy by needing only M samples of the signal, where K < M N. Let y be an M-length measurement vector given by y= s , where is an MN

measurement matrix. The above expression can be written in terms of s as Y=s The signal x can be recovered lossless from M K or slightly more measurements if the measurement matrix is properly designed, so that satisfies the so-called restricted isometric property . This will always be true if and are incoherent, that is, the vectors of cannot sparsely represent basic vectors and vice versa.

MRITS, DEPARTMENT OF ECE

39

Adaptive compressed sensing of speech signal

CHAPTER-6

APPLICATIONS OF COMPRESSED SENSING


Compressed sensing finds many applications almost in every signal processing system and communication system due to its efficient signal recovery capability. Sparse signals arise in practice in very natural ways, so compressed sensing lends it well to many settings. Some of the applications are given below

6.1 Imaging
Many images are sparse with respect to some basis. Because of this, many applications in imaging are able to take advantage of the tools provided by Compressed Sensing. The typical digital camera today records every pixel in an image before compressing that data and storing the compressed image. Due to the use of silicon, everyday digital cameras today can operate in the megapixel range. A natural question asks why we need to acquire this abundance of data, just to throw most of it away immediately. This notion sparked the emerging theory of Compressive Imaging. In this new framework, the idea is to directly acquire random linear measurements of an image without the burdensome step of capturing every pixel initially. Several issues from this of course arise. The rst problem is how to reconstruct the image from its random linear measurements. The solution to this problem is provided by Compressed Sensing. The next issue lies in actually sampling the random linear measurements without rst acquiring the entire image. Compressive sampling camera consists of a digital micro mirror device (DMD), two lenses, a single photon detector and an analog-to-digital (A/D) converter. The rst lens focuses the light onto the DMD. Each mirror on the DMD corresponds to a pixel in the image, and can be tilted toward or away from the second lens. Since this camera utilizes only one photon detector, its design is a stark contrast to the usual large photon detector array in most cameras. The single-pixel compressive sampling camera also operates at a much broader range of the light spectrum than traditional cameras that use silicon. MRITS, DEPARTMENT OF ECE 40

Adaptive compressed sensing of speech signal For example, because silicon cannot capture a wide range of the spectrum, a digital camera to capture infrared images is much more complex and costly. Compressed Sensing is also used in medical imaging, in particular with magnetic resonance (MR) images which sample Fourier coecients of an image. MR images are implicitly sparse and can thus capitalize on the theories of Compressed Sensing. Some MR images such as angiograms are sparse in their actual pixel representation, whereas more complicated MR images are sparse with respect to some other basis, such as the wavelet Fourier basis. MR imaging in general is very time costly, as the speed of data collection is limited by physical and physiological constraints. Thus it is extremely benecial to reduce the number of measurements collected without sacricing quality of the MR image. Compressed Sensing again provides exactly this, and many Compressed Sensing algorithms have been specically designed with MR images in mind. The common approach in digital image system is to capture as many pixels as possible and later compress the captured image by digital means. Compression is desired to increase the storage capacity and enhance the communication process. Compression techniques exploit the visual redundancy typical to human intelligible images. After capturing the optical image and applying data compression, the image is represented by a smaller number of pixels than the original image. The decompressed image should satisfy some desired visual quality.

6.2 Analog to Information Conversion


Analog to-digital converters (ADC) have been used in sensing and communications due to the advancement in digital signal processing. The process of ADC is based on the Nyquist sampling theorem which uniformly samples the signal with a rate of at least twice its bandwidth in order to reconstruct the signal perfectly. Emerging applications like radar detection and ultra-wideband communication are pushing the limit of ADC. The recent developments in the field of compressive sensing (CS), has helped in the design of Analog to-Information converters (AIC) that are able to acquire samples at a lower sampling rate.

MRITS, DEPARTMENT OF ECE

41

Adaptive compressed sensing of speech signal

6.3 Compressive Radar


One additional application is Compressive Radar Imaging. A standard radar system transmits some sort of pulse (for example a linear chirp), and then uses a matched lter to correlate the signal received with that pulse. The receiver uses a pulse compression system along with a high-rate analog to digital (A/D) converter. This conventional approach is not only complicated and expensive, but the resolution of targets in this classical framework is limited by the radar uncertainty principle. Compressive Radar Imaging tackles these problems by discretizing the timefrequency plane into a grid and considering each possible target scene as a matrix. If the number of targets is small enough, then the grid will be sparsely populated, and we can employ Compressed Sensing techniques to recover the target scene. The new theory of compressive sensing can be used in radar imaging systems which are designed to determine the range, altitude, direction and speed of moving and fixed objects. The received radar signal can be reconstructed with fewer measurements by solving an inverse problem through a linear program or a Greedy Algorithm. With the implementation of compressive sensing in radar systems, the need for a pulse compression matched filter at the receiver side and the analog todigital conversion operating at high Nyquist rates can be eliminated. As a result, the complexity and the cost of the receiver hardware is going to be greatly reduced.

MRITS, DEPARTMENT OF ECE

42

Adaptive compressed sensing of speech signal

6.4 Compressive Sensing in Mobile Communication System


Wireless Sensor Networks (WSN) is a very special topic in wireless communications, because in WSNs the main constraint is power, unlike almost all other wireless networks. The power constraint poses a serious restriction on the design of the sensor nodes and the network layers (especially the physical and MAC layers). A typical sensor node consists of the wireless transceiver, processor, sensor and the power. The sensor node has a number of modes of operation associated with the activity of its components which may be broadly divided as sleep and active modes. In sleep mode the sensor typically consumes much less power than in active mode. In order to minimize the power consumption, the ratio of the active vs. sleep time must be as minimal as possible. The lifetime of a node in a sensor network is of utmost importance. Because, once the sensors are deployed it will be very difficult to identify the ones whose power supply is dead and to maintain them (consider a WSN for monitoring the traffic on a freeway), the sensor nodes cannot be recharged with the same ease that a cell-phone (or other handheld devices) offers. There are also the additional problems associated with ad-hoc networks, such as synchronization, hidden nodes, exposed nodes, etc. that need to be solved in order to maximize the sleep time. Also while in active mode, the node consumes considerable energy to transmit bits. This is where CS comes in, it helps to reduce the amount of data that must circulate through the network. An important point here is that, reducing the amount of data will significantly reduce transmit power; because, number of collisions, bit errors and retransmissions will all decrease as well. Compressed sensing (CS) theory specifies a new signal acquisition approach, potentially allowing the acquisition of signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is not directly acquired but reconstructed from a few measurements. One of the key problems in CS is how to recover the original signal from measurements in the presence of noise. This dissertation addresses signal reconstruction problems in CS.

MRITS, DEPARTMENT OF ECE

43

Adaptive compressed sensing of speech signal First, a feedback structure and signal recovery algorithm, orthogonal pruning pursuit (OPP), is proposed to exploit the prior knowledge to reconstruct the signal in the noise-free situation. To handle the noise, a noise-aware signal reconstruction algorithm based on Bayesian Compressed Sensing (BCS) is developed. Moreover, a novel Turbo Bayesian Compressed Sensing (TBCS) algorithm is developed for joint signal reconstruction by exploiting both spatial and temporal redundancy. Then, the TBCS algorithm is applied to a UWB positioning system for achieving mm-accuracy with low sampling rate ADCs. Finally, hardware implementation of BCS signal reconstruction on FPGAs and GPUs is investigated. Implementation on GPUs and FPGAs of parallel Cholesky decomposition, which is a key component of BCS, is explored. Simulation results on software and hardware have demonstrated that OPP and TBCS outperform previous approaches, with UWB positioning accuracy is improved. The accelerated computation helps enable real-time application of this work. By using compressive sensing techniques, the speech signal is pre coded at the transmitter side which is being sent to the receiver through a wireless channel. As a result, a small number of samples are being transmitted, and this will increase the transmission data rates when compared to the current communication systems. In the proposed communication system, first the speech signal is modelled in such a way that the input signal is sparse enough before applying compressive sensing. The sparse signal is multiplied by the predefined measurement matrix.

Figure 6.1: Compressive sensing in a mobile communication system

MRITS, DEPARTMENT OF ECE

44

Adaptive compressed sensing of speech signal The output of the compressive sensing module is then transmitted to the receiver. At the receiver, the signal is perfectly reconstructed from a significant small number of measurements by using different optimization techniques such as l1-norm or convex optimization.

MRITS, DEPARTMENT OF ECE

45

Adaptive compressed sensing of speech signal

CHAPTER -7

SIMULATION RESULTS
To compare the performance of this proposed adaptive compressed sensing and the conventional non-adaptive CS, some experiments are conducted. As a part of that, an arbitrary speech signal has been chosen which is 16 kHz sampled and 16 bits quantized for each sample. Adaptive CS and CS sampling and reconstruction are performed frame by frame, with a frame length of N=320 samples. Threshold values T1 and T2 are chosen as 0.08 and 0.4 respectively which is tested through a great number of experiments. Average Frame Signal to Noise Ratio (AFSNR) is calculated and used to evaluate the reconstruction quality of speech signal. Average Frame Signal to Noise Ratio (AFSNR) is calculated using the formula shown below

AFSNR=1/K

) (

Where K is the total frame number of a speech sequence the frame speech and the

and represent

frame reconstructed speech. Under different

compressed ratio(r=0.2, r=0.4 and r=0.6), which is defined as r=M/N, the different test results are obtained based on the proposed frame-based adaptive CS using OMP reconstruction algorithm.

MRITS, DEPARTMENT OF ECE

46

Adaptive compressed sensing of speech signal

7.1 Compression Ratio r=0.2


Here the compression ratio r is equal to 0.2 which indicates that the number of projections M=64 for the frame of samples N = 320.The below figures 7.1 shows the time domain waveform of the original speech signal and adaptive CS reconstructed speech with compressed ratio of 0.2.

original signal 1 0.5 0 -0.5 -1

6 x 10

7
4

reconstruction signal 1 r=0.2 0.5 0 -0.5 -1

6 x 10

7
4

Figure 7.1: Original and reconstructed signals for N=320, M=64, r=0.2

The average frame signal to noise ratio for compression ratio r=0.2 with adaptive projection matrix is 12.030 which is 4.970 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix.

MRITS, DEPARTMENT OF ECE

47

Adaptive compressed sensing of speech signal

7.2 Compression Ratio r=0.4


Here the compression ratio r is equal to 0.4 which indicates that the number of projections M=128 for the frame of samples N = 320.The below figures 7.2 shows the time domain waveform of the original speech signal and adaptive CS reconstructed speech with compressed ratio of 0.4

original signal 1 0.5 0 -0.5 -1

6 x 10

7
4

reconstruction signal 1 r=0.4 0.5 0 -0.5 -1

6 x 10

7
4

Figure 7.2: Original and reconstructed signals for N=320, M=128, r=0.4

The average frame signal to noise ratio for compression ratio r=0.4 with adaptive projection matrix is 15.1714 which is 12.103 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the above figure, it is observed that, when the compressed ratio r=0.4, the quality of the reconstructed signal has been increased than the reconstructed signal obtained from compressed ratio r=0.2.

MRITS, DEPARTMENT OF ECE

48

Adaptive compressed sensing of speech signal

7.3 Compression ratio r=0.6


Here the compression ratio r is equal to 0.6 which indicates that the number of projections M=192 for the frame of samples N = 320.The below figures 7.3 shows the time domain waveform of the original speech signal and adaptive CS reconstructed speech with compressed ratio of 0.6

original signal 1 0.5 0 -0.5 -1

6 x 10

7
4

reconstruction signal 1 r=0.6 0.5 0 -0.5 -1

6 x 10

7
4

Figure7.3: Original and reconstructed signals for N=320, M=192, r=0.6

The average frame signal to noise ratio for compression ratio r=0.6 with adaptive projection matrix is 25.1721 which is 21.630 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the above figure, it is observed that, when the compressed ratio r=0.6, the quality of the reconstructed signal has been increased than the reconstructed signal obtained from compressed ratio r=0.2 and r=0.4.

MRITS, DEPARTMENT OF ECE

49

Adaptive compressed sensing of speech signal

7.4 Comparison with the Non-Adaptive Process


The results obtained by using adaptive projection matrix to implement the compressed sensing are compared with that of conventional non adaptive compressed sensing, for the three compressed ratios r=0.2, r=0.4, r=0.6. Below figure shows the reconstruction quality of speech signal based on our proposed frame-based adaptive CS and the conventional non-adaptive CS. From the fig 7.4 it can be observed that reconstruction quality is improved when adaptive projection matrix is used. For example, when the compressed ratio is 0.2, the AFSNR increases more than 7dB.

Figure 7.4: Reconstructed speech quality using the proposed adaptive CS and nonAdaptive Compressed sensing.

MRITS, DEPARTMENT OF ECE

50

Adaptive compressed sensing of speech signal Table 7.1 Gives the Average Frame Signal to Noise Ratio (AFSNR) of the reconstructed speech based on our proposed frame-based adaptive CS and the conventional non-adaptive CS. Under each different compressed ratio, we can see that reconstructed speeches using the proposed adaptive frame have higher AFSNR than that using non-adaptive frame.

Table 7.1: MOS of the reconstructed speech using non-adaptive CS and the adaptive CS

Thus it is clearly observed that the proposed adaptive compressed sensing is more efficient than the conventional non adaptive compressed sensing.

MRITS, DEPARTMENT OF ECE

51

Adaptive compressed sensing of speech signal

CHAPTER-8

CONCLUSION AND FUTURE SCOPE


The adaptive projection matrix has been applied to the conventional compressed sensing and improved the average frame signal to noise ratio. It is also proved that the quality of the reconstructed signal increases as the compressed ratio increases. Thus the conventional non adaptive compressed sensing can be replaced by the adaptive compressed sensing to improve the efficiency of the system. During the design process, this module went through different tests and analysis in order to find the most adequate optimization technique to reconstruct the speech signal with few random measurements without losing the information. For simulation purposes, code was created in order to compress the speech signal below the Nyquist rate by taking only a few measurements of the signal. The result shows that by keeping the length of the signal (L) and threshold window (Th) constant the desired compression of the signal can be achieved by making the signal sparse (K) to a certain amount which in turn increases the data rates. The speech signal was reconstructed without losing important information in order to achieve an increase in the data rates. After multiple simulations, it was found that the system worked as expected and the speech signal was reconstructed efficiently with a minimum error. Performance of compressive sensing is better when compared to wavelet compression as there is a minimum error with same compression rate using different parameters. In this research the design of a new signal acquisition system using adaptive compressive sensing has been implemented. The proposed system should fulfil the accurate reconstruction of the speech signal. Different transformations need to be tested in order to find the most efficient one for this application Design and measurement matrix that will be optimum for speech signals.

MRITS, DEPARTMENT OF ECE

52

Adaptive compressed sensing of speech signal

REFERENCES

[1] Donoho D L. Compressed sensing [J]. IEEE Transaction on information theory, 2006,52(4):1289-1306 [2] Donoho D,Tsaig Y. Extensions of compressed sensing [J]. Signal Processing, 2006,86(3):533-548. [3] Scott S. chen,David L. Donoho,Michael A.Saunders. Atomic

decomposition by Basis pursuit [J]. 2001 Society for Industrial and Applied Mathematics, SIAM Review,43(1):129-159. [4] J. Tropp, A. Gilbert. Signal recovery from random measurements via orthogonal Matching pursuit. IEEE Transaction on Information Theory [J], 2007, 53(12),4655- 4666. [5] Zh. M Wang, G. R. Arcet, J. L. Paredest. Colored random projections for compressed sensing. ICASSP,2007: 873-876 [6] Zhaorui Liu, Vicky Zhao,A. Y. Elezzabi. Block-based adaptive compressed sensing conference on image proceeding,2010,1649-1652. [7] Jarvis Haupt,Robert Nowak,Rui Castro.Adaptive sensing for sparse signalrecovery.ICASSP,2009,702-707. MRITS, DEPARTMENT OF ECE 53 for video. Proceedings of 2010 IEEE 17th international

Adaptive compressed sensing of speech signal [8] R.Gribonval, M. Nielsen. Sparse representations in unions of bases. IEEE Trascation on Information Theory, 2004,49(12),3320-3325. [9] J. A. Tropp. Greed id good: Algorithmic results for sparse approximation. IEEE Transaction on Information Theory, 2004, 50(10), 2231-2242.

MRITS, DEPARTMENT OF ECE

54

Adaptive compressed sensing of speech signal Appendix A: MATLAB code for sampling the speech signal in mobile system using Compressed sensing

clc; clear all; %Fs Hz (samples per second) is the rate at the speech signal is sampled Fs=2000; x=wavread('test.wav'); figure(1) stem(x) title('Recorded input speech signal'); xlabel('Length of the input speech signal'); ylabel('Amplitude of the input speech signal'); %Discrete cosine transform of the recorded signal a0=dct(x) figure(2) stem(a0) axis([0 2000 -1 1]); title('Discrete cosine transform of the recorded signal'); xlabel('Length of the DCT spectrum'); ylabel('Amplitude of the DCT spectrum'); % Thresholding the spectrum to make it sparse for i=1:1:2000; if a0(i,1)<=0.04 && a0(i,1)>=-0.06 a0(i,1)=0; MRITS, DEPARTMENT OF ECE 55

Adaptive compressed sensing of speech signal else a0(i,1)=a0(i,1); end end a0; figure(3) stem(a0) axis([0 2000 -1 1]); title('The Threshold spectrum'); xlabel('The length of the threshold spectrum'); ylabel('Amplitude of the threshold spectrum'); % Sparsity of the spectrum(K)and Length of the signal (N) K=800 N=2000 % Random measurement matrix disp('Creating measurment matrix...'); A = randn(K,N); A = orth(A')'; figure(4) imagesc(A) colorbar; colormap('lines'); title('Random Measurement matrix'); disp('Done.'); % observations vector

MRITS, DEPARTMENT OF ECE

56

Adaptive compressed sensing of speech signal y = A*a0; figure(5) plot(y) title('Observation Vector'); %initial guess = min energy x0 = A'*y; %solve the LP tic xp = l1eq_pd(x0, A, [], y, 1e-2); toc figure(6) plot(xp) axis([0 2000 -0.6 0.6]); title(' Reconstructed Spectrum using l1-minimization'); % Inverse dicrete cosine transform of reconstructed signal (IDCT) Xrec=idct(xp) wavplay(Xrec,Fs) figure(7) stem(Xrec) title('Reconstructed signal at the receiver'); xlabel('Length of the reconstructed signal using IDCT'); ylabel('Amplitude of the reconstructed signal using IDCT'); % Calculating Absolute error between the reconstructed and actual signal err=(max(abs(Xrec-x))) stem(err); title(' Absolute Error of Reconstructed spectrum and Threshold spectrum ');

MRITS, DEPARTMENT OF ECE

57

Adaptive compressed sensing of speech signal xlabel('Length of the Maximum Absolute Error'); ylabel('Maximum Absolute error')

MRITS, DEPARTMENT OF ECE

58

Adaptive compressed sensing of speech signal Appendix B:MATLAB code for compressing the test signal using wavelet compression

% Load original one-dimensional signal. Fs=2000 s=wavread('test.wav')'; figure(1) stem(s) title('Input speech signal'); xlabel('Length of the input speech signal'); ylabel('Amplitude of the input speech signal'); l_s = length(s); % Wavelet transform of input signal [cA1,cD1] = dwt(s,'db1'); %To extract the Level-1 Approximation and Detail coefficient A1 = idwt(cA1,[],'db1',l_s); D1 = idwt([],cD1,'db1',l_s); figure(2) subplot(1,2,1); plot(A1); title('Approximation A1') subplot(1,2,2); plot(D1); title('Detail D1') %Inverse Wavelet transform of Approximation and detail coefficient A0 = idwt(A1,D1,'db1',l_s); wavplay(A0,Fs) figure(3) stem(A0) title('Recontructed speech signal'); MRITS, DEPARTMENT OF ECE 59

Adaptive compressed sensing of speech signal xlabel('Length of the reconstructed speech signal'); ylabel('Amplitude of the reconstructed speech signal'); err = max(abs(s-A0))

MRITS, DEPARTMENT OF ECE

60

You might also like