Professional Documents
Culture Documents
CHAPTER-1
INTRODUCTION
Compressive sensing is an emerging and revolutionary technology that strongly relies on the sparsity of the signal. In compressive sensing the signal is compressively sampled by taking a small number of random projections of the signal, which contain most of the salient information. Compressive sensing has been previously applied in areas like image processing, radar systems and sonar systems. And now it is being used in speech processing as advanced technique to acquiring the data. The key objective in compressed sensing (also referred to as sparse signal recovery or compressive sampling) is to reconstruct a signal accurately and eciently from a set of few non-adaptive linear measurements. Of course, linear algebra easily shows that in general it is not possible to reconstruct an arbitrary signal from an incomplete set of linear measurements. Thus one must restrict the domain in which the signals belong. To this end, we consider sparse signals, those with few non-zero coordinates. It is now known that many signals such as real-world images or audio signals are sparse . Since sparse signals lie in a lower dimensional space, one would think that they may be represented by few linear measurements. This is indeed correct, but the diculty is determining in which lower dimensional subspace such a signal lies. That is, we may know that the signal has few non-zero coordinates, but we do not know which coordinates those are. It is thus clear that we may not reconstruct such signals using a simple linear operator, and that the recovery requires more sophisticated techniques. The compressed sensing eld has provided many recovery algorithms, most with provable as well as empirical results. There are several important traits that an optimal recovery algorithm must possess. The algorithm needs to be fast, so that it can eciently recover signals in practice. Of course, minimal storage requirements as well would be ideal. MRITS, DEPARTMENT OF ECE 1
Adaptive compressed sensing of speech signal The algorithm should provide uniform guarantees, meaning that given a specic method of acquiring linear measurements, the algorithm recovers all sparse signals (possibly with high probability).Ideally, the algorithm would require as few linear measurements as possible. However, recovery using only this property would require searching through the exponentially large set of all possible lower dimensional subspaces, and so in practice is not numerically feasible. Thus in the more realistic setting, we may need slightly more measurements. Finally, we wish our ideal recovery algorithm to be stable. This means that if the signal or its measurements are perturbed slightly, then the recovery should still be approximately accurate. This is essential, since in practice we often encounter not only noisy signals or measurements, but also signals that are not exactly sparse, but close to being sparse.The conventional scheme in signal processing, acquiring the entire signal and then compressing it, was questioned by Donoho. Indeed, this technique uses tremendous resources to acquire often very large signals, just to throw away information during compression. The natural question then is whether we can combine these two processes, and directly sense the signal or its essential parts using few linear measurements. Recent work in compressed sensing has answered this question in positive, and the eld continues to rapidly produce encouraging results.
1.1 Objective
Compressed sensing (CS) is an emerging signal acquisition theory that directly collects signals in a compressed form if they are sparse on some certain basis. It originates from the idea that it is not necessary to invest a lot of power into observing the entries of a sparse signal in all coordinates when most of them are zero anyway. Rather it should be possible to collect only a small number of measurements that still allow for reconstruction. This is potentially useful in applications where one cannot afford to collect or transmit a lot of measurements but has rich resources at the decoder.
Adaptive compressed sensing of speech signal Observing that different kind speech frames have different intra-frame correlations, a frame-based adaptive compressed sensing framework for speech signals has been proposed. The objective of this project is to further improve the performance of the existing compressed sensing process that uses non adaptive projection matrix, by using the adaptive projection matrix based on frame analysis. Average-frame signal-to-noise ratio (AFSNR) is calculated to evaluate the performance of the frame-based adaptive CS with the non-adaptive CS.
Adaptive compressed sensing of speech signal Most work in CS research focus on random projection matrix which is constructed by considering only the signals sparsity rather than other properties. In other word, the construction of projection matrix is non-adaptive. Observing that different kind speech frames have different intra-frame correlations, a frame-based adaptive compressed sensing framework, which applies adaptive projection matrix for speech signals, has been proposed.To do so, neighbouring frames are compared to estimate their intra-frame correlation, every frame is classified into different categories, and the number of projections for each frame is adjusted accordingly.The experimental results show that the adaptive projection matrix can significantly improve the speech reconstruction quality.
CHAPTER-2
LITERATURE SURVEY
According to information theory, the bit rate at which the condition of distortion less transmission of any source signal is possible is determined by the entropy of the speech source message. However, that in practical terms the source rate corresponding to the entropy is only asymptotically achievable, for the encoding memory length or delay tends to infinity. Any further compression is associated with information loss or coding distortion. Many practical source compression techniques employ lossy coding, which typically guarantees further bit rate economy at the cost of nearly imperceptible speech, audio, video, and other source representation degradation. Note that the optimum Shannonian source encoder generates a perfectly uncorrelated source-coded stream, in which all the source redundancy has been removed. Therefore, the encoded source symbols which in most practical cases are constituted by binary bits are independent, and each one has the same significance. Having the same significance implies that the corruption of any of the source-encoded symbols results in identical source signal distortion over imperfect channels. Under these conditions, according to Shannon's fundamental work the best protection against transmission errors is achieved if source and channel coding are treated as separate entities. When using a block code of length N channel-coded symbols in order to encode K source symbols with a coding rate of R = K/N, the symbol error rate can be rendered arbitrarily low if N tends to infinity .
Human speech production comprises lungs, vocal cords, and the vocal tract. The vocal cords are expressed as a simple vibration model, and the pitch of the speech changes according to adjustments in the tension of the vocal cords. Speech is generated by emitting sound pressure waves, radiated primarily from the lips, although significant energy emanates through sounds from the nostrils, throat, and the like. The air compressed by the lungs excites the vocal cords in two typical modes. When generating voiced sounds, the vocal cords vibrate and generate a high-energy quasi-periodic speech waveform, while in the case of lower energy unvoiced sounds, the vocal cords do not participate in the voice production and the source behaves similarly to a noise generator. In a somewhat simplistic approach, the excitation signal denoted by E (z) is then filtered through the vocal apparatus, which behaves like a spectral shaping filter with a transfer function of H (z) that is constituted by the spectral shaping action of the glottis, which is defined as the opening between the vocal folds. Further spectral shaping is carried out by the vocal tract, lip radiation characteristics, and so on.
Adaptive compressed sensing of speech signal The human speech in its pristine form is an acoustic signal. For the purpose of communication and storage, it is necessary to convert it into an electrical signal. This is accomplished with the help of certain instruments called transducers. This electrical representation of speech has certain properties. 1. It is a one-dimensional signal, with time as its independent variable. 2. It is random in nature. 3. It is non-stationary, i.e. the frequency spectrum is not constant in time. Microphone is a transeducer that converts the acoustic speech signal into elictrical signal .The micro phone receives the acoustic voice signal and produces the electrical signal as output whose amplitude is proportional to the input acoustic voice signal intensity.
The electrical signal which is produced by the micro phone is a analog signal whose amplitude various continuously with time and it is continuous both in time and amplitude as shown in figure below,
processed in its original analog form it is not as much efficient as it is processed in digital form. With the advent of digital computing machines, it was propounded to exploit the powers of the same for processing of speech signals. This required a digital representation of speech. To achieve this, the analog signal is sampled at some frequency and then quantized at discrete levels. There are many advantages in processing the signal in digital than in analog form. The analog speech signal is converted into digital form by using Analog to Digital Converter (ADC).
As shown in the figure above the analog to digital converter consists of three basic functions. 1. Sampling 2. Quantization 3. Encoding
Adaptive compressed sensing of speech signal 2.2.1 Sampling Sampling process is used to convert the continuous time signal to discrete time signal. This is achieved by multiplying the input continuous signal with an impulse train of unity magnitude. Here the impulse train frequency which is also known as sampling frequency should be greater than twice the highest frequency component present in the input analog signal (Fs > = 2Fm), and this rate is called as nyquist rate( nyquist condition for sampling),
Figure 2.5: Block diagram of Sampler The output of sampler is a discrete time signal in which the samples present at only discrete intervals of time. 2.2.2 Quantization The output obtained from the sampling process is discrete in only time but its amplitude values are still assume continuous values. In order to make it discrete , quantization is used.
The quantizer assigns discrete values to the input samples by either Rounding off process or Truncation process. In rounding off process the values are assign to the nearest integer multiple of step size. Where as in truncation process the values are assigned by truncating the values above the integral multiple of step size. When compared Rounding off process the truncation process produces more error.Thus the output of quantizer is a signal which is discrete both in time and amplitude. MRITS, DEPARTMENT OF ECE 9
Adaptive compressed sensing of speech signal 2.2.3 Encoding The encoder is used to assign binary values to the sample values coming from the quantizer.
Thus the out from the encoder is pure digital signal represented by zeros and ones. 2.2.4 Anti-Aliasing Filter Apart from the three building blocks sampler, quantizer and encoder discussed above the Analog to Digital Converter also contains a low pass filter called AntiAliasing filter. Anti-aliasing low-pass filtering (LPF) is necessary in order to band limit the signal to a bandwidth of B before sampling. In case of speech signals, about 1% of the energy resides above 4 kHz and only a negligible proportion above 7 kHz. Hence, commendatory quality speech links, which are also often referred to as wideband speech systems, typically band limit the speech signal to 7-8 kHz. Conventional telephone systems usually employ a bandwidth limitation of 0.3-3.4 kHz, which results only in a minor speech degradation, hardly perceptible by the listener.
10
Adaptive compressed sensing of speech signal one carries the speech information, and the other includes silent or noise sections that are between the utterances, without any verbal information. The verbal (informative) part of speech can be further divided into two categories namely Speech and Unvoiced Speech. 2.3.1 Voiced and Non Voiced Speech In speech there are two major types of excitation, voiced and unvoiced. Voiced speech consists mainly of vowel sounds. It is produced by forcing air through the glottis, proper adjustment of the tension of the vocal cords results in opening and closing of the cords, and a production of almost periodic pulses of air. These pulses excite the vocal tract. Psychoacoustics experiments show that this part holds most of the information of the speech and thus holds the keys for characterizing a speaker. Unvoiced speech sections are generated by forcing air through a constriction formed at a point in the vocal tract (usually toward the mouth end), thus producing turbulence. Being able to distinguish between the three is very important for speech signal analysis. Voiced speech tends to be periodic in nature. Examples of voiced sounds are English vowels, such as the /a/ in .bay and the /e/ in .see.Since unvoiced speech is due to turbulence, the speech is aperiodic and has a noise-like structure. Some examples of unvoiced English sounds are the /s/ in .so and the /h/ in .he.. In general at least 90% of the speech energy is always retained in the first N/2 transform coefficients, if the speech is a voiced frame. However, for an unvoiced frame the energy is spread across several frequency bands and typically the first N/2 coefficients holds less than 40% of the total energy. Due to this wavelets are inefficient at coding unvoiced speech. Unvoiced speech frames are infrequent. By detecting unvoiced speech frames and directly encoding them (perhaps using entropy coding), no unvoiced data is lost and the quality of the compressed speech will remain transparent. Typical voiced and unvoiced speech waveform segments are shown in Figures 2.8 and 2.9 respectively, along with their corresponding power densities. Clearly, the unvoiced segment appears to have a significantly lower magnitude, which is also reflected by its PSD. MRITS, DEPARTMENT OF ECE 11
The voiced segment shown in Figure 2.10 is quasi-periodic in the time domain, and it has an approximately 80-sample periodicity, identified by the positions of the largest time-domain signal peaks, which corresponds to10 ms. This interval is referred to as the pitch period and it is also often expressed in terms of the pitch frequency p, which in this example 1/ (10 ms) = 100 Hz. In the case of male speakers, the typical pitch frequency range is between 40 and 120 Hz , whereas for females it can be as high as 300-400 Hz.
12
Adaptive compressed sensing of speech signal Observe furthermore that within each pitch period there is a gradually decaying oscillation, which is associated with the excitation and gradually decaying vibration of the vocal cords. A perfectly periodic time-domain signal would have a line spectrum, but since the voiced speech signal is quasi-periodic with a frequency of p (rather than being perfectly periodic) its spectrum exhibits somewhat widened but distinctive spectral needles at frequencies of np, rather than being perfectly periodic. As a second phenomenon, it can also be observed that three, sometimes four, spectral envelope peaks. In voiced spectrum of Figure 2.8 these formant frequencies are observable around 500 Hz, 1500 Hz, and 2700 Hz, and they are the manifestation of the resonances of the vocal tract at these frequencies. In contrast, the unvoiced segment of Figure 2.9 does not have a formant structure, rather it has a more dominant high-pass nature, exhibiting a peak around 2500 Hz. Observe, furthermore, that its energy is much lower than that of the voiced segment . It is equally instructive to study the ACF of voiced and unvoiced segments, which are portrayed on an expanded scale in Figure 2.10 and Figure 2.11 respectively.
The voiced ACF shows a set of periodic peaks at displacements of about 20 samples, corresponding to 2.5 ms, which coincides with the positive quasi-periodic MRITS, DEPARTMENT OF ECE 13
Adaptive compressed sensing of speech signal time-domain segments. Following four monotonously decaying peaks, there is a more dominant one around a displacement of 80 samples, which indicates the pitch periodicity.
The periodic nature of the ACF can therefore, for example, be exploited to detect and measure the pitch periodicity in a range of applications, such as speech codecs and voice activity detectors. Observe, however, that the first peak at a displacement of 20 samples is about as high as the one near 80.Hence, a reliable pitch detector has to attempt to identify and rank all these peaks in order of prominence, exploiting also the a priori knowledge as to the expected range of pitch frequencies. By contrast, the unvoiced segment has a much more rapidly decaying ACF, indicating no inherent correlation between adjacent samples and no long-term periodicity. The voiced speech signal and unvoiced speech signal have some distinct characteristic features that enable us to distinguish between voiced speech and unvoiced speech. 2.3.2 Zero Crossing Rate The rate at which the speech signal crosses zero can provide information about the source of its creation. It is well known that unvoiced speech has a much higher ZCR than voiced speech this is because most of the energy in unvoiced speech is found in higher frequencies than in voiced speech, implying a higher ZCR for the former. MRITS, DEPARTMENT OF ECE 14
Adaptive compressed sensing of speech signal 2.3.3 Cross-Correlation Cross-correlation is calculated between two consecutive pitch cycles. The cross-correlation values between pitch cycles are higher (close to 1) in voiced speech than in unvoiced speech.
15
CHAPTER-3
SPEECH COMPRESSION
In the recent years, large scale information transfer by remote computing and the development of massive storage and retrieval systems have witnessed a tremendous growth. To cope up with the growth in the size of databases, additional storage devices need to be installed and the modems and multiplexers have to be continuously upgraded in order to permit large amounts of data transfer between computers and remote terminals. This leads to an increase in the cost as well as equipment. One solution to these problems is COMPRESSION where the database and the transmission sequence can be encoded efficiently.
Adaptive compressed sensing of speech signal 2. It is square integrable or, equivalently has finite energy
( )
( )
A function is called mother wavelet if it satisfies these two properties. There is infinity of functions that satisfy these properties and thus qualify to be mother wavelet. The simplest of them is the Haar wavelet. Some other wavelets are Mexican hat, Morlet. Apart from this, there are various families of wavelets. Some of the families are daubechies family, symlet family, coiflet family etc. Consider the following figure which juxtaposes a sinusoid and a wavelet
wavelet is a waveform of effectively limited duration that has an average value of zero. Compare wavelets with sine waves, which are the basis of Fourier analysis, Sinusoids do not have limited duration,they extend from minus to plus infinity. And while sinusoids are smooth and predictable, wavelets tend to be irregular and asymmetric.Fourier analysis consists of breaking up a signal into sine waves of various Frequencies .Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet.
17
The above diagram suggests the existence of a synthesis equation to represent the original signal as a linear combination of wavelets which are the basis function for wavelet analysis (recollect that in Fourier analysis, the basis functions are sines and cosines). This is indeed the case. The wavelets in the synthesis equation are multiplied by scalars
( )
For m=0....p-1
Wavelets with a high number of vanishing moments lead to a more compact signal representation and are hence useful in coding applications. However, MRITS, DEPARTMENT OF ECE 18
Adaptive compressed sensing of speech signal in general, the length of the filters increases with the number of vanishing moments and the complexity of computing the DWT coefficients increases with the size of the wavelet filters. 3.3.2 Fast Wavelet Transform The Discrete Wavelet Transform (DWT) coefficients can be computed by using Mallets Fast Wavelet Transform algorithm. This algorithm is sometimes referred to as the two-channel sub band coder and involves filtering the input signal based on the wavelet function used. To explain the implementation of the Fast Wavelet Transform algorithm consider the following equations: 1. (t) = 2. (t) = ( ( ) (2t-k) ) ( ) ( )
The first equation is known as the twin-scale relation (or the dilation equation) and defines the scaling function. The next equation expresses the wavelet in terms of the scaling function. These equations represent the impulse response coefficients for a low pass filter of length 2N, with a sum of 1 and a norm of 1/squrerootof(2). The high pass filter is obtained from the low pass filter using the relationship
) (
19
Adaptive compressed sensing of speech signal Another factor that comes into picture is taken from psycho acoustic studies. Since our ears are more sensitive to low frequencies than high frequencies and our hearing threshold is very high in the high frequency regions, a method for compression is used by means of which the detail coefficients (corresponding to high frequency components) of wavelet transforms are thresholded such that the error due to thresholding is inaudible to our ears. Since some of the high frequency components are discarded, a smoothened output signal is expected , as is shown in the following figure
In summary, the notion behind compression is based on the concept that the regular Signal component can be accurately approximated using a small Number of approximation coefficients (at a suitably chosen level) and some of the detail Coefficients.Data compression is then achieved by treating small valued coefficients as insignificant data and thus discarding them. The process of compressing a speech signal using Wavelets involves a number of different stages, each of which is discussed below. 3.4.1 Choice of Wavelet The choice of the mother-wavelet function used in designing high quality speech coders is of prime importance. Choosing a wavelet that has compact support in both time and frequency in addition to a significant number of vanishing moments is essential for an optimum wavelet speech compressor.
20
Adaptive compressed sensing of speech signal This is followed very closely by the Daubechies D20, D12, D10 or D8 wavelets, all concentrating more than 96% of the signal energy in the Level 1 approximation coefficients. Wavelets with more vanishing moments provide better reconstruction quality, as they introduce less distortion into the processed speech and concentrate more signal energy in a few neighbouring coefficients. 3.4.2 Wavelet Decomposition Wavelets work by decomposing a signal into different resolutions or frequency bands, and this task is carried out by choosing the wavelet function and computing the Discrete Wavelet Transform (DWT). Signal compression is based on the concept that selecting a small number of approximation coefficients (at a suitably chosen level) and some of the detail coefficients can accurately represent regular signal components. Choosing a decomposition level for the DWT usually depends on the type of signal being analyzed or some other suitable criterion such as entropy. For the processing of speech signals decomposition up to scale 5 is adequate, with no further advantage gained in processing beyond scale 5. 3.4.3 Truncation of Coefficients After calculating the wavelet transform of the speech signal, compression involves truncating wavelet coefficients below a threshold this means that most of the speech energy is in the high-valued coefficients, which are few. Thus the small valued coefficients can be truncated or zeroed and then be used to reconstruct the signal. This compression scheme provided a segmental signal-to-noise ratio (SEGSNR) of 20 dB, with only 10% of the coefficients.Two different approaches are available for calculating thresholds. The first, known as Global Thresholding involves taking the wavelet expansion of the signal and keeping the largest absolute value coefficients. In this case you can manually set a global threshold, a compression performance or a relative square norm recovery performance. Thus only a single parameter needs to be selected. The second approach known as By Level Thresholding consists of applying visually determined level dependent thresholds to each decomposition level in the wavelet transform.
21
Adaptive compressed sensing of speech signal 3.4.4 Encoding Coefficients Signal compression is achieved by first truncating small-valued coefficients and then efficiently encoding them. One way of approach to compression is to encode consecutive zero valued coefficients, with two bytes. One byte to indicate a sequence of zeros in the wavelet transforms vector and the second byte representing the number of consecutive zeros. For further data compaction a suitable bit encoding format, can be used to quantize and transmit the data at low bit rates. A low bit rate representation can be achieved by using an entropy coder like Huffman coding or arithmetic coding.
22
CHAPTER-4
COMPRESSIVE SENSING
The theory of compressive sensing was developed by Candes et al and Donoho. Compressed sensing (CS) also named compressive sampling .In a typical communication system, the signal is sampled at least at twice the highest frequency contained in the signal. However, this limits efficient ways to compress the signal, as it places a huge burden on sampling the entire signal while only a small number of the transform coefficients are needed to represent the signal. On the other hand, compressive sampling provides a new way to reconstruct the original signal from a minimal number of observations. CS is a sampling paradigm that allows to go beyond the Shannon limit by exploiting the sparsity structure of the signal. It allows to capture and represent the compressible signals at a rate significantly below the Nyquist rate. The signal is then reconstructed from these projections by using different optimization techniques.During compressive sampling only the important information about a signal is acquired, rather than acquiring the important information plus the information of a signal which will be eventually discarded at the receiver. The key elements that need to be addressed before using compressive sensing are the following how to find the transform domain in which the signal has a sparse representation, how to effectively sample the sparsely signal in the time domain, how to recover the original signal from the samples by using optimization techniques. In summary, the large amount of data needed to sample at the Nyquist rate, especially for speech, image and video signals, motivates the study of compressive sensing as a feasible solution for future mobile communication systems. Sparse signals are defined as signals that can be represented by a limited number of data
23
Adaptive compressed sensing of speech signal points in the transform domain. Many real-world signals can be classified into this category using an appropriate transform domain. For instance, if signal is a sine, it is clearly not sparse, but its Fourier transform is extremely sparse. The consequences of acquiring large amounts of data, added to the overhead of compression can be improved by using compressive sensing. As a result, there are potential savings in terms of energy, memory and processing.
Figure 4.1 shows an example of how compressive sensing can be used to compress a signal below Nyquist rate. In this example, the original sampled signal is composed of 300 samples. The intent is to reconstruct the signal using only 30 samples. Figure 4.1 (a) shows the time domain representation of the sampled signal. From this figure, it is evident that by selecting only 30 samples (red dots) from the 300 samples it will be impossible to reconstruct the original signal perfectly. On the other hand, by applying compressive sensing to the frequency representation of the signal it is possible to perfectly reconstruct it from a significant small number of samples. In order to achieve this goal it is necessary to implement optimization techniques. However, not every optimization technique can be used for this purpose. For example, Figure 4.1(c) represents the reconstructed spectrum using the l2 minimization.Clearly; there are significant differences between the signal in Figure 4.1(b) and the signal from Figure 4.1(c). MRITS, DEPARTMENT OF ECE 24
Figure 4.1: (a) Time domain representation of the signal composed of 300 samples, (b) Fourier spectrum of the signal to be encoded (c) Reconstruction of the Fourier spectrum via l2 minimization (d) Reconstruction of the Fourier spectrum via l1 minimization In contrast, the reconstruction using the l1 minimization gives as a result a perfect reconstruction. This can be clearly seen by comparing Figure 4.1(b) and Figure 4.1(d). In summary, optimization techniques based on l1 minimization are desired when compressive sensing is used.
25
* +
via
=< x,
>
Where
vector. It has been seen that some of the measurement matrices can be used in any scenario, in the sense that they are incoherent with any fixed basis such as Gabor, spikes, sinusoidal and wavelets. The compressive sensing measurement process with K-sparse coefficient vector x is depicted in Figure 4.2
The measurement matrix plays a vital role in the process of recovering the original signal.There are two types of measurements matrices that can be used in compressive sensing.The Random measurement matrix and the predefined measurement matrix. The fundamental revelation is that, if a signal x composed of N samples is sparse then the actual signal can be reconstructed using below formula M O (Log (N/K))
26
Adaptive compressed sensing of speech signal Furthermore, x can be perfectly reconstructed using different optimization techniques. If is a structurally random matrix, its rows are not stochastically independent because they are randomized from the same random seed vector. The random matrix is transposed and then orthogonalized. This will have the effect of creating a matrix that represents an orthonormal basis. In a predefined measurement matrix, the matrix is created by using function like the Dirac functions and Sine functions. In this case, the signal is multiplied by several Dirac functions cantered at different locations to obtain the observation vector. Then the speech signal can be reconstructed using the l1 normalization method by using the observation vector and the predefined measurement matrix. Linear programming is another procedure that plays a vital role in reconstructing the original signal. It is a mathematical approach designed to get the best outcome in a given mathematical model, which is a special case of mathematical programming. Linear programming can be expressed in the following canonical form Maximize Subject to Ax b
Where x represents the variable that is to be determined, e and b are vectors of coefficients and A is a matrix of coefficients. The above expression which has to be maximized or minimized is called the objective function and the equation defines the constraints over which the objective function has to be optimized. At the end the reconstruction of the speech signal depends upon the observation vector and the measurement matrix.
Restricted Isometric Property (RIP) which are nearly orthonormal then it is possible to recover the K largest significant coefficients from a similar size set of M=O(K
log(N/K)) measurements of y. As a result, the sparse signal can be reconstructed by different optimization techniques such as l1 norm and convex optimization. The first minimization technique which has been used to reconstruct the signal is the l1 minimization (P1) min || x ||l1 Subject to x = y
This is also known as basis pursuit (P1). The goal of this technique is to find the vectors with the smallest l1-norm = It is also known as Taxicab norm Manhattan norm. The name relates to the distance a taxi has to drive in rectangular street grid to get from the origin to the point x. The distance obtained from this norm is called the Manhattan distance or l1 distance. The other optimization technique known as convex optimization (cvx) will solve many medium and small scale problems. By using cvx the signal is minimized in order to reconstruct the original signal.
Adaptive compressed sensing of speech signal Recently, l1 minimization has been proposed as a convex alternative to the combinatorial norm l1, which simply counts the number of nonzero entries in a vector, for synthesizing the signal as a sparse superposition of waveforms.The program P1min || x ||1 subject to Ax=b, is also known as basis pursuit. The goal of this program is to find the smallest l1 norm. This algorithm search is for a vector x, that will explain the observation b. If the signal x is sufficiently sparse then (P1) will find the norm of x by using the values of A and b. When x, A, and b have real valued entries, (P1) can be recast as a linear program. 4.4.2 Matching Pursuit Orthogonal matching pursuit (OMP) is a canonical greedy algorithm for a sparse approximation. Let represent a matrix of size MN (where typically M < N) and y denotes a vector in RM , the goal of OMP is to recover a coefficient vector x RN with roughly K < M non-zero terms so that x equals y exactly or
approximately. OMP is frequently used to find a sparse representations of the signal y RM in settings where represents a dictionary for the signal space. It is also commonly used in compressive sensing, where y=x represents compressive measurements of a sparse signal x RN to be recovered. One of the attractive features of OMP is its simplicity. OMP is empirically competitive in terms of approximation performance. 4.4.3 Orthogonal Matching Pursuit (OMP) In this project, signal is reconstructed frame by frame using OMP method. OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then is in a loose sense close to the identity. Therefore the largest coordinate of the observation vector y = x is expected to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting of that contribution from the observation vector y and repeating eventually yields the entire support of the signal x. OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit. The algorithms simplicity enables a fast runtime. The algorithm iterates s times, and each iteration does a selection through d elements, multiplies by x and solves a least squares problem. MRITS, DEPARTMENT OF ECE 29
CHAPTER-5
As shown in Figure 5.1, for each frame in a speech sequence, a small number of projections is collected, and compared these projections with the projections collected for the previous frame. Based on the comparison results, the correlation between these two frames is estimated , and the correlation is classified into different categories. Then the sampling strategy is adjusted according to the correlation type and different number of samples for the current frame are collected. For the current t th frame original speech signal, we represent it as X(t) its previous frame t-1 is represented using x(t-1) The difference between X(t) and x(t1) reflects the correlation between the two neighbouring frames, and can be used to classify the correlation. We use the collected measurements to estimate the correlation instead. The same projection matrix is applied to all frames in the partial sampling stage, and we have y(t)-y(t-1)= x(t)- x(t-1), where y(t) and y(t-1) are the the projection vectors of x(t) and x(t-1) respectively. As each sample in y(t)-y(t-1) is a linear combination of x9t)-x(t-1) the difference between the two projection vectors also reflects the intensity changes in the two frame. Therefore, we can estimate the amount of intensity changes in the two frames using only a small number of projections. Let M0 be a matrix containing the first M0 rows of the Gaussian random matrix . For the current frame t, we first use M0 to collect M0 measurements y (t) ^ M0= M0 .x(t) in the partial sampling stage. Then, we compare it with the first M0 measurements in y (t-1) and calculate the difference y (t) ^d=y (t) ^M0-y (t-1) ^M0. In the frame analysis module, given y (t) ^d, we calculate its l ^2 norm normalized by M0 and compare with two thresholds T1 and T2 (T1<T2). MRITS, DEPARTMENT OF ECE 31
Adaptive compressed sensing of speech signal If y (t) ^d/M0<=T1, the current frame is almost the same as its previous frame.We consider the two neighbouring frames may be both surd and label the intraframe correlation as surd vs. surd. If T1< y (t) ^d/M0<=T2, it indicates that these two neighbouring frames undergo small changes. In this situation, the two neighbouring frames may be both sonant at high probability and the intra-correlation is labelled as sonant vs. sonant. If y(t)^d/M0>T0, the two frames are significantly different from each other, which is most likely due to the change of the frame type, and we label the correlation as surd vs. sonant.
, where yt and yt-1 are the projection vectors of xt and xt-1 respectively. As
each sample in is a linear combination of 1 tt the difference between the two projection vectors also reflects the intensity changes in the two frame. Therefore, we can estimate the amount of intensity changes in the two frames using only a small number of projections. Let Mo be a matrix containing the first Mo rows of the Gaussian random matrix . For the current frame t, we first use Mo to collect Mo measurements ytMo =Moxt in the partial sampling stage. Then, we compare it with the first Mo measurements in yt-1 and calculate the difference ytd=ytMo-yt-1Mo. In the frame analysis MRITS, DEPARTMENT OF ECE 32
Adaptive compressed sensing of speech signal module, given ydt, we calculate its l2 norm normalized by Mo and compare with two thresholds T1 and T2 . If ||yd||2/Mo<= T1 the current frame is almost the same as its previous frame. We consider the two neighboring frames may be both surd and label the intra-frame correlation as surd vs. surd. If T1 < ||yd||2/Mo<= T2 it indicates that these two neighboring frames undergo small changes. In this situation, the two neighboring frames may be both sonant at high probability and the intra-correlation is labeled as sonant vs. sonant. If , ||yd||2/Mo >T1 the two frames are significantly different from each other, which is most likely due to the change of the frame type, and we label the correlation as surd vs. sonant. The correlation between the two neighbor frames, and can be used classify the correlation.
33
5.5 Reconstruction
The original signal is reconstructed from a significant small numbers of samples by using different optimization techniques such as l1-norm or convex optimization. In this project we have used orthogonal matching pursuit (OMP) which gives better results when compared to other optimization techniques. Reconstructing frame by frame of signal using OMP method. OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then x is in a loose sense close to the identity. Therefore one would expect the largest coordinate of the observation vector y = x to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting of that contribution from the observation vector y and repeating eventually yields the entire support of the signal x. OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit. The algorithms simplicity enables a fast runtime. The algorithm iterates s times, and each iteration does a selection through d elements, multiplies by x and solves a least squares problem. Reconstruction algorithms compute the support of the sparse signal x iteratively. Once the support of the signal is compute correctly, the pseudo-inverse of the measurement matrix restricted to the corresponding columns can be used to reconstruct the actual signal x. The clear advantage to this approach is speed, but the approach also presents new challenges. 5.5.1 Orthogonal Matching Pursuit Orthogonal Matching Pursuit (OMP), put forth by Mallrat and his collaborators and analyzed by Gilbert and Troop .OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If is such a measurement matrix, then is in a loose sense close to the identity. Therefore one would expect the largest coordinate of the observation vector y = x to correspond to a non-zero entry of x. Thus one coordinate for the support of the signal x is estimated. Subtracting o that contribution from the observation vector y and repeating eventually yields the entire support of the signal x.
34
Adaptive compressed sensing of speech signal OMP is quite fast, both in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit.The algorithms simplicity enables a fast runtime. The algorithm iterates s times,and each iteration does a selection through d elements, multiplies by , and solves a least squares problem. The selection can easily be done in O(d) time, and the multiplication of in the general case takes O(md). When is an unstructured matrix, the cost of solving the least squares problem is O(s2d). However, maintaining a QR-Factorization of |I and using the modied Gram-Schmidt algorithm reduces this time to O(|I|d) at each iteration. Using this method, the overall cost of OMP becomes O(smd). In the case where the measurement matrix is structured with a ast-multiply, this can clearly be improved. 5.5.2. Stagewise Orthogonal Matching Pursuit An alternative greedy approach, Stagewise Orthogonal Matching Pursuit (StOMP) developed and analyzed by Donoho and his collaborators, uses ideas inspired by wireless communications. As in OMP, StOMP utilizes the observation vector y = u.Where u = x is the measurement vector. However, instead of simply selecting the largest component of the vector y, it selects all of the coordinates whose values are above a specied threshold. It then solves a least-squares problem to update the residual. The algorithm iterates through only a xed number of stages and then terminates, whereas OMP requires s iterations where s is the sparsity level.The thresholding strategy is designed so that many terms enter at each stage, and so that algorithm halts after a xed number of iterations. The formal noise level k is proportional the Euclidean norm of the residual at that iteration. This method appears to provide slightly weaker results. It appears however,that StOMP outperforms OMP and Basis Pursuit in some cases. Although the structure of StOMP is similar to that of OMP, because StOMP selects many coordinates at each state, the runtime is quite improved. Indeed, using iterative methods to solve the least-squares problem yields a runtime bound of CNsd + O(d), where N is the xed number of iterations run by StOMP, and C is a constant that depends only on the accuracy level of the least-squares problem.
35
Adaptive compressed sensing of speech signal 5.5.3 Regularized Orthogonal Matching Pursuit As is now evident, the two approaches to compressed sensing each presented disjoint advantages and challenges. While the optimization method provides robustness and uniform guarantees, it lacks the speed of the greedy approach. The greedy methods on the other hand had not been able to provide the strong guarantees of Basis Pursuit. This changed when we developed a new greedy algorithm, Regularized Orthogonal Matching Pursuit , that provided the strong guarantees of the optimization method. This work bridged the gap between the two approaches, and provided the rst algorithm possessing the advantages of both approaches. Regularized Orthogonal Matching Pursuit (ROMP) is a greedy algorithm, but will correctly recover any sparse signal using any measurement matrix that satises the Restricted Isometry Condition. Again as in the case of OMP, we will use the observation vector x as a good local approximation to the s-sparse signal x. Since the Restricted Isometry Condition guarantees that every s columns of are close to an orthonormal system, we will choose at each iteration not just one coordinate as in OMP, but up to s coordinates using the observation vector. It will then be okay to choose some incorrect coordinates, so long as the number of those is limited. To ensure that we do not select too many incorrect coordinates at each iteration, we include a regularization step which will guarantee that each coordinate selected contains an even share of the information about the signal. We remark here that knowledge about the sparsity level s is required in ROMP, as in OMP. Clearly in the case where the signal is not exactly sparse and the signal and measurements are corrupted with noise, the algorithm as described above will never halt. Thus in the noisy case, we simply change the halting criteria by allowing the algorithm iterate at most s times, or until |I| s. We show below that with this modication ROMP approximately reconstructs arbitrary signals.
36
and obtain coefficient vector x. 2. Choose the parameter M to obtain a new observation vector y by Correlating the signal
3. Create a new sensing matrix x and solve the L1 norm equation for the same. 4. Get the estimated coefficient vector s* and the recovered signal y. 5. Compare y with x to check whether the reconstruction is exact. 6. if not, go back to step 2 and increase the number of M. Conventionally, after acquisition of a scene, Discrete Cosine Transform (DCT) is performed on the image using values assigned to each pixel. After DCT, many coefficients will be zero or will carry negligible energy; these coefficients are discarded before quantization or/and entropy coding. Hence, though each frame of the image is acquired fully, much of the acquired information is discarded after DCT causing unnecessary burden on the acquisition process. This makes compressive sampling a good candidate for digital image and video
applications, where the Nyquist rate is so high that compressing the sheer volume of samples is a problem for transmission or storage.
37
Adaptive compressed sensing of speech signal Applying compressive sampling to the whole image is ineffective. In order to exploit any sparsity within an image, we split the image into small nonoverlapping blocks of equal size. Compressive sampling is then performed only on blocks determined to be sparse, i.e., we exploit local sparsity within an image. Note that, in real-time acquisition, it is not possible to test for sparsity of a block before sampling. Reference frames are sampled fully, and DCT is applied on each of the B blocks. We select Bs sparse blocks in the following manner. Let C be a small positive constant, and T an integer threshold that is representative of the average number of non-significant DCT coefficients over all blocks. If the number of DCT coefficients in the block whose absolute value is less than C is larger than T, the block is selected as a reference for compressive sampling. Consider a real-valued, finite-length, one-dimensional, discrete-time signal x, which we view as an N 1 column vector in with elements x[n], n = 1, 2, . . . , N. We treat an image or higher dimensional data by vectorizing it into a long onedimensional vector. Any signal in can be represented in terms of a basis of N 1 vectors { i} =1. For simplicity, assume that the basis is orthonormal. Forming the N N basis matrix = [ 1 | 2 | . . . | n ] by stacking the vectors { i } as columns, we can express any signal x as x= where s is the N 1 column vector of weighting coefficients s1= <x, i
> iX= and where denotes the (Hermitian) transpose operation. Clearly, x and s are equivalent representations of the same signal, with x in the time domain and s in the domain. We will focus on signals that have a sparse representation, where x is a linear combination of just K basis vectors, with K << N. That is, only K of the above equation are nonzero and (N K) are zero. Sparsity is motivated by the fact that many natural and manmade signals are compressible in the sense that there exists a basis where the representation in above equation has just a few large coefficients and many small coefficients. Compressible signals are well approximated by K-sparse representations; this is the basis of transform coding. MRITS, DEPARTMENT OF ECE 38 in
Adaptive compressed sensing of speech signal For example, natural images tend to be compressible in the discrete cosine transform (DCT) and wavelet bases on which the JPEG and JPEG-2000 compression standards are based. Audio signals and many communication signals are compressible in a localized Fourier basis. The main idea of compressive sampling is to remove this sampling redundancy by needing only M samples of the signal, where K < M N. Let y be an M-length measurement vector given by y= s , where is an MN
measurement matrix. The above expression can be written in terms of s as Y=s The signal x can be recovered lossless from M K or slightly more measurements if the measurement matrix is properly designed, so that satisfies the so-called restricted isometric property . This will always be true if and are incoherent, that is, the vectors of cannot sparsely represent basic vectors and vice versa.
39
CHAPTER-6
6.1 Imaging
Many images are sparse with respect to some basis. Because of this, many applications in imaging are able to take advantage of the tools provided by Compressed Sensing. The typical digital camera today records every pixel in an image before compressing that data and storing the compressed image. Due to the use of silicon, everyday digital cameras today can operate in the megapixel range. A natural question asks why we need to acquire this abundance of data, just to throw most of it away immediately. This notion sparked the emerging theory of Compressive Imaging. In this new framework, the idea is to directly acquire random linear measurements of an image without the burdensome step of capturing every pixel initially. Several issues from this of course arise. The rst problem is how to reconstruct the image from its random linear measurements. The solution to this problem is provided by Compressed Sensing. The next issue lies in actually sampling the random linear measurements without rst acquiring the entire image. Compressive sampling camera consists of a digital micro mirror device (DMD), two lenses, a single photon detector and an analog-to-digital (A/D) converter. The rst lens focuses the light onto the DMD. Each mirror on the DMD corresponds to a pixel in the image, and can be tilted toward or away from the second lens. Since this camera utilizes only one photon detector, its design is a stark contrast to the usual large photon detector array in most cameras. The single-pixel compressive sampling camera also operates at a much broader range of the light spectrum than traditional cameras that use silicon. MRITS, DEPARTMENT OF ECE 40
Adaptive compressed sensing of speech signal For example, because silicon cannot capture a wide range of the spectrum, a digital camera to capture infrared images is much more complex and costly. Compressed Sensing is also used in medical imaging, in particular with magnetic resonance (MR) images which sample Fourier coecients of an image. MR images are implicitly sparse and can thus capitalize on the theories of Compressed Sensing. Some MR images such as angiograms are sparse in their actual pixel representation, whereas more complicated MR images are sparse with respect to some other basis, such as the wavelet Fourier basis. MR imaging in general is very time costly, as the speed of data collection is limited by physical and physiological constraints. Thus it is extremely benecial to reduce the number of measurements collected without sacricing quality of the MR image. Compressed Sensing again provides exactly this, and many Compressed Sensing algorithms have been specically designed with MR images in mind. The common approach in digital image system is to capture as many pixels as possible and later compress the captured image by digital means. Compression is desired to increase the storage capacity and enhance the communication process. Compression techniques exploit the visual redundancy typical to human intelligible images. After capturing the optical image and applying data compression, the image is represented by a smaller number of pixels than the original image. The decompressed image should satisfy some desired visual quality.
41
42
43
Adaptive compressed sensing of speech signal First, a feedback structure and signal recovery algorithm, orthogonal pruning pursuit (OPP), is proposed to exploit the prior knowledge to reconstruct the signal in the noise-free situation. To handle the noise, a noise-aware signal reconstruction algorithm based on Bayesian Compressed Sensing (BCS) is developed. Moreover, a novel Turbo Bayesian Compressed Sensing (TBCS) algorithm is developed for joint signal reconstruction by exploiting both spatial and temporal redundancy. Then, the TBCS algorithm is applied to a UWB positioning system for achieving mm-accuracy with low sampling rate ADCs. Finally, hardware implementation of BCS signal reconstruction on FPGAs and GPUs is investigated. Implementation on GPUs and FPGAs of parallel Cholesky decomposition, which is a key component of BCS, is explored. Simulation results on software and hardware have demonstrated that OPP and TBCS outperform previous approaches, with UWB positioning accuracy is improved. The accelerated computation helps enable real-time application of this work. By using compressive sensing techniques, the speech signal is pre coded at the transmitter side which is being sent to the receiver through a wireless channel. As a result, a small number of samples are being transmitted, and this will increase the transmission data rates when compared to the current communication systems. In the proposed communication system, first the speech signal is modelled in such a way that the input signal is sparse enough before applying compressive sensing. The sparse signal is multiplied by the predefined measurement matrix.
44
Adaptive compressed sensing of speech signal The output of the compressive sensing module is then transmitted to the receiver. At the receiver, the signal is perfectly reconstructed from a significant small number of measurements by using different optimization techniques such as l1-norm or convex optimization.
45
CHAPTER -7
SIMULATION RESULTS
To compare the performance of this proposed adaptive compressed sensing and the conventional non-adaptive CS, some experiments are conducted. As a part of that, an arbitrary speech signal has been chosen which is 16 kHz sampled and 16 bits quantized for each sample. Adaptive CS and CS sampling and reconstruction are performed frame by frame, with a frame length of N=320 samples. Threshold values T1 and T2 are chosen as 0.08 and 0.4 respectively which is tested through a great number of experiments. Average Frame Signal to Noise Ratio (AFSNR) is calculated and used to evaluate the reconstruction quality of speech signal. Average Frame Signal to Noise Ratio (AFSNR) is calculated using the formula shown below
AFSNR=1/K
) (
Where K is the total frame number of a speech sequence the frame speech and the
and represent
compressed ratio(r=0.2, r=0.4 and r=0.6), which is defined as r=M/N, the different test results are obtained based on the proposed frame-based adaptive CS using OMP reconstruction algorithm.
46
6 x 10
7
4
6 x 10
7
4
Figure 7.1: Original and reconstructed signals for N=320, M=64, r=0.2
The average frame signal to noise ratio for compression ratio r=0.2 with adaptive projection matrix is 12.030 which is 4.970 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix.
47
6 x 10
7
4
6 x 10
7
4
Figure 7.2: Original and reconstructed signals for N=320, M=128, r=0.4
The average frame signal to noise ratio for compression ratio r=0.4 with adaptive projection matrix is 15.1714 which is 12.103 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the above figure, it is observed that, when the compressed ratio r=0.4, the quality of the reconstructed signal has been increased than the reconstructed signal obtained from compressed ratio r=0.2.
48
6 x 10
7
4
6 x 10
7
4
The average frame signal to noise ratio for compression ratio r=0.6 with adaptive projection matrix is 25.1721 which is 21.630 with non adaptive projection matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the above figure, it is observed that, when the compressed ratio r=0.6, the quality of the reconstructed signal has been increased than the reconstructed signal obtained from compressed ratio r=0.2 and r=0.4.
49
Figure 7.4: Reconstructed speech quality using the proposed adaptive CS and nonAdaptive Compressed sensing.
50
Adaptive compressed sensing of speech signal Table 7.1 Gives the Average Frame Signal to Noise Ratio (AFSNR) of the reconstructed speech based on our proposed frame-based adaptive CS and the conventional non-adaptive CS. Under each different compressed ratio, we can see that reconstructed speeches using the proposed adaptive frame have higher AFSNR than that using non-adaptive frame.
Table 7.1: MOS of the reconstructed speech using non-adaptive CS and the adaptive CS
Thus it is clearly observed that the proposed adaptive compressed sensing is more efficient than the conventional non adaptive compressed sensing.
51
CHAPTER-8
52
REFERENCES
[1] Donoho D L. Compressed sensing [J]. IEEE Transaction on information theory, 2006,52(4):1289-1306 [2] Donoho D,Tsaig Y. Extensions of compressed sensing [J]. Signal Processing, 2006,86(3):533-548. [3] Scott S. chen,David L. Donoho,Michael A.Saunders. Atomic
decomposition by Basis pursuit [J]. 2001 Society for Industrial and Applied Mathematics, SIAM Review,43(1):129-159. [4] J. Tropp, A. Gilbert. Signal recovery from random measurements via orthogonal Matching pursuit. IEEE Transaction on Information Theory [J], 2007, 53(12),4655- 4666. [5] Zh. M Wang, G. R. Arcet, J. L. Paredest. Colored random projections for compressed sensing. ICASSP,2007: 873-876 [6] Zhaorui Liu, Vicky Zhao,A. Y. Elezzabi. Block-based adaptive compressed sensing conference on image proceeding,2010,1649-1652. [7] Jarvis Haupt,Robert Nowak,Rui Castro.Adaptive sensing for sparse signalrecovery.ICASSP,2009,702-707. MRITS, DEPARTMENT OF ECE 53 for video. Proceedings of 2010 IEEE 17th international
Adaptive compressed sensing of speech signal [8] R.Gribonval, M. Nielsen. Sparse representations in unions of bases. IEEE Trascation on Information Theory, 2004,49(12),3320-3325. [9] J. A. Tropp. Greed id good: Algorithmic results for sparse approximation. IEEE Transaction on Information Theory, 2004, 50(10), 2231-2242.
54
Adaptive compressed sensing of speech signal Appendix A: MATLAB code for sampling the speech signal in mobile system using Compressed sensing
clc; clear all; %Fs Hz (samples per second) is the rate at the speech signal is sampled Fs=2000; x=wavread('test.wav'); figure(1) stem(x) title('Recorded input speech signal'); xlabel('Length of the input speech signal'); ylabel('Amplitude of the input speech signal'); %Discrete cosine transform of the recorded signal a0=dct(x) figure(2) stem(a0) axis([0 2000 -1 1]); title('Discrete cosine transform of the recorded signal'); xlabel('Length of the DCT spectrum'); ylabel('Amplitude of the DCT spectrum'); % Thresholding the spectrum to make it sparse for i=1:1:2000; if a0(i,1)<=0.04 && a0(i,1)>=-0.06 a0(i,1)=0; MRITS, DEPARTMENT OF ECE 55
Adaptive compressed sensing of speech signal else a0(i,1)=a0(i,1); end end a0; figure(3) stem(a0) axis([0 2000 -1 1]); title('The Threshold spectrum'); xlabel('The length of the threshold spectrum'); ylabel('Amplitude of the threshold spectrum'); % Sparsity of the spectrum(K)and Length of the signal (N) K=800 N=2000 % Random measurement matrix disp('Creating measurment matrix...'); A = randn(K,N); A = orth(A')'; figure(4) imagesc(A) colorbar; colormap('lines'); title('Random Measurement matrix'); disp('Done.'); % observations vector
56
Adaptive compressed sensing of speech signal y = A*a0; figure(5) plot(y) title('Observation Vector'); %initial guess = min energy x0 = A'*y; %solve the LP tic xp = l1eq_pd(x0, A, [], y, 1e-2); toc figure(6) plot(xp) axis([0 2000 -0.6 0.6]); title(' Reconstructed Spectrum using l1-minimization'); % Inverse dicrete cosine transform of reconstructed signal (IDCT) Xrec=idct(xp) wavplay(Xrec,Fs) figure(7) stem(Xrec) title('Reconstructed signal at the receiver'); xlabel('Length of the reconstructed signal using IDCT'); ylabel('Amplitude of the reconstructed signal using IDCT'); % Calculating Absolute error between the reconstructed and actual signal err=(max(abs(Xrec-x))) stem(err); title(' Absolute Error of Reconstructed spectrum and Threshold spectrum ');
57
Adaptive compressed sensing of speech signal xlabel('Length of the Maximum Absolute Error'); ylabel('Maximum Absolute error')
58
Adaptive compressed sensing of speech signal Appendix B:MATLAB code for compressing the test signal using wavelet compression
% Load original one-dimensional signal. Fs=2000 s=wavread('test.wav')'; figure(1) stem(s) title('Input speech signal'); xlabel('Length of the input speech signal'); ylabel('Amplitude of the input speech signal'); l_s = length(s); % Wavelet transform of input signal [cA1,cD1] = dwt(s,'db1'); %To extract the Level-1 Approximation and Detail coefficient A1 = idwt(cA1,[],'db1',l_s); D1 = idwt([],cD1,'db1',l_s); figure(2) subplot(1,2,1); plot(A1); title('Approximation A1') subplot(1,2,2); plot(D1); title('Detail D1') %Inverse Wavelet transform of Approximation and detail coefficient A0 = idwt(A1,D1,'db1',l_s); wavplay(A0,Fs) figure(3) stem(A0) title('Recontructed speech signal'); MRITS, DEPARTMENT OF ECE 59
Adaptive compressed sensing of speech signal xlabel('Length of the reconstructed speech signal'); ylabel('Amplitude of the reconstructed speech signal'); err = max(abs(s-A0))
60