Professional Documents
Culture Documents
Contents
Introduction to speech Compression and its need Polynomial approximation Frame parameters Interpolation Encoding Compression of spectral component,gain,pitch
Speech Signal
Human Speech is acoustic signal It is converted to electrical signal by transducers.
With the advent of digital computing machines, it was propounded to exploit the powers of the same for processing of speech signals. The analog signal is sampled at some frequency and then quantized at discrete levels.
What is COMPRESSION?
Compression is a process of converting an input data stream into another data stream that has a smaller size. Compression is possible only because input data has some amount of redundancy associated with it. The main objective of compression systems is to eliminate this redundancy.
Why Compression?
Multimedia files in general need plenty of disk space for storage and sound files are no exception. Hence compression of these files has become a necessity. When compression is used to reduce storage requirements, overall program execution time may be reduced. This is because reduction in storage will result in the reduction of disc access attempts.
Applications of Compression
1. The use of compression in recording applications is extremely powerful. The playing time of the medium is extended in proportion to the compression factor. 2. In the case of tapes, the access time is improved because the length of the tape needed for a given recording is reduced and so it can be rewound more quickly.
3. In digital audio broadcasting and in digital television transmission, compression is used to reduced the bandwidth needed. 4. The time required for a web page to be displayed and the downloading time in case of files is greatly reduced due to compression.
Polynomial Approximation-Introduction
Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. A method for compressing speech is based on polynomial approximations of the trajectories in time of various speech features (i.e., spectrum, gain, and pitch).
Continued..
One method of compression, called segmental coding, uses polynomial functions to approximate trajectories of speech features present in successive time frames. Useful compression results if the number of bits per second needed to transmit the polynomial coefficients is smaller than the number of bits per second needed to transmit the original feature frames for the segment.
The frame parameters are transmitted to the receiver (decoder) where the signal is synthesized to resemble the original signal. Linear interpolation is employed usually in the decoder between successive frames to smooth the transitions of the parameters across frames. Successive frames are analyzed independently of each other and they usually contain some redundant information. Additional compression can be obtained by exploiting this redundancy across successive frames.
Continued..
There are two main advantages to using polynomial functions for approximation: First, they can approximate various shapes of feature trajectories with arbitrary accuracy, depending on the polynomial order Second, they are described by a relatively small number of parameters which is necessary to achieve a significant compression.
Approximation
One of the most popular approaches to function approximation is the least squares method. Thus, for any arbitrary function f(x), continuous on a closed interval [,], there exists an algebraic polynomial p(x), of order d, that can best approximate the function on that interval in the L2 norm.
It is assumed that there is a bidirectional transform between the original signal and its vector space representation. Such bidirectional transformation is also required in order to reconstruct the original signal from the vector-space representation.
Interpolation
interpolation is done in the least-squares sense by a polynomial function defined as follows: Fi,P(n)=ai,0+ai,1n+ai,2n2++ai,pnP where ai,0 ai,pnP are the polynomial coefficients, and P is the polynomial order for feature element .
The maximum order of the polynomial is limited to P=N-1 because there are only N data points available to estimate the coefficients in the leastsquares sense. The lower the polynomial order P in the range[0,,N-1], the higher the approximation error for an arbitrary trajectory.
Compression
Thus, the condition for achieving compression is P+1<N which presupposes some approximation errors. A feature compression factor is defined as follows:
Conclusion
Polynomial approximation has proved to be a useful and efficient method for compression of the speech parameters. Such a method can be applied to both coding and storing of speech. The spectral parameters, especially those with low dynamics such as LSF parameters, are particularly suitable to supplementary compression by polynomial approximation. In addition, the gain and pitch parameters can also be compressed by polynomial approximation methods.