Professional Documents
Culture Documents
Audio compression
By K.Murugan,DDE
STI(T),Delhi.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Agenda
1)What is compression?
Compression
- Content compression can be as simple as removing all extra space characters,
inserting a single repeat character to indicate a string of repeated characters, and
substituting smaller bit strings for frequently occurring characters.
- Audio compression is a form of data compression designed to reduce the size of audio
files .
Digital audio
— Compact Disc (CD) quality stereo audio have the following
specification :
— • Sampling frequency : 44.1 KHz
— • 16-bits/ sample for each of the two stereo channels
Digital audio
— Although high bandwidth channels are available,
there is a necessity to achieve compression for low bit
rate applications in cost effective storage and
transmission.
— Spectral efficiency.
— Excellent coding quality can be achieved with bit rates
of 0.5 to 1 bit/ sample for speech and wideband speech
and 1 to 2 bit/sample for audio.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Compression standards
— a) MPEG-1 audio : Total bit rate of 1.5 Mbit/sec for
CD quality multimedia storage,
Compression standards
— b) MPEG-2 audio : HDTV applications.
Compression standards
— c) MPEG-4 audio : The MPEG-4 standards for audiovisual
coding addresses applications ranging from
— mobile access,
— low complexity multimedia terminals to high complexity multi
channel sound systems.
— The major feature of MPEG audio is that , the coders exploit the
perceptual limitations of the human auditory system.
Compression Goals
— Reduced Bandwidth.
— Scalable.
— Robust.
— Standards.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Compression
— Low bit rate audio coding is an “enabling technology”
for applications such as
— digital radio,
— Internet streaming,
— Mobile multimedia applications .
— In the case of broadcast mediums, transmission of
audio from remote locations to a central broadcast
facility using the
— Public Switched Telephone Network,
— ISDN.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Psychoacoustics
— Relationship between what comes to your ear and what you hear.
— Auditory Masking.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
— Simultaneous masking is a
frequency domain phenomena
where a low level signal can be
made inaudible by a
imultaneously occurring stronger
signal (as long as masker and
maskee are close enough to each
other in frequency)
Audio compression
— In addition to simultaneous masking, the time-domain phenomenon
of temporal masking pays an important role in human auditory
perception.
— It may occur when two sounds appear within a small interval of time.
— Depending on the signal levels, the stronger sound may mask the
weaker one, even if the maskee precedes the masker.
Audio compression
— When compressing speech and music signals, it is not crucial to retain
the input signal exactly. It is sufficient that the output signal appears
to sound identical to a human listener.
Audio compression
— MP3. It has become very popular for compressing CD quality music
with almost no audible degradation down from 1.4 Mbit/s to 128
kbit/s. This means that an ISDN connection can be used for real-time
transmission and that a full-length song can be stored in 3-4 Mbytes.
Perceptual coder
— Perceptual coder maintain sampling frequency like PCM, but
selectively decrease the word length.
— Coder analyze the frequency and amplitude of the input signal and
compare with the human auditory perception.
— Using the model removes the irrelevancy and redundancy of the audio
signal.
— 16 bits/sample will be reduced to a average of 2.67 bits/sample.
— Two kind of bit allocation is performed.
— Forward adaptive allocation: All allocation is performed in the encoder
and the details are included in the bit stream.
— Encoder is sophisticated.
— Backward adaptive allocation: Bit allocation information is derived
from the coded audio which is not containing explicit information
from the encoder.
— Errors are limited to the critical band.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
Perceptual
model
Quantization
Band and scale Framing
Splitting factors
PCM
Audio Data-reduced
bit stream
Perceptual compression
— Spectral analysis.
— Quantize.
— Code.
Perceptual coder
— Perceptual coders- maintain sampling frequency but selectively
decrease word length.
Spectral analysis/Synthesis
Subband coding
— Blocks of consecutive time domain samples of broadband signals are
collected over a short period and applied to the filter banks.
Audio compression
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
— Low and medium bit rate SBR can give the efficiency of 30% over
perceptual coders.
— SBR with MP3(MP3PRO) can reduce the bit rate of stereo mp3 bit
stream fro 96 kbps to 64kbps.
SBR
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
— 1. Introduction.
— Mode can be selected in one of single channel, dual channel, stereo and
joint stereo.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 1
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 1
— Input can be Analog audio/PCM digital with 32,44.1 and 48KHz sampling
frequency.
.
— In all they make 384 samples in a frame in 8 sec for 48 KHz sampling.
— Coder analyzes the energy in each sub bands for audible information.
— Perceptual coding calculates the average power level in each sub band in 8 ms.
— Masking levels in each sub band and adjacent sub bands are estimated.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 1
— Minimum threshold levels are applied.
— Peak power levels in each sub bands are calculated and compared with
masking levels.
— Min masking threshold is calculated.
— SMR is calculated with quantized value range from 2 to 15 bits.
— For every sub band absolute peak value of the 12 samples are compared
to the scale factor of the table.
— 6.O2n+1.76=S/N.
— 15 bit for 92 dB S/N.
— 6 bit scale factor at 2dB step.
— Quantized values are scaled to give -118dB to +6dB in 2dB steps.
— Masking threshold and scale factors are calculated only once for every
group of 12 samples forming a 12 x 32=384 samples
Peak power levels in each sub bands are calculated and compared with
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 1
— Frames: — Bit rate is fixed to 384 kbps.
— Subband samples.
— For 44.1KHz at fixed 32 sub band
— Synchronization information. the width becomes 689.06 Hz.
MPEG1 Layer 1
— Header describes the sampling rate and any use of pre- emphasis.
— Block of 32 four-bit allocation codes specify the wordlength used in each
sub-band
— 32 six-bit scale factor specify the gain given to each band during
companding.
Decoder
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 1
— Decoder Layer 1 decoding proceeds frame by frame.
MPEG1 Layer 1
— Decoder:
— No psychoacoustic processing.
— No bit allocation.
— Cheaper.
MPEG1 Layer 2
— More sophisticated in
design.
— 32 – 192 kbps.
— DAB
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer2
— Filter creates 32 equal width sub bands.
MPEG1 Layer2
— The 1152 sample block of Layer II is divided into three blocks of 384 samples so
that the same companding structure as Layer I can be used.
MPEG1 Layer2
— Quantization:
— For greater efficiency 3 successive samples from all sub(32) bands are
grouped to form granule and quantized together.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer3
— Moderate fidelity
even at very low bit
rate.
MPEG1 Layer3
— Layer III is the most complex layer.
— The original ASPEC coder used a direct MDCT on the input samples.
— In Layer III this was modified to use a hybrid transform incorporating the
existing polyphase 32 band QMF of Layers I and II and retaining the block size
of 1152 samples.
— In Layer III, the 32 sub-bands from the QMF are further processed by a
critically sampled MDCT.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer3
— The input to the encoder is normally PCM coded data that is split into
frames of 1152 samples.
— The frames are further divided into two granules of 576 samples each.
— The frames are sent to both the Fast Fourier Transform (FFT) block
and the analysis filter bank.
— The FFT block transforms granules of 576 samples to the frequency
domain using a Fourier transform.
— The frequency information from the FFT block is used together with a
psychoacoustic model to determine the masking thresholds for all
frequencies.
— The masking thresholds are applied by the quantizer to determine
how many bits are needed to encode each sample.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 3
Analysis Filterbank
MPEG1 Layer3
MDCT with Dynamic Windowing
— The subband samples are transformed to the frequency domain through a
modified discrete cosine transform (MDCT). The MDCT is performed on blocks
that are windowed and overlapped 50%.
— The MDCT is normally performed for 32 sub band samples x 18 diff freq at a time
(long blocks 24ms) to achieve good frequency resolution.
— 18 samples at a time (long blocks).
— It can also be performed on32 sub band samples x 6 diff freq at a time (short
blocks) to achieve better time resolution, and to minimize pre-echoes.
— 6 samples at a time (short blocks).
— There are special window types for the transition between long and short blocks.
MPEG1 Layer 3
— Scaling and Quantization
— The masking thresholds are used to iteratively determine how many bits are
needed in each critical band to code the samples so that the quantization
noise is not audible.
— The encoder usually also has to meet a fixed bit rate requirement.
— Non-uniform quantizing is used, in which the quantizing step size becomes
larger as the magnitude of the coefficient increases.
— The quantized samples are Huffman coded and stored in the bit stream along
with the scale factors and side information.
— This is a technique where the most common code values are allocated the
shortest word length.
— The frame consists of four parts: header, side information, main data, and
ancillary data.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 3
— The main data (coded scale factor value and Huffman coded data) are
not necessarily located adjacent to the side information.
— All the main data for one frame is stored in that and previous frames.
The maximum size of the bit reservoir is 511 bytes.
— The header is always 4 bytes long and contains information about the
layer, bit rate, sampling frequency and stereo mode. It also contains a
12-bit sync word that is used to find the start of a frame in a bit stream.
Side Information:
— Huffman table selection, scale factors, requantization parameters and
window selection.
— This section is 17 bytes long in single channel mode and 32 bytes in dual
channel mode
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Mp3 Frame
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 3
— Decoder:
— Huffman variable length coded sample.
— The 576 spectral lines of each granule are partitioned into five regions.
— Huffman coding scheme assumes that the large values occurs at the low
spectral frequencies and mainly low values and zeroes occur at the high
spectral frequencies.
— The sample requantization block uses the scale factors to convert the
Huffman decoded values back to their spectral values.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG1 Layer 3
— The alias reduction is required to negate the aliasing effects of the
polyphase filterbank in the encoder. It is not applied to granules
that use short blocks.
MPEG1 Layer 3
— 128 kbps is the most common, because it typically offers
adequate audio quality in a relatively small space.
MPEG1 Layer 3
— In the early years of MP3, all — It will take too much time to
files were encoded at a constant force mp3 out of the market.
bitrate or CBR.
AAC
— The input signal passes to the filter bank and the perceptual model in parallel.
The filter bank consists of a 50 per cent overlapped critically sampled MDCT
which can be switched between block lengths of 2048 and 256 samples.
— At 48 kHz the filter allows resolutions of 23 Hz and 21 ms or 187 Hz and 2.6 ms.
— Following the filter bank is the intra-block predictive coding module. When
enabled this module finds redundancy between the coefficients within one
transform block.
— In the time domain, predictive coding works well on stationary signals but fails
on transients. The dual of this characteristic is that in the frequency domain,
predictive coding works well on transients but fails on stationary signals
— The prediction method is a conventional forward predictor structure in which
the result of filtering a number of earlier coefficients is used to predict the
current one.
— The prediction is subtracted from the actual value to produce a prediction error
or residual which is transmitted.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
MPEG2 AAC
— In the time domain, predictive — Following the filter bank is the intra-block
coding works well on stationary predictive coding module.
— When enabled this module finds
— signals but fails on transients. redundancy between the coefficients
The dual of this characteristic is within one transform block
that in the frequency domain,
predictive coding
— works well on transients but
fails on stationary signals
— AAC supports up to 48 audio
channels.
— The use of TNS also allows the
coder to use longer blocks more
of the time
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
AAC
— Advanced Audio Coding and is a lossy audio
compression format.
— Audio files that have been encoded with AAC are generally
smaller in size and deliver a higher quality of sound than
MP3.
AAC
— AAC is 100% more powerful than layer2 and 30%
powerful than layer3.
— AAC uses tools like temporal noise shaping,backward
adaptive coding.
— And techniques used in mpeg1 layer3.
— With the above gives, high fidelity audio at low bit
rate.
— AAC with SBR (aacplus)gives stereo at 48 kbps.
— aacplus for Digital Radio Mondial.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
FLAC
— Free Lossless Audio
Coder.
SBR
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
— Digital compression of audio has become increasingly more important
with the advent of fast and inexpensive microprocessors. It is used in
many applications such as transmission of speech in the GSM mobile
phone system, storing music in the digital cassette format, and for the
DAB digital broadcast radio.
— Normally no information loss is acceptable when compressing digital
data such as programs, source code, and text documents.
— Entropy coding is the method most commonly used for lossless
compression. It exploits the fact that all bit combinations are not as
likely to appear in the data, which is used in coding algorithms such as
Huffman.
— This approach works for the data types mentioned above, however
audio signals such as music and speech cannot be efficiently
compressed with entropy coding.