Audio CompressionEA

Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Audio compression
By K.Murugan,DDE
STI(T),Delhi.
Agenda
1)What is compression?
2)Why audio compression?
3)How audio compression?
4) Advantages of audio compression
5)Application and standards of compressed

audio.
Compression
- Content compression can be as simple as removing all extra space characters,
inserting a single repeat character to indicate a string of repeated characters, and
substituting smaller bit strings for frequently occurring characters.
- Audio compression is a form of data compression designed to reduce the size of audio
files .
- Compression is the reduction in size of data in order to save space or transmission

time.
- A force that tends to shorten or squeeze something, decreasing its volume.

Digital audio
— Compact Disc (CD) quality stereo audio have the following
specification :
— • Sampling frequency : 44.1 KHz
— • 16-bits/ sample for each of the two stereo channels
— Therefore, the net bit-rate required is 2 x 16 x 44.1 x 10 = 1.41

Mbits/sec.
— Extra bits are required for synchronization and error correction,
resulting in 49 bits for every 16-bit audio sample.
— Thus, the total stereo bit-rate requirement 1.41 x 49/16 Mbit/sec
= 4.32 Mbit/sec.
Digital audio
— Although high bandwidth channels are available,
there is a necessity to achieve compression for low bit
rate applications in cost effective storage and
transmission.
— Spectral efficiency.
— Excellent coding quality can be achieved with bit rates
of 0.5 to 1 bit/ sample for speech and wideband speech
and 1 to 2 bit/sample for audio.
Compression standards
— a) MPEG-1 audio : Total bit rate of 1.5 Mbit/sec for
CD quality multimedia storage,
— 1.2 Mbits/sec is allocated to video

—
— and 256 Kbits/sec is allocated to audio.
— Up to two channels of audio are accommodated.

— b) MPEG-2 audio : HDTV applications.
— In its audio part, two to five full bandwidth audio

channels are accommodated.
— The standard also offers a collection of tools known as

Advanced Audio Coding (MPEG-2 AAC)
— c) MPEG-4 audio : The MPEG-4 standards for audiovisual
coding addresses applications ranging from
— mobile access,
— low complexity multimedia terminals to high complexity multi
channel sound systems.
— The major feature of MPEG audio is that , the coders exploit the
perceptual limitations of the human auditory system.
— Compression is achieved by eliminating the perceptually

irrelevant parts of audio, which cannot produce any audio
distortion.
Compression Goals
— Reduced Bandwidth.
— Reduced storage space.
— Make decoded signal as close to the original signal.
— Don’t transmit what your ear cannot hear.
— Scalable.
— Robust.
— Standards.
Compression
— Low bit rate audio coding is an “enabling technology”
for applications such as
— digital radio,
— Internet streaming,
— Mobile multimedia applications .
— In the case of broadcast mediums, transmission of
audio from remote locations to a central broadcast
facility using the
— Public Switched Telephone Network,
— ISDN.
Audio compression
Psychoacoustics
— Relationship between what comes to your ear and what you hear.
— Range of human hearing.
— Auditory Masking.
— One signal can make another inaudible.

Audio compression
Audio compression
Audio compression
Audio compression
Audio compression
Audio compression
— Simultaneous masking is a
frequency domain phenomena
where a low level signal can be
made inaudible by a
imultaneously occurring stronger
signal (as long as masker and
maskee are close enough to each
other in frequency)
— The slope of the masking threshold

is steeper toward lower frequencies
i.e. higher frequencies are more
easily masked.
Audio compression
— In addition to simultaneous masking, the time-domain phenomenon
of temporal masking pays an important role in human auditory
perception.
— It may occur when two sounds appear within a small interval of time.
— Depending on the signal levels, the stronger sound may mask the
weaker one, even if the maskee precedes the masker.
— The duration within which premasking applies is significantly less

than that of the post masking which is in the order of 50 to 200 ms.
Audio compression
— When compressing speech and music signals, it is not crucial to retain
the input signal exactly. It is sufficient that the output signal appears
to sound identical to a human listener.
— This is the method used in perceptual audio coders.
— A perceptual audio coder uses a psychoacoustic effect called 'auditory

masking', where the parts of a signal that are not audible due to the
function of the human auditory system are reduced in accuracy or
removed completely.
Audio compression
— MP3. It has become very popular for compressing CD quality music
with almost no audible degradation down from 1.4 Mbit/s to 128
kbit/s. This means that an ISDN connection can be used for real-time
transmission and that a full-length song can be stored in 3-4 Mbytes.
— The lossy compression scheme described here achieves coding gain by

exploiting both perceptual irrelevancies and statistical redundancies.
Perceptual coder
— Perceptual coder maintain sampling frequency like PCM, but
selectively decrease the word length.
— Coder analyze the frequency and amplitude of the input signal and
compare with the human auditory perception.
— Using the model removes the irrelevancy and redundancy of the audio
signal.
— 16 bits/sample will be reduced to a average of 2.67 bits/sample.
— Two kind of bit allocation is performed.
— Forward adaptive allocation: All allocation is performed in the encoder
and the details are included in the bit stream.
— Encoder is sophisticated.
— Backward adaptive allocation: Bit allocation information is derived
from the coded audio which is not containing explicit information
from the encoder.
— Errors are limited to the critical band.
Compression
Audio compression
Perceptual
model
Quantization
Band and scale Framing
Splitting factors
PCM
Audio Data-reduced
bit stream
Fig. 7 : Psychoacoustic low bit rate coder

Perceptual compression
— Spectral analysis.
— Calculate the threshold.
— Remove inaudible component.
— Quantize.
— Code.
— Pack data and frame.

Perceptual coder
— Perceptual coders- maintain sampling frequency but selectively
decrease word length.
— PCM all signals are given equal word lengths.
— Perceptual coders assign bits according to the audibility.
— Prominent tone is given a large numbers of bits to ensure

audible integrity.
Spectral analysis/Synthesis
— Break the signal into spectrum.

ie Time domain to frequency domain.
Recover the signal from the spectrum.
Frequency domain to time domain.

Subband coding
— Blocks of consecutive time domain samples of broadband signals are
collected over a short period and applied to the filter banks.
— Filter banks divides the signal into multiple bandlimited channels in

approximation to the criticalbands response of human ear.
— Each subband is coded independently with high or low bits allocated

to the samples in the subband.
— Bit allocation will be decided by the psychoacoustic model.
— This operation is repeated for every new block of data.
— Inverse synthesis filter bank in the decoder sums the subband signals
to construct the broadband signal.
Audio compression
Spectral Band Replication

— SBR allows codec to reduce the bit rate through bandwidth
reduction.
— SBR is primarily a post process which occurs at the receiver.
— Lower part of the spectrum is transmitted and the higher frequency

components are constructed based on the lower transmitted
frequencies and control information.
— Low and medium bit rate SBR can give the efficiency of 30% over
perceptual coders.
— SBR with MP3(MP3PRO) can reduce the bit rate of stereo mp3 bit
stream fro 96 kbps to 64kbps.
— SBR techniques used in MPEG 4 AAC.

SBR
Audio compression
— 1. Introduction.
— The MPEG1 audio becomes a popular algorithm for audio data

compression due to its high quality with various compression rate,
sampling rate and mode.
— It can compress 1.5Mbit/sec CD quality audio data into
— 32 to 448 Kbit/sec for layer I

— 32 to 384 Kbit/sec for layer II.
— 32 t0 320 kbps for Layer III.
— It requires to support sampling rates of 32,44.1 and 48Khz.
— Mode can be selected in one of single channel, dual channel, stereo and
joint stereo.
MPEG1 Layer 1
MPEG1 Layer 1
— Input can be Analog audio/PCM digital with 32,44.1 and 48KHz sampling
frequency.
— Filter creates 32 equal width sub bands.
— Each sub band will have 12 samples.
.
— In all they make 384 samples in a frame in 8 sec for 48 KHz sampling.
— 512 point FFT in Layer1 to determine the masking threshold.
— Coder analyzes the energy in each sub bands for audible information.
— Perceptual coding calculates the average power level in each sub band in 8 ms.
— Masking levels in each sub band and adjacent sub bands are estimated.
MPEG1 Layer 1
— Minimum threshold levels are applied.
— Peak power levels in each sub bands are calculated and compared with
masking levels.
— Min masking threshold is calculated.
— SMR is calculated with quantized value range from 2 to 15 bits.
— For every sub band absolute peak value of the 12 samples are compared
to the scale factor of the table.
— 6.O2n+1.76=S/N.
— 15 bit for 92 dB S/N.
— 6 bit scale factor at 2dB step.
— Quantized values are scaled to give -118dB to +6dB in 2dB steps.
— Masking threshold and scale factors are calculated only once for every
group of 12 samples forming a 12 x 32=384 samples
Peak power levels in each sub bands are calculated and compared with
MPEG1 Layer 1
— Frames: — Bit rate is fixed to 384 kbps.
— Subband samples.
— For 44.1KHz at fixed 32 sub band
— Synchronization information. the width becomes 689.06 Hz.
— Scale factors. — Reduction in frame rate.
— Bit allocation details. — Additional bits per frames are

used.
— Control bits for sampling frequency.

— Mild compression- low cost
MPEG1 Layer 1
— Header describes the sampling rate and any use of pre- emphasis.
— Block of 32 four-bit allocation codes specify the wordlength used in each
sub-band
— 32 six-bit scale factor specify the gain given to each band during
companding.
— The last block contains 32 sets of 12 samples. These samples vary in

wordlength from one block to the next, and can be from 0 to 15 bits long.
Decoder
MPEG1 Layer 1
— Decoder Layer 1 decoding proceeds frame by frame.
— Quantized samples are dequantized to form a normalized samples.
— Received scale factors are arranged in a array of 2 column 32 rows.

— Column-channel.
— Row- Subbands.
— Scale factor corresponding to the particular sample is applied.
— Subband samples are multiplied by the SF.
— Empty subbands are assigned zero value.
— Filter ( inverse) gives the broadband audio.

MPEG1 Layer 1
— Decoder:
— No psychoacoustic processing.
— No bit allocation.
— Cheaper.
— Transparent to improvement in encoder technology.
— Difficult to differentiate Layer1 rec and an original CD rec.

MPEG1 Layer 2
— More sophisticated in
design.
— Provide high quality at

moderate rate.
— Marginally high cost.
— 32 – 192 kbps.
— DAB
MPEG1 Layer2
— Filter creates 32 equal width sub bands.
— But the frame size is tripled.
— 1024 FFT for better differentiation of tonal and nontonal

component for the masking calculation.
— 3 groups of 12 samples in each sub bands.
— Max of 3 scale factor for each sub bands.
— To reduce scale factor bit rates, SF are shared between

groups.
— The sub-bands are categorized into three frequency
ranges, low, medium and high.
MPEG1 Layer2
— The 1152 sample block of Layer II is divided into three blocks of 384 samples so
that the same companding structure as Layer I can be used.
— Not all the scale factors are transmitted (redundancy).

— This technique effectively halves the scale factor bit rate.
— As the transient content increases in a given sub-band, two or three scale
factors will be sent.
— Layer I, the requantizing process always uses an odd number of steps to allow
a true centre zero step.
— But when three, five or nine quantizing intervals are used for samples, binary
is inefficient.
— when three-, five- or nine-level coding is used in a subband, sets of three
samples are encoded into a granule.
— Five quantizing intervals, each sample could have five different values.
— The quantized samples/granules in each sub-band, bit allocation data, scale

factors and scale factor select codes are multiplexed into the output bit stream.
MPEG1 Layer2
— Quantization:
— Lower sub bands can receive as many as 15 bits.
— Middle sub bands can receive 7 bits
— Higher sub bands are limited to 3 bits.
— For greater efficiency 3 successive samples from all sub(32) bands are
grouped to form granule and quantized together.
MPEG1 Layer2 Decoder

MPEG1 Layer3
— More complex design

than Layer I and II.
— Moderate fidelity
even at very low bit
rate.
— Layer III files are

known as MP3.
MPEG1 Layer3 Encoder

MPEG1 Layer3
— Layer III is the most complex layer.
— It is a transform code based on the ASPEC system with certain modifications

to give a degree of commonality with Layer II.
— The original ASPEC coder used a direct MDCT on the input samples.
— In Layer III this was modified to use a hybrid transform incorporating the
existing polyphase 32 band QMF of Layers I and II and retaining the block size
of 1152 samples.
— In Layer III, the 32 sub-bands from the QMF are further processed by a
critically sampled MDCT.
MPEG1 Layer3
— The input to the encoder is normally PCM coded data that is split into
frames of 1152 samples.
— The frames are further divided into two granules of 576 samples each.
— The frames are sent to both the Fast Fourier Transform (FFT) block
and the analysis filter bank.
— The FFT block transforms granules of 576 samples to the frequency
domain using a Fourier transform.
— The frequency information from the FFT block is used together with a
psychoacoustic model to determine the masking thresholds for all
frequencies.
— The masking thresholds are applied by the quantizer to determine
how many bits are needed to encode each sample.
MPEG1 Layer 3
Analysis Filterbank
— The analysis filter bank consists of 32 band pass filters of equal

width.
— The output of the filters are critically sampled.
— There are 18 samples output from each of the 32 band pass

filters, which gives a total of 576 subband samples.
MPEG1 Layer3
MDCT with Dynamic Windowing
— The subband samples are transformed to the frequency domain through a
modified discrete cosine transform (MDCT). The MDCT is performed on blocks
that are windowed and overlapped 50%.
— The MDCT is normally performed for 32 sub band samples x 18 diff freq at a time
(long blocks 24ms) to achieve good frequency resolution.
— 18 samples at a time (long blocks).
— It can also be performed on32 sub band samples x 6 diff freq at a time (short
blocks) to achieve better time resolution, and to minimize pre-echoes.
— 6 samples at a time (short blocks).
— There are special window types for the transition between long and short blocks.
— They are start and stop windows.

MPEG1 Layer 3
— Scaling and Quantization
— The masking thresholds are used to iteratively determine how many bits are
needed in each critical band to code the samples so that the quantization
noise is not audible.
— The encoder usually also has to meet a fixed bit rate requirement.
— Non-uniform quantizing is used, in which the quantizing step size becomes
larger as the magnitude of the coefficient increases.
— The quantized samples are Huffman coded and stored in the bit stream along
with the scale factors and side information.
— This is a technique where the most common code values are allocated the
shortest word length.
— The frame consists of four parts: header, side information, main data, and
ancillary data.
MPEG1 Layer 3
— The main data (coded scale factor value and Huffman coded data) are
not necessarily located adjacent to the side information.
— All the main data for one frame is stored in that and previous frames.
The maximum size of the bit reservoir is 511 bytes.
— The header is always 4 bytes long and contains information about the
layer, bit rate, sampling frequency and stereo mode. It also contains a
12-bit sync word that is used to find the start of a frame in a bit stream.
Side Information:
— Huffman table selection, scale factors, requantization parameters and
window selection.
— This section is 17 bytes long in single channel mode and 32 bytes in dual
channel mode
Mp3 Frame
MPEG1 Layer 3
— Decoder:
— Huffman variable length coded sample.
— The 576 spectral lines of each granule are partitioned into five regions.
— Huffman coding scheme assumes that the large values occurs at the low
spectral frequencies and mainly low values and zeroes occur at the high
spectral frequencies.
— High value Region contains values from -8206 to 8206.

— Count 1 - -1,1
— Count zero- 0.
— The sample requantization block uses the scale factors to convert the
Huffman decoded values back to their spectral values.
MPEG1 Layer3 Decoder

MPEG1 Layer 3
— The alias reduction is required to negate the aliasing effects of the
polyphase filterbank in the encoder. It is not applied to granules
that use short blocks.
— The IMDCT (Inverse Modified Discrete Cosine Transform)

transforms the frequency lines to polyphase filter subband
samples.
— The synthesis polyphase filterbank transforms the 32 subband

blocks of 18 time-domain samples in each granule to 18 blocks of
32 PCM samples.
— The filterbank operates on 32 samples at a time, one from each

subband block.
MPEG1 Layer 3
— 128 kbps is the most common, because it typically offers
adequate audio quality in a relatively small space.
— 192 kbps is often used by those who notice artifacts at

lower bit rates.
— As the Internet bandwidth availability and hard drive sizes

have
increased, 128 kbps bit rate files are slowly being replaced
with higher bit rates like 192 and 256 kbps.
MPEG1 Layer 3
— In the early years of MP3, all — It will take too much time to
files were encoded at a constant force mp3 out of the market.
bitrate or CBR.
— Musical passages which are — There are some formats that

relatively dense, however, have ensures quality even better
more audio data than quiet than that of mp3, ie AAC
passages, so variable bitrates or
VBR can be used to achieve a
higher quality sound.
— Audio files with no — which is often called mp4.

compression, as found on a But its bitrate is limited by
normal audio CD, can be very 192 kbps,
large - around 10 MB per minute
of stereo sound.
AAC
— The input signal passes to the filter bank and the perceptual model in parallel.
The filter bank consists of a 50 per cent overlapped critically sampled MDCT
which can be switched between block lengths of 2048 and 256 samples.
— At 48 kHz the filter allows resolutions of 23 Hz and 21 ms or 187 Hz and 2.6 ms.
— Following the filter bank is the intra-block predictive coding module. When
enabled this module finds redundancy between the coefficients within one
transform block.
— In the time domain, predictive coding works well on stationary signals but fails
on transients. The dual of this characteristic is that in the frequency domain,
predictive coding works well on transients but fails on stationary signals
— The prediction method is a conventional forward predictor structure in which
the result of filtering a number of earlier coefficients is used to predict the
current one.
— The prediction is subtracted from the actual value to produce a prediction error
or residual which is transmitted.
MPEG2 AAC
— In the time domain, predictive — Following the filter bank is the intra-block
coding works well on stationary predictive coding module.
— When enabled this module finds
— signals but fails on transients. redundancy between the coefficients
The dual of this characteristic is within one transform block
that in the frequency domain,
predictive coding
— works well on transients but
fails on stationary signals
— AAC supports up to 48 audio
channels.
— The use of TNS also allows the
coder to use longer blocks more
of the time
AAC
— Advanced Audio Coding and is a lossy audio
compression format.
— Audio files that have been encoded with AAC are generally
smaller in size and deliver a higher quality of sound than
MP3.
— File extensions are

— .aac
— .m4a
— .mp4
— .m4p
AAC
— AAC is 100% more powerful than layer2 and 30%
powerful than layer3.
— AAC uses tools like temporal noise shaping,backward
adaptive coding.
— And techniques used in mpeg1 layer3.
— With the above gives, high fidelity audio at low bit
rate.
— AAC with SBR (aacplus)gives stereo at 48 kbps.
— aacplus for Digital Radio Mondial.
FLAC
— Free Lossless Audio
Coder.
— 1043 kbps – avg.]

SBR
Audio compression
— Digital compression of audio has become increasingly more important
with the advent of fast and inexpensive microprocessors. It is used in
many applications such as transmission of speech in the GSM mobile
phone system, storing music in the digital cassette format, and for the
DAB digital broadcast radio.
— Normally no information loss is acceptable when compressing digital
data such as programs, source code, and text documents.
— Entropy coding is the method most commonly used for lossless
compression. It exploits the fact that all bit combinations are not as
likely to appear in the data, which is used in coding algorithms such as
Huffman.
— This approach works for the data types mentioned above, however
audio signals such as music and speech cannot be efficiently
compressed with entropy coding.

Audio CompressionEA

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio CompressionEA

Uploaded by

Copyright:

Available Formats

Generated by Foxit PDF Creator © Foxit Software

http://www.foxitsoftware.com For evaluation only.

2)Why audio compression?

3)How audio compression?

4) Advantages of audio compression

5)Application and standards of compressed

- Compression is the reduction in size of data in order to save space or transmission

- A force that tends to shorten or squeeze something, decreasing its volume.

— Therefore, the net bit-rate required is 2 x 16 x 44.1 x 10 = 1.41

— 1.2 Mbits/sec is allocated to video

— Up to two channels of audio are accommodated.

— In its audio part, two to five full bandwidth audio

— The standard also offers a collection of tools known as

— Compression is achieved by eliminating the perceptually

— Reduced storage space.

— Make decoded signal as close to the original signal.

— Don’t transmit what your ear cannot hear.

— Range of human hearing.

— One signal can make another inaudible.

— The slope of the masking threshold

— The duration within which premasking applies is significantly less

— This is the method used in perceptual audio coders.

— A perceptual audio coder uses a psychoacoustic effect called 'auditory

— The lossy compression scheme described here achieves coding gain by

Fig. 7 : Psychoacoustic low bit rate coder

— Calculate the threshold.

— Remove inaudible component.

— Pack data and frame.

— PCM all signals are given equal word lengths.

— Perceptual coders assign bits according to the audibility.

— Prominent tone is given a large numbers of bits to ensure

— Break the signal into spectrum.

Recover the signal from the spectrum.

Frequency domain to time domain.

— Filter banks divides the signal into multiple bandlimited channels in

— Each subband is coded independently with high or low bits allocated

Spectral Band Replication

— Lower part of the spectrum is transmitted and the higher frequency

— SBR techniques used in MPEG 4 AAC.

— The MPEG1 audio becomes a popular algorithm for audio data

— It can compress 1.5Mbit/sec CD quality audio data into

— 32 to 448 Kbit/sec for layer I

— It requires to support sampling rates of 32,44.1 and 48Khz.

— Filter creates 32 equal width sub bands.

— Each sub band will have 12 samples.

— 512 point FFT in Layer1 to determine the masking threshold.

— Scale factors. — Reduction in frame rate.

— Bit allocation details. — Additional bits per frames are

— Control bits for sampling frequency.

— The last block contains 32 sets of 12 samples. These samples vary in

— Quantized samples are dequantized to form a normalized samples.

— Received scale factors are arranged in a array of 2 column 32 rows.

— Subband samples are multiplied by the SF.

— Empty subbands are assigned zero value.

— Filter ( inverse) gives the broadband audio.

— Transparent to improvement in encoder technology.

— Difficult to differentiate Layer1 rec and an original CD rec.

— Provide high quality at

— Marginally high cost.

— But the frame size is tripled.

— 1024 FFT for better differentiation of tonal and nontonal

— 3 groups of 12 samples in each sub bands.

— Max of 3 scale factor for each sub bands.