Professional Documents
Culture Documents
Specification v1.0
June 2014
Contents
List of Figures
List of Tables
1 PURPOSE
2 INTRODUCTION
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
7
8
9
10
10
10
.
.
.
.
.
.
15
15
15
16
16
16
16
5 IMPLEMENTATION
16
6 CONCLUSION
16
A
Cyclic Redundancy Check
17
A.1 Computation of CRC . . . . . . . . . . . . . . . . . . . . . . . . . . 17
LIST OF FIGURES
List of Figures
1
Page 2
LIST OF TABLES
List of Tables
1
2
3
Page 3
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Acronyms and Abbreviations . . . . . . . . . . . . . . . . . . . . .
Glossary of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
4
LIST OF TABLES
REVISION HISTORY
The following table lists critical changes that were made in each revision of the
document.
Table 1: Revision History
Revision Reason
Author
Date
1.0
Initial release Delfino, Oscar Ariel Jun 11 2014
GLOSSARY
Table 3: Glossary of terms
Meaning
One or more audio samples that span several channels.
One or more audio samples within a channel. So a block
contains one subblock for each channel, and all subblocks
contain the same number of samples.
Blocksize The number of samples in any of a blocks subblocks. For
example, a one second block sampled at 44.1KHz has a
blocksize of 44100, regardless of the number of channels.
Frame
A frame header plus one or more subframes.
Subframe A subframe header plus one or more encoded samples
from a given channel. All subframes within a frame will
contain the same number of samples.
Term
Block
Subblock
Page 4
PURPOSE
This document presents the technical specification of the FlaC Hardware Decoder.
This design explains the FLAC audio decoder design implementation with the
following objectives:
Decoding an encoded *.flac file and playing the decoded samples in real time.
Implementing a FLAC decoder algorithm in the FPGA to improve the speed
of the FLAC decoder.
This FLAC decoder design example supports sample rates from 8 kHz to 44.1
KHz and a word length of 16 bits.
Page 5
INTRODUCTION
The audio input to the FLAC encoder is passed block by block to its prediction
stage where the encoder tries to find a mathematical description of the signal. This
description is typically much smaller than the raw signal itself. Since the methods
of prediction are known to both the encoder and decoder, only the parameters of
the predictor need be included in the compressed stream. FLAC currently uses four
different classes of predictors to approximate the audio signal based on its content.
If the predictor does not describe the signal exactly, the difference between the
original signal and the predicted signal (called the error or residual signal) must be
coded losslessly. If the predictor is effective, the residual signal will require fewer
bits per sample than the original signal. The FLAC encoder currently uses Rice
coding to code the error signal (residual signal). Each compressed block becomes a
frame in the compressed format.
An encoded frame is written on the FLAC output file for each block of the input
file. The frame starts with a header and is followed by a number of sub-frames.
Each sub-frame starts with its own header, followed by Rice codes for encoded
audio samples from the same channel. A frame consists of one sub-frame for each
audio channel, and each sub-frame consists of the same number of Rice encoded
audio samples.
The FLAC codec is an open and royalty-free format with a free software
implementation made available.
Page 6
3.1
METADATA
Metadata blocks can be any length and new ones can be defined. A decoder is
allowed to skip any metadata types it does not understand. Only one is mandatory:
the STREAMINFO block. This block has information like the sample rate,
number of channels, etc., and data that can help the decoder manage its buffers,
like the minimum and maximum data rate and minimum and maximum block
size. Also included in the STREAMINFO block is the MD5 signature of the
unencoded audio data. This is useful for checking an entire stream for transmission
errors.
3.2
AUDIO DATA
After the metadata comes the encoded audio data. Audio data and metadata are
not interleaved. Like most audio codecs, FLAC splits the unencoded audio data
into blocks, and encodes each block separately. The encoded block is packed into a
frame and appended to the stream. The reference encoder uses a single block size
for the whole stream but the FLAC format does not require it.
3.3
BLOCKING
The block size is an important parameter to encoding. If it is too small, the frame
overhead will lower the compression. If it is too large, the modeling stage of the
compressor will not be able to generate an efficient model. Understanding FLACs
modeling will help you to improve compression for some kinds of input by varying
the block size. In the most general case, using linear prediction on 44.1kHz audio,
Page 7
the optimal block size will be between 2-6 ksamples. flac defaults to a block size of
4096 in this case. Using the fast fixed predictors, a smaller block size is usually
preferable because of the smaller frame header.
In order to simplify encoder/decoder design, FLAC imposes a minimum block
size of 16 samples, and a maximum block size of 65535 samples. This range covers
the optimal size for all of the audio data FLAC supports.
Blocked data is passed to the predictor stage one subblock (channel) at a time.
Each subblock is independently coded into a subframe, and the subframes are
concatenated into a frame. Because each channel is coded separately, it means that
one channel of a stereo frame may be encoded as a constant subframe, and the
other an LPC subframe.
3.4
INTER-CHANNEL DECORRELATION
In the case of stereo input, once the data is blocked it is optionally passed through an
inter-channel decorrelation stage. The left and right channels are converted to center and side channels through the following transformation: mid = (lef t + right)/2
, side = lef t right. This is a lossless process, unlike joint stereo. For normal CD
audio this can result in significant extra compression.
In stereo streams, many times there is an exploitable amount of correlation
between the left and right channels. FLAC allows the frames of stereo streams to
have different channel assignments, and an encoder may choose to use the best
representation on a frame-by-frame basis.
Independent: The left and right channels are coded independently.
Mid-side: The left and right channels are transformed into mid and side
channels. The mid channel is the midpoint (average) of the left and right
signals, and the side is the difference signal (left minus right).
Left-side: The left channel and side channel are coded.
Right-side: The right channel and side channel are coded.
Surprisingly, the left-side and right-side forms can be the most efficient in many
frames, even though the raw number of bits per sample needed for the original
signal is slightly more than that needed for independent or mid-side coding.
Page 8
3.5 MODELING
3.5
MODELING
In the next stage, the encoder tries to approximate the signal with a function
in such a way that when the approximation is subracted, the result (called the
residual, residue, or error) requires fewer bits-per-sample to encode. The functions
parameters also have to be transmitted so they should not be so complex as to eat
up the savings.
FLAC has two methods of forming approximations:
1. Fitting a simple polynomial to the signal.
2. General linear predictive coding (LPC).
First, fixed polynomial prediction is much faster, but less accurate than LPC.
The higher the maximum LPC order, the slower, but more accurate, the model
will be. However, there are diminishing returns with increasing orders.
Second, the parameters for the fixed predictors can be transmitted in 3 bits
whereas the parameters for the LPC model depend on the bits-per-sample and
LPC order. This means the frame header length varies depending on the method
and order you choose and can affect the optimal block size.
Methods for modeling the input signal:
1. Verbatim: This is essentially a zero-order predictor of the signal. The
predicted signal is zero, meaning the residual is the signal itself, and the
compression is zero.
2. Constant: This predictor is used whenever the subblock is pure DC (digital
silence), i.e. a constant value throughout. The signal is run-length encoded
and added to the stream.
3. Fixed linear predictor: FLAC adds a fourth-order predictor to the zeroto-third-order predictors. Since the predictors are fixed, the predictor order
is the only parameter that needs to be stored in the compressed stream. The
error signal is then passed to the residual coder.
4. FIR Linear prediction: The reference encoder uses the Levinson-Durbin
method for calculating the LPC coefficients from the autocorrelation coefficients, and the coefficients are quantized before computing the residual.
FLAC allows the quantized coefficient precision to vary from subframe to
subframe. The FLAC reference encoder estimates the optimal precision to
use based on the block size and dynamic range of the original signal.
Page 9
3.6
RESIDUAL CODING
Once the model is generated, the encoder subracts the approximation from the
original signal to get the residual (error) signal. The error signal is then losslessly
coded. To do this, FLAC takes advantage of the fact that the error signal generally
has a Laplacian (two-sided geometric) distribution, and that there are a set of
special Huffman codes called Rice codes that can be used to efficiently encode these
kind of signals quickly and without needing a dictionary.
Rice coding involves finding a single parameter that matches a signals distribution, then using that parameter to generate the codes. As the distribution
changes, the optimal parameter changes, so FLAC supports a method that allows
the parameter to change as needed. The residual can be broken into several contexts
or partitions, each with its own Rice parameter.
The error signal is coded using Rice codes in one of two ways:
1. The encoder estimates a single Rice parameter based on the variance of the
residual and Rice codes the entire residual using this parameter.
2. The residual is partitioned into several equal-length regions of contiguous
samples, and each region is coded with its own Rice parameter based on the
regions mean.
3.7
FRAMING
An audio frame is preceded by a frame header and trailed by a frame footer. The
header starts with a sync code, and contains the minimum information necessary
for a decoder to play the stream, like sample rate, bits per sample, etc. It also
contains the block or sample number and an 8-bit CRC of the frame header. The
sync code, frame header CRC, and block/sample number allow resynchronization
and seeking even in the absence of seek points. The frame footer contains a 16-bit
CRC of the entire encoded frame for error detection. If the reference decoder
detects a CRC[A] error it will generate a silent block.
3.8
BITSTREAM FORMAT
All numbers used in a FLAC bitstream are integers; there are no floating-point
representations. All numbers are big-endian coded. All numbers are unsigned
unless otherwise specified.
Description of the stream:
Page 10
because the decoder may not have access to the STREAMINFO metadata
block at the start of the stream. This information includes sample rate, bits
per sample, number of channels, etc. Since the frame header is pure overhead,
it has a direct effect on the compression ratio. To keep the frame header
as small as possible, FLAC uses lookup tables for the most commonly used
values for frame parameters. For instance, the sample rate part of the frame
header is specified using 4 bits. Eight of the bit patterns correspond to the
commonly used sample rates of 8/16/22.05/24/32/44.1/48/96 kHz. However,
odd sample rates can be specified by using one of the hint bit patterns,
directing the decoder to find the exact sample rate at the end of the frame
header. The same method is used for specifying the block size and bits per
sample. In this way, the frame header size stays small for all of the most
common forms of audio data.
Individual subframes (one for each channel) are coded separately within a
frame, and appear serially in the stream. In other words, the encoded audio
data is NOT channel-interleaved. This reduces decoder complexity at the
cost of requiring larger decode buffers. Each subframe has its own header
specifying the attributes of the subframe, like prediction method and order,
residual coding parameters, etc. The header is followed by the encoded audio
data for that channel.
FLAC specifies a subset of itself as the Subset format. The purpose of this
is to ensure that any streams encoded according to the Subset are truly
streamable, meaning that a decoder that cannot seek within the stream can
still pick up in the middle of the stream and start decoding. It also makes
hardware decoder implementations more practical by limiting the encoding
parameters such that decoder buffer sizes and other resource requirements
can be easily determined. flac generates Subset streams by default unless
the lax command-line option is used. The Subset makes the following
limitations on what may be used in the stream:
The blocksize bits in the frame header must be 0001-1110. The blocksize
must be <= 16384; if the sample rate is <= 48000Hz, the blocksize
must be <= 4608.
The sample rate bits in the frame header must be 0001-1110.
The bits-per-sample bits in the frame header must be 001-111.
If the sample rate is <= 48000Hz, the filter order in LPC subframes
must be less than or equal to 12, i.e. the subframe type bits in the
subframe header may not be 101100-111111.
Page 13
Page 14
FLAC DECODER
This design explains the FLAC audio decoder design implementation with the
following objectives:
Decoding an encoded *.flac file and playing the decoded samples in real time.
Implementing the algorithm in the FPGA to improve the speed of the FLAC
decoder.
This FLAC decoder design example supports sample rates from 8 kHz to 44.1
KHz and a word length of 16 bits.
4.1
ARCHITECTURE
The FLAC decoder decompresses the compressed FLAC source file into original
audio source without any loss in quality.
An encoded FLAC output stream starts with the four-byte identifying string
fLaC. This is followed by the STREAMINFO metadata block, including all the
needed side information for the decoder and for the end-user. The remainder of
the stream consists of encoded audio frames.
The FLAC decoder consists primarily of two functional blocks responsible for
performing tasks on different types of information within the FLAC stream received
from the metadata decoder. These blocks are the stream decoder, and the frame
decoder. Figure 1 illustrates the complete FLAC audio player architecture.
*.flac file and plays back the audio. The following functional blocks are involved in
decoding and playing the audio.
4.2
METADATA DECODER
4.3
STREAM DECODER
4.4
FRAME DECODER
4.5
PDMA BEFFER
4.6
DAC
IMPLEMENTATION
CONCLUSION
Page 16
A
Cyclic Redundancy Check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in
digital networks and storage devices to detect accidental changes to raw data.
Blocks of data entering these systems get a short check value attached, based on
the remainder of a polynomial division of their contents; on retrieval the calculation
is repeated, and corrective action can be taken against presumed data corruption
if the check values do not match.
CRCs are popular because they are simple to implement in binary hardware,
easy to analyze mathematically, and particularly good at detecting common errors
caused by noise in transmission channels. Because the check value has a fixed
length, the function that generates it is occasionally used as a hash function.
Specification of a CRC code requires definition of a so-called generator polynomial. This polynomial becomes the divisor in a polynomial long division, which
takes the message as the dividend and in which the quotient is discarded and the
remainder becomes the result. The length of the remainder is always less than
the length of the generator polynomial, which therefore determines how long the
result can be.
The simplest error-detection system, the parity bit, is in fact a trivial 1-bit CRC:
it uses the generator polynomial x + 1 (two terms), and has the name CRC-1.
A.1
Computation of CRC
To compute an n-bit binary CRC, line the bits representing the input in a row,
and position the (n + 1)-bit pattern representing the CRCs divisor (called a
polynomial) underneath the left-hand end of the row.
In this example, we shall encode 14 bits of message with a 3-bit CRC, with
a polynomial x3 + x + 1. The polynomial is written in binary as the coefficients;
a 3rd order polynomial has 4 coefficients (1x3 + 0x2 + 1x + 1). In this case, the
coefficients are 1, 0, 1 and 1. The result of the calculation is 3 bits long.
Start with the message to be encoded:
11010011101100
Page 17
This is first padded with zeroes corresponding to the bit length n of the CRC.
Here is the first calculation for computing a 3-bit CRC:
11010011101100 000 < i n p u t r i g h t padded by 3 b i t s
1011
< d i v i s o r ( 4 b i t s ) = x3+x+1
00000000000000 100
Division algorithm
<
<
<
<
i n p u t r i g h t padded by 3 b i t s
divisor
r e s u l t (XOR with t he d i v i s o r )
divisor
< remainder ( 3 b i t s ) .
s t o p s h e r e as q u o t i e n t i s e q u a l t o z e r o .
Since the leftmost divisor bit zeroed every input bit it touched, when this
process ends the only bits in the input row that can be nonzero are the n bits at
the right-hand end of the row. These n bits are the remainder of the division step,
Page 18
Page 19
<
<
<
<
i n p u t with check v a l u e
divisor
result
divisor . . .
< remainder
REFERENCES
References
[Figueredo and Wolf, 2009] Figueredo, A. J. and Wolf, P. S. A. (2009). Assortative
pairing and life history strategy - a cross-cultural study. Human Nature,
20:317330.
Page 20