The Parks McClellan Algorithm: A Scalable Approach For Designing FIR Filters

The Parks McClellan algorithm: a scalable approach for
designing FIR filters
Silviu Filip
N. Brisebarre and G. Hanrot
under the supervision of
(AriC, LIP, ENS Lyon)
PEQUAN Seminar, February 26, 2015
1 / 33
Digital Signal Processing
became increasingly relevant over the past 4 decades:
ANALOG → DIGITAL
2 / 33
ANALOG → DIGITAL
think of:
data communications (ex: Internet, HD TV and digital radio)
audio and video systems (ex: CD, DVD, BD players)
many more
2 / 33
ANALOG → DIGITAL
think of:
data communications (ex: Internet, HD TV and digital radio)
audio and video systems (ex: CD, DVD, BD players)
many more
What are the ’engines’ powering all these?
2 / 33
Digital filters
3 / 33
Digital filters
→ we get two categories of filters

finite impulse response (FIR) filters
non-recursive structure (i.e. ak = 0, k = 1, . . . , M )
infinite impulse response (IIR) filters
recursive structure (i.e. ∃k s.t. ak 6= 0)
3 / 33
Digital filters

non-recursive structure (i.e. ak = 0, k = 1, . . . , M )
recursive structure (i.e. ∃k s.t. ak 6= 0)
→ natural to work in the frequency domain
3 / 33
Digital filters

H is a polynomial
H is a rational fraction
→ natural to work in the frequency domain
H is the transfer function of the filter
3 / 33
The filtering toolchain
Steps:
1. derive a concrete mathematical representation of the filter
→ use theory of minimax approximation
4 / 33
Steps:
2. quantization of the filter coefficients using fixed-point or floating-point

formats
→ use tools from algorithmic number theory (euclidean lattices)
4 / 33
Steps:

formats
3. hardware synthesis of the filter
4 / 33
Steps:

formats
3. hardware synthesis of the filter

Today’s focus: first step for FIR filters
4 / 33
Finite Impulse Response (FIR) filters
large class of filters, with a lot of desirable properties

Pn
Usual representation: H(ω) = k=0 hk cos(ωk)
5 / 33

Usual representation: H(ω) = nk=0 hk cos(ωk) = nk=0 hk Tk (cos(ω))
P P
→ if x = cos(ω), view H in the basis of Chebyshev polynomials
5 / 33

P P

Specification:
5 / 33

P P

Specification:
8
X
H(ω) = hk cos(ωk)
k=0
5 / 33
Optimal FIR design with real coefficients
Pn Given a closed real set F ⊆ [0, π], find an approximation

The problem:
H(ω) = k=0 hk cos(ωk) of degree at most n for a continuous function
D(ω), ω ∈ F such that
δ = kE(ω)k∞,F = max |W (ω) (H(ω) − D(ω))|

ω∈F
is minimal.
W - real valued weight function, continuous and positive over F .
6 / 33
The solution: characterized by the Alternation Theorem
Theorem
Pn
The unique solution H(ω) = k=0 hk cos(ωk) has an error function E(ω), for
which there exist n + 2 values ω0 < ω1 < · · · < ωn+1 , belonging to F , such that
E(ωi ) = −E(ωi+1 ) = ±δ,
for i = 0, . . . , n and δ = kE(ω)k∞,F .
7 / 33
The solution: characterized by the Alternation Theorem
Theorem
Pn
The unique solution H(ω) = k=0 hk cos(ωk) has an error function E(ω), for
which there exist n + 2 values ω0 < ω1 < · · · < ωn+1 , belonging to F , such that
E(ωi ) = −E(ωi+1 ) = ±δ,
for i = 0, . . . , n and δ = kE(ω)k∞,F .
→ well studied in Digital Signal Processing literature

1972: Parks and McClellan
→ based on a powerful iterative approach from Approximation Theory:
1934: Remez
7 / 33
The Parks-McClellan design method: Motivation
Why work on such a problem?

one of the most well-known filter design methods
no concrete study about its numerical behavior in practice
need for high degree (n > 500) filters + existing implementations
not able to provide them (e.g. MATLAB, SciPy, GNURadio)
useful for attacking the coefficient quantization problem
8 / 33
The Parks-McClellan design method: Example
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
1.2
0.8
0.6
0.4
0.2
-0.2
0
9 / 33
The Parks-McClellan design method: Steps
10 / 33
Step 1: Choosing the n + 2 initial references
Traditional approach: take the n + 2 references uniformly from F
→ can lead to convergence problems
11 / 33
→ want to start from better approximations
Existing approaches: most are not general enough and/or costly to execute
11 / 33
→ want to start from better approximations
Existing approaches: most are not general enough and/or costly to execute
Our approach: extrema position extrapolation from smaller filters
→ although empirical, this reference scaling idea is rather robust in practice
11 / 33
Examples
1. degree n = 520 unit weight filter with passband [0, 0.99π] and stopband
centered at π
→ removal of harmonic interference inside signals
1

2. degree
3 n= 53248 lowpass filter with passband 0, 8192 π and stopband
8192 π, π
→ design of efficient wideband channelizers for software radio systems
12 / 33
Examples + comparison with uniform initialization

centered at π
→ removal of harmonic interference inside signals
1

2. degree
3 n= 53248 lowpass filter with passband 0, 8192 π and stopband
8192 π, π
→ design of efficient wideband channelizers for software radio systems
Example 1 Example 2
Degree Iterations Degree Iterations
520 12/3 53248 NC1 /3
Advantages:
reduced number of iterations
improved numerical behavior
1 our implementation did not converge when using uniform initialization

12 / 33
13 / 33
Step 2: Computing the current error function E(ω) and δ
Amounts to solving a linear system in h0 , . . . , hn and δ.

1
1 cos(ω0 ) ··· cos(nω0 )
    
W (ω0 ) h0 D(ω0 )
 .. .. .. ..  .   ..
. . . .  .   
 .  =  . 
(−1)n
 
1
 cos(ωn ) ··· cos(nωn ) W (ωn )  hn
  
D(ωn ) 
(−1)n+1 δ D(ωn+1 )
1 cos(ωn+1 ) · · · cos(nωn+1 ) W (ωn+1 )
→ solving system directly: can be numerically unstable
14 / 33
Step 2: Computing the current error function E(ω) and δ
Amounts to solving a linear system in h0 , . . . , hn and δ.

1
1 cos(ω0 ) ··· cos(nω0 )
    
W (ω0 ) h0 D(ω0 )
 .. .. .. ..  .   ..
. . . .  .   
 .  =  . 
(−1)n
 
1
 cos(ωn ) ··· cos(nωn ) W (ωn )  hn
  
D(ωn ) 
(−1)n+1 δ D(ωn+1 )
1 cos(ωn+1 ) · · · cos(nωn+1 ) W (ωn+1 )
→ solving system directly: can be numerically unstable

→ Parks & McClellan’s idea: use barycentric form of Lagrange interpolation
14 / 33
Barycentric Lagrange interpolation
Problem: p polynomial with deg p 6 n interpolates f at points xk , i.e.,
p(xk ) = fk , k = 0, . . . , n
15 / 33
p(xk ) = fk , k = 0, . . . , n
→ schoolbook solution:
n n
X Y x − xi
p(x) = fk `k (x), `k (x) =
xk − xi
k=0 i=0,i6=k
Cost: O(n2 ) operations for evaluating p(x), each time
15 / 33
p(xk ) = fk , k = 0, . . . , n
→ schoolbook solution:
n n
X Y x − xi
p(x) = fk `k (x), `k (x) =
xk − xi
k=0 i=0,i6=k
Cost: O(n2 ) operations for evaluating p(x), each time

Can we do better?
15 / 33
YES → the barycentric form of p:

n
X wk
fk
x − xk 1
k=0
p(x) = n , wk = Q
wk i6=k k − xi )
(x
X
x − xk
k=0
Cost: O(n2 ) operations for computing the weights wk (done once) + O(n)
operations for evaluating p(x)
16 / 33
→ we get:
n+1
X
wk D(ωk )
k=0 1
δ= , wk = Q
i6=k (xk − xi )
n+1 k
X (−1) wk
W (ωk )
k=0
and
n+1
X wk
ck
x − xk
k=0
H(ω) = n+1
,
X wk
x − xk
k=0
δ
where x = cos(ω), xk = cos(ωk ) and ck = D(ωk ) − (−1)k .
W (ωk )
17 / 33
Why should we use it?

→ numerically stable if the family of interpolation nodes used has a small
Lebesgue constant [Higham2004;Mascarenhas&Camargo2014]
The Lebesgue constant: specific for each grid of points; measures the quality of
a polynomial interpolant with respect to the function to be approximated
18 / 33
Why should we use it?

→ numerically stable if the family of interpolation nodes used has a small
Lebesgue constant [Higham2004;Mascarenhas&Camargo2014]
The Lebesgue constant: specific for each grid of points; measures the quality of
a polynomial interpolant with respect to the function to be approximated
→ from empirical observation, the families of points used inside the

Parks-McClellan algorithm (Step 1 + Step 3) usually converge to sets of points
with small Lebesgue constant
18 / 33
19 / 33
Step 3: Finding the local extrema of E(ω)
Traditional approach: evaluate E(ω) on a dense grid of uniformly distributed

points (in practice it is usually 16n)
→ can sometimes fail to find all the extrema
→ need for a more robust alternative
20 / 33
Chebyshev-proxy rootfinders [Boyd2006,Boyd2013]
21 / 33
Chebyshev-proxy rootfinders [Boyd2006,Boyd2013]
→ numerically stable for finding the real roots of fm (x) located inside [−1, 1]
Cost: around 10m3 operations according to [Boyd2013]
Important questions:
What value is suitable for m?
Depends on f . Can be computed adaptively (like inside the Chebfun2
MatlabTM package)
Can the computational cost be reduced?
YES. To O(m2 ). Through interval subdivision OR direct quadratic
solvers.
2 http://www.chebfun.org/
22 / 33
Interval subdivision
Key idea: use the trigonometric form of fm (i.e. x = cos(ω), ω ∈ [0, π])
23 / 33
→ several alternatives [Boyd2006] for finding the roots of fm :

k-m subdivision algorithms: split [0, π] into m uniform subintervals +
degree k Chebyshev interpolation on each subinterval
Cost estimate: 22000m + 42m2 operations, for k = 13 ("tredecic"
subdivision)
23 / 33

subdivision)
linear-with-cubic solve algorithm: adaptive interval subdivision + cubic
interpolation + zero-free interval testing
Cost estimate: 400m2 operations, for problems with O(m) simple roots
23 / 33

subdivision)
linear-with-cubic solve algorithm: adaptive interval subdivision + cubic
interpolation + zero-free interval testing
Cost estimate: 400m2 operations, for problems with O(m) simple roots
Which one to use? → problem-dependent
23 / 33
Interval subdivision: Our problem
local extrema of E(ω) → roots of E 0 (ω)

What we know, at each iteration:
E(ω) usually has very close to n + 2 local extrema inside F
placement information for the local extrema
24 / 33
Interval subdivision: Our problem
local extrema of E(ω) → roots of E 0 (ω)

What we know, at each iteration:
E(ω) usually has very close to n + 2 local extrema inside F
placement information for the local extrema
Our approach: k-n-type subdivision with non-uniform subintervals
24 / 33
Interval subdivision: Our solution
Why use it?

works very well in practice
k = 4 is usually sufficient → small computational cost
no need for zero-free interval testing
embarrassingly parallel approach
25 / 33
Direct quadratic solvers
→ investigated in a number of recent articles.

→ tend to be faster than classic QR/QZ schemes for n > 80.
26 / 33
Direct quadratic solvers
→ investigated in a number of recent articles.

→ tend to be faster than classic QR/QZ schemes for n > 80.
Some questions:
How do such methods compare to subdivision approaches?
When is it worthwhile to use them with Chebyshev basis expansions?
Can they be easily parallelized?
26 / 33
27 / 33
Step 4: Recover coefficients of H(ω) upon convergence
→ can use the Inverse Discrete Fourier Transform
28 / 33
Step 4: Recover coefficients of H(ω) upon convergence
→ can use the Inverse Discrete Fourier Transform
→ implement it using Clenshaw’s algorithm for computing

linear combinations of Chebyshev polynomials (numerically
robust approach)
Cost: O(n2 ) arithmetic operations
28 / 33
Some remarks about convergence
Notation:
H ∗ - final minimax filter
Hk - the filter computed at the k-th iteration of the Parks-McClellan algorithm
29 / 33
Notation:
→ theoretically, linear convergence is always possible [Cheney1966], i.e.
max |W (ω)(H ∗ (ω) − Hk (ω))| 6 Aθk ,

ω∈F
for some A > 0 and θ ∈ (0, 1).
29 / 33
Notation:
→ theoretically, linear convergence is always possible [Cheney1966], i.e.
max |W (ω)(H ∗ (ω) − Hk (ω))| 6 Aθk ,

ω∈F
for some A > 0 and θ ∈ (0, 1).

→ if D(ω) twice differentiable and E ∗ (ω) = W (ω)(D(ω) − H ∗ (ω)) equioscillates
exactly n + 2 times, we have quadratic convergence [Veidinger1960]
29 / 33
Our implementation
→ written in C++
30 / 33
Our implementation
→ written in C++
→ small number of external dependencies:
Eigen;
Intel TBB;
MPFR.
30 / 33
Our implementation
→ written in C++
Eigen;
Intel TBB;
MPFR.
→ optional parallelism with OpenMP
30 / 33
Our implementation
→ written in C++
Eigen;
Intel TBB;
MPFR.
→ optional parallelism with OpenMP
→ comes in three flavors:
double
long double
MPFR
30 / 33
Our implementation: Results
Examples:
[0.5π, π]
2. degree n = 100 unit weight filter with passbands [0, 0.2π], [0.6π, π] and
stopband [0.3π, 0.5π]
centered at π
1

4. degree
3 n =
53248 lowpass filter with passband 0, 8192 π and stopband
8192 π, π
31 / 33
Our implementation: Results
→ running times (in seconds) on a 3.6 GHz 64-bit Intel Xeon(R) E5-1620
Problem Uniform GNURadio MATLAB SciPy

(sequential)
Example 1 (n = 100) 0.0112 NC 0.1491 0.3511
Example 2 (n = 100) 0.0395 NC NC NC
Example 3 (n = 520) 0.3519 NC NC NC
Example 4 (n = 53248) NC NC NC NC
Example (degree) Uniform Uniform Scaling Scaling
(sequential) (parallel) (sequential) (parallel)
Example 1 (n = 100) 0.0112 0.0073 0.0147 0.011
Example 2 (n = 100) 0.0395 0.0274 0.0339 0.0275
Example 3 (n = 520) 0.3519 0.2251 0.0982 0.0716
Example 4 (n = 53248)3 NC NC 537.8 162.6
3 used the long double version of our code

32 / 33
Perspectives
Conclusion:
improved the practical behavior of a well known polynomial approximation
algorithm for filter design
→ use numerically stable barycentric Lagrange interpolation + rootfinders
without sacrifices in efficiency
this new approach can take huge advantage of parallel architectures
Future work:
provide a complete toolchain for constructing FIR filters (approximation +
quantification + hardware synthesis)
tackle the IIR filter setting (rational fraction)
non-linear problem
constraints: poles located inside the unit circle
33 / 33

The Parks McClellan Algorithm: A Scalable Approach For Designing FIR Filters

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Parks McClellan Algorithm: A Scalable Approach For Designing FIR Filters

Uploaded by

Copyright:

Available Formats

The Parks McClellan algorithm: a scalable approach for

designing FIR filters

(AriC, LIP, ENS Lyon)

PEQUAN Seminar, February 26, 2015

became increasingly relevant over the past 4 decades:

became increasingly relevant over the past 4 decades:

became increasingly relevant over the past 4 decades:

→ we get two categories of filters

→ we get two categories of filters

→ we get two categories of filters

2. quantization of the filter coefficients using fixed-point or floating-point

2. quantization of the filter coefficients using fixed-point or floating-point

3. hardware synthesis of the filter

2. quantization of the filter coefficients using fixed-point or floating-point

3. hardware synthesis of the filter

large class of filters, with a lot of desirable properties

large class of filters, with a lot of desirable properties

→ if x = cos(ω), view H in the basis of Chebyshev polynomials

large class of filters, with a lot of desirable properties

→ if x = cos(ω), view H in the basis of Chebyshev polynomials

large class of filters, with a lot of desirable properties

→ if x = cos(ω), view H in the basis of Chebyshev polynomials

Pn Given a closed real set F ⊆ [0, π], find an approximation

δ = kE(ω)k∞,F = max |W (ω) (H(ω) − D(ω))|

The solution: characterized by the Alternation Theorem

E(ωi ) = −E(ωi+1 ) = ±δ,

for i = 0, . . . , n and δ = kE(ω)k∞,F .

The solution: characterized by the Alternation Theorem

E(ωi ) = −E(ωi+1 ) = ±δ,

for i = 0, . . . , n and δ = kE(ω)k∞,F .

→ well studied in Digital Signal Processing literature

Why work on such a problem?

Our approach: extrema position extrapolation from smaller filters

→ although empirical, this reference scaling idea is rather robust in practice

Examples + comparison with uniform initialization

1 our implementation did not converge when using uniform initialization

Amounts to solving a linear system in h0 , . . . , hn and δ.

→ solving system directly: can be numerically unstable

Amounts to solving a linear system in h0 , . . . , hn and δ.

→ solving system directly: can be numerically unstable

Problem: p polynomial with deg p 6 n interpolates f at points xk , i.e.,

Problem: p polynomial with deg p 6 n interpolates f at points xk , i.e.,

Cost: O(n2 ) operations for evaluating p(x), each time

Problem: p polynomial with deg p 6 n interpolates f at points xk , i.e.,

Cost: O(n2 ) operations for evaluating p(x), each time

YES → the barycentric form of p:

Why should we use it?

Why should we use it?

→ from empirical observation, the families of points used inside the

Traditional approach: evaluate E(ω) on a dense grid of uniformly distributed

→ several alternatives [Boyd2006] for finding the roots of fm :

→ several alternatives [Boyd2006] for finding the roots of fm :

→ several alternatives [Boyd2006] for finding the roots of fm :

local extrema of E(ω) → roots of E 0 (ω)

local extrema of E(ω) → roots of E 0 (ω)

Our approach: k-n-type subdivision with non-uniform subintervals

Why use it?

→ investigated in a number of recent articles.

→ investigated in a number of recent articles.

→ can use the Inverse Discrete Fourier Transform

→ can use the Inverse Discrete Fourier Transform

→ implement it using Clenshaw’s algorithm for computing