Professional Documents
Culture Documents
Pipelined Processing
Basic Ideas
Parallel processing
Pipelined processing
time
time
P1
a1
a2
a3
a4
P1
P2
b1
b2
b3
b4
P2
P3
c1
c2
c3
c4
P3
P4
d1
d2
d3
d4
P4
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
Data Dependence
P1
P1
P2
P2
P3
P3
P4
P4
time
time
Method to incorporate
pipelining: Cut-set retiming
Cut set:
A cut set is a set of edges of
a graph. If these edges are
removed from the original
graph, the remaining graph
will become two separate
graphs.
Retiming:
The timing of an algorithm is
re-adjusted while keeping
the partial ordering of
execution unchanged so
that the results correct
x[n]
z1
h[0]
z1
h[1]
h[2]
y[n]
?
=
y[n]
z1
h[0]
u[n]
h[1]
z1
h[2]
x[n]
We obtain
Fine-grain pipelining
Block Processing
One form of vectorized
parallel processing of DSP
algorithms. (Not the parallel
processing in most general
sense)
Block vector: [x(3k) x(3k+1)
x(3k+2)]
Clock cycle: can be 3 times
longer
Original (FIR filter):
y ( n) a x(n) b x(n
1)
c x(n 2)
Rewrite 3 equations at a
time:
y (3k )
x (3k )
y (3k 1) a x (3k 1)
x(3k 2)
y (3k 2)
x (3k 1)
x(3 k 2)
b x (3 k ) c x (3 k 1)
x(3k 1)
x (3k )
x( k )
x(3k )
x(3k 1)
x(3k 2)
a 0 0
0 c b
0 0 c x(k 1)
y (k ) b a 0 x(k )
c b a
0 0 0
Block Processing
10
n: sampling period
k: clock period (processor)
k = 2n
Rewrite
y (2n) a y (2n 2) x(2n)
y (2n 1) a y (2n 1) x(2n 1)
x(2n)
y (2n )
,
y
(
k
)
y (2n 1)
x(2n 1)
Note:
Pipelining: clock period =
sampling period.
Time indices
Then
y (k ) a y (k 1) x( k )
11
x(2k)
x(n)
S/P
y(2(k1))
y(2k)
x(2k+1)
y(2k+1)
y(2(k1)+1)
P/S
y(n)
12
Timing Comparison
x(1)
x(2)
MAC
x(3)
y(1)
x(4)
y(2)
y(3)
y(4)
Pipelining
Add
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(7)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
y(7)
a y(1)
Mul
Block processing
x(2)
x(4)
x(1)
x(6)
x(3)
x(8)
x(5)
x(7)
7
13