You are on page 1of 12

Pipelined Architectures for

High-Speed and Area-Efficient


Viterbi Decoders

Chen, Chao-Nan
Chu, Hsi-Cheng
Convolutional code
Viterbi decoder
In-place path metric updating
Inserting pipeline levels into ACS
Convolutional Codes
Convolutional encoders map information streams into a long code sequence.
k = 1 bit input blocks produce n = 2 code symbols each.
The code rate k/n expresses the information per coded bit and the constraint
length v defines the encoder memory order.
This encoder has 2(v 1) = 4 states.

1st code symbol

input output

2nd code symbol

Fig.1 A simple rate , v = 3 convolutional encoder


Viterbi Algorithm (VA)
The most commonly employed decoding technique that can be implemented
using either software or digital hardware.

VA uses the trellis diagram (Fig.2) and can theoretically perform maximum
likelihood decoding.

It finds the most likely path by means of suitable distance metric between the
received sequence and all the trellis paths.

00 00 00 00 00 00
11 11 11 11 11
input bit 0
01 10 10 10 10
input bit 1 11 11 11
10 00 00 00

11 01 01 01 01 01 01 01
10 10 10
Fig.2 Trellis diagram representation of the encoder of Fig.1
Viterbi Decoder
BMU: BM are computed from introduced input data
ACSU: PMs of all states are updated according to equation (1)
SMU: The stored decisions are employed in the SMU to build a unique
decoded output

PM[i](t+1) = allmin ( PM[k](t) + BM([k][i])(t) )


possible
(1)

PM[k](t) : Path metric corresponding to state k at instant t


BM([k][i])(t): Branch metric of the transtion from state k at t to state i at t+1

Branch Metric Survior-Path


ACS
Input Unit Memory Unit Output
(BMU) Unit (SMU)

Fig.3 Basic computation units in Viterbi decoder


In-place Path Metric State State Overwrites previous
i 2i metric of state i
Updating
Efficiently save half State State Overwrites previous
i+2v-1 2i+1 metric of state i+2v-1
memory size
Fig. 3. Partial trellis diagram or butterfly for
in-place computation of updated path metrics.

State State State State State State


0 0 0 0 0 0
1 1 1 2 4 1
2 2 2 4 1 2
3 3 3 6 5 3
4 4 4 1 2 4
5 5 5 3 6 5
6 6 6 5 3 6
7 7 7 7 7 7
(a) (b)
Fig. 4. Example for v=3: (a) butterflies in the traditional approach;
(b) states and butterfies during one full cycle of in-place computation
State State
i 2i

State State
i+32 2i+1

Figure 5. The diagram of BF unit

Table 1. State arrangement and path metric Figure 6. A novel architecture for
updating for constraint length 7 (64 states) the Viterbi decoder

Table 2. Address scrambling of path metric


memory for constraint length 7 (64 states)
Cycle 0 1 2 3 4 5 6 7
Iterarion 0
Address(DpRAM0-3) 0 1 2 3 4 5 6 7
Address(DpRAM4-7) 0 1 2 3 4 5 6 7
Iteration 1
Address(DpRAM0-3) 0 2 4 6 1 3 5 7
Address(DpRAM4-7) 1 3 5 7 0 2 4 6
State State
i 2i

State State
i+32 2i+1

Figure 5. The diagram of BF unit

Table 1. State arrangement and path metric Figure 6. A novel architecture for
updating for constraint length 7 (64 states) the Viterbi decoder

Table 2. Address scrambling of path metric


memory for constraint length 7 (64 states)
Cycle 0 1 2 3 4 5 6 7
Iterarion 0
Address(DpRAM0-3) 0 1 2 3 4 5 6 7
Address(DpRAM4-7) 0 1 2 3 4 5 6 7
Iteration 1
Address(DpRAM0-3) 0 2 4 6 1 3 5 7
Address(DpRAM4-7) 1 3 5 7 0 2 4 6
State State
i 2i

State State
i+32 2i+1

Figure 5. The diagram of BF unit

Table 1. State arrangement and path metric Figure 6. A novel architecture for
updating for constraint length 7 (64 states) the Viterbi decoder

Table 2. Address scrambling of path metric


memory for constraint length 7 (64 states)
Cycle 0 1 2 3 4 5 6 7
Iterarion 0
Address(DpRAM0-3) 0 1 2 3 4 5 6 7
Address(DpRAM4-7) 0 1 2 3 4 5 6 7
Iteration 1
Address(DpRAM0-3) 0 2 4 6 1 3 5 7
Address(DpRAM4-7) 1 3 5 7 0 2 4 6
State State
i 2i

State State
i+32 2i+1

Figure 5. The diagram of BF unit

Table 1. State arrangement and path metric Figure 6. A novel architecture for
updating for constraint length 7 (64 states) the Viterbi decoder

Table 2. Address scrambling of path metric


memory for constraint length 7 (64 states)
Cycle 0 1 2 3 4 5 6 7
Iterarion 0
Address(DpRAM0-3) 0 1 2 3 4 5 6 7
Address(DpRAM4-7) 0 1 2 3 4 5 6 7
Iteration 1
Address(DpRAM0-3) 0 2 4 6 1 3 5 7
Address(DpRAM4-7) 1 3 5 7 0 2 4 6
Insert Pipeline Levels into ACS
Generally, the maximum number of ACS pipeline levels is only dependent
on the ratio N/P (N: number of states ; P: number of ACS unit)

Table 3. The maximum pipelines levels for


(N/P) from 1 to 64
N/P 1 2 4 8 16 32 64
ACS pipline levels 1 1 2 5 10 20 40

Figure 7. A simple example of inserting pipeline levels into ACS unit

PM[k](t)
+
BM[k][i](t)
Comparator Selector PM[i](t+1)

PM[j](t)
+
BM[j][i](t)
Conclusion

Assuming pipeline levels are equally distributed into ACS,


the decoding speed is LP/N 5/8 of a state-parallel ACS
instead of P/N.

The maximum possible area-saving can be obtained by


selecting a large enough ratio N/P

A favorable solution for applications, where area-saving


and hence power, is the most crucial while moderate
decoding speed degradation is allowed.