You are on page 1of 8

Outline

The Evolution of DSP Processors


u DSP applications
u Digital filtering as a motivating problem
u The first generation of DSPs, with an example
u Comparison of DSP processors to general-purpose processors
Berkeley Design Technology, Inc. u DSP evolution continues... later-generation DSPs and alternatives
2107 Dwight Way, Second Floor u Modern DSP-enhanced general-purpose processors
Berkeley, California U.S.A. u Benchmark results
u Conclusions
+1 (510) 665-1600
info@BDTI.com
http://www.BDTI.com

1 Copyright © 1998 Berkeley Design Technology, Inc. 2 © 2000 Berkeley Design Technology, Inc.

Who Cares? Example DSP Applications

u Digital cell phones u Satellite communications


u DSP is a key enabling technology for many types of electronic u Automated inspection u Seismic analysis
products u Vehicle collision avoidance u Secure communications
u DSP-intensive tasks are the performance bottleneck in many u Voice -over-Internet u Tapeless answering machines
computer applications today u Motor control u Sonar
u Computational demands of DSP-intensive tasks are increasing u Consumer audio u Cordless phones
very rapidly u Voice mail u Digital cameras
u In many embedded DSP applications, general-purpose u Navigation equipment u Modems (POTS, ISDN, cable, ...)
microprocessors are not competitive with DSP-oriented u Audio production u Noise cancellation
processors today u Videoconferencing u Medical ultrasound
u 2000 market for DSP processors: US $6.2 billion (2x 1998) u Toys, games consoles u Patient monitoring
u Music synthesis, effects u Radar

And more to come...

3 © 2000 Berkeley Design Technology, Inc. 4 © 2000 Berkeley Design Technology, Inc.

This is Your Palm Pilot This is Your Palm Pilot... On DSP

Hello, Dave. You


have a meeting
in 10 minutes.

5 © 2000 Berkeley Design Technology, Inc. 6 © 2000 Berkeley Design Technology, Inc.

1
Today's DSP "Killer Apps" DSP Tasks for Microprocessors

u Speech and audio compression


u In terms of dollar volume, the biggest markets for DSP
processors today include: u Filtering
l Digital cellular telephony u Modulation and demodulation
l Pagers and other wireless systems u Error correction coding and decoding
l Modems u Servo control
l Disk drive servo control u Audio processing (e.g., surround-sound,
noise reduction, equalization, sample rate
• Most demand good performance conversion, echo cancellation)
• All demand low cost
u Signaling (e.g., DTMF)
• Many demand high energy efficiency
u Speech recognition
u Signal synthesis (e.g., music, speech)
l Trends are towards better support for these (and similar)
major applications.

7 © 2000 Berkeley Design Technology, Inc. 8 © 2000 Berkeley Design Technology, Inc.

What Do DSP Processors Need to Do Well? A Motivating Example: FIR Filtering

Most DSP tasks require:


l Repetitive numeric calculations x D D ... D ... D
l Attention to numeric fidelity

• Fixed- vs floating-point h0 5 h1 h M-n 5 hM


5 5
• Standards
l High memory bandwidth
x = input samples
• Streaming data
l Real-time processing
y = output samples
h = filter coefficients
: ... : ... : y
D = unit time delay
a "tap"
Processors must perform these tasks efficiently while minimizing:
l Cost Each tap (M+1 taps total) nominally requires:
Σ x∗ h
l Power consumption
• Two data fetches
• Multiply
l Memory use
• Accumulate
l Development time
• Memory write-back to update delay line
9 © 2000 Berkeley Design Technology, Inc. 10 © 2000 Berkeley Design Technology, Inc.

First-Generation DSP (1982):


FIR Filter on Von Neumann Architecture
Texas Instruments TMS32010
loop: Data Path
Memory Structure
mov *r0,x0 Data Path Memory
mov *r1,y0 Program
Memory
mpy x0,y0,a Register
add a,b Data Path
mov y0,*r2 Data
Memory Mult
inc r0 Problems:
• Memory bandwidth bottleneck
inc r1
• Control code and addressing overhead
inc r2 • Possibly slow multiply • 16-bit fixed-point ALU
• Harvard architecture
dec ctr
• Accumulator
tst ctr • Specialized instruction set
jnz loop Accumulator
• 390 ns MAC time (228 ns today)
(Computes one tap per loop iteration)
11 © 2000 Berkeley Design Technology, Inc. 12 © 2000 Berkeley Design Technology, Inc.

2
TMS32010 Filter Code Features Common to Most DSP
Processors

LT X4 ;Load T with x(n-4) u Data path configured for DSP


MPY H4 ;P=H4*X4 u Specialized instruction set
LTD X3 ;Load T with x(n-3);x(n- 4)= x(n-3) u Multiple memory banks and buses
;Acc = Acc + P u Specialized addressing modes
MPY H3 ;P=H3*X3 u Specialized execution control
u Specialized peripherals for DSP
LTD X2
MPY H2
etc.

u Two instructions per tap, but requires loop unrolling

13 © 2000 Berkeley Design Technology, Inc. 14 © 2000 Berkeley Design Technology, Inc.

Data Path Comparison Instruction Set Comparison

DSP Processor General-Purpose Processor DSP Processor General-Purpose Processor


u Specialized hardware performs u Multiplies often take >1 cycle u Specialized, complex instructions u General-purpose instructions
all key arithmetic operations in u Multiple operations per instruction u Typically only one operation per
1 cycle instruction
u Shifts often take >1 cycle

u Hardware support for u Other operations (e.g., saturation, mac x0,y0,a x:(r0)+,x0 y:(r4)+,y0 mov *r0,x0
managing numeric fidelity: rounding) typically take multiple mov *r1,y0
l Shifters
mpy x0,y0,a
cycles
add a,b
l Guard bits mov y0,*r2
l Saturation
inc r0
inc r1

15 © 2000 Berkeley Design Technology, Inc. 16 © 2000 Berkeley Design Technology, Inc.

Memory Architecture Comparison Addressing Comparison

DSP Processor General-Purpose Processor DSP Processor General-Purpose Processor


u Harvard architecture u Von Neumann architecture u Dedicated address-generation u Often, no separate address -
u 2-4 memory accesses per cycle u Typically 1 access per cycle units generation units
u No caches --on-chip SRAM u May use caches
u Specialized addressing modes u General-purpose addressing
l Autoincrement modes
l Favor compiler-generated
l Modulo (circular)
Program code
l Bit-reversed (for FFT)
Memory
Processor Processor Memory u Good immediate data support
Data
Memory

17 © 2000 Berkeley Design Technology, Inc. 18 © 2000 Berkeley Design Technology, Inc.

3
Execution Control Specialized I/O for DSP

DSP Processor General-Purpose Processor u Synchronous serial ports u Host ports


u Hardware support for fast u Loops implemented in software u Parallel ports u Bit I/O ports
looping l Pipelines can increase cost of u Timers u On-chip DMA controller
loops u On-chip A/D, D/A converters u Clock generators
u "Fast interrupts" for I/O
handling u Interrupt overhead can be large
for simple interrupts u On-chip peripherals often designed for "background" operation,
even when core is powered down.
u Real-time debugging support u On-chip debug; usually not real-
time

19 © 2000 Berkeley Design Technology, Inc. 20 © 2000 Berkeley Design Technology, Inc.

Summary of DSP Attributes Summary of DSP Attributes

Computational demands Multiple parallel execution units, Execution-time locality Hardware-assisted zero-overhead
hardware acceleration of looping, specialized instruction
common DSP functions caches, streamlined interrupt
handling
Numeric fidelity Accumulator registers, guard bits, MAC-centricity Single-cycle multiplier(s) or MAC
saturation hardware
unit(s), MAC instruction

Harvard architecture, support for Streaming data No data cache; powerful DMA
High memory bandwidth
parallel moves
Real-time constraints Few dynamic features, on-chip
Specialized addressing modes, RAM instead of cache
Predictable data access e.g., modulo addressing, bit-
patterns reversed addressing Standards Rounding, saturation

21 © 2000 Berkeley Design Technology, Inc. 22 © 2000 Berkeley Design Technology, Inc.

Second-Generation DSPs (1987-): Low-cost GPP vs Low-Cost DSP


Motorola DSP56001
u 24-bit data, instructions Speed (BDTImarks TM ) Note that
P MIPS ≠≠ Performance!
u 3 memory spaces (X, Y, P) Memory
u Single- and multi-instruction 19
hardware loops X
20
Data Path 18
u Modulo addressing Memory
16
u 75 ns MAC (21 ns today) Y 14

Memory 12

move #Xaddr,r0 10
7
move #Haddr,r4 8
6
rep #Ntaps
4
mac x0,y0,a x:(r0)+,x0 y:(r4)+,y0
2
0
u Other 2nd-generation processors: Analog ARM7TDMI ADSP-218x

Devices ADSP-2100, TI TMS320C50 80 MIPS 75 MIPS

23 © 2000 Berkeley Design Technology, Inc. 24 © 2000 Berkeley Design Technology, Inc.

4
Third Generation (1995): Fourth Generation (1997-2000):
Ex: Motorola DSP56301, TI TMS320C541 Ex: TMS320C6201/6701, LSI401Z, MMX Pentium

Today's top DSP performers adopt architectures far different from


u Enhanced conventional DSP architectures
conventional DSP processor designs:
u 3.0 or 3.3 volts
u More on-chip memory u SIMD
u Application-specific function units in data path or as co- l Single instruction, multiple data

processors (e.g., MMX, AltiVec, MDMX)


u More sophisticated debugging and application development tools u VLIW
u DSP cores (Pine, Oak from DSP Group, cDSP from TI) l "Very long instruction word"

l Compile-time scheduling and parallel execution of multiple


u 20 ns MAC (10 ns today)
simple instructions (e.g., TMS320C6201/C6701)
u Architectural innovation mostly limited to adding application-
u Superscalar
specific function units and miscellaneous minor refinements
l Run-time scheduling and execution of >1 (usually 2-4)
u Also, multiple processors on a chip (TI TMS320C80, Motorola instructions per cycle (e.g., Pentium, PowerPC, ZSP164xx)
MC68356)
u User-defined instructions
25 © 2000 Berkeley Design Technology, Inc. 26 © 2000 Berkeley Design Technology, Inc.

General-Purpose Processors Add DSP SIMD


Single Instruction, Multiple Data

u Virtually all high-performance CPUs (and some


modern DSPs) support SIMD operations

u One SIMD instruction performs the same operation on


multiple (independent) sets of data
u For each SIMD instruction, you can get 2x (or 4x, or 8x, ...)
the work

u Two ways to implement SIMD


u Split execution units
u Multiple execution units (or data paths) operating in
lock-step

27 © 2000 Berkeley Design Technology, Inc. 28 © 2000 Berkeley Design Technology, Inc.

SIMD SIMD Characteristics


Split Execution Unit

32-bit input register 32-bit input register u Each instruction performs lots of work
u Algorithms, data organization must be amenable to data-parallel
16 bits 16 bits 16 bits 16 bits processing
l Programmers must be creative, and sometimes pursue
alternative algorithms
++ −− ×× ++ −− ×× l Reorganization penalties can be significant

u Most effective on algorithms that process large blocks of data


u May support multiple data widths (e.g., 16-bit
and 8-bit)
16 bits 16 bits

32-bit output register holds two results

29 © 2000 Berkeley Design Technology, Inc. 30 © 2000 Berkeley Design Technology, Inc.

5
SIMD Challenges High-Performance GPPs with SIMD

u Loss of generality
u Most high-performance GPPs targeting desktop applications are
l Each iteration of a loop processes N elements (typically 4 ≤ N
superscalar architectures
≤ 8) l Pentium, PowerPC
l Amplified if loops are unrolled for speed
u Often have many dynamic features to accelerate performance,
enable higher clock speeds
u High program memory usage
l Sophisticated, multi -level caches
l Re-arranging data for SIMD processing
l Branch prediction
l Merging partial results
l Speculative execution
l Loop unrolling
u Most offer SIMD extensions to increase performance on DSP and
u Often, only fixed-point supported multimedia applications (audio, video)
l MMX/SSE, AltiVec

31 © 2000 Berkeley Design Technology, Inc. 32 © 2000 Berkeley Design Technology, Inc.

High-Performance GPPs with SIMD Hybrid DSP/Microcontrollers

u These processors can often execute DSP tasks faster than DSP u GPPs designed for embedded applications are starting to address
processors DSP needs

u So why do people still use DSPs? u Embedded GPPs typically don't have the advanced features that
l Price affect execution-time predictability, so are easier to use for DSP
l Power consumption

l Availability of off -the-shelf DSP software


u There are a wide variety of approaches to combining DSP and
l DSP-oriented development tools
microcontroller functionality
l DSP-oriented on-chip integration

l Execution-time predictability is especially


problematic with high-performance GPPs

33 © 2000 Berkeley Design Technology, Inc. 34 © 2000 Berkeley Design Technology, Inc.

Hybrid DSP/Microcontrollers Hybrid DSP/Microcontrollers


Approaches Advantages, Disadvantages

l Multiple processors on a die l Multiple processors on a die


• e.g., Motorola DSP5665x • Two entirely different instruction sets, debugging tools,
etc.
l DSP co-processor
• e.g., Massana FILU-200
• Both cores can operate in parallel
l DSP brain transplant in existing µC
• e.g., SH-DSP • No resource contention...

Microcontroller tweaks to existing DSP • ...but probably resource duplication


l

• e.g., TMS320C27xx

l Totally new design


• e.g., TriCore

35 © 2000 Berkeley Design Technology, Inc. 36 © 2000 Berkeley Design Technology, Inc.

6
Hybrid DSP/Microcontrollers Hybrid DSP/Microcontrollers
Advantages, Disadvantages Advantages, Disadvantages

l DSP co-processor l DSP brain transplant in existing µC;


microcontroller tweaks to existing DSP
• May result in complicated programming model
• Simpler programming model than dual cores
– Dual instruction sets
• Subject to constraints imposed by "legacy" architecture
– Possible deadlocks
• Allows code re-use

• Transferring data between the host and the co-processor


may be time-consuming l Totally new design
• Avoids legacy constraints
• Both cores can operate in parallel • May result in a cleaner architecture
• Adopting a totally new architecture can
be risky

37 © 2000 Berkeley Design Technology, Inc. 38 © 2000 Berkeley Design Technology, Inc.

Processor DSP Speed: BDTImarks Processor DSP Speed: BDTImarks


(Higher is Better) (Higher is Better)

1st gen 2nd gen 3rd gen 4th gen 1st gen 2nd gen 3rd gen 4th gen
1982 1987 1995 2000 1982 1987 1995 2000
1000 1000

123 123
100 100

13 13

10 10
4 4

Speech
1 0.5 1 0.5

0.1 0.1
TMS32010 DSP56001 TMS320C54x TMS320C6201 TMS32010 DSP56001 TMS320C54x TMS320C6201
5 MIPS 13 MIPS 50 MIPS 250 MHz 5 MIPS 13 MIPS 50 MIPS 250 MHz

39 © 2000 Berkeley Design Technology, Inc. 40 © 2000 Berkeley Design Technology, Inc.

Processor DSP Speed: BDTImarks Processor DSP Speed: BDTImarks


(Higher is Better) (Higher is Better)

1st gen 2nd gen 3rd gen 4th gen 1st gen 2nd gen 3rd gen 4th gen
1982 1987 1995 2000 1982 1987 1995 2000
1000 1000

123 123
100 100 Audio

13 13
“2G” “2G”
10 Wireless 10 Wireless
4 4

Speech Speech
1 0.5 1 0.5

0.1 0.1
TMS32010 DSP56001 TMS320C54x TMS320C6201 TMS32010 DSP56001 TMS320C54x TMS320C6201
5 MIPS 13 MIPS 50 MIPS 250 MHz 5 MIPS 13 MIPS 50 MIPS 250 MHz

41 © 2000 Berkeley Design Technology, Inc. 42 © 2000 Berkeley Design Technology, Inc.

7
Processor DSP Speed: BDTImarks Execution Times
(Higher is Better) Complex Block FIR Filter Benchmark

(lower is better)
1st gen 2nd gen 3rd gen 4th gen
1982 1987 1995 2000 High- Performance DSPs vs High-Performance CPU
1000 “3G”
Wireless 16
123 14
100 Audio 12

microseconds
10
13
“2G” 8
10 Wireless
4 6
4
Speech
2
1 0.5
0

TMS320C6202 SC140 ADSP- 21160 TMS320C6701 Pentium III


250 MHz 300 MHz 100 MHz 167 MHz 1000 MHz
0.1
TMS32010 DSP56001 TMS320C54x TMS320C6201
5 MIPS 50 MIPS 250 MHz
13 MIPS
fixed-point results
43 © 2000 Berkeley Design Technology, Inc. 44
floating-point results© 2000 Berkeley Design Technology, Inc.

Conclusions For More Information

u DSP processor performance has increased by a factor of about 150x http://www.BDTI.com Collection of BDTI's papers on
over the past 15 years (~40% per year) DSP processors, tools, and
benchmarking
u Multi-issue architectures dominate the field of new high-performance http://www.eg3.com/dsp Links to other good DSP sites
processors
l But conventional DSPs still make up most of volume shipping today
comp.dsp Usenet group
u General-purpose processors increasingly tackling DSP, providing
competition for dedicated DSP processors Microprocessor Report
For info on newer DSPs
u Users of processors for DSP will have an expanding array of choices DSP Processor Fundamentals,
BDTI Textbook on DSP processors
u Compiler-friendliness is an increasingly important factor...
l ... as time-to-market pressures increase and applications become larger
Or, join BDTI...We're hiring!
u Selecting processors requires careful, application-specific
analysis (see www.BDTI.com)
45 © 2000 Berkeley Design Technology, Inc. 46 © 2000 Berkeley Design Technology, Inc.

You might also like