Professional Documents
Culture Documents
2010 11 15
These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
fakultt fr informatik
- 2-
- 3-
0 Instr. A Instr. B
1 Instr. C
1 Instr. D
0 Instr. E
1 Instr. F
1 Instr. G
Cycle 1 2 3
technische universitt dortmund
Instruction A B E C F
fakultt fr informatik
D G
Instructions B, C and D use disjoint functional units, cross paths and other data path resources. The same is also true for E, F and G.
- 4-
fakultt fr informatik
- 5-
instruc 1
instruc 2
instruc 3
template
Instruction There are 5 instruction types: grouping A: common ALU instructions I: more special integer instructions (e.g. shifts) information M: Memory instructions F: floating point instructions B: branches The following combinations can be encoded in templates: MII, MMI, MFI, MIB, MMB, MFB, MMF, MBB, BBB, MLX with LX = move 64-bit immediate encoded in 2 slots
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 6-
MMI
M_II
MFI_
MII
Group 1
Group 2
Very restricted placement of stops within bundle. Parallel execution within groups possible. Parallel execution can span several bundles
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 7-
There are 4 functional unit (FU) types: M: Memory Unit I: Integer Unit F: Floating-Point Unit B: Branch Unit Instruction types corresponding FU type, except type A (mapping to either I or M-functional units).
fakultt fr informatik
- 8-
L3 cache
[ftp://download.intel.com/design/itaniu m2/download/madison_slides_r1.pdf] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
Intel, 2003
- 9-
Philips TriMediaProcessor
For For multimediamultimediaapplications, applications, up to 5 up to 5 instructions/ instructions/ cycle. cycle.
- 10 -
fakultt fr informatik
- 11 -
fakultt fr informatik
- 12 -
add sub ld
sub mult st
and xor mv
or div beq
The execution of many instructions has been started before it is realized that a branch was required. Nullifying those instructions would waste compute power Executing those instructions is declared a feature, not a bug. How to fill all delay slots with useful instructions? Avoid branches wherever possible.
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 13 -
fakultt fr informatik
- 14 -
Predicated execution [c] ADD x,y,a || [c] ADD x,z,b || [!c] SUB x,y,a || [!c] SUB x,z,b
1 cycle
p.marwedel, informatik 12, 2010
- 15 -
Microcontrollers - MHS 80C51 as an example Features for Embedded Systems 8-bit CPU optimised for control applications Extensive Boolean processing capabilities 64 k Program Memory address space 64 k Data Memory address space 4 k bytes of on chip Program Memory 128 bytes of on chip data RAM 32 bi-directional and individually addressable I/O lines Two 16-bit timers/counters Full duplex UART 6 sources/5-vector interrupt structure with 2 priority levels On chip clock oscillators Very popular CPU with many different variations
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 16 -
fakultt fr informatik
- 17 -
http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 18 -
- 19 -
fakultt fr p.marwedel, ~50% inherent power efficiency12, 2010 informatik informatik of silicon
- 20 -
2010 06 12
fakultt fr informatik
- 22 -
Reconfigurable Logic
Full custom chips may be too expensive, software too slow. Combine the speed of HW with the flexibility of SW HW with programmable functions and interconnect. Use of configurable hardware; common form: field programmable gate arrays (FPGAs) Applications: bit-oriented algorithms like encryption, fast object recognition (medical and military) Adapting mobile phones to different standards. Very popular devices from XILINX (XILINX Vertex II are recent devices) Actel, Altera and others
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 23 -
- 24 -
fakultt fr informatik
- 25 -
fakultt fr informatik
- 26 -
Virtex 5 SliceM
SliceM supports using memories for storing data and as shift registers
fakultt fr informatik
- 27 -
[ and source: Xilinx Inc.: Virtex 5 FPGA User Guide, May, 2009 //www.xilinx.com] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 28 -
fakultt fr informatik
- 29 -
[ and source: Xilinx Inc.: Virtex-II Pro Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 30 -
Memory
Peter Marwedel Informatik 12 TU Dortmund Germany 2009/11/22
Memory
Memories?
Oops! Memories!
For the memory, efficiency is again a concern: speed (latency and throughput); predictable timing energy efficiency size cost other attributes (volatile vs. persistent, etc)
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 32 -
Access times and energy consumption increases with the size of the memory
Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months"
[STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]
fakultt fr informatik
- 33 -
Area (2x106)
14 12 10 8
Power (W)
GP6M2
GP6M3
fakultt fr informatik
- 34 -
Memory system frequently consumes >50 % of the energy used for processing
29%
Proc. Energy
51,9%
I-Cache Energy
28,1%
D-Cache Energy Main Mem. Energy
14,8%
[M. Verma, P. Marwedel: Advanced Memory Optimization Techniques for Low-Power Embedded Processors, Springer, 2007] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 35 -
Icache 26%
EBOX 8%
Ibox 18%
D Cache 19%
Dcache 16%
Strong ARM
IEEE Journal of SSC Nov. 96
CP 15 2% PATag RAM 1%
D MMU 5%
[Based on slide by and : Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, U. of Massachusetts, Amherst, 2001] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 36 -
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source. technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 37 -
Speed
2x every 2 years
.)
Similar problems also for embedded systems & MPSoCs In the future: Memory access times >> processor cycle times Memory wall problem
.a (1.07 p DRAM
1 0 1 2 3 4
5 years
[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane] technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 38 -
Index $ () way 1
Data
technische universitt dortmund fakultt fr informatik p.marwedel, informatik 12, 2010
- 39 -
Example Example
0
scratch pad memory
FFF..
no tag memory ARM7TDMI cores, wellknown for low power consumption
select
SPM processor
technische universitt dortmund
SPM
fakultt fr informatik
- 40 -
Current
32 Bit-Load Instruction (Thumb)
116
77,2 50,9
82,2 44,4
Prog SPM/ Data Main
1,16 53,1
Prog SPM/ Data SPM
Core+SPM (mA)
fakultt fr informatik
- 41 -
Cache, 2way, 4GB space Cache, 2way, 16 MB space Cache, 2way, 1 MB space
memory size
technische universitt dortmund fakultt fr informatik
- 42 -
- 43 -
Summary
Processing VLIW/EPIC processors MPSoCs FPGAs Memories Small is beautiful (in terms of energy consumption, access times, size)
fakultt fr informatik
- 44 -