Professional Documents
Culture Documents
Memory requirement
HBM: Memory Solution for Density & Bandwidth-Hungry Processors
< Exa-scale Roadmap >
High-End Graphics
40G/100G Ethernet
Exa-scale HPC
Source : SciDAC,
www.scidacreview.org
High Bandwidth
High Capacity
solution beyond
DDR4/GDDR5/LPDDR4
Scaling Limit
Power Efficiency
solution to meet
solution to meet
Die Pad
Die Pad
Bond wire
TSV
PKG
Pad
PKG
Pad
Die #2
Die #1
Bottleneck 1) Bandwidth
DDR3
TSV(HBM)
Config.
soc
IO
64 DQ
1024 DQ
Speed
1.6G bps
1~2Gbps
Bandwidth
Compare
64 Gbps
Long Length
12.8GBps
1024 Gbps
RLC increase
Short Length
Max 256GBps
RLC decrease
1. Capacity Limit
TSV - HBM
Image
http://www.shmj.or.jp/english/
packaging/pac90s.html
PKG Size@die
100% (117mm2)
36% (42mm2)
mm2@128GB/s
100% (3744mm2)
11% (42mm2)
Power
Consumption*
@128GB/s
100% (6.4W)
51% (3.3W)
1.0
42%
0.63
0.55
0.32
DDR3 x16
DDR4 x16
GDDR5 x32
HBM
Items
Target
# of Stack
4(Core) + 1(Base)
Ch./Slice
IO/Ch.
128
Total IO/KGSD
1024(=128 x 8)
Address/CMD
Dual CMD
Data Rate
DDR
FBGA
Side Molding
Side Molding
DRAM Slice
DRAM Slice
DRAM Slice
DRAM Slice
DA ball
TSV
PHY
SoC
PHY
Interposer
KGSD
Base Die
DFT Area
TSV Area
DA BALL
TSV OVERLAP
(TEST Logic)
ADD/CMD,
DQ (RX/TX),
(Direct Access)
DFT
PHY
DRAM POWER
Supply
SIGNAL
Connectivity
Test
Low
Power
Capacity
& BW
Improve
RAS
POST
GDDR5
128GBps BW
JEDEC std
Density flexibility
>256GBps BW
VDD=1.XV
Good I/O power eff.
Digital Consumer,
Automotive, etc.
ECC
Post-package Repair
MBIST
KGSD w/ Bump
2nd
Gen HBM
Interposer
490mm
25um
5.48mm 25um
<Top View>
7.29mm 25um
Exposed Silicon
<Bottom View>
Item
Value
(a)
uBump
Diameter
25um
(b)
uBump
Height
( 3.5 um)
(c)
uBump Pitch
55um
(d)
uBump Array
(MPGA)
JEDEC
Remark
( 3 um)
35um
Cu/Ni/SnAG
(17/3/15um)
JC11-2.883
JC11-4.884
Side Mold(190um)
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
DDR3 (x8)
GDDR5 (x32)
I/O
32
1024
8Byte
32Byte
256Byte
2GB/s
28GB/s
128~256GB/s
40~48ns
40ns(=1.6v, 1.5v)
48ns(=1.35v)
40~48ns
tCCD
4ns (=4tCK)
2ns (=4tCK)
2ns (=1tCK)
VPP
Internal VPP
Internal VPP,
(Opt. Ext. VPP)
Ext. VPP
VDD
1.5, 1.35
1.2
Single CMD
Single CMD
Dual CMD
DBI mode
O (DBI_DC)
O (DBI_AC)
Access Granularity
(=I/O x Prefetch)
Max. Bandwidth
tRC
CMD Input
RAS
CMD
RA, CA
Bank
RA, CA
CMD
RAS/CAS/WE/CS
decoder
CMD
CAS
decoder
Refresh
CLK
T+1
T+2
T+3
BANK0
ACT
tRCD
BANK1
tRRD
ACT
T+5
T+6
T+7
WT
T+8
PCG
WT
BANK2
PCG
BANK3
WT
BANK4
tRRD
RD
WT
BANK5
RD
BANK6
BANK7
T+4
REFSB
RD
B2
DWORD 0
32 I/O
B6
B0
YCTRL
YCTRL
B3
B2
DWORD 1
32 I/O
B4
YCTRL
B1
AWORD
CH-Right
DWORD 2
32 I/O
B5
B4
YCTRL
YCTRL
B7
B6
B1
B0
YCTRL
YCTRL
B3
B2
DWORD 3
32 I/O
DWORD 0
32 I/O
B5
B4
YCTRL
YCTRL
B7
B6
B1
B0
YCTRL
YCTRL
B3
B2
DWORD 1
32 I/O
B5
C
YCTRL
B7
AWORD
B1
C
YCTRL
B3
DWORD 2
32 I/O
DWORD 3
32 I/O
1 bank
B4 : 2 sub-banks(64Mb)
B5
non-shared I/O between
sub-banks
YCTRL
B6
YCTRL
B7
Overcome
Memory Scaling
Memory Die
Customization to
meet various
requirements
- Timing
- Refresh
Parallel-to-Serial(P2S)/S2P
JTAG, PMBIST
Configuration Registers
Error Handling
20
Temp. [[]
Thermal dummy bumps as well as well-designed device architecture are helpful for
thermal dissipation No mechanical reliability issues by thermal dummy bumps.
Temp. Saturation
TSV
Thermal
Dummy Bumps
(1)
2015
2016
2017
2018
2019
2020
2021
2022
2023
HBM Summary
Perfect memory solution for various application requirement
High Bandwidth
~256GB/s
Up to 8GB
HBM
Smaller
Form Factor
-65%
1 ~ 4 Cubes
per GPU
4 ~ 8 Stacks
KGSD
1)
Good Power
Efficiency
68%
Thank You !