You are on page 1of 44

Rick Coulson

Senior Fellow, Intel NVM Solutions Group

1
Outline

The history of storage latency and where we stand today


The promise of Storage Class Memory (SCM) and 3D Xpoint Memory
Extensive compute platform changes driven by the quest for lower storage
latency with SCM / 3D Xpoint Memory
As traditional storage
As persistent memory
Innovation opportunities abound
1956: IBM RAMAC 350

5 MBytes
$57,000
$15200/Mbyte
~1.5 Random IOPs*
600ms latency

* IOPs depends on the workload and is a range


*Other names and brands may be claimed as the property of others.
5
1980: IBM 3350

300 Mbytes
$60,000
$200/MByte
30 random IOPs
33ms latency

*Other names and brands may be claimed as the property of others.


1983: IBM3380

2.52 GBytes
$82,000
$36/MByte
~160 IOPS total
25ms latency

Two hard disk assemblies each with two independent actuators


each accessing 630 MB gigabyte within one chassis

*Other names and brands may be claimed as the property of others.


2007: 15K RPM HDD

15K RPM HDD


About 200 random IOPs*
~5ms latency

IOPS scaling problem was addressed through HDDs in parallel in Enterprise

* IOPs depends on the workload and is a range


2016: 10K RPM HDD

1.8TB
~150 IOPs
6.6ms latency
2016 NVMe NAND SSD

2TB
500,000+ IOPs
~60 usec latency

10
The Continuing Need For Lower Latency

RAMAC 350 HDD


~100x 10K RPM
Access time reduction
600ms ~6ms access

~10,000x NAND SSD


Access time reduction ~60us access

59 Years

RAMAC 305 ~40,000,000x Core i7


100 Hz best Clock speed increase
~4 Ghz clock
case clock

Source: Wikipedia
*Other names and brands may be claimed as the property of others.
Lower Storage Latency Requires Media and
Platform Improvements Persistent
Memory

3D XPoint
memory (SCM)

Ultra fast SSD

Drive for Lower Latency

Media Bottlenecks

Platform HW / SW bottlenecks
12
Addressing Media Latency:
Next Gen NVM / SCM Resistive RAM NVM Options
Scalable Resistive Memory Element Family Defining Switching
Wordlines Memory
Characteristics
Element Phase Energy (heat) converts material
Change between crystalline (conductive) and
Selector Memory amorphous (resistive) phases
Device Magnetic Switching of magnetic resistive
Tunnel layer by spin-polarized electrons
Junction
(MTJ)
Electrochemical Formation / dissolution of
Cells (ECM) nano-bridge by electrochemistry
Binary Oxide Reversible filament formation by
Filament Oxidation-Reduction
Cross Point Array in Backend Layers ~4l2 Cell Cells
Interfacial Oxygen vacancy drift diffusion
Switching induced barrier modulation

Scalable, with potential for near DRAM access times


3D XPoint Technology

Crosspoint Breakthrough
Structure Material Advances
Selectors allow dense Compatible switch and
packing and individual memory cell materials
access to bits

Scalable High Performance


Memory layers Cell and array architecture
can be stacked in that can switch states 1000x
a 3D manner faster than NAND
1000X
faster
1000X
endurance
10X
denser
THAN NAND OF NAND THAN DRAM

*Results have been estimated or simulated using internal analysis or architecture simulation or modeling, and provided to you for informational
purposes. Any differences in your system hardware, software or configuration may affect your actual performance
3D XPoint Technology Instantiation

Intel Optane SSDs DIMMs based on 3D XPoint


3D Xpoint Technology Video

Please excuse the marketing

17
3D Xpoint Technology Video
Demonstration of 3D Xpoint SSD Prototype
Need to Address System Architecture To Go Lower
120

100
Latency (usecs)

80

60

40

20

0
NAND MLC NVMe SSD 3D Xpoint NVMe SSD DIMM Memory
(4kB read) (4kB read) (64B read)
Block Storage Platform Changes

Intel Optane SSDs

21
Addressing Interface Efficiency With NVMe / PCI

10,000

200 SSD NAND technology offers ~500X


175 reduction in media latency over HDD
Latency (uS)

150
125
NVMe eliminates 20 s of
100
controller latency
75
50 ~7X 3D XPoint SSD delivers < 10 s latency
25
3D XPoint Persistent Memory
0 HDD SSD SSD SSD PM
+SAS/ NAND NAND 3D 3D
SATA +SAS/ +NVMe XPoint Xpoint
SATA +NVMe

Drive Controller Software


Latency Latency Latency
(ie. SAS HBA)

Source: Storage Technologies Group, Intel


NVMe Delivers Superior Latency
Platform HW/SW Average Latency Excluding Media 4KB
100
90
80
microseconds (us)

70 SATA SAS
60
50
40 PCIe NVMe approaches
30 theoretical max of 800K
20 IOPS at 18us
10
0
0 100000 200000 300000 400000 500000 600000 700000 800000 900000
IOPS
2 WV LSI 4 WV LSI 6 WV LSI 2 WV AHCI
4 WV AHCI 1 FD 1 CPU 1 FD 2 CPU 1 FD 4 CPU

Source: Storage Technologies Group, Intel


NVMe/PCIe Provides More Bandwidth

7 6.4
6
Bandwidth (GB/sec)

4
3.2 PCIe/NVMe provides
3 more than 10X the
Bandwidth of SATA.
2 Even More with Gen 4
1 0.55

0
SATA 4x PCIeG3/NVMe 8x PCIeG3/NVMe

Source: Storage Technologies Group, Intel


Storage SW Stack Optimizations

Much of the storage stack designed with HDDs latencies in mind


No point in optimizing until now
Example: Paging algorithms with seek optimization and grouping

25
Synchronous Completion for Queue Depth 1?
Async (interrupt-driven) Context
Switch
9.0 s
From Yang: FAST 12
System call
Ta Tb=1.4 Tu = 2.7 s Ta
Return to user -10th USENIX
CPU
user kernel
user
kernel user
Conference on File
(P2)
OS cost = Ta + Tb and Storage
Storage = 4.9 + 1.4
Device Device 4.1 s
Interrupt
Technologies
command = 6.3 s

Sync (polling)
System call Return to user
4.4 s
CPU
user kernel user
polling

Storage 2.9 s
Device OS cost = 4.4 s
Standards for Low Latency Replication

In most Datacenter usage models, a


storage write does not count until
replicated
High replication overhead
diminishes the performance
differentiation of 3D XPoint
technology
NVMe over Fabrics is a developing
SNIA specification for low overhead
replication
Summary: Block Storage Platform Changes

Move to PCIe based storage


Streamlined command set NVMexpress Intel Optane SSDs
OS / SW stack optimizations
Fast replication standards

28
Persistent Memory Oriented Platform Changes

DIMMs based on 3D XPoint

29
Why Persistent Memory?
30

25

20
Latency (usecs)

15

10

0
NAND MLC NVMe SSD 3D Xpoint NVMe SSD DIMMDIMM
3D XPoint MemoryMemory
(4kB read) (4kB read) (64Bread)
(64B read)
Open NVM Programming Model

50+ Member Companies

SNIA Technical Working Group


Initially defined 4 programming modes required by developers

Spec 1.0 developed, approved by SNIA voting members and published

Interfaces for PM-aware file system interfaces for application accessing a Kernel support for block Interfaces for legacy applications to
accessing kernel PM support PM-aware file system NVM extensions access block NVM extensions

32
NVM Library: pmem.io
64-bit Linux Initially

Application

Standard
Load/Store
File API User
Space
Library

pmem-Aware MMU

File System
Open Source
Mappings

Kernel http://pmem.io
Space
libpmem
libpmemobj
libpmemblk Transactional
Intel 3D XPoint DIMM libpmemlog
libvmem
33 libvmmalloc

33
Write I/O Replaced with Persist Points

Application

Standard
Load/Store
File API
User
Space
NVM Library

No Page Cache
pmem-Aware MMU
Mappings
File System

Kernel Traditional APIs NVML API


Space
msync() pmem_persist()
FlushViewOfFile()

NVDIMM
Operating System Support for Persistent Memory
The Data Path

MOV Core Core Core Core

L1 L1 L1 L1 L1 L1 L1 L1
L2 L2 L2 L2

L3

Memory Controller Memory Controller

NVDIMM NVDIMM
NVDIMM NVDIMM

36
New Instructions For Flushing Writes

MOV Core Core Core Core

L1 L1 L1 L1 L1 L1 L1 L1
L2 L2 L2 L2
CLFLUSH, CLFLUSHOPT, CLWB

L3

Memory Controller Memory Controller


PCOMMIT
NVDIMM NVDIMM
NVDIMM NVDIMM

37
Flushing Writes from Caches

Instruction Meaning
Cache Line Flush:
CLFLUSH addr
Available for a long time

Optimized Cache Line Flush:


CLFLUSHOPT addr
New to allow concurrency

Cache Line Write Back:


CLWB addr Leave value in cache
for performance of next access

38

38
Flushing Writes from Memory Controller

Instruction Meaning

Persistent Commit:
PCOMMIT Flush stores accepted by
memory subsystem

Flush outstanding writes


Asynchronous DRAM Refresh on power failure
Platform-Specific Feature

39

39
Example Code

Comments
MOV X1, 10
MOV X2, 20 X2,X1 are in pmem
.
MOV R1, X1 Stores to X1 and X2 are globally visible, but may
. not be persistent
.
.
CLFLUSHOPT X1
CLFLUSHOPT X2 X1 and X2 moved from caches to memory
.
.
SFENCE
PCOMMIT
.
.
SFENCE Ensures PCOMMIT has completed
Join the Discussion about Persistent Memory
Learn about the Persistent Memory programming model
http://www.snia.org/forums/sssi/nvmp

Join the pmem NVM Libraries Open Source project


http://pmem.io

Read the documents and code supporting ACPI 6.0 and Linux NFIT drivers
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/log/?h=nd
https://github.com/pmem/ndctl
http://pmem.io/documents/
https://github.com/01org/prd

Intel Architecture Instruction Set Extensions Programming Reference


https://software.intel.com/en-us/intel-isa-extensions
Intel 3D XPointTM Memory
http://www.intel.com/content/www/us/en/architecture-and-technology/non-volatile-memory.html
Persistent Memory Summary

New storage model for low latency


DIMMs based on 3D XPoint
New instructions to support persistence
OS support
Lots of innovation opportunity
Low Latency Ahead

<1 usec

Persistent
Memory
3D XPoint memory

NVMe SSD
<10 usec
Ultra fast SSD

43

You might also like