Professional Documents
Culture Documents
Manas Minglani
Wednesday, 2015
Topics
Block-Based Data Access
File-Based Data Access
Object-Based Data Access
Object-Based Storage Devices (OSD)
Object Storage Systems
Object Storage Server (OSS)
Content Addressable Storage (CAS)
Content Aware Storage (CAS)
Data Access
Data Access
Block
SCSI
SAS
iSCSI
SATA
Object
OSD
File
Local FS
Distributed FS
Global, Distributed & Parallel FS
Inode Metadata
File systems allow concurrent access to smaller groups & they enable read and
write operations. Overhead to manage permissions and operations such as file
locking.
Much of the unstructured data that is generated today, does not require
concurrent access; so the file systems overhead is unnecessary and simply adds
cost and complexity.
Hierarchical file structure consisting of directories and a hierarchy of nested
folders, subfolders and files; the content and data contained within a file is not
anywhere as important as is the location in the directories; as a consequence
each file will have only basic metadata attached to it, such as file name, date
created, last modified, file type, and person who created the file.
In a Flat Namespace
Objects
Inodes vs Objects
Comparison
We dont recommend you use object storage for transactional data,
especially because of the eventual consistency model outlined
previously. In addition, its very important to recognize that object
storage was not created as a replacement for NAS file access and
sharing; it does not support the locking and sharing mechanisms
needed to maintain a single accurately updated version of a file.
Good examples for block storage use cases are structured database
storage, random read/write loads, and virtual machine file system
(VMFS) volumes.
Object Types
The ANSI T10 SCSI OSD standard defines four different objects:
The root object -- The OSD itself
User objects -- Created by SCSI commands from the application or client
Collection objects -- A group of user objects, such as all .mp3 objects or all objects belonging to a project
Partition objects -- Containers for user objects and collections that share common security and space management
characteristics, such as quotas and keys
Durability: Due to the design of most Object Storage systems (3 file replicas is the most common
paradigm), durability levels at scale are extremely high compared to conventional storage solutions
(think 99.99999% to 99.999999999%, 7 to 11 nines). Object Storage systems have internal mechanisms
to verify file consistency, and handle failed drives, bit-rot, server and cabinet failures, etc. These
features allow the system to automatically replicate data as needed to retain the desired number of
replicas, which results in extremely high durability and availability of data.
Cost: Because many Object Storage platforms are designed to run on commodity hardware, even with
3x overhead, the price point is attractive when compared to block or file storage. At scale, costs of
pennies per gig per month are typical. Think the comparable or better durability than tape, nearly the
same cost as tape, and the convenience and performance of hot storage, plus all the benefits of a
cloudy storage platform.
New Paradigm
Evolving the storage interface. The figure indicates how the
OSD migrates (lower arrow) a portion of the file system into
the storage device while leaving the file system high-level
policy functions (e.g., user authentication) to the server
(upper arrow). A cryptographic mechanism (symbolized by
the keys in the figure and generated using private keys
stored in the OSD) is used to authorize client access.
Rationale
An object can grow or shrink dynamically and is completely contained
and managed within a single OSD.
Objects are grouped into partitions that enable security, space
management, and quota management. Each partition represents a
security domain with its own set of keys.
The OSD object model is flexible, allowing higher-level software to
map each of its entities (e.g., file and table) to a single object,
multiple objects, or a partial object.
OSD Optimizations
One of the common usage patterns of an OSD involves an operation (e.g., WRITE)
on the data segment of an object and then accessing some attributes (e.g., a file
system maintained last-access-time attribute). To support this common access
pattern, the OSD standard allows all commands to include a SET_ATTR and/or
GET_ATTR operation. We call this piggybacking an attribute access on a
command. Piggybacking attribute updates can significantly reduce the number of
messages and hence the message processing on each OSD. Piggybacking can also
improve client latency by removing the additional network roundtrip that a
separate GET_ATTR or SET_ATTR command would create.
Security
Eventual Consistency
Scale out by adding more x86 servers, nodes to the Object Store
Due to distributed nature of Object Storage, it is subject to Brewers CAP theorem.
It states that it is impossible for a distributed system to simultaneously
Provide consistency, availability, and partition tolerance
Object stores can offer two but not three
Open vStorage takes this different approach and is designed from the ground up
with Virtual Machines and their performance requirements in mind. It uses a
well-considered architecture which allows Object Storage to be turned into block
storage for Virtual Machines and avoids pitfalls such as seen with distributed file
systems linked to Object Storage.
Open vStorage creates a unified namespace for the Virtual Machines across
multiple Hosts.
OpenStack Swift
Data is distributed by making distributed copies or replicas to different nodes and
drives across the Swift cluster
The number of replicas is user-configurable
The access is provided via RESTfil API and am Amazon S3 compatible RESTful API
Swift is based on ring architecture with three categories: account, container, and
objects
Ceph
Stores data on a single distributed computer cluster and provides interfaces for
object-, block-, and file-level storage
Does not have a single point of failure
Scalable to Exabyte level
Replicates data and make it fault tolerant
Characteristics
API-level access vs. filesystem-level
Flat structure vs. hierarchical structure
Scalable metadata
Scalable platform
Durable data storage
Low-cost data storage
Advantages
Scalable capacity (many PB easily)
Scalable performance (environment-level performance scales in a
linear manner)
Durable
Low cost
Simplified management
Single Access Point
No volumes to manage/resize/etc.
Disadvantages
No random access to files
POSIX utilities do not work directly with object-storage (it is not a
filesystem)
Integration may require modification of application and workflow
logic (This could easily be seen when we are trying to use FIO for the
Kinetic Drives). Due to API in there we have to alter the FIO to make it
run benchmarks on the Kinetic Drive.
Typically, lower performance on a per-object basis than block storage
Use Cases
Currently the datasets best-suited for Object Storage are the following:
Unstructured data
Thanks