4220 Delivering Hadoop As GSM 2014 Emc Forum

Delivering Hadoop-as-a-Service To
Your Organization
Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources
Internet of Things
Mobile Sensors
Dark Data
Smart Grids
Social Media
Oil Exploration
Video Surveillance
Medical
Imaging
2
Why Hadoop?
Save Money Or Make Money
Improve
Company
Performance
Increase Revenue
Increase Demand
Increase Spend
Efficiency
Increase
Customer
Acquisition
Ad
Optimization
Hyper
Targeting
Campaign
Optimization
Purchase Funnel
Analysis
Increase Customer
Engagement
Customer
Segmentation
Churn Prevention
Customer Lifetime
Value
Ad
Effectiveness
Analytics
Manage Demand
Increase Basket
Size
Demand Analysis
Price Optimization
Affinity Analytics
Next Best Offer
Cross-Sell / Upsell
Reduce Costs
Build Brand Equity
Increase Reach
Digital Marketing
Social Media
Click Fraud
Improve
Customer Loyalty
Social Graph /
Influencers
Transaction
Anomaly
Detection
Production Cost /
Efficiency
Supply / Demand
Forecasting
General and
Administrative
Workforce
Analytics
Employee Churn
IT / Security
Analytics
Loyalty Program
Analytics
Customer
Satisfaction
Customer Care
Analytics
Market Mix
Modeling
Coupon
Redemption
Hadoop Overview
Hadoop
is an open-source framework from Apache that allows
for parallel batch processing of very large data sets
MapReduce
is the Hadoop process that divides the workload so
multiple devices can process it
HDFS
is the file system for the data. It provides data
protection and locality with multiple mirrors (usually 3
times)
IT Challenges With Hadoop

Time consuming and complex creating
shadow IT
Bare metal capacity utilization is low

Multiple Hadoop Distribution deployments
creating data siloes
Typical Enterprise Deployment
Multiple, siloed
clusters to manage
Redundant common
data in separate
clusters
Peak compute and I/O
resource is limited to
number of nodes in
each independent
cluster
Dept A: Recommendation engine
Production
Test
Production
Log files
Experimentation
Historical cust behavior
Dept B: Ad targeting
Test
Experimentation
Social data
6
What If You Consolidate & Virtualize?

Recommendation engine
Production
Ad targeting
Production
Experimentation
Test
Experimentation
Test
Experimentation
Production
recommendation engine
Test/Dev
Production
Ad Targeting
One physical platform to support

multiple virtual big data clusters
EMC Hadoop Starter Kit

Consolidate And Virtualized Hadoop With EMC Isilon And Vmware
Support for major

Hadoop distributions
GUI simplifies
management tasks
Apache
nam e
node
nam e
node
nam e
node
data node
Quickly deploy, manage,

and scale Hadoop
clusters
NameNode
Data
HDFS
nam e
node
Elastic scaling optimizes

cluster performance and
resource utilization
Why Shared Storage For Hadoop?
Hadoop Bare Metals Deployment

Hadoop DAS Environment
1
2
3
4
Dedicated Storage Infrastructure
2x
2x
2x
3x
3x
3x
Rigid compute to storage ratio
Manual Import/Export
1x
3X mirroring
Fixed Scalability
1x
No Snapshots, replication, backup
Poor Storage Efficiency
NameNode
One-off for Hadoop only
Lacking Enterprise Data Protection
1x
No protocol support
10
Hadoop On EMC Isilon Scale Out NAS

1
2
3
4
5
Scale-Out Storage Platform
Multiple applications & workflows
End-to-End Data Protection
SnapshotIQ, SyncIQ, NDMP Backup
Industry-Leading Storage Efficiency
>80% Storage Utilization
Independent Scalability
Add compute & storage separately
Multi-Protocol
Industry standard protocols

NFS, CIFS, FTP, HTTP, HDFS
11
EMC Isilon Addresses Hadoop Challenges

1
2
3
4
5
Dedicated Storage Infrastructure
One-off for Hadoop only
Lacking Enterprise Data Protection
No Snapshots, replication, backup
Poor Storage Efficiency
3X mirroring
Fixed Scalability
Rigid compute to storage ratio
Manual Import/Export
No protocol support
1
2
3
4
5
Scale-Out Storage Platform
Multiple applications & workflows
End-to-End Data Protection
SnapshotIQ, SyncIQ, NDMP Backup
Industry-Leading Storage Efficiency
>80% Storage Utilization
Independent Scalability
Add compute & storage separately
Multi-Protocol
Industry standard protocols

NFS, CIFS, FTP, HTTP, HDFS
12
Why Virtualize Hadoop?
13
Hadoop with Virtualization

Elastic, Multi-Tenant
VM
VM
Combined
Storage/
Compute
Hadoop in VM
VM lifecycle
determined
by Datanode
Limited elasticity
Limited to Hadoop
Multi-Tenancy
VM
Comput
e
T1
VM
VM
Storage
Separate Storage
VM
T2
Storage
Separate Compute Tenants
Separate compute
from data
Elastic compute
Compute cluster per tenant

Stronger VM-grade security
and resource isolation
Enable shared
workloads
Raise utilization
Enable deployment of
multiple Hadoop runtime
versions
14
Virtualized Hadoop Performance

Native vs. Virtual, 32 hosts, 16 disks/host
Source: http://www.vmware.com/resources/techresources/10360
15
Example Deployment With Pivotal HD

Pre-requisities
Isilon OneFS version 6.5.5 or
higher
VMware vSphere 5.0 (or later)
Enterprise or Enterprise Plus
Download Vmware Big Data

Extensions (Free)
Configure Isilon cluster for
HDFS (Free license)
Configure Big Data Extensions
to use Pivotal HD
Deploy Hadoop Cluster
Run a simple program to test

16
Hadoop Data Services

Real-time, Interactive, And Batch Processing
17
WGSN
Retail
Challenges
Rapidly launch new market intelligence service for
fashion retailers
Support large and growing volumes of Big Data
Performance, scalability, and tight integration with Hadoop
were the key reasons we chose Isilon. We also felt very
Pivotal Greenplum Database
comfortable with the partnership between EMC and
Pivotal HD
Pivotal. In the end, the EMC and Pivotal solution offered
EMC Isilon
the ideal balance of storage and compute with the right
Pivotal Data Science Labs level of support.
Solution
Results
Fast deployment with native Hadoop integration,
enabling rapid launch of new service
Delivered high performance scalability
Simplified platform administration
18
Download Hadoop Starter Now

Rapid provisioning
High availability
Elasticity
Multi-tenancy
Portability
https://community.emc.com/docs/DOC-26892
19

4220 Delivering Hadoop As GSM 2014 Emc Forum

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4220 Delivering Hadoop As GSM 2014 Emc Forum

Uploaded by

Copyright:

Available Formats

Delivering Hadoop-as-a-Service To

Build Brand Equity

Copyright 2014 EMC Corporation. All rights reserved.

Copyright 2014 EMC Corporation. All rights reserved.

IT Challenges With Hadoop

Bare metal capacity utilization is low

Copyright 2014 EMC Corporation. All rights reserved.

Typical Enterprise Deployment

Dept A: Recommendation engine

Copyright 2014 EMC Corporation. All rights reserved.

Historical cust behavior

What If You Consolidate & Virtualize?

Copyright 2014 EMC Corporation. All rights reserved.

One physical platform to support

EMC Hadoop Starter Kit

Support for major

Quickly deploy, manage,

Elastic scaling optimizes

Why Shared Storage For Hadoop?

Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Bare Metals Deployment

Dedicated Storage Infrastructure

Rigid compute to storage ratio

No Snapshots, replication, backup

Poor Storage Efficiency

One-off for Hadoop only

Lacking Enterprise Data Protection

Copyright 2014 EMC Corporation. All rights reserved.

Hadoop On EMC Isilon Scale Out NAS

Copyright 2014 EMC Corporation. All rights reserved.

Scale-Out Storage Platform

Multiple applications & workflows

End-to-End Data Protection

SnapshotIQ, SyncIQ, NDMP Backup

Industry-Leading Storage Efficiency

>80% Storage Utilization

Add compute & storage separately

Industry standard protocols

EMC Isilon Addresses Hadoop Challenges

Dedicated Storage Infrastructure

One-off for Hadoop only

Lacking Enterprise Data Protection

No Snapshots, replication, backup

Poor Storage Efficiency

Rigid compute to storage ratio

Copyright 2014 EMC Corporation. All rights reserved.

Scale-Out Storage Platform

Multiple applications & workflows

End-to-End Data Protection

SnapshotIQ, SyncIQ, NDMP Backup

Industry-Leading Storage Efficiency

>80% Storage Utilization

Add compute & storage separately

Industry standard protocols

Why Virtualize Hadoop?

Copyright 2014 EMC Corporation. All rights reserved.

Hadoop with Virtualization

Copyright 2014 EMC Corporation. All rights reserved.

Separate Compute Tenants

Compute cluster per tenant

Virtualized Hadoop Performance

Example Deployment With Pivotal HD

Download Vmware Big Data