You are on page 1of 20

Delivering Hadoop-as-a-Service To

Your Organization
Copyright 2014 EMC Corporation. All rights reserved.

Why Hadoop?

Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources

Internet of Things

Mobile Sensors

Dark Data
Copyright 2014 EMC Corporation. All rights reserved.

Smart Grids

Social Media

Oil Exploration

Video Surveillance

Medical
Imaging
2

Why Hadoop?
Save Money Or Make Money
Improve
Company
Performance

Increase Revenue

Increase Demand

Increase Spend
Efficiency

Increase
Customer
Acquisition

Ad
Optimization
Hyper
Targeting
Campaign
Optimization

Purchase Funnel
Analysis

Increase Customer
Engagement

Customer
Segmentation
Churn Prevention
Customer Lifetime
Value

Ad
Effectiveness
Analytics

Manage Demand

Increase Basket
Size

Demand Analysis
Price Optimization

Affinity Analytics
Next Best Offer
Cross-Sell / Upsell

Reduce Costs

Build Brand Equity

Increase Reach

Digital Marketing

Social Media

Click Fraud

Improve
Customer Loyalty

Social Graph /
Influencers

Transaction
Anomaly
Detection

Production Cost /
Efficiency

Supply / Demand
Forecasting

General and
Administrative

Workforce
Analytics
Employee Churn
IT / Security
Analytics

Loyalty Program
Analytics

Customer
Satisfaction
Customer Care
Analytics

Market Mix
Modeling
Coupon
Redemption

Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Overview
Hadoop
is an open-source framework from Apache that allows
for parallel batch processing of very large data sets
MapReduce
is the Hadoop process that divides the workload so
multiple devices can process it
HDFS
is the file system for the data. It provides data
protection and locality with multiple mirrors (usually 3
times)

Copyright 2014 EMC Corporation. All rights reserved.

IT Challenges With Hadoop


Time consuming and complex creating
shadow IT

Bare metal capacity utilization is low


Multiple Hadoop Distribution deployments
creating data siloes

Copyright 2014 EMC Corporation. All rights reserved.

Typical Enterprise Deployment

Multiple, siloed
clusters to manage

Redundant common
data in separate
clusters
Peak compute and I/O
resource is limited to
number of nodes in
each independent
cluster

Dept A: Recommendation engine

Production

Test

Copyright 2014 EMC Corporation. All rights reserved.

Production

Log files

Experimentation

Historical cust behavior

Dept B: Ad targeting

Test

Experimentation

Social data
6

What If You Consolidate & Virtualize?


Recommendation engine

Production

Ad targeting

Production

Experimentation

Test

Experimentation

Test

Experimentation

Copyright 2014 EMC Corporation. All rights reserved.

Production
recommendation engine

Test/Dev
Production
Ad Targeting

One physical platform to support


multiple virtual big data clusters

EMC Hadoop Starter Kit


Consolidate And Virtualized Hadoop With EMC Isilon And Vmware

Support for major


Hadoop distributions

GUI simplifies
management tasks

Apache
nam e
node
nam e
node
nam e
node

data node

Quickly deploy, manage,


and scale Hadoop
clusters

NameNode
Data

HDFS

nam e
node

Elastic scaling optimizes


cluster performance and
resource utilization
Copyright 2014 EMC Corporation. All rights reserved.

Why Shared Storage For Hadoop?

Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Bare Metals Deployment


Hadoop DAS Environment
1

2
3
4

Dedicated Storage Infrastructure

2x

2x

2x

3x

3x

3x

Rigid compute to storage ratio

Manual Import/Export

1x

3X mirroring

Fixed Scalability

1x

No Snapshots, replication, backup

Poor Storage Efficiency

NameNode

One-off for Hadoop only

Lacking Enterprise Data Protection

1x

No protocol support

Copyright 2014 EMC Corporation. All rights reserved.

10

Hadoop On EMC Isilon Scale Out NAS


1
2
3

4
5

Copyright 2014 EMC Corporation. All rights reserved.

Scale-Out Storage Platform

Multiple applications & workflows

End-to-End Data Protection

SnapshotIQ, SyncIQ, NDMP Backup

Industry-Leading Storage Efficiency

>80% Storage Utilization

Independent Scalability

Add compute & storage separately

Multi-Protocol

Industry standard protocols


NFS, CIFS, FTP, HTTP, HDFS

11

EMC Isilon Addresses Hadoop Challenges


1
2
3
4
5

Dedicated Storage Infrastructure

One-off for Hadoop only

Lacking Enterprise Data Protection

No Snapshots, replication, backup

Poor Storage Efficiency

3X mirroring

Fixed Scalability

Rigid compute to storage ratio

Manual Import/Export

No protocol support

Copyright 2014 EMC Corporation. All rights reserved.

1
2
3

4
5

Scale-Out Storage Platform

Multiple applications & workflows

End-to-End Data Protection

SnapshotIQ, SyncIQ, NDMP Backup

Industry-Leading Storage Efficiency

>80% Storage Utilization

Independent Scalability

Add compute & storage separately

Multi-Protocol

Industry standard protocols


NFS, CIFS, FTP, HTTP, HDFS

12

Why Virtualize Hadoop?

Copyright 2014 EMC Corporation. All rights reserved.

13

Hadoop with Virtualization


Elastic, Multi-Tenant
VM

VM

Combined
Storage/
Compute
Hadoop in VM

VM lifecycle
determined
by Datanode
Limited elasticity
Limited to Hadoop
Multi-Tenancy

Copyright 2014 EMC Corporation. All rights reserved.

VM

Comput
e

T1

VM

VM

Storage
Separate Storage

VM

T2

Storage

Separate Compute Tenants

Separate compute
from data
Elastic compute

Compute cluster per tenant


Stronger VM-grade security
and resource isolation

Enable shared
workloads
Raise utilization

Enable deployment of
multiple Hadoop runtime
versions
14

Virtualized Hadoop Performance


Native vs. Virtual, 32 hosts, 16 disks/host

Source: http://www.vmware.com/resources/techresources/10360
Copyright 2014 EMC Corporation. All rights reserved.

15

Example Deployment With Pivotal HD


Pre-requisities
Isilon OneFS version 6.5.5 or
higher
VMware vSphere 5.0 (or later)
Enterprise or Enterprise Plus

Download Vmware Big Data


Extensions (Free)
Configure Isilon cluster for
HDFS (Free license)
Configure Big Data Extensions
to use Pivotal HD
Deploy Hadoop Cluster

Run a simple program to test


Copyright 2014 EMC Corporation. All rights reserved.

16

Hadoop Data Services


Real-time, Interactive, And Batch Processing

Copyright 2014 EMC Corporation. All rights reserved.

17

WGSN
Retail

Challenges
Rapidly launch new market intelligence service for
fashion retailers
Support large and growing volumes of Big Data
Performance, scalability, and tight integration with Hadoop
were the key reasons we chose Isilon. We also felt very
Pivotal Greenplum Database
comfortable with the partnership between EMC and
Pivotal HD
Pivotal. In the end, the EMC and Pivotal solution offered
EMC Isilon
the ideal balance of storage and compute with the right
Pivotal Data Science Labs level of support.

Solution

Results
Fast deployment with native Hadoop integration,
enabling rapid launch of new service
Delivered high performance scalability
Simplified platform administration

Copyright 2014 EMC Corporation. All rights reserved.

18

Download Hadoop Starter Now


Rapid provisioning
High availability

Elasticity
Multi-tenancy
Portability
https://community.emc.com/docs/DOC-26892
Copyright 2014 EMC Corporation. All rights reserved.

19

You might also like