You are on page 1of 26

An Executives Cheat Sheet on Hadoop, the Enterprise

Data Warehouse and the Data Lake


Tamara Dull, Director of Emerging Technologies, SAS Best Practices

@TamaraDull

Big data
is not
new.

@TamaraDull

CRM

FINANCIAL
DATA

LOYALTY
CARD DATA

TROUBLE
TICKETS

EMAIL

PDF FILES

SPREADSHEETS

WORD
PROCESSING
DOCUMENTS

RFID TAGS

GPS

WEB LOG
DATA

PHOTOS

SATELLITE
IMAGES

SOCIAL
MEDIA DATA

BLOGS

FORUMS

CLICKSTREAM
DATA

VIDEOS

XML DATA

MOBILE
DATA

WEBSITE
CONTENT

RSS FEEDS

AUDIO
FILES

CALL CENTER
TRANSCRIPTS

POS DATA

On Todays Agenda
Whats Trending?
The 5 Questions
A Comparison and
Contrast Exercise

@TamaraDull

Part 1 of 3

Whats Trending?

@TamaraDull

The market is growing.

SOURCE: http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017

@TamaraDull

The success rate is meh.

@TamaraDull

People issues trump technology issues.

@TamaraDull

Analytics keep them coming back.

@TamaraDull

Part 2 of 3

The 5 Questions

@TamaraDull

The 5 Questions
1) What can Hadoop do that my data warehouse cant?
2) Were not doing big data, so why do we need
Hadoop?
3) Is Hadoop enterprise-ready?

4) Isnt a data lake just the data warehouse revisited?


5) What are some of the pros and cons of a data lake?
@TamaraDull

1) What can Hadoop do that my data warehouse cant?

1. Store data more cheaply.

2. Process data more quickly


(and cheaply).

@TamaraDull

2) Were not doing big data, so why do we need Hadoop?

Stage structured data.

Process any data.

@TamaraDull

Process structured data.

Archive any data.

Access any data.

Access any data.

(via data warehouse)

(via Hadoop)

3) Is Hadoop really enterprise-ready?


For your organization: Maybe
For all organizations: No

@TamaraDull

Are we
there yet?

4) Isnt a data lake just the data warehouse revisited?


DATA WAREHOUSE
structured, processed

schema-on-write
hierarchically archived
less agile, fixed
configuration
mature
business professionals

@TamaraDull

vs.

DATA

DATA LAKE
structured / semistructured / unstructured,
raw

PROCESSING

schema-on-read

STORAGE

object-based, no
hierarchy

AGILITY
SECURITY
USERS

highly agile, configure and


reconfigure as needed
maturing
data scientists et. al.

5) What are some of the pros and cons of a data lake?

@TamaraDull

strengths

weaknesses

lower costs
one-stop data shopping

data management
security

opportunities

threats

discovery
advanced analytics

status quo
skills

Part 3 of 3

A Comparison and Contrast Exercise

@TamaraDull

A Functional Comparison
Business Requirements
Discovery of unexplored business
questions
Clean, transformed, high-quality
aggregated data
Low latency, interactive reports, OLAP
High volumes of raw, highly granular,
unstructured data
Exploratory analysis of preliminary data

@TamaraDull

Traditional

Big Data

A Cost Comparison: The TCOD Model

Challenge: Which platform is most


cost-effective EDW or Hadoop?

The Total Cost of Data (TCOD) model:


Calculates the cost of using data over a
5-year period
Includes these costs:
System and data administration
Data integration
Query development
Procedural program development
Analytic application development

@TamaraDull

Free downloads:

Special Report: http://www.wintercorp.com/tcod-report


Spreadsheet: http://www.wintercorp.com/tcodspreadsheet

TCOD Example 1: Building a Data Warehouse

Requirements:
Large number of data sources,
users, complex queries, analyses
and analytic applications
Data integration and integrity
Reusability and agility to
accommodate rapidly changing
business requirements and long
data life

Data volume: 500 TB

@TamaraDull

Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

TCOD Example 2: Building a Data Refinery

Objective: Refine the sensor output


of large industrial diesel engines

Requirements:
Rapid, intensive processing of a
small number of closely-related
data sets
Analysis reads the entire dataset
Life of the raw data is relatively short
Small group of experts collaborate
on analysis

Data volume: 500 TB

@TamaraDull

Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

A Cost Comparison: TCOD 5-Year Summary


Example 1: Data Warehouse

Example 2: Data Refinery

Data Warehouse
Platform

Hadoop

Data Warehouse
Appliance

Hadoop

$44.6

$1.4

$22.7

$1.4

Initial acquisition

$10.8

$0.2

$5.5

$0.2

Upgrades

$16.4

$0.3

$8.4

$0.3

Maintenance/support

$15.9

$0.2

$8.2

$0.2

Power/space/cooling

$1.5

$0.6

$0.6

$0.7

Administration

$7.7

$8.5

$0.8

$0.8

Application development

$16.5

$36.0

$6.6

$7.2

ETL

$18.4

--

--

--

Complex queries

$88.7

$475.0

--

--

Analysis

$88.7

$219.0

--

--

$265.0 million

$740.0 million

$30.0 million

$9.3 million

Cost
System Cost

Total Cost of Data

@TamaraDull

Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

Executives Cheat Sheet

Wrap-Up

@TamaraDull

Executives Cheat Sheet


Big data is not new.
Analytics keep them coming back.
Big data requires new ways of working.
Use the best tool for the job.

@TamaraDull

Its a big data world out there. Now lets be safe.


Tamara Dull
tamara.dull@sas.com
@tamaradull

presents

@TamaraDull