Professional Documents
Culture Documents
DATA MODELING
FOR MODERN BI
AND ANALYTICS
13th TDWI European Conference | 1719 June, 2013
John OBrien
@obrienjw @radiantadvisors
john.obrien@radiantadvisors.com
#modernBI
v2.10.000
JOHN OBRIEN
@obrienjw @radiantadvisors
john.obrien@radiantadvisors.com
With over 25 years of experience delivering value through data warehousing and business
intelligence (BI) programs, John OBrien's unique perspective comes from the combination of
his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in
designing, building, and growing enterprise BI systems and teams brings real world insights to
each role and phase within a BI program.
Today, through Radiant Advisors John provides research and advisory services that guide
companies in meeting the demands of next generation information management, architecture,
and emerging technologies.
Experienced
Education
#modernBI
v2.10.000
OUTLINE
Section 1
Section 2
Section 3
Section 4
#modernBI
v2.10.000
ANALYSIS AND
INFORMATION
MODELING
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
INDUSTRY EVOLUTION
#modernBI
Business
Analytics
2010+
v2.10.000
DECADES OF BI EVOLUTION
1970s-1980s Bill Inmons Building Strategic Data Warehouses
Subject Oriented and Business Model Oriented
Build it & they will come or One version of the truth
80% of data warehouse projects fail
#modernBI
v2.10.000
what happened?
why did that happen?
whats happening now?
whats going to happen?
#modernBI
v2.10.000
#modernBI
v2.10.000
BUSINESS ARCHITECTURES
Functionally Oriented
Business Oriented
Line of Business
CEO
Operations
Inventory
Product
Development
Line of Business
CEO
Finance
Sales
Line of Business
Customer
Support
Global Finance
#modernBI
v2.10.000
10
BUSINESS SYSTEMS
#modernBI
v2.10.000
11
Marketing
Sales data
Customer data
Order data
Customer data
Product data
Customer data
Process Orders
Opportunities
Prospecting
Customer rating
Customer budget
Answers
Data
Sales
Perspective
Apps
Billing
Order data
Customer data
Support
Order data
Customer data
Collect Revenue
Support/Retain
Customer Balance
Customer Last Pmt
Customer Bill Term
Customer Value
Customer Warranty
Customer Status
#modernBI
v2.10.000
12
BUSINESS ALIGNMENT
Align People
Align Goals
Align Information
#modernBI
v2.10.000
13
BI FEEDBACK LOOP
Quote
Support
Quoting
App
Order
Business
Processes
Customer
Care Sys
Ship
Ordering
System
Operational
Systems
Shipping
App
Invoice
Finance
Accounting
Business Processes
Executed
Operational Systems
Capture Business Events
Integrate
Cleanse
Acquire
Act
Business
Intelligence
Analyze
Transform to
Business Context
Actionable
Business Data
Improve Processes
Achieve Goals
Decide & Act
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
Publish
14
MODELING
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
15
Modeling
MODELING GENERALIZED
What is modeling?
Understanding something
Analysis, Learning, Investigating, Testing Assumptions
#modernBI
v2.10.000
15
16
Modeling
BUSINESS MODELING
Business subjects and their relationships
Customer places Order
#modernBI
v2.10.000
16
17
Modeling
SUBJECTS TO NORMALIZED
Product
Customer
Finance
Order
#modernBI
v2.10.000
17
18
Modeling
Dimensional Business Subjects
Sales organization and processes
Finance organization and processes
Customer Relationship Management (CRM)
Operations organization and processes
#modernBI
v2.10.000
18
19
Modeling
BILL INMON
a subject oriented, nonvolatile, integrated, time variant collection of
data in support of management's decisions.
#modernBI
v2.10.000
20
Modeling
RALPH KIMBALL
A data warehouse is merely the union of all its constituent data
marts.
#modernBI
v2.10.000
21
Modeling
PERCEPTIONS
Are you Kimball or Inmon?
Departmental Focused or Enterprise Driven?
Are you Tactical or Strategic?
#modernBI
v2.10.000
22
Modeling
Source
2
Source
3
Source
4
Source
5
Data Warehouse
ODS
Staging
Layer
Normalized
Subject
Areas
Conformed
Dim &
Metrics
Operations
& Supply
#modernBI
Finance
v2.10.000
Order to
Cash
CRM
23
Modeling
Operational
database
Extract
data
Data
Warehouse
Distribute
data
Customer
Data Mart
Sales
Data Mart
CRM
database
Transform
Code /
Business
Rules/
Cleansing
Flat
Flat
File
File
Financial
Cube
#modernBI
v2.10.000
23
24
Modeling
Integrated
Nonvolatile
Once loaded into the data warehouse, the data is not updated or
changed
Time Variant
Stores near current data (since the last acquisition process) and all
historical data and changes for analysis
#modernBI
v2.10.000
25
Modeling
Department Scope
#modernBI
v2.10.000
26
Modeling
Cons
Scalable architecture
#modernBI
v2.10.000
27
Modeling
#modernBI
v2.10.000
28
Modeling
ETL
Bus
ERP
database
Business
Rules
Customer
Data Mart
Load
Data
Operational
database
Extract
Data
Transform
Code
Conformed
Dimensions
Sales
Data Mart
CRM
database
Stage
Area
Flat
Flat
File
File
Financial
Cube
#modernBI
v2.10.000
29
Modeling
DIMENSIONAL MODELING
Answers Business Questions:
1. How much Sales were last
month by Sales Person and
Product Category?
2. Are Sales Quantities of
Product Category A
increasing each month for
the past year?
3. What Products do
Customers in City A buy
most of this month?
4. Who are our repeat buy
customers?
#modernBI
v2.10.000
30
Modeling
BOTTOM UP APPROACH
Enterprise
Scope
Department Scope
#modernBI
v2.10.000
31
Modeling
Cons
Quicker Deliveries
Focused on Department
Information Needs
#modernBI
v2.10.000
32
Modeling
DIMENSIONAL MODELING
DATE
TRADE
JOB
Facts related to
Qualifiers
Facts are numeric
Facts have standard
units
Every Fact has all
Qualifiers
Modeling Answers to
business questions
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
32
33
BI CAPABILITIES AND
DATA MODELING
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
34
#modernBI
v2.10.000
34
35
GENERATIONS OF CAPABILITIES
Next:
Opportunities
Manage Risk
Now:
Whats happening
Operational
Understand
Insight
Make Decisions
What happened:
Last Year
Last Month
Last Week
Yesterday
#modernBI
v2.10.000
36
Scorecards
Performance
Dashboard
Analytics
Director
Manager
Reports
Director
Manager
Team
#modernBI
Analyst
Knowledge
Worker
Manager
Knowledge
Worker
v2.10.000
36
37
Informa@on
Consumers
Standard Reports
Knowledge Worker
Parameterized
Reporting
Business Analysts
Analytics
Managers &
Directors
Performance
Dashboards
Executives
Scorecards
Statisticians
Data Mining
Customers
Partners & Suppliers
External
Operational
Data Mart
BI Applica@ons
Internal
Understand how the user needs to work with information and experience level
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
38
#modernBI
v2.10.000
38
39
#modernBI
v2.10.000
40
1 STATIC REPORTING
#modernBI
v2.10.000
40
41
2 INTERACTIVE REPORTING
What I want to know
Customizing user sets parameters
Enhanced value to access, select, filter, sort
Introduces UI (User Interface) to allow control
Avoids many reports of same data
#modernBI
v2.10.000
41
42
#modernBI
v2.10.000
42
43
4 MONITORING
#modernBI
v2.10.000
43
44
5 PREDICTIVE ANALYTICS
What is going to happen? or Likely to happen?
I dont know what question to ask the data
Data asks the person Is this relevant?
#modernBI
v2.10.000
44
45
MULTIDIMENSIONAL
ANALYSIS
Dimensional Modeling Paradigm
#modernBI
v2.10.000
46
TOPICS
Data Models for Dimensional
Process for Requirements Gathering
Data Modeling for Dimensional Analysis
Physical Modeling for Dimensional Analysis
#modernBI
v2.10.000
47
DIMENSIONAL
MODELING
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
48
Dimensional Modeling
#modernBI
v2.10.000
48
49
Dimensional Modeling
#modernBI
v2.10.000
49
50
Dimensional Modeling
Meter:
Protability
Fact
Fact
Fact
Fact
Fact
Facts
Meter:
Revenue
Fact
Fact
Fact
Fact
Fact
#modernBI
v2.10.000
50
51
Dimensional Modeling
LOGICAL MODELING
#modernBI
v2.10.000
51
52
Dimensional Modeling
ADDITIVE FACTS
Additive
Useful and meaningful when summarized along any set of
dimensions
Semi-Additive
Useful to summarize using some dimensions but not all
dimensions
Non-Additive
Impractical to sum along any set of dimensions
Example: Fact Employee Count can be summarized across departments, over different
job trades but not time for quarter
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
52
53
Dimensional Modeling
#modernBI
v2.10.000
53
54
PHYSICAL STAR
SCHEMAS
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
55
STAR SCHEMA
#modernBI
v2.10.000
55
56
location
location_code
location_name
location_address
location_city_name
location_state_abbr
location_zip_code
location_key
labor organization
union_id_number
union_name
union_group_code
trade_code
trade_SOC_code
trade_name
contract_start_date
contract_end_date
labor_org_key
date_yyyymm
fiscal_yyyymm
fiscal_yyyyqq
time_key
employee satisfaction
job_change_count
employment_length_months
complaint_count
resignation_count
termination_count
promotion_count
demotion_count
disciplinary_action_count
satisfaction_score
time_key
emp_key
location_key
employment_org_key
labor_org_key
emp_age
emp_gender
#modernBI
emp_id_number
emp_age
emp_gender
emp_name
emp_hire_date
emp_term_date
emp_status_code
emp_term_reason
emp_key
employment
organization
dept_number
dept_name
dept_abbr
job_id_number
job_title
job_shift_code
job_shift_name
emplmt_org_key
v2.10.000
56
57
Advanced topic
Users interact
with
Stars Schemas
BI Tool Layer:
Universes
Catalogs
Stars mapped
to Tables
Database Layer:
Views
Data stored in
Normalized
Tables
Physical Database
Structures
#modernBI
v2.10.000
57
58
SNOWFLAKE SCHEMA
#modernBI
v2.10.000
58
59
#modernBI
v2.10.000
59
60
#modernBI
v2.10.000
60
61
MULTI-DIMENSIONAL
OLAP (CUBES)
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
61
62
Multi-Dimensional Cubes
OVERVIEW
#modernBI
v2.10.000
63
Multi-Dimensional Cubes
45+ YEARS
Some of the highlights:
#modernBI
v2.10.000
64
Multi-Dimensional Cubes
---------------------------------------------------------------------------------------------------
2007 Oracle buys Hyperion (just when it delivers 11g embedded OLAP)
2008 IBM buys Cognos
2008 SAP buys Business Objects
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
65
Multi-Dimensional Cubes
Products
MOLAP CUBES
Clothing
50
66
Hats
Sweaters
20
20
Jackets
25
40
U.S.
West
Hawaii
Alaska
#modernBI
66 clothing items
sold in Alaska
on December 2nd
40 items of jackets
sold in Alaska
on December 2nd
Time
v2.10.000
65
66
Multi-Dimensional Cubes
Disadvantages
Pre-Calculations
Latency
Scalability
Hierarchies
Ability to define multiple hierarchies
on a dimension
Sparsity
Significantly less data is actual, but
every cell assigned a value
#modernBI
v2.10.000
67
Multi-Dimensional Cubes
SPARSITY SOLUTIONS
Cube Farming or Cube Chunking
Horizontal Partitioning
Vertical Partitioning
Hybrid OLAP
#modernBI
v2.10.000
68
Multi-Dimensional Cubes
CUBE FARMING
LOB 1
LOB 2
Q1
Asia
Europe
North America
#modernBI
v2.10.000
69
Multi-Dimensional Cubes
HORIZONTAL PARTITIONING
#modernBI
v2.10.000
70
Multi-Dimensional Cubes
VERTICAL PARTITIONING
OEM Business
Retail Business
Asia
Europe
North America
#modernBI
v2.10.000
71
Multi-Dimensional Cubes
CHUNKING CUBES
#modernBI
v2.10.000
72
OLAP ARCHITECTURE
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
72
73
Information
Data Flow
Source
DM
Source
DM
Source
Stage
EDL
DM
Source
DM
Source
DM
Extract Engine
(Little to no code)
Copyright 2013 Radiant Advisors. All Rights Reserved
Transformation
The Real Work
#modernBI
v2.10.000
Load Engine
(Select, Filter, Target)
74
BASIC DW ARCHITECTURE
#modernBI
v2.10.000
75
Departmental Specialization
Localized and sub-dimensions
Local facts and derivations
Tailored Delivery
#modernBI
v2.10.000
76
DIMENSIONAL ARCHITECTURE
#modernBI
v2.10.000
77
#modernBI
v2.10.000
78
#modernBI
v2.10.000
79
#modernBI
v2.10.000
80
ENABLING TECHNOLOGIES
#modernBI
v2.10.000
81
SEMANTIC
VIRTUALIZATION
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
82
Semantic Virtualization
Unstructured
Structured
More Rigid
#modernBI
v2.10.000
More Agile
83
Semantic Virtualization
Context leveraged
BI Tools
Direct access
Unstructured
Context in structures
More Rigid
#modernBI
v2.10.000
More Agile
84
Semantic Virtualization
Context leveraged
Context(s) leveraged
BI Tools
Direct access
Context in abstraction
Context in structures
Unstructured
Context in structures
More Rigid
#modernBI
v2.10.000
More Agile
85
Semantic Virtualization
Centralized
Context in
abstraction
PIG
Individual
Context
with Data
Scientists
HIVE
Unstructured
Structured
MapReduce
Hadoop HDFS
More Rigid
#modernBI
v2.10.000
More Agile
86
Semantic Virtualization
Individual
Context
with Data
Scientists
Centralized
Context in
abstraction
MapReduce
Hadoop HDFS
DB
HCatalog
MapReduce
Hadoop HDFS
More Rigid
PIG
Context in
Data
Scientists
Hive
PIG
Centralized
Context in
abstraction
HIVE
Unstructured
Structured
#modernBI
v2.10.000
More Agile
87
Semantic Virtualization
Structured
Context leveraged
BI Tools
Direct access
Context in structures
Context in structures
Individual
Context
with Data
Scientists
Centralized
Context in
abstraction
MapReduce
Hadoop HDFS
M/R
HCatalog
MapReduce
Hadoop HDFS
More Rigid
PIG
Context in
Data
Scientists
Hive
PIG
Centralized
Context in
abstraction
Hive
Unstructured
Context in abstraction
#modernBI
v2.10.000
More Agile
88
Semantic Virtualization
Structured
Context leveraged
BI Tools
Direct access
Context in structures
Context in structures
Individual
Context
with Data
Scientists
Centralized
Context in
abstraction
MapReduce
Hadoop HDFS
M/R
HCatalog
MapReduce
Hadoop HDFS
More Rigid
PIG
Context in
Data
Scientists
Hive
PIG
Centralized
Context in
abstraction
Hive
Unstructured
Context in abstraction
#modernBI
v2.10.000
More Agile
89
Semantic Virtualization
Value
PIG
MapReduce
Hadoop HDFS
Power Users
Users Involved
Hive
Yesterday
constrained
Very Few
Data Scientists
#modernBI
v2.10.000
90
Semantic Virtualization
Many Many
Consumers
Very Few
Data Scientists
HCatalog
MapReduce
Hadoop HDFS
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
PIG
Power Users
Hive
DB
Tomorrow
Unleashed
BI Tool
Value
Analysts &
Casual Users
Users Involved
91
Semantic Virtualization
DISCOVERY IN BI PROCESSES
Many More Analysts
Iterate
More
Analysts/Modelers
M/R
HCatalog
Very Few
Data Scientists
Hadoop HDFS
2.
Copyright 2013 Radiant Advisors. All Rights Reserved
Defined Context
Available to
Structured Database
#modernBI
v2.10.000
BI
Tool
Verify
BI
Tool
Discover
Context
PIG
Migrate
Test
Hive
Discover
Few
Analysts/
Modelers
92
Semantic Virtualization
REVIEW
#modernBI
v2.10.000
93
ROLE OF DATA
VIRTUALIZATION
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
94
#modernBI
v2.10.000
95
VIRTUALIZAITON DEFINED
Virtualization refers to technologies that provide a layer of
abstraction in between technology stacks and allows for a single
logical view rather than physical view.
RS
BO
COG
HYP
REP
REP
AS
DB Layer
SS
ORA
ORA
ORA
PG
ORA
SS
OS Layer
MS
LNX
SOL
MS
LNX
MAC
MS
Server Layer
x86
x86
x86
x86
x86
x86
X86
BI Layer
Disks Layer
#modernBI
v2.10.000
96
#modernBI
v2.10.000
97
BENEFITS
#modernBI
v2.10.000
98
user
user
user
BO
COG
HYP
MS
Virtual
Database
VIEW
VIEW
VIEW
VIEW
VIEW
VIEW
VIEW
Databases
ORA
MS
#modernBI
DB2
PG
v2.10.000
XML
LOG
FILE
99
#modernBI
v2.10.000
100
DATA LEGACIES
#modernBI
v2.10.000
101
#modernBI
v2.10.000
102
#modernBI
v2.10.000
103
COMPLIMENTARY SOLUTIONS
ETL:
Focused on data movement
Data warehouse is another set of data existing in operational systems
Operates normally in batch mode
EAI:
Focused on moving data with corresponding business logic between systems
Guaranteed delivery
Transactional in nature
Data Virtualization:
Virtualizes all data in a single view
Persists no data, meta model driven
Most current to real time
#modernBI
v2.10.000
104
DATA MOVEMENT
Client
App
EAI
ERP
Extract
Client
Client
EAI
App
Extract
Extract
ETL
Load
DW
Only DV does not move data, except when returning result sets
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
105
#modernBI
v2.10.000
106
Data Services
Query Tools
SOAP Interface
Virtual DB
Cust
Oracle
JDBC
Transform
SQL
Server
#modernBI
Other Access
Top
Cust
DB2
v2.10.000
ODBC
Virtual
Tables
Order
Sybase
Logs
XML
Text
107
DISTRIBUTED QUERY
The original intent of data virtualization tools was
to solve the query across multiple database
scenario
Today these tools can query across:
Multiple database vendors
Multiple data formats
Incorporate legacy APIs
Data on multiple system platforms
#modernBI
v2.10.000
108
PERFORMANCE
#modernBI
v2.10.000
109
#modernBI
v2.10.000
110
CACHING
#modernBI
v2.10.000
111
CONFIGURATION MANAGEMENT
Strong configuration management best practices will
have to be followed for storing versions
All virtual tables
All transformation
Virtual relationships
Mapping to databases
#modernBI
v2.10.000
112
JOB ROLE
#modernBI
v2.10.000
113
#modernBI
v2.10.000
114
TIME VARIANT
#modernBI
v2.10.000
115
TRANSFORMATIONS
#modernBI
v2.10.000
116
TRANSFORMATIONS
Virtual
Table
All describe in
Virtual Table
Metadata
Join
#modernBI
Agg
Java
Code
Result
v2.10.000
SOAP/JDBC Interface
Join
Virtual
Table
Client
SQL
117
TRANSFORMATION ARCHITECTURE
Data Virtualization Server / Virtual Database
Client
SOAP/
JDBC
Interface
Joined
Table
Joined
Table
Table
Table
Agg
Table
Table
Table
JDBC
ODBC
RDBMS
RDBMS
#modernBI
v2.10.000
Joined
Table
Table
Table
Legacy
API
Text
All tables
represented
as 1 for 1
118
#modernBI
v2.10.000
119
#modernBI
v2.10.000
120
#modernBI
v2.10.000
121
#modernBI
v2.10.000
121
122
#modernBI
v2.10.000
123
COLUMNAR SQL
DATABASES
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
124
COLUMNAR DATABASES
Transactional OLTP databases are built around row management
which is what OLTPs need
Analytics works with data sets and more specifically selecting, filter
and grouping by columns
Therefore, Columnar databases that store data on disk in a columnar
orientation will perform much better than reading rows of data to find
columns of interest
#modernBI
v2.10.000
125
#modernBI
v2.10.000
126
Columnar Databases
COLUMNAR ORIENTATION
1
John
OBrien
CO
50000
Betty
Smith
CA
55000
Sue
Hughes
TX
60000
Tom
Jones
NV
65000
David
Saunders
CO
70000
1,John,OBrien,CO,5000,2,Betty,Smith,CA,55000,3,Sue,Hughes,TX,60000,4,Tom,Jones,NV,65000
1,2,3,4,5,John,Betty,Sue,Tom,David,50000,55000,60000,65000,70000
SELECT State, SUM(Sales) FROM Table1 GROUP BY state ORDER BY 2;
#modernBI
v2.10.000
127
#modernBI
v2.10.000
128
NOSQL DATABASES
FOR ANALYTICS
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
129
NoSQL Landscape
#modernBI
v2.10.000
130
NoSQL Landscape
#modernBI
v2.10.000
131
NoSQL Landscape
#modernBI
v2.10.000
132
NoSQL Landscape
LOOKING AHEAD
Relational Databases are not going away
Not ACID like RDBMS, options for consistency and
distribution
not only enables ecosystem of storage options so you
must understand your data better
Polyglot Persistence is using different data stores in different
circumstances
#modernBI
v2.10.000
133
UNDERSTANDING
NOSQL COLUMNAR
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
134
DATABASES AVAILABLE
Cassandra
Hbase
HyperTable
Amazon DynamoDB
#modernBI
v2.10.000
135
GOOD FOR
#modernBI
v2.10.000
136
#modernBI
v2.10.000
137
UNDERSTANDING
NOSQL GRAPHS
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
138
#modernBI
v2.10.000
139
#modernBI
v2.10.000
140
BigCo
f
yee_o
Emplo hitect
Arc
Role= =Feb 04
Hired
Anna
Carol
Barbara
friend
Since=2005
Elizabeth
friend
Since=1989
Share=[books,movies,tweets]
#modernBI
v2.10.000
Dawn
141
DATABASES AVAILABLE
Neo4J
Infinite Graph
OrientDB
FlockDB
#modernBI
v2.10.000
142
UNDERSTANDING
NOSQL KEY VALUES
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
143
Key-Value Stores
KEY-VALUE STORES
Simple data structure of two attributes: a key and value
Simple programming interface:
Store key-value pair
Retrieve value given a key
#modernBI
v2.10.000
144
Key-Value Stores
KEY-VALUE STORES
Simplest of NoSQL data stores
Key-value is a simple hash table
Used when all access to database is via
primary key
#modernBI
v2.10.000
145
Key-Value Stores
TERMINOLOGY
Oracle
Riak
Database instance
Riak cluster
Table
Bucket
Row
Key-value
Rowid
Key
#modernBI
v2.10.000
146
Key-Value Stores
TERMINOLOGY
<Bucket = userData>
<key = sessionID>
<Value = Object>
UserProfile
SessionData
ShoppingCart
CartItem
CartItem
#modernBI
v2.10.000
147
Key-Value Stores
DATABASES AVAILABLE
Riak
Redis (referred to as Data Structure server)
Memcached DB
Berkeley DB
HamsterDB (for embedded use)
Amazon DynamoDB (not open-source)
Project Valdemort (open-source version of Amazon
DynamoDB)
#modernBI
v2.10.000
148
Key-Value Stores
GOOD FOR
#modernBI
v2.10.000
149
Key-Value Stores
#modernBI
v2.10.000
150
Key-Value Stores
Column-oriented
Column family
#modernBI
v2.10.000
151
Key-Value Stores
HBASE
#modernBI
v2.10.000
152
Key-Value Stores
TERMINOLOGY
RDBMS
Cassandra
database instance
cluster
database
keyspace
table
column family
row
row
#modernBI
v2.10.000
153
Key-Value Stores
TERMINOLOGY
Column family
ROW
Row
KeyX
Column1
Name1:value1
Column2
Name2:value2
ColumnN
NameN:valueN
ROW
Row
KeyY
Column1
Name1:value1
#modernBI
Column9
Name9:value9
v2.10.000
ColumnN
NameN:valueN
154
UNDERSTANDING
HADOOP KEY VALUES
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
155
#modernBI
v2.10.000
156
Hadoop Scale-out:
#modernBI
v2.10.000
157
#modernBI
v2.10.000
158
Rather than rely on hardware to deliver highavailability, the library is designed to detect and
handle failures at the application layer, delivering a
highly-availability service on top of a cluster of
computers, each of which may be prone to failures.
HCatalog
Hive
Cassandra
Mahout
Hadoop MapReduce
Pig
Chukwa
HBase
ZooKeeper
#modernBI
v2.10.000
WHAT IS HADOOP?
159
Data stored in
Hadoop Distributed File System
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
160
SCHEMA-LESS
Event Data
Data which contains standard or fixed data structures which allows
it to be stored easily in RDBMS
However, high volume Event Data presents the challenges of
loading, managing, and accessing this data
#modernBI
v2.10.000
161
ARCHITECTURE COMPONENTS
Master / Slave Architecture
NameNode
DataNode
Secondary NameNode
JobTracker
TaskTracker
#modernBI
v2.10.000
162
TaskTracker
JobTracker
DataNode
TaskTracker
#modernBI
DataNode
TaskTracker
v2.10.000
DataNode
TaskTracker
Backup
Secondary NameNode
Slave
Slave
DataNode
Slave
Client
NameNode
Slave
Master
ARCHITECTURE TOPOLOGY
163
UNDERSTANDING MAPREDUCE
MapReduce is a framework for processing huge datasets
on certain kinds of distributable problems using a large
number of computers (nodes)
Collectively referred to as a cluster (if all nodes use the
same hardware) or as a grid (if the nodes use different
hardware)
Computational processing can occur on data stored either
in a file system (unstructured) or within a database
(structured)
#modernBI
v2.10.000
164
#modernBI
v2.10.000
165
MAP STEP
#modernBI
v2.10.000
166
REDUCE STEP
#modernBI
v2.10.000
167
#modernBI
v2.10.000
168
#modernBI
v2.10.000
169
PIG
#modernBI
v2.10.000
170
HIVE
Features:
Overview:
#modernBI
v2.10.000
171
HBASE
#modernBI
v2.10.000
172
HADOOP ECOSYSTEM
#modernBI
v2.10.000
173
#modernBI
v2.10.000
174
Cloudera Distribution
Including Apache Hadoop
#modernBI
v2.10.000
175
MODERN DATA
PLATFORM IN BI
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
176
POLYGLOT PERSISTENCE
Different databases are designed to solve
different problems
Disparate Data Storage Needs
Hybrid Approach to Persistence
Similar to todays RDBMS and MOLAP
combination
#modernBI
v2.10.000
177
POLYGLOT PERSISTENCE
In 2006, Neal Ford coined the term
polyglot programming
Idea that applications should be written in a mix
of languages to take advantage of being suitable
for tackling different problems
#modernBI
v2.10.000
178
Polyglot Persistence
TYPICAL ARCHITECTURE
E-commerce
platform
Shopping cart
data
Shopping cart
data
Completed
Orders
RDBMS
store
#modernBI
v2.10.000
BI/DW
179
Polyglot Persistence
POLYGLOT ARCHITECTURE
Disparate Data Storage Needs
Key-Value store
E-commerce
platform
Relational store
Shopping cart
data
Key-Value
store
Completed
Orders
RDBMS
store
#modernBI
v2.10.000
Session data
Key-Value
store
180
Polyglot Persistence
POLYGLOT ARCHITECTURE
Disparate Data Storage Needs
E-commerce
platform
Key-Value store
Document store
Shopping cart
and session
data
Graph store
Relational store
Inventory and
Item price
Customer
social graph
Completed
orders
Key-Value
store
#modernBI
Document
store
v2.10.000
RDBMS
store
Graph
store
181
Polyglot Persistence
Shopping cart
and session
data
Inventory and
Item price
Completed
orders
Inventory and
Price service
Session Storage
service
Key-Value
store
Order persistence
service
RDBMS
store
Document
store
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000
Customer
social graph
Friends bought
these products
service
Graph
store
182
#modernBI
v2.10.000
183
Analytic
Database
Technologies
Highest Scalability
Lowest Cost
Schema-less
Without Context
Accessibility:
Programming
Workload:
Maturity:
Emerging
Mature
#modernBI
EDW
RDBMS
Accepted
v2.10.000
SQL Access
184
Tier 2 - Optimized
Cluster
Tier 3 - Reference
HCatalog /
Hive-QL
MapReduce
MPP
Columnar
Integration
Links
Gateways
askdjfl
kasjdfl
iuyuiio
Document
Stores
Discovery, Scalable,
Programmable
Copyright 2013 Radiant Advisors. All Rights Reserved
Projections
MOLAP
HCatalog /
Hive-QL
MapReduce
Cluster
In-memory
Graphs
Analytics Oriented
#modernBI
v2.10.000
Data
Warehouses
Master
Reference
Data
Text
Analytics
185
Data Warehouse:
Optimized Work Loads
Operational
Benefit from Context
Internet,
Sensor
data
Operational Systems
Migrate History
or ETL Acquire
Staging
Very Few
Data
Scientists
MapReduce
ETL
or ETL
PIG
Hadoop HDFS
Hive
HCatalog
ETL
Data Marts
Data Marts
Data Marts
Persisted or Virtual
#modernBI
v2.10.000
Few
Analysts/
Modelers
186
HCatalog / Hive-QL
MapReduce
SQL
Integration
Semantic Projections
Links
Gateways
text
Analytic DBMS
Hadoop
#modernBI
v2.10.000
Columnar storage
In-memory access
Document stores
Text Analysis
Graph Analysis
ROLAP/MOLAP
EDW
(RDBMS)
187
Future BI tools
HCatalog
Services
MapReduce
text
In-memory
Semantic
Discovery
Analytic DBMS
Hadoop
#modernBI
v2.10.000
EDW
188
MapReduce
text
In-memory
Analytic DBMS
Hadoop
#modernBI
v2.10.000
EDW
189
Polyglot Persistence
SUMMARY
#modernBI
v2.10.000
190
NoSQL Landscape
GREAT RESOURCES
NoSQL Distilled, A brief guide to the Emerging World on
Polyglot Persistence. Pramod J. Sadalage, Martin Fowler,
Addison-Wesley, copyright 2013
Seven Databases in Seven Weeks, A guide to Modern
Database and the NoSQL Movement. Eric Redmond, Jim R.
Wilson, The Pragmatic Bookshelf, copyright 2012
#modernBI
v2.10.000
191
Rediscovering BI
Todays BI environment is all about rethinking how we do BI and
imagining new, innovative ways to approach BI.
Rediscovering BI is a free, monthly eMagazine that challenges readers
to rethink, reexamine, and rediscover the way they approach business
intelligence.
We publish pieces that provide thought leadership, foster innovation,
challenge the status quo, and inspire you to rediscover BI.
www.RadiantAdvisors.com/RediscoveringBI
#rediscoveringBI
#modernBI
v2.10.000
192
Subscribe
#modernBI
v2.10.000
193
RSS:
feed://radiantadvisors.com/feed/
Email us at:
info@RadiantAdvisors.com
Linked IN:
www.linkedin.com/company/radiant-advisors
Subscribe:
THANK YOU!
Copyright 2013 Radiant Advisors. All Rights Reserved
#modernBI
v2.10.000