You are on page 1of 37

IBM Big Data Platform

Overview

Martin Pavlk
+420 731 435 691
martin_pavlik@cz.ibm.com

January 2013

2013 IBM Corporation

Big Data is a Hot Topic Because Technology Makes it


Possible to Analyze ALL Available Data
Cost effectively manage and analyze
all available data in its native form
unstructured, structured, streaming

Social Media

Website

Billing
ERP
2

CRM

RFID

Network Switches
2012 IBM Corporation

BIG DATA is not just HADOOP

Understand and navigate


federated big data sources

Federated Discovery and Navigation

Manage & store huge


volume of any data

Hadoop File System


MapReduce

Structure and control data

Data Warehousing

Manage streaming data

Stream Computing

Analyze unstructured data

Text Analytics Engine

Integrate and govern all


data sources

Integration, Data Quality, Security,


Lifecycle Management, MDM
2012 IBM Corporation

Business-Centric Big Data Enables You to Start With a Critical Business


Pain and Expand the Foundation for Future Requirements

Big data isnt just a technology


its a business strategy for
capitalizing on information
resources
Getting started is crucial
Success at each entry point is
accelerated by products within
the Big Data platform
Build the foundation for future
requirements by expanding
further into the big data platform

2012 IBM Corporation

1 Unlock Big Data


Customer need
Understand existing data sources
Search and navigate data within
existing systems
No copying of data
Value statement
Get up and running quickly
Discover and retrieve big data
Work even with big data sources by
business users
Solution
Vivisimo Velocity renamed to
IBM InfoSphere DataDiscovery
5

2012 IBM Corporation

2 Analyze Raw Data


Customer need
Ingest data as-is into Hadoop
Combine it with data from DWH
Process very large volume of data
Value statement
Gain new insight
Overcome the high cost of converting
data from unstructured to structured
format
Experiment with analysis on different
data and combine them with other
sources
Solution
IBM InfoSphere BigInsights

2012 IBM Corporation

Merging the Traditional and Big Data Approaches


Traditional Approach

Big Data Approach

Structured & Repeatable Analysis

Iterative & Exploratory Analysis

Business Users
Determine what
question to ask

Delivers a platform to
enable creative
discovery

IT

Business

Structures the
data to answer
that question

Explores what
questions could be
asked

Monthly sales reports


Profitability analysis
Customer surveys

IT

Brand sentiment
Product strategy
Maximum asset utilization

2012 IBM Corporation

InfoSphere BigInsights is more than just HADOOP

IBM InfoSphere Big Insights


Is much more than
HADOOP

IBM Big data platform


Includes much more than
IBM InfoSphere Big
Insights

2012 IBM Corporation

Hadoop
Open-source software framework from Apache
Inspired by
Google MapReduce
GFS (Google File System)

HDFS
Map/Reduce

2012 IBM Corporation

InfoSphere BigInsights
Platform for volume, variety,
velocity
Enhanced Hadoop
foundation

Enterprise Edition
Licensed

Enterprise Class
Storage, security, cluster
management

Enterprise class

Analytics
Text analytics & tooling
Application accelerators
Usability
Web console
Spreadsheet-style tool
Ready-made apps

Can run also on top of

Application accelerators
Pre-built applications
Text analytics
Spreadsheet-style tool
RDBMS, warehouse connectivity
Basic Edition
Administrative tools, security
Eclipse development tools
Free download
Performance enhancements
....
Integrated install
Online InfoCenter
BigData Univ.

Apache
Hadoop

Integration
Connectivity to Netezza,
DB2, JDBC databases, etc
Breadth of capabilities
10

2012 IBM Corporation

Spreadsheet-style Analysis
Web-based analysis
and visualization
Spreadsheet-like
interface
Define and manage
long running data
collection jobs
Analyze content of the
text on the pages that
have been retrieved

11

2012 IBM Corporation

Build a Big Data Program MapReduce example


Eclipse tools
For Jaql, Hive, Pig Java MapReduce, BigSheets
plug-ins, text analytics, etc.

12

2012 IBM Corporation

JAQL IBMs programming language in hadoop world

Integration point for


various data sources

Local and distributed


file systems
NoSQL data bases
Content repositories
Relational sources
(Warehouses,
operational data bases)

13

(Integration)
DB2, Netezza,
Streams,

Ad-Hoc analysis
(BigSheets)

Text analytics
Statistical analysis
Machine learning
Ad-hoc analysis

Machine
learning
(SystemML)

Statistical
Analysis
(R module)

Integration point for


various analytics

BigInsights Text
Analytics

Jaql is a complete solutions environment supporting all other


BigInsights components

Jaql
Jaql I/O

DFS

Jaql Core
Operators

NoSQL

RDBMS

Jaql
Modules

File
System

2012 IBM Corporation

BigInsights and the data warehouse


Big Data
analytic
applications

Traditional
analytic
tools

Data warehouse

BigInsights

Filter
14

Transform

Aggregate
2012 IBM Corporation

3 Simplify your warehouse


Customer need SIGNIFICANTLY
Make performance of DWH better
Reduce DWH administration costs
Value statement

Speed: 10 100x better performance


Simplicity: Administration costs reduced by 75% - 90%
Scalability
Smart system
In-database analytics
Out-of-the box integration with SPSS

Solution
IBM Netezza renamed to
PureData System for Analytics

15

2012 IBM Corporation

I need to evaluate the possible


relationship between client salary and
overdrafts
Analyst

16

OK. We have to evaluate a lot of


statistics, set the correct db indexes
and db partitioning. It will take us 5
days.
IT

2012 IBM Corporation

Great. Thanks a lot.


Im going to check the results.

Analyst

17

Done. You can run your analytical


query.

IT

2012 IBM Corporation

Great. I can see here some nice Noooo!!!


not
correlations. Now I need to Its
look
atpossible to work
here!
it from the different perspective.
Analyst

18

Ohhh, welcome dear friend.


Understand. So, its .
another 5 days of our work
IT

2012 IBM Corporation

And now with Netezza ...

19

2012 IBM Corporation

I need to evaluate the possible


relationship between client salary and
overdrafts.
I will use Netezza.
Analyst

20

IT

2012 IBM Corporation

Great. I can see here some nice correlations.


Now I need to look at it from the different
perspective.
With Netezza I can run the query immediately.
The response will be in the same time

IT

Analyst

IT can do something else


much more useful

21

2012 IBM Corporation

22

2012 IBM Corporation

Built-In Expertise Makes This as Simple as an Appliance

Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost

23

2012 IBM Corporation

In October 2012

IBM Netezza was renamed to IBM PureData System for Analytics

24

2012 IBM Corporation

Netezza
Genesis in T-Mobile CZ

Proof-Of-Concept Project
New EnterpriseDataWarehouse platform selection
Comparison of existing and other platforms
Selection Criteria

Performance
Operational Savings

.and the winner was: Netezza

25

2012 IBM Corporation

Netezza Genesis in T-Mobile CZ


Expectations
Significant response improvement:
Faster platform means better reports response
Direct Data Availability
Higher trust in data , one version of truth
Aggregation reduction
Any attribute available
Operational Benefits
Storage savings (no data replicas)
Administration costs reduction(DBA)
Infrastructure Simplification
Lower environment complexity

26

2012 IBM Corporation

Netezza Genesis in T-Mobile CZ


Project Implementation
EDW platform migration
Netezza platform implementation
ETL graphs/processes redesign

BI Front-End Tool Migration


SAP Business Object implementation
All reports redesign

Main Integration Partner: T-System CZ

27

2012 IBM Corporation

Netezza Genesis in T-Mobile CZ


Actual Status
All relevant ETL procecessing redesigned
Actual parallel run to Original and Netezza platform finished
Netezza as only primary platform

28

2012 IBM Corporation

Real Netezza experience from T-Mobile Czech Rep.


Original
Platform

Netezza

2 hours

1 minute

Payment discipline of current month invoices

33 minutes

17 seconds

Overdue Debt of Invoices in Current Month

10 hours

23 seconds

Average Monthly Invoice Figures

50 minutes

38 seconds

Workflow Reporting
Invoicing and Payments reporting

RESPONSE TIME MASSIVELY


IMPROVED
29

2012 IBM Corporation

4 Reduce costs with Hadoop


Customer need SIGNIFICANTLY
Too much data => Too expensive to store and to maintain
Big portion is used just in case
Data amount is still growing => its more expensive
=> too expensive to have all data in standard DWH
Value statement
Leverage the architecture of parallel processing in Hadoop
Hadoop uses cheap commodity HW
Enable business users still work in the same or similar way
Solution
IBM InfoSphere BigInsights

30

2012 IBM Corporation

BigInsights and the data warehouse


Traditional
analytic
tools

Big Data
analytic
applications

From Cognos BI
via Hive JDBC

BigInsights

Data Warehouse

31

Query-ready archive for cold warehouse data

2012 IBM Corporation

Future: The SQL interface . . . .


Application

Rich SQL query capabilities


SQL '92 and 2011 features
Correlated subqueries
Windowed aggregates

SQL Language
JDBC / ODBC Driver

SQL access to all data stored in


InfoSphere BigInsights

JDBC / ODBC Server

Robust JDBC/ODBC support

SQL interface Engine

Take advantage of key features


of each data source
Leverage MapReduce
parallelism
OR
achieving low-latency

Data Sources

HiveTables

HBase
tables

CSV Files

InfoSphere BigInsights
34

2012 IBM Corporation

5 Analyze Streaming Data


Customer need
Process and leverage streaming data
Select valuable data from data stream for
future processing
Quickly process data going to be useless
if its not processed immediately
Value statement
React in real-time to take an oppurtinity
before it expires
Periodically adjust streaming models
based on analysis on data at rest

Solution
IBM InfoSphere Streams

35

2012 IBM Corporation

Why and when to use InfoSphere Streams?


Applications needing on-fly processing, filtering and analyzing streaming data
Sensors

Environmental, Industrial, GPS,


Images, Videos,

Data Exhaust

Network data
system logs (web server, app server),

High-rate transaction data

Financial transactions
CDRs

At least 2 criteria from the list bellow should be fulfilled

36

Isolation

Processing in isolation
or in limited windows (time / nr. Of records)

Non-traditional formats included

Spatial data, images, text, voice,

Integration challenges

Different connection methods


Different data rates
Different processing requirements

Multiple processing nodes

Volume / rate very high => scalability required

Sub-millisecond latency

Immediate analysis and response

Store & mine approach doesnt work

Because of very high volume of data (and its rates)


2012 IBM Corporation

Streams and BigInsights - Integrated Analytics on Data in


Motion & Data at Rest
Visualization of realtime and historical
insights

InfoSphere
Streams
Data

Data Integration,
data mining,
machine learning,
statistical modeling

1. Data Ingest
2. Bootstrap/Enrich

Data ingest,
preparation,
online analysis,
model validation

38

Control
flow

InfoSphere
BigInsights,
Database &
Warehouse

3. Adaptive Analytics Model

2012 IBM Corporation

The Platform Advantage


BENEFITS

IN DETAIL

Increase over
time

By moving from entry to a 2nd


and 3rd project

Lowering
deployment costs

Shared components

Analytic Applications
BI /
Exploration / Functional Industry Predictive Content
BI /
Reporting Visualization
App
App
Analytics Analytics
Reporting

IBM Big Data Platform


Visualization
& Discovery

Application
Development

Systems
Management

Integration
Accelerators
Points of leverage

Shared text analytics for


Streams and BigInsights
HDFS connectors (data
integration (ETL, ),
Streams)
Accelerators
Build across multiple
engines

39

Hadoop
System

Stream
Computing

Data
Warehouse

Information Integration & Governance

2012 IBM Corporation

IBM big data IBM big data

IBM big data


IBM big data

40

IBM big data

IBM big data

THINK

IBM big data

IBM big data

IBM big data IBM big data


2012 IBM Corporation

You might also like