Professional Documents
Culture Documents
Overview
Martin Pavlk
+420 731 435 691
martin_pavlik@cz.ibm.com
January 2013
Social Media
Website
Billing
ERP
2
CRM
RFID
Network Switches
2012 IBM Corporation
Data Warehousing
Stream Computing
Business Users
Determine what
question to ask
Delivers a platform to
enable creative
discovery
IT
Business
Structures the
data to answer
that question
Explores what
questions could be
asked
IT
Brand sentiment
Product strategy
Maximum asset utilization
Hadoop
Open-source software framework from Apache
Inspired by
Google MapReduce
GFS (Google File System)
HDFS
Map/Reduce
InfoSphere BigInsights
Platform for volume, variety,
velocity
Enhanced Hadoop
foundation
Enterprise Edition
Licensed
Enterprise Class
Storage, security, cluster
management
Enterprise class
Analytics
Text analytics & tooling
Application accelerators
Usability
Web console
Spreadsheet-style tool
Ready-made apps
Application accelerators
Pre-built applications
Text analytics
Spreadsheet-style tool
RDBMS, warehouse connectivity
Basic Edition
Administrative tools, security
Eclipse development tools
Free download
Performance enhancements
....
Integrated install
Online InfoCenter
BigData Univ.
Apache
Hadoop
Integration
Connectivity to Netezza,
DB2, JDBC databases, etc
Breadth of capabilities
10
Spreadsheet-style Analysis
Web-based analysis
and visualization
Spreadsheet-like
interface
Define and manage
long running data
collection jobs
Analyze content of the
text on the pages that
have been retrieved
11
12
13
(Integration)
DB2, Netezza,
Streams,
Ad-Hoc analysis
(BigSheets)
Text analytics
Statistical analysis
Machine learning
Ad-hoc analysis
Machine
learning
(SystemML)
Statistical
Analysis
(R module)
BigInsights Text
Analytics
Jaql
Jaql I/O
DFS
Jaql Core
Operators
NoSQL
RDBMS
Jaql
Modules
File
System
Traditional
analytic
tools
Data warehouse
BigInsights
Filter
14
Transform
Aggregate
2012 IBM Corporation
Solution
IBM Netezza renamed to
PureData System for Analytics
15
16
Analyst
17
IT
18
19
20
IT
IT
Analyst
21
22
Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost
23
In October 2012
24
Netezza
Genesis in T-Mobile CZ
Proof-Of-Concept Project
New EnterpriseDataWarehouse platform selection
Comparison of existing and other platforms
Selection Criteria
Performance
Operational Savings
25
26
27
28
Netezza
2 hours
1 minute
33 minutes
17 seconds
10 hours
23 seconds
50 minutes
38 seconds
Workflow Reporting
Invoicing and Payments reporting
30
Big Data
analytic
applications
From Cognos BI
via Hive JDBC
BigInsights
Data Warehouse
31
SQL Language
JDBC / ODBC Driver
Data Sources
HiveTables
HBase
tables
CSV Files
InfoSphere BigInsights
34
Solution
IBM InfoSphere Streams
35
Data Exhaust
Network data
system logs (web server, app server),
Financial transactions
CDRs
36
Isolation
Processing in isolation
or in limited windows (time / nr. Of records)
Integration challenges
Sub-millisecond latency
InfoSphere
Streams
Data
Data Integration,
data mining,
machine learning,
statistical modeling
1. Data Ingest
2. Bootstrap/Enrich
Data ingest,
preparation,
online analysis,
model validation
38
Control
flow
InfoSphere
BigInsights,
Database &
Warehouse
IN DETAIL
Increase over
time
Lowering
deployment costs
Shared components
Analytic Applications
BI /
Exploration / Functional Industry Predictive Content
BI /
Reporting Visualization
App
App
Analytics Analytics
Reporting
Application
Development
Systems
Management
Integration
Accelerators
Points of leverage
39
Hadoop
System
Stream
Computing
Data
Warehouse
40
THINK