Professional Documents
Culture Documents
Real-time InfoSphere
analytics Discovery
zone Enterprise What is
Information warehouse happening?
ingestion and data mart
InfoSphere Discovery and
operational Streams and analytic exploration
information appliances Cognos
zone zone Why did it
Exploration, Cognitive What action
happen? should I take?
landing and DB2 BLU and Fabric
Reporting, analysis, Decision
archive zone PureData content analytics
System for management
DB2 BLU Analytics SPSS
BigInsights
and Hadoop What could
happen?
Predictive analytics
Information governance zone and modeling
InfoSphere Server and DataStage
35000
30000
2.6x-12.3x
Throughput (emails/s)
25000
20000
More
15000
Streams Throughput
10000 Storm
5000
0
x4 (100%) x8 (100%) x8 (200%) x8 (400%)
Parallelism (dataset)
Based on IBM internal tests comparing InfoSphere Streams against Apache Storm. Results may not be typical and will vary based on actual workload, configuration, applications and other
variables in a production environment. Users of this document should verify the applicable data for their specific environment. Contact IBM and see what we can do for you.
https://www.ibmdw.net/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
Big Data and Real Time Analytics 11
InfoSphere BigInsights includes Apache
Hadoop to process data at rest
Hadoop Cluster
Processing
Storage
Input
MapReduce Result
Java
Program
R3
R1
B1
B2
R3
… R2
B3
R2 R1
A distributed file system that spans all the nodes in a Hadoop cluster
Files are split automatically at load time into blocks and spread among Data
Nodes
Elastically scalable
Big Data and Real Time Analytics 13
The MapReduce framework sends
programs out to the data
MapReduce
Job
…
Map and Map and Map and Map and
Reduce Tasks Reduce Tasks Reduce Tasks Reduce Tasks
Developer Role
Eclipse based tooling InfoSphere BigInsights Console
Read/write access to HDFS
Extensive views of jobs and workflows in
system
Application staging, launch and
scheduling center
Many built in accelerators
Administrator Role
Complete management of cluster
− Monitor/start/stop components
− Add/remove nodes
Portal style dashboards
Product demand
New product
Feelings - Attitudes acceptance
Emotions - Opinions
Thoughts - Desires
Competitive threats
Threats to brand
reputations
Advertisement
Huge volumes of unstructured data
targets
Topic
Data Source
Service Oriented
Twitter
Finance
Likes Dislikes
Love the check guard feature Don’t trust the on-line banking feature
Like the on-line bill pay feature Don’t like to wait in line for a long time
Like that the ATMs are located all over Don’t like the ATM fees
the city Hate the overdraft fees
Like the service representatives
2 Socket HP Servers
2500
2283
2069 2049
2000
RPE2 per Core
1500
Sandy Bridge Ivy Bridge Ivy Bridge
EP EP EX
1000
2.9 GHz 2.7 GHz 2.8 GHz
16 cores 24 cores 30 cores
500
0
The number shown is best in each category (sockets and number of cores) RPE2 numbers are derived from the
following six benchmark inputs:
The data in this tool is derived from RPE2 from Ideas International. Ideas International was acquired by Gartner, Inc. in 2012. © 2014
Gartner, Inc. and/or its affiliates. All rights reserved.”
Scale-out
Big Data and Real Time Analytics 22
Power S822L servers are priced
competitively to Intel Ivy Bridge servers
Comparable TCA Dell PowerEdge HP ProLiant IBM Power
R720 DL380 G8 S822L
Linux on Intel
$21,300 $22,763 $22,382
Ivy Bridge + KVM
vs.
Linux on
POWER8 + KVM
Server list price*
-3-year warranty, on-site
$12,605 $14,068 $14,895
0.2
0
Xeon Xeon Xeon POWER7+ POWER8
E5-2697v2 E5-2630v2 E5-2665 7R2 S822L
8 nodes 24 nodes 16 nodes 10 nodes 8 nodes
192 cores 288 cores 256 cores 120 cores 192 cores
1 TB 10 TB 10 TB 1 TB 10 TB
Allows variable block sizes to exist on same cluster to meet the needs
of different applications
@thinkpowerlinux
Big Data and Real Time Analytics 35