Professional Documents
Culture Documents
Jrgen Urbanski
Board Member Big Data & Analytics of BITKOM (German IT Industry Association) Views and opinions expressed are not necessarily those of his employer T-Systems, the enterprise arm of Deutsche Telekom.
Volume
Velocity
N = Highlighted today
Enterprise data warehouse offload Data landing zone Active archive Enterprise-wide data lake augmenting the EDW Mainframe offload
!!
360 degree view of customer value Targeted TV advertising, serving the most profitable TV ad to individual viewers based on analysis of their viewing behavior (e.g., what ads prompt the user to switch channels) Personalized marketing campaigns, integrating analytics across various marketing channels Big data as a product
Network maintenance and upgrades, operational intelligence for better capacity planning and customer experience Analytics and archiving of call detail records (CDRs) on Hadoop for compliance, billing disputes, and congestion monitoring* Real time analytics of CDRs via streaming and in-memory technology to reduce fraud on pre-paid services (e.g., SIM cloning) Security analytics (e.g., intrusion detection) M2M device telemetry analytics (e.g., home security and assisted living) Log analytics for customer support
!!
!!
!! !! !!
!!
!!
* A large mobile carrier might reach 1 billion new CDRs, ingesting 20 TB per day
The Solution
!!
Many EDWs are at capacity Running out of budget before running out of relevant data Older data archived in the dark, not available for exploration
!! !!
Hadoop for data storage and processing: parse, cleanse, apply structure and transform Free EDW for valuable queries Retain all data for analysis!
DATA WAREHOUSE
Operational (44%) Analytics (11%) ETL Processing (42%)
DATA WAREHOUSE
Operational (50%) Analytics (50%)
The Solution
!!
!!
Functionality includes
!! !! !! !!
!!
Team
!! !! !!
Price heat maps and trending Demand- and supply-side info Local area information Comparable transactions
3 Network Maintenance and Upgrades to Improve the Customer Experience The Situation
!!
Challenge
Poor visibility into how cable network congestion affects churn, and where exactly network upgrades produce the most incremental revenue
!!
Approach
Analyze node utilization against customer experience indicators to see if congestion correlates to experience !! Increase of dropped packets (IPDR data) !! Increase of calls into the call center for customers associated to these nodes (customer experience data) !! Increase of requests to drop service (work order data) Nodes considered to be causing customer experience issues can be prioritized for maintenance and upgrade based on the value of the customers served by each node
!!
Hypothesis
3 Network Maintenance and Upgrades to Improve the Customer Experience The Solution
!!
Analysis that integrates subscriber and network node data to see correlations between network congestion and customer experience 11 different data sources 4m subscriber records, 12m work orders, 9m calls, 42m IPDRs1, 20m Tivoli NPMs2 Finding: Only small number of nodes are responsible for majority of the negative customer experience
!! !!
Subscriber
Caller Experience Work Orders
Subscriber
Network Node
Products
!!
1 IPDR = Internet Protocol Detail Record, provides information about IP-based service usage, usually to inform OSS and BSS systems. 2 NPM = NetView Performance Monitor Messages. 9 Source: Zaloni project for a large cable provider
Approach to Execution
Agile learning on each project
Architecture & Design Proof of Concept Pilot Production Implementation
N = Highlighted today
Training1
+
Programmatic steering
Supply-side requirements: !! Higher capital efficiency !! Lower upfront investment !! Faster time to value !! More rapid innovation
!! !!
GOAL:
Platform that natively supports mixed workloads as shared service
Interactive Explore
Online Enrich
Big Data
Big Data
Big Data
Big Data
Transactions, Interactions, Observations
11
Presentation
Reports & Dashboards Clients Advanced Visualization Real-Time Monitoring
Application
OLAP Text & Semantics Web & Social Media Video & Audio Geo-spatial Data Mining & Predictive
Data Encryption
Operations
Data Processing
Batch Processing Streaming & Complex Event Processing Search
Data Management
(Scale-out) Relational DB NoSQL DB1 In-memory DB (MPP) EDW
Data Masking
Custodian Gateways
Physical Infrastructure
12
Presentation
Reports & Dashboards Clients Advanced Visualization Real-Time Monitoring
Application
OLAP Text & Semantics Web & Social Media Video & Audio Geo-spatial Data Mining & Predictive
Data Encryption
Operations
Data Processing
Batch Processing Streaming & Complex Event Processing Search2
Data Management
(Scale-out) Relational DB NoSQL DB1 In-memory DB (MPP) EDW
Data Masking
Custodian Gateways
Physical Infrastructure
Store first, ask questions later (HDFS) Parallel scale-out processing (MapReduce) Much cheaper storage Any data type, including unstructured
13
1! Includes key value, document, graph and object data bases. 2! Solr and Lucene open source projects, also applicable outside Hadoop.
EXTRACT ONLY
Category
Strategic Fit
Relevance of ISV ecosystem (notably EDW & BI) that is certified Avoidance of vendor lock-in (open source vs. proprietary) Support of batch, interactive, online & streaming use cases Full data lifecycle management Rolling upgrades without service disruption and fallback capability Support for end-to-end management and automation frameworks (e.g., Puppet, Chef) Granular role-based access control via AD, LDAP, Kerberos Tenant, data, network and namespace separation in all services Auditability Deployment flexibility: virtual on-premise environments, public clouds, appliances
Data Management
Operations
Security
Infrastructure
!!
14
Our perspective
!! !! !! !!
In 3 years, 50% of new data for enterprise workloads will land on Hadoop Big Data can deliver value in every function of a telco Big Data has a high return-on-investment, if you master the learning curve Operators who embrace Hadoop today will see their business performance pull away from those who are late to join the new world of Big Data
15
Further Reading
!!
Start Small, Grow Tall: Debunking Three Big Data Myths (Link)
!! !! !!
WIRED Innovation Insights October 4, 2013 Enterprises dont need petabytes of data, a small army of data scientists, not even a big budget to get a meaningful start with Big Data -- thanks to Hadoop.
!!
Information Week October 2, 2013 The first Hadoop tools weren't easy to deploy or manage. But the second-wave tools deliver great advances in usability.
!!
TDWI June 2013 Deutsche Telekoms perspective on how the open-source Hadoop ecosystem delivers powerful innovation in storage, databases and business intelligence at a fraction of the cost of legacy systems.
!!
Been there, forked that: What the Unix-Linux schism can teach us about Hadoops future (Link)
!! !! !!
GigaOm June 26, 2013 Concerned about proprietary and expensive forks of Hadoop, T-Systems Juergen Urbanski explains how to tell if you are buying an open version of Hadoop or something you might later regret.
16