You are on page 1of 17

CTO Perspective on Capturing the Potential of Big Data in a Service Provider

Jrgen Urbanski
Board Member Big Data & Analytics of BITKOM (German IT Industry Association) Views and opinions expressed are not necessarily those of his employer T-Systems, the enterprise arm of Deutsche Telekom.

Big Data 1956

IBM 305 RAMAC 5 MB!

Big Data 1956

Big Data 2013

Volume

Velocity

Variety IBM 305 RAMAC 5 MB!

Big Data Use Cases for Telcos


Business Intelligence
1
!! -! -! -!

N = Highlighted today

Marketing & Sales


!! !!

Service & Operations


3
!!

Enterprise data warehouse offload Data landing zone Active archive Enterprise-wide data lake augmenting the EDW Mainframe offload

!!

360 degree view of customer value Targeted TV advertising, serving the most profitable TV ad to individual viewers based on analysis of their viewing behavior (e.g., what ads prompt the user to switch channels) Personalized marketing campaigns, integrating analytics across various marketing channels Big data as a product

Network maintenance and upgrades, operational intelligence for better capacity planning and customer experience Analytics and archiving of call detail records (CDRs) on Hadoop for compliance, billing disputes, and congestion monitoring* Real time analytics of CDRs via streaming and in-memory technology to reduce fraud on pre-paid services (e.g., SIM cloning) Security analytics (e.g., intrusion detection) M2M device telemetry analytics (e.g., home security and assisted living) Log analytics for customer support

!!

!!

!! !! !!

!!

!!

* A large mobile carrier might reach 1 billion new CDRs, ingesting 20 TB per day

1 Enterprise Data Warehouse Offload


The Situation
!! !! !!

The Solution
!!

Many EDWs are at capacity Running out of budget before running out of relevant data Older data archived in the dark, not available for exploration

!! !!

Hadoop for data storage and processing: parse, cleanse, apply structure and transform Free EDW for valuable queries Retain all data for analysis!

DATA WAREHOUSE
Operational (44%) Analytics (11%) ETL Processing (42%)

DATA WAREHOUSE
Operational (50%) Analytics (50%)

HADOOP Cost is 1/10th


Storage & Processing

2 Data Products: ImmobilienScout (a DT subsidiary)


The Situation
!!

The Solution
!!

Europes leading real estate marketplace with data on...


1m properties listed currently !! 20m properties cumulative !! 6m saved searches !! Geographical coordinates !! Enriched by socio-demographic data on 19m properties
!!

Market Navigator service


Supports realtors in acquiring customers !! Local market analysis helps with price setting for rent and buy !! Integrates third-party data
!!

!!

Functionality includes
!! !! !! !!

!!

Team
!! !! !!

Product manager Data scientists 2 scrum teams

Price heat maps and trending Demand- and supply-side info Local area information Comparable transactions

2 Turning Big Data into Products!

3 Network Maintenance and Upgrades to Improve the Customer Experience The Situation
!!

Challenge

Poor visibility into how cable network congestion affects churn, and where exactly network upgrades produce the most incremental revenue

!!

Approach

Analyze node utilization against customer experience indicators to see if congestion correlates to experience !! Increase of dropped packets (IPDR data) !! Increase of calls into the call center for customers associated to these nodes (customer experience data) !! Increase of requests to drop service (work order data) Nodes considered to be causing customer experience issues can be prioritized for maintenance and upgrade based on the value of the customers served by each node

!!

Hypothesis

Source: Zaloni project for a large cable provider

3 Network Maintenance and Upgrades to Improve the Customer Experience The Solution
!!

Analysis that integrates subscriber and network node data to see correlations between network congestion and customer experience 11 different data sources 4m subscriber records, 12m work orders, 9m calls, 42m IPDRs1, 20m Tivoli NPMs2 Finding: Only small number of nodes are responsible for majority of the negative customer experience

Master Subscriber Record Equipment Marketing Demographics


TNMP CMTS Performance

!! !!

Subscriber
Caller Experience Work Orders

Subscriber

Network Node

Nodes to CMTS Map

Products

!!

Competitive Spend Data

IPDR Cable Modem Usage

1 IPDR = Internet Protocol Detail Record, provides information about IP-based service usage, usually to inform OSS and BSS systems. 2 NPM = NetView Performance Monitor Messages. 9 Source: Zaloni project for a large cable provider

Approach to Execution
Agile learning on each project
Architecture & Design Proof of Concept Pilot Production Implementation

N = Highlighted today

Training1

Maintenance & Support

+
Programmatic steering

Supply-side requirements: !! Higher capital efficiency !! Lower upfront investment !! Faster time to value !! More rapid innovation
!! !!

Implications: A !! Technology strategy B !! Target architecture C !! Vendor selection !! New processes


!! !! !!

Market-facing differentiation Security & compliance

New skills Privacy considerations Etc.


10

1 Architect, Developer, Data Science and Admin

Technology Strategy: From Data Puddles to Data Lakes AVOID:


Systems separated by workload type due to contention

GOAL:
Platform that natively supports mixed workloads as shared service

Batch BU1 BU2 BU3 Refine

Interactive Explore

Online Enrich

Big Data

Big Data

Big Data

Big Data
Transactions, Interactions, Observations

11

Target Architecture: Modular against a Fragmented Ecosystem


Not in focus

Data Integration & Governance


Extract, Transform, Load Real Time & Batch Ingestion

Presentation
Reports & Dashboards Clients Advanced Visualization Real-Time Monitoring

Identity & Access Management

Application
OLAP Text & Semantics Web & Social Media Video & Audio Geo-spatial Data Mining & Predictive

Data Encryption

Security & Privacy

Operations

Data Processing
Batch Processing Streaming & Complex Event Processing Search

Data Isolation & Multitenancy

Data Connectors Distributed Storage & Processing

Data Management
(Scale-out) Relational DB NoSQL DB1 In-memory DB (MPP) EDW

Data Masking

Life Cycle Management

Custodian Gateways

Physical Infrastructure

1! Includes key value, document, graph and object data bases.

12

Hadoop Is the Foundation for Much of the Innovation


Hadoop Projects & Ecosystem Adjacent Categories
Identity & Access Management

Data Integration & Governance


Extract, Transform, Load Real Time & Batch Ingestion

Presentation
Reports & Dashboards Clients Advanced Visualization Real-Time Monitoring

Application
OLAP Text & Semantics Web & Social Media Video & Audio Geo-spatial Data Mining & Predictive

Data Encryption

Security & Privacy

Operations

Data Processing
Batch Processing Streaming & Complex Event Processing Search2

Data Isolation & Multitenancy

Data Connectors Distributed Storage & Processing

Data Management
(Scale-out) Relational DB NoSQL DB1 In-memory DB (MPP) EDW

Data Masking

Life Cycle Management

Custodian Gateways

Physical Infrastructure
Store first, ask questions later (HDFS) Parallel scale-out processing (MapReduce) Much cheaper storage Any data type, including unstructured
13

1! Includes key value, document, graph and object data bases. 2! Solr and Lucene open source projects, also applicable outside Hadoop.

Vendor Selection Considerations


Select requirements
!! !! !! !! !! !! !! !! !!

EXTRACT ONLY

Category
Strategic Fit

Relevance of ISV ecosystem (notably EDW & BI) that is certified Avoidance of vendor lock-in (open source vs. proprietary) Support of batch, interactive, online & streaming use cases Full data lifecycle management Rolling upgrades without service disruption and fallback capability Support for end-to-end management and automation frameworks (e.g., Puppet, Chef) Granular role-based access control via AD, LDAP, Kerberos Tenant, data, network and namespace separation in all services Auditability Deployment flexibility: virtual on-premise environments, public clouds, appliances

Data Management

Operations

Security

Infrastructure

!!

14

Our perspective
!! !! !! !!

In 3 years, 50% of new data for enterprise workloads will land on Hadoop Big Data can deliver value in every function of a telco Big Data has a high return-on-investment, if you master the learning curve Operators who embrace Hadoop today will see their business performance pull away from those who are late to join the new world of Big Data

15

Further Reading
!!

Start Small, Grow Tall: Debunking Three Big Data Myths (Link)
!! !! !!

WIRED Innovation Insights October 4, 2013 Enterprises dont need petabytes of data, a small army of data scientists, not even a big budget to get a meaningful start with Big Data -- thanks to Hadoop.

!!

Hadoops Second Generation Offers More To Enterprises (Link)


!! !! !!

Information Week October 2, 2013 The first Hadoop tools weren't easy to deploy or manage. But the second-wave tools deliver great advances in usability.

!!

Hadoop! Coming soon to an enterprise data warehouse near you (Link)


!! !! !!

TDWI June 2013 Deutsche Telekoms perspective on how the open-source Hadoop ecosystem delivers powerful innovation in storage, databases and business intelligence at a fraction of the cost of legacy systems.

!!

Been there, forked that: What the Unix-Linux schism can teach us about Hadoops future (Link)
!! !! !!

GigaOm June 26, 2013 Concerned about proprietary and expensive forks of Hadoop, T-Systems Juergen Urbanski explains how to tell if you are buying an open version of Hadoop or something you might later regret.
16

CTO Perspective on Capturing the Potential of Big Data in a Service Provider


Jrgen Urbanski
Board Member Big Data & Analytics of BITKOM (German IT Industry Association) Views and opinions expressed are not necessarily those of his employer T-Systems, the enterprise arm of Deutsche Telekom.

You might also like