You are on page 1of 38

Global Consulting Practice (GCP) Big Data Point of View GCP Information Management

INTERNAL & CONFIDENTIAL


October 30, 2012
Copyright 2012 Tata Consultancy Services Limited

Why Big Data?


Social Media Sensor Data Video Feeds Audio Clips Images News Feeds Log Files

Explosion of Big Data

Digital Expansion

Social Explosion

Google Amazon Yahoo eBay Apple Hadoop Map/Reduce

Emergence of Big Data Platforms Mobility/ Location Cloud Computing

Big Data

Listening Text Mining Machine Learning Automated Reasoning Artificial Intelligence

Maturation of Analytic Tools (Advanced A.I)

Explosion of Information plus Multiple Innovations are creating a Perfect Storm


Document Name TCS Confidential

Leveraging Big Data The New Challenge


Big Data : Web Scale
50 billion web pages 800 million Facebook users 1000 million Facebook pages 200 million Twitter accounts 100 million tweets per day 5 billion Google queries per day Millions of servers, Petabytes of data

Digital Expansion

Social Explosion

Big Data

Varieties of Data

Video / Audio Images / Pictures Diverse internal and external data

Mobility/ Location

Sources of Data

Cloud Computing

News / Feeds / Blogs / forums Groups / Polls / Chats / Wiki

Information is exploding all around But the challenge is to understand


Document Name TCS Confidential

The Net Generation is Here

The Net Generation is inter-connected on a variety of Web based and Digital channels. Facebook Twitter Google Youtube Linkedin Wikipedia Blogs Forums Groups

This is changing the rules of Customer engagement


Document Name TCS Confidential

The Voice of the Customer must be heard


Listening to the voice of the customer (VoC) has acquired new meaning in the wake of Social Media
Sales and Marketing
Identify new value added service ideas Accelerated new product introductions Improved new product adoption rates, Increased sales Improved lead conversion rates Reduced sales and marketing expense

Product Innovation

Customer Acquire new customers Acquisition Grow share-of-wallet from


existing customers

Retained customers Improved customer responsiveness and service levels Improved customer satisfaction

Customer Retention

Customer Service

Higher customer satisfaction Faster implementation of service improvements Reduced customer service expense

Brand Reputation
Proactively manage brand risk Identify areas where damage control is required

-4-

Document Name TCS Confidential

TCS Point of View # 1

POV : Big Data is here to stay and is going to be an increasingly relevant arena of competitive differentiation Rationale : Given the information explosion going on all around, and the current stream of innovations happening altogether, Big Data is going to be very important. Organizations that learn how to harness Big Data and harvest useful information and insight from Big Data will create competitive advantage for themselves. They will be seen by their customers as keeping up with the March of technology capabilities. Others that are not current will appear to behind the times, and therefore not competitive. Implication : Most organizations will invest resources and time to uncover use case scenarios for Big Data in various Business Processes, and deploy Big Data platforms to harness and harvest useful insight from Big Data. While the particular sources of data that are relevant for a given Business scenario may vary from use case to use case within an organization, and from one Industry Vertical to another, the application of techniques for harnessing Big Data and harvesting useful insight will be nearly Universally adopted.
-5Document Name TCS Confidential

Big Data The New Frontier VELOCITY


Worldwide digital content will double in 18 months, and every 18 months thereafter.
IDC

Processing

VOLUME
Opportunities

Mobile

In 2005, humankind created 150 exabytes of information. In 2011, 1,200 exabytes were be created.
The Economist

Emails

GPS

CRM Data
Planning
Tweets

Inventory
Deman d

VARIETY
80% of enterprise data will be unstructured, spanning traditional and non traditional sources.
Gartner

Instant Messages

Speed
Sales Orders

Velocity Customer

Things
Service Calls

Transactions

Document Name TCS Confidential

Big Data Management and Interpretation

Data
Management Services Analytics Services Structured X Internal

Unstructured

External

-7-

Document Name TCS Confidential

TCS Point of View # 2


POV : There are two fundamental aspects to Big Data The harnessing aspect, i.e. the Technology required to Manage Big Data, and the harvesting aspect i.e. The Technology required to analyze and derive insight from Big Data. Rationale : Given the volume, variety, velocity characteristics of Big Data, it is not amenable to being managed by traditional technologies. It requires a new class of Big Data platforms e.g. The Hadoop ecosystem, the Map / Reduce Algorithm and technologies built on top of them, to harness Big Data. At the same time, analyzing Big Data with a view to harvesting useful nuggets of insight from a variety of Big Data sources requires completely different technologies as well. These two domains of technologies are complementary to each other, i.e. two sides of the Big Data coin. Implication : Both Technology domains need to be deployed for Big Data to be useful. Correspondingly the skills required to harness and manage Big Data, and the skills required for analyzing and interpreting Big Data are also necessary. However, they are generally different skills. Harnessing Big Data requires purely a technology orientation, while harvesting insights from Big Data requires a more comprehensive business context i.e. the Business problem we are trying to solve, and metrics we are trying to impact etc.
-8Document Name TCS Confidential

Big Data Technology is Here Now


Hadoop : Massively Parallel Processing Capability, running on commodity hardware

Big Data Technology handles data at extreme scale and is characterized by Massive parallel computing to divide and conquer workloads. Extremely flexible to allow unlimited data manipulation and transformation Massively scalable in terms of both technology and cost

Hbase and Hadoop/HDFS are designed to store and manage massive amounts of data

Hive, Mahout and R, enable query, analysis and running in-memory compute-intensive applications

The ecosystem of Big Data Technology is affordable, and within the reach of companies

Document Name TCS Confidential

What Does a Big Data Platform Do?

Document Name TCS Confidential

TCS Point of View # 3


POV : Big Data Technology Platforms built around the Hadoop ecosystem, using The Map / Reduce algorithms can be used to solve many traditional problems, i.e. not involving Big Data per se. Rationale : The Hadoop and Map/Reduce based frameworks, represent a paradigm Shift in Data Processing capabilities. While they originated in the context of handling Big Data from vendors such as Google, Yahoo, Amazon etc. they can be used to Handle many traditional Data Processing contexts as well. One example is the use Of the Hadoop Platform as an ETL Toolset working exclusively with traditional Structured, transactional and master data. Thus the Big Data Technology Platform Has use in contexts such as ETL, DWH, MDM, Analytics etc. Implication : Organizations which are experiencing extremely high workloads, in traditional Data Warehousing and Analytics contexts, are likely to experiment with Big Data Technologies for solving traditional data processing problems. In fact, many benefits ranging from significant performance improvements, total cost of ownership, increased throughput of processing activity, improved availability of data to end users, and many others can be generated from deploying Big Data Platforms, without the incorporation Big Data sources.
- 11 Document Name TCS Confidential

Hadoop as Transformation Platform in ETL

Within Hadoop Ecosystem Transactional Systems


MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS).

Data Warehouse

MapReduce / Hive /Pig HDFS

Hadoop Cluster

Less number of Higher end nodes

Tools like SQOOP could be leveraged to load data from and to HDFS
Document Name TCS Confidential

TCS Confidential

Hadoop complements Data Warehouse


Data-Mart on Hadoop (to store more granular data) Transactional Systems
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), and create the aggregates and the same could be moved to aggregate level data marts

MapReduce / Hive /Pig HDFS

Data Warehouse Data Marts at Aggregate Levels

Hadoop Cluster

Higher number of nodes for larger storage

Tools like SQOOP could be leveraged to load data from and to HDFS

Document Name TCS Confidential

Hadoop as an ad-hoc analysis platform


Transactional Systems
Hadoop as an ad-hoc analysis platform
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), this could provide the business analytics team a platform for innovation

Data Warehouse

MapReduce / Hive /Pig HDFS

Hadoop Cluster

Higher number of nodes for larger storage

Tools like SQOOP could be leveraged to load data from and to HDFS
TCS Confidential
Document Name TCS Confidential

TCS Point of View # 4


POV : The Big Data Technology and Product landscape is quite vast and varied right now. There are hundreds of products and offerings. Consolidation of Products and offerings will be natural over the 2-5 years. Rationale : The basic Hadoop and Map Reduce technologies which are at the heart of all Big Data Technology Platforms are available in three forms i.e. open source, proprietary and hybrid. Open Source technologies can be deployed as they are, and many companies are choosing to do this. However, they will have the issues of security privacy and robustness of management etc. Niche players are relatively new and will get consolidated in course of time. The major Technology vendors such as IBM, HP, Oracle, Teradata, Informatica etc. will complement, fill gaps and improve their offerings Implication : It is difficult to predict, which technologies will survive, which will get acquired and consolidated and which will simply die, at this time. Companies which are committed to the open source idea and wish to exploit this technology may invest in these directly, and build skills in this area. On the other hand, companies which are committed to Vendors such as IBM or Teradata, etc, may weigh the costs versus benefits of going with pure open source, or buy into a hybrid strategy, where some of the capability gaps are filled by the Vendors. This needs careful evaluation.
- 15 Document Name TCS Confidential

Big Data Product and Offering Landscape


Analytics / Visualization Search

CEP

No SQL

Data Integration

Tools

Data Integration

Hadoop Distributions

Appliance/ Vendor

Cloud Distributions

Document Name TCS Confidential

Pure-Play Vendors

Document Name TCS Confidential

Big Data Product Landscape

Commercial

Open Source

Hybrid

Document Name TCS Confidential

TCS Point of View # 5


POV : Unstructured Data cannot be consumed as it is, in its raw form. It must be processed into useful nuggets of information i.e. converted into a consumable Structured form, before it can be interpreted and acted upon. Rationale : Unstructured information cannot be interpreted and used by end users, as it is. It must be converted into a useful form. This requires filtering a lot of noise out of the data, since Big Data tends to have a lot of noise relative to useful data. Further the information content of Big Data streams, must be interpreted in the context of other more traditional types of information, before it can be deemed useful. This requires the Fusion of Big Data based information with more traditional structured information to derive useful insight. Implication : Big Data is not a new opportunity or capability that stands on its own. It is better considered as augmenting already existing Data Management and Analytics capabilities in an organization. Big Data platforms are not replacements for existing traditional Data Management and Analytics platforms. They merely add, mature and improve upon existing environments and capabilities. The information fusion i.e. the ability to bring together structured and unstructured information in the context of specific business problems and opportunities is what is needed to exploit Big Data.
- 19 Document Name TCS Confidential

An Example - Social Intelligence


Social Intelligence i.e. the process of generating useful knowledge from the web of social media activity is maturing : However the social Web is too big, moving too fast and too full of irrelevant data trash.
Radian 6 Visible Technologies

Listening

Synethesio Attensity

Converseon SDL

Dashboards

Networked Insights

Filtering

Lithium

Analysis

Friends

Fusion
Fans
Followers
- 20 -

Value Network Influencers


Document Name TCS Confidential

Listen & Learn Machine Learning

News Chatter Events


Respond Alert

Listen

Learn, Focus, Filter, Reason Fuse, Connect


Document Name TCS Confidential

Analyze

This requires Information Fusion


Real Time Streams
Real-Time Business Insights and Alerts Early-Problem Detection Market Intelligence Demand Signal Refinement

EIF Framework

Marketing

Analytics

Real Time Structured Database

Big SQL

No SQL Processing

Unstructured Data (HDFS)

Integrated Customer Insights environment

Document Name TCS Confidential

Enterprise Information Fusion (EIF)

Structured Information

Unstructured Information

Document Name TCS Confidential

Big Data requires connecting the dots


Marketing Public Relations Customer Service Sales Product Development Human Resources Finance

Web

Mobile
Tablets

Smartphones

Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services

Website Intranet Partner Portals SEO SEM Online Advertising Web presence Micro-sites ecommerce

Partner Portals

Big Data
Traditional Channels
Social Network Applications Social Search Engine Optimization Community management Social Media Expansion Social Business Initiatives Crowd sourcing

Call Center RFID, Monitors and Sensors

Social

Document Name TCS Confidential

In order to generate useful Insights


Marketing Public Relations Customer Service Sales Product Development Human Resources Finance

Big Data Big Insights


Mobile
Tablets

Web
Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services Social Network Applications Social Search Engine Optimization Community management Social Media Expansion Social Business Initiatives Crowd sourcing

Smartphones

Website Intranet Partner Portals SEO SEM Online Advertising Web presence Micro-sites ecommerce

Partner Portals

Traditional Channels

Social

RFID, Monitors and Sensors

Call Center

The new Technology Challenge Harnessing the power of Big Insights


Document Name TCS Confidential

TCS Point of View # 6


POV : The Fusion of Unstructured and Structured Information for a given Business context, requires Business domain expertise in addition to Data Analysis Expertise. This is a new science i.e. Data Science Rationale : While Information Fusion is a general expertise, its application is usually within the confines of a specific Business context. Examples of specific business contexts are Marketing, Sales, Brand Management, Customer Service, Fraud and Risk analytics etc. Within each Business context, the information sources that are relevant, and the process of extracting useful insights from Big Data, are unique and distinct. This requires knowledge and understanding of Data sources and the processes for deriving useful information from Big Data in business contexts. Implication : Data Science, and the role of a Data Scientist is going to be a new area of growth and development. The traditional Analyst who was equipped with managing and analyzing structured data is going to have to extend themselves to understand and work with non-traditional Big Data sources, and tools appropriate to working with them. There is likely to be a tremendous demand for Data Scientists in the future. It is possible that many universities and colleges may offer courses on Data Science and the Tools required to work with big Data.
- 26 Document Name TCS Confidential

Data Science and Advanced Analytics


Analytics is evolving to meet the needs of the market. Leaders can expect: Big Data

Future Direction
Business Analytics

Description
Business intelligence combines with advanced analytics to form a new category called business analytics Social data will play a greater role in decision processes The emergence of applications that bundle, data, knowledge, and analytics to solve business problems Analytics will increasingly identify market signals and initiate action, through context sensitive alerts The growing enterprise realization that Analytic COEs are required McKinsey Global Institute predicts a future shortage of analysts and managers with the necessary analytical skills Text Analytics is absorbed into business applications The shift from analytics as a reporter of process, to analytics as an enabler of process The growing role of analytics throughout the information life cycle
Document Name TCS Confidential

Social Channels Blogs, Wikis, Forums Social networking Groups User profiles Ratings, reviews, etc. Polls, chat, podcasting Audio, video, photos Events & calendar Private messaging+

Social Data Analytic Applications The Awareness-to-Action Imperative Analytic Centers of Excellence Analytic Outsourcing Text Analytics Maturation Process Enablement The Information Lifecycle

Instrumented Channels Smart grid Home appliances Cars Sensors Monitors Supply chain devices Other mobile devices Mobile Channels Mobile Applications Other Channels Video Audio Other

Analytics Classifications
Text Analytics Social Analytics Sentiment Analysis Brand Identity Product & Brand Affinity Reputation Driven Online-Economy Predictive Analytics Forecasting Targeting Fraud Detection, Anti-Fraud Analytics Regression, Predictive, Multivariate Propensity Price Elasticity Mobile Analytics

Digital Delivery Channels & Services Property Effectiveness Application Analytics Ad Analytics Geo-Spatial Analytics User profile and Relevance Identify New Opportunities
Segmentation Analytics Customer Segmentation in real-time Churn Analysis, Attrition Funnel Analysis Behavioral Segmentations

Document Name TCS Confidential

Big Data Analytics

Prescriptive
(What should happen?)

Optimizing Outcomes
Optimization Simulation

Identifying possible outcomes

Predictive
(What will happen?)

Domain Expertise Text Analytics Data Mining Knowledge

Predictive Modeling Statistical Analysis Visual Analytics Forecasting

Describing and analyzing outcomes

Descriptive
(What has Happened?)

Query, Analysis, Drill-Down, Ad-Hoc Reporting Dashboards and Scorecards Visual Analytics

* Source GCP Business Analytics


Document Name TCS Confidential

Examples of Uses of Big Data


Log Analytics & Storage Smart Grid / Smarter Utilities RFID Tracking & Analytics Fraud / Risk Management & Modeling 360 View of the Customer Warehouse Extension Email / Call Center Transcript Analysis Call Detail Record Analysis +++
30

Document Name TCS Confidential

Some Examples of Use Cases


Data Source High-Frequency Operations Low-Frequency Operations

Document Name TCS Confidential

Applications for Big Data Analytics


Smarter Healthcare Multi-channel sales Finance Log Analysis

Homeland Security

Traffic Control

Telecom

Search Quality

Manufacturing

Trading Analytics

Fraud and Risk

Retail: Churn, NBO

Document Name TCS Confidential

TCS Point of View # 7


POV : We are still in the very early days of Big Data adoption. The companies That have deployed and exploited Big Data technologies are Google, Yahoo, Amazon etc. The rest are just beginning their Big Data Journey. Rationale : Big Data Technologies have been used exclusively so far in companies that are dealing with Web Scale data. This technology is now slowly beginning to become viable for large commercial enterprises. Use cases which represent possible scenarios where Big Data can be fruitfully exploited, are still being discovered and documented. Very few case studies are available which represent full scale adoption of Big Data technologies. We are still in an era of experimentation, trial and error, do and learn, Proof of concept and Value cycles. Implication : Big Data adoption will increase steadily over the next few years. Gartner is predicting that we are still in the early Technology Trigger phase of Big Data. IDC and Wikibon are predicting a ten-fold growth in the Big Data Market over the next five years. Most companies will do well to set aside budgets for experimentation and laboratory scale projects to explore the uses of Big Data in various business contexts and in the process develop some skills in these new technologies and Data Science areas.
- 33 Document Name TCS Confidential

The Gartner Hype Cycle

Document Name TCS Confidential

What is the Market?

Document Name TCS Confidential

Business Drivers for Big Data

Document Name TCS Confidential

Thank You
Big data analytics will push businesses to become smarter, social, more relevant

TCS Confidential
30 October, 2012

Copyright 2012 Tata Consultancy Services Limited

You might also like