You are on page 1of 49

APP-CAP2985

Real-Time Analytics

Karthik Kannan, VMware, Inc. David McMath, VMware, Inc.

#vmworldapps

Disclaimer

This session may contain product features that are


currently under development.

This session/overview of the new technology represents


no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in


contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features
discussed or presented have not been determined.

It is a Big Data World

DATA VOLUME
Zettabyte

2.0 Zettabytes in Enterprise Data

2011

Machine To Machine

Exabyte

Petabyte

Interactions
Terabyte

Transactions

Mainframe

PC

Internet

Mobile

Machine

Time

Chart based on IDC and UC Berkeley Data Growth Estimates, Source: IDC & CosmoBC.com:
3

http://techblog.cosmobc.com/2011/08/26/data-storage-infographic/

The Big Data Problem

Velocity

Volume

Variety

Value

$
10s of Billions of Daily Records From Terabytes to Petabytes Multi-Structured Business Insights

Big Data Use Cases

Big Data Technology Stack

Transform IT + Transform Business + Transform Yourself

Leverage Cetas for Big data and Cloud Analytics

Customer profiles

1. Business analysts, LOB managers, execs


Need: out-of-the-box analytics Designed for: self-service for end-user leveraging app
developers

2. Data engineers/analysts
Need: out-of-the-box + some customization Designed for: admin + operations

3. Data scientists
Need: power capabilities + heavy customization Designed for: data scientists

4. IT, Operations
Need: out-of-the-box + some customization Designed for: IT/admin, ops

Traditional BI vs. Next-Gen Analytics


Old
vs.

Cetas

BI: static Charts, Reports, Dashboards

Multi-Dimensional Analytics, AutoInsights, Behavior Analytics

SQL Interface

Map/Reduce, No ETL/Schema-less, Streaming Interface, Data Modeling

App s

NoSQL Object DBs Store

Database, warehouse App


10

Real-time and Predictive Business Analytics

Category Online App Analytics Problem Provide real-time visibility into business for companies Insights and interpretation of Big data Use Cases Customer Behavior Analytics and Audience Segmentation Grow revenue and focus on monetization Grow traffic, user engagement and user acquisition

11

IT & Operational Intelligence


Category Operational Analytics Problem Provide real-time visibility into business & IT Operations Scalable Time-event Correlation and log processing Use Cases Log Analytics, Storage System Analytics Investigative analytics and root cause analyses Application/Infrastructure performance and management

12

Operational Intelligence Use case

For IT/ops users:


Cut server/application downtime Investigate security breaches and risks Troubleshoot application performance Detect abuse traffic and DOS attacks Create management dashboards Manage infrastructure globally Track SLA violations

13

Pattern Extraction/Anomaly Detection


Rule-based/user-defined pattern extraction Event Classification (machine learning)
Classify events by volume, type of events, source of events etc. Group events based on similarities Anomaly detection via supervised and unsupervised machine learning

Event correlation
Use (graph-based or vector-based) clustering

Causality Analysis
Link analysis

14

Depth of Cetas Analytics


Recommendation, Closedloop Decision Engine
Real-time Decisions, Recommender System Workflow

Data Modeling & Predictive Analytics Audience Segmentation & Cohorts

What-if Analysis, Machine Learning, Statistical/Math Modeling, Support for External Models (R, Mahout) Static & Dynamic (AutoDerived) Clusters, Product Cohorts, Cluster Churn over time, Market Basket Analysis
Did you know the top 10 countrie s are Data source 1 Data source 2 Users whose virality > 1 declined 13% in Dec

Data Discovery Ad-hoc Navigation & Root Cause Analysis


Market is here today

The male populatio n aged 18-34 plays Level 3 and buys $18 worth of virtual goods

Pattern Extraction, Autoderived data insights Interactive Search-based Analytics, Filtering and Drilldowns Aggregate Metrics standard and custom, MultiDimensional Graphs, TimeSeries, Click-Pathing, Funnel Conversion

Number of invites and Did you number know the of top 10 downloa countrie ds are s are both down in Jan

Campai gn X increase d CTR by 15% in Jan

Search

Visualization, Summary & Behavior Analysis


15

Analytics building blocks

Recommendation engine

Predictive analytics

Data modeling

Analytics

16

Cetas real-time analytics for the end user

17

Customer deployment
Application
APIs, SDK

JS

Cetas
Seamless ingestion thru app integration Rapid parsing, processing Multidimensional analytics

Transactional data

Real-time streaming events

RDBMS

ETL

DW DW

DW DW

Short-term data
18

Historical data

A Holistic View of a Big Data System

Real-Time Streams Real-Time Processing


Analytics Seamless ingestion Real Time Database
(HBase, GemFire Cassandra)

Big Query, Search


(Google, Spunk, Cetas)

Batch Processing

Structured and Unstructured Data

19

How do customers use Cetas analytics?

Discover multiple sources of live data Navigate data & perform multi-dimensional analysis Drill into and analyze time-trends Auto-discover key insights Use pre-defined measures to measure business metrics & KPIs Custom define new functions, as needed Create dynamic dashboards Take instant action to monetize users/customers optimally

20

Customer On-Boarding

Step 1: Self-registration & login by customer Step 2: Setting up of live and batch data feeds Step 3: Raw dimension and time analytics Step 4: Development of custom measures Step 5: Creation of dashboards & taking action

21

Value proposition

Instant intelligence for operations & business users


Real-time analysis of non-structured and transactional data

Auto derive trends and patterns


Enable businesses to take action Provide user behavior analytics

Unique predictive algorithms


To optimize the IT operations To monetize user/customer interactions

Dramatically reduce time, effort and cost


10x lower cost, 10x less effort, and 10x better intelligence
22

Evolution of business functions as services

Salesforce data has been trusted to the cloud Payroll data has been trusted to the cloud HR data has been trusted to the cloud In fact, a lot more of your data has been trusted to the cloud

Whats next?

Analytics-as-a-service

23

VMware DSS Mission

To provide the right information, at the right level, to the right user, at the right time, to create the right insights and make the right decision.

24

Decision Support Systems at VMware


Environment
MDM Trillium, UCM ETL Informatica Database Oracle (on VMware) Reporting Layer OBIEE Datawarehouse Structures EDW & Federated data marts (dimensional) 4000+ analytical consumers 2400+ jobs running 24x7 20+ Sources of information 12 Major categories of data: Billing, Sales Forecasting, Customers, Orders, Leads/Opportunities, Locations/Sites, Products/Bundles, Support, Keys/Entitlements, Marketing/Campaigns, Ecommerce/Web, Partners

Challenges Hypergrowth data, users, business lines, products No downtime theres always someone analyzing something Self Serve vs. Concierge different styles in different depts and geographies. Technology/Skill Balance theres always something new Apploranges (Apples+ Oranges) - Data Integration is difficult different atomic levels of data, different subject areas

25

Decision Support Systems at VMware


Environment
MDM Trillium, UCM ETL Informatica Database Oracle (on VMware) Reporting Layer OBIEE Datawarehouse Structures EDW & Federated data marts (dimensional) 4000+ analytical consumers 2400+ jobs running 24x7 20+ Sources of information 12 Major categories of data: Billing, Sales Forecasting, Customers, Orders, Leads/Opportunities, Locations/Sites, Products/Bundles, Support, Keys/Entitlements, Marketing/Campaigns, Ecommerce/Web, Partners

Challenges Hypergrowth data, users, business lines, products

Business Needs Our No downtime theres always someone analizing something Information that is
Self Serve vs. Concierge Cleaner different styles in different depts and geographies. Technology/Skill Balance theres always something new Apploranges (Apples+ Oranges) - Data Integration is and difficult different atomic levels of data, different subject areas

Cheaper Faster

Integrated

26

What has changed in the past year


Philosophy of Data
Data is Water. It should be clean (impurities removed), safe (standardized), understood (know how/when to use it) available as soon as you turn the faucet (near real time, never wait) Consumable (it any container excel, obiee, cubes, mobile, web, app, etc.)

POCS
We do 10-12 proofof-concepts per year now.

Virtualize
We use public, private, and hybrid clouds to bring data together from in-house and SAAS systems

Compartmentalization
Like Apple consumers are now accustomed to apps that are singularly focused, we have started to see report suites and OLAP capabilities that must be specialized for deep reactive and predictive analysis but also be linked to a larger ecosphere of data.

Just as water sustains life and enables the growth of organisms, data sustains the growth of organizations and enables insights.

27

Rapid Evolutionary Trends Right Here, Right Now


No Batch
Daily loads quickly become 2x/day then 4x/day, then hourly. QE: 3000 ETL and aggregation jobs

Data Data Everywhere


but not a byte to view

Visualize
We have been experimenting with various visualization techniques to turn the data sideways to see if it reveals something different/new.

DMTOF
We have been testing data marts on-the-fly allowing users to quickly save off subsets of enterprise data (columns, rows, subject areas, time frame) as static and dynamic snapshots to be saved off in a temporary cloud space for custom analysis.

We have started using replication in lieu of ETL to put the data AS PHYSICALLY CLOSE to the users as possible in order to eliminate SPOF, speed access, provide autonomy and policing

Please Write Back


We have been testing write back capabilities for reports and data allowing users to annotate charts and add custom data without affecting read-only production data

Storm Clouds
The use of clouds is accelerating

28

Conclusions and Recommendations

New paradigm for real-time analytics


Massively scalable Event-driven Search oriented Schema-less Machine and human data

End-user focused
Interactive, fast, simple Drill-downs, pre-defined and custom analytics Self-service (mostly!); no need for specialized skills

Discover data trends, patterns rather than predefined queries


Let the data tell you what to mine
29

Free analytics offer!

www.cetas.net

30

Product showcase

31

Data sources

32

Management (configure data sources) app integration

App integration, SDK

33

Date fields
Start Dt:

End Dt:

Our time stamp:

34

Raw dimension analytics

35

Filtering raw dimension values interactively

BEFORE

Filter

CACHEROLE values excluded

AFTER

36

Activity analytics time trend

37

Multi-dimensional analytics by time

38

Highest activity (Top 10) Presentation Name

39

Highest activity (Top 10) Users

40

Highest activity (Top 10) Users by Subject Area Name

41

Granular analytics Individual user by Presentation Name

42

Dynamic Dashboard

43

Aggregates Summaries and Uniques


Create custom aggregates summaries (totals) and uniques (distinct)

44

E.g., Summary number of users by Subject Area Name

45

Table view of data

46

Collaborate email, export etc.

Project container

Export

47

Q&A

48

FILL OUT A SURVEY


EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A $25 VMWARE COMPANY STORE GIFT CERTIFICATE

APP-CAP2985

Real-Time Analytics

Karthik Kannan, VMware, Inc. David McMath, VMware, Inc.

#vmworldapps

You might also like