You are on page 1of 56

BIG DATA- MORE THAN INFORMATION MANAGEMENT

INTRODUCTION: KISHOR BAGUL- CHIEF TECHNICAL OFFICER FOR NEW YORK STATE

DATA GENERATION: BRAD KRAMER- CAS SR. BUSINESS TECHNOLOGY ARCHITECT


DATA MANAGEMENT: DENNIS GOO- DIRECTOR OF PATHFINDING, INTEL DATA CENTER GROUP

DATA DELIVERY: PAT HERBERT- PRINCIPAL SYSTEMS ARCHITECT FOR BIG DATA AT SAS
BRINGING IT TOGETHER: NEAL FISHMAN- SENIOR IBM OPEN GROUP CERTIFIED IT ARCHITECT HEALTH CLUSTER AND BIG DATA: JOHN NORTON- HEALTH CLUSTER CIO, NYS OFFICE OF IT SERVICES

NYS IT Leadership Academy


Big Data - More than Information Management March 5, 2014

Enterprises have Data they never had access to before!


Transactions Monitoring Contracts Sensor Economic Population Sentiment Health Location

Operational

Enterprise ("Dark Data")

Email
Reports

Public & Open Data

Commercial

Mobile Industry

Social Media
Network

Correlations and patterns from disparate, linked data sources yield the greatest insights and transformative opportunities
Source: Gartner 2013

NYS: Unstructured Data Framework

Reporting / Monitoring Content / Process Analytics Workflow / BPMS

Document Management
APIs and Protocols
Proprietary Interfaces Portlet Standards CMIS Interfaces File Sharing & Transfer

Services
Repository Metadata Search Versioning Workflow Rendering Audit Life Cycle

Repositories
Content Image Databases SharePoint File Systems

Records Management

Archive Authentic, Immutable, Highly Available, Secure


Enterprise Content Management

Security & Access Control

Lets hear from our Big Data Experts!

Unlocking the Value in your BIG DATA


Brad Kramer, Sr. Business Technology Architect, NY State & Local Government
March 5, 2014

2013 CA. All rights reserved.

Big Data and Analytics


The remarkable cognitive capacity of Humans

determining the context for all interactions


Converting audio/video to text, providing semantic context and integrating with enterprise apps.

Sense - Collecting customer sentiments, integrating into business processes ; Respond to positive/negative product issues.... Processing reward and loyalty points that may hold unforeseen customer information patterns..

and Big Data

Big Data and Analytics


connecting with outside patterns Collecting external weather to gain an edge patterns / and integrating with

insurance, disaster recovery, even resort management systems. Analyzing social media trending after a comms outages pinpoint problem location, times and responseeven PR

detecting and responding to trends

Determining Big Data Value Potential Industry Landscape

** - McKinsey & Company - Big data: The next frontier for innovation, competition, and productivity

So Whats Holding us Back? Business Constraints, Operational Readiness?


Economics
57% of IT executives cited budget constraints as the top barrier to big data1 140,000190,000 deep analytical talent positions; 1.5 million data savvy managers needed to take advantage of Big Data (US alone)2

More than half of government agencies report a break in their system, saying at least one dataset has outgrown current management tools3 Big data is all about aggregating data across the organization, IT professionals must eliminate the silos of data controls
1 Information week The Big Data Management Challenge 2 - McKinsey & Company - Big data: The next frontier for innovation, competition, and productivity 3 - MeriTalk survey The Big Data Gap

Operations

Big Data The Management Challenge


Barriers to Entry

To manage big data, companies need to determine the right mix of policies and technologies to balance access and performance with capacity, security, and short and long-term costs.

Data Mentality

Dev / Opps / Risk

Analytical Methods Policies Business Model (consumer / provider)

Resources Talent

BIGGER THAN A VOLUME ISSUE

Accessibility and Capture

Big Data Bridging the Visibility Gap Key Management Considerations

Big Data available

VOLUME / VARIETY

Gain critical insights needed to capture and support your Big Data mix
Eliminate big data pollutants and prevent leakages

Manage information flows - as Big Data consumer and/or provider


Blend Big Data development and operations for speed and resilience

Technical processing capability

TIME/VELOCITY

Big Data Stack and Management Considerations


Insurance Manufacturing Marketing Procurement Finance

Business Healthcare Intelligence Social Analytics On premise

Visualization

DaaS PaaS

Predictive Modeling

IaaS
In-Memory Computing

Relational Hadoop Clusters ERP

CR M

HR

Refine and adapt management polices as conditions and information mix changes Protect the digital exhaust with flexible policies and data leakage controls Incorporate identity management for new apps, roles & responsibilities Manage the end-user experience and triage performance problems Infuse DevOps best practices from testing & QA to problem & change mgmt Scale using appropriate cloud models and automate information flows Carefully assess & mitigate new technology mgmt shortfalls (performance, CPU) Address privileged access reqs for new technologies & supporting platforms Outline architectural changes and refresh/upgrade priorities Baseline capacity & performance requirements according to Info Mix

Parallel Management Strategies

Big Data Winners...


Boston
App detects potholes, alerts Boston city officials

Philadelph ia

Making it easier to share knowledge within City Hall

Improving the ability to manage and secure data and applications with business context

SUNY

HPC s Next Frontier


Dennis Goo Intel C orporation Director Government Programs-R&D Pathfinding IAG& DCGPathfinding

Legal Disclaimer
Todays presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others

Copyright 2013, Intel Corporation. All rights reserved.

Executive Summary
There are Four Key Technology Initiatives that will Strengthen our Economy, Security and Competitiveness High Performance Computing, Cybersecurity, BIG DATA and Workforce/Education

Innovate Faster

Intel's Vision
This decade we will create and extend computing technology to connect and enrich the lives of every person on earth

http://mimobaby.com/ Intel Edison

Data Evolution

Counting to

Sampling,
With Randomization.

to

Big Data

Harnessing Analytics
Transform Data into actionable knowledge

Actionable Insights

Drowning in data. Starving for knowledge

Constituent Benefits

450M models

Technology R&D for a Balanced Architecture

CORE

Workforce

The Five Vs of Big Data


Traditional Data
Volume

Big Data

Gigabytes to Terabytes

Many Petabytes
All the time, Realtime All the Datatypes

Velocity

Occasional Batch Processing

Variety

Centralized, Structured

Veracity

Monitor, Contain, Recover

Protect, Detect, Report

VALUE

What will it take?


Grow the Skill Base for Big Data and C omputing Innovation Improve Open Source T ools and User Friendliness Build more energy efficient cores for scaling Create low-cost, high-density memory solutions

Integrate adaptive interconnect fabrics


Design in Resiliency and Security Resolve our Wireless Spectrum challenges Engage the User C ommunity Sooner: C o-Design and Makers

43+ YEARS OF INNOVATION

Its not just what we make. Its what we make possible.

Data Delivery Big Data- More than Information Management!

PAT HERBERT, SAS PRINCIPLE SYSTEMS ARCHITECT FOR BIG DATA NEW YORK STATE IT LEADERSHIP ACADEMY MARCH 5, 2014

Data Delivery

OVERVIEW

The Value of Big Data is Directly Related to the Ability to Analyze and Deliver Timely *and* Useful Insights About/From that Data. History Repeats Itself

Bringing Value to Big Data


Old Ways / New Ways Use Cases

Questions?

Data Delivery

HISTORY REPEATS ITSELF

Old problems become easily solved More data/problems/issues are added to the mix and they become new challenges New solutions allow us to handle new/bigger challenges but they are hard to use

Solutions are refined to make them easier to use/more practical


These approaches become the norm (aka old problems) We push the edges of these new normal approaches which then require new solutions to new problems (wash-rinse-repeat)

Data Delivery

BRINGING VALUE TO BIG DATA

THRIVING IN THE BIG DATA ERA

VOLUME

VARIETY
DATA SIZE

VELOCITY VALUE

TODAY

THE FUTURE

Data Delivery

BRINGING VALUE TO BIG DATA

Many of todays new problems are tied to big data challenges new and existing solutions are being refined, rearchitected and redefined in order to make them easier to use for this class of problems.
Techniques to improve and/or ease use of big data: Simplify access to breadth of new capabilities (eg. HDFS, Hive, Pig, MapReduce, HCatalog, ) Enable existing applications to use big data platforms without demanding new code

Fully utilize big data platforms in natural/native ways


Deliver results in appropriate formats rather than returning overly dense results in styles designed for lesser data

Data Delivery

BRINGING VALUE TO BIG DATA

DATA EXPLORATION
Predictive Descriptive

Modeling

Statistics

Variable Selection

Summarization

ANALYTICAL LIFECYCLE

MODEL DEVELOPMENT

Model Comparison Scoring

MODEL DEPLOYMENT

Data Delivery

OLD WAYS / NEW WAYS

If all you have is a hammer then everything looks like a nail! Until tools, applications and systems are provided to address new challenges, we will keep trying to address them in the old ways. This leads to: Wasted resources and effort Inconsistent and/or inaccurate results Frustration

Failure

Data Delivery

OLD WAYS

Data Delivery

OLD WAYS - SAMPLING

Data Delivery

NEWER WAYS OLD TOOLS JUST DONT WORK

Data Delivery

NEWEST WAYS CONSOLIDATED APPROACH

Central Entry Point

Integration

Role-Based Views

PREPARE
Manage data Load and join data Create calculated columns

EXPLORE
Perform ad hoc data exploration Insights generated through analytic visualizations

DESIGN
Create dashboard style reports for web or mobile

DELIVER
Mobile BI - native tablet applications delivering interactive reports Web and PDF

IN-MEMORY ANALYTICS ENGINE

Data Delivery

NEWEST WAYS SIMPLIFIED USER EXPERIENCE FOR COMPLEX CONCEPTS/PROBLEMS

Advanced Modeling and Machine Learning techniques

Data Delivery

NEWEST WAYS

Data Delivery Use Case


Issues:

PENSIONS & HEALTH BENEFITS

Unable to get timely/accurate info on health plans/outcomes Uncoordinated businesses (investments, actuarials, purchasing, IT services) BI did not include analytics Poorly chosen/negotiated benefit plans Inaccurate near/long-term risk profiles

Goals:

Improved negotiation results with providers by answering questions like:

How many flu shots? How many physician treated flu cases?
How many knee surgeries? How many post surgery complications? What are largest procedure costs? Where do they occur?

Measure health plan effectiveness improving best/most used components while eliminating most burdensome components Develop 360-degree view of members benefits selected and used/unused Every dollar saved (not wasted) increases baseline for investments to increase member benefits

Data Delivery Use Case


Issues:

MICHIGAN DEPARTMENT OF COMMUNITY HEALTH

Perception that fraud was a major issue no way to prove/disprove Models used were incomplete/inaccurate resulting in wasted effort by case-workers

Goals:

Identify, report on and eliminate Medicaid related fraud Minimize cost in developing fraud-related models Avoid fundamental changes to structure of existing data Provide near real-time reporting for analysts that could be easily shared 9:1 ROI

Big Data More Than Information Management


Bringing it Together; Creating New Types of Solutions
Presenter: Neal Fishman

Neal Fishman
Program Director Data Based Pathology Big Data Leadership Team Public Sector Sr. Certified IT Architect IBM Distinguished Chief/Lead IT Architect Open Group Neal leads the Public Sector architects within the Big Data Leadership Team Neal is the author of several books including Viral Data in SOA and Enterprise Architecture Using the Zachman Framework

nfishman@us.ibm.com +1 646.457.0798

Bring it Together: Context Diagram


Information Sources

Mobile and other Channels System of Record Applications


Front Office Applications Back Office Applications Support Services

10001 01011 01101

Decision Model Management


Information Service Calls Data Export Deploy Decision Models

Information Owner
Understand Information Sources Advertise Information Source

Governance, Risk and Compliance Platform


Understand Compliance
Report Compliance

Deploy Real-time Decision Models

Deploy Real-time Decision Models

Understand Information Sources Events

Line of Business Applications

Enterprise Service Bus

Information Federation Calls

Search Requests

New Sources
Third Party Feeds
Information Service Calls

Browse Requests

Data Lake

Information Service Calls Data Export

Simple, Ad Hoc Discovery and Analytics

Third Party Services


Internal Sources

Data Load

Data Feeds

Report Queries

Reporting

Information Management

Other Other Data Lakes Data Lakes

Inter-lake Exchange

Major Subsystems
Information Sources

Mobile and other Channels System of Record Applications


Front Office Applications Back Office Applications Support Services

10001 01011 01101

Decision Model Management


Information Service Calls Data Export Deploy Decision Models

Information Owner
Understand Information Sources Advertise Information Source

Governance, Risk and Compliance Platform


Understand Compliance
Report Compliance

Deploy Real-time Decision Models

Deploy Real-time Decision Models

Events

Real-time Decisions

Advanced Data Provisioning

Catalog Interfaces

Understand Information Sources

Line of Business Applications

Enterprise Service Bus

Information Federation Calls

Search Requests

New Sources
Third Party Feeds
Information Service Calls

Enterprise IT Interaction

Data Lake Repositories

Browse Requests

Line of Business Interaction

Information Service Calls Data Export

Simple, Ad Hoc Discovery and Analytics

Third Party Services


Internal Sources

Data Load

Data Feeds

Report Queries

Reporting

Information Management

Other Other Data Lakes Data Lakes

Inter-lake Exchange

Information Integration & Governance

Data Lake

Logical Architecture
Information Sources

Mobile and other Channels System of Record Applications


Front Office Applications Back Office Applications Support Services

10001 01011 01101

Decision Model Management


Information Service Calls Data Export Deploy Decision Models

Information Owner
Understand Information Sources Advertise Information Source

Governance, Risk and Compliance Platform


Understand Compliance
Report Compliance

Deploy Real-time Decision Models

Deploy Real-time Decision Models

Events

Real-time Decisions
STREAMING ANALYTICS

Advanced Data Provisioning

Catalog Interfaces
Line of Business Interaction Self-service Data Access

Understand Information Sources

Line of Business Applications

Enterprise Service Bus

Information Federation Calls

Enterprise IT Interaction Real-time Interfaces

Virtualized Data

CATALOG

INFORMATION VIEWS

Search Requests

New Sources
Third Party Feeds
Information Service Calls

Shared Operational Data

ASSET HUB

ACTIVITY HUB

CONTENT HUB

Search

Browse Requests

OPERATIONAL STATUS

Publishing Feeds

Deposited Data

Mash-up

Information Service Calls Data Export

Simple, Ad Hoc Discovery and Analytics

Data Lake Repositories

Third Party Services


Internal Sources

Data Load

APIs

INFORMATION WAREHOUSE
Data Feeds Information Ingestion Harvested Data DEEP DATA

Provision Reporting Data Marts Report Queries

Reporting

Other Other Data Lakes Data Lakes

Inter-lake Exchange

Information Integration & Governance

INFORMATION BROKER

CODE HUB

OPERATIONAL GOVERNANCE HUB

MONITOR

WORKFLOW

Information Management

Data Lake

Cognitive Computing: Creating New Types of Solutions


Decision Maker
Has a Question Distills to 2-3 Keywords Reads Documents, Finds Answers Finds & Analyzes Evidence

Search Engine
Finds Documents containing Keywords Delivers Documents based on Popularity

Expert
Understands Question Produces Possible Answers & Evidence Analyzes Evidence, Computes Confidence Delivers Response, Evidence & Confidence

Decision Maker
Asks a Natural Language Question Considers Answer & Evidence

Why Cognitive is Challenging


answering complex natural language questions requires more than keyword evidence
In May 1898 Portugal celebrated the 400th anniversary of this explorers arrival in India In May, Gary arrived in India after he celebrated his anniversary in Portugal Legend Keyword Hit arrived in celebrated celebrated
Red Text

Reference Text Answer Weak evidence

In May 1898 400th anniversary Portugal arrival in India

In May

anniversary in Portugal

India

explorer

Gary

This evidence suggests Gary is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence

Leveraging Multiple Algorithms to Gather Deeper Evidence


In May 1898 Portugal celebrated the 400th anniversary of this explorers arrival in India. On the 27th of May 1498, Vasco da Gama landed in Kappad Beach
Legend Temporal Reasoning celebrated landed in Portugal May 1898 400th anniversary
Date Match

Statistical Paraphrasing GeoSpatial Reasoning Reference Text Answer 27th May 1498

arrival in

Paraphrases

Stronger evidence can be much harder to find and score


Search far and wide Explore many hypotheses Find judge evidence

India

Geo-KB

Kappad Beach

Many inference algorithms

explorer

Vasco da Gama

HEALTH CLUSTER AND BIG DATA


John Norton- Health Cluster CIO, NYS Office of IT Services

HEALTH CLUSTER AND BIG DATA


1.Organizational Governance 2.Current Capabilities 3.Strategic Plan 4.Next Steps

Questions?

THANK YOU TO OUR TEAM


Arielle Bernstein (SAS) Jack Davis (VMWare), Kishor Bagul (ITS), Leslie
Woodin (HP), John Norton (ITS), Elizabeth Bush (OMH), Samikya Balguri (ITS), Mike OBoyle (IBM), Gerard Mule (FireEye), Peter Welling (CA), John Gable (Citrix), Joe Lynch (Oracle), Jim Hendler (RPI), Melanie Fekete (Intel), Rick Cobello (Schenectady County), Jalila Smith (IBM)

You might also like