Professional Documents
Culture Documents
INTRODUCTION: KISHOR BAGUL- CHIEF TECHNICAL OFFICER FOR NEW YORK STATE
DATA DELIVERY: PAT HERBERT- PRINCIPAL SYSTEMS ARCHITECT FOR BIG DATA AT SAS
BRINGING IT TOGETHER: NEAL FISHMAN- SENIOR IBM OPEN GROUP CERTIFIED IT ARCHITECT HEALTH CLUSTER AND BIG DATA: JOHN NORTON- HEALTH CLUSTER CIO, NYS OFFICE OF IT SERVICES
Operational
Email
Reports
Commercial
Mobile Industry
Social Media
Network
Correlations and patterns from disparate, linked data sources yield the greatest insights and transformative opportunities
Source: Gartner 2013
Document Management
APIs and Protocols
Proprietary Interfaces Portlet Standards CMIS Interfaces File Sharing & Transfer
Services
Repository Metadata Search Versioning Workflow Rendering Audit Life Cycle
Repositories
Content Image Databases SharePoint File Systems
Records Management
Sense - Collecting customer sentiments, integrating into business processes ; Respond to positive/negative product issues.... Processing reward and loyalty points that may hold unforeseen customer information patterns..
insurance, disaster recovery, even resort management systems. Analyzing social media trending after a comms outages pinpoint problem location, times and responseeven PR
** - McKinsey & Company - Big data: The next frontier for innovation, competition, and productivity
More than half of government agencies report a break in their system, saying at least one dataset has outgrown current management tools3 Big data is all about aggregating data across the organization, IT professionals must eliminate the silos of data controls
1 Information week The Big Data Management Challenge 2 - McKinsey & Company - Big data: The next frontier for innovation, competition, and productivity 3 - MeriTalk survey The Big Data Gap
Operations
To manage big data, companies need to determine the right mix of policies and technologies to balance access and performance with capacity, security, and short and long-term costs.
Data Mentality
Resources Talent
VOLUME / VARIETY
Gain critical insights needed to capture and support your Big Data mix
Eliminate big data pollutants and prevent leakages
TIME/VELOCITY
Visualization
DaaS PaaS
Predictive Modeling
IaaS
In-Memory Computing
CR M
HR
Refine and adapt management polices as conditions and information mix changes Protect the digital exhaust with flexible policies and data leakage controls Incorporate identity management for new apps, roles & responsibilities Manage the end-user experience and triage performance problems Infuse DevOps best practices from testing & QA to problem & change mgmt Scale using appropriate cloud models and automate information flows Carefully assess & mitigate new technology mgmt shortfalls (performance, CPU) Address privileged access reqs for new technologies & supporting platforms Outline architectural changes and refresh/upgrade priorities Baseline capacity & performance requirements according to Info Mix
Philadelph ia
Improving the ability to manage and secure data and applications with business context
SUNY
Legal Disclaimer
Todays presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/products/processor_number for details. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, Intel Xeon, Intel Core microarchitecture, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others
Executive Summary
There are Four Key Technology Initiatives that will Strengthen our Economy, Security and Competitiveness High Performance Computing, Cybersecurity, BIG DATA and Workforce/Education
Innovate Faster
Intel's Vision
This decade we will create and extend computing technology to connect and enrich the lives of every person on earth
Data Evolution
Counting to
Sampling,
With Randomization.
to
Big Data
Harnessing Analytics
Transform Data into actionable knowledge
Actionable Insights
Constituent Benefits
450M models
CORE
Workforce
Big Data
Gigabytes to Terabytes
Many Petabytes
All the time, Realtime All the Datatypes
Velocity
Variety
Centralized, Structured
Veracity
VALUE
PAT HERBERT, SAS PRINCIPLE SYSTEMS ARCHITECT FOR BIG DATA NEW YORK STATE IT LEADERSHIP ACADEMY MARCH 5, 2014
Data Delivery
OVERVIEW
The Value of Big Data is Directly Related to the Ability to Analyze and Deliver Timely *and* Useful Insights About/From that Data. History Repeats Itself
Questions?
Data Delivery
Old problems become easily solved More data/problems/issues are added to the mix and they become new challenges New solutions allow us to handle new/bigger challenges but they are hard to use
Data Delivery
VOLUME
VARIETY
DATA SIZE
VELOCITY VALUE
TODAY
THE FUTURE
Data Delivery
Many of todays new problems are tied to big data challenges new and existing solutions are being refined, rearchitected and redefined in order to make them easier to use for this class of problems.
Techniques to improve and/or ease use of big data: Simplify access to breadth of new capabilities (eg. HDFS, Hive, Pig, MapReduce, HCatalog, ) Enable existing applications to use big data platforms without demanding new code
Data Delivery
DATA EXPLORATION
Predictive Descriptive
Modeling
Statistics
Variable Selection
Summarization
ANALYTICAL LIFECYCLE
MODEL DEVELOPMENT
MODEL DEPLOYMENT
Data Delivery
If all you have is a hammer then everything looks like a nail! Until tools, applications and systems are provided to address new challenges, we will keep trying to address them in the old ways. This leads to: Wasted resources and effort Inconsistent and/or inaccurate results Frustration
Failure
Data Delivery
OLD WAYS
Data Delivery
Data Delivery
Data Delivery
Integration
Role-Based Views
PREPARE
Manage data Load and join data Create calculated columns
EXPLORE
Perform ad hoc data exploration Insights generated through analytic visualizations
DESIGN
Create dashboard style reports for web or mobile
DELIVER
Mobile BI - native tablet applications delivering interactive reports Web and PDF
Data Delivery
Data Delivery
NEWEST WAYS
Unable to get timely/accurate info on health plans/outcomes Uncoordinated businesses (investments, actuarials, purchasing, IT services) BI did not include analytics Poorly chosen/negotiated benefit plans Inaccurate near/long-term risk profiles
Goals:
How many flu shots? How many physician treated flu cases?
How many knee surgeries? How many post surgery complications? What are largest procedure costs? Where do they occur?
Measure health plan effectiveness improving best/most used components while eliminating most burdensome components Develop 360-degree view of members benefits selected and used/unused Every dollar saved (not wasted) increases baseline for investments to increase member benefits
Perception that fraud was a major issue no way to prove/disprove Models used were incomplete/inaccurate resulting in wasted effort by case-workers
Goals:
Identify, report on and eliminate Medicaid related fraud Minimize cost in developing fraud-related models Avoid fundamental changes to structure of existing data Provide near real-time reporting for analysts that could be easily shared 9:1 ROI
Neal Fishman
Program Director Data Based Pathology Big Data Leadership Team Public Sector Sr. Certified IT Architect IBM Distinguished Chief/Lead IT Architect Open Group Neal leads the Public Sector architects within the Big Data Leadership Team Neal is the author of several books including Viral Data in SOA and Enterprise Architecture Using the Zachman Framework
nfishman@us.ibm.com +1 646.457.0798
Information Owner
Understand Information Sources Advertise Information Source
Search Requests
New Sources
Third Party Feeds
Information Service Calls
Browse Requests
Data Lake
Data Load
Data Feeds
Report Queries
Reporting
Information Management
Inter-lake Exchange
Major Subsystems
Information Sources
Information Owner
Understand Information Sources Advertise Information Source
Events
Real-time Decisions
Catalog Interfaces
Search Requests
New Sources
Third Party Feeds
Information Service Calls
Enterprise IT Interaction
Browse Requests
Data Load
Data Feeds
Report Queries
Reporting
Information Management
Inter-lake Exchange
Data Lake
Logical Architecture
Information Sources
Information Owner
Understand Information Sources Advertise Information Source
Events
Real-time Decisions
STREAMING ANALYTICS
Catalog Interfaces
Line of Business Interaction Self-service Data Access
Virtualized Data
CATALOG
INFORMATION VIEWS
Search Requests
New Sources
Third Party Feeds
Information Service Calls
ASSET HUB
ACTIVITY HUB
CONTENT HUB
Search
Browse Requests
OPERATIONAL STATUS
Publishing Feeds
Deposited Data
Mash-up
Data Load
APIs
INFORMATION WAREHOUSE
Data Feeds Information Ingestion Harvested Data DEEP DATA
Reporting
Inter-lake Exchange
INFORMATION BROKER
CODE HUB
MONITOR
WORKFLOW
Information Management
Data Lake
Search Engine
Finds Documents containing Keywords Delivers Documents based on Popularity
Expert
Understands Question Produces Possible Answers & Evidence Analyzes Evidence, Computes Confidence Delivers Response, Evidence & Confidence
Decision Maker
Asks a Natural Language Question Considers Answer & Evidence
In May
anniversary in Portugal
India
explorer
Gary
This evidence suggests Gary is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence
Statistical Paraphrasing GeoSpatial Reasoning Reference Text Answer 27th May 1498
arrival in
Paraphrases
India
Geo-KB
Kappad Beach
explorer
Vasco da Gama
Questions?