You are on page 1of 31

Query

Across the Cloud the


Easy Way

Marty Gubar
Big Data SQL PM
Oracle Corporation
October 2018

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Optimizing the Data Platform: Goals
Business Analytics • Easily query and analyze all data
using your current apps
AI/ML | Data Science
• Safeguard sensitive data
• Fast performance
Integration

Streaming

Data
Data Lake
Warehouse • Deploy in minutes
• Reduce runtime costs
• Support hybrid deployments
Infrastructure
– On-premise to cloud
– Multi-cloud

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Roadmap
Big Data SQL 3.2 Big Data SQL 4.0 Cloud Smart Scan

• Use Oracle SQL to • Simplify • Autonomous


query all data deployments Database
• Safeguard access • Major enablement
using Oracle performance • Serverless - Zero
security policies breakthroughs Administration
• Fast performance • Automated • Pay for usage
thru scale-out metadata usage
processing • Extend breadth of
sources

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 4
Roadmap
Big Data SQL 3.2 Big Data SQL 4.0 Cloud Smart Scan

• Use Oracle SQL to • Simplify • Autonomous


query all data deployments Database
• Safeguard access • Major enablement
using Oracle performance • Serverless - Zero
security policies breakthroughs Administration
• Fast performance • Automated • Pay for usage
thru scale-out metadata usage
processing • Extend breadth of
sources

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 5
Big Data SQL Today

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 6
Big Data SQL Architecture
• Any application that queries
REST Python node.js SQL Java
R Graph
Oracle Database enhanced
Oracle Database – Seamlessly query external stores
Big Data SQL – Oracle Database Big Data SQL-enabled

• Scale-out, data local processing


Hive – Big Data SQL Cells deployed to Hadoop
Streaming

Big Data SQL Cells


Metadata cluster
Kafka

HDFS – Fan-out data local processing

Hadoop Data Lake


• Uses shared Hadoop metadata
– Hive metastore captures data structure and
location

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 7


Flexible Deployment Options
Oracle Cloud and
Engineered Systems Commodity
Oracle Cloud at Customer

Oracle Big Oracle Oracle Big Data Oracle Exadata DIY Cloudera or Oracle
Data Appliance Exadata Cloud Service Cloud Service Hortonworks Database

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 8
Start With “Big Data Enabled” External Tables

CREATE TABLE movielog ( • New types of external tables


click VARCHAR2(4000))
ORGANIZATION EXTERNAL (
– ORACLE_HIVE (leverage hive metadata)
TYPE ORACLE_HIVE – ORACLE_HDFS (specify metadata)
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(
• Access parameters used to describe how to
com.oracle.bigdata.tablename logs identify sources and process data on the
com.oracle.bigdata.cluster mycluster
)) hadoop cluster
REJECT LIMIT UNLIMITED;

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 9


Use Oracle SQL Analytics for All Data
Basic analytic SQL Descriptive statistical SQL
• Full breadth of Oracle SQL query •

Window functions
Ranking
• DBMS_STATS_FUNCS

language supported • Lag/lead


• Top-N
• Existing application now access • WITH clause Advanced statistical SQL
• Pivot/Unpivot
data across stores without • Aggregation functions
• Statistical aggregates
• Linear regression
modification • Approximate Queries
• Descriptive stats
• Ratio to Report
• Correlations
• Leverage existing skills and • Cross Tabs
management infrastructure • Hypothesis testing
Advanced analytic SQL • Distribution fitting
• Pareto analysis
• Pattern matching
• Model clause

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Slide - 10
next = lineNext.getQuantity();
}

if (!q.isEmpty() && (prev.isEmpty() || (eq(q, prev) && gt(q, next)))) {


state = "S";
return state;
}

Why an Advanced SQL Engine Matters


if (gt(q, prev) && gt(q, next)) { Ticker
state = "T";
return state;
}

Simple, Productive and Optimized Development


if (lt(q, prev) && lt(q, next)) {
state = "B";
return state;
}

10:00 10:05 10:10 10:15 10:20 10:25

Simplified, sophisticated, standards based syntax


if (!q.isEmpty() && (next.isEmpty() || (gt(q, prev) && eq(q, next)))) {
state = "E";

Finding Patterns in Stock Market Data - Double Bottom (W)


return state;
}

if (q.isEmpty() || eq(q, prev)) {


state = "F";
return state;
}

return state;
}

private boolean eq(String a, String b) {


SELECT first_x, last_z
if (a.isEmpty() || b.isEmpty()) {
return false; FROM ticker MATCH_RECOGNIZE (
}

}
return a.equals(b); PARTITION BY name ORDER BY time
private boolean gt(String a, String b) { MEASURES FIRST(x.time) AS first_x,
if (a.isEmpty() || b.isEmpty()) {

}
return false;
LAST(z.time) AS last_z
}
return Double.parseDouble(a) > Double.parseDouble(b);
ONE ROW PER MATCH
private boolean lt(String a, String b) {
if (a.isEmpty() || b.isEmpty()) { PATTERN (X+ Y+ W+ Z+)
return false;
}
return Double.parseDouble(a) < Double.parseDouble(b);
DEFINE X AS (price < PREV(price)),
}
Y AS (price > PREV(price)),
public String getState() {

}
return this.state;
W AS (price < PREV(price)),
}
BagFactory bagFactory = BagFactory.getInstance(); Z AS (price > PREV(price) AND
@Override
public Tuple exec(Tuple input) throws IOException { z.time - FIRST(x.time) <= 7 ))
long c = 0;
String line = "";
String pbkey = "";
V0Line nextLine;
V0Line thisLine;
V0Line processLine;
V0Line evalLine = null;

250+ Lines of Java UDF 12 Lines of SQL


V0Line prevLine;
boolean noMoreValues = false;
String matchList = "";
ArrayList<V0Line> lineFifo = new ArrayList<V0Line>();
boolean finished = false;

DataBag output = bagFactory.newDefaultBag();

20x less code


if (input == null) {
return null;
}
if (input.size() == 0) {
return null;
}
Object o = input.get(0);
if (o == null) {
return null;
}

11 Copyright
//Object © 2014, Oracle
o = input.get(0); and/or its affiliates. All rights reserved. 10/25/18 Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 11
if (!(o instanceof DataBag)) {
int errCode = 2114;
Data Visualization: No Changes to Query Kafka & Oracle

Current Network Error Stream Compare Sales Stream to


Benchmark (History)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 12
Securing Access to Data
• Support Source Security Rules Single User Application LDAP / DB Support Varied
Users Users Application
– Use access privileges defined on HDFS Employee Dir My HR Direct Access
Authentication
Methods
sources with multiuser authorization
• Extend Protection with Advanced Oracle Big Data SQL Add Oracle
Advanced
Oracle Security Policies Salary Emp security options

– Redaction
– VPD
Salary Emp Automatically
– Database Vault use ACLs on
Hadoop Cluster protected files
– Database Security Assessment Tool

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 13
Demonstration 1.
2.
3.
Review data in warehouse
Extend customer attributes with external data
Secure it
Seamlessly extend your warehouse 4. Add new detail facts – customer behavior information
5. Gain insights using advanced Oracle SQL

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 14
Big Data SQL 4.0

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 15
Roadmap
Big Data SQL 3.2 Big Data SQL 4.0 Cloud Smart Scan

• Use Oracle SQL to • Simplify • Autonomous


query all data deployments Database
• Safeguard access • Major enablement
using Oracle performance • Serverless - Zero
security policies breakthroughs Administration
• Fast performance • Automated • Pay for usage
thru scale-out metadata usage
processing • Extend breadth of
sources

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 16
Query Server: SQL on Hadoop
• An Oracle query engine deployed to a
Hadoop Cluster
REST Python node.js SQL R Graph Java
• Simple, zero maintenance
Oracle Database – Uses Hive metadata and Hadoop
Big Data SQL authorization
– Oracle data not saved to Query Server

Hive Big Data SQL • Included with Big Data SQL license
Streaming

Metadata Cells Big Data SQL


Kafka

– Limited use Oracle Database license


Query Server
HDFS
• Use in addition to Big Data SQL-
Data Lake enabled Oracle Database

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 17


Query Server: Manage Using Hadoop Cluster Mgmt Tools

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 18
Performance Breakthroughs

• Significant performance enhancements with distributed aggregation


– Utilize processing of Hadoop compute nodes for massive query acceleration (sum,
min, max, avg, count)
– Single table, multi-table joins
• Optimized C-Drivers for Text, Parquet, Enterprise Parquet and Avro
• Cell-based JSON processing for CLOBs

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 19
Aggregation Offload: Major Performance Breakthrough
40.00
34.85
Single table Count(*)
Elapsed (sec)

20.00
5.38 SELECT COUNT(*)
0.00 FROM store_sales
OFF ON

250.0 206.0 Single table: Add columns + Group By:


Elapsed Seconds

200.0 159.0
150.0 124.2 SELECT ss_store_sk
100.0 OFF sum(ss_wholesale_cost),
50.0 8.9 10.5 14.3 ON sum(ss_list_price)
0.0
1 2 4 FROM store_sales
# of SUM columns GROUP BY ss_store_sk

300.0 256.8
Multi-table: Join fact to dimension table
Elapsed Seconds

250.0
181.1 SELECT d_dom
200.0 151.5
150.0 OFF sum(ss_wholesale_cost),
100.0 ON sum(ss_list_price)
50.0 17.1 11.4 15.2
0.0
FROM store_sales, date_dim
1 2 4 WHERE ss_sold_date_sk=d_date_sk
GROUP BY d_dom
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 20
Support Object Store Sources
• Support data captured in
Oracle Database object stores
Oracle Big Data SQL – Oracle Object Storage, Amazon S3,
Azure Blob Storage
• Use new ORACLE_BIGDATA
driver
– Optimized C-mode driver support
– Support text, parquet, avro, json
Limitless, highly available,
economical storage

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 21
Big Data SQL Cluster: Separate Compute from Storage
Post-Big Data SQL 4.0
• Support hybrid deployments
Oracle Big Data SQL
Database – Data local processing for Hadoop
– Separate compute and storage for
Big Data SQL Big Data SQL Big Data SQL other cloud deployments
Hive Metadata Big Data SQL Cells Cell Cluster Cell Cluster Cell Cluster
• Improve performance and
HDFS Oracle Object
Store
Azure Blob
Storage
Amazon S3 decrease costs
– Extends database processing to local
data center
– Minimize data movement across
cloud

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 22
Important Investments on the Road to Autonomous DB
• Performance: Major breakthroughs in distributed database processing
• Data Sources: Added support for Object Stores
• Deployment: Option to separate compute and storage
• Metadata: Query Server automatically synchronizes external metadata
with Oracle Database query engine

These investments are critical enablers for autonomous


query processing of external data

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 23
Autonomous Database with
Cloud Smart Scan

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 24
Roadmap
Big Data SQL 3.2 Big Data SQL 4.0 Cloud Smart Scan

• Use Oracle SQL to • Simplify • Autonomous


query all data deployments Database
• Safeguard access • Major enablement
using Oracle performance • Serverless - Zero
security policies breakthroughs Administration
• Fast performance • Automated • Pay for usage
thru scale-out metadata usage
processing • Extend breadth of
sources

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 25
Autonomous Data Warehouse

• Easy
–Automated management
–Automated tuning: Simply load data and run
• Fast
–Based on Exadata technology
• Elastic
–Instant scaling of compute or storage with no downtime

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 26


Autonomous Database with Cloud Smart Scan
SQL
• Economically query object store at scale
Autonomous Database – File types: text, json, parquet, avro
• Cloud Smart Scan serverless compute
– Always available. Zero setup/configuration
– Resources automatically allocated based on
Cloud Smart Scan query requirements to meet SLA
Compute pool scans, filters
and aggregates data – Scans, filters and aggregates data
• Pay per query based on usage

Oracle Amazon Azure


Object S3 Blob
Storage Storage

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 27
Metadata Catalog
• Catalog scans and inventories data from
Autonomous Database
across all locations
– Crawl sources and identify schemas
– Understand business definition, lineage and
Metadata Catalog more
– Identify and tag sensitive data
• Metadata immediately available for query
– No need to define tables and parsing rules

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 28
Oracle Data Management Vision

PROCESSING

Spark Oracle SQL Machine Learning Graph Spatial Search

METADATA CATALOG

STREAMING PERSISTENCE

Event Hub STORAGE DATABASE

Object Store HDFS Autonomous DW Autonomous TP


INTEGRATION
Block NoSQL MySQL
Data Integration Platform

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29


Day-by-Day Guide to OOW

https://cloudcustomerconnect.oracle.com/posts/9aaeb6c91a
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 30
http://cloudcustomerconnect.oracle.com

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

You might also like