Professional Documents
Culture Documents
Information Management
Information Management
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBMS CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANTY OR REPRESENTATION FROM IBM AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR (OR ITS
ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
Information Management
Agenda
New DB2 10.1 features Star schema query optimization Zig-zag join Multi-core query parallelism Intra-partition query parallelism Existing functionality, significantly improved
Information Management
Provides improved performance for star schema queries Star schemas are typically found in data marts or some data warehouses Introduces new star schema join method: (zig-zag join) Complimentary to existing star schema join methods Improves existing star schema detection algorithms Supports wider range of queries
Information Management
Star Schemas
Customer
custkey name address
Product
Daily Sales
perkey prodkey storekey promokey custkey quantity_sold price cost prodkey category upc_number
Period
perkey year month
Store
storekey storenumber region
Promotion
promokey promotype promodesc
Information Management
Star joins
Queries performed against star schemas
SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST) FROM PERIOD, DAILY_SALES, PRODUCT, STORE WHERE PERIOD.PERKEY=DAILY_SALES.PERKEY AND PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND STORE.STOREKEY=DAILY_SALES.STOREKEY AND CALENDAR_DATE BETWEEN AND '01/01/2005' AND '04/28/2005' AND STORE_NUMBER='03' AND CATEGORY=72 GROUP BY ITEM_DESC
Aggregate on dimension attribute, sum on fact measures Join fact to some subset of the dimensions Join fact foreign keys to dimension primary keys
Information Management
Period
CALENDAR_DATE BETWEEN '01/01/2005' AND '04/28/2005'
Product
CATEGORY=72
30M
Store
' STORE_NUMBER='03'
Information Management
Information Management
semi-join NLJOIN
semi-join NLJOIN
semi-join NLJOIN
Product
Daily Sales
Period
Daily Sales
Store
Daily Sales
Information Management
STOREKEY 30 30 40 40
PERIODKEY 50 50 50 50
NLJOIN
NLJOIN
PRODKEY 10 20 STOREKEY 30 40
Period Store
Product
10
Information Management
NLJOIN
PRODKEY 10 20 10 20 10 20 10 20 10 20 10 20 STOREKEY 30 30 40 40 30 30 40 40 30 30 40 40 PERIODKEY 50 50 50 50 60 60 60 60 70 70 70 70
Daily Sales
11
Information Management
12
Information Management
13
Information Management
DB2 10.1
Necessary to form a Necessary to form a star. star. Necessary to form a Necessary to form a star. star. Used by the Zigzag join Used by the Cartesian Hub plan, plan, if available. if available. One Unlimited
Star can not be formed in the query block in the presence of this SQL feature.
Star can be formed in the query block in the presence of these features and may include the feature in the star
9 10
14
Information Management
The new zigzag join method for star schema based queries
How does it work? First forms the virtual Cartesian product of dimensions. Avoids most non-productive probes from the Cartesian product into the fact table. Fact table index provides feedback to dimensions. zigzags through the dimensions and the fact table. Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.
New
15
Information Management
Fact
B,C
D2 (B,C)
(A,B), (C,D)
(B,A,C)
NO The columns B and C in the composite index are not in contiguous positions in the index.
16
Information Management
Fact
D2 (B)
Gap processing is implemented using new jump scan technology Explain facility indicates when gap processing is used New JUMPSCAN argument on IXSCAN operator Gap columns identified Arguments: -------------JUMPSCAN: (JumpScan Plan) TRUE Gap Info: --------------------Index Column 0: Index Column 1: Index Column 2:
17
Information Management
18
Information Management
FETCH
Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.
RIDSCAN
SORT
ZZJOIN(1)
Performs the zigzag join operation 1) Last leg is the fact table 2) Preceding legs are dimensions
TBSCAN
TBSCAN
TEMP
TEMP
Snowflake plans could either be: 1) Access of a single table or 2) Joins of multiple tables
Could be one of the following: 1) Index scan 2) Single-probe list-prefetch 3) All-probes list-prefetch
19
Information Management
A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.
ZZJOIN(1)
TBSCAN
TBSCAN
TEMP
TEMP
The
TEMP
RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).
To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.
20 2012 IBM Corporation
Information Management
IDXOVTMP: (A temporary index will be created and used on this temp) TRUE - the scan builds an index over the temporary table for random access. FALSE - the scan builds a fast integer sort structure for random access. The feedback predicates applicable to that dimension are displayed in the form of startstop key conditions.
21 2012 IBM Corporation
Information Management
Index scan and data page fetch Single-probe list-prefetch All-probes list-prefetch
22
Information Management
ZZJOIN
Any access on D1
Any access on D2
FETCH
IXSCAN
FACT
23
Information Management
ZZJOIN
Any access on D1
Any access on D2
FETCH
RIDSCAN
FACT
SORT
IXSCAN
24 2012 IBM Corporation
Information Management
FETCH
RIDSCN
SORT
ZZJOIN(1)
Any access on D1
25
Any access on D2
IXSCAN on FACT
2012 IBM Corporation
Information Management
Also known as intra-partition parallelism Supported in DB2 since V5 Query parallelism within a database partition Parallelism achieved without the use of the database partitioning feature Does not require any form of data partitioning Exploits symmetric multi-processor and/or multi-core processors DB2 10.1:
Extend the existing implementation Remove scalability bottlenecks
26
Information Management
27
Information Management
Combination of data and functional parallelism Data parallelism Dynamically partition data Assign partition to query task Easier to load balance User not required to partition data e.g. range, hash, etc Data dynamically assigned to query tasks Assign range of pages or rows (Range is a fixed size prior to DB2 10.1) Assign new range when range is consumed Provides dynamic load balancing Support table and index scans
28 2012 IBM Corporation
Information Management
Subagent 2
Pages 2-3
Subagent 3
Pages 4-5
Subagent 4
Pages 6-7
Subagent 3
Pages 8-9
Subagent 2
etc...
Information Management
Functional parallelism
Functional parallelism Divide query task by function Assign functional task to different execution units Doesn't require data partitioning Harder to load balance Must ensure execution units are equally busy DB2 implementation Single co-ordinator process services application requests Multiple sub-agent processes return data through local table queue Only 1 parallelized functional unit (section)
30 2012 IBM Corporation
Information Management
Functional parallelism
Query contains only 2 subsections and 1 local table queue Runtime operators coordinated using latches, semaphores, shared memory controls blocks
Co-ordinator
RETUR N (9 ) | LTQ (8 )
Subagent 1
Subagent 2
Subagent 3
Subagent 4
LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
Information Management
Results returned via shared memory table queue to co-ordinator agent Join processed in parallel by each agent by joining corresponding partitions Each agent scans a sort partition Hash partitioned sorts on prod_id
one partition per agent
Information Management
SQL Query
Prefetchers
33
Information Management
34
Information Management
Improved scalability
Load imbalance results in poor scalability REBAL redistributes rows to ensure all subagents do equal work Optimizer performs load balance analysis to determine REBAL placement
6.77122e+06 NLJOIN ( 6) 713706 63 /---------+----------\ 292.2 23173.3 REBAL FETCH ( 7) ( 9) 325.265 2456.85 11 2 | /---+----\ 292.2 23173.3 6.77122e+07 TBSCAN IXSCAN TABLE: DB2USER ( 8) ( 10) DAILY_SALES 325.265 1605.23 Q1 11 1 | | 2922 6.77122e+07 TABLE: DB2USER INDEX: SYSIBM PERIOD SQL091218161022180 Q2 Q1
degree
After
degree
35
Information Management
Improved scalability
More efficient parallelization techniques Partial-final UNIQUE GRPBY on unique key Can perform complete GRPBY without a partitioned SORT Improved access plan parallelization transformation costing Improved exploitation of stream partitioning Avoid partitioned SORT Reduce latch contention Dynamic straw scan unit (straw gulp size) Improved NLJOIN inner access Improved HSJOIN Improved partitioned SORT Prefetcher queues Various others
36 2012 IBM Corporation
Information Management
DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload Improved automatic degree determination degree=ANY Avoid parallelizing queries that wont benefit Improved automatic runtime degree reduction
37
Information Management
A BI workload using parallelism >1 INTRA_PARALLEL=YES Also specifies the degree upper limit The application specifies the requested degree using existing external controls
CREATE WORKLOAD report_wl APPLNAME (cognos) MAXIMUM DEGREE 8; ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;
Application control:
CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(YES) Toggles intra-partition parallelism at transaction boundaries
Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors
38
Information Management
INTRA_PARALLEL MAX_QUERYDEGREE
NO ANY
Instance Instance
DFT_DEGREE
ANY, 1~32,767
Database
DB configuration, Initial value for CURRENT DEGREE special register or package bind DEGREE option
CURRENT DEGREE
ANY, 1~32,767
DFT_DEGREE
Application
Special register, the degree of parallelism considered by the SQL compiler for dynamic SQL access plans DB2 bind option, the degree of parallelism considered by the SQL compiler for static SQL access plans CLP command, the degree of parallelism allowed at runtime for any access plans (dynamic or static SQL)
Bind DEGREE
ANY, 1~32,767
DFT_DEGREE
Package
1~32,767
Application
39
Information Management
Appendix
Additional material
40
Information Management
Typically contains much fewer rows than the fact table May be represented as a hierarchy of tables or a snowflake
e.g. product is further normalized to product, brand and category but this requires extra joins
Brand Product Category
41
Information Management
Star schemas
Fact table
Contains numeric measures of business information Queries perform computation (sum, avg, etc.) on measures Contains primary key columns from each dimension
Represent foreign keys referencing each parent dimension Can have explicit referential integrity, but not necessary for DB2
May have a primary key
hourly, daily, etc. finer granularity -> more rows coarser granularity -> limits drill down ability
Typically, local predicates arent applied directly
42
Information Management
Star schemas Data Marts Can contain multiple fact tables Each fact usually denotes a separate star Dimensions can be shared across stars e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions Queries may join multiple fact tables
43
Information Management
ZZJOIN(1) operator
An n-ary join method that joins together the dimension table/snowflakes and the fact table. Drives the process of forming probe rows from dimension tables/snowflakes, Probes the fact table to find matching fact table rows Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes. Feedback predicates identified in explain information New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK Displayed in the ZZJOIN operator details in db2exfmt
Predicates: ---------2) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D2FK = Q1.D2PK) 3) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D1FK = Q2.D1PK) Equal (=) No 0.25
44
Information Management
45
Information Management
46