DB2 LUW 10 Star Schema and MCP

Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query Parallelism
John Hornibrook IBM Canada
Information Management
2012 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBMS CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANTY OR REPRESENTATION FROM IBM AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR (OR ITS
ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
Agenda
New DB2 10.1 features Star schema query optimization Zig-zag join Multi-core query parallelism Intra-partition query parallelism Existing functionality, significantly improved
Star Schema Query Optimization
Provides improved performance for star schema queries Star schemas are typically found in data marts or some data warehouses Introduces new star schema join method: (zig-zag join) Complimentary to existing star schema join methods Improves existing star schema detection algorithms Supports wider range of queries
Star Schemas
Customer
custkey name address
Product
Daily Sales
perkey prodkey storekey promokey custkey quantity_sold price cost prodkey category upc_number
Logical DB design resembles a star Central table contains business facts

Sales prices, cost, quantities, etc.
Period
perkey year month
Surrounding tables contain dimensional data

Time, location, characteristics, etc.
Store
storekey storenumber region
Promotion
promokey promotype promodesc
Each dimension is a parent of the fact table

1:N from a dimension to the fact
Star joins
Queries performed against star schemas
SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST) FROM PERIOD, DAILY_SALES, PRODUCT, STORE WHERE PERIOD.PERKEY=DAILY_SALES.PERKEY AND PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND STORE.STOREKEY=DAILY_SALES.STOREKEY AND CALENDAR_DATE BETWEEN AND '01/01/2005' AND '04/28/2005' AND STORE_NUMBER='03' AND CATEGORY=72 GROUP BY ITEM_DESC
Aggregate on dimension attribute, sum on fact measures Join fact to some subset of the dimensions Join fact foreign keys to dimension primary keys
Constrain on dimension attributes
Star join dilemma

No single dimension may filter the fact table well But a combination of dimensions may filter well
840,000 category 72 products sold during Jan. to April 2005 53,000 category 72 products sold during Jan. to April 2005 in store #3
How do we filter with a combination of dimensions?
Period
CALENDAR_DATE BETWEEN '01/01/2005' AND '04/28/2005'
50M Daily Sales 750M rows 20M
Product
CATEGORY=72
30M
Store
' STORE_NUMBER='03'
Star join solutions Specialized join methods (pre-DB2 10.1):

Semi-join with index ANDing Use combinations of fact table indexes to avoid accessing data pages Hub join Use Cartesian product of dimension rows to provide better fact keys Query must meet star join criteria Both methods can be built and competed Regular join plans are still built and competed with either method Costing decides Specialized methods arent always the winners
Semi-join index ANDing star join

Produce filtered fact table (Daily_Sales) with foreign key indices Execute "semi-join" with each dimension that filters the fact table "AND" RID-maps from each semi-join with next semi-join Retrieve fact table columns via RIDs
FETCH Daily Sales
rid bitmap > each semi-join eliminates bits ->

101101101001111011011001 100100101001001010011001 000100001000000010001000
semi-join NLJOIN
semi-join NLJOIN
semi-join NLJOIN
Product
Daily Sales
Period
Daily Sales
Store
Daily Sales
Hub star join

Form a Cartesian join of filtering dimensions Cartesian join -> no join predicates Cartesian join result should be small to be effective Join the Cartesian join result to the fact table using a multi-column fact table index.
PRODKEY 10 20 10 20 PRODKEY 10 20 10 20 STOREKEY 30 30 40 40
STOREKEY 30 30 40 40
PERIODKEY 50 50 50 50
NLJOIN
NLJOIN Daily Sales

PERIODKEY 50 Probe fact table with multi-column index on: PRODKEY,STOREKEY,PERIODKEY
NLJOIN
PRODKEY 10 20 STOREKEY 30 40
Period Store
Product
10
Hub star join

Works well if Cartesian result is small Cartesian may contain many key combinations that dont exist in the fact table Results in unnecessary fact table index probes.
NLJOIN
PRODKEY 10 20 10 20 10 20 10 20 10 20 10 20 STOREKEY 30 30 40 40 30 30 40 40 30 30 40 40 PERIODKEY 50 50 50 50 60 60 60 60 70 70 70 70
Daily Sales
11
DB2 10.1 Star Schema Highlights

Introduces a new zigzag join method that builds upon the zigzag join technology available in Redbrick that has proven unique performance advantage in the industry. Provides consistent performance for warehouse queries. Adds a new star detection method that is more reliable. Supports star schema queries in single and multiple subject areas with snowflakes. Exploits indexes even when there is a gap in probing key, reducing the number of indexes that need to be created. Works seamlessly for range partitioned tables and in serial, SMP and DPF environments. Can use MDC block indexes on the fact table for enabling zigzag join. Recommends multi-column indexes to enable zigzag join through explain diagnostics and index advisor in Optim Query Tuner (OQT)
12
Enhancing the star detection in DB2 pre-10.1

DB2 (pre-10.1) recognizes a star By analysis of sizes of tables and join predicates. A star is detected after application of local filtering and snowflake joins. The New Star Detection in DB2 10.1: Only requirement: joining dimension column(s) must be unique New Detects multiple stars per query block Allows a star to be detected with fewer restrictions Much more reliable The new star detection method also enables pre-DB2 10.1 star schema plans. Pre-DB2 10.1 detection is invoked if the new star detection fails to detect any star.
13
Comparison of old and new star detection methods:

No. 1 2 3 4 5 6 7 8 Requirement/Restriction Minimum of three base tables Minimum of two equijoin predicates Multi-column index on fact table Number of fact tables allowed Non-deterministic or side-effect predicates Non-equijoin predicates Sub-query predicates Correlation among tables in a snowflake Simple XML predicates Derived (non-base) tables Excluded from the star. Can be included in the star.
Before DB2 10.1
DB2 10.1
Necessary to form a Necessary to form a star. star. Necessary to form a Necessary to form a star. star. Used by the Zigzag join Used by the Cartesian Hub plan, plan, if available. if available. One Unlimited
Star can not be formed in the query block in the presence of this SQL feature.
Star can be formed in the query block in the presence of these features and may include the feature in the star
9 10
14
The new zigzag join method for star schema based queries
How does it work? First forms the virtual Cartesian product of dimensions. Avoids most non-productive probes from the Cartesian product into the fact table. Fact table index provides feedback to dimensions. zigzags through the dimensions and the fact table. Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.
New
15
Using a multi-column index in a zigzag join

Pre-requisite Columns that participate in the join are included in the index Index columns from at least two dimension tables are completely covered by join predicates Consider this star schema based query: D1 has primary key A D2 has a composite primary key (B,C) D3 has primary key D These PK columns are used in equi-join operations with the fact table
Fact table index definition Qualified? Why? (A,D), (A,B,C), (B,C,D), (C,B,D) YES The index completely covers two dimensions. (A,B,C,D), (A,C,B,D) YES The index completely covers three dimensions. D3 (D) D D1 (A) A
Fact
B,C
D2 (B,C)
(A,B), (C,D)
(B,A,C)
NO The index does not completely cover the dimension D2.
NO The columns B and C in the composite index are not in contiguous positions in the index.
16
Zigzag join with index key gap processing

Gap processing allows a single multi-column index to be used for a bigger set of queries. Greatly reduces the number of fact table indexes E.g., a fact table index on (A, C, B) allows zigzag join when there is no join on C
D1 (A) A
Fact
D2 (B)
Gap processing is implemented using new jump scan technology Explain facility indicates when gap processing is used New JUMPSCAN argument on IXSCAN operator Gap columns identified Arguments: -------------JUMPSCAN: (JumpScan Plan) TRUE Gap Info: --------------------Index Column 0: Index Column 1: Index Column 2:
17
Status --------------------No Gap Positioning Gap No Gap

Multi-column index recommendations

New explain diagnostic message recommending multi-column fact table indexes The optimizer performs analysis of primary/unique keys and equi-join predicates in the query that and detects that: the query is based on a star schema and a multi-column index does not exist or a different multi-column index might provide better performance Extended Diagnostic Information: -----------------------------------------------Diagnostic Identifier: 1 Diagnostic Details: EXP0256I Analysis of the query shows that the query might execute faster if an additional index was created. Schema name: "STAR". Table name: "FACT". Column list: "(F3, F2, F1, F0)". Optim Query Tuner provides a workload based index advisor that uses the above feature to determine a consolidated set of index recommendations.
18
Understanding ZZJOIN plan components

ZZJOIN(2)
Performs data prefetch of the fact table for an all-probes List-Prefetch.
FETCH
Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.
RIDSCAN
SORT
ZZJOIN(1)
Performs the zigzag join operation 1) Last leg is the fact table 2) Preceding legs are dimensions
Scans either: 1) Index over temp or 2) Fast integer sort array
TBSCAN
TBSCAN
access plan for fact table
TEMP
TEMP
Builds either: 1) Index over temp or 2) Fast integer sort
plan for snowflake 1
Snowflake plans could either be: 1) Access of a single table or 2) Joins of multiple tables
Could be one of the following: 1) Index scan 2) Single-probe list-prefetch 3) All-probes list-prefetch
19
Accessing a dimension in a zigzag join plan
A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.
ZZJOIN(1)
TBSCAN
TBSCAN
access plan for fact table
TEMP
TEMP
The
TEMP
operator shows the following information (new operator argument):
RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).
To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.
20 2012 IBM Corporation
Fast integer sort and index-over-temp

Two new dimension access methods are implemented to ensure efficient random access of the dimensions by the zigzag join operator. An index is created over the TEMP operator (IOT) using dimension join columns. Additional columns may be included in the index as include columns A fast integer sort (FIS) data structure is built using the join key from the dimension. This method has an extension to allow additional columns if the join key is of type INTEGER. In order for the optimizer to pick fast integer sort, the dimension must not have a composite key and the joining column must be of type INTEGER or BIGINT. If the join column is of type BIGINT, fast integer sort can be used only if no other dimension column is required for subsequent operations. The
TBSCAN
operator (input to ZZJOIN(1) operator) shows the following:
IDXOVTMP: (A temporary index will be created and used on this temp) TRUE - the scan builds an index over the temporary table for random access. FALSE - the scan builds a fast integer sort structure for random access. The feedback predicates applicable to that dimension are displayed in the form of startstop key conditions.
Fact table index access strategies
Index scan and data page fetch Single-probe list-prefetch All-probes list-prefetch
22
Fact table index access

IXSCAN-FETCH plan: The index scan accesses the index over the fact table to retrieve RIDs from the fact table matching the input probe values. These fact table RIDs are then used to fetch the necessary fact table data.
ZZJOIN
Any access on D1
Any access on D2
FETCH
IXSCAN
FACT
23
Fact table access using single-probe list-prefetch plan

The list prefetch plan executes for every probe row from the combination of dimension tables/snowflakes. The index scan over the fact table finds fact table RIDs matching the input probe values. The SORT, RIDSCAN and FETCH operators sort RIDs according to data page ids and start off list prefetchers to get the fact table data.
ZZJOIN
Any access on D1
Any access on D2
FETCH
RIDSCAN
FACT
SORT
IXSCAN
Fact table access using all-probes list-prefetch plan

All matching RIDs from all the probes are sorted together in the order of the fact table data pages and the list prefetchers started to retrieve the necessary fact table data . The benefit of sorting all the RIDs in this fashion is that it helps achieve better prefetching and can lower the number of physical I/Os. A back-join with each of the dimension tables is necessary to retrieve the dimension table columns required for subsequent operations Dimension columns do not flow through list-prefetch operation Back-join represented as a 2nd ZZJOIN operator
ZZJOIN(2)
FETCH
RIDSCN
SORT
ZZJOIN(1)
Any access on D1
25
Any access on D2
IXSCAN on FACT
Multi-core Query Parallelism
Also known as intra-partition parallelism Supported in DB2 since V5 Query parallelism within a database partition Parallelism achieved without the use of the database partitioning feature Does not require any form of data partitioning Exploits symmetric multi-processor and/or multi-core processors DB2 10.1:
Extend the existing implementation Remove scalability bottlenecks
26
Multi-core Query Parallelism Use Cases

Large OLTP reporting systems Reporting jobs can often be a large part of the batch processing Workloads are normally running on large multi-processor machines SMP, with multiple cores, sometimes with hyper-threading Improve multi-core query parallelism to reduce the time the reporting jobs take within the batch windows C-Class warehouse workloads Targeting warehouse and marts that are up to 4-5 TB Will be running on x or p servers with anywhere from 8 to 32 cores Simple setup using ESE (i.e. no database partitioned) Improve query response through multi-core parallelism
27
Current intra-partition parallelism architecture
Combination of data and functional parallelism Data parallelism Dynamically partition data Assign partition to query task Easier to load balance User not required to partition data e.g. range, hash, etc Data dynamically assigned to query tasks Assign range of pages or rows (Range is a fixed size prior to DB2 10.1) Assign new range when range is consumed Provides dynamic load balancing Support table and index scans
Dynamic data allocation straw scans

Degree=4
Subagent 1
Pages 0-1
Subagent 2
Pages 2-3
Subagent 3
Pages 4-5
Subagent 4
Pages 6-7
Subagent 3
Pages 8-9
Subagent 2
etc...
Functional parallelism
Functional parallelism Divide query task by function Assign functional task to different execution units Doesn't require data partitioning Harder to load balance Must ensure execution units are equally busy DB2 implementation Single co-ordinator process services application requests Multiple sub-agent processes return data through local table queue Only 1 parallelized functional unit (section)
Functional parallelism
Query contains only 2 subsections and 1 local table queue Runtime operators coordinated using latches, semaphores, shared memory controls blocks
Co-ordinator
RETUR N (9 ) | LTQ (8 )
Subagent 1
Subagent 2
Subagent 3
Subagent 4
LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
Intra-partition parallelism example

select p.name, p.prod_id, pa.attribute from product p, prodatr pa where p.prod_id = pa.prod_id;
LTQ (8 ) | MSJOIN (7 ) / - -+ - -- - -\ TBSCAN TBSCAN (3 ) (6 ) | | SORT S O RT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR
Results returned via shared memory table queue to co-ordinator agent Join processed in parallel by each agent by joining corresponding partitions Each agent scans a sort partition Hash partitioned sorts on prod_id
one partition per agent
Parallel table scans ("straw" scans)
Intra-partition parallelism architecture

Compile Time
Single query involves 1 coordinating agent n sub agents m prefetchers (shared) All executing in parallel on available processors Combination of... Data parallelism Each agent works on subset of data Data dynamically assigned so user not required to partition data Functional parallelism Each agent works on different query function, e.g. scan, sort User can control "degree" of parallelism Also benefits I/O bound uniprocessors Agent Agent Agent
SQL Query
Query Optimizer Best Query Plan Threaded Code

Run Time
Prefetchers
33
DB2 10.1 Multi-core query parallelism

Improved scalability
Within the current architecture Scale near-linearly to degree 32
Achieved by: 1.Improved load balance New rebalance (REBAL) access plan operator 2.More efficient parallelization techniques Move LTQ higher in the access plan 3.Reduce latch contention
34
Load imbalance results in poor scalability REBAL redistributes rows to ensure all subagents do equal work Optimizer performs load balance analysis to determine REBAL placement
6.77122e+06 NLJOIN ( 6) 713706 63 /---------+----------\ 292.2 23173.3 REBAL FETCH ( 7) ( 9) 325.265 2456.85 11 2 | /---+----\ 292.2 23173.3 6.77122e+07 TBSCAN IXSCAN TABLE: DB2USER ( 8) ( 10) DAILY_SALES 325.265 1605.23 Q1 11 1 | | 2922 6.77122e+07 TABLE: DB2USER INDEX: SYSIBM PERIOD SQL091218161022180 Q2 Q1
Multi-core Query Parallelism Before
degree
After
degree
35
More efficient parallelization techniques Partial-final UNIQUE GRPBY on unique key Can perform complete GRPBY without a partitioned SORT Improved access plan parallelization transformation costing Improved exploitation of stream partitioning Avoid partitioned SORT Reduce latch contention Dynamic straw scan unit (straw gulp size) Improved NLJOIN inner access Improved HSJOIN Improved partitioned SORT Prefetcher queues Various others
DB2 10.1 Multi-core query parallelism externals

Support mixed workloads
Parallelize report queries in an OLTP system Reduce parallel infrastructure overhead on OLTP queries Pre DB2 10.1 there is a 10-15% impact just by setting INTRA_PARALLEL=ON
In ESE only. DPF unconditionally enables parallel infrastructure
DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload Improved automatic degree determination degree=ANY Avoid parallelizing queries that wont benefit Improved automatic runtime degree reduction
37
Controlling query parallelism

WLM workload control: An OLTP workload that doesnt use parallelism =1 INTRA_PARALLEL=NO
CREATE WORKLOAD banking_wl APPLNAME (banking) MAXIMUM DEGREE 1;
A BI workload using parallelism >1 INTRA_PARALLEL=YES Also specifies the degree upper limit The application specifies the requested degree using existing external controls
CREATE WORKLOAD report_wl APPLNAME (cognos) MAXIMUM DEGREE 8; ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;
Application control:
CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(YES) Toggles intra-partition parallelism at transaction boundaries
Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors
38
Pre-DB2 10.1 intra-partition parallelism external controls

Parameter Value Default Scope Comment
INTRA_PARALLEL MAX_QUERYDEGREE
NO,YES ANY, 1~32,767
NO ANY
Instance Instance
DBM configuration DBM configuration, Valid only if INTRA_PARALLEL is ON
DFT_DEGREE
ANY, 1~32,767
Database
DB configuration, Initial value for CURRENT DEGREE special register or package bind DEGREE option
CURRENT DEGREE
ANY, 1~32,767
DFT_DEGREE
Application
Special register, the degree of parallelism considered by the SQL compiler for dynamic SQL access plans DB2 bind option, the degree of parallelism considered by the SQL compiler for static SQL access plans CLP command, the degree of parallelism allowed at runtime for any access plans (dynamic or static SQL)
Bind DEGREE
ANY, 1~32,767
DFT_DEGREE
Package
SET RUNTIME DEGREE command
1~32,767
Application
39
Appendix
Additional material
40
Star schemas Dimension tables

Contain descriptive information to augment fact rows Used to filter fact rows Query results are aggregated on dimension attributes Contains a primary key
possibly multiple columns generated, meaningless numeric value
Typically contains much fewer rows than the fact table May be represented as a hierarchy of tables or a snowflake
e.g. product is further normalized to product, brand and category but this requires extra joins
Brand Product Category
41
Star schemas
Fact table
Contains numeric measures of business information Queries perform computation (sum, avg, etc.) on measures Contains primary key columns from each dimension
Represent foreign keys referencing each parent dimension Can have explicit referential integrity, but not necessary for DB2
May have a primary key
Composite of the foreign keys or Single, generated, meaningless numeric value

Number of rows depends on fact granularity
hourly, daily, etc. finer granularity -> more rows coarser granularity -> limits drill down ability
Typically, local predicates arent applied directly
42
Star schemas Data Marts Can contain multiple fact tables Each fact usually denotes a separate star Dimensions can be shared across stars e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions Queries may join multiple fact tables
43
ZZJOIN(1) operator
An n-ary join method that joins together the dimension table/snowflakes and the fact table. Drives the process of forming probe rows from dimension tables/snowflakes, Probes the fact table to find matching fact table rows Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes. Feedback predicates identified in explain information New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK Displayed in the ZZJOIN operator details in db2exfmt
Predicates: ---------2) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D2FK = Q1.D2PK) 3) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D1FK = Q2.D1PK) Equal (=) No 0.25
Equal (=) No 0.25
44
ZZJOIN (2) operator

Only required for all-probes list-prefetch. Uses the join columns to locate the matching row in the temporary table so that the required non-join columns from the dimension table can be retrieved. Makes use of the efficient random access method such as FIS or IOT to retrieve the dimension table columns required for subsequent operations. Also known as backjoin Indicated in explain by BACKJOIN argument of ZZJOIN operator
45
Star schema plans in DB2 pre-10.1

Type of plan Hub join How does the plan work? Cartesian product of dimensions. Each row in Cartesian product probes the multi-column fact table index. Pre-filtering of the fact table by dimensions (semi-joins). Index ANDing the results of the dimension filtering. Completing the dimension join. Most likely plan is to: Join the most filtering dimension with the fact table first. Join in rest of the dimensions using a suitable join method such as hash join. Other plans are possible. Pre-requisite Multi-column index on the fact table on columns that join with the dimensions. Indexes on the fact table on each of the columns that joins with the dimensions (typically, the foreign keys) None.
Semi-join with index ANDing
Regular (2way) join
46

DB2 LUW 10 Star Schema and MCP

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB2 LUW 10 Star Schema and MCP

Uploaded by

Copyright:

Available Formats

Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query Parallelism

John Hornibrook IBM Canada

2012 IBM Corporation

2012 IBM Corporation

2012 IBM Corporation

Star Schema Query Optimization

2012 IBM Corporation

Logical DB design resembles a star Central table contains business facts

Surrounding tables contain dimensional data

Each dimension is a parent of the fact table

2012 IBM Corporation

Constrain on dimension attributes

2012 IBM Corporation

Star join dilemma

How do we filter with a combination of dimensions?

50M Daily Sales 750M rows 20M

2012 IBM Corporation

Star join solutions Specialized join methods (pre-DB2 10.1):

2012 IBM Corporation

Semi-join index ANDing star join

FETCH Daily Sales

rid bitmap > each semi-join eliminates bits ->

2012 IBM Corporation

Hub star join

PRODKEY 10 20 10 20 PRODKEY 10 20 10 20 STOREKEY 30 30 40 40

NLJOIN Daily Sales

2012 IBM Corporation

Hub star join

2012 IBM Corporation

DB2 10.1 Star Schema Highlights

2012 IBM Corporation

Enhancing the star detection in DB2 pre-10.1

2012 IBM Corporation

Comparison of old and new star detection methods:

Before DB2 10.1

2012 IBM Corporation

Using a multi-column index in a zigzag join

NO The index does not completely cover the dimension D2.

2012 IBM Corporation

Zigzag join with index key gap processing

Status --------------------No Gap Positioning Gap No Gap

Multi-column index recommendations

2012 IBM Corporation

Understanding ZZJOIN plan components

Performs data prefetch of the fact table for an all-probes List-Prefetch.

Scans either: 1) Index over temp or 2) Fast integer sort array

access plan for fact table

Builds either: 1) Index over temp or 2) Fast integer sort

plan for snowflake 1

plan for snowflake 2

2012 IBM Corporation

Accessing a dimension in a zigzag join plan

access plan for fact table

plan for snowflake 1

plan for snowflake 2

operator shows the following information (new operator argument):

Fast integer sort and index-over-temp

operator (input to ZZJOIN(1) operator) shows the following:

Fact table index access strategies

2012 IBM Corporation

Fact table index access

2012 IBM Corporation

Fact table access using single-probe list-prefetch plan