You are on page 1of 46

Deep Dive into DB2 10 Query Performance Optimization: Star Schemas and Multi-core Query Parallelism

John Hornibrook IBM Canada

Information Management

2012 IBM Corporation

Information Management

Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBMS CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANTY OR REPRESENTATION FROM IBM AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR (OR ITS

ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

2012 IBM Corporation

Information Management

Agenda

New DB2 10.1 features Star schema query optimization Zig-zag join Multi-core query parallelism Intra-partition query parallelism Existing functionality, significantly improved

2012 IBM Corporation

Information Management

Star Schema Query Optimization

Provides improved performance for star schema queries Star schemas are typically found in data marts or some data warehouses Introduces new star schema join method: (zig-zag join) Complimentary to existing star schema join methods Improves existing star schema detection algorithms Supports wider range of queries

2012 IBM Corporation

Information Management

Star Schemas
Customer
custkey name address

Product
Daily Sales
perkey prodkey storekey promokey custkey quantity_sold price cost prodkey category upc_number

Logical DB design resembles a star Central table contains business facts


Sales prices, cost, quantities, etc.

Period
perkey year month

Surrounding tables contain dimensional data


Time, location, characteristics, etc.

Store
storekey storenumber region

Promotion
promokey promotype promodesc

Each dimension is a parent of the fact table


1:N from a dimension to the fact

2012 IBM Corporation

Information Management

Star joins
Queries performed against star schemas
SELECT ITEM_DESC, SUM(QUANTITY_SOLD), AVG(PRICE), AVG(COST) FROM PERIOD, DAILY_SALES, PRODUCT, STORE WHERE PERIOD.PERKEY=DAILY_SALES.PERKEY AND PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND STORE.STOREKEY=DAILY_SALES.STOREKEY AND CALENDAR_DATE BETWEEN AND '01/01/2005' AND '04/28/2005' AND STORE_NUMBER='03' AND CATEGORY=72 GROUP BY ITEM_DESC

Aggregate on dimension attribute, sum on fact measures Join fact to some subset of the dimensions Join fact foreign keys to dimension primary keys

Constrain on dimension attributes

2012 IBM Corporation

Information Management

Star join dilemma


No single dimension may filter the fact table well But a combination of dimensions may filter well
840,000 category 72 products sold during Jan. to April 2005 53,000 category 72 products sold during Jan. to April 2005 in store #3

How do we filter with a combination of dimensions?

Period
CALENDAR_DATE BETWEEN '01/01/2005' AND '04/28/2005'

50M Daily Sales 750M rows 20M

Product
CATEGORY=72

30M

Store
' STORE_NUMBER='03'

2012 IBM Corporation

Information Management

Star join solutions Specialized join methods (pre-DB2 10.1):


Semi-join with index ANDing Use combinations of fact table indexes to avoid accessing data pages Hub join Use Cartesian product of dimension rows to provide better fact keys Query must meet star join criteria Both methods can be built and competed Regular join plans are still built and competed with either method Costing decides Specialized methods arent always the winners

2012 IBM Corporation

Information Management

Semi-join index ANDing star join


Produce filtered fact table (Daily_Sales) with foreign key indices Execute "semi-join" with each dimension that filters the fact table "AND" RID-maps from each semi-join with next semi-join Retrieve fact table columns via RIDs

FETCH Daily Sales

rid bitmap > each semi-join eliminates bits ->


101101101001111011011001 100100101001001010011001 000100001000000010001000

semi-join NLJOIN

semi-join NLJOIN

semi-join NLJOIN

Product
Daily Sales

Period
Daily Sales

Store
Daily Sales

2012 IBM Corporation

Information Management

Hub star join


Form a Cartesian join of filtering dimensions Cartesian join -> no join predicates Cartesian join result should be small to be effective Join the Cartesian join result to the fact table using a multi-column fact table index.

PRODKEY 10 20 10 20 PRODKEY 10 20 10 20 STOREKEY 30 30 40 40

STOREKEY 30 30 40 40

PERIODKEY 50 50 50 50

NLJOIN

NLJOIN Daily Sales


PERIODKEY 50 Probe fact table with multi-column index on: PRODKEY,STOREKEY,PERIODKEY

NLJOIN
PRODKEY 10 20 STOREKEY 30 40

Period Store

Product

10

2012 IBM Corporation

Information Management

Hub star join


Works well if Cartesian result is small Cartesian may contain many key combinations that dont exist in the fact table Results in unnecessary fact table index probes.

NLJOIN
PRODKEY 10 20 10 20 10 20 10 20 10 20 10 20 STOREKEY 30 30 40 40 30 30 40 40 30 30 40 40 PERIODKEY 50 50 50 50 60 60 60 60 70 70 70 70

Daily Sales

11

2012 IBM Corporation

Information Management

DB2 10.1 Star Schema Highlights


Introduces a new zigzag join method that builds upon the zigzag join technology available in Redbrick that has proven unique performance advantage in the industry. Provides consistent performance for warehouse queries. Adds a new star detection method that is more reliable. Supports star schema queries in single and multiple subject areas with snowflakes. Exploits indexes even when there is a gap in probing key, reducing the number of indexes that need to be created. Works seamlessly for range partitioned tables and in serial, SMP and DPF environments. Can use MDC block indexes on the fact table for enabling zigzag join. Recommends multi-column indexes to enable zigzag join through explain diagnostics and index advisor in Optim Query Tuner (OQT)

12

2012 IBM Corporation

Information Management

Enhancing the star detection in DB2 pre-10.1


DB2 (pre-10.1) recognizes a star By analysis of sizes of tables and join predicates. A star is detected after application of local filtering and snowflake joins. The New Star Detection in DB2 10.1: Only requirement: joining dimension column(s) must be unique New Detects multiple stars per query block Allows a star to be detected with fewer restrictions Much more reliable The new star detection method also enables pre-DB2 10.1 star schema plans. Pre-DB2 10.1 detection is invoked if the new star detection fails to detect any star.

13

2012 IBM Corporation

Information Management

Comparison of old and new star detection methods:


No. 1 2 3 4 5 6 7 8 Requirement/Restriction Minimum of three base tables Minimum of two equijoin predicates Multi-column index on fact table Number of fact tables allowed Non-deterministic or side-effect predicates Non-equijoin predicates Sub-query predicates Correlation among tables in a snowflake Simple XML predicates Derived (non-base) tables Excluded from the star. Can be included in the star.
2012 IBM Corporation

Before DB2 10.1

DB2 10.1

Necessary to form a Necessary to form a star. star. Necessary to form a Necessary to form a star. star. Used by the Zigzag join Used by the Cartesian Hub plan, plan, if available. if available. One Unlimited

Star can not be formed in the query block in the presence of this SQL feature.

Star can be formed in the query block in the presence of these features and may include the feature in the star

9 10
14

Information Management

The new zigzag join method for star schema based queries
How does it work? First forms the virtual Cartesian product of dimensions. Avoids most non-productive probes from the Cartesian product into the fact table. Fact table index provides feedback to dimensions. zigzags through the dimensions and the fact table. Pre-requisite: A multi-column index on the fact table on columns that join with the dimensions.

New

15

2012 IBM Corporation

Information Management

Using a multi-column index in a zigzag join


Pre-requisite Columns that participate in the join are included in the index Index columns from at least two dimension tables are completely covered by join predicates Consider this star schema based query: D1 has primary key A D2 has a composite primary key (B,C) D3 has primary key D These PK columns are used in equi-join operations with the fact table
Fact table index definition Qualified? Why? (A,D), (A,B,C), (B,C,D), (C,B,D) YES The index completely covers two dimensions. (A,B,C,D), (A,C,B,D) YES The index completely covers three dimensions. D3 (D) D D1 (A) A

Fact

B,C

D2 (B,C)

(A,B), (C,D)

(B,A,C)

NO The index does not completely cover the dimension D2.

NO The columns B and C in the composite index are not in contiguous positions in the index.

16

2012 IBM Corporation

Information Management

Zigzag join with index key gap processing


Gap processing allows a single multi-column index to be used for a bigger set of queries. Greatly reduces the number of fact table indexes E.g., a fact table index on (A, C, B) allows zigzag join when there is no join on C
D1 (A) A

Fact

D2 (B)

Gap processing is implemented using new jump scan technology Explain facility indicates when gap processing is used New JUMPSCAN argument on IXSCAN operator Gap columns identified Arguments: -------------JUMPSCAN: (JumpScan Plan) TRUE Gap Info: --------------------Index Column 0: Index Column 1: Index Column 2:
17

Status --------------------No Gap Positioning Gap No Gap


2012 IBM Corporation

Information Management

Multi-column index recommendations


New explain diagnostic message recommending multi-column fact table indexes The optimizer performs analysis of primary/unique keys and equi-join predicates in the query that and detects that: the query is based on a star schema and a multi-column index does not exist or a different multi-column index might provide better performance Extended Diagnostic Information: -----------------------------------------------Diagnostic Identifier: 1 Diagnostic Details: EXP0256I Analysis of the query shows that the query might execute faster if an additional index was created. Schema name: "STAR". Table name: "FACT". Column list: "(F3, F2, F1, F0)". Optim Query Tuner provides a workload based index advisor that uses the above feature to determine a consolidated set of index recommendations.

18

2012 IBM Corporation

Information Management

Understanding ZZJOIN plan components


ZZJOIN(2)

Performs data prefetch of the fact table for an all-probes List-Prefetch.

FETCH

Performs back-join to get dimension table columns required for subsequent operations if fact table access is all-probes List-Prefetch.

RIDSCAN

SORT

ZZJOIN(1)

Performs the zigzag join operation 1) Last leg is the fact table 2) Preceding legs are dimensions

Scans either: 1) Index over temp or 2) Fast integer sort array

TBSCAN

TBSCAN

access plan for fact table

TEMP

TEMP

Builds either: 1) Index over temp or 2) Fast integer sort

plan for snowflake 1

plan for snowflake 2

Snowflake plans could either be: 1) Access of a single table or 2) Joins of multiple tables

Could be one of the following: 1) Index scan 2) Single-probe list-prefetch 3) All-probes list-prefetch

19

2012 IBM Corporation

Information Management

Accessing a dimension in a zigzag join plan

A dimension leg must have TBSCAN-TEMP on top of the base dimension access plan.
ZZJOIN(1)

TBSCAN

TBSCAN

access plan for fact table

TEMP

TEMP

plan for snowflake 1

plan for snowflake 2

The

TEMP

operator shows the following information (new operator argument):

RANDOM_ACCESS (Random Access on temp table is available using Fast Integer Sort method or Index over Temp).
To simplify the query plans in the following discussion, please assume the TBSCAN-TEMP operators exist on top of the base dimension access plan.
20 2012 IBM Corporation

Information Management

Fast integer sort and index-over-temp


Two new dimension access methods are implemented to ensure efficient random access of the dimensions by the zigzag join operator. An index is created over the TEMP operator (IOT) using dimension join columns. Additional columns may be included in the index as include columns A fast integer sort (FIS) data structure is built using the join key from the dimension. This method has an extension to allow additional columns if the join key is of type INTEGER. In order for the optimizer to pick fast integer sort, the dimension must not have a composite key and the joining column must be of type INTEGER or BIGINT. If the join column is of type BIGINT, fast integer sort can be used only if no other dimension column is required for subsequent operations. The
TBSCAN

operator (input to ZZJOIN(1) operator) shows the following:

IDXOVTMP: (A temporary index will be created and used on this temp) TRUE - the scan builds an index over the temporary table for random access. FALSE - the scan builds a fast integer sort structure for random access. The feedback predicates applicable to that dimension are displayed in the form of startstop key conditions.
21 2012 IBM Corporation

Information Management

Fact table index access strategies

Index scan and data page fetch Single-probe list-prefetch All-probes list-prefetch

22

2012 IBM Corporation

Information Management

Fact table index access


IXSCAN-FETCH plan: The index scan accesses the index over the fact table to retrieve RIDs from the fact table matching the input probe values. These fact table RIDs are then used to fetch the necessary fact table data.

ZZJOIN

Any access on D1

Any access on D2

FETCH

IXSCAN

FACT

23

2012 IBM Corporation

Information Management

Fact table access using single-probe list-prefetch plan


The list prefetch plan executes for every probe row from the combination of dimension tables/snowflakes. The index scan over the fact table finds fact table RIDs matching the input probe values. The SORT, RIDSCAN and FETCH operators sort RIDs according to data page ids and start off list prefetchers to get the fact table data.

ZZJOIN

Any access on D1

Any access on D2

FETCH

RIDSCAN

FACT

SORT

IXSCAN
24 2012 IBM Corporation

Information Management

Fact table access using all-probes list-prefetch plan


All matching RIDs from all the probes are sorted together in the order of the fact table data pages and the list prefetchers started to retrieve the necessary fact table data . The benefit of sorting all the RIDs in this fashion is that it helps achieve better prefetching and can lower the number of physical I/Os. A back-join with each of the dimension tables is necessary to retrieve the dimension table columns required for subsequent operations Dimension columns do not flow through list-prefetch operation Back-join represented as a 2nd ZZJOIN operator
ZZJOIN(2)

FETCH

RIDSCN

SORT

ZZJOIN(1)

Any access on D1
25

Any access on D2

IXSCAN on FACT
2012 IBM Corporation

Information Management

Multi-core Query Parallelism

Also known as intra-partition parallelism Supported in DB2 since V5 Query parallelism within a database partition Parallelism achieved without the use of the database partitioning feature Does not require any form of data partitioning Exploits symmetric multi-processor and/or multi-core processors DB2 10.1:
Extend the existing implementation Remove scalability bottlenecks

26

2012 IBM Corporation

Information Management

Multi-core Query Parallelism Use Cases


Large OLTP reporting systems Reporting jobs can often be a large part of the batch processing Workloads are normally running on large multi-processor machines SMP, with multiple cores, sometimes with hyper-threading Improve multi-core query parallelism to reduce the time the reporting jobs take within the batch windows C-Class warehouse workloads Targeting warehouse and marts that are up to 4-5 TB Will be running on x or p servers with anywhere from 8 to 32 cores Simple setup using ESE (i.e. no database partitioned) Improve query response through multi-core parallelism

27

2012 IBM Corporation

Information Management

Current intra-partition parallelism architecture

Combination of data and functional parallelism Data parallelism Dynamically partition data Assign partition to query task Easier to load balance User not required to partition data e.g. range, hash, etc Data dynamically assigned to query tasks Assign range of pages or rows (Range is a fixed size prior to DB2 10.1) Assign new range when range is consumed Provides dynamic load balancing Support table and index scans
28 2012 IBM Corporation

Information Management

Dynamic data allocation straw scans


Degree=4
Subagent 1
Pages 0-1

Subagent 2

Pages 2-3

Subagent 3

Pages 4-5

Subagent 4

Pages 6-7

Subagent 3

Pages 8-9

Subagent 2

etc...

2012 IBM Corporation

Information Management

Functional parallelism

Functional parallelism Divide query task by function Assign functional task to different execution units Doesn't require data partitioning Harder to load balance Must ensure execution units are equally busy DB2 implementation Single co-ordinator process services application requests Multiple sub-agent processes return data through local table queue Only 1 parallelized functional unit (section)
30 2012 IBM Corporation

Information Management

Functional parallelism
Query contains only 2 subsections and 1 local table queue Runtime operators coordinated using latches, semaphores, shared memory controls blocks
Co-ordinator

RETUR N (9 ) | LTQ (8 )

Subagent 1

Subagent 2

Subagent 3

Subagent 4

LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR

LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR

LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR

LTQ (8 ) | MSJOIN (7 ) /- -+ - -- - - \ TBSCAN TBSCAN (3 ) (6 ) | | SORT SORT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR

2012 IBM Corporation

Information Management

Intra-partition parallelism example


select p.name, p.prod_id, pa.attribute from product p, prodatr pa where p.prod_id = pa.prod_id;
LTQ (8 ) | MSJOIN (7 ) / - -+ - -- - -\ TBSCAN TBSCAN (3 ) (6 ) | | SORT S O RT (2 ) (5 ) | | TBSCAN TBSCAN (1 ) (4 ) | | PRO D U CT PRO DATR

Results returned via shared memory table queue to co-ordinator agent Join processed in parallel by each agent by joining corresponding partitions Each agent scans a sort partition Hash partitioned sorts on prod_id
one partition per agent

Parallel table scans ("straw" scans)

2012 IBM Corporation

Information Management

Intra-partition parallelism architecture


Compile Time
Single query involves 1 coordinating agent n sub agents m prefetchers (shared) All executing in parallel on available processors Combination of... Data parallelism Each agent works on subset of data Data dynamically assigned so user not required to partition data Functional parallelism Each agent works on different query function, e.g. scan, sort User can control "degree" of parallelism Also benefits I/O bound uniprocessors Agent Agent Agent

SQL Query

Query Optimizer Best Query Plan Threaded Code


Run Time

Prefetchers

33

2012 IBM Corporation

Information Management

DB2 10.1 Multi-core query parallelism


Improved scalability
Within the current architecture Scale near-linearly to degree 32
Achieved by: 1.Improved load balance New rebalance (REBAL) access plan operator 2.More efficient parallelization techniques Move LTQ higher in the access plan 3.Reduce latch contention

34

2012 IBM Corporation

Information Management

Improved scalability
Load imbalance results in poor scalability REBAL redistributes rows to ensure all subagents do equal work Optimizer performs load balance analysis to determine REBAL placement
6.77122e+06 NLJOIN ( 6) 713706 63 /---------+----------\ 292.2 23173.3 REBAL FETCH ( 7) ( 9) 325.265 2456.85 11 2 | /---+----\ 292.2 23173.3 6.77122e+07 TBSCAN IXSCAN TABLE: DB2USER ( 8) ( 10) DAILY_SALES 325.265 1605.23 Q1 11 1 | | 2922 6.77122e+07 TABLE: DB2USER INDEX: SYSIBM PERIOD SQL091218161022180 Q2 Q1

Multi-core Query Parallelism Before

degree

After

degree

35

2012 IBM Corporation

Information Management

Improved scalability
More efficient parallelization techniques Partial-final UNIQUE GRPBY on unique key Can perform complete GRPBY without a partitioned SORT Improved access plan parallelization transformation costing Improved exploitation of stream partitioning Avoid partitioned SORT Reduce latch contention Dynamic straw scan unit (straw gulp size) Improved NLJOIN inner access Improved HSJOIN Improved partitioned SORT Prefetcher queues Various others
36 2012 IBM Corporation

Information Management

DB2 10.1 Multi-core query parallelism externals


Support mixed workloads
Parallelize report queries in an OLTP system Reduce parallel infrastructure overhead on OLTP queries Pre DB2 10.1 there is a 10-15% impact just by setting INTRA_PARALLEL=ON
In ESE only. DPF unconditionally enables parallel infrastructure

DB2 10.1: Use Workload Manager (WLM) to toggle INTRA_PARALLEL and maximum DEGREE for a workload Improved automatic degree determination degree=ANY Avoid parallelizing queries that wont benefit Improved automatic runtime degree reduction

37

2012 IBM Corporation

Information Management

Controlling query parallelism


WLM workload control: An OLTP workload that doesnt use parallelism =1 INTRA_PARALLEL=NO
CREATE WORKLOAD banking_wl APPLNAME (banking) MAXIMUM DEGREE 1;

A BI workload using parallelism >1 INTRA_PARALLEL=YES Also specifies the degree upper limit The application specifies the requested degree using existing external controls
CREATE WORKLOAD report_wl APPLNAME (cognos) MAXIMUM DEGREE 8; ALTER WORKLOAD report_wl MAXIMUM DEGREE 4;

Application control:
CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL(YES) Toggles intra-partition parallelism at transaction boundaries
Must not have open cursors across transaction boundaries e.g. WITH HOLD cursors

38

2012 IBM Corporation

Information Management

Pre-DB2 10.1 intra-partition parallelism external controls


Parameter Value Default Scope Comment

INTRA_PARALLEL MAX_QUERYDEGREE

NO,YES ANY, 1~32,767

NO ANY

Instance Instance

DBM configuration DBM configuration, Valid only if INTRA_PARALLEL is ON

DFT_DEGREE

ANY, 1~32,767

Database

DB configuration, Initial value for CURRENT DEGREE special register or package bind DEGREE option

CURRENT DEGREE

ANY, 1~32,767

DFT_DEGREE

Application

Special register, the degree of parallelism considered by the SQL compiler for dynamic SQL access plans DB2 bind option, the degree of parallelism considered by the SQL compiler for static SQL access plans CLP command, the degree of parallelism allowed at runtime for any access plans (dynamic or static SQL)

Bind DEGREE

ANY, 1~32,767

DFT_DEGREE

Package

SET RUNTIME DEGREE command

1~32,767

Application

39

2012 IBM Corporation

Information Management

Appendix

Additional material

40

2012 IBM Corporation

Information Management

Star schemas Dimension tables


Contain descriptive information to augment fact rows Used to filter fact rows Query results are aggregated on dimension attributes Contains a primary key
possibly multiple columns generated, meaningless numeric value

Typically contains much fewer rows than the fact table May be represented as a hierarchy of tables or a snowflake
e.g. product is further normalized to product, brand and category but this requires extra joins
Brand Product Category

41

2012 IBM Corporation

Information Management

Star schemas
Fact table
Contains numeric measures of business information Queries perform computation (sum, avg, etc.) on measures Contains primary key columns from each dimension

Represent foreign keys referencing each parent dimension Can have explicit referential integrity, but not necessary for DB2
May have a primary key

Composite of the foreign keys or Single, generated, meaningless numeric value


Number of rows depends on fact granularity

hourly, daily, etc. finer granularity -> more rows coarser granularity -> limits drill down ability
Typically, local predicates arent applied directly

42

2012 IBM Corporation

Information Management

Star schemas Data Marts Can contain multiple fact tables Each fact usually denotes a separate star Dimensions can be shared across stars e.g. Daily_Sales and Daily_Forecast facts can share the Store and Product dimensions Queries may join multiple fact tables

43

2012 IBM Corporation

Information Management

ZZJOIN(1) operator
An n-ary join method that joins together the dimension table/snowflakes and the fact table. Drives the process of forming probe rows from dimension tables/snowflakes, Probes the fact table to find matching fact table rows Uses the feedback from the fact table to advance to next rows on the temporary table over the dimension tables/snowflakes. Feedback predicates identified in explain information New EXPLAIN_PREDICATE.HOW_APPLIED value: FEEDBACK Displayed in the ZZJOIN operator details in db2exfmt
Predicates: ---------2) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D2FK = Q1.D2PK) 3) Feedback Predicate used in Join, Comparison Operator: Subquery Input Required: Filter Factor: Predicate Text: -------------(Q3.D1FK = Q2.D1PK) Equal (=) No 0.25

Equal (=) No 0.25

44

2012 IBM Corporation

Information Management

ZZJOIN (2) operator


Only required for all-probes list-prefetch. Uses the join columns to locate the matching row in the temporary table so that the required non-join columns from the dimension table can be retrieved. Makes use of the efficient random access method such as FIS or IOT to retrieve the dimension table columns required for subsequent operations. Also known as backjoin Indicated in explain by BACKJOIN argument of ZZJOIN operator

45

2012 IBM Corporation

Information Management

Star schema plans in DB2 pre-10.1


Type of plan Hub join How does the plan work? Cartesian product of dimensions. Each row in Cartesian product probes the multi-column fact table index. Pre-filtering of the fact table by dimensions (semi-joins). Index ANDing the results of the dimension filtering. Completing the dimension join. Most likely plan is to: Join the most filtering dimension with the fact table first. Join in rest of the dimensions using a suitable join method such as hash join. Other plans are possible. Pre-requisite Multi-column index on the fact table on columns that join with the dimensions. Indexes on the fact table on each of the columns that joins with the dimensions (typically, the foreign keys) None.

Semi-join with index ANDing

Regular (2way) join

46

2012 IBM Corporation

You might also like