Oracle Join Algorithms

http://oracle-online-help.blogspot.com/2007/03/nested-loops-hash-join-and-sort-merge.
html
Nested loop (loop over loop)

In this algorithm, an outer loop is formed which consists of few entries and then for each entry,
and inner loop is processed.
Ex:
Select tab1.*, tab2.* from tabl, tab2 where tabl.col1=tab2.col2;
It is processed like:
For i in (select * from tab1) loop
For j in (select * from tab2 where col2=i.col1) loop
Display results;
End loop;
End loop;
The Steps involved in doing nested loop are:
a) Identify outer (driving) table
b) Assign inner (driven) table to outer table.
c) For every row of outer table, access the rows of inner table.
In execution plan it is seen like this:
NESTED LOOPS
outer_loop
inner_loop
When optimizer uses nested loops?
Optimizer uses nested loop when we are joining tables containing small number of rows with an
efficient driving condition. It is important to have an index on column of inner join table as this
table is probed every time for a new value from outer table.
Optimizer may not use nested loop in case:
1. No of rows of both the table is quite high

2. Inner query always results in same set of records
3. The access path of inner table is independent of data coming from outer table.
Note: You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works
on model of showing instantaneous results to user as they are fetched. There is no need for
selecting caching any data before it is returned to user. In case of hash join it is needed and is
explained below.
Hash join
Hash joins are used when the joining large tables. The optimizer uses smaller of the 2 tables to
build a hash table in memory and the scans the large tables and compares the hash value (of rows
from large table) with this hash table to find the joined rows.
The algorithm of hash join is divided in two parts
1. Build a in-memory hash table on smaller of the two tables.

2. Probe this hash table with hash value for each row second table
In simpler terms it works like

Build phase
For each row RW1 in small (left/build) table loop
Calculate hash value on RW1 join key
Insert RW1 in appropriate hash bucket.
End loop;
Probe Phase
For each row RW2 in big (right/probe) table loop
Calculate the hash value on RW2 join key
For each row RW1 in hash table loop
If RW1 joins with RW2
Return RW1, RW2
End loop;
End loop;
When optimizer uses hash join?
Optimizer uses has join while joining big tables or big fraction of small tables.
Unlike nested loop, the output of hash join result is not instantaneous as hash joining is blocked
on building up hash table.
Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on
model of showing results after all the rows of at least one of the tables are hashed in hash table.
Sort merge join
Sort merge join is used to join two independent data sources. They perform better than nested
loop when the volume of data is big in tables but not as good as hash joins in general.
They perform better than hash join when the join condition columns are already sorted or there is
no sorting required.
(A merge join basically sorts all relevant rows in the first table by the join key, and also
sorts the relevant rows in the second table by the join key, and then merges these sorted
rows.
Take an example! At a garage sale you can buy 400 books. The deal is to take all or none.
You take all. Now, you have to find the books that you already have at home. How would
you go about it? Probably, you'd do a merge join: first, you sort your books by the primary
key (author, title), then you sort the 400 books by their primary key (author, title). Now,
you start at the top of both piles. If the value of the left piles primary key is higher, then
you take a book from the right pile and vice versa. When both values are equal, then you
have found a duplicate. To demonstrate a MERGE JOIN, two tables need to be created: )
The full operation is done in two parts:
 Sort join operation
get first row RW1 from input1

get first row RW2 from input2.
 Merge join operation
while not at the end of either input loop

if RW1 joins with RW2
get next row R2 from input 2
return (RW1, RW2)
else if RW1 < style=""> get next row RW1 from input 1
else
get next row RW2 from input 2
end loop
Note: If the data is already sorted, first step is avoided.
Important point to understand is, unlike nested loop where driven (inner) table is read as many
number of times as the input from outer table, in sort merge join each of the tables involved are
accessed at most once. So they prove to be better than nested loop when the data set is large.
When optimizer uses Sort merge join?
a) When the join condition is an inequality condition (like <, <=, >=). This is because hash
join cannot be used for inequality conditions and if the data set is large, nested loop is
definitely not an option.
b) If sorting is anyways required due to some other attribute (other than join) like “order by”,
optimizer prefers sort merge join over hash join as it is cheaper.
Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint
because it works on a model of first sorting both the data sources and then start returning the
results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may
prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as
optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge
join over hash join
Posted by Sachinat Friday, March 02, 2007

Labels: CBO, Tuning
28 comments:
Sachin said...
I wanted to put some examples in the post itself, but missed it earlier.
Here it is:
SQL> conn scott/*****

Connected.
SQL> create table e as select * from emp;
Table created.
SQL> create table d as select * from dept;
Table created.
SQL> create index e_deptno on e(deptno);
Index created.
Gather D stats as it is
SQL> exec dbms_stats.gather_table_stats('SCOTT','D')

PL/SQL procedure successfully completed.
Set artificial stats for E:
SQL> exec dbms_stats.set_table_stats(ownname => 'SCOTT', tabname => 'E', numrows =>
100, numblks => 100, avgrlen => 124);
Set artificial stats for E_DEPTNO index
SQL> exec dbms_stats.set_index_stats(ownname => 'SCOTT', indname => 'E_DEPTNO',

numrows => 100, numlblks => 10);
Check out the plan:

A) With less number of rows(100 in E), you will see Nested loop getting used.
SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno;
Execution Plan
----------------------------------------------------------
Plan hash value: 3204653704
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 2200 | 6 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| E | 25 | 225 | 1 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 100 | 2200 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | D | 4 | 52 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | E_DEPTNO | 33 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
B) Let us set some more artificial stats to see which plans is getting used:
1000000, numblks => 10000, avgrlen => 124);

SQL> exec dbms_stats.set_table_stats(ownname => 'SCOTT', tabname => 'D', numrows =>
1000000,numblks => 10000 , avgrlen => 124);
Now we have 1000000 number of rows in E and D table both and index on E(DEPTNO)
reflects the same.
Plans changes !!
SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno;
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250G| 5122G| | 3968K(100)| 13:13:45 |
|* 1 | HASH JOIN | | 250G| 5122G| 20M| 3968K(100)| 13:13:45 |
| 2 | TABLE ACCESS FULL| E | 1000K| 8789K| | 2246 (3)| 00:00:27 |
| 3 | TABLE ACCESS FULL| D | 1000K| 12M| | 2227 (2)| 00:00:27 |
-----------------------------------------------------------------------------------
C) Now to test MERGE JOIN, we set moderate number of rows and do some ordering
business.
10000, numblks => 1000, avgrlen => 124);

SQL> exec dbms_stats.set_table_stats(ownname => 'SCOTT', tabname => 'D', numrows =>
1000, numblks => 100, avgrlen => 124);
SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno order by e.deptno;
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500K| 52M| 167 (26)| 00:00:02 |
| 1 | MERGE JOIN | | 2500K| 52M| 167 (26)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| E | 10000 | 90000 | 102 (1)| 00:00:02 |
| 3 | INDEX FULL SCAN | E_DEPTNO | 10000 | | 100 (0)| 00:00:02 |
|* 4 | SORT JOIN | | 1000 | 13000 | 25 (4)| 00:00:01 |
| 5 | TABLE ACCESS FULL | D | 1000 | 13000 | 24 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Hope these examples help in learning ...

Oracle Join Algorithms

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Oracle Join Algorithms

Uploaded by

Copyright:

Available Formats

http://oracle-online-help.blogspot.com/2007/03/nested-loops-hash-join-and-sort-merge.

Nested loop (loop over loop)

1. No of rows of both the table is quite high

1. Build a in-memory hash table on smaller of the two tables.

In simpler terms it works like

The full operation is done in two parts:

 Sort join operation

get first row RW1 from input1

 Merge join operation

while not at the end of either input loop

Posted by Sachinat Friday, March 02, 2007

SQL> conn scott/*****

SQL> create table d as select * from dept;

SQL> create index e_deptno on e(deptno);

SQL> exec dbms_stats.gather_table_stats('SCOTT','D')

Set artificial stats for E:

PL/SQL procedure successfully completed.

Set artificial stats for E_DEPTNO index

SQL> exec dbms_stats.set_index_stats(ownname => 'SCOTT', indname => 'E_DEPTNO',

PL/SQL procedure successfully completed.

Check out the plan:

SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno;

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_index_stats(ownname => 'SCOTT', indname => 'E_DEPTNO',

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno;

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_index_stats(ownname => 'SCOTT', indname => 'E_DEPTNO',

PL/SQL procedure successfully completed.

PL/SQL procedure successfully completed.

SQL> select e.ename,d.dname from e, d where e.deptno=d.deptno order by e.deptno;

You might also like