You are on page 1of 13

Access Path Selection in a

Relation Database Management


System
(summarized in section 2)
Processing an SQL statement
parsing, optimization, code generation,
execution
an SQL statement may have many query
blocks (nesting)
Optimizer
validates parsed query
collects statistics on referenced relations &
columns
discovers available access paths for each
relation
checks for type errors in expressions
Access path selection:
determines order of evaluation of query blocks
a tree of alternate path choices is created for
each query block with more than one relation
minimum cost access path is chosen from the
tree
results of optimizer is passed to code
generation and execution components
RSS (Research Storage System)
storage manager for System R
Maintains physical storage, access paths,
locking, logging, and recovery
Relations are stored as a collection of tuples
tuples are stored on 4K pages; pages are
organized into segments
segments completely contain one or more
relations
tuples are accessed via a scan: sequential
scan or index scan
indexes are B-trees with linked leaves
sequential scan touches all the pages of a
segment that contains a relation once
index scans touch all the leaf pages of the
index once; relation pages >=1 times
if index and data tuples are in the same
order, the data is clustered
scans may takes a set of predicates to apply
to a tuple before returning it
predicates are of the form (column op value)
Cost computation
cost = page fetches + W*(RSI calls)
cost = IO costs + W * CPU costs
an index that matches a boolean factor of
the query is an efficient access path
Statistics
NCARD(T): cardinality of the relation T
TCARD(T): number of pages used for T
P(T): fraction of pages in a segment used
for T

ICARD(I): number of distinct keys in index
I
NINDX(I): number of pages in index I
Selectivity
column = value : F = 1/ICARD(column) if
there is an index. F = 1/10 otherwise
column1 = column2: F =
1/MAX(ICARD(column1),
ICARD(column2));
F = 1/ICARD(column i); F = 1/10
column > value: F = (high key value -
value) / (high key - low key)
column between value1 and value2: F=
(value2 - value1)/ (high key - low key)
column IN (list of values): F = (# of items
in list) * (selectivity for column = value)
max is 1/2
columnA IN subquery: F = (card. of
subquery) / ( card. of subquery relations)
(pred1) OR (pred2): F = F(pred1) +
F(pred2) - F(pred1) * F(pred2)
(pred1) AND (pred2): F = F(pred1) *
F(pred2)
NOT pred: F = 1 - F(pred)
QCARD
QCARD is ( card. of all relations) * (
F(pred i))
RSICARD is the expected number of calls
to RSI
( card. of all relations) * ( F(sargable
pred i))
An interesting order is an order specified
by the GROUP BY or ORDER BY clause
Single relation cost: cheapest access path
which produces the interesting order or
cheapest access path plus sorting cost of
result
Cost Table (p. 515)
index pages fetched plus data pages fetched
plus W times RSI tuple retrieval calls.
unique index matching an equal predicate:
1+1+W
clustered index I matching one or more
boolean factors:
F(preds) *(NINDX(I) + TCARD) + W *
RSICARD
etc
Joins
nested loops and merging scans
merging scans require sorts on the join
column -- another interesting order
n-way joins can be done by a succession of
2-way joins; not necessarily using the same
technique. Results may be pipelined if a sort
is not required.
Join ordering
n! permutations of relation join orders
join of (k+1) relation with previous k
relations is independent of first k join order
avoid Cartesian products when possible;
make them as late as possible
Construct a tree
construct a tree of possible join orderings:
keep the cheapest order that produces an
interesting ordering.
First find the best way to access each single
relation for each interesting ordering and
unordered.
Next, find the best way of joining any
relation to each of these.
Repeat until all relations have been added to
each branch
Choose the cheapest strategy that has an
interesting ordering, or the cheapest strategy
plus a sort.
Total number of solutions to store: 2
n

You might also like