Professional Documents
Culture Documents
complexity
➢ Either case is infeasible
Solution: Set-at-a-time approach
Uses mechanism
➢ Positional representation of occurrences of XML
elements and string values
➢ Element
• 3 tuple (DocId, StartPos:EndPos, LevelNum)
➢ String
• 3 tuple (DocId, StartPos, LevelNum)
Positional Representation
Structural Joins
• Set-at-a-time approach
• Uses positional representation of XML
elements.
• I/O and CPU optimal
Structural Join
• Goal: join two lists based on either parent-child or
ancestor-descendant
• Input :
➢ AList (a.DocID, a.StartPos : a.EndPos, d.LevelNum)
• Tree-merge Algorithm
• Stack-tree Algorithm
Algorithm Tree-Merge-Anc
• Output : ordered by ancestors
• Algorithm :
Loop through list of ancestors in increasing order of
startPos
➢ For each ancestor, skip over unmatchable descendants
( or parent-child relationship )
➢ Append result to output list
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Analysis
• Ancestor-descendent relationship
➢ |Output List| = O(|AList| * |DList|)
➢ Time complexity is optimal O( |AList| * |DList| )
➢ But poor I/O performance
• Parent-child relationship
➢ |Output List| = O(|AList| + |DList|)
➢ Time complexity is O (|AList| * |DList| )
Tree-Merge-Desc Algorithm
• Output : ordered by descendants
• Algorithm :
Loop over Descendants list in increasing order of startPos
➢ For each descendant, skip over unmatchable ancestors
( or parent-child relationship )
➢ Append result to output list
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Analysis
• Ancestor-descendent relationship
➢ |Output List| = O( |AList| * |DList| )
➢ Time Complexity :
O( |AList| * |DList| )
• Parent-child relationship
➢ |Output List| = O(|AList| + |DList|)
➢ Space and Time complexities are
O (|AList| * |DList| )
• Tree-Merge algorithms are not I/O optimal
• Repetitive accesses to Anc or Desc list
Motivation for Stack-Tree
Algorithm
• Basic idea: depth first traversal of XML tree
➢ takes linear time with stack of size equal to tree depth
➢ all ancestor-descendant relationships appear on stack
during traversal
• Main problem: do not want to traverse the whole
database, just nodes in Alist or Dlist
• Solution : Stack-Tree algorithm
➢ Stack: Sequence of nodes in Alist
Stack-Tree-Desc
• Initialize start pointers (a*, d*, s->top)
• While input lists are not empty and stack is not empty
➢ if new nodes (a* and d*) are not descendants of current
• else
➢ if a* is ancestor of d*, push a* on stack
and increment a*
➢ else
Dlist : d1, d2
Step 3
Step 4
Step 5
• Final output is :
(a1 , d1) , (a1 , d2) , (a2 , d1) , (a3 , d2)
Stack-Tree-Anc Analysis
• Requires careful handling of lists.
• Time complexity (for anc-desc and parent-child relation)
O(|AList| + |DList| + |OutputList|)
• Careful buffer management needed.
Thank You !