Professional Documents
Culture Documents
ADBIS 2003
Presentation Outline
Metric Indexing M-tree
basic concepts motivation for the M-tree revision fat-factor multi-way insertion slim-down algorithm
Multimedia Indexing
Reasons for indexing of multimedia databases:
Implementation of the mechanism how to query Fast Retrieval
Vectormodel:
Multimediadocument Featurevector
Oi=(251,250,251,251,249,...)
Metric Indexing
Feature vectors are indexed according to distances between each other. As a dissimilarity measure, a distance function d(Oi,Oj) is specified such that the metric axioms are satisfied:
d(Oi,Oi) = 0 reflexivity d(Oi,Oj) > 0 positivity d(Oi,Oj) = d(Oj,Oi) symmetry d(Oi,Ok) + d(Ok,Oj) d(Oi,Oj) triangular inequality
Metric structures:
Main memory structures: metric tree, vp-tree, mvp-tree Persistent structures:M-tree, Slim-tree (modification of M-tree)
M-tree at a glance
indexing objects of a general metric space (not only vector spaces) up to this time the only persistent and balanced metric tree doesnt directly use dimensions, just the distances between objects (possible vector coordinates are handled by a metric defined by user) The correct M-tree hierarchy is guaranteed due to the triangular inequality axiom of d. The hierarchy consists of nested metric regions. better resists to the curse of dimensionality (it depends on the metric) the hierarchy of nodes allows to natively implement the similarity queries
O O O
j l
O O
m
r o 1u r o 1 u( O i t r o 1 u( O pt
j t)
( O
r o )
( 0u
Opt
r n g
p d)
( O g r n k d) ( O g r n l )d ( r n md ) (g O r n i )d ( O g r n j )d ( O
p q
r o 1 u( O i t r o 1 u( O pt
) )
r n g
p d)
( O g r n k d) ( O g r n l )d ( r n md ) (g O r n i )d ( O g r n j )d ( O
M-tree, fat-factor
TheretrievalefficiencyofanMtreeisextremelyaffectedbytheamountofoverlapamongthe metricregionsonalevelofMtree.Thus,thereisneedtominimizethevolumeofmetric regions. Sincevolumedoesntexistsingeneralmetricspaces,theamountofoverlapintheMtreecanbe measuredbythefatfactor,introducedforSlimtrees. Thefatfactorisaproportionofthediskaccesesneededforprocessingpointqueriesforallthe groundobjects.Thefatfactorisininterval<0,1>.(0meansthebest,1theworst)
O O O
j l
O O
m
r o 1u r o 1 u( O i t r o 1 u( O pt
j t)
( O
r o )
( 0u
Opt
r n g
pd)
( O g r n k d) ( O g r n l )d ( O r n md ) (g O r n i )d ( O g r n j )d ( O
z r o 1 u t) i r o 1 u Oi t ( r o 1 u Opt ( )
r o 0 (u Opt r o 0 u Owt ( (j O )
) ) r o 1 u Owt ( r o 1 (u Out ) )
O O O
j
Onew
O O
p
z r o 1 u t) i r o 1 u Oi t ( r o 1 u Opt ( )
r o 0 (u Opt r o 0 u Owt ( (j O )
) ) r o 1 u Owt ( r o 1 (u Out ) )
O O O
j
Onew
O O
p
restarted)
doesntdirectlyincreasetheinsertioncostse.g.canrunintheidletime Disadvantages:
Slimmingtheleaflevel(Level 0)
SlimmingLevel1
TheslimmedMtree
Experimental results
synthetic datasets of clustered tuples used metric: L2 (Euclidean) dimensionality: 2 50 number of tuples: 20,000 1,000,000 index sizes: 1 400 MB node capacity: 20 M-tree height: 3 5
Experiments were performed on an Intel Pentium4, 2.53GHz, 512 MB DDR333, under Windows XP pro.
Building Costs
Conclusions
New M-tree building techniques were proposed, improving the M-tree retrieval efficiency. These techniques are beneficial especially for modeling query-intensive MDBMS scenarios. The multi-way insertion improves the M-tree retrieval efficiency by up to 50%. The slim-down algorithm improves the M-tree retrieval efficiency by up to 300%.
References
T.Skopal, J.Pokorny, M.Kratky, V.Snasel. Revisiting M-tree Building Principles, ADBIS 2003, Dresden P. Ciaccia, M. Patella, P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces, VLDB 1997, Athens C. Traina Jr., A. Traina, B. Seeger, C. Faloutsos. Slim-Trees: High performance metric trees minimizing overlap between nodes. LNCS 1777, 2000.