Mtree

Revisiting M-tree Building Principles
Tom Skopal1, Jaroslav Pokorn2, Michal Krtk1, Vclav Snel1

1
DepartmentofComputerScience 2DepartmentofSoftware VBTechnicalUniversityof Engineering Ostrava CharlesUniversityinPrague CzechRepublic CzechRepublic
ADBIS 2003
Presentation Outline
Metric Indexing M-tree
basic concepts motivation for the M-tree revision fat-factor multi-way insertion slim-down algorithm
Experimental results Conclusions
Multimedia Indexing
Reasons for indexing of multimedia databases:
Implementation of the mechanism how to query Fast Retrieval
Vectormodel:
Multimediadocument Featurevector
Oi=(251,250,251,251,249,...)
Multimedia Indexing, cont.

Every indexing model must follow a retrieval semantics. The multimedia indexing model must support the similarity queries. Two types similarity queries: range queries
return documents similar more than a given threshold
k nearest neighbour queries

return the first k most similar documents
Metric Indexing
Feature vectors are indexed according to distances between each other. As a dissimilarity measure, a distance function d(Oi,Oj) is specified such that the metric axioms are satisfied:
d(Oi,Oi) = 0 reflexivity d(Oi,Oj) > 0 positivity d(Oi,Oj) = d(Oj,Oi) symmetry d(Oi,Ok) + d(Ok,Oj) d(Oi,Oj) triangular inequality
Metric structures:
Main memory structures: metric tree, vp-tree, mvp-tree Persistent structures:M-tree, Slim-tree (modification of M-tree)
M-tree at a glance
indexing objects of a general metric space (not only vector spaces) up to this time the only persistent and balanced metric tree doesnt directly use dimensions, just the distances between objects (possible vector coordinates are handled by a metric defined by user) The correct M-tree hierarchy is guaranteed due to the triangular inequality axiom of d. The hierarchy consists of nested metric regions. better resists to the curse of dimensionality (it depends on the metric) the hierarchy of nodes allows to natively implement the similarity queries
Structure of the M-tree

The M-tree nodes contain items of two types:
ground objects in leafs, representing the data objects routing objects in inner nodes, representing the metric regions
O O O
j l
O O
m
r o 1u r o 1 u( O i t r o 1 u( O pt
j t)
( O
r o )
( 0u
Opt
r n g
p d)
( O g r n k d) ( O g r n l )d ( r n md ) (g O r n i )d ( O g r n j )d ( O
Similarity queries in the M-tree

ArangequeryisspecifiedbyaqueryobjectOqandaqueryradius rq.
AkNNqueryisbasedonamodifiedrangequery(usingdynamicradius)andapriorityqueue.
Duringtherangequeryevaluation,theMtreeisLIFOpassedand onlytherelevant(i.e.intersecting)metricregions(theirnodes r o 0 u( Opt resp.)arefurtherprocessed. O i O k r o u t) ( O

O O
j l
p q
r o 1 u( O i t r o 1 u( O pt
) )
r n g
p d)
( O g r n k d) ( O g r n l )d ( r n md ) (g O r n i )d ( O g r n j )d ( O
M-tree, revision motivation

Internet-based applications: huge public multimedia databases (web pages, digital libraries, etc.) millions of users Which means: to focus on higher retrieval efficiency
thousands of users query at a moment
the building costs can increase

the index updates are much less frequent than querying
M-tree, fat-factor
TheretrievalefficiencyofanMtreeisextremelyaffectedbytheamountofoverlapamongthe metricregionsonalevelofMtree.Thus,thereisneedtominimizethevolumeofmetric regions. Sincevolumedoesntexistsingeneralmetricspaces,theamountofoverlapintheMtreecanbe measuredbythefatfactor,introducedforSlimtrees. Thefatfactorisaproportionofthediskaccesesneededforprocessingpointqueriesforallthe groundobjects.Thefatfactorisininterval<0,1>.(0meansthebest,1theworst)
O O O
j l
O O
m
r o 1u r o 1 u( O i t r o 1 u( O pt
j t)
( O
r o )
( 0u
Opt
r n g
pd)
( O g r n k d) ( O g r n l )d ( O r n md ) (g O r n i )d ( O g r n j )d ( O
M-tree, single-way insertion

Duringanobjectinsertion,onlysinglesubtreeisfurtherprocessed onacurrentlevelofMtree.Aheuristiccriterion:anodeischosen, thatspatiallycontainstheinsertedobjectand/orwhoserouting objectisthenearest O v O(log(n))complexity O
O
u w
z r o 1 u t) i r o 1 u Oi t ( r o 1 u Opt ( )
r o 0 (u Opt r o 0 u Owt ( (j O )
) ) r o 1 u Owt ( r o 1 (u Out ) )
O O O
j
Onew
O O
p
k g r n pd) ( O g r n kd) ( O g r n md ) ( O r n i d ( gO r n j d ( Og r n wd) ( g O r n ud) ( O g ) ) ( g r n l d ( g O r n zd) g O r n vd) )
M-tree, multi-way insertion

Duringanobjectinsertion,apointqueryfortheinsertedobjectis executedandroutingobjectsofallrelevantnonfullleafsare checked.Thenearestoneischosen.Ifsuchleafdoesntexist,the singlewayinsertionisperformed. O v O(n)complexity O
O
u w
z r o 1 u t) i r o 1 u Oi t ( r o 1 u Opt ( )
r o 0 (u Opt r o 0 u Owt ( (j O )
) ) r o 1 u Owt ( r o 1 (u Out ) )
O O O
j
Onew
O O
p
k g r n pd) ( O g r n kd) ( O g r n md ) ( O r n id ( gO r n j d ( Og r n wd) ( g O r n ud) ( O g ) ) ( g r n l d ( g O r n zd) g O r n vd) )
M-tree, slim-down algorithm

ApostprocessingmethodinspiredbySlimtrees,reducingthefat factorofanMtree. Inprinciple,itisaredistributionofgroundobjects,aswellasrouting objectswithinanexistingMtree.Theredistributionislevelbased,i.e. foreachobjectthebestnodeonthesamelevelistriedtofind(usingan algorithmsimilartopointquery).Ifsuchnodeisfoundtheobjectis movedandtheoriginalnodecanmeslimmed.Theslimmingstarts fortheleaflevelandcontinuesupwardsforthehigherlevels.
Advantages: asignificantreductionofthefatfactoraswellasoftheretrievalcosts astablealgorithm(canbewheneverinterruptedandresumedor
restarted)
doesntdirectlyincreasetheinsertioncostse.g.canrunintheidletime Disadvantages:
Slim-down algorithm, example

LetshaveacorrectbutpoorlybuiltMtree.Theregionsare highlyoverlappingandthefatfactorishigh. Twogroundobjects canbemovedto moreappropriate leafs.
Slimmingtheleaflevel(Level 0)

Theradiiofmetricregionsrepresentingtheleafsaswellasthe higherlevelednodeshavereduced.Alsothefatfactorisnow lower. Twolevel1routing objectscanbe movedtomore appropriatelevel1 nodes.
SlimmingLevel1

Again,theradiiofmetricregionsaswellasthefatfactorhave reduced. Therootlevel cannotbeslimmed becausenoparent nodeexists.
TheslimmedMtree
Experimental results
synthetic datasets of clustered tuples used metric: L2 (Euclidean) dimensionality: 2 50 number of tuples: 20,000 1,000,000 index sizes: 1 400 MB node capacity: 20 M-tree height: 3 5
Experiments were performed on an Intel Pentium4, 2.53GHz, 512 MB DDR333, under Windows XP pro.
Building Costs
Fat-factor, node utilization
Range queries costs
100-NN queries costs
Conclusions
New M-tree building techniques were proposed, improving the M-tree retrieval efficiency. These techniques are beneficial especially for modeling query-intensive MDBMS scenarios. The multi-way insertion improves the M-tree retrieval efficiency by up to 50%. The slim-down algorithm improves the M-tree retrieval efficiency by up to 300%.
References
T.Skopal, J.Pokorny, M.Kratky, V.Snasel. Revisiting M-tree Building Principles, ADBIS 2003, Dresden P. Ciaccia, M. Patella, P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces, VLDB 1997, Athens C. Traina Jr., A. Traina, B. Seeger, C. Faloutsos. Slim-Trees: High performance metric trees minimizing overlap between nodes. LNCS 1777, 2000.

Mtree

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mtree

Uploaded by

Copyright:

Available Formats

Revisiting M-tree Building Principles

Tom Skopal1, Jaroslav Pokorn2, Michal Krtk1, Vclav Snel1

DepartmentofComputerScience 2DepartmentofSoftware VBTechnicalUniversityof Engineering Ostrava CharlesUniversityinPrague CzechRepublic CzechRepublic

Experimental results Conclusions

Multimedia Indexing, cont.

k nearest neighbour queries

Structure of the M-tree

Similarity queries in the M-tree

Duringtherangequeryevaluation,theMtreeisLIFOpassedand onlytherelevant(i.e.intersecting)metricregions(theirnodes r o 0 u( Opt resp.)arefurtherprocessed. O i O k r o u t) ( O

M-tree, revision motivation

the building costs can increase

M-tree, single-way insertion

k g r n pd) ( O g r n kd) ( O g r n md ) ( O r n i d ( gO r n j d ( Og r n wd) ( g O r n ud) ( O g ) ) ( g r n l d ( g O r n zd) g O r n vd) )

M-tree, multi-way insertion

k g r n pd) ( O g r n kd) ( O g r n md ) ( O r n id ( gO r n j d ( Og r n wd) ( g O r n ud) ( O g ) ) ( g r n l d ( g O r n zd) g O r n vd) )

M-tree, slim-down algorithm

Slim-down algorithm, example

Slim-down algorithm, example

Slim-down algorithm, example

Fat-factor, node utilization

Range queries costs

100-NN queries costs

You might also like