You are on page 1of 26

FLANN

Fast Library for Approximate


Nearest Neighbors
Marius Muja and David G. Lowe
University of British Columbia

Presented by
Mohammad Sadegh Riazi
Rice University

1 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
2 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
3 /26
What is FLANN?

FLANN is a library for performing fast approximate


nearest neighbor searches in high dimensional
spaces. System for automatically choosing the best
algorithm and optimum parameters depending on
the dataset.

4 /26
Which Programming Languages
does it support?
Written in C++
Contains a binding for:
C
MATLAB
Python

5 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
6 /26
Applications
Cluster analysis
Pattern Recognition
Statistical classification
Computational Geometry
Data compression
Database

7 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
8 /26
Approaches
Multiple Randomized k-d Tree Algorithm

Searching multiple trees in parallel


Splitting dimension is randomly chosen from
top ND dimensions with highest variance
ND is fixed to 5
Usually 20 trees is used
Best performance in most of data sets

9 /26
Approaches
The priority search K-Means Tree algorithm
Partitioning data into K distinct regions
Recursively partitioning each zone until the
leaf node which has no more than K items
Pick up the initial centers using random
selection Gonzales algorithm
I max is number of iterations of making
regions
Better performance than k-d tree for higher
precisions

10 /26
Approaches
Complexity Comparison

11 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
12 /26
Experiments
Data Dimensionality

Has a great impact on the nearest neighbor


matching performance
The decrease or increase in performance is
highly correlated with the type of data
For Random data samples it will highly
decreases

13 /26
Experiments
Data Dimensionality

However for Image Patches and real life


data, the performance will increases as
dimensionality increases. It can be explained
by the fact that each dimension gives us
some information about the other
dimensions so with few search iterations its
more likely to find the exact NN

14 /26
Experiments
Search Precision

The desired search precision determines


the degree of speedup
Accepting precision as low as 60% we can
achieve a speedup of three orders of
magnitude

15 /26
Experiments
Automatic Selection of the Optimal Algorithm

Algorithm is a parameter itself Each


algorithm can have different performance
with different data sets
Each algorithm has some internal
parameters
Dimensionality has a great impact
Size and Structure of data (Correlation?)
Desired Precision
k-means tree & randomized Kd-trees have
best performances in most data sets

16 /26
Experiments
How to find the best internal parameters

First using Global Grid Search to find a zone on parameter plane in order to
achieve better performance
Then Local optimizing using Nelder-Mead downhill simplex method
Can choose optimizing on all data sets or portion of it
Do this for all available algorithms

Find the best algorithms with its internal parameters

17 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
18 /26
Scaling Nearest Neighbor Search

We can achieve better performance using larger scale data sets


Problem: NOT possible to load into single memory
Solutions:
Dimension Reduction
Keeping data on a disk and loading into memory (poor performance)
Distributing

19 /26
Scaling Nearest Neighbor Search
Distribute NN matching among N machines using Map-Reduce like algorithm
Each machine will only have to index and search 1/N of the whole data
The final result of NNS is obtained by merging the partial results from all the
machines in the cluster once they have completed the search
Using Message Passing Interface (MPI) specification
The query is sent from a client to one of the computers in MPI cluster
(Master server)
The master server broadcasts the query to all of the processes in the cluster
Each process run NNS in parallel on its own fraction of the data
When the search is complete an MPI reduce operation is used to merge the
results back to master process and the final results is returned to the client

20 /26
Scaling Nearest Neighbor Search
Implementing using Message Passing Interface (MPI)

21 /26
Outline
Introduction
What is FLANN?
Which programming languages does it support?
Applications
Approaches
Randomized k-d Tree Algorithm
The Priority Search K-Means Tree Algorithm
Experiments
Data Dimensionality
Search Precision
Automatic Selection of the Optimal Algorithm
Scaling Nearest Neighbor Search
What we are going to do
References
22 /26
What we are going to do
Developing and
simulating an approach
for pre processing the
input queries to get the
better performance
Try to group the input
data so that we do not
need to search though
all data in the tree
Trade off between
Throughput and Latency

23 /26
References
[1] Marius Muja and David G. Lowe: "Scalable Nearest Neighbor Algorithms for
High Dimensional Data". Pattern Analysis and Machine Intelligence (PAMI), Vol.
36, 2014. [PDF] [BibTeX]

[2] Marius Muja and David G. Lowe: "Fast Matching of Binary Features".
Conference on Computer and Robot Vision (CRV) 2012. [PDF] [BibTeX]

[3] Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with
Automatic Algorithm Configuration", in International Conference on Computer
Vision Theory and Applications (VISAPP'09), 2009 [PDF] [BibTeX]

24 /26
Thank you for your attention

/26
Questions ?

/26

You might also like