You are on page 1of 24

Scale Invariant Feature Transform

(SIFT)

Outline

What is SIFT
Algorithm overview
Object Detection
Summary

Overview
1999
Generates image features, keypoints
invariant to image scaling and rotation
partially invariant to change in illumination and 3D
camera viewpoint
many can be extracted from typical images
highly distinctive

Algorithm overview
Scale-space extrema detection
Uses difference-of-Gaussian function

Keypoint localization
Sub-pixel location and scale fit to a model

Orientation assignment
1 or more for each keypoint

Keypoint descriptor
Created from local image gradients

Scale space
Definition: L( x, y, ) G( x, y, ) I ( x, y)
where G( x, y, ) 1 2 e ( x y ) / 2
2

Scale space
Keypoints are detected using scale-space extrema in
difference-of-Gaussian function D
D definition:
D( x, y, ) (G( x, y, k ) G( x, y, )) I ( x, y)
L( x, y, k ) L( x, y, )

Efficient to compute

Relationship of D to 2 2G
Close approximation to scale-normalized
Laplacian of Gaussian,
2 2G
Diffusion equation:
Approximate G/:
giving,

G
2G

G G( x, y, k ) G( x, y, )

G( x, y, k ) G( x, y, )
2G
k

G( x, y, k ) G( x, y, ) (k 1) 2 2G

When D has scales differing by a constant


factor it already incorporates the 2 scale
normalization required for scale-invariance

Scale space construction

2k2
2k
2

2k

2
k

Scale space images

first octave

second octave

third octave

fourth octave

Difference-of-Gaussian images

first octave

second octave

third octave

fourth octave

Frequency of sampling
There is no minimum
Best frequency determined experimentally

Prior smoothing for each octave

Increasing increases robustness, but costs


= 1.6 a good tradeoff
Doubling the image initially increases number of
keypoints

Finding extrema
Sample point is selected only if it is a
minimum or a maximum of these points

Extrema in this image


DoG scale space

Localization
3D quadratic function is fit to the local sample
points
Start with Taylor expansion with sample point
D
1
D
D
(

as the origin

where ( x, y, )T
Take the derivative with respect to X, and set
D D
it to 0, giving
0

X
X X
D Dis the location of the keypoint

This is a 3x3 linear system
T

Localization
2D
2
2
D
y
2D

2D
y
2D
y 2
2D
yx

2D
D

D
2D
y

yx
y
x

2
D
D
x
x 2

Derivatives approximated by finite differences,


example:

D Dki ,j1 Dki ,j1

2
2 D Dki ,j1 2 Dki , j Dki ,j1

2
1
2 D ( Dki11, j Dki 11, j ) ( Dki11, j Dki11, j )

y
4

If X is > 0.5 in any dimension, process


repeated

Filtering
Contrast (use prev. equation):

T
1

D
) D
D(
X
2

If | D(X) | < 0.03, throw it out

Edge-iness:
Use ratio of principal curvatures to throw out poorly
defined peaks
Dxx Dxy
H

Curvatures come from Hessian:


D
D
yy
xy
Ratio of Trace(H)2 and Determinant(H)
Tr ( H ) Dxx Dyy
Det ( H ) Dxx Dyy ( Dxy ) 2

If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)

Orientation assignment
Descriptor computed relative to keypoints
orientation achieves rotation invariance
Precomputed along with mag. for all levels
(useful in descriptor computation)
m( x, y ) ( L( x 1, y ) L( x 1, y )) 2 ( L( x, y 1) L( x, y 1)) 2

( x, y) a tan 2(( L( x, y 1) L( x, y 1)) /( L( x 1, y ) L( x 1, y)))

Multiple orientations assigned to keypoints


from an orientation histogram
Significantly improve stability of matching

Keypoint images

Descriptor

Descriptor has 3 dimensions (x,y,)


Orientation histogram of gradient magnitudes
Position and orientation of each gradient sample
rotated relative to keypoint orientation

Descriptor

Weight magnitude of each sample point by


Gaussian weighting function
Distribute each sample to adjacent bins by
trilinear interpolation (avoids boundary effects)

Descriptor
Best results achieved with 4x4x8 = 128
descriptor size
Normalize to unit length
Reduces effect of illumination change

Cap each element to 0.2, normalize again


Reduces non-linear illumination changes
0.2 determined experimentally

Object Detection
Create a database of
keypoints from
training images
Match keypoints to a
database
Nearest neighbor
search

PCA-SIFT

Different descriptor (same keypoints)


Apply PCA to the gradient patch
Descriptor size is 20 (instead of 128)
More robust, faster

Summary

Scale space
Difference-of-Gaussian
Localization
Filtering
Orientation assignment
Descriptor, 128 elements

You might also like