Low Bit Rate Mobile Visual Search

Institute
of Digital Media, The School of CS & EE, Peking University, China Aug. 18, 2011, NTU, Singapore
Rigid Textured
Product Packaging Logos
Faces Cars
Unpackage d Products Pets
Landmarks
Articulate Contour Defined
Moment of truth for computer vision Instance matching, not class recognition
4
PervasiveMobileVisualSearch
DevelopedSystems
Mobot NevenVision Ideixis Snaptell PointandFound GoogleGoggle
SideInformationisuseful formobilevisualsearch
Highefficient visualvocabularycoding
Visual Search
Mobile VisualSearch
Side Information
Desirablesearch precision&fastspeedover extremelylargescaledatasets
Goodscalabilityinterms ofupstreamquery codingcomplexity
PervasiveSide(Context)Information
GPS
AccessPoints
RFIDTags
VisuallyEasilyRecognizedBarcode, LabelsorLogos

A mobile photo query has to be transmitted from a mobile device to a server Often over a relatively slow wireless link 3G The quality of user experience heavily depends on how much information has to be transmitted, especially in the scenario of reality augmentation
8

Compact local feature descriptors
e.g. CHoG Stanford , DAISY EFPL , PCASIFT Descriptors CMU
Compact image signatures

e.g. miniBoW, Aggregate Local Descriptors INTRIA
Previous works still demand 2KB to 4KB visual descriptors per query image unsuitable for reality augmentation Within an unstable wireless network e.g. 3G network , the delivery cost still occasionally delays the query
Transform Coding of SIFT/SURF descriptors Chandrasekhar et al., VCIP 09 Direct compression of oriented image patch M. Makar et al., ICASSP 09 Descriptor designed for compressibility: CHoG Chandrasekhar et al., CVPR 09 Tree Histogram Coding Chen et al., DCC 09 Compression of Spatial Layout of Local Features Tsai et al., Mobimedia 10
10

Compact Descriptor for Visual Search
An ongoing MPEG AD HOC Group
Standard schedule
2010, April Initialize the CDVS AD HOC Group 2011, July, Final CFP
Evaluation set
Mobile query, especially landmark queries
Major Requirement of Compact Descriptors

Compactness Discriminability
11
12

All stateof theart compact descriptors are designed only based on visual content statistics
Incorporate the cheaply available mobile context e.g. GPS or base station tags to supervise the descriptor design
13

Effective and efficient, low bit rate mobile visual search say hundreds of bits per visual query Coding/Compression of an originally highdimensional image signatures e.g. BoW
M is learnt from the mobile context
14
User enters a given region The mobile end downloads or prestore the region specific description function The mobile user takes a query The mobile end extracts the initial BoW, compresses it into a regionspecific compact descriptor 16
Geographical region division Learning compact descriptor within each region
18

Crawled from Flickr and Panoramio websites
Beijing New York City Barcelona Singapore Florence
The geographical distribution of Flicker and Panoramio Photos in Beijing
19
Notations: in total m regions current geotagged photo x region assignment y i with i m

20

Expectation Step
Maximization Step
21
ThegeographicaldistributionofFlickerand PanoramioPhotosinBeijingCity
ThevisualawarepartitionofBeijing intogeographicalregions.Different colorsdenotedifferentclusters.
22
23
For the query belonging region, compact codewords are learnt to transmit
The learnt compact codebook and the extracted descriptors in exemplar queries. Left: the compact codebook in the querys assigned region; Middle: the query, where color highlights denote the detected descriptors on the query; Right: the transmitted words. We only transmit their occurrence index in practice.
24
Learning a compression matrix

MM
K M
The compact codebook U is generated via U MTV Use scalable vocabulary tree SVT to build the initial high dim vocabulary Hierarchical quantization of local features A good codebook U should minimize the ranking loss
where
istherankingpositionweightofIxwithrespecttoIq
25
,suchthatahigherrankcorrespondstoalargeweight.
26

with sample loss, Boosting
Error Weighting Ranking Decoded Signature
Given a region, aim to minimize the overall cost:
27

Find the ith best code word
Error weighting update
Compression function update

28
Formoredetails,pls refertoLocationDiscriminativeVocabularyCodingfor MobileLandmarkSearch,Int. Journal of Computer Vision,InPress
30
A chicken and egg problem Solved via an iterative cooptimization

31

Coding length of region i The maximal coding length of all regions
32
Compactness Learning Process over Beijing City
We normalize the min vs. max ratio of descriptor lengths and map the ratio to the saturation of red color. The green points denote the distribution of geo tagged photos. In general, less saturated map corresponds to more optimal descriptors.
The geographical visualization of the descriptor compactness in Beijing city through iterative cooptimization T 1 to 20 , which would be talked later
33
34

Over 1 M geographical tagged photos from Flickr and Panoramio websites Beijing, New York City, Barcelona, Singapore, etc..
From the geographical map of each city, we choose 30 most dense regions and 30 random regions
Ask volunteers to manually identify one or more dominant views. All nearduplicated landmark photos to a given view are labeled in its belonging and nearby regions
35

Nister et al. 2006

Jegou et al., 2010
Chandrasekhar et al., 2009a
Chen et al., 2009
Our alternative approach

36
Rate: Bit Rate of the final descriptor Distortion: mAP performance
37
Case study of illumination changes, scale changes, blurred photographing, occlusions, and partial landmark queries. Top: Vocabulary Boosting; Middle: Original BoW 38 features or Tree Histogram Coding; Bottom: IDF Thresholding top 20% .
mAP with respect to different regions
mAP variances in different regions, we draw two dimensional lattices to divided regions with respect to different image volumes and descriptor bits, then average the mAP for regions falling into each lattice.
39
40
41
42
43
ensureinteroperabilityof visualsearchapplications anddatabases, enablehighlevelof performanceof implementationsconformant tothestandard, simplifydesignofdescriptor extractionandmatchingfor visualsearchapplications, enablehardwaresupportfor descriptorextractionand matchinginmobiledevices, reduceloadonwireless networkscarryingvisual searchrelatedinformation.
Meeting#
97 98
Date July1822,2011 Nov.28Dec.02,2011
Action
FinalCfPissued Initialevaluationofproposalsandassignmentofcross checks
99 101 103 105
Feb.0610,2012 July1620,2012 Jan.2125,2013 July,2013
WD1 CD DIS FDIS
Requirement Sufficiency Format independence Robustness
Description Descriptorsshallbeselfcontained,inthesensethatno otherdataarenecessaryformatching Descriptorsshallbeindependentoftheimageformat Highmatchingaccuracyshallbeachievedatleastfor imagesoftexturedrigidobjects,landmarks,andprinted documents.Thematchingaccuracyshallberobustto changesinvantagepoint,cameraparameters,lighting conditions,aswellasinthepresenceofpartial occlusions Shallminimizelengths/sizeofimagedescriptors 1. Shall allow adaptation of descriptor lengths to support the required performance level and database size. 2. Shall enable design of webscale visual search applications and databases. Shallallowdescriptorextractionwithlowcomplexity(in termsofmemoryandcomputation) 1.
Fulfillment information Y/N Y/N *Retrievalandpair wisematching accuracyresults obtainedfordifferent descriptorlengths.
Compactness Scalability
Rangeofdescriptor lengthsanddatabase sizessupported.
Extraction complexity Matching complexity
2.
Localization
1.
2.
*Timinginformation fordescriptor extractionopertations. Shall allow matching of descriptors with low *Timinginformation complexity (in terms of memory and forretreivalandpair computation). wisematching If decoding of descriptors is required for opertations. matching, such decoding shall also be possible with low complexity. Shall support visual search algorithms that *Localizationaccuracy identify and localize matching regions of the results. query image and the database image Shall support visual search algorithms that provide an estimate of a geometric transformation between matching regions of the query image and the database image
(*)resultsobtainedusingdatabasesandtestproceduresspecifiedintheevaluationframework[1].
TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)
Theencodedcompactdescriptorincludestwoparts:
ChannelIdentification+ChannelDependentVisualDescriptor
Note:Achannelreferstosubdivide(partition)thereferencedatabase.Withineachchannel, wemaygenerateachanneldependent(extremely)compactdescriptor.
TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)
Thechannellearningcanbebasedoncontextualtags(e.g.GPSorRFID) orcanbesolelyonthevisualstatistics.
Wherestandardizationtakesplace
GPS
Access Points
RFIDTags
StandardizationEfforts VisuallyRecognized Barcode,Logos,etc.
JustificationofChannelinMCVD
ChannelNum=1 ChannelDivision
Channel1Model
Channel2Model
ChannelKModel
JustificationofChannelinMCVD
ChannelNum=1 ChannelDivision
Channel1 Compact Codebook
Channel2 Compact Codebook BoostingCompactCodebook
Channel1Model
Channel2Model
ChannelKModel
TwotypicalMCVDcaseswithorwithoutcontextualinformation,with comparisontothestateoftheart,overmillionscalelandmarkdataset.
53
ComparisonofmAP withrespecttotheupstreamquerybitratefor product/CD/bookcoverBenchmarkDatabases(publiclyavailable).
54
Bandwidth
Battery
Search
55
Phone Battery: Voltage: 4.0 V, Capability: 1400 mAH (or 20.2K Joules) Sending Images 20.2K Joules/52.4 Joules ~ 385 queries in total Sending MCVD: 20.2K Joules/8.1Joules ~ 2494 queries in total
Averageenergyconsumptioncomparisonthroughthe3Gwireless link,betweentransmittingtheentirequeryimageandtheextracting andtransmittingofMCVDandothercompactdescriptors.

56

57
Ourrelevantpapers
SelectedPapers
LocationDiscriminativeVocabularyCodingforMobileLandmarkSearch, Int. Journal of Computer Vision,InPress LearningCompactVisualDescriptorforLowBitRateMobileLandmark Search,IJCAI2011,Barcelona,Spain,Jul.2011 TowardsLowBitRateMobileVisualSearchwithMultipleChannel Coding,ACM MM2011,Arizona,USA,Nov.2011
MPEGInputContributions
CompactDescriptorsforVisualSearch,m18542,MPEG 94th PKUBench:AContextualRichBenchmarkforMobileVisualSearch, m19188,MPEG 95th MultipleChannelCompactVisualDescriptorwithAdaptiveChannel Learning,m19985,MPEG 96th TopiclevelSamplingTowardsOptimizedLocalitySensitiveVocabulary Coding,m21199,MPEG97th
Bandwidth
Battery
Search
Visual Search
Side Information
Mobile Visual Search
Thanks! Q&A

Low Bit Rate Mobile Visual Search

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Low Bit Rate Mobile Visual Search

Uploaded by

Copyright:

Available Formats

Institute

Product Packaging Logos

Unpackage d Products Pets

Articulate Contour Defined

Desirablesearch precision&fastspeedover extremelylargescaledatasets

Goodscalabilityinterms ofupstreamquery codingcomplexity

Compact image signatures

Major Requirement of Compact Descriptors

M is learnt from the mobile context

Geographical region division Learning compact descriptor within each region

Beijing New York City Barcelona Singapore Florence

The geographical distribution of Flicker and Panoramio Photos in Beijing

Notations: in total m regions current geotagged photo x region assignment y i with i m

ThevisualawarepartitionofBeijing intogeographicalregions.Different colorsdenotedifferentclusters.

Learning a compression matrix

Given a region, aim to minimize the overall cost:

Error weighting update

Compression function update

Formoredetails,pls refertoLocationDiscriminativeVocabularyCodingfor MobileLandmarkSearch,Int. Journal of Computer Vision,InPress

A chicken and egg problem Solved via an iterative cooptimization

Compactness Learning Process over Beijing City

Chandrasekhar et al., 2009a

Chen et al., 2009

Our alternative approach

Rate: Bit Rate of the final descriptor Distortion: mAP performance

mAP with respect to different regions

Date July1822,2011 Nov.28Dec.02,2011

99 101 103 105

Feb.0610,2012 July1620,2012 Jan.2125,2013 July,2013

WD1 CD DIS FDIS

Requirement Sufficiency Format independence Robustness

Fulfillment information Y/N Y/N *Retrievalandpair wisematching accuracyresults obtainedfordifferent descriptorlengths.

Rangeofdescriptor lengthsanddatabase sizessupported.

Extraction complexity Matching complexity

StandardizationEfforts VisuallyRecognized Barcode,Logos,etc.

Channel1 Compact Codebook

Channel2 Compact Codebook BoostingCompactCodebook

ComparisonofmAP withrespecttotheupstreamquerybitratefor product/CD/bookcoverBenchmarkDatabases(publiclyavailable).

Averageenergyconsumptioncomparisonthroughthe3Gwireless link,betweentransmittingtheentirequeryimageandtheextracting andtransmittingofMCVDandothercompactdescriptors.

Mobile Visual Search

You might also like