You are on page 1of 59

Institute

of Digital Media, The School of CS & EE, Peking University, China Aug. 18, 2011, NTU, Singapore

Rigid Textured

Product Packaging Logos

Faces Cars

Unpackage d Products Pets

Landmarks

Articulate Contour Defined

Moment of truth for computer vision Instance matching, not class recognition
4

PervasiveMobileVisualSearch
DevelopedSystems
Mobot NevenVision Ideixis Snaptell PointandFound GoogleGoggle

SideInformationisuseful formobilevisualsearch
Highefficient visualvocabularycoding
Visual Search

Mobile VisualSearch
Side Information

Desirablesearch precision&fastspeedover extremelylargescaledatasets

Goodscalabilityinterms ofupstreamquery codingcomplexity

PervasiveSide(Context)Information

GPS

AccessPoints

RFIDTags

VisuallyEasilyRecognizedBarcode, LabelsorLogos


A mobile photo query has to be transmitted from a mobile device to a server Often over a relatively slow wireless link 3G The quality of user experience heavily depends on how much information has to be transmitted, especially in the scenario of reality augmentation
8


Compact local feature descriptors
e.g. CHoG Stanford , DAISY EFPL , PCASIFT Descriptors CMU

Compact image signatures


e.g. miniBoW, Aggregate Local Descriptors INTRIA

Previous works still demand 2KB to 4KB visual descriptors per query image unsuitable for reality augmentation Within an unstable wireless network e.g. 3G network , the delivery cost still occasionally delays the query

Transform Coding of SIFT/SURF descriptors Chandrasekhar et al., VCIP 09 Direct compression of oriented image patch M. Makar et al., ICASSP 09 Descriptor designed for compressibility: CHoG Chandrasekhar et al., CVPR 09 Tree Histogram Coding Chen et al., DCC 09 Compression of Spatial Layout of Local Features Tsai et al., Mobimedia 10
10


Compact Descriptor for Visual Search
An ongoing MPEG AD HOC Group

Standard schedule
2010, April Initialize the CDVS AD HOC Group 2011, July, Final CFP

Evaluation set
Mobile query, especially landmark queries

Major Requirement of Compact Descriptors


Compactness Discriminability
11

12


All stateof theart compact descriptors are designed only based on visual content statistics

Incorporate the cheaply available mobile context e.g. GPS or base station tags to supervise the descriptor design
13


Effective and efficient, low bit rate mobile visual search say hundreds of bits per visual query Coding/Compression of an originally highdimensional image signatures e.g. BoW

M is learnt from the mobile context

14

User enters a given region The mobile end downloads or prestore the region specific description function The mobile user takes a query The mobile end extracts the initial BoW, compresses it into a regionspecific compact descriptor 16

Geographical region division Learning compact descriptor within each region

18


Crawled from Flickr and Panoramio websites

Beijing New York City Barcelona Singapore Florence

The geographical distribution of Flicker and Panoramio Photos in Beijing

19

Notations: in total m regions current geotagged photo x region assignment y i with i m


20


Expectation Step

Maximization Step

21

ThegeographicaldistributionofFlickerand PanoramioPhotosinBeijingCity

ThevisualawarepartitionofBeijing intogeographicalregions.Different colorsdenotedifferentclusters.

22

23

For the query belonging region, compact codewords are learnt to transmit

The learnt compact codebook and the extracted descriptors in exemplar queries. Left: the compact codebook in the querys assigned region; Middle: the query, where color highlights denote the detected descriptors on the query; Right: the transmitted words. We only transmit their occurrence index in practice.
24

Learning a compression matrix


MM
K M

The compact codebook U is generated via U MTV Use scalable vocabulary tree SVT to build the initial high dim vocabulary Hierarchical quantization of local features A good codebook U should minimize the ranking loss

where

istherankingpositionweightofIxwithrespecttoIq
25

,suchthatahigherrankcorrespondstoalargeweight.

26


with sample loss, Boosting
Error Weighting Ranking Decoded Signature

Given a region, aim to minimize the overall cost:

27


Find the ith best code word

Error weighting update

Compression function update


28

Formoredetails,pls refertoLocationDiscriminativeVocabularyCodingfor MobileLandmarkSearch,Int. Journal of Computer Vision,InPress

30

A chicken and egg problem Solved via an iterative cooptimization


31


Coding length of region i The maximal coding length of all regions

32

Compactness Learning Process over Beijing City

We normalize the min vs. max ratio of descriptor lengths and map the ratio to the saturation of red color. The green points denote the distribution of geo tagged photos. In general, less saturated map corresponds to more optimal descriptors.

The geographical visualization of the descriptor compactness in Beijing city through iterative cooptimization T 1 to 20 , which would be talked later

33

34


Over 1 M geographical tagged photos from Flickr and Panoramio websites Beijing, New York City, Barcelona, Singapore, etc..

From the geographical map of each city, we choose 30 most dense regions and 30 random regions
Ask volunteers to manually identify one or more dominant views. All nearduplicated landmark photos to a given view are labeled in its belonging and nearby regions
35


Nister et al. 2006


Jegou et al., 2010

Chandrasekhar et al., 2009a

Chen et al., 2009

Our alternative approach


36

Rate: Bit Rate of the final descriptor Distortion: mAP performance

37

Case study of illumination changes, scale changes, blurred photographing, occlusions, and partial landmark queries. Top: Vocabulary Boosting; Middle: Original BoW 38 features or Tree Histogram Coding; Bottom: IDF Thresholding top 20% .

mAP with respect to different regions

mAP variances in different regions, we draw two dimensional lattices to divided regions with respect to different image volumes and descriptor bits, then average the mAP for regions falling into each lattice.
39

40

41

42

43

ensureinteroperabilityof visualsearchapplications anddatabases, enablehighlevelof performanceof implementationsconformant tothestandard, simplifydesignofdescriptor extractionandmatchingfor visualsearchapplications, enablehardwaresupportfor descriptorextractionand matchinginmobiledevices, reduceloadonwireless networkscarryingvisual searchrelatedinformation.

Meeting#
97 98

Date July1822,2011 Nov.28Dec.02,2011

Action
FinalCfPissued Initialevaluationofproposalsandassignmentofcross checks

99 101 103 105

Feb.0610,2012 July1620,2012 Jan.2125,2013 July,2013

WD1 CD DIS FDIS

Requirement Sufficiency Format independence Robustness

Description Descriptorsshallbeselfcontained,inthesensethatno otherdataarenecessaryformatching Descriptorsshallbeindependentoftheimageformat Highmatchingaccuracyshallbeachievedatleastfor imagesoftexturedrigidobjects,landmarks,andprinted documents.Thematchingaccuracyshallberobustto changesinvantagepoint,cameraparameters,lighting conditions,aswellasinthepresenceofpartial occlusions Shallminimizelengths/sizeofimagedescriptors 1. Shall allow adaptation of descriptor lengths to support the required performance level and database size. 2. Shall enable design of webscale visual search applications and databases. Shallallowdescriptorextractionwithlowcomplexity(in termsofmemoryandcomputation) 1.

Fulfillment information Y/N Y/N *Retrievalandpair wisematching accuracyresults obtainedfordifferent descriptorlengths.

Compactness Scalability

Rangeofdescriptor lengthsanddatabase sizessupported.

Extraction complexity Matching complexity

2.

Localization

1.

2.

*Timinginformation fordescriptor extractionopertations. Shall allow matching of descriptors with low *Timinginformation complexity (in terms of memory and forretreivalandpair computation). wisematching If decoding of descriptors is required for opertations. matching, such decoding shall also be possible with low complexity. Shall support visual search algorithms that *Localizationaccuracy identify and localize matching regions of the results. query image and the database image Shall support visual search algorithms that provide an estimate of a geometric transformation between matching regions of the query image and the database image

(*)resultsobtainedusingdatabasesandtestproceduresspecifiedintheevaluationframework[1].

TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)

Theencodedcompactdescriptorincludestwoparts:

ChannelIdentification+ChannelDependentVisualDescriptor
Note:Achannelreferstosubdivide(partition)thereferencedatabase.Withineachchannel, wemaygenerateachanneldependent(extremely)compactdescriptor.

TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)

Thechannellearningcanbebasedoncontextualtags(e.g.GPSorRFID) orcanbesolelyonthevisualstatistics.

Wherestandardizationtakesplace
GPS

Access Points

RFIDTags

StandardizationEfforts VisuallyRecognized Barcode,Logos,etc.

JustificationofChannelinMCVD

ChannelNum=1 ChannelDivision

Channel1Model

Channel2Model

ChannelKModel

JustificationofChannelinMCVD

ChannelNum=1 ChannelDivision

Channel1 Compact Codebook

Channel2 Compact Codebook BoostingCompactCodebook

Channel1Model

Channel2Model

ChannelKModel

TwotypicalMCVDcaseswithorwithoutcontextualinformation,with comparisontothestateoftheart,overmillionscalelandmarkdataset.
53

ComparisonofmAP withrespecttotheupstreamquerybitratefor product/CD/bookcoverBenchmarkDatabases(publiclyavailable).

54

Bandwidth

Battery

Search

55

Phone Battery: Voltage: 4.0 V, Capability: 1400 mAH (or 20.2K Joules) Sending Images 20.2K Joules/52.4 Joules ~ 385 queries in total Sending MCVD: 20.2K Joules/8.1Joules ~ 2494 queries in total

Averageenergyconsumptioncomparisonthroughthe3Gwireless link,betweentransmittingtheentirequeryimageandtheextracting andtransmittingofMCVDandothercompactdescriptors.


56


57

Ourrelevantpapers
SelectedPapers
LocationDiscriminativeVocabularyCodingforMobileLandmarkSearch, Int. Journal of Computer Vision,InPress LearningCompactVisualDescriptorforLowBitRateMobileLandmark Search,IJCAI2011,Barcelona,Spain,Jul.2011 TowardsLowBitRateMobileVisualSearchwithMultipleChannel Coding,ACM MM2011,Arizona,USA,Nov.2011

MPEGInputContributions
CompactDescriptorsforVisualSearch,m18542,MPEG 94th PKUBench:AContextualRichBenchmarkforMobileVisualSearch, m19188,MPEG 95th MultipleChannelCompactVisualDescriptorwithAdaptiveChannel Learning,m19985,MPEG 96th TopiclevelSamplingTowardsOptimizedLocalitySensitiveVocabulary Coding,m21199,MPEG97th

Bandwidth

Battery

Search

Visual Search

Side Information

Mobile Visual Search

Thanks! Q&A

You might also like