Professional Documents
Culture Documents
of Digital Media, The School of CS & EE, Peking University, China Aug. 18, 2011, NTU, Singapore
Rigid Textured
Faces Cars
Landmarks
Moment of truth for computer vision Instance matching, not class recognition
4
PervasiveMobileVisualSearch
DevelopedSystems
Mobot NevenVision Ideixis Snaptell PointandFound GoogleGoggle
SideInformationisuseful formobilevisualsearch
Highefficient visualvocabularycoding
Visual Search
Mobile VisualSearch
Side Information
PervasiveSide(Context)Information
GPS
AccessPoints
RFIDTags
VisuallyEasilyRecognizedBarcode, LabelsorLogos
A mobile photo query has to be transmitted from a mobile device to a server Often over a relatively slow wireless link 3G The quality of user experience heavily depends on how much information has to be transmitted, especially in the scenario of reality augmentation
8
Compact local feature descriptors
e.g. CHoG Stanford , DAISY EFPL , PCASIFT Descriptors CMU
Previous works still demand 2KB to 4KB visual descriptors per query image unsuitable for reality augmentation Within an unstable wireless network e.g. 3G network , the delivery cost still occasionally delays the query
Transform Coding of SIFT/SURF descriptors Chandrasekhar et al., VCIP 09 Direct compression of oriented image patch M. Makar et al., ICASSP 09 Descriptor designed for compressibility: CHoG Chandrasekhar et al., CVPR 09 Tree Histogram Coding Chen et al., DCC 09 Compression of Spatial Layout of Local Features Tsai et al., Mobimedia 10
10
Compact Descriptor for Visual Search
An ongoing MPEG AD HOC Group
Standard schedule
2010, April Initialize the CDVS AD HOC Group 2011, July, Final CFP
Evaluation set
Mobile query, especially landmark queries
12
All stateof theart compact descriptors are designed only based on visual content statistics
Incorporate the cheaply available mobile context e.g. GPS or base station tags to supervise the descriptor design
13
Effective and efficient, low bit rate mobile visual search say hundreds of bits per visual query Coding/Compression of an originally highdimensional image signatures e.g. BoW
14
User enters a given region The mobile end downloads or prestore the region specific description function The mobile user takes a query The mobile end extracts the initial BoW, compresses it into a regionspecific compact descriptor 16
18
Crawled from Flickr and Panoramio websites
19
Expectation Step
Maximization Step
21
ThegeographicaldistributionofFlickerand PanoramioPhotosinBeijingCity
22
23
For the query belonging region, compact codewords are learnt to transmit
The learnt compact codebook and the extracted descriptors in exemplar queries. Left: the compact codebook in the querys assigned region; Middle: the query, where color highlights denote the detected descriptors on the query; Right: the transmitted words. We only transmit their occurrence index in practice.
24
The compact codebook U is generated via U MTV Use scalable vocabulary tree SVT to build the initial high dim vocabulary Hierarchical quantization of local features A good codebook U should minimize the ranking loss
where
istherankingpositionweightofIxwithrespecttoIq
25
,suchthatahigherrankcorrespondstoalargeweight.
26
with sample loss, Boosting
Error Weighting Ranking Decoded Signature
27
Find the ith best code word
30
Coding length of region i The maximal coding length of all regions
32
We normalize the min vs. max ratio of descriptor lengths and map the ratio to the saturation of red color. The green points denote the distribution of geo tagged photos. In general, less saturated map corresponds to more optimal descriptors.
The geographical visualization of the descriptor compactness in Beijing city through iterative cooptimization T 1 to 20 , which would be talked later
33
34
Over 1 M geographical tagged photos from Flickr and Panoramio websites Beijing, New York City, Barcelona, Singapore, etc..
From the geographical map of each city, we choose 30 most dense regions and 30 random regions
Ask volunteers to manually identify one or more dominant views. All nearduplicated landmark photos to a given view are labeled in its belonging and nearby regions
35
Nister et al. 2006
Jegou et al., 2010
37
Case study of illumination changes, scale changes, blurred photographing, occlusions, and partial landmark queries. Top: Vocabulary Boosting; Middle: Original BoW 38 features or Tree Histogram Coding; Bottom: IDF Thresholding top 20% .
mAP variances in different regions, we draw two dimensional lattices to divided regions with respect to different image volumes and descriptor bits, then average the mAP for regions falling into each lattice.
39
40
41
42
43
ensureinteroperabilityof visualsearchapplications anddatabases, enablehighlevelof performanceof implementationsconformant tothestandard, simplifydesignofdescriptor extractionandmatchingfor visualsearchapplications, enablehardwaresupportfor descriptorextractionand matchinginmobiledevices, reduceloadonwireless networkscarryingvisual searchrelatedinformation.
Meeting#
97 98
Action
FinalCfPissued Initialevaluationofproposalsandassignmentofcross checks
Description Descriptorsshallbeselfcontained,inthesensethatno otherdataarenecessaryformatching Descriptorsshallbeindependentoftheimageformat Highmatchingaccuracyshallbeachievedatleastfor imagesoftexturedrigidobjects,landmarks,andprinted documents.Thematchingaccuracyshallberobustto changesinvantagepoint,cameraparameters,lighting conditions,aswellasinthepresenceofpartial occlusions Shallminimizelengths/sizeofimagedescriptors 1. Shall allow adaptation of descriptor lengths to support the required performance level and database size. 2. Shall enable design of webscale visual search applications and databases. Shallallowdescriptorextractionwithlowcomplexity(in termsofmemoryandcomputation) 1.
Compactness Scalability
2.
Localization
1.
2.
*Timinginformation fordescriptor extractionopertations. Shall allow matching of descriptors with low *Timinginformation complexity (in terms of memory and forretreivalandpair computation). wisematching If decoding of descriptors is required for opertations. matching, such decoding shall also be possible with low complexity. Shall support visual search algorithms that *Localizationaccuracy identify and localize matching regions of the results. query image and the database image Shall support visual search algorithms that provide an estimate of a geometric transformation between matching regions of the query image and the database image
(*)resultsobtainedusingdatabasesandtestproceduresspecifiedintheevaluationframework[1].
TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)
Theencodedcompactdescriptorincludestwoparts:
ChannelIdentification+ChannelDependentVisualDescriptor
Note:Achannelreferstosubdivide(partition)thereferencedatabase.Withineachchannel, wemaygenerateachanneldependent(extremely)compactdescriptor.
TheFrameworkofMultichannel CompactVisualDescriptor(MCVD)
Thechannellearningcanbebasedoncontextualtags(e.g.GPSorRFID) orcanbesolelyonthevisualstatistics.
Wherestandardizationtakesplace
GPS
Access Points
RFIDTags
JustificationofChannelinMCVD
ChannelNum=1 ChannelDivision
Channel1Model
Channel2Model
ChannelKModel
JustificationofChannelinMCVD
ChannelNum=1 ChannelDivision
Channel1Model
Channel2Model
ChannelKModel
TwotypicalMCVDcaseswithorwithoutcontextualinformation,with comparisontothestateoftheart,overmillionscalelandmarkdataset.
53
54
Bandwidth
Battery
Search
55
Phone Battery: Voltage: 4.0 V, Capability: 1400 mAH (or 20.2K Joules) Sending Images 20.2K Joules/52.4 Joules ~ 385 queries in total Sending MCVD: 20.2K Joules/8.1Joules ~ 2494 queries in total
57
Ourrelevantpapers
SelectedPapers
LocationDiscriminativeVocabularyCodingforMobileLandmarkSearch, Int. Journal of Computer Vision,InPress LearningCompactVisualDescriptorforLowBitRateMobileLandmark Search,IJCAI2011,Barcelona,Spain,Jul.2011 TowardsLowBitRateMobileVisualSearchwithMultipleChannel Coding,ACM MM2011,Arizona,USA,Nov.2011
MPEGInputContributions
CompactDescriptorsforVisualSearch,m18542,MPEG 94th PKUBench:AContextualRichBenchmarkforMobileVisualSearch, m19188,MPEG 95th MultipleChannelCompactVisualDescriptorwithAdaptiveChannel Learning,m19985,MPEG 96th TopiclevelSamplingTowardsOptimizedLocalitySensitiveVocabulary Coding,m21199,MPEG97th
Bandwidth
Battery
Search
Visual Search
Side Information
Thanks! Q&A