Professional Documents
Culture Documents
Some slides were adapted from those by James Allan, Jamie Callan,
Zhang Gang, Ken Hoganson, Weiyi Meng, and Clement Yu
Roadmap for Today
‣ Recap
Parallel Information Retrieval
‣ neural networks
‣ genetic algorithms
S1 A A A A B B B B S2
A B
A B
S1 S2
A B
A B
6
Roadmap for Today
‣ Recap
Distributed Information Retrieval (DIR)
‣ What is a collection?
- A single source, e.g., the Durham Herald Sun (what time period?)
···
Search Search Search
Engine 1 Engine 2 Engine n
Distributed
Search Engine
···
Resource Description
of each DB
A General Model for Distributed Information Retrieval
···
Search Search Search
Engine 1 Engine 2 Engine n
Query
Distributed
Search Engine
···
Resource Description
of each DB
A General Model for Distributed Information Retrieval
···
Search Search Search
Engine 1 Engine 2 Engine n
s
Qu ecte
ct e d DB
sel
to se l e
ery d D
Query
to Bs
Query
Distributed
Search Engine
···
Resource Description
of each DB
A General Model for Distributed Information Retrieval
···
Search Search Search
Engine 1 Engine 2 Engine n
Re ecte
D Bs
sel
e d
su d D
se lect
lts
m Bs
Qu ecte
l t s f ro t e d D
elec
sel
Resu
fro Bs
s
ery d D
y t o
Quer
m
to Bs
Query
Distributed
Search Engine
···
Resource Description
of each DB
A General Model for Distributed Information Retrieval
···
Search Search Search
Engine 1 Engine 2 Engine n
Re ecte
D Bs
sel
e d
su d D
se lect
lts
m Bs
Qu ecte
l t s f ro t e d D
elec
sel
Resu
fro Bs
s
ery d D
y t o
Quer
m
to Bs
Query
Distributed
Search Engine Merged Results
···
Resource Description
of each DB
A General Model for Distributed Information Retrieval
···
Search Search Search
Engine 1 Engine 2 Engine n
Re ecte
D Bs
sel
e d
su d D
se lect
lts
m Bs
Qu ecte
l t s f ro t e d D
elec
sel
Resu
fro Bs
s
ery d D
y t o
Quer
m
to Bs
Query
Distributed
Search Engine Merged Results Four Steps:
1. Find out what each DB contains
··· 2. Decide which DBs to search
3. Search selected DBs
4. Merge results returned by DBs
Resource Description
of each DB
Primary Motivations for Distributed IR
- To increase speed
‣ Site Description
‣ Collection Selection
‣ Searching
‣ Metrics
Site Description
‣ Contents
‣ Search Engine
‣ Services
Collection Selection
- cooperative
- uncooperative
Some Result Merging Possibilities
‣ Generality
‣ Effectiveness
‣ Efficiency
‣ Consistency of results
‣ Recap
The Problem
user
query result
user interface
query dispatcher result merger
20
A More Efficient Metasearch Engine
user
query result
user interface
database selector query dispatcher
query dispatcher result merger
‣ Recap
Retrieval
(A Higher Level View)
- text retrieval
- document retrieval
- fact retrieval
@).J"01*7%K=%/)%+"/"&"#'#
F"/"&"#'# K=
F"/" ;.,:/.:,)* 9$2.,:/.:,)*
+&)',62)1'$.-/26 8"67-)&*269)/E'0%
L1'3+#
9HHQ;%"7'< /E"*%/'P/<
!)7-$)* 90'3"/1)*"3% 5,))6.)4. 9M*"/-0"3%
5-'01'# "37'&0";%H5B< 3"*7-"7'N<;%O))3'"*
+,-.-/'& 9,)*,-00'*,(% !"#$%&'()*;%
=',)2'0"&131/( ,)*/0)3;%0',)2'0(;% /E)-7E%#/133%"*%1##-'
"/).1,%)J'0"/1)*#<
34'/. 90'#-3/#%"0'% 01%,)/-2) 9*''+%/)%
D"/,E1*7 !"#!$% M,)00',/N< .'"#-0'%
'::',/12'*'##<
"#$%&'()*+,+-./01+233.4
Information Retrieval Contrasted With Data Retrieval
• Why?
Set Theoretic
Fuzzy
Extended Boolean
Classic Models
Boolean Algebraic
vector
U Generalized Vector
probabilistic
s Retrieval: Lat. Semantic Index
e Adhoc Neural Networks
r Filtering
Structured Models
Probabilistic
T Non-Overlapping Lists
a Proximal Nodes Inference Network
s Belief Network
k Browsing
Browsing
Flat
Structure Guided
Hypertext
Kuropka’s Classification of IR Models (2005)
Standard
Boolean Fuzzy Set
set-theoretic
Extended
Boolean
Generalized Balanced
Vector Space Topic-based
Vector Space Topic-based
Vector Vector Space
algebraic Space
Binary
Language
Independence
Retrieval
probabilistic by Logical
Imaging
Inference Belief
Network Network
Roadmap for Today
‣ Metasearch Engines
‣ Recap
Recap: Parallel Information Retrieval
‣ Opportunities
- resilience to failures
32
Optional:
If Time Permits
and
There is Enough Interest
‣ Open Problems
()*+,)-.+$/%012
34$%5+"+$%67%+4$%8,+%
9 1$:,$*$;+);#%<6==$<+)6;*%->%+$,?*%";/%7,$@.$;<)$*%)*%$77$<+)A$B
9 C6;+,6==$/%A6<"-.=",)$*%";/%*<4$?"*%",$%;6+%;$<$**",>B
9 C6==$<+)6;*%";/%/6<.?$;+*%<";%-$%,";D$/%E)+4%6;$%"=#6,)+4?%
F.*);#%/)77$,$;+%*+"+)*+)<*GB
= $B#BH%I=J55H%);7$,$;<$%;$+E6,D*
9 1";D);#*%7,6?%/)77$,$;+%<6==$<+)6;*%<";%-$%?$,#$/%$77)<)$;+=>2
= E)+4%:,$<)*$=>%;6,?"=)K$/%*<6,$*%F0;76*$$DL*%?$+46/GH%6,
= E)+46.+%:,$<)*$=>%;6,?"=)K$/%/6<.?$;+%*<6,$*H
= E)+4%6;=>%?);)?"=%$776,+H%";/
= E)+4%6;=>%?);)?"=%<6??.;)<"+)6;%-$+E$$;%<=)$;+%";/%*$,A$,B
9 M",#$%*<"=$%/)*+,)-.+$/%,$+,)$A"=%<";%-$%"<<6?:=)*4$/%;6EB