Professional Documents
Culture Documents
TECHNICAL SEMINAR
ON
1
Presented By :
K.Shiva Kumar,
16d35a0507 .
3/23/2019 2
Introduction
Web Clustering Engine
3/23/2019 3
3/23/2019 4
Why web clustering engine
Conventional engines not
much efficient in ‘Ambiguous’
queries.
The search results returned by
conventional search engines
query will be mixed together in
the list, irrelevant items occurs.
3/23/2019 5
Advantages of cluster hierarchy
It makes for shortcuts to the items that relate to the
same meaning.
3/23/2019 6
Issues in Implementation of clusters
Short input data description.
Meaningful labels.
Selection of similarity measure.
Grouping of objects into clusters.
Over lapping.
Unknown number of clusters.
3/23/2019 7
Architecture
3/23/2019 8
Search Results Acquisition
Provides input for the rest of the system.
3/23/2019 9
Preprocessing of Search Results
Covert the search results into “features”.
Steps:
Language Identification
Tokenization
Stemming
Selection features
3/23/2019 10
Cluster Construction and Labeling
Search results are input to the clustering algorithm.
3/23/2019 11
How can represent a Feature/Text
Vector space Model (VSM).
Document d is represented
in the VSM as a vector
[wt0 , wt1 , … , wtn].
Example:
d->”polly had a dog and
the dog had polly”
3/23/2019 12
Visualization
One prominent approach is based on heirarchical folders
Clusty , CREDO ,Lingo3G – heirarchical folder
visualization approach.
Grokker – Nesting , zooming approach.
KarTOO – Graph based interface.
3/23/2019 13
Conclusion
A number of advances must be made to improve the cluster labels ,
coherence of cluster structure , performance evaluation studies
advanced visualization techniques . Then web clustering engines
entirely fulfills the promise of being the page Rank of the future.
3/23/2019 14
References
http://clusty.com
http://credo.fub.it
www.google.com
http://credino.demi.uniud.it
3/23/2019 15
Thank You…
3/23/2019 16