You are on page 1of 16

A

TECHNICAL SEMINAR
ON

1
Presented By :
K.Shiva Kumar,
16d35a0507 .

DEPARTMENT OF COMPUTER SCIENCE&ENGINEERING


INDEX
 Introduction
 Why web clustering engine
 Advantages of cluster hierarchy
 Issues in implementation of Clusters
 Architecture
 Conclusion

3/23/2019 2
Introduction
 Web Clustering Engine

 Clustering is the act of grouping similar objects into


sets.

3/23/2019 3
3/23/2019 4
Why web clustering engine
 Conventional engines not
much efficient in ‘Ambiguous’
queries.
 The search results returned by
conventional search engines
query will be mixed together in
the list, irrelevant items occurs.

3/23/2019 5
Advantages of cluster hierarchy
 It makes for shortcuts to the items that relate to the
same meaning.

 It allows better topic understanding.

 It favors systematic exploration of search results.

3/23/2019 6
Issues in Implementation of clusters
 Short input data description.
 Meaningful labels.
 Selection of similarity measure.
 Grouping of objects into clusters.
 Over lapping.
 Unknown number of clusters.

3/23/2019 7
Architecture

3/23/2019 8
Search Results Acquisition
 Provides input for the rest of the system.

 Deliver 50 to 500 results.

 Public search engines such as Google , Yahoo.

3/23/2019 9
Preprocessing of Search Results
 Covert the search results into “features”.
 Steps:
 Language Identification
 Tokenization

 Stemming

 Selection features

3/23/2019 10
Cluster Construction and Labeling
 Search results are input to the clustering algorithm.

 Data centric Clustering Algorithm.

 Created cluster should be aptly labeled.

3/23/2019 11
How can represent a Feature/Text
Vector space Model (VSM).
Document d is represented
in the VSM as a vector
[wt0 , wt1 , … , wtn].
 Example:
d->”polly had a dog and
the dog had polly”

3/23/2019 12
Visualization
One prominent approach is based on heirarchical folders
 Clusty , CREDO ,Lingo3G – heirarchical folder
visualization approach.
 Grokker – Nesting , zooming approach.
 KarTOO – Graph based interface.

3/23/2019 13
Conclusion
A number of advances must be made to improve the cluster labels ,
coherence of cluster structure , performance evaluation studies
advanced visualization techniques . Then web clustering engines
entirely fulfills the promise of being the page Rank of the future.

3/23/2019 14
References
 http://clusty.com
 http://credo.fub.it
 www.google.com
 http://credino.demi.uniud.it

3/23/2019 15
Thank You…

3/23/2019 16

You might also like