You are on page 1of 8

CS784: Recent Trends in Web Search

Sreenivasa Pavan Kuppili


May 17, 2008

Abstract niche markets. As can be expected, people have started


suggesting improvements to Web Search along many dif-
Web Search is an integral part of our lives today. The ferent dimensions. The Web Search landscape is pretty di-
advent of Google in the late 90’s revolutionalized search. verse today. There are several visions about the future of
In this decade, there has been a lot of research on Web Web Search, with no clear winner at this point. In this pa-
search. The landscape is pretty diverse today, with people per, we will look at the various directions in which search
exploring in various directions like social search, clus- is evolving: some of these are social search, clustering
tering, personalized search, local search, vertical search, documents and results, personalized search, local search,
object search, question-answering, etc. In this paper, we vertical search, object search, question-answering com-
look at some of the key ideas in these various fields (We munities, Semantic Web, etc. We look at the key ideas in
focus on text or information search, and not on audio or these different fields.
video search). The goal is to give a flavor of these various
directions currently being explored in Web Search.
The paper is organized as follows. In section 2, we
look at the different classes of search queries and how the
1 Introduction search engines have been evolving to meet the require-
ments of these different classes of user queries. In sec-
The early Web Search engines have their origins in in- tion 3, we examine the ideas in object-level search and
formation retrieval research. However, the Web is far in vertical search. Web 2.0 and mass collaboration are
more diverse and bigger than the specialized document very popular today, and in section 4, we examine how
collections used in Information Retrieval research. Be- the user communities and social tagging can be used to
sides, anyone is allowed to author articles on the Web. improve the quality of Web Search. In section 5, we ex-
Traditional information retrieval algorithms do not work amine how the search results can be personalized. Per-
well on the Web. As a result, Web Search engines had to sonalization can be seen as a way of biasing the search
evolve to meet the new requirements. One of the seminal results towards the user’s preferences. In section 6, we
papers in this direction is [1] in which the link structure of examine how clustering of documents or queries might be
the Web was exploited to measure the global importance useful in the context of Web Search. In section 7, we see
of pages, independent of the query. The advent of Google how question-answering communities have the potential
revolutionalized search. It made search ubiquitous. Per- to become good complements to Web search engines, and
haps, more importantly, the advertising model adopted by examine current efforts of automatically answering ques-
companies like Overture and Google proved to be very ef- tions by using the Web as a knowledge repository. In sec-
fective. The commercial success of Google made search tion 8, we look at the Semantic Web and one of the first
a very hot topic, both in academia and the industry. From generation Semantic Web search engines. In section 9, we
early 2000’s to date, there has been plenty of research look at the different kinds of user interfaces, besides the
on Web Search, and there have been a plethora of Web traditional ranked list of documents. Finally, we end with
Search startups, trying to be the next Google or capture our conclusions in section 10.

1
2 Taxonomy of Web Search queries the search engines evolve to try to satisfy the transactional
queries in a better way.
Let us look at how the web queries can be broadly clas-
sified, and how the search engines have evolved to meet
the requirements of these different kinds of queries. [2] 3 Object-level Vertical Search
says that the queries can be broadly classified into three
categories: 1. Navigational: The main goal of the query Vertical search is an important new paradigm. The ba-
is to go to a site. (ex “Greyhound bus”). 2. Informational: sic premise is the search engine can be customized to a
The goal is to acquire some information present on one particular domain like health, finance etc, and the vertical
or more pages. 3. Transactional: The goal is to perform search engine can perform better than a general purpose
some activity (like booking a flight). While the distinc- search engine in that domain. Another new paradigm is
tion can be blurred for queries, it is useful to see the evo- searching at an object level rather than at the webpage
lution of search engines as a response to meet these dif- level. Often times, users are interested in information
ferent kinds of queries. The first generation of search en- about a particular object, and in today’s search engine,
gines (like AltaVista, Excite, Webcrawler) were primarily the user needs to extract this object information from the
focused on answering informational queries. Search en- returned pages. So, object-level search is an attempt to
gines post Google (after 1998) are good at answering nav- bring the search engine closer to the user’s needs. For
igational and informational queries. Link analysis and an- instance, if you take a domain like academia, the vari-
chor text [3] are important aspects in doing a good job on ous objects can be researchers, papers, conferences etc.
navigational queries. Today, we are seeing the emergence A user might be interested in questions like who are the
of the third generation of search engines, where the search most popular researchers in Machine Learning. It’s diffi-
engines are trying to help the user accomplish his task as cult for a search engine like Google to answer this query,
easily as possible. For instance, if you type “San Fran- while this is a tailor-made query for an object-level verti-
cisco” in a search engine today, you just do not get back a cal search engine. The object-level vertical search archi-
bunch of pages. Instead you might get a map of San Fran- tecture discussed in [11] conatins three main components:
cisco, weather information, possibly a site to book flight 1. Extracting the domain specific objects and relation-
(or even a form to automatically search for flight reser- ships from the crawled webpages. 2. Object Aggrega-
vations) and hotel reservations in San Francisco, etc. The tion. 3. Object level ranking. The first component is the
search engines today are trying to minimize the user effort process of looking at the crawled webpages, and extract-
as the user goes about trying to finish his task. To achieve ing objects. HMM’s were typically used for information
this, search engines need to do semantic analysis of the extraction. The current state of the art is to use condi-
webpages in the background, and need to understand the tional random fields which typically outperform HMM’s
intent of the user query. This is a very important area because of their ability to capture non-independent and
of current research: trying to extract information from overlapping fields. This paper uses hierarchical condi-
the webpages in the background (for instance, using In- tional random fields for object extraction. The second
formation Extraction techqiues to extract information like component is the object aggregation stage, which includes
names, organizations, etc), and trying to understand the entity resolution (deciding that two objects point to the
user intent at query time. We will see in the next few same real world entity), and object disambiguation (two
sections how the keyword search is often very ambigu- objects with the same name refer to different real world
ous, and how the search engines use additional informa- entities). In this component, they use the web as a repos-
tion to disambiguate the user intent. Search engines have itory and look for co-occurrences to decide on entity res-
also started inspecting the so-called “deep web” which is olution and object disambiguation. For example, if two
the information locked in various databases and available papers appear in the website of a small research lab, and
through interfaces like web forms [4]. The Semantic Web if there is an author with the same name in the two papers,
might also be viewed as an attempt to try to solve these it is highly likely that the two names refer to the same per-
transactional queries. In the next few years, we might see son. The third component is deciding the popular objects.

2
Here, we can exploit the link structure to decide the pop- we can compute a similarity between the annotations and
ularity by a technique similar to page rank. The main dif- the query terms. One can also exploit the relation that
ference is here the links might have different semantics, annotations on the same webpage might be semantically
while in deciding webpage popularity by PageRank, all similar. The paper proposes Social Similarity Ranking
the links have the same semantics. As an example, one (SSR), an iterative algorithm to compute the similarities
link coming into a paper might be the “authored-by” link, among the various annotations. So, one can match the
and a second link might be a “citation” link. A “citation” query terms not only with the actual annotations, but also
link is definitely more important than an “authored-by” annotations which are semantically similar to the actual
link, because the importance of a paper is more dependent annotations of the page. 2. The annotations can also be
on the number of citations, and it is almost independent used as a measure of page importance. The paper pro-
of the number of authors. So, in this paper, a popularity poses SocialPageRank, an iterative algorithm which mu-
propagation factor (PPF) is added to each type of link to tually enhances relations among popular web-pages, upto
capture the importance of that type of link. So with this date users, and hot social annotations to measure the page
architecture, we have an object-level vertical search en- importance. The social page rank can be viewed as a com-
gine in place. Now the users can do both keyword kind of plement to traditional algorithms like PageRank. While
searches, and the more advanced users can also do SQL- PageRank exploits the link structure of how the webpages
styled queries to get more accurate and focussed answers. are linked by the page creators, social page rank captures
(This paper is similar in spirit to the DBLife project [12], the importance of webpages from the annotator’s point of
an ongoing research project at UW-Madison, which is an view. The results in the paper show promise that similar-
object-level community portal in the database field). This ity ranking based on social annotations, and page ranking
work has been implemented in two working systems: Li- by SocialPageRank, can be integrated with the current ap-
bra Acdemic Search (http://libra.msra.cn) and Windows proaches to yield a better Web Search performance. The
Live Product Search (http://products.live.com). Vertical authors do point out several challenges in this approach.
search engines are definitely very popular today, and we For instance, a lot of the webpages might have no social
will surely see more of vertical search in the next few annotations. Another major problem is malicious users.
years, as the different startups try to capture niche mar- As social annotations become an important part of Web
kets. search, it will be accompanied by users who will try to
insert spurious annotations for their own ends. One can
monitor and identify the malicious users, and employ lin-
4 Social Search guistic and statistical techniques to filter out these spuri-
ous annotations, and this is an important direction of fu-
For lack of better algorithms, image and video search is ture research.
still at a stage where we search by the metadata rather [25] advocates combining social search and social nav-
than the actual content. It is quite common in image and igation among users of a particular community. The paper
video search to search according to the annotations added argues that the resulting interface seamlessly integrates
by users. There have also been efforts to automatically the search and navigation tasks. Social navigation means
get people to label images by representing the problem how the browsing behavior of past users guides the other
as a game [13]. In [14], the authors consider the prob- users towards particular websites. Social navigation in an
lem of using social annotations to improve the quality of information space was originally introduced by Dourish
web search. Today, there are a lot of popular social book- and Chalmers as moving towards cluster of people or se-
marking sites like de.li.ci.ous.com, and these annotations lecting objects because others have been examining them
can potentially be leveraged to improve the search qual- [26]. The classic example of social navigation in a hy-
ity. Two main ways of using social annotations are: 1. pertext environment was provided later by Wexelblat and
Just like anchor text was utilized to provide additional in- Maes in the Footprints system, which visualizes usage
formation about webpages, the social annotations often paths in a Web site [27]. For instance, a community of
provide a multi-faceted description of the webpage, and students searching for papers on a topic can follow the

3
trails of past students to locate papers. On the other hand, 5 Personalization
Social search (in the context of community users) is the
way in which the ranking of search results can be modi- One of the problems with today’s Web Search is that the
fied to suit the particular needs of a community. This can user queries usually have a very small number of key-
be viewed as another way of adding context to a query. words, and it is hard to ascertain the exact user intent. A
The query can be disambiguated by the interests of that number of studies have shown that the search queries of-
community. For instance, for a computer science commu- ten tend to be short, underspecified [5], and consequently
nity, Sun probably means the computer science company, ambiguous [6]. One way to add some context to the query
and not the English newspapaer. This is essentially done is personalization. The context of the query can help dis-
in this paper by keeping track of the community’s previ- ambiguate the multiple interpretations of the query. There
ous searches, and pushing ahead the previous user-clicked are many ways in which personalization can be achieved.
results. This makes use of the fact that there can be a lot One can imagine extracting information about a user’s
of query repitions within a community [28]. The navi- preferences from his search history (or other information
gation and search are integrated in the following ways: like the information on his local machine, his currently
While presenting search results, the interface can present opened files, etc), and focus on that part of the Web which
both previous navigation and search results as visual cues the user is possibly interested in. One way of achieving
to help the user choose the links to click among the re- personalization is by customizing the PageRank for each
sults. For instance, the visual cues might indicate related user. For instance, [1] suggests that the random jumps
queries for the particular result (other queries which led to in the random surfer model can be limited to some sub-
the same result), the last time the result was encountered set of pages (like a user’s bookmarks, or previously vis-
(giving an idea of the freshness of the result), relevance of ited pages) to bias the Pagerank towards the user’s pref-
the result for this query (how often the community users erences. For instance, if a person views a lot of Ma-
select this result for the given query), the number of anno- chine Learning pages, the personalized pagerank might
tations for the result by the community members, etc. Be- give higher weight to the homepage of Michael Jordan,
sides, the browing interface is complemented with search the researcher, than pages about Michael Jordan, the bas-
information. For example, when the user has navigated to ketball player. If a personalized PageRank is computed
a document, the interface can give the list of queries by at the granularity of pages, it runs into problems of scal-
the community users which lead them to this document. ability. [24] talks about how to scale personalized web
Using these queries, the current user can find similar doc- search using graph theoretic results, and how to decom-
uments. Thus, This paper aims at bridging the domains of pose a personalized page rank vector into multiple par-
navigation and search and allowing users to move more tial vectors, and share these partial vectors across multiple
fluidly and naturally between information sources via a personalized views to achieve scalability.
richer form of linkages. This paper talks about searches [7] presents a way of analyzing a user’s click history
of users within a single community. However, in social to improve the search performance. This work essen-
networks like Facebook, a user might belong to multi- tially builds on Topic-Sensitive PageRank proposed in [8].
ple communities, and the solution needs to be extended Topic-Sensitive PageRank computes multiple PageRank
to deal with this case. Nonetheless, the paper is a proof- values, one for each first-level topic in the Open Directory
of-concept of how the searches and browsing by members [23]. These values are computed by changing the ran-
within a community might be leveraged to improve the dom jump in PageRank computation to go to only pages
searching and browsing experience of other users of the within the same topic. Once we have these topic specific
community. pagerank values, at search time, based on an estimate of
which topic the query is about, the sarch engine can use
the appropriate PageRank vector. In this paper [7], the
probability of the query belonging to a topic is computed
based on both the query and the user’s topic interest vec-
tor T (his interest in the different topics). It can be noted

4
that here the user’s preferences are being modelled at the tures - one is the standard keyword or string similarity
topic level rather than at the page level (this is computa- (augmented with phrase identification). The second sim-
tionally much more feasible). T is estimated by using the ilarity metric is based on user logs. If two queries result
user’s click history, and a topic-driven searcher model. In in similar document clicks, then the queries can be con-
the topic-driven searcher model, a user chooses a topic sidered similar. This is an important complement to the
t with probability according to his topic interest vector first similarity metric because now similar queries need
T , and issues a query from that topic. The search en- not even share words in common. For instance, if “atomic
gine returns pages ranked by the Topic-sensitive Pager- bomb”, “Nagasaki”, “Manhattan Project”, “Hiroshima”,
ank (T SP R), and the user clicks them according to the all lead to a click on a document titled “Atomic Bomb”,
model in [9]. Now, we can connect the vector V (V (p) is we can infer that the queries are similar despite having no
the probability that a user clicks a page p) with the topic- words in common. In this paper, they use a linear combi-
senstive pagerank and the topic interest vector by V (p) = nation of the above two similarity metrics to get a similar-
Σmi=1 T (i) ∗ [T SP Ri (p)]
9/4
. Vector V is known from the ity between queries, and at least theoretically this should
user’s click history, and TSPR can be calculated as in [8]. perform better than either similarity metric in isolation.
The missing vector T is estimated using maximum likeli- However tuning the parameters is left as a manual knob,
hood estimation. Now at search time, given a query q, we and that is a disadvantage. Nonetheless, this is a good step
need to compute which topic it belongs to. So, we need to identify similar queries and questions. Once we have
to find P r(T (i)|q) which by Bayes rule is proportional to a good similarity metric between questions, one can clus-
P r(T (i)) ∗ P r(q|T (i)). So, the user’s topic interest will ter the questions and identify the most frequently asked
serve as a prior in the Bayes model. Combining this with questions (FAQ’s), and these can be better answered or
TSPR, the personalized oage rank for a query q becomes verified by human experts, or if possible, one can indicate
P P RT (p) = Σm i=1 T (i).P r(q|T (i)).T SP Ri (p). So, this to the machine that these are important queries or ques-
is a way of incorporating the user’s preferences to get a tions. The second similarity metric of using document
personalized pagerank, and this will help bias the search clicks to find similarity between queries can also be used
reuslts towards a user’s preferences. to disambiguate queries with the same word, but different
intent. For instance, two users might have typed “crane”,
but based on the documents they look at, we can see that
6 Clustering one person intended crane-the bird, and another intended
crane-the machine. This kind of query disambiguation
Clustering of web documents can potentially be used in can be useful in a search engine. For instance, the search
many ways. While its impact in improving the qual- engine can provide clusters for a query - with each cluster
ity of search has been fairly limited, it has been pri- containing documents for one interpretation of the query.
marily used for improving browsing and improving the
user interface. For instance, a search engine like Clusty
[18], presents the results in the form of clusters. For in- 7 Question-Answering
stance, on typing “Michael Schumacher” in Clusty, we get
back several clusters like “Formula One”, “Ferrari”, “Pic- One could imagine an ideal search engine as an Omni-
tures”, “Retirement”, etc, and the user can look at only scient Oracle. A user can plug in a question, and the
the documents in the clusters in which he is interested, search engine would return the answer. Currently, there
or jump to relevant clusters. Clusters can also be poten- are two kinds of question answering going on in the Web.
tially useful in improving the search quality. One area One is question-answering portals where the answers are
in which clustering has been proposed as a useful tool generated by humans. For instance, today we have sites
is in the question-answering community. Finding simi- like Ask.com and answers.yahoo.com, and the many com-
lar queries or questions can be very important in search munity specific forums which fit into this pattern. The
engines and question-answering communities. [10] ad- main advantage of these question-answering communi-
dresses this problem. They combine two similarity fea- ties is they can answer questions which are difficult for

5
a search engine to answer. For instance, if something tween the text and the question. The phrase that con-
is wrong with your computer, you can pose the specific tains most query terms gets a high score. A phrase that
problem on a forum and get the answer, while it is very is near a phrase that contains most query terms will get
hard for a search engine to diagnose your computer prob- a slightly lower score, and phrases further away get lower
lem. The key problem here is how to find similar ques- scores. The second score is using linguistic analysis. One
tions, so that when a user poses a question he can be di- can use part-of-speech tagging to tag each word in the
rected to other similar questions for which answers are phrase. Lets call that the signature of the phrase. Then
available. In section 6, we have seen one way of cluster- one can estimate the probability that the phrase is of the
ing similar questions. However, this is still an important same type as the question type, given the signature of the
question with no satisfactory solution, and is a hurdle be- phrase. For instance, suppose the question type is “Per-
fore the question-answering communities can really take son”, and the phrase is “Hugo Young” which has a sig-
off. The second way of question-answering in today’s nature “NNP, NNP”. We can then compute the probabil-
web is doing it automatically. The process of retrieving ity P r(P erson|N N P, N N P ). Lets call this probability
answers from questions is known as Natural Language value the signature score. The total score is the product
Question Answering (NLQA). [15] descibes a probabilis- of this signature score and the proximity score. While
tic way of question-answering on the web. They propose automatic question-answering is not at the level of being
an algorithm which uses a combination of proximity and practically deployed, this is an interesting direction of fu-
question-type features, to answer the questions. The al- ture research. Automatic question-answering might also
gorithm proceeds in several phases. In the first phase, help in bootstarpping the question-answer communities.
the question is converted to an appropriate query, so that
this can be submitted to a search engine like Google. In
the second phase, the algorithm identifies the type of an- 8 Semantic Web Search
swer expected (like person, place, distance, organization,
etc). The question type identification can be done by us- Ideally, we would like the search engine to be able to
ing either Machine Learning, or hand coded heuristics. perform tasks for us, or perfectly solve the transactional
The paper claims that the heuristics are currently perform- queries. For instance, if a user wants to plan a trip to
ing better than Machine Learning in identifying the ques- New York, using today’s search, he gets multiple pages
tion types. In the third phase, the query is submitted to a back, and it involves sufficient effort on the part of the
search engine which returns several documents, some of user to compare various flight tickets, look at various ho-
which hopefully contain the answer to the question. The tels etc. before he can finish his task. It would be cool
fourth phase is identifying which passages or sentences if the machine can automatically go to the websites, un-
in the doucments contain the answers. The purpose of derstand the information, process it, and reduce the user’s
identifying sentences is to reduce the computational com- effort. Semantic Web is an effort in this direction. Essen-
plexity of the next phrase ranking stage. The similarity tially, each web page will be like a structured database,
between a sentence and the query can be computed us- and a machine can potentially read and process all this
ing standard measures like the N-gram approach (normal- information. Semantic Web is facing several significant
ized by the sentence length), or the vector space model challenges like dealing with semantic heterogenity, in-
(for instance, a slight modification of the Okapi ranking centives, efficient execution, etc. Nonetheless, it is start-
function [16]). In the fifth phase, the relevant passages ing to gain momentum in some fields, and it might lead
or sentences are split up into constituent phrases, each of to pockets of machine-readable Web. Swoogle [19] is
which is a potential answer candidate. This splitting into a search engine for the Semantic Web. Swoogle is not
phrases is done by using an off-the-shelf chunker [17]. meant to answer complex queries exploiting the semantic
Finally in the sixth phase, the algorithm ranks the phrases structure and data in the Semantic Web. It is more like
extracted in the previous phase, and presents them to the a metadata search engine. It can be used in finding ap-
user. This is the most important phase. The ranking uses propriate ontologies, finding instance data corresponding
two components. The first component is proximity be- to an ontology, and finding the characteristics of the Se-

6
mantic Web like how is it connected, how are the ontolo- engines have alternative ways of presenting results. For
gies referenced, what is the importance level of a semantic instance, the search results can be categorized by mean-
web document (SWD), etc. Swoogle crawls the Web for ingful and stable categories. The categories can be drawn
SWD’s using 1. specific queries (for instance type ex- from thesaurai, glossaries, or ontologies. The categories
tensions like “.rdf”) on a search engine like Google, 2. can even be simple categories like document type, size of
crawling user-supplied sites, and 3. following semantic the document etc. Flamenco [20] uses multiple sets of
links from SWD’s. Once Swoogle crawls the SWD’s, it hierarchical categories to aid user browsing and search-
extracts metadata from these SWD’s. Swoogle looks at ing. Exalead [21] uses topical categories as well as cat-
relationships at the SWD granularity, and not at the RDF egories based on document type and geography. Instead
triple granularity (which is much harder and expensive). of static categories, one can also dynamically categorize
It captures relationships like: 1. SWD A imports all the the results. For instance, Clusty presents the results in
content in SWD B 2. SWD A uses some of the terms the form of a dynamic hierarchy of clusters. Findex [22]
in SWD B without importing. 3. A extends the defini- presents results categorized by a flat set of clusters. There
tions of terms defined by B. 4. A makes assertions about are other ways of visually representing the results. For
the individuals defined by B. Using these various kinds of insatnce, Themescape uses a topographic map metaphor
links, Swoogle uses a rational random surfing model to to plot keywords (shows how the keywords are related
compute the importance of the SWD’s in a manner sim- pictorially) extracted from a corpus. Another interesting
ilar to PageRank. The model is rational because we as- way of presenting results is in the form of a map. This is
sume that the user is rational: for instance if SWD A im- relevant in fields like academia. For instance, the results
ports content of B, it is expected that from A, the user can be presented in the form of a graph with nodes corre-
goes to B, because B is a part of A. Different weights are sponding to the papers, and edges corresponding to cita-
assigned to the 4 kinds of links which are supposed to tions. These graphs also can visually help the user identify
reflect the probability of a rational user traversing those the major topics and fields. A graph interface can be use-
links. Once, we have these weights in place, we com- ful in other contexts too. For instance, one can imagine
pute the rank of individual SWD’s in a manner similar to the search results of a general search engine being a small
PageRank. This ranking gives the importance of individ- subgraph, which connects two different webpages, which
ual SWD’s. For indexing and keyword search, Swoogle together yield the information requested by the user. So,
takes the SWD’s, reduces them to their rdf triples, extracts there are multiple ways in which the results can be pre-
the URIrefs, and treats an SWD as a bag of URIrefs (as sented to the user, and probably the future search engines
opposed to the bag of words approach in Web search en- will dynamically choose the most appropriate way of pre-
gines). Once we have these bag of URIrefs, traditional in- senting results, based on the query and the user.
formation retrieval techniques like N-gram approach are
used for keyword search to return ontologies and data in-
stances matching the keywords. 10 Conclusions
We have looked at some of the important ways in which
9 Search Interfaces Web Search is currently being explored. Vertical search
is going to be important in the next few years as the vari-
The user interface is certainly an important part of a ous startups try to capture vertical markets. Clustering is
search engine. The clean, uncluttered interface of Google more of an academic topic at the moment. Social search
was one of the reasons why it became popular. However, can improve the search quality, by using annotations as
the Google way of returning results as a ranked list of additional metadata, or using annotations to create an al-
documents is only one among several possible alterna- ternate view of the importance of pages, or by leveraging
tives. (Today, the interface of Google is actually a mix community information to improve the search quality of
of documents, images, videos, maps, book results, etc - other members in the community. Social search has to
nonetheless, it is still a ranked list). Some other search deal with the problems like spurious tagging. Personal-

7
ization is likely to be important in the near future. Person- [10] Ji-Rong Wen, Jian-Yun Nie, and Hong-Jian Zhang. “Clustering
alization helps add context to the user queries and results User Queries of a Search Engine.” In WWW10, 2001.
in better results. Semantic Web is still in a rudimentary [11] Zaiqing Nie, Ji-Rong Wen, and Wei-Ying Ma. “Object-level Verti-
cal Search.” In CIDR, 2007.
stage, and it faces several challenges before it can become [12] Pedro DeRose, Warren Shen, Fei Chen, Yoonkyong Lee, Doug
a useful tool at the Web level. However, the potential Burdick, Anhai Doan, and Raghu Ramakrishna. “DBLife: A Com-
rewards of a Semantic Web are huge. Question-answer munity Information Management Platform for the Database Re-
search Community.” In CIDR, 2007.
communities can turn out to be a good complement to [13] Luis von Ahn and Laura Dabbish. “Labeling images with a com-
search engines. Humans can provide focused and accurate puter game.” In SIGCHI, 2004.
answers to questions (which are difficult for a search en- [14] S. Bao, X. Wu, B. Fei, G. Xue, Z. Su, and Y. Yu. “Optimizing Web
gine to answer). Automatic question-answering is still in Search using Social Annotations.” In WWW, 2007.
the academic realm at the moment. It is an interesting area [15] D. Radev, W. Fan, H. Qi, H. Wu, A. Grewal. “Probabilistic Ques-
tion Answering on the Web” In WWW, 2002.
of research, and can also help bootstrap question-answer
[16] S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and
communities. So, there are several interesting directions M. Gatford. “Okapi at TREC-4.” In Proc. of the Fourth TREC,
in which the field of Web search is evolving. It is hard 1996.
to say what the next paradigm shift in search is going to [17] Andrei Mikheev. “Document centered approach to text normaliza-
tion.” In SIGIR, 2000.
be. Probably in the next few years, the most important [18] Clusty. http://www.clusty.com
improvement will be a better semantic understanding of [19] Li Ding, Tim Finin, Anupam Joshi, Yun Peng, R. Scott Cost, Joel
what the web pages mean, and understanding the user in- Sachs, Rong Pan, Pavan Reddivari, Vishal Doshi. “Swoogle: A
tent. This should help the the search engines better an- Semantic Web Search and Metadata Engine.” In CIKM, 2004.
swer user queries, particularly transactional queries. The [20] K. Yee, K. Swearingen, K. Li, and M. Hearst. “Faceted metadata
for image searching and browsing.” In SIGCHI, 2003
ultimate search engine will be like an Oracle, which has [21] Exalead. http://www.exalead.com
a complete understanding of the content on the Web, us-
[22] Kaiki. “Findex: search result categories help when document rank-
ing which it can give accurate and focused answers to a ing fails.”’ In SIGCHI, 2005.
user query. Semantic Web is a small step in that direction. [23] http://dmoz.org
Possibly, advances in AI, Machine Learning, and Natu- [24] Glen Jeh and Jennifer Widom. “Scaling Personalized Web Search.”
ral Language Processing might help search engines move In WWW, 2003.
towards that goal. [25] Jill Freyne, Rosta Farzan, Peter Brusilovsky, Barry Smith, and
Maurice Coyle. “Collecting Community Wisdom: Integrating So-
cial Search and Social Navigation.” In IUI, 2007.
[26] P. Dourish and M. Chalmers “Running out of space: models of
References information navigation.” Proceedings of HCI, 1994.
[1] L. Page, S. Brin, R. Motwani, and T. Winograd. “The PageRank [27] A. Wexelblat and P. Maes “Footprints: History rich tools for infor-
citation ranking: Bringing order to the Web.” Technical Report, mation foraging.” In CHI, 1999.
Stanfoed Digital Library Technologies Project, 1998. [28] B. Smyth, E. Balfe, J. Freyne, P. Briggs, M. Coyle, and O. Boy-
[2] Andrei Broder. “A taxonomy of Web Search.” In SIGIR, 2002. dell. “Exploiting Query Repitition and Regularity in an Adaptive
Community-based Web Seach Engine.” In User Modelling and
[3] Nick Craswell, David Hawking, and Stephen Robertson. “Effec- User-Adapted Interaction, 2004.
tive site finding using link anchor information.” In SIGIR, 2001.
[4] http://googlewebmastercentral.blogspot.com/2008/04/crawling-
through-html-forms.html
[5] B. Jansen, A. Spink, and T. Saracevic. “Real life, real users, and
real needs: A study and analysis of user queries on the Web.” In-
formation Processing and Management, 2000.
[6] Robert Krovetz and W. Bruce Croft. “Lexical ambiguity and infor-
mation retreival.” In Information Systems, 1992.
[7] Feng Qiu and Junghoo Cho. “Automatic Identification of User In-
terest For Personalized Search.” In WWW, 2006.
[8] T. Haveliwala. “Topic-Sensitive PageRank.” In Proc of the
Eveleventh Intl. World Wide Web Conf, 2002.
[9] J. Cho and S. Roy. “Impact of Web search engines on page popu-
larity.” In Proc. of WWW, 2004.

You might also like