You are on page 1of 17

Handbook of Research on

Computer Mediated
Communication
Sigrid Kelsey
Louisiana State University, USA
Kirk St.Amant
East Carolina University, USA
Hershey New York
InformatIon scIence reference
Volume I
Acquisitions Editor: Kristin Klinger
Development Editor: Kristin Roth
Senior Managing Editor: Jennifer Neidig
Managing Editor: Jamie Snavely
Assistant Managing Editor: Carole Coulson
Copy Editor: Katie Smalley, Lanette Ehrhardt
Typesetter: Christopher Hrobak
Cover Design: Lisa Tosheff
Printed at: Yurchak Printing Inc.
Published in the United States of America by
Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue, Suite 200
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com
and in the United Kingdom by
Information Science Reference (an imprint of IGI Global)
3 Henrietta Street
Covent Garden
London WC2E 8LU
Tel: 44 20 7240 0856
Fax: 44 20 7379 0609
Web site: http://www.eurospanbookstore.com
Copyright 2008 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by
any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identifcation purposes only. Inclusion of the names of the products or companies does
not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Handbook of research on computer mediated communication / Sigrid Kelsey and Kirk St. Amant, editors.
p. cm.
Summary: "This book provides academics and practitioners with an authoritative collection of research on the implications and social
effects computers have had on communication. With 65 chapters of innovative research compiled in this comprehensive reference source,
this handbook of research is a must-have addition to every library collection"--Provided by publisher.
ISBN 978-1-59904-863-5 (hbk.) -- ISBN 978-1-59904-864-2 (e-book)
1. Computer-assisted instruction. 2. Communication and technology. 3. Information technology--Social aspects. I. Kelsey, Sigrid. II. St.
Amant, Kirk, 1970-
LB1028.5.H3163 2008
378.1'734--dc22
2008001871
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of
the publisher.
If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating
the library's complimentary electronic access to this publication.
740
Chapter LIII
Seek and Ye Shall Find
Suely Fragoso
Unisinos, Brazil
Copyright 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AbstrAct
This chapter proposes that search engines apply a verticalizing pressure on the WWW many-to-many
information distribution model, forcing this to revert to a distributive model similar to that of the mass
media. The argument for this starts with a critical descriptive examination of the history of search
mechanisms for the Internet. Parallel to this there is a discussion of the increasing ties between the
search engines and the advertising market. The chapter then presents questions concerning the con-
centration of traffc on the Web around a small number of search engines which are in the hands of an
equally limited number of enterprises. This reality is accentuated by the confdence that users place in
the search engine and by the ongoing acquisition of collaborative systems and smaller players by the
large search engines. This scenario demonstrates the verticalizing pressure that the search engines apply
to the majority of WWW users, that bring it back toward the mass distribution mode.
INtrODUctION
The 20
th
Century was the century of mass com-
munication, during which cinema, radio and
television fourished using the irradiative (one-
to-many) distribution model. In its last decades,
however, a new practice emerged from military
institutions and university campi: computer medi-
ated communications (CMC). At frst sight, this
did not seem to be more than the transposition
into a new technological environment of some
pre-existent modes of interpersonal (one-to-one)
communication, such as the postal system or the
telephone. This was proved not to be the case
however, as CMC developed into an epidemic
(many-to-many) mode of communication, what
was mostly due to the technological confguration
(networked) and the cultural environment (both
the university background and the proximity
between the hacker community and the counter-
culture movements).
With the popularization of the Internet, and in
particular with the implementation of the World
741
Seek and Ye Shall Find
Wide Web, the possibilities of many-to-many com-
munication were extended to an unprecedented
number of people. In the context of the 1990s,
defned by the apparently insuperable hegemony
of the model of mass communication, it was al-
most impossible to not welcome the subversive
potential of CMC. Despite the absolute numbers
obscuring the fact that only a very reduced portion
of the worlds population has full access to digital
communication networks, it is undisputable that
CMC has exponentially increased the number of
individuals that are capable of adopting the role
of producers of communicational processes on a
large scale, thus provoking a rearrangement of
the mediatic scenario. Without detracting from
the merits of this new communication mode, it
is just as important to be aware of the negative
consequences of the many-to-many model.
First of all, a large number of senders implies
an increased number of messages. In a paper that
has become a classic on the subject, Lawrence
and Giles (1999, p. 2) estimate that there were
800 million indexable
1
pages available on the
Web in 1999. One year later, Murray (2000, p.
3) calculated the number of indexable pages had
already exceeded two billion. In January of 2005,
Gulli and Signorini (2005, p. 1) calculated the
existence of no less than 11.5 billion pages. As if
the sheer magnitude of these numbers were not
enough, it is worth remembering that the Web is
essentially dynamic and self-organized. In the year
2000, when the daily increase in the number of
pages was estimated at 7.3 million (Murray, 2000,
p. 3), Arasu, Cho, Garcia-Molina, Paepcke, and
Raghavan (2001, p. 3) reported that the half-life
of pages with a .com domain did not exceed 10
days
2
. In addition to this it is necessary to consider
the immense variety of languages used on Web
pages (text, sound, static and dynamic images)
and the dynamism of the pages content.
The scenario constituted in this way is of such
exuberance that it brings to the foreground the
crucial difference between the multiplication of
the number of people capable of publishing on
the World Wide Web and the visibility of each
of these people. The question cannot be reduced
to the one about the quality or pertinence of the
material presented, given that there are many
different conceptions of pertinence. Under the
hypothesishighly improbablethat all of the
millions of terabytes on the Web are of interest
to everyone, the problem of excess is still not
resolved. In the absence of an entry gatekeeping
control, as is the norm in analogue communica-
tion media, the many-to-many environment of the
Web favors the emergence of selection fltering
mechanisms on the exit. In this scenario, search
engines constitute an obvious and apparently in-
nocuous solution.
Nevertheless, as this chapter proposes, search
engines apply a verticalizing pressure that places
at risk the epidemic format of the WWW. In
order to understand how search engines nave
been transformed into forces of massifcation
requires a review of some key moments in the
history of Internet search. Thus, the next section
of this chapter intends to not only identify the
technological advances made over the years, but
also to describe and discuss how the increasing
connections between search tools and the advertis-
ing market ended up placing the search engines
themselves in jeopardy until a new generation of
search tool emerged.
bAcKGrOUND: A brIEF (AND I
NcOMPLEtE) hIstOrY OF
sEArch ENGINEs
The need for guidance in the midst of the profusion
of material available on the Internet dates back
to before the emergence of the World Wide Web.
In 1990, the frst indexercalled Archie
3
ap-
peared. It collected information on fles available
through anonymous ftp servers and kept this up to
date by checking the information on a thirty day
schedule. Users of Archie searched for character
sequences (regular expressions) in the names
742
Seek and Ye Shall Find
of the fles or directories available in the index.
Initially intended for departmental use, Archie
was announced to the world when it covered just
over 200 servers (Deustch, 1990).
The ease of localization of fles available for
ftp using Archie inspired the creation of a similar
indexer for Gopher, which was called Veronica
4
.
Veronica was a database that collated the menus
from Gopher servers, allowing queries to be made
by topic (using key words) instead of by server
(as was inherent to the system). Soon afterwards
Jughead
5
appeared, which had the merit of in-
troducing the possibility of performing Boolean
searches.
Another system, which was more advanced
in many aspects and which brought together the
properties of Gopher and of the search tools that
operated on it, had been in operation since the
previous year. This was WAIS (Wide Area Infor-
mation Server), which had been developed in a
joint initiative by four companies
6
. With WAIS,
it was possible to carry out queries on remote
databases, the results of which were organized in
decreasing order of frequency of the key words.
WAIS clients were created for various operat-
ing systems, including Windows, Macintosh
and Unix, but its proprietary nature limited the
popularity of WAIS. In fact it could be said that,
at the time, to contradict the public nature of the
Internet various good ideas and implementations
of these succumbed due to the insistence on com-
mercialization. Even so, it is doubtful that CERN
(http://www.cern.ch) had any idea of the scale of
the consequences of its decision to drop, in 1993,
any claim to ownership of the basic code of a
global hypertext that had been started by Tim
Berners-Lee in 1989 (CERN, 1993) and that would
become the World Wide Web as we know it today.
Combined with the decision to make the WWW
a public domain system, the launch of the frst
browser for Windows, the X Windows Mosaic
7

and its later conversion for Macintosh platforms,
helped popularize the Web on a scale that was
unprecedented for other information systems.
Shortly after Mosaic was launched, the frst
spider began crawling the Web. This was the World
Wide Web Wanderer
8
, the frst Web robot. The
Wanderer traversed the Web mapping each page
of a site and proceeding on to one of the pages
linked to from that page, to then map that new
page and so on in succession
9
while recording the
addresses it found in a database. The initial idea
was to map the entire Web (Gray, 1995) and was
based on the premise that all the pages would be
connected to at least one other, and it would only
be a matter of time until the Wanderer explored
the entire Web
10
.
Despite the controversy caused by the impact
of the WWW Wanderer on the network servers,
before the end of 1993 there were at least three
other bots crawling the Web: JumpStation, World
Wide Web Worm and RBSE. The Worm
11
indexed
the titles and addresses of pages while JumpSta-
tion
12
innovated in also storing the page headers.
Both presented the results in the order that they
found them. RBSE
13
was the frst Web bot to
implement a ranking system based on the rela-
tive relevance to the search term used (Mauldin,
1997; Wall, 2006).
Still in 1993 the frst indexer designed spe-
cifcally for the Web appeared, the Archie-Like
Indexing of the Web, or AliWeb
14
. Strongly infu-
enced by Archie, AliWeb did not use a crawler but
compiled its database from information supplied
directly by Webmasters. This allowed the system
to store descriptions of the pages which were
supplied by the page creators themselves but, on
the other hand, the goodwill of third parties was
necessary to maintain the data up to date and of
high quality.
Also working with a database that was con-
structed without the use of crawlers, the frst
queriable Web directory, Galaxy
15
, appeared the
following year. As it listed only the URLs that
had been directly provided, Galaxy could organize
these addresses in categories and subcategories,
allowing users to restrict their search to a subarea
of the database, which accelerated the process and
made it more accurate.
743
Seek and Ye Shall Find
It was not long until a bot appeared that was
capable of bringing together the recording of the
complete content with the functionality of auto-
mated crawling. To achieve this, WebCrawler
16

adopted a vectorial indexing
17
system. This
strategy was a great success: after 6 months of
use WebCrawler had indexed thousands of docu-
ments and carried out almost a quarter of a million
searches, attributed to more than 23,000 different
users (Pinkerton, 1994). By November of its frst
year, the number of searches carried out had risen
to the million mark (Pinkerton, n.d.). Soon the
University of Washington system could no longer
support the search tool, a problem that would only
be resolved with the sale of WebCrawler.
Other search systems improved further on the
combination of functionality and scope that had
been inaugurated by WebCrawler. One of the
most signifcant of these was Lycos
18
, which, in
addition to organizing search results by relevance,
allowed searches by prefx and gave weighting to
the proximity between words (Mauldin, 1997).
One of the initial appeals of Lycos was the size
of its database: by August 1994, Lycos had
identifed 394,000 documents; by January 1995,
the catalog had reached 1.5 million documents;
and by November 1996, Lycos had indexed over
60 million documents - more than any other Web
search engine (Mauldin, 1997) at the time. The
weight of this database was alleviated by the
strategy of not storing the complete content of
the pages but of keeping only a summary, which
was automatically constructed taking into ac-
count the 100 most frequent keywords in each
page combined with the title, the header and
the frst 20 lines or 10% of the document. These
summaries could be viewed together with a list
of results and helped the user decide which of the
pages found to visit frst.
Another important difference of Lycos was
the way its crawler worked. This was neither
depth-frst nor breadth-frst, but in line with a
strategy that Maudlin called best-frst. To defne
what was the best page and thus the next to be
crawled, the Lycos spider took into account the
number of links that each page received from
other servers (inlinks).
In the mid-1990s, the capacity of the Web to
attract signifcant volumes of traffc started to gain
the attention of new investors. The search systems
were considered to be particularly interesting by
the advertising industry, which initially wanted
to place banners and small advertisements on
their home pages. Soon the search system opera-
tors discovered that intensifying the number of
viewers was the way to attract more advertising
revenue. With a view to generating traffc of their
own and increasing the length of stay of users
within their domain many assumed the format of
a portal, and started offering a variety of services.
One of the frst and most successful Web portals
was, without doubt, Yahoo!.
Yahoo! started modestly, as a list of the favorite
sites of two freshman doctorate students at the
University of Stanford (Yang & Filo, 1994). The
practice of publishing lists of favorites on the
Web was common enough at the time and the big
differential for the index from Yang and Filo was
the availability of brief descriptions of the pages
listed. As the number of indications grew, the list
became unwieldy, so the authors created a tree
structure (categories and subcategories), giving
Yahoo! a directory profle. To meet the growing
popularity of the list they also added a search tool
and started accepting inscriptions of Web sites
that wanted to appear in their database. Less than
a year after it was frst published, the Yahoo! page
celebrated its one millionth page view, with visi-
tors coming from almost 100 thousand distinct
addresses (Yahoo! Media Relations, 2005).
Having arrived late, AltaVista
18
had to con-
front ferce competition. It was, however, much
faster than the other search engines available at
the time and promised Web masters that it would
update any information received within 24 hours.
It was the frst tool that permitted queries to
be formulated in natural language, searches of
newsgroups and searches for words associated
744
Seek and Ye Shall Find
with images, titles or other HTML felds. It was
also the frst tool to provide searches for inlinks,
a possibility that tended to pass unnoticed by the
ordinary user, but one that had important implica-
tions for marketing. In addition to this AltaVista
added a tips feld below the search form which
increased user loyalty to the tool (Sonnenreich &
Macinta, 1998).
At this point, new forms of integrating the
advertising with the search results, adapted to
the push
19
nature of the Web, began to become
popular. Paid inclusion, where a Webmaster paid
a search engine or directory to guarantee that the
site concerned was included in the database, was
already commonplace when a more advanced ver-
sion appeared. This was paid placement, which
consisted of paying the search engine provider to
ensure that the site concerned appeared among
the most highly classifed in response to a specifc
word (or words). In 1997, GoTo (1997, Idealab!)
inaugurated a new sales model, introducing the
pay-per-click system, in which the advertisers
only paid the search engine when the link to the
site concerned was used. Search engines rapidly
became the principal vehicles for online advertis-
ing (FutureNow, Inc, 2003, p. 15).
The very success of the search business gen-
erated competition, and soon there were dozens
of different search engines available on the Net.
Each one of these worked with its own interface
and algorithms and their databases covered dif-
ferent portions of the Web. As a result, queries
made on different systems produced different
results and the user was faced with repeating the
same query on different tools to obtain the widest
range of responses. To meet this problem there
arose the metasearch tools, which allowed various
systems to be queried at the same time. The frst
two metasearch tools appeared almost simultane-
ously, in 1995. Savvy Search
20
ran searches on up
to 20 other search engines at a time and included
access to some themed directories. However, it
simply ignored the advanced search options of
the various search engines. MetaCrawler
21
, which
became more popular, on the other hand, took on
the variations of syntax between the advanced
options of the various engines and taking a query
in its own syntax converted this user input into
the corresponding command fore each system.
The results obtained were converted to a standard
format in the page that was presented as a result
(Selberg & Etzioni, 1995).
From the point of view of the original search
engines the meta searches were a bad idea, given
that they took public away from their pages and
as a consequence reduced the interest of ad-
vertisers. With the users, however, they were a
great successin particular MetaCrawler, which
quickly outgrew the capacity of the servers on
the campus of the University of Washington, and
was licensed to go2net, which was later to become
InfoSpace. Under the management of InfoSpace,
MetaCrawler found a compatible business model
for a metasearch engine and started to provide
the results from various search engines together
with the original advertising from each site. The
great commercial drive for metasearch tools came
from pay-per-click advertising, which allowed for
differentiation between traffc originating from
the original engine and that deriving from the
meta-engine.
Parallel to the manipulation of search results
deriving from paid placement there also appeared
the phenomenon of search spam.
22
From the
point of view of the search engine providers, it
was important to avoid such spam as incorrect or
badly ordered results would drive users away and
with them, the advertisers. To this end the search
engines developed more and more sophisticated
indexing and classifying algorithms. Opposite
to this, however, the number of paid inclusions
shown in the search results was growing ever
larger. Soon the dissemination of these practices
began to compromise the confdence of users in
search engines in general.
At this point the market battle seemed to turn
on the size of the databases of the various search
systems. Impressive numbers were exhibited as
745
Seek and Ye Shall Find
an argument for the existence of a large num-
ber of users. Due to the high costs involved in
compiling databases large enough to compete,
the survival of the smaller search tools became
almost impossible. Many were bought by larger
search engines, which were interested in both
the increasing the size of their database and,
very importantly, in access to the crawlers and
classifcation systems of the smaller companies
which, as was the standard practice, were trade
secrets. The competition to obtain a larger slice
of the advertising market was intense, but the
potential proft was huge. The users, however, had
been placed in the background, a mere statistic
to present to the advertisers.
In the academic world, there was under devel-
opment a classifcation system that would bring
back to the center one of the more interesting
characteristics of Lycos: the popularity heuristic
(Mauldin, 1997). This strategy was improved
for use in BackRub, which classifed the results
in terms of the number of back links that each
site received. The project rapidly grew and was
renamed as Google
23
. In principle Page and Brin
(1998) did not appear to be planning to develop
a company based on their new search engine, to
the degree that they tried to sell it, unsuccess-
fully, in 1998. One year later, Google was still a
beta version, but it had acquired a reputation as
a new search engine that supplied more reliable
results than other systems that not only did not
have paid inclusions among the results but that
also used an innovative classifcation algorithm,
the details of which were open to inspection
24
.
Other strengths of Google were the speed of
its searches and the simplicity of its interface
(starting with the absence of banners and other
advertising material, which meant that the home
page loaded much faster than that of other search
engines). Soon Google was able to confront the
competition with respect to database size and
started informing the number of indexed pages
immediately beneath the query feld.
At the end of 2000, Google started to provide
some paid results, but contrary to the majority of
systems, it did not mix these in with the natural
results. By this time Google had established
itself as the best search engine in the opinion of
the general public, which accepted happily the
graphic differentiation between paid and natural
results. Other search engines had to deal with the
superiority in relevance of the results provided
by Google and the loyalty that this quality gen-
erated among its users, to the degree that some
large systems such as Yahoo! signed agreements
to include results deriving from Google in their
own pages. By the end of 2003 it was estimated
that two thirds of all search queries carried out
on the Web returned results originating from
Google (Thies, 2005).
In September of 1999 Microsoft MSN Search
started to use its own classifcation system to the
data obtained from various databases (Sullivan,
1999), setting in motion the process of separat-
ing itself from subcontractors that had until then
driven its searches (LookSmart, Inktomi/Yahoo!).
In 2003 Microsoft announced plans to build its
own crawler (Sullivan, 2003) which would only
be offcially launched 2 years later (Sullivan,
2005). A little more than 1 year after this, Mi-
crosoft launched Windows Live Search, a new
search tool that had a more customizable interface
and that allowed the user some control over the
ordering of the results (limited to classifcation
by the most recent, most popular, or most exact)
(Murray, 2006).
As it has been shown, the present generation
of search engines was born as a way out of the
previous entanglement between search tools and
advertising which had placed the entire search
engine business at risk. The next section of this
chapter will show that the recovery of user con-
fdence in the search engines did not, however,
result in redemption for the many-to-many dis-
tribution model for the World Wide Web. The
concentration of the search engines still implies
a verticalizing pressure on the distribution model,
746
Seek and Ye Shall Find
leaving it more similar to mass communication
than to the many-to-many standard that guided
its creation.
thE POwEr OF MONEY
When approaching the Internet from the perspec-
tive of political economy, van Couvering (2004)
saw the net as an important new medium, the
structure of which followed the same irradiative
model that characterizes mass communication
media. In fact, the global reach of the search tools
and their concentration in the hands of a limited
number of businessmen, the majority from the
United States of America, reinforces the image of
the scenario as being extremely similar to that of
the great traditional media conglomerates
25
. The
concentration of search engines in the hands of
a few groups accelerated after the .com bubble
burst in 2000. Its strength is even more evident
when one moves on from the number of players
involved to the relationships that exist between
these. Between the 11 main search engines identi-
fed in January of 2007, the results of all of them
derived from just four sources: Google, Ask.com,
MSN and Yahoo! (Figure 1).
Evidently, there are a variety of small search
tools that are not represented in the diagrams above
and are not considered in the market analysis of van
Couvering. These are experimental or thematic in
the majority and operate on small databases, often
in incubators or universities. It would not be the
frst time if one of these took over the leadership
of the search industry in the future, as this has
already happened with, for example, AltaVista
and with Google. Nevertheless, the increasing
consolidation of the search business makes this
kind of event less and less likely. As the capital
for the search industry derives mostly from ad-
vertising, survival in the current market depends
upon the ability to attract high numbers of users.
The users, in their turn, tend to stay with the tools
with which they are most familiar.
Year after year Google, Yahoo! and MSN
appear among the 10 most visited sites in all
of the nations covered by Nielsen/Netratings
(htpp:/www.nielsen-netratings.com). More than
80% of searches are concentrated in these three
companies. The users, in their turn, use these tools
to access even the best known sites, making the
search engine the focal point of the online experi-
ence for all users, from the most inexperienced to
the savviest (Nielsen/Netratings, 2006).
The other two methods of accessing sites,
typing the URL directly into the address box and
by following links from one site to another, are
used on a much lower frequency. For the majority
of users, everything that is the Web is restricted
to the content of the databases of the big search
engines. Despite these being signifcant in size,
they only cover a portion of the WWW. Even if the
deep Web
26
is not considered, Gulli and Signorini
have calculated that in 2005 the databases of the
main search engines covered no more than 76.2%
of the Web (Google. The scope of Yahoo! was
69.3%, of MSN 61.9% and of Ask 57.6%) (Gulli
& Signorini, 2005, p. 2). The degree of overlap
of the databases of the four most popular systems
is also signifcant (Figure 3):
Even when indexed, many sites never appear
in the search results. One of the reasons for this
is the time restriction that the tools effectively
place on searches. This is to avoid the user giv-
ing up on the search and switching to a different
engine and means that after a specifed time
interval the search is interrupted, regardless of
the coverage of the search (Google even shows
the length of time dedicated to the search with
the query results found) This time restriction is
irrelevant, however, when it is taken into account
that despite announcing enormous quantities of
results to the user, the large search engines do not,
in fact, make available more than the frst 1000 of
these results. In addition to this, despite the use
of declustering algorithms, more than one page
from the same site often appears in the results
that are presented (Fragoso, 2006).
747
Seek and Ye Shall Find
The majority of users do not notice the limit
of the number of pages effectively shown by the
search engines as they restrict their attention to
the frst few. Empirical verifcation has shown that
no more than 10% of users advance more than the
third page of results, with 62% selecting a result
from the frst page (iProspect, 2006). The result
is an accentuated channeling of traffc to some
few addresses, converging on those that classify
best with the search engines.
It is evident that search engines cannot avoid
producing selections or establishing hierarchies;
after all, this is their main purpose. It is true that
their operation does not represent a reclosing of
the opportunity of being a sender and thus does not
threaten freedom of expression on the WWW. It
is, however, necessary to be aware that the search
engines function as true Web gatekeeperswith
the added complication that they operate accord-
ing to criteria that are carefully maintained in as
trade secrets. Finally, it has to be said that the
search results can be highly inconsistent. Searches
made with the same parameters at different mo-
ments produce different results, most frequently
in Google (Fragoso, 2006). However, users tend
to feel in control of their searches and trust the
search engines, which many consider to be fair
and unbiased source of information (Fallows,
2005, p. 2).
Given the concentration of the search business
in the hands of very few players, the confdence
that the users place in the results of searches is
the last piece required for the Web to revert to
a vertical distribution model. Before closing the
diagnosis, it is important to take into consider-
ation some alternative possibilities wherein the
Figure 1. Relations between search engines in January 2007 (Adapted from Bruce Clay, Inc., 2007)
748
Seek and Ye Shall Find
many-to-many distribution model that guided the
design of the WWW could be preserved.
FUtUrE trENDs: cAN
ALtErNAtIVE sOLUtIONs
sUrVIVE thE PrEssUrE?
The frst years of the 2000 decade have seen
the rediscovery of the potential of collaborative
systems such as social bookmarking
27
. This
practice, which gave rise to important tools such
as Yahoo! reappeared with the improvement of
collaborative tagging, which consists of the as-
sociation of keywords with the listed site. Tools
based on social tagging have led to queries on
databases that were compiled from data from the
users themselves, taking as a basis the tags that
the community members have chosen to associate
with the indexed items. At the time of writing,
one of the most popular social tagging sites is
Del.icio.us (http://del.icio.us), but innumerous
others exist.
Collaborative systems are typical of the so
called Web 2.0 and depend upon the subversive
infuence of the long tail, a characteristic long
known of in statistics and one that has been re-
cently popularized
29
. The concept of the long tail
applies to the Web, the linked structure of which
forms a pattern in which a few sites are highly
connected while the majority of sites receive very
few inlinks
30
. Contrary to the algorithms based
on the popularity of the sites that have the larg-
est number of inlinks, the long tail hypothesis
brings into view the enormous power of small
sites, whose cumulative audience can be greater
than that of a major portal.
The acquisition of collaborative sites by large
search companies
31
demonstrates that the latter
are not unaware of the long tail phenomenon.
Figure 2. Percentages of total searches carried out by U.S. users on various engines in November of
2006. Searches restricted to the content of the site that the user was currently viewing (internal searches)
were not included. Google includes all of the sites of the brand Google (Google.com, Google.com.br,
Google Images, etc) Yahoo! includes all of the sites of the brand Yahoo!, (Yahoo.com, Yahoo.com.br or
Yahoo.local). It does not include searches on sites that belong to Yahoo! such as Altavista or AllteWeb.
MSN includes all of the sites of the brand MSN, such as MSN Search, but not Windows LiveSearch
(which corresponded to about 0.02% of the total) AOL includes all of the sites of the brand AOL. Ask
includes searches on Ask.com but not on the other sites of Ask/IAC (MyWay.com, iWon and My Search).
The category Others includes all searches on sites that are not mentioned above or named in the graph.
No site not named in the graph had more than 2.5% of the public. Recreated from Sullivan, 2006.
749
Seek and Ye Shall Find
The trend also points toward the absorption of
specialized search tools, which concentrate on
local searches, specifc themes or on the dynamic
Web
32
.
The tentacles of the big players are not
restricted to other search tools. Even Google,
originally an alternative to the portal model, is
moving in the direction of diversifcation of its
activities. The plethora of services offered today
by Google is so varied that its size manages to
pass unnoticed by most of its users. In addition
to specialized searches (GoogleFinance, Froogle)
these include services such as Google Checkout,
Google Calendar, Google Talk, Gmail and appli-
cations such as Google Web Accelerator, Google
Earth, Picasa and Google Desktop. At this time
the accumulation of searches that are based on
Google, its partners and subsidiaries indicate a
monopolistic profle that has given the company
the reputation of being the Microsoft of the In-
ternet (Maney, 2005; Mohney, 2003).
In a highly unregulated setting, Google and
its powerful competitors have begun to examine
possible convergence. At the end 2006, Google,
Yahoo! and Microsoft announced their frst joint
action, the adoption of Google SiteMaps Protocol
as a common standard for the three companies.
With this unifcation Webmasters no longer have
to separately report their pages to each of the
databases of Google, Yahoo! and MSN but can
now do this in one unifed action (Mills, 2006). In
practice this means that a portion of the databases
of each of the three companies will be a mirror
of a portion of that of the other two.
Overall, despite the decentralizing potential
of collaborative systems and the continuous
emergence of small players dedicated to special-
ized searching, the verticalization of the WWW
continues to increase as the search business
increasingly concentrates in the hands of very
few players.
cONcLUsION
CMC increased, in an unprecedented manner, the
number of individuals that are capable of adopt-
ing the role of producers of communicational
processes on a large scale. It can not be denied
that this has resulted in an enormous democra-
tization of international communication and has
Figure 3. Graphical representation of the percentage of the indexed Web in the databases of the largest
search companies, with their respective intersections (Adapted from Gulli & Signorini, 2005, p. 2)
750
Seek and Ye Shall Find
challenged the vertical model of information
distribution that has characterized the practices
of mass media throughout the 20
th
Century. On
the other hand the increase in the number senders
implied a proportional increase in the number of
messages. In the frst decade of the 21
st
century,
there are billions of available Web pages, using a
true plethora of languages (text, sound, static and
dynamic images). At any given moment there are
pages being updated, new pages being created and
others, not necessarily in less signifcant quantity,
being abandoned or removed. In the midst of such
exuberance the lack of information ceased (or
appeared to cease) to be a problem. Nevertheless,
when everything that there is to be known about
something appears to be available, the excess of
information reveals its most negative aspect.
In this scenario search engines perform the
essential role of selecting and ordering, providing
an apparently innocuous solution. While their
operation does not threaten freedom of expression
on the WWW, as they do not close the opportunity
for anyone to be a sender, it is necessary to be
aware that they control the visibility of what has
been published and thus determine what may well
be seen and what is not likely to be found.
The history of search engines reveals an
entanglement with advertising that reached the
point of placing the search engine business at
risk. Currently, the concentration of traffc in
the hands of a few tools which, in turn, belong
to a still smaller group of businesses, shows an
economic concentration of the digital media
which is unprecedented even in the mass media.
Despite the criteria for ordering in the results of
searches being carefully maintained trade secrets,
it is indicated that users trust the search engines,
which are seen as being fair and unbiased.
Nowadays, the epidemic information distri-
bution model of the World Wide Web no longer
corresponds to the experience of a good portion of
the users, the attention of whom is conditioned to
those addresses that are classifed highest among
the results that it is possible to obtain from the
large search tools. The small specialized search
tools and the collaborative systems constitute an
effective hope for decentralization. Each new
service that starts to stand out, however, simply
becomes the next target for the capital behind the
large tools, which continue to quickly absorb these
small enterprises into their plethora of services.
The result is that the traffc fow on the WWW
is already seen to be highly concentrated and the
visibility of addresses rests in the hands of a small
group of enterprises.
Meanwhile, the users trust wholeheartedly in
the search engines, providing the fnal condition
for the Web to revert to a vertical distribution
model, the behavior of which is tending toward
being even more centralized and biased than that
of the mass communications media.
rEFErENcEs
Anderson, C. (2004). The long tail. Wired Maga-
zine, 12(10). Retrieved February 18, 2008, from
http://www.wired.com/wired/archive/12.10/tail.
html
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke,
A., & Raghavan, S. (2001). Searching the Web.
ACM Transactions on Internet Technology, 1(1),
2-43. Retrieved February 18, 2008, from http://.
portal.acm.org
Barabsi, A.-L. (2002). Linked. New York:
Plume.
Bergman, M. K. (2001). The deep Web: Surfac-
ing hidden value. The Journal of Electronic
Publishing, 7(1). Retrieved February 18, 2008,
from http://www.press.umich.edu/jep/07-01/berg-
man.html
Brin, S., & Page, L. (1998). The anatomy of a
large-scale hypertextual Web search engine. In
Proceedings of the Seventh International Confer-
ence on World Wide Web. Retrieved February 18,
2008, from http://infolab.stanford.edu/~backrub/
751
Seek and Ye Shall Find
google.html
Brin, S., Page, L., Motwami, R., & Winograd, T.
(1998). The PageRank citation ranking: Bringing
order to the Web (Tech. Rep. from the Computer
Science Department, Stanford University). Re-
trieved February 18, 2008, from http://dbpubs.
stanford.edu:8090/pub/1999-66
Bruce Clay, Inc. (2006). The histogram of the
search engine relationship chart. Retrieved
February 18, 2008, from http://www.bruceclay.
com/serc_histogram/histogram.htm
Bruce Clay, Inc. (2007). The search engine re-
lationship chart. Retrieved February 18, 2008,
from http://www.bruceclay.com/searchenginere-
lationshipchart.htm
CERN. (1993). Software freely available.
Retrieved February 18, 2008, from http://
www.w3.org/History/1993/WWW/Conditions/
,FreeofCharge.html
Cohen, L. (n.d.) Internet tutorials. University at
Albany, SUNY. Retrieved February 18, 2008, from
http://www.internettutorials.net/
Deutsch, P. (1990, September 11). An Internet
archive server server (was about Lisp). E-mail
message sent to the Newsgroup comp.sys.next.
Retrieved February 18, 2008, from http://groups.
google.com/group/comp.archives/msg/a77343f91
75b24c3?output=gplain
Deutsch, P., Emtage A., & Heelan, B. (1990).
Archiean electronic directory service for the
Internet. Retrieved February 18, 2008, from
http://tecfa.unige.ch/pub/documentation/Internet-
Resources/short-guides/whatis.archie
Fallows, D. (2005). Search engine users: Internet
searchers are confdent, satisfed and trusting
but they are also unaware and nave. Pew Internet
& American Life Project. Retrieved February 18,
2008, from http://www.pewinternet.org/
Fragoso, S. (2006). Sampling the Web: Discussing
strategies for the selection of Brazilian Web sites
for quanti-qualitative analysis. In M. Consalvo
& C. Haythornthwaite (orgs.), AoIR Internet Re-
search Annual (Vol. 4, pp. 195-208). New York:
Peter Lang.
Future Now, Inc. (2003). What converts search
engine traffc: Understanding audience, vehicle,
message and perspective to optimize your ROI.
Obtained February 18, 2008, from http://job-
functions.bnet.com/whitepaper.aspx?&tags=e-
business%2fe-commerce&docid=161804
Gray, M. (1995). Measuring the growth of the
Web: June 1993 to June 1995 (MIT Tech. Rep.).
Retrieved February 18, 2008, from http://www.
mit.edu/people/mkgray/growth/
Gulli, A., & Signorini, A. (2005). The index-
able Web is more than 11.5 billion pages. In
Proceedings of the International Conference on
the WWW 2005. Retrieved February 18, 2008,
from http://www.cs.uiowa.edu/~asignori/Web-
size/size-indexable-Web.pdf
Iprospect, Inc. (2006, April). Search engine user
behavior study. Retrieved February 18, 2008, from
http://www.iprospect.com
Kahle, B. (1991). An information system for corpo-
rate users: Wide area information servers. WAIS
Corporate paper version 3. Retrieved February
18, 2008, from ftp://think.com in /pub/wais/wais-
overview-docs.sit.hqx
Koster, M. (1993). ANNOUNCEMENT: ALIWEB
(Archie-Like Indexing for the WEB). E-mail mes-
sage sent to the Newsgroup comp.infosystems.
Retrieved February 18, 2008, from http://groups.
google.com/group/comp.infosystems.www/msg/
4b58ee36a52f21ee?oe=UTF-8&output=gplain
Koster, M. (1994). ALIWEBArchie-like in-
dexing in the Web. In Proceedings of the First
International Conference on the World Wide Web.
Retrieved February 18, 2008, from http://www.
informatik.uni-stuttgart.de/menschen/som-
mersn_public/aliWeb-paper.html
752
Seek and Ye Shall Find
Lawrence, S., & Giles, L. (1999). Accessibility and
distribution of information on the Web. Nature,
400, 107-109. Summarized version retrieved Feb-
ruary 18, 2008, from http://wwwmetrics.com
Maney, K. (2005, August 31). Google: The next
Microsoft? Noooo! Cyberspeak. USA Today.
Retrieved February 18, 2008, from http://www.
usatoday.com/tech/columnist/kevinmaney/2005-
08-30-google-microsoft_x.htm
Mauldin, M.L. (1997, January-February). Lycos:
Design choices in an Internet search service. IEEE
Expert, p. 8-11. Retrieved February 18, 2008,
from IEEE Expert Online, http://www.fuzine.
com/lti/pub/ieee97.html
Mills, E. (2006, November 15). Google, Yahoo,
Microsoft adopt same Web index tool. CNET
News.com. Retrieved February 18, 2008, from
http://www.cnet.com/?tag=hdrgif
Mohney, D. (2003, September 1). Is Google the
next Microsoft? The Inquirer. Retrieved February
18, 2008, from http://www.theinquirer.net/default.
aspx?article=11305
Murray, B. (2000). Sizing the Internet: A Cyveil-
lance white paper. Retrieved February 18, 2008,
from http://www.cyveillance.com
Murray, R. (2006, October 6) Search Wars
Salvo: Microsoft launches live search. Search
Insider Media Post. Ret rieved Febr uary
18, 2008, from htt p://publications.media-
post.com/i ndex.cf m?f useact ion=Ar t icles.
showArticleHomePage&art_aid=49199
Nielsen/Netratings. (2006, January 18). Top search
terms reveal Web users rely on search engines to
navigate their way to common Web sites. Nielsen/
Netratings Press Release. Retrieved February 18,
2008, from http://www.nielsen-netratings.com
Pinkerton, B. (n.d.) Webcrawler timeline. Re-
trieved February 18, 2008, from http://thinkpink.
com/bp/WebCrawler/History.html
Pinkerton, B. (1994). Finding what people want:
Experiences with the WebCrawler. In Proceedings
of the Second International WWW Conference.
Retrieved February 18, 2008, from http://think-
pink.com/bp/WebCrawler/WWW94.html
Salient Marketing. (n.d.). History of search
engines. Retrieved February 18, 2008, from
http://www.salientmarketing.com/seo-resources/
search-engine-history.html
Selberg, E., & Etzioni, O. (1995). Multi-service
search and comparison using the MetaCrawler.
In Proceedings of the Fourth International World
Wide Web Conference. Retrieved February 18,
2008, from http://www.w3.org/Conferences/
WWW4/Papers/169/
Sonnenreich, W., & Macinta, T. (1998). A history
of search engines. Indianapolis, IN: Wiley.
Sullivan, D. (2003, July 1). Microsofts MSN
search to build crawler-based search engine.
SearchEngineWatch. Retrieved February 18,
2008, from http://searchenginewatch.com/show-
Page.html?page=2230291
Sullivan, D. (2005, February 1). MSN search
offcially switches to its own technology. Search-
EngineWatch SearchDay. Retrieved February
18, 2008, from http://searchenginewatch.com/
searchday/article.php/3466721
Sullivan, D. (2006, August 22). Nielsen NetRat-
ings search engine ratings. SearchEngineWatch
Report. Retrieved February 18, 2008, from
http://searchenginewatch.com/reports/article.
php/2156451
Thies, D. (2005). The search engine marketing kit.
Retrieved February 18, 2008, from http://www.
sitepoint.com/books/sem1/
Van Couvering, E. (2004). New media? The politi-
cal economy of Internet search engines. In Paper
presented at the Communication Technology
Policy Section, Conference of the International
753
Seek and Ye Shall Find
Association of Media & Communications Re-
searchers (IAMCR), Porto Alegre, Brazil.
White, D M. (1950) The 'Gatekeeper': a case study
in the selection of the news. Journalism Quarterly
27 (4), pp. 383-390.
Yahoo! Media Relations. (2005). The history of
Yahoo!how it all started. Retrieved February
18, 2008, from http://docs.yahoo.com/info/pr/in-
dex.html
KEY tErMs
Boolean Search: Search that uses Boolean
logic operators (such as AND, OR and NOT) to
formulate conditions relating to the key words
or phrases to be located in a document or set of
documents
Collaborative Systems: Also known as social
systems, are constituted by the collaboration of
the users
Crawler: Program or script which methodi-
cally browses databases collecting data about its
elements
Directory: A directory service is a system
which stores and organizes information about a
database, generally in a hierarchical format
Gatekeeping: Developed by White (1950) , the
concept of gatekeeping denominates the selection
process which determines which information will
be made public in mass media
Mass Media: Communication systems in
which messages are delivered to very large audi-
ences. The association to broadcasting led to the
use of the expression mostly in relation to radio
and television, but printed material, in particular
newspapers and magazines, can also be considered
mass media.
Search Engine: Program which performs
searches for keywords or expressions in docu-
ments (in this case, Web documents) and returns
a list of results
World Wide Web: Online hypertextual space
in which documents are identifed by addresses
called URLs (Universal Resource Locator) and are
connected to one another by selectable links
ENDNOtEs
1
The expression indexable pages designates
the content of the Web that is normally ac-
cessible to search engines. Nonindexable
pages comprise the deep Web, which consists
of pages that do not give or receive links,
dynamic content generated from databases
and material of restricted access.
2
In other words, in 10 days half the pages
observed were no longer to be found at the
previously checked address.
3
Alan Emtage, Bill Heelan and Peter Deutsch
at McGill University Montreal, Canada,
1990.
4
Steve Foster and Fred Barrie, University of
Nevada System Computing Services Group,
1992.
5
Rhett Jones, University of Utah Computer
Centre, 1993.
6
Thinking Machines Corporation, Apple
Computer, KPMG and Dow Jones Co.,
1992
7
Marc Andreesen and Eric Bina, University
of Illinois at Urbana-Champaign, 1993.
8
Matthew Gray, MIT, 1993.
9
This type of progress is known as depth-frst
and implies that the crawler will return to
the initial page various times, which places
a great demand on the servers, compromis-
ing their performance. Another possible
754
Seek and Ye Shall Find
approach is that of breadth-frst, in which
the crawler follows all the links from one
page and only then proceeds to the links of
the following page.
11
The belief that all pages are within the
reach of whoeveror whateverfollows the
links continued until recently, when it was
mathematically demonstrated that the direc-
tional nature of the hyperlinks in the Web
implied that it is necessarily fragmented.
In the process some better connected sites
gain in accessibility while others can form
small inaccessible clusters (Barabsi, 2002,
p. 167).
12
Jonathon Fletcher, University of Stirling,
1993.
13
David Eichmann, Repository Based Software
Engineering Program, University of Houston,
1993.
14
Martijn Koster, NEXOR, 1993.
15
MCC Research Consortium, University of
Texas, Austin, 1994.
16
Brian Pinkerton, University of Washington,
1994.
17
In the vectorial indexing model, natural
language documents are represented through
vectors (in this case, keywords, which func-
tion as indexing terms to which vector char-
acteristics are applied). The system evaluates
relevance of documents by measuring their
spatial relation to the key words used in the
search.
18
Michael Mauldin, Carnegie Mellon Univer-
sity, 1994.
19
Digital Research Laboratories, Palo Alto,
California, 1995.
20
In which the content is not pulled by the
user but requested by (pushed) him.
21
Daniel Dreilinger, Colorado State Univer-
sity, 1995.
22
Eric Selberg and Oren Etzioni, University
of Washington, 1995.
23
Search spam consists of confguring a site
to deceive the search engine to obtain a
better classifcation.
24
Larry Page and Sergey Brin, Stanford Uni-
versity, 1998.
24
The algorithm PageRank was published in
the article The Anatomy of a Large-Scale
Hypertextual Web Search Engine, presented
at the Seventh International Conference
on World Wide Web, Brisbane, Australia,
1998.
26
It must be emphasized, however, that these
are not the same large groups as those in-
volved in traditional media.
27
The Deep Web is estimated between 500
(Cohen, 2006) and 2000 (Bergman, 2001)
times larger than the indexable Web.
28
In this chapter the terms social and col-
laborative are used indiscriminately to
denote the collective practice of creating a
list of bookmarks (social or collaborative
bookmarking) or tags (social or collabora-
tive tagging).
29
Its popularization is attributed to an article by
A Chris Anderson (2004) published in Wired
(The Long Tail) and more recently a book by
the same author, The Long Tail: Why the Future
of Business is Selling Less of More (Hyperion,
2006).
30
Graphically represented, the many small
weakly connected sites form the so named
long tail.
31
For example, Blogger was bought by Google,
in 2003; Del.icio.us and Flickr by Yahoo!,
both in 2005.
32
For example, the French portals trouvez.
com (http://www.trouvez.com) and Mozbot
(http://www.mozbot.fr) as well as Swiss-
guide (http://www.swissguide.ch), which
work in partnership with Google. Another
example is Cad, a Brazilian portal (http://
www.cade.com.br), which now belongs to
Yahoo!. The same happens with thematic
searches such as those of Civil Engineer.
(http://www.icivilengineer.com) and Insect-
clopedia (http://www.insectclopedia.com),
both currently linked to Google.

You might also like