You are on page 1of 5

Search engine basics A search engine comprises of three basic parts 1.

The spider/robot/crawler is software that visits sites on the internet (each search engine does this differently).The spider reads what is there ,follows links at the site, and ultimately brings that data back to. 2. The search engine index, catalog, or database, where everything the spider found is stored. 3. The search engine software that actually sifts through everything in the index to find matches and then ranks or sorts them into a list of results or hits. Important points to consider about search engines Spiders are programmed to return to website on regular basis, but the time interval varies widely from engine to engine. When you use a search engine ,you are searching the index or database, not the web pages themselves. This is important to remember because no search engine operates in real time

When searching note the following Use more than one search engine Weigh the validity, accuracy, currency, and overall quality of the information before using it Dont use Boolean queries unless you exactly what you are looking for and are very comfortable with that search engine Boolean rules (no, they are not the same). Be aware that search engine are giving more weight to popular and / or pay-for-placement web pages. Learn the search syntax of the search engine you use(never assume).most search engines use double quotes( ) to enclose a phrase and the plus (+) and (-) minus keys to indicate the must include and must exclude respectively .But these by no means universal rules( i.e when using international or metasearch engine). The default operator for all major US search engines is now AND. As of February .2002, no major search engine used OR as its default operator. However, most search engines will let you use an OR in the simple search box: Yahoo and Google permit OR searches in the simple search box, but you must capitalize the OR. Keep in mind that because HTML does not have a "date" tag, "date" can mean many things: creation date; the last modified date for the page; or the date search engine found the page. I do not recommend searching by date except when using weblog, news, or newsgroup search engines.

TYPES OF SEARCH TOOLS There are many type of search tools that you can use to locate information on the World Wide Web.Various search tools are developed by different companies and have different search features and techniques. They search different and overlapping parts of the WWW,no single search tool searches ALL of the web sites (there are millions of web sites and more are added each and every day). The search tools include: Search engines Web subject directories Metasearch engines the Invisible Web (also known as the "Deep Web")

Search engines The search engine interface allows you to search for certain words or phrases found on Web pages Advantages: Search engines contain millions of web pages. You retrieve results that match the word(s) that you are looking for. Can be useful for searching for unique or specific topics.

Disadvantages: Depending on the search engine, you may get thousands or millions of results. Many of the results may not be exactly what you are looking for, especially if you are using broad or common terms.

Examples AltaVista (http://www.altavista.com) Lycos (http://www.lycos.com) Google (http://www.google.com)

Web Directories

Often confused with search engines, Web Directories (also called Subject Directories) are lists of websites organized into numerous subject categories and sub-categories. Users click on a topic of interest, and then browse through the list of resources in that category, much like using a card catalogue in a library. Directories provide a more focused and organized approach to locating resources. People (often librarians or subject specialists) select, organize, annotate and often evaluate (especially the Academic & Professional Directories) the resources included in these tools. Directories can be browsed by subject or searched by keyword Advantages: There may be a higher degree of accuracy using web directories for researching broad subjects or topics.

Disadvantages: Usually fewer web sites than a search engine. Web directories may not be as useful as search engines for researching specific or obscure topics

Eg Yahoo Directory Google Directory Metasearch Engines Similar to search engines but are used to search more than one search engine at a time. Some Metasearch engines will also show you a small number of the "best" web sites from each search engine based on criteria established by the metasearch engine. Advantages: You can search several search engines at one time. Disadvantages: You may retrieve inappropriate Web sites depending on how each individual search engine interprets the search Eg: Clust Jux2 www.clust.com www.jux2.com www.dir.yahoo.com wwww.directory.google.com

Dogpile

www.dogpile.com

Note:Jux2 lets the users query three search engines- google, yahoo, and live search (MSN) and then shows 1. 2. 3. 4. The best results from all the three search engines and the total hits for each What only Google found and what is missing from Google What only Yahoo found and what is missing from Yahoo What only Live/MSN found and what is missing from Live/MSN

Invisible Web Web sites that are hidden from use by the general public. Also known as the Deep Web.Includes specialized databases and directories Advantages: Can be useful for specific topics or unique terms. Disadvantages:

The Invisible Web will not be located by the use of conventional search engines and directories. You must know the URL or search using a search tool specifically created for searching parts of the Invisible Web Examples: Complete Planet http://www.completeplanet.com

GOOGLE SEARCH ENGINE Google first gained fame and widespread use because of its single-minded focus on search, exemplified by its "clean" interface, and its PageRank "weighted link popularity." In simple terms, Google gives each webpage a rank based on the number of other pages linking to it and the "importance" of those pages, where importance is derived from an overall link count. While PageRank is imperfect, it works better than most other approaches to ranking search results and, indeed, is one of the primary reasons for Google's success. Some of Google's features that helped to create this very successful and powerful search tool are:

cached versions of webpages; Google was the first search engine to offer this option, which let users peek into its vast database. automatic conversion of non-HTML filetypes to HTML is available; Google was not the first to do this, but certainly has been the most successful. backlinks (the link: syntax); unfortunately, Google now limits the number of backlinks it shows, greatly reducing the utility of this option. Google seems to have increased its limits on the size of indexed pages. I found an indexed PDF document over 764K, a text file over 1000K, and a webpage over 366K. Very few webpages are larger than 500K. Google does not offer HTML versions of very large PDF or Word documents. Google refreshes its index continuously, not on a schedule (this is a good thing); Google stopped advertising the size of its database in 2005, but Google is one of the largest if not the largest search database.

You might also like