ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer science,download mba projects
ieee it projects,mba project topics in finance,mba marketing project topics,list java projects,mca final year projects titles,mini project titles in java,it projects in java,mba project topics for marketing,mba projects in finance,new project topics in java,project ideas for engineering students,final year projects titles,project list in java,marketing topics for mba project,students projects,eee projects titles,mba projects,mba project topics in marketing,topics for project in java,projects in ieee,list of java projects,mba project title in marketing,mini project topics for mechanical,marketing project topics for mba,project topics for eee,topics for mini ,projects in java,java projects for mca,project for final year,mca project ideas,java project titles,ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer
ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer science,download mba projects
ieee it projects,mba project topics in finance,mba marketing project topics,list java projects,mca final year projects titles,mini project titles in java,it projects in java,mba project topics for marketing,mba projects in finance,new project topics in java,project ideas for engineering students,final year projects titles,project list in java,marketing topics for mba project,students projects,eee projects titles,mba projects,mba project topics in marketing,topics for project in java,projects in ieee,list of java projects,mba project title in marketing,mini project topics for mechanical,marketing project topics for mba,project topics for eee,topics for mini ,projects in java,java projects for mca,project for final year,mca project ideas,java project titles,ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer
ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer science,download mba projects
ieee it projects,mba project topics in finance,mba marketing project topics,list java projects,mca final year projects titles,mini project titles in java,it projects in java,mba project topics for marketing,mba projects in finance,new project topics in java,project ideas for engineering students,final year projects titles,project list in java,marketing topics for mba project,students projects,eee projects titles,mba projects,mba project topics in marketing,topics for project in java,projects in ieee,list of java projects,mba project title in marketing,mini project topics for mechanical,marketing project topics for mba,project topics for eee,topics for mini ,projects in java,java projects for mca,project for final year,mca project ideas,java project titles,ieee projects titles,ieee projects list,ieee projects download,project titles in java,ieee projects on java,ieee projects for cse,ieee project,project ideas in java,ieee projects for ece,mba project list,mba project topics,mca project titles,project topics for computer science,java projects for engineering students,ieee projects for eee,mca project topics,topics for projects in java,computer project topics,java project topics,computer science project topics,ieee project titles,ideas for projects in java,ieee project titles for cse,ieee project topics for ece,ieee projects titles for cse,projects ideas in java,ieee projects titles 2010,ieee projects title,project topics for information technology,ieee project topics,ieee projects in java,ieee projects list 2010,mechanical project topics,final year project titles,mechanical projects ideas,ieee java projects 2010,mechanical project ideas,project topics in java,java projects ideas,engineering project topics,mba project titles,eee projects,projects for ece,ieee projects,mba projects list,java projects download,mba project,project for computer science,computer engineering projects,list of mba projects,topics for project,ieee project titles 2011,ece projects list,mba projects topics,mba projects download,ieee project titles for ece,project title in java,topics for java project,final year projects for information technology,mca projects in java,java mini project topics,,java projects topics,project for ece,projects titles in java,ece projects titles,java project topics list,topics of projects,java ieee projects,projects for computer
Abstract A major concern in the implementation of a distributed Web crawler is the choice of a strategy for partitioning the Web among the nodes in the system. Our goal in selecting this strategy is to minimize the overlap between the activities of individual nodes. We propose a topic-oriented approach, in which the Web is partitioned into general subject areas with a crawler assigned to each. We examine design alter-natives for a topic-oriented distributed crawler and analyze the performance of the implemented crawler The experimental evaluation demonstrates the feasibility of the approach, addressing issues of communication overhead, duplicate content detection. Introduction A crawler is a program that gathers resources from the Web. Web crawlers are widely used to gather pages for indexing by Web search engines but may also be used to gather information for Web data mining, for question answering, and for locating pages with specific content. A crawler operates by maintaining a pending queue of URLs that the crawler intends to visit. At each stage of the crawling process a URL is removed from the pending queue, the corresponding page is retrieved, URLs are extracted from this page, and some or all of these URLs are inserted back into the pending queue for future processing. For performance, crawlers often use asynchronous I/O to al-low multiple pages to be downloaded simultaneously or are structured as multithreaded programs, with each thread executing the basic steps of the crawling process concurrently with the others. Existing System In this section we out-line the design of a topic-oriented collaborative crawler that uses a text classifier to assign pages to nodes. Given the contents of a Web page, the classifier assigns the page to one of n distinct subject categories. Each subject category is associated with a local crawler. When the classifier assigns a page to a remote node, the local crawler transfers it to its assigned node for further processing. A topic-oriented collaborative crawler may be viewed as a set of broad-topic focused crawlers that partition the Web between them. The breadth of the subject categories depends on the value of n. For n in the range 10-20, two of the subject categories might be BUSINESS and Page | 3
SPORTS. For larger n, the subject categories will be narrower, such as INVESTING, FOOT-BALL and HOCKEY. The authors of the given paper used the Open Directory Project (ODP) to train their system to classify the pages. ODP according to the authors is a self-regulated organization maintained by volunteer experts who categories URLS into hierarchical class directory,. Proposed System In this project we try to implement a system that has three crawlers each of which has a particular topic associated with it that it is trying to crawl. The three topics we have crawlers for currently are Sports, Business, and Science. Each crawler when started gets its topic assigned to it and then gets a start page to start crawl that topic from. When the crawler goes to a page it will first check to make sure the page is in the category that it is crawling. If it is a good page it will add to the good links list and all the links from it will be added to the open list. If the website is off topic it is added to a list of bad links and no links from that page are added to the open list. Each web crawler keeps track of its own data when a client wants to receive information about sports it simply has to ask the sports crawler for this information. Each crawler has its own set of data that can be used either completely by itself from the crawler. If a system needed information from all the crawlers it would have to simply ask each crawler for its information. This design feature allows the design to stay away from one central location for all the data. So if one crawler goes away the other data would still be available for the system. We also needed to be able to classify pages. Unlike the researchers we decided to go with a simpler implementation than trying to train our system to know what page content meant what. We simply decided to look at the page for a bunch of keywords about each of topics. Based on how many matches were on the page we would then decided whether we had found a match to the topic or not. Another thing we took from this paper was the idea of not having a central storage location. In our system if a client wants information about a topic it sends a message directly to that crawler that is crawl that topic.
Page | 4
INTRODUCTION
The purpose of this section is to provide the reader with general, background information about the software Web Crawling. Purpose This document is the Software Requirement Specification for the Topic Oriented Distributed Web Crawler. This SRS describes the functions and performance requirements of the Web Crawler. Initially the Search Engine crawls and indexes web pages in through the command prompt and displays the results in Java Applets. We propose Software Requirement Specification for the Topic Oriented Distributed Web Crawler. This SRS describes the functions and performance requirements of the Web Crawler Scope With a 4 months time constraint we students have looked into the analysis of the Web Crawlers and its design, implementation and integration of modules (it consists three main modules). For gaining an insight into how the existing crawlers works, a Comparative study of various features the several engines offer have been made. A survey of the existing Crawlers which have been working in the background of various search engines and also been conducted in order to understand the in-addition expectations from current crawling application. The planning stage and requirement gathering stage is a base work for further analysis and design. Hence planning and requirement gathering stage has also been allotted a time period of 2 months.
Page | 5
Objective A major concern in the implementation of a distributed Web crawler is the choice of a strategy for partitioning the Web among the nodes in the system. Our goal in selecting this strategy is to minimize the overlap between the activities of individual nodes. We propose a topic-oriented approach, in which the Web is partitioned into general subject areas with a crawler assigned to each. We examine design alter-natives for a topic-oriented distributed crawler and analyze the performance of the implemented crawler The experimental evaluation demonstrates the feasibility of the approach, addressing issues of communication overhead, duplicate content detection. INTENDED AUDIENCE AND READING SUGGESTIONS
This document is meant for users, developers, project managers, testers, and documentation writers. The SRS document aims to explain in an easy manner, the basic idea behind the BATSS Search Engine and how the developers aim to achieve their goals. It also aims to introduce to the users the main features of the BATSS Search Engine and what makes it different from other Search Engines like Google, Yahoo! Etc. For gaining an insight into how the existing search engine works, a study of various features the several engines offer have been made. A survey of the existing search engines has also been conducted in order to understand the in-addition expectations from current search engine.
Existing System In this section we out-line the design of a topic-oriented collaborative crawler that uses a text classifier to assign pages to nodes. Given the contents of a Web page, the classifier assigns the page to one of n distinct subject categories. Each subject category is Page | 6
associated with a local crawler. When the classifier assigns a page to a remote node, the local crawler transfers it to its assigned node for further processing. A topic-oriented collaborative crawler may be viewed as a set of broad-topic focused crawlers that partition the Web between them. The breadth of the subject categories depends on the value of n. For n in the range 10-20, two of the subject categories might be BUSINESS and SPORTS. For larger n, the subject categories will be narrower, such as INVESTING, FOOT-BALL and HOCKEY. The authors of the given paper used the Open Directory Project (ODP) to train their system to classify the pages. ODP according to the authors is a self-regulated organization maintained by volunteer experts who categories URLS into hierarchical class directory,. Proposed System In this project we try to implement a system that has three crawlers each of which has a particular topic associated with it that it is trying to crawl. The three topics we have crawlers for currently are Sports, Business, and Science. Each crawler when started gets its topic assigned to it and then gets a start page to start crawl that topic from. When the crawler goes to a page it will first check to make sure the page is in the category that it is crawling. If it is a good page it will add to the good links list and all the links from it will be added to the open list. If the website is off topic it is added to a list of bad links and no links from that page are added to the open list. Each web crawler keeps track of its own data when a client wants to receive information about sports it simply has to ask the sports crawler for this information. Each crawler has its own set of data that can be used either completely by itself from the crawler. If a system needed information from all the crawlers it would have to simply ask each crawler for its information. This design feature allows the design to stay away from one central location for all the data. So if one crawler goes away the other data would still be available for the system. Page | 7
We also needed to be able to classify pages. Unlike the researchers we decided to go with a simpler implementation than trying to train our system to know what page content meant what. We simply decided to look at the page for a bunch of keywords about each of topics. Based on how many matches were on the page we would then decided whether we had found a match to the topic or not. Another thing we took from this paper was the idea of not having a central storage location. In our system if a client wants information about a topic it sends a message directly to that crawler that is crawl that topic. SOFTWARE AND HARDWARE REQUIREMENTS Software Requirements Operating System : Any 32 Bit OS Platform : Windows Language : JAVA Technology : Swings
Hardware Requirements Processor : Pentium 4 Ram : 512 Mb Hard Disk : 256 Mb Monitor : VGA Color (256)