You are on page 1of 35

CERTIFICATE

This to certify that the project entitled Search Engine submitted by


Shivam Kainth, Himank Chandra, Dheeraj Sonkhla , students of
University Institute of Information Technology as a part of course
curriculum for B.Tech (I.T.) HPU Shimla is a record of student on study
carried under the supervision and guidance. This report has not been
submitted to any other university or any institution for award of any
degree. The students under University Roll No. 61543,61545 and 61529
respectively has satisfactorily completed their project as per the partial
fulfillment of the B.Tech Course in Information & Technology (I.T.) of
semester 7th. During this period they have shown exemplary dedication
and discipline towards the work assigned to them.

Guided By:

Assistant Prof: Balvir Thakur

1|Page
ACKNOWLEDGEMENT

No task is a single mans effort. Various factors, situations and persons


integrate to provide the background for accomplishment of a task. Behind
this work lie the kind help, assistance and valuable advice of many
persons whom I will remain indebted. Never can this report be claimed as
my individual effort.

I am encouraged to get this privileged to express my deep gratitude to


those persons whose co-operation and suggestion helped me a lot to
complete this project work. This acknowledgement is not a means of
formality, but to me it is a way by which I am getting the opportunity to
show deep sense of gratitude and obligation to all the people who provide
me inspiration, guidance and help during the preparations of the project.
Although it is not possible to give individual thanks to all members, who
helped me a lot during my training period.

2|Page
CONTENTS
1. Introduction 6

2. Requirements 13

2.1System Specifications

3. Design Approach 14

3.1.1 Flow chart

3.1.2 DFD

4. Project Modules 17

5. Implementation 23

5.1.1 Concepts

5.1.2 Techniques

6. Output Screens 25

7. Conclusion 33

8. Future Enhancements 34

9. Bibliography 35

3|Page
ABSTRACT

AIM

The aim of our project was to explore new avenues in


the field of Search Engines. This work exploits the use of PHP and MySQL
to create a replica of Search Engine.

SCOPE

This project aspires to be a simulation of Search


Engine, if coupled with some website, it can turn out to be a full-fledged
working module.

FEATURES

o Client-Server Model based on MySQL concept.


o Intermediate Administrator for indexing.
o Client Side is platform Independent.

MODULES

This project Consist of three modules.

1) Server Side storage

2) Intermediate Administrator

3) Client Side Web Browser

4|Page
Modules Description:

1) Server:

The Following are the Server side Functionalities.

Save query logs


Find latest searches
2) Intermediate Administrator

Privilege to add sites.


Can check Database Status.
Index additional sites

3) Client
The Following are the Client side Functionalities.
Search keywords
Select the category to search in.
Visit external links

5|Page
1. INTRODUCTION

1.1 PURPOSE
The project was to explore new avenues in Web Development like the
PHP and MySQL along with traditional concepts like HTML5
&CSS3.WampServer is a Windows web development environment. It
allows you to create web applications with Apache2, PHP and a MySQL
database. It also comes with PHPMyAdmin and SQLiteManager to easily
manage your databases. WAMP is sometimes used as an abbreviated
name for the software stack Windows, Apache, MySQL, PHP.

1.2 SCOPE

This project aspires to be a simulation of search engines.

Can be used by any webmaster to index their website.


System was developed exclusively for offline searches yet it can be
integrated with online website.
Can index both static and dynamic pages.

1.3 OVERVIEW

It includes an automated crawler, which can follow links found on a site,


and an indexer which builds an index of all the search terms found in the
pages. It is written in PHP and uses MySQL as its back end database

Supports AND, OR and phrase searches

Option to add and group sites into categories

Possibility to limit searching to a given category and its subcategories.

6|Page
2. ABOUT PHP & What PHP CAN DO.

PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-


used open source general-purpose scripting language that is especially
suited for web development and can be embedded into HTML. The PHP
code is enclosed in special start and end processing
instructions <?php and ?> that allow you to jump into and out of "PHP
mode."

What distinguishes PHP from something like client-side JavaScript is that


the code is executed on the server, generating HTML which is then sent to
the client. The client would receive the results of running that script, but
would not know what the underlying code was. You can even configure
your web server to process all your HTML files with PHP, and then there's
really no way that users can tell what you have up your sleeve.

PHP is mainly focused on server-side scripting, so you can do anything any


other CGI program can do, such as collect form data, generate dynamic
page content, or send and receive cookies. But PHP can do much more.

PHP CAN BE USED TO CREATE THREE TYPES OF


PROGRAMS:

o Server-side scripting. This is the most traditional and main target field for
PHP. You need three things to make this work: the PHP parser (CGI or
server module), a web server and a web browser. You need to run the web
server, with a connected PHP installation. You can access the PHP
program output with a web browser, viewing the PHP page through the
server. All these can run on your home machine if you are just
experimenting with PHP programming. See the installation
instructions section for more information.

7|Page
o Command line scripting. You can make a PHP script to run it without any
server or browser. You only need the PHP parser to use it this way. This
type of usage is ideal for scripts regularly executed using cron (on *nix or
Linux) or Task Scheduler (on Windows). These scripts can also be used
for simple text processing tasks. See the section about Command line usage
of PHP for more information.
o Writing desktop applications. PHP is probably not the very best language
to create a desktop application with a graphical user interface, but if you
know PHP very well, and would like to use some advanced PHP features
in your client-side applications you can also use PHP-GTK to write such
programs. You also have the ability to write cross-platform applications
this way. PHP-GTK is an extension to PHP, not available in the main
distribution.

FEATURES OF PHP

8|Page
Development of PHP webpages

In order to develop and run PHP Web pages three vital components need
to be installed on your computer system.

Web Server PHP will work with virtually all Web Server software,
including Microsoft's Internet Information Server (IIS) but then most
often used is freely available Apache Server.

MySQL
MySQL is a fast, easy to use relational database. It is currently the most
popular open-source database. It is very commonly used in conjunction
with PHP scripts to create powerful and dynamic server-side applications.

MySQL is an open source database product that was created by MySQL


AB, a company founded in 1995 in Sweden. In 2008, MySQL AB
announced that it had agreed to be acquired by Sun Microsystems for
approximately $1 billion.

MySQL Features
o Relational Database Management System (RDBMS): MySQL is a
relational database management system.
o Easy to use: MySQL is easy to use. You have to get only the basic
knowledge of SQL. You can build and interact with MySQL with only a
few simple SQL statements.
o It is secure: MySQL consist of a solid data security layer that protects
sensitive data from intruders. Passwords are encrypted in MySQL.
o Client/ Server Architecture: MySQL follows a client /server
architecture. There is a database server (MySQL) and arbitrarily many
clients (application programs), which communicate with the server; that
is, they query data, save changes, etc.
o Free to download: MySQL is free to use and you can download it from
MySQL official website.
o It is scalable: MySQL can handle almost any amount of data, up to as
much as 50 million rows or more. The default file size limit is about 4 GB.

9|Page
However, you can increase this number to a theoretical limit of 8 TB of
data.
o Compatibale on many operating systems: MySQL is compatible to run
on many operating systems, like Novell NetWare, Windows* Linux*,
many varieties of UNIX* (such as Sun* Solaris*, AIX, and DEC* UNIX),
OS/2, FreeBSD*, and others. MySQL also provides a facility that the
clients can run on the same computer as the server or on another computer
(communication via a local network or the Internet).

WAMP SERVER

WAMP is sometimes used as an abbreviated name for the software stack


Windows, Apache, MySQL, PHP. It is derived from LAMP which
stands for Linux, Apache, MySQL, and PHP. As the name implies, while
LAMP is used on Linux servers, WAMP is used on Windows servers.
Because WordPress isnt usually installed on Windows Servers, WAMP
has become popular among developers as a method of installing
WordPress on their personal computers.

The A in WAMP stands for Apache. Apache is server software that is


used to serve webpages. Whenever someone types in your WordPress
websites URL, Apache is the software that serves your WordPress
site.

The M in WAMP stands for MySQL. MySQL is a database


management system. Its job in the software stack is to store all of your
websites content, user profiles, comments, etc.

The P in WAMP stands for PHP. PHP is the programming language


that WordPress is written in. It is also the piece that holds the entire
software stack together. It runs as a process in Apache and
communicates with the MySQL database to dynamically build your
webpages.

WAMP software stack can be downloaded from wampserver


projects download page. For Microsoft windows users, it comes in an
easy installation package with a control panel.

10 | P a g e
JavaScript
JavaScript is a object-based scripting language and it is light weighted.
It is first implemented by Netscape (with help from Sun Microsystems).
JavaScript was created by Brendan Eich at Netscape in 1995 for the
purpose of allowing code in web-pages (performing logical operation on
client side).

Where it is used?

It is used to create interactive websites. It is mainly used for:

Client-side validation

Dynamic drop-down menus

Displaying data and time

Build small but complete client side programs.

Features of JavaScript

JavaScript is an object-based scripting language.

Giving the user more control over the browser.

It Handling dates and time.

It Detecting the user's browser and OS,

It is light weighted.

JavaScript is a scripting language and it is not java.

JavaScript is interpreter based scripting language.

JavaScript is case sensitive.

JavaScript is object based language as it provides predefined objects.

Every statement in javascript must be terminated with semicolon (;).

Most of the javascript control statements syntax is same as syntax of


control statements in C language.

11 | P a g e
An important part of JavaScript is the ability to create new functions
within scripts. Declare a function in JavaScript using function keyword.

Database PHP will work with virtually all database software, including
Oracle and Sybase but most commonly used is freely available MySQL
database. Download MySQL for free here

PHP Parser In order to process PHP script instructions a parser must


be installed to generate HTML output that can be sent to the Web
Browser. This tutorial will guide you how to install PHP parser on your
computer.

12 | P a g e
2. SYSTEM REQUIREMENTS
3.1 System Specifications
System specifications are very important aspects of a project work. For a
developer to ensure the perfect working of project one must be aware if
minimum requirements to be fulfilled by a computer system in terms of
both hardware and software.

The minimum requirement of a system for our project to be working


efficiently we gathered the following data.

HARDWARE REQUIREMENTS:

Intel Atom Processor and Above


RAM 256MB and Above

SOFTWARE REQUIREMENTS:

Wamp Server (For Offline Use)


Window XP sp3 and above.
Compatible Web Browser (Firefox, chrome, IE).
Sublime Text Editor.
MySQL 5.5

13 | P a g e
3. Design Approach

Design is the first step in the development phase for any techniques
and principles for the purpose of defining a device, a process or system in
sufficient detail too permit its physical realization.

Once the software requirements have been analyzed and specifies


the software design involves three technical activities design, coding,
implementation and testing that are required to build and beriy the
software.

The design activities are of main importance in the phase, because in


this activity. Decisions ultimately affecting the success of the software
implementation and its ease of maintenance are made. These decision
have the final bearing upon reliability and maintainability of the system.
Design is the only way to accurately translate the customers requirement
into finished software

Design is the place where quality is fostered in development.


Software design is the process through which requirements are translated
into a representation of software. Software design is conducted in two
steps. Preliminary designed is concerned with the transformation of
requirement into data.

DFD:

A data flow diagram (DFD) is a graphical representation of the "flow"


of data through an information system, modelling its process aspects. A
DFD is often used as a preliminary step to create an overview of the
system without going into great detail, which can later be elaborated.[2]
DFDs can also be used for the visualization of data processing (structured
design).

A DFD shows what kind of information will be input to and output from
the system, how the data will advance through the system, and where the
data will be stored. It does not show information about the timing of

14 | P a g e
process or information about whether processes will operate in sequence
or in parallel unlike a flowchart which also shows this information.

Following is the DFD created for the project at level 1

This shows the basic working of our project. (Fig 3.1). In this we have an
input unit , a storage unit and the output results unit.

Fig:3.1 Level 1 DFD

The following DFD show the flow diagrams of the controls for an
administrative.

Using these controls the admin can control various settings related to
the search engine, e.g. adding new sites for indexing, configure the
keywords, maintains the level for crawler to crawl the website.

15 | P a g e
Fig: 3.2

16 | P a g e
4. Project Module

4.1 Admin Module

Admin module consists of different functions. Admin has full control


over the data that is being stored on the database.
Admin can decide which website to index and to what depth it can be
indexed. Admin can also check the database logs, can re-index a website
and can also optimize tables. He can also keep track of total number of
sites in the database and number of keywords indexed.
Following are the some modules which are used in our project
development.

Module 1:

17 | P a g e
Module 2:

Module 3:

This module is the login page for the Admin.

18 | P a g e
Module 4:

Module 5:

19 | P a g e
4.2 Server Module
Following are the some server side modules that are used in our project
work:

Module 1:
This module deals with the creation of tables.In this module we created
various modules including modules for storing keywords and addition of
various sites to the project.

20 | P a g e
4.3 Search Module
This module is the place where we select the various aspects of our
search, as an option we can limit our search to a particular domain or we
can search from various sites all together.

21 | P a g e
This module is the css of our search page. This is used to enhance the UI
of our Search Engine. All links are displayed as a row in the table thus it
is quite easy to manipulate the elements later.

22 | P a g e
5. Implementation

Implementation is the stage where the theoretical design is turned into a


working system. The most crucial stage in achieving a new successful
system and in giving confidence on the new system for the users that it
will work efficiently and effectively.

The system can be implemented only after it is found to work according


to the specification.

It involves careful planning, investigation of the current system and its


constraints on implementation, design of methods to achieve the
changeover and an evaluation of changeover methods apart from
planning. To measure tasks of preparing the implementation are
education and training of the users.

The more complex the system being implemented, the more involved will
be the system analysis and design effort.

The implementation phase comprises of several activities. The required


h/w and s/w acquisition is carried out. The system may require some
softwares to be developed. For this , programs are tested and used. The
user then changes over to his new fully tested system and the old system
is discontinued.

Online implementation:

To implement this search engine online we need a web server for the
purpose of hosting this application.

One can use different server providers for hosting services e.g. hostgator,
Amazon web services (Aws) etc.

The web server provider will provide the developer with the control panel
credentials. By using these credentials developer can get the
administrative controls. From control panel one can manage the database
of the search engine and can also manage the homepage with FTP
services.

23 | P a g e
End users are provided with a web URL and they can input their query in
the search box provided.

24 | P a g e
6. Output Screens
Output 1:

Admin login screen.

Output 2:

Admin Control panel screen.

25 | P a g e
Fig: Adding website for indexing and controlling depth setting.

Fig: Checking the status of the database (keywords, links, sites etc.)

26 | P a g e
Fig: occurrence of the indexed keywords

Fig: Tables in the database.

27 | P a g e
fig: View of the keywords in the table

Fig: Indexed websites in the database.

28 | P a g e
Fig: Database

29 | P a g e
Fig: Homepage of the Search Engine

30 | P a g e
Fig: Search results

Fig: Search results

31 | P a g e
Fig: Search results

32 | P a g e
7. Conclusions
The package was designed in such a way that future modifications can be
done easily. The following conclusions can be deduced from the
development of the project.

It provides a friendly graphical user interface which proves to be better


when compared to existing system.
The adding of new websites in database is very easy.
Updating of the data in database is easy.
The system can adapt the new features in the future very easily as per the
new needs.

33 | P a g e
8. Future Enhancements

I and my team members have worked hard in order to present our project
at its best in front of you. Still we know that our project can be done in a
better way and can be enhanced by adding some new features like

Image search
Categorization
More Administrative controls like settings etc.
Add the users likes and dislikes to database on the bases of his/ her
search history
And many more new features as the time passes.

34 | P a g e
9. Bibliography
Websites referred:

www.w3schools.com
www.javatpoint.com
www.tutorialspoint.com

35 | P a g e

You might also like