You are on page 1of 41

PROJECT REPORT

ON

Development of Application to Enhance The Web Page
Search Results

Submitted for partial fulfillment of the degree of

BACHELOR OF ENGINERING

(Computer Science & Engineering)

By
Khushbu Wandhe
Komal Sahare
Monica Pardhi
Rashmeet Sabharwal

VIII Semester B.E. CSE
Department of Computer Science & Engineering

Under the Guidance of
Prof. N.M. Nirkhi
Department of Computer Science & Engineering

Department of Computer Science & Engineering
G.H.Raisoni College of Engineering, Nagpur.
(An Autonomous Institution Under UGC Act 1956)
2011-2012






CERTIFICATE

This is to certify that the dissertation entitled

Development of Application to Enhance the Web Page Search Results

Is a bonafide work and it is submitted to the Rashtrasant Tukdoji Maharaj University, Nagpur.

By

Khushbu Wandhe
Komal Sahare
Monica Pardhi
Rashmeet Sabharwal

in the partial fulfillment of the degree of BACHELOR OF ENGINEERING in Computer Science &
Engineering, during the academic year 2011-22012 under my guidance.

Prof. N.M. Nirkhi
Department of Computer Sc. & Engineering
G.H.Raisoni College of Engineering,
Nagpur.

Head Dr.P.R.Bajaj
Department of Computer Sci & Engineering Director,
G.H.Raisoni College of Engineering, G.H.Raisoni College of Engineering,
Nagpur. Nagpur



Department of Computer Science & Engineering
G.H.Raisoni College of Engineering, Nagpur.
(An Autonomous Institution Under UGC Act 1956)
2011-2012





Acknowledgement

It is our great pleasure to express our sincere gratitude to all direct and indirect help for
completing the project work prepared for the course of B.E. Computer Science &
Engineering(8
th
Sem) at G.H.R.C.E., Nagpur.

We wish to express our gratitude to our guide Prof. N.M.Nirkhi and Head of Department
for giving there every possible help, excellent suggestions in improving our programming.

We also thank the staff of G.H.R.C.E. who helped us time to time throughout this project
work.




Projectee:

Khushbu Wandhe
Komal Sahare
Monica Pardhi
Rashmeet Sabharwal

















INDEX

Sr. No. NAME OF TOPIC

PAGE No.
1 Introduction

2 Requirement Analysis

3 Process View

4 Design

5 Implementation &
Testing

6 Future Plans

7 Bibliography
























LIST OF FIGURES

FIG. No. NAME OF FIGURE PAGE No.

1 Partition of Search Results

2 Selecting a Cluster

3 Initialization

4 Authentication

5 File Menu

6 Search Menu

7 Search for Keyword

8 Result of Search

9 Selected Page Opened in
Browser

























































INTRODUCTION

INTRODUCTION

Web page clustering puts together web pages in groups, based on similarity or other
relationship measures. Tightly coupled pages, pages in the same cluster, are considered as
singular items for following data analysis steps. A complete data mining analysis could be
performed by using web pages information as it appears in web logs, but when the number of
pages taken into account increases (i.e., in a corporative large- scale web server or a server
using dynamic web pages) this process could be quite hard or even unbearable. In order to deal
with this issue, web page clustering appears as a reasonable solution. These techniques group
pages together based on some kind of relationship measure. Pages in the same cluster will be
considered as a single item for further data analysis steps.
Traditional Web page clustering algorithms use the full-text in the documents to generate
feature vectors. Such methods often produce unsatisfactory results because there is much noisy
information, such as decoration, interaction, and advertisement, in Web pages. The varying-
length problem of the Web pages is also a significant negative factor affecting the performance.
In this paper, we investigate the use of several summarization techniques to tackle these issues
when clustering Web pages. Compared with the full-text representation of the Web pages, our
experimental results indicate that our proposed approach effectively solves the problems of
noisy information and varying-length, and thus significantly boosts the clustering performance.
The web information usually is acceded by search engines and by thematic web
directories. Search engines, as Google1, return to us a sorted list which is not conceptually
sorted and it does not connect information extracted from several web pages. Nevertheless,
there are search engines, for example Vivsimo2, which besides the list of relevant documents
they show us a cluster hierarchy. When thematic web directories are used, the documents are
showed classified in taxonomies and the search process uses that taxonomy.
In this context, the document clustering algorithms are very useful to apply to tasks such
as: automatic grouping before and after the search, search by similarity, and search results
visualization on a structured way.






























































Requirement
analysis

REQUIREMENT ANALYSIS


Why Visual Basic .NET Framework

Microsoft Visual Basic .NET is faster and the easiest way to create applications for
Microsoft Windows. Visual Basic .NET provides a complete set of tools to simplify rapid
application development for the experienced as well as inexperienced users.

The Graphical User Interface (GUI) provided by visual basic .NET avoids writing of
numerous lines of codes to describe the appearance and location of interface elements. VB
.NET has evolved from the original BASIC language. It contains several hundred statements,
functions and keywords. Beginners can create useful application by learning just a few of the
keywords, yet the power of the language allows professionals to accomplish anything that can
be accomplished using any other Windows programming language.
VB.NET provides a graphical environment in which you visually design the forms and controls
that become the building blocks of your applications. VB.NET supports many useful tools that
will help you to be more productive. Time consumed by the project in VB.NET is less than that
of in any other language.

Features of Visual Basic.NET

Timer control responds to the passage of time. They are independent of the user, and user
can program them to take actions at regular intervals. A typical response is checking the system
clock to see if it is time to perform some task. Timer also is useful for other kinds of
background processing. Each timer control has an Interval property that specifies the number of
milliseconds that pass between one-timer events to the next. Unless it is disabled, a timer
continues to receive an appropriately named the timer event at roughly equal intervals of time.
At run time timer is invisible and its position and size are irrelevant. Timer event is periodic.

As VB.NET provides such functions which helps in capturing image, this project is
developed using visual basic.











Web Browser
A web browser is a software application for retrieving, presenting, and traversing
information resources on the World Wide Web. An information resource is identified by a
Uniform Resource Identifier (URI) and may be a web page, image, video, or other piece of
content.

Mini Database
A mini database is a collection of web pages that we would require to run the project, as a
part of demonstration. When this project is integrated with a website or website-based software
we would not require this database as the tool will itself access the internet and display the web
pages.

Hardware Requirements:
1. Pentium based PC or Processor
2. RAM required more than 128 for better performance.










































































Process view



Why cluster web page search results

With current search engines getting ever more powerful, is there any need for a new
type of search engine? The answer is a resounding yes. With current search engines, the largest
problem is the ordered list of results coupled with the enormous size of the internet. The first
few links in any search may not be the ones desired; in fact, the first sever hundred or thousand
may be on topics completely unrelated to what the user was searching for. The solution with
current search engines is to refine the search. This often involves adding more keywords,
altering the keywords or using advanced Boolean features. This can be a time
consuming process and even after this process, the user may not find the pages they are looking
for, as they may still be thousands down the ordered list. A change is called for, a new way of
visualizing the result set is required and web search clustering is one way to do this. By
dynamically generating a series of clusters that can be used as filters on the result set, a user can
very quickly get an overview of the entire result set, the information it contains and can filter to
the topic they require with ease.

How to cluster web page search results

The clusters are formed by partitioning the result set into clusters of pages, where the
pages within each cluster are in some way related. The aim is to generate clusters which contain
pages about the same topic, thus dynamically partitioning the result set into topics of potential
interest to the user. Ideally, the semantic properties of the content of the pages should be used
for clustering, but this is intractable. There are many syntactic properties which may indicate
that two pages could be considered related: pages may have common words, common phrases,
common in-links, common out-links, or even common words or phrases
in the in-linking or out-linked pages. The state of the art web technology for search result
clustering and the contributions made by this project consider forming these clusters solely
using properties such as these, and little to no attempt is made to understand the semantics of
the natural language within the pages or to devise the topics based on understanding this.















Fig 1:- Partition of Search Results

As shown in figure 1, there are two main tasks for web search clustering. The first is to
partition the search results into a set of clusters; the second is to generate an accurate
description or name for each cluster. Both tasks are important in enabling users to find what
they need easily, but this report focuses primarily on the first problem.


Fig: Selecting a Cluster


Information

There are two main kinds of information available for clustering web documents:
Textual information
Link Information.

Textual information is the raw data contained in the pages; this may be in the form of
individual words or phrases of arbitrary length. Textual information can be found in many
sources: it may occur in the page directly as plain text or it may occur as hidden text associated
with alt text of images, meta tags such as keywords or page description, the page title and it can
occur in the URL.

This Project

In this project, textual information is used: the clustering algorithm uses all phrases of
arbitrary length shared by two or more documents. Link information was considered carefully
throughout the project, but the overhead of the approximately forty-fold increase in page
downloads and processing time proved too large.

Fast String Algorithm

The algorithm preprocesses the target string (key) that is being searched for, but not the
string being searched in (unlike some algorithms that preprocess the string to be searched and
can then amortize the expense of the preprocessing by searching repeatedly). The execution
time of this algorithm, while still linear in the size of the string being searched, can have a
significantly lower constant factor than many other search algorithms: it doesn't need to check
every character of the string to be searched, but rather skips over some of them. Generally the
algorithm gets faster as the key being searched for becomes longer. Its efficiency derives from
the fact that with each unsuccessful attempt to find a match between the search string and the
text it is searching, it uses the information gained from that attempt to rule out as many
positions of the text as possible where the string cannot match.












- - - - - - - X - - - - - - -
A N P A N M A N - - - - - - -
- A N P A N M A N - - - - - -
- - A N P A N M A N - - - - -
- - - A N P A N M A N - - - -
- - - - A N P A N M A N - - -
- - - - - A N P A N M A N - -
- - - - - - A N P A N M A N -
- - - - - - - A N P A N M A N


The X in position 8 excludes all 8 of the possible starting positions shown.
Fast String Algorithm attempts to check whether a match exists at a particular
positionwork backwards. If it starts a search at the beginning of a text for the word
"ANPANMAN", for instance, it checks the eighth position of the text to see if it contains an
"N". If it finds the "N", it moves to the seventh position to see if that contains the last "A" of the
word, and so on until it checks the first position of the text for an "A".
Why Fast String Algorithm takes this backward approach is clearer when we consider
what happens if the verification failsfor instance, if instead of an "N" in the eighth position,
we find an "X". The "X" doesn't appear anywhere in "ANPANMAN", and this means there is
no match for the search string at the very start of the textor at the next seven positions
following it, since those would all fall across the "X" as well. After checking the eight
characters of the word "ANPANMAN" for just one character "X", we're able to skip ahead and
start looking for a match ending at the sixteenth position of the text.
This explains why the best-case performance of the algorithm, for a text of length and
a fixed pattern of length , is : in the best case, only one in characters needs to be
checked. This also explains the somewhat counter-intuitive result that the longer the pattern we
are looking for, the faster the algorithm will usually be able to find it.
The algorithm pre computes two tables to process the information it obtains in each
failed verification: one table calculates how many positions ahead to start the next search based
on the value of the character that caused the mismatch; the other makes a similar calculation
based on how many characters were matched successfully before the match attempt failed.
(Because these two tables return results indicating how far ahead in the text to "jump", they are
sometimes called "jump tables", which should not be confused with the more common meaning
of jump tables in computer science.) The algorithm will shift the larger of the two jump values
when a mismatch occurs.


The first table
- - - - A M A N - - - - - - -
A N P A N M A N - - - - - - -
- A N P A N M A N - - - - - -
- - A N P A N M A N - - - - -
- - - A N P A N M A N - - - -
- - - - A N P A N M A N - - -
- - - - - A N P A N M A N - -
- - - - - - A N P A N M A N -


The mismatch "A" in position 5 (3 back from the last letter of the needle) excludes the first 6 of
the possible starting positions shown.
Populate the first table as follows. For each i less than the length of the search string,
construct the pattern consisting of the last i characters of the string preceded by a mis-matched
character, right-align the pattern and string, and record the fewest characters the pattern must
shift left for a match.

For instance, for the search string ANPANMAN, the table would be as follows:
(NMAN signifies a substring in ANPANMAN consisting of a character that is not 'N' plus the
characters 'MAN'.)


i Pattern Left Shift
0 N
It is true that the next letter to the left in 'ANPANMAN' is not N (it is A), therefore the pattern
N must shift one left position for a match; then = 1
1 AN
AN is not a substring in ANPANMAN, then : Left_Shift is the number of letters in 'ANPANMAN'
= 8
2 MAN Substring MAN match with ANPANMAN three positions to the left. Then Left_Shift = 3
3 NMAN
We see that 'NMAN' is not a substring of 'ANPANMAN' but 'NMAN' is a possible substring 6
positions away to the left : ('NMANPANMAN'); then = 6
4 ANMAN 6
5 PANMAN 6
6 NPANMAN 6
7 ANPANMAN 6

The second table
The second table is easier to calculate: Start at the last character of the sought string and move
towards the first character. Each time you move left, if the character you are on is not in the
table already, add it; its Shift value is its distance from the rightmost character. All other
characters receive a count equal to the length of the search string.
Example: For the string ANPANMAN, the second table would be as shown (for clarity, entries
are shown in the order they would be added to the table): (The N which is supposed to be zero
is based on the second N from the right because we only record the calculation for the first
letters)
Character Shift
A 1
M 2
N 3
P 5
all other characters 8












































































design
CODING FOR SEARCH BUTTON:

Private Sub btnSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
btnSearch.Click

Dim SerchKey As String
Dim index As Integer
SerchKey = Trim(txtSearchMain.Text)
SerchKey = " " & SerchKey & " "
Dim filename As String
Dim foundnowords As Integer
MessageBox.Show("co=" + itemcount.ToString())
page = 0
For index = 0 To itemcount - 1
page = page + 1
pb.Value = page
filename = databaselist.FileListBox1.Items(index)
ReadFile(filename)
foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then
' MessageBox.Show("dd" + foundnowords.ToString())
Else
listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1))
End If

Next
lbltot.Text = listAvailable.Items.Count

End Sub

SECND SEARCH BUTTON:

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
databaselist.FileListBox1.Path = My.Application.Info.DirectoryPath & "\database"
itemcount = databaselist.FileListBox1.Items.Count
pb.Maximum = itemcount
'lbltot.Text = itemcount
End Sub

Private Sub btnSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
btnSearch.Click

Dim SerchKey As String
Dim index As Integer
SerchKey = Trim(txtSearchMain.Text)
SerchKey = " " & SerchKey & " "
Dim filename As String
Dim foundnowords As Integer
MessageBox.Show("co=" + itemcount.ToString())
page = 0
For index = 0 To itemcount - 1
page = page + 1
pb.Value = page
filename = databaselist.FileListBox1.Items(index)
ReadFile(filename)
foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then
' MessageBox.Show("dd" + foundnowords.ToString())
Else
listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1))
End If

Next
lbltot.Text = listAvailable.Items.Count

End Sub
Private Sub ReadFile(ByVal file As String)
Dim mindex As Integer
Dim apppath As String = My.Application.Info.DirectoryPath
Dim reader As System.IO.StreamReader
Readtext = ""
reader = New System.IO.StreamReader(apppath & "\database\" & file)
mindex = 0
While reader.Peek <> -1
ReadText = ReadText & reader.ReadLine & vbCrLf
' TextBox1.Text = TextBox1.Text & reader.ReadLine & vbCrLf
mindex = mindex + 1
End While
reader.Close()
End Sub
Private Function SearchForWord(ByVal FileText As String, ByVal Findtext As String) As Integer
Dim i, mlen, flen As Integer
Dim s As String
'Dim pre As Integer
mlen = Len(FileText)
flen = Len(Findtext)
Dim count As Integer = 0
For i = 1 To mlen
s = Mid(FileText, i, flen)

If LCase(s) = Findtext Then
count = count + 1
End If
Next
If count >= 2 Then
Return count

End If

Return count = 0

End Function

Private Sub btnSearchCat_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
btnSearchCat.Click
Dim ch As Char()
Dim mycount As Integer
sm = True
EnterString = txtCat.Text
ch = EnterString.ToCharArray()
For i = 0 To EnterString.Length - 1
If ch(i) = "," Then
mycount = mycount + 1

End If
Next

If (mycount < 5) Then


hhh:
EnterString = BuildWord(EnterString)
If (EnterString.Length > 1) Then

GoTo hhh
End If



For cn = 0 To wordcount
FindSite(cn)
MakeHtmlFile(cn)
MsgBox("Search Complete For " + word(cn), MsgBoxStyle.Information, "Result")
Next cn

Process.Start(My.Application.Info.DirectoryPath & "\Main.html")
Else
MessageBox.Show("Only Five Or Less Then Five Category Enter the String ")
End If
End Sub
Private Function BuildWord(ByVal str As String) As String

Dim ch As Char()
Dim j As Integer = 0
Dim i As Integer

ch = str.ToCharArray()
EnterString = ""
If (wordcount > 0) Then

ch(0) = " "
End If

For i = 0 To str.Length - 1

If ch(i) = "," Then

wordcount = wordcount + 1


Exit For

End If
word(wordcount) = word(wordcount) + ch(i)

Next i

For j = i To str.Length - 1

EnterString = EnterString + ch(j)

Next j

Return EnterString

End Function
Private Sub FindSite(ByVal cn As Integer)
If (cn = 0) Then


Dim serchfound As Integer = 0
Dim numofsite As Integer
numofsite = listAvailable.Items.Count
pb.Maximum = numofsite
searchpage = 0
For i = 0 To numofsite - 1
searchpage = searchpage + 1
pb.Value = searchpage
filestr = listAvailable.Items(i)
ReadFile(filestr)
' Label4.Text = fileStr
serchfound = SearchForWord(ReadText, " " & word(0) & " ")
If serchfound < 1 Then
Else
Categry1(noofsite) = listAvailable.Items(searchpage - 1)
noofsite = noofsite + 1
End If
Next i

End If

If cn = 1 Then


Dim serchfound As Integer = 0
Dim numofsite1 As Integer
numofsite1 = listAvailable.Items.Count
pb.Maximum = numofsite1
searchpage = 0
For i = 0 To numofsite1 - 1
searchpage = searchpage + 1
pb.Value = searchpage
filestr = listAvailable.Items(i)
ReadFile(filestr)
' Label4.Text = fileStr
serchfound = SearchForWord(ReadText, " " & word(1) & " ")
If serchfound < 1 Then
Else
Categry2(noofsite1) = listAvailable.Items(searchpage - 1)
noofsite1 = noofsite1 + 1
End If
Next i


End If

If cn = 2 Then
Dim serchfound As Integer = 0
Dim numofsite2 As Integer
numofsite2 = listAvailable.Items.Count
pb.Maximum = numofsite2
searchpage = 0
For i = 0 To numofsite2 - 1
searchpage = searchpage + 1
pb.Value = searchpage
filestr = listAvailable.Items(i)
ReadFile(filestr)
' Label4.Text = fileStr
serchfound = SearchForWord(ReadText, " " & word(2) & " ")
If serchfound < 1 Then
Else
Categry3(noofsite2) = listAvailable.Items(searchpage - 1)
noofsite2 = noofsite2 + 1
End If
Next i
End If

If cn = 3 Then


Dim serchfound As Integer = 0
Dim numofsite3 As Integer
numofsite3 = listAvailable.Items.Count
pb.Maximum = numofsite3
searchpage = 0
For i = 0 To numofsite3 - 1
searchpage = searchpage + 1
pb.Value = searchpage
filestr = listAvailable.Items(i)
ReadFile(filestr)
' Label4.Text = fileStr
serchfound = SearchForWord(ReadText, " " & word(3) & " ")
If serchfound < 1 Then
Else
Categry4(noofsite3) = listAvailable.Items(searchpage - 1)
noofsite3 = noofsite3 + 1
End If
Next i
End If


If cn = 4 Then


Dim serchfound As Integer = 0
Dim numofsite4 As Integer
numofsite4 = listAvailable.Items.Count
pb.Maximum = numofsite4
searchpage = 0
For i = 0 To numofsite4 - 1
searchpage = searchpage + 1
pb.Value = searchpage
filestr = listAvailable.Items(i)
ReadFile(filestr)
' Label4.Text = fileStr
serchfound = SearchForWord(ReadText, " " & word(4) & " ")
If serchfound < 1 Then
Else
Categry5(noofsite4) = listAvailable.Items(searchpage - 1)
noofsite4 = noofsite4 + 1
End If
Next i
End If


End Sub
Private Sub MakeHtmlFile(ByVal cn As Integer)
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\Main.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">YOUR
RESULT " & "</font></b></p>")
For i = 0 To wordcount
If i = 0 Then
Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat1.html" & """" & ">" & word(0) & "</a></font></p>")
End If
If i = 1 Then
Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat2.html" & """" & ">" & word(1) & "</a></font></p>")
End If
If i = 2 Then
Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat3.html" & """" & ">" & word(2) & "</a></font></p>")
End If
If i = 3 Then
Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat4.html" & """" & ">" & word(3) & "</a></font></p>")
End If

If i = 4 Then
Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat5.html" & """" & ">" & word(4) & "</a></font></p>")
End If


Next i


'If cn = 3 Then
'Writer.WriteLine("<p><font size=" & "5 ><a href=" & """" & "file:///" &
My.Application.Info.DirectoryPath & "/" + "SubCat4.html" & """" & ">" & word(cn) & "</a></font></p>")
'End If

'Writer.WriteLine("<p><font size=" & 5 & "> " + txtCat.Text & "</font></p>")

' For i = 0 To noofsite - 1
'Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" & """" &
path & "\database\" & Categry(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
'Next


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()
For i = 0 To wordcount
If i = 0 Then
SubCat1()


' Return
End If
If i = 1 Then
SubCat2()


End If

If i = 2 Then
SubCat3()

'Return
End If

If i = 3 Then
SubCat4()

'Return
End If

If i = 4 Then
SubCat5()

'Return
End If

Next i
'MsgBox("Search Complete For" + word(aa), MsgBoxStyle.Information, "Result")
sm = False
End Sub
Public Sub SubCat1()
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat1.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH
RESULT FOR " & UCase(word(0)) & "</font></b></p>")
If noofsite > 0 Then
Writer.WriteLine("<p><font size=" & 5 & "> " + word(0) & +noofsite.ToString() &
"</font></p>")

For i = 0 To noofsite - 1
Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" &
"""" & path & "\database\" & Categry1(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
Next
End If


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()

End Sub
Public Sub SubCat2()
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat2.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH
RESULT FOR " & word(1) & "</font></b></p>")
If noofsite1 > 0 Then
Writer.WriteLine("<p><font size=" & 5 & "> " + UCase(word(1)) & +noofsite1.ToString() &
"</font></p>")

For i = 0 To noofsite1 - 1
Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" &
"""" & path & "\database\" & Categry2(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
Next
End If


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()

End Sub
Public Sub SubCat3()
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat3.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH
RESULT FOR " & word(2) & "</font></b></p>")
If noofsite2 > 0 Then
Writer.WriteLine("<p><font size=" & 5 & "> " + word(2) & +noofsite2.ToString() &
"</font></p>")

For i = 0 To noofsite2 - 1
Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" &
"""" & path & "\database\" & Categry3(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
Next
End If


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()

End Sub
Public Sub SubCat4()
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat4.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH
RESULT FOR " & word(3).ToString() & "</font></b></p>")
If noofsite3 > 0 Then
Writer.WriteLine("<p><font size=" & 5 & "> " + word(3) + noofsite3.ToString() &
"</font></p>")

For i = 0 To noofsite3 - 1
Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" &
"""" & path & "\database\" & Categry4(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
Next
End If


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()

End Sub



Public Sub SubCat5()
Dim Writer As System.IO.StreamWriter
Dim i As Integer = 0
Dim _readthread As System.Threading.Thread
Dim lineString As String
Dim path As String = My.Application.Info.DirectoryPath
''If System.IO.File.Exists(FileName) Then
Writer = New System.IO.StreamWriter(My.Application.Info.DirectoryPath & "\SubCat5.html", False)

Writer.WriteLine("<body>")

Writer.WriteLine("<p align" & "=" & "center" & "><b><font color" & "=" & "#FF0000" & ">SEARCH
RESULT FOR " & word(4) & "</font></b></p>")
If noofsite4 > 0 Then
Writer.WriteLine("<p><font size=" & 5 & "> " + word(4) + noofsite4.ToString() &
"</font></p>")

For i = 0 To noofsite4 - 1
Writer.WriteLine("<p align=" & "left" & "><font color=" & "#0000FF" & "><b><a href=" &
"""" & path & "\database\" & Categry5(i) & """" & ">Page" & i + 1 & "</a></b></font></p>")
Next
End If


Writer.WriteLine("</body>")
Writer.WriteLine("</html>")
Writer.Close()

End Sub

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
Button2.Click
databaselist.Show()
End Sub

Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
Button3.Click
Me.Hide()
End Sub

Private Sub btnBoth_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
btnBoth.Click

Dim SerchKey As String
Dim index As Integer
SerchKey = Trim(txtSearchMain.Text)
SerchKey = " " & SerchKey & " "
Dim filename As String
Dim foundnowords As Integer
MessageBox.Show("co=" + itemcount.ToString())
page = 0
For index = 0 To itemcount - 1
page = page + 1
pb.Value = page
filename = databaselist.FileListBox1.Items(index)
ReadFile(filename)
foundnowords = SearchForWord(ReadText, SerchKey)

If foundnowords < 1 Then
' MessageBox.Show("dd" + foundnowords.ToString())
Else
listAvailable.Items.Add(databaselist.FileListBox1.Items(page - 1))
End If

Next
lbltot.Text = listAvailable.Items.Count


Dim ch As Char()
Dim mycount As Integer
sm = True
EnterString = txtCat.Text
ch = EnterString.ToCharArray()
For i = 0 To EnterString.Length - 1
If ch(i) = "," Then
mycount = mycount + 1

End If
Next

If (mycount < 5) Then


hhh:
EnterString = BuildWord(EnterString)
If (EnterString.Length > 1) Then

GoTo hhh
End If



For cn = 0 To wordcount
FindSite(cn)
MakeHtmlFile(cn)
MsgBox("Search Complete For " + word(cn), MsgBoxStyle.Information, "Result")
Next cn

Process.Start(My.Application.Info.DirectoryPath & "\Main.html")
Else
MessageBox.Show("Only Five Or Less Then Five Category Enter the String ")
End If
End Sub
End Class



CODING FOR LOGIN FORM

Public Class LoginForm1

Private Sub OK_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles OK.Click
If UsernameTextBox.Text = "ABC" And PasswordTextBox.Text = "ABC" Then
MessageBox.Show("Login Successfully Done")
Me.Hide()
MAIN.Show()

Else
MessageBox.Show("Invalid Log In")
End If

End Sub

Private Sub Cancel_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
Cancel.Click
Me.Close()
End Sub

End Class




CODING FOR MAIN MENU

Public Class MAIN
Dim childform(1) As databaselist
Dim childform1(1) As Form1

Dim chil As Integer = 0
Dim SorucePath As String = ""
Dim Filename As String = ""

Private Sub CreateWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles CreateWebDataToolStripMenuItem.Click
Dim objOpenFileDialog As New OpenFileDialog
'Set the Open dialog properties
With objOpenFileDialog
.Filter = "Html files (*.html)|*.htm|All files (*.*)|*.*"
.FilterIndex = 1
.Title = "Select a File"

End With

'Show the Open dialog and if the user clicks the Open button,
'load the file
If objOpenFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then

Try

SorucePath = objOpenFileDialog.FileName

Catch fileException As Exception
Throw fileException
End Try
End If

'Clean up
objOpenFileDialog.Dispose()
objOpenFileDialog = Nothing



Filename = FindFilename()

Try


File.Copy(SorucePath, My.Application.Info.DirectoryPath & "\database\" + Filename)


MsgBox("File Create ")
' File.Delete("c:\testFile.txt")


Catch ex As Exception
MessageBox.Show("" + ex.Message)
End Try
End Sub

Private Function FindFilename() As String
Dim revesefilename As String = ""
Dim filename As String = ""
Dim ch As Char()
Dim ch1 As Char()
ch = SorucePath.ToCharArray()
Dim co As Integer = SorucePath.Length - 1
While co > 1
If ch(co) = "\" Then
Exit While
End If
revesefilename = revesefilename + ch(co)
co = co - 1
End While
ch1 = revesefilename.ToCharArray()
co = revesefilename.Length - 1
While co > -1

filename = filename + ch1(co)
co = co - 1
End While
Return filename

End Function

Private Sub FindAllWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles FindAllWebDataToolStripMenuItem.Click
chil = chil + 1
childform(chil) = New databaselist()
childform(chil).MdiParent = Me
childform(chil).Show()
End Sub

Private Sub SearchWebPageToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles SearchWebPageToolStripMenuItem.Click
Dim ind As Integer = 0
ind = ind + 1
childform1(ind) = New Form1()
childform1(ind).MdiParent = Me
childform1(ind).Show()
'Form1.Show()
End Sub

Private Sub SearchWebDataToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles SearchWebDataToolStripMenuItem.Click
Dim Path As String = My.Application.Info.DirectoryPath + "\Database\My Folder"
Dim info As IO.FileInfo = My.Computer.FileSystem.GetFileInfo(Path)
Path = info.DirectoryName
Process.Start("explorer.exe", Path)
End Sub

Private Sub MAIN_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
MyBase.Load
Form1.Hide()
databaselist.Hide()
End Sub

Private Sub ExitToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)
Handles ExitToolStripMenuItem.Click
Application.Exit()

End Sub

Private Sub FileMenuToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles FileMenuToolStripMenuItem.Click

End Sub
End Class



CODING FOR DATBASE DIALOG BOX

Public Class databaselist
Public itemcount As Integer

Private Sub databaselist_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
MyBase.Load
FileListBox1.Path = My.Application.Info.DirectoryPath & "\database"
itemcount = FileListBox1.Items.Count
'pb.Maximum = itemcount
lbltot.Text = itemcount
End Sub

Private Sub FileListBox1_SelectedIndexChanged(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles FileListBox1.SelectedIndexChanged

End Sub

Private Sub lbl_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles lbl.Click

End Sub
End Class


CODING FOR TIMER

Public Class Splash
Dim w As Integer = -10
Dim per As Integer = 0
Dim temp As Integer = 0
Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
Timer1.Tick

w = w + 1
Button1.Width = w

If (w Mod 5 = 0) Then
per = per + 1
lblPer.Text = "" + per.ToString() + "%"
If (per = 100) Then
lblPer.Text = "100% Complete"
End If

End If
If (w > 490) Then
Timer1.Stop()
Label2.Visible = True
Timer2.Start()
End If
End Sub

Private Sub Splash_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
MyBase.Load
Timer1.Start()
End Sub

Private Sub Timer2_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
Timer2.Tick
temp = temp + 1
If (temp = 10) Then
LoginForm1.Show()
Me.Hide()
Timer2.Stop()
End If
End Sub

Private Sub PictureBox1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles
PictureBox1.Click

End Sub
End Class




















































Implementation &
testing

Implementation & Testing

MODULES:

In the first module, we will be creating the used interfaces. Here all the forms will be
created and the flow will be decided.
Second module will be creation of the user entry module where all the details of the
persons using the system have to be entered.
Hardware circuitry will be developed in the first, second and third module
simultaneously.
In the last module the software and hardware will be tested and changed if necessary.






Fig.3:- Initialization

















Fig. 4:- Authentication








Fig.5:- File Menu







Fig.6:- Search Menu







Fig.7:- Search for Keyword






Fig.8:- Result of Search







Fig.9:- Selected Page opened in Browser



















































Future plans

FUTURE PLANS


In Social Networking Sites
Clustering Based Search engine will help the users find the exact community, fan pages,
or discussion boards, they desire.
Various Fields in Corporate World
It can also be used in various fields like banking system, Colleges, Business Firms,
Census System and so on.
Mobile Systems
This software can also be implemented in mobile devices which supports a large group of
internet users.































































bibliography

BIBLIOGRAPHY


Improvements To Web Page Clustering Method- Daniel Wayne Crabtree
IEEE Papers based on Web Page Clustering
John.M.Pierre, Practical Issues for Automated Categorization of Web Pages,September
2000.

You might also like