Professional Documents
Culture Documents
Master of Technology
in
Computer Science and Engineering
by
Swapna T R
M050181CS
Guided by
Abdul Nazeer A K
1
ACKNOWLEDGEMENT
Swapna. T R
2
Contents
1 Abstract 4
2 Introduction 5
4 About WEKA 10
5 Conclusion 11
6 Snapshots 11
7 References 16
3
1 Abstract
set.A number of classified documents from the training set are removed prior
work.
4
2 Introduction
of documents in digital form and the ensuing need to organize them.In the
gories based on Instance based algorithm.The classes are ”hit” and ”miss”.
5
3 Instance Based Learning Methods
Instance based learning method is defined as the generalizing of the new in-
ods are sometimes called Lazy Learning because they delay the processing
until a new instance must be classified. Each time a new query instance is
to assign a target function value for the new instance.Search for the best
match, similar match, or close match, but not necessary exact match.
any past design that had previously been stored in the database. If it matches
the closest design match. There can be many similar instances retrieved and
the best attributes of each design can be combined and used to design a
6
3.1 Common Instance Based Learning Methods
instances is retrieved from memory and used to classify the query instance(target
function).
The following are the most common Instance based learning methods:
• k-Nearest Neighbor
proximation of the target function for each query instance that has to be
function that applies in the neighbourhood of the new query instance and
instance space. This has a significant advantage when the target function is
very complex, but can still be described by a collection of less complex local
approximations.
7
3.2 K-Nearest Neighborhood Algorithm
the training data set that are similar to a new observation,say (u1,u2,....,up)
,that we wish to classify and to use these observations to classify the ob-
servation into a class C.if we knew the function f,we would simply compute
is to look for observations in our training data that are near it,and then to
8
The Euclidian distance between the points (x1,x2,.......xp) and (u1,u2,.....up)
9
4 About WEKA
WEKA is a comprehensive tool bench for machine learning and data min-
that can apply to your dataset.It also includes a variety of tools for trans-
dataser ,feed it into a learning scheme and analyze the resulting classifier and
its performance.
we can also call packages that are implemented in WEKA from our own
source code in java.For this project i have used the IBK,which is WEKA’s
10
5 Conclusion
In this project i have extensively studied the WEKA’s framework .It has
package is the center to the WEKA System.It forms the base for every other
class.I implemented a Message classifier ,with which i could train the system
instance is given ,it is also getting correctly classified using K-Nearest Neigh-
6 Snapshots
11
Figure 1: WEKA’s Command prompt
12
Figure 2: Classifying a Message File
13
Figure 3: Already classified Message File
14
Figure 4: The Training file- The Message appears in bold
15
7 References
References
[2] www.cs.waikato.ac.nz/ml/weka/
16