You are on page 1of 8

http://artificial.

ir

svm

SVM KNN Rocchio


SVM

svm

Classifying Farsi texts using SVM algorithms and examining feature reduction methods

In text classification, words are usually considered as the features of the text. Therefore text
classifier are about to deal with a large number of features. Different approaches have been
proposed to reduce those features. In this paper we compare different methods used in text
classification and introduce the best one. Naïve Bayesian, Rocchio, KNN, Regression,
Decision tree, Neural networks, SVM, Rule based and Evolutionary methods are among
those methods. SVM, which is a supervised learning method, is one of the best methods used
in text classification. This method maps the information from the existing space to another
vector space with different (usually more) dimensions in which the linear learning algorithms
are possible to be applied. This method is computationally complex and its advantage is that
it is not dependent to the number of samples in the experimental set and yet it can work well
with a few samples and a high number of features.

Keywords: Feature selection, Text classification, SVM method, Feature extraction, Vector
space.

charkari@modares.ac.ir
zaman_ma@modares.ac.ir
CHI MI IG DF

KNN Rocchio
SVM SVM

XML HTMl
A

Dn

Amn fn

fm
dn

A
A=(ajk)
m k i ajk
m

ajk= fjk
tf*idf

fjk k i tf*idf
Ni
ajk= fjk * log
tfc
tfc tf*idf
tf*idf
ltc
lemma lemma post-tag

(d)
d*c c

Rocchio
SVM KNN

Rocchio

cj
d d
C CJ

KNN
K
K

dj

DNF
C d

SVM

SVM

SVM
SVM

SVM
Decision Boundry
SVM

x w.x=b N
b w
dmin

QP

phi

phi phi

[3]

IG(Information Gain) DF(Document Frequency)


CHI(chi Square) MI(Mutual Information)
DF IG
svm

vector

SVM Weka vector

[ ]F.sebastiani,"Mechine learning in Automared Text Categorization",Journal of ACM


-
[ ] I.guyon & A.Elisseff," An Introduction to variable abd feature selection," Machine
-
[ ] T.Joachims,"Text categorization with support vector machines:Learning with many
-
[ ] Y.Yang and J. pedrsen,"a comparative study on feature selection in Text categorization"in
th
procee international conference on machine learning(ICML-
-
[ ]
[ ] L.Galavotti ,F.Sebastiani,and M.Simi,"feature Selection and Negative Evidencein

[ ] T.Joachims, "Transducttive Inference for text classification using support vector


th
machines,"in pricceding of ICML- international conference on machine
-
tegorization: A Survey," in International Conference on
-
This document was created with Win2PDF available at http://www.daneprairie.com.
The unregistered version of Win2PDF is for evaluation or non-commercial use only.