Professional Documents
Culture Documents
LINGUISTIC QUANTIFIERS
FOR TEXT CATEGORIZATION
Warsaw, POLAND
Notation
Document representation
Classification problem
Ξ: D1 × C → {0, 1} ⇒ Ξ: D × C → {0, 1}
Two phases:
Special cases
Learning phase
Classification phase
M M M
sim(d,p) = ∑ w i ui ∑ w i ∑ ui2
2
i =1 i =1 i =1
Centroids
d = [w1,...,wM]
wj = (fj ∗ log(S/nj)) / arg max(log(S / n j ) ∗ f j ) )
j
fj - frequency of term tj in the document d
nj - category frequency of this term
PROPOSED ALGORITHM (2)
Q S (x )
∈X
x∈
Q (F ( x ), S ( x ))or QF S ( x )
x ∈X x ∈X
X = { x1, xN }
µ S : X → [0,1],µ F : X → [0,1],µQ : [ 0,1] → [0,1]
1 N
truth( Q S ( x )) = µQ ( ∑ µS ( x i ))
∈X
x∈ N i =1
truth( Q (F ( x ), S( x ))) =
x ∈X
N
∑ ( µF ( x i ) ∧ µS ( x i ))
µQ i =1
N
∑ µF ( x i )
i =1
PROPOSED ALGORITHM (3)
Then,
Classically: