Professional Documents
Culture Documents
Rule-Based Classifier
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
consequent of a 10 No Single
10
90K Yes
rule (Status=Single) → No
Coverage = 40%, Accuracy = 50%
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Exhaustive rules
– Classifier has exhaustive coverage if it
accounts for every possible combination of
attribute values
– Each record is covered by at least one rule
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Rule-based ordering
– Individual rules are ranked based on their quality
Class-based ordering
– Rules that belong to the same class appear together
Direct Method:
u Extract rules directly from data
u e.g. RIPPER
Indirect Method:
u Extract rules from other classification models (e.g.
decision trees, neural networks, etc.)
u e.g. C4.5rules
R1 R1
R2
Rule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
Rule Growing
Yes: 3
{} No: 4
Refund=No, Refund=No,
Status=Single, Status=Single,
Income=85K Income=90K
(Class=Yes) (Class=Yes)
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
Refund=No,
Status = Single
Yes: 3 Yes: 2 Yes: 1 Yes: 0 Yes: 3 (Class = Yes)
No: 4 No: 1 No: 0 No: 3 No: 1
RIPPER Algorithm:
– Start from an empty rule: {} => class
– Add conjuncts that maximizes FOIL’s information gain measure:
u R0: {} => class (initial rule)
u R1: {A} => class (rule after adding conjunct)
u Gain(R0, R1) = p1 [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
u where p0: number of positive instances covered by R0
Instance Elimination
Why do we need to
eliminate instances?
R3 R2
– Otherwise, the next rule is
R1
identical to previous rule + + + + +
+ ++ +
– Remove positive instances class = +
+ + + +
+++ + +
to ensure that the next rule + + + +
+ +
is different + + + + +
- - -
– Remove negative - - -
-
- -
-
instances to prevent - -
class = - -
underestimating accuracy - -
-
of rule - -
- -
-
Metrics:
f+ n : Number of instances
– Accuracy = covered by rule
n f+ : Number of positive
f+ +1 instances covered by rule
– Laplace = k : Number of classes
n+k p+ : Prior probability for
f + + kp+ positive class
– M-estimate =
n+k
– FOIL’s information gain p1 : Number of positive
instances covered by new rule
p1 p0
= p1(log 2 − log 2 ) n1 : Number of negative
p1 + n1 p 0 + n 0 instances covered by new rule
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Stopping criterion
– Compute the gain
– If gain is not significant, discard the new rule
Rule Pruning
– Similar to post-pruning of decision trees
– Reduced Error Pruning:
u Remove one of the conjuncts in the rule
u Compare error rate on validation set before and
after pruning
u If error improves, prune the conjunct
Indirect Methods
P
No Yes
Q R Rule Set
Bayes Classifier
Given:
– A doctor knows that heart disease causes stiff neck 50% of
the time
– Prior probability of any patient having heart disease is
1/50,000
– Prior probability of any patient having stiff neck is 1/20
Bayesian Classifiers
Approach:
– compute the posterior probability P(C | A1, A2, …, An) for
all values of C using the Bayes theorem
P( A A … A | C ) P(C )
P(C | A A … A ) = 1 2 n
P( A A … A )
1 2 n
1 2 n
Normal distribution:
( Ai − µij ) 2
1 −
2σ ij2
P( Ai | c j ) = e
2πσ ij2
1 −
( 120 −110 ) 2
N ic
Original : P ( Ai | C ) =
Nc
c: number of classes
N +1
Laplace : P ( Ai | C ) = ic p: prior probability
Nc + c
m: parameter
N + mp
m - estimate : P ( Ai | C ) = ic
Nc + m
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals A: attributes
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat yes yes no yes mammals P ( A | M ) = × × × = 0.06
pigeon
cat
no
yes
yes
no
no
no
yes
yes
non-mammals
mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) = × × × = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P ( M ) = 0.06 × = 0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
owl no yes no yes non-mammals 13
dolphin yes no yes no mammals P ( A | N ) P ( N ) = 0.004 × = 0.0027
eagle no yes no yes non-mammals 20
Give Birth Can Fly Live in Water Have Legs Class P(A|M)P(M) > P(A|N)P(N)
yes no yes no ? => Mammals
Find a linear hyperplane (decision boundary) that will separate the data
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
B1
B2
B1
B2
b21
b22
margin
b11
b12
B1
w• x + b = 0
w • x + b = +1
w • x + b = −1
b11
b12
⎧ 1 if w • x + b ≥ 1 2
f ( x ) = ⎨ Margin = 2
⎩− 1 if w • x + b ≤ −1 || w ||
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Support Vector Machines
2
We want to maximize: Margin = 2
|| w ||
|| w ||2
– Which is equivalent to minimizing: L( w) =
2
Ensemble Methods
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
– Boosting
Bagging
Boosting
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
1 N
εi = ∑ w j I (Ci ( x j ) ≠ y j )
N j =1
I(p)=1 if the predicate p is true;
0 otherwise
Importance of a classifier:
1 ⎛ 1 − ε i ⎞
αi = ln⎜⎜ ⎟
2 ⎝ ε i ⎟⎠
Example: AdaBoost
Weight update:
( j +1)
⎧⎪exp−α j if C j ( xi ) = yi
wi( j )
wi =⎨ α
Zj
⎪⎩ exp j if C j ( xi ) ≠ yi
where Z j is the normalization factor
Ifany intermediate rounds produce error rate
higher than 50%, the weights are reverted back to
1/n and the resampling procedure is repeated
Classification: T
C * ( x) = arg max ∑α jδ (C j ( x) = y )
y j =1
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Illustrating AdaBoost
B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - - a= 1.9459
Illustrating AdaBoost
B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - - a= 1.9459
B2
0.3037 0.0009 0.0422
Boosting
Round 2 - - - - - - - - ++ a= 2.9323
B3
0.0276 0.1819 0.0038
Boosting
Round 3 +++ ++ ++ + ++ a= 3.8744
Overall +++ - - - - - ++
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›