Professional Documents
Culture Documents
Rule-Based Classifier
Classify records by using a collection of
ifthen rules
Rule: (Condition)
where
Condition is a conjunctions of attributes
y is the class label
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Blood Type
warm
cold
cold
warm
cold
cold
warm
warm
warm
cold
cold
warm
warm
cold
cold
cold
warm
warm
warm
warm
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Live in Water
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
Class
mammals
reptiles
fishes
mammals
amphibians
reptiles
mammals
birds
mammals
fishes
reptiles
birds
mammals
fishes
amphibians
reptiles
mammals
birds
mammals
birds
hawk
grizzly bear
Blood Type
warm
warm
Fishes
Birds
Mammals
Reptiles
Amphibians
Give Birth
Can Fly
Live in Water
Class
no
yes
yes
no
no
no
?
?
Accuracy of a rule:
Fraction of records
that satisfy both the
antecedent and
consequent of a rule
Taxable
Income Class
Yes
Single
125K
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced 95K
Yes
No
Married
No
Yes
Divorced 220K
No
Single
85K
Yes
No
Married
75K
No
10
No
Single
90K
Yes
60K
No
10
(Status=Single)
No
lemur
turtle
dogfish shark
Blood Type
warm
cold
cold
Birds
Fishes
Mammals
Reptiles
Amphibians
Give Birth
Can Fly
Live in Water
Class
yes
no
yes
no
no
no
no
sometimes
yes
?
?
?
Exhaustive rules
Classifier has exhaustive coverage if it accounts for
every possible combination of attribute values
Each record is covered by at least one rule
Refund
Yes
No
NO
Marita l
Status
{Single,
Divorced}
NO
{Married}
NO
Taxable
Income
< 80K
> 80K
YES
Taxable
Income Cheat
Yes
Single
125K
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced 95K
No
Married
Yes
Divorced 220K
No
No
Single
85K
Yes
No
Married
75K
No
10
No
Single
90K
Yes
(Status=Married)
No
Refund
Yes
No
NO
{Single,
Divorced}
Marita l
Status
NO
Taxable
Income
< 80K
NO
{Married}
> 80K
YES
60K
10
Initial Rule:
(Refund=No)
No
Yes
No
turtle
Blood Type
cold
Give Birth
Can Fly
no
no
Live in Water
Class
sometimes
Birds
Fishes
Mammals
Reptiles
Amphibians
Class-based ordering
Rules that belong to the same class appear together
Rule-based Ordering
Class-based Ordering
(Refund=Yes) ==> No
(Refund=Yes) ==> No
Indirect Method:
Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
e.g: C4.5rules
(ii) Step 1
R1
R2
(iii) Step 2
(iv) Step 3
Rule Pruning
Rule Growing
Two common strategies
{}
Yes: 3
No: 4
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 0
No: 3
(a) General-to-specific
...
Income
> 80K
Yes: 3
No: 1
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
Instance Elimination
Why do we need to
eliminate instances?
R3
R1
+
class = +
+
class = -
+ +
++
+
+
+
++
+ + +
+ +
-
+ +
-
+
-
R2
Rule Evaluation
nc
n
Metrics:
Accuracy
Laplace
nc 1
n k
nc kp
M-estimate
n k
n : Number of instances
covered by rule
nc : Number of instances
covered by rule
k : Number of classes
p : Prior probability
Rule Pruning
Similar to post-pruning of decision trees
Reduced Error Pruning:
Remove one of the conjuncts in the rule
Compare error rate on validation set before and after
pruning
If error improves, prune the conjunct
Indirect Methods
P
No
Yes
Q
No
Rule Set
R
Yes
No
Yes
Q
No
Yes
Example
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Lay Eggs
no
yes
yes
no
yes
yes
no
yes
no
no
yes
yes
no
yes
yes
yes
yes
yes
no
yes
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
yes
no
no
no
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
yes
yes
yes
no
yes
Class
mammals
reptiles
fishes
mammals
amphibians
reptiles
mammals
birds
mammals
fishes
reptiles
birds
mammals
fishes
amphibians
reptiles
mammals
birds
mammals
birds
Give
Birth?
No
Yes
(Give Birth=Yes)
Yes
Mammals
()
Reptiles
Amphibians
RIPPER:
No
(Live in Water=Yes)
(Have Legs=No)
Sometimes
Fishes
Fishes
Live In
Water?
Mammals
Birds
Yes
Birds
Reptiles
Can
Fly?
Amphibians
Fishes
No
Reptiles
()
Mammals
Birds
0
0
0
3
0
Mammals
0
1
0
0
6
0
0
0
2
0
Mammals
2
0
1
1
4
RIPPER:
PREDICTED CLASS
Amphibians Fishes Reptiles Birds
ACTUAL Amphibians
0
0
0
CLASS Fishes
0
3
0
Reptiles
0
0
3
Birds
0
0
1
Mammals
0
2
1
Instance-Based Classifiers
Set of Stored Cases
Atr1
...
AtrN
Class
A
B
B
C
A
C
B
Unseen Case
Atr1
...
AtrN
Nearest neighbor
Uses k closest points (nearest neighbors) for
performing classification
Training
Records
Choose k of the
nearest records
Test
Record
Nearest-Neighbor Classifiers
Unknown record
1 nearest-neighbor
Voronoi Diagram
d ( p, q )
(p
q)
vs
100000000000
011111111111
000000000001
d = 1.4142
d = 1.4142
Bayes Classifier
A probabilistic framework for solving
classification problems
Conditional Probability: P(C | A) P( A, C )
P( A | C )
Bayes theorem:
P(C | A)
P( A | C ) P(C )
P( A)
P ( A)
P ( A, C )
P (C )
P( S | M ) P( M )
P( S )
0.5 1 / 50000
0.0002
1 / 20
Bayesian Classifiers
Consider each attribute and class label as
random variables
Given a record with attributes (A1, A2,,An)
Bayesian Classifiers
Approach:
compute the posterior probability P(C | A1, A2, , An) for all
values of C using the Bayes theorem
P (C | A A
1
A)
n
P( A A
A | C ) P (C )
P( A A
A)
1
How to Estimate
Probabilities
from
Data?
l
l
s
ca
10
g
te
a
c
i
r
ca
g
te
a
c
i
r
t
on
in
u
o
u
s
s
a
cl
Tid
Refund
Marital
Status
Taxable
Income
Evade
Yes
Single
125K
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced
95K
Yes
No
Married
60K
No
Yes
Divorced
220K
No
No
Single
85K
Yes
No
Married
75K
No
10
No
Single
90K
Yes
s
uProbabilities
How togo Estimate
from
Data?
o
u
o
s
in
g
a
c
i
r
c
Tid
1
e
at
Refund
Yes
a
c
i
r
e
at
Marital
Status
Single
t
n
o
Taxable
Income
125K
as
l
c
Evade
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced
95K
Yes
No
Married
60K
No
Yes
Divorced
220K
No
No
Single
85K
Yes
No
Married
75K
No
10
No
Single
90K
Yes
Normal distribution:
1
2
P( A | c )
i
( Ai
ij
2
ij
ij
10
1
e
2 (54.54)
( 120 110 ) 2
2 ( 2975 )
0.0072
)2
Example
of
Nave
Bayes
Classifier
Given a Test Record:
X
(Refund
P(X|Class=No) = P(Refund=No|Class=No)
P(Married| Class=No)
P(Income=120K| Class=No)
= 4/7 4/7 0.0072 = 0.0024
P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Married| Class=Yes)
P(Income=120K| Class=Yes)
= 1 0 1.2 10-9 = 0
=> Class = No