You are on page 1of 17

Nave Bayes Classifier

Ke Chen

COMP24111 Machine Learning

Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Example: Play Tennis
Relevant Issues
Conclusions
2
COMP24111 Machine Learning

Background

There are three methods to establish a classifier


a)) Model a classification rule directly
Examples: k-NN,
NN, decision trees, perceptron, SVM

b)) Model the probability of class memberships given input data


Example: perceptron with the cross-entropy
cross
cost

c) Make a probabilistic model of data within each class


Examples: naive Bayes, model based classifiers

a) and b) are examples of discriminative classification

c) is an example of generative classification

b) and c) are both examples of probabilistic classification


3
COMP24111 Machine Learning

Probability Basics

Prior, conditional and joint probability for random variables


Prior probability: P(X )
Conditional probability: P( X1 |X2 ), P(X2 | X1 )
Joint probability: X ( X1 , X2 ), P( X ) P(X1 ,X2 )
Relationship: P(X1 ,X2 ) P( X2 | X1 )P( X1 ) P( X1 | X2 )P( X2 )

Independence: P( X2 | X1 ) P( X2 ), P( X1 | X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2

Bayesian Rule

Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
4
COMP24111 Machine Learning

Probability Basics

Quiz: We have two six-sided


sided dice. When they are tolled, it could end up
with the following occurance: (A)) dice 1 lands on side 3, (B)
( dice 2 lands
on side 1, and (C)) Two dice sum to eight. Answer the following questions:

1) P( A ) ?
2) P(B) ?
3) P(C) ?
4) P( A | B) ?
5) P(C | A ) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equals P(A) P(C) ?
5
COMP24111 Machine Learning

Probabilistic Classification

Establishing a probabilistic model for classification


Discriminative model

P(C |X ) C c1 , , c L , X (X1 , , Xn )
P ( c 1 | x ) P( c 2 | x )

P(c L | x)

Discriminative
Probabilistic Classifier

x1

x2

xn

x ( x1 , x2 , , xn )
6
COMP24111 Machine Learning

Probabilistic Classification

Establishing a probabilistic model for classification (cont.)


Generative model

P( X |C ) C c1 , , c L , X (X1 , , Xn )
P( x |c1 )

P( x | c 2 )

P( x |c L )

Generative
Probabilistic Model

Generative
Probabilistic Model

Generative
Probabilistic Model

for Class 1

for Class 2

x2

xn x1

x2

for Class L

xn

x1

x2

x ( x1 , x2 , , xn )
7
COMP24111 Machine Learning

Probabilistic Classification

MAP classification rule


MAP: Maximum A Posterior
Assign x to c* if
P(C c * |X x ) P(C c | X x ) c c * , c c1 , , c L

Generative classification with the MAP rule


Apply Bayesian rule to convert them into posterior probabilities
P( X x |C ci )P(C ci )
P(C ci |X x)
P( X x)
P( X x |C ci )P(C ci )
for i 1,2 , , L

Then apply the MAP rule


8
COMP24111 Machine Learning

Nave Bayes

Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C ) P(C )
Difficulty: learning the joint probability P( X1 , , Xn |C )

Nave Bayes classification


Assumption that all input attributes are conditionally independent!
P( X1 , X2 , , Xn |C ) P( X1 | X2 , , Xn ; C )P( X2 , , Xn |C )
P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )

MAP classification rule: for x ( x1 , x2 , , xn )


[ P( x1 |c * ) P( xn |c * )]P( c* ) [ P( x1 |c) P( xn |c)]P( c), c c * , c c1 , , c L
9
COMP24111 Machine Learning

Nave Bayes

Nave Bayes Algorithm (for discrete input attributes)


Learning Phase:: Given a training set S,
For each target value of ci (ci c1 , , c L )
P (C ci ) estimate P(C ci ) with examples in S;
For every attribute value x jk of each attribute X j ( j 1, , n; k 1, , N j )
P ( X j x jk |C ci ) estimate P( X j x jk |C ci ) with examples in S;

Output: conditional probability tables; for Xj , Nj L elements


Test Phase:: Given an unknown instance X ( a1 , , an ),
Look up tables to assign the label c* to X if
[ P ( a1 |c * ) P ( an |c * )]P (c * ) [ P ( a1 |c) P ( an |c)]P (c), c c * , c c1 , , c L
10
COMP24111 Machine Learning

Example

Example: Play Tennis

11
COMP24111 Machine Learning

Example

Learning Phase
Outlook

Play=Yes

Play=No

Temperature

Play=Yes

Play=No

Sunny

2/9
4/9
3/9

3/5
0/5
2/5

Hot

2/9
4/9
3/9

2/5
2/5
1/5

Overcast
Rain

Humidity
High
Normal

Mild
Cool

Play=Yes Play=No

3/9
6/9

4/5
1/5

P(Play=Yes) = 9/14

Wind

Play=Yes

Play=No

Strong

3/9
6/9

3/5
2/5

Weak

P(Play=No) = 5/14

12
COMP24111 Machine Learning

Example

Test Phase
Given a new instance,
x=(Outlook=Sunny, Temperature=Cool,
Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play=Yes) = 3/9

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=Yes) = 3/9

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=Yes) = 3/9

P(Wind=Strong|Play=No) = 3/5

P(Play=Yes) = 9/14

P(Play=No) = 5/14

MAP rule
P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High
High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High
High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x) < P(No


No|x), we label x to be No.
13
COMP24111 Machine Learning

Example

Test Phase
Given a new instance,
x=(Outlook=Sunny, Temperature=Cool,
Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play=Yes) = 3/9

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=Yes) = 3/9

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=Yes) = 3/9

P(Wind=Strong|Play=No) = 3/5

P(Play=Yes) = 9/14

P(Play=No) = 5/14

MAP rule
P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High
High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High
High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x) < P(No


No|x), we label x to be No.
14
COMP24111 Machine Learning

Relevant Issues

Violation of Independence Assumption


For many real world tasks, P(X1 , , Xn |C) P(X1 |C) P(Xn |C)
Nevertheless, nave Bayes works surprisingly well anyway!

Zero conditional probability Problem

If no example contains the attribute value Xj ajk , P (Xj ajk |C ci ) 0


In this circumstance, P(x1|ci ) P(ajk |ci ) P(xn |ci ) 0 during test
For a remedy, conditional probabilities estimated with
n mp
P ( X j a jk |C c i ) c
nm
nc : number of training examples for which X j a jk and C c i
n : number of training examples for which C c i
p : prior estimate (usually, p 1 / t for t possible values of X j )
m : weight to prior (number of " virtual" examples, m 1)
15
COMP24111 Machine Learning

Relevant Issues

Continuous-valued
valued Input Attributes
Numberless values for an attribute
Conditional probability modeled with the normal distribution
( X j ji ) 2
1

P ( X j |C c i )
exp
2

2 ji
2 ji

ji : mean (avearage) of attribute values X j of examples for which C c i

ji : standard deviation of attribute values X j of examples for which C c i

Learning Phase: for X ( X1 , , Xn ), C c1 , , c L


Output: n L normal distributions and P(C ci ) i 1, , L
Test Phase: for X ( X1 , , Xn )

Calculate conditional probabilities with all the normal distributions


Apply the MAP rule to make a decision
16
COMP24111 Machine Learning

Conclusions

Nave Bayes based on the independence assumption


Training is very easy and fast; just requiring considering each
attribute in each class separately
Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions

A popular generative model


Performance competitive to most of state-of-the-art
state
classifiers
even in presence of violating independence assumption
Many successful applications, e.g., spam mail filtering
A good candidate of a base learner in ensemble learning
Apart from classification, nave Bayes can do more
17
COMP24111 Machine Learning

You might also like