Naive Bayes

Nave Bayes Classifier
Ke Chen
COMP24111 Machine Learning
Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Example: Play Tennis
Relevant Issues
Conclusions
2
Background
There are three methods to establish a classifier

a)) Model a classification rule directly
Examples: k-NN,
NN, decision trees, perceptron, SVM
b)) Model the probability of class memberships given input data

Example: perceptron with the cross-entropy
cross
cost
c) Make a probabilistic model of data within each class

Examples: naive Bayes, model based classifiers
a) and b) are examples of discriminative classification
c) is an example of generative classification
b) and c) are both examples of probabilistic classification

3
Probability Basics
Prior, conditional and joint probability for random variables

Prior probability: P(X )
Conditional probability: P( X1 |X2 ), P(X2 | X1 )
Joint probability: X ( X1 , X2 ), P( X ) P(X1 ,X2 )
Relationship: P(X1 ,X2 ) P( X2 | X1 )P( X1 ) P( X1 | X2 )P( X2 )
Independence: P( X2 | X1 ) P( X2 ), P( X1 | X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2
Bayesian Rule
Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
4
Probability Basics
Quiz: We have two six-sided

sided dice. When they are tolled, it could end up
with the following occurance: (A)) dice 1 lands on side 3, (B)
( dice 2 lands
on side 1, and (C)) Two dice sum to eight. Answer the following questions:
1) P( A ) ?
2) P(B) ?
3) P(C) ?
4) P( A | B) ?
5) P(C | A ) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equals P(A) P(C) ?
5
Establishing a probabilistic model for classification

Discriminative model
P(C |X ) C c1 , , c L , X (X1 , , Xn )
P ( c 1 | x ) P( c 2 | x )
P(c L | x)
Discriminative
Probabilistic Classifier
x1
x2
xn
x ( x1 , x2 , , xn )
6
Establishing a probabilistic model for classification (cont.)

Generative model
P( X |C ) C c1 , , c L , X (X1 , , Xn )
P( x |c1 )
P( x | c 2 )
P( x |c L )
Generative
Probabilistic Model
Generative
Probabilistic Model
Generative
Probabilistic Model
for Class 1
for Class 2
x2
xn x1
x2
for Class L
xn
x1
x2
x ( x1 , x2 , , xn )
7
MAP classification rule

MAP: Maximum A Posterior
Assign x to c* if
P(C c * |X x ) P(C c | X x ) c c * , c c1 , , c L
Generative classification with the MAP rule

Apply Bayesian rule to convert them into posterior probabilities
P( X x |C ci )P(C ci )
P(C ci |X x)
P( X x)
P( X x |C ci )P(C ci )
for i 1,2 , , L
Then apply the MAP rule

8
Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C ) P(C )
Difficulty: learning the joint probability P( X1 , , Xn |C )
Nave Bayes classification

Assumption that all input attributes are conditionally independent!
P( X1 , X2 , , Xn |C ) P( X1 | X2 , , Xn ; C )P( X2 , , Xn |C )
P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )
MAP classification rule: for x ( x1 , x2 , , xn )

[ P( x1 |c * ) P( xn |c * )]P( c* ) [ P( x1 |c) P( xn |c)]P( c), c c * , c c1 , , c L
9
Nave Bayes
Nave Bayes Algorithm (for discrete input attributes)

Learning Phase:: Given a training set S,
For each target value of ci (ci c1 , , c L )
P (C ci ) estimate P(C ci ) with examples in S;
For every attribute value x jk of each attribute X j ( j 1, , n; k 1, , N j )
P ( X j x jk |C ci ) estimate P( X j x jk |C ci ) with examples in S;
Output: conditional probability tables; for Xj , Nj L elements

Test Phase:: Given an unknown instance X ( a1 , , an ),
Look up tables to assign the label c* to X if
[ P ( a1 |c * ) P ( an |c * )]P (c * ) [ P ( a1 |c) P ( an |c)]P (c), c c * , c c1 , , c L
10
Example
Example: Play Tennis
11
Example
Learning Phase
Outlook
Play=Yes
Play=No
Temperature
Play=Yes
Play=No
Sunny
2/9
4/9
3/9
3/5
0/5
2/5
Hot
2/9
4/9
3/9
2/5
2/5
1/5
Overcast
Rain
Humidity
High
Normal
Mild
Cool
Play=Yes Play=No
3/9
6/9
4/5
1/5
P(Play=Yes) = 9/14
Wind
Play=Yes
Play=No
Strong
3/9
6/9
3/5
2/5
Weak
P(Play=No) = 5/14
12
Example
Test Phase
Given a new instance,
x=(Outlook=Sunny, Temperature=Cool,
Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9
P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
MAP rule
P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High
High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High
High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x) < P(No

No|x), we label x to be No.
13
Example
Test Phase
Given a new instance,
x=(Outlook=Sunny, Temperature=Cool,
Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9
P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
MAP rule
P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High
High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High
High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x) < P(No

No|x), we label x to be No.
14
Relevant Issues
Violation of Independence Assumption

For many real world tasks, P(X1 , , Xn |C) P(X1 |C) P(Xn |C)
Nevertheless, nave Bayes works surprisingly well anyway!
Zero conditional probability Problem
If no example contains the attribute value Xj ajk , P (Xj ajk |C ci ) 0

In this circumstance, P(x1|ci ) P(ajk |ci ) P(xn |ci ) 0 during test
For a remedy, conditional probabilities estimated with
n mp
P ( X j a jk |C c i ) c
nm
nc : number of training examples for which X j a jk and C c i
n : number of training examples for which C c i
p : prior estimate (usually, p 1 / t for t possible values of X j )
m : weight to prior (number of " virtual" examples, m 1)
15
Relevant Issues
Continuous-valued
valued Input Attributes
Numberless values for an attribute
Conditional probability modeled with the normal distribution
( X j ji ) 2
1
P ( X j |C c i )
exp
2
2 ji
2 ji
ji : mean (avearage) of attribute values X j of examples for which C c i
ji : standard deviation of attribute values X j of examples for which C c i
Learning Phase: for X ( X1 , , Xn ), C c1 , , c L

Output: n L normal distributions and P(C ci ) i 1, , L
Test Phase: for X ( X1 , , Xn )
Calculate conditional probabilities with all the normal distributions

Apply the MAP rule to make a decision
16
Conclusions
Nave Bayes based on the independence assumption

Training is very easy and fast; just requiring considering each
attribute in each class separately
Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions
A popular generative model

Performance competitive to most of state-of-the-art
state
classifiers
even in presence of violating independence assumption
Many successful applications, e.g., spam mail filtering
A good candidate of a base learner in ensemble learning
Apart from classification, nave Bayes can do more
17

Naive Bayes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Naive Bayes

Uploaded by

Copyright:

Available Formats

Nave Bayes Classifier

COMP24111 Machine Learning

There are three methods to establish a classifier

b)) Model the probability of class memberships given input data

c) Make a probabilistic model of data within each class

a) and b) are examples of discriminative classification

c) is an example of generative classification

b) and c) are both examples of probabilistic classification

Prior, conditional and joint probability for random variables

Independence: P( X2 | X1 ) P( X2 ), P( X1 | X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2

Quiz: We have two six-sided

Establishing a probabilistic model for classification

Establishing a probabilistic model for classification (cont.)

MAP classification rule

Generative classification with the MAP rule

Then apply the MAP rule

Nave Bayes classification

MAP classification rule: for x ( x1 , x2 , , xn )

Nave Bayes Algorithm (for discrete input attributes)

Output: conditional probability tables; for Xj , Nj L elements

Example: Play Tennis

Given the fact P(Yes|x) < P(No

Given the fact P(Yes|x) < P(No

Violation of Independence Assumption

Zero conditional probability Problem

If no example contains the attribute value Xj ajk , P (Xj ajk |C ci ) 0

ji : mean (avearage) of attribute values X j of examples for which C c i

ji : standard deviation of attribute values X j of examples for which C c i

Learning Phase: for X ( X1 , , Xn ), C c1 , , c L

Calculate conditional probabilities with all the normal distributions

Nave Bayes based on the independence assumption

A popular generative model

You might also like