Professional Documents
Culture Documents
Naïve Bayes
Source: Lecture Notes for Chapter 4 &5 from
Introduction to Data Mining, by Tan, Steinbach, Kumar
Classification: Definition
• Given a collection of records (training
set )
– Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a
function of the values of other attributes.
• Goal: previously unseen records should
be assigned a class as accurately as
possible.
– A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with training
set used to build the model and test set used to validate it.
Illustrating Classification Task
Examples of Classification Task
• Predicting tumor cells as benign or malignant
• Bayes theorem:
Example of Bayes Theorem
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20
● P(X|Class=No) = P(Refund=No|Class=No)
× P(Married| Class=No)
× P(Income=120K| Class=No)
= 4/7 × 4/7 × 0.0072 = 0.0024
library(e1071)
library(caTools)
iris$spl=sample.split(iris,SplitRatio=0.7)
# By using the sample.split() we are creating a vector with values TRUE and FALSE
and by setting
#the SplitRatio to 0.7, we are splitting the original Iris dataset of 150 rows to 70%
training
#and 30% testing data.
train=subset(iris, iris$spl==TRUE) #the subset of iris dataset for which spl==TRUE
test=subset(iris, iris$spl==FALSE)