Chapter 6 - Supervised Machine Learning-Classification

Supervised Machine Learning -
Classification
Chapter # 6
Business Analytics Using R - A Practical Approach by — Dr. Umesh R. Hodeghatta & Umesha
Nayak
Introduction
Classification and prediction are two important methods of data
analysis used to find patterns in data.
Classification predicts the categorical class (or discrete values),

whereas regression and other models predict continuous valued
functions.
This chapter focuses on basic classification techniques. It

explains some classification methods including naïve Bayes,
decision trees, and other algorithms. It also provides examples
using R packages available to perform the classification tasks.
6.1 What Is Classification? What Is Prediction?
1. Classification is a two-step process. In the first step, a
model is constructed by analyzing the database and
the set of attributes that define the class variable.
2. A classification problem is a supervised machine-
learning problem. The training data is a sample from
the database, and the class attribute is already known.
3. In a classification problem, the class of Y, a
categorical variable, is determined by a set of input
variables {x1, x2, x3, …}. In classification, the variable
we would like to predict is typically called class
variable C, and it may have different values in the set
{c1, c2, c3, …}.
4. The observed or measured variables X1, X2, … Xn are
the attributes, or input variables, also called
explanatory variables.
5. In classification, we want to determine the relationship
between the Class variable and the inputs, or
explanatory variables.
6. Typically, models represent the classification rules or
mathematical formulas. Once these rules are created
by the learning model, this model can be used to
predict the class of future data for which the class is
unknown.
6.2 Probabilistic Models for Classification
● Probabilistic classifiers and, in particular, the naïve Bayes classifier, is the
most popular classifier used by the machine-learning community.
● The naïve Bayes classifier is a simple probabilistic classifier based on Bayes’
theorem, the most popular theorem in natural language processing and visual
processing. It is one of the most basic classification techniques, with
applications such as e-mail spam detection, e-mail sorting, and sentiment
analysis.
● Even though naïve Bayes is a simple technique, it provides good performance
in many complex real-world problems.
● The study of probabilistic classification is based on the study of approximating
joint distribution with an assumption of independence and then decomposing
this probability into a product of conditional probability. A conditional
probability of event A, given event B—denoted by P(A|B)—represents the
chances of event A occurring, given that event B also occurs.
6.2 Probabilistic Models for Classification
Let C1 correspond to the class Approved, and C2 correspond to class Denied. Using
the naïve Bayes classifier, we want to classify an unknown label sample X:
X = (Age >40, Purchase Frequency = Medium, Credit Rating = Excellent)
To classify a record, first compute the chance of a record belonging to each of the
classes by computing P(Ci|X1,X2, … Xp) from the training record.
Then classify based on the class with the highest probability. In this example, there
are two classes. We need to compute P(Xi|Ci)P(Ci).
6.2.2 Naïve Bayes Classifier Using R
● Let’s try building the model by using R. We’ll use the same example. The data sample sets have the
attributes Age, Purchase Frequency, and Credit Rating.
● The class label attribute has two distinct classes: Approved or Denied.
● The objective is to predict the class label for the new sample, where Age > 40, Purchase Frequency =
Medium, Credit_Rating = Excellent.
Step no. 1: Install “e1071” package (one time only)
Step no. 2: Load “1071” package
Step no. 3: Generate naiveBayes model
Apply NaiveBayes Input Variables

Model
nbmodel <- naiveBayes(Approval ~ PurchaseFrequency+CreditRating+Age,data=credit_rating_data)
Data Set
Classification Variable
Step no. 4: View NaiveBayes Model
Step no. 5: Load Data to be predicted
creditratingpredicted <- read.csv("CreditRating_to_be_predicted.csv")
Step no. 6: View Test Data
Step no. 6: Predict output of test data
Name of data set
Model Name
Predictions
6.2.3 Advantages and Limitations of the Naïve Bayes Classifier
● The naïve Bayes classifier is the simplest classifier of all.

● It is computationally efficient and gives the best
performance when the underlying assumption of
independence is true. The more records you have, the better
the performance of naïve Bayes.
● The main problem with the naïve Bayes classifier is that the
classification model depends on posterior probability, and
when a predictor category is not present in the training data,
the model assumes zero probability.
6.3 Decision Trees
● A decision tree builds a classification model by using a tree structure.
● A decision tree structure consists of a root node, branches, and leaf nodes. Leaf
nodes hold the class label, each branch denotes the outcome of the decision-
tree test, and the internal nodes denote the decision points.
6.3.1 Recursive Partitioning Decision-Tree Algorithm
The basic strategy for building a decision tree is a recursive divide-and-conquer

approach. The following are the steps involved:
1. The tree starts as a single node from the training set.
2. The node attribute or decision attribute is selected based on the information
gain, or entropy measure.
3. A branch is created for each known value of the test attribute.
4. This process continues until all the attributes are considered for decision.
The tree stops when the following occur:

○ All the samples belong to the same class.
○ No more attributes remain in the samples for further partitioning.
○ There are no samples for the attribute branch.
6.3.2 Information Gain
● In order to select the decision-tree node and attribute to split the tree, we measure the information
provided by that attribute.
● Such a measure is referred to as a measure of the goodness of split.
● The attribute with the highest information gain is chosen as the test attribute for the node to split.
● This attribute minimizes the information needed to classify the samples in the recursive partition
nodes. This approach of splitting minimizes the number of tests needed to classify an object and
guarantees that a simple tree is formed.
● Many algorithms use entropy to calculate the homogeneity of a sample.
ERPNO AGE GENDER SMOKER
13141 25 M Y
13142 24 F N
13143 23 F N
13144 25 F N
13145 26 M Y
6.3.3 Example of a Decision Tree
6.3.4 Induction of a Decision Tree
● In this example, CreditRating has the highest information gain and it is used as a root node and branches are grown for
each attribute value.
● The next tree branch node is based on the remaining two attributes, Age and PurchaseFrequency. Both Age and Purchase
Frequency have almost same information gain. Either of these can be used as split node for the branch. We have taken Age
as the split node for the branch.
● The rest of the branches are partitioned with the remaining samples. For Age < 35, the decision is clear. Whereas for the
other Age category, PurchaseFrequency parameter has to be looked at before making the loan approval decision. This
involves calculating the information gain for the rest of the samples and identifying the next split.
6.3.5 Decision Tree classifier using rpart (recursive partitioning)
Step no. 1: Install rpart.plot and load rpart and rpart.plot

packages
install.packages("rpart.plot")
Step no. 2: Create Decision Tree Model
Input Variables
Apply rpart Model
Method is
classification
treemodel <- rpart(Approval ~ PurchaseFrequency+CreditRating+Age,data=credit_rating_data,method=”class”,minsplit=2)
Tree will have a

Data Set
minimum of 2 nodes
Classification Variable
Step no. 3: View the decision tree model
6.4.1 K-Nearest Neighbor

Chapter 6 - Supervised Machine Learning-Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6 - Supervised Machine Learning-Classification

Uploaded by

Copyright:

Available Formats

Supervised Machine Learning -

Classification predicts the categorical class (or discrete values),

This chapter focuses on basic classification techniques. It

X = (Age >40, Purchase Frequency = Medium, Credit Rating = Excellent)

Step no. 1: Install “e1071” package (one time only)

Apply NaiveBayes Input Variables

nbmodel <- naiveBayes(Approval ~ PurchaseFrequency+CreditRating+Age,data=credit_rating_data)

Step no. 6: View Test Data

● The naïve Bayes classifier is the simplest classifier of all.

The basic strategy for building a decision tree is a recursive divide-and-conquer

The tree stops when the following occur:

ERPNO AGE GENDER SMOKER

Step no. 1: Install rpart.plot and load rpart and rpart.plot

Step no. 2: Create Decision Tree Model

treemodel <- rpart(Approval ~ PurchaseFrequency+CreditRating+Age,data=credit_rating_data,method=”class”,minsplit=2)

Tree will have a

Step no. 3: View the decision tree model

You might also like