You are on page 1of 30

Decision Tree Classifier

Classification and Prediction


The Classification Problem Katydids
(informal definition)

Given a collection of annotated data.


In this case 5 instances of Katydids
and five of Grasshoppers, decide
what type of insect the unlabeled
example is. Grasshoppers

Katydid or Grasshopper?
For any domain of interest, we can measure features

Color {Green, Brown, Gray, Other} Has Wings?

Abdomen Thorax
Length Length Antennae
Length

Mandible
Size

Spiracle
Diameter Leg Length
My_Collection
We can store features
Insect Abdomen Antennae Insect Class
in a database. ID Length Length
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
3 0.9 4.7 Grasshopper
The classification
4 1.1 3.1 Grasshopper
problem can now be
5 5.4 8.5 Katydid
expressed as:
6 2.9 1.9 Grasshopper
7 6.1 6.6 Katydid
• Given a training database
(My_Collection), predict the class 8 0.5 1.0 Grasshopper
label of a previously unseen Katydid
instance
9 8.3 6.6
10 8.1 4.7 Katydids

previously unseen instance = 11 5.1 7.0 ???????


Grasshoppers Katydids

10
9
8
7

Antenna Length
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Abdomen Length
Grasshoppers Katydids
We will also use this lager dataset
as a motivating example…
10
9
8
7

Antenna Length
6
5 Each of these data
4 objects are called…
• exemplars
3
• (training) examples
2 • instances
1 • tuples

1 2 3 4 5 6 7 8 9 10
Abdomen Length
Decision Tree Classifier

10
9 Ross Quinlan
8
7
Antenna Length

6 Abdomen Length > 7.1?

5
no yes
4
3 Antenna Length > 6.0? Katydid
2
no yes
1
Grasshopper Katydid
1 2 3 4 5 6 7 8 9 10
Abdomen Length
Antennae shorter than body?

Yes No

3 Tarsi?

Grasshopper
Yes No

Foretiba has ears?

Yes No
Cricket

Decision trees predate computers Katydids Camel Cricket


Decision Tree Classification
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
How do we construct the decision tree?
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they can be discretized
in advance)
– Examples are partitioned recursively based on selected attributes.
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
– There are no samples left
Person Hair Weight Age Class
Length
Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M

Comic 8” 290 38 ?
Of the 3 features we had, Weight
was best. But while people who
weigh over 160 are perfectly
classified (as males), the under 160
yes no
people are not perfectly Weight <= 160?
classified… So we simply recurse!
This time we find that we
can split on Hair length, and
we are done! no
yes
Hair Length <= 2?
We need don’t need to keep the data
around, just the test conditions. Weight <= 160?

yes no

How would
Hair Length <= 2?
these people Male
be classified?
yes no

Male Female
It is trivial to convert Decision
Weight <= 160?
Trees to rules…
yes no

Hair Length <= 2?


Male
yes no

Male Female

Rules to Classify Males/Females

If Weight greater than 160, classify as Male


Elseif Hair Length less than or equal to 2, classify as Male
Else classify as Female
The worked examples we have
seen were performed on small
datasets. However with small
datasets there is a great danger of
overfitting the data…
Yes No
When you have few datapoints, Wears green?
there are many possible splitting
rules that perfectly classify the
data, but will not generalize to
future datasets.
Female Male

For example, the rule “Wears green?” perfectly classifies the data, so does
“Mothers name is Jacqueline?”, so does “Has blue shoes”…
Avoid Overfitting in Classification
• The generated tree may overfit the training data
– Too many branches, some may reflect anomalies due to
noise or outliers
– Result is in poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early—do not split a
node if this would result in the goodness measure falling
below a threshold
• Difficult to choose an appropriate threshold
– Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees
• Use a set of data different from the training data to
decide which is the “best pruned tree”
Advantages/Disadvantages of Decision Trees

• Advantages:
– Easy to understand (Doctors love them!)
– Easy to generate rules
• Disadvantages:
– May suffer from overfitting.
– Classifies by rectangular partitioning (so does
not handle correlated features very well).
– Can be quite large – pruning is necessary.
– Does not handle streaming data easily
Classification and Prediction

• What is classification? What is regression?


• Issues regarding classification and prediction
• Classification by decision tree induction
• Scalable decision tree induction
Classification
• Definition: Learning a target function(f) that
maps each attribute set X to one of the
predefined class labels y.
• Input: collection of records.
• Record: It is a tuple(X, y) where X is
attribute set and y is class label.
• Type of attribute: Must be discrete. If
continuous, it should be converted to
discrete.
5/14/2019
20
Classification vs. Prediction
• Classification:
– predicts categorical class labels
– classifies data (constructs a model) based on the training set and
the values (class labels) in a classifying attribute and uses it in
classifying new data
• Regression:
– models continuous-valued functions, i.e., predicts unknown or
missing values
• Typical Applications
– credit approval
– target marketing
– medical diagnosis
– treatment effectiveness analysis

5/14/2019
21
Why Classification? A motivating
application
• Credit approval
– A bank wants to classify its customers based on whether they are
expected to pay back their approved loans
– The history of past customers is used to train the classifier
– The classifier provides rules, which identify potentially reliable
future customers
– Classification rule:
• If age = “31...40” and income = high then credit_rating = excellent
– Future customers
• Paul: age = 35, income = high  excellent credit rating
• John: age = 20, income = medium  fair credit rating

5/14/2019
22
Descriptive and Predictive
modeling

• Descriptive: A model that can be used as


an explanatory tool to distinguish between
objects of different classes.
• Predictive: A model that can be used to
predict the class label of unknown records.

5/14/2019
23
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
– Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
– Estimate accuracy of the model
• The known label of test samples is compared with the classified result
from the model
• Accuracy rate is the percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur

5/14/2019
24
Classification Process (1): Model
Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
IF rank = ‘professor’
D ave A ssistant P rof 6 no
OR years > 6
A nne A ssociate P rof 3 no
THEN tenured = ‘yes’
25
5/14/2019
Classification Process (2): Use the
Model in Prediction
• Accuracy=?
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Mellisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 26
5/14/2019
Classification by Decision Tree Induction
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree

5/14/2019
27
Ways of attribute splitting

Marital Marital Annual


status status income

{>25K, <50K
{>10K, <25K}
{<10K}
{Married,
{single }{married} {divorced} {single} Divorced}

(a) Multi-way split (c) Continuous attribute


(b) Binary split

5/14/2019
28
Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
5/14/2019
29
Output: A Decision Tree for
“buys_computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

5/14/2019
30

You might also like