20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK

DAT630 
Classification 
Basic Concepts, Decision Trees, and Model Evaluation Basic Concepts
Introduction to Data Mining, Chapter 4
08/09/2015
Krisztian Balog | University of Stavanger
Classification Why?
- Classification is the task of assigning objects - Descriptive modeling
to one of several predefined categories

- Explanatory tool to distinguish between objects of
different classes
- Examples
- Credit card transactions: legitimate or fraudulent? - Predictive modeling
- Emails: SPAM or not? - Predict the class label of previously unseen records
- Patients: high or low risk? - Automatically assign a class label when presented
- Astronomy: star, galaxy, nebula, etc. with the attributes of the record
- News stories: finance, weather, entertainment,
sports, etc.
The task Attribute set 

(x)
Classification
Model
Class label 
(y)
- Input is a collection of records (instances)
- Each record is characterized by a tuple (x,y)
- x is the attribute set

- y is the class label (category or target attribute)
- Classification is the task of learning a target
function f (classification model) that maps
each attribute set x to one of the predefined
class labels y
Attribute set 
(x)
Classification
Model
Class label 
(y) General approach
Tid Attrib1 Attrib2 Attrib3 Class
Learning
1 Yes Large 125K No
algorithm
Records whose class
2 No Medium 100K No
3 No Small 70K No labels are known

4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
Nominal Nominal 7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No
Ordinal 10 No Small 90K Yes
Model
10
Training Set
Interval Apply
Model
Ratio
Tid
11
Attrib1
No
Attrib2
Small
Attrib3
55K
Class
?
Records with
12 Yes Medium 80K ? unknown class labels
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
General approach Objectives for Learning Alg.

Learning
Learning Tid Attrib1 Attrib2 Attrib3 Class
Learning
Learning
1 Yes Large 125K No algorithm
algorithm 1 Yes Large 125K No algorithm
algorithm
2 No Medium 100K No Should fit2theNo input
Medium 100K No
Ind data well 3 No Ind

uc uct
3 No Small 70K No Small 70K No
4 Yes Medium 120K No tion

Induction 4 Yes Medium 120K No ion
Induction
5 No Large 95K Yes 5 No Large 95K Yes
6 No Medium 60K No 6 No Medium 60K No
7 Yes Large 220K No

Learn
Learn 7 Yes Large 220K No
Learn
Learn
8 No Small 85K Yes model
Model 8 No Small 85K Yes model
Model
9 No Medium 75K No 9 No Medium 75K No
10 No Small 90K Yes 10 No Small 90K Yes

10
Model
Model
10
Model
Model
Training Set Training Set
Apply Apply
Apply
Model
Apply
Model
Tid Attrib1 Attrib2 Attrib3 Class Tid Attrib1 Attrib2 Attrib3 Class
n model model
11 No Small 55K ?
uctio
11 No Small 55K ?
uction
12 Yes Medium 80K ? Ded Should correctly
12 Yes Medium 80K ? Ded
13 Yes Large 110K ? Deduction predict class
13 labels
Yes Large 110K ? Deduction
14
15
No
No
Small
Large
95K
67K
?
?
for unseen14
15
data
No
No
Small
Large
95K
67K
?
?
10 10
Test Set Test Set
Learning Algorithms Machine Learning vs.  

Data Mining
- Decision trees
- Similar techniques, but diﬀerent goal
- Rule-based
- Machine learning is focused on developing and
designing learning algorithms
- Naive Bayes
- More abstract, e.g., features are given

- Support Vector Machines
- Data Mining is applied Machine Learning
- Random forests
- Performed by a person who has a goal in mind and
- k-nearest neighbors
uses Machine Learning techniques on a specific
dataset
- … - Much of the work is concerned with data
(pre)processing and feature engineering
Today Objectives for Learning Alg.
Learning
Learning
- Decision trees
1
Should fit2theNo input
Yes
Medium
Large 125K
100K
No
No
algorithm
algorithm
data well 3 No Ind
uct
Small 70K No
- Binary class labels

4
5
Yes
No
Medium
Large
120K
95K
No
Yes
ion
Induction
- Positive or Negative 6 No Medium 60K No
Learn
Learn
7 Yes Large 220K No

Model
How to measure
9
10
No
No
this?
Medium
Small
75K
90K
No
Yes
10
Model
Model
Training Set
Apply
Apply
Model
n model
uctio
11 No Small 55K ?
Should correctly
predict class
13 labels
Yes Large 110K ? Deduction
for unseen14
15
data
No
No
Small
Large
95K
67K
?
?
10
Test Set
Evaluation Confusion Matrix

- Measuring the performance of a classifier
Predicted class
- Based on the number of records correctly and
incorrectly predicted by the model
Positive Negative
- Counts are tabulated in a table called the
confusion matrix Positive
True Positives False Negatives
(TP) (FN)
- Compute various performance metrics based Actual
class
on this matrix Negative
False Positives True Negatives
(FP) (TN)
Confusion Matrix Example 

"Is the man innocent?"
Predicted class Predicted class
Positive  Negative 
Positive Negative
Innocent Guilty
Type II Error  True Positive  False Negative  letting a guilty
True Positives False Negatives Positive 
Positive (TP) (FN) failing to       person go free
Innocent
Actual raise an alarm Actual
Convicted Freed (error of impunity)
class class False Positive  True Negative 
False Positives True Negatives Negative 
Negative (FP) (TN)
   
Guilty Convicted Freed
Type I Error  convicting an innocent person 

raising a false alarm (miscarriage of justice)
Evaluation Metrics Exercise
- Summarizing performance in a single number
- Create confusion matrix
- Accuracy
- Compute Accuracy and Error rate
Number of correct predictions TP + TN
=
Total number of predictions TP + FP + TN + FN
- Error rate
Number of wrong predictions FP + FN

=
Total number of predictions TP + FP + TN + FN
- We seek high accuracy, or equivalently, low

error rate
Decision Trees Motivational Example
How does it work?

- Asking a series of questions about the
attributes of the test record
- Each time we receive an answer, a follow-up

question is asked until we reach a conclusion
about the class label of the record
Decision Tree Model Decision Tree
Learning
Learning
1 Yes Large 125K No algorithm
algorithm
2 No Medium 100K No
Ind
uc
3 No Small 70K No
4 Yes Medium 120K No tion

Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No

Learn
Learn
Model
9 No Medium 75K No
10 No Small 90K Yes

10
Model
Model
Training Set
Apply
Apply
Model
n model
uctio
11 No Small 55K ?

13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Decision Tree Root node 

no incoming edges 
Decision Tree
zero or more outgoing edges
Internal node 
exactly one incoming edges 
two or more outgoing edges
Decision Tree Example Decision Tree

al al us
ic ric uo
or go in s
teg te nt as
ca c a c o cl
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat
1 Yes Single 125K No

2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
Leaf (or terminal) nodes  9 No Married 75K No
NO YES
have exactly one incoming edges  10 No Single 90K Yes
and no outgoing edges
10
Training Data Model: Decision Tree

Another Example Apply Model to Test Data
al al
Learning
ic us Tid Attrib1 Attrib2 Attrib3 Class
Learning
or ric uo algorithm
go algorithm
1 Yes Large 125K No
eg in s
t te nt as Single,
cl MarSt 2 No Medium 100K No
ca c a c o
Married Ind
Divorced
uct
3 No Small 70K No
Status Income Cheat 4 Yes Medium 120K No ion
Induction
Yes
NO Refund 5 No Large 95K
Yes No 6 No Medium 60K No
Learn
2 No Married 100K No 7 Yes Large 220K No Learn
Model
3 No Single 70K No NO TaxInc
9 No Medium 75K No
4 Yes Married 120K No < 80K > 80K 10 No Small 90K Yes
5 No Divorced 95K Yes

10
Model
Model
NO YES Training Set
6 No Married 60K No
Apply
7 Yes Divorced 220K No Apply
Model
n model
uctio
8 No Single 85K Yes 11 No Small 55K ?
There could be more than one
9 No Married 75K No 12 Yes Medium 80K ? Ded
10 No Single 90K Yes
tree that fits the same data! 13 Yes Large 110K ? Deduction
10
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Test Data Test Data

Start from the root of tree. Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat
No Married 80K ? No Married 80K ?

Refund 10
Refund 10
Yes No Yes No
NO MarSt NO MarSt
Single, Divorced Married Single, Divorced Married
TaxInc NO TaxInc NO
< 80K > 80K < 80K > 80K
NO YES NO YES
Test Data Test Data

Refund Marital Taxable Refund Marital Taxable

Refund 10
Refund 10
Yes No Yes No
NO MarSt NO MarSt
Single, Divorced Married Single, Divorced Married
TaxInc NO TaxInc NO
< 80K > 80K < 80K > 80K
NO YES NO YES
Test Data Test Data
Refund Marital Taxable Refund Marital Taxable

Refund 10
Refund 10
Yes No Yes No
NO MarSt NO MarSt
Single, Divorced Married Single, Divorced Married Assign Cheat to “No”
TaxInc NO TaxInc NO
< 80K > 80K < 80K > 80K
NO YES NO YES
Decision Tree Induction Tree Induction

Learning
Learning
1
2
Yes
No
Large
Medium
125K
100K
No
No
algorithm
algorithm - There are exponentially many decision trees
3 No Small 70K No Ind
uct
ion
that can be constructed from a given set of
4 Yes Medium 120K No
Induction
5
6
No
No
Large
Medium
95K
60K
Yes
No
attributes
Learn
Learn
- Finding the optimal tree is computationally
7 Yes Large 220K No

Model
infeasible (NP-hard)
9 No Medium 75K No
10 No Small 90K Yes

10
Model
Model
Training Set
Apply - Greedy strategies are used
Apply
Model
n model - Grow a decision tree by making a series of locally
ctio
edu
11 No Small 55K ?
12 Yes Medium 80K ? D optimum decisions about which attribute to use for
Deduction
splitting the data
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set

Status Income Cheat
Hunt’s algorithm
Status Income Cheat
1 Yes Single 125K No 2 No Married 100K No

Refund
2 No Married 100K No Don’t 3 No Single 70K No
Yes No
3 No Single 70K No Cheat 4 Yes Married 120K No
4 Yes Married 120K No Don’t Don’t Yes
5 No Divorced 95K
Cheat Cheat
- Let Dt be the set of training records that 5 No Divorced 95K Yes
6 No Married 60K No
6 No Married 60K No
reach a node t and y={y1,…yc} the class 7 Yes Divorced 220K No
7 Yes Divorced 220K No
labels
8 No Single 85K Yes 8 No Single 85K Yes
- General Procedure
9 No Married 75K No Refund Refund 9 No Married 75K No
- If Dt contains records that belong the 10
10 No Single 90K Yes Yes No Yes No 10 No Single 90K Yes
same class yt, then t is a leaf node
10
Don’t Don’t Marital

Marital
labeled as yt Dt Cheat
Status
Cheat
Status
Single,
- If Dt is an empty set, then t is a leaf Single,
Married Divorced
Married
Divorced
node labeled by the default class, yd
- If Dt contains records that belong to
? Cheat Don’t Taxable Don’t
Cheat
Cheat Income
more than one class, use an attribute < 80K >= 80K
test to split the data into smaller
subsets. Recursively apply the Don’t Cheat
Cheat
procedure to each subset.
Tree Induction Issues Tree Induction Issues
- Determine how to split the records
- How to specify the attribute test condition? - How to specify the attribute test condition?
- How to determine the best split? - How to determine the best split?
- Determine when to stop splitting - Determine when to stop splitting
How to Specify Test Splitting Based on Nominal

Condition? Attributes
• Depends on attribute types
• Multi-way split: Use as many partitions as
- Nominal distinct values.
- Ordinal CarType
Family Luxury
- Continuous Sports
• Depends on number of ways to split
- 2-way split • Binary split: Divides values into two subsets.

- Multi-way split Need to find optimal partitioning.
CarType CarType
{Sports,
Luxury} {Family} OR {Family,  
Luxury} {Sports}
Splitting Based on Ordinal Splitting Based on

Attributes Continuous Attributes
• Multi-way split: Use as many partitions as - Diﬀerent ways of handling
distinct values.
- Discretization to form an ordinal categorical attribute
Size - Static – discretize once at the beginning
Small Large - Dynamic – ranges can be found by equal interval bucketing,
Medium equal frequency bucketing (percentiles), or clustering
- Binary Decision: (A < v) or (A ≥ v)
• Binary split: Divides values into two subsets.  

- consider all possible splits and finds the best cut
Need to find optimal partitioning.
- can be more compute intensive
Size Size
{Small,
{Large}
OR {Medium,  
{Small}
Medium} Large}
Splitting Based on Tree Induction Issues
Continuous Attributes
- How to specify the attribute test condition?

Taxable Taxable - How to determine the best split?
Income Income?
> 80K? - Determine when to stop splitting
< 10K > 80K
Yes No
[10K,25K) [25K,50K) [50K,80K)
(i) Binary split (ii) Multi-way split
Determining the Best Split Determining the Best Split

Before Splitting: 10 records of class 0, 
10 records of class 1
• Greedy approach:
- Nodes with homogeneous class distribution are

Own preferred
Car Student
Car? Type? ID?
• Need a measure of node impurity:
Yes No Family Luxury c1 c20
c10 c11
Sports
C0: 5 C0: 9
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1 C1: 5 C1: 1
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Which test condition is the best?
Impurity Measures Entropy

c 1
X
Entropy(t) = P (i|t)log2 P (i|t)
- Measuring the impurity of a node
i=0
- P(i|t) = fraction of records belonging to class i at a
given node t • Maximum (log nc) when records are equally
- c is the number of classes distributed among all classes implying least
c 1
X information
Entropy(t) = P (i|t)log2 P (i|t) • Minimum (0.0) when all records belong to one
i=0
class, implying most information
c 1
X
Gini(t) = 1 P (i|t)2
i=0
Classification error(t) = 1 max P (i|t)

c 1
X c 1
X
Exercise Entropy(t) =
i=0
P (i|t)log2 P (i|t) Exercise Entropy(t) =
i=0
P (i|t)log2 P (i|t)
C1 0 C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 C2 6 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0
C1 1 C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 C2 5 Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65
C1 2 C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 C2 4 Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
c 1
X
GINI Exercise Gini(t) = 1
i=0
P (i|t)2
c 1
X
2
Gini(t) = 1 P (i|t)
i=0 C1 0
C2 6
- Maximum (1 - 1/nc) when records are equally
distributed among all classes, implying least
interesting information
C1 1
- Minimum (0.0) when all records belong to one
class, implying most interesting information C2 5
C1 2
C2 4
c 1
X
Exercise Gini(t) = 1
i=0
P (i|t)2 Classification Error
Classification error(t) = 1 max P (i|t)
C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0
• Maximum (1 - 1/nc) when records are equally distributed

among all classes, implying least interesting information
C1 1 P(C1) = 1/6 P(C2) = 5/6 • Minimum (0.0) when all records belong to one class,
C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278 implying most interesting information
C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Exercise Classification error(t) = 1 max P (i|t) Exercise Classification error(t) = 1 max P (i|t)
C1 0 C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 C2 6 Error = 1 – max (0, 1) = 1 – 1 = 0
C1 1 C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 C2 5 Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6
C1 2 C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 C2 4 Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
Comparison of   Gain = goodness of a split

Impurity Measures C0 N00
Before Splitting: M0
C1 N01
For a 2-class problem: A? B?
Yes No Yes No
Node N1 Node N2 Node N3 Node N4
C0 N10 C0 N20 C0 N30 C0 N40

C1 N11 C1 N21 C1 N31 C1 N41
M1 M2 M3 M4
M12 M34
Gain = M0 – M12 vs M0 – M34
Information Gain Determining the Best Split

- When Entropy is used as the impurity measure, Before Splitting: 10 records of class 0, 
10 records of class 1
it’s called information gain
- Measures how much we gain by splitting a Own

Car?
Car Student
Type? ID?
parent node number of records  
number of attribute values Family Luxury c1 c20
associated with the   Yes No
c10 c11
child node vj Sports
k
X C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
N (vj ) C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1
inf o = Entropy(p) Entropy(vj )
j=1
N
total number of records   Which test condition is the best?

at the parent node
Gain Ratio Tree Induction Issues
inf o
Gain ratio = - How to specify the attribute test condition?

Split info
k - How to determine the best split?
X
Split info = P (vi ) log2 P (vi ) - Determine when to stop splitting
i=1
- It the attribute produces a large number of

splits, its split info will also be large, which in
turn reduces its gain ratio
Stopping Criteria for Tree Summary Decision Trees

Induction
• Stop expanding a node when all the records - Inexpensive to construct
belong to the same class

- Extremely fast at classifying unknown records
• Stop expanding a node when all the records - Easy to interpret for small-sized trees
have similar attribute values
- Accuracy is comparable to other classification

• Early termination techniques for many simple data sets
Underfitting and Overfitting

500 circular and 500
triangular data points.
Practical Issues of Circular points:

0.5 ≤ sqrt(x12+x22) ≤ 1
Classification Triangular points:

sqrt(x12+x22) > 0.5 or
sqrt(x12+x22) < 1
Underfitting and Overfitting How to Address Overfitting
Overfitting
• Pre-Pruning (Early Stopping Rule)
- Stop the algorithm before it becomes a fully-grown tree
- Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
- More restrictive conditions:
• Stop if number of instances is less than some user-
specified threshold
• Stop if class distribution of instances are independent
of the available features
Underfitting: when model is too simple, both training and test errors are large • Stop if expanding the current node does not improve
impurity measures (e.g., Gini or information gain)
How to Address Methods for estimating

Overfitting… performance
• Post-pruning - Holdout
- Grow decision tree to its entirety - Reserve 2/3 for training and 1/3 for testing
- Trim the nodes of the decision tree in a bottom-up fashion (validation set)
- If generalization error improves after trimming, replace sub-
tree by a leaf node - Cross validation
- Class label of leaf node is determined from majority class of - Partition data into k disjoint subsets
instances in the sub-tree
- k-fold: train on k-1 partitions, test on the remaining
one
- Leave-one-out: k=n
Expressivity Expressivity
1
0.9
0.8
x < 0.43?
x+y<1
0.7
Yes No
0.6
y < 0.33?
y
0.5 y < 0.47?

0.4
0.3
Yes No Yes No Class = + Class =
0.2
:4 :0 :0 :4
0.1 :0 :4 :3 :0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Use-case:  
Web Robot Detection
Tasks
- Given a training data set and a test set
- Task 1: Build a decision tree classifier
Assignment 1 -
-
You have to build it from scratch
You are free to pick your programming language
- Submit code and predicted class labels for test set
- Accuracy has to reach a certain threshold
Tasks Online evaluation
- Task 2: Submit a short report describing
- Real-time leaderboard will be available for the
- What processing steps you applied submissions (updated after each git push)
- Which are the most important features of the dataset - Two tracks, results separately
(based on the decision tree built)

- Decision tree track (for everyone)
- Task 3 (optional) Use any classifier from a the - Open track (optional)
scikit-learn Python machine learning library
- Best teams for each track get +5 points at the

- Submit code and predicted class labels for test set
exam (all members)
- Online evaluation be available from next week
Practicalities
- Data set and specific instructions will be made
available today
- Work in groups of 2-3
- Deadlines
- Forming groups by 11/9 (this Friday!)

- Predictions due 28/9
- Report due 5/10
- Note: I’m away this Friday, but the practicum
will be held. Get started on Assignment 1!

20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK

Uploaded by

Copyright:

Available Formats

DAT630

Krisztian Balog | University of Stavanger

to one of several predefined categories

- Credit card transactions: legitimate or fraudulent? - Predictive modeling

The task Attribute set

- Input is a collection of records (instances)

- Each record is characterized by a tuple (x,y)

- x is the attribute set

3 No Small 70K No labels are known

Nominal Nominal 7 Yes Large 220K No Learn

General approach Objectives for Learning Alg.

Ind data well 3 No Ind

4 Yes Medium 120K No tion

6 No Medium 60K No 6 No Medium 60K No

7 Yes Large 220K No

10 No Small 90K Yes 10 No Small 90K Yes

Test Set Test Set

Learning Algorithms Machine Learning vs.

- More abstract, e.g., features are given

- Data Mining is applied Machine Learning

- Binary class labels

8 No Small 85K Yes model

Evaluation Confusion Matrix

Confusion Matrix Example

Type I Error convicting an innocent person

Number of wrong predictions FP + FN

- We seek high accuracy, or equivalently, low

Decision Trees Motivational Example

How does it work?

- Each time we receive an answer, a follow-up

4 Yes Medium 120K No tion

7 Yes Large 220K No

10 No Small 90K Yes

12 Yes Medium 80K ? Ded

Decision Tree Root node

Decision Tree Example Decision Tree

1 Yes Single 125K No

Training Data Model: Decision Tree

5 No Divorced 95K Yes

Test Data Test Data

No Married 80K ? No Married 80K ?

Single, Divorced Married Single, Divorced Married

Test Data Test Data

No Married 80K ? No Married 80K ?

Single, Divorced Married Single, Divorced Married

No Married 80K ? No Married 80K ?

Single, Divorced Married Single, Divorced Married Assign Cheat to “No”

Decision Tree Induction Tree Induction

8 No Small 85K Yes model

10 No Small 90K Yes

Tid Refund Marital Taxable

1 Yes Single 125K No 2 No Married 100K No

Don’t Don’t Marital

How to Specify Test Splitting Based on Nominal

• Depends on number of ways to split

- 2-way split • Binary split: Divides values into two subsets.

Splitting Based on Ordinal Splitting Based on

- Binary Decision: (A < v) or (A ≥ v)

• Binary split: Divides values into two subsets.

- How to specify the attribute test condition?

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

Determining the Best Split Determining the Best Split

DAT630 

The task Attribute set 

Learning Algorithms Machine Learning vs.  

Confusion Matrix Example 

Type I Error  convicting an innocent person 

Decision Tree Root node 

• Binary split: Divides values into two subsets.  

Comparison of   Gain = goodness of a split

total number of records   Which test condition is the best?