Professional Documents
Culture Documents
Decision Trees
LESSON 21
Introduction to Decision Trees
1
Decision Trees
• Decision trees are used for classification where each path is a set of
decisions leading to one class.
• Decision trees are used in decision analysis, where the decision tree
visually represents decision making.
• A path from the root to a leaf node represents a set of rules which leads
to a classification.
• Each leaf node of the decision tree pertains to one of the decisions
possible.
• It can been that there are six leaf nodes corresponding to the identifi-
cation of the salt.
2
salt + water
+ dil HCl
gives
precipitate
y n
y n
y n
y n
y n
Group V Group VI
basic radical basic radical
present present
3
colour
A B C D
• Each path from the root to a leaf node defines one path in the decision
tree. This corresponds to a rule. For example one rule would be :
If ((salt+water+ dil HCl gives precipitate=n) and (salt+water+dil H2 S
gives precipitate=n) and (salt+water+NH4 Cl + NH4 OH gives white
precipitate=y)) then (salt ǫ Group III basic radical)
• The different leaf nodes have different path lengths. The leaf node
belonging to Group I basic radical requires just one test or has only
one decision node. Group II basic radical requires two tests and so on.
• In this way, by forming the decision tree, problem solving can be carried
out.
4
Is colour
green?
y n
A Is colour
red?
y n
B Is colour
blue?
y n
C D
2. The decision at each node is done using attributes which may take
values which are numerical or categorical. The numerical values
may be integers or real numbers. By categorical attributes, we
mean attributes which are described by using words. An example
would be, if we have colour as an attribute and it can take values
such as brown, red, green, etc.
3. The decision taken at a node may involve just one attribute or
more attributes. If one attribute is involved, it is called a univari-
ate decision. This results in a axis-parallel split. If more than one
attribute is involved, it is called a multivariate split.
• The nodes represent the decisions and each leaf node represents a class.
5
• The decision at each node may involve more than one feature and is
called multivariate or may involve just one feature and is called uni-
variate.
• The outcome at each node may be two or more. If every node has only
two outcomes, it is a binary tree.
• It is to be noted that the first decision node has a three-way split. The
decision node based on whether Ajay has exams or not is a binary split
and the node on the weather has a three-way split.
• Every leaf node of the tree is associated with a class and every path
from the root to the leaf gives a classification rule. For example, If
(Has money = 50-150) and (Has exams = true) then (Goes to a movie
= false).
Computer Programs for Learning Decision Trees
6
has money
<50
>150
50−150
goes to goes to
a movie has exams a movie
= false = true
y n
goes to
a movie weather?
= false
7
• Some of them are
– ID3
– C4.5 . This program is developed by Quinlan and is an improve-
ment of ID3.
– CART
4. It is robust and performs quite well even if it deviates from the true
model.
3. Some problems are hard to solve using decision trees like the XOR
problem, parity problem and the multiplexer problem. In these cases,
the decision tree becomes prohibitively large.