You are on page 1of 3

SGN-2556 Pattern Recognition

Computer exercise 4

Tree-based classifiers
25.04.2006

It is natural and intuitive to classify a pattern through a sequence of questions, in which


the next question asked depends on the answer to the current question. Such a sequence
of questions is displayed in a directed decision tree or simply tree, where by convention
the first or root node is displayed at the top, connected by successive (directional) links
or branches to other nodes. These are similarly connected until we reach terminal or leaf
nodes, which have no further link (see the figure below).

(a) (b)

Figure 1: Classification in a basic decision tree proceeds from top to bottom (a).
Two-dimensional two-category example with decision regions marked R1 and R2 (right).
Decision boundaries are perpendicular to the feature axes.

Much of the work in designing trees focuses on deciding which property test or query
should be performed at each node. For nonmetric data one might well consider logical
combinations of properties, such as using (size=medium)AND (NOT(color=yellow))? as
a query. For numerical data, there is a simple way to visualize the decision boundaries
that are produced by decision trees. Suppose that the query at each node has the form
“Is xi > xj ? This leads to hyperplane decision boundaries that are perpendicular to the
coordinate axis and to decision regions of the form illustrated in Fig. 1. The fundamental
principle underlying tree creation is that of simplicity. We seek a property query T at
each node N that makes the data reaching the immediate descendent nodes as “pure as
possible”. The most popular measure is the entropy impurity:
X
i(N ) = − P (ωj )log2 P (ωj ), (1)
j

where P (ωj ) is the fraction of patterns at node N that are in category ωj . By the property
of entropy, if all the patterns are of the same category, the impurity is 0, otherwise it is

1
positive, with the greatest value occuring when the different classes are equally likely. An
obvious heuristic is to choose the query that decreases the impurity as much as possible.
The drop in impurity is defined by

∆i(N ) = i(N ) − PL i(NL ) − (1 − PL )i(NR ), (2)


where NL and NR are the left and right descendent nodes, i(NL ) and i(NR ) are their
impurities, and PL is the fraction of patterns at node N that will go to NL when property
query T is used. Then the best query is the choice for T that maximizes ∆i (T ).

1 CART (Classification And Regression Trees) method


1) Download the file CARTdata.mat that contains a vector of features (2 features, 16
samples taken from two classes) and a vector of targets. Visualize the data.

2) Train a binary CART tree using the entropy impurity. Use the CART.m function for
that.

3) Using the test CART.m function and obtained decision tree, build a decision surface
D over the range of given data. Plot the decision boundary superimposed on the
data points (Hint: use the contour command to plot the contour of matrix D. Put
the number of contour lines equal to 1 to get a single decision boundary).

4) Draw the decision tree for the given data as a block diagram (similarly to Fig.1(a))
using the structured array tree returned by CART.m function. Include the tree in your
report. You can access a field (or a substructure) of a particular structure using the
field structure followed by the field name (e.g. struct1.field1, struct1.substruct1.subsubstruct1).

5) In 2), a tree was grown fully (i.e. until no possible splits found). This typically
leads to overfitting. Simplify the decision tree obtained in (4) by pruning all the
neighboring leaf nodes (linked to a common antecedent node, one level above) whose
elimination gives a very small increase in impurity (take for this particular example
values less then 10−2 ).

6) Are there any other redundancies in the tree, which might be simplified?

7) Repeat step 2), using Gini and Misclassification impurity measures. Compare the
performance of three different measures in terms of complexity of the decision tree
(number of nodes), and in terms of misclassification rate for a given dataset.

8) Consider the nonmetric data from the file letters.mat sampled from three categories
and consisting of five features and twenty patterns (see figure below). Train a tree
for this data using the entropy impurity. Check the misclassification rate for it using
test CART letters.m function.

9) Train a tree only with patterns belonging to the first and second class. Simplify it
and convert information in your tree into a single logical expression, which describes
the first category. Repeat the same for the second category. (Hint: use the char
command to convert integers into ASCII characters).

2
Sample Category A−D E−G H −J K −L M −N
1 ω1 A E H K M
2 ω1 B E I L M
3 ω1 A G I L N
4 ω1 B G H K M
5 ω1 A G I L M
6 ω2 B F I L M
7 ω2 B F J L N
8 ω2 B E I L N
9 ω2 C G J K N
10 ω2 C G J L M
11 ω2 D G J K M
12 ω2 B F I L M
13 ω3 D E H K N
14 ω3 A E H K N
15 ω3 D E H L N
16 ω3 D F J L N
17 ω3 A F H K N
18 ω3 D E J L M
19 ω3 C F J L M
20 ω3 D F H L M

You might also like