Professional Documents
Culture Documents
Date: 5-11-2009
Time: 14.00-17.00
General Remarks
1. You are allowed to consult 1 A4 sheet with notes written on both sides.
2. Always show how you arrived at the result of your calculations.
3. If you are a native speaker, answers in Dutch are preferred.
4. There are six questions, for which you can score a total of 100 points.
items
{A, B}
{A, B, D}
{B, D, E}
{B, C, D, E}
{A, B, C}
a) Use the Apriori algorithm to compute all frequent itemsets, and their
support, with minimum support of 2. It is important that you clearly
indicate the steps of the algorithm.
b) Give all closed frequent itemsets.
c) Use the frequent itemsets to construct a krimp codetable, and compute
how often each itemset in the codetable is used in covering the database.
2
1
3
4
x1
2
8
6
2
x2
2
6
8
4
a) Cluster this data into two clusters, using the k-means algorithm. To initialize the algorithm, put objects 1 and 3 in one cluster, and objects 2 and
4 in the other cluster. Show the steps of the algorithm clearly. Give the
value of the k-means error function after convergence.
b) What is the value of the error function in the optimal solution for k = 4?
c) The k-means algorithm can be viewed as a special case of model based
clustering with normal components. Which constraints have to be imposed
on the cluster covariance matrices to get a distance measure that is similar
to the one used by k-means?
t1
60 | 40
aa
!
!!
aa
!
!
aa
!
a
!
t2
30 | 10
t3
@
@
t4 30 | 5
30 | 30
l
,,
l
l
t6 0 | 20
t5 0 | 5
t7
30 | 10
@
@
t8 25 | 5
t9 5 | 5
In each node, the number of observations with class 0 is given in the left
part, and the number of observations with class 1 in the right part. The leaf
nodes have been drawn as rectangles.
a) Compute the impurity of nodes t1 , t2 and t3 according to the resubstitution
error. Give the impurity reduction achieved by the first split.
b) Compute the sequence T1 > T2 > . . . > {t1 }, where T1 is the smallest
minimizing subtree of Tmax for = 0. For each tree in the sequence, give
the interval of values for which it is the smallest minimizing subtree of
Tmax .