Professional Documents
Culture Documents
results
Conclusions
Mining Frequent Itemsets
Given a transaction database and a
support threshold, mining frequent
itemsets is to find the complete set of
frequent itemsets
Mining frequent itemsets is essential for
many data mining tasks, e.g. association,
etc.
Mining frequent itemsets and association
rules over them often generates a large
number of frequent itemsets and rules
Harm efficiency
Hard to understand
From Frequent Itemsets to
Frequent Closed Itemsets
Mining frequent closed itemsets has the
same power as mining the complete set of
frequent itemsets, but it substantially
reduces redundant rules to be generated
Increase both efficiency and effectiveness
TDB min_sup=1
(a1a2…a100)
min_conf=50%
(a1a2…a50)
e:4}
Optimization 1: Compress Transactional
& Conditional Databases Using FP-trees
FP-tree compresses databases for
frequent itemsets
Partition-based cfad
cef
projection
60
40
20
0
0.7% 0.9% 1.1% 1.3% 1.5%
Support threshold
Scalability With Support Threshold
on Dataset Connect-4
10000
A-CLOSE
CLOSET
1000 ChARM
Runtime (second)
100
10
1
40% 50% 60% 70% 80% 90% 100%
Support threshold
Scalability With Support Threshold
on Dataset Pumsb
300
A-CLOSE
250 CLOSET
ChARM
Runtime (second)
200
150
100
50
0
75% 80% 85% 90% 95%
Support threshold
Size Scaleup on Datasets
300
T25I20D100-1000K (1%)
Connect4 (70%)
250
Pumsb (85%)
Runtime (second)
200
150
100
50
0
0 2 4 6 8 10
Replication Factor
Conclusions
CLOSET is an FP-tree-based database
projection method for efficient mining of
frequent closed itemsets in large
databases
Applying FP-tree structure
Developing techniques to identify frequent
closed itemsets quickly
Exploring a partition-based projection
mechanism for scalable mining
CLOSET can be straightforwardly extended
to mine max-patterns
References
R. Agarwal, C. Aggarwal and V.V.V. Prasad. A tree projection algorithm
for generation of frequent itemsets. In Journal of Parallel and
Distributed Computing, (to appear), 2000
R. Agrawal and R. Srikant. Fast algorithms for mining association
rules. In Proc. VLDB’94, Chile, September 1994
R.J. Bayardo. Efficiently mining long patterns from databases. In
Proc. SIGMOD’98, WA, June 1998
J. Han, J. Pei and Y. Yin. Mining frequent patterns without candidate
generation. In Proc. SIGMOD’00, TX, May 2000
H. Mannila, H. Toivonen and A.I. Verkamo. Efficient algorithms for
discovering association rules. In Proc. KDD’94, WA, July 1994
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent
closed itemsets for association rules. In Proc. ICDT’99, Israel, January
1999.
Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal: Efficient
Mining of Association Rules Using Closed Itemset Lattices. In
Information Systems, Vol.24, No.1, 1999
M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed
association rule mining. In Tech. Rep. 99-10, Computer Science,
Rensselaer Polytechnic Institute, 1999.