You are on page 1of 4

International Journal of Advanced Engineering Research and Technology (IJAERT)

Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190


61


www.ijaert.org

Efficient Mining of Frequent Item Set using Recursive
Algorithm
Bhumika H. Patel
Department of Computer Science and Engineering, PIET, Limda
Gujarat Technological University
Vadodara, India


Abstract - Frequent Pattern (Or Item set) mining is the
extraction of interested collection of items from dataset. The
frequent item set is used for achieving the collection of items
according users requirement. The researchers have proposed
various algorithms like Apriori, Eclat, RElim, SaM etc. There is a
problem of ordering of items while selecting one item as prefix for
mining frequent item sets. This type of problem affects the
performance. The researchers introduce a RElim algorithm for
frequent item set mining. In this paper, two approaches are
considered for solving such type of the problem. The results of
these two approaches are compared with RElim execution.

I ndex Terms Data mining, Frequent item set mining, RElim,
Support.

I. INTRODUCTION

In the era of business world, data mining is gaining
popularity in terms of organizational profits. The core idea of
data mining is to gain useful and unknown information or the
patterns from the data in the large dataset. Data mining is
currently used in the wide range of profiling practices, such as
scientific discovery, marketing, fraud detection and
surveillance[11]. Frequent item set mining works on the
principle of finding the item sets that are found frequently as
well as together in the transaction set. Various algorithms like
Apriori[1,2], FP-Growth[4], Eclat[5], RElim[6], SaM[7], etc.
have been proposed after Agrawal first introducing the
problem of deriving categorical association rule from
transactional databases[2]. Studies of Frequent Itemset Mining
is held in the data mining because of its broad applications
in mining association rules, correlations and graph pattern
constraint based on frequent patterns and many other data
mining tasks.
Let I = {i1,, in} be a set of N distinct items and DB be a
database consist of M transactions {t1, , tm} such that each
transaction ti is a subset of I . An itemset or pattern x is a
subset of I which if |x|=k, it is called a k-itemset. One of the
properties of x is its support count or Sup(x) which is the
number of transactions in DB that contain the itemset x. If
Sup(x) is no less than a user specified threshold, called
Minsup, it is called a frequent pattern. The aim of frequent
pattern mining is to find all frequent patterns satisfying Minsup
from a given database DB. As the minimum threshold
decreases, resulting frequent items would be more. Therefore,
eliminating infrequent patterns can be done effectively in
mining process and that is the one of the main issues in
frequent pattern mining. Our main work is based on this issue
that how to select less frequent item in case two or more items
have same frequency, for mining frequent item sets.
The rest of this paper is organized as follows. Section 2
describes the work already done related to frequent item set
mining. In section 3, the limitation regarding to few algorithms
is presented. In section 4, proposed work is shown and section
5 illustrates experiment results and finally conclusion is
derived in section 5.

II. RELATED WORK

In this section, we describe few existing frequent item set
mining algorithms, namely: (i)Can-tree, (ii)CP-tree, (iii)RElim.

A. Can-tree:
In [10] a tree structure called Can-Tree is proposed. This
Can-tree algorithm requires only single scan of database. In
this algorithm, items are ordered on the basis of a canonical
standard (e.g. alphabetical) depending upon user choice.
Therefore, if there is a change in frequency, it will not affect
the order of items in the Can-tree. Therefore, new transactions
are inserted into the tree without swapping any tree nodes.

B. CP-tree:
In [9] a new tree structure called CP-tree is proposed
which is a dynamic tree. This structure allows all the
transactions to be inserted in accordance with a predefined
item order. This item order is maintained by a list, called I-list.
After inserting some of the transactions, if the item order of the
I-list differs from the current frequency-descending item order
to a predefined degree, the CP-tree is restructured through a
method called the branch sorting. Then, the item order is
updated with the current list.

C. RElim:
In [6] RElim algorithm is proposed which uses array list
structure to find frequent item sets. Figure 1 shows all the
necessary steps that are required to process RElim. In first
step, orginal database is shown. By scanning the database,
frequency of each item is determined in step 2. After that items
in each transactions are sorted in frequency ascending order in
3. In step 4, each transactions are sorted depending on items
lexicographic order.
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
62


www.ijaert.org


Fig. 1: (1) Database in original form, (2)item frequencies, (3)transactions with
sorted items, (4)lexicographically sorted transactions

In step 5, the data structure used by RElim is created. This
data structure contains a list which is sorted in frequency
descending order of the items. This list contains a counter that
shows the number of transactions that starts with the first
leading item and a pointer to the head of the list. The list-
elements themselves contains a successor pointer and pointer
to the transaction.


Fig. 2: (5) Data structure used by RElim

The basic operations of the RElim algorithm are illustrated
in Figure 3. Basic operations of RElim starts with eliminating
least frequent item from the list and respective array elements
are transferred to the conditional database containing that data
item. The item to be processed is the one associated with the
last (rightmost) list (in the example this is item e).


Fig. 3: Basic operations of RElim

If the counter associated with the list, which states the
support of the item, exceeds the minimum support, the item set
consisting of this item and the prefix of the conditional
database is reported as frequent. In addition, the list is
traversed and its elements are copied to construct a new list
array, which represents the conditional database of
transactions containing the item. In this operation the leading
item of each transaction (suffix) is used as an index into the list
array to find the list it has to be added to. In addition, the
leading item is removed (see Figure 3 on the right). The
resulting conditional database is then processed recursively to
find all frequent item sets containing the list item.

III. LIMITATION OF EXISTING ALGORITHM

The limitation of RElim algorithm is that when dataset has
more number of attributes, the performance of algorithm is
decreased. When more attributes is there, number of items
available in each transaction is also more. So it is difficult to
select the prefix with same item frequency.

IV. PROPOSED WORK

Frequent item set mining problem can be solved using
many approaches. One of them is RElim algorithm. As
discussed in above section, this algorithm has some
limitations. In order to overcome from this limitation, we have
proposed two different approaches. As RElim uses array list as
data structure, the running time of array based FI mining
algorithms take less time as compared to that of tree-based
algorithms. RElim operations are simply based on three
processing steps: deleting items, recursive processing, and
reassigning transactions. Here we are considering deleting
items operation step where there is a scope of improvement in
terms of time. Therefore, all the preprocessing steps are the
same as that of RElim, the difference will be between the order
of choosing an item for pruning when items have same
frequency.
In the proposed method, two approaches for choosing an
item for elimination in case they have same frequency are:
alphabetical order and other is order of occurrence of an item
in the database.
Suppose the database and all the preprocessing steps from
1 to 4 shown in figure 1 are the same and data structure shown
in figure 2 is also same. Now to begin performing step 6, we
need to consider the list for selecting an item as prefix for
pruning. Here in figure 2, item e and item a have the same
frequency. Now there is a confusion whether to select item a or
item e as prefix as all the recursive processing and reassigning
of the transaction greatly depend on this prefix only.
Therefore if we consider first approach i.e. alphabetical
order, the item-order list will be {a,b,c,d,e} as shown in figure
3 and item a is selected as prefix and their array elements will
be transferred in conditional database and leading item of each
transaction is used as an index into the list array to find the list
it has to be added to(in our case it is c only) and their support
count will be incremented depending upon the number of the
transactions added to it(in our case c is incremented by 2).
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
63


www.ijaert.org



added to and support count of both transactions will be
incremented by 1 as one transaction bd will be added in the list
of b and one transaction cbd will be added in the list of c.

V. COMPARING ALGORITHMS

All the tree based algorithms requires more time to find
frequent itemsets, while RElim requires less time for exection
as it is simply using three steps: deleting items, recursive
processing, and reassigning transactions. In modified
approach, the algorithm will take less time for execution and
each item will get its importance while generating frequent
itemsets.













Fig. 3: operations of Modified RElim using Method-I

Therefore if we consider second approach i.e. order of
occurrence the item-order list will be {e,d,b,a,c} as shown in
figure 4.


Fig. 4: operations of Modified RElim using Method-II

and item e is selected as prefix and their array elements will be
transferred in conditional database. The leading items b and c
is used as an index into the list array to find the list it has to be

VI. CONCLUSION

This paper provides brief introduction about the
algorithms which is used in the area of frequent item set
mining. RElim algorithm is based on array-list structure and
easy to implement. Modified RElim extends existing RElim
by maintaining item-list for same frequency items. Due to
comparision of such items, as a part of future work, I am going
to analyse the behavior of various interesting measures on
mining frequent itemsets.

ACKNOWLEDGMENT

My most sincere thanks go to my advisor Asst.Prof. Neha
Pandya. I thank her for providing me opportunity to work in
the area of FI mining. I thank her guidance, encouragement
and support during initial development of this project. I would
not like to miss a chance to say thank for the time that she
spared for me, from her extremely busy schedule.

REFERENCES

[1] R. Agrawal and R. Srikant. Fast algorithms for mining
association rules. In VLDBY94, pp. 487-499.
[2] R. Agrawal, T. Imielinski, and A. Swami. Mining
association rules between sets of items in large databases. In
Proc.1993 ACM-SIGMOD Int. Conf. Management of Data,
Washington, D.C., May 1993, pp 207-216
[3] J. Han, J. Pei, Y. Yin, And R. Mao. Mining frequent patterns
without candidate generation: a frequent-pattern tree approach.
Data Mining And Knowledge Discovery, 2003.
[4] J. Han, H. Pei, And Y. Yin. Mining frequent patterns without
candidate generation. In: Proc. Conf. On The Management Of
Data (Sigmod00, Dallas, Tx). Acm Press, New York, Ny, Usa
2000.
[5] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New
algorithms for fast discovery of association rules, Proc. 3rd Int.
Conf. on Knowledge Discovery and Data Mining (KDD97,
Newport Beach, CA), 283296. AAAI Press, Menlo Park, CA,
USA 1997
[6] C. Borgelt, Keeping things simple: finding frequent item sets by
recursive elimination. Proc. Workshop Open Software for Data
Mining (OSDM05 at KDD05, Chicago, IL), 6670. ACM
Press, New York, NY, USA 2005
[7] C. Borgelt, Simple algorithms for frequent item set mining,
Springer-Verlag, Berlin, Germany 2010
[8] J. Han, and M. Kamber, 2000. Data Mining Concepts and
Techniques. Morgan Kanufmann.
International Journal of Advanced Engineering Research and Technology (IJAERT)
Volume 2 Issue 2, May 2014, ISSN No.: 2348 8190
64


www.ijaert.org

[9] S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong, Y.-K. Lee, Efficient
single-pass frequent pattern mining using a prefix-tree.
Information Sciences 179 (2009) 559583
[10] C.K.-S. Leung, Q.I. Khan, Z. Li, and T. Hoque, CanTree: A
canonical-order tree for incremental frequent-pattern mining,
KAIS, 11 (3), pp. 287311, Apr. 2007.
[11] R. Somkumar. A study on various data mining approaches of
association rules. Int.J.Comput. Sci. Eng. Vol.2, pp.141-144.
[12] C.L. Blake and C.J. Merz. UCI Repository of Machine Learning
Databases. Dept. of Information and Computer Science,
University of California at Irvine, CA, USA 1998.
http://www.ics.uci.edu/mlearn/MLRepository.html
[13] R. Kohavi, C.E. Bradley, B. Frasca, L. Mason, and Z. Zheng.
KDD-Cup 2000 Organizers Report: Peeling the Onion.
SIGKDD Exploration 2(2):8693. ACM Press, New York, NY,
USA 2000.
[14] Synthetic Data Generation Code for Associations and Sequential
Patterns. Intelligent Information Systems, IBM Almaden
Research Center.
http://www.almaden.ibm.com/software/quest/Resources/index.sh
tml

You might also like