Synopsis

ABSTRACT
Association rule mining play an important role in various data mining process. The diversity of association rule
mining spread in various field such as market bucket analysis, medical diagnose and share market prediction. Now a
days various authors and researcher focus on validation of association rule mining. For the validation of association
rule mining used various optimization algorithm are used such as genetic algorithm, ACO and particle of swarm
optimization also used. Some authors used trigonometric function for validation of rule such function are monotonic
and non-monotonic. In the continuity of these used sine and cosine based constraints function. The sine and cosine
function work in certain range of interval. For the selection validation of these function used genetic algorithm.
Genetic algorithm are various famous optimization technique in association rule mining.
Multiple constraints and meta-heuristic function play major role in efficient association rule mining technique.
The multiple constrains applied in form of inbound and outbound condition and valued the rule for real time database.
Meta-heuristic function applied by many researchers in current research trend in data mining for pattern antfrule
extraction. These function optimized rule and reduce the rules of redundancy for association rule mining. For the
optimization of rule mining used particle of swarm optimization, ant colony optimization and several neural network
model. In this paper discuss review of rule mining technique based on constraints function based on multiple
conditions and meta-heuristic function for optimization purpose of rule mining.
For the mining of rule mining a variety of algorithm are used such as Apriori algorithm and tree based
algorithm. Some algorithm is wonder performance but generate negative associatior rule and also suffered from multi-
scan problem. In this dissertation proposed IMLMS-GA association rule mining based on min-max algorithm and
MLMS formula. In this method they used a multilevel multiple support of data table as 0 and 1. The divided process
reduces the scanning time of database. The proposed algorithm is a combination of MLMS and min-max algorithm.
Support length key is a vector value given by the transaction data set.
I. INTRODUCTION
1.1 INTRODUCTION
Association rule mining concept has been applied to market domain and specific problem has been studied, the
management of some aspects of a shopping mall, and an architecture that makes it possible to construct agents capable
of adapting the association rules has been used. Data mining refers to extracting knowledge from large quantity of
data. Interesting association can be discovered among a large set of data items by association rule mining. The finding
of interesting relationship among large amount of business transaction records can help in many business decisions
making process. Association rules mining is an important task in the field of data mining, and frequent item set mining
is a key step of many algorithms for association rules mining. There had been lots of work done for mining of
association rules. When the dataset are large, the rules generated may be very large, but some of them are not
interesting to the users, so, it is common to set some parameters to reduce the numbers of rules generated, support and
confidence are two common parameters. An association rule R is of the form A — B, where A, B are disjoint subsets of
the attribute set I. The support for the rule R is the number of database records which contain A U B (often expressed
as a proportion of the total number of records). The confidence in the rule R is the ratio:
Support for R
Support for A
which the support exceeds the required threshold. Such subsets arc referred to as “large”, “frequent” or “interesting”
sets.
1.2 ASSOCIATION RULE

The formal statement of association rule mining problem was firstly stated in [Agrawal et al. 1993] by Agrawal.
Let 1= 11, 12, ... ,Im be a set of m distinct attributes, T be transaction that contains a set of items such that T £ 1, D be
a database with different transaction records Ts. An association rule is an implication in the form of X —> Y, where
X, YE I are sets of items called item sets, and X fl Y = 0. X is called antecedent while Y is called consequent, the rule
means X implies Y. There are two important basic measures for association rules, support(S) and confidence(C).
1.2.1 ASSOCIATION RULE MINING IN LARGE DATABASE

Association rule mining used to mine the sales transactions between items in large database recognized as a
most significant area of database research. Measuring a large database there are different techniques are used. Pruning
strategy and interestingness is one of the measuring techniques for measuring large database. Large database consists
of many fields. Each field consists of their own process. They different depends on their field of work. Suppose a
customer transaction of a large database each transaction consists of items purchased by a customer in a visit items
purchased by a customer in a visit, time of purchase, category of payment, net amount etc. so it is a tedious process to
maintain for huge amount of customer transaction. An efficient algorithm implemented in association rule mining.
Apriori algorithm is best for association rule mining in large database.
1.2.2 ASSOCIATION RULE MINING IN DISTRIBUTED DATABASE
Databases or data warehouses may store a large amount of data (large database) to be mined. Mining association
rules in large databases may require extensive processing power. Distributed system is to solve this problem in large
database mining. Many large databases are distributed, more feasible to use Distributed algorithms by distributed
system. Distributed computing of large item sets encounters certain different complications. To solve this
complications by using different distributed algorithms. Such as:-
> Distributed association rule learning.
> Distributed hierarchical clustering.
> Collective PCA and PCA-based clustering.
> Collective decision tree learning.
> Collective Bayesian network learning.
Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data
sets from different sites sustains huge network communication costs. Distributed higher-order association rule mining
algorithm is to
1.2.3 ASSOCIATION RULE MINING IN RELATIONAL DATABASE

An ever growing number of organizations are installing large data warehouses using relational database
technology. There is a huge demand for mining bits of knowledge from these data warehouses. Association rule mining
is used to make a decision to solve this problem. Relational association rules and supervised learning methods help to
identify the probability of illness in a certain disease. This interface can be simply extended by adding new symptoms
types for the given disease, and by defining new relations between these symptoms.
R
D
INCLUDEPICTURE "../shona%20synopsis/media/image1.jpeg" \* MERGEFORMAT
B
M
S
Figure 1.1: A Multi-relational Database Environment.
1.2.4 ASSOCIATION RULE MINING IN SPATIAL DATABASE
Spatial Data Mining is the discovery of fascinating patterns from large geospatial databases. It refers to the extraction
of knowledge, spatial associations or other fascinating patterns not clearly stored in spatial databases. In data mining
association rules are encompassing spatial relations among spatial substances. Spatial database contains objects which
are described by a spatial scene and/or extension as well as by several non-spatial attributes. Spatial data mining
algorithms have to consider the neighbours of substances in order to mine useful knowledge. It is indispensable
because the attributes of the neighbours of some substance of curiosity may have a momentous inspiration on the
substance itself. The application of data mining techniques in spatial database to census data, and more generally, to
official data, has great potential in supporting worthy public strategy and in sustaining the actual operational of an
independent society. Spatial data mining approaches and procedures have been suggested for the mining of hidden
knowledge, spatial relations, or other patterns not clearly stored in spatial databases.
Spatial data mining is used in:-
HNASA Earth Observing System (EOS) for Earth science data.
> Census Bureau, Dept, of Commerce for census data.
> Dept, of Transportation (DOT) for traffic data National Inst, of Health (NIH) for cancer clusters.
II. RELATED WORK
(Literature Review)
2) PREVIOUS WORK DONE
1. “AN IMPROVED ASSOCIATION RULES MINING METHOD”
In the titled paper ,They present a form of the directed item sets graph to store the information of frequent item sets
of transaction databases, and give the trifurcate linked list storage structure of directed item sets graph.
Furthermore, They develop the mining algorithm of maximal frequent item sets based on this structure. As a result,
one realizes scanning a database only once, and improves storage efficiency of data structure and time efficiency of
mining algorithm. They introduce a directed item sets graph to store the information of frequent item sets of
transaction databases. Next they create the trifurcate linked list storage structure of directed item sets graph, and
finally develop the mining algorithm of maximal frequent item sets based on directed item sets graph. The
realization of the process in this manner leads to a single scanning of the databases. It also improves storage
efficiency of data structures and time efficiency of the mining algorithm itself. Max-Miner and Pincer-Search
search the item sets lattice in a breadth first manner to find MFI. The former algorithm uses a look-ahead strategy to
prune branches from the item sets lattice by quickly identifying long frequent item sets. The latter combines both
the bottom-up and top-down searches. The Depth-Project algorithm searches the item sets lattice in a depth first
maimer to find MFI.
2. “MINING FREQUENT PATTERNS FROM UNIVARIATE UNCERTAIN DATA”
Here they discussed a new algorithm called U2P-Miner for mining frequent U2 patterns from univariate uncertain
data, where each attribute in a transaction is associated with a quantitative interval and a probability density
function. The algorithm is implemented in two phases. First, they construct a U2P-tree that compresses the
information in the target database. Then, they use the U2P-tree to discover frequent U2 patterns. Potential frequent
U2 patterns are derived by combining base intervals and verified by traversing the U2P-tree. They also develop two
techniques to speed up the mining process. Since the proposed method is based on a tree-traversing strategy, it is
both efficient and scalable. Our experimental results demonstrate that the U2P-Miner algorithm outperforms three
widely used algorithms, namely, the modified Apriori, modified H-mine, and modified depth-first backtracking
algorithms. evaluate the performance of the U2P-Miner algorithm on synthetic and real datasets. They modified the
above- Mentioned algorithms to compare their performance with that of U2P-Miner. The H-mine algorithm
constructs an H struct for the database, and a linked array called a Header table is maintained to link the
occurrences of each item in the transactions. The modified H-mine algorithm is similar to the UH-mine algorithm,
which is a variant of the H-mine algorithm for dealing with item set uncertain data. They have proposed a novel
algorithm called U2P-Miner for mining frequent U2 patterns from univariate uncertain data.
3. “ASSOCIATION RULES MINING WITH MULTIPLE CONSTRAINTS”
In this referenced paper they proposed an algorithm for mining association rules with multiple constraints, the
proposed algorithm simultaneously copes with two different kinds
of constraints, it consists of three phases, first, the frequent 1 -itemset arc generated, second, they exploit the
properties of the given constraints to prune search space or save constraint checking in the conditional databases.
Third, for each item set possible to satisfy the constraint, they generate its conditional database and perform the
three phases in the conditional database recursively. Experimental results show that the proposed method
outperform the revised FP-growth algorithm. The problem of discovering all frequent item sets that satisfy
constraints is a difficult one, the difficulty stems from the fact that, first, testing for minimum support and
maximum support can not be done simultaneously, since when valid, one is always true for subsets while the other
is always true for supersets. Second, despite their selective power, some constraints cannot be checked to filter
candidate item sets until a very late stage of the mining process depending upon the type of constraint and the
search space traversal strategy used. However, there are some efficient algorithms proposed to deal with this
problem, but most of these algorithms only cope with one constraint, in this paper, they present an algorithm to
mine association rules with multiple constraints, it copes with two different kinds of constraints simultaneously.
4. “MINING POSITIVE AND NEGATIVE ASSOCIATION RULES FROM INTERESTING FREQUENT

AND INFREQUENT ITEM SETS”
The aim of this study is to develop a new model for mining interesting negative and positive association rules out of
a transactional data set. The proposed model is integration between two algorithms, the Positive Negative
Association Rule (PNAR) algorithm and the Interesting Multiple Level Minimum Supports
(IMLMS) algorithm, to propose a new approaeh (PNAR IMLMS) for mining both negative and positive association
rules from the interesting frequent and infrequent item sets mined by the IMLMS model. The experimental results
show that the PNARIMLMS model provides significantly better results than the previous model. The purpose of
association rule mining is to find certain associations between a set of items in a database.
5. “MULTI-LEVEL ASSOCIATION RULE MINING BASED ON CLUSTERING PARTITION”
In this paper author described The method is combined with the concept of hierarchical concept, the data of the
generalization sets processing, and uses SOFM neural network generalization into the database after the transaction,
by way of introducing an internal threshold so no need to set the minimum support threshold, to generate the local
frequent item sets as global candidates item sets to generate global frequent item sets, thereby enhancing the
efficiency of multi-level
Association rules and accuracy. And by simulating the case shows that the method can not only efficient mining
single-layer and cross-layer association rules, but also the association rules is new ,easy to understand and
meaningful.
6. “A WEB BASED RECOMMENDATION USING ASSOCIATION RULE AND CLUSTERING”
In this paper author described about the Association rule and clustering and the details are, A web based
recommendation system include browsing history database enclosed with information related to the web pages that
a user browsed. This paper presents the Prediction of User navigation patterns of WUM using Association Rule and
Clustering from web log data. In the first stage, separating the potential users is processed, and in the second stage
clustering process is used to group the users with similar interest, and in the third stage association andclustering is
used to navigate the user future requests. The experimental results are really encouraging and produce valuable
information.
7. “INTERESTINGNESS MEASURES FOR MULTI-LEVEL ASSOCIATION RULES”
Here, they propose two approaches which measure multi-level association rules to help evaluate their
interestingness. These measures of diversity and peculiarity can be used to help identify those rules from multi-
level datasets that are potentially useful. Abstract Association rule mining is one technique that is widely used when
querying databases, especially those that are transactional, in order to obtain useful associations or correlations
among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has
also been a focusing on the quality of rules from single level datasets with many interestingness measures
proposed.
III. PROPOSED WORK
In this section I discuss proposed algorithm for optimization of association rule mining, the proposed algorithm
resolves the problem of negative rule generation and also optimized the process of rule generation. Negative
association rule mining is a great challenge for large dataset. In the generation of valid rules association existing
algorithm or method generate a series of negative rules, which generated rule affected a performance of association
rule mining. In the process of rule generation various multi objective associations rule mining algorithm is proposed
but all these are not solve. In this Paper they proposed MLMS-GA of association rule mining with min-max algorithm.
In this algorithm they used MLMS used for multi-level minimum support for constraints validation. The scanning of
database divided into multiple levels as frequent level and infrequent level of data according to MLMS. The frequent
data logically assigned 1 and infrequent data logically assigned 0 for MLMS process. The divided process reduces the
un-interesting item in given tlalabu.se, The proposed algorithm is a combination of MLMS and min max algorithm
along this used level weight for the separation of frequent and infrequent item. The weight value act as Support length
key is a vector value given by the transaction data set. The support value passes as a vector for finding a near level
between MLMS candidates key. After finding a MLMS candidate key the nearest level divide into two levels, one level
take a higher odder value and another level gain infrequent minimum support value for rule generation process^ The
process of selection of level also reduces the passes of data set. Alter finding a level of lower and higher of given
support value, compare the value of level weight vector. Here level length vector work as a fitness function for
selection process of min-max algorithm. Here They present steps of process of algorithm step by step and finally draw
a flow chart of complete process.
Steps of algorithm (MLMS-GA)

1. Scanning of database used flowing steps
Some standard notation of pseudo code of algorithm such as D dataset, K level MLMS, Ls generation candidate
K = MLMS dataset (D) n = Number of multiple level block For i = 1 to n loop
Scan_k (Ki Gk)
Li if gen_itemsets (ki)
For (i = 2; L i ||§ j = 1,2 n; i++)
Ci = Uj = l,2,...nL.
End;
For i — 1 to nscan_kmap (ki£K)
For all items C GCG generate block (C, ki)
End;
LG = {c eCG}
2. Generate multiple support vector value for selection process for all transaction LG do generate
count table TC
L| =(frequenl 1-itemsets);
02 =L| oo L).
bj={CEC2| sup(c)3v1inSupNum};
For(k=3;Lk-i ?0 ;k++)do begin For (j=k;j 3n;j++)do Generate CIVijk'1;
Ck=candidate_gen(Lk-1)
Lk={cECk| sup(c) 3riinSupNum};\
End
3. Set of rule is generated •
Return L = 1/ Lk;
Candidate_gen(frequent itemsetLk-i)
a. for all(K-l)-itemsetlELk-i do
b. for all ijELk-i do
c. //S is the result of the formula(2)
If for every r(l such that S[r] 3c-1 then
Li = (frequent 1-itemsets);
C2 =L| OO L|;
L2 = {CEC2 I sup(c) SMinSupNum}; For(k=3;Lk-i 90 ;k++)do begin For (j=k;j ^n;j++)do Generate
CIVijk’1,
Ck=candidate_gen(Lk-1)
4. Check MLMS value of table
5If rule is not MLMS go to selection process
6. Else optimized rule is generated.
Exit
7.
a.) Data Encoding
The process of data in min-max algorithm needs some data encoding technique for representation of data. Here
binary encoding technique is used.
b) Fitness Function
The population selection of Min-max Algorithm is a design of Fitness function:
M(S)=(Ai/wi)+Bi/l(1-wi)
Ai={Frequent item support}
Wi1={Level of weight value of MLMS}
Bi={those value or data infrequent}
The selection strategy based on the basis of individual fitness and concentration pi is the probably of selection of
individual whose fitness value is greater than one and m(s) is a those value whose fitness is less than one but near
to the value of l.The Min-max operators determine the search capability and convergence of the algorithm. Min-
max operators hold the selection crossover and mutation on the population and generate the new population. In this
algorithm it restore each chromosome in the population to the corresponding rule, and then calculate selection
probability pi for each rule based on above formula. In which single point are used. It divide multiple level domain
of each attribute into a group and classifies the cut point of each continuous attributes into one group .And the
crossover carried out between the corresponding groups of two individuals by a certain rate. Any bit in the
chromosomes is mutated by a certain rate, that is, changing “0”to’T”,’T”to”0”. Now they explain complete process
of algorithm shows block diagram of proposed algorithm using min-max algorithm
LOAD DATA
SET CONSTRAINTS PARAMETER
START SCANNING
GET FREQUENT ITEM
PARTITION INTO MULTILEVEL OF ALL FREQUENTITEM
IF LEVEL=0 NO
YES
SET POPULATION
SELECTION
CROSS OVER
PROBABILITY OF MUTATION P=0.007
NO
IF RULE IS OPTIMAL?
YES
OPTIMIZED RULE SET
Figure 4.1: shows that proposed model for association rule mining algorithm for large database.
REFERENCES
[1] XiaobingLiu , Kun Zhai, WitoldPedrycz “An improved association rules mining method” Expert Systems with
Applications 2012, Pp 1362-1374.
[2] Ying-Ho Liu “Mining frequent patterns from univariate uncertain data” Data & Knowledge Engineering, 2012. Pp
47-68.
[3] Li Guang-yuan, Cao Dan yang, GuoJian-wei “Association Rules Mining with Multiple Constraints” Elsevier ltd.
2011, Pp 1678-1683.
[4] IdhebMohamad Ali O. Swesi, Azuraliza Abu Bakar, AnisSuhailis Abdul Kadir “Mining Positive and Negative
Association Rules from Interesting Frequent and Infrequent Item sets” IEEE 9th International Conference on Fuzzy
Systems and Knowledge Discovery, 2012. Pp 650-655.
[5] Huang QingLan, DuanLongZhen “Multi-level association rule mining based on clustering partition” IEEE Third
International Conference on Intelligent System Design and Engineering Applications, 2013. Pp 982-986.
[6] VidhuSinghal, GopalPandey “A Web Based Recommendation Using Association Rule and Clustering” International
Journal of Computer & Communication Engineering Research, 2013. Pp 1-5.
[7] Gavin Shaw, YueXu, ShlomoGeva “Interestingness Measures for Multi-Level Association Rules’ Proceedings of the
14th Australasian Document Computing Symposium, 2009. Pp 2-9.
Signature of Candidate Signature of Guide

(Ms. Sonali P. Mahindre) Prof. / Dr. G. R. Bamnote
Department of Computer Sc. & Engineering
PRMIT&R, Badnera

Synopsis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Synopsis

Uploaded by

Copyright:

Available Formats

ABSTRACT

1.2 ASSOCIATION RULE

1.2.1 ASSOCIATION RULE MINING IN LARGE DATABASE

1.2.3 ASSOCIATION RULE MINING IN RELATIONAL DATABASE

Figure 1.1: A Multi-relational Database Environment.

1.2.4 ASSOCIATION RULE MINING IN SPATIAL DATABASE

2) PREVIOUS WORK DONE

1. “AN IMPROVED ASSOCIATION RULES MINING METHOD”

2. “MINING FREQUENT PATTERNS FROM UNIVARIATE UNCERTAIN DATA”

3. “ASSOCIATION RULES MINING WITH MULTIPLE CONSTRAINTS”

4. “MINING POSITIVE AND NEGATIVE ASSOCIATION RULES FROM INTERESTING FREQUENT

6. “A WEB BASED RECOMMENDATION USING ASSOCIATION RULE AND CLUSTERING”

7. “INTERESTINGNESS MEASURES FOR MULTI-LEVEL ASSOCIATION RULES”

Steps of algorithm (MLMS-GA)

SET CONSTRAINTS PARAMETER

GET FREQUENT ITEM

PARTITION INTO MULTILEVEL OF ALL FREQUENTITEM

PROBABILITY OF MUTATION P=0.007

Signature of Candidate Signature of Guide

You might also like