Data Mining-Association Mining 1

Association Rule Mining
Association Rules
Finds Interesting associations / correlation
relationships among large sets of data
Business Decision Making
Example Market Basket Analysis
Items likely to be purchased
Advertising strategy, Catalog Design, Store
layout
Association Rules
Forming Association rules
Universe all items
Boolean Vector
Example:
Computer Accounting_Software
[support = 5%, confidence = 60%]
Minimum support and Confidence threshold
Basic Concepts
I = {i1, i2, im} Set of Items
D Set of database Transactions
T Transaction contains a set of items and T I
Association rule
Support Percentage of transactions in D containing

both A and B Confidence Percentage of transactions in D
containing A that also contain B P(B/A)
(A)
Itemset
K-Itemset
Occurrence frequency of an itemset
Frequency, support_count (absolute support) or
count
Itemset satisfies minimum support when count >=
min_sup * number of transactions in D
Minimum Support Count
Frequent Itemset
Association Rule Mining Process
Find all frequent itemsets
Generate strong association rules from frequent
itemsets
Satisfy Minimum Support and Minimum
Confidence
Itemsets
Complete Itemsets
Closed Frequent Itemset
X is closed in a data set S if there exists no proper
super itemset Y such that Y has the same support
count as X in S
X is frequent
Maximal Frequent Itemset
X is Frequent and there exists no super-itemset Y
such that X Y and Y is frequent in S
Example:
T = { {a1,a2,a100}, {a1,a2,a50}}, min_sup = 1
Closed frequent itemsets : Both {{a1,a2,a100}:1,
{a1,a2,a50}: 2}
Maximal frequent itemset: {a1,a2,a100}
Types of Association Rules

Types of Values
Boolean, Quantitative Association Rule
Dimensions of data
Single Dimensional, Multi-dimensional
Level of abstraction
Multilevel association rules
Based on kinds of rules
Association rules, Correlation rules, Strong
gradient relationships
Based on completeness of patterns
Complete, Closed, Maximal, top-k, constrained,
approximate
Mining Single Dimensional Boolean Association Rules
Apriori Algorithm Finding Frequent Itemsets using
Candidate Generation
Uses prior knowledge of frequent itemset
properties
Level wise search
K itemsets used for exploring k+1 itemsets

Frequent 1-itemsets L1
L1 is used to find L2
Apriori Property
Reduces Search space
All non empty subsets of a frequent itemset must also
be frequent
If P(I) < min_sup then P(I U A) < min_sup
Anti-monotone property If a set cannot pass a test
all of its supersets will fail the test as well.
Any subset of a frequent itemset must be frequent
Apriori property application

Join Step
To find Lk - join Lk-1 with itself - Ck
th
li[j] j item in li
Members of Lk-1 are joinable if their first (k-2)
items are common
Members l1 and l2 of Lk-1 are joinable if
(l1[1]=l2[1])
(l1[2]=l2[2]) (l1[k-2]=l2[k-2]) (l1[k-1]< l2[k1])
Resulting itemset is l1[1], l1[2], l1[k-1], l2[k-1]
Prune Step
Ck is a superset of Lk
Determine the count of each candidate of Ck
To reduce the size of Ck - if any (k-1) subset is
not in Lk-1 it can be removed from Ck
The Apriori Algorithm

Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 2; Lk-1 !=; k++) do begin
Ck = candidates generated from Lk-1;
for each transaction t in database do
increment the count of all candidates in Ck
that are contained in t
Lk = candidates in Ck with min_support
end
return k Lk;
The Apriori AlgorithmAn Example

Apriori Algorithm
Input: Database of transactions D, min_sup
Output: L, frequent itemsets
L1 = find_frequent_1-itemsets(D);
for(k=2;Lk-1 ; k++)
{
Ck = apriori_gen(Lk-1, min_sup);
for each transaction t D
{
Ct = subset(Ck, t)
for each candidate c Ct
c.count++;
}
Lk = {c Ck | c.count >= min_sup }
}
return L = UkLk;
Apriori Algorithm
procedure apriori_gen(Lk-1 , min_sup)
for each itemset l1 Lk-1
for each itemset l2 Lk-1
if(l1[1]=l2[1]) (l1[2]=l2[2]) (l1[k-1]< l2[k-1])
{
c = l1 join l2;
// Join step
if has_infrequent_subset(c, Lk-i) then
delete c; // Prune step
else add c to Ck;
}
return Ck
procedure has_infrequent_subset(c, Lk-1)
for each (k-1) subset s of c
if s is not an element of Lk-1 then return TRUE;
return false;

Data Mining-Association Mining 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining-Association Mining 1

Uploaded by

Copyright:

Available Formats

Association Rule Mining

Support Percentage of transactions in D containing

Types of Association Rules

K itemsets used for exploring k+1 itemsets

Apriori property application

The Apriori Algorithm

The Apriori AlgorithmAn Example

You might also like