You are on page 1of 5

Association Rule Mining

Association Rules
Finds Interesting associations / correlation
relationships among large sets of data
Business Decision Making
Example Market Basket Analysis
Items likely to be purchased
Advertising strategy, Catalog Design, Store
layout
Association Rules
Forming Association rules
Universe all items
Boolean Vector
Example:
Computer Accounting_Software
[support = 5%, confidence = 60%]
Minimum support and Confidence threshold
Basic Concepts
I = {i1, i2, im} Set of Items
D Set of database Transactions
T Transaction contains a set of items and T I
Association rule

Support Percentage of transactions in D containing


both A and B Confidence Percentage of transactions in D
containing A that also contain B P(B/A)

(A)

Itemset
K-Itemset
Occurrence frequency of an itemset
Frequency, support_count (absolute support) or
count
Itemset satisfies minimum support when count >=
min_sup * number of transactions in D
Minimum Support Count
Frequent Itemset
Association Rule Mining Process
Find all frequent itemsets
Generate strong association rules from frequent
itemsets
Satisfy Minimum Support and Minimum
Confidence
Itemsets
Complete Itemsets
Closed Frequent Itemset
X is closed in a data set S if there exists no proper
super itemset Y such that Y has the same support
count as X in S
X is frequent
Maximal Frequent Itemset
X is Frequent and there exists no super-itemset Y
such that X Y and Y is frequent in S
Example:
T = { {a1,a2,a100}, {a1,a2,a50}}, min_sup = 1
Closed frequent itemsets : Both {{a1,a2,a100}:1,
{a1,a2,a50}: 2}
Maximal frequent itemset: {a1,a2,a100}

Types of Association Rules


Types of Values
Boolean, Quantitative Association Rule
Dimensions of data
Single Dimensional, Multi-dimensional
Level of abstraction
Multilevel association rules
Based on kinds of rules
Association rules, Correlation rules, Strong
gradient relationships
Based on completeness of patterns
Complete, Closed, Maximal, top-k, constrained,
approximate
Mining Single Dimensional Boolean Association Rules
Apriori Algorithm Finding Frequent Itemsets using
Candidate Generation
Uses prior knowledge of frequent itemset
properties
Level wise search

K itemsets used for exploring k+1 itemsets


Frequent 1-itemsets L1
L1 is used to find L2
Apriori Property
Reduces Search space
All non empty subsets of a frequent itemset must also
be frequent
If P(I) < min_sup then P(I U A) < min_sup
Anti-monotone property If a set cannot pass a test
all of its supersets will fail the test as well.
Any subset of a frequent itemset must be frequent

Apriori property application


Join Step
To find Lk - join Lk-1 with itself - Ck

th
li[j] j item in li
Members of Lk-1 are joinable if their first (k-2)
items are common
Members l1 and l2 of Lk-1 are joinable if
(l1[1]=l2[1])
(l1[2]=l2[2]) (l1[k-2]=l2[k-2]) (l1[k-1]< l2[k1])
Resulting itemset is l1[1], l1[2], l1[k-1], l2[k-1]

Prune Step
Ck is a superset of Lk
Determine the count of each candidate of Ck
To reduce the size of Ck - if any (k-1) subset is
not in Lk-1 it can be removed from Ck

The Apriori Algorithm


Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 2; Lk-1 !=; k++) do begin
Ck = candidates generated from Lk-1;
for each transaction t in database do
increment the count of all candidates in Ck
that are contained in t
Lk = candidates in Ck with min_support
end
return k Lk;

The Apriori AlgorithmAn Example


Apriori Algorithm
Input: Database of transactions D, min_sup
Output: L, frequent itemsets
L1 = find_frequent_1-itemsets(D);
for(k=2;Lk-1 ; k++)
{
Ck = apriori_gen(Lk-1, min_sup);
for each transaction t D
{
Ct = subset(Ck, t)
for each candidate c Ct
c.count++;
}
Lk = {c Ck | c.count >= min_sup }
}
return L = UkLk;
Apriori Algorithm
procedure apriori_gen(Lk-1 , min_sup)
for each itemset l1 Lk-1
for each itemset l2 Lk-1
if(l1[1]=l2[1]) (l1[2]=l2[2]) (l1[k-1]< l2[k-1])
{
c = l1 join l2;
// Join step
if has_infrequent_subset(c, Lk-i) then
delete c; // Prune step
else add c to Ck;
}
return Ck
procedure has_infrequent_subset(c, Lk-1)
for each (k-1) subset s of c
if s is not an element of Lk-1 then return TRUE;
return false;

You might also like