Professional Documents
Culture Documents
Association Rules
Usually applied to market baskets but other
applications are possible
Useful Rules contain novel and actionable
information: e.g. On Thursdays grocery customers are
likely to buy diapers and beer together
Trivial Rules contain already known information: e.g.
People who buy maintenance agreements are the ones
who have also bought large appliances
Some novel rules may not be useful: e.g. New
hardware stores most commonly sell toilet rings
Applications
Examples.
Rule form: Body ead [support, confidence].
buys(x, diapers) buys(x, beers) [0.5%, 60%]
major(x, CS) ^ takes(x, DB) grade(x, CS, A)
[1%, 75%]
Customer
buys beer
Customer
buys diaper
Items Bought
A,B,C
A,C
A,D
B,E,F
For rule A C:
L2
itemset sup.
{1}
2
C1
{2}
3
3
Scan D {3}
{4}
1
{5}
3
Items
134
235
1235
25
itemset
{1 3}
{2 3}
{2 5}
{3 5}
C3
sup
2
2
3
2
itemset
{1 3 5}
{2 3 5}
C2 itemset sup
{1
{1
{1
{2
{2
{3
Scan D
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Method:
Candidate itemsets are stored in a hash-tree
Leaf node of hash-tree contains a list of itemsets and
counts
Interior node contains a hash table
Subset function: finds all the candidates contained in a
transaction
We need a measure of
dependent or correlated events
P(B|A)/P(B) is called the lift
of rule A => B
X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1
X=>Y 25%
X=>Z 37.5%
Y=>Z 12.5%
50%
75%
50%
Rule
Support
Lift
X=>Y
X=>Z
Y=>Z
25%
37.50%
12.50%
2
0.86
0.57
Extensions
Food
bread
milk
skim
Fraser
2%
wheat
Sunset
white
Uniform Support
Multi-level mining with uniform support
Level 1
min_sup = 5%
Level 2
min_sup = 5%
Milk
[support = 10%]
2% Milk
Skim Milk
[support = 6%]
[support = 4%]
Reduced Support
Multi-level mining with reduced support
Level 1
min_sup = 5%
Level 2
min_sup = 3%
Milk
[support = 10%]
2% Milk
Skim Milk
[support = 6%]
[support = 4%]
Multi-level Association:
Redundancy Filtering
Some rules may be redundant due to ancestor
relationships between items.
Example
milk wheat bread [support = 8%, confidence = 70%]
2% milk wheat bread [support = 2%, confidence = 72%]
Examples
Financial: stock price, inflation
Biomedical: blood pressure
Meteorological: precipitation
Time series
Record changes of certain (typically numeric)
values over time
E.g. stock price movements, blood pressure
Event series
Series can be represented in two ways:
As a sequence (string) of events.
Empty space if no events occur at a certain time
Hard to represent multiple events
Similarity/Difference with
Association Rules
Similarities:
Groups of events : frequent item sets
Associations : Association rules
Differences:
Notion of (time) windows:
People who rent Star Wars tend to rent Empire Strikes
Back within one week
Episodes
A partially ordered sequence of events
A
A
C
B
Serial
(B follows A)
Parallel
(B follows A OR
A follows B)
B
General
(order between
A & B unknown
or immaterial but A
& B precede C)
Sub-episode / super-episode
If A, B & C occur within a time window:
Episode rules
Used to emphasize the effect of events on
episodes
Support/confidence as defined in association
rules
A
C