You are on page 1of 32

Association Rules – Market

Basket Analysis
• What it is?
• What is the business usage?
• Strength of association – intuitive understanding
• Calculating values of support , confidence and lift
• Algorithm to get rules
• Market basket analysis using R
• Assignment and sample solution
Created by – Gopal Prasad Malakar
Background
WHAT IT IS?

Created by – Gopal Prasad Malakar


Does it appear familiar?

• How is it showing suggested text?


– Historical Frequency of past searches
– More frequent ones
• What is the use?

Created by – Gopal Prasad Malakar


Any similarity with previous
example?

• What is the use?


Created by – Gopal Prasad Malakar
Association rule
• Proposed by Agrawal et al in 1993.
• It is an important data mining model studied extensively by the
database and data mining community.
• Initially used for Market Basket Analysis to find how items purchased by
customers are related
• Market Basket analysis is a modelling technique based upon the theory
that if you buy a certain group of items, you are more (or less) likely to
buy another group of items
• Assumes all data are categorical.
• The concept is easily extendable to many scenarios
• Can you guess some?

Created by – Gopal Prasad Malakar


Usage of
ASSOCIATION RULES / MARKET
BASKET ANALYSIS

Created by – Gopal Prasad Malakar


Association rules
So what’s the use
• How to display on shelfs

• Cross sell – subsequent purchase campaign

• How to develop bundle of products


– For banking (credit card + home loan)
– For telecom products / services
(400 Minutes Call + 100 SMS + 1 GB data)
Created by – Gopal Prasad Malakar
Association rules
• Search engine support
– Indexed tables for quick
search (90% of search for 10%
of items)

• Medical patient histories –


better treatment based on
association of symptoms

• Unusual combinations of
insurance claims can be a
warning of fraud

• Web site layout


Bill details Pay Bill

Created by – Gopal Prasad Malakar


Terms associated
ASSOCIATION RULES & ITS
TYPES

Created by – Gopal Prasad Malakar


What is an association rule?
• Rule form: LHS  RHS
– IF a customer buys diapers, THEN they also buy beer
• diapers  beer (actionable)
– “Transactions that purchase bread and butter also purchase milk”
• bread & butter  milk
– One needs to be careful to discover association

Created by – Gopal Prasad Malakar


Association Rules
• Association rule types:
– Actionable Rules – contain high-quality, actionable information
– Trivial Rules – information already well-known by those familiar
with the business
– Inexplicable Rules – no explanation and do not suggest action
• Trivial and Inexplicable Rules occur most often

Created by – Gopal Prasad Malakar


Association Rules
• Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of three types of candy bars [Forbes, Sept 8, 1997]
• Customers who purchase maintenance agreements are very likely to
purchase large appliances (trivial)
• When I take tea during the match, Sachin Tendulkar gets out…
(inexplicable)

• Sometimes what we think inexplicable leads to new discovery


• Will discuss a story in next slide
• So be careful of frequency of inexplicable

Created by – Gopal Prasad Malakar


Inexplicable, think off…

Created by – Gopal Prasad Malakar


Strength of the rule – how good is
the rule?
SUPPORT, CONFIDENCE & LIFT

Created by – Gopal Prasad Malakar


Intuitive
• Any answer ?

Created by – Gopal Prasad Malakar


Process
• Transaction data are analysed to generate association rules
• One transaction may contain one or more items
TID Items
T100 Apple, Banana, Cherry, Mango
T200 Apple, Mango
T300 Banana, Mango
T400 Mango, Banana, Cherry
T500 Banana, Mango
T600 Apple, Banana
T700 Apple, Cherry, Mango
• Lets figure out
– Some rules
– And how good are these rules
Created by – Gopal Prasad Malakar
Support
TID Items
• Support measures of how often
the collection of items in an T100 Apple, Banana, Cherry, Mango
association occur together as a T200 Apple, Mango
percentage of all the transactions
T300 Banana, Mango
• Rule: X  Y = n(X and Y)/ n
T400 Mango, Banana, Cherry
• X and Y can be sets of items
T500 Banana, Mango
• n(X and Y) implies number of
transactions having all items of X T600 Apple, Banana
and Y T700 Apple, Cherry, Mango
• n is the total number of
transactions

Person who bought this item (LHS)  Support


Also bought the following item (RHS)
Banana  Mango ? = 4/ 7 = 57%
Cherry  Mango ? = 3/ 7 = 43%
Created by – Gopal Prasad Malakar
Confidence
TID Items
• Confidence of rule “Y given X” is a
measure of how much more likely T100 Apple, Banana, Cherry, Mango
it is that Y occurs when X has T200 Apple, Mango
occurred
T300 Banana, Mango
• Rule: X -> Y = n(X and Y)/n(X)
T400 Mango, Banana, Cherry
• It is like conditional probability
T500 Banana, Mango
• n(X) – total number of baskets
having X T600 Apple, Banana
T700 Apple, Cherry, Mango

Person who bought this item (LHS)  Support Confidence


Also bought the following item (RHS)
Banana  Mango 57% ? = 4/ 5 = 80%
Cherry  Mango 43% ? = 3/ 3 = 100%
Created by – Gopal Prasad Malakar
Confidence
TID Items
• Confidence of rule “Y given X” is a
measure of how much more likely T100 Apple, Banana, Cherry, Mango
it is that Y occurs when X has T200 Apple, Mango
occurred
T300 Banana, Mango
• Rule: X -> Y = n(X and Y)/n(X)
T400 Mango, Banana, Cherry
• It is like conditional probability
T500 Banana, Mango
• n(X) – total number of baskets
having X T600 Apple, Banana
T700 Apple, Cherry, Mango

Person who bought this item (LHS)  Support Confidence


Also bought the following item (RHS)
Mango  Banana 57% ? = 4/ 6 = 66.6%
Mango  Cherry 43% ? = 3/ 6 = 50%
Created by – Gopal Prasad Malakar
So
• X  Y and
• Y  X has same support but different confidence

Created by – Gopal Prasad Malakar


Lift
• Is support and confidence TID Items
enough? T100 Apple, Banana, Cherry, Mango
• Lift (improvement) tells us how T200 Apple, Mango
much better a rule is at predicting
T300 Banana, Mango
the result than just assuming the
result in the first place T400 Mango, Banana, Cherry
• Lift = P(X&Y) / (P(X).P(Y)) or T500 Banana, Mango
𝑃 ( 𝑋 & 𝑌) 1 T600 Apple, Banana
• = * T700 Apple, Cherry, Mango
𝑃 𝑋 𝑃 𝑌
~ confidence of X  Y / support of Y
✓ Intuitively does knowing X helps in predicting Y?
✓ Think off for rule Banana  Mango
✓ If you know presence of banana, you can be ____ 80% sure for mango.
✓ If you didn’t know presence of banana, you can still be =__________ 6/ 7 = 85.7%
sure for mango.
✓ So lift is 80%/ 85.7% = 0.93
✓ So is your prediction better?Created by – Gopal Prasad Malakar
Lift
• Is support and confidence TID Items
enough? T100 Apple, Banana, Cherry, Mango
• Lift (improvement) tells us how T200 Apple, Mango
much better a rule is at predicting
T300 Banana, Mango
the result than just assuming the
result in the first place T400 Mango, Banana, Cherry
• Lift = P(X&Y) / (P(X).P(Y)) T500 Banana, Mango
• Does knowing X helps in T600 Apple, Banana
predicting Y T700 Apple, Cherry, Mango

(LHS)  (RHS) Support Confidence Lift


Banana  Mango 57% 80% = 80% / (6/7) = 0.93
Cherry  Mango 43% 100% ? = 100% / (6/7) = 1.16

✓ When lift > 1 then the rule is better at predicting the result than guessing
✓ When lift < 1, the rule is doing worse than informed guessing.

Created by – Gopal Prasad Malakar


How to derive rule?
BASIC ALGORITHM

Created by – Gopal Prasad Malakar


Process
• Generate all rules from the data with
– support greater than a specified minimum (why?) and
– confidence greater than a specified minimum.
• Note the complexity
• If you have just two items A and B, then there are two possible rules
– AB
– BA
• If you have three items A, B and C, here are the possibilities

Created by – Gopal Prasad Malakar


Process X Y

• With A, B and C 1 A B

• Transaction is a combination of 1 to d-1 items, 2 A C


where d is the number of items. 3 B A
• For have taken The generic formula for the 4 B C
possibilities can be given by 3^d – 2^(d+1) +1 5 C A
• So for two items, it is 3^2-2^3+1 =2 6 C B
7 A B, C
• For three items 3^3 – 2^4 + 1= 12
8 B A, C
d 3 4 5 10 100 1000
9 C A, B
Possible 12 50 180 57002 5.15*10^47 3.63*10^238
10 A, B C
rules
11 B, C A
• As the number of possibilities explodes, it is 12 A, C B
important to take only those rules, which exceeds
– Threshold of support
– Threshold of confidence
Created by – Gopal Prasad Malakar
Algorithm
• Start with all one item sets that are frequent
• In the next step combine frequent items to form 2 item sets.
• Check their min support
• Keep only those two item list, which exceeds minimum support and
minimum confidence requirement
• Iterate to add third item
• ….

Created by – Gopal Prasad Malakar


How to derive rule?
BASIC ALGORITHM - DEMO

Created by – Gopal Prasad Malakar


Example- finding frequent item
sets
Min support = 0.5 TID Items
T1 1, 3, 4
T2 2, 3, 5
{item set} : count T3 1, 2, 3, 5
1. scan T  C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3 T4 2, 5
X
 F1: {1}:2, {2}:3, {3}:3, {5}:3
 C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}
2. scan T  S2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2
X X
 F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2
 C3: {2, 3,5}
3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}

Created by – Gopal Prasad Malakar


How to derive rule?
TWO VARIANT - DEMO

Created by – Gopal Prasad Malakar


Breadth first algorithm
{}

{1} {2} {3} {4}

{1,2} {1,3} {1,4} {2,3} {2,4} {3,4}

{1,2,3} {1,2,4} {1,3,4} {2,3,4}

You are checking for support and frequency width wise

Created by – Gopal Prasad Malakar


Depth first algorithm
{}

{1} {2} {3} {4}

{1,2} {1,3} {1,4} {2,3} {2,4} {3,4}

{1,2,3} {1,2,4} {1,3,4} {2,3,4}

You are checking for support and frequency depth wise

Created by – Gopal Prasad Malakar


With R
MARKET BASKET ANALYSIS
USING R

Created by – Gopal Prasad Malakar

You might also like