You are on page 1of 23

MARKET BASKET ANALYSIS USING

R TOOL

Gaurav Mittal
DOMS-NITT
What is Market Basket Analysis?
 Understanding behavior of shoppers
 What items are bought together
 What’s in each shopping cart/basket?
 Basket data consist of collection of transaction date and
items bought in a transaction
 Itemset
 Retail organizations interested in generating qualified
decisions and strategy based on analysis of transaction
data
 what to put on sale, how to place merchandise on shelves for
maximizing profit, customer segmentation based on buying
pattern
Market Basket Analysis
 MBA uses this information to:
 Identify who customers are (not by name)
 Understand why they make certain purchases
 Gain insight about its merchandise (products):
 Fast and slow movers
 Products which are purchased together
 Products which might benefit from promotion
 Take action:
 Store layouts
 Which products to put on specials, promote, coupons…
 Combining all of this with a customer loyalty card it
becomes even more valuable
Examples
 Rule form: LHS RHS
 IF a customer buys diapers, THEN they also buy beer
 diapers  beer
 “Transactions that purchase bread and butter also purchase
milk”
 bread  butter  milk
 Customers who purchase maintenance agreements are very
likely to purchase large appliances
 When a new hardware store opens, one of the most
commonly sold items is toilet bowl cleaners
 Def: Market Basket Analysis (Association
Analysis) is a mathematical modeling
technique based upon the theory that if
you buy a certain group of items, you are
likely to buy another group of items.

 It is used to analyze the customer


purchasing behavior and helps in
increasing the sales and maintain
inventory by focusing on the point of sale
transaction data.
Definitions and Terminology
 Transaction is a set of items (Itemset).
 Confidence : It is the measure of uncertainty or trust
worthiness associated with each discovered pattern.
 Support : It is the measure of how often the collection of items
in an association occur together as percentage of all
transactions
 Frequent itemset : If an itemset satisfies minimum support,then
it is a frequent itemset.
 Strong Association rules: Rules that satisfy both a minimum
support threshold and a minimum confidence threshold
 In Association rule mining, we first find all frequent itemsets and
then generate strong association rules from the frequent
itemsets
Market Basket Analysis
General Concept: methods
_____________________________
 Method:
Frozen Potato Pretzel
  Pizza Milk Cola Chips s
Transaction 1: Frozen pizza, cola, milk
Frozen Pizza 2 1 2 0 0
Transaction 2: Milk, potato chips
Milk 1 3 1 1 1
Transaction 3: Cola, frozen pizza
Transaction 4: Milk, pretzels Cola 2 1 3 0 1

Transaction 5: Cola, pretzels Potato Chips 0 1 0 1 0


Pretzels 0 1 1 0 2

Results:

we could derive the association rules:


If a customer purchases Frozen Pizza, then they will probably
purchase Cola.
If a customer purchases Cola, then they will probably purchase
Frozen Pizza.
Market Basket Analysis
General Concept: Measures
 Support : measure of how often the collection of items
in an association occur together as a percentage of all
the transactions
 support = (containing the item combination) /( total number of record.)
 Let the rule Is "If a customer purchases Cola, then they will purchase Frozen
Pizza“
 The support for this
= 2 (number of transaction that include both Cola and Frozen Pizza is) /
5(total records )
= 40%.

 Confidence : confidence of rule “B given A” is a


measure of how much more likely it is that B occurs
when A has occurred
 100% meaning that B always occurs if A has occurred
 Confidence of a rule = the support for the combination / the support for the
condition.
For the rule "If a customer purchases Milk, then they will purchase Potato
Chips"
 confidence = support for the combination (Potato Chips + Milk) is 20%/
support for the condition (Milk) is 60%,
=33%
Association Rules Apply Elsewhere
 Retail – supermarkets, etc…
 Purchases made using credit/debit cards.
 Optional Telco Service purchases.
 Banking services.
 Unusual combinations of insurance claims can be
a warning of fraud.
 Medical patient histories.
 Restaurants and Fast-food Centre.
Preparing Data for MBA
 Determining scope of dataset (one or
many stores, what period, etc)
 Converting transaction data to itemsets
 Generalizing items to appropriate level
 Depends on objective of model
 Rolling up rare items to get adequate support
INTRODUCTION TO R
 R is a programming language and software environment for
statistical computing and graphics.
 R is part of the GNU project. Its source code is freely available
under the GNU General Public License, and pre-compiled
binary versions are provided for various operating systems.
 R uses a command line interface, though several graphical
user interfaces are available.
 Comprehensive R Archive Network (CRAN) makes it easy to
benefit from others’ work and to share your work and get
feedback on potential improvements
 For computationally-intensive tasks, C, C++, and Fortran
code can be linked and called at run time.
 R provides a wide variety of statistical (linear and
nonlinear modeling, classical statistical tests, time-series
analysis, classification, clustering, and others) and
graphical techniques.
 Another of R's strengths is its graphical facilities, which
produce publication-quality graphs which can include
mathematical symbols.
 Although R is mostly used by statisticians and other
practitioners requiring an environment for statistical
computation and software development, it can also be used
as a general matrix calculation toolbox with comparable
benchmark results to GNU Octave and its proprietary
counterpart, MATLAB
THE R ENVIRONMENT
 R is an integrated suite of software facilities for data
manipulation, calculation and graphical display. It includes
 an effective data handling and storage facility,
 a suite of operators for calculations on arrays, in particular matrices,
 a large, coherent, integrated collection of intermediate tools for data analysis,
and
 graphical facilities for data analysis and display either on-screen or on
hardcopy.
 Packages
 The capabilities of R are extended through user-submitted packages, which
allow specialized statistical techniques, graphical devices, as well as and
import/export capabilities to many external data formats.
 A statistical package is a suite of computer programs that are specialised
for statistical analysis. It enables people to obtain the results of standard
statistical procedures and statistical significance tests, without requiring
low-level numerical programming.
Process Methodology
 The data is obtained from the excel sheet
provided by the customer.
 Each row contains- 
 BUS_DT - Bussiness Date
 REST_NO – Restaurant Number
 RTL_TRAN_NO – Transaction Numbrer
 MENU_ITEM_KEY – Product Key Number
 MENU_ITEM_PLU – Menu Product Number
 MENU_ITEM_NAME – Product Name
 RCPT_DT_TMSTP – Date Of Transaction
 HALF_HOUR_KEY – The half hour in which the transaction occurred.
 COMBO_IND – Is the product offered with something else
 SERVICE_MODE_CODE – Eating / Taken
 CGY – Category
 RGLR_PRC – Regular Price
 DRV_PRC – Derived Price
 ITEM_QTY – Number of Products Ordered
Products offered at the store
 WHOPPER
 TENDERCRISP Chicken Sandwich
 Crown-shaped CHICKEN TENDERS
 French Fries
 Hamburger
 Cheeseburger
 DOUBLE CROISSAN'WICH
 BK BURGER SHOTS
 KRAFT Macaroni and Cheese
 Drinks
 Changing the given data in a new format that
contains all items purchased in a single transaction.
 Done by using VLOOKUP function in excel.
 The data obtained is re structured to remove the
multiple line of the same transaction using if…then
method in excel.
 The data is ready to be fed for statistical application.
Working in R
 Downloading Rcmndr, which is a GUI, and Apriori
or Association rules package from the CRAN.
 A GUI is run named as Rcmndr, to load the data
in the software, or the data can be directly loaded
using the command functions.
 <-Dataset <-
read.table("C:/Users/mittal/Documents/mittal.csv",
header=TRUE, sep=",", na.strings="NA", dec=".",
strip.white=TRUE)
 loading package Arules
 library("arules")
 To inspect the transactions.
 <-inspect(Dataset)
 Next, we call the function apriori() to find all
rules (the default association type for
apriori()) with a minimum support of 1% and
a confidence of 0.6.
 > rules <- arules(Adult, parameter = list(support = 0.01,
+ confidence = 0.6))
  Asking for the rules
 > rules
   Getting the Summary of the rules
 > summary(rules_whopper)
 > rules_whopper <- subset(rules, subset = rhs %in%

"income=small" &
+ lift > 1.2)
 > rules_hamburger <- subset(rules, subset = rhs %in%

"income=large" &
+ lift > 1.2)
The recommendations
 Whopper can be bundled with coke, minute
maid orange juice, French toast stick.
 Cheeseburger can be bundled with the
French fries, onion rings.
 French fries with HERSHEY®'S Fat Free Milk.
 Dutch Apple Pie with Bacon, Egg & Cheese
Biscuit Sandwich.
Challenges…!!!
 Cannot load data more than 799 rows.
 R software is usable only for learning
purpose but difficult for industrial purpose
where large amount of data to be analyzed.
 Limited knowledge available for guiding
analysis development in R.
 New codes has to be developed for
extending the database.
Thank You

You might also like