You are on page 1of 4

21/07/2016 predicting_customers

PredictingRepeatBuyers
Consumerbrandsoftenofferdiscountstoattractnewshopperstobuytheirproducts.Themostvaluable
customersarethosewhoreturnafterthisinitialincentedpurchase.Withenoughpurchasehistory,itispossible
topredictwhichshoppers,whenpresentedanoffer,willbuyanewitem.However,identifyingtheshopperwho
willbecomealoyalbuyerpriortotheinitialpurchaseisamorechallengingtask.

So,herewearepredictingwhichcustomerswillbecomerepeatbuyers.Thedatausedforpredictioncontains
theTransactionhistoryofcustomers,theoffersdetailsi.e,category,companyandbrandonwhichtheoffer
wasgranted.

Thefeaturesthatwerecalculatedforthepredictionare:
userid:uniqueidofeachcustomer.
uniqe_category:Countofuniquecategoriesonwhichpurchasewasmadebythecustomer.
Totalcatgry_trans:Totaltransactionsonthecategory.
OfferCatgry_trans:Totaltransactionsonthecategoryonwhichofferwasgranted.
uniqe_cmpny:Countofuniquecompaniesonwhichpurchasewasmadebythecustomer.
Totalcmpny_trans:Totaltransactionsonthecompany.
OfferCmpny_trans:Totaltransactionsonthecompanyonwhichofferwasgranted.
uniqe_brand:Countofuniquebrandsonwhichpurchasewasmadebythecustomer.
Totalbrand_trans:Totaltransactionsonthebrand.
Offerbrand_trans:Totaltransactionsonthebrandonwhichofferwasgranted.
unique_CtgryCompanyBrand:Countofuniquecombinationsofcategory,companyandbrandonwhich
purchasewasmadebythecustomer.
TotalCtgryCompanyBrand_trans:Totaltransactionsonthecombinationsofcategory,companyandbrand.
OfferCtgryCompanyBrand_trans:Totaltransactionsonthecombinationsofcategory,companyandbrand
onwhichofferwasgranted.
has_bought_30:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe30daysbeforethedatethecouponwasoffered.
has_bought_60:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe60daysbeforethedatethecouponwasoffered.
has_bought_90:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe90daysbeforethedatethecouponwasoffered.
has_bought_120:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe120daysbeforethedatethecouponwasoffered.
has_bought_150:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe150daysbeforethedatethecouponwasoffered.
has_bought_180:thenumberoftimesashopperhasboughtfromthecombinationofcategory,companyand
brandanofferinthe180daysbeforethedatethecouponwasoffered.
amt_range[205:144]:numberofitemspurchasedbetweentheamountrange.
(ve)signdenotesreturn.
amt_range[144:83]:numberofitemspurchasedbetweentheamountrange.
(ve)signdenotesreturn.
amt_range[83:22]:numberofitemspurchasedbetweentheamountrange.
(ve)signdenotesreturn.
amt_range[22:38]:numberofitemspurchasedbetweentheamountrange.
(ve)signdenotesreturn.
qty_range[31:14]:numberofitemspurchasedbetweenthequantityrange.
(ve)signdenotesreturn.
qty_range[14:3]:numberofitemspurchasedbetweenthequantityrange.
(ve)signdenotesreturn.
le:///home/rajat/predicting_customers.html 1/4
21/07/2016 predicting_customers

qty_range[3:20]:numberofitemspurchasedbetweenthequantityrange.
qty_range[20:37]:numberofitemspurchasedbetweentheamountrange.
offers_per:Percentageoftransactionsontheoffer.
total_returnTrans:Percentageoftransactionsonreturneditems.
total_returnQty:Totalquantityofproductsreturned.
repeater:Theoutcomevariabletobepredicted(torf).

dataset <- read.csv('/home/rajat/newMetaData.csv') #reading dataset of Feature varia


bles
df1 = subset(dataset, dataset$repeater != "None") #filtering data with missing value
s
df1$repeater = factor(df1$repeater)
#splitting data into training and testing set
index = sort(sample(nrow(df1), nrow(df1)*.9))
training = df1[index,]
testing = df1[-index,]
trainingData = training[2:31]
#training model using RandomForest
library(randomForest)

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

model = randomForest(repeater ~ . , data = trainingData, ntree = 95, mtry


=sqrt(ncol(trainingData)))
testingData = testing[2:30]
#predicting outcome variable on test data
predicted = predict(model, testingData)
#building confusionMatrix
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

##
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':


##
## margin

confusionMatrix(testing$repeater, predicted, positive = "t")

le:///home/rajat/predicting_customers.html 2/4
21/07/2016 predicting_customers

## Confusion Matrix and Statistics


##
## Reference
## Prediction f t
## f 11367 252
## t 4153 234
##
## Accuracy : 0.7248
## 95% CI : (0.7178, 0.7317)
## No Information Rate : 0.9696
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.0438
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.48148
## Specificity : 0.73241
## Pos Pred Value : 0.05334
## Neg Pred Value : 0.97831
## Prevalence : 0.03036
## Detection Rate : 0.01462
## Detection Prevalence : 0.27408
## Balanced Accuracy : 0.60695
##
## 'Positive' Class : t
##

IncludingPlots

le:///home/rajat/predicting_customers.html 3/4
21/07/2016 predicting_customers

le:///home/rajat/predicting_customers.html 4/4

You might also like