You are on page 1of 5

International Journal of Engineering Research

Volume No.5, Issue No.4, pp : 264-268 1 April 2016

ISSN:2319-6890)(online),2347-5013(print)

A New Model for Profitable Pattern Mining


Jagrat Gupta1, Akhilesh Tiwari2
1,2
Department of Computer Science and Engineering, Madhav Institute of Technology & Science,
Gwalior, India
1
jagrat.webtech@gmail.com, 2 atiwari.mits@gmail.com
Abstract: Profitable pattern mining is a captivating research
area that accentuates to adjudicate the business
objectives.One of the most prominent and unaddressed
business objectives regarding this discussion is Profit.The
research problem as well as the prominent objective behind
this research paper is to extract profitable rules accurately,
efficiently as well as in optimized manner. To accomplish the
above objective, incorporation of rough set theory followed by
conventional association rule mining algorithm and genetic
based optimization is used in optimized and efficient manner.
The proposed model overcomes the major findings to make
model more beneficial for any business organization in the
current scenario.
Keywords: Association rules, Data Mining, Genetic
Algorithm, Market Basket Analysis, Profitable Pattern
mining (PPM), Rough Set Theory (RST).
I. Introduction
Now-a-days, the explosive growth of amount of data gathered
by systems has needed to analyze as well as discover
interesting and non obvious information from such huge
amount of data. This explosive gathering of data is possible
only due to the technological advancements and available
storage facilities. So there is urgent challenge as well as
requirement for the development of tools and techniques for
analyzing immense data. Data mining emerged as the new
research area to meet this challenge.
Data mining, also called Knowledge mining, Knowledge
extraction, Data archaeology, Data Dredging, is the process
for the extraction of valuable information from the huge
amount of data. It is one of the most important analysis step of
the Knowledge Discovery in Databases or KDD Process [i].
An official definition of KDD given by Usama Fayyad in 1996
is: KDD or Data Mining is non-trivial process of identifying
valid, novel, potentially useful and ultimately understandable
patterns in data. It was also suggested that data mining should
be used for the discovery as well as analysis stage of KDD
process. Another aspect of Data Mining is that different
applications are incorporated by researchers during their
research work but the first application in the context of Data
Mining is Market Basket Analysis (MBA).
Market Basket Analysis or super market analysis is a modeling
technique which is widely used to identify the purchasing
relationship between items.So in precisely MBA is a tool used
in business intelligent decisions [ii]. For Example, Market
basket analysis gives essential sales information about group of
goods basis Customers who buys Bread often also buy several
products related to Bread like Milk, Butter or Jam.Such related
IJER@2016

groups of goods also must be located side-by-side in order to


remind customers of related items and to lead them through the
center in a logical manner. Typically the relationship will be
represented in the form of rules referred as Association rules.
Extraction of Association rules is one of the most prominent
Data Mining tasks which were given by R. Agrawal [iii]. Such
rules describe the co-occurrence relationship among the set of
items in a dataset [iv]. For Example:
{Bread} {Milk or Butter or Jam}
The probability that a customer will buy Bread or Milk or Jam
is referred as theSupport for the rule. The Conditional
probability that a customer will purchase Milk or Butter or Jam
is referred as the Confidence. These rules as well as measures
lead to analyze various types of scenarios related to market or
organization.
Literature indicates that several measures are addressed by
researchers during their research work but still there are some
measures that are required to get more attention by the
researchers [v], [vi]. One of such measure is Profit that leads to
the evolution of Profitable Pattern Mining. Next section
describes the interpretation as well as the objective of Profitable
Pattern Mining.
II. Profitable Pattern Mining
Now-a-days, prominent issueis whether a customer purchases
an item recommended by organization. Regarding this,
different factors like items stocked, competitors` offers, prices,
promotions, recommendations by individuals, psychological
issues, individual interest etc are in consideration. As far as
implementation of all such factors are concerned, it is difficult
to implement all these factors in a single model but still some
factors can be taken into considerations and build up a model
for the enhancement of Profit. All the above Considerations
lead to the evolution of Profit Mining. So the major concern in
Profit mining problem is to determine an item that is of the
interest to customer at affordable price and also profitable for
organization.
The prominent objective of Profit Pattern Mining (PPM) is to
develop a model which generates profitable rules as well as
recommender rules that recommend target items for future
customer [vii]. It is a new technique as well as extension of
association Rule Mining which aims to extract those patterns
which contributes maximum profit for organization. Following
figure clearly explains level wise hierarchy of relationship from
KDD Process to profit mining.

doi : 10.17950/ijer/v5s4/409

Page 264

International Journal of Engineering Research


Volume No.5, Issue No.4, pp : 264-268 1 April 2016

ISSN:2319-6890)(online),2347-5013(print)
Relationship between unit profit and sales volume is defined as
Graphical representation:-

Fig. 1 Hierarchy of Profitable Pattern Mining


Literature indicates that major obstacle in the association Rule
mining application is the gap between the Statistical based
patterns extraction (decisions are based on using some
statistical measures) and Valued based decision making
(decisions are based on economic values attached to the item
sets). Profit Pattern mining reduces this gap.
Because the field of Profitable Pattern Mining is new, So
researchers gave their initial prominent contributions regarding
this field for incorporating Optimization techniques for
reducing the search space [viii], Utility based data mining
techniques for considering profit as utility [ix] and Uncertainty
measurement techniques to extract the profitable rules that
contain uncertainty but profitable as business point of view [x].
Now next section describes the most prominent contributions
given by researchers in the field of profitable pattern mining.
III. Literature Review
In 2002Ke Wang, Senqiang Zhou, and Jiawei Han presented
a concept of profit mining to overcome the gap between the
statistic-based pattern mining and the value-based decision
making [vii]. Wang et al. obtained a set of past transactions and
pre-selected target items, and intended to build a model for
recommending target items and promotion strategies to new
customers, with the goal of maximizing the overall profit.The
aim of profit mining is to suggest right cost and right item.
If the cost is too high, the customer will go away without
producing any profit; if the price is too low or if the item is not
profitable, the profit will not be maximized. The major issues
regarding this context are Profit generated patterns, shopping
on unavailability, optimality and interpretability of
recommender.
In 2007, Yaohua Chen, Yan Zhao and Yiyu Yao proposed a
profit based business based model that evaluates interestingness
of rules [xi]. Chen et al. introduce two types of marketing
strategies to increase profits. Marketing strategies for
increasing the profit can be generally classified into two groups
based on the two factors in the profit model: Price-based strategy: based on increasing the unit
profit.
Volume-based strategy: based on increasing the
sales volume.

IJER@2016

Fig. 2 Relationships between Unit Profit and Sales Volume


Final results for describing the notion of profit of the rules of
type A B (B is dependent on sales of A) can be concluded by
following table:
Table I
Summarization of Profit Results
Strategies
Price Based
Volume Based
Strategy
Strategy
Item
sets
Profit Decreases Profit Increases
A
Profit Increases Profit Decreases
B

In 2010, Sandhu, P.S. et al. proposed another efficient


approach based on utility and weight factor for mining of
efficient association rules [xii]. In the proposed approach,
weight age (W-gain) and utility (U-gain) constraints are applied
over set of association rules, and for every association rule
mined, a combined Utility Weighted Score (UW-Score) is
computed. Sandhu et al. determined a subset of valuable
association rules based on the UW-Score computed. The
experimental results show the effectiveness of the proposed
approach in terms of generating high utility association rules
that can be profitably applied for business growth.
In 2014, Sameer Kumar Vishnoi, Vivek Badhe proposed a
model for profit pattern based on genetic algorithm [viii].
During the research work, the concept of profit pattern mining
is applied with genetic algorithm to generate profit oriented
pattern which help out in future business expansion and fulfill
the business objectives. In this paper Vishnoi et al. proposed
two types of profits i.e. value profit and percentage of profit as
well as two quality measures like completeness and
Interestingness. They applied classical association rule mining
followed by genetic algorithm techniques. The experimental
results show the effectiveness of the proposed approach in
terms of generating optimized profitable patterns that can be
profitably applied for business growth.

doi : 10.17950/ijer/v5s4/409

Page 265

International Journal of Engineering Research


Volume No.5, Issue No.4, pp : 264-268 1 April 2016
IV. Proposed Methodology
The novel proposed computational model and design for
mining the profitable pattern mining using Rough set theory
(RST) and Genetic algorithm (GA) is represented through the
following approach:

ISSN:2319-6890)(online),2347-5013(print)

Check Fitness: Calculate Fitness value from defined


fitness function.
Check Terminate Condition.
Step 9. Store the result of Genetic algorithm as a final result
which contains the optimized (profitable) rules.
Step 10. Mapped the final rule set into desire format.
Step 11. Stop.
For incorporating the Profitability in association rule
mining, three types of profits margin profit, weighted profit as
well as percentage profit are defined.
For each rule A -> B, the fitness function is defined as: =

1 + 2
1 + 2

Where,
C- Completeness
I- Interestingness
The value of w1 and w2 is calculated as:1 =

2 =

Completeness (C) and Interestingness (I) for A B are defined


as follows:Completeness (C):- Rule completeness measure is defined by
following formula:

Interestingness (I):- Rule Interestingness measure is defined


by following formula:
Interestingness
Fig.3. Proposed Framework of Profit Mining
Pseudo code of proposed framework:
Step 1. Start.
Step 2. Load the dataset (D) and CO_QUA (Cost-Quantity)
table.
Step 3. Convert the dataset (D) into bit vector format.
Step 4. Apply Traditional Rough Set Theory to perform
Reduct operation.
Step 5. Obtain the reduct dataset from dataset.
Step 6. Apply Apriori algorithm through calling the Function
Apriori to reduct dataset for rule generation with defined
parameter support and confidence.
Step 7. Store the output of apriori algorithm to rule set.
Step 8. Apply the Genetic algorithm on the rule set.
Selection: Random Selection.
Crossover: Single Point Crossover (After 2nd position
from left).
Mutation: flipping of bit is performed (2nd position from
left) over onlychromosomes that cannot be represented
through representation scheme.
IJER@2016

Where, N is the total no of transactions in dataset.


=

CO_QUA Table: CO_QUA table (Cost-Quantity table)


contains 6 tuples in which there are 4 attributes i.e. items, cost
price, sales price and quantity respectively. The tabular
representation of CO_QUA table is as follows:

Items
A
B
C
D
E
F

doi : 10.17950/ijer/v5s4/409

Cost
Price
10
08
13
12
10
10

Table II
CO_QUA Table
Sales
Weight(Quantity)
Price
15
2
16
1
16
3
14
1
13
2
11
1

Page 266

International Journal of Engineering Research


Volume No.5, Issue No.4, pp : 264-268 1 April 2016

ISSN:2319-6890)(online),2347-5013(print)
Following representation scheme
representing the chromosomes:-

V. Illustration and Results


Dataset (D) in Bit Vector format:

TID
1
2
3
4
5
6
7
8
9
10

A
1
1
0
1
1
0
1
0
1
0

Table III
Bit Vector Formatted Dataset
B
C
D
E
1
0
0
0
1
1
1
0
0
0
0
1
1
1
1
1
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
0
1
1
0
0
1
1
0

Present & Antecedent


Present & Consequent
Absent
F
0
0
0
0
0
0
1
0
0
1

Incorporation of Rough Set Theory: Rough set theory (reduct


operation)is applied over the above bit vector formatted dataset.
The result after applying the rough set theory gives two reducts
abcef and abdef.Out of these two reducts, anyone can be
processed for further computation. Here abcef reduct is chosen
as further computation. The reduct dataset would be:

TID
1
2
3
4
5
6
7
8
9
10

A
1
1
0
1
1
0
1
0
1
0

B
1
1
0
1
1
0
0
0
0
0

Table IV
Reduct dataset
C
E
0
0
1
0
0
1
1
1
0
0
1
0
0
0
0
1
1
0
1
0

Table V
Association Rule Set
Rules
Min Confidence
AB
66.66 %
BA
100 %
AC
50 %
CA
60 %

Incorporation of Genetic Algorithm:


Representation Schemes of chromosomes:
Chromosome representation is the initial stage for
incorporating the genetic based optimization.
IJER@2016

1.
2.

opted

for

11
10
00

1st Generation Results:


1st generation results can be obtained after applying
selection, crossover and mutation operation. Minimum
fitness value is taken as 1.30.

S.N.

Table VI
Rules after 1st Generation
Rules
Min
Confidence
AB
66.66 %
BA
100 %

2nd Generation Results:


2nd generation results can be obtained by processing
the rules extracted from 1st generation.

S.N.
1.
2.

F
0
0
0
0
0
0
1
0
0
1

Incorporation of Apriori algorithm: Conventional


association rule mining is applied over above reduct dataset to
extract the association rules. Minimum support is taken as 3
and minimum confidence is taken as 50 %. Taking the above
considerations, following rules are extracted:

S.N.
1
2
3
4

is

Table VII
Rules after 2nd Generation
Rules
Min
Confidence
BA
100 %
AB
66.66 %

Because 2nd generation rules are same as 1st generation


rules so from here further processing is stopped.
Hence final most optimized profitable rules are:

S.N.
1.
2.

Table VIII
Final Profitable Rules
Rules
Min
Confidence
BA
100 %
AB
66.66 %

VI. Conclusion
The presented research paper not only formalizes all existing
profitable pattern mining techniques that have already
contributed by researchers but also solves the optimization
problem in more efficient manner as well as provides a new
model for effective profitable pattern mining that is optimized,
accurate as well as efficient.
References
i. J. Han and M. Kamber, Data Mining: Concepts and
techniques, Morgan Kaufmann Publishers, Elsevier India, 2001.
ii. Gajalakshmi. V, M.S. Murali Dhar, A Survey on algorithms for
Market Basket Analysis, International Journal of Advance Research
in computer science and Management Studies, 2013.
iii. R Agrawal, T.Imielinski, and A.Swami, 1993. Mining
association rules between sets of items in large databases, in
proceedings of the ACM SIGMOD Int'l Conf. on management of data,
pp. 207-216.

doi : 10.17950/ijer/v5s4/409

Page 267

International Journal of Engineering Research


Volume No.5, Issue No.4, pp : 264-268 1 April 2016
iv. Margaret H. Dunham, Yongqiao Xiao, Le Gruenwald and Zahid
Hossain, A Survey of Association Rules, 2001.
v. Chunhua Ju, Fuguang Bao, Chonghuan Xu and Xiaokang Fu,
A novel Method of Interestingness Measures for Association Rules
mining Based on Profit, Hindawai Publishing Corporation, Discrete
Dynamics in Nature and Society, Volume 2015.
vi. Liqiang Geng, Howard J. Hamilton, Interestingness Measures
for Data Mining: A Survey, ACM Computing Surveys, Vol. 38, No. 3,
Article 9, September 2006.
vii. Ke Wang, Senqiang Zhou, and Jiawei Han, Profit Mining:
From Patterns to Actions, Springer Verlag Berlin. C.S. Jensen et al.
(Eds.): EDBT 2002, LNCS 2287, pp. 7087, 2002.
viii. Sameer Kumar Vishnoi, Vivek Badhe, Association rule mining
for profit pattern using Genetic algorithm, International Journal of
Emerging Technology and Advanced Engineering, Volume 4, Issue 5,
May 2014.

IJER@2016

ISSN:2319-6890)(online),2347-5013(print)
ix. Hong Yao, Howard J. Hamilton, Liqiang Geng, A Unified
framework for Utility Based Measures for mining item sets, Second
International Workshop on Utility Based Data Mining held in
conjunction with the KDD conference, August 2006.
x. Vivek Badhe, Dr. R.S. Thakur and Dr. G.S. Thakur, Vague Set
theory for profit Patterns and decision making in uncertain data,
International Journal of Advanced Computer Science and
Applications, Vol. 6, No. 6, 2015.
xi. Yaohua Chen, Yan Zhao and Yiyu Yao, A Profit-based
Business Model for Evaluating Rule Interestingness, Proceedings of
the 20th Canadian Conference on Artificial Intelligence (CAI07),
296-307, 2007.
xii. Sandhu, P.S., Dhaliwal D.S., Panda, S.N. and Bisht, A., An
Improvement in Apriori Algorithm Using Profit and Quantity ICCNT
Year: 2010, IEEE conference publication.

doi : 10.17950/ijer/v5s4/409

Page 268

You might also like