You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

ORG

73

An Efficient Blocking algorithm for Privacy Preserving Data Mining


R.Sugumar, C.Jayakumar, and A.Rengarajan
Abstract Managing Huge volume of personal data and sharing of these data is proved to be beneficial for data mining application. Privacy-preserving data mining (PPDM) is one of the newest trends in privacy and security research. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. Privacy preserving data mining has become an important problem in recent years, because of the large amount of consumer data tracked by automated systems on the internet. In this paper, We design a blocking algorithm to provide better privacy than K-Anonymity method. In the Blocking based algorithms the idea is to substitute the value of an item supporting the rule we want to hide with a meaningless symbol. Index Terms Association rule mining, Apriori algorithm, Data mining, K-anonymity, Records

1 INTRODUCTION

n recent years, large amounts of data about individuals have become available with corporations as well as public entities. This has led to serious concerns about the misuse and privacy of such data. Privacy preserving data mining has become an important problem in recent years, because of the large amount of consumer data tracked by automated systems on the internet.. In addition, advances in hardware technology have also made it feasible to track information about individuals from transactions in everyday life. For example, a simple transaction such as using the credit card results in automated storage of information about user buying behavior. In many cases, users are not willing to supply such personal data unless its privacy is guaranteed. Therefore, in order to ensure effective data collection, it is important to design methods which can mine the data with a guarantee of privacy. Another interesting method for privacy preserving data mining is the k- anonymity model. In the kanonymity model, domain generalization hierarchies are used in order to transform and replace each record value with a corresponding generalized value. The problem of privacy-preserving data mining has turn into more significant in recent years because of the growing capability to accumulate private data about users, and the ever increasing sophistication of data mining algorithms to influence this information. A number of techniques such as statistical disclosure control, distributed data privacy, randomization and k-anonymity, etc., have been recommended in recent years in order to execute data mining operations in a privacy preserving way. In addition, the problem has been discussed in da-

tabase community, the statistical disclosure control community and the cryptography community. The rest of the paper is organized as follows. In Section 2 association rule hiding and the related works are discussed. Section 3 gives the general problem formulation and the basic definitions of association rule mining. In Section 4, the proposed blocking algorithm for sensitive
item modification is given. The effectiveness of the algorithm is evaluated and the experimental results of the proposed technique are discussed in Section 5. Conclusions are given in Section 6.

2.RELATED WORK :ANONYMITY MODELS


K-anonymization techniques have been the focus of intense research in the last few years. In order to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications, several extending models are proposed, which are discussed as follows.

2.1 K-anonymity

K-anonymity is one of the most classic models, which technique that prevents joining attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k . In the k-anonymous tables, a data set is kanonymous (k 1) if each record in the data set is indistinguishable from at least (k-1) other records within the same data set. The following database: first last Harry John Beatrice John Stone Reyser Stone Delagado age 34 36 34 22 race Afr-Am Cauc Afr-Am Hisp

R.Sugumar is an Assistant Professor with R.M.D.Engineering College,Chennai-601206,India. C.Jayakumar is an a Professor with R.M.K.Engineering College,Chennai601206,India. A.Rengarajan is an Assistant Professor with Sree Sastha Institute Of Engineering and Technology,Chennai-600123,India.

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

74

Table 1
Can be 2-Anonymized with suppression as follows:

first * John * John

last Stone * Stone *

age 34 * 34 *

race Afr-Am * Afr-Am *

The rule {milk , sugar} coffee has a confidence of 1 / 2 = 0.5 in the database, which means that for 50% of the transactions containing milk and sugar the rule is correct.

3.2 Apriori Algorithm


Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items

Table 1 and 2 Example of k-anonymity The larger the value of k, the better the privacy is protected. K-anonymity can ensure that individuals cannot be uniquely identified by linking attacks.

3. PROBLEM FORMULATION
3.1 Formulation of Association Rule
Association rule hiding refers to the process of modifying the original database in such a way that certain sensitive association rules disappear without seriously affecting the data and the nonsensitive rules. Association rule mining is defined as: Let be a set of n binary attributes called items.Let be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form XY where and . The sets of items (for short item sets) X and Y are called antecedent (left-hand-side or LHS) and consequent (righthandside or RHS) of the rule respectively. For example T = {T1, T2, T3, T4, T5}. I= {crme, sugar, coffee, beer, bread, chips, cheese, milk, oranges, apples, eggs}. Support measure of X is denoted as Support(X). The confidence of a rule is defined

bought by customers, or details of a website frequentation). Apriori algorithm is the most popular algorithm to find all the frequent sets . It makes use of the downward closure property. Apriori algorithm is a bottom-up search, moving upward level-wise in the lattice. Before reading the database at every level it graciously prunes many of the sets which are unlikely to be frequent sets. The Apriori frequent item set discovery algorithm uses the two functions namely candidate generation and pruning at every iteration. It moves upward in the lattice starting from level I till level k, where no candidate set remains after pruning. It has two processes such as Candidate Generation, Pruning. Table3. Apriori Algorithm L1: = {frequent 1-itemsets}; k:= 2; // k represents the pass number While (Lk-1) Ck = New candidates of size k generated from Lk-1For all transactions t D Increment count of all candidates in Ck that are contained in t Lk = All candidates in Ck with minimum support k = k+1

Table 3. Transactional Database T id 1 2 3 4 5 Items {crme , sugar , coffee , beer} {bread, chips , cheese , milk} {oranges , sugar , crme , beer} {apples , beer , crme , sugar} {eggs , milk , coffee , sugar }

The first pass of the algorithm calculates single item frequencies to determine the frequent 1-itemsets. Each subsequent pass k discovers frequent itemsets of size k. To do this, the frequent itemsets Lk-1 found in the previous iteration are joined to generate the candidate itemsets Ck. Next, the support for candidates in Ck is calculated through one sweep of the transaction list. From Lk-1, the set of all frequent (k-1) itemsets, the set of candidate kitemsets is created. Consider a given transactional database D, minimum support threshold value SUPmin, minimum confidence threshold value CONFmin, a set of association rules AR can be mined from D and a set of sensitive association rules ARsen mined from D and set of sensitive rules ARsen AR to be hidden, generate a new database D , such that the rules in ARnon-sen=AR-ARsen can mined from D under the same SUPmin and CONFmin C. No normal rules in ARnon-sen are falsely hidden (lost rules) and no extra fake rules are (ghost rules) are mistakenly will mined after the rule hiding process.

From the table 1 the item set {milk, sugar} has a support of 1 / 5 = 0.2 since it occurs in 60% of all transactions (3 out of 5 transactions).

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

75

4.PROPOSED SOLUTION
4.1 Blocking Algorithm In the Blocking based algorithms the idea is to substitute the value of an item supporting the rule we want to hide with a meaningless symbol. We describe here results of a blocking algorithm which reduces loss of data and minimizes the undesirable side effects by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. To modify the database in a way that an adversary cannot recover the original values of the database The following steps are required for the proposed solution. 1st step: o For each (trxn) left n right sensitive rule RS (Rule RS has left itemset IL and right itemset IR) compute how many 0s and 1s you have to block, in order to reduce the confidence of RS. 2nd step: o Find the set of transactions TR that support RS or the set of transactions TLpR that support partially RS (support partially the left itemset and do not support the right itemset). o For each transaction in TR find the rules Rcommon with at least one common item with IR and for each transaction in TLpR find the Rcommon NBRS with at least one common item with IL. Assign a weight w for each Rcommon and a weight w for each Rcommon. Assign a PT(priority) for each transaction in T such as PT is large if transaction Ti(trxn) has many Rcommon rules with large w, and a priority value PT for each Ti such as PT is small if transaction T has many Rcommon rules with large w.

Experimental Results of Blocking Algorithm is followed.

700 600 Large Itemsets 500 Remained 400 300 200 100 0 10% 20% 40% Safe ty M argin 60%

BA CRA

Figure 1:Large Item sets Remained after The hiding process

100% R ules C hanged(% ) 80% 60% 40% 20% 0% 10% 20% 40% Safe ty M ar gin

BA CRA
60%

Figure 2:Rules changed (%) after the process

3rd step: o Sort T TR starting from them with lowest PTi. and sort TTLRp starting from them with highest PTi. 4th step: o For the first N1 sorted TTR block an item iIR and for the first N0 sorted TTLRp block an item i IL 5th step: o Update values minconf(Ri), minsup(Ri), for all other rules that have been affected.

140 120 100 80 60 40 20 0

Time in secs

BA CRA
2500 5000 7500 10000 Databas e Trans actions

Figure 3:Databases with average 13 items per transaction

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

76

4.2 Privacy Breaches Definitions


If an item i, some values of which, are hidden by ?s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence. For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule. For a blocked item i in a specific transaction T, a privacy breach occurs if the adversary can estimate with c% confidence that its original value is either 0 or 1.

[6] Evfimievski, A., R. Srikant, R. Agrawal and J. Gehrke, 2002. Privacy preserving mining of association rules.Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-25, ACM Press, Edmonton, AB., Canada. [7] Igor Nai Fovino and Marcelo Masera, 2008,Privacy Preserving Data Mining, a Data Quality Approach 8] Michael W. Berry, Society for Industrial and Applied Maths, Pro ceedings of the Fourth SIAM International Conference on Data Mining [9] Emmanuel Pontikakis, Vassilios Verykios, 2004 An Experimen Tal Study of Association Rule Hiding Techniques, Computer Tech nology Institute Research Unit 3,Athens, Greece [10] Alexandre Evifimievski, Ramakrishnan Srikant, Rakesh Agar wal, Johannes Gehrke,2004, Privacy Preserving Mining Of Association Rules Journal of Information Systems- Knowledge Discovery and Data Mining.

4.3 Goals that an algorithm has to achieve:


To put a relatively small number of ?s and reduce significantly the confidence of senstitive rules. To minimize the undesirable side effects (rules and item sets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. To modify the database in a way that an adversary cannot recover the original values of the database.

The Authors:
Sugumar.R received the Undergraduate Degree in Computer Science and Engineering from Madras University, in 2003 and the Post Graduate degree in Computer Science and Engineering from Dr.M.G.R. Educational and Research Intituite, Chennai in 2007. He is currently doing her research in Faculty of Computer Science Engineering at Bharath University, Chennai-73.He has more than 5 publications in National Conferences and international journal proceedings. He has more than 8 years of teaching experience. His areas of interest include Data Mining, Data Structures, Database Management Systems, Distributed systems and Operating systems. He is currently working as an Assistant Professor in the Department of Information Technology at R.M.D.Engineering College, Chennai, India. C.Jayakumar has more than 14 years of teaching and research experience. He did his Postgraduate in ME in Computer Science and Engineering at College of engineering, Guindy, and Ph.D in Computer Science and Engineering at Anna University, Chennai. He has published more than 35 research papers in High Impact factor International Journal, National and International conferences and visited many countries like USA and Singapore. He has guiding a number of research scholars in the area Adhoc Network, Security in Sensor Networks, Mobile Database and Data Mining under Anna University Chennai, Anna University of Technology, Sathayabama University and Bharathiyar University. He conducted Various National Conference, Staff Development Program, Workshop, Seminar in associated with Industries like Infosys and TCS. He has Received Rs 22 Lakhs Grant from AICTE for RPS Project and Staff Development Program. He chaired various International and National Conferences. He was Advisor and Technical Committee Member for many International and National Conferences. Currently he is working as Professor in the Department of Computer Science and Engineering, RMK Engineering College.

5. CONCLUSION
We have proposed blocking algorithm in this paper for generating association rule. This work describes a method that reduces loss of data and minimizes the undesirable side effects by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. The purpose of the blocking algorithm for privacy preserving data mining is to hide certain crucial information so they cannot discovered through association rule.

REFERENCES
[1] Luo Yongcheng Le Jiajin and Wang Jian 2009, Survey of Anonyity Techniques for Privacy Preserving, Donghua University,China [2] Pingshui WANG, 2010, International Journal of Digital Content Technology and its Applications , A Survey Of Randomization Techniques For Privacy Preserving Data Mining , China. [3] Latanya Sweeney, 2002, k-anonymity: a model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,USA. [4 ] Charu C. Aggarwal and Philip S. Yu, Privacy-Preserving Data Mining: A Survey, IBM, T. J. Watson Research Center. [5] S. Fed eration For Information Processing. Emmanuel D.Pontikakis, Achilleas A.Tsitsonis and Vassilios Verykios,2004 An Expreimental Study of Distortion Based Techniques for Association Rule Hiding IFIP International

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

77

A.Rengarajan received the Undergraduate Degree in Computer Science and Engineering from Madras University, in 2003 and the Post Graduate degree in Computer Science and Engineering from Dr.M.G.R. Educational and Research Intituite, Chennai in 2007. He is currently doing her research in Faculty of Computer Science Engineering at Bharath University, Chennai-73.He has more than 5 publications in National Conferences and international journal proceedings. He has more than 8 years of teaching experience. His areas of interest include Data Mining, Data Structures, Database Management Systems, Distributed systems and Operating systems. He is currently working as an Assistant Professor in the Department of Information Technology at R.M.D.Engineering College, Chennai, India.

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

You might also like