Professional Documents
Culture Documents
ORG
73
1 INTRODUCTION
n recent years, large amounts of data about individuals have become available with corporations as well as public entities. This has led to serious concerns about the misuse and privacy of such data. Privacy preserving data mining has become an important problem in recent years, because of the large amount of consumer data tracked by automated systems on the internet.. In addition, advances in hardware technology have also made it feasible to track information about individuals from transactions in everyday life. For example, a simple transaction such as using the credit card results in automated storage of information about user buying behavior. In many cases, users are not willing to supply such personal data unless its privacy is guaranteed. Therefore, in order to ensure effective data collection, it is important to design methods which can mine the data with a guarantee of privacy. Another interesting method for privacy preserving data mining is the k- anonymity model. In the kanonymity model, domain generalization hierarchies are used in order to transform and replace each record value with a corresponding generalized value. The problem of privacy-preserving data mining has turn into more significant in recent years because of the growing capability to accumulate private data about users, and the ever increasing sophistication of data mining algorithms to influence this information. A number of techniques such as statistical disclosure control, distributed data privacy, randomization and k-anonymity, etc., have been recommended in recent years in order to execute data mining operations in a privacy preserving way. In addition, the problem has been discussed in da-
tabase community, the statistical disclosure control community and the cryptography community. The rest of the paper is organized as follows. In Section 2 association rule hiding and the related works are discussed. Section 3 gives the general problem formulation and the basic definitions of association rule mining. In Section 4, the proposed blocking algorithm for sensitive
item modification is given. The effectiveness of the algorithm is evaluated and the experimental results of the proposed technique are discussed in Section 5. Conclusions are given in Section 6.
2.1 K-anonymity
K-anonymity is one of the most classic models, which technique that prevents joining attacks by generalizing and/or suppressing portions of the released microdata so that no individual can be uniquely distinguished from a group of size k . In the k-anonymous tables, a data set is kanonymous (k 1) if each record in the data set is indistinguishable from at least (k-1) other records within the same data set. The following database: first last Harry John Beatrice John Stone Reyser Stone Delagado age 34 36 34 22 race Afr-Am Cauc Afr-Am Hisp
R.Sugumar is an Assistant Professor with R.M.D.Engineering College,Chennai-601206,India. C.Jayakumar is an a Professor with R.M.K.Engineering College,Chennai601206,India. A.Rengarajan is an Assistant Professor with Sree Sastha Institute Of Engineering and Technology,Chennai-600123,India.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
74
Table 1
Can be 2-Anonymized with suppression as follows:
age 34 * 34 *
The rule {milk , sugar} coffee has a confidence of 1 / 2 = 0.5 in the database, which means that for 50% of the transactions containing milk and sugar the rule is correct.
Table 1 and 2 Example of k-anonymity The larger the value of k, the better the privacy is protected. K-anonymity can ensure that individuals cannot be uniquely identified by linking attacks.
3. PROBLEM FORMULATION
3.1 Formulation of Association Rule
Association rule hiding refers to the process of modifying the original database in such a way that certain sensitive association rules disappear without seriously affecting the data and the nonsensitive rules. Association rule mining is defined as: Let be a set of n binary attributes called items.Let be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form XY where and . The sets of items (for short item sets) X and Y are called antecedent (left-hand-side or LHS) and consequent (righthandside or RHS) of the rule respectively. For example T = {T1, T2, T3, T4, T5}. I= {crme, sugar, coffee, beer, bread, chips, cheese, milk, oranges, apples, eggs}. Support measure of X is denoted as Support(X). The confidence of a rule is defined
bought by customers, or details of a website frequentation). Apriori algorithm is the most popular algorithm to find all the frequent sets . It makes use of the downward closure property. Apriori algorithm is a bottom-up search, moving upward level-wise in the lattice. Before reading the database at every level it graciously prunes many of the sets which are unlikely to be frequent sets. The Apriori frequent item set discovery algorithm uses the two functions namely candidate generation and pruning at every iteration. It moves upward in the lattice starting from level I till level k, where no candidate set remains after pruning. It has two processes such as Candidate Generation, Pruning. Table3. Apriori Algorithm L1: = {frequent 1-itemsets}; k:= 2; // k represents the pass number While (Lk-1) Ck = New candidates of size k generated from Lk-1For all transactions t D Increment count of all candidates in Ck that are contained in t Lk = All candidates in Ck with minimum support k = k+1
Table 3. Transactional Database T id 1 2 3 4 5 Items {crme , sugar , coffee , beer} {bread, chips , cheese , milk} {oranges , sugar , crme , beer} {apples , beer , crme , sugar} {eggs , milk , coffee , sugar }
The first pass of the algorithm calculates single item frequencies to determine the frequent 1-itemsets. Each subsequent pass k discovers frequent itemsets of size k. To do this, the frequent itemsets Lk-1 found in the previous iteration are joined to generate the candidate itemsets Ck. Next, the support for candidates in Ck is calculated through one sweep of the transaction list. From Lk-1, the set of all frequent (k-1) itemsets, the set of candidate kitemsets is created. Consider a given transactional database D, minimum support threshold value SUPmin, minimum confidence threshold value CONFmin, a set of association rules AR can be mined from D and a set of sensitive association rules ARsen mined from D and set of sensitive rules ARsen AR to be hidden, generate a new database D , such that the rules in ARnon-sen=AR-ARsen can mined from D under the same SUPmin and CONFmin C. No normal rules in ARnon-sen are falsely hidden (lost rules) and no extra fake rules are (ghost rules) are mistakenly will mined after the rule hiding process.
From the table 1 the item set {milk, sugar} has a support of 1 / 5 = 0.2 since it occurs in 60% of all transactions (3 out of 5 transactions).
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
75
4.PROPOSED SOLUTION
4.1 Blocking Algorithm In the Blocking based algorithms the idea is to substitute the value of an item supporting the rule we want to hide with a meaningless symbol. We describe here results of a blocking algorithm which reduces loss of data and minimizes the undesirable side effects by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. To modify the database in a way that an adversary cannot recover the original values of the database The following steps are required for the proposed solution. 1st step: o For each (trxn) left n right sensitive rule RS (Rule RS has left itemset IL and right itemset IR) compute how many 0s and 1s you have to block, in order to reduce the confidence of RS. 2nd step: o Find the set of transactions TR that support RS or the set of transactions TLpR that support partially RS (support partially the left itemset and do not support the right itemset). o For each transaction in TR find the rules Rcommon with at least one common item with IR and for each transaction in TLpR find the Rcommon NBRS with at least one common item with IL. Assign a weight w for each Rcommon and a weight w for each Rcommon. Assign a PT(priority) for each transaction in T such as PT is large if transaction Ti(trxn) has many Rcommon rules with large w, and a priority value PT for each Ti such as PT is small if transaction T has many Rcommon rules with large w.
700 600 Large Itemsets 500 Remained 400 300 200 100 0 10% 20% 40% Safe ty M argin 60%
BA CRA
100% R ules C hanged(% ) 80% 60% 40% 20% 0% 10% 20% 40% Safe ty M ar gin
BA CRA
60%
3rd step: o Sort T TR starting from them with lowest PTi. and sort TTLRp starting from them with highest PTi. 4th step: o For the first N1 sorted TTR block an item iIR and for the first N0 sorted TTLRp block an item i IL 5th step: o Update values minconf(Ri), minsup(Ri), for all other rules that have been affected.
Time in secs
BA CRA
2500 5000 7500 10000 Databas e Trans actions
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
76
[6] Evfimievski, A., R. Srikant, R. Agrawal and J. Gehrke, 2002. Privacy preserving mining of association rules.Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-25, ACM Press, Edmonton, AB., Canada. [7] Igor Nai Fovino and Marcelo Masera, 2008,Privacy Preserving Data Mining, a Data Quality Approach 8] Michael W. Berry, Society for Industrial and Applied Maths, Pro ceedings of the Fourth SIAM International Conference on Data Mining [9] Emmanuel Pontikakis, Vassilios Verykios, 2004 An Experimen Tal Study of Association Rule Hiding Techniques, Computer Tech nology Institute Research Unit 3,Athens, Greece [10] Alexandre Evifimievski, Ramakrishnan Srikant, Rakesh Agar wal, Johannes Gehrke,2004, Privacy Preserving Mining Of Association Rules Journal of Information Systems- Knowledge Discovery and Data Mining.
The Authors:
Sugumar.R received the Undergraduate Degree in Computer Science and Engineering from Madras University, in 2003 and the Post Graduate degree in Computer Science and Engineering from Dr.M.G.R. Educational and Research Intituite, Chennai in 2007. He is currently doing her research in Faculty of Computer Science Engineering at Bharath University, Chennai-73.He has more than 5 publications in National Conferences and international journal proceedings. He has more than 8 years of teaching experience. His areas of interest include Data Mining, Data Structures, Database Management Systems, Distributed systems and Operating systems. He is currently working as an Assistant Professor in the Department of Information Technology at R.M.D.Engineering College, Chennai, India. C.Jayakumar has more than 14 years of teaching and research experience. He did his Postgraduate in ME in Computer Science and Engineering at College of engineering, Guindy, and Ph.D in Computer Science and Engineering at Anna University, Chennai. He has published more than 35 research papers in High Impact factor International Journal, National and International conferences and visited many countries like USA and Singapore. He has guiding a number of research scholars in the area Adhoc Network, Security in Sensor Networks, Mobile Database and Data Mining under Anna University Chennai, Anna University of Technology, Sathayabama University and Bharathiyar University. He conducted Various National Conference, Staff Development Program, Workshop, Seminar in associated with Industries like Infosys and TCS. He has Received Rs 22 Lakhs Grant from AICTE for RPS Project and Staff Development Program. He chaired various International and National Conferences. He was Advisor and Technical Committee Member for many International and National Conferences. Currently he is working as Professor in the Department of Computer Science and Engineering, RMK Engineering College.
5. CONCLUSION
We have proposed blocking algorithm in this paper for generating association rule. This work describes a method that reduces loss of data and minimizes the undesirable side effects by selecting the items in the appropriate transactions to change, and maximize the desirable side effects. The purpose of the blocking algorithm for privacy preserving data mining is to hide certain crucial information so they cannot discovered through association rule.
REFERENCES
[1] Luo Yongcheng Le Jiajin and Wang Jian 2009, Survey of Anonyity Techniques for Privacy Preserving, Donghua University,China [2] Pingshui WANG, 2010, International Journal of Digital Content Technology and its Applications , A Survey Of Randomization Techniques For Privacy Preserving Data Mining , China. [3] Latanya Sweeney, 2002, k-anonymity: a model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,USA. [4 ] Charu C. Aggarwal and Philip S. Yu, Privacy-Preserving Data Mining: A Survey, IBM, T. J. Watson Research Center. [5] S. Fed eration For Information Processing. Emmanuel D.Pontikakis, Achilleas A.Tsitsonis and Vassilios Verykios,2004 An Expreimental Study of Distortion Based Techniques for Association Rule Hiding IFIP International
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
77
A.Rengarajan received the Undergraduate Degree in Computer Science and Engineering from Madras University, in 2003 and the Post Graduate degree in Computer Science and Engineering from Dr.M.G.R. Educational and Research Intituite, Chennai in 2007. He is currently doing her research in Faculty of Computer Science Engineering at Bharath University, Chennai-73.He has more than 5 publications in National Conferences and international journal proceedings. He has more than 8 years of teaching experience. His areas of interest include Data Mining, Data Structures, Database Management Systems, Distributed systems and Operating systems. He is currently working as an Assistant Professor in the Department of Information Technology at R.M.D.Engineering College, Chennai, India.