You are on page 1of 28

1.

INTRODUCTION
According to [1], there has been a significant improvement in the
information processing technology and storage capacity. Hence lot of data is
generated by different types of organizations. For extracting hidden patterns from
these huge data sets, data mining techniques are applied. During the process of
data mining, Unexpected and interesting patterns are the main result because
expected patterns does not give any new information. Hence data mining is also
known as Knowledge mining and a very good example of data mining would be
Mining Gold from Rocks where Rocks refer to tons of data and Gold refers to
important information obtained from Rocks.
Let us take an example here to clearly understand the problem of privacy
during data mining. In this example, a hospital is an organization which has a data
set of all their patients. This data set contains the sensitive information of each
individual. Now let us consider a third party like a drug manufacturer. This third
party would like to take the data set prepared by the hospital to analyze the current
trend of diseases, the most affected age groups and the drugs in demand and also
details of the doctor referred etc etc. But this data set needs to be edited properly
before releasing it to the third party because the third party may not be trusted. If
the third party is able to identify a record completely, it leads to a breach in
privacy of the individual. The third party may also use the sensitive information
of an individual for malicious purposes.
One solution to this privacy breach would be to not send any details to the
third party. But there are advantages to send details to the third party for
knowledge mining as in the above example the drug manufacturer after mining
the data set may come to know about the prevalent diseases and he can focus on
the prevalent diseases and do more research on these diseases. Hence the process
of data mining cannot be denied and its advantages are magnanimous.

But privacy also plays an important role. It is a matter of concern with growing
issues related to information privacy.
Another solution would be to alter the data set in such a way that sensitive
information is not sent and it can be mined to find useful patterns also. This is by
far the best solution because privacy of the individual is intact and data is also
mined and analyzed for the various advantages it has to offer. This provides a
win-win situation for both the parties (in this case the hospital and the drug
manufacturer) to carry on with their work and also expect a more effective
solution for the current problems.
According to [2], there are more than 20 state of the art techniques dealing
with privacy preserving data mining. The research related to PPDM started in the
year 2000 with researchers R. Agrawal and A. Srikant in their paper [3]. Research
has been fruitful since then with various techniques like data perturbation [4],
association rule mining [5], histogram based approach, decision tree technique
[6], cryptographic technique [7], k-anonymity technique [8] among others having
dealt with the issue of privacy.
According to [9], PPDM approach assumes single level trust on data
miners (the third party). Under this assumption, a data owner generates only one
perturbed copy of the data with a fixed amount of uncertainty. This assumption is
limited in various applications where a data owner trusts the data miners at
different levels. This is a believable scenario as there may be more than one drug
manufacturer who wants the data of the hospital in the example mention above. A
particular data miner( third party) may be more trusted than the others . Hence a
less perturbed copy should be sent to a more trusted data miner and a more
perturbed copy should be sent to a less trusted data miner. Hence more than one
perturbed copy must be released for data mining. A malicious data miner could
access multiple perturbed copies through various other means.
By utilizing diversity across differently perturbed copies, the data miner
may be able to produce a more accurate reconstruction of the original data than

what is allowed by the data owner. We refer to this attack as diversity attack. It
includes the colluding attack scenario where adversaries combine their copies to
mount an attack ,it also includes the scenario where an adversary utilized public
information to perform the attack on its own. Preventing diversity attacks is the
key challenge in solving the problem. This defines the research problem i.e.
Trust Issues Concerning Multiple Parties in Privacy Preserving Data Mining .
The rest of the report is organized as follows: in section 2 i.e. the literature
survey we discuss in detail about data mining concepts and privacy preserving
data mining .In section 3 we discuss about the existing system, in section 4 we
suggest improvements to the existing system thus introducing our proposed
system, in section 5 concludes the report.

2. LITERATURE SURVEY
Here we are going to have a look at Data Mining Definition, Concepts and all the
preliminary information required to understand the report.
2.1 DATA MINING
Data Mining can be seen as a process for extracting hidden and valid knowledge from
huge databases [10]. Data mining extracts knowledge which was previously unknown
[11].This means that the more unexpected the knowledge generally the more interesting it
is. There is no benefit of mining data set to extract the knowledge which is very obvious.
The extracted knowledge needs to be valid. Moreover, extracting knowledge from a data
set having a small number of records is not a viable option .Here a doubt may arise, is
data mining similar to statistical data analysis? Among all traditional data analysis,
statistical analysis is most similar to data mining. Many of the data mining tasks such as
building predictive models, and discovering associations, could also be done through
statistical analysis. An advantage of data mining is its assumption free approach.
Statistical Analysis still needs some predefined hypothesis. Additionally Statistical
Analysis is restricted to only numerical attributes while data mining can handle both
numerical and categorical attributes. Moreover data mining techniques are generally easy
to use.
2.1.1

Data Mining Steps

Essential steps of data mining include data cleaning,data integration ,data


selection, data mining, pattern evaluation and knowledge Presentation[12,13].Each of
the steps are briefly discussed as follows.

Data Cleaning-It refers to the removal of natural noise and inconsistent


data from the database. Words and numbers may be misspelt and entered
erroneously in a database due to various reasons including typographical
errors .Missing values are either replaced by the most likely value or

deleted along the whole record. An attribute Soft Drink may have the
values such as pepsi,cola or pepsi cola which necessarily refers to
the same drink. They need to be consistent before data mining technique.

Data Integration-It is also known as data transformation. It is the process


of combining two or more data sources into a uniform data set. Different
data sources can use different models such as relational model or object
oriented relational model. A two dimensional data set is created from this
various sources. Sometimes a particular attribute can be called differently
in different data sources. For example one data source can name the
attribute as Income while the other data source can name the attribute as
Salary. These anomalies are resolved in the data integration phase.

Data Selection-In order to perform a particular data mining task all


relevant attributes are selected from a warehouse data set. This new set
comprising of these selected attributes are used for data mining .

Data Mining It is an essential process which extracts previously


unknown patterns and trends from a huge data set without making any
predefined hypothesis.

Pattern Evaluation and Knowledge Presentation- Some extracted patterns


may be obvious and unappealing . However , some other patterns may be
counter intuitive ,interesting and useful.

Fig 1: Steps in a Knowledge Discovery in Database (KDD )process

2.1.2

Data Mining Tasks


There are some tasks which makes use of data mining techniques,
although they themselves are not data mining. These tasks are often
mistakenly considered as data mining, perhaps due to the close link to it.
There are many data mining tasks such as classification, association rule
mining, clustering, outlier analysis, evolution analysis, characterization and
discrimination. We briefly discuss some of them as follows.

Classification and Prediction-A data set may have an attribute


called class attribute which refers to the category of records. For
example a patient data set may have a class attribute called
diagnosis along with several other non-class attributes that
describes various properties and conditions of the patient. Records
having class attribute values are known as labelled records.
Classification is the process of building a classifier from a set of
pre-classified records. Classifiers help to analyze data sets better.

They are expressed in different ways such as set of rules, decision


trees etc.

Fig 2: Decision tree example

Association Rules Mining-Primary objective of association rules


mining is to obtain frequent item sets, and association rules. If a
set of items appears in a number of transactions which is more
than the user defined threshold then the set is known as a frequent
item set. However, if two sets of items makes the appearance of
one set of items in a transaction makes the appearance of the other
set of items in the same transaction highly expected then it is
known as an association rule. An association rule can be
represented as X=>Y, where X and Y are mutually exclusive
subset of items and X,Y I ,where I is the set of all items in the
data set . A rule computer=> software [1%,50%] which says that
if a transaction contains computer then there is a 50% chance that
there will be software in that transaction and 1% of all the
transactions contain both of them.

Clustering-It is a process of arranging similar records in groups so


that the records belonging to the same cluster have high similarity,

while

records

belonging to

different

cluster

have

high

dissimilarity. Partitioning, hierarchical, density based, grid based


and model based methods are few of the clustering methods.

2.1.3

AN APPLICATION OF DATA MINING- MEDICAL DATA ANALYSIS


Generally, medical data sets contain wide variety of bio-medical data
which are distributed among parties. Various data mining tasks such as data
cleaning, data preprocessing and semantic integration can be used for the
construction of the warehouse and useful analysis of the medical databases.
Data sets having patient records can also be analyzed through data mining for
various other purposes such as prediction of diseases for new patients.

2.2

PRIVACY PRESERVING DATA MINING

Nowadays, data mining is a widely accepted technique for huge range of


organizations. Organizations are extremely dependent on data mining in their everyday
activities. The paybacks are well acknowledged and can hardly be overestimated. During
the whole process of data mining , these data which typically contain sensitive individual
information such as medical and financial information, often gets exposed to third parties
including collectors ,owners ,users and miners. Disclosure of such sensitive information
can cause a breach of individual privacy.
An intruder or malicious data miner can learn sensitive attribute information such as
disease type and income of a certain individual through re-identification of record from a
certain data set. It is also not unlikely for an intruder to have sufficient supplementary
knowledge, such as ethnic background , religion , marital status and number of children
of an individual.
Public concern is mainly caused due to the so-called secondary use of personal
information without the consent of the subject. In other words, customers feel strongly
that their personal information should not be sold to other organizations without their
prior consent. The IBM Multinational Consumer Privacy Survey performed in 1999 in

Germany, USA and UK illustrates public concern over privacy [14]. 80% of the
respondents feel that consumers have lost all control over how personal information is
collected and used by companies. 94% of the respondents are concerned about the
possible misuse of their personal information. This survey also shows that when it comes
to the confidence that their personal information is properly handled, customers have
most trust in health care providers and banks and the least trust in credit card agencies
and internet companies.
A Harris Poll Survey illustrates the growing public awareness and apprehension
regarding their privacy ,from results obtained in 1999,2000,2001 and 2003 [15]. The
public awareness is shown in the table 1.
1999

2000

2001

2003

Concerned

78%

88%

92%

90%

Unconcerned

22%

12%

8%

10%

Table 1. Harris Poll Survey

Due to the enormous efforts of data mining, yet high public concerns regarding
individual privacy, the implementation of privacy preserving data mining techniques has
become the demand at the moment. A privacy preserving data mining technique promotes
individual privacy while allowing extraction of useful knowledge from data.
There are several different methods that can be used to enable privacy preserving
data mining. One particular class of such techniques modifies the collected data before its
release, in an attempt to protect individual records from being re-identified. An intruder
even with supplementary knowledge, cannot be certain about the correctness of a reidentification when the data set has been re-modified. This class of privacy preserving
techniques relies on the fact that the data sets used for data mining purposes do not
necessarily contain 100% accurate data. In fact this is almost never the case , due to the

existence of natural noise in data sets . In the context of data mining it is important to
maintain the patterns in the data set. Additionally , maintenance of statistical parameters
namely mean and covariances of attributes is important in the context of statistical
databases.
High quality and privacy/security are two important requirements that a good
privacy preserving technique needs to satisfy. There is no single agreed upon technique
for privacy. Therefore measuring privacy/security is a challenging task.

10

3.TRUST ISSSUES IN PRIVACY PRESERVING DATA MINING


According to [9], Privacy Preserving Data Mining(PPDM) addresses the problem
of developing accurate models about aggregated data without access to precise
information in individual data record. But these techniques make a tacit assumption that
the data owner has a single level trust on data miners. This assumption is relaxed and the
scope of perturbation based PPDM is extended to Multi Level Trust(MLT-PPDM). The
research paper in [9] is our existing system. In this paper , the more trusted the data miner
is the less perturbed the data he can access. Under this setting, a malicious data miner
may have access to differently perturbed copies of the same data through various means ,
and may combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release.
This type of attack is known as diversity attack. Preventing such attacks is the key
challenge of providing MLT-PPDM services. This challenge is addressed by properly
correlation perturbation across copies at different trust levels. The solution is robust
against diversity attacks with respect to the privacy goal. The data owner must also be
able to generate perturbed copies for arbitrary trust levels on demand. This feature offers
maximum flexibility.
Data perturbation as already discussed in previous sections in a PPDM approach
which is widely used. This approach introduces uncertainty about individual values
before data are published or released to third parties for data mining purposes [15]. But
the single trust level assumption is limited. A two level trust level scenario as a
motivating example.

The government or a business might do internal data mining, but they also want
to release the data to the public and might perturb it more. The mining
department which receives the less perturbed internal copy also has access to the
more perturbed public copy. It would be desirable that this department does not

11

have more power in reconstructing the original data by utilizing both copies than
when it has only the internal copy

Conversely, if the internal copy is leaked to the public, then obviously the public
has all the power of the mining department. However, it would be desirable if the
public cannot reconstruct the original data more accurately when it uses both
copies than when it uses only the leaked internal copy.

This new dimension of MLT_PPDM poses new challenges, in contrast to the original
scenario multiple perturbed copies of the same data are available to the data miners. A
malicious data miner would try to reconstruct the original data through diversity attack.
This challenge is addressed by MLT-PPDM services. The focus is on additive
perturbation approach where random guassian noise is added to the original data with
arbitrary distribution and provide a systematic solution. Through a one to one mapping,
data owner generates distinctly perturbed copies of its data according to different trust
levels.
Here a question may arise, why do we use random perturbation among all the
techniques. The first category of PPDM technique is Secure Multi Party Computation
(SMC) which makes of the cryptographic techniques. However these techniques are not
practically put to use as they are extraordinarily expensive in practice, and impractical for
real use. Very other solutions have been proposed, solutions to build decision trees over
the horizontally partitioned data were proposed in [16]. For vertically partitioned data,
algorithms have been proposed to address the association rule mining [17]. K-means
clustering [18] and frequent pattern matching problems [19]. The work of [20] uses a
secure coprocessor for privacy preserving collaborative data mining and analysis.
3.1 PREMILINARIES
3.1.1 Jointly Guassian
Let

through

be L Guassian random variables. They are said to be jointly

guassian if and only if each of them is a linear combination of multiple independent

12

guassian random variable. Equivalently

through

are jointly guassian if and only if

any linear combination of them is also a guassian random variable.


A vector formed by jointly guassian random variable is a jointly guassian vector.
For a jointly guassian vector G=

, its probability density function is

given by any real vector g,


(G)=
Where

are the mean vector and covariance matrix of G respectively.

If multiple random variables are jointly guassian then conditional on a subset of them ,
the remaining variables are still jointly guassian . Specifically partition a jointly guassian
vector as
G=(

Then the distribution of

over

),

is also a jointly guassian with mean

and covariance matrix

3.1.2 Additive Perturbation


A widely used and accepted way to

perturb data is by additive

perturbation[15].This approach adds to the original data X , some random noise Z to


obtain the perturbed copy Y as follows
Y=X+Z----------------------------------------------------------------(1)

13

We assume that X,Y,Z are all N Dimensional vectors where N is the number of attributes
in X .Let

be the jth entry of X,Y and Z respectively.

The covariance matrix is a N

N matrix given by
]-----------------------------------------------------------(2)

Which is a diagonal matrix if attributes are uncorrelated.


The Noise Z is assumed to be independent of X and is a jointly guassian vector with zero
mean and covariance matrix
covariance matrix
E[Z

is an a N

chosen by the data owner. We write it as Z~N(0,

).The

N matrix is given by

]-----------------------------------------------------------------------------------(3)

In a straightforward way we need to verify

Huang et al [21] points out that there must be some correlation in the added noise
otherwise it may be filtered out. hence

for some constant

denoting the

perturbation magnitude.

3.1.3 Linear Least Squares Error Estimation


Given a perturbed copy of the data , a malicious data miner may attempt to
reconstruct the original data as accurately as possible. Among the family of linear
reconstruction methods, where estimates can only be linear functions of the perturbed
copy. Linear Least Square Error(LLSE) estimation has the minimum square errors
between the estimated values and the original values [22].
The LLSE estimate of X given Y is given by (Y) is as follows
(Y)=

-----------------------------------------------------------(4)

14

3.1.4 Kronecker Product


The kronecker product is a binary matrix operator that maps two matrices of arbitrary
dimensions into a larger matrix with a special block structure. Given a nm matrix A and
pq matrix B where ,

A=(

) their kronecker product is denoted by

AB is an np mq matrix with block structure

3.2 PROBLEM FORMULATION


It is true that the data owner may consider to release only the mean and
covariance of the original data. We remark that by simply releasing the mean and
covariance does not provide the same utility as the perturbed data. For many real
applications, knowing only the mean and covariance may not be sufficient to apply data
mining techniques , such as clustering, principal component analysis , and classification.
By using random perturbation to release the data set , the data owner allows the data
miner to exploit more statistical information without releasing the exact values of
sensitive attributes.
Let H be an (N.M) N matrix as follows :

H=( )

Where

represents NN identity matrix

15

And Y=HX+Z-------------------------------------------------(5)
3.3 THREAT MODEL
We assume malicious data miners who always attempt to reconstruct a more accurate
estimate of the original data given perturbed copies where
and

then the LLSE estimate is given by

(Y)=

(Y-

)--------------------------------(6) and

Covariance matrix is as follows :


E[ (Y)-X)

]=

--------------------------------------(7)

For an adversary who observes only a single copy

(1 i M) and gets a LLSE estimate

,the covariance matrix is as follows

E[ (Y)-X)

]=

---------------------------------------------(8)

3.4 DISTORTION
We define the concept of perturbation D between two data sets as the average expected
square difference between them. For example the distortion between the original data X
and the perturbed copy Y is given by
D(X,Y)=
Based on the above definition , we refer to a perturbed copy with respect to X ,if and only
if D(X

3.5 PRIVACY GOAL AND DESIGN SPACE


Using equation (8) we express the privacy of

16

.ie. D(X,

) as follows :

D(X,

)=

Tr(

)----------------(9)

Where Tr(.) is the trace of the matrix. We initially assume that the data owner wants to
release M copies for the ease of analysis.

We say the privacy goal is achieved with respect to M perturbed copies if


D(X,
Where

D(X, ( ))------------------(10)

)=

) is the set of perturbed copies an adversary uses to reconstruct the original

data. Intuitively ,achieving the privacy goal requires that give n the copy with the least
privacy among any subset of these M perturbed copies , the remaining copies in that
subset contain no extra information about X.
Assume the case of two perturbed copies

and

for noises

.The privacy goal

in (10) requires that


D(X,
Where

D(X,

-----------------------(11)

is the less perturbed copy.

3.6 PROPOSED SOLUTION


One way to satisfy (11) is to generate
We rewrite

If

so that

=X+

and

are independent.

as

are independent , then


.All information in

is nothing but a perturbed derivation of

useful for estimating X is inherited from

has no extra

innovative information to improve the estimation accuracy and (11) is satisfied.

17

Now the covariance matrix is given by (

)--------------------------(13)

3.6.1 Corner Wave Property


The privacy goal in (10) is achieved if the noise covariance matrix satisfies corner wave
property.Specifically we say that an MM square matrix has corner wave property if
from I to M,all the entries to the right and below(i,i) th entry are same. We assume
<

for all i=1M-1--------------------------------------------------(14)

Thus the covariance matrix is given by

)---------------------------------(15)

=(

---------------------------(16)

3.6.2 Batch Generation


Two algorithms have been defined.Algorithm 1 generates noise
Algorithm 1:Parallel Generation
1.Input:X,

and

to

2.Output : Y
3.Construct

with

4.Generate Z with

and

to

According to (15)

,according to (16)

5.Generate Y=HX+Z

18

to

in parallel

6.Output Y

Algorithm 2:Sequential Generation


1. Input:X,

and

to

2. Output:
3. Construct

~N(0,

4. Generate
5. Output
6. Construct noise
7. Generate

8. Output
The main disadvantage of the batch Generation approach is that it requires a data owner
to foresee all possible trust levels apriori.
3.6.3 On-Demand Generation
In order for maximum flexibility to the data owner we generate the perturbed copies on
demand. We assume that there are L existing copies and the data owner generates M-L
copies .Thus there will be M-L copies in total.
The guassian mean will be
------------------------------------(17)

And covariance
-

--------------------------(18)

Algorithm 3:On Demand Generation

19

1.Input: X,

and

to

and values of

2.Output:New copies
3.Construct
4.Extract
5.Generate

with
,

and
and

to

According to (15)

from

using (17) and (18)

6.for I from L+1 to M do


7. Generate

=X+

8.Output
9.end for
Algorithm 3 offers more flexibility .

20

4.PROPOSED SYSTEM
The following system is required for the execution of the proposed system.

System Configuration:H/W System Configuration:-

Processor

Pentium III

Speed

- 1.1 Ghz

RAM

- 256 MB(min)

Hard Disk

- 20 GB

Floppy Drive

- 1.44 MB

Key Board

- Standard Windows Keyboard

Mouse

- Two or Three Button Mouse

Monitor

- SVGA

S/W System Configuration:

Operating System

Application Server

Front End

:Windows95/98/2000/XP
: Tomcat5.0/6.X
: HTML, Java, Jsp

21

Scripts

Server side Script

: Java Server Pages.

Database Connectivity

: Mysql.

: JavaScript.

System Architecture:

22

DATA OWNERS

ORIGINAL DATA

MLT - PPDM
ALGORITHMS

RANDOM RATATION
PERTURBATION

PERTURBED
DATA

DATA MINERS

Implementation Modules:

23

1) Data owners (users)


2) Multilevel trust in PPDM (Manager)
3) Admin

Data owners
The bank customers are the data owners. They could register them self as per the
account number and create a username and password. User can view their original datas
Whatever they given when there open the account.

Multilevel trust in PPDM


Develop algorithms and code to execute the existing system and further enchance it .

Admin
Admin also can view the original datas. whatever stored in the database .Admin can
login and view the original datas .

We would first like to implement the existing system and then further enhance it.The
existing system considers only linear attacks.We would like to continue the research by
looking into non linear attacks to derive original data and recover more information.

5.CONCLUSION
24

The scope of PPDM is extended to Multi-level PPDM by relaxing an implicit assumption


of single level trust.MLT-PPDM allows data owners to generate differently perturbed
copies at his own will. The key challenge lies in preventing the data miners form
combining copies at different trust levels to jointly reconstruct the original data more
accurately then what is there with the data miner. This challenge is addressed by properly
correlating noise at different trust levels.The On demand Generation algorithm also
provides maximum flexibility.We would like to extend the existing system from linear
attacks to non linear attacks.

REFERENCES
[1] Md Zahidul Islam , Privacy Preserving Data Mining through Noise Addition 2008.

25

[2]Shwetha Taneja, Shashank Khanna, Sugandha Tilwalia, Ankita, A Review on


Privacy Preserving Data Mining: Techniques and Research Challenges Proc. IJCSIT,
Vol 5(2) pp 2310-2315 ,2014.
[3] R. Agrawal and A. Srikant, " Privacy-preserving data mining, in proceedings of
SIGMOD00, pp. 439-450.
[4] H. Kargupta and S. Datta, Q. Wang and K. Sivakumar, On the Privacy Preserving
Properties of Random Data Perturbation Techniques, in proceedings of the Third IEEE
International
Conference on Data Mining, IEEE 2003.
[5] D.Karthikeswarant, V.M.Sudha, V.M.Suresh and A.J. Sultan, A Pattern based
framework for privacy preservation through Association rule Mining in proceedings of
International Conference On Advances In Engineering, Science And Management
(ICAESM -2012), IEEE 2012.
[6] H.C. Huang, W.C. Fang, Integrity Preservation and Privacy Protection for Medical
Images with Histogram-Based Reversible Data Hiding, in proceedings of 978-1-457704222/11/$26.00_c, IEEE 2011.
[7] Y. Lindell, B.Pinkas, Privacy preserving data mining, in proceedings of Journal of
Cryptology, 5(3), 2000.
[8] L. Sweeney, "k-Anonymity: A Model for Protecting Privacy, in proceedings of Int'l
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2002.

26

[9] Yapling Li ,Minghua Chen , Qiwei Li and Wei Zhang ,Enabling Multilevel Trust in
Privacy Preserving Data Mining , IEEE transactions on knowledge and data
engineering, vol-24,no.9 pp 1598-1613 ,September 2012.
[10] P. Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi. Discovering Data
Mining from Concept to Implementation. Prentice Hall PTR, New Jersey 07458,USA,
1998.
[11] A. Cavoukian. Data mining: Staking a claim on your privacy, Information and
Privacy Commissioner Ontario. Available from http://www.ipc.on.ca/ docs/datamine.pdf,
Accessed on 21 May, 2008, 1998.
[12] J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann
Publishers, San Diego, CA 92101-4495,USA, 2001.
[13] R. Groth. Data Mining A Hands-On Approach For Business Professionals. Prentice
Hall PTR, New Jersey 07458, USA, 1998.
[14] Consumer Report. Ibm multi-national privacy survey consumer report. available
from http://www1.ibm.com/services/les/privacy survey oct991.pdf. visited on 01.07.03.
[15] D. Agrawal and C.C. Aggarwal, On the Design and Quantification of Privacy
Preserving Data Mining Algorithms, Proc. 20th ACM SIGMOD-SIGACT-SIGART
Symp. Principles of Database Systems (PODS 01), pp. 247-255, May 2001.
[16] Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Proc.Intl Cryptology
Conf. (CRYPTO), 2000.
[17] J. Vaidya and C.W. Clifton, Privacy Preserving Association Rule Mining in
Vertically Partitioned Data, Proc. ACM SIGKDD Intl Conf. Knowledge Discovery and
Data Mining, 2002.

27

[18] J. Vaidya and C. Clifton, Privacy-Preserving K-Means Clustering over Vertically


Partitioned Data, Proc. ACM SIGKDD Intl Conf.Knowledge Discovery and Data
Mining, 2003.
[19] A.W.-C. Fu, R.C.-W. Wong, and K. Wang, Privacy-PreservingFrequent Pattern
Mining across Private Databases, Proc. IEEEFifth Intl Conf. Data Mining, 2005.

[20] B. Bhattacharjee, N. Abe, K. Goldman, B. Zadrozny, V.R. Chillakuru, M.del Carpio,


and C. Apte, Using Secure Coprocessors for Privacy Preserving Collaborative Data
Mining and Analysis, Proc. Second Intl Workshop Data Management on New
Hardware (DaMoN 06), 2006.
[21] Z. Huang, W. Du, and B. Chen, Deriving Private Information From Randomized
Data, Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD), 2005.
[22] K. Shanmugan and A. Breipohl, Random Signals: Detection, Estimation, and Data
Analysis. John Wiley & Sons Inc, 1988.

28

You might also like