You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 41

Security Information Hiding in Data Mining on


the Basis of Privacy Preserving Technique
Dr.R.Dhanapal, Gayathri Subramanian, M.R.Raja Gopal, K.Hemamalini

Abstract—Data mining has attracted a great deal of information industry and in society as a whole in recent years, due to the
wide availability of huge amount of data and the imminent need for such data into useful information and knowledge. The
information and knowledge gained can be used for applications ranging from market analysis, fraud detection and customer
retention, to production control and science exploration. With and more information accessible in electronic forms and available
on the web, and increasingly powerful data mining tools being developed and put into use, data mining may pose a threat to our
privacy and data security .The real privacy concerns are with unconstrained access of individual records, like credit card,
banking applications, customer ID, which must access privacy sensitive information. In this paper we investigate the issue of
data mining, as data shared before mining the means to shield it with Unified Modeling Language diagrams. Describing the
privacy preserving definition, problem statement privacy preserving data mining technique, Architecture of the proposed work.
We propose an amalgamated scaffold for Privacy Preserving Data Mining that ensures that the mining process will not trespass
Privacy up to a certain degree of security.

Index Terms—Association Rules, Clustering, Confidence, Data Snooping, Data Sanitization, Privacy, Privacy Preserving Data
Mining, Sensitive Data, Unified Modeling Language.

——————————  ——————————

1 INTRODUCTION
Data mining technology provides the number of remain private even after the mining process. The
advantages using automated tools to analyze corporate, problem that arises when confidential information can be
research and development, biological, Financial data, derived from released data by unauthorized users is also
retail industry, telecommunication industry, and other commonly called the “database inference” problem.
scientific applications can help to find way to increase Using UML methodology the privacy model to be
efficiency of organization, industry, or in medical portrayed with use of several diagrams, such as logical
applications. Privacy preserving data mining [1,2], is a diagrams, use case diagrams, scenario and activity
novel research direction in data mining and statistical diagrams, collaborations and distribution diagrams,
databases [3], where data mining algorithms are analyzed Through the analysis of different occurring privacy
for the side-effects they incur in data privacy. Knowledge preserving research project work scenarios, we were able
can equally well compromise data privacy, as we to define the use case type, applying the appropriate
knowledge about individuals or groups that could be UML diagrams. Conventional research project record
against privacy policies, especially if there is potential maintenance poses sample obstacles to intruders, because
dissemination of discovered information. Another issue those seeking to inspect records must have authorization.
that arises from this concern is the appropriate use of data They can view records only in person. Moreover, because
mining. Due to the value of data, databases of all sorts of paper records were decentralized – a single project
content are regularly sold, and because of the competitive records maybe disjointed across a number of places in the
advantage that can be attained from will indicate. The event of a rupture of security, illegitimate access would
main objective in privacy preserving data mining is to be restricted.
develop algorithms for modifying the original data in The remainder of this paper is organized as follows:
some way, so that the private data and private knowledge Section 2 offers an overview of the privacy preserving
Data mining. In this section we have also analyzed the
————————————————
different problems in Data mining and the existing
 Prof.Dr.R.Dhanapal is with the Department of Computer Applications,
solutions. Section 3 discusses the problem statement,
Easwari Engineering College, Affiliated to Anna University of Technology, PPDM techniques for the research project services.
Chennai – 600 089, Tamil Nadu, India. Section 4 presents the block diagram, PPDM techniques
 Gayathri Subramanian is with the Department of Computer Science, for the research lab services. Section 5 discusses the
R B Gothi Jain College for Women, Affiliated to University of Madras,
Chennai – 600 052,Tamil Nadu India, a research scholar pursuing Ph.D implementation details using UML diagrams. Section 6
Computer Science in Dravidian University concludes this paper with a brief summary.

 M.R. Raja Gopal is with the Department of Computer Science, Swami


Dayananda College of Arts and Science, Manjakkudi, a research scholar
pursuing Ph.D Computer Science in Dravidian University

 K.Hemamalini doing second year MCA at Easwari Engineering College,


Affiliated to Anna University of Technology Chennai.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 42

2 LITERATURE SURVEY based on the following dimensions:


 Data distribution
Security is an important issue with any data collection
 Data modification
that is shared and/or is intended to be used for strategic
 Data mining algorithm
decision making. In addition, when data is collected for
 Data or rule hiding
customer profiling, user behavior understanding,
 Privacy preservation
correlating personal data with other information, etc.,
The first dimension refers to the distribution of data.
large amounts of sensitive and private information about
Some of the approaches have been developed for
individuals or companies is gathered and stored. This
becomes controversial given the confidential nature of centralized data. Distributed data scenarios can also be
some of this data and the potential illegal access to the classified as horizontal data distribution and vertical
information. Moreover, data mining could disclose new data distribution. The second dimension refers to the
implicit knowledge discovered, some important data modification In general; data modification is used
information could be withheld, while other information in order to modify the original values of a database that
could be widely distributed and used without control. needs to be released to the public and in this way to
Multifarious issues, such as those concerned in Privacy ensure high privacy protection
Preserving Data Mining (PPDM), cannot simply be  Perturbation, which is accomplished by the
addressed by restricting data collection or even by alteration of an attribute value by a new value
restricting the secondary use of information technology (i.e., changing a 1-value to a 0-value, or adding
[4, 5, and 6]. A fairly accurate explanation could be a
noise),
dequate, depending on the relevance since the suitable
altitude of privacy can be interpreted in diverse contexts  Blocking, which is the replacement of an
[7, 8]. In some applications (e.g., association rules, existing attribute value with a “?”,
classification, or clustering), an apt equilibrium between a  Aggregation or merging which is the
want for privacy and knowledge discovery should be combination of several values into a coarser
originated. Preserving privacy when data are pooled for category.
mining is an exigent predicament. The usual methods in  Swapping that refers to interchanging values
database security, such as access control and of individual records.
authentication that have been adapted to Lucratively  Sampling, which refers to releasing data for
handle the access to data present some restrictions in the only a sample of a population?
milieu of data mining. While access control and
The third dimension refers to the data mining
authentication protections can preserve against direct
algorithm, for which the data modification is taking place.
disclosures, they do not address disclosures based on
This is actually something that is not known beforehand,
inferences that can be strained from released data [9, 10,
but it facilitates the analysis and design of the data hiding
and 11]. Preventing this sort of inference discovery is
algorithm. The fourth dimension refers to whether raw
beyond the reach of the existing methods [16, 19]. In this
data or aggregated data should be hidden. The
paper we address the issue of privacy preserving Data
complexity for hiding aggregated data in the form of
Snooping for a scenario in which the parties owning
rules is of course higher, and for this reason, mostly
confidential databases wish to run a Data Snooping
heuristics have been developed. The last dimension,
algorithm on the union of their databases, without
which is the most important, refers to the privacy
revealing any sensitive information.
preservation technique used for the selective modification
of the data. Selective modification is required in order to
3 PRIVACY PRESERVING DATA MINING STATEMENT achieve higher utility for the modified data given that the
privacy is not jeopardized. The techniques that have
Privacy Preserving Data mining Analysis is an
been applied
amalgamation of the data of heterogeneous users without
For this reason are:
disclosing the private and susceptible details of the users.
 Heuristic-based techniques like adaptive mod-
3.1. Problem Statement ification that modifies only selected
Stipulation of a comprehensible but prescribed values that minimize the utility loss rather
approach for early privacy preserving analysis in the than all available values.
milieu of component based software development, in  Cryptography- based techniques like secure
order to evaluate and compare with apiece and all the multiparty computation where a computation
Techniques in a universal platform and to devise, build is secure if at the end of the computation, no
up and execute functionalities like a User friendly party knows anything except its own input
framework, portability etc. and the results.
 Reconstruction-based techniques where the
3.2. Classification of Privacy Preserving original distribution of the data is
Techniques reconstructed from the randomized data.
There are many approaches which have been adopted
for privacy preserving data mining. We can classify them
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 43

4 SHARING RESEARCH PROJECT WORK DATA held in reserve for records. These similar project data are
collected from several research lab or research project.
Investigating and analyzing the predominance,
The patient data will be in dissimilar formats. These facts
frequency of various research project work for
are keyed in to the database server. This input data will
understanding various research project and treating
be converted into the preferred format and stored in these
them. Such analyses have considerable bang on policy
database servers. The second data input coming from the
decisions. A palpable precondition to (carrying out) such
data warehouse is sent to the data warehouse servers. The
studies is to have the indispensable data available.
data warehouse server contains a collection about various
First, various similar project data has to be collected
things. From the database server pool, we choose only the
from several research lab area providers, here projects
most wanted data and transform it to the desired format.
1- n. It has to subject to data sanitization and then
This transformed data is the input data on which we need
integrated. The data that is required for pattern
to run the Data Snooping techniques. Instead, of sending
evaluation and knowledge mining alone is selected by
the data directly for Data Snooping we make the data to
filtering. These heterogeneous data’s are converted to the
be obscured by using different privacy preserving Data
desired format. This course of action is tremendously
Snooping techniques with the intention of preserving the
time consuming and toil demanding. Privacy concerns
sensitive information. This privacy preserved obscured
are a major hindrance to streamlining these efforts.
data is the subjected to the various different Data
Infringing privacy can lead to significant dent to
Snooping techniques like classification, association,
individuals both materially and psychologically. Privacy
clustering etc., on the input preserved data. The extracted
is addressed be nowadays by preventing propagation to a
patterns are sent to the pattern evaluator and the
certain extent than integrating privacy constraints into the
interesting patterns are visually shown for further
data sharing process. Privacy preserving amalgamation
analysis.
and partaking of research data has become vital to
enabling scientific innovation.
5 UML PPDM MODELS
Project 1
Unpreserved
data
Figure shows an essential part of the use case diagram
Cleaning
Knowle dge
base
specifying the behavior of a PPDM system. An actor
and data
Project 2 integration Data 
Pattern 
exterior to the box characterizes an external entity
snooping
 Database engine Patterns Evaluator
cooperating with the system. The use cases within the box
 & dwh
se rver characterize system functionalities afforded to the
Project n
external actors, where each use case can include or be
Select
and
Privacy extended by other use cases. The use case diagram is
preserving GUI
transformation data complemented by textual use cases with a varying degree
Filtering of formality from an informal, casual description to the
Data  ppdm Knowle dge
Warehouse technique use of a semiformal template specifying details of each
use case.
Fig. 1. System Architecture
5.1. PPDM System Use Cases
Under the current state of affairs of hi-tech
4.1. Unpreserved Data developments which has obliterated the distinctions of
Data of several project works are collected in research researcher project work data kept in private and public;
lab and kept for records. These project data are collected we are incapable of shielding the project privacy. The
from several research labs. The similar project data will be project records are kept in private, various research labs.
in different formats. These data are given as an input to In budding project systems, the researcher or
the database server. The data will be converted into a administrator responsibility as a research work privacy is
desired format and stored in these database servers. We under grave assault. Relationships between research &
get one more data input from the data warehouse and development organization and researcher have been
send it to the data warehouse servers. From the output of transformed so that researcher may no longer be able to
the database. have power over project information in the manner they
Servers we select only the desired data and transform once did. Furthermore, new information technologies
to the desired format for which we need to run the Data have enhanced the significance and latent uses of project
Snooping engine. We use different Data Snooping data; as a result, third-party demands for right to use
techniques like classification, association, clustering etc. have increased, with attendant risks to project privacy.
on the input unpreserved data. The extracted patterns are The highly sensitive information of the project work has
sent to the pattern evaluator and the interesting patterns to be conserved and then mined for effective data
are visually shown for further analysis. dredging. The UML (Unified Modeling Language)
methodology allows the PPDM model to be described
4.2. Privacy Preserved Data mining with use of diagrams, use case diagrams, class and class
Amount of Data is quite a lot of various similar structure diagrams.
projects that is collected from various research labs and
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 44
unyielding advances in the upcoming of PPDM. Our
exploration concludes that our Privacy Preserving Data
Mining framework is reusable, customizable, and
effective, meets privacy requirements, and guarantees
well-founded Data Snooping results while shielding
vulnerable information (e.g., sensitive knowledge and
individuals' privacy).

7 ACKNOWLEDGMENTS
The author would like to thank the reviewers for their
constructive suggestions and comments.

REFERENCES
Fig. 2. Use case diagram of data mining system [1] Chris Clifton and Donald Marks, Security and privacy implications of data
mining, In Proceedings of the ACM SIGMOD Workshop on Research
5.2 Use Case Description Issues on Data Mining and Knowledge Discovery (1996), 15–19.
[2] Daniel E. O’Leary, Knowledge Discovery as a Threat to Database Security, In
This gives us a comprehensive portrayal of how a system
Proceedings of the 1st International Conference on Knowledge
will be used. It endows us with an outline of the projected
Discovery and Databases (1991), 107–516.
functionality of the system. PPDM main success scenario
[3] Nabil Adam and John C. Wortmann, Security- Control Methods
(basic flow) that can be extracted from Use Case Diagram
for Statistical Databases: A Comparison Study, ACM Computing
shown in figure 2 is understandable by laymen as well as
Surveys 21 (1989), no. 4, 515–556.
professionals. Class diagram as shown in figure 3 show the
[4] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic
static structure of the Object, their internal structure, and
Databases. In Proc. Of the 28th Conference on Very Large Data
their relationships.
Bases, Hong Kong, China, August 2002.
[5] L. Brankovic and V. Estivill-Castro. Privacy Issues in Knowledge
Discovery and Data Mining. In Proc. Of Australian Institute of
Computer Ethics Conference (AICEC99), Melbourne, Victoria,
Australia, July 1999.
[6] S. R. M. Oliveira and O. R. Zaiane. Foundations for an Access
Control Model for Privacy Preservation in Multi- Relational
Association Rule Mining. In Proc. of the IEEE ICDM Workshop
on Privacy, Security, and Data Mining, pages 19 -26, Maebashi
City, Japan, December 2002.
[7] C. Clifton, W. Du, M. Atallah, M. Kantarcio_glu, X. Lin, and
J. Vaidya. Distributed Data Mining to Protect Information. Privacy.
Proposal to the National Science Foundation, December 2001.
[8] C. Clifton. Using Sample Size to Limit Exposure to Data Mining.
Journal of Computer Security, 8(4):281-307, November 2000.
[9] L. Sweeney. k-Anonymity: A Model for Protecting Privacy.
Fig. 3. Class diagram International Journal on Uncertainty, Fuzziness
[10] C. Farkas and S. Jajodia. The Inference Problem: A Survey.
SIGKDD Explorations, 4(2):6{11, December 2002.Knowledge-
Based Systems, 10(5):557-570, 2002.
6. CONCLUSION
Dr.R.Dhanapal obtained his Ph.D in Computer Science from
The work presented in here, indicates the ever Bharathidasan University, India. He is currently Professor of the
increasing interest of researchers in the area of securing Department of Computer Applications, Easwari Engineering College,
sensitive data and knowledge from malicious users. The Affiliated to Anna University of Technology Chennai, Tamil Nadu
conclusions that we have reached from reviewing this India. He has 25 years of teaching, research and administrative
area manifest that privacy issues can be effectively experience. Besides being Professor, he is also a prolific writer,
considered only within the limits of certain data mining having authored twenty one books on various topics in Computer
algorithms. In this paper we are defining privacy Science. His books have been prescribed as text books in
preservation in data mining, and the implications of Bharathidasan University and autonomous colleges affiliated to Bha-
benchmark privacy doctrine in information detection and rathidasan University. He has served as Chairman of Board of Stu-
we are advocating a few policies for PPDM based on dies in Computer Science of Bharathidasan University, member of
these privacy principles. These are vital for the Board of Studies in Computer Science of several universities and
development and deployment of methodological autonomous colleges. Member of standing committee of Artificial
solutions and will let vendors and developers to construct Intelligence and Expert Systems of IASTED, Canada and Senior
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 45
Member of International Association of Computer Science and
Information Technology (IACSIT), Singapore. He has Visited USA,
Japan, Malaysia, and Singapore for presenting papers in the
International conferences and to demonstrate the software
developed by him. He is the recipient of the prestigious ‘Life-time
Achievement’ and ‘Excellence’ Awards. He is serving as Principal
Investigator of UGC sponsored innovative, major and minor research
projects about 1.6 crore. He is the recognized supervisor for
research programmes in Computer Science leading to Ph.D and MS
by research in several universities including Anna University of
Technology Chennai, Bharathiar University, and Manonmaniam
Sundaranar University. He has got 47 papers on his credit in
international and national journals.
Mrs.S.Gayathri Subramanian M.Sc. M.Phil., Head of the
Department of Computer Science, R.B.Gothi Jain College for Wom-
en, Redhills, Chennai - 52, India, pursuing research leading to Ph.D
in Dravidian University, Andhra Pradesh, India under the
guidance and supervision of Prof.Dr.R.Dhanapal. Current Research
Interest: Data Mining.
Mr.M.R.RajaGopal M.Sc., MCA., MBA., M.Phil., Head of the
Department of Computer Science, Swami Dayananda College of Arts
and Science, Manjakkudi, pursuing reserach leading to Ph.D in
Dravidian University, Andhra Pradesh, India under the guidance and
supervision of Prof.Dr.R.Dhanapal. Current Research Interest:
Data Mining.
Ms.K.Hemamalini doing Second Year MCA at Easwari Engineering
College, affiliated to Anna University of Technology, Chennai, India
under the guidance and supervision of Prof.Dr.R.Dhanapal. Current
Research Interest: Data Mining.

You might also like