You are on page 1of 17

ABSTRACT

The advent of cloud computing, data owners are motivated to outsource their
complex data management systems from local sites to commercial public cloud for great
flexibility and economic savings. But for protecting data privacy, sensitive data has to be
encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext
keyword search. Thus, enabling an encrypted cloud data search service is of paramount
importance. Considering the large number of data users and documents in cloud, it is crucial
for the search service to allow multi-keyword query and provide result similarity ranking to
meet the effective data retrieval need. Related works on searchable encryption focus on single
keyword search or Boolean keyword search, and rarely differentiate the search results. In this
paper, for the first time, we define and solve the challenging problem of privacy-preserving
multi-keyword ranked search over encrypted cloud data (MRSE), and establish a set of strict
privacy requirements for such a secure cloud data utilization system to become a reality.
Among various multi-keyword semantics, we choose the efficient principle of coordinate
matching, i.e., as many matches as possible, to capture the similarity between search query
and data documents, and further use inner product similarity to quantitatively formalize
such principle for similarity measurement. We first propose a basic MRSE scheme using
secure inner product computation, and then significantly improve it to meet different privacy
requirements in two levels of threat models. Thorough analysis investigating privacy and
efficiency guarantees of proposed schemes is given, and experiments on the real-world
dataset further show proposed schemes indeed introduce low overhead on computation and
communication.

PROBLEM DESCRIPTION
Our contributions are summarized as follows:
1. For the first time, we explore the problem of multi keyword ranked search over encrypted
cloud data, and establish a set of strict privacy requirements for such a secure cloud data
utilization system.
2. We propose two MRSE schemes based on the similarity measure of coordinate matching
while meeting different privacy requirements in two different threat models.
3. We investigate some further enhancements of our ranked search mechanism to support
more search semantics and dynamic data operations.
4. Thorough analysis investigating privacy and efficiency guarantees of the proposed
schemes is given, and experiments on the real-world data set further show the proposed
schemes indeed introduce low overhead on computation and communication.

INTRODUCTION
THE GENERIC PRIVACY PRESERVING PROBLEM
The problem of learning something without revealing ones own data is not new. It
was proposed way back in 1982 by Yao [5]. It has become very important as data has started
to grow a million times faster and along with it the demands to keep it for oneself only. When
the problem was proposed the web had just come out of its infancy, today it is mature and
large and spread to the remotest corners of the world. The authors marked back then that
explosive progress in networking, storage, and processor technologies has led to the creation
of ultra large databases that record unprecedented amount of transactional information [1].
We are more concerned in privacy preservation with context to data mining algorithms. This
is one point where the privacy can be trapped. Suggestion from paper that data mining and
data warehousing go hand-in- hand: Most tools operate by gathering all data into a central
site, then running an algorithm against that data [6]. However, privacy concerns can prevent
building a centralized warehouse data may be distributed among several custodians, none
of which are allowed to transfer their data to another site. It should be noted that what data
mining algorithms produce is knowledge, and that data mining results rarely violate privacy,
as they generally reveal high-level knowledge rather than disclosing instances of data.
However, the concern among privacy advocates is well founded, as bringing data together to
support data mining makes misuse easier. The problem is not data mining, but the way data
mining is done [7]. 1. The Solutions to the problem PPDM is a new era of research in data
mining, where data mining algorithms are analysed for possible infringement in privacy.
PPDM research usually takes one of the three philosophical approaches: (1) data hiding, in
which sensitive raw data like identifiers, name, addresses, etc. were transformed, jammed, or
trimmed out from the original database, in order for the users of the data not to be able to
compromise another persons privacy; (2) secure multiparty computation, where distributed
data are encrypted before released or shared for computations; and (3) rule hiding, in which
sensitive knowledge extracted from the data mining process be excluded for use, because
private information may be derived from the released knowledge; thus, no party knows no
matter which except its own inputs and the results. The crucial goal of PPDM is to develop
efficient algorithms that allow one to extract relevant knowledge from a large amount of data,
while prevent sensitive data and information from leak or deduction [10]. The third approach
can be broadly called cryptographic approach to solve PPDM problems. A very nicely
classifies all the proposed solutions into various categories depending on what methods are

used in [8]. Various ways to handle PPDM problems including the cryptographic approaches
available in [4][3]. 2. The Quantifiers of efficient solution The most general parameters for
analysing efficient solution are overall performance in all the areas. A framework is but
required to get exact comparative measures. Attempts have been made in past to generalize a
framework. One is proposed by Bertino et al. The framework they identified was based on the
following evaluation dimensions [9]: -- Efficiency. The ability of a privacy preserving
algorithm to execute with good performance in terms of all the resources implied by the
algorithm; Scalability. This factor evaluates the efficiency trend of a PPDM algorithm for
increasing sizes of the data from which relevant information is mined while ensuring privacy;
Data quality after the application of a privacy preserving technique. Considered both as the
quality of data themselves and the quality of the data mining results after the hiding strategy
is applied; Hiding failure. The portion of sensitive information that is not hidden by the
application of a privacy preservation technique; Privacy level offered by a privacy
preserving technique. It estimates the degree of uncertainty, according to which sensitive
information can still be predicted even if it has been hidden. Such framework allows one to
assess the different features of a privacy preserving algorithm according to a variety of
evaluation criteria. 3 Privacy Preserving Techniques the most used technique yet has been
secure multiparty computation. As mentioned earlier, the basic privacy preservation problem
is a classical multiparty problem. Cryptography-based SMC has the highest accuracy in data
mining and good privacy preservation capability as well; however, it has strict usages as it is
only applicable to a distributed data environment [6].
CLOUD BASED PRIVACY DATA SHARING USING DATAMINING:
The rapid expansion of data, the data owners tend to store their data into the cloud to
release the burden of data storage and maintenance [1]. However, as the cloud customers and
the cloud server are not in the same trusted domain, our outsourced data may be under the
exposure to the risk. Thus, before sent to the cloud, the sensitive data needs to be encrypted to
protect for data privacy and combat unsolicited accesses. Unfortunately, the traditional
plaintext search methods cannot be directly applied to the encrypted cloud data any more.
The traditional information retrieval (IR) has already provided multi-keyword ranked search
for the data user. In the same way, the cloud server needs provide the data user with the
similar function, while protecting data and search privacy.

It is meaningful storing it into the cloud server only when data can be easily searched and
utilized. In the literature, searchable encryption techniques [2-4] are able to provide secure
search over encrypted data for users. They build a searchable inverted index that stores a list
of mapping from keywords to the corresponding set of files which contain this keyword.
When data users input a keyword, a trapdoor is generated for this keyword and then
submitted to the cloud server Some researchers study the problem on secure and ranked
search over outsourced cloud data. Wang et al., [5] propose a secure ranked keyword search
scheme. Their solution combines inverted index with order-preserving symmetric encryption
(OPSE). In terms of ranked search, the order of retrieved files is determined by numerical
relevance scores, which can be calculated by TFIDF. The relevance score is encrypted by
OPSE to ensure security.
It enhances system usability and saves communication overhead. This solution only supports
single keyword ranked search. Cao et al., [6] propose a method that adopts similarity measure
of coordinate matching to capture the relevance of files to the query. They use inner
product similarity to measure the score of each file. This solution supports exact multikeyword ranked search. It is practical, and the search is flexible. Sun et al., [7] proposed a
MDB-tree based scheme which supports ranked multi-keyword search. This scheme is very
efficient, but the higher efficiency will lead to lower precision of the search results in this
scheme. These methods employ a spellcheck mechanism, such as, search for wireless
instead of wireiess, or the data format may not be the same e.g., data-mining versus
datamining. Chuah et al., [8] propose a privacy-aware bed-treemethod to support fuzzy
multi-keyword search. This approach uses edit distance to build fuzzy keyword sets. Bloom
filters are constructed for every keyword. Then, it constructs the index tree for all files where
each leaf node a hash value of a keyword. Li et al., [9] exploit edit distance to quantify
keywords similarity and construct storageefficient fuzzy keyword sets. Specially, the
wildcard-based fuzzy set construction approach is designed to save storage overhead. Wang
et al., [10] employ wildcard-based fuzzy set to build a private trie-traverse searching index. In
the searching phase, if the edit distance between retrieval keywords and ones from the fuzzy
sets is less than a predetermined set value, it is considered similar and returns the
corresponding files. These fuzzy search methods support tolerance of minor typos and format
inconsistencies, but do not support semantic fuzzy search. Considering the existence of
polysemy and synonymy [10], the model that supports multi-keyword ranked search and
semantic search is more reasonable. In this work, we will solve the problem of multikeyword

latent semantic ranked search over encrypted cloud data and retrieve the most relevant files.
We define a new scheme named Latent Semantic Analysis (LSA)-based multi-keyword
ranked search which supports multi-keyword latent semantic ranked search. By using LSA,
the proposed scheme could return not only the exact matching files, but also the files
including the terms latent semantically associated to the query keyword. For example, when
the user inputs the keyword automobile to search files, the proposed method returns not
only the files containing automobile, but also the files including the term car We take a
large matrix of term-document association data and construct a semantic space wherein terms
and documents are closely associated are placed near one another. To meet the challenge of
supporting such multi-keyword semantic without privacy breaches, we propose the idea: the
multi-keyword ranked search (MRSE) using Latent Semantic Analysis.

LITERATURE REVIEW
PRIVACY PRESERVING KEYWORD SEARCHES ON REMOTE ENCRYPTED
DATA:
Consider the problem: a user U wants to store his files in an encrypted form on a remote file
server S. Later the user U wants to efficiently retrieve some of the encrypted files containing
specific keywords, keeping the keywords themselves secret and not to endanger the security
of the remotely stored files. For example, a user may want to store old e-mail messages
encrypted on a server managed by Yahoo or another large vendor, and later retrieve certain
messages while travelling with a mobile device. In [2], solutions for this problem under welldefined security requirements are offered. The schemes are efficient as no public-key
cryptosystem is involved. Indeed, the approach is independent of the encryption method
chosen for the remote files. They are incremental too. In that, user U can submit new files
which are secure against previous queries but still searchable against future queries. From
this, the main theme taken is of storing data remotely on other server and retrieving that data
from anywhere via mobile, laptop etc.
CRYPTOGRAPHIC CLOUD STORAGE:
When the benefits of using a public cloud infrastructure are clear, it introduces significant
security and privacy risks. In fact, it seems that the biggest obstacle to the adoption of cloud
storage (and cloud computing in general) is concern over the confidentiality and integrity of
data. In [3], an overview of the benefits of a cryptographic storage service, for example,
reducing the legal exposure of both customers and cloud providers, and achieving regulatory
compliance is provided. Besides this, cloud services that could be built on top of a
cryptographic storage service such as secure backups, archival, health record systems, secure
data exchange and e-discovery is stated briefly.
EFFICIENT AND SECURE MULTI-KEYWORD SEARCH ON ENCRYPTED
CLOUD DATA:
On one hand, users who do not necessarily have prior knowledge of the encrypted cloud
data, have to post process every retrieved file in order to find ones most matching their
interest; On the other hand, invariably retrieving all files containing the queried keyword
further incurs unnecessary network traffic, which is absolutely undesirable in todays pay-asyou-use cloud paradigm. This work has defined and solved the problem of effective yet

secure ranked keyword search over encrypted cloud data [4]. Ranked search greatly enhances
system usability by returning the matching files in a ranked order regarding to certain
relevance criteria (e.g., keyword frequency) thus making one step closer towards practical
deployment of privacy-preserving data hosting services in Cloud Computing. For the first
time, the work has defined and solved the challenging problem of privacy-preserving multikeyword ranked search over encrypted cloud data (MRSE), and establishes a set of strict
privacy requirements for such a secure cloud data utilization system to become a reality. The
proposed ranking method proves to be efficient to return highly relevant documents
corresponding to submitted search terms. The idea of proposed ranking method is used in our
proposed system in order to enhance the security of data on Cloud Service Provider.
Providing Privacy Preserving in Cloud Computing: Privacy is an important issue for cloud
computing, both in terms of legal compliance and user trust and needs to be considered at
every phase of design. The [5] work tells the importance of protecting individuals privacy in
cloud computing and provides some privacy preserving technologies used in cloud
computing services. Work tells that it is very important to take privacy into account while
designing cloud services, if these involve the collection, processing or sharing of personal
data. From this work, main theme taken is of preserving privacy of data. This work only
describes privacy of data but doesnt allow indexed search as well as doesnt hide users
identity. Thus, these two drawbacks are overcome in our proposed system. F. Privacy
Preserving Data Sharing With Anonymous ID Assignment: In this work, an algorithm for
anonymous sharing of private data among N parties is developed. This technique is used
iteratively to assign these nodes ID numbers ranging from 1 to N. This assignment is
anonymous in that the identities received are unknown to the other members of the group. In
[6], existing and new algorithms for assigning anonymous IDs are examined with respect to
trade-offs between communication and computational requirements. These new algorithms
are built on top of a secure sum data mining operation using Newtons identities and Sturms
theorem. The main idea taken from this work is of assigning anonymous ID to the user on the
cloud.
ENABLING EFFICIENT FUZZY KEYWORD SEARCH OVER ENCRYPTED DATA
IN CLOUD COMPUTING:
In this work, main idea is to formalize and solve the problem of effective fuzzy keyword
search over encrypted cloud data while maintaining keyword privacy [7]. This basic idea is
taken but it is for multi-keyword raked search (MRSE scheme) in our proposed system. In

[8], design of secure cloud storage service which addresses the reliability issue with near
optimal overall performance is proposed.
ACHIEVING SECURE, SCALABLE, AND FINE-GRAINED DATA
Access Control in Cloud Computing: Achieving finegrainedness, scalability, and data
confidentiality of access control simultaneously is a problem which actually still remains
unresolved. The work [9] addresses this challenging open issue by, on one hand, defining and
enforcing access policies based on data attributes, and, on the other hand, allowing the data
owner to delegate most of the computation tasks involved in fine-grained data access control
to untrusted cloud servers without disclosing the underlying data contents. In [10], authors
have proposed a privacy-preserving public auditing system for data storage security in Cloud
Computing scheme is proposed. It utilizes the homomorphic linear authenticator and random
masking to guarantee that the TPA would not learn any knowledge about the data content
stored on the cloud server during the efficient auditing process, which eliminates the burden
of cloud user from the tedious and possibly expensive auditing task, it also alleviates the
users fear of his/her outsourced data leakage.

SYSTEM ANALYSIS
PREVIOUS WORK
The large number of data users and documents in cloud, it is crucial for the
search service to allow multi-keyword query and provide result similarity ranking to meet the
effective data retrieval need. The searchable encryption focuses on single keyword search or
Boolean keyword search, and rarely differentiates the search results.

DISADVANTAGES

Single-keyword search without ranking

Boolean- keyword search without ranking

Single-keyword search with ranking

Proposed System:
We define and solve the challenging problem of privacy-preserving multi-keyword
ranked search over encrypted cloud data (MRSE), and establish a set of strict privacy
requirements for such a secure cloud data utilization system to become a reality. Among
various multi-keyword semantics, we choose the efficient principle of coordinate matching.
we propose the problem of Secured Multikeyword search (SMS) over encrypted cloud data
(ECD), and construct a group of privacy policies for such a secure cloud data utilization
system. From number of multi-keyword semantics, we select the highly efficient rule of
coordinate matching, i.e., as many matches as possible, to identify the similarity between
search query and data , and for further matching we use inner data correspondence to
quantitatively formalize such principle for similarity measurement. We first propose a basic
Secured multi keyword ranked search scheme
using secure inner product computation, and then improve it to meet different privacy
requirements. The Ranked result provides top k retrieval results. Also we propose an alert
system which will generate alerts when un-authorized user tries to access the data from cloud,
the alert will generate in the form of mail and message.

Advantages of Proposed System:


1. Multi-keyword ranked search over encrypted cloud data (MRSE)
2. Coordinate matching by inner product similarity.
3. Secured Multi-keyword Ranked Search: To design search schemes which allow multikeyword query and provide result similarity ranking for valuable data retrieval,
instead of returning undifferentiated results.
4. Privacy: To prevent cloud server from learning additional information from dataset
and index, and to meet privacy requirements. Effectiveness with high performance:
Above goals on functionality and privacy should be achieved with low
communication and computation overhead.

5.

MODULE DESCRIPTION
Encrypt Module:
This module is used to help the server to encrypt the document using RSA Algorithm
and to convert the encrypted document to the Zip file with activation code and then activation
code send to the user for download.
Client Module:
This module is used to help the client to search the file using the multiple key words
concept and get the accurate result list based on the user query. The user is going to select the
required file and register the user details and get activation code in mail from the
customerservice404 email before enter the activation code. After user can download the
Zip file and extract that file.
Multi-keyword Module:
This module is used to help the user to get the accurate result based on the multiple
keyword concepts. The users can enter the multiple words query, the server is going to split
that query into a single word after search that word file in our database. Finally, display the
matched word list from the database and the user gets the file from that list. The search query
is also described as a binary vector where each bit means whether corresponding keyword
appears in this search request, so the similarity could be exactly measured by inner product of
query vector with data vector. However, directly outsourcing data vector or query vector will
violate index privacy or search privacy. To meet the challenge of supporting such multikeyword semantic without privacy breaches, we propose a basic SMS scheme using secure
inner product computation, which is adapted from a secure k-nearest neighbour (kNN)
technique, and then improve it step by step to achieve various privacy requirements in two
levels of threat models.
1) Showing the problem of Secured Multi-keyword search
over encrypted cloud data
2) Propose two schemes following the principle of
coordinate matching and inner product similarity.

Admin Module:
This module is used to help the server to view details and upload files with the
security. Admin uses the log key to the login time. Before the admin logout, change the log
key. The admin can change the password after the login and view the user downloading
details and the counting of file request details on flowchart. The admin can upload the file
after the conversion of the Zip file format.

File upload Module:


This module is used to help the server to view details and upload files with the security.
Admin uses the log key to the login time. Before the admin logout, change the log key. The
admin can change the password after the login and view the user downloading details and the
counting of file request details on flowchart. The admin can upload the file after the
conversion of the Zip file format.

Ranking Result:
When any User request for the data then Ranking is done on requested data using k-nearest
neighbor algorithm. For Ranking co-ordinate matching principle is used. After ranking
user gets the expected results of the query.

SYSTEM FLOW DIAGRAM


User

Keyword input

Cloud server
verification

Co-ordinate Matching
File Verification
Triple DES Encryption/Decryption

Mrse keyword search


User Verification
File info

Result

Keyword matched file

End

CONCLUSION
In this paper, for the first time we define and solve the problem of multi-keyword
ranked search over encrypted cloud data, and establish a variety of privacy requirements.
Among various multi-keyword semantics, we choose the efficient principle of coordinate
matching, i.e., as many matches as possible, to effectively capture similarity between query
keywords and outsourced documents, and use inner product similarity to quantitatively
formalize such a principle for similarity measurement.
For meeting the challenge of supporting multi-keyword semantic without privacy
breaches, we first propose a basic MRSE scheme using secure inner product computation,
and significantly improve it to achieve privacy requirements in two levels of threat models.
Thorough analysis investigating privacy and efficiency guarantees of proposed schemes is
given, and experiments on the real-world dataset show our proposed schemes introduce low
overhead on both computation and communication. As our future work, we will explore
supporting other multi-keyword semantics (e.g., weighted query) over encrypted data,
integrity check of rank order in search result and privacy guarantees in more stronger threat
model.

FUTURE ENHANCEMENT
In Future we have focus a new way to solve the problems by introducing Secure
Keyword Search mechanism. Secure keyword search displays the top list of files with fewer
results as output. Using fewer results as output we increase the file retrieval accuracy and
reduce the communication overhead. Secure Keyword Search starts in ranked list files only.
Ranked list files can identify with relevance score and statistical measure. To achieve our
design goals on both system security and usability, to bring together the advance of both
crypto and IR community to design the ranked searchable symmetric encryption (RSSE)
scheme.

REFERENCES
[1] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, A break in the clouds:
towards a cloud definition, ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp.
5055, 2009.
[2] S. Kamara and K. Lauter, Cryptographic cloud storage, in RLCPS, January 2010,
LNCS. Springer, Heidelberg.
[3] A. Singhal, Modern information retrieval: A brief overview, IEEE Data Engineering
Bulletin, vol. 24, no. 4, pp. 3543, 2001.
[4] I. H. Witten, A. Moffat, and T. C. Bell, Managing gigabytes: Compressing and indexing
documents and images, Morgan Kaufmann Publishing, San Francisco, May 1999.
[5] D. Song, D. Wagner, and A. Perrig, Practical techniques for searches on encrypted data,
in Proc. of S&P, 2000.
[6]

E.-J.

Goh,

Secure

indexes,

Cryptology

ePrint

Archive,

2003,

http://

eprint.iacr.org/2003/216.
[7] Y.-C. Chang and M. Mitzenmacher, Privacy preserving keyword searches on remote
encrypted data, in Proc. of ACNS, 2005.
[8] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, Searchable symmetric
encryption: improved definitions and efficient constructions, in Proc. of ACM CCS, 2006.
[9] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, Public key encryption with
keyword search, in Proc. of EUROCRYPT, 2004.
[10] M. Bellare, A. Boldyreva, and A. ONeill, Deterministic and efficiently searchable
encryption, in Proc. of CRYPTO, 2007.

You might also like