You are on page 1of 14

Secure Auditing and Deduplicating

Data in Cloud
Akhila N P
Hinduja D K
Pallavi M C

PRELIMINARY STUDY
The two main disadvantages of existing system are :
It is very difficult to audit huge files and large amount of data in cloud using

integrity auditing.
Data loss and lots of duplicate files in cloud.
Proposed System:
In this project, we implement secure auditing and deduplication of cloud data.
To obtain the above two functionality, we propose two secure system secCloud and

secCloud+.
In secCloud system, the maintenance of MapReduce helps to reduce the computation

load while generating data tags before uploading as well as audit the integrity of data

that is stored in the cloud.


secCloud cannot prevent the cloud servers from knowing the content of files

having been stored. In other words the functionality of integrity auditing and
secure deduplication are only imposed on plain files.
Hence, in this project, we propose secCloud+, which enables guarantee of file

confidentiality apart from providing integrity auditing and deduplication on


encrypted files.
The design goal of file confidentiality requires to prevent the cloud servers from

accessing contents of files. Specially, we require that the goal of file


confidentiality needs to be resistant to dictionary attack, where the adversaries
have the pre-knowledge of the dictionary, which includes all possible files , they
still cannot recover the target files.

ADVANTAGES OF PROPOSED SYSTEM


It provides integrity auditing by removing the duplicate files.
The duplicate files are mapped with a single copy of the file by mapping

with the existing file in the cloud.


Provide integrity auditing and secure deduplication directly on encrypted

data.

Literature Survey
1) Enabling Public Verifiability and Data Dynamics for Storage Security
in Cloud Computing:
Cloud Computing helps to move the database and application software to the
centralized large data centres, where the management of the data and services
may not be fully trustworthy. This project studies the problem of ensuring the
data integrity of the storage in Cloud Computing. The introduction of TPA
eliminates the involvement of client through the auditing of whether his data
stored in the cloud is indeed intact, which can be important in achieving
economies of scale for Cloud Computing as well. The support for data
dynamics via the most general forms of data operation, such as block
modification, deletion , insertion is also a significant step towards practicality,
since services in the Cloud Computing are not limited to archive or backup
data. While prior work on ensure remote data integrity often lack the supports
of either public verifiability or dynamic data operation.

2) Proofs of Ownership in Remote Storage Systems:


Cloud storage systems are becoming increasingly popular. A promising
technology that keeps the users cost down is removing duplication of file ,
which stores only a single copy of the data repeating.
Client side deduplication attempts to identify the deduplication
opportunities already present at the client and save the bandwidth of
uploading copies of existing files to the server which is harmful.
In this work we identify attacks that exploit client-side deduplication,
allowing an attacker to gain access to arbitrary size files of other users
based on a very small hash signature of these files.
More specifically, an attacker if he knows the hash signature of a file can
convince the storage service that it owns that file, hence that server lets the
attacker download the entire file.

3) DupLESS: Server-Aided Encryption for Deduplicated Storage :


Cloud storage service providers such as Dropbox and others perform the
deduplication to save space by only storing one copy of each file uploaded
on it.
Message locked encryption (the prominent manifestation of which is
convergent encryption) resolves all the tension. However it is inherently
subject to the brute force attack that can recover files falling into some
known set.
Here an architecture is which provides secure de-duplicated
storage resisting brute-force attacks too, and realize it in a system called
DupLESS .
In DupLESS, client encrypt message based keys obtained from key server.
It enables the client for storing encrypted data with an existing service, that
have the service perform deduplication on their behalf, and yet achieves
strong confidentiality guarantees

4) Provable Data Possession at Untrusted Stores :


The research paper proposes a model for provable data possession that is
PDP which allows a client that has stored data at an untrusted server for the
verification of the server possesses the original data without retrieving.
The model generates the probabilistic proofs of the possession by sampling
some random sets of blocks from the server, which drastically reduces I/O
costs.
The client maintains some constant amount of the metadata to verify the
proof.
The challenge or the response protocol transmits small and constant amount
of the data, which minimizes the network communication. Thus, the PDP
model for the remote data checking supports large data sets in widely
distributed storage systems.

Identification, formulation and


analysis of engineering problems
Secure deduplication:

Deduplication is a technique where the server stores only single copy of file,
regardless of how many clients asked to store that file, such that the disk
space of cloud servers as well as network bandwidth are saved.
Encryption and decryption:
It provides data confidentiality in deduplication . A user derives a key from
the data content and encrypts the data copy with the key. In addition, the
user derives a tag for data copy, such that the tag will be used to detect
duplicate i.e. if two data copies are same then their tags are the same .
Integrity auditing:
The goal of this work is to provide the capabilities of verifying correctness of
the remotely stored data.

Software Requirement Specification


1. Functional Requirements :
Practical prerequisite characterizes capacity of programming framework and it

gives insight about how the framework ought to carry on when given
particular data or conditions. These can take counts, information control,
handling and other significant usefulness.
The functional requirement of this system will be shown as below
First step is data division in small packet.
Hash tag generation for all packets.
Hash tag comparison.
Data owner encrypts data using AES technique and uploads to server.
While downloading the data from cloud first data auditing happens after that if
data is secure then it get downloaded and decrypted using AES.

2. Non Functional Requirements:


Non utilitarian necessities are the prerequisites which are not straightforwardly

having a place with the specific capacity gave by the framework. This gives the
criteria that can be utilized to finish up the operation of a framework rather than
particular practices.
This can be utilized to relate the rising structure properties, for instance, immovable
quality, response time and store inhabitancies. Here again they ought to portray
objectives on the system, for instance, the capacity of the data yield devices and
data representation used as a piece of structure interfaces.
Non utilitarian necessity gets through the client needs, in view of spending plan
limitations, hierarchical approaches, and the requirement for interoperability with
other programming and equipment frameworks.
The going with non-valuable requirements is meriting thought.
Security: The framework ought to permit a secured correspondence between
information proprietor and beneficiary.
Reliability: The system should be trustworthy and ought not to corrupt the
execution of the present structure and should not to provoke the hanging of the
structure.

HARDWARE REQUIREMENTS

Processor
Hard Disk
RAM

: Pentium IV with 2.6 GHz


: 150GB
: 1 GB

SOFTWARE REQUIREMENTS
J2EE Servlet , JSP
JDK 1.6
Database Server My-SQL Server
Tomcat 6.0
Operating System: Windows XP Edition

HIGH LEVEL DESIGN


System Architecture

THANK YOU

You might also like