Dessertation Final Report

Data Security in Cloud Computing
Mr .ANIL KUMAR MYSA

07053456
AMK0144@londonmet.ac.uk
Supervisor : NICHOLAS IOANNIDES

n.ioannides@londonmet.ac.uk
A Dissertation submitted in partial fulfillment
of the requirements of London Metropolitan University
for the degree of Master of Science (MSc) in Computer Networking
Faculty of Computing
January 2010
Abstract
Cloud computing has become a significant technology trend and many experts expect
it to reshape information technology processes and the IT market place in next few
years. Security and data security becomes more important when using clod computing
at all levels like Infrastructure as a service (IaaS) Platform as a Service (PaaS) and
Software as service (SaaS). A major reason for the lack of effective data security is
the simply the limitations of current encryption capabilities. In this paper a novel
techniques were discussed like Key derivation method, homomorphic tokens and
achieve stronger data security storage correctness, fast localization of data error. This
is followed by a discussion of the performance evaluation and an outlook into future.
Summary
2
Cloud computing is not an innovation but a means to constructing IT services that use
advanced computational power and improved storage capabilities it has drawn the
attention of major industrial companies, scientific communities as well as user groups.
Critics argue that cloud computing is not secure enough data leaves companies local
area networks. Encryption is a well known approach to addressing these types
security threats for protection in the cloud, the enterprise would need to encrypt all
data and communications.
Approaching an effective, flexible distributed scheme with explicit dynamic data
support to ensure the correctness of user’s data in the cloud by utilizing homomorphic
token system with distributed verification of erasure-coded data, and working on
integration of storage correctness insurance and data error localization.
Further to find secure and efficient dynamic operations on data blocks, including data
update, delete and append. Overall to design efficient mechanism for dynamic data
verification and operation to achieve storage correctness, fast localization of data
error, dynamic data support, minimizing the effect brought by data errors or secure
failures.
Reviewed on basic tools of coding like homomorphic tokens that in needed for file
distribution across cloud servers, analysis are studied from the proposed approaches
for computational, storage and communication overhead of a data access operation
and to prevent revoked users from getting access to outsourced data through
eavesdropping, efficient key management methods like key derivation hierarchy
are delivered.
3
Contents Page.no
Chapter1 Introduction 11
1.1 Statement of Problem 11
1.2 Aims and Objectives 13
1.3 Approach and Methodology 15
1.4 Chapter Preview 16
Chapter 2.0 Literature review 17
2.1 Cloud Computing 18
Definitions
Models
Levels
Cloud Storage
2.2 Cloud Computing Security Issues 19
2.2a Confidentiality
2.2b Authentication
2.2c Authorization
2.2d Integrity
2.5e Availability
2.3 Technical Security Issues 19
2.4Requirements of Data Security in Cloud Computing 22
2.5 Infrastructure Security
Network Level
Host Level
Application Level
4
Infrastructure Responsibilities and Challenges
Chapter 3.0 First Approach 23
Confidentiality 24
3.1 Encryption Techniques 26
3.2 Proof Of Retrievability 28
3.3 29
3.4 31
3.5 Conclusion 32
3.6 Summary 32
Chapter 4.0 Second Approach 33
Integrity
Message Authentication Code
Data Verifiability 33
4.1 Conclusion 38
4.2 Summary 38
Chapter 5.0 Third Approach 39
Availabiltiy 39
5.1 Major Threats 41
5.2 Network Based Attacks 42
5.3 CSP’s Own Availabiltity 44
5.4 45
5.4.1 45
5
5.4.2 45
5.4.3 45
5.5 Conclusion 47
5.6 Summary 48
Chapter6.0 Critical Appraisal, Recommendations and Future work 49
6.1 Future Work 51
Chapter 7 Conclusions 52
Appendix A Scientific Article 53
Appendix B Project Proposal
References and Bibliography
List of Figures page.no
Fig1. Cloud Data Storage Architecture 18
Fig2.Key Derivation Hierarchy 25
Fig3. Handling Updates to Data Blocks 30
Fig4. HDFS Architecture 33
Fig5. Cloud Computing Data Security mode 35
6
Definition of Terms
HDFS Architecture The Hadoop Distributed File System (HDFS) is a distributed
file system designed to run on commodity hardware. It has many similarities with
existing distributed file systems. However, the differences from other distributed
file systems are significant. HDFS is highly fault-tolerant and is designed to be
deployed on low-cost hardware. HDFS provides high throughput access to
application data and is suitable for applications that have large data sets. HDFS
relaxes a few POSIX requirements to enable streaming access to file system data.
HDFS was originally built as infrastructure for the Apache Nutch web search
engine project. HDFS is part of the Apache Hadoop Core project.[1]
7
Reed–Solomon error correction is an error correcting code that works by
oversampling a polynomial constructed from the data. The polynomial is evaluated at
several points, and these values are sent or recorded. Sampling the polynomial more
often than is necessary makes the polynomial over-determined. As long as it receives
"many" of the points correctly, the receiver can recover the original polynomial even
in the presence of a "few" bad points.[2]
Universal hashing is a randomized algorithm for selecting a hash function F with the
following property: for any two distinct inputs x and y, the probability that F(x)=F(y)
(i.e., that there is a hash collision between x and y) is the same as if F was a random
function. Thus, if F has function values in a range of size r, the probability of any
particular hash collision should be at most 1/r. There are universal hashing methods
that give a function F that can be evaluated in a handful of computer instructions.[3]
Homomorphic encryption is a form of encryption where one can perform a specific
algebraic operation on the plaintext by performing a (possibly different) algebraic
operation on the ciphertext. Depending on one's viewpoint, this can be seen as a
positive or negative attribute of the cryptosystem. Homomorphic encryption schemes
are malleable by design and are thus unsuited for secure data transmission. On the
other hand, the homomorphic property of various cryptosystems can be used to create
secure voting systems, collision-resistant hash functions and private information
retrieval schemes.[4]
Byzantine fault is an arbitrary fault that occurs during the execution of an algorithm
by a distributed system. It encompasses those faults that are commonly referred to as
8
"crash failures" and "send and omission failures". When a Byzantine failure has
occurred, the system may respond in any unpredictable way, unless it is designed to
have Byzantine fault tolerance.
These arbitrary failures may be loosely categorized as follows:
• a failure to take another step in the algorithm, also known as a crash failure;
• a failure to correctly execute a step of the algorithm; and
• arbitrary execution of a step other than the one indicated by the algorithm.[5]
Glossary
ACM Access Control Matrix
MAC Message Authentication Code
HDFS Hoop Distributed File System
SAAS Software as a Service
9
PAAS Platform as a Service
IAAS Infrastructure as a Service
SLA Service Legal Agreement
SOA Service Oriented Architecture
ACKNOWLEDGMENT
To my Supervisor- Dr. Nicholas Ioannides. Thank you for your insight, patience,
Encouragement and guidance, for your constant willingness to answer the frequent
10
qestions. I had during the course of my research, for your continuing support which
has helped me enthusiastically compile this work and for your added humour which
left a smile on my face and enabled me work with light-heartedness. I hope that all
Your future endeavours will be filled with affluence. I am indeed honoured and
Privileged to call myself your student.
To my parents Mum and Dad, your prayers worked .Thanks a lot
Chapter 1
1.1 Statement of Problem
Cloud computing is not an innovation but a means to
constructing IT services that use advanced computational power and improved storage
capabilities it has drawn the attention of major industrial companies, scientific
communities as well as user groups. The main focus of cloud computing from the
providers view as extraneous hardware connected to support downtime on any device
in the network.
11
Critics argue that cloud computing is not secure enough data leaves companies local
area networks. It is up to the clients to decide the vendors, depending on how willing
they are to implement secure policies and be subject to third party verifications. [6]
Encryption is a well known approach to addressing these types security threats for
protection in the cloud, the enterprise would need to encrypt all data and
communications while it is not that difficult to add encryption software initially to the
application environment, the new configuration requires ongoing management and
maintenance. And in order to run the application in the cloud, the enterprise needs to
deliver the encryption keys to the cloud to decrypt the data, creating additional
security risks by exposing the keys in the operating environment. [7]
Recent advances in cryptography could mean that future cloud computing services
will not only be able to encrypt documents to keep them safe in cloud but also make it
possible to search and retrieve this information without first decrypting it.
Encrypted search architectures and tools have been developed by groups at several
universities and companies. Though there are a variety of different approaches, most
technologies encrypt data file as well as tags called metadata that describe the
contents of those files and issue a master key to the user. The token used to search
through encrypted data contains functions that are able to find matches to metadata
attached to certain files and then return the encrypted files to the user. Once the user
has the file, he can use his master decryption “key” to decrypt it. [8]
12
Firstly, traditional cryptographic primitives for the purpose of data security protection
can not be directly adopted due to the users’ loss control of data under cloud
Computing. Therefore, verification of correct data storage in the cloud must be
conducted without explicit knowledge of the whole data. Considering various kinds of
data for each user stored in the cloud and the demand of long term continuous
assurance of their data safety, the problem of verifying correctness of data storage in
the cloud becomes even more challenging. Secondly, Cloud Computing is not just
a third party data warehouse. The data stored in the cloud may be frequently updated
by the users, including insertion, deletion, modification, appending, reordering, etc.
To ensure storage correctness under dynamic data update is hence of paramount
importance. However, this dynamic feature also makes traditional integrity insurance
techniques futile and entails new solutions. Last but not the least, the deployment
of Cloud Computing is powered by data centres running in a simultaneous,
cooperated and distributed manner. Individual user’s data is redundantly stored in
multiple physical locations to further reduce the data integrity threats. Therefore,
distributed protocols for storage correctness assurance will be of most importance in
achieving a robust and secure cloud data storage system in the real world. However,
more research efforts are needed to achieve flexible access control to large scale
dynamic data. [9]
1.2 Aims and Objectives:
The Aim of the project is to Study and Evaluate on major security concerns like
Confidentiality ,Integrity, and Availability in cloud computing to achieve Secure
Data in Infrastructure as a Service(IaaS) Model.
13
Objectives:
Academic Objectives: This project will introduce an effective and flexible distributed
scheme with two salient features by utilizing homomophic token with distributed
verification of erasure-coded data to achieve the integration of storage correctness
insurance and data error localization and designing a data security model.
To determine the efficient key management method for data
block encryption.
To prevent revoked users from getting access to outsourced data through
eavesdropping.
To analyse the computational, storage and communication overhead of a data access
operation.
To review of basic tools from coding theory that is needed for file distribution across
cloud servers.
Personal Objectives:
To gain sound knowledge on cryptographic tools and encryption key methods
dynamic data storage.
To understand the issues and problems associated with data storage in cloud
computing.
To get information on cloud security issues from journals, conference papers and IT
magazines.
To know the security issues technical and not technical in cloud computing.
14
1.3 Approach and Methodology:
To develop a new scheme which integrates several advanced techniques to
secure and efficient access to large scale outsourced data in cloud computing by
encrypting every data block with a different symmetric key and adopting key
derivation method and to provide fine grained access control to the outsourced data
with flexible and efficient management by reducing the risk of owner where user
needs only few secrets for key derivation and it does not need to access the storage
server except for data updates and to construct the key hierarchies key derivation
procedures and mechanisms to handle dynamics in outsourced data blocks.
Approaching an effective, flexible distributed scheme with explicit dynamic data
support to ensure the correctness of user’s data in the cloud by utilizing
homomorphic token system with distributed verification of erasure-coded data, and
working on integration of storage correctness insurance and data error localization.
Further to find secure and efficient dynamic operations on data blocks, including data
update, delete and append. Overall to design efficient mechanism for dynamic data
verification and operation to achieve storage correctness, fast localization of data
error, dynamic data support, minimizing the effect brought by data errors or secure
failures.
15
To review basic tools from coding theory which are needed for file distribution and
homomorphic token system which belongs to universal hash function to preserve
homomorphic properties and to perfectly integrate with verification of erasure-coded
data and also to derive challenge response protocol to verify the storage correctness
as well as to identify misbehaving servers.[9]
1.4 Chapter Preview: The following chapter provides an overview of the project. In
this chapter issues such as project overview, problem of statement, contribution and
report outline are discussed.
Chapter 1 discusses the introduction of the project. Chapter 2 explores the background
and literature review. Chapter 3 is the First approach explains hot to secure and
efficient access to outsourced data. Chapter 4 gives the security model for cloud
computing. Chapter 5 gives to ensure security storage security in cloud computing.
Chapter 6 gives critical appraisal and future work. Chapter 7 shows the conclusion.
16
Chapter 2 Literature Review
2.1Cloud Computing
The concept of cloud computing addresses the next evolutionary step of distributed
computing. The goal of this computing model is to make a better use of distributed
resources, put them together in order to achieve higher throughput and be able to
tackle large scale computation problems. Cloud computing is not a completely new
concept for the development and operation of web applications. It allows for the most
cost effective development of scalable web portals on highly available and fail –safe
infrastructures. The evolution of cloud computing can handle such massive data as per
on demand service [10].
There are three categories of cloud computing:
Software as a service (SaaS): is software offered by a third party provider, available
on demand, usually via the Internet configurable remotely. Examples include online
word processing and spreadsheet tools, CRM services and web content delivery
services (Salesforce CRM, Google Docs, etc)
Platform as a service (PaaS): allows customers to develop new applications using
APIs deployed and configurable remotely. The platforms offered include development
tools, configuration management, and deployment platforms. Examples are Microsoft
Azure, Force and Google App engine.
17
Infrastructure as service (IaaS): provides virtual machines and other abstracted
hardware and operating systems which may be controlled through a service API.
Examples include Amazon EC2 and S3, Terremark Enterprise Cloud, Windows Live
Skydrive and Rackspace Cloud.
Clouds may also be divided into
Public: available publicly - any organisation may subscribe
Private: services built according to cloud computing principles, but accessible only
within a private network
Partner: cloud services offered by a provider to a limited and well-defined number of
parties. [9 ][11].
Cloud data storage architecture
18
2.2 Cloud computing security Issues
Privileged user access-Information transmitted from the client through the Internet
poses a certain degree of risk, because of issues of data ownership; enterprises should
spend time getting to know their providers and their regulation as much possible
before assigning some trivial applications first to test the water.[6]
Regulatory Compliance-clients are accountable for the security of their solution, as
they can choose between providers that allow auditing by third party organizations
that check levels of security and providers that do not.
Service Level Agreement(SLA) which vendor has to provide assurance to convince
customer on security issues in cloud computing performance management to be
reviewed regularly by two parties to minimize and resolve unplanned incidents
customer duties and responsibilities service qualities third party claims for breaches
exclusions adequate provisions for disaster recovery and business continuity planning
to protect the service.[12]
2.3 Technical security Issues:
The current browser based authentication protocols for the cloud are not secure as the
browser is unable to issue XML based token by itself and federated identity
19
management system store security tokens with in the browser, where they are only
protected by integrating TLS and SOP in better fashion.
A promising countermeasure approach by performing a service instance integrity
check prior to using a service instance for incoming requests in the cloud system
prevents from the cloud malware injection attack, where as for metadata spoofing
attack a hash based integrity verification of the metadata description files prior to
usage is required by strengthening the security capabilities and integrating the security
web service frameworks into the web browser cloud computing security can be
improved.[13]
Data location-depending on contracts, some clients might never know what country or
what jurisdiction their data is located.
Data segregation-encrypted information from multiple companies may be stored on
the same hard disk, so a mechanism to separate data should be deployed by the
provider.
Recovery-every provider should have a disaster recovery protocol user data.
Investigative support-if a client suspects faulty activity from the provider, it may not
have many legal ways pursue an investigation
Long term viability-refers to the ability to retract a contract and all data if the current
provider is bought out by another firm [6]
Security, data security become more important when using cloud computing at all
levels like Infrastructure as a Service (IaaS) Platform as a Service(Paas) and Software
as a Service (SaaS). A major reason for the lack of effective data security is simply
20
the limitations of current encryption capabilities, however effect adequately detail
data lineage (mapping) are simply not possible in today’s cloud computing offerings
and another major concern is about data residue left behind and possibly becoming
available to unauthorized parties.
These concerns with data security do not negate the capabilities of advantage of
utilizing storage as service in the cloud for non sensitive, on regulated data. If
customers do want to store organizational data in the cloud they must take explicit
action or at least verify that the provider will and can adequately provide their data
stored in the cloud.
We know how to effectively encrypt data in transit and we know how to effectively
encrypt data at rest but because encrypted data cannot be processed or stored to do
any of those important activities requires that the data be encrypted hence a security
concern especially if that data is in the cloud and is beyond the data owners direct
control.
Even efforts to effectively manage data that is encrypted are extremely complex and
troublesome due to the current inadequate capabilities of the key management
products. Key management is an intra organizational content is difficult enough trying
to do effectively key management is the cloud is frankly beyond current capabilities
and will require significant advances in both encryption and key management
capabilities to be viable claim of key management products being effective currently
are naïve at best.
Due to nature of cloud computing like multitenancy and the volume of data likely to
be put in the cloud data security capabilities are important for the future of cloud
computing because of that coupled with today’s inadequate encryption and key
management capabilities cryptographic research efforts, such as predicate encryption
21
are underway to limit the amount of data that can be decrypted for processing in the
cloud. Recently announced capabilities of fully homomorphic encryption to process
encrypted data should be a huge benefit to the cloud computing. Similarly research
into large scale multitenancy key management should also be encouraged as it would
be of enormous benefit to cloud computing. [13]
2.4 Requirements of Data security in Cloud Computing
Secure data storage management is an important aspect of quality of service; cloud
computing inevitably poses new challenging security threats for number of reasons.
Traditional cryptographic primitives for the purpose of data security protection can
not be directly adopted due to the user’s loss control of data under cloud computing.
In the Security Alliance Guidance a secure outsourcing service should be evaluated
from at least from (1) strong encryption and scalable management (2) user
provisioning, deprovisioning (3) system availability and performance.
Securing outsourced data for multi-user accesses can achieved both data and metadata
must be properly protected.
Therefore verification of correct data storage in the cloud must be conducted without
explicit knowledge of the whole data, considering various kinds of data for each user
stored in the cloud and the demand of long term continuous assurance of their data
safety, the problem of verifying correctness of data storage in the cloud becomes more
challenging .Secondly cloud computing is not just a third party data warehouse. The
data stored in the cloud may be frequently updated by the users, including insertion,
deletion, modification, appending, recording etc. To ensure storage correctness under
dynamic feature also makes traditional integrity insurance techniques futile and
entails new solutions. Last but not least, the deployment of cloud computing is
22
powered by data centres running in a simultaneous, cooperated and distributed
manner. Individual user’s data is redundantly stored in multiple physical locations to
further reduce the data integrity threats. Therefore distributed protocols for storage
correctness assurance will be of most importance in achieving a robust and secure
cloud data.
There were many approaches adopted asymmetric encryption to information data
block level will make the key management mechanism of secure file systems very
cumbersome. These techniques while can be useful to ensure the storage correctness
without having users possessing data, can not address all the security threats in cloud
data storage, since all focussed on single server scenario and did not considered the
dynamic operations. Distributed protocols were proposed for ensuring storage
correctness across multiple servers on peers. Again none of these distributed schemes
was aware of dynamic data operations and as a result their applicability in cloud data
storage can be drastically limited [9][4][15].
23
Chapter 3
3.0 Title: Secure and Efficient Access to Outsourced Data
To enable secure and efficient access to outsourced data, investigators have tried to
integrate key derivation mechanisms [16, 17, 18,19] h encryption-based data access
control.[20 propose a generic method that uses only hash functions to derive a
descendant's key in a hierarchy. The method can handle updates locally and avoid
propagation. Although the proposed key derivation tree structure can be viewed as a
special case of access hierarchies, from analysis it is shown that the proposed method
serves the studied application better. In [21], the authors divide users into groups
based on their access rights to the data. The users are then organized into a hierarchy
and further transformed to a tree structure to reduce the number of encryption keys.
This method also helps to reduce the number of keys that are given to each user
during the initiation procedure. In [22], data records are organized into groups based
on the users that can access them. Since the data in the same group are encrypted by
the same key, changes to user access rights will lead to updates in data organization.
While a creative idea in this approach is to allow servers to conduct a second level
encryption (over-encryption) to control access, repeated access revocation and grant
may lead to a very complicated hierarchy structure for key management. In [23], the
approach will store multiple copies of the same data record encrypted by different
keys. At the same time, when access rights change, reencryption and data updates to
the server must be conducted. These operations will cause extra overhead on the
24
server and do not fit into proposed approach. An experimental evaluation of these
approaches can be found in [24].
The basic idea is to generate the data block encryption keys through a hierarchy.
Every key in the hierarchy can be derived by combining its parent node and some
public information. As the derivation procedure uses a one-way function, secret keys
of the parent node and sibling nodes cannot be calculated, In this way the data owner
will need to maintain only the root nodes of the hierarchy and during the key
distribution procedure the owner can send the secrets in the hierarchy to end users
based on their access rights. The end users will derive the leaf nodes in the hierarchy
to decrypt the data blocks.
Key Derivation Hierarchy
By assuming that the out sourced data contain n blocks and 2 P-1 <=n<2P a binary tree
structure with height p can be built. The data owner will choose a root secret K 0,1
which is the first index of key represents its level in the hierarchy and second index
represents its sequence in the level and so on. The data owner chooses a public hash
function h( ) for any node K ij in the hierarchy its child can be calculated as
25
by sandwiching the sequence number of the child
node with the parents key and then hash function is applied. Similarly right child of
Kij can be calculated, A node can calculate the secrets of all its descendants applying
this function repeatedly, reaching level P of the hierarchy the hash results can be used
as keys to encrypt the data blocks.
3.1 Data Access Procedure
To prevent revoked users from getting access to outsourced data through
eavesdropping. The service provider will conduct over encryption when it sends data
blocks to end users, service provider and end users share a pseudo random bit
sequence generator p( ) .
Representing :
O  data owner,
S  service provider,
U  end user.
Since only U and O know Kou, O will be able to authenticate the sender .The request
index will be increased by 1 every time U send out a request and it is used by O to
defend against replay attacks. The request contains the index numbers of the data
26
blocks that U wants to access. The message Authentication Code (MAC) will protect
the integrity of the packet.
When O receives this message, it will authenticate the sender and verify the integrity
and it will then examine its access control matrix and make sure that U is authorised
to read all blocks in the request. If the request passes this check the owner will
determine the smallest set of keys K` in the key such that (1) K’ can derive the keys
that are used to encrypt the requested data blocks.(2)U is authorized to know all keys
that can be derived from K’and can be determined by an algorithm.
The owner will then generate the reply to the end user.
The ACM index is used by O to lable the freshness of the Access Control
Matrix(ACM).This index will be increased by 1 every time O changes some end users
access rights. The updated ACM index will be sent to S by O to prevent those revoked
users from using old certificates to access data blocks. The seed is a random number
to initiate P( ) so that U can decrypt the over encryption conducted by S. U will use
K` to derive the data block encryption keys. The format of certificate for service
provider is as follows.
The user U will send to the service provide. When S receives
this packet, it can verify that the cert is generated by O since only they know secret
key Kos. S will make sure that the user name and request index in cert match the
27
values in the packet. If the ACM index in cert is smaller than the value that S receives
from O, some changes to the access control matrix have happened and S will notify U
to get a new cert. Otherwise ,the service provider will retrieve the encrypted data
blocks and conduct the over encryption as follows. Using seed as the initial state of
P( ) the function will generate a long sequence of pseudo random bits. S will use this
bit sequence as one time pad and conduct the xor operation to encrypt the blocks. The
computation results will be sent to U.
When U receives the data blocks it will use seed to generate to pseudo random bit
sequence and use K` to derive the encryption keys then the data blocks are recovered
When an end user U loses access to some data blocks, the access control matrix at O
will be updated. This will be sent to S through a secure channel. If U presents the old
cert to S it will be rejected since the ACM index value is invalid. However U can still
get access to the data blocks by eavesdropping on the traffic between S and other end
users if it has kept a copy of the key set K’. To defend such attacks service provider
can conduct over-encryption before sending out the data blocks. Since for every data
request the seed is dynamically generated by O and never transmitted in plaintext U
will not be able to regenerate the bit sequence of other end users. Therefore unless U
keeps a copy of the data blocks from previous access, it will not be able to get the
information.
3.2 Dynamics in User Access Rights
In lazy revocation it is assumed that it is acceptable for revoked users to read
unmodified data blocks and however it must not be able to read updated blocks .Lazy
revocation trades re-encryption and data access overhead for a degree of security.
28
When the access rights to data block Di of the user U is revoked, the access control
matrix in O will be updated and the ACM index increased .At the same time, O will
label this data block to show that some users access right has been revoked since its
last content update. Before Di has been updated next time, the owner will not change
the block on the outsourced storage. Since the ACM index value has been changed U
can no longer use its old cert to access Di. However when another user gets encrypted
Di through the network U can eavesdrop on the traffic. Since the service provider
refuses to conduct over encryption the data will be transmitted in the same format
whoever the reader is. Therefore U should keep a copy of the encryption key, so that
it will get access to Di. This results however is the same as U has kept a copy of Di
before it access right is revoked.
When the owner needs to change the data block from Di to D`i it will check the label
and find that some uses access rights has been revoked. Therefore it cannot encrypt
the updated data block with the current key. The solution for this drawback is that the
owner will encrypt a control block with secret Kp,i the updated data is stored. When a
user receives this control block from the service provider, it will submit it to the
owner. The owner will derive the new key and send it back to the user. At the same
time a new cert will be generated so that the user get the new block from the service
provider. A revoked user will be able to get access to the control block. However the
owner will not send the new encryption key and cert to it .Therefore the revoked user
cannot get access to the updated data.
29
3.3 Dynamics in Outsourced Data
When a data block Di is deleted from the outsourced data, the owner will use a special
control block to replace Di. The special block will be encrypted by Kpi and stored at
the original slot for Di on the service provider. At the same time, the owner will label
its access control matrix to show that the block no longer exists. The end users can
still access this control block but they will not get any useful information from the
contents.
The block data updating can be conducted as follows when the owner needs to update
Di, it will use Kp,I to encrypt the control block and store it in the i-th block of the
outsourced data. The control block will contain (1) (2p+i) which is the index of the
block in which D’I is stored.(2)x which is the number of times that Di has been
updated and (3) which is used to protect the integrity of the
control block. The owner will encrypt D’I with and the store the result in
the block with index number (2p+1).
Handling updates to data blocks
30
When user U needs to access the updated data block Di’ it will first get the encrypted
control block from S and submit it to the data owner. The owner will use the secret k
verify to examine the integrity of the control block. It will then use K’01 and x to derive
the encryption key of D01. The owner will return the encryption key and a new cert to
U though the secure communication channel between them. U will then get D01 from
the service provider. This method has several properties all meta data is stored in the
control block on the service provider so that the data owner only needs to store two
secret K’01 K verify since k verify is known to only the owner, attackers cannot generate
fake control blocks, every time the data block Di from the service provider.
The data blocks that are always accessed together should be given sequential block
index number so that the owner can derive a smaller access key set K` for users. The
owner can reserve some empty slots in the outsourced data and later it can insert new
data into these positions based on their access patterns.
3.4 Analysis of Overhead:
The proposed approach introduces very limited storage overhead. The key derivation
mechanism allows the owner O to store only the root keys of the hierarchies. The end
user U does not need to pre-calculate and store all data block encryption keys. On the
contrary it can calculate the keys on the fly when it is conducting the data block
decryption operations. The service provider S needs to store an extra copy of the
updated data blocks when the data update rate is very low in the application
environment, the extra storage overhead at S is also low compared to the size of the
outsourced data.[15].
31
3.5 Conclusion:
In this paper the authors proposed a mechanism to achieve secure and efficient access
to outsourced data in owner-write-users read applications. Assuming that the
outsourced data has a very large scale and tried to reduce the overhead at the data
owner and service provider. It is proposed to encrypt every data block with a
different key so that flexible cryptography- based access control can be achieved.
Through the adoption of key derivation method, the owner needs to maintain only a
few secrets. Analysis shows that the key derivation procedure based on hash functions
will introduce very limited overhead. Over-encryption and/or lazy revocation to
prevent revoked users from getting access to updated data blocks worked on
mechanisms to handle both updates to outsourced data and changes in user access
rights analyzed the computational, storage, and communication overhead of the
approach. and also investigated the scalability and safety of the approach.[15]
3.6 Summary
Integrating several advanced techniques to secure and efficient access to large scale
outsourced data in cloud computing by encrypting every data block with a different
symmetric key and adopting key derivation method and to provide fine grained access
control the outsourced data with flexible and efficient management by reducing the
risk of owner where he needs only few secrets for key derivation and it does not need
to access the storage server except for data updates, by constructing the key hierarchy
and key derivation procedures and mechanisms to handle dynamics in outsourced data
blocks overhead of the proposed approach was investigated for data retrieval from
scientific databases[15].
32
Chapter 4.0 Data Security Model for Cloud Computing:
In the cloud computing environment, the traditional access control mechanism has
serious shortcomings and built with new architecture which is compromised with
Hadoop, Hbase technologies which enhances the performance of the cloud systems
but brings in risks at the same time.
By analyzing of HDFS data security needs of cloud computing is divided as the client
authentication requirements in login. The vast majority of cloud computing through a
browser client and the user’s identity as a cloud computing applications demand the
primary needs. If namenode is attacked or failure there will be disastrous
consequences on the system, so the effectiveness of namenode in cloud computing
and its efficiency is key to the success of data protection so to enhance namenodes
security is very important.
HDFS Architecture
33
As Datanode is data storage node, there is the possibility of failure and can not
guarantee the availability of data .currently each data storage block in HDFS has at
least three replicas, which is HDFS backup strategy. when comes to how to ensure the
safety reading and writing data, HDFS has not made any detailed explanation so the
needs to ensure rapid recovery and to make reading and writing data operation fully
controllable can not be ignored. In addition access control, file encryption, such as
demand for cloud computing model foe data security issues must be taken into
account.
All the data security techniques is built on confidentiality, integrity and availability of
these three basic principles. Confidentiality refers to so called hidden the actual data
or information and in cloud computing the data is stored in data centres security and
confidentiality is more important. The integrity of data in any state is not subject to
the need to guarantee unauthorized deletion, modification or damage.
34
Data model of cloud computing can be described in math as
35
Cloud Computing Data Security Mode
The model used three-level defence system structure in which each floor performs its
own duty to ensure that the data security of cloud layers.
The first layer is responsible for user authentication, the user of digital certificates
issued by the appropriate, and manager user permissions.
The second layer is responsible for users data encryption and protect the privacy of
users through a certain way.
The third layer is the user data for fast recovery system protection is the last layer of
user data with three level structures, user authentication is used to ensure that data is
not tampered. The user authenticated can manage, the data by operations: add,
modify, delete and so on. If the user authentication system is deceived by illegal
36
means, and malign user enters the system file encryption and privacy protection can
provide this level of defence. In this the layer user data is encrypted, even is the key
was the illegally accessed, through privacy protection, malign user will still not
unable to obtain effective access to information, which is very important to protect
business users trade secrets in cloud computing. Finally the rapid restoration of files
layer, through fast recovery algorithm, makes user data be able to get the maximum
recovery even in case of damage.
37
Hence the cloud computing model for data security is designed [25].
4.1 Conclusion
The data security becomes more important in cloud computing with analysis of HDFS
architecture, the data security requirement of cloud computing and a set up
mathematical model is designed.[25]
38
4.2Summary:
Cloud computing environment is a dynamic environment where the uses data
transmits from the data centre to the user’s client and data of the user’s changes all the
time The HDFS used in large scale cloud computing in typical distributed file system
architecture
All the data security technique is built on confidentiality, integrity and availability
taking them in consideration a mathematical data model is designed [25].
Chapter 5.0 Ensuring Data Storage Security in cloud computing
In [27]described a formal “proof of retrievability”(POR) model for ensuring the
remote data integrity. Their scheme combines spot-checking and error-correcting code
39
to ensure both possession and retrievability of files on archive service systems.
[28]built on this model and constructed a random linear function based homomorphic
authenticator which enables unlimited number of queries and requires less
communication overhead. [29]Proposed an improved framework for POR protocols
that generalizes both [27] and [28] work. Later in their subsequent work [29] extended
POR model to distributed systems. However, all these schemes are focusing on static
data. The effectiveness of their schemes rests primarily on the pre-processing steps
that the user conducts before outsourcing the data file F. Any change to the contents
of F, even few bits, must propagate through the error-correcting code, thus
introducing significant computation and communication complexity. [30] define the
“provable data possession” (PDP) model for ensuring possession of file on untrusted
storages. Their scheme utilized public key based homomorphic tags for auditing the
data file, thus providing public verifiability. However, their scheme requires sufficient
computation overhead that can be expensive for an entire file. In their subsequent
work, [31] described a PDP scheme that uses only symmetric key cryptography. This
method has lower-overhead than their previous scheme and allows for block updates,
deletions and appends to the stored file, which has also been supported in our work.
However, their scheme focuses on single server scenario and does not address small
data corruptions, leaving both the distributed scenario and data error recovery issue
unexplored. [32] aimed to ensure data possession of multiple replicas across the
distributed storage system. They extended the PDP scheme to cover multiple replicas
without encoding each replica separately, providing guarantees that multiple copies of
data are actually maintained. In other related work, [33]presented a P2P backup
scheme in which blocks of a data file are dispersed across m+k peers using an
(m+k,m)-erasure code. Peers can request random blocks from their backup peers and
40
verify the integrity using separate keyed cryptographic hashes attached on each block.
Their scheme can detect data loss from free riding peers, but does not ensure all data
is unchanged. [34] proposed verify data integrity using RSA-based hash to
demonstrate uncheatable data possession in peer-to-peer file sharing networks.
However, their proposal requires exponentiation over the entire data file, which is
clearly impractical for the server whenever the file is large. [35] Proposed allowing a
TPA to keeps online storage honest by first encrypting the data then sending a number
of pre computed symmetric-keyed hashes over the encrypted data to the auditor.
However, their scheme only works for encrypted files, and auditors must maintain
long-term state.[36]Proposed to ensure file integrity across multiple distributed
servers, using erasure-coding and block-level file integrity checks. However, their
scheme only considers static data files and do not explicitly study the problem of data
error localization, which in our approach considering this work.
In this approach an effective and flexible distributed scheme with explicit dynamic
data support to ensure the correctness of user’s data in the cloud is proposed. Erasure
correcting code in the file distribution preparation is to provide redundancies and to
guarantee the data dependability, by which this construction drastically reduces the
communication overhead. To achieve the storage insurance as well as data error
localization homomorphic token with distributed verification of erasure-coded data is
utilized.
41
The main idea is as followed before file distribution the user pre computes a certain
number of short verification tokens on individual vector G(j)(jϵ{1,…..n}) each token
covering a random subset of data blocks. To ensure the storage correctness the use
challenges the provider with a set of randomly generated block indices and the sever
computes a short signature over the specified blocks and returns to the user and it is
compared with pre computed tokens to match by the user. The requested response
values of integrity check must also be valid codeword determined by secret matrix P.
5.1 Challenge Token Preparation
When user wants to challenge the cloud servers t times to ensure the correctness of
data storage. Then user must pre compute t verification tokens for each G(j)(jϵ{1…
n})using a PRFf(.) a PRP Ф(.) a challenge key kchal and a master permutation key KPRP.
To generate the ith token of server J the user acts as follows.
To derive a random challenge value αi of GF(2p) by αi=fkchal(i) and a permutation key
KiPRP based on KPRP.
Compute the set of r randomly chosen indices:
Token is calculated as
Vi(j) which is an element of GF(2p) with small size, is the response the user expects to
receive from the server j. when the user challenges it on the specified data blocks.
42
Once all tokens are computed the final step before file distribution is to blind each
parity block. Gi(i) in
where Kj is the secret key for parity vector G(j)(jϵ{1,….n}) across the cloud servers
S1,S2,S3,,,,,Sn
5.2Correctness Verification and Error Localization
The user reveals the αi as well as the I-th permutation key Kprpi to each servers.
The server storing vector G(i) aggregates those r rows specified by index Kprp into a
linear combination.
Upon receiving Ri(i) from all the servers, the users takes away blind values in
As all the servers operate over the same subset of indices the linear aggregation of
these r specified rows has to be codeword in the encoded file matrix.
43
Once the inconsistency among the storage has been detected by relying on pre
computed verification tokens to further determine where the potential data errors lies
in. Each response Ri(j) is computed exactly in the same way as token vi(j) thus user can
simply find which server is misbehaving by verifying the following n equations.
Algorithm gives the details of correctness verification and error localization.
5.3 File Retrieval and Error Recovery
Since layout considered here is systematic the user can reconstruct the original file by
downloading the data vectors from the first m servers, assuming that they return the
correct response values. This verification scheme is based on random spot-checking
so the storage correctness assurance is a probabilistic one, by choosing system
parameters (e.g.r.t.l) appropriately and conducting enough times of verification, file
44
retrieval can be guaranteed. Whenever the data corruption is detected the comparison
of pre computed tokens and receive response values can guarantee the identification
of misbehaving servers. The user can always ask the servers to send back blocks of
the row r specified in the challenge and regenerate the correct blocks by erasure
correction .The newly recovered blocks can then be redistributed to the misbehaving
servers to maintain the correctness of storage.
Algorithm for Error Recovery
5.4 Providing Dynamic Data Operation Support:
5.4.1 Update Operation
Due to the linear property of reed Solomon code a user can perform the update
operation and generate the updated parity blocks by using Δ fij only without involving
any other unchanged block .general update matrix ΔF as:
45
Zero elements are used in ΔF to denote the unchanged blocks. To maintain the
corresponding parity vectors as well as be consistent with the original file layout, the
user can multiply ΔF by A and thus update information for both vectors and parity
vectors is generated.
Where denotes the update information for the parity
vector G(j).
5.4.2 Delete and Insert Operation: It is special case of update operation where the
original data blocks can be replaced with zeros or some predetermined special blocks.
By setting Δfij in ΔF
To be the updated parity information has to be blinded using the same method
specified in update operation.
An insert operation may affect many rows in the logical data file matrix F and a
substantial number of computations are required to renumber all the subsequent
blocks as well recomputed the challenge response tokens.
5.4.3Append Operation: If user wants to append the m blocks at the end of file F
denoted as
46
With the secret matrix P user can directly calculate the
append blocks for each parity server as
When the user is ready to append new blocks both the file blocks and the
corresponding parity blocks are generated the total length of each vector G(j)
Will be increased and fall into the range [l, lmax].Therefore the user will update those
affected tokens by adding to old Vi Whenever
The parity blinding is similar as introduced in update operation.
Through detailed security and performance analysis is it shown that that this scheme
is highly efficient and resilient to Byzantine failures, malicious data modification
attack and even server colluding attacks [9].
5.5 Conclusion
In this paper, the problem of data security in cloud data storage is investigated, which
is essentially a distributed storage system. To ensure the correctness of users’ data in
cloud data storage, an effective and flexible distributed scheme with explicit dynamic
data support, including block update, delete, and append is proposed. Relying on
47
erasure-correcting code in the file distribution preparation to provide redundancy
parity vectors and guarantee the data dependability. By utilizing the
homomorphic token with distributed verification of erasurecoded data, our scheme
achieves the integration of storage correctness insurance and data error localization,
i.e., whenever data corruption has been detected during the storage correctness
verification across the distributed servers, it is almost guarantee the simultaneous
identification of the misbehaving server(s). Through detailed security and
performance analysis, it is shown that this scheme is highly efficient and resilient to
Byzantine failure, malicious data modification attack, and even server colluding
attacks. [9]
5.6 Summary
The Authors proposed an effective and flexible distributed scheme with explicit
dynamic data support to ensure the correctness of user’s data in the cloud by utilizing
homomorphic token system with distributed verification of erasure-coded data,
worked on integration of storage correctness insurance and data error localization.
Further this proposed scheme supports secure and efficient dynamic operations on
data blocks, including data update, delete and append. From analysis proposed
scheme is efficient and resilient against Byzantine failure, malicious data modification
attack and even secure on colluding attacks.
48
The authors aim to designing efficient mechanism for dynamic data verification and
operation to achieve storage correctness, fast localization of data error, dynamic data
support, minimizing the effect brought by data errors or secure failures.
The authors reviewed basic tools from coding theory which are needed for file
distribution and homomorphic token system which belongs to universal hash function
to preserve homomorphic properties and it is also perfectly integrated with
verification of erasure-coded data and also derived challenge response protocol to
verify the storage correctness as well as to identify misbehaving servers.[9]
Chapter 6
Critical Appraisal, Recommendations and suggestions for future work
49
The proposed approach in chapter 3 is to encrypt every data block with a different key
so that to achieve cryptography based access control flexibly. Owner has to maintain
only a few secrets by adopting key derivation methods. From analysis the key
derivation procedure using hash function will introduce very limited computational
overhead. The approach provides fine grain access control to outsourced data with
flexible and efficient management and does not need to access the storage server
except for data updates .A comprehensive mechanism to introduced to handle
dynamics in user access rights and updates to outsourced data and this mechanism
does not depend on any specific encryption algorithm to end users and cam make their
own choices based on the requirement of the application. The key derivation tree
structure will allow data consumer to use a few keys to generate all secrets in need.
The key distribution and update problem is beyond this approach and considered only
the simple case of outsourced data with a single owner and can be extended to the
scenarios in which the data has multiple owners and where each of them can choose
data blocks independently.
To maintain data consistency we should orderly execution of the update operations
when owners want to change the data contents, this can be achieved through
semaphore flag at the service provider which is not discussed in this approach [15].
The proposed approach from chapter 4 Cloud computing environment is a dynamic
environment where the uses data transmits from the data centre to the user’s client and
50
data of the user’s changes all the time The HDFS used in large scale cloud computing
in typical distributed file system architecture
All the data security technique is built on confidentiality, integrity and availability
taking them in consideration a mathematical data model is designed [25].
The proposed approach in chapter 5 Utilized the homomorphic token with distributed
verification of erasure coded data to achieve the integration of storage correctness
insurance and data error localization, That is to identify misbehaving servers and
further supports to secure and efficient dynamic operations on data blocks, including
data update, delete and append. This construction drastically reduces the
communication and storage overhead as compared to the traditional replication based
file distribution techniques. Extensive security and performance analysis shows that
the proposed scheme is highly efficient and resilient against Byzantine failure,
malicious data modification attack and even server colluding attacks.
Assumed point-to-point communication channels between each cloud servers and the
user is authenticated and reliable which can be achieved in practice with little
overhead but multipoint communication is not considered. Issue of data privacy is not
addressed as in cloud computing data privacy is orthogonal to the proposed approach.
An efficient inset operation is difficult to support in the given approach as It may
affect rows in the logical data file matrix and a substantial number of computations
are required to renumber all the subsequent blocks as well as re-compute the
challenges response tokens [9]
6.1Future work:
51
To study semaphore flag at the service provider from operating systems and
distributed database for access to shared resources. Work on new key management
schemes to apply for write-many-read applications.
To work on efficient insert operation in dynamic data and public verifiable models
and dynamic cloud data storages, Fine grained data error localization.
After comparing the three approaches that none of the above were not sure to secure
data storage in cloud computing as an area full of challenges and of paramount
importance is still in its infancy and working on data model architecture is to be
considered to secure data in cloud computing.
Chapter 7
Conclusions
52
Reviewed on basic tools of coding like homomorphic tokens that in needed for file
distribution across cloud servers, analysis are studied from the proposed approaches
for computational, storage and communication overhead of a data access operation
and to prevent revoked users from getting access to outsourced data through
eavesdropping, efficient key management methods like key derivation hierarchy.
A mathematical data model is designed.
Appendix-A
53
Securing Data Storage in Cloud Computing
Anil Kumar Mysa E-mail: akm0144@londonmet.ac.uk

Supervisor: Dr. Nicholous Inonnides
Computer Networking
London Metropolitan University
London U.K
secure enough data leaves companies local area

networks. It is up to the clients to decide the
ABSTRACT
vendors, depending on how willing they are to
Cloud computing has become a significant
implement secure policies and be subject to third
technology trend and many experts expect it to
party verifications.
reshape information technology processes and the
IT market place in next few years. Security and data
security becomes more important when using clod
Encryption is a well known approach to addressing
computing at all levels like Infrastructure as a
these types security threats for protection in the
service (IaaS) Platform as a Service (PaaS) and
cloud, the enterprise would need to encrypt all data
Software as service (SaaS). A major reason for the
and communications while it is not that difficult to
lack of effective data security is the simply the
add encryption software initially to the application
limitations of current encryption capabilities. In this
environment, the new configuration requires
paper a novel techniques were discussed like Key
ongoing management and maintenance. And in
derivation method, homomorphic tokens and
order to run the application in the cloud, the
backup assisted revocation schemes are proposed to
enterprise needs to deliver the encryption keys to the
achieve stronger data security storage correctness,
cloud to decrypt the data, creating additional
fast localization of data error. This is followed by a
security risks by exposing the keys in the operating
discussion of the performance evaluation and an
environment. Recent advances in cryptography
outlook into future
could mean that future cloud computing services
will not only be able to encrypt documents to keep
INTRODUCTION
them safe in cloud but also make it possible to
Cloud computing is not an innovation but a means
search and retrieve this information without first
to constructing IT services that use advanced
decrypting it.
computational power and improved storage
Encrypted search architectures and tools have been
capabilities it has drawn the attention of major
developed by groups at several universities and
industrial companies, scientific communities as well
companies. Though there are a variety of different
as user groups. The main focus of cloud computing
approaches, most technologies encrypt data file as
from the providers view as extraneous hardware
well as tags called metadata that describe the
connected to support downtime on any device in the
contents of those files and issue a master key to the
network. Critics argue that cloud computing is not
54
user. The token used to search through encrypted can be viewed as a special case of access
data contains functions that are able to find matches hierarchies, from analysis it is shown that the
to metadata attached to certain files and then return proposed method serves the studied application
the encrypted files to the user. Once the user has the better. In [21], the authors divide users into groups
file, he can use his master decryption “key” to based on their access rights to the data. The users
decrypt it. are then organized into a hierarchy and further
Firstly, traditional cryptographic primitives for the transformed to a tree structure to reduce the number
purpose of data security protection can not be of encryption keys. This method also helps to
directly adopted due to the users’ loss control of reduce the number of keys that are given to each
data under cloud Computing. Therefore, verification user during the initiation procedure. In [22], data
of correct data storage in the cloud must be records are organized into groups based on the users
conducted without explicit knowledge of the whole that can access them. Since the data in the same
data. Considering various kinds of data for each user group are encrypted by the same key, changes to
stored in the cloud and the demand of long term user access rights will lead to updates in data
continuous assurance of their data safety, the organization. While a creative idea in this approach
problem of verifying correctness of data storage in is to allow servers to conduct a second level
the cloud becomes even more challenging. encryption (over-encryption) to control access,
Secondly, Cloud Computing is not just a third party repeated access revocation and grant may lead to a
data warehouse. The data stored in the cloud may be very complicated hierarchy structure for key
frequently updated by the users, including insertion, management. In [23], the approach will store
deletion, modification, appending, reordering, etc. multiple copies of the same data record encrypted
To ensure storage correctness under dynamic data by different keys. At the same time, when access
update is hence of paramount importance. However, rights change, reencryption and data updates to the
this dynamic feature also makes traditional integrity server must be conducted. These operations will
insurance techniques futile and entails new cause extra overhead on the server and do not fit
solutions. Last but not the least, the deployment of into proposed approach. An experimental evaluation
Cloud Computing is powered by data centres of these approaches can be found in [24].
running in a simultaneous, cooperated and The basic idea is to generate the data block
distributed manner. Individual user’s data is encryption keys through a hierarchy. Every key in
redundantly stored in multiple physical locations to the hierarchy can be derived by combining its parent
further reduce the data integrity threats. node and some public information. As the derivation
Therefore, distributed protocols for storage procedure uses a one-way function, secret keys of
correctness assurance will be of most importance in the parent node and sibling nodes cannot be
achieving a robust and secure cloud data storage calculated, In this way the data owner will need to
system in the real world. However, more research maintain only the root nodes of the hierarchy and
efforts are needed to achieve flexible access control during the key distribution procedure the owner can
to large scale dynamic data. send the secrets in the hierarchy to end users based
To enable secure and efficient access to outsourced on their access rights. The end users will derive the
data, investigators have tried to integrate key leaf nodes in the hierarchy to decrypt the data
derivation mechanisms [16, 17, 18,19] h encryption- blocks.
based data access control.[20 propose a generic
method that uses only hash functions to derive a
descendant's key in a hierarchy. The method can
handle updates locally and avoid propagation.
Although the proposed key derivation tree structure
55
Since only U and O know Kou, O will be able to
authenticate the sender .The request index will be
increased by 1 every time U send out a request and
it is used by O to defend against replay attacks. The
request contains the index numbers of the data
blocks that U wants to access. The message
Key Derivation Hierarchy
Authentication Code (MAC) will protect the
integrity of the packet.
By assuming that the out sourced data contain n
blocks and 2 P-1 <=n<2P a binary tree structure with
When O receives this message, it will authenticate
height p can be built. The data owner will choose a
the sender and verify the integrity and it will then
root secret K 0,1 which is the first index of key
examine its access control matrix and make sure that
represents its level in the hierarchy and second
U is authorised to read all blocks in the request. If
index represents its sequence in the level and so on.
the request passes this check the owner will
The data owner chooses a public hash function h( )
determine the smallest set of keys K` in the key such
for any node K ij in the hierarchy its child can be
that (1) K’ can derive the keys that are used to
calculated as
encrypt the requested data blocks.(2)U is authorized
by
to know all keys that can be derived from K’and can
sandwiching the sequence number of the child node
be determined by an algorithm.
with the parents key and then hash function is
applied. Similarly right child of Kij can be
The owner will then generate the reply to the end
calculated, A node can calculate the secrets of all its
user.
descendants applying this function repeatedly,
reaching level P of the hierarchy the hash results can
be used as keys to encrypt the data blocks.
The ACM index is used by O to lable the freshness
Data Access Procedure of the Access Control Matrix(ACM).This index will
be increased by 1 every time O changes some end
To prevent revoked users from getting access to users access rights. The updated ACM index will be
outsourced data through eavesdropping. The service sent to S by O to prevent those revoked users from
provider will conduct over encryption when it sends using old certificates to access data blocks. The seed
data blocks to end users, service provider and end is a random number to initiate P( ) so that U can
users share a pseudo random bit sequence generator decrypt the over encryption conducted by S. U will
p( ) . use K` to derive the data block encryption keys. The
format of certificate for service provider is as
Representing : follows.
O  data owner,
S  service provider,
U  end user.
56
The user U will send and however it must not be able to read updated
blocks .Lazy revocation trades re-encryption and
to the service data access overhead for a degree of security.
provide. When S receives this packet, it can verify
that the cert is generated by O since only they know When the access rights to data block Di of the user
secret key Kos. S will make sure that the user name U is revoked, the access control matrix in O will be
and request index in cert match the values in the updated and the ACM index increased .At the same
packet. If the ACM index in cert is smaller than the time, O will label this data block to show that some
value that S receives from O, some changes to the users access right has been revoked since its last
access control matrix have happened and S will content update. Before Di has been updated next
notify U to get a new cert. Otherwise ,the service time, the owner will not change the block on the
provider will retrieve the encrypted data blocks and outsourced storage. Since the ACM index value has
conduct the over encryption as follows. Using seed been changed U can no longer use its old cert to
as the initial state of access Di. However when another user gets
P( ) the function will generate a long sequence of encrypted Di through the network U can eavesdrop
pseudo random bits. S will use this bit sequence as on the traffic. Since the service provider refuses to
one time pad and conduct the xor operation to conduct over encryption the data will be transmitted
encrypt the blocks. The computation results will be in the same format whoever the reader is. Therefore
sent to U. U should keep a copy of the encryption key, so that
it will get access to Di. This results however is the
When U receives the data blocks it will use seed to same as U has kept a copy of Di before it access
generate to pseudo random bit sequence and use K` right is revoked.
to derive the encryption keys then the data blocks When the owner needs to change the data block
are recovered from Di to D`i it will check the label and find that
When an end user U loses access to some data some uses access rights has been revoked. Therefore
blocks, the access control matrix at O will be it cannot encrypt the updated data block with the
updated. This will be sent to S through a secure current key. The solution for this drawback is that
channel. If U presents the old cert to S it will be the owner will encrypt a control block with secret
rejected since the ACM index value is invalid. Kp,i the updated data is stored. When a user receives
However U can still get access to the data blocks by this control block from the service provider, it will
eavesdropping on the traffic between S and other submit it to the owner. The owner will derive the
end users if it has kept a copy of the key set K’. To new key and send it back to the user. At the same
defend such attacks service provider can conduct time a new cert will be generated so that the user get
over-encryption before sending out the data blocks. the new block from the service provider. A revoked
Since for every data request the seed is dynamically user will be able to get access to the control block.
generated by O and never transmitted in plaintext U However the owner will not send the new
will not be able to regenerate the bit sequence of encryption key and cert to it .Therefore the revoked
other end users. Therefore unless U keeps a copy of user cannot get access to the updated data.
the data blocks from previous access, it will not be
able to get the information.
Dynamics in User Access Rights Dynamics in Outsourced Data

In lazy revocation it is assumed that it is acceptable
for revoked users to read unmodified data blocks
57
When a data block Di is deleted from the outsourced the data owner only needs to store two secret K’01 K
data, the owner will use a special control block to verify since k verify is known to only the owner,
replace Di. The special block will be encrypted by attackers cannot generate fake control blocks, every
Kpi and stored at the original slot for Di on the time the data block Di from the service provider.
service provider. At the same time, the owner will
label its access control matrix to show that the block The data blocks that are always accessed together
no longer exists. The end users can still access this should be given sequential block index number so
control block but they will not get any useful that the owner can derive a smaller access key set
information from the contents. K` for users. The owner can reserve some empty
slots in the outsourced data and later it can insert
The block data updating can be conducted as new data into these positions based on their access
follows when the owner needs to update Di, it will patterns.
use Kp,I to encrypt the control block and store it in
the i-th block of the outsourced data. The control
block will contain (1) (2p+i) which is the index of Analysis of Overhead:
the block in which D’I is stored.(2)x which is the The proposed approach introduces very limited
number of times that Di has been updated and (3) storage overhead. The key derivation mechanism
allows the owner O to store only the root keys of the
which is used to
hierarchies. The end user U does not need to pre-
protect the integrity of the control block. The owner
calculate and store all data block encryption keys.
will encrypt D’I with and the store
On the contrary it can calculate the keys on the fly
the result in the block with index number (2p+1).
when it is conducting the data block decryption
operations. The service provider S needs to store an
extra copy of the updated data blocks when the data
update rate is very low in the application
environment, the extra storage overhead at S is also
low compared to the size of the outsourced data.
[15].
Data Security Model for Cloud Computing:
Handling updates to data blocks

In the cloud computing environment, the traditional
access control mechanism has serious shortcomings
When user U needs to access the updated data block
and built with new architecture which is
Di’ it will first get the encrypted control block from
compromised with Hadoop, Hbase technologies
S and submit it to the data owner. The owner will
which enhances the performance of the cloud
use the secret k verify to examine the integrity of the
systems but brings in risks at the same time.
control block. It will then use K’01 and x to derive
the encryption key of D01. The owner will return the
By analyzing of HDFS data security needs of cloud
encryption key and a new cert to U though the
computing is divided as the client authentication
secure communication channel between them. U
requirements in login. The vast majority of cloud
will then get D01 from the service provider. This
computing through a browser client and the user’s
method has several properties all meta data is stored
identity as a cloud computing applications demand
in the control block on the service provider so that
the primary needs. If namenode is attacked or
58
failure there will be disastrous consequences on the need to guarantee unauthorized deletion,
system, so the effectiveness of namenode in cloud modification or damage.
computing and its efficiency is key to the success of
data protection so to enhance namenodes security is Data model of cloud computing can be described in
very important. math as
HDFS Architecture
As Datanode is data storage node, there is the

possibility of failure and can not guarantee the
availability of data .currently each data storage
block in HDFS has at least three replicas, which is
HDFS backup strategy. when comes to how to
ensure the safety reading and writing data, HDFS
has not made any detailed explanation so the needs
to ensure rapid recovery and to make reading and
writing data operation fully controllable can not be
ignored. In addition access control, file encryption,
such as demand for cloud computing model foe data
security issues must be taken into account.
All the data security techniques is built on

confidentiality, integrity and availability of these
three basic principles. Confidentiality refers to so
called hidden the actual data or information and in
cloud computing the data is stored in data centres
security and confidentiality is more important. The
Cloud Computing Data Security Mode
integrity of data in any state is not subject to the
59
The model used three-level defence system structure maximum recovery even in case of damage.
in which each floor performs its own duty to ensure
that the data security of cloud layers.
The first layer is responsible for user authentication,
the user of digital certificates issued by the
appropriate, and manager user permissions.
The second layer is responsible for users data
encryption and protect the privacy of users through
a certain way.
The third layer is the user data for fast recovery

system protection is the last layer of user data with
three level structures, user authentication is used to
ensure that data is not tampered. The user
authenticated can manage, the data by operations:
add, modify, delete and so on. If the user
authentication system is deceived by illegal means,
and malign user enters the system file encryption
and privacy protection can provide this level of
defence. In this the layer user data is encrypted,
even is the key was the illegally accessed, through
privacy protection, malign user will still not unable
to obtain effective access to information, which is
very important to protect business users trade
secrets in cloud computing. Finally the rapid
restoration of files layer, through fast recovery
algorithm, makes user data be able to get the Hence the cloud computing model for data security
is designed [25].
Ensuring Data Storage Security in cloud

computing
In this approach an effective and flexible distributed
scheme with explicit dynamic data support to ensure
the correctness of user’s data in the cloud is
proposed. Erasure correcting code in the file
distribution preparation is to provide redundancies
and to guarantee the data dependability, by which
this construction drastically reduces the
communication overhead. To achieve the storage
insurance as well as data error localization
homomorphic token with distributed verification of
erasure-coded data is utilized.
60
The main idea is as followed before file distribution
the user pre computes a certain number of short 5.2Correctness Verification and Error Localization
verification tokens on individual vector G (jϵ{1,
(j)
…..n}) each token covering a random subset of data The user reveals the αi as well as the I-th
blocks. To ensure the storage correctness the use permutation key Kprpi to each servers.
challenges the provider with a set of randomly The server storing vector G(i) aggregates those r rows
generated block indices and the sever computes a specified by index Kprp into a linear combination.
short signature over the specified blocks and returns
to the user and it is compared with pre computed
tokens to match by the user. The requested response
values of integrity check must also be valid
codeword determined by secret matrix P. Upon receiving Ri(i) from all the servers, the users
takes away blind values in
Challenge Token Preparation

When user wants to challenge the cloud servers t
times to ensure the correctness of data storage. Then
user must pre compute t verification tokens for each
G(j)(jϵ{1…n})using a PRFf(.) a PRP Ф(.) a
challenge key kchal and a master permutation key
KPRP. To generate the ith token of server J the user
As all the servers operate over the same subset of
acts as follows.
indices the linear aggregation of these r specified
rows has to be codeword in the encoded file matrix.
To derive a random challenge value αi of GF(2p) by
Once the inconsistency among the storage has been
αi=fkchal(i) and a permutation key KiPRP based on KPRP.
detected by relying on pre computed verification
tokens to further determine where the potential data
Compute the set of r randomly chosen indices:
errors lies in. Each response Ri(j) is computed
exactly in the same way as token vi(j) thus user can
simply find which server is misbehaving by
verifying the following n equations.
Token is calculated as
Algorithm gives the details of correctness

Vi(j) which is an element of GF(2p) with small size, verification and error localization.
is the response the user expects to receive from the
server j. when the user challenges it on the specified
data blocks.
Once all tokens are computed the final step before

file distribution is to blind each parity block. Gi(i) in
where Kj is the secret key for parity vector G(j)(jϵ{1,

….n}) across the cloud servers S1,S2,S3,,,,,Sn
61
Zero elements are used in ΔF to denote the
unchanged blocks. To maintain the corresponding
parity vectors as well as be consistent with the
original file layout, the user can multiply ΔF by A
and thus update information for both vectors and
parity vectors is generated.
5.3 File Retrieval and Error Recovery Where

Since layout considered here is systematic the user denotes the update information for the parity vector
can reconstruct the original file by downloading the G(j).
data vectors from the first m servers, assuming that
they return the correct response values. This 5.4.2 Delete and Insert Operation: It is special case
verification scheme is based on random spot- of update operation where the original data blocks
checking so the storage correctness assurance is a can be replaced with zeros or some predetermined
probabilistic one, by choosing system parameters special blocks. By setting Δfij in ΔF
(e.g.r.t.l) appropriately and conducting enough times To be the updated parity information has to be
of verification, file retrieval can be guaranteed. blinded using the same method specified in update
Whenever the data corruption is detected the operation.
comparison of pre computed tokens and receive
response values can guarantee the identification of An insert operation may affect many rows in the
misbehaving servers. The user can always ask the logical data file matrix F and a substantial number
servers to send back blocks of the row r specified in of computations are required to renumber all the
the challenge and regenerate the correct blocks by subsequent blocks as well recomputed the challenge
erasure correction .The newly recovered blocks can response tokens.
then be redistributed to the misbehaving servers to
maintain the correctness of storage. 5.4.3Append Operation: If user wants to append the
m blocks at the end of file F denoted as
With the secret

Providing Dynamic Data Operation Support:
matrix P user can directly calculate the append
Update Operation
blocks for each parity server as
Due to the linear property of reed Solomon code a

user can perform the update operation and generate
the updated parity blocks by using Δ fij only without
involving any other unchanged block .general
When the user is ready to append new blocks both
update matrix ΔF as:
the file blocks and the corresponding parity blocks
are generated the total length of each vector G(j)
62
multiple owners and where each of them can choose
Will be increased and fall into the range [l, data blocks independently.
lmax].Therefore the user will update those affected To maintain data consistency we should orderly
execution of the update operations when owners
tokens by adding to old Vi
want to change the data contents, this can be
Whenever
achieved through semaphore flag at the service
provider which is not discussed in this approach
[15].
.
The proposed approach from chapter 4 Cloud
The parity blinding is similar as introduced in
computing environment is a dynamic environment
update operation.
where the uses data transmits from the data centre to
the user’s client and data of the user’s changes all
Through detailed security and performance analysis
the time The HDFS used in large scale cloud
is it shown that that this scheme is highly efficient
computing in typical distributed file system
and resilient to Byzantine failures, malicious data
architecture
modification attack and even server colluding
All the data security technique is built on
attacks [9].
confidentiality, integrity and availability taking
them in consideration a mathematical data model is
designed [25].
Conclusion
The proposed approach in chapter 5 Utilized the
The proposed approach in chapter 3 is to encrypt
homomorphic token with distributed verification of
every data block with a different key so that to
erasure coded data to achieve the integration of
achieve cryptography based access control flexibly.
storage correctness insurance and data error
Owner has to maintain only a few secrets by
localization, That is to identify misbehaving servers
adopting key derivation methods. From analysis the
and further supports to secure and efficient dynamic
key derivation procedure using hash function will
operations on data blocks, including data update,
introduce very limited computational overhead. The
delete and append. This construction drastically
approach provides fine grain access control to
reduces the communication and storage overhead as
outsourced data with flexible and efficient
compared to the traditional replication based file
management and does not need to access the storage
distribution techniques. Extensive security and
server except for data updates .A comprehensive
performance analysis shows that the proposed
mechanism to introduced to handle dynamics in user
scheme is highly efficient and resilient against
access rights and updates to outsourced data and this
Byzantine failure, malicious data modification
mechanism does not depend on any specific
attack and even server colluding attacks.
encryption algorithm to end users and cam make
their own choices based on the requirement of the
Assumed point-to-point communication channels
application. The key derivation tree structure will
between each cloud servers and the user is
allow data consumer to use a few keys to generate
authenticated and reliable which can be achieved in
all secrets in need.
practice with little overhead but multipoint
communication is not considered. Issue of data
The key distribution and update problem is beyond
privacy is not addressed as in cloud computing data
this approach and considered only the simple case
privacy is orthogonal to the proposed approach.
of outsourced data with a single owner and can be
An efficient inset operation is difficult to support in
extended to the scenarios in which the data has
the given approach as It may affect rows in the
63
logical data file matrix and a substantial number of To work on efficient insert operation in dynamic
computations are required to renumber all the data and public verifiable models and dynamic
subsequent blocks as well as re-compute the cloud data storages, Fine grained data error
challenges response tokens [9] localization.
After comparing the three approaches that none of

Future work: the above were not sure to secure data storage in
To study semaphore flag at the service provider cloud computing as an area full of challenges and of
from operating systems and distributed database for paramount importance is still in its infancy and
access to shared resources. Work on new key working on data model architecture is to be
management schemes to apply for write-many-read considered to secure data in cloud computing.
applications.
[1]http://www.apache.org./docs/current/hdfs_design.html
[2]http;//www.ia.org/wiki/Reed–Solomon_error_correction
[3]http;//www.ia.org/wiki/Universal_hashing
[4]http://www.ia.org/wiki/Homomorphic_encryption
[5]http://www.a.org/wiki/Byzantine_fault_tolerance
[6]Cloud computingchallenges and related security issues by TraianAndrei
[7]http://www.cloudswitch.com/page/making-cloud-computing-secure-for-the-enterprise. By Eellen Rubin
[8]http;//www.technologyreview.com/searching an encrypted cloud by David Talbot
[9]Ensuring Data Storage security in cloud computing by congwang,qianwang and kui ren,wenjing lou 978-1-
4244-3876-1/09/$25.00 ©2009 IEEE
[10] Ataxonomy and survey of cloud computing systems by Bhasker Prasad Rimal.Eunumi choi. Ian lumb.
[11] http://www.enisa.europa.eu/
[12] www.isaca.org
[13]on technical security issues in cloud computing by Meiko jenson, Jorg Schwenk Nils Gruschka,Luigi Lo
Iacono
[13]Privacy and security in cloud computing by tima mather ,kumaraswamy and shahed ali.page61-71
[14]cloud security alliance guide
[15]secure and efficient access to outsourced data by weichao wang Rodney owens Zhiweili and bharat bhargava
CCSW'09, November 13, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-784-4/09/11
[16] T. Chen, Y. Chung, and C. Tian. A novel key anagement scheme for dynamic access control in a
user hierarchy. In IEEE Annual International computersftware and Applications Conference,
pages 396{401, 2004.

[17] H. Chien and J. Jan. New hierarchical assignment ithout public key cryptography. Computers &
Security, 22(6):523{526, 2003.
[18] C. Lin. Hierarchical key assignment without ublic-key cryptography. Computers & Security,
20(7):612{619, 2001.
[19] S. Zhong. A practical key management scheme for ccess control in a user hierarchy. Computers &
Security, 21(8):750{759, 2002.
64
[20] M. J. Atallah, M. Blanton, N. Fazio, and K. B. rikken. Dynamic and e_cient key management for
access hierarchies. ACM Trans. Inf. Syst. Secur.,
12(3):1{43, 2009.
[21] E. Damiani, S. D. C. di Vimercati, S. Foresti,S. Jajodia, S. Paraboschi, and P. Samarati. Key
management for multi-user encrypted databases. InProceedings of the ACM workshop on Storage securityand
survivability, pages 74{83, 2005.
[22] S. D. C. di Vimercati, S. Foresti, S. Jajodia,S. Paraboschi, and P. Samarati. Over-encryption:
management of access control evolution on outsourceddata. In Proceedings of the international conference
onVery large data bases, pages 123{134, 2007.
[23] S. D. C. di Vimercati, S. Foresti, S. Jajodia,S. Paraboschi, and P. Samarati. A data outsourcing
architecture combining cryptography and accesscontrol. In Proceedings of the ACM workshop on
Computer security architecture, pages 63{69, 2007.
[24] E. Damiani, S. De Capitani di Vimercati, S. Foresti,S. Jajodia, S. Paraboschi, and P. Samarati. An
Experimental Evaluation of Multi-Key Strategies for daa Outsourcing, IFIP International Federation for
Information Processing, Volume 232, New Approachesfor Security, Privacy and Trust in Complex
Environments, pages 385{396. Springer, 2007.
[25]Data security model for cloud computing by daiYuefa Wu bo,Gu Yaqiang,Zhang Quan,Tang chaojing .
[26] A. Juels and J. Burton S. Kaliski, “PORs: Proofs of Retrievability forLarge Files,” Proc. of CCS ’07, pp.
584–597, 2007.
[27] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc.of Asiacrypt ’08, Dec. 2008
[28] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability: Theory
and Implementation,” Cryptology ePrint Archive, Report 2008/175,2008, http://eprint.iacr.org/.
[29] K. D. Bowers, A. Juels, and A. Oprea, “HAIL: A High-Availability andIntegrity Layer for Cloud Storage,”
Cryptology ePrint Archive, Report 008/489, 2008, http://eprint.iacr.org/.
[30] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,and D. Song, “Provable Data
Possession at Untrusted Stores,” Proc. ofCCS ’07, pp. 598–609, 2007.
[31] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable andEfficient Provable Data Possession,”
Proc. of SecureComm ’08, pp. 1–10, 2008.
[32] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP: Multiple-Replica Provable Data Possession,”
Proc. of ICDCS ’08, pp. 411–420,2008.
[33] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows, and M. Isard,“A Cooperative Internet Backup Scheme,”
Proc. of the 2003 USENIXAnnual Technical Conference (General Track), pp. 29–41, 2003.
[34] D. L. G. Filho and P. S. L. M. Barreto, “Demonstrating Data Possessionand Uncheatable Data Transfer,”
Cryptology ePrint Archive, Report 006/150, 2006, http://eprint.iacr.org/.
[35] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, “Auditing to Keep Online Storage Services
Honest,” Proc. 11th USENIX Workshop on Hot Topics in Operating Systems (HOTOS ’07), pp. 1–6, 2007.
[36] T. S. J. Schwarz and E. L. Miller, “Store, Forget, and Check: Using Algebraic Signatures to Check Remotely
Administered Storage,” Proc. of ICDCS ’06, pp. 12–12, 2006
.
65
Bibliography and References
[1]http://www.apache.org./docs/current/hdfs_design.html
[2]http;//www.ia.org/wiki/Reed–Solomon_error_correction
[3]http;//www.ia.org/wiki/Universal_hashing
[4]http://www.ia.org/wiki/Homomorphic_encryption
[5]http://www.a.org/wiki/Byzantine_fault_tolerance
[6]Cloud computingchallenges and related security issues by TraianAndrei
[7]http://www.cloudswitch.com/page/making-cloud-computing-secure-for-the-
enterprise. By Eellen Rubin
[8]http;//www.technologyreview.com/searching an encrypted cloud by David Talbot
[9]Ensuring Data Storage security in cloud computing by congwang,qianwang and
kui ren,wenjing lou 978-1-4244-3876-1/09/$25.00 ©2009 IEEE
[10] Ataxonomy and survey of cloud computing systems by Bhasker Prasad
Rimal.Eunumi choi. Ian lumb. Rimal, B.P.; Eunmi Choi; Lumb, I.;
INC, IMS and IDC, 2009. NCM '09. Fifth International Joint Conference on
25-27 Aug. 2009 Page(s):44 - 51
Digital Object Identifier 10.1109/NCM.2009.218
[11] http://www.enisa.europa.eu/
[12] www.isaca.org
66
[13]on technical security issues in cloud computing by Meiko jenson, Jorg Schwenk
Nils Gruschka,Luigi Lo Iacono Jensen, M.; Schwenk, J.; Gruschka, N.; Iacono, L.L.;
Cloud Computing, 2009. CLOUD '09. IEEE International Conference on
21-25 Sept. 2009 Page(s):109 - 116
Digital Object Identifier 10.1109/CLOUD.2009.60
[13]Privacy and security in cloud computing by tima mather ,kumaraswamy and

shahed ali.page61-71
[14]cloud security alliance guide
[15]secure and efficient access to outsourced data by weichao wang Rodney owens
Zhiweili and bharat bhargava CCSW'09, November 13, 2009, Chicago, Illinois, USA.
Copyright 2009 ACM 978-1-60558-784-4/09/11
[16] T. Chen, Y. Chung, and C. Tian. A novel key anagement scheme for dynamic
access control in a
user hierarchy. In IEEE Annual International computersftware and Applications
Conference,
pages 396{401, 2004.
[17] H. Chien and J. Jan. New hierarchical assignment ithout public key
cryptography. Computers &
Security, 22(6):523{526, 2003.
[18] C. Lin. Hierarchical key assignment without ublic-key cryptography. Computers
& Security,
20(7):612{619, 2001.
[19] S. Zhong. A practical key management scheme for ccess control in a user
hierarchy. Computers &
Security, 21(8):750{759, 2002.
[20] M. J. Atallah, M. Blanton, N. Fazio, and K. B. rikken. Dynamic and e_cient key
management for
access hierarchies. ACM Trans. Inf. Syst. Secur.,
12(3):1{43, 2009.
[21] E. Damiani, S. D. C. di Vimercati, S. Foresti,S. Jajodia, S. Paraboschi, and P.
Samarati. Key
management for multi-user encrypted databases. InProceedings of the ACM
workshop on Storage securityand survivability, pages 74{83, 2005.
[22] S. D. C. di Vimercati, S. Foresti, S. Jajodia,S. Paraboschi, and P. Samarati.
Over-encryption:
management of access control evolution on outsourceddata. In Proceedings of the
international conference onVery large data bases, pages 123{134, 2007.
[23] S. D. C. di Vimercati, S. Foresti, S. Jajodia,S. Paraboschi, and P. Samarati. A
data outsourcing
architecture combining cryptography and accesscontrol. In Proceedings of the ACM
workshop on
Computer security architecture, pages 63{69, 2007.
[24] E. Damiani, S. De Capitani di Vimercati, S. Foresti,S. Jajodia, S. Paraboschi,
and P. Samarati. An
Experimental Evaluation of Multi-Key Strategies for daa Outsourcing, IFIP
International Federation for
Information Processing, Volume 232, New Approachesfor Security, Privacy and Trust
in Complex
Environments, pages 385{396. Springer, 2007.
67
[25]Data security model for cloud computing by daiYuefa Wu bo,Gu Yaqiang,Zhang
Quan,Tang chaojing .
[26] A. Juels and J. Burton S. Kaliski, “PORs: Proofs of Retrievability forLarge
Files,” Proc. of CCS ’07, pp. 584–597, 2007.
[27] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc.of
Asiacrypt ’08, Dec. 2008
[28] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability: Theory
and Implementation,” Cryptology ePrint Archive, Report 2008/175,2008,
http://eprint.iacr.org/.
[29] K. D. Bowers, A. Juels, and A. Oprea, “HAIL: A High-Availability andIntegrity
Layer for Cloud Storage,” Cryptology ePrint Archive, Report 008/489, 2008,
[30] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,and D.
Song, “Provable Data Possession at Untrusted Stores,” Proc. ofCCS ’07, pp. 598–
609, 2007.
[31] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable andEfficient
Provable Data Possession,” Proc. of SecureComm ’08, pp. 1–10, 2008.
[32] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP: Multiple-Replica
Provable Data Possession,” Proc. of ICDCS ’08, pp. 411–420,2008.
[33] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows, and M. Isard,“A Cooperative
Internet Backup Scheme,” Proc. of the 2003 USENIXAnnual Technical Conference
(General Track), pp. 29–41, 2003.
[34] D. L. G. Filho and P. S. L. M. Barreto, “Demonstrating Data Possessionand
Uncheatable Data Transfer,” Cryptology ePrint Archive, Report 006/150, 2006,
[35] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, “Auditing to Keep
Online Storage Services Honest,” Proc. 11th USENIX Workshop on Hot Topics in
Operating Systems (HOTOS ’07), pp. 1–6, 2007.
[36] T. S. J. Schwarz and E. L. Miller, “Store, Forget, and Check: Using Algebraic
Signatures to Check Remotely Administered Storage,” Proc. of ICDCS ’06, pp. 12–
12, 2006.
68

Dessertation Final Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dessertation Final Report

Uploaded by

Copyright:

Available Formats

Data Security in Cloud Computing

Mr .ANIL KUMAR MYSA

Supervisor : NICHOLAS IOANNIDES

A Dissertation submitted in partial fulfillment

of the requirements of London Metropolitan University

for the degree of Master of Science (MSc) in Computer Networking

is followed by a discussion of the performance evaluation and an outlook into future.

attention of major industrial companies, scientific communities as well as user groups.

area networks. Encryption is a well known approach to addressing these types

data and communications.

Approaching an effective, flexible distributed scheme with explicit dynamic data

token system with distributed verification of erasure-coded data, and working on

integration of storage correctness insurance and data error localization.

verification and operation to achieve storage correctness, fast localization of data

for computational, storage and communication overhead of a data access operation

eavesdropping, efficient key management methods like key derivation hierarchy

1.1 Statement of Problem 11

1.2 Aims and Objectives 13

1.3 Approach and Methodology 15

1.4 Chapter Preview 16

Chapter 2.0 Literature review 17

2.1 Cloud Computing 18

2.2 Cloud Computing Security Issues 19

2.3 Technical Security Issues 19

2.4Requirements of Data Security in Cloud Computing 22

2.5 Infrastructure Security

Chapter 3.0 First Approach 23

3.1 Encryption Techniques 26

3.2 Proof Of Retrievability 28

Chapter 4.0 Second Approach 33

Message Authentication Code

Chapter 5.0 Third Approach 39

5.1 Major Threats 41

5.2 Network Based Attacks 42

5.3 CSP’s Own Availabiltity 44

Chapter6.0 Critical Appraisal, Recommendations and Future work 49

6.1 Future Work 51

Appendix A Scientific Article 53

Appendix B Project Proposal

References and Bibliography

List of Figures page.no

Fig1. Cloud Data Storage Architecture 18

Fig2.Key Derivation Hierarchy 25

Fig3. Handling Updates to Data Blocks 30

Fig4. HDFS Architecture 33

Fig5. Cloud Computing Data Security mode 35

HDFS Architecture The Hadoop Distributed File System (HDFS) is a distributed

file systems are significant. HDFS is highly fault-tolerant and is designed to be

deployed on low-cost hardware. HDFS provides high throughput access to

engine project. HDFS is part of the Apache Hadoop Core project.[1]

oversampling a polynomial constructed from the data. The polynomial is evaluated at

often than is necessary makes the polynomial over-determined. As long as it receives

in the presence of a "few" bad points.[2]

that give a function F that can be evaluated in a handful of computer instructions.[3]

Homomorphic encryption is a form of encryption where one can perform a specific

algebraic operation on the plaintext by performing a (possibly different) algebraic

operation on the ciphertext. Depending on one's viewpoint, this can be seen as a

positive or negative attribute of the cryptosystem. Homomorphic encryption schemes

secure voting systems, collision-resistant hash functions and private information

by a distributed system. It encompasses those faults that are commonly referred to as

have Byzantine fault tolerance.

These arbitrary failures may be loosely categorized as follows: