Iaetsd Enhancement of Performance and Security in Bigdata Processing

ISBN:978-1534910799
Proceedings of ICAER-2016
ENHANCEMENT OF PERFORMANCE AND SECURITY IN

BIGDATA PROCESSING
Aswin Jose 1
1
Final year, M.E, Computer Science And Engineering, Dhaanish Ahmed College of Engineering and
Technology, Chennai - 600045, India,aswin1906@gmail.com
Abstractin the proposed system, building a Big Data Application with No SQL Database.
Each record and its associated data are usually stored together in a single document
Database, this simplifies data access and reduces the need for joins or complex transactions.
Documents stored are schema free and similar to each other, this flexibility can be
particularly helpful for modelling unstructured data. In this case, Collecting all the sensitive
data and process into server environment for storage in the no-sql database. Before storing
all the sensitive data's encryption will takes place using Proxy Re-Encryption algorithm. It
will allow to encrypt all the sensitive data's and reporting datas like stream of bytes as
well.No-sql database doesnt care about the datatype. It will simply store the data in any
format with less storage memory. Its leads to faster data access from multiple access point.
While external user collecting data with proper approval and the user should decrypt the
data using key and if any of the duplicate data exists means it will be filtered based on
mapReduce.mongoDB will store the data based on Value-Key Pair relationship.
Keywords--- NoSQL, Structured Data, Big Data, map-reduce, secure sharing, sensitive data, jwt
authentication, data streaming, proxy re-encryption
I. INTRODUCTION
The Internet and in various news media, it summarize all types of opinions in different media in a realtime fashion, including updated, cross-referenced discussions by critics. This type of summarization
program is an excellent example for Big Data processing, as the information comes from multiple,
heterogeneous, autonomous sources with complex and evolving relationships, and keeps growing.
Along with the above example, the era of Big Data has arrived. Every day, 2.5 quintillion bytes of data
are created and 90 percent of the data in the world today Ire produced within the past two years.
Capability for data generation has never been so powerful and enormous ever since the invention of
the information technology in the early 19th century. As another example, on 4 October 2012, the first
presidential debate between President Barack Obama and Governor Mitt Romney triggered more than
10 million tweets within 2 hours. Among all these tweets, the specific moments that generated the most
discussions actually revealed the public interests, such as the discussions about Medicare and vouchers.
Such online discussions provide a new means to sense the public interests and generate feedback in
real-time, and are mostly appealing compared to generic media, such as radio or TV broadcasting.
Another example is Flickr, a public picture sharing site, which received 1.8 million photos per day, on
average, from February to March 2012.
Assuming the size of each photo is 2 megabytes (MB), this requires 3.6 terabytes (TB) storage every
single day. Indeed, as an old saying states: a picture is worth a thousand words, the billions of
pictures on Flicker are a treasure tank for us to explore the human society, social events, public affairs,
disasters, and so on, only if the power to harness the enormous amount of data. The above examples
demonstrate the rise of Big Data applications where data collection has grown tremendously and is
beyond the ability of commonly used software tools to capture, manage, and process within a tolerable
elapsed time. The most fundamental challenge for Big Data applications is to explore the large
volumes of data and extract useful information or knowledge for future actions. In many situations,
the knowledge extraction process has to be very efficient and close to real time because storing all
observed data is nearly infeasible. For example, the square kilometer array (SKA) in radio astronomy
www.iaetsd.in
IAETSD 2016
ISBN:978-1534910799
consists of 1,000 to 1,500 15-meter dishes in a central 5-km area. It provides 100 times more sensitive
vision than any existing radio telescopes, answering fundamental questions about the Universe
II. RELATED WORK
In this section, we focus on previous work on relevant topics such as encryption, access control,
trusted computing, and data security destruction technology in services. Computation services refer
primarily to operations (such as encrypting data, conversion, or function encryption) on data used
by participants, which can invigorate dead data.
We consider users preferences as sensitive data. When Alice submits a query (sportswear),
the Search Engine Service Provider (SESP) first looks for Alices preference on the big data
platform.
The Regarding encryption technology, the Attribute-Based Encryption (ABE) algorithm
includes Key-Policy ABE (KP-ABE) and Cipher text-Policy ABE (CP- ABE). ABE decryption
rules are contained in the encryption algorithm, avoiding the costs of frequent key distribution
in ciphertext access control. However, when the access control strategy changes dynamically, a data
owner is required to re-encrypt the data. A semi-trusted agent with a proxy key can re-encrypt cipher
text; however, the agent cannot obtain the corresponding plaintext or compute the decryption key
of either party in the authorization process. The FHE mechanism permits a specific algebraic
operation based on c i p h e r t e x t that yields a still encrypted result. More specifically, retrieval
and comparison of the encrypted data produce correct results, but the data are not decrypted
throughout the entire process. The FHE scheme requires very substantial computation, and it is not
always easy to implement with existing technology. In regard to cipher text retrieval with a view
toward data privacy protection in the cloud, cipher text retrieval solutions in the cloud are proposed
.Regarding access control, a new cryptographic access control scheme, Attribute-Based Access
Control for Cloud Storage (AB- ACCS). Each users private key is labeled with a set of attributes,
and data is encrypted with an attribute condition restricting the user to be able to decrypt the data
only if their attributes satisfy the datas condition. Distributed systems with Information Flow
Control (DIFC) use a tag to track data based on a set of simple data tracking rules. DIFC allow
untrusted software to use private data, but use trusted code to control whether the private data
can be revealed. The authors consider the complexity of fine-grained access control for a large
number of users in the cloud and propose a secure and efficient revocation scheme based on a
modified CP-ABE algorithm. This algorithm is used to establish fine-grained access control in which
users are revoked according to Shamirs theory of secret sharing. With a Single Sign-On (SSO),
any authorized user can log in to the cloud storage system using a standard common application
interface
III. TRUSTED COMPUTING AND PROCESS PROTECT
Trusted Computing Group (TCG) introduced the Trusted Platform Module (TPM) in its
existing architecture, to ensure that a general trusted computing platform using TPM security
features is credible. In academia, the main research idea includes first building a trusted terminal
platform based on a security chip, and then establishing trust between platforms through remote
attestation. Then, trust is extended to the network. Integrity measurement is the primary technical
means of building a trusted terminal platform. Research on virtual platform measurement
technology includes HIMA and Hyper Sentry[13] metric architecture. Using virtual platform isolation
features, HIMA measures the
www.iaetsd.in
IAETSD 2016
ISBN:978-1534910799
Integrity of a virtual machine by monitoring the virtual machines memory. Hyper Sentry
completes the integrity measurement using a hardware mechanism. TCG issued a Trusted
Network Connection (TNC) architecture specification version 1.0 in 2005, characterized by having
terminal integrity as a decision of network access control. Chinese scholars have conducted
research on trusted network connections based on the TNC architecture. Beginning by establishing
the trust of the terminal platform, Feng et al. proposed a t r u s t w o r t h i n e s s -based t r u s t model
and provided a method of building a trust chain dynamically with information flow. Zhang et al.
[17]
proposed a transparent, backward-compatible approach that protects the privacy and
integrity of customers virtual machines on commodity virtualized infrastructures. Dissolver is a
prototype system based on a Xen VMM and a Confidentiality and High-Assurance Equipped
Operating System (CHAOS). It ensures that the users text data exist only in a private operating space
and that the users key exists only in the memory space of the VMM. Data in the memory and
the users key are destroyed at a user-specified time
IV. SENSITIVE DATA STORAGE
H-PRE involves three types of algorithm, traditional identity-based encryption (including
SetupIBE, KeyGenIBE, EncIBE, and DecIBE), re-encryption (including KeyGenRE, ReEnc, and ReDec
functions), and the last one is the traditional public key cryptosystems (including KeyGenPKE,
EncPKE, and DecPKE). The basic H-PRE process is simple. The data owner encrypts sensitive data
using a local security plug-in and then uploads the encrypted dat a t o a big data platform. The
data are transformed into the cipher text that can be decrypted by a specified user after PRE
services. If an SESP is the specified user, then the SESP can decrypt the data using its own private
key to obtain the corresponding clear text. We complete the following steps to implement the HPRE algorithm.
The data owner encrypts d a t a l o c a l l y , first u s i n g the Advanced Encryption Standard (AES)
symmetric encryption algorithm to encrypt the submission data and then using the PRE algorithm to
encrypt the symmetric key of the data. These results are all stored within the distributed data. In the
meantime, if the data owner shares the sensitive data with other users, the data owner must authorize
the sensitive data locally and generate the PRE key, which is stored in the authorization key server.
Let On the big data platform, the PRE server re-encrypts and transforms the original cipher using
the PRE key. Then, PRE cipher text, which can be encrypted by the (authorized) data users, is
generated. If the data user wants to use the data on the big data platform, the data user will send data
requests to the platform and then query whether there is corresponding data in the shared space. If such
data exist, the data user accesses and downloads it. The operation on the big data platform is
independent and transparent to users. Moreover, the computing resources of the big data platform are
more powerful than those of the client. Hence, we can put PRE computational overhead on the big
data platform to improve user experience. The PRE system includes data submission, storage (sharing),
and data extraction
Operations Data extraction operations after receiving the data download request, the
Web browser invokes the security plug-in and provides data download services for the data user,
in accordance with the following detailed steps. The browser (1) queries whether there is
authorization for the data user on the PRE server of the big data platform, and if an authorization
is in effect, proceeds to Step (2); (2) uses the download plug-ins to send data download requests to
the big data platform, which then finds PRE cipher text data in the data center; (3) pushes the
PRE cipher text to the secure data plug-in on the big data platform
www.iaetsd.in
IAETSD 2016
ISBN:978-1534910799
V. WORKING METHODS
H-PRE involves three types of algorithm, traditional identity-based encryption (including
SetupIBE, KeyGenIBE, EncIBE, and DecIBE), re-encryption (including KeyGenRE, ReEnc, and ReDec
functions), and the last one is the traditional public key cryptosystems (including KeyGenPKE, EncPKE,
and DecPKE). The basic H-PRE process is simple. The data owner encrypts sensitive data using a
local security plug-in and then uploads the encrypted dat a t o a big data platform. The data are
transformed into the cipher text that can be decrypted by a specified user after PRE services. If
an SESP is the specified user, then the SESP can decrypt the data using its own private key to obtain
the corresponding clear text. We complete the following steps to implement the H-PRE algorithm.
5.1. SetupIBE.k/;
Input security parameters k, generate randomly a primary security parameter mk, calculate
the system parameter set params using a bilinear map and hash function.
5.2. KeyGenIBE.mk, params, id/;
When the user requests the private key from the key generation center, the key
generation center obtains the legal identity (id) of the user and generates the public and private
keys (pkid, skid) for the user using params and mk.
5.3. KeyGenPKE.params/;
When a user submits a request, the key management center not only generates the
identity-based public and private keys, but also generates the public and private keys of the traditional
public key system (pkid, skid).
5.4. EncIBE.pkid; skid; params; m /;
When the user encrypts data, the data owner encrypts the clear-text (m) into the
ciphertext (c D .c1; c2/) using the users own (pk , sk ) and a random number (r 2 RZ ).
5.5 KeyGenRE.skidi ; skidi ; pkidj ; params /;
When the data owner (user i ) grants user j permissions, using skidi , skidi , and pkidj
user i computes the PRE key (rkidi idj ), completing the transformation from user i to user j.When a
user submits a request, the key management center not only generates the identity-based public and
private keys, but alsogenerates the public and private keys of the traditional public key system
(pkid, skid).
5.6. ReEnc.ci ; rkidi idj ; params /;
This is a function for decrypting the PRE ciphertext. After receiving the PRE ciphertext
(cj D .cj1; cj 2/) from the proxy server of the big data platform, user j determines the clear-text
(m0 D m) of the data using his or her own.
5.6. ReEnc.ci ; rkidi idj ; params /;
This is a function for decrypting the PRE ciphertext. After receiving the PRE ciphertext
(cj D .cj1; cj 2/) from the proxy server of the big data platform, user j determines the cleartext (m0 D m) of the data using his or her own.
VI. MODULE INTEGRATION
6.1. User Type Creation
Common user will create a profile based on the role. Each user details has been handled
in the mongo database. As requested every user details has been restricted as per their role.so here
the mongo DB will save the data in the JSON format. User details has been handled using node
server. Node server will act like a server to make a communication between client and DB. In this
project designed the User Interface Frame (Angularjs) to communicate the database using the
programming language like Nodejs.
www.iaetsd.in
IAETSD 2016
ISBN:978-1534910799
6.1. Sensitive Data Creation
The request has been handled by a server side programming called nodejs.Node
package manager is a main kit to handle all the request from the server side end. Internal user
will customize the requested details based on the External request. Internal user doesn't allow
to access all the details requested by External user. Consolidated the requested details in UI
and forward filtered data to admin. Admin is the one who is going to collect all the sensitive
data and update in the documentary database. Before updating data in the database the data
has been encrypted based on the Proxy Re-Encryption
6.1. Mongo Data Customization
Mongo DB allow to store bulk amount of data in the JSON format. It will allow
to handle unstructured data, semi-structured and structured data. Its have more robustness
and light weight to do manipulation. In this project, customizing the sensitive data and will
collect in the mongo DB for further manipulation. While doing CRUD operation it will filter
the data using unique id based upon the user request. Admin user also will customize the data
for approval that has been also handle in mongo DB.
6.1. Data Collection
External user used to collect all the sensitive data based on the admin approval. Hes
allowed to collect the buffered stream of data .Before fetching the data from documentary database
he should decrypt the data based on the key phrase.
VII.
CONCLUSION
In this project, here proposed the data storage has been handled in document type data like mongoDB.
Admin user will collect all the sensitive data in different perspective.Data collection includes PDF
files,json files etc.PDF Files has been collected in the form of stream of bytes.Prior to saving data in
the database admin used to do encryption followed by Proxy Pre-Encryption.External user will collect
the data based on the admin approval using the decryption key.So here the document data can able to
store bulk amount of data with less memory space and collecting the data with asynchronous call to
handle multiple event.
REFERENCES
[1] Guangyan Huang, Jing He, Chi-Hung Chi, Wanlei Zhou, Yanchun Zhang, A Data as a Product Model for Future
Consumption of Big Stream Data in Clouds. IEEE International Conference on Year: 2015.
[2] Kyounghyun Park; Minh Chau Nguyen; Heesun Won,Web-based collaborative big data analytics on big data as a
service platform,2015
[3] E.Y. Chang, H. Bai, and K. Zhu, Parallel Algorithms for Mining Large-Scale Rich-Media Data Proc. 17th ACM
Intl Conf. Multimedia, (MM 09,) pp. 917-918, 2009.
[4] L. Bonnet, A. Laurent, M. Sala, B. Laurent, N. Sicard ,Reduce, It say: What nosql can do for data aggregation and
bi in large repositories. In Database and Expert Systems Applications (DEXA), 2011
[5] A. Machanavajjhala and J.P. Reiter, Big Privacy: Protecting Confidentiality in Big Data ACM Crossroads, vol.
19, no. 1, pp. 20-23, 2012
[6] M. Stonebraker, Why Enterprises Are Uninterested in NoSQL(MONGO). [Online] Available at:
http://cacm.acm.org/blogs/blog-cacm/99512-whyenterprises-are-uninterested-in-nosql/fulltext
[Accessed:
02
Oct
2010].
[7] Jinxin Huang; Lin Niu; Jie Zhan; Xiaosheng Peng; Junyang Bai; Shijie Cheng,"Technical aspects and case study
of big data based condition monitoring of power apparatuses. 2014
[8] E. Brewer, A certain freedom: thoughts on the CAP theorem[Report], 2010, New York: ACM.
[9] Information week, Surprise: 44% of Business IT Pros Never Heard of NoSQL. [Online] Available
www.iaetsd.in
IAETSD 2016
ISBN:978-1534910799
at:http://www.informationweek.com/software/informationmanagement/surprise-44-of-business-it-pros-neverhe/227500077 [Accessed: 02 Oct 2013].

[10] Oracle, Oracle: Big Data for the Enterprise. [White Paper, 2011] Redwood Shores: Oracle.
[11] J. Han, E. Haihong, G. Le, J. Du "Survey on NoSQL database," Pervasive Computing and the Applications
(ICPCA), 2011
[12] Mongodb, Big Data: Examples and Guidelines for the Enterprise Decision Maker. [White Paper, 2013] New
York: MongoDB.
[13] S. De Capitani di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi,and P. Samarati, Integrity for join queries in the
cloud, IEEE Trans. Cloud Comput., vol. 1, no. 2, pp. 187200, Jul.Dec. 2013.
[14] J. Pokorny NoSQL databases: a step to database scalability in web environment International Journal of Web
Information Systems20139:1, 69-82
[15] E. Birney, The Making of ENCODE: Lessons for Big-Data Projects, Nature, vol. 489, pp. 49-51, 2012
[16] Philosophy of Big Data: Expanding the Human-Data Relation with Big DataScience Services. Big Data Computing
Service and Applications (BigDataService), 2015 IEEE First International Conference on Year: 2015
[17] Jie Zhan; Jinxin Huang; Lin Niu; Xiaosheng Peng; Diyuan Deng; Shijie Cheng,"Study of the key technologies of
electric power big data and its application prospects in smart grid.,2014
www.iaetsd.in
IAETSD 2016

Iaetsd Enhancement of Performance and Security in Bigdata Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Iaetsd Enhancement of Performance and Security in Bigdata Processing

Uploaded by

Copyright:

Available Formats

ISBN:978-1534910799

ENHANCEMENT OF PERFORMANCE AND SECURITY IN

6.1. Sensitive Data Creation

at:http://www.informationweek.com/software/informationmanagement/surprise-44-of-business-it-pros-neverhe/227500077 [Accessed: 02 Oct 2013].

You might also like