You are on page 1of 6

Big Data: A Review of Analytics Methods &

Techniques
YojnaArora Dr Dinesh Goyal
PhD scholar Professor
Gyan Vihar School of Engineering & Technology Gyan Vihar School of Engineering & Technology
Suresh Gyan Vihar University, Rajasthan, India Suresh Gyan Vihar University, Rajasthan, India
yojanaI83@gmail.com dinesh8dg@gmail.com

Abstract - In the current scenario, data is considered to be the


biggest assets. One who has maximum relevant data is Sources of big data include: transactions, scientific
considered to be rich in the information industry. But only the experiments, genomic investigations, logs, events, emails,
collection of data is not enough, it needs to be analyzed. This social media, sensors, RFID scans, texts, geospatial data,
huge amount of data which is termed ass Big Data cannot be audio data, medical records, surveillance, images, and videos
analyzed by traditional tools and techniques, rather it requires [9]
more advanced
Techniques which can make data retrieval, management and
storage much faster are required. In this paper an introduction +
to big data is explained along with a detailed comparative study
of various Big Data techniques which have already been
implemented. At the end various issues which still exist are
enlisted.

Keywords-Big Data, Analytic, Hadoop, Map Reduce


BIG DATA ERA
1. INTRODUCTION
The combination of growing torrent of data and on demand
A. Big Data Definitions
computing has launched the Big Data era
Big Data is defmed as amount of data just beyond
It can be summarised as: Big Data is huge amount of data
technology's capability to store manage and process
coming from heterogeneous sources at a very high speed, that
efficiently [1]
it is not possible for the existing tools and techniques to
analyze and extract value from it. Thus, new technology is
Big data that is too fast, too big or too hard for existing tools
required to analyze, manage and store this large amount of
to process [2]
Big Data
Big Data is a term defining data that has three characteristics.
First is the great volume of data, second the data cannot be
structured into tables and third is velocity which means data B. Characteristics ofBig Data
is generated rapidly and thus is need to be processed and
There are many defmitions of Big Data framed differently in
analyzed fast [3]
the past by various researchers, but all of them revolve
Big data concerns large volume, complex growing data sets around the five characteristics of Big Data. These 5 V's of
with multiple autonomous sources as stated by Xindong Wu Big Data are: [10]
in [4]
1) Variety: The first characteristic of Big Data is
In [5] Big data is defmed as large amount of data which Variety, which addresses the various sources which are
requires new technologies and architecture so that it becomes generating this Big Data. They are classified into three
possible to extract value from it by capturing and analyzing categories as :
process
a) Structured Data: Structured data concerns all data which
Big Data is a massive volume of both structured and can be stored in table with rows and colunms. These data are
unstructured data that is so large that it is difficult to process considered to be the most organised data but it accounts for
using traditional database and software techniques [6] only 5-10% of the total data available.

Big Data concerns massive, heterogeneous, autonomous b) Semi structured data: Semi-structured data is the
sources with distributed and decentralized control [7] information that does not reside in tables but they posses
some properties which make them convertible into structured
Data sets that are growing exponentially and that are too data. These are the data coming from web server logs, XML
large, too raw or too unstructured for analysis using relational documents etc. Comparatively less organised than structured
database technique [8] data, they also make only 5-10% of data available.

978-1-5090-5256-1116/$31.00 ©2016 IEEE 225


c) Unstructured data: Unstructured data constitutes the
biggest source of Big Data that is 80 - 90%. It includes data 4) Value: It is necessary to fetch meaningful
in the fonn of text, images, video, voices, web pages, emails, infonnation or patterns from this huge amount of Big Data
word documents and all other multimedia content. These data which can be used for analysis or detennining results on
are very difficult to store into database. These types of data application of queries. Thus, Value is the characteristic which
are both machine and human generated just like structured denotes fetching meaning from Big Data. The value can be
and semi structured data. extracted from Big Data as:
• Statistical
d) Multi Structured Data: Data which is a mix of • Events
Structured, semi structured and unstructured data. Example • Correlation
operating system logs • Hypothetical
2) Volume: Volume is the characteristic which
5) Veracity: The fifth V of Big data ensures the
makes Data as Big Data It denotes to the large amount of data
correctness and accuracy of infonnation. When dealing with
which is generating in every second. The range of data has
Big Data, along with maintaining its privacy and security, it
highly increased, crossing the range of terabytes to Peta, Exa
is also important to take care of Data quality, data governance
and now till Zeta bytes. Big data can be measured in the
and metadata management. Factors which should be
tenns of:
considered are :
• Records per Area • Trustworthiness
• Transactions • Authenticity
• Accountability
• Table
• Availability

3) Velocity: Data is coming from multiple sources C. Applications ofBig data


in huge amounts, as explained earlier. Also, Velocity is one
of the characteristics of Big Data which talks about the high The large number of big data applications as explained in
data rate at which it is being generated. Various applications [4] [II]
based on data rate are: [7]
• Healthcare
a) Batch: Batch means running the query in a • Public sector administration
scheduled and sequential way without any • Retail
intervention. Execution is on a batch of input • Manufacturing
b) Real Time: Real time data is defined as the • Personal location data
infonnation which is delivered immediately • Fact based decision making
after its collection. There is no delay in the • Improved customer experience
timeliness of infonnation provided. • Improved sales
c) Interactive means executing the tasks which • New product innovation
require frequent user interaction.
d) Streaming: The method of processing the data
as it comes in is called streaming. The insight
into the data is required as it arrives.

Fig 1. 5 v's of Big Data

226 2016 2nd 1nternational Conference on Contemporary Computing and 1nformatics (ic3i)
II. LITERATURE REVIEW OF EXISTING WORK

Tablel. Comparison of various Big Data Analytics techniques


Author's Name Aim Technique Applied! Key Features Findings/Result
Explained
Venkateswara Reddy Eluri, To integrate Big Data K means clustering & Mahout API works on K means clustering is better for
Amima Salim & Mohd AL- and Data Miniog Canopy Clusteriog top of Map Reduce globular data
Jabri rI2l
R.A Fadnavis and Samrudhi To explain Hadoop Architecture Hadoop provides fault Hadoop provides a complete
Tabhane [13] programming & Ecosystem tolerance and distributed infrastructure to handle Big Data
paradigm ofHadoop processing feature
and Map Reduce
Fatos Xhafa, Victor Naranjo To implement Yahoo Map Reduce Software chain Able to do Real time analytics for
and Sanfi Caballe [14] S4 software framework architecture system monitoring and decision
making
Poonam Vashisht and Vishal To explain Big Data Community detection, Video analytics Analytics on Audio and Video
Gupta( [15] Analytics Methods Social Influence architecture was done
analysis & liok
prediction
Kyounghyun Park, Minh Chau To explain Big Data Hadoop Web server portal & Big data analytics platform
Nguyen and Heesun Won [16] analytics platfonn YARN Analytics Portal generated to support Big Data
Management tools
Sruthika and N Tajunisha [17] To explain meaning of Descriptive, Hadoop architecture Various analytics technique
Analytics & Big Data Predictive & traditional and new are explained
Analvtics Prescriptive analvtics
Bichitra MandaI, Ramesh To explain all the Hadoop Architecture Programming Paradigm: Application ofHadoop io terms of
Kumar SahOG and Srinivas details about Big Data Map Reduce work processing is explained
Sethi rI8l
Parth Chandarana and M To explain Batch Oriented, Various types of data The framework helps in Social
Vijayalakshini( [19] characteristic & OLTP, E commerce sources are identified media and interactive adhoc query
features ofBig Data & Stream Processing processing
Framework
Dawei Jiang et. al [20] To tackle with Big New framework ePic Concurrent Programs are parallelized and run
Data's Data Variety (extensible system) Programming model for time system takes care of fault
problem parallel computations tolerance
Zoltan Prekopcsak et. a1. [21] To implement a new Hadoop and Rapid Programming Model of Radoop is an excellent tool for
framework Radoop Miner tool Hadoop & User Interface Big Data Analytics
of Rapid Mioer
Alexander Alexendrov, Rico To ensure parallel data Data warehousing, Stratosphere framework Easy programmiog of analytical
Bergmann and Stephem even analysis infOlmation extraction applications
r22f and integration
Xindong Wu, Xingquan ZllU, To implement a model Infonnation --7 HACE theorem Data Driven revolution model
Gong Qing Wu and Wei Ding which will Mining---7 implemented
[23] characterize features User Interest
of Big Data Modelling---7
Revolution Security & Privacy
Bama Saba and Divesh To identify various Statistical & Logical Data quality issues and Rules were discovered in semi
Srivastava [24] issues related to Big methods of Data challenges were structured data based on value and
Data Quality Quality Monitoriog discussed in detail structure
Jinsong Zhang, Yan Chen and To explain major InfOlmation Data Capture, Storage,
Taoyiog Li [25] challenges & Integration, Data Management and Analytics are
Irmovation scope in visualization, identified as Big Data Challenges
Big Data Technical Irmovation, --------
Product Irmovation
etc
Seref Sagiroglu and Duygu To get an overview Big data's content, Various advantages and Future scope: llinplementation of
Sioang [26] about Big data methods & tools challenges enlisted better tools & methods for
explaioed analyzing data
Janusz Weilki [27] To explaio the Hadoop & Map \Big Data's notion, tools,
growing role of data & Reduce ------- challenges & future scope
infOlmation discussed
Du Zhang [28] To explain about Data Mioing methods Spatial, Tempoml, Text Data quality of Big data is
multidimensional are explained to & Functional increased by removing
issues and challenges remove inconsistencies inconsistencies
in Big Data & Big inconsistencies to
Data Analysis increase the data
qualitv
Katharina Ebner, Thilo Buhnen To explain the Eight contiogency Understandiog of The analysis of organizational
and Nils Umach [29] methods of handling factors identified Decision making decision makers need to start
Big Data by the grouped ioto three methods thirikiog about whether and how
corporate industry categories as Strategy, to facilitate Big Data analytics
Resource & Operating
environment
YouseefM Essa [30] To implement new Mobile Agent & Map Hadoop provides MRAM improves Big Data
framework MRAM Reduce paradigm distributed parallel Analysis & overcome drawbacks
under JADE processing & Mobile of Hadoop

2016 2nd 1nternational Conference on Contemporary Computing and 1nformatics (ic3i) 227
Agent provides mobility
Sung HwanKim, Nam UK To extract relevant Attribute selection Maintain & protect The relevance between attributes
Kim and Tai Myoung Chung information from Big method in Data important attributes of data sets is really important
[31] Data Mining
Zibin Zheng, Jiernrning Zhu To study Big data Service - generated Quality metric & The quality of service oriented
and Michael R Lyu [32] analytics Big Data and Big security considered systems is enhanced
Data as service
Avita Katal, Mohammad To explain importance Hadoop Good practices followed Defmitions, Tools, issues &
Wazid and R H Goudar [33] of Big Data in modem for Big Data challenges related to Big Data are
world discussed
Marcus R. Wigan and Roger To explain various Consumer Profile Various applications of Big Data
Clarke [34] applications of Big Database, Loyality are explained
Data Cards, Social Media, --------

Yuri Demchenko, Paolo To study about the Scientific Data Life Relationship of SDLM SDLM is implemented which
Grosso andCees de Laat [35] impact of Big data on cycle Management with modem e Science helps in data management
modem and future Model
Scientific Data
Infrastructure
Dan Garlasu and Oana To implement Big Grid Computing: A Grid Centre High storage capability & High
Grigoriu [36] Data Analytics kind of distributed Computing element processing power is achieved y
computing Storage Element using Grid computing system
Worker Node
Stephen Kaisler, Frank To explain various Storage, Management Big Data analytics Various issues, challenges and
Arrrnour, J. Alberto( [37] issues in Big Data & Processing framework and design factors that may lead to future data
methodology analyse are identified

III Big Data Analytics Approach [34]


standard business reports, ad hoc reports, OLAP, alerts and
Big Data is important because it helps to Gather tore and notifications.
manipulate vast amount of data at a right speed at right time.
Big Data Analytics id\s an approach to identify hidden 2) Proactive-Proactive approach means looking forward i.e.
patterns and uncover information from huge amount of Big proactive big data analytics is required for proactive decision
Data so that it may help to take better decisions. The two making such as optimisation, text mining, modelling. This
analytics approaches are: reveals the patterns and trends implemented for future decision
making. This can be used for extracting important information
1) Reactive-In Reactive approach knowledge is gained from from terabytes of data and using it for future decision making.
the past which may have some usefulness or purpose in the Proactive big data analytics is not easy, its impact is more of a
future. The use of Business intelligence in this approach ends culture change and not just decision making. This can even
up in generating help the decision makers to have more insight and knowledge
about the problem and analytics to be applied on it

Collection

Big Analytics

Fast Data

Fig 2 Big Data Stack

228 2016 2nd International Conference on Contemporary Computing and Informatics (ic3i)
A. Value afBig Data

Then Now

1. Data is in structured form Data is raw, complex & Unstructured form


2. It is supported by Relational Database It is supported by Click streams
3. Predictive models were build Analyze all data and run models on it
4. Data Mining methods & techniques applied Big Data methods and techniques applied
i.e Clustering, Classification i.e Hadoop, Map Reduce
5. It can store larger data sets in data warehouse t can store larger set on a cluster of computers

III. ISSUES IDENTIFIED IN BIG DATA PROCESSING about various Big Data analytics techniques. The paper ends
by mentioning a few important issues which can be
Issue 1: Working on data sampling methods which can help considered as future aspects for further research.
in reducing the volume of data for analysis

Issue 2: Working on the parameter scaling, which means REFERENCES


with the changing increase in size of data set, the algorithms
are not able to adapt accordingly. Data Mining and Machine [I] Stephen Kaisler, Frank Amnour, J. Alberto," Big Data: Issues
Learning algorithms are the sample which cannot scale up to and Challenges Moving Forward",46th Hawaii International
zeta byte. Conference on System Science, IEEE,2012

Issue 3: Not all the problems are map reducible, so [2] Sam Padden, "From database to Big Data,", in IEEE Computer
identifying the big data problems which are map reducible Society, 2012

Issue 4: To improve statistical and machine learning [3]Dan Garlasu, "Data Implementation Based on Grid Computing",
algorithms to be more robust
[4] Avita Katal, Mohammad Wazid and R H Goudar, "Big Data:
Issues, Challenges, Tools and good Practices", in IEEE 2013
Issue 5: Various technical issues which arise while
processing Big Data are Fault tolerance, Heterogeneity, [5] Seref Sagiroglu and Duygu Sinang, "Big Data A
Quality and scalability Review",IEEE,2013

Issue 6 : When classified in a broad way, Security, Data [6] Yuri Demchenko, Paolo Grosso andCees de Laat, "Addressing
Management and Sharing of big data as explained in [7] are Big Data Issues in Scientific Data Infrastructure", in IEEE 2013
the three major issues.
[7] Parth Chandarana and M Vijayalakshrni, "Big Data Analytics
Issue 7: Issues related to change of Technology and lack of Framework", in International Conference on Circuits, System,
appropriate people with analytical skills also needs Communication and Information Technology Applications",IEEE,
2014
consideration.
[8] Rich Adduci, Dave Blue and Guy Chiarello, "Big Data: Big
Issue 8 Maintaining the Quality of data extracted for decision Opportunities to create Business value", in EMC2
making is important. Some of the issues related to data
quality are mentioned in [38]. Also, [7] explains various [9] M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, and P.
inconsistencies found in data and its repair techniques. Tufano, Analytics: the real-world use of big data: how innovative
enterprises extract value from uncertain data, Executive Report,
Issue 9: Difficulties associated with analysis of ever IBM Institute for Business Value and Said Business School at the
increasing uustructured data are still pertaining. University of Oxford, 2012.

Issue 10: Data protection and Access control explained in [10] Parth Chandarana and M Vijayalakshrni, "Big Data Analytics
[31] are the security issues Framework", in International Conference on Circuits, System,
Communication and Information Technology Applications" ,IEEE,
2014
Issue 11: Cost is also a major concern in dealing with Big
data [ll]Janusz Weilki, "Implementation of Big Data Concept in
organizations- possibilities, impediments and challenges",
proceeding of 2013 Federated conference on computer science and
information systems, pp985-989, IEEE, 2013
IV. CONCLUSION
[12] Dr Venkateswara Reddy & MS. Arnina Salim, "A comparative
study of various clustering techniques on Big Data Sets using
Big Data has become the greatest assets in the current Apache Mahout", in 3,d MEC International Conference on Big Data
scenario. Handling it is a difficult task. This paper explains Smart City, IEEE, 2016
the definition and characteristics of Big Data along with Big
Data stack. It also includes a detailed comparative study

2016 2nd 1nternational Conference on Contemporary Computing and 1nformatics (ic3i) 229
[13] Prof R A Fadnavis & Sannudhi Tabhane, "Big Data [30] Youseef MEssa, "Mobile Agent Based New Framework for
Processing using Hadoop", in IJCSIT, Vol I, 2015 improving Big Data Analysis", in International Conference on
Cloud Computing and Big Data, IEEE, 2013
[14] Fatos Xhafa, Victor Naranjo and Sanfi Caballe, "A software
Chain approach to Big Data Stream Processing and Analytics", in [31] Sung Hwan Kim, Nam UK Kim and Tai Myoung Chung,
International Conference on Complex Intelligent and Software "Attribute Relationship Evaluation Methodology for Big Data
Intensive systems, IEEE 2015 Security", in IEEE, 2013

[15] Poonam Vashisht and Vishal Gupta, " Big Data Analytics [32] Zibin Zheng, Jiennning Zhu and Michael R Lyu, "Service
Techniques: A survey", in IEEE 2015. generated big data and big data as a service : An Overview", in
IEEE International Congress on Big Data, 2013

[16] Kyounghyun Park, Minh Chau Nguyen and Heesun Won, [33] Avita Katal, Mohammad Wazid and R H Goudar, "Big Data:
"Web based Collaborative Big Data Analytics on Big Data as a Issues, Challenges, Tools and good Practices", in IEEE 2013
service platform", in ICACT, July 2015
[34] Marcus R. Wigan and Roger Clarke. "Big Data's big
[17] Sruthika and Dr. N. Tajunisha, "A study on evolution of Data unintended consequences", in IEEE Computer Society, 2013
Analytics to Big Data Analytics & its research scope", in 2nd
International Conference on Innovations in Information Embedded [35] Yuri Demchenko, Paolo Grosso andCees de Laat, "Addressing
and communication system, IEEE, 2015 Big Data Issues in Scientific Data Infrastructure", in IEEE 2013

[18] Bichitra Mandai, Ramesh Kumar Sahoo and Srinivas Sethi, [36] Dan Garlasu, "Data Implementation Based on Grid
"Architecture of efficient word processing using Hadoop for Big Computing"
Data Applications", in International Conference on Man and
Machine Interfaccing",IEEE 2015 [37] Stephen Kaisler, Frank Arrmour, J. Alberto," Big Data: Issues
and Challenges Moving Forward",46 th Hawaii International
[19] Parth Chandarana and M Vijayalakslnni, "Big Data Analytics Conference on System Science, IEEE,20 12
Framework", in International Conference on Circuits, System,
Communication and Information Technology Applications",IEEE, [38] Barna Saba and Divesh Srivastava, "Data Quality : The other
2014 face of Big Data", in IEEE, 2014

[20] Dawei Jiang, Gang Chen, Beng Chin ooi and Sai Wu", "epiC:
An extensible and scalable system for processing Big Data",
Proceeding ofVLDB Endowment, Vol 7, No.

[21] Zoltan Prekopcsak, Garbar Makrai and Tamas Henk, "Radoop :


Analyzing Big Data with Rapid miner and Hadoop"

[22] Alexander Alexendrov, Rico Bergmann and Stephem even,"


The stratosphere platform for Big Data Analytics", Springer, 2014

[23] Xindong Wu, Xingquan Zhu, Gong Qing Wu and Wei Ding,
"Data Mining with Big Data", in IEEE transactions in knowledge
and data engineering, Vol 26, Number I, January 2014

[24] Barna Saha and Divesh Srivastava, "Data Quality: The other
face of Big Data", in IEEE, 2014

[25] Jinsong Zhang, Yan Chen and Taoying Li, "Opportunities of


Innovation under challenges of Big Data", in 10th International
Conference on Fuzzy Systems and Knowledge Discovery (FSKD),
IEEE, 2013///

[26] Seref Sagiroglu and Duygu Sinang, "Big Data : A


Review",IEEE,2013

[27] Janusz Weilki, "Implementation of Big Data Concept in


organizations- possibilities, impediments and challenges",
proceeding of 2013 Federated conference on computer science and
information systems, pp985-989, IEEE, 2013

[28] Du Zhang, " Inconsistencies in Big Data", in Proceeding of


IEEE international conference on Cognitive Informatics and
Cognitive Computing" IEEE, 2013

[29] Katharina Ebner, Thilo Buhnen and Nils Urbach, "Think Big
with Big Data: IdentifYing Suitable Big Data Strategies in Corporate
Environment", 47th Hawaii International Conference on System
Science", IEEE, 2014

230 2016 2nd 1nternational Conference on Contemporary Computing and 1nformatics (ic3i)

You might also like