You are on page 1of 14

BIG DATA ANALYTICS: A SURVEY

Gautam. Dr. Chavi Rana
Research Scholar. Assistant Professor
UIET, Rohtak. UIET, Rohtak

Abstract
It is very difficult to storing, managing and processing huge amount of data. The term ‘Big
Data’ describes various techniques and technologies to store, distribute, manage and analyze
huge amount of data with different structures. Big data consists of structured, unstructured or
semi-structured data so there is problems occur regarding incapability of conventional data
management methods. To process these huge amounts of data in an inexpensive and efficient
way, parallelism is used. Big Data is a data which is in large amount and having complexity in
it and this complexity require new architecture, techniques, algorithms, and analytics to
manage it and extract knowledge from it. Hadoop is a framework for processing large amount
of data and provides better storage capacity for large datasets and performs parallel
processing of big data that gives better computational power to all the tasks. It works in batch
processing mode and Hadoop is the core platform for structuring Big Data, it also solves the
problem of making it useful for analytics purposes. In this paper, we provide a brief overview
of Big data management involving hadoop and highlight research efforts and the challenges
to big data.

Index Terms: Big Data, Hadoop, Map Reduce, HDFS, Hadoop Component.

1. Introduction:
1.1. Big Data: Definition

Big data is a term used to describe the exponential growth and availability of data, having
structured, unstructured and semi-structured data, whose size (volume), complexity (variability),
and rate of growth (velocity) make them difficult or even impossible to be managed and
analyzed using conventional software tools and technologies. When the amount of data to be
increases than the time to produce results is also increased. Retrieved data from big data is still a
complex and time consuming approach. Big data provides tremendous opportunities for
enterprise information management and decision making. In the recent study big data is not only
limited to business needs but also helps in research and scientific issues.

The Big Data problem is characterized by the 3V features:

Volume- a huge amount of data, Volume of big data can be measured in terms or several
megabytes, gigabytes, terabytes or petabytes.
Velocity- a high data ingestion rate or the speed with which the data can be analyzed.
Variety- a mix of structured data, semi-structured data, and unstructured data.

The core of hadoop consists of two parts the storage part and processing part. b) Processing part: Processing part of hadoop is Mapreduce which is a software framework which process large amount of data in the form of clusters. The solutions to the Big Data problem are largely based on the MapReduce framework[9] and its open source implementation Hadoop. Facebook.These 3V features gives a challenge to data processing systems since these systems cannot either scale to the huge data volume in a cost-effective way or fail to handle data with variety of types. Hadoop distribute clusters to the node so that they process parallely and this approach also takes advantage of data locality This allows the dataset to be processed faster and more efficiently which make it a more conventional supercomputer architecture which work on a parallel file system where computation and data are distributed via high-speed networking Fig.1. Hadoop is a framework for processing large amount of data and provides better storage capacity for large datasets and performs parallel processing of big data that gives better computational power to all the tasks. Hadoop architecture .2. Modules present in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework. Hadoop is the open source software founded by Apache and it is Linux based software. It is used by famous websites like Google. Yahoo. It works in batch processing mode and having two major components HDFS (Hadoop Distributed File System)[12] for huge data storage and MapReduce for processing huge amount of datasets. a) Storage part: Storage part of hadoop is HDFS (Hadoop distributed file system) which stores huge amount of data with high degree of throughput and this huge data is stored in form of clusters. Hadoop: Hadoop is an open-source software framework used for distributed storage and processing of big data using the MapReduce programming model. 1. Amazon and many more. When the data size is increased it create problems to existing algorithms to manage that so here main problem is to store and process that huge amount of data and this problem is solve by hadoop because it store and process huge amount of data in less time. Although Hadoop handles the data volume challenge successfully.1.

4. fault tolerance. MapReduce is the heart of the Hadoop framework that provides scalability across thousands of hadoop cluster.one Map task and the is Reduce task. HDFS has master/ slave architecture. NameNode. flexible access. Map task takes a set of data. The master node consists of a Task Tracker.The advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes. HDFS Architecture 1.1. It provides parallel processing of data. load balancing. 1. HDFS is reliable and manageable file system. 1. Every MapReduce job performs two tasks .A small Hadoop cluster having single master and multiple worker nodes called as slave node as shown in Fig. Job Tracker. and DataNode [14] where as slave or worker node acts as both a DataNode and TaskTracker.3. 1. Hadoop MapReduce: MapReduce is a java based programming paradigm for processing huge amount of data stored in HDFS. HDFS: Hadoop Distributed File System (HDFS) is the storing component in hadoop which store huge amount of structured. . It has great features such as high availability.[23] Fig. HDFS is java based file system. The reduce job takes the output of the map task as the input and combines them to smaller set of tuples (reduces the large dataset into a smaller one) based on the transformations and various logic. processes it at node level and generates the output. security. unstructured and seminars-structured data.2. easy management and high data throughputs.

Generally the input data is in the form of file or directory and it is stored in the Hadoop file system (HDFS). After processing. MapReduce Architecture Map stage: The map stage job is to process the input data as shown in Fig. which will be stored in the Hadoop Distributed File System (HDFS).3. 1.C. Fabio theoretical and experimental comparative Big Data is Pulvirenti. with scaling random forests to Big Data bag error Christine Tuleau. problems and also describe how out of problem. Table provides the extensive survey of researches. From both multilevel association rules consistent and inconsistent rules are including level- evaluated and compared based on crossing for different experimental results that lead to each zone. [34] level-crossing for each zone using multilevel DMFPM. Fig.3. N. 2017 Proposed a selective review that deal Addressing a Jean-Michel Poggi. scalable algorithms for mining problem algorithms for Tania Cerquitelli. with the name of author. Prajapati. Reduce stage: The Reducer’s job is to process the data that comes from the map stage. year of publication in descending order of research along with purposed work and approaches used by them as shown below: Authors Publication Proposed Work Technique Year used Daniele Apiletti. used. it produces a new set of output. [33] Dinesh J. 1. Malot. multilevel association rules including for extracts Chauhan. in the Big Data domain having both mining in the Paolo Garza. the final conclusions. 2. Literature Survey: This paper provides a detailed review of different approaches used in Big Data in recent years. 2017 The proposed method initially extracts Use DMFPM Sanjay Garg. Venturini. Robin Genuer. The input file is passed to the map function that processes the data and creates several small chunks of data. . 2017 Reviews Hadoop and Spark based Spark Elena Baralis. Nathalie bag error addressed. Luca analyses.

which solves the problems regarding SOM estimate [39] selecting and allocating appropriate big data resource to big data and used 4 V's characteristics. property of big data. scalable algorithm [38] computing. Feras A. [35] M. 2017 Presented resource management system Using Cod and Sandeep K. Through survey they find data Calheiros. information visualization. 2016 Study on healthcare data that is collected Assesses QoS Eyad Abdel Latif. which combine to reveal an effective and efficient way to perform closed-loop big data analysis with visualization and scalable computing. (HDDs) when they are used as storage executing Tassiulas. model . challenge. They also present the design model system Kian-Lee Tan. D. Bianchi. Navroop Kaur. Xu. [36] for Hadoop's MapReduce. analytics on Clouds for Big Data method used in Rodrigo N. Silvia out possible gaps in technology and management. P. 2017 Presented the design of marched system Designed a Rong Ge.Villa-Vialaneix. Sai 2016 Presents epiC. Bakratsas. [2] programming model and two customized data processing models. from various different sources so that for examines [40] quality and best practices of field is done historical health using big data tools. define the Big Data’s data variety programming Beng Chin Ooi1. Jun and implementation of epiC’s concurrent called epiC. 2017 Investigate the relative performance and Evaluate SSDs Basaras. Gang Chen. Marco provide future directions on Cloud. data by analytical infrastructure. L. Ziliang Zong. [37] obtaining power consumption data in and its tools. Batarseh. Guangchen Ruan 2017 Proposed framework that integrates Parallel mining and Hui Zhang. Dawei Jiang. an extensible system to Introduce a new Wu. Marcos D. different research. algorithm on real social network data. 2015 Discusses environments for carrying out Define various Assunçãoa. benefits of SSDs versus hard disk drives and HDDs by Katsaros. Sood. Qijun and demonstrate it measurement tools for marched system Gu. applications. and user interfaces to explore running on HPC large-scale multi-modal data streams is used.

2015 Presented an efficient system for Used PB level Dongsheng Wang managing PB level structured data called structured data and Guodong Liu. Chao Wang. 2015 Proposed a FPGA-based acceleration Used FPGA- Peng Chen. Davide independent from the specific NoSQL. [26] objects involving in BDA applications in applications. Ernesto NoSQL databases. preference to rank candidate. it for performance evaluation by using Optimization . Liming Zhu done on BDA applications involving the analyze the and Weishan lifecycle of BDA applications and work on BDA Zhang.N. Qinghua Lu. Nettoc. provide a comprehensive survey on Big Delay and Chenna Reddy. and Hong hardware acceleration and MapReduce solution with Yu. which is present in large degree and test Particle Swarm and Athanasios V. Banian. [24] multi-objective optimization approach to selection by make trade-off decision between optimization Service’s trust value and user’s QoS approach. Xuehai The combination of these two namely acceleration Zhou. 2015 Presented algorithms to collect big data Accelerated Raymond Wong. Tao Xu. 2015 Presented score-based benchmark for Used score- Ardagna. Fulvio adopters. [19] execution flow can enhance the task of MapReduce aligning short length reads to a known framework. P. banian overcomes the storage called Banian. the cloud. The proposed benchmark is benchmark for Frati. Sreedhar C. development. supported Big Data computing. 2015 The primary purpose of their work is to Algorithm of Kasiviswanath. Xi Li. Maria CF4BDA to analyze the existing work CF4BDA to Kihl. [25] configurations of the database and deployment environment. Lei Wan service selection by developing a novel and trust-based and Qi Yu. Simon Fong. based Wang. Rajkumar Buyya. Claudio A. Hongbing Wang. [20] problem. 2015 Presented conceptual framework Framework Zheng Li. [1] data management and to provide an Genetic overview on various algorithms related scheduling is to job scheduling in Hadoop. which supports based Damiani. visualization [3] and business models.A. Rebeccani. 2015 Proposed heterogeneous and trust-based Heterogeneous Chao Yu. reference genome. Aili solution with MapReduce framework. used.S.

elements. 2015 Proposed that bid data analytics can Use VPH Peter Hunter. [29] analytics in education with discussion on big data prospects and challenges way forward. [23] accelerated particle swarm optimization (APSO) (APSO) type of swarm search that algorithms to enhanced analytical accuracy within collect big data. Alun Evans Javi 2015 Presented a web-based application Use WEBGL Agenjo Josep Blat. basic The knowledge Xiaoxin Zhou. Marco Viceconti. Syed Akhter 2015 Described the nascent field of big data Nascent field of Hossain. and Unprecedented AND Xiaotong also highlight current research efforts challenges to Lin. 2014 Study and analyzed various techniques Speculative . Matturdi Bardi.Vasilakos. issues for educationist of big data analytics is defined. and information is presented. 2014 Reviewed the various benefits and Big data Zhou Xianwei. [30] and the challenges to big data. analytics in Also focus on research and development education and issues for educationist and practitioners development of big data analytics. as well as harnessing data the future trends. Research shows more powerful various and adapts various knowledge calculations is requirements of electric power big data. with big data analytics. [32] methods and techniques to ensure Big technique is Data security and privacy. [21] technology to give desirable medical combined it solutions. LIN Data and also presented some possible privacy Fuhong. Yanhao Huang and 2015 Proposed the structure. reasonable processing time. and successfully combined with VPH technology and Rod Hose. Xue-Wen Chen 2014 Presented overview of deep learning. which and meta data combines research from several fields of visualization image processing and 3D graphics. having analytic visualization of on-set 3D on the web [28] media data and metadata. done. LI challenges of security and privacy in Big security and Shuai. Suman Arora. techniques. defined. [22] calculations and multi-dimensional model is reasoning method of the new knowledge established and model.

Li 2014 Introduces a novel IIS that combines Combine IoT. Rakesh Varma. [16] enhancement that can reduce data blocks. Jiaerheng Computing. science for Ahati. Xiongpai Qin. [17] monitoring and management. particulate concentration. 2013 Provided study of the connection micro and Arianna Dagliati. Jinjun 2014 Presented types of fine-grained data Describe a Chen. and support authorized auditing and fine. with a case management. of scheduling which enhance the execution and [7] performance by using Hadoop. [4] benchmark work and their characteristics for evaluating are analyzed. Chi Yang. study on climate change and its ecological effects of a particular region. Miura.Dr. data gathering method with the help of algorithm for Nei Katoi and Ryu network clustering based on modified clustering. Da Xu. Also propose an variable sized Kotagiri. Huan Pei. the MapReduce framework. 2014 Proposed a new mobile sink routing and Use EM Hiroki Nishiyamai. Copy compute splitting technique of Hadoop. and and e-Science for environmental monitoring and Zhihui Liu. Chang Liu. Cloud GIS and e- Zhu. Andrea Marinoni. updates and scheme that can fully scheme for Rajiv Ranjan. geographical information system (GIS) environmental Jianwu Yan. scheduling algorithms and LATE- Speculative . micro and drawn by [27] macro-vascular disease can be drawn creating properly. Daisuke Takaishi.Madhu Goel. [18] expectation maximization technique. connection between various approach. than correlations among black disease can be Paolo Gamba1. and 2013 Reviewed last several years big data Use MRBench Xiaoyun Zhou. Geoinformatics. between air pollution and clinical macro-vascular Riccardo Bellazzi. Shifeng Fang. communication overheads for verifying small updates. records. [6] 2013 Objective of the research is to study For managing about MapReduce and various Big Data algorithms of scheduling which enhance various the scheduling performance.supporting Ramamohanarao grained update requests. Yunqiang Internet of Things (IoT).

[12] of the multi-dimensional association rule association rule based on BUC algorithm . F. localizing global map. result of framework Hadoop data processing. information society. 2010 Proposed a SLCA (Smallest Lowest SLCA based Haoji Hu and Common Ancestor) based keyword keyword search Minqi Zhou. Lü Boolean association rule on Apriori dimensional Guonian. 2010 Presented research on using different Define various Denis Music. [5] is used. [10] applies the vector space model automatic web information retrieval strategy. Ruqing vehicle position with respect to a global dimensioning Yang. S4 architecture Kesari. It is based on the texture of ground technique for from where the vehicle moves. Xu 2009 Introduces the single-dimensional Single- Yin. John. Mengjie Zhou. Hui Fang. Anish They includes large scale applications stream of data Nair. to use unsupervised and May D. 2006 Describes a grid-enabled approach for Grid-enabled Vijay K. automatic web page classification that approach for Vaishnavi. Seema Metikurke. Quo. Jasmin Azemovic. unbounded Robbins. Anand for data mining and machine learning . H. Phan. Daniel Warneke. a [15] for parallel data processing in clouds and new data present Nephele. [11] map. [14] search implementation for large-scale implementation XML data sets on a MapReduce cluster.Bruce applications of real-life deployments. Ming 2007 Proposed a approach to localize the Use Yang. and the data mining algorithm Boolean Anping. And evaluate the processing MapReduce process and compare the framework. execution is used. page classification. Wang. PEI algorithm. unsupervised methods of clustering to methods of . Jiao Feng. 2011 Discuss the opportunities and challenges Use Nephele. BI Shuoben. development of novel system. [13] data types for storing unstructured data way for storing within database and this research is unstructured inspired with current situation of data. on Apriori algorithm and BUC algorithm. Leonardo 2010 Outline the S4 architecture and describe For dealing with Neumeyer. 2004 Reporting the results of the first phase Use Chang. for large-scale data.

efficiently distributed storage and search is required for effective online analysis which requires effective techniques for data mining. This paper helps to a novice who wants to pursue his/her career in the field of big data. to the nodes where data is actually serializability David Tanid. semi. Challenges: Big data is very huge amount of data so set of challenges occur because difficulties regarding management. Future Direction: This work can be extended by developing a new job scheduling algorithm which consider all the parameters which can produce better performance. It is found that solutions to Big Data problem are largely based on the MapReduce framework and its open source implementation Hadoop. Security and Privacy is also a big issue in big data. Hadoop handles the data volume challenge successfully. serializability criterion.[9] discover relationship of genes and clustering to knowledge-based supervised discover classification is used to get accurate relationship of prediction in cancer diagnosis. 3. Data Integration. new protocols and interfaces are require which are able to manage structured. Scheduling. Visualisation and user interaction. In addition. Third. [8] located and also propose a new criterion. Second. scheduling. Sushant Goel. 4. Big data management includes different tools. First. Efficient handling of big data stream is big challenge which uses various programming models. Database Quasi- Serializability (PDQS) is used.structured and unstructured data. Data preparation. genes and knowledge- based supervised. Second. Security is crucial phase in any organization so strong mechanisms for the privacy of data should be needed. techniques and various algorithms for job scheduling in hadoop. the user profile (similar users) and us age profile (invoked services) should taken and some related collaborative filtering techniques . Parallel Database Parallel Quasi-Serializability. storing. There are many research challenges present in big data visualisation so more efficient techniques are required in real time visualization. security and processing occur. 2003 Distribute the scheduling responsibilities A new Hema Sharda. Conclusion: A survey of different big data approaches is presented of recent years. scheduling approach should be smart enough to make real-time responses to a changing environment. 5. Fourth.

"Comparative Study of Microarray Data for Cancer Research" proceedings of the 26th Annual International Conference of IEEE EMBS San Francisco. 79–80 (2015) 3–15.Anish Nair.438. Jun Xu.IEEE 2nd International Conference on Education Technology and Computer (ICETC).Anand Kesari. 2007.Haoji Hu and Minqi Zhou. "Big Data computing and clouds: Trends and future directions"J. and May D. Wang. LNCS 8206. Quo. Parallel Distrib. (Eds. Rajkumar Buyya. Lü Guonian. Beng Chin Ooi1and Kian-Lee Tan. Silvia Bianchi. Gang Chen.Denis Music. [7] Suman Storage and Dr. "epiC: an extensible and scalable system for processing Big Data"The VLDB Journal (2016) 25:3–26 DOI 10. Calheiros. USA * September 1-5. H.00 0 2003 IEEE. [12] BI Shuoben. Marco A. Chenna Reddy. pp. Phan.S. Vancouver. Nettoc. "Study on Data Mining in First Period of Jiangzhai Site Based on the Association Algorithms" 2009 International Conference on Artificial Intelligence and Computational Intelligence. [13] Jasmin Azemovic.): IDEAL 2013. [2] Dawei Jiang. Springer- Verlag Berlin Heidelberg 2013. May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering. Issue 5. [9] John. 2004. Sai Wu. JIAO Feng.Madhu Goel. June 13-15. [4] Xiongpai Qin. Kasiviswanath. "S4: Distributed Stream Computing Platform" 2010 IEEE International Conference on Data Mining Workshops. . P. CA. Turkey. PEI Anping.2009. Comput. 2009 IEEE DOI 10. 6. [6] Rakesh Varma. [8] Sushant Goel. Hema Sharda and David Tanid. Vaishnavi.1007/s00778-015-0393-2. 619–627. "Ground Texture Matching based Global Localization for Intelligent Vehicles in Urban Environment" Proceedings of the 2007 IEEE Intelligent Vehicles Symposium Istanbul. "Comparative analysis of efficient methods for storing unstructured data into database with accent on performance" 201O. "Survey Paper on Scheduling in Hadoop" Volume 4. XU Yin. Chang. 2006. "Grid-Enabled Automatic Web Page Classification" 2006 IEEE International Conference on Fuzzy Systems Sheraton Vancouver Wall Centre Hotel. References: [1] Sreedhar C. and Xiaoyun Zhou. Ming Yang and Ruqing Yang. Rodrigo N. Assunçãoa. "A Survey on Benchmarks for Big Data and Some More Considerations" H. [5] Leonardo Neumeyer. F. "Searching XML Data by SLCA on a MapReduce Cluster” 2010 IEEE. [14] Mengjie Zhou."Survey on MapReduce and Scheduling Algorithms in Hadoop" International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6. [3] Marcos D. [10] Seema Metikurke and Vijay K.14 | Impact Factor (2013): 4.13.Bruce Robbins. “A Survey on Big Data Management and Job Scheduling" International Journal of Computer Applications (0975 – 8887) Volume 130 – No.1109/AICI. November 2015. "Distributed Scheduler for High Performance Data-Centric Systems" b7803-76CI-XIO1lB17. BC. 2013. Yin et al. [11] Hui Fang.can be considered to integrate with ourservice selection approach. Canada July 16-21.N.

[21] Marco Viceconti.2485985. NO. ZHENG LI. [30] XUE-WEN CHEN1. Chi Yang. Peng Chen. Yunqiang Zhu. 2015. "Banian: A Cross-Platform Interactive Query System for Structured Big Data" TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-021 07/11 p p 6 2. Xuehai Zhou and Hong Yu. Lei Wan and Qi Yu. 2015. Peter Hunter and Rod Hose. 2014. Hiroki Nishiyamai. 2. Ernesto Damiani. [20] Tao Xu. 9. "Knowledge Model for Electric Power Big Data Based on Ontology and Semantic Web" CSEE JOURNAL OF POWER AND ENERGY SYSTEMS. LIMING ZHU AND WEISHAN ZHANG1. NO. Dongsheng Wang and Guodong Liu. MARCH 2015. Li Da Xu. "INFERRING AIR QUALITY MAPS FROM REMOTELY SENSED DATA TO EXPLOIT EOREFERENCED CLINICAL ONSETS: THE PAVIA 2013 CASE” IEEE."A Configuration- Independent Score-Based Benchmark for Distributed Databases" DOI 10. [17] Shifeng Fang. Arianna Dagliati. 25. and Zhihui Liu. VOL. JANUARY/FEBRUARY 2015. NO. February 2015. [24] Hongbing Wang. VOL. Xi Li. NO. [27] Andrea Marinoni. IEEE Transactions on Services Computing. VOL. [18] Daisuke Takaishi. "COMBINED 2D AND 3D WEB-BASED VISUALISATION OF ON-SET BIG MEDIA DATA" 978-1-4799-8339-1/15 2015 IEEE. 1. [22] Yanhao Huang and Xiaoxin Zhou. "Big Data Analytics in Education: Prospects and Challenges" 978-1- 4673-7231-2/15/ 2015 IEEE.1109/TSC. "Big Data. Jiaerheng Ahati. 4. "Big Data Deep Learning: Challenges and Perspectives" May 16. and Athanasios V. 10. JUNE 2011. 22. 12.7 1 Volume 20. Riccardo Bellazzi. "Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. 2015. Raymond Wong. I. [26] QINGHUA LU. 19. Big Knowledge: Big Data for Personalized Healthcare" IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS. [16] Chang Liu. VOL. Jinjun Chen. Ardagna. [23] Simon Fong.2015. Aili Wang. [29]Syed Akhter Hossain. Huan Pei.[15] Daniel Warneke. MAY 2014. NO. Vasilakos. NO. "Effective BigData-Space Service Selection over Trust and Heterogeneous QoS Preferences" IEEE. [28] Alun Evans Javi Agenjo Josep Blat. Jianwu Yan. AND XIAOTONG LIN. Fulvio Frati. . Davide Rebeccani. VOL. "Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable Fine-Grained Updates" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS. VOL. Chao Yu. MARIA KIHL. IEEE. SEPTEMBER 2014. JULY 2015. Number 1. I. " CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud" IEEE October 27. 6. "Toward Energy Efficient Big Data Gathering in Densely Distributed Sensor Networks" 2014 IEEE. [25] Claudio A. Nei Katoi and Ryu Miura. 2015. [19] Chao Wang. "Heterogeneous Cloud Framework for Big Data Genome Sequencing" IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. "An Integrated System for Regional Environmental Monitoring and Management Based on Internet of Things" IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS. Paolo Gamba1. Rajiv Ranjan and Ramamohanarao Kotagiri. "Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data" IEEE TRANSACTIONS ON JOURNAL NAME.

Nathalie Villa-Vialaneix.M. [38] Guangchen Ruan and Hui Zhang.Nitesh V.07[J]. 2011. 3-D Data Management: Controlling Data Volume [J]. 2012. Down the petabyte highway[J].C. Reinsel D. work. N. Data P. [37] Ziliang Zong. 2009. [36]. Chawla. LIN Fuhong. LHC: the guide (English version) [R]. Katsaros . . Laney D. [48] Beyer M. Wired Magazine 16. Tania Cerquitelli. D. [51] Mangelsdorf J. Supercomputing the climate: Nasa’s big data mission[J]. Luca Venturini . Obama Administration Unveils “Big Data” Initiative: Announces $200 Million in New R&D Investments [J]. Fabio Pulvirenti. 2011.April 2017. Business Intelligence meets Big Data: An Overview on Security and Privacy [J].2 2014. The importance of 'big data': a definition [J].Graham J. [50] Lefevre C. [47] Laney D. Houghton Mifflin Harcourt. The end of theory: the data deluge makes the scientific method obsolete. bigger digital shadows. IDC iView: IDC Analyze the Future. [43] Anderson C. Gartner [J]. Christine Tuleau-Malot. Tassiulas.Aug 2017. Damiani E. "MapReduce Based Multilevel Consistent and Inconsistent Association Rule Detection from Big Data Using Interestingness Measures" vol-9 September 2017.IEEE 2017. Sood. L. Sandeep K. Basaras. Chauhan. [32] MATTURDI Bardi.Yaochu Jin. and biggest growth in the Far East [J]. META Group Original Research Note. Gartner says solving big data challenge involves more than just managing volumes of data. [44] Mayer-Schönberger V. Qijun Gu. Cukier K. July 2017. 2011. Eyad Abdel Latif .[31] Zhi-Hua Zhou. "Marcher: A Heterogeneous System Supporting Energy- Aware High Performance Computing and Big Data Analytics" Volume 8. Williams.IEEE. and think [M]. July 2017. DC. Stamford. and productivity [J]. Batarseh. "Closed-loop Big Data Analysis with Visualization and Scalable Computing ". Rong Ge. 2011. Nature. Washington. 469(20): 282-283. ZHOU Xianwei. "Frequent itemsets mining for big data: A Comparative Analysis" IEEE . The Emergence of a New Asset Class[C]//World Economic Forum Report. Velocity and Variety. 2013. Big data: A revolution that will transform how we live. Prajapati. Big data: The next frontier for innovation. [33] Daniele Apiletti. [41] Gantz J. 2008.Sanjay Garg. Zgorski L. 2012. 2012. Elena Baralis. [34] Dinesh J. Jean-Michel Poggi. "Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives" IEEE Computational intelligence magazine | November 2014. Accessed online. [46] Manyika J. et al. [40] Feras A. [35] Robin Genuer. "Efficient Resource Management System Based on 4Vs of Big Data Streams " Volume 13. 2013: 11-27. CT: Gartner. [49] Beyer M A. The digital universe in 2020: Big data. [45] Ardagna C A. Volume 8. "Big Data security and privacy: A review” China Communications Supplement No.[14] Brumfiel G. 2001. Brown B. June 2016. P. Office of Science and Technology Policy. [39] Navroop Kaur . LI Shuai. Paolo Garza. Chui M. competition. "Assessing the Quality of Service Using Big Data Analytics: With Application to Healthcare" Volume4. "Hadoop MapReduce Performance on SSDs for Analyzing Social Networks " IEEE 2017. [42] Weiss R. "Random Forests for Big Data" Vol-23. Bakratsas.

2012. Info World.[52] Kalil T. 2013. 2012 03-29)[2013-03-06]. 2012. Big data is a big deal [J]. [53] Sheet F. [54] Lampitt A. pdf. ’The real story of how Big Data analytics helped Obama win’ [J]. http://www. . gov/sites/default/ files/microsites/ostp/big_ data fact sheet final. whitehouse. The White House. 14. Big Data Across the Federal Government [J].