Professional Documents
Culture Documents
ANALYTICS
(Weather Analysis and
Prediction)
End Semester MINI PROJECT report submitted in the proposal of the
requirements for the completion of the seventh semester of the
UNDER GRADUATE PROGRAM in Electronics and
Communication Technology (B.Tech in ECE).
Submitted
By:
Gaurav Satish
Kumar
(IEC2012021)
(IEC2012049)
Vatsal Mishra
(IEC2012068)
Akash Kumar
Salil
(IEC2012033)
Varsheindra Gautam
(IEC2012071)
CANDIDATES DECLARATION
We hereby declare that the work presented in this project report entitled
BIG DATA ANALYTICS (Weather Analysis and Prediction), submitted
in the proposal of the requirements for the completion of the 7th semester
of the UNDER GRADUATE PROGRAM (B.Tech in ECE), is an authenticated
record of our original work carried out from July 2015 to November 2015
under the guidance of Dr. Satish Kumar Singh & Dr. Rajat Kumar
Singh. Due acknowledgements have been made in the text to all other
material used. The project was done in full compliance with the requirements
and constraints of the prescribed curriculum.
Place: Allahabad
Date: 18/11/2015
Supervisor :
Dr. Satish Kumar Singh
Dr. Rajat Kumar Singh
ACKNOWLEDGEMENT
We owe special debt of gratitude to Dr . Satish Kumar Singh & Dr. Rajat
Kumar Singh for their constant support and guidance throughout the course
of our work. Their sincerity, thoroughness and perseverance have been a
constant source of inspiration for us. It is only their cognizant eforts that our
endeavours have seen light of the day.
TABLE OF CONTENTS
1. Introduction
2. Motivation
3. Problem definition and scope
4. Literature Survey and and analysis of recent similar work
5. Approach and Proposed methodology
6. Hardware and Software Requirements
7. References
INTRODUCTION
We live in the data age. It is not easy to measure the total volume of data
stored electronically, but an IDC estimate put the size of the digital
universe at 0.18 zettabytes in 2006, and is forecasting a tenfold growth by
2011 to 1.8 zettabytes. A zettabyte is 1021 bytes, or equivalently one
thousand Exabytes, one million petabytes, or one billion terabytes. That is
roughly the same order of magnitude as one disk drive for every person in
the world. Online searches, store purchases, Facebook posts, Tweets or
Foursquare check-ins, cell phone usage, etc. are creating a flood of data
that, when organized and categorized and analyzed, reveals trends and
habits about ourselves and society at large.
This flood of data is coming from many sources. Consider the
following:
Big data is the term for a collection of data sets so large and complex that it
becomes difcult to process using on-hand database management tools or
traditional data processing applications. The challenges include capture,
curation, storage, search, sharing, transfer, analysis and visualization. Big
Data refers to the explosion in the quantity (and sometimes, quality) of
available and potentially relevant data, largely the result of recent and
unprecedented advancements in data recording and storage technology.
To defne big data in competitive terms, we must think about what it takes
to compete in the business world. Big data is traditionally characterized as a
rushing river: large amounts of data flowing at a rapid pace. To be
competitive with customers, big data creates products which are valuable
and unique. To be competitive with suppliers, big data is freely available
with no obligations or constraints. To be competitive with new entrants, big
data is difcult for newcomers to try. To be competitive with substitutes big
data creates products which preclude other products from satisfying the
same need.
MOTIVATION
The use of big data will become a key basis of competition and growth for
individual frms. From the standpoint of competitiveness and the potential
capture of value, all companies need to take big data seriously. In most
industries, established competitors and new entrants alike will leverage
data-driven strategies to innovate, compete, and capture value from deep
and up-to-real-time information. Indeed, we found early examples of such
use of data in every sector we examined.
The use of big data will underpin new waves of productivity growth and
consumer surplus. For example, we estimate that a retailer using big data to
the full has the potential to increase its operating margin by more than 60
percent. Big data ofers considerable benefts to consumers as well as to
companies and organizations. For instance, services enabled by personallocation data can allow consumers to capture $600 billion in economic
surplus.
While the use of big data will matter across sectors, some sectors are set for
greater gains. We compared the historical productivity of sectors in the
United States with the potential of these sectors to capture value from big
data (using an index that combines several quantitative metrics), and found
that the opportunities and challenges vary from sector to sector. The
computer and electronic products and information sectors, as well as fnance
and insurance, and government are poised to gain substantially from the use
of big data.
There will be a shortage of talent necessary for organizations to take
advantage of big data. By 2018, the United States alone could face a
shortage of 140,000 to 190,000 people with deep analytical skills as well as
1.5 million managers and analysts with the know-how to use the analysis of
big data to make effective decisions.
Several issues will have to be addressed to capture the full potential of big
data. Policies related to privacy, security, intellectual property, and even
liability will need to be addressed in a big data world. Organizations need
not only to put the right talent and technology in place but also structure
workflows and incentives to optimize the use of big data. Access to data is
PROBLEM
DEFINITION:
What matters when dealing with data Big
Data?
Smart Sampling of data
Reducing the original data while not losing the statistical properties
of data
Finding similar terms
Efcient multi-dimensional indexing
Incremental updating of models
Crucial for streaming data
Distributed linear algebra
Dealing with large sparse matrices
In this project we deal with the weather prediction, we follow the mentioned
approach:
Map/Reduce
logical data flow
Fig. 1: Map/Reduce logical data flow
Hadoop BlockDiagram:-
Curve fitting:Capturing the trend in the data by assigning a single function across the entire
range. The example below uses a straight line function
This is a Vandermonde matrix. We can also obtain the matrix for a least
squares ft by writing
Premultiplying both sides by the transpose of the frst matrix then gives
So
As before, given
gives
points
, ...,
Setting
SCOPE
Analyzing big data allows analysts, researchers, and business users to make
better and faster decisions using data that was previously inaccessible or
unusable. The 5 Key Big Data Use Cases:
The use of big data will become a key basis of competition and growth for
individual frms. In most industries, established competitors and new entrants
alike will leverage data-driven strategies to innovate, compete, and capture
value from deep and up-to-real-time information. Indeed, we found early
examples of such use of data in every sector we examined. There is much
future research that could come out of this project. The potential
research
can be broken up into two main areas, that of
experimentation and exploration.
In the area of experimentation a more comprehensive performance study
could be done. There are many parameters in Hadoop that can be
customized that could potentially increase the performance of the Map
Reduce process. Experiments could also be conducted with larger clusters
and more demanding Map Reduce tasks that require much larger data sets.
In the area of exploration a more in depth study of Map Reduce could be
conducted. This could involve writing programs that make use of the Map
Reduce process that Hadoop provides. An attempt to make an eficient
solution to an NP-complete problem would be an interesting application.
transactional data accurately and storing it efficiently for all mission critical
systems. However, analytical capabilities of the system were limited.
RDF/OW).
Results:
Berkeley
2020
2021
2022
2023
Delhi
Year
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
RAM - 4GB
Hard Disk - 80 GB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Software:Ubuntu 14.04
SSH Server
Java 6 or greater
Hadoop 1.x
REFERENCES
[1] Hadoop. http://www.cloudera.com/what-is-hadoop/.
[2] Hadoop Overview. http://wiki.apache.org/hadoop/ProjectDescription/
[3] MapReduce. http://hadoop.apache.org/common/docs/mapred_tutorial.html
[4] Tom White. Hadoop: The Definitive Guide. OReilly Media, Inc, Gravenstein
Highway
North, Sebastopol, first edition, June 2009
[5] effrey Dean and Sanjay Ghemawat. Mapreduce: Simplified Data
Processing on Large
Clusters. Commun. ACM, 51(1):107113, 2008.
Figures :
Fig. 1 : http://resources.appistry.com/pressappistry-cloudiqstorage-now-generally-available/
Fig. 2 : http://www.tutorialspoint.com/hadoop/hadoop_hdfs_overview.htm
Fig. 3 : http://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
Fig. 4 : http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/