You are on page 1of 3

Big Data Analytics

Dawn of Big Data


Data becomes big data when its volume, velocity, or variety exceeds the abilities of the IT
systems to ingest, store, analyze, and process it. Many organizations have the equipment and
expertise to handle large quantities of structured databut with the increasing volume and
faster flows of data, they lack the ability to mine it and derive actionable intelligence in a
timely way.
In defining big data, its also important to understand the mix of unstructured and multistructured data that comprises the volume of information.
Big Data Characteristics

(i). Volume The quantity of data that is generated is very important in this context. It is the
size of the data which determines the value and potential of the data under consideration and
whether it can actually be considered as Big Data or not. The name Big Data itself contains
a term which is related to size and hence the characteristic.
(ii) Variety- The next aspect of Big Data is its variety. This means that the category to which
Big Data belongs to is also a very essential fact that needs to be known by the data analysts.

This helps the people, who are closely analysing the data and are associated with it, to
effectively use the data to their advantage and thus upholding the importance of the Big Data.
(iii) Velocity- The term velocity in this context refers to the speed of generation of data or
how fast the data is generated and processed to meet the demands and the challenges which
lie ahead in the path of growth and development.

Big Data Data comes mainly in two forms1. Structured


2. Unstructured Data (there are also semi-structured data eg. XML)
Structured data has semantic meaning attached to it whereas
Unstructured data has no latent meaning. Few examples of unstructured data
1. Calls, text, tweet, net surf, browse through various websites each day and exchange
messages via several means.
2. Social media usage my several million people for exchanging data in various forms also
forms a part of Big Data.
3. Transactions made through card for various payment issues in large numbers every second
across the world also constitute the Big Data.
Tools for Analysing Big Data:
Hadoop is a free, Java-based programming framework that supports the processing of large
data sets in a distributed computing environment. It is part of the Apache project sponsored
by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of nodes involving
thousands of terabytes.
The Hadoop framework is used by major players including Google, Yahoo and IBM, largely
for applications involving search engines and advertising.
Hadoop is a popular choice when you need to filter, sort, or pre-process large amounts of new
data in place and distill it to generate denser data that theoretically contains more
information. Pre-processing involves filtering new data sources to make them suitable for
additional analysis in a data warehouse.
For example, a concert promoter might want to analyze twitter feeds to determine how
attendees liked the staging, set list, costumes, and warm-up band associated with a new Lady
Gaga tour. They might begin by collecting tweets related to the artist using hash tags like
#Gaga, #concert, #Palladium etc. The sentiment of each tweet can be determined by
parsing the text and comparing it with positive and negative words in the English dictionary.
In conjunction with MapReduce (is a software framework for easily writing applications

which process vast amounts of data), Hadoop can process a huge amount of data in parallel
on multiple servers, then re-combine it into a unified answer set or integrate it with other
types of enterprise data. The resulting data set can be imported into a data warehouse for data
mining and predictive analytics.
Cloudera Inc. is an American-based software company that provides Apache Hadoop-based
software, support and services, and training to business customers.
Use of Analytics
Banking

Fraud Detection

Cross Selling products

Customer retention

Healthcare

Preventive Healthcare

Drug Discovery

Retail

Inventory Mgmt.

Promotional analysis

Transportation

Traffic Management

Smarter roads

Government

Citizen services

Crime Prevention

You might also like