Professional Documents
Culture Documents
Performance
Learning Objectives
Recognize the importance of data,
managerial issues and life cycle
Data
What is Data ?
Importance of Data ?
Data Quality ?
Result of Dirty Data ?
Database
Collection of storage objects like tables.
TABLE
EMPNO
NAME
SALARY DEPTNO
10
ARUN
30000
10
20
KIRAN
40000
20
FIELD
RECORD
Data Views
Keys
Primary, Foreign, Candidate, Alternate
Indexes
SQL(Structured Query Language)
Data Warehousing
Data warehouse is a repository of historical data
organized by subject to support decision makers
in the organization and include:
Online analytical processing(OLAP) which involves
the analysis of accumulated data by end users.
Multidimensional data structure which allows data
to be represented in a three-dimensional matrix
(or data cube).
Unlike the data tables in the database which are
designed to optimize storage, the data tables in a
warehouse are designed to respond to analysis
query.
Data warehousing entails an ETL process:
Extracting data from various sources
Transforming it to fit operational needs
Loading it into the end target (Data mart)
BIG DATA
Background :
For decades companies have been
making business decisions based on
transactional data stored in relational
databases.
In the recent years, companies have
realized that the non traditional, less
structured data in the form of
weblogs, social media, email, sensors
is trove of potential treasure as this
can be mined for useful insights.
4 Vs of big data
Volume
Velocity
Variety
Value
Volume
Machine generated data is produced in
much larger quantities than nontraditional data.
For example, a single jet engine can
generate 10 TB of data in 30 minutes
Velocity
Social media data streams while not
as massive as machine-generated data
produce a large influx of opinions and
relationships valuable to customer
relationship management. Even at 140
characters per tweet, the high velocity
(or frequency) of Twitter data ensures
large volumes.
Variety
Traditional data formats tend to be
relatively well described and change
slowly. In contrast, non-traditional
data formats exhibit a dazzling rate of
change.
Value
The economic value of different data
varies significantly. Typically there is
good information hidden amongst a
larger body of non-traditional data;
the challenge is identifying what is
valuable and then transforming and
extracting that data for analysis.
Examples
Retailers usually know who buys their products. Use of social media
and web log files from their ecommerce sites can help them
understand who didnt buy and why they chose not to.. This can
enable much more effective micro customer segmentation and
targeted marketing campaigns, as well as improve supply chain
efficiencies.
Social media sites like Facebook and LinkedIn simply wouldnt exist
without big data. Their business model requires a personalized
experience on the web, which can only be delivered by capturing
and using all the available data about a user or member.