You are on page 1of 19

UNIT 1

Data, data everywhere


I cant find the data I need
data is scattered over the network
many versions, subtle differences
I cant get the data I need

need an expert to get the data


I cant understand the data I found

available data poorly documented


I cant use the data I found

results are unexpected


data needs to be transformed from
one form to other

Introduction to Data Warehouse


Definition: A single, complete and consistent store of data obtained from a variety
of different sources made available to end users in a what they can understand and
use in a business context. [Barry Devlin]
A data warehouse is a powerful database model that significantly enhances the
users ability to quickly analyze large, multidimensional data sets.
It cleanses and organizes data to allow users to make business decisions based on
facts.
Hence, the data in the data warehouse must have strong analytical characteristics.

Why A Data Warehouse?


The Data Access Crisis

key to survival in the 1990s and beyond is being able to analyze, plan, and
react to changing business conditions in a much more rapid fashion.
top managers, analysts, and knowledge workers in our enterprises, need
more and better information.
Every day, organizations large and small, create billions of bytes of data about
all aspects of their business; millions of individual facts about their customers,
products, operations and people.
But for the most part, this is locked up in a maze of computer systems and is
exceedingly difficult to get at.
This phenomenon has been described as data in jail.

Why Data Warehousing?


Which are our
lowest/highest margin
customers ?
Who are my customers
and what products
are they buying?

What is the most


effective distribution
channel?

What product prom-otions have the biggest


impact on revenue?

Which customers
are most likely to go
to the competition ?
What impact will
new products/services
have on revenue
and margins?

Continued
Having a data warehouse we can get following benefits:
Improved user access: data is meant for analysis, benchmarking, prediction etc.
Better consistency of data: information contained in the data warehouse is standardized
All-in-one: give a business the "big picture" view that is needed to analyze the business,
make plans, track competitors and more
Advanced query processing: data warehouse will process queries much faster and more
effectively, leading to efficiency and increased productivity
Retention of data history: providing a reliable history of all changes, additions and
deletions. With a data warehouse, the integrity of data is ensured

Operational vs. Informational Systems


Operational systems:
help the every day operation of the enterprise
backbone systems of any enterprise
include order entry, inventory, manufacturing, payroll and accounting.

Informational systems:

deals with analyzing data and making decisions


i.e. about how the enterprise will operate now, and in the future
informational systems have a different focus from operational ones
also have a different scope.
Where operational data needs are normally focused upon a single area, informational
data needs often span a number of different areas and need large amounts of related
operational data.

Data Warehouse
According to Inmons (father of data warehousing) definition:
An enterprise structured repository of subject-oriented, time-variant, historical data used for
information retrieval and decision support. The data warehouse stores atomic and summary
data.
OR
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection
of data in support of management's decision making process.
Ralph Kimball provided a more concise definition of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis.
This is a functional view of a data warehouse.
Kimball did not address how the data warehouse is built like Inmon did; rather he focused on
the functionality of a data warehouse.

Data Warehouse Properties


Subject
Oriented

Integrated

Data
Warehouse

Non Volatile

Time Variant

Subject-Oriented
Subject-Oriented: A data warehouse can be used to analyze a particular
subject area. For example, "sales" can be a particular subject.
Data is categorized and stored by business subject rather than by application
OLTP Applications
Equity
Plans

Data Warehouse Subject

Shares

Insurance
Savings
Loans

Customer
financial
information

Integrated
Integrated: A data warehouse integrates data from multiple data sources. For
example, source A and source B may have different ways of identifying a product,
but in a data warehouse, there will be only a single way of identifying a product.
Data on a given subject is defined and stored once.

Continued
Savings

Current
accounts

Loans

OLTP Applications

Customer

Data Warehouse

Time-Variant
Time-Variant: Historical data is kept in a data warehouse. For example, one can
retrieve data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the most
recent data is kept. For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all addresses associated
with a customer.
Data is stored as a series of snapshots, each representing a period of time

Time
Jan-97
Feb-97
Mar-97

Data
January
February
March

Non-volatile
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
OR data in the data warehouse is not updated or deleted.
Operational

Warehouse

Load

Insert
Update
Delete

Read

Read

Data warehouse Architecture


A Data Warehouse Architecture (DWA) is a way of representing the overall structure
of data, communication, processing and presentation that exists for end-user
computing within the enterprise.
The architecture is made up of a number of interconnected parts:
Source system: finding the right source
Source data transport layer: transferring data from source to warehouse(e.g. using ftp)
Data quality control and data profiling layer: checking the quality and correcting the data
if needed.
Metadata management layer: collecting and managing metadata
Data integration layer: formatting and cleansing
Data processing layer: data staging(analysis, filtering), complex programming
End user reporting layer : final reports

DWA

Data Warehouse Options


Scope: can be narrow or wide depending on the need
Data redundancy: 3 levels
Virtual data warehouse end users have direct access to the data stores, using tools enabled at the
data access layer.
Central data warehouse a single physical database contains all of the data for a specific functional
area.
Distributed data warehouse components are distributed across several physical databases

Types of end-users:
Executives and managers
Power users (BA, Financial analysts, engineers)
Support Users (clerical, admins)

Developing Data Warehouse


It requires careful planning, requirements definition, design, prototyping and
implementation.
The first and most important element is a planning process that determines what kind of
data warehouse strategy the organization is going to start with.
Developing strategy:
Who is the audience? What is the scope? What type of data warehouse should we build?
i.e. we need to find out the needs of audience, scope , type of DW required.
Choose from number of available strategies available.

Evolving DWA:
Evolve data warehouse considering the framework

Designing Data Warehouses :


Different from operational system
Thinking in terms of much broader, and more difficult to define, business concepts than does
designing an operational system.

Managing Data Warehouses:


Data warehouses require careful management and marketing.

Goals
Provide Easy Access to Corporate Data
user access tools must be easy to use

access should be graphic


Access should be manageable by end user of the data warehouse
process of getting and analysing data must be fast
Provide Clean and Reliable Data for Analysis
data environment must be stable
Source conflicts must be resolved
Historical analysis must be possible

You might also like