Professional Documents
Culture Documents
Overview
SivaSatish.K
Data Warehousing
Data & Information
History
Reasons:
In OLTP approach, historical data was archived to
update the recent information in the system and
even it is bit difficult to update that information
everywhere in OLTP. Unlike OLTP; DW stores
historical data to compare trends over period of time
(now a days, it’s in Tera bytes)
As it has one DB at the end, number of sources can
be used to transfer data and sources can be
relational DB, Flat File, Unix file. ETL tool handles
these source files to transfer data from source to
target.
Continued…..
Normalized Approach
Dimensional approach or De - Normalized
approach
Normalized Vs Dimensional Approach:
One Dimensional:-
– For SVVD (PAX-ANX-054 W), there are 4572
containers. Here, SVVD is the dimension and
4572 (container qty) is the measure i.e. fact.
– For Shipper name “ABC 123”, revenue is 1.2
million $. Here, shipper is the dimension and
revenue is the measure i.e. fact corresponds to
dimensions.
Continued….
Multidimensional example:
– What is the gross weight for 47GP Containers
traveling from New York to Singapore with
shipment direction “E” in year 2006?
– Here in this example, Gross weight is the only
measure (Fact) and container type (47GP), Load
Port (New York) and Discharge Port (Singapore),
shipment direction (E) and Year (2006) are
various dimensions.
Characteristics : Data Warehouse
Subject Oriented
Integrated
Non Volatile
Time Variant
Continued……
Pictorial Representation:
CPF Sybase
OLAP
CDW CDW Oracle Cubes
(ETL) Ware house
Gemstone Sybase
Server
Cognos
ODS Sybase Repository
EQP Shpmt
Data mart Mart Mart
Data
Functional
If Require, Middleware
Data Testing includes:
Initial data Testing, Data validation, Data quality,
Data integration and Data volume testing.
– Data Validation ensures Null/not null values,
duplicate records, behavior of surrogate
keys/primary keys, Data type and format etc.
– Data quality testing ensures the accurate
information in dimension and fact tables.
– Initial testing means verifying the data in CDW,
Master tables which will be populated through
OSCAR and its web application (Front End
System)
Continued….
Overview
Why ETL tool ?
http://en.wikipedia.org/wiki/Data_warehouse