Professional Documents
Culture Documents
Presented By:
NIKHIL DEBBARMA
M. Tech (2nd Semester)
CSE, NITA
Outline
v What is Data Warehousing?
v Purpose of Data Warehousing
v Introduction, Definitions, and Terminology
v Comparison with Traditional Databases
v Characteristics of Data Warehouses
v Classification of Data Warehouses
v Multi-dimensional Schemas
v Building a Data Warehouse
v Functionality of a Data Warehouse
v Warehouse vs. Data Views
v Implementation difficulties and open issues
What is Data Warehousing?
A process of transforming
data into information and
making it available to users
in a timely enough manner
to make a difference
Y
Data Warehouse
v A data warehouse is a
ÿ subject-oriented
ÿ integrated
ÿ time-varying
ÿ non-volatile
ÿ Diverse Sources
ÿ Diverse Formats
Data Warehouse
v Time-variant:
All data in the data warehouse is identified with a
particular time period.
È
È
Y
Operational v/s Information °DW) System
È
Y
È
Y
Y
Y
!
Y
"
#
Y
"$
"
%&
' (
)
Operational v/s Information System
È
È Y
*
!
Y
Y##
Y $#
"
Y
"
Y
Y
Warehouse Architecture
þ þ
Comparison with Traditional Databases
er
u a rt Q tr 4
Q
c a l Q tr 3
F is tr 2
Q
tr 1
Q R eg 1 R eg 2
P R eg 3
P 123
r
o P 124
d P 125
u
c P 126
t R
R
R e g io n
Data Modeling for Data Warehouses
ÿ Èact table
[ Each tuple is a recorded fact. This fact contains some
measured or observed variable °s) and identifies it with
pointers to dimension tables. The fact table contains the
data, and the dimensions to identify each tuple in the data.
Multi-dimensional Schemas
v Two common multi-dimensional schemas are
ÿ Star schema:
[ Consists of a fact table with a single table for each
dimension
ÿ Snowflake Schema:
[ It is a variation of star schema, in which the dimensional
tables from a star schema are organized into a hierarchy by
normalizing them.
Multi-dimensional Schemas
v Star schema:
ÿ Consists of a fact table with a single table for each
dimension.
Multi-dimensional Schemas
v Snowflake Schema:
ÿ It is
a variation of star schema, in which the
dimensional tables from a star schema are organized
into a hierarchy by normalizing them.
Multi-dimensional Schemas
v Èact Constellation :
ÿ Fact constellation is a set of tables that share some
dimension tables. !owever, fact constellations limit
the possible queries for the warehouse.
Multi-dimensional Schemas
v Indexing
ÿ Data warehouse also utilizes indexing to support high
performance access.
ÿ A technique called bitmap indexing constructs a bit
vector for each value in domain being indexed.
ÿ Indexing works very well for domains of low
cardinality.
Building A Data Warehouse
v The builders of Data warehouse should take a
broad view of the anticipated use of the
warehouse.
ÿ The design should supportad-hoc querying
ÿ An appropriate schema should be chosen that reflects
the anticipated usage.
Building A Data Warehouse
v The Design of a Data marehouse involves
following steps.
ÿ Acquisition of data for the warehouse.
ÿ Ensuring
that Data Storage meets the query
requirements efficiently.
Y
OLAP Cube
.
Y
++2 34+536
)
67 +765,8
)
29 :957:
)
+2 23537
)
1+ 2 8577
)
) 2 8577
OLAP Operations
Y Y
þ
OLAP Operations
Y
þ
OLAP Operations
Y
!
OLAP Operations
"
#
OLAP Server
v An OLAP Server is a high capacity,multi user data
manipulation engine specifically designed to
support and operate on multi-dimensional data
structure.
v OLAP server available are
ÿ MOLAP server
ÿ ROLAP server
ÿ !OLAP server
Presentation
#
#
Warehouse vs. Data Views
v Views and data warehouses are alike in that they both have
read-only extracts from the databases.
v !owever, data warehouses are different from views in the
following ways:
ÿ Data Warehouses exist as persistent storage instead of being
materialized on demand.
ÿ Data Warehouses are not usually relational, but rather multi-
dimensional.
ÿ Data Warehouses can be indexed for optimization.
ÿ Data Warehouses provide specific support of functionality.
ÿ Data Warehouses deals huge volumes of data that is contained
generally in more than one database.
Advantages of Warehousing
v !igh query performance
v Queries not visible outside warehouse
v Local processing at sources unaffected
v Can operate when sources unavailable
v Can query data not stored in a DBMS
v Extra information at warehouse
ÿ Modify, summarize °store aggregates)
ÿ Add historical information
Difficulties of implementing Data Warehouses
v OLAP tools
ÿ SQL Server Analysis Services
ÿ Oracle Express Server
v Reporting tools
ÿ MS Excel Pivot Chart
ÿ VB Applications
Tools
v Data Extraction - SAS
v Data Cleaning - Apertus, Trillium
v Data Storage - ORACLE, SYBASE
v Optimizers - Advanced Parallel Optimizer
Bitmap Indices
Star Index
Tools
v Development tools to create applications
IBM Visualizer, ORACLE CDE
v Relational OLAP
Informix Metacube
mhy we use Oracle for our warehouse