Professional Documents
Culture Documents
Architecture
Objectives
Define Data Warehouse Architecture
Define Data Warehouse and Data Mart
Present a Data Warehouse Architectural
Framework
Demo – Data Enterprise Integration Server
Information Systems
Architecture
Information Systems Architecture is the
process of making the key choices that are
essential to the development of an
information system. Architecture includes:
◦ Guiding Principles:
◦ Approaches/philosophies
◦ “Logical” representations of a system
◦ Hardware/Operating System
◦ Computing model: client/server vs traditional vs
Web-based
◦ Tools and technologies
It is key, when making these choices that they
are:
◦ Requirements driven
◦ Take into consideration operational, technical and
financial feasibility
◦ Made within an architectural framework
◦
Architecture Drivers
There are a lot of Drivers of Architecture
Corporat
Corporat
ee
Politics
Politics
Business
Business System
System
Plan
Plan Qualities
Qualities
Architecture
Architecture
Emergin
Emergin
Current
Current gg
Systems
Systems Technolo
Technolo
gies
gies
End
EndUser
User
Require
Require
ments
ments
How is Architecture Different
from Design?
Its not – Architecture can be considered
‘high-level’ design
Architecture includes those aspects of the
design that are essential to the
information system
Architecture Example:
◦ Users must be able to self-serve (guiding
principle)
◦ “We will use a “hub and spoke” design where
data will be placed in a central data warehouse,
then be propagated to one or more data marts.
(approach)
◦ We will normalize data in the central warehouse
and use a dimensional design in the data marts
(approach)
◦ We will use Oracle 8i as our DBMS (technical
architecture)
Architecture vs Design
Not Architecture:
◦ The Order subject area will be composed of the
following tables: order_fact, customer_dim,
product_dim and time_dim
◦ The customer_dim table will have the following
attributes…….
The Value of
Communication:
Architecture
◦ To business sponsors, and business users
◦ Between members of the project team
Planning:
◦ Cross Check for Project Plan
◦ Ensure that all important components of the
data warehouse are accounted for
Flexibility and Growth
◦ Thinking about overall architecture will reduce
risk associated with the ‘success’ of the data
warehouse
Learning
Productivity and Reuse
What’s different about DW
Architecture?
Transaction processing systems – growth is
(relatively) predictable
Example:
◦ A company uses SAP for order processing
◦ They are opening a new retail store
◦ They predict (based on experience) 2000
transactions per week
◦ To process this volume, we need 3 workstations
to capture the transactions
◦ Peak time each day is 11-2 when 50% of
transactions occur
◦
What’s Different About Data
Warehouse Architecture?
Successdrives
explosive growth
Data ◦ More users
Warehouse ◦ More (complex)
queries
◦ More data
Performance is
unpredictable
Growth
SAP R/3
◦ Unpredictable
Siebel
queries
◦ Unpredictable use
patterns
Time
The Great Data Warehouse
Architecture Debate
Bill Inmon: “The
enterprise data If you build it,
They will come
warehouse”
Ralph Kimball: “data
marts”
The compromise:
“Hub and Spoke” or
“Federated” models
What is a Data Mart?
A data mart is a collection of subject areas
organized for decision support based on
the specific needs of a given user group.
Each mart may widely different from others
(as we will see)
Typically, data marts are built on the
dimensional data model:
◦ Facts – things that the organization wants to
measure: revenue, orders, shipments,
purchases, etc.
◦ Dimensions – the means by which the
organization wants to analyze the measures
(facts) – by customer, by time, by product –
BY ANY COMBINATION!!
What is a Data Mart?
There are two kinds of data marts--dependent
and independent.
A dependent data mart is one whose source is a
data warehouse.
An independent data mart is one whose source is
the legacy applications environment. All
dependent data marts are fed by the same
source--the data warehouse. Each independent
data mart is fed uniquely and separately by the
legacy applications environment.
Dependent data marts are architecturally and
structurally sound.
Independent data marts have a number of
significant issues
Data Warehouse vs.
Data Marts
What comes first
From the Data Warehouse to
Data Marts
Information
Individually Less
Structured
History
Departmentally Normalized
Structured Detailed
Organizationally More
Structured Data Warehouse
Data
Data Warehouse and Data
Marts
OLAP
Data Mart
Lightly summarized
Departmentally structured
Organizationally structured
Atomic
Detailed Data Warehouse Data
Data Mart Centric
Data Sources
Data Marts
Data Warehouse
Problems with Data Mart
Centric Solution
Data Sources
Data Warehouse
Data Marts
Data Warehouse
Architectures
Generic Two-Level Architecture
Independent Data Mart
Dependent Data Mart and Operational
Data Store
Logical Data Mart and Real-Time Data
Warehouse
Three-Layer architecture
19
Generic two-level data warehousing architecture
L
One,
company-
wide
T warehouse
20
Independent data mart data Data marts:
warehousing architecture Mini-warehouses, limited in scope
T
E
21
Dependent data mart with operational data ODS provides option for
store: a three-level architecture obtaining current data
T
E Simpler data access
Single ETL for
enterprise data warehouse Dependent data marts
(EDW) loaded from EDW
22
Logical data mart and real time ODS and data warehouse
are one and the same
warehouse architecture
T
E
Near real-time ETL for Data marts are NOT separate databases,
Data Warehouse but logical views of the data warehouse
Easier to create new data marts
23
Three-layer data architecture for a data warehouse
24
The Major Data Warehouse
Architectures
End user
access/
applications
Data staging
•Developed independently.
•No conformed dimensions (i.e., does not have the same categories and labels
for data elements in data marts which would allowdata across data marts to be
combined).
•Built to a business unit or functional area.
Federated architecture
Data mart
Data staging
End user
access/
applications
Data warehouse
Source of sponsorship
Expert influence
Best Practice #1
ØUse a data model that is optimized for
information retrieval
◦ dimensional model
◦ denormalized
◦ hybrid approach
Capture/Extract
Scrub or data cleansing
Transform
Load and Index
34
Capture/Extract…obtaining a snapshot of a chosen subset
of the source data for loading into the data warehouse
Steps in data
reconciliation
36
Transform = convert data from format of operational
system to format of data warehouse
Steps in data
reconciliation
(cont.)
Record-level: Field-level:
Selection–data partitioning single-field–from one field to one field
Joining–data combining multi-field–from many fields to one, or
Aggregation–data summarization one field to many
37
Load/Index= place transformed data
into the warehouse and create indexes
Steps in data
reconciliation
(cont.)
38
Data Quality Assurance
data cleansing
◦ the process of validating and enriching
the data as it is published to the DW
◦ also, a software development tool for
building data cleansing processes (a
data cleansing tool)
◦ many production DWs have only very
rudimentary data quality assurance
processes
DW appliances, consisting of
packaged solutions providing all
required software and hardware, are
beginning to offer very promising
price/performance
production experience is limited so
far, so this is not yet a ‘best practice’