Professional Documents
Culture Documents
Data Warehouse
Architecture Best Practices
December 5, 2005
Agenda
Introductions
Business Intelligence Background
Architecture Best Practices
Questions & Answers
Data Warehouse
Architecture Best Practices
Introductions
cohesion
institute
Presenter Biography
R. Michael Pickering
President and Chief Architect,
Cohesion Systems Consulting Inc.
previously, Managing Consultant, BI&W, Oracle
Consulting (Canada)
before that, Red Brick Systems, Inc.
over 8 years DW experience
Manulife Reinsurance, Bell Canada, USDA, Kraft
Foods, LCBO, Telecom Argentina, Nortel Networks,
Procter & Gamble, Bayer, Syncrude, OMoHLTC…
Mr. Pickering has had DW articles published
in The Handbook of Data Management
Audience Survey
By a show of hands, please indicate your
experience with:
normalization
dimensional modeling
operational data store
data consolidation
Extract Transform Load (ETL)
metadata architecture
DW appliances
Data Warehouse
Architecture Best Practices
Business Intelligence
Background
cohesion
institute
BUT...
Overarching Goal
The overarching goal of
business intelligence is to
provide the information
necessary to MANAGE a
business
This means providing
information in support of
management decision making,
which is why BI is also called
“Decision Support”
March 21, 2009 DW Architecture Best Practices 9
cohesion
institute
Business of BI
In some cases, legislation such as Sarbanes-
Oxley or Basel II makes some kind of BI
fundamental to doing business
Many leading companies use BI to achieve
competitive advantage
E.g. Walmart, Dell, Amazon.com, Kraft, American
Express, etc…
Good Architecture
‘It’s not easy to describe a good
design, but I’ll know it when I
see it’
BI Architecture Requirements
must recognize change as a
constant
take incremental development
approach
existing applications must
continue to work
need to allow more data and
new types of data to be added
Data Warehouse
Architecture Best Practices
Architecture
Best Practices
cohesion
institute
Dimensional Model
Advantages
simplicity
humans can navigate and remember
software can navigate
deterministically
business process explicitly separated
(Data Mart)
not so many keys (keys = # of
attendant tables)
March 21, 2009 DW Architecture Best Practices 23
cohesion
institute
Best Practice #1
Use a data model that is optimized
for information retrieval
dimensional model
denormalized
hybrid approach
Best Practice #2
Carefully design the data
acquisition and cleansing processes
for your DW
Ensure the data is processed
efficiently and accurately
Consider acquiring ETL and Data
Cleansing tools
Use them well!
Data Model
Already discussed the benefits of a
dimensional model
No matter whether dimensional
modeling or any other design
approach is used, the data model
must be documented
Metadata Architecture
The strategy for sharing data model
and other metadata should be
formalized and documented
Metadata management tools should
be considered & the overall
metadata architecture should be
carefully planned
Best Practice #3
Design a metadata architecture
that allows sharing of metadata
between components of your DW
consider metadata standards such as
OMG’s Common Warehouse
Metamodel (CWM)
Alternative Architecture
Approaches
Bill Inmon: “Corporate Information
Factory”
Hub and Spoke philosophy
“JBOC” – just a bunch of cubes
Let it evolve naturally
What We Want
(Architectural Principal)
In most cases, business and IT
agree that the data warehouse
should provide a ‘single version of
the truth’
Any approach that can result in
disparate data marts or cubes is
undesireable
This is known as data silos or…
Enterprise DW Architecture
how to design an enterprise data
warehouse and ensure a ‘single
version of the truth’?
according to Kimball:
start with an overall data architecture
phase
use “Data Warehouse Bus” design to
integrate multiple data marts
use incremental approach by building one
data mart at a time
March 21, 2009 DW Architecture Best Practices 37
cohesion
institute
Dimension Granularity
conformed dimensions will usually
be granular
makes it easy to integrate with various
base level fact tables
easy to extend fact table by adding
new facts
no need to drop or reload fact tables,
and no keys have to be changed
Conforming Dimensions
by adhering to standards, the separate
data marts can be plugged together
e.g. customer, product, time
Conforming Facts
Data Consolidation
a current trend in BI/DW is ‘data
consolidation’
from a software vendor
perspective, it is tempting to
simplify this:
‘we can keep all the tables for all your
disparate applications in one physical
database’
March 21, 2009 DW Architecture Best Practices 46
cohesion
institute
Data Integration
To truly achieve ‘a single version of
the truth’, must do more than simply
consolidating application databases
Must integrate data models and
establish common terms of reference
Best Practice #4
Take an approach that consolidates
data into ‘a single version of the
truth’
Data Warehouse Bus
conformed dimensions & facts
OR?
Role of an ODS in DW
Architecture
In the case where an ODS is a
necessary component of the overall
DW, it should be carefully integrated
into the overall architecture
Can also be used for:
Staging area
Master/reference data management
Etc…
March 21, 2009 DW Architecture Best Practices 50
cohesion
institute
Best Practice #5
Consider implementing an ODS only
when information retrieval requirements
are near the bottom of the data
abstraction pyramid and/or when there
are multiple operational sources that
need to be accessed
Must ensure that the data model is
integrated, not just consolidated
May consider 3NF data model
Avoid at all costs a ‘data dumping ground’
Capacity Planning
DW workloads are typically very
demanding, especially for I/O capacity
Successful implementations tend to grow
very quickly, both in number of users
and data volume
Rules of thumb do exist for sizing the
hardware platform to provide adequate
initial performance
typically based on estimated ‘raw’ data size
of proposed database e.g. 100-150 Gb per
modern CPU
Scale Out
There is an increasing trend in IT to ‘scale out’
processing capacity by deploying many small,
commodity servers rather than a single large
SMP system
This strategy tends to work well for relatively
simple applications such as network or web
servers
For very complex workloads such as a data
warehouse, this strategy is much more difficult
to effectively implement
Especially so for the database server itself
Best Practice #6
Create a capacity plan for your BI
application & monitor it carefully
Consider future additional performance
demands
Establish standard performance benchmark
queries and regularly run them
Implement capacity monitoring tools
Build scalability into your architecture
May need to allow for scaling both up and
out!
DW Appliances
DW appliances, consisting of
packaged solutions providing all
required software and hardware, are
beginning to offer very promising
price/performance
production experience is limited so
far, so this is not yet a ‘best practice’
Data Warehouse
Architecture Best Practices
Q&A
cohesion institute