You are on page 1of 5

Data ware House

Operational Source Systems

Capture the transactions of the business; deal with one record at a time, mainly inserts and updates of one
record in the database, which is kept in a highly normalized form, following the ER model. Generating
report from an operational source system is theoretically possible, but will require numerous joins on
tables, thus complicating the matter for the end user (limiting slicing and dicing of data) and compromising
on query performance

Slicing and Dicing

Business users want to separate and combine data in the warehouse in endless combinations. This process
is called slicing and dicing. This is done generating SQL queries constrained by the attributes of the
dimension table

Meta Data

A repository is used to keep all this metadata


• Operational source meta data
Source schemas
• Staging meta data
Transformation rules
Target table layouts
Staging file layout
Conformed Dimension
Fact definitions
ETL transmission schedules
Run log results
Custom programming code
• Presentation meta data
Partition settings
Indexes
View definitions
DBMS level security and grants

Dimensional Model

Dimensional models are also relational models like ER Model, but with lesser degree of normalization and
are based on the star schema rather than ER models. Salient features of Dimensional Modeling are
• Understandability
• Query performance
• Resilience to change
Adding new attribute in dimension table
Old dimension table records for this new attribute should not contain NULL
Adding new dimension in the star schema
Old fact table records for this new dimension should not contain NULL
Adding new fact in fact table
NULL values are placed for the new fact in older fact table records
• Slicing and Dicing is possible and easier

Data Staging Area

• Flat files or Relational database - ER Model in 3N form for data storage


• No query or presentation services
• ETL Process
Extracting
From different source
Transforming
Cleaning
Conforming
Creating surrogate keys for each dimension record
Building aggregates
Loading and Indexing
Loading data into data marts

Presentation Server

• Relational database – Dimensional Model – A set of data marts connected together


• Direct query services
• User understandability
• Query performance – high performance retrieval of data

Data Mart – Star Schema

• Every data mart should always be business process centric rather than department centric of an
organization
• Every data mart is based on most granular and atomic data, i.e. measurements are taken at the
most granular level of Dimensions
• Based on star schema – Dimension Modeling
• Logical sub set of a complete data warehouse
• Represent one business process or Subject Area
• Tied together by conformed dimensions and conformed facts to complete a Data warehouse
• Fact table
o A highly normalized table with dimensions and facts. The dimensions of a fact table are not
co-related to each other and has a many to many relationship between them
o Stores Measurement of the business – a numeric fact
o Contains a set of FK, which makes its multi part primary key. Each FK is PK of one
dimension table
• Dimension table
o Dimension tables are flat, denormalized table
o Has a unique single part PK
o Represent the dimension on which the fact in the fact table is measured
o Textual Attributes of dimension
Fact Table

Numerical Performance Measurements (facts) resulting from a business process, taken at the intersection of
all dimensions are stored in fact table.

The list of dimension defines the grain of the fact table. All the measurements (facts) in a fact table must be
at the same grain .We access the fact tables via dimension tables attached to it

Fact tables represent many to many relationships between dimensions. It’s not necessary that all the
dimensions in the fact table, together guarantee the uniqueness of a row in the fact table

D1 D2 D3
A B {c, C}
A b {c, C}
a b {c, C}
a B {c, C}

In the above fact table only dimensions D1 and D2 make up the primary key (A many to many relationship
exist only between D1 and D2). In this case dimension D3 can only take on a single value
in the context of the composite primary key (D1, D2) that is either c or C

This means ONLY ONE of these two rows can exist, NOT BOTH in the fact table,
otherwise PK constraint will be violated

D1 D2 D3
A B C
A B c
Dimension Tables

Dimension attributes serve as the primary source of


Query constraints
Groupings
Report labels

And are identified as BY words like dollar sale by week by brand

Dimensional Design process

• Choose a legacy single source of data – Single source data mart


Select the business process supported by a single source data collection system
o Raw material purchasing
o Orders
o Shipments
o Invoicing
o Inventory
o General Ledger

• Fact table grain – Declare the grain of the business process


Decide what is a record / row in the fact table. Most fact table records represent either

o Transaction or Individual Line items in a transaction


One sale transaction
One ATM transaction
One boarding pass to get on a flight
Each line item of an invoice
o Periodic snapshot
Daily product sales total in a store
Daily snapshot of inventory level for each product in every store
Monthly account balance
o Accumulating snapshot

The lower the grain of the fact table, the more robust the design is. As a rule the data or measurement
should be at the lowest possible grain of each dimension

• Dimensions in the fact table


How do business people describe the measurement that result from the business process?

Fact table grain itself will help in choosing the primary dimensions. For example a line item in a POS
retail sale transaction will lead to date, customer, product and store dimensions. The best dimensions
are those that take on a single value in the context of each measurement, which implies that dimensions
are at the lowest possible grain. Additional dimensions can be added to the fact table other than the
primary dimensions that result from the grain of the fact table, and are called supplement dimensions.
These supplement dimensions should take on a single value under each combination of the primary
dimensions, thus not violating the primary key constraint of the fact table

Dimensions are not co-related to each other in a fact table, thus making the fact table highly
normalized. If you suspect two dimensions are somehow related then think of combining the two
dimensions as one dimension table

• Choose facts
What are we measuring? Business performance measure
Facts should always be specific to the fact / business process grain
o Grain Transaction fact – transaction amount
o Grain Line item of a transaction fact – amount, quantity, and discounts …
o Grain Snapshot facts– many facts
Conformed dimension
• Conformed dimensions are either identical or strict mathematical subsets of the most granular, detailed
dimension
• Both row or column dimension sub-setting is possible
o Row subset: subset of rows from the master dimension table
o Column subset: Leaving some attributes from the master dimension table
• Means the same with every possible fact table it is joined i.e. in different data marts
• Conformed dimension is a master table of that dimension, like customer – customer master table or a
product master list, built from taking data from all the sources. Since built considering different sources it
has a different hierarchy for each subject area or source
• Defined at the most granular- atomic level, hence very detailed
• The grain of customer dimension will be an individual customer
• They have an anonymous surrogate key. They tie data marts together
• In general they are separate physical tables in different data marts

Degenerate Dimensions
• Transaction or Transaction Line item oriented fact table design
• In transaction systems, the parent child relationship, the parent key –like invoice number
• In dimensional modeling we keep these keys in fact table and are called degenerate keys and sit in
Fact table without a join to anything
• They mostly become part of the primary key of the fact table

Conformed Facts
• For reports that drill across the data marts
• Conformed facts are defined in the same dimensional context and with the same units of measurements
from data mart to data mart. Example Revenue and Profits needs to be reported in the same time periods
and in the same geographies in two different fact tables. Month end revenue should be a different fact than
Billing cycle revenue in one fact table
• Facts are measurements taken at each combination of dimensions. A fact is something that is not known in
advance
• A fact in an ER model is a NON KEY numeric measurement having a many to many relationship
Additive facts
Additive facts are those that can be summed up across all the dimensions in the fact table

Semi Additive facts


Semi additive facts can be summed up across some of the dimensions of the fact table
Example of a fact table DAILY INVENTORY LEVEL OF ALL PRODUCTS IN ALL STORES

Product (Dimension) Store (Dimension) Date (Dimension) Quantity at Hand (Fact)


P1 S1 MONDAY 20
P2 S1 MONDAY 19
P1 S2 MONDAY 10
P2 S2 MONDAY 9

P1 S1 TUESDAY 15
P2 S1 TUESDAY 14
P1 S2 TUESDAY 5
P2 S2 TUESDAY 4

In the above table we see that fact – Quantity at hand is additive across product dimension because if we
freeze store and date to say S1 and MONDAY then 20+19=39 tells us TOTAL INVENTRY LEVEL OF ALL
THE PRODUCTS IN STORE S1 ON MONDAY which is a valid summation

In the above table we see that fact – Quantity at hand is additive across store dimension because if we
freeze product and date to say P1 and MONDAY then 20+10=30 tells us TOTAL INVENTRY LEVEL OF
PRODUCT P1 ON MONDAY IN ALL STORES which is a valid summation
In the above table we see that fact – Quantity at hand is NOT additive across date dimension because if we
freeze store and product to say S1 and P1 then 20+15=35 is not a valid summation because it is summing the
inventory level of two days of P1 in S1, which at best can be averaged to get an average inventory level

Non-Additive facts
Non-additive facts cannot be summed up across any of the dimensions of the fact table

Enhanced Facts

Sometimes we can add more facts in the fact table that are not coming from the grain definition to get more
meaningful reports. These facts are co-related to the original facts of the fact table. Care should be taken in
introducing these enhanced facts and they must not violate the grain of the fact table

Surrogate keys
• Surrogate key is a simple integer key, used as primary key of the dimension table
• It is completely meaningless and tells nothing about the record in the dimension table
• Other names used are meaningless key, integer key, non natural key, artificial key, synthetic key

Snow flaking and Dimension table Hierarchy

Removing low cardinality textual attributes from dimension tables and placing them in secondary
Dimension tables, that is normalizing the dimension table

Dimensions have textual attributes. Almost every dimension table contains multiple hierarchies of
attributes along with some unrelated attributes. Example for a product dimension, it has a finance hierarchy
of attributes and then a marketing hierarchy of the same attributes and then some unrelated attributes.
Inside one hierarchy there exist a many to one relationship between attributes. All the lower nodes can be
rolled up to the root node of the hierarchy
Example product hierarchy
Department attribute (one department has many categories)
Category attribute (one category has many brands)
Brand attribute (one brand has many products)
Product
So if a product dimension table has 100 rows depicting each product, brand, category and department, it’s
likely that department attribute will have the lowest cardinality and product attribute with highest
cardinality. In Such cases designers tend to take out Department, Category and Brand attributes from the
Dimension table and make them separate dimension tables, thus normalizing the original dimension table.
This is called Snow flaking

There may exist multiple hierarchies in one dimension table. For example a store dimension table can have
o One geographic hierarchy as State, County, Zip code
o Second business specific geographic hierarchy as Region, District

Slowly changing Dimension


The key of the dimension record does not change but some other attribute changes
• Type 1 overwrite the old record
• Type 2 create a new record and keep the old record as history
• Type 3 Create another field in the same record and keep both values

You might also like