You are on page 1of 45

10/15/08 Sudarshan

Review

Today

Star Schema

Fact table

Dimensions

Drilling Down &


Roll up

Slicing & Dicing

Implementation
techniques for
OLAP

Bit map indexes

Join indexes

File org.

Architecture
Architecture

Characteristics
Characteristics

Relational OLAP
Relational OLAP

Multidimensional OLAP
Multidimensional OLAP

ROLAP VS. MOLAP


ROLAP VS. MOLAP

10/15/08 Sudarshan
2

Star Schema is a relational database schema for


representing multidimensional data.

It is the simplest form of data warehouse


schema that contains one or more dimensions
and fact tables.

It is called a star schema because the entity


relationship diagram between dimensions and
fact tables resembles a star where one fact table
is connected to multiple dimensions.

!he center of the star schema consists of a


large fact table and it points towards the
dimension tables.

!he ad"antage of star schema are slicing down#


performance increase and easy understanding of
data.
$hat is Star
Schema%
10/15/08 Sudarshan
3
Steps in designing Star
Schema

Identify a business process for


analysis(lie sales!"

Identify measures or facts (sales


dollar!"

Identify dimensions for facts(product


dimension# location dimension# time
dimension# or$ani%ation dimension!"

List the columns that describe each


dimension"(re$ion name# branch name#
re$ion name!"

&etermine the lowest level of summary


in a fact table(sales dollar!"
10/15/08 Sudarshan
4
Important aspects of Star Schema
& Snow Fla&e Schema

In a star schema e"ery dimension will ha"e a


primary &ey.

In a star schema# a dimension table will not ha"e


any parent table.

$hereas in a snow fla&e schema# a dimension


table will ha"e one or more parent tables.

'ierarchies for the dimensions are stored in the


dimensional table itself in star schema.

$hereas hierarchies are bro&en into separate


tables in snow fla&e schema. !hese hierarchies
helps to drill down the data from topmost
hierarchies to the lowermost hierarchies"
10/15/08 Sudarshan
5
Fact

'acts are numeric measurements


(values! that represent a specific
business activity"

()ample# sales fi$ures are numeric


measurements that represent product
and*or service sales"

'acts are used in business data analysis#


are units# cost# prices and revenues"

'acts are stored in a 'A+T table I"e" the


center of the star schema"
10/15/08 Sudarshan
6
Fact !able
The centrali(ed table in a star schema is called
as F)*! table# that contains facts and connected to
dimensions" A fact table typically has two types of
columns,

those that contain facts and

those that are foreign &eys to dimension tables.

The primary ey of a fact table is usually a composite


ey that is made up of all of its forei$n eys"

A fact table mi$ht contain either detail level facts or


facts that have been a$$re$ated (fact tables that
contain aggregated facts are often instead
called summary tables!" A fact table usually
contains facts with the same level of a$$re$ation"
10/15/08 Sudarshan
7

-any OLAP applications are based on a fact


table

'or e)ample# a supermaret application mi$ht


be based on a table
Sales Sales (Market.Id# Product_Id# Time_Id# Sales_Amt!

The table can be viewed as multidimensional

Market.Id# Product_Id# Time_Id are the dimensions


that represent specific supermarets# products#
and time intervals

Sales_Amt is a function of the other three


10/15/08 Sudarshan
8
'act Table (+onclusion!

+entral table

mostly raw numeric items

narrow rows# a few columns at most

lar$e number of rows (millions to a


billion!

Access via dimensions


10/15/08 Sudarshan
9
Dimension

/ualifyin$ characteristics that provide


additional perspective to a $iven fact"

()ample# sales mi$ht be compared by product


from re$ion to re$ion and from one time period
to the ne)t"

0ere sales have product# location and time


dimensions"

Such dimensions are stored in &I-(1SIO1AL


TA2L("
10/15/08 Sudarshan
10
Dimension !ables

The dimensions of the fact table are further


described with dimension tables

'act table,

Sales
Sales (Market_id, Product_Id, Time_Id,
Sales_Amt!

&imension Tables,

-aret
-aret (Market_Id, City, State, Region!

Product
Product (Product_Id, Name, Category, Price!

Time
Time (Time_Id, Week, Month, Quarter!
10/15/08 Sudarshan
11
)ttributes

(ach dimension table contain


attributes"

3sed to search# filter and classify facts"

()ample# Sales# we can identify some


attributes for each dimension,

Product &imension, product I&#


description# product type

Location &imension, re$ion# state# city"

Time &imension, year quarter# month#


wee and date"
10/15/08 Sudarshan
12
)ttributes hierarchy

A0 provides a top4down data


or$ani%ation

3sed for a$$re$ation and drill4


down*roll4up data analysis"

()ample# location dimension attributes


can be or$ani%ed in a hierarchy by
re$ion# state and city"

A0 provides the capability to perform


drill4down and roll4up searches"

Allows the &5 and OLAP systems to to


have defined path"
10/15/08 Sudarshan
13
) *oncept 'ierarchy+ Dimension
,location-
all
Europe North_Ameria
!e"io #anada Spain $erman%
&anou'er
!( )ind *( #han
(((
((( (((
(((
(((
(((
all
re+ion
o,,ie
ountr%
-oronto .ran/,urt it%
10/15/08 Sudarshan
14
.ultidimensional Data

Sales 'olume as a ,untion o, produt0


month0 and re+ion
1
r
o
d
u

t
2
e
+
i
o
n
!onth
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City onth !ee"
#$$ice Day
10/15/08 Sudarshan
15
) Sample Data *ube
Total annual sales
o$ T% in &'(')'
Date
P
r
o
d
u
c
t
C
o
u
n
t
r
y
sum
sum
-&
&#2
1#
13tr
23tr
33tr
43tr
4(S(A
#anada
!e"io
sum
10/15/08 Sudarshan
16
Star Schema

A sin+le ,at ta5le and ,or eah dimension


one dimension ta5le

6oes not apture hierarhies diretl%


T
i
m
e

r
o
d
c
u
s
t
c
i
t
y
!
a
c
t
date# custno# prodno# cityname# """
10/15/08 Sudarshan
17
E"ample o, Star Shema
E"ample o, Star Shema

E"ample o, Star Shema7 .i+ure 1(6




10/15/08 Sudarshan
18

In the e)ample# sales fact table is connected to


dimensions location# product# time and
or$ani%ation" It shows that data can be sliced
across all dimensions and a$ain it is possible
for the data to be a$$re$ated across multiple
dimensions" 6Sales dollar6 in sales fact table
can be calculated across all dimensions
independently or in a combined manner which
is e)plained below"
Sales dollar value for a particular product

Sales dollar value for a product in a location

Sales dollar value for a product in a year


within a
location

Sales dollar value for a product in a year


within a
location sold or serviced by an employee
10/15/08 Sudarshan
19
()ample of Star Schema

time_/e%
da%
da%_o,_the_8ee/
month
9uarter
%ear
time
loation_/e%
street
it%
pro'ine_or_street
ountr%
loation
Sales .at -a5le
time_/e%
item_/e%
5ranh_/e%
loation_/e%
units_sold
dollars_sold
a'+_sales
!easures
item_/e%
item_name
5rand
t%pe
supplier_t%pe
item
5ranh_/e%
5ranh_name
5ranh_t%pe
5ranh
10/15/08 Sudarshan
20
)ggregation

-any OLAP queries involve aggregation of the


data in the fact table

'or e)ample# to find the total sales (over time!


of each product in each maret# we mi$ht use
S/0/*! S.Market_Id# S.Product_Id# S1.
,S.Sales_Amt-
FR2. Sales Sales S
3R214 B5 S.Market_Id# S.Product_Id

The a$$re$ation is over the entire time


dimension and thus produces a two4
dimensional view of the data
10/15/08 Sudarshan
21
A$$re$ation Over
Time

The output of the previous query


: : : 15
:
7000 7503
14
:
3 4503
13
:
2402 6003
12
:
1503 3003
11
!4 !3 !2 !1
(&*Sales_Amt+
Market_Id
P
r
o
d
u
c
t
_
I
d
10/15/08 Sudarshan
22
!ypical 20)4
2perations

Roll up ,drillup-+ summari%e data

"y clim"ing u hierarchy or "y dimension reduction

Drill down ,roll down-+ reverse of roll4up

!rom higher le#el summary to lo$er le#el summary or


detailed data, or introducing ne$ dimensions

Slice and dice+

ro%ect and select

4i"ot ,rotate-+

reorient the cu"e, #isuali&ation, '( to series o! )(


lanes*

Other operations

drill across: in#ol#ing +across, more than one !act


ta"le

drill through: through the "ottom le#el o! the cu"e to


its "ack-end relational ta"les +using SQ.,
10/15/08 Sudarshan
23
Drilling Down and
Rolling 1p

Some dimension tables form an aggregation


hierarchy
Market_Id City State Region

()ecutin$ a series of queries that moves down


a hierarchy (e*g*, from a$$re$ation over
re$ions to that over states! is called drilling
down

Re6uires the use of the fact table or


information more specific than the re6uested
aggregation ,e.g.# cities-

()ecutin$ a series of queries that moves up


the hierarchy (e"$"# from states to re$ions! is
called rolling u
10/15/08 Sudarshan
24

&rillin$ down on maret, from Region to


State
Sales Sales (Market_Id, Product_Id, Time_Id,
Sales_Amt!
-aret -aret (Market_Id, City, State, Region!
7
S/0/*! S.Product_Id# ..Region# S1.
,S.Sales_Amt-
FR2. Sales Sales S# .ar&et .ar&et .
$'/R/ ..Market_Id 7 S.Market_Id
3R214 B5 S.Product_Id# ..Region
7
S/0/*! S.Product_Id# ..State# S1.
,S.Sales_Amt-
FR2. Sales Sales S# .ar&et .ar&et .
$'/R/ ..Market_Id 7 S.Market_Id
3R214 B5 S.Product_Id# ..State#
Drilling Down
10/15/08 Sudarshan
25
Rolling 1p

2ollin+ up on mar/et0 ,rom State to Region

;, 8e ha'e alread% reated a ta5le0 State_Sales State_Sales0 usin+


8. S/0/*! S.Product_Id# ..State# S1.
,S.Sales_Amt-
FR2. Sales Sales S# .ar&et .ar&et .
$'/R/ ..Market_Id 7 S.Market_Id
3R214 B5 S.Product_Id# ..State
then we can roll up from there to+
9 9. S/0/*! !.Product_Id# ..Region# S1.
,!.Sales_Amt-
FR2. State:Sales State:Sales !# .ar&et .ar&et .
$'/R/ ..State 7 !.State
3R214 B5 !.Product_Id# ..Region
10/15/08 Sudarshan
26
Rollup and Drill Down

Sales +hannel

Re$ion

+ountry

State

Location Address

Sales
Representative
R
o
l
l

3
p
0i$her Level of
A$$re$ation
Low4level
&etails
&
r
i
l
l
4
&
o
w
n
10/15/08 Sudarshan
27
;Slicing and Dicing<
4roduct
Sales *hannel
R
e
g
i
o
n
s
Retail &irect Special
0ousehold
Telecomm
8ideo
Audio
India
'ar (ast
(urope
The Telecomm Slice
10/15/08 Sudarshan
28
Snowfla&e Schema

A snowflae schema is a term that


describes a star schema structure
normali%ed throu$h the use of outri$$er
tables" i"e dimension table hierarchies
are broen into simpler tables" In
star schema e)ample we had 9
dimensions lie location# product# time#
or$ani%ation and a fact table (sales!
10/15/08 Sudarshan
29
Snowflae schema

Represent dimensional hierarchy


directly by normali%in$ tables"

(asy to maintain and saves stora$e


T
i
m
e

r
o
d
c
u
s
t
c
i
t
y
!
a
c
t
date# custno# prodno# cityname# """
r
e
g
i
o
n
10/15/08 Sudarshan
30
E"ample o, Sno8,la/e Shema




10/15/08 Sudarshan
31
/xample of Snowfla&e
Schema
time_/e%
da%
da%_o,_the_8ee/
month
9uarter
%ear
time
loation_/e%
street
it%_/e%
loation
Sales .at -a5le
time_/e%
item_/e%
5ranh_/e%
loation_/e%
units_sold
dollars_sold
a'+_sales
!easures
item_/e%
item_name
5rand
t%pe
supplier_/e%
item
5ranh_/e%
5ranh_name
5ranh_t%pe
5ranh
supplier_/e%
supplier_t%pe
supplier
it%_/e%
it%
pro'ine_or_street
ountr%
it%
10/15/08 Sudarshan
32
Indexing !echni6ues

()ploitin$ inde)es to reduce


scannin$ of data is of crucial
importance

2itmap Inde)es

:oin Inde)es

Other Issues

Te)t inde)in$

Paralleli%in$ and sequencin$ of


inde) builds and incremental
updates
10/15/08 Sudarshan
33
Inde)in$ Techniques

2itmap inde),

Inde) on a particular column

(ach value in the column has a bit


vector, bit4op is fast

The len$th of the bit vector, ; of


records in the base table

The i4th bit is set if the i4th row of the


base table has the value for the inde)ed
column

not suitable for hi$h cardinality


domains
10/15/08 Sudarshan
34
2it-ap Inde)es

()ample, the attribute se) has values - and '"


A table of <== million people needs > lists of
<== million bits
10/15/08 Sudarshan
35
*ustomer
=uery + select > from customer where
gender 7 ?F@ and "ote 7 ?5@
=
=
=
=
=
=
=
=
=
<
<
<
<
<
<
<
<
<
2itmap Inde)
-
'
'
'
'
-
?
?
?
1
1
1
10/15/08 Sudarshan
36
2it -ap Inde)
Cust Region Rating
C1 N H
C2 S M
C3 W L
C4 W H
C5 S L
C6 W L
C7 N H
Base Table Base Table
Row ID N S E W
1 1 0 0 0
2 0 1 0 0
3 0 0 0 1
4 0 0 0 1
5 0 1 0 0
6 0 0 0 1
7 1 0 0 0
Row ID H M L
1 1 0 0
2 0 1 0
3 0 0 0
4 0 0 0
5 0 1 0
6 0 0 0
7 1 0 0
Rating Index Rating Index
Region Index Region Index
Customers where
Customers where
Region = W Region = W
Rating = M Rating = M And
And
10/15/08 Sudarshan
37
2it-ap Inde)es

+omparison# @oin and a$$re$ation operations


are reduced to bit arithmetic with dramatic
improvement in processin$ time

Si$nificant reduction in space and I*O (A=,<!

Adapted for hi$her cardinality domains as well"

+ompression (e"$"# run4len$th encodin$!


e)ploited

Products that support bitmaps, -odel >=9#


Tar$etInde) (Redbric!# I/ (Sybase!# Oracle B"A
10/15/08 Sudarshan
38
Join Indexes

1re<omputed =oins

A =oin inde" 5et8een a ,at ta5le and a dimension


ta5le orrelates a dimension tuple 8ith the ,at
tuples that ha'e the same 'alue on the ommon
dimensional attri5ute

e(+(0 a =oin inde" on city dimension o, calls ,at ta5le

orrelates ,or eah it% the alls >in the calls ta5le? ,rom
that it%
10/15/08 Sudarshan
39
Join Indexes

:oin inde)es can also span


multiple dimension tables

e"$"# a @oin inde) on city and time


dimension of calls fact table
10/15/08 Sudarshan
40
Star :oin Processin$

3se @oin inde)es to @oin dimension


and fact table
Calls
C/T
C/T/.
C/T/.
/P
Time
.oca-
tion
Plan
10/15/08 Sudarshan
41
@itmapped Aoin 1roessin+
A1&
Time
.oca-
tion
Plan
Calls
Calls
Calls
0itmas
1
2
1
2
2
1
1
1
2
10/15/08 Sudarshan
42
Nigel Pendse, Richard Creath - The OLAP Report Nigel Pendse, Richard Creath - The OLAP Report
20)4 Is F)S.I

'ast

Analysis

Shared

-ultidimensional

Information
10/15/08 Sudarshan
43
5arehouse Products

+omputer Associates 44 +A4In$res

0ewlett4Pacard 44 Allbase*S/L

Informi) 44 Informi)# Informi) CPS

-icrosoft 44 S/L Server

Oracle 44 OracleB# Oracle Parallel Server

Red 2ric 44 Red 2ric 5arehouse

SAS Institute 44 SAS

Software AD 44 A&A2AS

Sybase 44 S/L Server# I/# -PP


10/15/08 Sudarshan
44
5arehouse Server
Products

Oracle E

Informi)

Online &ynamic Server

CPS 44()tended Parallel Server

3niversal Server for ob@ect relational


applications

Sybase

Adaptive Server <<"F

Sybase -PP

Sybase I/
10/15/08 Sudarshan
45
5arehouse Server
Products

Red 2ric 5arehouse

Tandem 1onstop

I2-

&2> -8S

3niversal Server

&2> 9==

Teradata

You might also like