You are on page 1of 25

Data Warehousing

An Introduction
What does it take to run a business?

TRANSACT
 Purchase raw
materials
 Arrangements
with channel
partners
 Sell
product/service
to customers
What does it take to run a business?

TRACK
 Inventory
 Customer details, feedback
 Product codes, manufacturing defects,
R&D
What does it take to run a business?

ADMINISTRATE
 Payroll
 Personnel
 Budgeting
 Accounting
So what does it take?

The answer is a four letter word:

OLTP: Online Transaction Processing


systems

- Handle individual modules of business


- High availability
- Operated by customers/employees
- Hold current information
- Handle day-to-day aspect of business
But what about the future?

Strategic decisions…
Which product lines
How can I increase
will bring long
market share?
term profit?

How can I improve How do I cut


customer retention? production costs?

What marketing strategies work


best for my products?
Information vs. Knowledge

The information you need is everywhere


Coherence

Structure

Organization

Focus
Data warehousing

Business Business
Data
information intelligence
warehouse

“Store of data collected from other


systems that becomes the foundation
for decision support and data analysis”
OLTP systems and Data warehouses

OLTP Data warehouse


- Transactions (many - Analysis (few
end users) users)
- Application oriented - Subject oriented
- Real time data - Historic info
- Volatile - Non-volatile
- Scattered - Integrated
- Processes few - Processes huge
records amounts of data
The data warehouse - how it works

E
T Data
warehouse
L

ETL: Extraction, Transformation, Load


- Picks up data from various sources
- Integrates data, organizes it according to
business rules
- End users query DB for various reports
More organization :– Data Marts

Data Mart: A scaled down version of a data


warehouse, to be used by a target group

E Fin
T Data
warehouse
L
Mktg

Sales
Architecture approaches

Top down Bottom up

Sales
E E
Data
T warehouse T Mktg
L L
Fin

Sales
Mktg Fin Data warehouse
Data models

Data Model: A conceptual model of the


information needed to support a business
function or process. In a relational database,
data models are built from tables

- Entity Relationship (ER) model


- Dimensional model
Entity-relationship models

Query: How much Pepsi was sold in


Bangalore in June?
- ER models build
Product:
Pepsi
Primary key/ Foreign
key relationships
Market:
India,
between the entities
Blore in the DB
Units sold - Necessitates multiple
joins between DB
Time: tables
date
Entity-relationship models

More suited to OLTP systems

 Efficient in accessing a
small number of records
 Good for very specific
queries, like a number,
or customer name
 Minimal data redundancy
 Can store volatile
information
Dimensional models

Dimensional models optimize the placement of


data to enable high-performance access

Dimension tables Fact table


Descriptive data Numeric/Additive data

Customer ID (PK)
-Name
- Address
- Phone Customer ID (FK)
- Email
Date ID (FK)
Date ID (PK)
-Time -Units sold
-Day
- Month
- Price
- Year
Dimensional models

More suited to Data warehouses

 Can process a large


number of records, fast
 Good for data analysis,
trend-spotting
 Has data redundancy, but
that’s not the focus
 Information historic/non-
volatile
Dimensional models: – Star Schema

Single fact table, one dimension table


for each dimension
Dimensional models :– Snowflake schema

An expanded version of the star schema in which


dimension tables are normalized into related tables

Country ID
India 01
USA 02
Area ID
D1
K’mangala 01
Jayanagar 02

D2 Customer ID
Area
City
City ID
FACT Country
Bangalore 01
TABLE Bombay 02

D3 D4
Extraction, Transformation, Load

The most important process in data warehousing

Transformation
Extract data from Mappings corresponding Data
multiple Sources, to source and target Populate
multiple formats Target tables warehouse
tables apply business
rules to meet end
user requirements
ETL: – The challenges

Identifying business rules

Data cleansing

Window of frame
The solution :– Data Staging

Interim process (stages) to speed up ETL

ETL ETL Data


Staging Apply biz rules,
Data Extraction Warehouse
DB populate
Data staging :– the benefits

 Completes data transfer within


window of frame
 Improves performance of data
load
 Identifies need for change in
business rules
 Creates backup of OLTP data
 Enables queries on source data
OLAP

On-Line Analytical Processing


A category of applications and technologies for collecting,
managing, processing and presenting multidimensional
data for analysis and management purposes

 For analytical purposes


 Presents data in hierarchies
 Supports ad-hoc queries
OLAP Types

ROLAP: Relational OLAP. Data for reports


fetched from RDBMS.

MOLAP: Multidimensional OLAP. Data


presented in a multidimensional cube.

DOLAP: Desktop OLAP. Used for desktop


publishing. Snapshot data for
presentation.

HOLAP: Hybrid OLAP. Multidimensional


analysis of data from MDB or RDBMS

You might also like