You are on page 1of 62

OVERVIEW OF

DATA WAREHOUSING

SUSHIL
KULKARNI
I  DATABASE VS DATA WAREHOUSE
 PROBEMS OF DATABASES
N  WHAT IS DATA WAREHOUSE?
T  DATA ACCESS SYSTEMS AND DIFFERENCES
 DATA WAREHOUSE SCOPE
E
 TYPES OF DATA WAREHOUSES
N  BIG PICTURE
 END USERS
S
 DATA MARTS
I  DATA IN DATA WAREHOUSE
 DATABASE STRUCTURE
O
 OLTP AND OLAP
N  ARCHITECTURE
 ETL PROCESS
S SUSHIL KULKARNI
DATABASE
VS
DATA WAREHOUSE

SUSHIL KULKARNI
DATABASE
VS
DATA WAREHOUSE
 To accelerate decision making:
1. Right information,
2. Right time,
3.Easily accessible
 Problems with database
1. Fragments
2. Operational / Information
Processing
SUSHIL KULKARNI
PROBLEMS !

SUSHIL KULKARNI
PROBLEMS

SUSHIL KULKARNI
PROBLEMS !

 IT business requires:

1. integrated
2. company-wide view of high quality
3. Fixed network with changing users

Informational Processing systems


department must be separated from
operational systems to improve
performance
SUSHIL KULKARNI
PROBLEMS !

 No single system of data.

 View of databases as a whole is difficult

 Organization wants to analyze the


activities in a balance way

 Customer relationship with management

SUSHIL KULKARNI
DEFINITION
OF
DATA WAREHOUSE

SUSHIL KULKARNI
SO WHAT IS
DATA WAREHOUSE ?
Subject-oriented:
customers, patients, students,
products, time.

Integrated: Gathered CENTRALLY from

1.several internal systems of records


2. sources external to the organization
SUSHIL KULKARNI
WHAT IS
DATA WAREHOUSE ?

 Time - variant:

Use to study trends and changes.

 Non - updatable:

cannot updated by end users.

SUSHIL KULKARNI
DATA SYSTEMS

SUSHIL KULKARNI
DATA SYSTEMS AVAILABLE

OPERATIONAL SYSTEMS INFORMATION SYSTEMS

SUSHIL KULKARNI
OPERATIONAL SYSTEMS
Used to run a business in real time based
on current data and process large volumes
of relatively simple read/write transactions,
while providing fast response.

 Examples

1. Sales order processing


2. Reservation systems
3. Patient registration
SUSHIL KULKARNI
INFORMATION SYSTEMS
 Designed to support decision-making based on

1. Historical data
2. Prediction data.

 Designed for complex queries or data-mining


applications.

 Examples:

1. Sales trend analysis,


2. Customer segmentation
SUSHIL KULKARNI
3. Human resources planning
DIFFERENCE
Characteristics Operational Informational
Systems Systems
Purpose Real time data entry Real and analyze
historical data.
Primary users Clerks, sales-persons, Managers, business
administrations analysts, customers
Scope of usage Narrow, planned, and Broad, ad hoc,
simple updates and complex queries and
queries analysis
Design goal Performance Ease of flexible access
throughput, and use
availability
Volume Many, constant Periodical batch
updates and queries updates and queries
on one or a few table requiring many or all
rows rows

SUSHIL KULKARNI
DATA WAREHOUSE
SCOPE

SUSHIL KULKARNI
DATA WAREHOUSE SCOPE

 Broad :

Required for companies, Very costly, May


be divided according to Depts.

Narrow:

Required for Personal information


SUSHIL KULKARNI
TYPES
OF
DATA WAREHOUSE

SUSHIL KULKARNI
TYPES OF DATA WAREHOUSE

 Point – Point

End-users allowed to get operational


databases directly using any tools

SUSHIL KULKARNI
TYPES OF DATA WAREHOUSE
 Central Data Warehouses

SUSHIL KULKARNI
TYPES OF DATA WAREHOUSE
1. EIS : Executive Information System
2. DSS: Decision Support System
3. Reporting

 Distributed Data Warehouse:

Certain Components of DW are distributed


across a number of different physical
databases

SUSHIL KULKARNI
BIG PICTURE

SUSHIL KULKARNI
BIG PICTURE

SUSHIL KULKARNI
END USERS

SUSHIL KULKARNI
END USERS

 Executives and managers

"Power" users (business and financial


analysts, engineers, etc.)

Support users (clerical, administrative,


etc.)

SUSHIL KULKARNI
DATA MART

SUSHIL KULKARNI
DATA MARTS

SUSHIL KULKARNI
DATA MARTS
 Create many DM’s
 Limited scope
 Independent ETL process or derived from

DW

 Examples:

1. Financial DM
2. Marketing DM
3. Supply chain DM SUSHIL KULKARNI
D.M. PICTURE

SUSHIL KULKARNI
DATA
IN
DATA WAREHOUSE

SUSHIL KULKARNI
DATA IN
DATA WARE HOUSE
 one version of the truth across the
enterprise with meaning full recordes

For IT staff : clean, consistent, and


documented formatted data.

For engineer or analyst: convenient, in a


common formatted data, exportable to
other common formats

SUSHIL KULKARNI
DATA IN
DATA WARE HOUSE

 Production Data: Data from different


Operational systems with heterogeneous
platforms

 Internal Data: Private data of organization


like spread sheets, documents, customer
profiles

SUSHIL KULKARNI
DATA IN
DATA WARE HOUSE

External Data: Data from external sources.


Statistics relating to their industry
produced by external agencies

Example: DW of car rental company


contains data on the current production
schedules of the leading automobile
manufactures
SUSHIL KULKARNI
DATA IN
DATA WARE HOUSE

Archived Data: Data from current business


and old data store in archive files

SUSHIL KULKARNI
DATA IN
DATA WARE HOUSE

 Methods of archiving data:


1. Recent data is archived to separate
archival database that may be online
2. Old data is archived to flat files on disk
storage
3. Oldest data is archived to tape
cartridges or microfilms or kept off line

SUSHIL KULKARNI
DATA BASE
STRUCTURE

SUSHIL KULKARNI
DATA BASE STRUCTURE

DW made up of three separate databases:

1. Interim data store


2. Meta data repository
3. Production DW

SUSHIL KULKARNI
OLTP
AND
OLAP

SUSHIL KULKARNI
OLTP
 On line transaction processing

 Standard Normalized Structure

 Designed for transactions: Insert, Updates,

Delete

SUSHIL KULKARNI
OLAP
 On line analytical processing , Star
Schema [See Table]

 Read Only

 Historical data

 Aggregated data

SUSHIL KULKARNI
ARCHETECTURE

SUSHIL KULKARNI
ARCHITECTURE
AND
END-TO - PROCESS

SUSHIL KULKARNI
BACK END TOOLS
AND
UTILITIES
 Tools are used to extract & loading data

Data extraction from foreign sources by


gateways & interfaces

Examples: EDA/SQL, ODBC, Oracle Open


Connect, Sybase Enterprise Connect
Informix Enterprise gate way

SUSHIL KULKARNI
PROCESS OF BRINGING DATA
TO DATA WAREHOUSE

ETL PROCESS

SUSHIL KULKARNI
CLEANING
 Large volumes of data from multiple
sources are involved

High probability of errors and anomalies in


the data

Tools that help to detect data anomalies


and correct them can have a high payoff

SUSHIL KULKARNI
CLEANING
 Examples where data cleaning becomes
necessary are:

1. Inconsistent field lengths,


2. Inconsistent descriptions,
3. Inconsistent value assignments,
4. Missing entries and violation of
integrity constraints.
 Different, classes of data cleaning tools used to
extract & loading data

1. Data Migration
2. Data scrubbing
3. Data Auditing tools SUSHIL KULKARNI
DATA MIGRATION
Data migration tools allow simple
transformation rules to be specified

 Examples: “replace the string gender by


sex”.

Warehouse Manager from Prism is an


example of a popular tool of this kind.

SUSHIL KULKARNI
DATA SCRUBBING
 Data scrubbing tools use domain-specific
knowledge

Example: Postal addresses, to do the


scrubbing of data.

Use parsing and fuzzy matching


techniques to accomplish cleaning from
multiple sources.

Tools: Integrity and Trillum


SUSHIL KULKARNI
DATA AUDITING

 Data auditing tools make it possible to


discover rules and relationships by
scanning data.

 Example: Tool may discover a suspicious


pattern (based on statistical analysis) that
a certain car dealer has never received
any complaints.

SUSHIL KULKARNI
LOADING
 Additional preprocessing required:
1.Checking integrity constraints
2. Sorting; summarization, aggregation
3.Other computation to build the derived tables
stored in the warehouse

 Batch load utilities are used for this purpose. In


addition to populating the warehouse, a load utility
must allow the system administrator to monitor
status, to cancel, suspend and resume a load, and
to restart after failure with no loss of data integrity.

SUSHIL KULKARNI
REFRESH

 Refreshing a warehouse consists in


propagating updates on source data to
correspondingly update the base data and
derived data stored in the warehouse.

 Two sets of issues: when to refresh, and


how to refresh.

SUSHIL KULKARNI
SUMMARIZATION

 Required lot of space to store and require


computer time as well as resources.
Some of the summaries may contain
figures that explain the summary.

 Advantage is that the data warehouse is


not calculating the summaries.

SUSHIL KULKARNI
METADATA

 Administrative metadata

 Business metadata includes business


terms and definitions,

 Operational metadata includes


information that is collected during the
operation of the warehouse:

SUSHIL KULKARNI
The ETL Process
 Capture

 Scrub or data cleansing

 Transform

 Load and Index

ETL = Extract, transform, and load


Steps in data reconciliation

Capture = extract…obtaining a snapshot of a


chosen subset of the source data for loading
into the data warehouse

Static extract = Incremental extract =


capturing a snapshot of capturing changes that
the source data at a have occurred since the
point in time last static extract
Steps in data reconciliation

Scrub = cleanse…uses pattern


recognition and AI techniques to
upgrade data quality
Fixing errors: misspellings, Also: decoding, reformatting,
erroneous dates, incorrect field time stamping, conversion, key
usage, mismatched addresses, generation, merging, error
missing data, duplicate data, detection/logging, locating
inconsistencies missing data
Steps in data reconciliation

Transform = convert data from


format of operational system to
format of data warehouse

Record-level: Field-level:
Selection – data partitioning single-field – from one field to
Joining – data combining one field
Aggregation – data multi-field – from many fields to
summarization one, or one field to many
Steps in data reconciliation

Load/Index = place transformed


data into the warehouse and
create indexes

Refresh mode: bulk Update mode: only


rewriting of target data at changes in source data are
periodic intervals written to data warehouse
Single-field transformation

In general – some transformation function


translates data from old form to new form

Algorithmic transformation uses a formula


or logical expression

Table lookup – another approach


Multi field transformation

M:1 –from many source


fields to one target field

1:M –from one


source field to
many target fields
THANKS!

SUSHIL KULKARNI

You might also like