You are on page 1of 34

DATA WAREHOUSING

Shumais Ul Haq
Data Warehouse

■ Evolved as a part of Business Intelligence


■ Integrate and transform enterprise data into information
■ Historical data from operational systems
■ External data
■ Resolve conflicts
■ Transform data
■ Information delivery
Data Warehouse

“A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant


collection of data in support of management’s decisions.” – Bill Inmon
Data Warehouse

■ The data in the data warehouse is


– Separate
– Available
– *Integrated
– *Time Stamped
– *Subject Oriented
– *Nonvolatile
– Accessible
Data Warehouse

■ Subject-Oriented Data

■ Banking: Data sets for consumer loans, savings acounts


■ Insurance Company: Data sets for individual, life, car etc.
■ Data sets are organized for a particular operational system

■ Data Warehouse: Data stored by business subjects


■ Data sets, related to a business subject, combined
■ What are Business Subjects?
Data Warehouse
Data Warehouse

■ Integrated Data

■ Informed Decisions
■ Various Applications and Various Data Sources
■ External Sources
Data Warehouse
Data Warehousing
Data Warehouse
Data Warehouse

■ Time-Variant Data

■ Operational System stores current values


■ Data Warehouse stores historical data
– Buying Pattern
– Area Sales
– Sales Promotions

■ Every event is time stamped


■ Analyses Past
■ Shows Current
■ Forecasts Future
Data Warehouse
Data Warehouse

■ Nonvolatile Data

■ Data moved into warehouse at intervals


■ Not updated each time and order is processed
■ Not meant for looking up current values
■ Operational systems can handle modifications
■ Data warehouse can handle only loading
■ Data is for query and analayis and not for operations
Data Warehouse
Data Warehouse

■ Granularity
– Granular data is key to reusability
– Same data might be used
– Marketing: Weekly Sales by Location
– Sales: Weekly Sales by Sales Person by Location
– Finance: Quarterly Revenue by Product
– Lower granularity
– Allows for flexibility
– Historical events and activities can be reshaped according to various needs
Data Warehouse

■ Low Granularity; High Detail ■ High Granularity; Low Detail


– Details of every phone call – Monthly summary of calls
– Date – Month
– Time – Total Calls
– Who – Average Length
– Cellular – Total Cellular
– Special rate – Total Long Distance
– Long Distance – Total Special Rate

– 200 bytes x 200 records = – 200 bytes x 1 = 200 bytes


40,000 bytes

Did Adeel call his mom in Lahore last week?


Data Warehouse
Data Warehouse
■ Top Down Approach
Data Warehouse
■ Bottom Up Approach
Data Warehouse
■ A Combined Approach
Data Warehouse
■ Centralized ■ Federated
– Enterprise level requirement – Legacy systems, extracted data sets,
– Lowest level of granularity data marts, etc..
– May contain some summarized data – Not a good idea to start from scratch
– Integrate data
– Shared keys
■ Independent Data Marts
– Global meta data
– Each department develops it own data mart
– Distributed Queries
– No single version of truth
– Inconsistent
– Data Definitions
– Standards
– Hard to analyze data
Data Warehouse
■ Hub and Spoke
– Centralized but with derived data marts
– Data marts
– Handle most of the queries
– Obtain data from the centralized data warehouse
– Developed for various purposes
– Contain data according to varying requirements

■ Data-Mart Bus
– Analyze business requirements
– Build data (super)marts
– Data provide an enterprise view
Data Warehouse
Data Warehouse
Data Warehouse

■ Production Data
– Taken from various operation systems
– Data is selected based on requirements for the data warehouse
– Inconsistent
– Different kinds of hardware
– Supported by various databases and operating systems
– Disparity
– Standardization, Transformation and Integration are huge challenges
Data Warehouse

■ Internal Data
– Held by employees in private files
– Sources
– Spreadsheets
– Text files
– Documents
– Can not be ignored
– Need to assess what to include
– It department must work with users to collect it
Data Warehouse

■ Archived Data
– Operational systems are intended for day to day business
– Old data is periodically moved and stored in archive files
– Archival policies depend on the organization
– Legacy systems are archived
Data Warehouse

■ External Data
– Industry statistics
– Market data
– Standards
Data Warehouse

■ Data Staging Component


– Data is obtained from various operational systems
– Data needs to be changed and converted for querying and analysis
– Separate area for Extract, Transform and Load
– Cant we just put the data into the warehouse and perform these functions?
Data Warehouse

■ Data Extraction
– Different
– Sources
– Machines
– Formats
– Buy tools or Develop tools
– Data is extracted to a separate physical environment from which moving it to the data
warehouse is eaiser
Data Warehouse

■ Data Transformation
– Clean – Filter and purge
– Corrections – Assign surrogate keys
– Resolution of conflicts – Appropriate summarization
– Default Values
– Duplicates
■ Standardization
– Data types
– Lengths
– Semantic Standardization
– Homonyms
– Synonyms
Data Warehouse

■ Data Loading
– Initial loading
– Big operation
– Large amounts of Data
– Takes time
– Refresh
– Frequency?
Data Warehouse

■ Data Storage Component


– Separate from operational systems
– Large volumes of historic data
– Data structured for analysis and not for retrieval
– Read only
– Should be in a format that is readily accessible from a multitude of tools
Data Warehouse

■ Operational Metadata ■ End-User Meta Data


– Information about the operational data – Map of the data warehouse
sources – Allows end users to use business terminology
– Data structures
– Field lengths ■ Importance
– Metadata connects the parts of the dataware house
■ Extraction and Transformation Metadata – Provides information about the content and structures to
– Everything related to Extraction the developers
– Frequencies – Makes content recognizable to the users in their own
terms
– Methods
– Business rules
– Transformations

You might also like