Professional Documents
Culture Documents
Shumais Ul Haq
Data Warehouse
■ Subject-Oriented Data
■ Integrated Data
■ Informed Decisions
■ Various Applications and Various Data Sources
■ External Sources
Data Warehouse
Data Warehousing
Data Warehouse
Data Warehouse
■ Time-Variant Data
■ Nonvolatile Data
■ Granularity
– Granular data is key to reusability
– Same data might be used
– Marketing: Weekly Sales by Location
– Sales: Weekly Sales by Sales Person by Location
– Finance: Quarterly Revenue by Product
– Lower granularity
– Allows for flexibility
– Historical events and activities can be reshaped according to various needs
Data Warehouse
■ Data-Mart Bus
– Analyze business requirements
– Build data (super)marts
– Data provide an enterprise view
Data Warehouse
Data Warehouse
Data Warehouse
■ Production Data
– Taken from various operation systems
– Data is selected based on requirements for the data warehouse
– Inconsistent
– Different kinds of hardware
– Supported by various databases and operating systems
– Disparity
– Standardization, Transformation and Integration are huge challenges
Data Warehouse
■ Internal Data
– Held by employees in private files
– Sources
– Spreadsheets
– Text files
– Documents
– Can not be ignored
– Need to assess what to include
– It department must work with users to collect it
Data Warehouse
■ Archived Data
– Operational systems are intended for day to day business
– Old data is periodically moved and stored in archive files
– Archival policies depend on the organization
– Legacy systems are archived
Data Warehouse
■ External Data
– Industry statistics
– Market data
– Standards
Data Warehouse
■ Data Extraction
– Different
– Sources
– Machines
– Formats
– Buy tools or Develop tools
– Data is extracted to a separate physical environment from which moving it to the data
warehouse is eaiser
Data Warehouse
■ Data Transformation
– Clean – Filter and purge
– Corrections – Assign surrogate keys
– Resolution of conflicts – Appropriate summarization
– Default Values
– Duplicates
■ Standardization
– Data types
– Lengths
– Semantic Standardization
– Homonyms
– Synonyms
Data Warehouse
■ Data Loading
– Initial loading
– Big operation
– Large amounts of Data
– Takes time
– Refresh
– Frequency?
Data Warehouse