Professional Documents
Culture Documents
Data Warehousing
Data Warehousing
A data warehouse is the organized collection of internally- and externally-generated data used to allow the business to make accurate business decisions.
It is not just a database. It is an organized collection of databases that is designed specifically to support management decision making.
Data Warehousing
slide 2 of 33
Integrated
Data throughout the data warehouse is stored using consistent technologies and rules so that the exchange of data works flawlessly.
Time-variant
Much of the data is stored with date/time information so that trends can be followed.
Nonupdatable
Data cannot be updated by end users; data comes from operational systems. Data Warehousing slide 3 of 33
Informational systems needs to be separated from operational systems to improve data management.
Operational systems is the collection and management of data from business operations. Information systems is the collection and management of information used to make decisions.
Data Warehousing
slide 4 of 33
Data Warehousing
slide 5 of 33
Data Warehousing
slide 7 of 33
Operational Systems
Operational systems process large quantities of relatively simple data on a day to day basis. Necessary to run the business on a daily basis. They comprise the bulk of the data collection for organizations. Primary users are the clerks, sales people, and store/office managers.
Data Warehousing slide 8 of 33
Informational Systems
Designed to support decision making based on historical point-in-time and prediction data.
Based on the Labor Day weekend sales in the last five years, what should the next two Labor Day weekend sales be assuming an increase of 20% in advertising? Queries of informational systems are much more complex that queries of operational systems.
Data Warehousing
slide 9 of 33
Data Warehouse Architectures Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and @ctive Warehouse Three-Layer architecture
Data Warehousing
slide 10 of 33
Data Warehousing
slide 11 of 33
Data Warehousing
slide 12 of 33
Limitations of Independent DM
Problems with independent data marts include:
Separate end-user tools for each data mart may require costly redundancy. Data marts may not be consistent with one another, limiting the ease in which an organizational manager can get comprehensive information. Queries cant be customized so that they compare data from other data marts.
A dependent data mart addresses the first two limitations by using a central data warehouse that feeds subject-related data into data marts.
Data Warehousing
slide 13 of 33
Logical Data Mart A logical data mart has different relational views of a single physical data warehouse, instead of separate data marts.
Like partitioning a single hard drive. New data marts can be created quickly. Data marts are kept up-to-date since the source data is in the same system.
http://www.teradata.com/t/page/116324/
Data Warehousing slide 14 of 33
Data Warehousing
slide 15 of 33
Data Warehousing
slide 16 of 33
Status
Status
Data Warehousing slide 17 of 33
Data are never physically altered or deleted once they have been added to the store Data Warehousing slide 18 of 33
Cleanse
Duplicate, missing, and misspelled data is corrected.
Transform
Converts the data from the format used in the operational systems to a format used by the enterprise data warehouse
Load
Transformed data is loaded into the enterprise data warehouse.
Data Warehousing
slide 19 of 33
Data Warehousing
slide 20 of 33
Data Reconciliation
Typical operational data is:
Transient not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope not comprehensive Sometimes poor quality inconsistencies and errors
Single-Field Transformation
In general some transformation function translates data from old form to new form
Data Warehousing
slide 22 of 33
Multi-field Transformation
Data Warehousing
slide 23 of 33
Derived Data
Objectives
Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities
Characteristics
Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers)
Data Warehousing
slide 24 of 33
Data Warehousing
slide 25 of 33
Excellent for ad-hoc queries, but bad for online transaction processing
Data Warehousing
slide 26 of 33
Data Warehousing
slide 27 of 33
Data Warehousing
slide 29 of 33
Data Warehousing
slide 30 of 33
Data Warehousing
slide 31 of 33
Data Warehousing
slide 32 of 33
Data Visualization
Data Visualization
The representation of data in graphical and multimedia formats for human analysis.
Trends and patterns can often be easier to see and act on when presented in a visual format.
Data Warehousing
slide 33 of 33