Professional Documents
Culture Documents
Page 1 of 1
Definition Data Warehouses are mostly populated with periodic migrations of data from operational systems. The second source is made up of external, frequently purchased, databases. Examples of this data would include lists of income and demographic information. This purchased information is linked with internal data about customers to develop a good customer profile. A Data Warehouse is a Subject-oriented Integrated Time-variant Non-volatile collection of data in support of management decisions.
Page 2 of 2
Subject Oriented OLTP databases usually hold information about small subsets of the organization. For example, a retailer might have separate order entry systems and databases for retail, catalog, and outlet sales. Each system will support queries about the information it captures. But if somebody wants to find out details of all sales, then these separate systems are not adequate. To address this type of situation, your data warehouse database should be subject-oriented, organized into subject areas like sales, rather than around OLTP data sources.
OLTP System
Data Warehouse
Subject-oriented sales information A data warehouse is organized around major subjects such as customer, products, sales, etc. Data are organized according to subject instead of application. For exmple, an insurance company using a data warehouse would organize their data by customer, premium, and claim instead of by different products (auto, life, property etc.). Integrated A data warehouse is usually constructed by integrating multiple, heterogeneous sources, such as relational databases, flat files, and OLTP files. When data resides in many separate applications in the operational environment, the encoding of data is often inconsistent. For example, in the above system, the retail system uses a numeric 7-digit code for products, the outlet system code consists of 9 alpha-numerics, and the catalog system uses 4 alphabets and 4 numerics. To create a useful subject area, the source data must be integrated. There is no need to change the coding in these systems, but there must be some mechanism to modify the data coming into the data warehouse and assign a common coding scheme. OLTP Systems Retail Sales System Outlet Sales System Catalog Sales System Product code: Product code: Product code: 9999999 XXXXXXXXX XXXX99.99 Product code:
Page 3 of 3
Common code or a mapping of the various source codes Sales Subject Area Data Warehouse Nonvolatile Unlike operational databases, warehouses primarily support reporting, not data capture. A data warehouse is always a physically separate store of data. Due to this separation, data warehouses do not require transaction processing, recovery, concurrency control etc. The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries. READ USER WRITE OLTP
READ USER OLTP R/W vs. DW Read Only Time Variant Data are stored in a data warehouse to provide historical perspective. Every key structure in the data warehouse contains, implicitly or explicitly, an element of time. A data warehouse generally stores data that is 5-10 years old, to be used for comparisons, trends, and forecasting. DW
Operational Systems vs Data Warehousing Systems Operational Data Warehouse Holds current data Holds historic data Data is dynamic Data is largely static Read/Write accesses Read only accesses Repetitive processing Adhoc complex queries Transaction driven Analysis driven Application oriented Subject oriented Used by clerical staff for day-to-day Used by top managers for analysis operations Normalized data model (ER model) Denormalized data model (Dimensional model) Must be optimized for writes and small Must be optimized for queries queries. involving a large portion of the warehouse.
Page 4 of 4
Advantages of Data Warehousing Potential high Return on Investment Competitive Advantage Increased Productivity of Corporate Decision Makers Problems with Data Warehousing Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands High maintenance Long duration projects Complexity of integration
*The details of the architecture will be discussed in the next article (read chapter 3 of the text book 1 in the mean time) The above notes have been compiled from the following sources: 1. Corey M et al., Oracle 8i Data Warehousing, TMH 2001 2. Connolly T and Carolyn B, Database Systems, second edition, AW, 1998 3. Ramakrishna R and Gehrke J, Database Management Systems, second edition, MGH, 2000 4. http://cisnet.baruch.cuny.edu/holowczak/classes/9440/datawarehousing/
You may also refer to http://system-services.com/ftp/dwintro.doc
Page 5 of 5
This document was created with Win2PDF available at http://www.daneprairie.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only.