You are on page 1of 2

DATA WAREHOUSING CONCEPTS

1. Data Warehouse: storage area for processed and integrated data across different sources.
2. A data warehouse allows its users to extract required data, for business analysis & strategic decision making.
3. Definitions of Data Warehouse:
 Bill Inmon: A Warehouse is a subject-oriented, integrated, time variant and non-volatile collection of data in
support of management’s decision making process.
 Ralph Kimbal: A Warehouse is a copy of transaction data specifically structured for query and analysis.
4. Conceptually, a Data Warehouse is s home for ‘secondhand’ data that originates in either other corporate
applications, or some data source external to your company.
5. Formally, a Data Warehouse is a stand-alone repository of information, integrated from several heterogeneous
operational data bases.
6. Data Warehouse is a process and not a product. The process of creating a well-designed information management
solution which enables informational & analytical processing without the barriers of geography and organization.
7. A source for data warehouse is a data extract from operational databases.
8. OLTP: Online Transaction Processing. These cover most of the day to day operations of a organization or enterprise.
OLTP systems have very current data. OLTP’s are designed or adopts ER data model and an application oriented data
base design.
9. OLAP: Online Analytical Processing. OLAP systems (DW) is used for data analysis and decision making by knowledge
workers such as manager, executives, analysts etc., Will have historical data, provides facilities for summarization and
aggregation, stores and manages information at different levels of granularity. OLAP typically adopts subject oriented
database design.
10. Data warehouse and OLAP tools are based on multi dimensional data model. This model views data in the form of a
data cube.
11. Each numerical measure depends on a set of dimensions which provide the context for the measure.
12. Data Warehouse as per:
 Bill Inmon: CIF (Corporate information factory). EDW in normalized form and we then create a DM/DDS from the
EDW.
Advantages:
1. To minimize disk storage.
2. Normalized model is more suitable to some reports.
3. To increase flexibility.
4. To reduce data integration.
5. To eliminate data redundancy, making it quicker to update the DW.
 R Kimbal: Dimensional modeling. Facts sounded by its dimensions. Good query performance and flexibility.
Advantages:
1. Flexibility. E.g: we can accommodate changes in the requirements with minimal changes on the data
model.
2. Performance: we can query faster than normalized model
3. It’s quicker and simpler to develop than normalized DW and easier to maintain.
13. **Early Arriving Fact Row ‘or’ Late arriving Dimension rows:
14. Difference between Data Mart and Data Warehouse:
 Data mart is small and DW is big.
 Data mart is one star and DW is collection of stars.
 Data warehouse could be a normalized model that store EDW, whereas a Data Mart is the dimensional model
containing 1-4 stars for specific department (both relational DB and Multi Dimensional DB).
15. Purpose of Multi Dimensional Data Warehouse (MDB) is performance and easier data exploration. It’s also called as
cube and is much much faster than relational DB for returning an aggregate. An MDB will be very easy to navigate,
drilling up and down the hierarchies and cross attributes, exploring the data.
16. Start Schema: model in which it has a single object in the middle radially connected to other surrounding objects. The
object in the center of the star is fact table. This fact table consists of basic business measurements and can consist of
millions of rows. The objects surrounding the fact table are called dimension tables. These dimension tables contain
business attributes that can be used as search criteria, and they are relatively small.
17. Snow Flake Schema: Extension to star schema where each point of the star expands into more points. In snow flake
schema the dimension tables are more normalized.
18. Fact Constellation: there may be a need to have more than one fact table and these are called Fact constellation. A
Fact Constellation is a kind of schema where more than one Fact table sharing among them some Dimension tables. It
is also called Galaxy schema. In simple terms, same dimensions are shared among more than one fact tables.
19. Types of Metadata: 1) Built-in Metadata 2) Usage Metadata 3) Control meta data.
20. ROLAP: Relational OLAP: Data is not required to be stored dimensional for viewing dimensionally, stored relational
21. MOLAP: MultiDimensional OLAP; data must be stored multi dimensional to be viewed as multi dimensional.
22. Steps in building a Data Warehouse:
 Analyzing business needs
 Extracting operational data
 Transforming and loading data
 Query optimization
 Presentation of data
 Continuing refinement of the data warehouse

You might also like