You are on page 1of 40

Data warehouse

From Wikipedia, the free encyclopedia

[hide]This article has multiple issues. Please help improve it or discuss these issues on the talk pag
This article needs additional citations for verification. (February 2008)
This article has an unclear citation style. (September 2013)

Data Warehouse Overview

In computing, a data warehouse (DW or DWH), also known as an enterprise data


warehouse (EDW), is a system used forreporting and data analysis. DWs are central repositories of
integrated data from one or more disparate sources. They store current and historical data and are
used for creating analytical reports for knowledge workers throughout the enterprise. Examples of
reports could range from annual and quarterly comparisons and trends to detailed daily sales
analyses.
The data stored in the warehouse is uploaded from the operational systems (such as marketing,
sales, etc., shown in the figure to the right). The data may pass through an operational data store for
additional operations before it is used in the DW for reporting.
Contents
[hide]

1 Types of systems

2 Software tools

3 Benefits

4 Generic data warehouse environment

5 History

6 Information storage

6.1 Facts

6.2 Dimensional vs. normalized approach for storage of data

7 Design methodologies

7.1 Bottom-up design

7.2 Top-down design

7.3 Hybrid design

8 Data warehouses versus operational systems

9 Evolution in organization use

10 See also

11 References

12 Further reading

13 External links

Types of systems[edit]
Data mart
A data mart is a simple form of a data warehouse that is focused
on a single subject (or functional area), such as sales, finance or
marketing. Data marts are often built and controlled by a single
department within an organization. Given their single-subject
focus, data marts usually draw data from only a few sources.
The sources could be internal operational systems, a central
data warehouse, or external data.[1] Denormalization is the norm
for data modeling techniques in this system.
Online analytical processing (OLAP)
OLAP is characterized by a relatively low volume of
transactions. Queries are often very complex and involve
aggregations. For OLAP systems, response time is an
effectiveness measure. OLAP applications are widely used
by Data Mining techniques. OLAP databases store aggregated,
historical data in multi-dimensional schemas (usually star
schemas). OLAP systems typically have data latency of a few
hours, as opposed to data marts, where latency is expected to
be closer to one day.
Online Transaction Processing (OLTP)
OLTP is characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE). OLTP systems
emphasize very fast query processing and maintaining data
integrity in multi-access environments. For OLTP systems,
effectiveness is measured by the number of transactions per
second. OLTP databases contain detailed and current data. The
schema used to store transactional databases is the entity
model (usually 3NF).[2] Normalization is the norm for data
modeling techniques in this system.
Predictive analysis

Predictive analysis is about finding and quantifying hidden


patterns in the data using complex mathematical models that
can be used to predict future outcomes. Predictive analysis is
different from OLAP in that OLAP focuses on historical data
analysis and is reactive in nature, while predictive analysis
focuses on the future. These systems are also used for CRM
(Customer Relationship Management).[3]

Software tools[edit]
The typical extract-transform-load (ETL)-based data
warehouse uses staging, data integration, and access
layers to house its key functions. The staging layer or
staging database stores raw data extracted from each
of the disparate source data systems. The integration
layer integrates the disparate data sets by transforming
the data from the staging layer often storing this
transformed data in an operational data store (ODS)
database. The integrated data are then moved to yet
another database, often called the data warehouse
database, where the data is arranged into hierarchical
groups often called dimensions and into facts and
aggregate facts. The combination of facts and
dimensions is sometimes called a star schema. The
access layer helps users retrieve data.[4]
This definition of the data warehouse focuses on data
storage. The main source of the data is cleaned,
transformed, cataloged and made available for use by
managers and other business professionals for data
mining, online analytical processing, market
research and decision support.[5] However, the means
to retrieve and analyze data, to extract, transform and
load data, and to manage the data dictionary are also
considered essential components of a data
warehousing system. Many references to data
warehousing use this broader context. Thus, an
expanded definition for data warehousing
includes business intelligence tools, tools to extract,
transform and load data into the repository, and tools to
manage and retrieve metadata.

Benefits[edit]
A data warehouse maintains a copy of information from
the source transaction systems. This architectural
complexity provides the opportunity to :

Congregate data from multiple sources into a single


database so a single query engine can be used to
present data.

Mitigate the problem of database isolation level


lock contention in transaction processing systems
caused by attempts to run large, long running,
analysis queries in transaction processing
databases.

Maintain data history, even if the source transaction


systems do not.

Integrate data from multiple source systems,


enabling a central view across the enterprise. This
benefit is always valuable, but particularly so when
the organization has grown by merger.

Improve data quality, by providing consistent codes


and descriptions, flagging or even fixing bad data.

Present the organization's information consistently.

Provide a single common data model for all data of


interest regardless of the data's source.

Restructure the data so that it makes sense to the


business users.

Restructure the data so that it delivers excellent


query performance, even for complex analytic
queries, without impacting the operational systems.

Add value to operational business applications,


notably customer relationship management (CRM)
systems.

Make decisionsupport queries easier to write.

Generic data warehouse


environment[edit]
The environment for data warehouses and marts
includes the following:

Source systems that provide data to the warehouse


or mart;

Data integration technology and processes that are


needed to prepare the data for use;

Different architectures for storing data in an


organization's data warehouse or data marts;

Different tools and applications for the variety of


users;

Metadata, data quality, and governance processes


must be in place to ensure that the warehouse or
mart meets its purposes.

In regards to source systems listed above, Rainer[clarification


needed]
states, A common source for the data in data
warehouses is the companys operational databases,
which can be relational databases.[6]
Regarding data integration, Rainer states, It is
necessary to extract data from source systems,
transform them, and load them into a data mart or
warehouse.[6]
Rainer discusses storing data in an organizations data
warehouse or data marts.[6]
Metadata are data about data. IT personnel need
information about data sources; database, table, and
column names; refresh schedules; and data usage
measures.[6]
Today, the most successful companies are those that
can respond quickly and flexibly to market changes and
opportunities. A key to this response is the effective and
efficient use of data and information by analysts and
managers.[6] A data warehouse is a repository of
historical data that are organized by subject to support
decision makers in the organization.[6] Once data are
stored in a data mart or warehouse, they can be
accessed.

History[edit]
The concept of data warehousing dates back to the late
1980s[7] when IBM researchers Barry Devlin and Paul
Murphy developed the "business data warehouse". In
essence, the data warehousing concept was intended
to provide an architectural model for the flow of data
from operational systems to decision support
environments. The concept attempted to address the
various problems associated with this flow, mainly the
high costs associated with it. In the absence of a data
warehousing architecture, an enormous amount of
redundancy was required to support multiple decision
support environments. In larger corporations it was
typical for multiple decision support environments to
operate independently. Though each environment
served different users, they often required much of the
same stored data. The process of gathering, cleaning
and integrating data from various sources, usually from
long-term existing operational systems (usually referred

to as legacy systems), was typically in part replicated


for each environment. Moreover, the operational
systems were frequently reexamined as new decision
support requirements emerged. Often new
requirements necessitated gathering, cleaning and
integrating new data from "data marts" that were
tailored for ready access by users.
Key developments in early years of data warehousing
were:

1960s General Mills and Dartmouth College, in a


joint research project, develop the
terms dimensions and facts.[8]

1970s ACNielsen and IRI provide dimensional


data marts for retail sales.[8]

1970s Bill Inmon begins to define and discuss


the term: Data Warehouse.[citation needed]

1975 Sperry
Univac introduces MAPPER (MAintain, Prepare,
and Produce Executive Reports) is a database
management and reporting system that includes
the world's first 4GL. First platform designed for
building Information Centers (a forerunner of
contemporary Enterprise Data
Warehousing platforms)

1983 Teradata introduces a database


management system specifically designed for
decision support.

1983 Sperry Corporation Martyn Richard


Jones [9] defines the Sperry Information Center
approach, which while not being a true DW in the
Inmon sense, did contain many of the
characteristics of DW structures and process as
defined previously by Inmon, and later by Devlin.
First used at the TSB England & Wales A subset of
this work found its way into the much later papers
of Devlin and Murphy.

1984 Metaphor Computer Systems, founded


by David Liddle and Don Massaro, releases Data
Interpretation System (DIS). DIS was a
hardware/software package and GUI for business
users to create a database management and
analytic system.

1988 Barry Devlin and Paul Murphy publish the


article An architecture for a business and
information system where they introduce the term
"business data warehouse".[10]

1990 Red Brick Systems, founded by Ralph


Kimball, introduces Red Brick Warehouse, a
database management system specifically for data
warehousing.

1991 Prism Solutions, founded by Bill Inmon,


introduces Prism Warehouse Manager, software for
developing a data warehouse.

1992 Bill Inmon publishes the book Building the


Data Warehouse.[11]

1995 The Data Warehousing Institute, a forprofit organization that promotes data warehousing,
is founded.

1996 Ralph Kimball publishes the book The


Data Warehouse Toolkit.[12]

2000 Daniel Linstedt releases the Data Vault,


enabling real time auditable Data Warehouses
warehouse.

2012 Bill Inmon developed and made public


technology known as textual disambiguation.
Textual disambiguation applies context to raw text
and reformats the raw text and context into a
standard data base format. Once raw text is
passed through textual disambiguation, it can
easily and efficiently be accessed and analyzed by
standard business intelligence technology. Textual
disambiguation is accomplished through the
execution of textual ETL. Textual disambiguation is
useful wherever raw text is found, such as in
documents, Hadoop, email, and so forth.

Information storage[edit]
Facts[edit]
A fact is a value or measurement, which represents a
fact about the managed entity or system.
Facts as reported by the reporting entity are said to be
at raw level. E.g. if a BTS (Business Transformation
Service) received 1,000 requests for traffic channel
allocation, it allocates for 820 and rejects the remaining

then it would report 3 facts or measurements to a


management system:

tch_req_total = 1000

tch_req_success = 820

tch_req_fail = 180

Facts at raw level are further aggregated to higher


levels in various dimensions to extract more service or
business-relevant information out of it. These are called
aggregates or summaries or aggregated facts.
E.g. if there are 3 BTSs in a city, then facts above can
be aggregated from BTS to city level in network
dimension. E.g.

Dimensional vs. normalized approach for


storage of data[edit]
There are three or more leading approaches to storing
data in a data warehouse the most important
approaches are the dimensional approach and the
normalized approach.
The dimensional approach refers to Ralph Kimballs
approach in which it is stated that the data warehouse
should be modeled using a Dimensional Model/star
schema. The normalized approach, also called
the 3NF model (Third Normal Form) refers to Bill
Inmon's approach in which it is stated that the data
warehouse should be modeled using an E-R
model/normalized model.
In a dimensional approach, transaction data are
partitioned into "facts", which are generally numeric
transaction data, and "dimensions", which are the
reference information that gives context to the facts.
For example, a sales transaction can be broken up into
facts such as the number of products ordered and the
price paid for the products, and into dimensions such
as order date, customer name, product number, order
ship-to and bill-to locations, and salesperson
responsible for receiving the order.
A key advantage of a dimensional approach is that the
data warehouse is easier for the user to understand
and to use. Also, the retrieval of data from the data

warehouse tends to operate very quickly.[citation


needed]
Dimensional structures are easy to understand for
business users, because the structure is divided into
measurements/facts and context/dimensions. Facts are
related to the organizations business processes and
operational system whereas the dimensions
surrounding them contain context about the
measurement (Kimball, Ralph 2008).
The main disadvantages of the dimensional approach
are the following:
1. In order to maintain the integrity of facts and
dimensions, loading the data warehouse with
data from different operational systems is
complicated.
2. It is difficult to modify the data warehouse
structure if the organization adopting the
dimensional approach changes the way in
which it does business.
In the normalized approach, the data in the data
warehouse are stored following, to a degree, database
normalization rules. Tables are grouped together
by subject areasthat reflect general data categories
(e.g., data on customers, products, finance, etc.). The
normalized structure divides data into entities, which
creates several tables in a relational database. When
applied in large enterprises the result is dozens of
tables that are linked together by a web of joins.
Furthermore, each of the created entities is converted
into separate physical tables when the database is
implemented (Kimball, Ralph 2008)[citation needed]. The main
advantage of this approach is that it is straightforward
to add information into the database. Some
disadvantages of this approach are that, because of the
number of tables involved, it can be difficult for users to
join data from different sources into meaningful
information and to access the information without a
precise understanding of the sources of data and of
the data structure of the data warehouse.
Both normalized and dimensional models can be
represented in entity-relationship diagrams as both
contain joined relational tables. The difference between
the two models is the degree of normalization (also
known as Normal Forms). These approaches are not
mutually exclusive, and there are other approaches.
Dimensional approaches can involve normalizing data
to a degree (Kimball, Ralph 2008).
In Information-Driven Business,[13] Robert Hillard
proposes an approach to comparing the two
approaches based on the information needs of the

business problem. The technique shows that


normalized models hold far more information than their
dimensional equivalents (even when the same fields
are used in both models) but this extra information
comes at the cost of usability. The technique measures
information quantity in terms of information entropy and
usability in terms of the Small Worlds data
transformation measure.[14]

Design methodologies[edit]

This section needs additional citations for verification. Please help impr
article by adding citations to reliable sources. Unsourced material may be
removed. (July 2015)

Bottom-up design[edit]
In the bottom-up approach, data marts are first created
to provide reporting and analytical capabilities for
specific business processes. These data marts can
then be integrated to create a comprehensive data
warehouse. The data warehouse bus architecture is
primarily an implementation of "the bus", a collection
of conformed dimensions and conformed facts, which
are dimensions that are shared (in a specific way)
between facts in two or more data marts.

Top-down design[edit]
The top-down approach is designed using a normalized
enterprise data model. "Atomic" data, that is, data at
the lowest level of detail, are stored in the data
warehouse. Dimensional data marts containing data
needed for specific business processes or specific
departments are created from the data warehouse. [15]

Hybrid design[edit]
Data warehouses (DW) often resemble the hub and
spokes architecture. Legacy systems feeding the
warehouse often include customer relationship
management andenterprise resource planning,
generating large amounts of data. To consolidate these
various data models, and facilitate the extract transform
load process, data warehouses often make use of
an operational data store, the information from which is
parsed into the actual DW. To reduce data redundancy,
larger systems often store the data in a normalized
way. Data marts for specific reports can then be built on
top of the DW.
The DW database in a hybrid solution is kept on third
normal form to eliminate data redundancy. A normal
relational database, however, is not efficient for
business intelligence reports where dimensional
modelling is prevalent. Small data marts can shop for

data from the consolidated warehouse and use the


filtered, specific data for the fact tables and dimensions
required. The DW provides a single source of
information from which the data marts can read,
providing a wide range of business information. The
hybrid architecture allows a DW to be replaced with
a master data management solution where operational,
not static information could reside.
The Data Vault Modeling components follow hub and
spokes architecture. This modeling style is a hybrid
design, consisting of the best practices from both third
normal form and star schema. The Data Vault model is
not a true third normal form, and breaks some of its
rules, but it is a top-down architecture with a bottom up
design. The Data Vault model is geared to be strictly a
data warehouse. It is not geared to be end-user
accessible, which when built, still requires the use of a
data mart or star schema based release area for
business purposes.

Data warehouses versus operational


systems[edit]
Operational systems are optimized for preservation
of data integrity and speed of recording of business
transactions through use of database normalization and
an entity-relationship model. Operational system
designers generally follow the Codd rules of database
normalization in order to ensure data integrity. Codd
defined five increasingly stringent rules of
normalization. Fully normalized database designs (that
is, those satisfying all five Codd rules) often result in
information from a business transaction being stored in
dozens to hundreds of tables. Relational databases are
efficient at managing the relationships between these
tables. The databases have very fast insert/update
performance because only a small amount of data in
those tables is affected each time a transaction is
processed. Finally, in order to improve performance,
older data are usually periodically purged from
operational systems.
Data warehouses are optimized for analytic access
patterns. Analytic access patterns generally involve
selecting specific fields and rarely if ever 'select *' as is
more common in operational databases. Because of
these differences in access patterns, operational
databases (loosely, OLTP) benefit from the use of a
row-oriented DBMS whereas analytics databases
(loosely, OLAP) benefit from the use of a columnoriented DBMS. Unlike operational systems which
maintain a snapshot of the business, data warehouses
generally maintain an infinite history which is

implemented through ETL processes that periodically


migrate data from the operational systems over to the
data warehouse.

Evolution in organization use[edit]


These terms refer to the level of sophistication of a data
warehouse:
Offline operational data warehouse
Data warehouses in this stage of evolution are updated on a
regular time cycle (usually daily, weekly or monthly) from the
operational systems and the data is stored in an integrated
reporting-oriented data
Offline data warehouse
Data warehouses at this stage are updated from data in the
operational systems on a regular basis and the data warehouse
data are stored in a data structure designed to facilitate
reporting.
On time data warehouse
Online Integrated Data Warehousing represent the real time
Data warehouses stage data in the warehouse is updated for
every transaction performed on the source data
Integrated data warehouse
These data warehouses assemble data from different areas of
business, so users can look up the information they need across
other systems.[16]

See also[edit]

Accounting intelligence

Anchor Modeling

Business intelligence

Business intelligence tools

Data integration

Data mart

Data mining

Data presentation architecture

Data scraping

Data warehouse appliance

Database management system

Decision support system

Data Vault Modeling

Executive information system

Extract, transform, load

Master data management

Online analytical processing

Online transaction processing

Operational data store

Semantic warehousing

Snowflake schema

Software as a service

Star schema

Slowly changing dimension

References[edit]
1.

Jump
up^ http://docs.oracle.com/html/E
10312_01/dm_concepts.htm Data
Mart Concepts

2.

Jump
up^ http://datawarehouse4u.info/
OLTP-vs-OLAP.html OLTP vs
OLAP

3.

Jump
up^ http://olap.com/category/bisolutions/predictive-analytics Pred
ictive Analysis

4.

Jump up^ Patil, Preeti S.;


Srikantha Rao; Suryakant B. Patil
(2011). "Optimization of Data
Warehousing System:
Simplification in Reporting and

Analysis". IJCA Proceedings on


International Conference and
workshop on Emerging Trends in
Technology (ICWET) (Foundation
of Computer Science) 9 (6): 33
37.
5.

Jump up^ Marakas & O'Brien


2009

6.

^ Jump up to:a b c d e f Rainer, R.


Kelly (2012-05-01). Introduction
to Information Systems: Enabling
and Transforming Business, 4th
Edition (Kindle Edition). Wiley.
pp. 127, 128, 130, 131, 133.

7.

Jump up^ "The Story So Far".


2002-04-15. Retrieved 2008-0921.

8.

^ Jump up to:a b Kimball 2002, pg.


16

9.

Jump up^ "Jones, Martyn


Richard". martynjones.eu.
Retrieved 2014-10-01.

10. Jump up^ "An architecture for a


business and information
system". IBM Systems Journal.
11. Jump up^ Inmon, Bill
(1992). Building the Data
Warehouse. Wiley. ISBN 0-47156960-7.
12. Jump up^ Kimball, Ralph
(1996). The Data Warehouse
Toolkit. Wiley. ISBN 0-471-153370.
13. Jump up^ Hillard, Robert
(2010). Information-Driven
Business. Wiley. ISBN 978-0-47062577-4.
14. Jump up^ "Information Theory &
Business Intelligence Strategy Small Worlds Data
Transformation Measure MIKE2.0, the open source
methodology for Information
Development".
Mike2.openmethodology.org.
Retrieved 2013-06-14.

15. Jump up^ Gartner, Of Data


Warehouses, Operational Data
Stores, Data Marts and Data
Outhouses, Dec 2005
16. Jump up^ "Data Warehouse".

Further reading[edit]

Davenport, Thomas H. and Harris,


Jeanne G. Competing on
Analytics: The New Science of
Winning (2007) Harvard Business
School Press. ISBN 978-1-42210332-6

Ganczarski, Joe. Data Warehouse


Implementations: Critical
Implementation Factors
Study (2009) VDM Verlag ISBN 3639-18589-7 ISBN 978-3-63918589-8

Kimball, Ralph and Ross,


Margy. The Data Warehouse
Toolkit Second Edition (2002) John
Wiley and Sons, Inc. ISBN 0-47120024-7

Linstedt, Graziano, Hultgren. The


Business of Data Vault
Modeling Second Edition (2010)
Dan linstedt, ISBN 978-1-43571914-9

William Inmon. Building the Data


Warehouse (2005) John Wiley and
Sons, ISBN 978-8-1265-0645-3

External links[edit]

Ralph Kimball articles

International Journal of Computer


Applications

Data Warehouse Introduction

Time to Reconsider the Data


Warehouse (Global Association of
Risk Professionals)

Data
NDL: 00911488

Authority control

Categories:
Business intelligence

Data management

Data warehousing

Information technology management

Navigation menu

Create account

Not logged in

Talk

Contributions

Log in

Read
Edit
View history
Go

Main page
Contents
Featured content
Current events
Random article
Donate to Wikipedia
Wikipedia store
Interaction

Help

About Wikipedia

Community portal

Recent changes

Contact page

Article
Talk

Tools

What links here

Related changes

Upload file

Special pages

Permanent link

Page information

Wikidata item

Cite this page


Print/export

Create a book

Download as PDF

Printable version
Languages

Azrbaycanca

Catal

etina

Dansk

Deutsch

Espaol

Franais

Hrvatski

Bahasa Indonesia

Italiano

Latvieu

Lietuvi

Lumbaart

Magyar

Nederlands

Norsk bokml
Polski
Portugus
Romn

Slovenina
Svenska

Trke

Ting Vit

Edit links

This page was last modified on 30 September


2015, at 04:07.

Text is available under the Creative Commons


Attribution-ShareAlike License; additional terms
may apply. By using this site, you agree to
the Terms of Use and Privacy Policy.
Wikipedia is a registered trademark of
theWikimedia Foundation, Inc., a non-profit
organization.

Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Developers

Mobile view

Skip Headers
Oracle9i Data Warehousing Guide
Release 2 (9.2)
Part Number A96520-01

Home Book Conten Index Mast Feedba


List
ts
er
ck
Index

1
Data Warehousing Concepts

This chapter provides an overview of the Oracle data warehousing implementation. It includes:

What is a Data Warehouse?

Data Warehouse Architectures

Note that this book is meant as a supplement to standard texts about data warehousing. This book
focuses on Oracle-specific material and does not reproduce in detail material of a general nature.
Two standard texts are:

The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996)

Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996)

What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. It usually contains historical data derived from transaction data, but it can
include data from other sources. It separates analysis workload from transaction workload and
enables an organization to consolidate data from several sources.
In addition to a relational database, a data warehouse environment includes an extraction,
transportation, transformation, and loading (ETL) solution, an online analytical processing
(OLAP) engine, client analysis tools, and other applications that manage the process of gathering
data and delivering it to business users.
See Also:

Chapter 10, "Overview of Extraction, Transformation, and Loading"


A common way of introducing data warehousing is to refer to the characteristics of a data
warehouse as set forth by William Inmon:

Subject Oriented

Integrated

Nonvolatile

Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.
Integrated

Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.
Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant

In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant.
Contrasting OLTP and Data Warehousing Environments

Figure 1-1 illustrates key differences between an OLTP system and a data warehouse.

Figure 1-1 Contrasting OLTP and Data Warehousing Environments

Text description of the illustration dwhsg005.gif

One major difference between the types of system is that data warehouses are not usually in
third normal form (3NF), a type of data normalization common in OLTP environments.
Data warehouses and OLTP systems have very different requirements. Here are some examples
of differences between typical data warehouses and OLTP systems:

Workload

Data warehouses are designed to accommodate ad hoc queries. You might not know the
workload of your data warehouse in advance, so a data warehouse should be optimized to
perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.

Data modifications

A data warehouse is updated on a regular basis by the ETL process (run nightly or
weekly) using bulk data modification techniques. The end users of a data warehouse do
not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification statements to the
database. The OLTP database is always up to date, and reflects the current state of each
business transaction.

Schema design

Data warehouses often use denormalized or partially denormalized schemas (such as a


star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete
performance, and to guarantee data consistency.

Typical operations

A typical data warehouse query scans thousands or millions of rows. For example, "Find
the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve the
current order for this customer."

Historical data

Data warehouses usually store many months or years of data. This is to support historical
analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP system
stores only historical data as needed to successfully meet the requirements of the current
transaction.
Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the specifics of an organization's
situation. Three common architectures are:

Data Warehouse Architecture (Basic)

Data Warehouse Architecture (with a Staging Area)

Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture (Basic)

Figure 1-2 shows a simple architecture for a data warehouse. End users directly access data
derived from several source systems through the data warehouse.

Figure 1-2 Architecture of a Data Warehouse

Text description of the illustration dwhsg013.gif

In Figure 1-2, the metadata and raw data of a traditional OLTP system is present, as is an
additional type of data, summary data. Summaries are very valuable in data warehouses because
they pre-compute long operations in advance. For example, a typical data warehouse query is to
retrieve something like August sales. A summary in Oracle is called a materialized view.
Data Warehouse Architecture (with a Staging Area)

In Figure 1-2, you need to clean and process your operational data before putting it into the
warehouse. You can do this programmatically, although most data warehouses use a staging area
instead. A staging area simplifies building summaries and general warehouse management.
Figure 1-3 illustrates this typical architecture.

Figure 1-3 Architecture of a Data Warehouse with a Staging Area

Text description of the illustration dwhsg015.gif

Data Warehouse Architecture (with a Staging Area and Data Marts)

Although the architecture in Figure 1-3 is quite common, you may want to customize your
warehouse's architecture for different groups within your organization. You can do this by adding
data marts, which are systems designed for a particular line of business. Figure 1-4 illustrates an
example where purchasing, sales, and inventories are separated. In this example, a financial
analyst might want to analyze historical data for purchases and sales.

Figure 1-4 Architecture of a Data Warehouse with a Staging Area and Data Marts

Text description of the illustration dwhsg064.gif


Note:

Data marts are an important part of many warehouses, but they are not the
focus of this book.

See Also:

Data Mart Suites documentation for further information regarding data marts

Copyright 1996, 2002 Oracle


Corporation.
All Rights Reserved.

Ho Boo Cont Inde Mas Feed


me k ents x ter back
List
Inde
x

SearchSQLServer

BI and Data Warehousing

Get Started

Search the TechTarget Network

Home

BI and Data Warehousing

Database

data warehouse definition

data warehouse definition

Posted by: Margaret Rouse

WhatIs.com

This definition is part of our Essential Guide: Big data applications: Real-world strategies for managing big
data

Share this item with your network:

Sponsored News

Using Metadata Wisely to Mitigate Risk and Uncover Opportunities Symantec

Check Your Mirror: Object Storage May Be Closer than It Appears NetApp

See More

Vendor Resources

E-Guide: Best practices for managing the data warehouse to support Big Data SAP America, Inc.

The Event Data Warehouse- Strategies for Improving Business Performance Sensage, Inc.

Product ReviewsPOWERED BY IT CENTRAL STATION

Exadata

vs.
Sybase IQ

HP Vertica

vs.
Microsoft Parallel Data Warehouse

Netezza

vs.
EMC Greenplum

Infobright

vs.
Teradata

A data warehouse is a federated repository for all the data that an enterprise's
various business systems collect. The repository may be physical or logical.
Hadoop 2 Upgrades: Ready to Take Advantage?

Hadoop doesnt lack for attention, but that has yet


to translate into high adoption or success rates.
Find out if you should leverage Hadoop 2
upgrades here.

Download Now

By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and
itspartners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
You also agree that your personal information may be transferred and processed in the United States, and that you
have read and agree to the Terms of Use and the Privacy Policy.

Data warehousing emphasizes the capture of data from diverse sources for
useful analysis and access, but does not generally start from the point-of-view
of the end user who may need access to specialized, sometimes local
databases. The latter idea is known as the data mart.
There are two approaches to data warehousing, top down and bottom up. The
top down approach spins off data marts for specific groups of users after the
complete data warehouse has been created. The bottom up approach builds
the data marts first and then combines them into a single, all-encompassing
data warehouse.
Typically, a data warehouse is housed on an enterprise mainframe server or
increasingly, in the cloud. Data from various online transaction processing
(OLTP) applications and other sources is selectively extracted for use by
analytical applications and user queries.

PRO+
Content
Find more PRO+ content and other member only offers,here.

E-Handbook

Pricing out the new Windows Azure SQL Database

The term data warehouse was coined by William H. Inmon, who is known as
the Father of Data Warehousing. Inmon described a data warehouse as being
a subject-oriented, integrated, time-variant and nonvolatile collection of data
that supports management's decision-making process.
This was first published in February 2015

Continue Reading About data warehouse

Big data vendors should stop dissing data warehouse systems

Enterprise data warehouse architecture endures -- and evolves

Imhoff: Data warehouse architecture must go beyond the traditional

Related Terms
data mining
Data mining is the analysis of data for relationships that have not previously been discovered.
For example, the sales records ... See complete definition

data preprocessing
Data preprocessing describes any type of processing performed on raw data to prepare it for
another processing procedure. See complete definition

meta
Metadata is a description of data. See complete definition

0 comments
Oldest

Send me notifications when other members comment.

Register or Login
E-Mail

Username / Password
Comment

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United
States, you consent to having your personal data transferred to and processed in the United States. Privacy
-ADS BY GOOGLE

File Extensions and File Formats


A

#
Powered by:

Latest TechTargetresources

BUSINESS ANALYTICS

DATA CENTER

DATA MANAGEMENT

AWS

ORACLE

CONTENT MANAGEMENT

WINDOWS SERVER

SearchBusinessAnalytics

Exploring Oracle Advanced Analytics

Oracle Advanced Analytics and other data analytics tools enable business users to explore large
volumes of data.

Alteryx advanced analytics tools simplify data analysis


The Alteryx advanced analytics tools suite comprises three products: Alteryx Designer, Alteryx
Server and Alteryx Analytics ...

Data discovery tools point the way to bigger BI


benefits
ITs habit of locking down company technology systems can be frustrating for modern
business users who often have plenty of ...

About Us

Contact Us

Privacy Policy

Videos

Photo Stories

Guides

Advertisers

Business Partners

Media Kit

Corporate Site

Experts

Reprints

Archive

Site Map

Events

E-Products
All Rights Reserved, Copyright 2005 - 2015, TechTarget

Data Warehousing

Data warehousing incorporates data stores and conceptual, logical, and physical
models to support business goals and end-user information needs. A data warehouse
(DW) is the foundation for a successful BI program.
Creating a DW requires mapping data between sources and targets, then capturing the
details of the transformation in a metadata repository. The data warehouse provides a
single, comprehensive source of current and historical information.
Data warehousing techniques and tools include DW appliances, platforms,
architectures, data stores, and spreadmarts; database architectures, structures,
scalability, security, and services; andDW as a service.

This Data Warehousing site aims to help people get a good high-level
understanding of what it takes to implement a successful data warehouse
project. A lot of the information is from my personal experience as a business
intelligence professional, both as a client and as a vendor.
This site is divided into six main areas:
- Tools: The selection of business intelligence tools and the selection of the
data warehousing team. Tools covered are:
Database, Hardware
ETL (Extraction, Transformation, and Loading)
OLAP
Reporting
Metadata

- Steps: This selection contains the typical milestones for a data warehousing
project, from requirement gathering, query optimization, toproduction
rollout and beyond. I also offer my observations on the data warehousing field.
- Business Intelligence: Business intelligence is closely related to data
warehousing. This section discusses business intelligence, as well as the
relationship between business intelligence and data warehousing.
- Concepts: This section discusses several concepts particular to the data
warehousing field. Topics include:
Dimensional Data Model
Star Schema
Snowflake Schema
Slowly Changing Dimension
Conceptual Data Model
Logical Data Model
Physical Data Model
Conceptual, Logical, and Physical Data Model
Data Integrity
What is OLAP
MOLAP, ROLAP, and HOLAP
Bill Inmon vs. Ralph Kimball
- Business Intelligence Conferences: Lists upcoming conferences in the
business intelligence / data warehousing industry.
- Glossary: A glossary of common data warehousing terms.

This site is updated frequently to reflect the latest technology, information, and
reader feedback. Please bookmark this site now.

Copyright 2015 1keydata.com All Rights Reserved


Policy About Contact

Data Warehouse

Data Warehouse Definition

Data Warehousing Concepts

Data Warehouse Design

Data Warehousing Trends

Business Intelligence

Business Intelligence Software

Business Intelligence Conferences

Master Data Management

Data Warehousing Books

Glossary

Site Map

Resources

SQL Tutorial

Privacy

Search

You might also like