You are on page 1of 53

|

 : J 
Data Warehouse:
The Building Blocks

|   


 
0  

º eview formal definition of a data warehouse


º Discuss the defining features
º Distinguish between data warehouses and data
marts
º tudy each component or building block that
makes up a data warehouse
º ntroduce metadata and highlight its significance

|   


 |
m 


º nformation delivery system


º ntegrate and transform enterprise data into
information
± suitable for strategic decision making
º Take all the historic data from the various
operational systems
± Combine this internal data with any relevant data
from outside sources
± Pull them together
|   
 
-  

  

º eed different components or building


blocks
± Arranged together in the most optimal way
± Arranged in a suitable architecture

|   


 
½  
 m  

m

º The father of Data Warehouse


º A Data Warehouse is a subject oriented,
integrated, nonvolatile, and time variant
collection of data in support of
management¶s decisions.´

|   


 
- 

º Another leading data warehouse practitioner


º The data in the data warehouse is:
±   eparate
±  ! Available
±  ntegrated
±  
Time stamped
±  "#$ ubject oriented
± %&' $ onvolatile
± () !* Accessible
|   
 
m    -

º What about the nature of the data in the


data warehouse?
º ow is this data different from the data in
any operational system?
º Why does it have to be different?
º ow is the data content in the data
warehouse used?

|   


 +
m    -

º ome of Key Defining Features of the


Data Warehouse
± ubject-Oriented
± ntegrated Data
± Time-Variant Data
± onvolatile Data
±  J Data Granularity

|   


 ,
-   m

º Data is stored by subjects, not by applications


º The subjects are critical for the enterprise
± ales, shipments and inventory for a manufacturing
company
± Figure 2-1
º There is no application flavor
º The data in a data warehouse cut across
applications

|   


 -
|   
 
  m

º eed to pull together all the relevant data from


the various systems
± Data from internal operational systems
± Data from outside sources
º Before the data can be stored in a DW,
± emove the inconsistencies
± tandardize the various data elements
± Go through a process of transformation,
consolidation, and integration of the source data
|   
 
|   
 |
- 

º ome of the items that would need


standardization:
± aming Conventions
± Codes
± Data attributes
± Measurements

|   


 
  m

º For an operational system,


± the stored data contains the current values
º The data in the data warehouse is meant
for analysis and decision making.
± The use needs data not only about the
current purchase, but on the past purchases.
º A data warehouse has to contain historical
data, not just current values.

|   


 
   

º The time-variant nature of the data


± Allows for analysis of the past
± elates information to the present
± Enables forecasts for the future

|   


 


 m

º The data in the data warehouse is not


intended to run the day-to-day business.
± You do not update the data warehouse every
time you process a single order.
º Data from the operational systems are
moved into the data warehouse at specific
intervals.
º Figure 2-3, not update

|   


 
m  

º The analysis begins at a high level and moves


down to lower levels of detail
± tart by looking at summary data
± Look at the breakdown
º Data granularity in a data warehouse refers to
the level of detail
± The lower the level of detail, the finer the data
granularity
± The lowest level of detail G a lot of data in the data
warehouse

|   


 +
|   
 ,
m  - - m m

-
º n 1998, Bill nmon stated,
The single most important issue facing the
T manager this year is whether to build
the data warehouse first or the data mart
first.´

|   


 -
m  - - m m

-
º Before deciding to build a data warehouse,
you need to ask:
± Top-down or bottom-up approach?
± Enterprise-wide or department?
± Which first ± data warehouse or data mart?
± Build pilot or go with a full-fledged
implementation?
± Dependent or independent data marts?
|   
 |
|   
 |

    m  

º Figure 2-5
º Two different basic approaches
1) Overall data warehouse feeding dependent
data marts
2) everal departmental or local data marts
combining into a data warehouse

|   


 ||
|   
 |

m
 
   

º A truly corporate effort, an enterprise view of


data
º nherently architected ± not a union of disparate
data marts
º ingle, central storage of data about the content
º Centralized rules and control
º May see quick results if implemented with
iterations

|   


 |

m
 
 
m  
º Takes longer to build even with an
iterative method
º igh exposure/risk to failure
º eeds high level of cross-functional skills
º igh outlay without proof of concept

|   


 |
½

 
   

º Faster and easier implementation of


manageable pieces
º Favorable return on investment and proof
of concept
º Less risk of failure
º nherently incremental; can schedule
important data marts first
º Allows project team to learn and grow
|   
 |
½

 
 
m  
º Each data mart has its own narrow view of
data
º Permeates redundant data in every data
mart
º Perpetuates inconsistent and
irreconcilable data
º Proliferates unmanageable interfaces

|   


 |+
  

  
1. Plan and define requirements at the overall
corporate level
2. Create a surrounding architecture for a
complete warehouse
3. Conform and standardize the data content
4. mplement the data warehouse as a series of
supermarts, one at a time
º upermarts are carefully architected data marts

|   


 |,
   m 


º A data mart is a logical subset of the complete


data warehouse
º A data warehouse is a conformed union of all
data marts
º ndividual data marts are targeted to particular
business groups
º The collection of all the data marts form an
integrated whole, called the enterprise data
warehouse

|   


 |-
     0
 -

º Architecture is the proper arrangement of


the components
º Build a data warehouse with software and
hardware components
º Arrange the building blocks for maximum
benefit
º May lay special emphasis on one
component
|   
 
½  0

 
 


º Figure 2-6: building blocks or components
± ource Data Component
± Data taging Component
± Data torage Component
º tore and manage the data, keep track of the data
by means of the metadata repository
± nformation Delivery Component
± Metadata Component
± Management and Control Component
|   
 
|   
 |
-
 m 0

 

º Production Data
º nternal Data
º Archived Data
º External Data

|   


 

 
m

º Data from the various operational systems


± on different hardware platforms
± by different database systems and operating systems
± from many vertical applications
º o conformance of data among the various
operational systems
º The significant and disturbing characteristic of
production data is disparity
± tandardize, transform, convert, and integrate the
disparate data
|   
 
   m

º Data from users¶ private´ spreadsheets,


documents, customer profiles, and sometimes
even departmental database
º Add additional complexity to the process of
transforming and integrating the data
± Determine strategies for collecting data from
spreadsheets
± Find ways of taking data from textual documents
± Tie into departmental databases to gather pertinent
data from these sources

|   


 
  m

º Periodically take the old data and store it in


archived files in an operational system
º Many different methods of archiving
± A separate archival database
± Flat files on disk storage
± Tape cartridges or microfilm and even off-site
º A data warehouse keeps historical snapshots of
data
± Look into your archived data sets
± Useful for discerning patterns and analyzing trends
|   
 
  m
º Data from external sources for information that most
executives use
± tatistics relating to their industry produced by external
agencies
± Market share data of competitors
± tandard values of financial indicators for their business
º To spot industry trends and compare performance
against other organizations
º Usually, data from outside sources do not conform
to your formats
|   
 +
m -  0

 

º Three major functions need to be performed for


getting the data ready
± extract the data
± transform the data
± and then load the data into the data warehouse
storage
º ETT:
± w (Extraction)
± .(Transformation)
± /0(Transportation)
|   
 ,
m - 

º Provide a place and an area with a set of


functions to clean, change, combine,
convert, deduplicate, and prepare source
data for storage and use in the data
warehouse

|   


 -
m 

º Deal with numerous data sources


º Tools for data extraction
± Purchasing outside tools
± Developing in-house programs
º Extract the source data into
± a group of flat files,
± or a data-staging relational database,
± or a combination of both
|   
 
m 


º Perform a number of individual tasks


± Clean
± tandardization
± Combine
± Purging and separating out
± orting and merging
± Assignment of surrogate keys
º esults: a collection of integrated data that is
cleaned, standardized, and summarized
|   
 
m
 

º Two distinct groups of tasks


± The initial loading of the data into the data
warehouse
± efresh cycles
º Extract the changes to the source data
º Transform the data revisions
º And feed the incremental data revisions on an
ongoing basis
º Figure 2-7
|   
 |
|   
 
m -
 0

 
º A separate repository
± To keep large volume of historical data for analysis
± To keep the data in structures suitable for analysis
º The data warehouses are read-only´ data
repositories
± The data is stable and it represents snapshots at
specified periods
º The database in a data warehouse must be open
± Must be open to different tools
± DBMs or MDDBs

|   


 


m 0

 

º Who are the users?


± The novices, the casual users, the business analysts,
and the power users
º Different methods of information delivery
± Ad hoc reports, complex queries, multidimensional
analysis, statistical analysis, E feed, data-mining
applications
º nformation delivery mechanism
± Online, internet, intranet, e-mail

|   


 
|   
 

 0

 

º The data about the data in the data


warehouse
º imilar to a data dictionary, but much
more than a data dictionary
º (Later, in a separate section)

|   


 +

     0


0

 
º it on top of all the other components
± Coordinate the services and activities
± Control the data transformation and the data transfer
into the data warehouse storage
± Moderate the information delivery to the users
± Monitor the movements of data into the staging area
and from there into the data warehouse storage
º The metadata is the source of information for
the management module
|   
 ,

m   m
 -
º The Yellow Pages
± A directory with data about the institutions

º Types of Metadata
± Operational Metadata
± Extraction and Transformation Metadata
± End-user Metadata

|   


 -




º Contain all of next information about the


operational data sources
± Data for the data warehouse comes from
several operational systems
± The data elements have various field lengths
and data types
± You split records, combine parts of records
from different source files, and deal with
multiple coding schemes and field lengths
|   
 

  




º Contain data about the extraction of data
from the source systems
± the extraction frequencies
± extraction methods,
± and business rules for the data extractions

|   


 
 


º The navigational map


± Enable the end-users to find information
± Allow the end-users to use their own
business terminology

|   


 |
- -   



º Act as the glue that connects all parts of


the data warehouse
º Provide information about the contents
and structures to the developers
º Open the door to the end-users and make
the contents recognizable in their own
terms

|   


 

You might also like