You are on page 1of 3

DWH Architecture

The Data Warehouse architecture defines the components and the structure of the data warehouse. The architecture also defines the interfaces between the components and the external environment. All data warehouses applications have many logical layers that are separate. Each layer interacts with the other layers by passing the required data.

Source Layer This layer contains all the separate OLTP and Legacy systems that provide the business data. These systems are physically separate from each other. This layer could contain data from different operating systems, databases and files. Landing Area This is the layer where source data before being transformed is temporarily stored on the ETL Server. We could store the source data in files or load it into database tables. Generally we store the source data in files. We could use the database to house the source data to provide a sophisticated and fast way of handling reject data reprocessing (one of the DWH Exception Handling processes) requirements for very low tolerance level DWH. Please Note that adding a database at the landing area adds cost of extra infrastructure and maintenance.

Staging Area This is the layer where the cleansed and transformed data is temporarily stored. We generally use files to store the intermediate data while transformation and cleansing. Once the data is ready to be loaded to the warehouse, we load it in the staging database. The advantage of using the staging database is that we add a point in the ETL flow where we can restart the load from. The other advantages of using staging database is that we can directly utilize the bulk load utilities provided by the databases and ETL tools while loading the data in the warehouse/mart, and provide a point in the data flow where we can audit the data. Warehouse/Mart Layer This is the layer where we store the transformed data. The data here is used for business analysis. Presentation Layer This layer contains the tools and processes used to read the data from the warehouse/mart and present it to the end user in a graphical format. The DWH architecture can be affected by various factors namely

Requirements, Geographical location of the source systems and the Budget

Requirements impact to DWH architecture In case of business not clear about the overall business requirements for all or multiple subject areas, it would be safer to understand the requirements for a single subject area and develop the data mart for that subject area. This leads to a standalone data mart. In the future once requirements for another subject area is clear we tend to build another standalone data mart and so on.. This creates multiple standalone data marts. This term is called as stovepipe or siloed structure. Here multiple data marts would have the same data loaded as many times and, same data may not be consistent across data marts. Care must be taken when we build the first data mart to avoid a stove-pipe architecture in the future when requirements for other subject areas are clear. We do so by designing dimension tables in the data mart that can be shared with other data marts. These dimension tables are called Conformed Dimension tables. Using the conformed dimensions we can integrate multiple data marts to form a data warehouse. This approach is called the bottom-up approach, suggested by Ralph Kimball. In the case when the overall business requirements are clear for all subject areas, we first build an Enterprise Data Warehouse (EDW) which is a third normal form (3NF) schema. From the

EDW we build the various subject specific data marts. This approach is called the top-down approach, suggested by Bill Inmon. Geographical location of the source systems In the case of business operational systems (source systems) located at different geographical locations, we have two options namely 1. Build the local warehouse/mart and using the local warehouse/mart data build the global warehouse. This is called the Federated architecture or the Distributed architecture. Here we perform the ETL operations at both the local as well as the corporate locations. 2. Extract data from various locations and collect them at a central location and build the mart/warehouse from the data extracted at the central location. This is called the Centralized architecture. Budget Each layer in the DWH could mean more hardware, more software licenses and eventually higher maintenance cost. When the business is conservative in its spending we may not want to go for databases in the landing area and staging area. This would definitely be more cost effective but would loose out on the advantages of having database in the landing area and staging area.

You might also like