Data Warehouse Concepts

What is the main difference between Inmon and Kimball?
Basically speaking, Inmon professes the Snowflake Schema while Kimball relies on the Star Schema.
According to Ralf Kimball…
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering
business objectives for departments in the organization. And the data warehouse is a conformed dimension
of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a
local departmental level.
He follows Bottom-up approach i.e. first creates individual Data Marts from the existing sources and then
Create Data Warehouse.
KIMBALL – First Data Marts – Combined way – Data warehouse.
According to Bill Inmon…
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of
the data warehouse can start with data from their needs arise. Point-of-sale (POS) data can be added later if
management decides it is necessary.
He follows Top-down approach i.e. first creates Data Warehouse from the existing sources and then create
individual Data Marts.
INMON – First Data warehouse – Later – Data Marts.
The Main difference is:
Kimball: follows Dimensional Modeling.
Inmon: follows ER Modeling bye Mayee.
Kimball: creating data marts first then combining them up to form a data warehouse.
Inmon: creating data warehouse then data marts.
What is difference between Views and Materialized Views?
Views:
• Stores the SQL statement in the database and let you use it as a table. Every time you access the
view, the SQL statement executes.
• This is PSEUDO table that is not stored in the database and it is just a query.
Materialized Views:
• Stores the results of the SQL in table form in the database. SQL statement only executes once and after that
every time you run the query, the stored result set is used. Pros include quick query results.
• These are similar to a view but these are permanently for the database and often and are useful in
aggregation and summarization of data.
What is Junk Dimension? What is the difference between Junk Dimension and Degenerate Dimension?
Junk Dimension:
The column which we are using rarely or not used, these columns are formed a dimension is called Junk
Dimension.
Degenerate Dimension:
The column which we use in dimension is Degenerate Dimension.
A Degenerate Dimension is data that is dimensional in nature but stored in a fact table.
Example:
EMP table has empno, ename, sal, job, deptno
But
We are talking only the column empno, ename from the EMP table and forming a dimension this is called
Degenerate Dimension.
How to list Top 10 salaries, without using Rank Transformation?
By Using Sorter Transformation using sorted port as SAL and Filter Transformation to get first 10 records.
What is Data Warehousing?
The process of making operational data available to business managers and decision support systems is called Data
Warehousing.
How do u handle two sessions in Informatica?
Using Link Condition.
If first session is succeeded automatically second runs and succeeded.
What is the purpose of using UNIX commands in Informatica? Which UNIX commands are generally used
with Informatica?
Sometimes we have to work with UNIX based servers mostly we are using UNIX based servers so there we have to
load data. ”egrep, grep, rm these commands would be used knowledge of UNIX would be advantage.
How to create Slowly Changing Dimension in Informatica?

Select all rows. Caches the existing target as lookup table. Compares logical key column in that source against
corresponding column in the target lookup table. Compare source columns against corresponding target columns if
key match, flags news rows and changed rows.
Create two data flows: one new row and other is changed row. Generate a primary key for new row. Insert new row
in the target and updates changed row in the target over writing existing rows.
Transformation used:
SQ –> 1 Connected Lookup –> target
2 Unconnected Lookup
–> Expression –> Router –> Update Strategy –> target (instance).
What is the difference between SQL Overriding in Source Qualifier and Lookup Transformation?
Major difference is we can use any types of joins in sql over riding in source qualifier but in lookup we can use only
eqi-join in sql override.
How will you update the row without using Update Strategy Transformation?
You can set the property at session level “Treat Source Rows as: UPDATE or INSERT”, the record without using
Update Strategy in the mapping.
In Target, there is a Update Override option of updating the records using the non-key columns. Using this one we
can update the records without using Update Strategy Transformation.
How we do performance tuning in Informatica?
Performance tuning is done in several stages, like for first we do check in following order:
Target, Source, Mapping, Session, System, and depending upon which level got bottleneck we do rectify it.
Explain about scheduling real time in Informatica?
Scheduling of Informatica jobs can be done by the following ways:
Informatica Workflow Manager, Using Cron in UNIX, Using Opcon Scheduler.
What is the definition of Normalized and Denormalized?
Normalization:
Normalization is the process of removing redundancies. OLTP uses the Normalization process.
Denormalization:
Denormalization is the process of allowing redundancies. OLAP/DWH uses the Denormalized process to
greater level of detailed data (each and every transaction).
Why fact table is in normal form?
A Fact Table consists of measurements of business requirements and foreign keys of dimensions tables as per
business rules.
Basically the fact table consists of the Index keys of the dimension/lookup tables and the measures.
So whenever we have the keys in a table that itself implies that the table is in the normal form.
What is difference between E-R Modeling and Dimensional modeling?
E-R Modeling is used for normalizing the OLTP database design. It revolves around the Entities and their
relationships to capture the overall process of the system.
In E-R Modeling the data is in Normalized form. So more number of Joins, which may adversely affect the system
performance.
Dimensional modeling/Multi-Dimensional Modeling is used for de-normalizing the ROLAP/MOLAP design. It

revolves around Dimensions (point of analysis) for decision making and not to capture the process.
In Dimensional Modeling the data is denormalized, so less number of Joins, by which system performance will
improve.
What is Conformed Fact?
A Dimension table which is used by more than one fact table is known as a Conformed Dimension.
Conformed facts are allowed to have the same name in separate tables and can be combined and compared
mathematically. The relationship between the facts and dimensions are with 3NF, and can works in any type of Joins
are called as Conformed Schema, the members of that schema are call so…
What are the Methodologies of Data Warehousing?
Every company has Methodology of their own. But to name a few SDLC Methodology, AIM Methodology are
standard used. Other Methodologies are AMM, World class Methodology and many more.
Most of the time, we use Mr. Ralph Kimball Methodologies for Data Warehousing design. Two kinds of Schemas:
Star Schema and Snow Flake Schema. Most probably every one follows Either Star Schema or Snow Flake Schema.
There are two Methodologies:
1. Ralph Kimball – First Data Marts then Enterprise Data Warehouse.
2. Bill Inmon – First Enterprise Data Warehouse then Data Marts from EDWH.
Regarding the Methodologies in the Data Warehousing. They are mainly two methods:
1. Ralph Kimball Model
Kimball Model always structured as Denormalized Structure.
2. Bill Inmon Model

Inmon Model Structured as Normalized Structure.
Depends on the requirements of the company any one can follow the company’s DWH will choose the one of the
above models.
In DWH contains the Two Methods:
1. Top down Method
Top down approach in the sense preparing individual departments data (data marts) from the Enterprise
DWH.
First loads into Data Marts and then loads into the Data Warehouse.
2. Bottom up Method
Bottom up approach is nothing but first gathering all the departments data and then cleanse the data and
Transforms the data and then load all the individual departments data into the enterprise data warehouse.
First loads into Data Warehouse and then loads into the Data Marts.
What is Data Warehousing Hierarchy?
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to
define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to
the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a
family structure.
What are Data Validation Strategies for Data Marts Version?
Data Validation is to make sure that the loaded data is accurate and meets the business requirements.
Strategies are different methods followed to meet the validation requirements.
What are the data types present in BO? N what happens I…
There are different data types: Dimensions, Measure and Detail. View is nothing but an alias and it can be used to
resolve the loops in the universe. There are called as Object types in the Business Objects (BOs). And “Alias” is
different from View in the universe. View is at database level, but Alias is a different name given for the same table
to resolve the loops in universe.
The different data types in business objects are:
1. Character
2. Date
3. Long text
4. Number
Dimension, Measure, Detail are objects type. Data types are “character, date and numeric”.
What is Surrogate Key? Where we use it explain why?
Surrogate Key is the primary key for the Dimensional table.
Surrogate Key is system generated artificial primary key values. It is mainly used for critical column in DWH. Here
“critical column” means nothing but it is a column which when we updated on them most DWH in top OLTP
systems. Surrogate Keys are that which Join Dimension tables and Fact tables. Surrogate Keys is the Solution for
Critical Column problems.
Example: The “customer purchases different items in different locations, for this situation we have to maintain
historical data.
Example: any candidate key can be considered as Surrogate Key.
By using Surrogate Keys we can introduce the row in the data warehouse to maintain historical data.
Surrogate Key is a Unique Identification Key, it is like an artificial or alternative key to production key, because the
production key may be alphanumeric or composite key, because the production key may be alphanumeric or
composite key but the surrogate key is always single numeric key. Assume the production key is an alphanumeric
field. If you create an index for these fields it will occupy more space, so it is not advisable to join/index, because
generally all the data warehousing fact table are having historical data. These fact tables are linked with so many
dimension tables. If it’s a numerical field the performance is high.
Surrogate Key is a substitution for the natural primary key. It is just a unique identifier or number for each row that
can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for
each row in the table.
Data warehouses typically use a surrogate, also known as artificial or identify key, key for the dimension tables
primary keys. They can use in sequence generator, or Oracle sequence, or SQL Server Identify values for the
surrogate key.
It is useful because the natural primary key i.e. “Customer Number” in “Customer Table” can change and this makes
updates more difficult.
What is Workflow?
A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and
loading data.
What is Worklets?
A worklet is an object that represents a set of tasks.
When to create Worklets?
Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to
create and edit worklets.
Where to use Worklets?
You can run worklets inside a workflow. The workflow that contains the worklet is called the “parent workflow”.
You can also nest a worklet in another worklet.
What is Workflow Monitor?
You can monitor workflows and tasks in the workflow monitor. View details about workflow or task in Gantt View
or Task View.
Actions:
You can run, stop, abort, and resume workflows from the Workflow Monitor.
You can view the log file and Performance Data.
Slowly Changed Dimension: It is a Dimension which slowly changes over a time.
Slowly Changed Dimension

Type Description
Mapping
Inserts new Dimensions, Overwrites existing

SCD Type 1 Slowly Changing Dimension dimensions with changed dimensions. (shows
Current Data)
Inserts new and changed dimensions. Creates a

SCD Type 2/Version Data Slowly Changing Dimension version number and increments the primary
key to track changes.
Inserts new and changed dimensions. Flags the

SCD Type 2/Flag Current Slowly Changing Dimension current version and increments the primary key
to track changes.
Inserts new and changed dimensions. Create an

SCD Type 2/Date Range Slowly Changing Dimension
effective date range to track changes.
Inserts new dimensions. Updates changed

SCD Type3 Slowly Changing Dimension values in existing dimensions. Optionally uses
the load date to track changes.
What is difference between OLTP and OLAP?
OLTP OLAP
On Line Transaction Processing On Line Analytical Processing
Continuously updates data Read only data
Tables are in normalized form Partially Normalized/Denormalized tables
Single record access Multiple records for analysis purpose
Holds current data Holds current and historical data

Records are maintained using Primary Key field Records are based on surrogate key field
Delete the table or record Cannot delete the records
Complex data model Simplified data model
What is the difference between Data Mart and Data Warehouse?
DATA MART DATA WAREHOUSE
A scaled – down version of the data warehouse that It is a database management system that facilitates on-
addresses only one subject like “Sales department, HR line analytical processing by allowing the data to be a
department etc. viewed in different dimensions or perspectives to
provide business intelligence.
One fact table with multiple dimension tables. More than one fact table and multiple dimension tables.
[Sales Department] [HR Department] [Manufacturing [Sales Department, HR Department, Manufacturing

Department] Department]
Small Organizations prefer “Data Mart” Bigger Organization prefer Data Warehouse
What is difference between Dimension Table & Fact Table?
DIMENSION TABLE FACT TABLE
It provides the context/descriptive information for fact It provides measurement of an enterprise.

table measurements.
Structure of Dimension – Surrogate Key, one or more Structure of Fact Table – Foreign Key (fk), Degenerated
other fields that compose the natural key (nk) and set of Dimension and Measurements.
attributes.
In a schema more number of dimensions is presented Size of Fact Table is larger than Dimension Table.
than Fact table.
Surrogate Key is used to prevent the primary key (pk) In a schema less number of Fact Tables observed
violation (store historical data). compared to Dimension Tables.
Provides entry points to data. Compose of Degenerate Dimension fields act as Primary
Key.
Values of fields are in numeric and text representation. Values of the fields always in numeric or integer form.
What is the difference between RDBMS SCHEMA & DWH SCHEMA?
Schema is nothing but the systematic arrangement of tables.
In OLTP it will be normalized and in DWH it will be denormalized.
RDBMS SCHEMA DWH SCHEMA

• Used for OLTP systems • Used for OLAP systems
• Normalized • Denormalized
• Cannot Solve extract and complex problems • Extract and complex problems can be easily
solved
What is a Cube in Data Warehousing Concept?
Cube is a logical schema which contains facts and dimensions
Cubes are multidimensional view of Data Warehouse or Data Marts. It is designed in a logical way to drill up & drill
down slice-n-dice etc. which enables the business users to understand the trend of the business. It is good to have the
design of the cube in the star schema so as to facilitate the effective use of the cube. Every part of the cube is a
logical representation of the combination of facts – dimension attributes.
What is a Linked Cube?
A cube can be partitioned in 3 ways.
1. Replicate
2. Transparent
3. Linked
In the Linked cube the data cells can be linked into another analytical database. If an end-user clicks on a data cell,
you are actually linking through another analytic database.
What is Partitioning a Cube?
Partitioning a cube mainly used for optimization.
Example:
You may have data for 5GB to create a report, you can specify a size for a cube as 2GB so if the cube exceeds 2GB
it automatically creates the second cube to store the data.
What is Incremental Loading?
Incremental Loading means loading the ongoing changes in the OLTP.
Aggregate table contains the measure values, aggregated/grouped/summed up to some level of hierarchy.
What are the possible Data Marts in Retail sales?
Product Information, Sales Information.

What is the main difference between OLTP Schema and in Data Warehouse Schema?
RDBMS DWH
Normalized Denormalized
More no. of Transactions Less no. of Transactions
Less time for queries execution More time for query execution
More no. of users Less no. of users
Have Insert, Delete and Update Transactions Will not have more insert, delete and updates
What are the various ETL tools in the Market?
1. Informatica PowerCenter
2. Ab Initio
3. Data Stage
4. Oracle Warehouse Builder
5. ESS Base Hyperion
6. BO Data Integrator
7. SAS ETL
8. MS DTS
9. Pervasive Data Junction
10. Cognos Decision Stream
11. SQL Loader
12. Data Integrator (Business Objects)
13. Sunopsis
What is Dimensional Modeling?
Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In
this design model all the data is stored in two types of tables – Fact tables and Dimension tables.
Fact table contains the facts/measurements of the business e.g. sales, revenue, profit etc… and the Dimension table
contains the
How to find the number of success, rejected and bad records in the same mapping?
- First we separate this data using Expression Transformation. This is used to flag the row for 1 or 0. The
condition is follows:
IIF(NOT IS_DATE(Hiredate,’DD-MON-YY’) OR ISNULL(Empno) OR

ISNULL(Name) OR ISNULL(Hiredate) OR (ISNULL(Sex), 1, 0)
FLAG=1 is considered as invalid data and FLAG=0 is considered as valid data. This data will be routed
into next transformation using Router Transformation. Here we added two user groups one as FLAG=1
for invalid data and the other as FLAG=0 for valid data.
FLAG=1 data is forwarded to the Expression Transformation. Here we take one variable port and two
output ports. One for increment purpose and the other for flag the row.

Data Warehouse Concepts

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouse Concepts

Uploaded by

Copyright:

Available Formats

What is the main difference between Inmon and Kimball?

According to Ralf Kimball…

KIMBALL – First Data Marts – Combined way – Data warehouse.

According to Bill Inmon…

INMON – First Data warehouse – Later – Data Marts.

The Main difference is:

Kimball: follows Dimensional Modeling.

Inmon: follows ER Modeling bye Mayee.

Inmon: creating data warehouse then data marts.

What is difference between Views and Materialized Views?

The column which we use in dimension is Degenerate Dimension.

EMP table has empno, ename, sal, job, deptno

How to list Top 10 salaries, without using Rank Transformation?

What is Data Warehousing?

How do u handle two sessions in Informatica?

Using Link Condition.

If first session is succeeded automatically second runs and succeeded.

How to create Slowly Changing Dimension in Informatica?

SQ –> 1 Connected Lookup –> target

How we do performance tuning in Informatica?

Explain about scheduling real time in Informatica?

Scheduling of Informatica jobs can be done by the following ways:

Informatica Workflow Manager, Using Cron in UNIX, Using Opcon Scheduler.

What is the definition of Normalized and Denormalized?

Why fact table is in normal form?

What is difference between E-R Modeling and Dimensional modeling?

Dimensional modeling/Multi-Dimensional Modeling is used for de-normalizing the ROLAP/MOLAP design. It

What is Conformed Fact?

What are the Methodologies of Data Warehousing?

There are two Methodologies:

1. Ralph Kimball – First Data Marts then Enterprise Data Warehouse.

1. Ralph Kimball Model

Kimball Model always structured as Denormalized Structure.

2. Bill Inmon Model

In DWH contains the Two Methods:

1. Top down Method

What is Data Warehousing Hierarchy?

What are Data Validation Strategies for Data Marts Version?

Strategies are different methods followed to meet the validation requirements.

What are the data types present in BO? N what happens I…

The different data types in business objects are:

Surrogate Key is the primary key for the Dimensional table.

Example: any candidate key can be considered as Surrogate Key.

A worklet is an object that represents a set of tasks.

When to create Worklets?

What is Workflow Monitor?

You can view the log file and Performance Data.

Slowly Changed Dimension: It is a Dimension which slowly changes over a time.

Slowly Changed Dimension

Inserts new Dimensions, Overwrites existing

Inserts new and changed dimensions. Creates a

Inserts new and changed dimensions. Flags the

Inserts new and changed dimensions. Create an

Inserts new dimensions. Updates changed

What is difference between OLTP and OLAP?

On Line Transaction Processing On Line Analytical Processing

Continuously updates data Read only data

Tables are in normalized form Partially Normalized/Denormalized tables

Single record access Multiple records for analysis purpose

Holds current data Holds current and historical data

Delete the table or record Cannot delete the records

Complex data model Simplified data model

What is the difference between Data Mart and Data Warehouse?