You are on page 1of 21

1. Where do we use connected and unconnected lookups?

If return port is only one then go for unconnected. More than one return
port is not possible with Unconnected. If more than one returns port then
go for Connected.

2. What are the various test procedures used to check whether


the data is loaded in the backend, performance of the mapping,
and quality of the data loaded in INFORMATICA.

The best procedure to take a help of debugger where we monitor each


and every process of mappings and how data is loading based on
conditions breaks

3. What is the difference between ETL tool and OLAP tools

ETL tool is meant for extraction data from the legacy systems and load
into specified database with some process of cleansing data.

Eg: Informatica, data stage ....etc

OLAP is meant for Reporting purpose in OLAP data available in


Multidimensional model. so that u can write simple query to extract data
fro the data base.

Eg: Businee objects, Cognos....etc

ETL tools are used to extract the data from different sources & OLAP tools
are used to analyze the data

ETL tools are used to extract, transformation and loading the data into
data warehouse / data mart

OLAP tools are used to create cubes/reports for business analysis from
data warehouse / data mart

4. What is ODS (operation data source)?

ODS - Operational Data Store.

ODS Comes between staging area & Data Warehouse. The data is ODS
will be at the low level of granularity.

Once data was populated in ODS aggregated data will be loaded into EDW
through ODS.
ODS is the Operational Data Source which is also called transactional data
ODS is the source of a warehouse. Data from ODs is staged, transformed
and then moved to data warehouse.

An updateable set of integrated operational data used for enterprise- wide


tactical decision-making. Contains live data, not snapshots, and has
minimal history retained can we lookup a table from source qualifier
transformation. ie. unconnected lookup

You cannot lookup from a source qualifier directly. However, you can
override the SQL in the source qualifier to join with the lookup table to
perform the lookup.

5. What are the different Lookup methods used in Informatica?

In the lookup transformation mainly 2 types

1) connected 2)unconnected lookup

Connected lookup:

1) It receive the value directly from pipeline

2) It will use both dynamic and static

3) It return multiple value

4) It support user defined value

Unconnected lookup:

1)it receives the value : lkp expression

2)it will be use only dynamic

3)it return only single value

4)it does not support user defined values

What is a mapping, session, worklet, workflow, mapplet?

Session: A session is a set of instructions that tells the Informatica Server


how and when to move data from sources to targets.
Mapplet : Mapplet is the set of transformation which we can make for
reusability.It is a whole logic.
Workflow: it is the pipeline, which pass or flow the data from source to
target.

7. What is the difference between Power Center & Power Mart?

Power Mart is designed for:


• Low range of warehouses only for local repositories mainly desktop
environment.
• we can connect to single and multiple Repositories, generally used
in big Enterprises.
Power mart is designed for:
• High-end warehouses Global as well as local repositories
• ERP support.
• Power Mart: we can connect to only a single Repository.

8. What are the various tools? - Name a few

The various ETL tools are as follows.

Informatica
Data stage
Business Objects Data Integrator

OLAP tools are as follows.


Cognos
Business Objects

9. What are snapshots? What are materialized views?

Materialized view:

Answer 1.Materialized view is a view in which data is also stored in some


temp table.i.e if we will go with the View concept in DB in that we only
store query and once we call View it extract data from DB. But In
materialized View data is stored in some temp tables.

Answer 2.Materialized view means it stores pre calculated data, it is a


physical representation and it's occupies the space

Snapshot:
Answer 1. A snapshot is a table that contains the results of a query of one
or more tables or views, often located on a remote database.

Answer 2.Snapshot is a specific interval of data,

10. What is partitioning? What are the types of partitioning?

Partitioning is a part of physical data warehouse design that is carried out


to improve performance and simplify stored-data management.
Partitioning is done to break up a large table into smaller, independently
manageable components because it:
1. Reduces work involved with addition of new data.
2. Reduces work involved with purging of old data.

Two types of partitioning are:


1. Horizontal partitioning.
2. Vertical partitioning (reduces efficiency in the context of a data
warehouse).

11.What are the modules in Power Mart?

1. Power Mart Designer


2. Server
3. Server Manager
4. Repository
5. Repository Manager

12. What is a staging area? Do we need it? What is the purpose of


a staging area?

Staging area is place where you hold temporary tables on data warehouse
server. Staging tables are connected to work area or fact tables. We
basically need staging area to hold the data, and perform data cleansing
and merging, before loading the data into warehouse.

In the absence of a staging area, the data load will have to go from the
OLTP system to the OLAP system directly, which in fact will severely
hamper the performance of the OLTP system. This is the primary reason
for the existence of a staging area. In addition, it also offers a platform for
carrying out data cleansing.

According to the complexity of the business rule, we may require staging


area, the basic need of staging area is to clean the OLTP source data and
gather in a place.
13. How to determine what records to extract?

Data modeler will provide the ETL developer, the tables that are to be
extracted from various sources.
When addressing a table some dimension key must reflect the need for a
record to get extracted. Mostly it will be from time dimension (e.g. date
>= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat).
Foolproof would be adding an archive flag to record, which gets reset
when record changes.

Draw the inference if slowly changing dimension and based on the Type
1/2 or3 tables defined.

14.What are the various transformation available?

Transformation plays an important role in Data warehouse.


Transformation is used when data is moved from source to destination.
Depending upon criteria transformations are done. Some of the
transformations are

The Various Type Of Transformation In Informatica


• Source Qualifier
• Aggregate
• Sequence Generator
• Sorter
• Router
• Filter
• Lookup
• Update Strategy
• Joiner
• Normalizer
• Expression
• Rank
• Stored Procedure

15.What is a three-tier data warehouse?

Three-tier data warehouse contains three tiers such as bottom tier, middle
tier and top tier.
Bottom tier deals with retrieving related data or information from various
information repositories by using SQL.
Middle tier contains two types of servers.
1.ROLAP server
2.MOLAP server
Top tier deals with presentation or visualization of the results.
The 3 tiers are:
1. Data tier - bottom tier - consists of the database
2. Application tier - middle tier - consists of the analytical server
3. Presentation tier - tier that interacts with the end-user

16.How can we use mapping variables in Informatica? Where do we


use them?

After creating a variable, we can use it in any expression in a mapping or


a mapplet. Also they can be used in source qualifier filter, user defined
joins or extract overrides and in expression editor of reusable
transformations.
Their values can change automatically between sessions.

17.What are the various methods of getting incremental records or delta


records from the source systems?

Getting incremental records from source systems to target can be done


by using incremental aggregation transformation
One foolproof method is to maintain a field called 'Last Extraction Date'
and then impose a condition in the code saying 'current_extraction_date
> last_extraction_date'.

Using mapping parameters and variable or type1 we can easily define


from where parameter will start and how variable will change as deltas
will get from OLTP systems

18.Can we use procedural logic inside Infromatica? If yes how, if now how
can we use external procedural logic in Infromatica?

We can use External Procedure Transformation to use external


procedures. Both COM and Informatica Procedures are supported using
External procedure Transformation

Can we override a native sql query within Informatica? Where do we do it?


How do we do it?
we can override a sql query in the sql override property of a source
qualifier

19.What is latest version of Power Center / Power Mart?

The Latest Version is 9


20.How do we call shell scripts from informatica?

You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command. You can call a Command task as the
pre- or post-session shell command for a Session task. For more information
about specifying pre-session and post-session shell commands

There is a task named command task, using that you can write or call Shell
script, DOS commands or BAT files

21.What are active transformation / Passive transformations?

Transformations can be active or passive. An active transformation can


change the number of rows that pass through it, such as a Filter
transformation that removes rows that do not meet the filter condition. A
passive transformation does not change the number of rows that pass
through it, such as an Expression transformation that performs a
calculation on data and passes all rows through the transformation

Active transformations

Advanced External Procedure


Aggregator
Application Source Qualifier
Filter
Joiner
Normalizer
Rank
Router
Update Strategy

Passive transformation
Expression
External Procedure
Maplet- Input
Lookup
Sequence generator
XML Source Qualifier
Maplet - Output

22.When do we analyze the tables? How do we do it?


When the data in the data warehouse changes frequently we need to
analyze the tables. Analyze tables will compute/update the table statistics
that will help to boost the performance of your SQL.

23.Compare ETL & Manual development?

There are pros and cons of both tools based ETL and hand-coded ETL. Tool
based ETL provides maintainability, ease of development and graphical view
of the flow. It also reduces the learning curve on the team.

Hand coded ETL is good when there is minimal transformational logic


involved. It is also good when the sources and targets are in the same
environment. However, depending on the skill level of the team, this can
extend the overall development time.

Can anyone please explain why and where do we exactly use the lookup
transformations?

You can use the Lookup transformation to perform many tasks, including:

♦ Get a related value. For example, your source includes employee ID, but
you want to include the employee name in your target table to make your
summary data easier to read.

♦ Perform a calculation. Many normalized tables include values used in a


calculation, such as gross sales per invoice or sales tax, but not the
calculated value (such as net sales).

♦ Update slowly changing dimension tables. You can use a Lookup


transformation to determine whether rows already exist in the target.

Lookup Transformation can be used mainly for slowly changing dimensions


and for getting related values

Look Up Transformation is generally used when a fixed data is not present in


the mappings we use but is required in the warehouse or look up is more
importantly used to compare the values...

Ex1) in the transactional data we have only name and custid .. but the
complete name (with first and last is required by the biz user..) and there is
a separate table (either in source or target data base) that has the first n last
names in it.
Ex2) u need to compare the prices of the existing goods with its previous
prices (referred as type3 ) a look up table containing the OLAP data could be
handy

In real time scenario where update strategy transformation is used?


if we DML operations in session properties then what is the use of
update strategy transformation ?

We can use Update strategy transformation in two ways .

1.Mapping level.

2.session level.

Importance of Update strategy transformation in both cases as follows.

In real time if we want to update the existing record with the same source
data you can go for session level update logic.

If you want to apply different set of rules for updating or inserting a record,
even that record is existed in the warehouse table .you can go for mapping
level Update strategy transformation. It means that if you are using Router
transformation for performing different activities.

EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance
with 10% less. If not, inserting the record with new Bonus in the Warehouse
table.

Lets suppose we have some 10,000 odd records in source system and when
load them into target how do we ensure that all 10,000 records that are
loaded to target doesn't contain any garbage values.

24. How do we test it? We can't check every record, as number of


records is huge?

Select count (*) From both source table and Target table and compare the
result.

25. What is Entity relation? How is works with Data warehousing


ETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in terms


of Logical view. The entity is called as a table in terms of Physical view.
The Entity relationship is nothing but maintaining a primary key, foreign key
relation between the tables for keeping the data and satisfying the Normal
form.

There are 4 types of Entity Relationships.

1.One-One,

2.One-Many,

3.Many-One,

4.Many-Many.

In the Data warehouse modeling Entity Relationship is nothing but, a


Relationship between dimension and facts tables (ie: Primary, foreign key
relations between these tables).

The fact table getting data from dimensions tables because it containing
primary keys of dimension tables as a foreign keys for getting summarized
data for each record.

26. Where do we use connected and un connected lookups

If return port only one then we can go for unconnected. More than one return
port is not possible with Unconnected. If more than one return port then go
for Connected.

27. Explain the process of extracting data from source systems,


storing in ODS and how data modeling is done.

There are various ways of Extracting Data from Source Systems. For
example, you can use a DATA step; an Import Process .It depends with your
input data styles. What kind of File/database it is residing in. Storing your
data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again
which depends on the file & data format, you want your output to be in.

IDP is the portal for display of reports, stored process, information maps and
a whole bunch of thing ideally required for a dashboard reporting.

IMS, is the GUI to help u convert your technical data and map it to business
data (change names, add filters, add new columns etc)

28. What is the difference between ETL tool and OLAP tools
ETL tool is meant for extraction data from the legacy systems and load into
specified database with some process of cleansing data.

ex: Informatica, data stage ....etc

OLAP is meant for Reporting purpose. in OLAP data available in


Multidimensional model. so that u can write simple query to extract data fro
the data base.

ex: Businee objects,Cognos....etc

ETL tools are used to extract the data from different sources & OLAP tools
are used to analyze the data ......

ETL tools are used to extract, transformation and loading the data into data
warehouse / data mart

OLAP tools are used to create cubes/reports for business analysis from data
warehouse / data mart

29. What are the various tools? - Name a few

1) ETL Tools

• IBM Web sphere Information Integration(Ascential Data stage)


• Ab Initio
• Informatica

2) OLAP Tools

• Business Objects
• Cognos
• Hyperion
• Microsoft Analysis Services
• Micro strategy

3) Reporting Tools

• Business Objects (Crystal Reports)


• Cognos
• Actuate

30. What is the difference between Power Center & Power Mart?
Power Mart is designed for:

Low range of warehouses


only for local repositories
mainly desktop environment.

Power mart is designed for:

High-end warehouses
Global as well as local repositories
ERP support

Power Center : we can connect to single and multiple Repositories, generally


used in big Enterprises.
Power Mart : we can connect to only a single Repository.

Informatica Power Center is used to maintain the Global Repository, But not
in the case of Informatica Power mart. For more you can analyse the
architecture of Informatica

Powermart:

We can register only local repositories

Partitioning is not available here

doesnot support ERP

PowerCentre:

We can make repositories to GLOBAL

Partitioning is available

Supports ERP

31. What is Entity relation? How is works with Data warehousing


ETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in terms


of Logical view. The entity is called as a table in terms of Physical view.

The Entity relationship is nothing but maintaining a primary key, foreign key
relation between the tables for keeping the data and satisfying the Normal
form.
There are 4 types of Entity Relationships.

1.One-One,

2.One-Many,

3.Many-One,

4.Many-Many.

In the Data warehouse modeling Entity Relationship is nothing but, a


Relationship between dimension and facts tables (ie: Primary, foreign key
relations between these tables).

The fact table getting data from dimensions tables because it containing
primary keys of dimension tables as a foreign keys for getting summarized
data for each record.

In real time scenario where update strategy transformation is used? if we


dml operations in session properties then what is the use of update strategy
transformation ?

We can use Update strategy transformation in two ways .

1.Mapping level.

2.session level.

Importance of Update strategy transformation in both cases as follows.

In real time if we want to update the existing record with the same source
data you can go for session level update logic.

If you want to apply different set of rules for updating or inserting a record,
even that record is existed in the warehouse table .you can go for mapping
level Update strategy transformation. It means that if you are using Router
transformation for performing different activities.

EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance
with 10% less. If not, inserting the record with new Bonus in the Warehouse
table.

32. What is a staging area? Do we need it? What is the purpose of


a staging area?
Staging area is place where you hold temporary tables on data warehouse
server. Staging tables are connected to work area or fact tables. We basically
need staging area to hold the data , and perform data cleansing and merging
, before loading the data into warehouse.

In the absence of a staging area, the data load will have to go from the OLTP
system to the OLAP system directly, which in fact will severely hamper the
performance of the OLTP system. This is the primary reason for the existence
of a staging area. In addition, it also offers a platform for carrying out data
cleansing.

Staging area is a temp schema used to


1. Do Flat mapping i.e. dumping all the OLTP data in to it without applying
any business rules. Pushing data into staging will take less time because
there is no business rules or transformation applied on it.

2. Used for data cleansing and validation using First Logic.

33.What are active transformation / Passive transformations?

An active transformation can change the number of rows as output after a


transformation, while a passive transformation does not change the number
of rows and passes through the same number of rows that was given to it as
input.

Transformations can be active or passive. An active transformation can


change the number of rows that pass through it, such as a Filter
transformation that removes rows that do not meet the filter condition. A
passive transformation does not change the number of rows that pass
through it, such as an Expression transformation that performs a calculation
on data and passes all rows through the transformation

Active transformations

Advanced External Procedure


Aggregator
Application Source Qualifier
Filter
Joiner
Normalizer
Rank
Router
Update Strategy

Passive transformation
Expression
External Procedure
Maplet- Input
Lookup
Sequence generator
XML Source Qualifier
Maplet - Output

34. How do we extract SAP data Using Informatica? What is ABAP?


What are IDOCS?
To extract SAP DATA.

Go to source analyzer, click on source, now u will get option 'Import from
SAP'

Click on this now give your SAP access user, client, password and filter
criteria as table name (so it will take lesser time). After connecting, import
the sap source.

Now one important thing after finishing the map save it and generate ABAP
Code for the map. Then only workflow will be running fine.

35. Where do we use semi and non-additive facts?

Additive: A measure can participate arithmetic calculations using all or any


dimensions.

Ex: Sales profit

Semi additive: A measure can participate arithmetic calculations using some


dimensions.

Ex: Sales amount

Non-Additive measure can’t participate arithmetic calculations using


dimensions.

Ex: temperature

36. What is a mapping, session, worklet, workflow, and mapplet?

Session: A session is a set of instructions that tells the Informatica Server


how and when to move data from sources to targets.
Mapplet: Mapplet is the set of transformation, which we can make for
reusability. It is a whole logic.
Workflow: it is the pipeline, which pass or flow the data from source to
target.

A mapping represents dataflow from sources to targets.


A mapplet creates or configures a set of transformations.

A workflow is a set of instructions that tell the Informatica server how to


execute the tasks.

A worklet is an object that represents a set of tasks.

A session is a set of instructions to move data from sources to targets.

Mapping - represents the flow and transformation of data from source to


target.
Mapplet - a group of transformations that can be called within a mapping.
Session - a task associated with a mapping to define the connections and
other configurations for that mapping.
Workflow - controls the execution of tasks such as commands, emails and
sessions.
Worklet - a workflow that can be called within a workflow.

Session - a task associated with a mapping to define the connections and


other configurations for that mapping.

Workflow - controls the execution of tasks such as commands, emails and


sessions.

Worklet - a workflow that can be called within a workflow.

Mapping - represents the flow and transformation of data from source to


target.

Mapplet - a group of transformations that can be called within a mapping.

37. What is a three-tier data warehouse?

Three-tier data warehouse contains three tier such as bottom tier, middle tier
and top tier.
Bottom tier deals with retrieving related datas or information from various
information repositories by using SQL.
Middle tier contains two types of servers.
1.ROLAP server
2.MOLAP server
Top tier deals with presentation or visualization of the results

The 3 tiers are:


1. Data tier - bottom tier - consists of the database
2. Application tier - middle tier - consists of the analytical server
3. Presentation tier - tier that interacts with the end-user

38. What are the various methods of getting incremental records


or delta records from the source systems?

Getting incremental records from source systems to target can be done


by using incremental aggregation transformation

One foolproof method is to maintain a field called 'Last Extraction Date' and
then impose a condition in the code saying 'current_extraction_date >
last_extraction_date'.

39. Compare ETL & Manual development?

There are pros and cons of both tool based ETL and hand-coded ETL. Tool
based ETL provides maintainability, ease of development and graphical view
of the flow. It also reduces the learning curve on the team.

Hand coded ETL is good when there is minimal transformational logic


involved. It is also good when the sources and targets are in the same
environment. However, depending on the skill level of the team, this can
extend the overall development time.

40. Can Informatica load heterogeneous targets from


heterogeneous sources?

Yes! it loads from heterogeneous sources..

41. What are snapshots? What are materialized views & where do
we use them? What is a materialized view log?

Materialized view is a view in which data is also stored in some temp table.i.e
if we will go with the View concept in DB in that we only store query and
once we call View it extract data from DB. But In materialized View data is
stored in some temp tables.

A snapshot is a table that contains the results of a query of one or more


tables or views, often located on a remote database.
Snapshot is a specific interval of data,

Materialized view means it stores precaluculated data, it is a physical


representation and it's occupies the space

42. What is Full load & Incremental or Refresh load?

By Full Load or One-time load we mean that all the data in the Source
table(s) should be processed. This contains historical data usually. Once the
historical data is loaded we keep on doing incremental loads to process the
data that came after one-time load.

Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further
2 techniques:-
Refresh load - Where the existing data is truncated and reloaded
completely.
Incremental - Where delta or difference between target and source data is
dumped at regular intervals. Timestamp for previous delta load has to be maintained.

Full Load: completely erasing the contents of one or more tables and reloading with
fresh data.

Incremental Load: applying ongoing changes to one or more tables based on a


predefined schedule.

43. What is the metadata extension?

Informatica allows end users and partners to extend the metadata stored in
the repository by associating information with individual objects in the
repository. For example, when you create a mapping, you can store your
contact information with the mapping. You associate information with
repository metadata using metadata extensions.

Informatica Client applications can contain the following types of metadata


extensions:

• Vendor-defined. Third-party application vendors create vendor-


defined metadata extensions. You can view and change the values of
vendor-defined metadata extensions, but you cannot create, delete, or
redefine them.
• User-defined. You create user-defined metadata extensions using
PowerCenter/PowerMart. You can create, edit, delete, and view user-
defined metadata extensions. You can also change the values of user-
defined extensions.
For this purpose only they came with new product called Informatica
SuperGlue.

Informatica Metadata Exchange (MX) provides a set of relational views that


allow easy SQL access to the Informatica metadata repository. The
Repository Manager generates these views when you create or upgrade a
repository.

MX views provide information to help you analyze the following types of


metadata stored in the repository:

• Database definition metadata


• Source metadata
• Target metadata
• Mapping and transformation metadata
• Session and workflow metadata

Try Super glue also good tool

44. How do we call shell scripts from informatica?

You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command. You can call a Command task as the
pre- or post-session shell command for a Session task. For more information
about specifying pre-session and post-session shell commands

There is a task named command task, using that you can write or call Shell
script, DOS commands or BAT files

What is Full load & Incremental or Refresh load?


By Full Load or One-time load we mean that all the data in the Source
table(s) should be processed. This contains historical data usually. Once the
historical data is loaded we keep on doing incremental loads to process the
data that came after one-time load.

Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further
2 techniques:-
Refresh load - Where the existing data is truncated and reloaded
completely.
Incremental - Where delta or difference between target and source data is
dumped at regular intervals. Timestamp for previous delta load has to be
maintained
Full Load: completely erasing the contents of one or more tables and
reloading with fresh data.

Incremental Load: applying ongoing changes to one or more tables based on


a predefined schedule.

When u load data initially in to data warehouse it will be Full Load.

If u r loading data everyday in to data warehouse it will be incremental


load

Do we need an ETL tool? When do we go for the tools in the market?

ETL Tools are meant to extract, transform and load the data into Data
Warehouse for decision making. Before the evolution of ETL Tools, the
above-mentioned ETL process was done manually by using SQL code created
by programmers. This task was tedious and cumbersome in many cases
since it involved many resources, complex coding and more work hours. On
top of it, maintaining the code placed a great challenge among the
programmers.

These difficulties are eliminated by ETL Tools since they are very powerful
and they offer many advantages in all stages of ETL process starting from
extraction, data cleansing, data profiling, transformation, debugging and
loading into data warehouse when compared to the old method.

1. Normally ETL Tool stands for Extraction Transformation Loader

2. This helps you to extract the data from different ODS/Database,

3. If you have a requirement like this you need to get the ETL tools, else
you no need any ETL
4.
What is Informatica Metadata and where is it stored?

Informatica Metadata contains all the information about the source tables,
target tables, the transformations, so that it will be useful and easy to
perform transformations during the ETL process

The Informatica Metadata is stored in Informatica repository

You might also like