Professional Documents
Culture Documents
If return port is only one then go for unconnected. More than one return
port is not possible with Unconnected. If more than one returns port then
go for Connected.
ETL tool is meant for extraction data from the legacy systems and load
into specified database with some process of cleansing data.
ETL tools are used to extract the data from different sources & OLAP tools
are used to analyze the data
ETL tools are used to extract, transformation and loading the data into
data warehouse / data mart
OLAP tools are used to create cubes/reports for business analysis from
data warehouse / data mart
ODS Comes between staging area & Data Warehouse. The data is ODS
will be at the low level of granularity.
Once data was populated in ODS aggregated data will be loaded into EDW
through ODS.
ODS is the Operational Data Source which is also called transactional data
ODS is the source of a warehouse. Data from ODs is staged, transformed
and then moved to data warehouse.
You cannot lookup from a source qualifier directly. However, you can
override the SQL in the source qualifier to join with the lookup table to
perform the lookup.
Connected lookup:
Unconnected lookup:
Informatica
Data stage
Business Objects Data Integrator
Materialized view:
Snapshot:
Answer 1. A snapshot is a table that contains the results of a query of one
or more tables or views, often located on a remote database.
Staging area is place where you hold temporary tables on data warehouse
server. Staging tables are connected to work area or fact tables. We
basically need staging area to hold the data, and perform data cleansing
and merging, before loading the data into warehouse.
In the absence of a staging area, the data load will have to go from the
OLTP system to the OLAP system directly, which in fact will severely
hamper the performance of the OLTP system. This is the primary reason
for the existence of a staging area. In addition, it also offers a platform for
carrying out data cleansing.
Data modeler will provide the ETL developer, the tables that are to be
extracted from various sources.
When addressing a table some dimension key must reflect the need for a
record to get extracted. Mostly it will be from time dimension (e.g. date
>= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat).
Foolproof would be adding an archive flag to record, which gets reset
when record changes.
Draw the inference if slowly changing dimension and based on the Type
1/2 or3 tables defined.
Three-tier data warehouse contains three tiers such as bottom tier, middle
tier and top tier.
Bottom tier deals with retrieving related data or information from various
information repositories by using SQL.
Middle tier contains two types of servers.
1.ROLAP server
2.MOLAP server
Top tier deals with presentation or visualization of the results.
The 3 tiers are:
1. Data tier - bottom tier - consists of the database
2. Application tier - middle tier - consists of the analytical server
3. Presentation tier - tier that interacts with the end-user
18.Can we use procedural logic inside Infromatica? If yes how, if now how
can we use external procedural logic in Infromatica?
You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command. You can call a Command task as the
pre- or post-session shell command for a Session task. For more information
about specifying pre-session and post-session shell commands
There is a task named command task, using that you can write or call Shell
script, DOS commands or BAT files
Active transformations
Passive transformation
Expression
External Procedure
Maplet- Input
Lookup
Sequence generator
XML Source Qualifier
Maplet - Output
There are pros and cons of both tools based ETL and hand-coded ETL. Tool
based ETL provides maintainability, ease of development and graphical view
of the flow. It also reduces the learning curve on the team.
Can anyone please explain why and where do we exactly use the lookup
transformations?
You can use the Lookup transformation to perform many tasks, including:
♦ Get a related value. For example, your source includes employee ID, but
you want to include the employee name in your target table to make your
summary data easier to read.
Ex1) in the transactional data we have only name and custid .. but the
complete name (with first and last is required by the biz user..) and there is
a separate table (either in source or target data base) that has the first n last
names in it.
Ex2) u need to compare the prices of the existing goods with its previous
prices (referred as type3 ) a look up table containing the OLAP data could be
handy
1.Mapping level.
2.session level.
In real time if we want to update the existing record with the same source
data you can go for session level update logic.
If you want to apply different set of rules for updating or inserting a record,
even that record is existed in the warehouse table .you can go for mapping
level Update strategy transformation. It means that if you are using Router
transformation for performing different activities.
EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance
with 10% less. If not, inserting the record with new Bonus in the Warehouse
table.
Lets suppose we have some 10,000 odd records in source system and when
load them into target how do we ensure that all 10,000 records that are
loaded to target doesn't contain any garbage values.
Select count (*) From both source table and Target table and compare the
result.
1.One-One,
2.One-Many,
3.Many-One,
4.Many-Many.
The fact table getting data from dimensions tables because it containing
primary keys of dimension tables as a foreign keys for getting summarized
data for each record.
If return port only one then we can go for unconnected. More than one return
port is not possible with Unconnected. If more than one return port then go
for Connected.
There are various ways of Extracting Data from Source Systems. For
example, you can use a DATA step; an Import Process .It depends with your
input data styles. What kind of File/database it is residing in. Storing your
data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again
which depends on the file & data format, you want your output to be in.
IDP is the portal for display of reports, stored process, information maps and
a whole bunch of thing ideally required for a dashboard reporting.
IMS, is the GUI to help u convert your technical data and map it to business
data (change names, add filters, add new columns etc)
28. What is the difference between ETL tool and OLAP tools
ETL tool is meant for extraction data from the legacy systems and load into
specified database with some process of cleansing data.
ETL tools are used to extract the data from different sources & OLAP tools
are used to analyze the data ......
ETL tools are used to extract, transformation and loading the data into data
warehouse / data mart
OLAP tools are used to create cubes/reports for business analysis from data
warehouse / data mart
1) ETL Tools
2) OLAP Tools
• Business Objects
• Cognos
• Hyperion
• Microsoft Analysis Services
• Micro strategy
3) Reporting Tools
30. What is the difference between Power Center & Power Mart?
Power Mart is designed for:
High-end warehouses
Global as well as local repositories
ERP support
Informatica Power Center is used to maintain the Global Repository, But not
in the case of Informatica Power mart. For more you can analyse the
architecture of Informatica
Powermart:
PowerCentre:
Partitioning is available
Supports ERP
The Entity relationship is nothing but maintaining a primary key, foreign key
relation between the tables for keeping the data and satisfying the Normal
form.
There are 4 types of Entity Relationships.
1.One-One,
2.One-Many,
3.Many-One,
4.Many-Many.
The fact table getting data from dimensions tables because it containing
primary keys of dimension tables as a foreign keys for getting summarized
data for each record.
1.Mapping level.
2.session level.
In real time if we want to update the existing record with the same source
data you can go for session level update logic.
If you want to apply different set of rules for updating or inserting a record,
even that record is existed in the warehouse table .you can go for mapping
level Update strategy transformation. It means that if you are using Router
transformation for performing different activities.
EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance
with 10% less. If not, inserting the record with new Bonus in the Warehouse
table.
In the absence of a staging area, the data load will have to go from the OLTP
system to the OLAP system directly, which in fact will severely hamper the
performance of the OLTP system. This is the primary reason for the existence
of a staging area. In addition, it also offers a platform for carrying out data
cleansing.
Active transformations
Passive transformation
Expression
External Procedure
Maplet- Input
Lookup
Sequence generator
XML Source Qualifier
Maplet - Output
Go to source analyzer, click on source, now u will get option 'Import from
SAP'
Click on this now give your SAP access user, client, password and filter
criteria as table name (so it will take lesser time). After connecting, import
the sap source.
Now one important thing after finishing the map save it and generate ABAP
Code for the map. Then only workflow will be running fine.
Ex: temperature
Three-tier data warehouse contains three tier such as bottom tier, middle tier
and top tier.
Bottom tier deals with retrieving related datas or information from various
information repositories by using SQL.
Middle tier contains two types of servers.
1.ROLAP server
2.MOLAP server
Top tier deals with presentation or visualization of the results
One foolproof method is to maintain a field called 'Last Extraction Date' and
then impose a condition in the code saying 'current_extraction_date >
last_extraction_date'.
There are pros and cons of both tool based ETL and hand-coded ETL. Tool
based ETL provides maintainability, ease of development and graphical view
of the flow. It also reduces the learning curve on the team.
41. What are snapshots? What are materialized views & where do
we use them? What is a materialized view log?
Materialized view is a view in which data is also stored in some temp table.i.e
if we will go with the View concept in DB in that we only store query and
once we call View it extract data from DB. But In materialized View data is
stored in some temp tables.
By Full Load or One-time load we mean that all the data in the Source
table(s) should be processed. This contains historical data usually. Once the
historical data is loaded we keep on doing incremental loads to process the
data that came after one-time load.
Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further
2 techniques:-
Refresh load - Where the existing data is truncated and reloaded
completely.
Incremental - Where delta or difference between target and source data is
dumped at regular intervals. Timestamp for previous delta load has to be maintained.
Full Load: completely erasing the contents of one or more tables and reloading with
fresh data.
Informatica allows end users and partners to extend the metadata stored in
the repository by associating information with individual objects in the
repository. For example, when you create a mapping, you can store your
contact information with the mapping. You associate information with
repository metadata using metadata extensions.
You can use a Command task to call the shell scripts, in the following ways:
1. Standalone Command task. You can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command. You can call a Command task as the
pre- or post-session shell command for a Session task. For more information
about specifying pre-session and post-session shell commands
There is a task named command task, using that you can write or call Shell
script, DOS commands or BAT files
Full Load is the entire data dump load taking place the very first time.
Gradually to synchronize the target data with source data, there are further
2 techniques:-
Refresh load - Where the existing data is truncated and reloaded
completely.
Incremental - Where delta or difference between target and source data is
dumped at regular intervals. Timestamp for previous delta load has to be
maintained
Full Load: completely erasing the contents of one or more tables and
reloading with fresh data.
ETL Tools are meant to extract, transform and load the data into Data
Warehouse for decision making. Before the evolution of ETL Tools, the
above-mentioned ETL process was done manually by using SQL code created
by programmers. This task was tedious and cumbersome in many cases
since it involved many resources, complex coding and more work hours. On
top of it, maintaining the code placed a great challenge among the
programmers.
These difficulties are eliminated by ETL Tools since they are very powerful
and they offer many advantages in all stages of ETL process starting from
extraction, data cleansing, data profiling, transformation, debugging and
loading into data warehouse when compared to the old method.
3. If you have a requirement like this you need to get the ETL tools, else
you no need any ETL
4.
What is Informatica Metadata and where is it stored?
Informatica Metadata contains all the information about the source tables,
target tables, the transformations, so that it will be useful and easy to
perform transformations during the ETL process