Professional Documents
Culture Documents
When the organization data is created at a single point of access it is called as enterprise data
warehousing. Data can be provided with a global view to the server via a single source store.
One can do periodic analysis on that same source. It gives better results but however the time
required is high.
What the difference is between a database, a data warehouse and a data mart?
Database includes a set of sensibly affiliated data which is normally small in size as compared to
data warehouse. While in data warehouse there are assortments of all sorts of data and data is
taken out only according to the customers needs. On the other hand datamart is also a set of data
which is designed to cater the needs of different domains. For instance an organization having
different chunk of data for its different departments i.e. sales, finance, marketing etc.
When all related relationships and nodes are covered by a sole organizational point, its called
domain. Through this data management can be improved.
Repository server controls the complete repository which includes tables, charts, and various
procedures etc. Its main function is to assure the repository integrity and consistency. While a
powerhouse server governs the implementation of various processes among the factors of
servers database repository.
There can be any number of repositories in Informatica but eventually it depends on number of
ports.
Partitioning a session means solo implementation sequences within the session. Its main
purpose is to improve servers operation and efficiency. Other transformations including
extractions and other outputs of single partitions are carried out in parallel.
For the purpose of creating indexes after the load process, command tasks at session level can be
used. Index creating scripts can be brought in line with the sessions workflow or the post
session implementation sequence. Moreover this type of index creation cannot be controlled after
the load process at transformation level.
Explain sessions. Explain how batches are used to combine executions?
A teaching set that needs to be implemented to convert data from a source to a target is called a
session. Session can be carried out using the sessions manager or pmcmd command. Batch
execution can be used to combine sessions executions either in serial manner or in a parallel.
Batches can have different sessions carrying forward in a parallel or serial manner.
One can group any number of sessions but it would be easier for migration if the number of
sessions are lesser in a batch.
When values change during the sessions execution its called a mapping variable. Upon
completion the Informatica server stores the end value of a variable and is reused when session
restarts. Moreover those values that do not change during the sessions execution are called
mapping parameters. Mapping procedure explains mapping parameters and their usage. Values
are allocated to these parameters before starting the session.
1. Difficult requirements
How can one identify whether mapping is correct or not without connecting session?
One can find whether the session is correct or not without connecting the session is with the help
of debugging option.
Can one use mapping parameter or variables created in one mapping into any other
reusable transformation?
Yes, One can do because reusable transformation does not contain any mapplet or mapping.
Aggregator transformations are handled in chunks of instructions during each run. It stores
transitional values which are found in local buffer memory. Aggregators provides extra cache
files for storing the transformation values if extra memory is required.
Briefly describe lookup transformation?
Lookup transformations are those transformations which have admission right to RDBMS based
data set. The server makes the access faster by using the lookup tables to look at explicit table
data or the database. Concluding data is achieved by matching the look up condition for all look
up ports delivered during transformations.
The dimensions that are utilized for playing diversified roles while remaining in the same
database domain are called role playing dimensions.
The types of metadata includes Source definition, Target definition, Mappings, Mapplet,
Transformations.
When data moves from one code page to another provided that both code pages have the same
character sets then data loss cannot occur. All the characteristics of source page must be
available in the target page. Moreover if all the characters of source page are not present in the
target page then it would be a subset and data loss will definitely occur during transformation
due the fact the two code pages are not compatible.
All the mappings cannot be validated simultaneously because each time only one mapping can
be validated.
It allows one to do aggregate calculations such as sums, averages etc. It is unlike expression
transformation in which one can do calculations in groups.
Values can be calculated in single row before writing on the target in this form of transformation.
It can be used to perform non aggregate calculations. Conditional statements can also be tested
before output results go to target tables.
What do you mean by filter transformation?
Joiner transformation combines two affiliated heterogeneous sources living in different locations
while a source qualifier transformation can combine data emerging from a common source.
It is used for looking up data in a relational table through mapping. Lookup definition from any
relational database is imported from a source which has tendency of connecting client and server.
One can use multiple lookup transformation in a mapping.
It is a diverse input group transformation which can be used to combine data from different
sources. It works like UNION All statement in SQL that is used to combine result set of two
SELECT statements.
Option for incremental aggregation is enabled whenever a session is created for a mapping
aggregate. Power center performs incremental aggregation through the mapping and historical
cache data to perform new aggregation calculations incrementally.
What is the difference between a connected look up and unconnected look up?
When the inputs are taken directly from other transformations in the pipeline it is called
connected lookup. While unconnected lookup doesnt take inputs directly from other
transformations, but it can be used in any transformations and can be raised as a function using
LKP expression. So it can be said that an unconnected lookup can be called multiple times in
mapping.
What is a mapplet?
A recyclable object that is using mapplet designer is called a mapplet. It permits one to reuse the
transformation logic in multitude mappings moreover it also contains set of transformations.
What does update strategy mean, and what are the different option of it?
Row by row processing is done by informatica. Every row is inserted in the target table because
it is marked as default. Update strategy is used whenever the row has to be updated or inserted
based on some sequence. Moreover the condition must be specified in update strategy for the
processed row to be marked as updated or inserted.
This happens when it faces DD_Reject in update strategy transformation. Moreover it disrupts
the database constraint filed in the rows was condensed.
Surrogate key is a replacement for the natural prime key. It is a unique identification for each
row in the table. It is very beneficial because the natural primary key can change which
eventually makes update more difficult. They are always used in form of a digit or integer.
In order to perform session partition one need to configure the session to partition source data
and then installing the Informatica server machine in multifold CPUs.
Which files are created during the session rums by informatics server?
During session runs, the files created are namely Errors log, Bad file, Workflow low and session
log.
It is a chunk of instruction the guides Power center server about how and when to transfer data
from sources to targets.
This specific task permits one or more than one shell commands in UNIX or DOS in windows to
run during the workflow.
This task can be used anywhere in the workflow to run the shell commands.
What is meant by Pre and Post session shell command?
Command task can be called as the pre or post session shell command for a session task. One can
run it as pre session command r post session success command or post session failure command.
User defined event can be described as a flow of tasks in the workflow. Events can be created
and then raised as need arises.
Work flow is a bunch of instructions that communicates server about how to implement tasks.
1. Task Designer
2. Task Developer
3. Workflow Designer
Tell me any other tools for scheduling purpose other than workflow manager pmcmd?
The tool for scheduling purpose other than workflow manager can be a third party tool like
CONTROL M.
What is a Worklet?
When the workflow tasks are grouped in a set, it is called as worklet. Workflow tasks includes
timer, decision, command, event wait, mail, session, link, assignment, control etc.
What is the use of target designer?
Throughput option can be found in informatica in workflow monitor. In workflow monitor, right
click on session, then click on get run properties and under source/target statistics we can find
throughput option.
Target load order is specified on the basis of source qualifiers in a mapping. If there are multifold
source qualifiers linked to different targets then one can entitle order in which informatica server
loads data into targets.
Aggregator Transformation
This is the type an active transformation which allows you to calculate the summarys for a
group of records. An aggregated transformation is created with following components.
Group by
This component defines the group for a specific port (s) which participates in aggregation
Aggregate Expression
Use aggregate functions to drive the aggregate expression which can be develop either in
variable ports (or) In only output ports
Sorted input
Group by ports are sorted using a sorted transformation and receive the sorted data as a input
to improve the performance of data aggregation.
Keep the sorted transformation prior the aggregator transformation to perform sorting on fro up
by ports.
Aggregate Cache
Unsorted inputs
The aggregate cache contains group by ports, non group by input ports and ouptput port which
contains aggregate expressions.
This transformation offers even more functionality than SQLs group by statements since one
can apply conditional logic to groups within the aggregator transformation. Many different
aggregate functions can be applied to individual output ports within the transformation. One is
also able to code nested aggregate functions as well. Below is a list of these aggregate functions:
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
To create ports, you can either drag the ports to the aggregator transformation or create in the
ports tab of the aggregator.
Aggregate Cache: The integration service stores the group values in the index cache and row
data in the data cache.
Aggregate Expression: You can enter expressions in the output port or variable port.
Group by Port: This tells the integration service how to create groups. You can configure input,
input/output or variable ports for the group.
Sorted Input: This option can be used to improve the session performance. You can use this
option only when the input to the aggregator transformation in sorted on group by ports.
Examples: MAX(SUM(sales))
Conditional clauses
You can reduce the number of rows processed in the aggregation by specifying a conditional
clause.
This will include only the salaries which are greater than 1000 in the SUM calculation.
Input port
Output port
Variable port (V)
Rank Port (R)
Rank Port
Variable Port
A port which allows you to develop expression to store the data temporarily for rank calculation
is known as variable port. Variable port support to write expressions which are required for rank
calculation.
Top or bottom
Number of Ranks
Top/Bottom Specifies whether you want the top or bottom ranking for a column.
Case-Sensitive String Comparison Specifies whether the Data Integration Service uses case-sensitive
string comparisons when it ranks strings. Clear this option to have the Data Integration Service ignore
case for strings.
Cache Directory- Local directory where the Data Integration Service creates the index cache files and
data cache files. Default is the CacheDir system parameter.
Rank Data Cache Size Data cache size for the transformation. Default is Auto.
Rank Index Cache Size Index cache size for the transformation. Default is Auto.
Tracing Level Amount of detail that appears in the log for this transformation. You can choose terse,
normal, verbose initialization, or verbose data. Default is normal.
Defining Groups
Like the Aggregator transformation, the Rank transformation lets you group information.
Example: If you want to select the 10 most expensive items by manufacturer, you would first define a
group for each manufacturer.
You will see one port RANKINDEX port already there. This port store the ranking of each
record and can be used to populate target as well
Add all additional port from source input which are going to be use in following transformation.
Open the port tab and first check the Group by option for desired column ( for example deptno in
our case)
Also check the Rank (R) option for the port which you want to do ranking. For example salary
in our case.
We can define Group by indicator for multiple port, but Ranking can be done on single
port only.
Go to the properties tab, select the Top/Bottom value as Top and the Number of Ranks property
as per need.
Click OK.
Lookup Transformation
The Lookup transformation is used to look up a source, source qualifier, or target to get the
relevant data. You can look up flat file and relational tables. The Lookup transformation in
Informatica works on similar lines as the joiner, with a few differences. For example, lookup
does not require two sources. Lookup transformations can be connected and unconnected. They
extract the data from the lookup table or file based on the lookup condition.
This is the type passive transformation which allows you to perform lookup on relational table.
Flat files, synonyms and views.
When the mapping contain the workup transformation the integration service queries the lock up
data and compares it with lookup input port values.
The lookup transformation support horizontal merging such as equijoin and non equijoin.
For each input row, the Integration Service queries the lookup source or cache based on the
lookup ports and the condition in the transformation.
If the transformation is uncached or uses a static cache, the Integration Service returns values
from the lookup query.
If the transformation uses a dynamic cache, the Integration Service inserts the row into the cache
when it does not find the row in the cache. When the Integration Service finds the row in the
cache, it updates the row in the cache or leaves it unchanged. It flags the row as insert, update, or
no change.
The Integration Service passes return values from the query to the next transformation. If the
transformation uses a dynamic cache, you can pass rows to a Filter or Router transformation to
filter new rows to the target.
If there is no match for the lookup condition, the Integration Service returns the default value for
all output ports.
In expression transformation create a expression for Null handling for all column except
INS_UPD_User
Port Name
Exp_Correct
Exp_Exception
In this Router T/R , copy the Exp_Exception Ports to Exp_Expression T/R and Exp T/R ports
connect to the Excp_Target
From Router T/R copy the Exp_correct ports to the Expression T/R
From transformation select the LOOK UP transformation
Port Name
New Flag
Update Flag
> Copy the Update Flag ports to the expression and connect Target
1. INS_Upd_date
2. INS_Upd_User
Create the transformation type expression to handle the null values create the I/R type
router to pass the correct date to one exp to pass the expression data to another expression I/R
NOTE: define the null handling on all the ports except following ports
Emp_insert_DATE
EMP_I/p date_DATE
INS_Upd_User
Note: In expression T/R write an expression to handle the nulls on all ports except rp-details-
last-modified date, rp- details-lost-modified by
Is null(end-date)And
There are three sources to load the data into fact table
1. Client order
2. Client allocation
3. Client execution
From tool menu select source analyzer from source menu select create enter the name stg-
transaction-detail-fact
Column name Data type Precision Scale Not null Key type
Branch Varchar 2 4
Account number Varchar 2 10
Deport Varchar 2 2
Client-flag Varchar 2 3
Counter party-flag Varchar 2 3
Full date date
Emp-web-SSO-ID Varchar 2 200
market Varchar 2 50
Product-ISI Varchar 2 200
Client-order-amount number 20
Allocation-amount number 20
Execution number 20
Pending-amount number 20
Double click on expression T/R select the ports tab UN check the output ports (except client
order amount, allocation amount, execution amount)
Branch I
D-branch
(C-branch)
Full date O
Create an output port with the name full- date with the following expression
Create a look up T/R which perform a look up on data-dim market-dim, account dim, product
dim, employee-dim
From expression transformation (from source) copy the 13 ports to the expression T/R
Create the transformation type lookup which perform a look on target table (transaction- detail-
fact)
From the expression T/R copy the following ports to the look up transformation
(Branch, account-no, deport, client-flag, counterparty-flag, full-date, market-code, product-ISIN,
EMP-web SSC-ID)
New -flag O
True,false
Update -flag O
True,false
From the Expression T/R copy all ports to the router T/R
New-flag new-
flag=true
Update-flag
update=true
SQL Transformation:
Go to tool menu and click on target designer import the target table
EMP- SQL:
EMP no Number
Job Vcarchar2
Dept no Number
Sal number
Comm. number
MGR number
Bellow
Select Emp number, E name, job, dept no, hire date, sal, comm, MGR,
Parameter
Click ok
Select emp no from SQ and link to the SQL T/R input ports
Select the all the output column from SQL T/R link to the Target
In session level provide the relation connection information (BSR-Reader) to SQL T/R
The integration service rises the user defined events during the work flow run
The user defined events declared in work flow properties event tab
The event rise task are always use in conjunction with event wait task
The event raise task send a signal to the event wait task that a particular set of predetermined
event have occurred
A user defined event is defined completion of tasks from start task to event raise task
An event wait task waits for a specific event to occur to start the next task in the work flow
Attribute Value
User defined event Dim-load-complete
From work flow designer select task, select command and event wait
Double click on command task and select command tab
Name Command
Success Copy e:/result.txt to c:/ batch 4pm
Assignment task:
It allows you to assign the values (or) expressions to the user defined workflow
variables
The user defined workflow variables are declared in a work flow properties variable tab
Decision task:
The decision task allows you to define the condition and the condition is evaluated by
integration service, returns true (or) false.
From tools menu, selects workflow designer, from the work flow menu select create
Enter the work flow name w-daily-weekly-wad
Select the variable tab from tool bar click on add a new variable
Bellow
Click ok
$assign. Status=succeeded
Attribute Value
$Decision. Condition=true
1. ETL Stage1
2. ETL Stage2
Its process of study and analyzing source data. We can detect records with in the Null Values,
duplicate Records, Inconsistency data and data definition.
Key Points Sequence Generator:
4) Cycle If checked, sequence generator returns to start value when end value is reached
otherwise it stops.
5) Number of cached Values Enables storing multiple values when the same sequence
generator is used in multiple sessions simultaneously
6) Reser it checked each session returns to starts value; otherwise each new session continues
from the last stored value.
Senarios 2 :
Workflow Senarios:
A workflow is having a 5 session run first 4 sessions in parallel, If all four sesion are soccers
then execute 5 sesion
The Integration Service inserts the records and updates the records in the cache. Use dynamic
LOOKUP cache for eliminating duplicates (or) In implementing slowly changing dimensions
Type1
The dynamic LOOKUP cache is used when you perform a LOOKUP on target table
The dynamic LOOKUP transformation allows for the synchronization of the dynamic target
LOOKUP table image in the memory with its physical tale in the database.
0 The integration service does not update (or) Insert the row in the cache
Key Points:
The LOOKUP transformation associates port matches a LOOKUP input port with the
corresponding port in the LOOKUP cache.
The Ignore NULL ports for updates should be checked to port where NULL
The Ignore a camparision should be checked for any port that is not to be compared.
The flag New LOOKUP Row indicates the type of row manipulation of the cache. If an input
row creates an insert in the LOOKUP cache.
The flag is set to 1 if an input row creates an updates of the LOOKUP ache the flag is set to
2. If no changes is detected the flow is set to 0. A filter or router T/R can be used with an
updates to set the proper row flag to update a target table.
Procedure:
emp no = EmpNO
Ename Ename1
Job Job1
MGR MGR1
Hiredate Hiredate1
sal sal1
comm comm1
Deptno Deptno1
From LOOKUP transformation copy the following ports to the router transformation (new
LOOKUP row, empkey, empno, ..deptno)
From New output group copy the ports to [except new LOOKUP] Update strategy and develop
the express DD-Insert .
Update Flow:
From Updateoutputgroup copy th eports Update strategy transformation and develop the
following expression DD-Update
Create the following two stored procedures in the target databse account.
Create or Replace procedure Emp_Create_Index
V_Table_Name in varchar2;
V_Index_Col_Name in varchar2;
V_Index_Name in varchar2;
as
begin
Execute Immediate
end;
Procedure 2:
V_Index_Name in varchar2
as
begin
Execute Immediate
end;
/
SQL> Create table emp_TGt as select * from scott.emp where 1=2; enter
Table create
Mapping Designing:
Create a target defiantion withe the Name emp-TGT [using target designer tool]
SP_drop_Index
Execute the following stored procedure in the source database account(username scott)
V_empNo IN Number;
as
begin
SELECT SAL+NVL(COMM, 0), Sal *0.1, sal *0.4
INTO
END;
From source qualifier capo the required ports to the expression transformation
V tax Decimal 7 2
V HRA decimal 7 2
TOTAL SAL 7 2
TAX 7 2 V Tax
HRA 7 2 V HRA
SHORTCUTS:
> You can create a shortcut to a shred folder in the same repository
>
When you will create a shortcut in the same repository that is known as local shortcut.
Open the client repository manager from folder menu select create enter the folder name
AXEDW (any name)
Create source defination with the emp in the source analyser tool
From sharable folder drag the source defination emp , drop on source analyzer workspace which
belongs to destination folder Click on Yes
Version Control:
> By using version control we are maintaining the history of the metadata objects.
> You can perform the following change management tasks to create and manage multiple
version of objects in the repository.
1. Checkin
2. Check out
Check in:
Check Out:
Check in:
Select the mapping m_customer_dimension , right click on that mapping select the versioning
and click on checkIn
Name:Date:Goal
Click on Applyall
Check out:
Select the mapping m_customer_dimension , right click on that mapping select the versioning
and click on checkout
Comments
From version menu select the two version of the mappping , right click on compare , select the
selected version
Click on save file , enter the file name 4pm_compare click on save
> Partition points mark the boundaries between thread in a pipeline the integration service
redistributes rows of do at partition points.
> You can edit partition to increase the number of transformation threads and increase threads
and increase session performance.
Types of Partition:
> With key range partitioning service attributes rows of data based on a port that you define as
the partition key
1. Designer
2. Testing(Q A)
3. Pre Population
4. Production
Procedure:
> Drag and drop you source target emp_partition to the mapping designer
> Copy all the columns to sa_emp and connect to the target
> Select the source qualifier (SQ_emp) , click on edit partition point
Name Destination
Partition 1
Partition 2
Partition 3
Click on Ok
Click on edit key select the SAL column click on Add click Ok
Business Requirements:
Senario 1:
MARKET HYD
Product IBMshare
Pending Amount 0
Senario 2:
Client Name
MARKET HYD
Product IBMshare
Scenario 4:
DM_T_PRODUCT_DIM
DM_T_ACCOUNT_DIM
DM_T_EMPLOYEE_DIM
TRANSACTION_DETAIL_FACT
ETLStage1 Implementation:
The source is defined with the flatfiles, the following are the delimited flatfiles which provides
the data for extraction.
Accounts.Txt
Market.Txt
Product.Txt
Client_Allocation.Txt
Client_Execution.Txt
Client_Order.Txt
Employee.Txt
List of Table:
T_Product
T_Account
T_Employee
T_Market
Client_Order
Client_Allocation
Client_Execution
T_date
Design the simple pass mappings which migrates the data from source to staging.
MT_stg_Account_Flatfile_Ora(source to stage)
MT_stg_Client_Allocation_Flatfile_Ora(source to stage)
MT_stg_Client_Execution_Flatfile_Ora(source to stage)
MT_stg_Client_Order_Flatfile_Ora(source to stage)
MT_stg_Employee_Flatfile_Ora(source to stage)
MT_stg_Market_Flatfile_Ora(source to stage)
MT_stg_Product_Flatfile_Ora(source to stage)
Product_SRC
Account_SRC
Employee_SRC
Market_SRC
Client_Order_SRC
Client_Allocation_SRC
Client_Execution_SR
T_Product
T_Account
T_Employee
T_Market
Client_Order
Client_Allocation
Client_Execution
T_date
Create the following stored procedure to populate the data into the stage table T_date
as
begin
for i in 12000
Loop
v_start_date : = v_start_date + 1;
end Loop;
end;
Source System:
Define the staging database source system with the following table
T_Account
T_Product
T_Market
T_Employee
Client_Order
Client_Execution
T_Date
DM_T_Account_DIM
DM_T_Account_DIM_EXCEP
DM_T_DATE_DIM
DM_T_EMPLOYEE_DIM
DM_T_EMPLOYEE_DIM_EXCEP
DM_T_Product_DIM
DM_T_Product_DIM_EXCEP
DM_T_MARKET_DIM
DM_T_MARKET_DIM_EXCEP
TRANSACTION_DETAIL_FACT
> product and exception product make to change the product key and excep key will make it as
a forst and generate SQL.
A Type one dimension keeps only current data in the target. It doesnt any history.
Procedure:
Create a target definition with the name EMP DIM Type1(Emp key, Emp no, tname, job,sal)
Note:
Creating a mapping with the name M_ EMPLOYEE _ DIM_Type1 drop the source definition
Emp
Click done
From source qualifier (SQ- EMP) copy the port empno to the look transformation.
Double click on the look up transformation select condition tab from tool brace click on add a
new condition.
From SQ_ EMP copy the following ports to expression transformation empno, ename, job, sal
From look up transformation copy the port emp key to the expression transformation.
Create the transformation type expression , update strategy and sequence generator
From router transformation , from new record output group copy the ports to the expression
transformation (emp no, Ename, job,sal) from expression transformation copy the ports to update
strategy transformation.
From updating transformation connect the ports to the target , from sequence generator
transformation connect the net work port to the EMP key of target table.
Create the transformation type expression and update strategy from router transformation , from
update record output group copy the following ports to the expression transformation (Emp key,
emp no, ename, job,sal)
From expression transformation copy the ports to the update strategy transformation.
From update strategy transformation connect the ports to the target from repository menu click
on save.
Create a session with the name S_M _EMPLOYEE_DIM_Type1, double click the session select
the properties tab
Attribute Value
Select the mapping tab set the reader and writer connections, click apply and click ok
SQL> commit;
A type 2 dimension stores complete history in the target for each update at insert a new record in
the target.
Create a target definition with the name EMP- type2 (emp key, emp no, enmae, sal, job, deptno,
version)
Transformation menu select create select the transformation type look up enter name LKP
TRG click on create
From SQ- EMP copy the port EMPNO to the look up transformation
Double click on the lookup transformation select the condition tab tool bar click on add a new
condition.
Emp no = Empno1
Create the transformation type expression from source qualifier, copy the following ports to the
expression transformation
Create the transformation type router from expression T/R, copy the following ports to the router
transformation(EMPno, Ename,job,sal,deptno,version,new-flag,update-flag)
Create the transformation type expression, update strategy and sequence generator
transformation
From router T/R , from new record output group copy the following ports to the expression
transformation emp no, ename, job,sal,deptno
Version Decimal 5 0
From sequence generator T/R connect the textual port to the EMP key of the target table
Creating the transformation type expression and update strategy from router T/R , for update
record output group copy the following ports to expression T/R
(EMPno, Ename,job,sal,deptno,version)
From expression T/R copy the ports to update strategies T/R and develop the following strategy
expression DD- insert
Select