Informatica Question & Answer Set

D W B I C o n c e p t s .
c o m

Master Informatica Questions and
Answer Set
Version 2.5
The one stop master manual of Informatica interview questions and answers
www.dwbiconcepts.com Community of DWBI Professionals
www.dwbiconcepts.com All rights reserved.
2

Copyright Notice

Informatica Master Question and Answer Set is copyright DWBIConcepts 2013.

All rights reserved. No part of this book shall be reproduced, stored in a retrieval system,
or transmitted by any means electronic, mechanical, photocopying, recording, or oth-
erwise without written permission from the publisher. No patent liability is assumed
with respect to the use of the information contained herein. Although every precaution
has been taken in the preparation of this book, the publisher and author assume no re-
sponsibility for errors or omissions. Neither is any liability assumed for damages result-
ing from the use of the information contained herein.

Trademarks
All terms mentioned in this book that are known to be trademarks or service marks have
been appropriately capitalized. New Riders Publishing cannot attest to the accuracy of
this information. Use of a term in this book should not be regarded as affecting the valid-
ity of any trademark or service mark.
Warning and disclaimer
Every effort has been made to make this book as complete and as accurate as possible,
but no warranty of fitness is implied. The information is provided on an as is basis. The
author and the publisher shall have neither liability nor responsibility to any person or
entity with respect to any loss or damages arising from the information contained in this
book

3
How this book should be used

This book contains various questions and answers pertaining to Informatica Power Cen-
ter and allied tools as commonly asked in Job Interviews. As such the book is written
for the candidates who are preparing for Job Interviews. It is suggested that the candidate
start preparing from the material at least one week in advance so that s/he can finish
reading the entire content before appearing for the interview. In case the candidate is
stuck with any question or answer, is not clear on something or has a doubt s/he can
interact with the Experts by using DWBIConcepts forum.
For the help of the readers, we have tagged certain questions accordingly as shown be-
low:

Common / Frequently Asked Questions

Harder Questions

Additional Information

4

Table of Contents
COPYRIGHT NOICE 2
TRADEMARKS 2
WARNING AND DISCLAIMER 2
HOW THIS BOOK SHOULD BE USED 3
1. AGGREGATOR TRANSFORMATION 13
1. WHAT IS AN AGGREGATOR TRANSFORMATION? 13
2. HOW AN EXPRESSION TRANSFORMATION DIFFERS FROM AGGREGATOR TRANSFORMATION? 13
3. DOES AN AGGREGATOR TRANSFORMATION SUPPORT ONLY AGGREGATE EXPRESSIONS? 13
4. GIVE ONE EXAMPLE FOR EACH OF CONDITIONAL AGGREGATION, NON-AGGREGATE EXPRESSION AND NESTED AGGREGATION. 13
5. HOW DOES AGGREGATOR TRANSFORMATION HANDLE NULL VALUES? 13
6. WHAT ARE THE PERFORMANCE CONSIDERATIONS WHEN WORKING WITH AGGREGATOR TRANSFORMATION? 14
7. WHAT ARE THE USES OF INDEX AND DATA CACHE? 14
8. WHAT DIFFERS WHEN WE CHOOSE SORTED INPUT FOR AGGREGATOR TRANSFORMATION? 14
9. UNDER WHAT CONDITIONS SELECTING SORTED INPUT IN AGGREGATOR WILL STILL NOT BOOST SESSION PERFORMANCE? 15
10. UNDER WHAT CONDITION SELECTING SORTED INPUT IN AGGREGATOR MAY FAIL THE SESSION? 15
11. SUPPOSE WE DO NOT GROUP BY ON ANY PORTS OF THE AGGREGATOR WHAT WILL BE THE OUTPUT. 15
12. WHAT IS THE EXPECTED VALUE IF THE COLUMN IN AN AGGREGATOR TRANSFORMATION IS NEITHER A GROUP BY NOR AN
AGGREGATE EXPRESSION? 15
13. WHAT IS INCREMENTAL AGGREGATION? 15
14. SORTED INPUT FOR AGGREGATOR TRANSFORMATION WILL IMPROVE PERFORMANCE OF MAPPING. HOWEVER, IF SORTED INPUT IS
USED FOR NESTED AGGREGATE EXPRESSION OR INCREMENTAL AGGREGATION, THEN THE MAPPING MAY RESULT IN SESSION FAILURE.
EXPLAIN WHY? 16
15. HOW CAN WE DELETE DUPLICATE RECORD USING INFORMATICA AGGREGATOR? 16
16. SCENARIO IMPLEMENTATION 1 16
2. EXPRESSION TRANSFORMATION 19
1. WHAT IS AN EXPRESSION TRANSFORM? 19
2. HOW MANY TYPES OF PORTS ARE THERE IN EXPRESSION TRANSFORM? 19
3. WHAT IS THE EXECUTION ORDER OF THE PORTS IN AN EXPRESSION? 19
4. DESCRIBE THE APPROACH FOR THE REQUIREMENT. SUPPOSE THE INPUT IS: 19
5. HOW CAN WE IMPLEMENT AGGREGATION OPERATION WITHOUT USING AN AGGREGATOR TRANSFORMATION IN INFORMATICA? 20
3. FILTER TRANSFORMATION 24
1. WHAT IS A FILTER TRANSFORMATION AND WHY IT IS AN ACTIVE ONE? 24
2. WHAT IS THE DIFFERENCE BETWEEN SOURCE QUALIFIER TRANSFORMATIONS SOURCE FILTER OPTION AND FILTER
TRANSFORMATION? 24
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

5
4. JOINER TRANSFORMATION 25
1. WHAT IS A JOINER TRANSFORMATION AND WHY IT IS AN ACTIVE ONE? 25
2. STATE THE LIMITATIONS WHERE WE CANNOT USE JOINER IN THE MAPPING PIPELINE. 25
3. OUT OF THE TWO INPUT PIPELINES OF A JOINER, WHICH ONE WILL WE SET AS THE MASTER PIPELINE? 25
4. WHAT ARE THE DIFFERENT TYPES OF JOINS AVAILABLE IN JOINER TRANSFORMATION? 26
5. DEFINE THE VARIOUS JOIN TYPES OF JOINER TRANSFORMATION. 27
6. DESCRIBE THE IMPACT OF NUMBER OF JOIN CONDITIONS AND JOIN ORDER IN A JOINER. 27
7. HOW DOES JOINER TRANSFORMATION TREAT NULL VALUE MATCHING? 27
8. WHEN WE CONFIGURE THE JOIN CONDITION, WHAT ARE THE GUIDELINES WE NEED TO FOLLOW TO MAINTAIN THE SORT ORDER? 28
9. WHAT ARE THE TRANSFORMATIONS THAT CANNOT BE PLACED BETWEEN THE SORT ORIGIN AND THE JOINER TRANSFORMATION SO
THAT WE DO NOT LOSE THE INPUT SORT ORDER? 28
10. WHAT IS THE USE OF SORTED INPUT IN JOINER TRANSFORMATION? 28
11. CAN WE JOIN TWO TABLES BASED ON A JOIN COLUMN HAVING DIFFERENT DATA TYPE? 29
12. IMPLEMENTATION SCENARIO1 - JOINER TRANSFORMATION IS JOINING TWO TABLES S1 AND S2. S1 HAS 10,000 ROWS AND S2
HAS 1000 ROWS . WHICH TABLE YOU WILL SET MASTER FOR BETTER PERFORMANCE OF JOINER TRANSFORMATION? WHY? 29
5. LOOKUP TRANSFORMATION 30
1. WHAT IS A LOOKUP TRANSFORM? 30
2. WHAT ARE THE DIFFERENCES BETWEEN CONNECTED AND UNCONNECTED LOOKUP? 30
3. WHAT ARE THE DIFFERENT LOOKUP CACHE(S)? 30
4. IS LOOKUP AN ACTIVE OR PASSIVE TRANSFORMATION? 31
5. WHAT IS THE DIFFERENCE BETWEEN STATIC AND DYNAMIC LOOKUP CACHE? 31
6. WHAT ARE THE USES OF INDEX AND DATA CACHES? 31
7. WHAT IS PERSISTENT LOOKUP CACHE? 31
8. WHAT TYPE OF JOIN DOES LOOKUP SUPPORT? 32
9. EXPLAIN HOW LOOKUP TRANSFORMATION WORKS LIKE SQL LEFT OUTER JOIN. 32
10. WHERE AND WHY DO WE USE UNCONNECTED LOOKUP INSTEAD OF CONNECTED LOOKUP? 32
11. HOW CAN WE IDENTIFY PERSISTENT CACHE FILES IN INFORMATICA SERVER? 33
12. HOW TO CONFIGURE A LOOKUP ON A FLAT FILE WITH HEADER? 33
13. WHAT IS THE DIFFERENCE BETWEEN PERSISTENT CACHE AND SHARED CACHE? 33
14. DESCRIBE HOW TO RETURN MULTIPLE PORT VALUES FROM UNCONNECTED LOOKUP IN INFORMATICA. 34
15. HOW TO MAKE THE PERSISTENT LOOKUP CACHE IN SYNC WITH LOOKUP TABLE? 34
16. IF WE USE PERSISTENT CACHE FOR A DYNAMIC LOOKUP, WILL THE CACHE FILE BE UPDATED OR INSERTED AS REQUIRED? 34
17. IS THERE ANYTHING WRONG IN SHARING A PERSISTENT CACHE BETWEEN STATIC AND DYNAMIC LOOKUP? 34
18. WHAT IS THE DIFFERENCE BETWEEN THE TWO UPDATE PROPERTIES - UPDATE ELSE INSERT, INSERT ELSE UPDATE IN DYNAMIC
LOOKUP CACHE? 35
19. IF THE DEFAULT VALUE FOR THE LOOKUP RETURN PORT IS NOT SET, WHAT WILL BE THE OUTPUT WHEN THE LOOKUP CONDITION
FAILS? 35
20. HOW CAN WE ENSURE DATA IS NOT DUPLICATED IN THE TARGET WHEN THE SOURCE HAS DUPLICATE RECORDS, USING LOOKUP
TRANSFORMATION? 35
6. NORMALIZER TRANSFORMATION 36
1. WHAT IS A NORMALIZER TRANSFORMATION? 36
3. WHAT ARE LEVELS IN NORMALIZER TRANSFORMATION? 36
4. WHAT IS THE PURPOSE OF GCID AND GK IN A NORMALIZER TRANSFORMATION? 37
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

6
7. RANK TRANSFORMATION 38
1. WHAT IS A RANK TRANSFORM? 38
2. HOW DOES A RANK TRANSFORM DIFFER FROM AGGREGATOR TRANSFORM FUNCTIONS MAX AND MIN? 38
3. HOW DOES A RANK CACHE WORKS? 38
4. WHAT IS A RANK PORT AND RANKINDEX? 38
5. HOW CAN YOU GET RANKS BASED ON DIFFERENT GROUPS? 38
6. WHAT HAPPENS IF TWO RANK VALUES MATCH? 39
7. WHAT ARE THE RESTRICTIONS OF RANK TRANSFORMATION? 39
8. HOW DOES RANK TRANSFORMATION HANDLE STRING VALUES? 39
9. WHAT IS DENSE RANK AND DOES INFORMATICA SUPPORTS DENSE RANK? 39
10. HOW DO WE ACHIEVE DENSE_RANK IN INFORMATICA? 40
11. SOURCE TABLE HAS 5 ROWS. RANK IN RANK TRANSFORMATION IS SET TO 10. HOW MANY ROWS THE RANK TRANSFORMATION
WILL OUTPUT? 40
12. HOW YOU WILL LOAD UNIQUE RECORD INTO TARGET FLAT FILE FROM SOURCE FLAT FILES HAS DUPLICATE DATA? 40
8. ROUTER TRANSFORMATION 42
1. WHAT IS THE DIFFERENCE BETWEEN ROUTER AND FILTER? 42
2. WHAT IS THE MINIMUM NUMBER OF GROUPS WE CAN DECLARE IN A ROUTER TRANSFORMATION? 42
9. SEQUENCE GENERATOR TRANSFORMATION 45
1. WHAT IS A SEQUENCE GENERATOR TRANSFORMATION? 45
2. DEFINE THE PROPERTIES AVAILABLE IN SEQUENCE GENERATOR TRANSFORMATION IN BRIEF. 45
5. WHAT ARE THE CHANGES WE OBSERVE WHEN WE PROMOTE A NON-REUSABLE SEQUENCE GENERATOR TO A REUSABLE ONE? AND
WHAT HAPPENS IF WE SET THE NUMBER OF CACHED VALUES TO 0 FOR A REUSABLE TRANSFORMATION? 47
6. HOW SEQUENCE GENERATOR IN THE MAPPING IS HANDLED WHEN WE MIGRATE THE MAPPING FROM ONE ENVIRONMENT TO
ANOTHER? 47
8. HOW DO I GET A SEQUENCE GENERATOR TO "PICK UP" WHERE ANOTHER "LEFT OFF"? 48
10. STORED PROCEDURE TRANSFORMATION 49
1. WHAT IS A STORED PROCEDURE TRANSFORMATION? 49
2. HOW MANY TYPES OF STORED PROCEDURE TRANSFORMATION ARE THERE? 49
3. HOW DO WE CALL AN UNCONNECTED STORED PROCEDURE TRANSFORMATION? 49
4. HOW DO WE SET THE EXECUTION ORDER OF PRE-POST LOAD STORED PROCEDURE? 49
5. HOW DO WE SET THE CALL TEXT FOR STORED PROCEDURE TRANSFORMATION? 49
6. HOW DO WE RECEIVE OUTPUT/RETURN PARAMETERS FROM UNCONNECTED STORED PROCEDURE? 50
11. SORTER TRANSFORMATION 51
1. WHAT IS A SORTER TRANSFORMATION? 51
2. WHY IS SORTER AN ACTIVE TRANSFORMATION? 51
3. HOW DOES SORTER HANDLE CASE SENSITIVE SORTING? 51
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

7
4. HOW DOES SORTER HANDLE NULL VALUES? 51
5. HOW DOES A SORTER CACHE WORKS? 51
6. HOW TO DELETE DUPLICATE RECORDS OR RATHER TO SELECT DISTINCT ROWS FOR FLAT FILE SOURCES? 52
12. UNION TRANSFORMATION 53
1. WHAT IS A UNION TRANSFORMATION? 53
2. WHAT ARE THE RESTRICTIONS OF UNION TRANSFORMATION? 53
3. HOW COME UNION TRANSFORMATION IS ACTIVE? 53
13. UPDATE STRATEGY TRANSFORMATION 54
1. WHAT IS UPDATE STRATEGY TRANSFORM? 54
2. WHAT ARE UPDATE STRATEGY CONSTANTS? 54
3. HOW CAN WE UPDATE A RECORD IN TARGET TABLE WITHOUT USING UPDATE STRATEGY? 54
4. WHAT IS DATA DRIVEN? 54
5. WHAT HAPPENS WHEN DD_UPDATE IS DEFINED IN UPDATE STRATEGY AND TREAT SOURCE ROWS AS INSERT IS SELECTED IN
SESSION? 55
6. WHAT ARE THE THREE AREAS WHERE THE ROWS CAN BE FLAGGED FOR PARTICULAR TREATMENT? 55
7. BY DEFAULT OPERATION CODE FOR ANY ROW IN INFORMATICA WITHOUT BEING ALTERED IS INSERT. THEN STATE WHEN DO WE
NEED DD_INSERT? 55
8. WHAT IS THE DIFFERENCE BETWEEN UPDATE STRATEGY AND FOLLOWING UPDATE OPTIONS IN TARGET? 55
9. WHAT IS THE USE OF FORWARD REJECT ROWS IN MAPPING? 56
14. JAVA TRANSFORMATION 57
15. SOURCE QUALIFIER TRANSFORMATION 59
1. WHAT IS A SOURCE QUALIFIER? WHAT ARE THE TASKS WE CAN PERFORM USING A SOURCE QUALIFIER AND WHY IT IS AN ACTIVE
TRANSFORMATION? 59
2. WHAT HAPPENS TO A MAPPING IF WE ALTER THE DATA TYPES BETWEEN SOURCE AND ITS CORRESPONDING SOURCE QUALIFIER? 59
3. SUPPOSE WE HAVE USED THE SELECT DISTINCT AND THE NUMBER OF SORTED PORTS PROPERTY IN THE SOURCE QUALIFIER AND
THEN WE ADD CUSTOM SQL QUERY. EXPLAIN WHAT WILL HAPPEN. 59
4. DESCRIBE THE SITUATIONS WHERE WE WILL USE THE SOURCE FILTER, SELECT DISTINCT AND NUMBER OF SORTED PORTS
PROPERTIES OF SOURCE QUALIFIER TRANSFORMATION. 60
5. WHAT WILL HAPPEN IF THE SELECT LIST COLUMNS IN THE CUSTOM OVERRIDE SQL QUERY AND THE OUTPUT PORTS ORDER
IN SOURCE QUALIFIER TRANSFORMATION DO NOT MATCH? 60
6. WHAT HAPPENS IF IN THE SOURCE FILTER PROPERTY OF SQ TRANSFORMATION WE INCLUDE KEYWORD WHERE SAY, WHERE
CUSTOMERS.CUSTOMER_ID > 1000. 60
7. DESCRIBE THE SCENARIOS WHERE WE GO FOR JOINER TRANSFORMATION INSTEAD OF SOURCE QUALIFIER TRANSFORMATION. 60
8. WHAT IS THE MAXIMUM NUMBER WE CAN USE IN NUMBER OF SORTED PORTS FOR SYBASE SOURCE SYSTEM? 61
9. WHAT IS USE OF SOURCE QUALIFIER IN INFORMATICA? CAN WE CREATE A MAPPING WITHOUT A SOURCE QUALIFIER? 61
10. SUPPOSE WE HAVE TWO TABLES OF SAME DATABASE TYPE, RESIDING IN DIFFERENT DATABASE INSTANCE. IF A DATABASE LINK IS
AVAILABLE, HOW CAN WE JOIN THE TWO TABLES USING A SOURCE QUALIFIER IN INFORMATICA PROVIDED THERE ARE VALID JOIN
COLUMNS. 61
11. WHAT IS THE MEANING OF OUTPUT IS DETERMINISTIC PROPERTY IN SOURCE QUALIFIER TRANSFORMATION? 61
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

8
16. MISCELLANEOUS 63
1. WHAT ARE THE NEW FEATURES OF INFORMATICA 9.X IN DEVELOPER LEVEL? 63
2. NAME THE TRANSFORMATIONS WHICH CONVERTS ONE TO MANY ROWS I.E. INCREASES THE I/P: O/P ROW COUNT. ALSO WHAT IS
THE NAME OF ITS REVERSE TRANSFORMATION? 63
3. HOW MANY WAYS WE CAN FILTER RECORDS? 63
4. WHAT ARE THE TRANSFORMATIONS THAT USE CACHE FOR PERFORMANCE? 63
5. WHAT IS THE FORMULA FOR CALCULATION OF LOOKUP/RANK/AGGREGATOR INDEX & DATA CACHES? 64
6. WHAT IS THE DIFFERENCE BETWEEN INFORMATICA POWERCENTER AND EXCHANGE AND MART? 64
7. HOW DO WE HANDLE DELIMITER CHARACTER AS A PART OF THE DATA IN A DELIMITED SOURCE FILE? 65
8. WE HAVE JUST RECEIVED SOURCE FILES FROM UNIX. WE WANT TO STAGE THAT DATA TO ETL PROCESS. WHAT ARE THE POINTS
WE NEED TO LOOK FOR? 65
9. WHAT IS THE DIFFERENCE BETWEEN JOINER AND LOOKUP. PERFORMANCE WISE WHICH ONE IS BETTER TO USE. 65
10. WHAT IS THE B2B IN INFORMATICA? HOW CAN WE USE IT IN INFORMATICA? 66
11. WHAT IS CDC, SCD AND MD5 IN INFORMATICA? 66
12. HOW CAN WE IMPLEMENT AN SCD TYPE2 MAPPING WITHOUT USING A LOOKUP TRANSFORMATION? 67
13. HOW DOES JOINER AND LOOKUP TRANSFORMATION TREAT NULL VALUE MATCHING? 67
14. DOES MICROSOFT SQL SERVER SUPPORTS BULK LOADING? IF YES, WHAT HAPPENS WHEN YOU SPECIFY BULK MODE AND DATA
DRIVEN FOR SQL SERVER TARGET 67
15. HOW CAN YOU UTILIZE COM COMPONENTS IN INFORMATICA? 67
16. WHAT IS SQL TRANSFORMATION IN INFORMATICA? 67
17. WHAT IS A XML SOURCE QUALIFIER? 68
18. WHAT IS THE METADATA EXTENSIONS TAB IN INFORMATICA? 68
19. DESCRIBE SOME OF THE ETL BEST PRACTICES 69
20. IS THERE A SCOPE OF CLOUD COMPUTING IN DATA WAREHOUSING TECHNOLOGY? 69
17. MAPPING 71
2. WHAT ARE MAPPING PARAMETERS AND VARIABLES? 71
4. WHAT ARE THE DEFAULT VALUES FOR VARIABLES? 72
5. WHAT DOES FIRST COLUMN OF BAD FILE (REJECTED ROWS) INDICATES? 72
6. OUT OF 100000 SOURCE ROWS SOME ROWS GET DISCARD AT TARGET, HOW WILL YOU TRACE THEM AND WHERE IT GETS LOADED?
72
7. WHAT IS REJECT LOADING? 72
8. WHY INFORMATICA WRITER THREAD MAY REJECT A RECORD? 74
9. WHY TARGET DATABASE CAN REJECT A RECORD? 74
10. DESCRIBE VARIOUS STEPS FOR LOADING REJECT FILE? 74
11. VARIABLE V1 HAS VALUES SET AS 5 IN DESIGNER (DEFAULT), 10 IN PARAMETER FILE, AND 15 IN REPOSITORY. WHILE RUNNING
SESSION WHICH VALUE INFORMATICA WILL READ? 74
12. WHAT ARE SHORTCUTS? WHERE IT CAN BE USED? WHAT ARE THE ADVANTAGES? 74
13. CAN WE HAVE AN INFORMATICA MAPPING WITH TWO PIPELINES, WHERE ONE FLOW IS HAVING A TRANSACTION CONTROL
TRANSFORMATION AND ANOTHER NOT. EXPLAIN WHY? 75
14. HOW CAN WE IMPLEMENT REVERSE PIVOTING USING INFORMATICA TRANSFORMATIONS? 75
15. IS IT POSSIBLE TO UPDATE A TARGET TABLE WITHOUT ANY KEY COLUMN IN TARGET? 75
18. MAPPLET 77
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

9
1. WHAT IS A MAPPLET? 77
2. WHAT IS THE DIFFERENCE BETWEEN REUSABLE TRANSFORMATION AND MAPPLET? 77
3. WHAT ARE THE TRANSFORMATIONS THAT ARE NOT SUPPORTED IN MAPPLET? 77
4. IS IT POSSIBLE TO CONVERT REUSABLE TRANSFORMATION TO A NON-REUSABLE ONE? 77
5. WHAT IS THE USE OF MAPPLET & WORKLET IN PROJECT? 78
6. IS IT POSSIBLE TO HAVE A MAPPLET WITHIN A MAPPLET AND WORKLET WITHIN A WORKLET? 78
19. SESSION 79
1. WHAT IS SESSION AND BATCHES? 79
2. WHAT ARE VARIOUS SESSION TRACING LEVELS? 79
3. CAN WE COPY A SESSION TO NEW FOLDER OR NEW REPOSITORY? 79
4. IS IT POSSIBLE TO STORE ALL THE INFORMATICA SESSION LOG INFORMATION IN A DATABASE TABLE? NORMALLY THE SESSION LOG IS
STORED AS A BINARY COMPRESSION .BIN FILE IN SESSLOGS DIRECTORY. CAN WE STORE THE SAME INFORMATION IN DATABASE TABLES
FOR FUTURE ANALYSIS? 79
5. CAN WE CALL A SHELL SCRIPT FROM SESSION PROPERTIES? 80
6. CAN WE CHANGE THE SOURCE AND TARGET TABLE NAMES IN SESSION LEVEL? 81
7. HOW TO WRITE FLAT FILE COLUMN NAMES IN TARGET? 81
8. WHAT ARE THE ERROR TABLES PRESENT IN INFORMATICA? 81
9. WHAT ARE THE ALTERNATE WAYS TO STOP A SESSION WITHOUT USING STOP ON ERRORS OPTION SET TO 1 IN SESSION
PROPERTIES? 81
10. SUPPOSE A SESSION FAILS AFTER LOADING OF 10,000 RECORDS IN THE TARGET. HOW CAN WE LOAD THE RECORDS FROM 10,001
WHEN WE RUN THE SESSION NEXT TIME? 82
11. DEFINE THE TYPES OF COMMIT INTERVALS APART FROM USER DEFINED? 82
12. SUPPOSE SESSION IS CONFIGURED WITH COMMIT INTERVAL OF 10,000 ROWS AND SOURCE HAS 50,000 ROWS EXPLAIN THE
COMMIT POINTS FOR SOURCE BASED COMMIT & TARGET BASED COMMIT. ASSUME APPROPRIATE VALUE WHEREVER REQUIRED? 82
13. HOW TO CAPTURE PERFORMANCE STATISTICS OF INDIVIDUAL TRANSFORMATION IN THE MAPPING AND EXPLAIN SOME
IMPORTANT STATISTICS THAT CAN BE CAPTURED? 83
14. HOW CAN WE PARAMETERIZE SUCCESS OR FAILURE EMAIL LIST? 83
15. IS IT POSSIBLE THAT A SESSION FAILED BUT STILL THE WORKFLOW STATUS IS SHOWING SUCCESS? 83
16. WHAT IS BUSY PERCENTAGE? 83
17. CAN WE WRITE A PL/SQL BLOCK IN PRE AND POST SESSION OR IN TARGET QUERY OVERRIDE? 84
18. WHENEVER A SESSION RUNS DOES THE DATA GETS OVERWRITTEN IN A FLAT FILE TARGET? IS IT POSSIBLE TO KEEP THE EXISTING
DATA AND ADD THE NEW DATA TO THE TARGET FILE? 84
19. CAN WE USE THE SAME SESSION TO LOAD A TARGET TABLE IN DIFFERENT DATABASES HAVING SAME TARGET DEFINITION? 84
20. HOW DO YOU REMOVE THE CACHE FILES AFTER THE TRANSFORMATION? 84
21. WHY DOESN'T A RUNNING SESSION QUIT WHEN ORACLE OR SYBASE RETURN FATAL ERRORS? 84
20. WORKFLOW 86
1. WHAT IS THE DIFFERENCE BETWEEN STOP AND ABORT OPTIONS IN WORKFLOW? 86
2. RUNNING INFORMATICA WORKFLOW CONTINUOUSLY HOW TO RUN A WORKFLOW CONTINUOUSLY UNTIL A CERTAIN CONDITION
IS MET? 86
3. HOW DO WE SEND EMAILS FROM INFORMATICA AFTER THE SUCCESSFUL COMPLETION OF ONE SESSION? THE EMAIL WILL CONTAIN
THE JOB NAME/ SESSION START TIME AND SESSION END TIME IN THE MESSAGE BODY. 87
5. HOW CAN WE SEND TWO SEPARATE EMAILS AFTER A SUCCESSFUL SESSION RUN? 87
6. WHAT IS COLD START IN INFORMATICA? 88
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

10
8. WE KNOW THERE ARE 3 OPTIONS FOR SESSION RECOVERY STRATEGY - RESTART TASK, FAIL TASK AND CONTINUE RUNNING THE
WORKFLOW, RESUME FROM LAST CHECKPOINT WHENEVER A SESSION FAILS. HOW DO WE RESTART A WORKFLOW AUTOMATICALLY
WITHOUT ANY MANUAL INTERVENTION IN THE EVENT OF SESSION FAILURE? 89
9. WHAT IS THE DIFFERENCE REAL-TIME AND CONTINUOUS WORKFLOWS? 89
12. HOW DO WE SEND A SESSION FAILURE MAIL WITH THE WORKFLOW OR SESSION LOG AS ATTACHMENT? 90
13. EXPLAIN DEADLOCK IN INFORMATICA AND HOW DO WE RESOLVE IT? 90
15. HOW CAN WE PASS A VALUE FROM ONE WORKFLOW TO ANOTHER? 91
21. ADMINISTRATION 92
1. WHAT IS LOAD MANAGER? 92
2. WHAT IS DTM PROCESS? HOW MANY THREADS IT CREATES TO PROCESS DATA, EXPLAIN EACH THREAD IN BRIEF? 92
3. CAN YOU CREATE A FOLDER WITHIN DESIGNER? 92
4. HOW DO YOU TAKE CARE OF SECURITY USING A REPOSITORY MANAGER? 93
5. WHAT ARE THE DIFFERENT USES OF A REPOSITORY MANAGER? 93
6. WHAT ARE 2 MODES OF DATA MOVEMENT IN INFORMATICA SERVER? 93
7. WHAT IS CODE PAGE USED FOR? 93
8. WHAT IS CODE PAGE COMPATIBILITY? 94
9. WHAT IS DEFAULT BLOCK BUFFER SIZE? 94
10. WHAT IS DEFAULT LM SHARED MEMORY SIZE? 94
11. DEFINE SERVER CONCEPTS WITH RESPECT TO MEMORY BUFFERS 94
12. WHAT ARE THE TWO PROGRAMS THAT COMMUNICATE WITH THE INFORMATICA SERVER? 95
22. COMMAND LINE ARGUMENTS 96
1. WHAT IS PMCMD COMMANDS? 96
2. WHAT IS PMREP COMMANDS? 96
3. HOW DO WE START & STOP SESSION FROM PMCMD COMMAND LINE? 96
23. METADATA REPOSITORY 97
1. IS THERE ANY METADATA QUERY TO FIND THE LIST OF INFORMATICA FOLDER NAME, WORKFLOW NAMES WHICH ARE MIGRATED IN
A PARTICULAR QUARTER? 97
3. WRITE A METADATA QUERY TO IDENTIFY THE SESSIONS HAVING TRUNCATE OPTION ENABLED 97
4. WHERE CAN I FIND A HISTORY / METRICS OF THE LOAD SESSIONS THAT HAVE OCCURRED IN INFORMATICA? 97
5. HOW TO EXTRACT THE WORKFLOW MONITOR RECORD INFORMATION FROM INFORMATICA METADATA REPOSITORY? 98
24. REPOSITORY MANAGER 100
1. DESCRIBE THE STEPS FOR EXPORT AND IMPORT? 100
2. WHAT ARE THE VARIOUS METHODS OF CODE MIGRATION OR WHICH IS THE BEST WAY OF DEPLOYMENT? 100
3. WHAT ARE THE VARIOUS OPTIONS FOR ETL CODE MIGRATION 101
4. WHAT IS LABELING IN INFORMATICA? 101
5. SUPPOSE HAVING INFORMATICA VERSION CONTROL IN PLACE, CAN WE REVERT BACK AN OBJECT TO A STATE OF TWO PREVIOUS
VERSION. 102
6. WHAT DO WE MEAN BY TEAM BASED DEVELOPMENT IN INFORMATICA? 102
25. SCENARIO QUESTIONS 104
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

11
1. SUPPOSE WE HAVE TEN SOURCE FLAT FILES OF SAME STRUCTURE. HOW CAN WE LOAD ALL THE FILES IN TARGET DATABASE IN A
SINGLE BATCH RUN USING A SINGLE MAPPING? 104
2. SUPPOSE WE HAVE TWO SOURCE QUALIFIER TRANSFORMATIONS SQ1 AND SQ2 CONNECTED TO TARGET TABLES TGT1 AND TGT2
RESPECTIVELY. HOW DO YOU ENSURE TGT2 IS LOADED AFTER TGT1? 104
3. SUPPOSE WE HAVE A SOURCE QUALIFIER TRANSFORMATION THAT POPULATES TWO TARGET TABLES. HOW DO YOU ENSURE TGT2
IS LOADED AFTER TGT1? 106
4. SUPPOSE WE HAVE THE EMP TABLE AS OUR SOURCE. IN THE TARGET WE WANT TO VIEW THOSE EMPLOYEES WHOSE SALARY ARE
GREATER THAN OR EQUAL TO THE AVERAGE SALARY FOR THEIR DEPARTMENTS. DESCRIBE YOUR MAPPING APPROACH. 106
5. HOW CAN WE PERFORM CHANGED DATA CAPTURE BASED ON LOAD SEQUENCE NUMBER (INTEGER) COLUMN PRESENT IN THE
SOURCE TABLE? 110
7. HOW CAN WE LOAD X RECORDS (USER DEFINED RECORD NUMBERS) OUT OF N RECORDS FROM SOURCE DYNAMICALLY,
WITHOUT USING FILTER AND SEQUENCE GENERATOR TRANSFORMATION? 112
8. SUPPOSE WE HAVE N NUMBER OF ROWS IN THE SOURCE AND WE HAVE TWO TARGET TABLES. HOW CAN WE LOAD N/2 I.E. FIRST
HALF THE SOURCE DATA INTO ONE TARGET AND THE REMAINING HALF INTO THE NEXT TARGET? 112
9. SUPPOSE WE HAVE A FLAT FILE WHICH HAS A HEADER RECORD WITH FILE CREATION DATE, AND DETAILED DATA RECORDS.
DESCRIBE THE APPROACH TO LOAD THE 'FILE CREATION DATE' COLUMN ALONG WITH EACH AND EVERY DETAILED RECORD. 113
11. SUPPOSE WE HAVE A FLAT FILE WHICH CONTAINS JUST A NUMERIC VALUE. WE NEED TO POPULATE THIS VALUE IN ONE COLUMN
OF THE TARGET TABLE FOR EVERY SOURCE RECORD. HOW CAN WE ACHIEVE THIS? 113
12. HOW WILL YOU LOAD A SOURCE FLAT FILE INTO A STAGING TABLE WHEN THE FILE NAME IS NOT FIXED? THE FILE NAME IS LIKE
SALES_2013_02_22.TXT, I.E. DATE IS APPENDED AT THE END OF THE FILE AS A PART OF FILE NAME. 114
13. SOLVE THE BELOW SCENARIO USING INFORMATICA AND DATABASE SQL. 114
14. SUPPOSE WE HAVE A COLUMN IN SOURCE WITH VALUES AS BELOW: 115
15. CAN WE PASS THE VALUE OF A MAPPING VARIABLE BETWEEN 2 PIPELINES UNDER THE SAME MAPPING? IF NOT HOW CAN WE
ACHIEVE THIS? 116
18. IMPLEMENT SLOWLY CHANGING DIMENSION OF TYPE 2 WHICH WILL LOAD CURRENT RECORD IN CURRENT TABLE AND OLD DATA
IN LOG TABLE. 118
26. PERFORMANCE TUNING 119
1. WHICH ONE IS FASTER CONNECTED OR UNCONNECTED LOOKUP? 119
2. HOW WE CAN IMPROVE PERFORMANCE OF INFORMATICA NORMALIZATION TRANSFORMATION. 119
3. HOW TO IMPROVE THE SESSION PERFORMANCE? 119
4. HOW DO YOU IDENTIFY THE BOTTLENECKS IN MAPPINGS? 120
5. HOW DO YOU HANDLE PERFORMANCE ISSUES IN INFORMATICA? WHERE CAN YOU MONITOR THE PERFORMANCE? 121
6. WHAT ARE PERFORMANCE COUNTERS? 122
7. HOW CAN WE INCREASE SESSION PERFORMANCE? 122

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

12
Topic Matrix:

Serial Number Topics Questions
1 Aggregator 17
2 Expression 10
3 Filter 2
4 Joiner 12
5 Lookup 20
6 Normalizer 4
7 Rank 12
8 Router 5
9 Sequence Generator 8
10 Stored Procedure 6
11 Sorter 6
12 Union 3
13 Update Strategy 10
14 Java 2
15 Source Qualifier 12
16 Miscellaneous 20
17 Mapping 12
18 Mapplet 6
19 Session 22
20 Workflow 15
21 Administration 12
22 Command Line Arguments 3
23 Metadata Repository 5
24 Repository Manager 6
25 Scenario Questions 18
26 Performance Tuning 8

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

13
1. Aggregator Transformation

1. What is an Aggregator Transformation?

Answer:
An aggregator is an Active, Connected transformation which performs aggregate calculations like AVG,
COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM and VARIANCE.

2. How an Expression Transformation differs from Aggregator Transformation?

Answer:
An Expression Transformation performs calculation on a row-by-row basis, whereas an Aggregator Trans-
formation performs calculations on groups.

3. Does an Aggregator Transformation support only aggregate expressions?

Answer:
Apart from aggregate expressions, aggregator transformation supports non-aggregate expressions and con-
ditional clauses.

4. Give one example for each of Conditional Aggregation, Non-Aggregate expression and
Nested Aggregation.

Answer:
Use conditional clauses in the aggregate expression to reduce the number of rows used in the ag-
gregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM (SALARY, JOB = CLERK)

Use non-aggregate expressions in group by ports to modify or replace groups.
IIF (PRODUCT = Brown Bread, Bread, PRODUCT)

Nested aggregation expression can include one aggregate function within another aggregate func-
tion.
MAX (COUNT (PRODUCT))

5. How does Aggregator Transformation handle NULL values?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

14
By default, the aggregator transformation treats null values as NULL in aggregate functions. But
we can specify to treat null values in aggregate functions as NULL or zero.

6. What are the performance considerations when working with Aggregator Transfor-
mation?

Answer:
Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping be-
fore the aggregator transformation to reduce unnecessary aggregation.
Improve performance by connecting only the necessary input/output ports to subsequent transfor-
mations, thereby reducing the size of the data cache.
Use Sorted input which reduces the amount of data cached and improves session performance.

Aggregator performance improves dramatically if records are sorted before passing to the aggregator and
Sorted Input option under aggregator properties is checked. The record set should be sorted on those col-
umns that are used in Group By operation.

It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside
a source qualifier transformation, unless there is a chance that already sorted records from
source qualifier can again become unsorted before reaching aggregator.

7. What are the uses of index and data cache?

Answer:
The group data is stored in index files whereas Row data stored in data files.

8. What differs when we choose Sorted Input for Aggregator Transformation?

Answer:
Integration Service creates the index and data caches files in memory to process the Aggregator transfor-
mation. If the Integration Service requires more space as allocated for the index and data cache sizes in the
transformation properties, it stores overflow values in cache files i.e. paging to disk.
One way to increase session performance is to increase the index and data cache sizes in the transformation
properties.
But when we check Sorted Input the Integration Service uses memory to process an Aggregator transfor-
mation it does not use cache files.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

15
9. Under what conditions selecting Sorted Input in aggregator will still not boost session per-
formance?

Answer:
Incremental Aggregation, session option is enabled.
The aggregate expression contains nested aggregate functions.
When session property, Treat Source rows as is set to data driven.

10. Under what condition selecting Sorted Input in aggregator may fail the session?

Answer:
If the input data is not sorted correctly, the session will fail.
Also if the input data is properly sorted, the session may fail if the sort order by ports and the group
by ports of the aggregator are not in the same order.

11. Suppose we do not group by on any ports of the aggregator what will be the output.

Answer:

If we do not use an input port in group-by neither in aggregate expression, the Integration Ser-
vice will return only the last row value of the column for the input rows.

For example, if we have 100 rows coming from source then aggregator will output only the last record (100
th

record)

12. What is the expected value if the column in an aggregator transformation is neither a
group by nor an aggregate expression?

Answer:
Integration Service produces one row for each group based on the group by ports. The columns which are
neither part of the key nor aggregate expression will return the corresponding value of last record of the
group received.
However, if we specify particularly the FIRST function, the Integration Service then returns the value of the
specified first row of the group. So default is the LAST function.

13. What is Incremental Aggregation?

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

16
Answer:
We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Trans-
formation. When the Integration Service performs incremental aggregation, it actually passes changed
source data through the mapping and uses the historical cache data to perform aggregate calculations in-
crementally.

14. Sorted input for aggregator transformation will improve performance of mapping. How-
ever, if sorted input is used for nested aggregate expression or incremental aggregation,
then the mapping may result in session failure. Explain why?

Answer:
In case of a nested aggregation, there are multiple levels of sorting associated as each aggregation function
will require one sorting pass, and after the first level of aggregation, the sort order of the group by column
may get jumbled up, so before the second level of aggregation, Informatica must internally sort it again.
However, if we already indicate that input is sorted, Informatica will not do this sorting - resulting into fail-
ure.

In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this his-
torical cache the data may not be in sorted order. If we give sorted input, the records come as presorted for
that particular run but in the historical cache the data may not be in the sorted order.

15. How can we delete duplicate record using Informatica Aggregator?

Answer:
One way to handle duplicate records in source batch run is to use an Aggregator Transformation and using
the Group By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to se-
lect the last or the first of the duplicate column value records.

16. Scenario Implementation 1

Suppose in our Source Table we have data as given below:
Student Name Subject Name Marks
Sam Maths 100
Tom Maths 80
Sam Physical Science 80
John Maths 75
Sam Life Science 70
John Life Science 100
John Physical Science 85
Tom Life Science 100
Tom Physical Science 85

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

17
We want to load our Target Table as:
Student Name Maths Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85

Describe your approach.
Answer:
Here our scenario is to convert many rows to one row, and the transformation which will help us to achieve
this is Aggregator.
Our Mapping will look like this:

We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.

Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as
MATHS: MAX( MARKS, SUBJECT = Maths )
LIFE_SC: MAX( MARKS, SUBJECT = Life Science )
PHY_SC: MAX( MARKS, SUBJECT = Physical Science )
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

18

Source:
100 XYZ AAA
100 XYZ BBB
100 XYZ CCC

The expected output data:

100 XYZ AAA BBB CCC

Which transformations are used for this?
Answer:
Use an Aggregator transformation with variable.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

19
2. Expression Transformation

1. What is an Expression Transform?

Answer:
Expression is a Passive connected transformation used to calculate values in a single row before you write to
the target. We can use the Expression transformation to perform any non-aggregate calculations. We can al-
so use the Expression transformation to test conditional statements before you output the results to target
tables or other transformations.
For example, we might need to adjust employee salaries, concatenate first and last names, or convert strings
to numbers.

2. How many types of ports are there in Expression transform?

Answer:
There are three types of ports- INPUT, OUTPUT, and VARIABLE

3. What is the execution order of the ports in an expression?

Answer:
All ports are executed TOP TO BOTTOM in a serial physical ordering fashion, but they are done in the
following groups:
All input ports are pushed values first.
Then all variables are executed (top to bottom physical ordering in the expression).
Last - all output expressions are executed to push values to output ports

You can utilize this to your advantage, by placing lookups in to variables, then using the variables
"later" in the execution cycle.

4. Describe the approach for the requirement. Suppose the input is:

Col1 Col2
10 a
20 b
30 c
40
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

20
50 d

The desired output is:
Col1 Col2
10 a
20 a,b
30 a,b,c
40 a,b,c
50 a,b,c,d

Answer: Use an Expression transformation:-

Port Name Port Type Expression
Col1 I/O
Col2 I
V_Seq V CUME(1)
V_Col2 V IIF (V_Seq = 1, Col2, IIF ( ISNULL (Col2), Prev_Col2, Prev_Col2 || ',' || Col2))
Prev_Col2 V V_Col2
Out_Col2 O Prev_Col2

Keep in mind the string length of the variable and output ports.

CUME function is used to calculate the cumulative amount based on the argument of the cumulative func-
tion. This means, if we call CUME with argument 1, e.g. CUME(1); then on the first call it will re-
turn 1; on the second call, it will return 2; on the third call, it will return 3 and so on. Since
Informatica process data row by row, this means that when the first row is processed CUME(1)
will return 1; for the next row, it will return 2 and so on.

5. How can we implement aggregation operation without using an Aggregator Transfor-
mation in Informatica?

Answer:
We will use the very basic concept of the Expression Transformation, that at a time we can ac-
cess the previous row data as well as the currently processed data in an expression transfor-
mation. What we need is simple Sorter, Expression and Filter transformation to achieve aggre-
gation at Informatica level.
For detailed understanding visit Aggregation without Aggregator.

Source
Col1 Col2
A W
B R
C E
A R
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

21
B E

Target
Col1 READ WRITE EXECUTE
A 1 1 0
B 1 0 1
C 0 0 1

In this scenario Source values in Col2 W, R, E means read write and execute.
Answer:
Take an Expression transformation followed by Aggregator transformation.

In Expression Transformation:
Port Name Port Type Expression
Col1 I/O
Col2 I/O
Read O IIF ( Col2 = 'R', 1, 0 )
Write O IIF ( Col2 = 'W', 1, 0 )
Execute O IIF ( Col2 = 'E', 1, 0 )

In Aggregator Transform:
Col 1 I/O GROUP BY
Read I/O MAX (Read)
Write I/O MAX (Write)
Execute I/O MAX (Execute)


Source data is like below:

Id name1 name2
10 A B
10 C D
20 E F

Desired Target data is like below

Id name
10 AB
10 CD
20 EF
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

22

Answer:
Use Expression Transformation to concatenate both values as- name = name1 || name2


Suppose we have a field in source file named as DATA. We need to mark those records having 9 characters
such that the first 2 characters must be alphabets i.e.(A-Z) and the rest 7 characters must be alphanumeric
i.e.(A-Z) or (0-9) for the DATA field as output. And the records which dont match the condition should be
marked as Invalid. How do we implement this?
E.g.
DATA OUTPUT
AB345GH6756 AB345GH67
CD56789PJ CD56789PJ
56CHJK97889 Invalid
DG//*67DF Invalid

Answer:
Use the below logic in an output port of an Expression Transformation in Informatica:-

IIF( REG_MATCH( SUBSTR(DATA,1,2), '[[:alpha:]]{2}' ) = 1
ANDREG_MATCH( SUBSTR(DATA,3,7), '[[:alnum:]]{7}' ) = 1, SUBSTR(DATA, 1,
9), 'Invalid' )

How do we convert a Date field coming as data type string from a flat file?
Answer:
Use Date Conversion Functions:-
IIF( IS_DATE( Column1 ) = 1, TO_DATE( Column1 , 'YYYYMMDD' ),
NULL )

In the above example, we have assumed the format of the date field is YYYYMMDD. If the format is some-
thing else (e.g. YYYY-MM-DD), we need to specify the same


Source:
Col1 Col2
1 B
2 C
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

23
3 D
4 E

Target
Col1 Col2 Col3 Col4
1 B 2 C
3 D 4 E

Describe the approach to the above scenario where the source 1st record loaded to target col1,col2 then
2nd record loaded to col3,col4 again 3rd record to col1,col2 and so on.
Answer:
Use an Expression transformation:

Port
Name
Port Type Expression
Col1 I
Col2 I
V_ID V 1 MOD (Col1, 2)
O_ID O V_ID
O_Col1 O V_Col1
O_Col2 O V_Col2
O_Col3 O Col1
O_Col4 O Col2
V_Col1 V Col1
V_Col2 V Col2

Next use a Filter transformation with condition O_ID = 1
Next map O_Col1, O_Col2, O_Col3, O_Col4 to Col1, Col2, Col3, Col4 of the target respectively.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

24
3. Filter Transformation

1. What is a Filter Transformation and why it is an Active one?

Answer:
A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next transfor-
mation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the
filter condition evaluates to NULL, the row is assumed to be FALSE. The numeric equivalent of FALSE is zero
(0) and any non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A
filter condition returns TRUE or FALSE for each row that passes through the transformation, de-
pending on whether a row meets the specified condition. Only rows that return TRUE pass
through this transformation. Discarded rows do not appear in the session log or reject files.

2. What is the difference between Source Qualifier transformations Source filter option and
filter transformation?

Answer:
SQ Source Filter Filter Transformation
Source Qualifier transformation filters rows when
read from a source.
Filter transformation filters rows from
within a mapping
Source Qualifier transformation can only filter rows
from relational sources.
Filter transformation filters rows coming
from any type of source system in the map-
ping level.
Source Qualifier limits the row set extracted from a
source.
Filter transformation limits the row set
sent to a target.
Source Qualifier reduces the number of rows used
throughout the mapping and hence it provides better
performance.
To maximize session performance, in-
clude the Filter transformation as close to
the sources in the mapping as possible to
filter out unwanted data early in the flow of
data from sources to targets.
The filter condition in the Source Qualifier transfor-
mation only uses standard SQL as it runs in the database.
Filter Transformation can define a condi-
tion using any statement or transformation
function that returns either a TRUE or FALSE
value.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

25
4. Joiner Transformation

1. What is a Joiner Transformation and why it is an Active one?

Answer:
A Joiner is an Active and Connected transformation used to join two source data streams coming from same
or heterogeneous databases or files.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses
a condition that matches one or more pairs of columns between the two sources.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join
Type and optionally Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join
two rows. Depending on the join condition and the type of join selected, the Integration Service
either adds the row to the result set or discards the row. Because of this reason, the number of
rows in Joiner output may not be equal to the number of rows in Joiner Input. This is why Joiner
is considered an Active transformation.

2. State the limitations where we cannot use Joiner in the mapping pipeline.

Answer:

The Joiner transformation accepts input from most transformations. However, following are the
limitations:

Joiner transformation cannot be used when either of the input pipelines contains an Update Strate-
gy transformation.
Joiner transformation cannot be used if we connect a Sequence Generator transformation directly
before the Joiner transformation.

3. Out of the two input pipelines of a joiner, which one will we set as the master pipeline?

Answer:

During a session run, the Integration Service compares each row of the master source against the
detail source. The master and detail sources need to be configured for optimal performance.
When the Integration Service processes an unsorted Joiner transformation, it blocks the detail source while
it caches rows from the master source. Once the Integration Service finishes reading and caching all master
rows, it unblocks the detail source and reads the detail rows. This is why if we have the source containing
fewer input rows in master, the cache size will be smaller, thereby improving the performance.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

26
For a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source for
optimal performance and disk storage. When the Integration Service processes a sorted Joiner transfor-
mation, it caches rows for one hundred keys at a time. If the master source contains many rows with the
same key value, the Integration Service must cache more rows, and performance can be slowed.

Blocking logic is possible if master and detail input to the Joiner transformation originate from dif-
ferent sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.

4. What are the different types of Joins available in Joiner Transformation?

Answer:
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The
Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins:
Normal
Master Outer
Detail Outer
Full Outer

A normal or master outer join performs faster than a full outer or detail outer join.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

27
5. Define the various Join Types of Joiner Transformation.

Answer:
In a normal join, the Integration Service discards all rows of data from the master and detail source
that do not match, based on the join condition.
A master outer join keeps all rows of data from the detail source and the matching rows from the
master source. It discards the unmatched rows from the master source.
A detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
A full outer join keeps all rows of data from both the master and detail sources.

6. Describe the impact of number of join conditions and join order in a Joiner.

Answer:
We can define one or more conditions based on equality between the specified master and detail sources.
Both ports in a condition must have the same data type.
If we need to use two ports in the join condition with non-matching data types we must convert the data
types so that they match. The Designer validates data types in a join condition.
Additional ports in the join condition, increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we
use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.

Only equality operator is available in joiner join condition.

7. How does Joiner transformation treat NULL value matching?

Answer:
The Joiner transformation does not match null values.

For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not
consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then
join on the default values.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

28
If a result set includes fields that do not contain data in either of the sources, the Joiner transfor-
mation populates the empty fields with null values. If we know that a field will return a NULL and
we do not want to insert NULLs in the target, set a default value on the Ports tab for the corre-
sponding port.

8. When we configure the join condition, what are the guidelines we need to follow to main-
tain the sort order?
Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted
ports in order: ITEM_NO, ITEM_NAME and PRICE.

Answer:
If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and
PRICE we must ensure that:
Use ITEM_NO in the First Join Condition.
If we add a Second Join Condition, we must use ITEM_NAME.
If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in
the Second Join Condition.
If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the In-
tegration Service fails the session.

9. What are the transformations that cannot be placed between the sort origin and the Join-
er transformation so that we do not lose the input sort order?

Answer:
The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data.
However do not place any of the following transformations between the sort origin and the Joiner transfor-
mation:
Custom
Unsorted Aggregator
Normalizer
Rank
Union transformation
XML Parser transformation
XML Generator transformation
Mapplet [if it contains any one of the above mentioned transformations]

10. What is the use of sorted input in joiner transformation?

Answer:
It is recommended to Join sorted data when possible. We can improve session performance by con-
figuring the Joiner transformation to use sorted input. When we configure the Joiner transformation
to use sorted data, it improves performance by minimizing disk input and output. We see
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

29
great performance improvement when we work with large data sets.
For an unsorted Joiner transformation, designate as the master source the source with fewer rows.

For optimal performance and disk storage, designate the master source as the source with the fewer rows.
During a session, the Joiner transformation compares each row of the master source against the de-
tail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which
speeds the join process.

11. Can we join two tables based on a join column having different data type?
For example table 1 EMPNO (string) and table 2 EMPNUM (number)
Answer:
Yes possible in this case. If we are using Joiner, we should be able to do this explicit conversion in an expres-
sion transformation before joining the tables.

12. Implementation Scenario1 - Joiner transformation is joining two tables s1 and s2. s1 has
10,000 rows and s2 has 1000 rows . Which table you will set master for better perfor-
mance of joiner transformation? Why?

Answer:
Set table S2 as Master table because informatica server has to keep master table in the cache so if it is 1000
in cache will get performance instead of having 10000 rows in cache.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

30
5. Lookup Transformation

1. What is a Lookup transform?

Answer:
The transform is used to look up data in a flat file, relational table, views, or synonym. The informatica server
queries the lookup table based on the lookup ports in the transformation. It compares lookup transfor-
mation port values to lookup table column values based on the lookup condition. The result is passed to
other transformations and the target.
Uses:
Get related value
Perform a calculation
Update slowly changing dimension tables.

2. What are the differences between Connected and Unconnected Lookup?

Answer:

The differences are illustrated in the below table:
Connected Lookup Unconnected Lookup
Connected lookup participates in dataflow and re-
ceives input directly from the pipeline
Unconnected lookup receives input values
from the result of a LKP: expression in an-
other transformation
Connected lookup can use both dynamic and static
cache
Unconnected Lookup cache can NOT be
dynamic
Connected lookup can return more than one col-
umn value ( output port )
Unconnected Lookup can return only one
column value i.e. output port
Connected lookup caches all lookup columns Unconnected lookup caches only the
lookup output ports in the lookup condi-
tions and the return port
Supports user-defined default values (i.e. value to
return when lookup conditions are not satisfied)
Does not support user defined default val-
ues

3. What are the different lookup cache(s)?

Answer:
Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dy-
namic.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

31
A static cache is one which does not modify the cache once it is built and the data remains same during the
session run.
On the other hand, a dynamic cache is refreshed during the session run by inserting or updating the records
in cache based on the incoming source data.
By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the
cache even after the completion of session run or deletes it.

4. Is lookup an active or passive transformation?

Answer:
From Informatica 9x, Lookup transformation can be configured as an "Active" transformation.
Find out How to configure lookup as active transformation.
However, in the earlier versions of Informatica, lookup is a passive transformation.

5. What is the difference between Static and Dynamic Lookup Cache?

Answer:
We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read-
only lookup cache the Integration Service caches the lookup table at the beginning of the session and does
not update the lookup cache while it processes the Lookup transformation. Rows are not added dynamically
in the cache.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup
cache and passes the data to the target. The dynamic cache is synchronized with the target. It basically,
caches the rows as and when it is passed.
In case you are wondering why we need to make lookup cache dynamic, read this article on dynamic lookup.

6. What are the uses of index and data caches?

Answer:
The conditions are stored in index cache and records from the lookup are stored in data cache

7. What is Persistent Lookup Cache?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

32
If the cache generated for a Lookup needs to be preserved for subsequent use then persistent cache is used.
It will not delete the index and data files. It is useful only if the lookup table remains constant.
Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The
Integration Service saves or deletes lookup cache files after a successful session run based on, whether the
Lookup cache is checked as persistent or not.

8. What type of join does Lookup support?

Answer:
Lookup is just similar like SQL LEFT OUTER JOIN.

9. Explain how lookup transformation works like SQL Left Outer Join.

Answer:
Lookup means if the source input column value matches the lookup table comparison column value then it
will Return valid values from the lookup table else it will return NULL.

Lets consider the EMP table as Source and DEPT table as lookup. We want to extract the location of each
employee based on his or her department number. So if the Location details are not available in the DEPT
table, still we want to have all the other information of the employee coming from the source EMP table,
apart from NULL as location and load in our target table.

So the equivalent SQL query looks like below:-

SELECT EMP.*, DEPT.LOC
FROM EMP LEFT OUTER JOIN DEPT
ON EMP.DEPTNO = DEPT.DEPTNO

Hence Lookup is associated with the Source table as Left Outer Join.

10. Where and why do we use Unconnected Lookup instead of Connected Lookup?

Answer:
The best part of unconnected lookup is that, we can call the lookup based on some condition and
not every time. I.e. based on some condition met we can invoke the unconnected lookup in an
expression transformation else not. By this we may optimize the performance of a flow.
We may consider unconnected lookup as a function in any procedural language. It takes multiple parameters
as input and returns one values, and can be used repeatedly. Same way unconnected lookup can be used in
any scenario where we need to use the lookup repeatedly either in single or multiple transformation.

With the unconnected lookup, we get the performance benefit of not caching the same data multiple times.
Also it is a good coding practice.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

33
11. How can we Identify Persistent Cache Files in Informatica Server?

Answer:
Cache files are generated in the Cache directory of the Informatica Server for transformations like
Aggregator, Joiner, Lookup, Rank & Sorter.
Two types of cache files are generated i.e. the data and index files exception being Sorter transfor-
mation.
Most Important point is that Informatica automatically deletes all the generated .dat and .idx cache
files after a session run is finished.
So the files that are present in the Cache directory are basically the Persistent Cache files of Lookup
transformation, Aggregator Cache files of Incremental Aggregation sessions or if the session run was
not successfully completed.
Informatica generated cache files are named as:
PMAGG*.idx, PMAGG*.dat, PMJNR*.idx, PMJNR*.dat, PMLKP*.idx, PMLKP*.dat.
Often while handling big data cache Informatica creates multiple index and data files due to paging
and appends a number to the end of the files e.g. PMAGG*.dat0, PMAGG*.idx0, PMAGG*.dat1,
PMAGG*.idx1.
So if we have followed any particular naming convention for Lookup Persistent Cache Name e.g. ta-
ble_name_PC or the table names have a convention like GDW_ then use shell commands accordingly to
identify the cache files in server.

In this context you can revisit Lookup Persistent Cache and Incremental Aggregation article

12. How to configure a Lookup on a flat file with header?

Answer:
When we try to create a lookup transformation, we have the option to select the location of the Lookup Ta-
ble from any of Source, Target, Source Qualifier, Import from Relational Table or Import from Flat File.
So after selecting the flat file as lookup from the desired location, the edit Transformation tab of the lookup
will have the Flat file information to choose between Delimited or Fixed width and advanced properties to
modify like Column Delimiters, Code Page and obviously Number of initial rows to skip.
Set Number of initial rows to skip as 1. Set the Lookup condition as required.
Apart from that go to the Mapping tab of the corresponding session and select the lookup transformation to
configure the Lookup source file directory and filename and Lookup source file type i.e. Direct or Indirect.

13. What is the difference between persistent cache and shared cache?

Answer:
Persistent cache is a type of Informatica lookup cache in which the cache file is stored in disk. We
can configure the session to re-cache if necessary. It will be used only if we are sure that lookup
table will not change between sessions.
It will be used if your mapping uses any static tables as lookup mostly.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

34

If the persistent cache is shared across mappings, we call it as shared cache (named). We will provide a name
for this cache file.
If the lookup table is used in more than one transformation/mapping then the cache built for the first lookup
can be used for the others. It can be used across mappings.
For Shared cache we have to give the name in cache file name prefix property. Use the same name it in dif-
ferent lookup where we want to use the cache.
Unshared cache: Within the mapping if the lookup table is used in more than one transformation then the
cache built for the first lookup can be used for the others. It cannot be used across mappings.

14. Describe how to return multiple port values from unconnected lookup in Informatica.

Answer:
Informatica Unconnected Lookup by default supports only one return port.
So alternatively we can write a Lookup SQL override with the required ports values concatenated into a sin-
gle string as return port value.
Call the Unconnected lookup from the expression transformation and use various output ports to retrieve
the lookup values based on the concatenated return value. Use SUBSTR, INSTR functions to extract the col-
umn values from the concatenated return field.

15. How to make the persistent lookup cache in sync with lookup table?

Answer:
To make the persistent cache in sync with the lookup table simply enable Re-cache option of the lookup
transformation to rebuild the lookup cache from lookup table again. While loading the target dimension ta-
ble we can choose to make the lookup cache dynamic and recache-persistent so that once dimension is
loaded the persistent cache file is in sync and available during Fact table loading.

16. If we use persistent cache for a dynamic lookup, will the cache file be updated or inserted
as required?

Answer:
Having persistent cache will not impact the dynamic cache anyway in doing insert & updates to the cache
file. Just that cache file will have a proper name assigned using persistent named cache and it can be reused
later.

17. Is there anything wrong in sharing a persistent cache between static and dynamic lookup?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

35
Static & Dynamic lookup cannot share the same persistent cache.

18. What is the difference between the two update properties - update else insert, insert else
update in dynamic lookup cache?

Answer:
I
In Dynamic Cache:
Update else Insert: In this scenario, if incoming record already exists in lookup cache then the record
is going to be updated in the cache and also the target else it will be inserted.
Insert else Update: In this scenario, if incoming record does not exist in lookup cache then the record
is going to be inserted in the cache and also the target else it will be updated.
These options play a role in the performance part. If we know the nature of the source data we can set the
update option accordingly. Suppose if the maximum source data is destined for insert we will select Insert
else Update, otherwise we will go for Update else Insert. Also, if the number of duplicate records coming
from Source is greater or there are few potential duplicates in source then we go for Update Else Insert or
Insert Else Update respectively for better performance.

19. If the default value for the lookup return port is not set, what will be the output when the
lookup condition fails?

Answer:
NULL will be returned from lookup transformation on lookup condition failure.

20. How can we ensure data is not duplicated in the target when the source has duplicate
records, using lookup transformation?

Answer:
Using Dynamic lookup cache we can ensure duplicate records are not inserted in the target. That is through
Using Dynamic Lookup Cache of the target table and associating the input ports with the lookup port and
checking the Insert Else Update option will help to eliminate the duplicate records in source and hence load-
ing unique records in the target.
For more details check, Dynamic Lookup Cache

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

36
6. Normalizer Transformation

1. What is a Normalizer transformation?

Answer:
The normalizer transformation normalizes records from COBOL and relational sources, allowing you to or-
ganize the data according to your own needs. A Normalizer transformation can appear anywhere in a data
flow when you normalize a relational source. Use a Normalizer transformation instead of the Source Qualifi-
er transformation when you normalize COBOL source. When you drag a COBOL source into the Mapping De-
signer Workspace, the Normalizer transformation appears, creating input and output ports for every col-
umns in the source.

Suppose in our Source Table we have data as given below:
Student Name Math Life Science Physical Science
Sam 100 70 80
John 75 100 85
Tom 80 100 85
We want to load our Target Table as:
Student Name Subject Name Marks
Sam Math 100
Sam Life Science 70
Sam Physical Science 80
John Math 75
John Life Science 100
John Physical Science 85
Tom Math 80
Tom Life Science 100
Tom Physical Science 85
Describe your approach.
Answer:
Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expres-
sion Transformation to decode the column taken into consideration. For more details on how the mapping is
performed please visit Working with Normalizer.

3. What are levels in Normalizer transformation?

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

37
Answer:
The VSAM Normalizer transformation is the Source Qualifier for a COBOL source definition. A COBOL can
contain multiple-occurring data (Group of columns of same type) and multiple types of records in the same
file. Mostly level is for that use. The Normalizer tab defines the structure of the source data. A group of col-
umns might define a record in a COBOL source or it might define a group of multiple-occurring fields in the
source.
The column level number identifies groups of columns in the data. Level numbers define a data hierarchy.
Columns in a group have the same level number and display sequentially below a group-level column. A
group-level column has a lower level number, and it contains no data.

4. What is the purpose of GCID and GK in a Normalizer transformation?

Answer:
Lets take an example:
Source data is:

Name FOOD HOUSERENT TRANSPORT
Saurav 1000 2000 500
Jenny 2000 2500 700

When we set the OCCURS property of the Normalizer to 3, the Normalizer creates 3 input ports to get data
from the source. Say the 3 columns FOOD, HOUSERENT and TRANSPORT is connected to the 3 input ports of
the Normalizer. Then the GCID gets 3 values 1, 2 and 3 corresponding to the connected input columns for
FOOD, HOUSERENT and TRANSPORT. Going forward it generates 3 rows for each input columns values of a
single source row.

On the other hand GK will keep a sequence value starting from 1 to number of source records. It holds the
sequence number of the source records being processed.
Below will help to visualize output data from the Normalizer in GCID and GK fields:

Name EXPENSEHEAD GCID_EXPENSEHEAD EXPENSE GK_EXPENSEHEAD
Saurav FOOD 1 1000 1
Saurav HOUSERENT 2 2000 1
Saurav TRANSPORT 3 500 1
Jenny FOOD 1 2000 2
Jenny HOUSERENT 2 500 2
Jenny TRANSPORT 3 700 2

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

38
7. Rank Transformation

1. What is a Rank Transform?

Answer:
Rank is an Active Connected transformation used to select a set of top or bottom values of data. It basically
filters the required number of records from the top or from the bottom.

2. How does a Rank Transform differ from Aggregator Transform functions MAX and MIN?

Answer:
Like the Aggregator transformation, the Rank transformation also groups information. The Rank Transform
allows us to select a group of top or bottom values, not just one value as in case of Aggregator MAX, MIN
functions.

3. How does a Rank Cache works?

Answer:
During a session, the Integration Service compares an input row with rows in the data cache. If the input row
out-ranks a cached row, the Integration Service replaces the cached row with the input row. If we configure
the Rank transformation to rank based on different groups, the Integration Service ranks incrementally for
each group it finds. The Integration Service creates an index cache to stores the group information and data
cache for the row data.

4. What is a RANK port and RANKINDEX?

Answer:
Rank port is an input/output port used to specify the column for which we want to rank the
source values. By default Informatica creates an output port RANKINDEX for each Rank trans-
formation. It stores the ranking position for each row in a group.

5. How can you get ranks based on different groups?

Answer:
Rank transformation lets us group information. We can configure one of its input/output ports as a group by
port. For each unique value in the group port, the transformation creates a group of rows falling within the
rank definition (top or bottom, and a particular number in each rank).

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

39
6. What happens if two rank values match?

Answer:
If two rank values match, they receive the same value in the rank index and the transformation skips the
next value.

7. What are the restrictions of Rank Transformation?

Answer:

We can connect ports from only one transformation to the Rank transformation.
We can select the top or bottom rank.
We need to select the Number of records in each rank.
We can designate only one Rank port in a Rank transformation.

8. How does Rank transformation handle string values?

Answer:
Rank transformation can return the strings at the top or the bottom of a session sort order.
When the Integration Service runs in Unicode mode, it sorts character data in the session using
the selected sort order associated with the Code Page of Integration Service which may be
French, German, etc. When the Integration Service runs in ASCII mode, it ignores this setting and
uses a binary sort order to sort character data.

9. What is Dense Rank and does Informatica supports Dense Rank?

Answer:
When multiple rows share the same rank the next rank in the sequence is not consecutive. On the other
hand DENSE RANK assigns consecutive ranks.

Take the following example: Lets say we want to see the top 2 highest salary of each department.

DEPTNO SAL RANK DENSE_RANK
10 400 1 1
10 400 1 1
10 300 3 2
10 100 4 3
20 550 1 1
20 550 2 2
20 150 2 2
30 200 1 1
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

40
40 600 1 1

So the normal RANK will generate the result set where we can miss rank (here RANK = 2 is missing for de-
partment 10) for due to sharing of same ranks between multiple records. On the other hand the DENSE
RANK will generate all the consecutive ranks.

Informatica RANK transform performs a simple RANK, not DENSE RANK. So using Informatica RANK trans-
form we may miss consecutive ranks.

10. How do we achieve DENSE_RANK in Informatica?

Answer:
In order to achieve the DENSE RANK functionality in Informatica we will use the combination of Sorter, Ex-
pression and Filter transformation. Based on the previous example data set, lets say we want to get the top
2 highest salary of each department as per DENSE RANK.
Use a SORTER transformation.
DEPTNO ASC, SAL DESC

After the sorter place an EXPRESSION transformation.

PORT_NAME TYPE EXPRESSION
DEPT I/O
SAL I/O
V_COMP V IIF (DEPT <> V_DEPT_PREV, 1, IIF (DEPT = V_DEPT_PREV AND SAL <>
V_SAL_PREV, RANK+1, RANK))
RANK O V_COMP
V_DEPT_PREV V DEPT
V_SAL_PREV V SAL

Next use a FILTER transformation.
FILTER CONDITION: RANK < 3

11. Source table has 5 rows. Rank in rank transformation is set to 10. How many rows the
rank transformation will output?

Answer:
5 Rank

12. How you will load unique record into target flat file from source flat files has duplicate da-
ta?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

41
In rank transformation using group by port (Group the records) and then set no. of rank 1. Rank transfor-
mation returns one value from the group. That value will be a unique one.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

42
8. Router Transformation

1. What is the difference between Router and Filter?

Answer:
Following differences can be note:
Router Filter
Router transformation divides the incoming rec-
ords into multiple groups based on some condi-
tion. Such groups can be mutually inclusive (Dif-
ferent groups may contain same record)
Filter transformation restricts or
blocks the incoming record set based
on one given condition.
Router transformation itself does not block any
record. If a certain record does not match any of
the routing conditions, the record is routed to de-
fault group
Filter transformation does not have a
default group. If one record does not
match filter condition, the record is
blocked
Router acts like CASE... WHEN statement in SQL
(Or Switch ()... statement in C)
Filter acts like WHERE condition is
SQL.

In filter transformation the records are filtered based on the condition and rejected rows are discarded. In
Router the multiple conditions are placed and the rejected rows can be assigned to a port.

2. What is the minimum number of groups we can declare in a Router transformation?

Answer:
We can define minimum 1 group condition for a Router transformation, and it will create automatically an-
other group called Default to pass those records that do not conform to the Router condition for the group
defined.

Loading Multiple Target Tables Based on Conditions- Suppose we have some serial numbers in a flat file
source. We want to load the serial numbers in two target files one containing the EVEN serial numbers and
the other file having the ODD ones.
Answer:
After the Source Qualifier place a Router Transformation. Create two Groups namely EVEN and ODD, with
filter conditions as:
MOD(SERIAL_NO,2)=0
MOD(SERIAL_NO,2)=1
Then output the two groups into two flat file targets.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

43

Suppose we have a source table and we want to load three target tables based on source rows such that first
row moves to first target table, second row in second target table, third row in third target table, fourth row
again in first target table so on and so forth. Describe your approach.
Answer:
We can clearly understand that we need a Router transformation to route or filter source data to the three
target tables. Now the question is what will be the filter conditions.
First of all we need an Expression Transformation where we have all the source table columns and along
with that we have another i/o port say seq_num, which gets sequence numbers for each source row from
the port NEXTVAL of a Sequence Generator start value 0 and increment by 1.
Now the filter condition for the three router groups will be:
MOD(SEQ_NUM,3)=1 connected to 1st target table
MOD(SEQ_NUM,3)=2 connected to 2nd target table
MOD(SEQ_NUM,3)=0 connected to 3rd target table
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

44

How can we distribute and load n number of Source records equally into two target tables, so that each
have n/2 records?
Answer:
After Source Qualifier use an expression transformation.
In the expression transformation create a counter variable

V_COUNTER = V_COUNTER + 1 (Variable port)
O_COUNTER = V_COUNTER (o/p port)

This counter variable will get incremented by 1 for every new record which comes in.

Router Transformation:

Group_ODD: IIF(MOD(O_COUNTER, 2) = 1)
Group_EVEN: IIF(MOD(O_COUNTER, 2) = 0)

Half of the record (all odd number record) will go to Group_ODD and rest to Group_EVEN.

Finally the target tables.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

45
9. Sequence Generator Transformation

1. What is a Sequence Generator Transformation?

Answer:
A Sequence Generator is a Passive and Connected transformation that generates numeric values.
It is used to create unique primary key values, replace missing primary keys, or cycle through a sequential
range of numbers.
This transformation by default contains two OUTPUT ports only, namely CURRVAL and NEXTVAL. We can-
not edit or delete these ports neither we cannot add ports to this unique transformation. We can create ap-
proximately two billion unique numeric values with the widest range from 1 to 2147483647.

2. Define the Properties available in Sequence Generator transformation in brief.

Answer:
Sequence Generator
Properties
Description
Start Value Start value of the generated sequence that we want the Integration
Service to use if we use the Cycle option. If we select Cycle, the In-
tegration Service cycles back to this value when it reaches the end
value. Default is 0.
Increment By Difference between two consecutive values from the NEXTVAL
port. Default is 1.
End Value Maximum value generated by Sequence Generator. After reaching
this value the session will fail if the sequence generator is not con-
figured to cycle. Default is 2147483647.
Current Value Current value of the sequence. Enter the value we want the Inte-
gration Service to use as the first value in the sequence. Default is
1.
Cycle If selected, when the Integration Service reaches the configured
end value for the sequence, it wraps around and starts the cycle
again, beginning with the configured Start Value.
Number of Cached
Values
Number of sequential values the Integration Service caches at a
time. Default value for a standard Sequence Generator is 0. Default
value for a reusable Sequence Generator is 1,000.
Reset Restarts the sequence at the current value each time a session
runs. This option is disabled for reusable Sequence Generator
transformations.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

46
Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Se-
quence Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values
in both of them.
Answer:
When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key col-
umns of the target tables, the Sequence number will not be the same.
A block of sequence numbers is sent to one target tables surrogate key column. The second target receives a
block of sequence numbers from the Sequence Generator transformation only after the first target table re-
ceives the block of sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1
(1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by
1]
Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression transformation in between the Se-
quence Generator and the Target tables. The Sequence Generator will pass unique values to the expression
transformation, and then the rows are routed from the expression transformation to the targets.

Suppose we have 100 records coming from the source. Now for a target column population we used a Se-
quence generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Answer:
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the
session fails with the following error message:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

47
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e.
whenever the Integration Service reaches the configured end value for the sequence; it wraps around and
starts the cycle again, beginning with the configured Start Value.

5. What are the changes we observe when we promote a non-reusable Sequence Generator
to a reusable one? And what happens if we set the Number of Cached Values to 0 for a
reusable transformation?

Answer:
When we convert a non-reusable sequence generator to reusable one we observe that the Number of
Cached Values is set to 1000 by default.
And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the
Transformation Developer we encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.

6. How Sequence Generator in the mapping is handled when we migrate the mapping from
one environment to another?

Answer:
While promoting the Informatica Objects using Copy Folder Wizard we have the option to choose to retain
existing values or to replace them with values from the source folder.
Generally we Retain the current values for the Sequence Generator transformation in the destination fold-
er, else we may end up having duplicate values for the sequence generated column and may result to ses-
sion failure.
Find the below Informatica Metadata query which gives the list of the current value of Sequence Generator
transform:
SELECT
OPB_SUBJECT.SUBJ_NAME AS "FOLDER NAME",
OPB_MAPPING.MAPPING_NAME AS "MAPPING NAME",
REP_WIDGET_INST.INSTANCE_NAME AS "SEQ NAME",
OPB_WIDGET_ATTR.ATTR_VALUE AS "CURRENT VALUE"
FROM REP_WIDGET_INST
INNER JOIN OPB_MAPPING ON
(REP_WIDGET_INST.MAPPING_ID = OPB_MAPPING.MAPPING_ID)
INNER JOIN OPB_WIDGET_ATTR ON
(REP_WIDGET_INST.WIDGET_TYPE = OPB_WIDGET_ATTR.WIDGET_TYPE AND
REP_WIDGET_INST.WIDGET_ID = OPB_WIDGET_ATTR.WIDGET_ID)
INNER JOIN OPB_SUBJECT ON
(OPB_MAPPING.SUBJECT_ID = OPB_SUBJECT.SUBJ_ID )
WHERE
REP_WIDGET_INST.WIDGET_TYPE_NAME like 'Sequence%'
AND OPB_WIDGET_ATTR.ATTR_ID = 4 --Current Value
ORDER BY OPB_MAPPING.MAPPING_NAME
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

48

Consider we have two mappings that populate a single target table from two different source systems. Both
the mappings have Sequence Generator transform to generate surrogate key in the target table. How can
we ensure that the surrogate key generated is consistent and does not generate duplicate values when pop-
ulating data from two different mappings?
Answer:
We should use a Reusable Sequence Generator in both the mappings to generate the target surrogate keys.

8. How do I get a Sequence Generator to "pick up" where another "left off"?

Answer:
Use an unconnected lookup on the Sequence ID of the target table. Set the properties to "LAST VALUE", in-
put port is an ID. the condition is: SEQ_ID >= input_ID. Then in an expression set up a variable port: connect
a NEW self-resetting sequence generator to a new input port in the expression. The variable port's expres-
sion should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true, :LKP.lkp_sequence(1), v_seq). Then, set up an
output port. Change the output port's expression to read: v_seq + input_seq (from the resetting sequence
generator). Thus you have just completed an "append" without a break in sequence numbers.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

49
10. Stored Procedure Transformation

1. What is a Stored Procedure Transformation?

Answer:

Stored Procedure is a Passive transformation used to execute stored procedures pre-built on the database
through Informatica ETL. It can also be used to call functions to return calculated values.

2. How many types of Stored Procedure transformation are there?

Answer:
There are two types of Stored Procedure transformation based on calling, Connected and Uncon-
nected. Based on the execution order they can be classified as Source Pre Load, Source Post Load,
Normal, Target Pre Load and Target Post Load.
Normal Stored Procedure transformation can be configured as both connected and unconnected whereas
Pre-Post Load Stored Procedures are unconnected ones.

3. How do we call an Unconnected Stored Procedure transformation?

Answer:
The unconnected Stored Procedure transformation is called from expression transformation using the
:SP.<Stored_Procedure_Name>(Argument1, Argument2).
Conditional execution of a Stored Procedure is possible using Unconnected Stored Procedure unlike the con-
nected one.

4. How do we set the Execution order of Pre-Post Load Stored Procedure?

Answer:

We set the execution order using the Stored Procedure Plan from the mapping property.

5. How do we set the Call Text for Stored Procedure transformation?

Answer:
Once we specify the Stored Procedure Type other than Normal, the Call Text Attribute in the Properties tab
gets enabled. Here we have to specify how the procedure has to be called along with arguments to be
passed. E.g. <Stored_Procedure_Name>(Argument1, Argument2).
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

50

6. How do we receive output/return parameters from Unconnected Stored Procedure?

Answer:
Configure the expression to send any input parameters and capture any output parameters or return value
You must know whether the parameters shown in the Expression Editor are input or output parameters. You
insert variables or port names between the parentheses in the exact order that they appear in the stored
procedure itself. The datatypes of the ports and variables must match those of the parameters passed to the
stored procedure.
For example, when you click the stored procedure, something similar to the following appears:
:SP.GET_NAME_FROM_ID()
This particular stored procedure requires an integer value as an input parameter and returns a string value
as an output parameter. How the output parameter or return value is captured depends on the number of
output parameters and whether the return value needs to be captured.
If the stored procedure returns a single output parameter or a return value (but not both), you should use
the reserved variable PROC_RESULT as the output variable. In the previous example, the expression would
appear as:
:SP.GET_NAME_FROM_ID(inID, PROC_RESULT)
InID can be either an input port for the transformation or a variable in the transformation. The value of
PROC_RESULT is applied to the output port for the expression.
If the stored procedure returns multiple output parameters, you must create variables for each output pa-
rameter. For example, if you created a port called varOUTPUT2 for the stored procedure expression, and a
variable called varOUTPUT1, the expression would appears as:
:SP.GET_NAME_FROM_ID (inID, varOUTPUT1, PROC_RESULT)
The value of the second output port is applied to the output port for the expression, and the value of the
first output port is applied to varOUTPUT1. The output parameters are returned in the order they are de-
clared in the stored procedure itself.
With all these expressions, the datatypes for the ports and variables must match the datatypes for the in-
put/output variables and return value.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

51
11. Sorter Transformation

1. What is a Sorter Transformation?

Answer:
Sorter is an Active Connected transformation used to sort data in ascending or descending order according
to specified sort keys. The Sorter transformation contains only input/output ports.

2. Why is Sorter an Active Transformation?

Answer:
This is because we can select the distinct option in the sorter property. When the Sorter transformation is
configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Inte-
gration Service discards duplicate rows compared during the sort operation. The number of Input
Rows will vary as compared with the Output rows and hence it is an Active transformation.

3. How does Sorter handle Case Sensitive sorting?

Answer:
The Case Sensitive property determines whether the Integration Service considers case when sorting data.
When we enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than
lowercase characters.

4. How does Sorter handle NULL values?

Answer:
We can configure the way the Sorter transformation treats null values. Enable the property Null Treated
Low if we want to treat null values as lower than any other value when it performs the sort operation. Disa-
ble this option if we want the Integration Service to treat null values as higher than any other value.

5. How does a Sorter Cache works?

Answer:
The Integration Service passes all incoming data into the Sorter Cache before Sorter transfor-
mation performs the sort operation.
The Integration Service uses the Sorter Cache Size property to determine the maximum amount
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

52
of memory it can allocate to perform the sort operation. If it cannot allocate enough memory, the Integra-
tion Service fails the session. For best performance, configure Sorter cache size with a value less than or
equal to the amount of available physical RAM on the Integration Service machine.

If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service tem-
porarily stores data in the Sorter transformation work directory. The Integration Service requires disk space
of at least twice the amount of incoming data when storing data in the work directory.

6. How to delete duplicate records or rather to select distinct rows for flat file sources?

Answer:
Since the source system is a Flat File you will not be able to select the distinct option in the source qualifier
as it will be disabled due to flat file source table. Hence the next approach may be we use a Sorter Trans-
formation and check the Distinct option. When we select the distinct option all the columns will the selected
as keys, in ascending order by default.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

53
12. Union Transformation

1. What is a Union Transformation?

Answer:
Union is an Active, Connected non-blocking multiple input group transformation used to merge data from
multiple pipelines or sources into one pipeline branch. Similar to the UNION ALL SQL statement, the Union
transformation does not remove duplicate rows.

2. What are the restrictions of Union Transformation?

Answer:
All input groups and the output group must have matching ports. The precision, data type, and scale
must be identical across all groups.
We can create multiple input groups, but only one default output group.
The Union transformation does not remove duplicate rows.
We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union
transformation.
The Union transformation does not generate transactions.

3. How come union transformation is active?

Answer:
Active transformations are those that may change the number or position of rows in the data
stream. Any transformation that splits or combines data streams or reduces, expands or sorts da-
ta is an active transformation because it cannot be guaranteed that when data passes through the
transformation the number of rows and their position in the data stream are always unchanged.
Union is an active transformation because it combines two or more data streams into one. Though the total
number of rows passing into the Union is the same as the total number of rows passing out of it, and the se-
quence of rows from any given input stream is preserved in the output, the positions of the rows are not
preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union
does not even guarantee that the output is repeatable.
For Union, number of input rows does not match with the number of output rows. Consider, we have two
sources with 10 and 20 rows individually. For each of this input Source we are getting 30 output rows. We
could probably consider this like a Joiner with 10 and 20 rows with Full Outer Join, with no matching col-
umns, which will give you all the rows as output.
It is a debatable Topic as why UNION transformation is Active. Union Transformation is derived from
Multigroup External transformation. As Multigroup External transformation is Active, Union transformation
can be termed as active.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

54
13. Update Strategy Transformation

1. What is Update Strategy transform?

Answer:
Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.

2. What are Update Strategy Constants?

Answer:
DD_INSERT - 0
DD_UPDATE - 1
DD_DELETE - 2
DD_REJECT - 3

3. How can we update a record in target table without using Update strategy?

Answer:
A target table can also be updated without using Update Strategy. For this, we need to define
the key in the target table in Informatica level and then we need to connect the key and the
field we want to update in the mapping Target. In the session level, we should set the target
property as Update as Update and enable the Update check-box.
Let's assume we have a target table "Customer" with fields as "Customer ID", "Customer Name" and "Cus-
tomer Address". Suppose we want to update "Customer Address" without an Update Strategy. Then we
have to define "Customer ID" as primary key in Informatica level and we will have to connect Customer ID
and Customer Address fields in the mapping. If the session properties are set correctly as described above,
then the mapping will only update the customer address field for all matching customer IDs.

4. What is Data Driven?

Answer:
Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.
Treat input rows as Data Driven: This is the default session property option selected while using an Update
Strategy transformation in a mapping.
The integration service follows the instructions coded in mapping to flag the rows for insert, update, delete
or reject.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

55

5. What happens when DD_UPDATE is defined in update strategy and Treat source rows as
INSERT is selected in Session?

Answer:
If in Session anything other than DATA DRIVEN is mentioned then Update strategy in the mapping is ignored.

6. What are the three areas where the rows can be flagged for particular treatment?

Answer:
In Mapping Update Strategy
In Session - Treat Source Rows As
In Session - Target Insert / Update / Delete Options.

7. By default operation code for any row in Informatica without being altered is INSERT.
Then state when do we need DD_INSERT?

Answer:
When we handle data insertion, updating, deletion and/or rejection in a single mapping, we use
Update Strategy transformation to flag the rows for Insert, Update, Delete or Reject. We flag it
by either providing the values 0, 1, 2, 3 respectively or by DD_INSERT, DD_UPDATE, DD_DELETE
or DD_REJECT in the Update Strategy transformation. By default the transform has the value '0'
and hence it performs insertion.
Suppose we want to perform insert or update target table in a single pipeline. Then we can write the below
expression in update strategy transformation to insert or update based on the incoming row.
IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE)

If we can use more than one pipeline then, its not a problem. For the Insert part we dont even need an Up-
date Strategy transform explicitly (DD_INSERT), we can map it straight away.

8. What is the difference between update strategy and following update options in target?
Update as Update - Update as Insert - Update else Insert Even if we do not use update strategy we can still
update the target by setting, for example Update as Update and treating target rows as data driven. So
what's the difference here?
Answer:
The operations for the following options will be done in the Database Level.
Update as Update
Update as Insert
Update else Insert
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

56
It will write a 'select' statement on the target table and will compare with the source. Accordingly if the rec-
ord already exits it will do an update else it will insert. On the other hand the update strategy the operations
will be done at the Informatica level itself.
Update strategy also gives conditional update option - wherein based on some condition you can update/ in-
sert even reject the rows. Such conditional options are not available in target based updates (wherein it will
either update or it will perform update else insert based on the keys defined in Informatica level)

9. What is the use of Forward Reject rows in Mapping?

Answer:
If DD_REJECT is selected in the Update Strategy, then we need to select this option to generate the Reject/
Bad file.

Suppose we have source employee table and we want to load employees who belong to department 10 to
Target 1, 20 to Target 2 and 30 to Target 3. Describe the approach without using FILTER or ROUTER Trans-
formations.
Answer:
We will use three separate Update Strategy transformations before each of the target tables (T1, T2, T3),
and provide below condition in their expression editor:
UPD_T1: IIF (DEPTNO = 10, DD_INSERT, DD_REJECT)

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

57
14. Java Transformation

Source:
Col1 Col2
A 3
B 2
C 2

Target:
Col1 Col2
A 3
A 3
A 3
B 2
B 2
C 2
C 2

Answer:
Using Java transformation in Informatica we can generate as many records required as per the requirement.
Here goes the Java code.
In_Col1 = Col1;
In_Col2 = Col2;

for (int i = 0, i < In_Col2, i++) {
Out_Col1 = In_Col1;
Out_Col2 = In_Col2;
generaterows();
}

How can I replace characters e.g. A to Z in a particular string to its ASCII value?
E.g. Input String-AB123C1; Output string-6566123671
Answer:
If the INPUT string is fixed size of 9 characters, Use the below code as expression in an Output port of an
Informatica Expression transformation.
Alternatively you can use Informatica User-Defined Function with the INPUT string as an Argument:

IIF( IS_NUMBER( SUBSTR( INPUT, 1, 1 ) ) = 1, SUBSTR( INPUT, 1, 1 ),
TO_CHAR( ASCII( SUBSTR( INPUT, 1, 1 ) ) ) ) ||
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

58
TO_CHAR( ASCII( SUBSTR( INPUT, 9, 1 ) ) ) )
As per the requirement we want to convert just the Characters in an input String to its ASCII equivalent not
the Digits.

If the requirement were to convert a single character to ASCII equivalent in Informatica, then
the ASCII in-built function of Informatica would have been helpful. E.g. ASCII(inp_chr)
But single this is a string and we need the ASCII equivalent of each characters in the string i.e.
parse each characters; concept of loop comes in picture. So use Informatica JAVA transformation.

Use Informatica Passive Java transformation:

I have the i/p column name as INPUT and o/p value from Java transform as OUTPUT port created.
On the Java Code tab of Java transformation use the below java code:-

String inp = INPUT;
String ch;
String out="";

for (int i = 0; i < inp.length(); i++) {
ch= inp.substring(i, i+1);
char c = inp.charAt(i);
if(! Character.isDigit(c)) {
int j = (int) c;
out = out + j;
} else
out = out + ch;
}
OUTPUT = out;

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

59
15. Source Qualifier Transformation

1. What is a Source Qualifier? What are the tasks we can perform using a Source Qualifier
and why it is an ACTIVE transformation?

Answer:
A Source Qualifier is an Active and Connected transformation that reads the rows from a relational database
or flat file source.
We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same
source database.
We can use a source filter to reduce the number of rows the Integration Service queries.
We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to
the default SQL query.
We can choose Select Distinct option for relational databases and the Integration Service adds a SE-
LECT DISTINCT clause to the default SQL query.
Also we can write Custom/Used Defined SQL query which will override the default query in the
Source Qualifier by changing the default settings of the transformation properties for relational da-
tabases.
Also we have the option to write Pre as well as Post SQL statements to be executed before and after
the Source Qualifier query in the source database.
Since the transformation provides us with the property Select Distinct, when the Integration Service adds a
SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the
Database to the Integration Service and hence it is an Active transformation.

2. What happens to a mapping if we alter the data types between Source and its corre-
sponding Source Qualifier?
Answer:
The Source Qualifier transformation displays the Informatica data types. The transformation data types de-
termine how the source database binds data when the Integration Service reads it.
Now if we alter the data types in the Source Qualifier transformation or the data types in the Source defini-
tion and Source Qualifier transformation do not match, the Designer marks the mapping as invalid when
we save the mapping.

3. Suppose we have used the Select Distinct and the Number of Sorted Ports property in the
Source Qualifier and then we add Custom SQL Query. Explain what will happen.

Answer:
Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Num-
ber of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user
defined SQL Query will be fired in the database and all the other options will be ignored.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

60

4. Describe the situations where we will use the Source Filter, Select Distinct and Number of
Sorted Ports properties of Source Qualifier transformation.

Answer:
Source Filter option is used basically to reduce the number of rows the Integration Service queries, so as to
improve performance.
Select Distinct option is used when we want the Integration Service to select unique values from a source.
Filtering out unnecessary data earlier in the data flow, will improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion, so as to use
the same in some following transformations like Aggregator or Joiner, those when configured for sorted in-
put will improve the performance.

5. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the
OUTPUT PORTS order in Source Qualifier transformation do not match?

Answer:
Mismatch or changing the order of the list of selected columns in the SQL Query override of Source Qualifier
to that of the connected transformation output ports may result is unexpected value result for ports if data
types matches by chance, else will lead to session failure.

6. What happens if in the Source Filter property of SQ transformation we include keyword
WHERE say, WHERE CUSTOMERS.CUSTOMER_ID > 1000.

Answer:
We use Source filter to reduce the number of source records. If we include the string WHERE in the source
filter, the Integration Service fails the session. In the above case, the correct syntax will be CUSTOM-
ERS.CUSTOMER_ID > 1000

7. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier
transformation.

Answer:
While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner trans-
formation. Use the Joiner transformation when we need to join the following types of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

61

8. What is the maximum number we can use in Number of Sorted Ports for Sybase source
system?

Answer:
Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort
more than 16 columns.

9. What is use of Source Qualifier in Informatica? Can we create a mapping without a source
qualifier?

Answer:
Source Qualifier is used to convert the data types of Heterogeneous Source Objects supported by
Informatica to Native Informatica data types, after which Informatica processes the following ob-
jects in a mapping with consistent Informatica data types.
Also for relational table Source Qualifier helps to join multiple tables from the same database and also al-
lows doing Pre or Post SQL operations.
We cannot create a mapping without Source Qualifier; it is the first transformation in Informatica that is at-
tached with the source tables or source flat file instance.

10. Suppose we have two tables of same database type, residing in different Database in-
stance. If a Database Link is available, how can we join the two tables using a Source
Qualifier in Informatica provided there are valid join columns.

Answer:
Source Qualifier Override:-

SELECT e.empno, e.ename, s.salary, s.comm
FROM emp e, sal@dblinkname s
WHERE e.empno=s.empno

It is advisable to create a Public Synonym at Database for the remote tables so that we can avoid using the
syntax : TableName@DBLinkName

11. What is the meaning of output is deterministic property in source qualifier transfor-
mation?

Answer:
Output is deterministic means we are informing Informatica that the output does not change (for
the same input) across every session run. Why is this required? Consider the source is relational
and we have enabled the session for recovery. The session fails and we resume the session. In this
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

62
case if we have set the source as deterministic, then the session would have created a cache (on the disc) of
the source during normal run to be used for recovery. This saves time during recovery because we need not
issue the SQL command to the source database again.

If this was not set, then the source data cache is not created during normal run and SQL will be reissued dur-
ing recovery. In some cases, if this property is not set you will not be able to enable recovery for the session.

How to delete duplicate rows present in relational database using Informatica? Suppose we have duplicate
records in Source System and we want to load only the unique records in the Target System eliminating the
duplicate rows. What will be the approach?
Answer:
Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check
the Distinct option of the Source Qualifier of the source table and load the target accordingly.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

63
16. Miscellaneous

1. What are the new features of Informatica 9.x in developer level?

Answer:
From a developer's perspective, some of the new features in Informatica 9.x are as follows:
Now Lookup can be configured as an active transformation - it can return multiple rows on success-
ful match.
Now you can write SQL override on un-cached lookup also. Previously you could do it only on
cached lookup.
You can control the size of your session log. In a real-time environment you can control the session
log file size or time.
Database deadlock resilience feature - this will ensure that your session does not immediately fail if
it encounters any database deadlock, it will now retry the operation again. You can configure num-
ber of retry attempts.
Cache can be updated based on a condition or expression.
New interface for admin console, now onwards called Informatica Administrator. (Create connection
objects, grant permission on database connections, deploy or configure deployment units from the
Informatica Administrator)
PowerCenter licensing now onwards based on the number of CPUs and repositories.

2. Name the transformations which converts one to many rows i.e. increases the I/P: O/P
row count. Also what is the name of its reverse transformation?

Answer:
Normalizers as well as Router Transformations are two Active transformations which can increase the num-
ber of input rows to output rows.
Aggregator Transformation performs the reverse action of Normalizer transformation.

3. How many ways we can filter records?

Answer:
Source Qualifier
Filter transformation
Router transformation
Update strategy

4. What are the transformations that use cache for performance?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

64
Aggregator, Sorter, Lookups, Joiner and Rank transformations use cache.

5. What is the formula for calculation of Lookup/Rank/Aggregator index & data caches?

Answer:
Index cache size = Total no. of rows * size of the column in the lookup condition (50 * 4)

Aggregator/Rank transformation Data Cache size = (Total no. of rows * size of the column in the
lookup condition) + (Total no. of rows * size of the connected output ports)

Aggregator Index cache: #Groups (( column size) + 7)
Aggregate data cache: #Groups (( column size) + 7)

Lookup Index Cache : #Rows in lookup table [( column size) + 16)
Lookup Data Cache: #Rows in lookup table [( column size) + 8]

Joiner Index Cache: #Master rows [( column size) + 16)
Joiner Data Cache: #Master row [( column size) + 8]

Rank Index Cache : #Groups (( column size) + 7)
Rank Data Cache: #Group [(#Ranks * ( column size + 10)) + 20]

6. What is the difference between Informatica PowerCenter and Exchange and Mart?

Answer:
PowerCenter:
PowerCenter can have many repositories.
It supports the Global Repository and networked local repositories.
PowerCenter can connect to all native legacy source systems such as Mainframe, ERP, CRM, EAI
(TIBCO, MSMQ, JMQ)
High Availability and Load sharing on multiple servers in the grid.
Informatica Session level Partioning is available.
Informatica Pushdown Optimizer is available.
PowerMart:
PowerMart supports only one repository.
PowerMart can connect to Relational and flat file sources.
PowerExchange:
PowerExchange Client and PowerExchange ODBC are PowerExchange interfaces to extract and load
data for a variety of data types on a variety of platforms relational, non-relational, and changed data
in batch-mode or real-time using PowerCenter.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

65
The PowerExchange Client for PowerCenter is installed with PowerCenter and integrates
PowerExchange(Separate License for the required source system; Check Sources->Import from
PowerExchange) and PowerCenter to extract relational, non-relational, and changed data.

7. How do we handle delimiter character as a part of the data in a delimited source file?

Answer:
For delimiter files the delimiter is the separator that identifies the data values of fields present in
the file.
So ideally if the data file contains the delimiter character as a part of the data in a field value,
the field value either remains within double or single quotes or an escape character precedes
the delimiter that is actually to be treated as a normal character.

To handle the same flat-files in Informatica, use the following options as per the data file format while defin-
ing the file structure.

1. Select Optional Quotes to Double or Single Quote. The column delimiters within the quote characters are
ignored.

2. Escape Character used to escape the delimiter or quote character.
Escape character preceding the delimiter character in an unquoted string or the quote character in a quoted
string is treated as regular character.

8. We have just received source files from UNIX. We want to stage that data to ETL process.
What are the points we need to look for?

Answer:
When a source flat file is loaded to a staging database table, generally we focus on the below items:

Define proper file-format for the input file (Delimited/Fixed-width), Code Page etc.
Header information having any Processing date to be checked with sysdate or some other business
logic.
Check the detail records count in the file with the information in the Trailer information if any.
Sum of any measure fields of detail records matches with Header/Trailer information if any.
In case of Indirect Loading we can add the filename and record number in file as part of columns in
the staging table.
Basically everything depends on your/business requirement.

9. What is the difference between Joiner and Lookup. Performance wise which one is better
to use.

Answer:
Joiner:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

66
Only = operator can be used in join condition
Supports normal, full outer, left/right outer join
Active transformation
No SQL override
Connected transformation
No dynamic cache
Heterogeneous source

Lookup:
=, <, <=, >, >=, != operators can be used in join condition
Supports left outer join
Earlier a Passive transformation, 9 onwards an Active transformation (Can return more than 1 rec-
ords in case of multiple match)
Supports SQL override
Connected/Unconnected
Supports dynamic cache update
Relational/FlatFile source/target
Pipeline Lookup

Selection between these two transformations is completely dependent on project requirement. Its a debat-
able topic to conclude which one among these two serves good in terms of performance.

10. What is the B2B in Informatica? How can we use it in Informatica?

Answer:
B2B allows to parse and read unstructured data such as PDF, EXCEL, HTML etc. It has the capability to read
binary data such as Messages, EBCDIC File etc. and has a very large list of supported formats.

B2B Data Transformation Studio is the Developer tool, by which the parsing of (reading) the unstructured da-
ta is done. B2B mostly gives the output as an XML file.

B2B Data Transformation is integrated with Informatica PowerCenter using a Transformation "Unstructured
Data Transformation", This transformation can receive the output of B2B Data Transformation studio and
load into any Target supported by PowerCenter.

11. What is CDC, SCD and MD5 in Informatica?

Answer:
CDC - Changed Data Capture. How, only the changed data is captured from the Source System.
SCD- Slowly Changing Dimension. How, history data is maintained in the Dimension tables.
MD5- MD5 Checksum Encoding. It generates 32 character HEX code encoding, can be used to decide
Insert/Update strategy for target records.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

67
12. How can we implement an SCD Type2 mapping without using a lookup transformation?

Answer:
The entire implementation will be same as that using a lookup. The only thing we need to replace the
Lookup transformation with a Joiner transformation. In the Joiner transformation the Source table will be
used as Master and the Target table as Detail. The join condition will be same as that of lookup condition
and the join type being Detail Outer Join.

13. How does Joiner and Lookup transformation treat NULL value matching?

Answer:
A NULL value is not equal to another NULL value in Joiner whereas, Lookup transformation matches null val-
ues.

14. Does Microsoft SQL server supports bulk loading? If yes, What happens when you specify
bulk mode and data driven for SQL server target

Answer:
Yes MS SQL Server supports Bulk Loading. But if we select Treat Source Rows as Data Driven with the Target
Load Type as Bulk then the session will fail. We have to select Normal Load with Data Driven source records.

15. How can you utilize COM components in Informatica?

Answer:
By writing C+, VB, VC++ code in External Stored Procedure Transformation

16. What is SQL transformation in Informatica?

Answer:
A SQL transformation can processes any SQL queries midstream in an Informatica pipeline. It supports
mostly all the DDL, DML, DCL, TCL.
For quick reference following are some important notes:-
We can configure the SQL transform in two modes that makes it Active/Passive.
Active, Query mode fires the SQL query in the database defined in the transformation.
Script mode, which is the Passive, one can call external SQL scripts to be executed.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

68
Query mode can be configured to handle Static SQL Query (i.e. the SQL query is the same with bind
variables) or Dynamic SQL Query (i.e. different query statements for each input row).
In case of Dynamic Query when we substitute the entire SQL query of the Query_Port is called Full
Query or portion of the query statement called Partial Query.
We can configure the SQL transformation to connect to a database with a Static Connection (i.e. se-
lecting a particular connection object) or Dynamic Connection (i.e. based on the logic it will dynami-
cally select the connection object to connect to a database).
Also we can pass the entire database connection information (i.e. username,password, connectstring,
codepage) called Full Database Connection.

17. What is a XML source qualifier?

Answer:
The XML source qualifier represents the data elements that the Informatica server reads when it runs a ses-
sion with XML sources.

18. What is the metadata extensions tab in Informatica?

Answer:
PowerCenter allows end users and partners to extend the metadata stored in the repository by associating
information with individual objects in the repository. That why its called Metadata Extension.
For example, when we create a mapping, we can store the information like the mapping functionality, busi-
ness user information, CR information. Similarly for Session we can store schedule information, contact per-
son for failed session information. We basically associate the information with repository metadata using
metadata extensions.

When we create reusable metadata extensions for a repository object using the Repository Manager, the
metadata extension becomes part of the properties of that type of object. For example, we can create a re-
usable metadata extension for source definition called SourceCreator. When we create or edit any source
definition in the Designer, the SourceCreator extension appears on the Metadata Extensions tab. anyone
who creates or edits a source can enter the name of the person that created the source into this field.

PowerCenter Client applications can contain the following types of metadata extensions:-
Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. We
can view and change the values of vendor-defined metadata extensions, but we cannot create, de-
lete, or redefine them.
User-defined. We create user-defined metadata extensions using PowerCenter. We can create, edit,
delete, and view user-defined metadata extensions. We can also change the values of user-defined
extensions.
All metadata extensions exist within a domain. We see the domains when we create, edit, or view metadata
extensions. Vendor-defined metadata extensions exist within a particular vendor domain. If we use third-
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

69
party applications or other Informatica products, we may see domains such as Ariba or PowerExchange for
Siebel. We cannot edit vendor-defined domains or change the metadata extensions in them.

User-defined metadata extensions exist within the User Defined Metadata Domain. When we create
metadata extensions for repository objects, we add them to this domain.

Both vendor and user-defined metadata extensions can exist for the repository objects- Source definitions,
Target definitions, Transformations, Mappings, Mapplets, Sessions, Tasks, Workflows, Worklets.

19. Describe some of the ETL Best Practices

Answer:
A lot of best practices may be applicable to a certain tool and pointless for the other. In a very high level and
in a very tool independent way-
Naming conventions for ETL objects
Naming conventions for Database objects
Parameterization of connections (so that things are easy for moving from 1 environment to other)
Maintaining of ETL job log - ideally automated maintenance through logging of job run
Handling of rejected records (and logging)
Data reconciliation
Meta data management- e.g. - maintaining Meta data columns in tables (Use of Audit columns e.g.
load date/ load user/ batch id etc.)
Error reporting
ETL job Performance evaluation
Following generic coding standards
Documentation
Decomposing complex logic in multiple ETL stages - load balancing (pushdown optimization wherev-
er applicable) etc.
Removal of unwanted ports from different transformations used in a mapping
Using Shortcuts for source, target and lookups
Using mapplet, worklet as and when required
Write some comments for every transformation
Use Decode function rather that if than else
make sure that the sorted data is moved into the aggregator transformation
If the target table is having indexes, loading data into such tables will decrease the performance; in
such situations, use pre SQL to drop the index before loading the data into target tables and once
the data is loaded then, re-create the index using post SQL.

20. Is there a scope of cloud computing in Data warehousing technology?

Answer:
This is not only possible; in fact, this is the way to go for many of the providers of the modern day BI tools.
There are certain advantages and benefits of using cloud computing for Business Intelligence applications
and this is a big topic of discussion today. I will quickly touch upon a few points that will substantiate the
need of Cloud BI and in the future I will try to make a comprehensive article post in this website with more
details. First, if you see the current state of BI - there are these typical characteristics:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

70
High Infrastructure requirement, leading to high upfront investment
High development cost (needs special talent) as well high maintenance cost
Unpredictable workload (data volume), and skewed business growth pattern
All these lead to the issues of longer cycle time and limited adoption of BI solutions. Now cloud platform,
as opposed to typical in-house software platform, is basically an alternative delivery method for the
software service. When you deliver the software or platform or infrastructure (as a service) through
cloud, you can instantly start to get the following benefits:
Lower entry cost
Lower maintenance cost (pay as you use)
Faster deployment
Reduced risk
Lower TCO (total cost of ownership)
Multiple deployment model etc. etc.
Moreover, Small and medium enterprises (SMEs) can easily adapt to this model given their typical con-
straints of small business. Companies like Pentaho etc. are already in with their products in SaaS (soft-
ware as a service) model of cloud computing. But cloud models like SaaS has some typical problems (e.g.
no flexibility of design, security concerns etc.).
As opposed to SaaS model, we have another cloud model called PaaS - Platform as a service - which has
the benefit of design flexibility. PaaS is very suitable for custom applications and even enterprise level BI
applications. This cloud service is being offered by almost everyone in the BI market - - BusinessObjects -
SAS - Microsoft Azure (check here: http://en.wikipedia.org/wiki/SQL_Azure ) - Vertica - Greenplum etc.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

71
17. Mapping

Suppose we have a source port called ename with data type varchar(20) and the corresponding target port
as ename with varchar(20). The data type is now altered to varchar(50) in both source and target database.
Describe the changes required to modify the mapping.
Answer:
Reimport the source and target definition. Next open the mapping and Right click on the source port ename
and use "Propagate Attribute" option. This option allows us to change the properties of one port across mul-
tiple transformations without manually modifying the port in each and every transformation. We can choose
the direction of propagation (forward / backward / both) and can also select attributes of propagation e.g.
data type, scale, precision etc.

2. What are mapping parameters and variables?

Answer:
A mapping parameter is a user definable constant that takes up a value before running a session.
It can be used in SQ expressions, Expression transformation etc.

A mapping variable is also defined similar to the parameter except that the value of the variable is subjected
to change. It picks up the value in the following order.
From the Session parameter file
As stored in the repository object in the previous run
As defined in the initial values in the designer
Data type Default values

3. Which type of variables or parameters can be declared in parameter file?
$, $$, $$$ - Can all be declared or not.
Answer:
There is a difference between variable and parameter.
Variable, as the name suggests, is like a variable value which can change within a session run.
Parameters are fixed and their values don't change during session run.
$ - for session level parameters which can be declared in parameter files.
$$ - for mapping level parameters which can be declared in parameter files.
$$$- Inbuilt Informatica system variables that cannot be declared in parameter files
E.g. $$$SessStartTime these are constant throughout the mapping and cannot be changed.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

72
Read this article to get a detail understanding:http://www.dwbiconcepts.com/etl/14-etl-
informatica/74-stop-hardcoding-follow-parameterization-technique.html

4. What are the default values for variables?

Answer:
String = Null
Number = 0
Date = 1/1/1753

5. What does first column of bad file (rejected rows) indicates?

Answer:
First Column - Row indicator (0, 1, 2, 3)
Second Column Column Indicator (D, O, N, T)

6. Out of 100000 source rows some rows get discard at target, how will you trace them and
where it gets loaded?

Answer:
Rejected records are loaded into bad files. It has record indicator and column indicator.
Record indicator identified by (0-insert,1-update,2-delete,3-reject) and
Column indicator identified by (D-valid,O-overflow,N-null,T-truncated).
Normally data may get rejected in different reason due to transformation logic

7. What is Reject loading?

Answer:
During a session, the Informatica server creates a reject file for each target instance in the mapping. If the
writer or the target rejects data, the Informatica server writes the rejected row into reject file. The reject file
and session log contain information that helps you determine the cause of the reject. You can correct reject
files and load them to relational targets using the Informatica reject load utility. The reject loader also cre-
ates another reject file for the data that the writer or target reject during the reject loading.
Reject Loading
During a session, the server creates a reject file for each target instance in the mapping. If the writer of the
target rejects data, the server writers the rejected rows into the reject file. You can correct those rejected
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

73
data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data in-
to a flat file target) Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles
$PMBadFileDir / Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
Row indicator - Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
o 0 Insert Writer or target
o 1 Update Writer or target
o 2 Delete Writer or target
o 3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it
for reject.
Column indicator - Column indicator is followed by the first column of data, and another column in-
dicator. They appears after every column of data and define the type of data preceding it
Column Indicator Meaning Writer Treats as
o D Valid Data Good Data. The target accepts it unless a database error occurs, such as finding
duplicate key.
o Overflow Bad Data.
o N Null Bad Data.
o T Truncated Bad Data
NOTE: NULL columns appear in the reject file with commas marking their column.

Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data. Keep in mind that correcting
the reject file does not necessarily correct the source of the reject. Correct the mapping and target database
to eliminate some of the rejected data when you run the session again. Trying to correct target rejected
rows before correcting writer rejected rows is not recommended since they may contain misleading column
indicator. For example, a series of N indicator might lead you to believe the target database does not
accept NULL values, so you decide to change those NULL values to Zero. However, if those rows also had a 3
in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not
because of a target database restriction. If you try to load the corrected file to target, the writer will again re-
ject those rows, and they will contain inaccurate 0 values, in place of NULL values.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

74
8. Why Informatica writer thread may reject a record?

Answer:
Data overflowed column constraints
An update strategy expression
9. Why target database can reject a record?

Answer:
Data contains a NULL column
Database errors, such as key violations

10. Describe various steps for loading reject file?

Answer:
After correcting the rejected data, rename the rejected file to reject_file.in
The rejloader used the data movement mode configured for the server. It also used the code page of
server/OS. Hence do not change the above, in middle of the reject loading
Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name]

11. Variable v1 has values set as 5 in designer (default), 10 in parameter file, and 15 in reposi-
tory. While running session which value Informatica will read?

Answer:
Informatica read value 15 from repository

12. What are shortcuts? Where it can be used? What are the advantages?

Answer:
There are 2 shortcuts (Local and global) Local used in local repository and global used in global repository.
The advantage is reusing an object without creating multiple objects. Say for example a source definition
want to use in 10 mappings in 10 different folders without creating 10 multiple source you create 10
shortcuts.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

75
13. Can we have an Informatica mapping with two pipelines, where one flow is having a
Transaction Control transformation and another not. Explain why?

Answer:
No it is not possible. Whenever we have a Transaction Control transformation in a mapping, the session
commit type is User Defined. Whereas for a pipeline without the Transaction Control transform, the session
expects the commit type to be either Source based or Target based.
Hence we cannot have both the pipelines in a single mapping; rather we have to develop single mappings for
each of the pipelines.

14. How can we implement Reverse Pivoting using Informatica transformations?

Answer:
Pivoting can be done using Normalizer transformation. For reverse-pivoting we will need to use an aggrega-
tor transformation like below:

From,
Col1 Col2
A 10
B 20

To,
Col1 Col2
A B
10 20

can be done using one Expression transformation and one Aggregator transformation:

In Expression transform, create two ports, o_col_a, o_col_b.
o_col_a = IIF (col1="A", ColB, 0)
o_col_b = IIF (col1="B", ColB, 0)

Next in the aggregator transform, take the MAX () of o_col_a, o_col_b and map it to target A and B columns.
(We may need to take SUM (), instead of MAX () if we have multiple A, B rows)

15. Is it possible to update a Target table without any key column in target?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

76
Yes it is possible to update the target table either by defining keys at Informatica level in Warehouse
designer or by using Update Override.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

77
18. Mapplet

1. What is a Mapplet?

Answer:
Mapplets are reusable objects that represent collection of transformations.

2. What is the difference between Reusable transformation and Mapplet?

Answer:
Any Informatica Transformation created in the Transformation Developer or a non-reusable pro-
moted to reusable transformation from the mapping designer which can be used in multiple
mappings is known as Reusable Transformation. When we add a reusable transformation to a
mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation
is a pointer to that transformation, when we change the transformation in the Transformation Developer, its
instances reflect these changes.
A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and
lets us reuse the transformation logic in multiple mappings. A Mapplet can contain as many transformations
as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the
mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all instances of the
mapplet.

3. What are the transformations that are not supported in Mapplet?

Answer:
Normalizer
Cobol sources
XML sources
XML Source Qualifier
Target definitions
Pre- and Post- session Stored Procedures
Other Mapplet

4. Is it possible to convert reusable transformation to a non-reusable one?

Answer:
Reusable transformations are created in the Transformation Developer.
Another way is to promote a non-reusable transformation in a Mapping/Mapplet to reusable one.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

78
**Converting a non-reusable transformation into a reusable transformation is not reversible.

But we can use the reusable transformation as a non-reusable one in any mapping or mapplet by dragging
the selected Reusable Transform from the Repository Navigator and press the Ctrl key just before dropping
the object in the Mapplet/Mapping designer.

The same applies for creating a non-reusable session from a reusable one in the Worklet/Workflow designer.

5. What is the use of Mapplet & Worklet in project?

Answer:
Mapplet and Worklets allow you to create reusable objects and thus make your informatica code reusable.

Just like a procedure or function in a procedural language, we can build a mapplet or worklet, to incorporate
a business logic, which can be used again and again in different mapping and workflow.

Mapplet can be created in PowerCenter Designer and reused in mapings. Worklet can be created in Work-
flow Manager and reused in Workflows.

6. Is it possible to have a mapplet within a mapplet and worklet within a worklet?

Answer:
Informatica does not support mapplet within a mapplet transformation but it supports worklet within a
worklet.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

79
19. Session

1. What is Session and Batches?

Answer:
SESSION - A Session is a set of instructions that tells the Informatica Server / Integration Service, how and
when to move data from Sources to Targets. After creating the session, we can use either the server manag-
er or the command line program pmcmd to start or stop the session.
BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica
Server. There Are Two Types Of Batches:
SEQUENTIAL - Run Session One after the Other.
CONCURRENT - Run Session at the Same Time.

2. What are various session tracing levels?

Answer:
Normal - default Logs initialization and status information, errors encountered, skipped rows
due to transformation errors, summarizes session results but not at the row level.

Terse - Log initialization, error messages, notification of rejected data.
Verbose Initialization - In addition to normal tracing levels, it also logs additional initialization information,
names of index and data files used and detailed transformation statistics.
Verbose Data - In addition to verbose initialization, it records row level logs.

3. Can we copy a session to new folder or new repository?

Answer:
Yes we can copy session to new folder or repository, provided the corresponding Mapping is already in the
folder or repository.

4. Is it possible to store all the Informatica session log information in a database table?
Normally the session log is stored as a binary compression .bin file in SessLogs directory.
Can we store the same information in database tables for future analysis?

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

80
Answer:
It is not possible to store all the session log information in some table. Along with error related in-
formation we may get some other session related information from metadata repository tables like
REP_SESS_LOG.
To capture error data, we can configure the session as below:
Go to Session->Config Object-> Error Handling Section

Give the setting-
Error Log Type: Relational Database.
Error Log Type: Give the Database Connection, where we want to store the error tables.
Error Log Table Name Prefix: Prefix for the error tables. By default, Informatica creates 4 different error ta-
bles. If we provide a prefix here the error tables will be created with the same prefix in the database.
Log Row Data: This option is used to log the data at the point where the error happened.
Log Source Row Data: Capture the source date for the error record.
Log Source Row Data: Error data will be stored into a single column of the database table. We can specify
the delimiter for the source data here.

List of Error tables created by Informatica:

PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source
row.
PMERR_MSG. Stores metadata about an error and the error message.
PMERR_SESS. Stores metadata about the session.
PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and data type,
when a transformation error occurs.

The above tables are specifically used to store the information about exception (error) records - e.g. records
in the reject file.
We can use this as a base for error handling strategy. But this does not contain all the information that are
present in session log - like performance details (thread busy percentage), details of the transformation in-
voked in the session etc. We can also check the contents of REP_SESS_LOG view under Informatica reposito-
ry schema; however, that too does not contain all the information.

5. Can we call a shell script from session properties?

Answer:
The Integration Service can execute shell commands at the beginning or at the end of the session. The Work-
flow Manager provides the following types of shell commands for each Session task:
Pre-session command
Post-session success command
Post-session failure command
Use any valid UNIX command or shell script for UNIX nodes, or any valid DOS or batch file for Windows
nodes. Configure the session to run the pre- or post-session shell commands.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

81
6. Can we change the Source and Target table names in Session level?

Answer:
Yes, we can change the source and target table names in the session level. Go to the session and navigate to
the mapping tab. Select the source/target to be changed- for target mention new table name in
Target Table Name & for source choose Source Table Name.
One more suitable method would be to parameterize the source and target table name. We can
run the same mapping concurrently using different parameter files. We have to enable concurrent run mode
in the Workflow level. Also find more information regarding parameterization.

7. How to write flat file column names in target?

Answer:
There are two options available in session properties to take care of this requirement. For this, Go to Map-
ping Tab Target Properties and Choose the header option as Output Field names OR Use Header Command
output File.
Option 1, will create your output file with a header record and the column heading names will be same as
your Target transformation port names.
Option 2, we can create our command to generate the header record text. We can use an 'echo' command
here to get this created. Here is an example
echo '"Employee ID"|"Department ID"'
It is recommended using the second option as it gives more flexibility for writing the column names.

8. What are the ERROR tables present in Informatica?

Answer:
PMERR_DATA- Stores data and metadata about a transformation row error and its corresponding
source row.
PMERR_MSG- Stores metadata about an error and the error message.
PMERR_SESS- Stores metadata about the session.
PMERR_TRANS- Stores metadata about the source and transformation ports, such as name and data
type, when a transformation error occurs.

9. What are the alternate ways to stop a session without using STOP ON ERRORS option
set to 1 in session properties?

Answer:
We can also use the functions STOP () or ERROR () in an expression transformation to stop the execution of a
session based on some user-defined conditions.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

82

10. Suppose a session fails after loading of 10,000 records in the target. How can we load the
records from 10,001 when we run the session next time?

Answer:
If we configure the Session for Normal load rather than Bulk load & by using Recovery Strategy
in the Session Properties & selecting the Option Resume from last Check point, then we can
run the Session from the last Commit Interval.
In this case if we specify the Commit Interval as 10,000 & the Integration Service issues a commit after load-
ing 10,000 records then you can load the records from 10,001.
If 9999 rows were loaded and the session fails and Integration Service did not issue any commit as the Com-
mit Interval in this case is 10,000 then we cannot perform Recovery. In this case truncate the Target Table &
Restart the session.

11. Define the types of Commit intervals apart from user defined?

Answer:
The different commit intervals are:
Target-based commit. The Informatica Server commits data based on the number of target rows and the key
constraints on the target table. The commit point also depends on the buffer block size and the commit in-
terval.
Source-based commit. The Informatica Server commits data based on the number of source rows. The
commit point is the commit interval you configure in the session properties.

12. Suppose session is configured with commit interval of 10,000 rows and source has 50,000
rows explain the commit points for source based commit & target based commit. Assume
appropriate value wherever required?

Answer:
Target Based commit (First time Buffer size full 7500 next time 15000)
Commit Every 15000, 22500, 30000, 40000, 50000

Source Based commit(Does not affect rows held in buffer)
Commit Every 10000, 20000, 30000, 40000, 50000

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

83
13. How to capture performance statistics of individual transformation in the mapping and
explain some important statistics that can be captured?

Answer:
Use tracing level Verbose data.

14. How can we parameterize success or failure email list?

Answer:
We can parameterize the email user list and modify the values in parameter file.
Use $PMSuccessEmailUser, $PMFailureEmailUser.

Also we can use pmrep command to update the email task:

updateemailaddr
-d <folder_name>
-s <session_name>
-u <success_email_address>
-f <failure_email_address>

15. Is it possible that a session failed but still the workflow status is showing success?

Answer:
If the workflow completes successfully it will show the execution status of success irrespective of whether
any session within the workflow failed or not. The workflow success status has nothing to do with session
failure. If and only if we set the session general option in the workflow designer Fail Parent if this task fails,
then only the workflow status will display as failed on session failure.

16. What is Busy Percentage?

Answer:
Duration of time the thread was occupied compared to total run time of the mapping.

So lets say, we have one writer thread - this thread is internally responsible for writing data to the target ta-
ble/ file. Now if our mapping runs for 100 seconds but the time taken by the mapping to write the data to
the target is only 20 seconds (because other time it was busy in reading/ transforming the data), then busy
percentage of the writer thread is 20%

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

84
17. Can we write a PL/SQL block in pre and post session or in target query override?

Answer:
Yes we can. Remember always to put a backslash (\) before any semi-colon ( ; ) we use in the PL-SQL block.

18. Whenever a session runs does the data gets overwritten in a flat file target? Is it possible
to keep the existing data and add the new data to the target file?

Answer:
Normally with every session run target file data will be overwritten, except if we select Append if Exist (8x
onwards) option for the Target session Property which will append the new data to the existing data in the
flat file target.

19. Can we use the same session to load a target table in different databases having same
target definition?

Answer:
Yes we can use the same session to load same target definition in different databases with the help of the
Parameterization; i.e. using different parameter files with different values for the parameterized Target Con-
nection object $DBConnection_TGT and Owner/Schema name Table Name Prefix with
$Param_Tgt_Tablename. To run the single workflow with the session, to load two different database target
tables we can consider using Concurrent workflow Instances with different parameter files.
Even we can load two instance of the same target connected in the same pipeline. At the session level use
different relational connection object created for different Databases.

20. How do you remove the cache files after the transformation?

Answer:
After session complete, DTM remove cache memory and deletes caches files. In case using persistent cache
and Incremental aggregation then caches files will be saved.

21. Why doesn't a running session QUIT when Oracle or Sybase return fatal errors?

Answer:
The session will only QUIT when its threshold: "Stop on errors" is set to 1. Otherwise the session will contin-
ue to run.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

85

22. If we have written a source override query in source qualifier in mapping level but have
modified the query in session level SQL override then how integration service behaves.

Answer:
Informatica Integration Service treats the Session Level Query as final during the session run. If both the que-
ries are different Integration Service will consider the Session level query for execution and will ignore the
Mapping level query.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

86
20. Workflow

1. What is the difference between STOP and ABORT options in Workflow?

Answer:
When we issue the STOP command on the executing session task, the Integration Service stops
reading data from source. It continues processing, writing and committing the data to targets. If
the Integration Service cannot finish processing and committing data, we can issue the abort
command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish pro-
cessing and committing data within the timeout period, it kills the DTM process and terminates the session.
We can stop or abort tasks, worklets within a workflow from the Workflow Monitor or Control
task in the workflow or from command task by using pmcmd stop or abort command. We can also
call the ABORT function from mapping level.
When we stop or abort a task, the Integration Service stops processing the task and any other tasks in the
path of the stopped or aborted task. The Integration Service however continues processing concurrent tasks
in the workflow. If the Integration Service cannot stop the task, we can abort the task.
The Integration Service aborts any workflow if the Repository Service process shuts down.

2. Running Informatica Workflow continuously How to run a workflow continuously until a
certain condition is met?

Answer:
We can schedule a workflow to run continuously. A continuous workflow starts as soon as the In-
tegration Service initializes. If we schedule a real-time session to run as a continuous workflow,
the Integration Service starts the next run of the workflow as soon as it finishes the first. When
the workflow stops, it restarts immediately.
Alternatively for normal batch scenario we can create conditional-continuous workflow as below.
Suppose wf_Bus contains the business session that we want to run continuously until a certain conditions is
meet before it stops, may be presence of file or particular value of workflow variable etc.
So modify the workflow as Start-Task followed by Decision Task which evaluates a condition to be TRUE or
FALSE. Based on this condition the workflow will run or stop.
Next use the Link Task to link the business session for $Decision.Condition=TRUE.
For the other part use a Command Task for $Decision.Condition=FALSE.
In the command task create a command to call a dummy workflow using pmcmd functionality. e.g.
"C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe" startworkflow -sv
IS_info_repo8x -d Domain_hp -u info_repo8x -p info_repo8x -f WorkFolder
wf_dummy
Next create the dummy workflow name it as wf_dummy. Place a Command Task after the Start Task.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

87
Within the command task put the pmcmd command as
"C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe" startworkflow -sv
IS_info_repo8x -d Domain_sauravhp -u info_repo8x -p info_repo8x -f
WorkFolder wf_Bus
In this way we can manage to run a workflow continuously. So the basic concept is to use two workflows and
make them call each other.

3. How do we send emails from Informatica after the successful completion of one session?
The email will contain the job name/ session start time and session end time in the mes-
sage body.

Answer:
The first thing is to have "mail" utility configured in the Informatica server (UNIX/WINDOWS).
After that, we will use the Informatica Email Task. We can create a email task and call it in the session level
On Success Email. Here we can use Informatica pre-build variables like- mapping name (%m), session start
time (%b) etc.

How to pass a value calculated in mapping variable to the email message. The email will be sent in HTML
format with a predefined message in which one value will be populated from one mapping variable. Sup-
pose, the predefined message is:
<html> <body>
The last transaction service ID is: <informatica_variable>
</body> </html>
In the place of <informatica_variable>, the value of the mapping variable at the end of the session will go.
Answer:
We cannot use a mapping variable in Workflow or Session level. It is local to a mapping. Instead, we have to
use a Workflow variable for this purpose. But, we cannot pass the value of the Mapping Variable to the
Workflow variable directly from your mapping.
1) Write the calculated value in some Flat File using your mapping say "value.txt".
2) Create a shell script say "mail.sh" to send the 2nd mail. Read the value from the "value.txt" into a variable
in "mail.sh". Use this variable in the body of the mail.
3) Create a Cmd task in the WF level. Call this "mail.sh" in that Cmd task.
4) Use this Cmd task upstream of your actual session and link it on its success.

5. How can we send two separate emails after a successful session run?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

88
The problem is we cannot call two email tasks from one session i.e. from session level On Success Email.
So, for the second email we can create another Email Task following the Session using and link them using
Link Task with execution condition as status=SUCCEEDED.

6. What is Cold Start in Informatica?

Answer:
In general terms, Cold Start means To start a program from the very beginning, without being able to con-
tinue the processing that was occurring previously when the system was interrupted.

With respect to Informatica, we can resume a stopped or failed real-time session. To resume a session, we
must restart or recover the session. The Integration Service can recover a session automatically if you ena-
bled the session for automatic task recovery. When you restart a session, the Integration Service resumes
the session based on the real-time source. Depending on the real-time source, it restarts the session with or
without recovery.

We can restart a task or workflow in cold start mode. When you restart a task or workflow in cold start
mode, the Integration Service discards the recovery information and restarts the task or workflow.

For e.g. if a workflow failed in between and we don't want to recover data because we manually did all clean
up of data in the impacted target tables. If workflow recovery is enabled then we can opt for a cold start
which will skip recovery task. Cold start will remove all recover data if any stored when session failed.
When we restart a stopped or failed task or workflow that has recovery enabled in cold start mode,
the Integration Service discards the recovery information and restarts the task or workflow.
Cold Start Task, Cold Start Workflow or Cold Start Workflow from Task commands can be executed
from the Workflow Manager, Workflow Monitor, or pmcmd command line programs.
If we restart a session in cold start mode, targets may receive duplicate rows.
So avoid cold start and restart the session with recovery to prevent data duplication.
So if recovery is not enabled in a session, then there is no difference between cold start and restart.

Email - I have a llist of 10 peoples in email after session failure. can we edit the list emails dynamically - I
mean can we add or delete email ID without touching the mapping.
Answer:
We can parameterize the email user list and modify the values in parameter file. Use $PMSuccessEmailUser,
$PMFailureEmailUser. Also you can use pmrep command to update the email task:
updateemailaddr -d <folder_name> -s <session_name> -u <suc-
cess_email_address> -f <failure_email_address>
You can create a distribution list and use that DL in the session failure cmd. What so ever emails will be listed
in the DL will receive the mail. Later on you can add/remove the emails in the DL depending upon your re-
quirement.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

89
8. We know there are 3 options for Session recovery strategy - Restart task, Fail task and
continue running the workflow, Resume from last checkpoint whenever a session fails.
How do we restart a workflow automatically without any manual intervention in the
event of session failure?

Answer:
Select Automatically recover terminated tasks option in workflow properties. Also we can specify the max-
imum number of auto attempts in the workflow property Maximum automatic recovery attempts.

9. What is the difference Real-time and continuous workflows?
Answer:
Real-time Workflow is source XML Message triggered workflow, whereas if any workflow which runs contin-
uously using two workflows and command line arguments to call each other.

Suppose we have two workflows workflow 1 (wf1) having two sessions (s1, s2) and workflow 2 (wf2) having
three sessions (s3, s4, s5) in the same folder, like below
wf1: s1, s2
wf2: s3, s4, s5

How can we run s1 first then s3 after that s2 next s4 and s5 without using pmcmd command or unix script?
Answer:
Use Command Task or Post Session Command to create touch file and use Event Wait Task to wait for the
file (Filewatch Name).

Combination of Command Task and Event Wait will help to solve the problem.

WF1----->S1------>CMD1----->EW2------>S2------->CMD3
WF2----->EW1--->S3--------->CMD2----->EW3---->S4------>S5
So run both the workflows, session s1 starts and after successful execution calls command task cmd1. cmd1
generates a touch file say s3.txt
After that the execution passes to event wait ew2. Immediately event wait ew1 will start to process session
s3 after the file s3.txt was generated. Next after success of session s3 it will pass the control to command
task cmd2 which in turn will generate a touch file say s2.txt and passes the control to event wait task ew3.
Immediately at the same time the event wait ew2 gets started after receiving the event wait file s2.txt and
passes the control to session s2. After completion of session s2 it triggers command task cmd3 which in turn
generates a wait file s4.txt and the workflow wf1 ends. On the other hand the event wait ew3 gets triggered
with wait file s4.txt in place and calls the session s4 which in turn after success triggers the last session s5
and the workflow wf2 completes.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

90
12. How do we send a session failure mail with the workflow or session log as attachment?

Answer:
Design an Informatica email task to send email communication in the event of session failure and used email
variable %g to attach the corresponding session log.
Email Variables:
(%g) - To attach session log.
(%a<>) - To attach any file, Absolute path needs to be given <>.

13. Explain deadlock in Informatica and how do we resolve it?

Answer:
In Database level deadlock normally occurs when two concurrent user sessions are trying to ap-
ply a DML command for same row in a table. Say for example, below query got executed by us-
er1 in session1

update emp set deptno=20 where deptno=10;

Before user1 is commits the transaction, if user2 from session2 execute the same query as below , it causes
deadlock error.

update emp set deptno=30 where deptno=10;
In informatica normally deadlock occurs when two sessions are updating or deleting records from a table in
parallel, (parallel insert is not a problem). One option to avoid deadlock is to identify those sessions and
make them sequential. Another option is to make use of the session level properties such as deadlock retry
limits and deadlock recovery option

Busy Percentage is given by (runtime-idle time) * 100 / runtime.
If a thread is having 0 idle time, which means more Busy Percentage. So do we need to tune that thread
component?
Why is it like that? So does it means we need to tune the thread whose busy percentage (BP) is more or the
one having more idle time.
Answer:
3 persons are asked to run 1 mile each. Each one of them is allotted 20 minutes of time. First person com-
pletes 1 mile in 5 minutes and stands idle other 15 minutes of his allotted time. The 2nd person completes it
in 10 minute and sits idle the rest 10 minute. The last one takes all 20 minutes and idle for 0 minutes. Who is
the worst performer?

Isn't it the last person who had no idle time? It's the same for a thread with 0 idle time.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

91
15. How can we pass a value from one workflow to another?

Answer:
Pass the Workflow variable value to a session variable in pre-assignment and then next to mapping parame-
ter.
Next develop a mapping to generate a parameter file with the desired value as a workflow variable that can
be passes to the next workflow using this parameter file.

Alternatively, develop the mapping to store the value in a flat file or Database table. Next create another
mapping to use that in the next workflow by passing it to the session in post-assignment and then to work-
flow level if required.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

92
21. Administration

1. What is Load Manager?

Answer:
The load Manager performs the following tasks
Manages session and batch scheduling.
Locks the session and read session properties.
Reads the parameter file.
Expand the server and session variables and parameters.
Verify permissions and privileges.
Validate source and target code pages.
Create the session log file.
Create the Data Transformation Manager which executes the session.

2. What is DTM process? How many threads it creates to process data, explain each thread
in brief?

Answer:
After the load manager performs validations for the session, it creates the DTM process. The DTM process is
the second process associated with the session run. The primary purpose of the DTM process
is to create and manage threads that carry out the session tasks. The DTM allocates process
memory for the session and divide it into buffers. This is also known as buffer memory. It cre-
ates the main thread, which is called the master thread. The master thread creates and man-
ages all other threads. If we partition a session, the DTM creates a set of threads for each par-
tition to allow concurrent processing. When Informatica server writes messages to the session log it includes
thread type and thread ID. Following are the types of threads that DTM creates:
MASTER THREAD - Main thread of the DTM process. Creates and manages all other threads.
MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping Information.
Pre and Post Session Thread - One Thread Each To Perform Pre and Post Session Operations.
READER THREAD - One Thread for Each Partition for Each Source Pipeline.
WRITER THREAD - One Thread for Each Partition If Target Exist in the Source pipeline Write to the
Target.
TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.

3. Can you create a folder within designer?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

93
Not possible

4. How do you take care of security using a repository manager?

Answer:
Using repository privileges, folder permission and locking.
Repository privileges(Session operator, Use designer, Browse repository, Create session and batches,
Administer repository, administer server, super user)
Folder permission(owner, groups, users)
Locking(Read, Write, Execute, Fetch, Save)

5. What are the different uses of a repository manager?

Answer:
Repository manager used to create repository which contains metadata the Informatica uses to transform
data from source to target. And also it use to create informatica users and folders and copy, backup and re-
store the repository

6. What are 2 modes of data movement in Informatica Server?

Answer:
The data movement mode depends on whether Informatica Server should process single byte or multi-byte
character data. This mode selection can affect the enforcement of code page relationships and code page
validation in the Informatica Client and Server.
Unicode IS allows 2 bytes for each character and uses additional byte for each non-ascii character
(such as Japanese characters)
ASCII IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes
into effect once you restart the Informatica Server.

7. What is Code Page used for?
Answer:
A code page contains the encoding to specify characters in a set of one or more languages. An encoding is
the assignment of a number to a character in the character set. Code Page is used to identify characters that
might be in different languages. If you are importing Japanese data into mapping, then u must select the
Japanese code page for the source data.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

94

8. What is Code Page Compatibility?
Answer:
Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in
the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One
code page can be a subset or superset of another. For accurate data movement, the target code page must
be a superset of the source code page.
Superset - A code page is a superset of another code page when it contains the character encoded in the
other code page. It also contains additional characters not contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the code page are encoded in
the other code page.

9. What is default block buffer size?

Answer: 64K

10. What is default LM shared memory size?

Answer: 2MB

11. Define Server Concepts with respect to memory buffers

Answer:
The Informatica server used three system resources CPU, Shared Memory & Buffer
MemoryInformatica server uses shared memory, buffer memory and cache memory for session
information and to move data between session threads.
LM Shared Memory - Load Manager uses both process and shared memory. The LM keeps the information
server list of sessions and batches, and the schedule queue in process memory. Once a session starts, the LM
uses shared memory to store session details for the duration of the session run or session schedule. This
shared memory appears as the configurable parameter (LMSharedMemory) and the server allots 2,000,000
bytes as default. This allows you to schedule or run approximately 10 sessions at one time.
DTM Buffer Memory - The DTM process allocates buffer memory to the session based on the DTM buffer
poll size settings, in session properties. By default, it allocates 12,000,000 bytes of memory to the session.
DTM divides memory into buffer blocks as configured in the buffer block size settings. (Default: 64,000 bytes
per block)
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

95

12. What are the two programs that communicate with the Informatica Server?

Answer:
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:
Server Manager - A client application used to create and manage sessions and batches, and to monitor and
stop the Informatica Server. You can use information provided through the Server Manager to troubleshoot
sessions and improve session performance.
pmcmd - A command-line program that allows you to start and stop sessions and batches, stop the
Informatica Server, and verify if the Informatica Server is running.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

96
22. Command Line Arguments

1. What is pmcmd commands?

Answer:
pmcmd is a command line program to communicate with the Informatica server. This does not replace the
server manager, since there are many tasks that you can perform only with server Manager.
These are some operations that you can do using PMCMD - Start, Stop and abort the session

2. What is pmrep commands?

Answer:
You can use pmrep to create or delete repository users and groups. You can also use pmrep to modify repos-
itory privileges assigned to users and groups.

3. How do we start & stop session from pmcmd command line?

Answer:

Use the following syntax to ping the Informatica Server on a UNIX system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
[hostname:]portno
Use the following syntax to start a session or batch on a UNIX system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno [folder_name:]{session_name | batch_name}
[:pf=param_file] session_flag wait_flag
Use the following syntax to stop a session or batch on a UNIX system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno[folder_name:]{session_name | batch_name} session_flag
Use the following syntax to stop the Informatica Server on a UNIX system:
pmcmd stopserver {user_name | %user_env_var} {password | %pass-
word_env_var} [hostname:]portno

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

97
23. Metadata Repository

1. Is there any metadata query to find the list of Informatica folder name, workflow names
which are migrated in a particular Quarter?

Answer:
The below SQL will give you the list of folders, workflows and their last saved date.

SELECT W.SUBJECT_AREA FOLDER_NAME, W.WORKFLOW_NAME, W.WORKFLOW_LAST_SAVED
FROM REP_WORKFLOWS W
ORDER BY TO_DATE (W.WORKFLOW_LAST_SAVED, 'MM/DD/YYYY HH24:MI:SS') DESC

2. How can I run Metadata Queries in Informatica PowerCenter?

Answer:

Informatica metadata is stored in some database repository. This can be the same database where we have
our source/ staging / target tables or it may be a completely different database (that is the case in general).
We can execute User defined queries metadata queries only on this database.
We may need to ask Informatica administrator about the database login credentials. We need to have a read
access username/password for the database. After that we can connect to the database and run the
metadata queries.

3. Write a metadata query to identify the sessions having truncate option enabled

Answer:

select
task_name,
'Truncate Target Table' ATTR,
decode(attr_value,1,'Yes','No') Value
from OPB_EXTN_ATTR OEA,
REP_ALL_TASKS RAT
where
OEA.SESSION_ID=rat.TASK_ID
and attr_id=9

4. Where can I find a history / metrics of the load sessions that have occurred in
Informatica?

Answer:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

98
The tables which house this information are OPB_LOAD_SESSION, OPB_SESSION_LOG, and
OPB_SESS_TARG_LOG. OPB_LOAD_SESSION contains the single session entries, OPB_SESSION_LOG contains
a historical log of all session runs that have taken place. OPB_SESS_TARG_LOG keeps track of the errors, and
the target tables which have been loaded. Keep in mind these tables are tied together by Session_ID. If a
session is deleted from OPB_LOAD_SESSION, it's history is not necessarily deleted from OPB_SESSION_LOG,
nor from OPB_SESS_TARG_LOG. Unfortunately - this leaves un-identified session ID's in these tables. How-
ever, when you can join them together, you can get the start and complete times from each session.

5. How to extract the workflow monitor record information from Informatica metadata re-
pository?

Answer:

SELECT DISTINCT
FOLDER_NAME, WORKFLOW_NAME, SESSION_NAME,
START_DATE, START_TIME, END_DATE, END_TIME, DURATION "DURATION IN
DD:HH:MI:SS",
SOURCE_ROWS, TARGET_ROWS, REJECTED_ROWS, REJECTED_STATUS, STATUS,
FAILED_REASON
FROM
( SELECT
t.SUBJECT_AREA FOLDER_NAME, t.WORKFLOW_NAME, t.SESSION_NAME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'DD-MON-YYYY'))
START_DATE,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'HH24:MI:SS
AM')) START_TIME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'DD-MON-
YYYY')) END_DATE,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'HH24:MI:SS
PM')) END_TIME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TRUNC((((86400*(SESSION_TIMESTAMP-
ACTUAL_START))/60)/60)/24)||':'
|| (TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60) -
24*(TRUNC((((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60)/24)))||':'
|| (TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60) -
60*(TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60))) ||':'
|| (TRUNC(86400*(SESSION_TIMESTAMP-ACTUAL_START)) -
60*(TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)))) DURATION ,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_SOURCE_ROWS) SOURCE_ROWS ,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_ROWS) TARGET_ROWS,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.FAILED_ROWS) REJECTED_ROWS,
DECODE(t.RUN_STATUS_CODE, 2,NULL,CASE WHEN t.SUCCESSFUL_SOURCE_ROWS <>
t.SUCCESSFUL_ROWS THEN 'VALIDATE THE MISMATCH' END) REJECTED_STATUS,
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

99
DECODE(t.RUN_STATUS_CODE, 1,'Succeeded', 2,'Disabled', 3,'Failed',
4,'Stopped', 5,'Aborted', 6,'Running', 7,'Suspending', 8,'Suspended',
9,'Stopping', 10,'Aborting', 11,'Waiting', 15,'Terminated') AS STATUS,
REPLACE(REPLACE(t.FIRST_ERROR_MSG,CHR(10),' '),'No errors encoun-
tered.','') AS FAILED_REASON,
RANK() OVER (PARTITION BY session_name ORDER BY t.SESSION_TIMESTAMP DESC)
rnk
FROM REP_SESS_LOG t WHERE t.SUBJECT_AREA='<<informatica_folder_name>>'
) sess_run
WHERE sess_run.rnk = 1
ORDER BY START_DATE, START_TIME
Don't forget to put the informatica folder name in the SUBJECT_AREA filter above. Also we might need to
make some other small adjustments above to better suit your purpose / informatica version.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

100
24. Repository Manager

1. Describe the steps for export and import?

Answer:
Open the folder which contains the mapping.
Check Out the mapping to be exported.
Click Repository-->Export Objects and save it in your local drive.
Open the folder in which you want to export the mapping.
Click Repository-->Import Objects and select mapping xml file and Click import.
Once the mapping is imported to the new folder just save it and Check In.

2. What are the various methods of code migration or which is the best way of deployment?

Answer:
The best way is, arguably, the XML export and import, as it is very easy.
But again it all depends upon the requirement; if we want to migrate some workflows with de-
pendent objects at once shot, then the suggested way is XML export and import.

If you need to migrate only some small objects (say some designer or workflow manager objects) then we
can go for copying through Repository Manager or through Designer(for Designer objects) or through Work-
flow manager (for Workflow manager objects) itself. But for this we have to be connected to both the repos-
itories while coping.

Sometime we may need to migrate entire project and want to have a complete log of deployment, then we
can go for creating Deployment Group using Deployment Wizard.
We might use pmrep to automate exporting objects on a daily or weekly basis. To use this command, we
must create a Control File with all the specifications that the Copy Wizard requires. The control file is an XML
file defined by the depcntl.dtd file. A deployment control file is an XML file that you use with the
DeployFolder and DeployDeploymentGroup pmrep commands to deploy a folder or deployment group.
We can create a deployment control file manually to provide parameters for deployment, or you can create
a deployment control file with the Copy Wizard. If you create the deployment control file manually, it must
conform to the depcntl.dtd file that is installed with the PowerCenter Client. You include the location of the
depcntl.dtd file in the deployment control file.

One good thing is we can roll back a deployment to purge the deployed versions from the target repository
or folder. When we roll back a deployment, you roll back all the objects in a deployment group that we de-
ployed at a specific date and time. We cannot roll back part of a deployment.

In the PowerCenter Client, we can export repository objects to an XML file and then import repository ob-
jects from the XML file. Use the following client applications to export and import repository objects:
Repository Manager: You can export and import both Designer and Workflow Manager Objects.
Designer: You can export and import Designer objects.
Workflow Manager: You can export and import Workflow Manager objects.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

101
pmrep: You can export and import both Designer and Workflow Manager objects. You might use
pmrep to automate exporting objects on a daily or weekly basis.

3. What are the various options for ETL code migration

Answer:
There are couples of Options Available for Code migration.

If you have a Versioned Repository, as the first step Check in all the Workflows and dependent objects. Now
we have Couple of different ways to achieve the migration.

Option 1. Now you can export the Workflow from Repository Manager using the Export Object Option to ex-
port as XML and then import into QA using Repository Manager Import Object Option.

Option 2. You can keep your Dev and QA is in the same Repo, you can just do the Drag and Drop option. For
this Open Both Dev and QA Folders in Repository Manager and Just Drag the Objects from Dev to QA.

Option 3. You can Create a Deployment Group using Repository Manager and attach all the Workflows you
need to migrate in the Deployment group and This Deployment group can be migrated

Option 4. You have the Option to Migrate the Entire Folder As well

when we can Use these Options

Option 1. We can use this Option when the number of Workflows to migrate is few. If you do not have
Informatica Versioned Repository, These Exported XML can be used to keep your Versions.

Option 2. When you have less number of Workflows to Migrate you can use this option.

Option 3. Large number of Objects migrated together. It will keep the list of Objects migrated as a group and
in case of a rollback is required it is easy in this approach.

Option 4. Mostly used when you migrate a Project for the first time to QA with a large number of workflows .

4. What is labeling in Informatica?

Answer:
we can see label concept in many places like in our mail box. Some time we do group some of our mails to
different level. Like marking some mails to personal level.

In Informatica, Label is a global object that you can associate with any versioned object or group of ver-
sioned objects in a repository. You may want to apply labels to versioned objects to achieve the following re-
sults:

- Track versioned objects during development.
- Improve query results.
- Associate groups of objects for deployment.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

102
- Associate groups of objects for import and export.

For example, you might apply a label to sources, targets, mappings, and sessions associated with a workflow
so that you can deploy the workflow to another repository without breaking any dependency.

You can apply the label to multiple versions of an object. Or you can specify that you can apply the label to
one version of the object.

You can create and modify labels in the Label Browser. From the Repository Manager, click Versioning > La-
bels to browse for a label.

Informatica Version control is nothing but a team based development methodology where we create copies
of the actual objects to tract the modification using check in and checkout options.

5. Suppose having Informatica Version Control in place, can we revert back an object to a
state of two previous version.

Answer:
From the Version History of the Object, open the required version of the Object in Workspace.
Next export the xml metadata of the Object.
Next Check out the Object.
Then import the metadata exported earlier.
Save and Check In the Object.

6. What do we mean by Team based development in Informatica?

Answer:
Team based development is nothing but version control for the metadata objects.

If we have the team-based development option, we can enable version control for the repository. A ver-
sioned repository stores multiple versions of an object. Each version is a separate object with unique proper-
ties. A PowerCenter version control feature allows us to efficiently develop, test, and deploy metadata into
production.

During development, we can perform the following change management tasks to create and manage multi-
ple versions of objects in the repository:
Check out and check in versioned objects.
Compare objects.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

103
Track changes to an object.
Delete or purge a version.
Use global objects such as queries, deployment groups, and labels to group versioned objects.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

104
25. Scenario Questions

1. Suppose we have ten source flat files of same structure. How can we load all the files in
target database in a single batch run using a single mapping?

Answer:
After we create a mapping to load data in target database from source flat file definition, next we move on
to the session property of the Source Qualifier.
To load a set of source files we need to create a file say final.txt containing the source flat file names, ten
files in our case and set the Source filetype option as Indirect. Next point this flat file final.txt, fully qualified
with Source file directory and Source filename.

2. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target
tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after TGT1?

Answer:
If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the or-
der in which the Integration Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier trans-
formations in a mapping to specify the required loading order.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

105

It defines the order in which Informatica server loads the data into the targets. This is to avoid integrity con-
straint violations

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

106
3. Suppose we have a Source Qualifier transformation that populates two target tables. How
do you ensure TGT2 is loaded after TGT1?

Answer:

In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration
Service orders the target load on a row-by-row basis. For every row generated by an active source, the Inte-
gration Service loads the corresponding transformed row first to the primary key table, then to the foreign
key table.

Hence if we have one Source Qualifier transformation that provides data for multiple target tables having
primary and foreign key relationships, we will go for Constraint based load ordering.

4. Suppose we have the EMP table as our source. In the target we want to view those em-
ployees whose salary are greater than or equal to the average salary for their depart-
ments. Describe your mapping approach.

Answer:
Our Mapping will look like this:
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

107

To start with the mapping we need the following transformations:
After the Source qualifier of the EMP table place a Sorter transformation. Sort based on DEPTNO port.

Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE SALARY for each
(GROUP BY) DEPTNO.
When we perform this aggregation, we lose the data for individual employees.
To maintain employee data, we must pass a branch of the pipeline to the Aggregator Transformation and
pass a branch with the same sorted source data to the Joiner transformation to maintain the original data.
When we join both branches of the pipeline, we join the aggregated data with the original data.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

108

So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data,
based on DEPTNO. Here we will be taking the aggregated pipeline as the Master and original dataflow as De-
tail Pipeline.
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

109

After that we need a Filter Transformation to filter out the employees having salary less than average salary
for their department.
Filter Condition: SAL >= AVG_SAL
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

110

Finally we place the Target table instance.

5. How can we perform changed data capture based on load sequence number (integer) col-
umn present in the Source table?

Answer:
Create a Mapping Variable as integer data type and Aggregation type as MAX. Set the value of this
mapping variable in any of these transformations (Expression, Filter, Router or Update Strategy).
Use SETMAXVARIABLE( $$Variable, load_seq_column ) function. This function will assign the MAX
sequence number of that particular load into the variable $$variable.

This function executes only if a row is marked as insert. SETMAXVARIABLE ignores all other row types and
the current value remains unchanged. The function sets the current value of a mapping variable to the high-
er of two values- the current value of the variable or the value from the source column for each record. At
the end of a successful session, the Integration Service saves the final current value to the repository.
When used with a session that contains multiple partitions, the Integration Service generates different cur-
rent values for each partition. At the end of the session, it saves the highest current value across all parti-
tions to the repository. Unless overridden, it uses the saved value as the initial value of the variable for the
next session run.

Now since the max sequence number for previous load is captured in this mapping variable and is saved in
the repository. We can use this variable as a filter in the Source Qualifier query. Next time when we run the
workflow, it will only extract those records having load sequence number greater than this sequence num-
ber.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

111
In my mapping I have 3 tables that we are joining.
In the source query we want to filter the data based off a value that is stored in one of our target tables. Is
there a way of pulling that one particular value from that target table and be able to use it in the filter in the
source qualifier? Basically the value is a load sequence number that gets incremented with each session run.
So when the session runs again we only pull records that are greater than that load sequence number.
Answer:
There are different options to solve the problem.

Option 1: Assumption- Source and target tables cannot be accessed using a single DB Connection and "load
Sequence Number" is modified by the current process.

In this case you can use a mapping variable in the mapping and set the value of the mapping variable to the
highest/current value using the SETMAXVARIABLE function. This value will be stored in Informatica reposito-
ry and the same value can be used in Source Qualifier Filter for the next session run. If incase the workflow
fails, the value of the mapping variable will not get incremented.

Steps
Define mapping Variable with Aggregation type as MAX.
Use SETMAXVARIABLE($$variable, Current load Sequence Number") function to store the value into
repository.
Use the variable $$Variable in Source Qualifier filter.
We can provide a default value for the variable and change the value during your code migration to set the
starting value

Option 2: Assumption- Source and target tables cannot be accessed using a single DB Connection and "load
Sequence Number" is modified by different process.

In this case you can create a mapping parameter and need to pass the value as a parameter.

Steps

Create a workflow to get the latest "load Sequence Number" and create a parameter file.
This workflow will write a flat file which will contain the parameter value. E.g.
[wf_DAILY_INCR_LOAD]
$$Variable=100

In the actual mapping
Define a mapping parameter $$Variable and use $$Variable in the Source Qualifier

Each time you need to run the workflow which creates the parameter file before your actual workflow is run

Option 3: Assumption- Source and Target table can be accessed using a single DB connection.

If both your source and target tables are connected using a single DB Connection, we can write the filter to
get the latest data in the Source Qualifier itself joining all the tables.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

112
7. How can we load x records (user defined record numbers) out of N records from source
dynamically, without using filter and sequence generator transformation?

Answer:

Take a mapping parameter say $$CNT to pass the number of records we want to load dynamically by
changing in the parameter file each time before session run.
Next after the Source Qualifier use an Expression transformation and create one output port say
CNTR with value CUME (1).
Next use an Update Strategy with condition IIF ($$CNT >= CNTR, DD_INSERT, DD_REJECT).

8. Suppose we have n number of rows in the Source and we have two target tables. How
can we load n/2 i.e. first half the source data into one target and the remaining half into
the next target?

Answer:
Use a Expression transformation with an output port ROWNUM with the expression CUME(1)
Next use a Router with 2 groups having below conditions:
MOD( ROWNUM, 2 ) = 0
MOD( ROWNUM, 2 ) = 1
Connect to the corresponding target instances.
Alternatively,
Below are the implementation steps in Informatica.
First place the Source table and its corresponding Source Qualifier in the mapping.
Next split the data into two flows; One going to the Expression Transformation with all the ports and
the other flow with any one column to an Aggregator Transformation.
In the Aggregator add a numeric output port say CNT with expression as COUNT (1) and do not
group by on any other input port.
Propagate this output column CNT to an Expression Transformation. Next in this expression trans-
formation create another numeric output port JN with expression value 1.
Now lets go back to the first expression transformation having all the source columns. Introduce a
Sequence Generator transformation with RESET attribute property enabled and propagate
the NEXTVAL port to the expression transformation. Next also add one more numeric output
port JN with expression value 1
Now take a Joiner Transformation and check the property Sorted Input.
Now bring in all the columns from the Expression Transformation next to the Source Qualifier. An-
other flow to the joiner is from the expression with two columns CNT and JN. Join condition is based
on JN ports.
Next after the joiner place a Router Transformation. Create one group say FST with condition
as NEXTVAL < (CNT/2).
Next introduce two target tables first and second. Propagate the columns of the FST group of the
router to the first target. Next propagate the columns of the Default group of the router transfor-
mation to the second target.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

113

9. Suppose we have a flat file which has a header record with file creation date, and de-
tailed data records. Describe the approach to load the 'file creation date' column along
with each and every detailed record.

Answer:
We can use the below shell command to write the header information in another flat file as pre-
session command.
head -1 Sourc_File.dat > header.txt
Next Use this flat file header.txt as Lookup in the mapping.
Create an output port in expression transformation with value 'H' or the tag in the source data file
that identifies the header record
Use this as Lookup condition and get the file creation date as return field and populate it in your tar-
get table.

Suppose we have the below two tables. What will be the output if we select Table 1 as Source and use Joiner
and Lookup transformation on Table 2 based on column ID?
Table 1 Table 2
ID ID Name
10 10 A
10 B
10 C

Answer:
When we use a Joiner Transformation as Inner Join on column id, we will get 3 rows as output.
When we use Passive Lookup Transformation we will get 1 row as output. In this case of multiple lookup
match, lookup will return either the first or the last as configured in on multiple matches property of the
transformation.
When we use Active Lookup Transformation we will get 3 rows as output, as active lookup returns all the
matching values on multiple lookup matches.

11. Suppose we have a flat file which contains just a numeric value. We need to populate this
value in one column of the target table for every source record. How can we achieve this?

Answer:
Use an Expression and create a decimal Output port say DUMMY with a very high number
along with other I/O ports from the source table.
Say, DUMMY = 99999999999 [Note- Use such a number value that can never appear in the
lookup flat file.]
Now use a Lookup transformation based on the source file. Say, the column name in the lookup
is VALUE
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

114
Map DUMMY from Expression to Lookup and use the lookup condition as
DUMMY != VALUE
Next use the VALUE column of the Lookup to populate the target column.

12. How will you load a source flat file into a staging table when the file name is not fixed?
The file name is like sales_2013_02_22.txt, i.e. date is appended at the end of the file as a
part of file name.
Answer:
The generic file name is like- sales_YYYY_MM_DD.txt
One option is to rename the file in the pre session load task. We will use OS level command to rename the
file to a fixed name. We will next set the Informatica source filename to this fixed name and load the file.
E.g. in Unix:
$> mv sales_*.txt sales.txt
Another option is to use Indirect Loading with a fixed file name. The content of the filename will contain the
actual filename to be processed.
E.g. in Unix:
$> ls sales_*.txt > sales.txt

13. Solve the below scenario using Informatica and Database SQL.
Source
PRODUCT_ID PRODUCT_NAME PRODUCT_PRICE
10 Lux 100
10 Dove 200
20 Cinthol 400
20 Dettol 500
30 Fiama 600

Target

Answer:
Using Informatica:

In one pipeline, calculate SUM (product-price) GROUP BY product-id using Aggregator transformation.

In the other flow bring all the data normally, then join the first flow with the second using an Informatica
Joiner transformation suing join column product-id and join type inner join.
PRODUCT_ID PRODUCT_NAME PRODUCT_PRICE SUM_PRODUCT_PRICE
10 Lux 100 300
10 Dove 200 300
20 Cinthol 400 900
20 Dettol 500 900
30 Fiama 600 600
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

115

Using SQL:

SELECT M.*, N. SUM_PRODUCT_PRICE
FROM SOURCE M,
(SELECT SUM (PRODUCT_PRICE) SUM_PRODUCT_PRICE, PRODUCT_ID
FROM SOURCE
GROUP BY PRODUCT_ID) N
WHERE M. PRODUCT_ID = N. PRODUCT_ID

14. Suppose we have a column in source with values as below:
EMPNO ENAME SAL
1 Tom 100
2 Jack 200
3 Peter 150
4 Donald 230
999 TEST 999
6 Eric 300

If we encounter EMPNO = 999, then whole record set should not be loaded in target table. Describe the ap-
proach.
Answer:
From Source create two flows:-

1: Source -> Expression -> Sorter
2: Source -> Filter ->Expression -> Sorter

1.1 In the Expression create output field dummy_M as 'X'
1.2 Sort on dummy

2.1 In the Filter set Filter Condition as EMPNO = 999
2.2 In the Expression create output field dummy_D as 'X'
2.3 Sort on dummy

3. Next use a Joiner Transform:

Set first flow as Master and second flow as Detail.
Set Join Condition as dummy_M = dummy_D
Set Join Type as Detail Outer Join.
Use Sorted Input.

4. Next use a Filter Transform:

Set Filter Condition as dummy_D IS NULL

And finally your Target.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

116
15. Can we pass the value of a mapping variable between 2 pipelines under the same map-
ping? If not how can we achieve this?

Answer:
We cannot pass the value of an Informatica variable between 2 pipelines in a same mapping. Mapping varia-
bles are values that can change between sessions. The Integration Service saves the latest value of a map-
ping variable to the repository only at the end of each successful session run. Now in case we have two pipe-
lines under same mapping- The mapping will have a single session and the value of the mapping variable will
be saved to the repository only when this session succeeds, that means when both the pipeline execution
completes.
The alternative method to solve this scenario is as below:
1. Split the pipelines into two different mappings say map1 and map2.
2. Create a mapping variable say var1 in map1 and set the value of the variable using SETVARIABLE ()
function. Next our goal is to pass the value of var1 at the end of the successful session run to map2.
3. Create a mapping variable say var2 in map2 and use this in the mapping where ever the value of the
variable from the first mapping var1 is required.
4. Create the workflow with a workflow variable say "wfvar".
5. Create two Non-Reusable sessions say ses1,ses2 for map1, map2 respectively.
6. In the Post-session success variable assignment of ses1 assign the value of mapping variable var1 to
workflow variable wfvar.
7. In the Pre-session variable assignment of ses2 assign the value of workflow variable wfvar to the map-
ping variable var2.

With this approach, we will be able to pass the value from the first session to the second session.

Suppose we have a huge (size in GB) flat file as source. The flat file contains 22 columns- out of which 4 col-
umns are considered as key columns-CUST_SRC_ID, PRODUCT_ID, FF_ID, SNM_ID
There is one more column in the flat file relevant to the discussion that is DATE_ID which stores date in YYYY-
MM-DD format.
The flat file contains duplicate records based on the above 4 columns (that is - the records are not entirely
duplicated, may be some values are different in some other columns).
Now the requirement is to choose all the unique records from the flat file based on the uniqueness of the
above mentioned keys. If there is any duplicate record then, we must select the record for which DATE_ID
column contains the latest value. So suppose we get following records in the flat file:
CUST_SRC_ID PRODUCT_ID FF_ID SNM_ID DATE_ID OTHER COLUMNS
123 P1 F1 S1 2013-01-02 X, Y, Z
123 P1 F1 S1 2013-01-06 P, Q, R
123 P1 F1 S1 2013-01-02 S, T, U

In the above case we want the following row in the target:
CUST_SRC_ID PRODUCT_ID FF_ID SNM_ID DATE_ID OTHER COLUMNS
123 P1 F1 S1 2013-01-06 P, Q, R

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

117
How can we achieve this in a single mapping?
Answer:
Use a Sorter transformation after Source Qualifier. Sorting key will be in below order:
CUST_SRC_ID Ascending order
PRODUCT_ID Ascending order
FF_ID Ascending order
SNM_ID Ascending order
DATE_ID Descending order
Next use an Expression transformation and create 3 variable ports in the below order:
V_Keys = CUST_SRC_ID || PRODUCT_ID || FF_ID || SNM_ID
V_FLAG = IIF (V_Keys != V_Keys_PREV, 1, 0)
V_Keys_PREV = V_Keys
O_FLAG = V_FLAG (output port)
Now use a filter transformation with filter condition as below:
O_FLAG=1
After sorting the data, for every group based on the unique keys, first record will have the latest date, be-
cause we have sorted it on DATE_ID descending. Using this expression logic, for every group 1st record (with
latest date) will have O_FLAG value as 1 and rest others with 0. We will filter those unwanted duplicate rec-
ords using Filter transformation.

I have a flat file with just one column as given below-
C1
L1
C2
L2
C3
L3

where data starting with C denotes company name and that of L depicts Location of the Company.
Have to load this data in Target table (using Infa) as -
C1, L1
C2, L2
C3, L3

Answer:
This is what i would do to achieve this req.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

118
1. After the SQ, in a expression generate (This is tricky, use variable port logic)
unique sequence number each group
unique number for each record with in the group
duplicate the column once
After the Expression the output will be as below
Col1, Col2, Col3, Col4
1,1,C1,C1
1,2,L1,L1
2,1,C2,C2
2,2,L2,L2
3,1,C3,C3
3,2,L3,L3

2. Add an Aggregator with
group by on the first column
Agg expression max(col3, col2 = 1)
Agg expression max(col3, col2 = 2)

18. Implement slowly changing dimension of Type 2 which will load current record in Current
table and old data in Log table.

Answer:
Use Joiner transformation to join Source and Current table with Full Outer Join.
Next use Expression transformation to mark the rows which are new or old and correspondingly
assign values like 0 or 1 in new output port.
Pass all the columns to a Router transformation and filter based on new port created.
If 0 means use Update Strategy transform DD_INSERT with insert to current table.
If 1 means use Update Strategy transform DD_UPDATE with update to current table
Also populate the data from Current table for 1 to the Log table.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

119
26. Performance Tuning

1. Which one is faster Connected or Unconnected Lookup?

Answer:
There can be some very specific situation where unconnected lookup may add some performance benefit on
total execution.

If you are calling the Unconnected lookup based on some condition (e.g. calling it from an expression
transformation only when some specific condition is met - as opposed to a connected lookup which will be
called anyway) then you might save some calls to the unconnected lookup, thereby marginally improving
the performance.

The improvement will be more apparent if your data volume is really huge. Keep the Pre-build Lookup
Cache option set to Always disallowed for the lookup, so that you can ensure that the lookup is not even
cached if it is not being called, although this technique has other disadvantages, check
http://www.dwbiconcepts.com/etl/14-etl-informatica/46-tuning-informatica-lookup.html , especially the
points under following subheadings:
- Effect of choosing connected OR Unconnected Lookup, and
- WHEN TO set Pre-build Lookup Cache OPTION (AND WHEN NOT TO)

2. How we can improve performance of Informatica Normalization Transformation.

Answer:
As such there is no way to improve the performance of any session by using Normalizer. Normalizer is a
transformation used to pivot or normalize datasets and has nothing to with performance. In fact, Normalizer
does not much impact the performance (apart from taking a little more memory).

3. How to improve the Session performance?

Answer:
Run concurrent sessions
Partition session (Power center)
Tune Parameter - DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit In-
terval, Tracing level (Normal, Terse, Verbose Initialization, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
The Informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner trans-
formation. The server stores the transformed data from the above transformation in the data cache
before returning it to the data flow. It stores group information for those transformations in index
cache. If the allocated data or index cache is not large enough to store the date, the server stores
the data in a temporary disk file as it processes the session data. Each time the server pages to the
disk the performance slows. This can be seen from the counters. Since generally data cache is larger
than the index cache, it has to be more than the index.
Remove Staging area
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

120
Tune off Session recovery
Reduce error tracing

4. How do you identify the bottlenecks in Mappings?

Answer:
Bottlenecks can occur in
Targets - The most common performance bottleneck occurs when the informatica server writes to a tar-
get database. You can identify target bottleneck by configuring the session to write to a flat file target. If
the session performance increases significantly when you write to a flat file, you have a target bottle-
neck.
Solution:
Drop or Disable index or constraints
Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised)
Tune the database for RBS, Dynamic Extension etc.,

Sources - Set a filter transformation after each SQ and see the records are not through. If the time taken
is same then there is a problem. You can also identify the Source problem by Read Test Session - where
we copy the mapping with sources, SQ and remove all transformations and connect to file target. If the
performance is same then there is a Source bottleneck.

Using database query - Copy the read query directly from the log. Execute the query against the source
database with a query tool. If the time it takes to execute the query and the time to fetch the first row
are significantly different, then the query can be modified using optimizer hints.
Solution:
Optimize Queries using hints.
Use indexes wherever possible.

Mapping - If both Source and target are OK then problem could be in mapping. Add a filter transfor-
mation before target and if the time is the same then there is a problem. (OR) Look for the performance
monitor in the Sessions property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a mapping bottleneck.
Optimize Single Pass Reading:
Optimize Lookup transformation :
o Caching the lookup table: When caching is enabled the Informatica server caches the lookup ta-
ble and queries the cache during the session. When this option is not enabled the server queries
the lookup table on a row-by row basis. Static, Dynamic, Shared, Un-shared and Persistent cache

o Optimizing the lookup condition: Whenever multiple conditions are placed, the condition with
equality sign should take precedence.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

121
o Indexing the lookup table: The cached lookup table should be indexed on order by columns. The
session log contains the ORDER BY statement The un-cached lookup since the server issues a SE-
LECT statement for each row passing into lookup transformation, it is better to index the lookup
table on the columns in the condition

Optimize Filter transformation: You can improve the efficiency by filtering early in the data flow. Instead
of using a filter transformation halfway through the mapping to remove a sizable amount of data.

Use a source qualifier filter to remove those same rows at the source, If not possible to move the filter
into SQ, move the filter transformation as close to the source qualifier as possible to remove unneces-
sary data early in the data flow.

Optimize Aggregate transformation:
o Group by simpler columns. Preferably numeric columns.
o Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes
all input data are sorted and as it reads it performs aggregate calculations.
o Use incremental aggregation in session property sheet.

Optimize Seq. Generator transformation:
o Try creating a reusable Seq. Generator transformation and use it in multiple mappings
o The number of cached value property determines the number of values the Informatica server
caches at one time.

Optimize Expression transformation:
o Factoring out common logic
o Minimize aggregate function calls.
o Replace common sub-expressions with local variables.
o Use operators instead of functions.

Sessions: If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
You can identify a session bottleneck by using the performance details. The informatica server creates
performance details when you enable Collect Performance Data on the General Tab of the session prop-
erties. Performance details display information about each Source Qualifier, target definitions, and indi-
vidual transformation. All transformations have some basic counters that indicate the Number of input
rows, output rows, and error rows. Any value other than zero in the readfromdisk and writetodisk coun-
ters for Aggregate, Joiner, or Rank transformations indicate a session bottleneck. Low
BufferInput_efficiency and BufferOutput_efficiency counter also indicate a session bottleneck. Small
cache size, low buffer memory, and small commit intervals can cause session bottlenecks.

System (Networks)

5. How do you handle performance issues in Informatica? Where can you monitor the per-
formance?

Answer:
There are several aspects to the performance handling .Some of them are:-
Source tuning
Target tuning
Repository tuning
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

122
Session performance tuning
Incremental Change identification in source side.
Software, hardware (Use multiple servers) and network tuning.
Bulk Loading
Use the appropriate transformation.
To monitor this
Set performance detail criteria
Enable performance monitoring
Monitor session at runtime &/ or Check the performance monitor file .

6. What are performance counters?

Answer:
The performance details provide that help you understand the session and mapping efficiency. Each Source
Qualifier, target definition, and individual transformation appears in the performance details, along with that
display performance information about each transformation
Understanding Performance Counters
All transformations have some basic that indicates the number of input rows, output rows, and error rows.
Source Qualifiers, Normalizes, and targets have additional that indicates the efficiency of data moving into
and out of buffers. You can use these to locate performance bottlenecks. Some transformations have specif-
ic to their functionality. For example, each Lookup transformation has an indicator that indicates the number
of rows stored in the lookup cache. When you read performance details, the first column displays the trans-
formation name as it appears in the mapping, the second column contains the name, and the third column
holds the resulting number or efficiency percentage. When you partition a source, the Informatica Server
generates one set of for each partition. The following performance illustrate two partitions for an Expression
transformation:
Transformation Counter Value
EXPTRANS [1]
o Expression_input rows 8
o Expression_output rows 8
EXPTRANS [2]
o Expression_input rows 16
o Expression_output rows 16
Note: When you partition a session, the number of aggregate or rank input rows may be different from the
number of output rows from the previous transformation.

7. How can we increase Session Performance?

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

123
Answer:
Minimum log (Terse)
Partitioning source data
Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
Adding indexes
Changing commit Level
Using Filter transformation to remove unwanted data movement
Increasing buffer memory, when large volume of data
Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expres-
sions.
In session level, the causes are small cache size, low buffer memory and small commit interval

At system level,
WIN NT/2000-Use the task manager
UNIX: VMSTART, IOSTART
Hierarchy of optimization
Target
Source
Mapping
Session
System
Optimizing Target Databases:
Drop indexes /constraints
Increase checkpoint intervals
Use bulk loading /external loading
Turn off recovery
Increase database network packet size
Source level
Optimize the query (using group by, group by)
Use conditional filters
Connect to RDBMS using IPC protocol
Mapping
Optimize data type conversions
Eliminate transformation errors
Optimize transformations/ expressions
Session
Concurrent batches
Partition sessions
D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

124
Reduce error tracing
Tune session parameters
System
Improve network speed
Use multiple preservers on separate systems
Reduce paging

What would be the best approach to update a huge table (more than 200 million records) using Informatica.
The table does not contain any primary key. However there are a few indexes defined on it. The target table
is partitioned. On the other hand the source table contains only a few records (less than a thousand) that will
go to the target and update the same. Is there any better approach than just doing it by an update strategy
transformation?
Answer:
Since the target busy percentage is 99.99% it is very clear that the bottleneck is on the target. So we need
tweak the target. I have couple of Options
1. Since the target tale is partitioned on time_id, you need to include in the WHERE clause of the SQL fired by
Informatica. For that you can define the time_id column as primary key in the target definition. With this
your update query will have the time_id in the where clause.
2. With Informatica update strategy, it fires update sql for every row which is marked for update by update
strategy. To avoid multiple update statements you can INSERT all the records which is meant to be UPDATE
into a temporary table. Then use a correlated sql to update the records in the actual table (200M table). This
query can be fires as a post session SQL. Please see the sample SQL
UPDATE TGT_TABLE U SET (U.COLUMNS_LIST /*Column List to be updated*/) = (SELECT I.COLUMNS_LIST
/*Column List to be updated*/ FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID)
WHERE EXISTS (SELECT 1 FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID)
TGT_TABLE
Actual table with 200M records UPD_TABLE - Table with records meant for UPDATE (1K record) We need to
make sure that your indexes are up to date and stats are collected. Since this is more to be done with DB
performance, you may need the help of DBA as well to check the DB throughput, SQL cost etc Hope this will
help you.

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

D
W
B
I
C
o
n
c
e
p
t
s

Informatica Question & Answer Set

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Informatica Question & Answer Set

Uploaded by

Copyright:

Available Formats

D W B I C o n c e p t s .

You might also like

Informatica Question &amp; Answer Set

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Informatica Question &amp; Answer Set

Uploaded by

Copyright:

Available Formats

D W B I C o n c e p t s .

You might also like

Informatica Question & Answer Set

Informatica Question & Answer Set