You are on page 1of 53

Sl. No.

1 2 3

Questions Can we have multiple conditions in a Filter? How the flags are called in Update strategy? What is diff. Things u can do using PMCMD? What kind of Test plan? What kind of validation you do? What is the usage of unconnected/connected look up? What is the difference between Connected and Unconnected Lookups ? Yes. (We can place multiple conditions in one filter condition, we can not give multiple conditions in separate groups) DD_UPDATE-1,DD_INSERT-0,DD_DELETE-2,DD_REJECT-3 pmcmd command is a command line prompt to contact the informatica server. In previous versions it is called as pmrepserver. using pmcmd command you can run the mapping without using the workflow manager.

Lookup is used to get the related value from the source or target.Two types of lookups are there: Connected and Unconnected. If the return value is single we need to go for unconnected lookup. If we want to return multiple columns then we need to use connected lookup. Dynamic cache, user defined default values will supoort in connected lookup. Connected transformation receives input from the pipeline. We can use dynamic or static cache return multiple columns as output if there is no match for the lookup condition, the powe center server returns the default value. We can define the user defined value Unconnected transformation receives input from the result of :LKP expression in another transformation It uses static cache If there is no match for the lookup condition, the power center server returns the null value. We cannot define the user defined value

If u have data coming from For hetrogenous sources we need to use join transformation diff. sources what transformation will u use in your designer? What are different ports in Input, output,input/output,variable,Group by,Rank port,Lookup port Informatica?

What is a Variable port? Why it is used? Diff between Active and passive transormation ?

Variable port will use with in the transformation. It is local to the particular transformation. We can store temporary results to perform the calculations. The values will maintain for the entire mapping. Active transformation will change the number of rows that pass through it. Eg: Source Qualifier, Filter, Router, Joiner,Aggregator,Union ,Update strategy etc. Passive transformation will not change the number of rows that pass through it.Eg: Lookup, Sequence generator, Stored procedure, external procedure etc.

10

11 12

What are Mapplet? What is Aggregate transformation

13

What is Router Transformation? How is it different from Filter transformation?

Mapplet is set of transformations that can be reusable. Aggregaotor transformation allows us to perform the calculations such as averages and sums etc. The aggregator transformation unlike the expression transformation in that you can use aggregate functions and groups but difference is expression transformaion will do calculations on row by row basis. The power center server performs aggregate calculations as it reads and stores necessary data group and row data in an aggreate cache. We can improve the performance by using the sorted input. if we place the sorted input the data should come in sorted order otherise session will fail Router transformation is similar to the filter transformation because both transformations will use to test the condition. A filter transformation tests the data for one condition and drops the rows of data that do not meet conditon. Router transformation can tests data for one or more conditions and gives you the option to route rows that do not meet any of the condition to a default output group. Router transformation contains one input group , one or more output groups and one default group.

14

What are connected and Connected transformation connected to other transformation in a pipeline. unconnected transformations? Unconnected transformation is not connected to other transformation in the mapping. A connected transformation is called within another transformation and returns a value to that transformation. What is Normalizer transformation? Normalizer transformation is the process of organizing data. In database terms, this includes creating normalized tables and establishing relationships between those tables according to rules designed to both protect the data and make the database more flexible by eliminating redundancy. Normalizer transformation normalizes records from cobol and relational sources. Use normalizer transformation instead of source qualifier transformation for cobol sources. 1. We can call in source qualifer transformation (sequencename.nextval) 2. By using stored procedure transformation we can get the value of sequence

15

16

How to use a sequence created in Oracle in Informatica?

17

What are source qualifier transformations? What are cache and their types in Informatica?

18

19

What is an incremental aggregation?

When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier transformation represents the rows that the Integration Service reads when it runs a session The power center server builds a cache in memory for the rank,aggregator,joiner,lookup,sorter transformations in a mapping. It allocates memory for the cache based on the amount we defined in the transformation or session properties. Types of caches : Index and Data cache, lookup caches (Static,dynamic,shared,persistent,recache from source) Incremental aggregation is used for aggregator transformation. Once the aggregate transformation is placed in mapping, in session properties we need to check the increment aggregation property. So that the data will aggregate for incrmentallly. The first time you run an incremental aggregation session, the Integration Service processes the source. At the end of the session, the Integration Service stores the aggregated data in two cache files, the index and data cache files. The Integration Service saves the cache files in the cache file directory. The next time you run the session, the Integration Service aggregates the new rows with the cached aggregated values in the cache files. When you run a session with an incremental Aggregator transformation, the Integration Service creates a backup of the Aggregator cache files in $PMCacheDir at the beginning of a session run. The Integration Service promotes the backup cache to the initial cache at the beginning of a session recovery run. The Integration Service cannot restore the backup cache file if the session aborts

20

What is Reject loading?

By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets. The writer may reject a row in the following circumstances: It is flagged for reject by an Update Strategy or Custom transformation. It violates a database constraint such as primary key constraint. A field in the row was truncated or overflowed, and the target database is configured to reject truncated or overflowed data. By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. Note: If you enable row error logging, the Integration Service process does not create a reject file.

21

WHAT IS SESSION and BATCHES?

Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session. Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches :

22

Significance of Source Qualifier Transformation

23

What are 2 modes of data movement in Informatica Server?

Sequential - Run Session One after the source Other.concurrent - Run Session you At The Same Time. it to a Source When you add a relational or a flat file definition to a mapping, need to connect Qualifier transformation. The Source Qualifier transformation represents the rows that the Integration Service reads when it runs a session. The following tasks we can do by using the SQ transformation: 1. Join data coming from the same database 2. Filter the rows when the informatica server reads the source data 3. We can use outer join instead of normal join ASCII UNICODE

24

Why we use lookup transformations?

By using lookup transformation we can do the following things: 1. Get a related value 2. Perform Calculation 3. Update slowly changing dimension tables - We can use lookup transformation to determine whether the records already exist in the target or not.

25 26

What are confirmed Confirmed dimensions can be shared by multiple facts dimensions What is Data warehousing Datawarehouse is a relational database and it is designed for query and analysis purpose. It contains the historical data derived from the transaction data and also it include data from other sources. What is a reusable transf.. What is a mapplet . Explain diff. Bet them Reusable tranformation is a single transformation it can be used in any other mappings. Mapplet is a reusable object it contains the multiple transformations. (The set of transformation logc embed in mapplet we can use this as a reusable logic in any number of mappings)

27

28

29 30

31

32

33 34

What happens when u use the delete or update or reject or insert statement in your update strategy? Where do u define users and privileges in Informatica when u run the session does debugger loads the data to target ? Can u use flat file and table (relational) as source together ? suppose I need to separate the data for delete and insert to target depending on the codition, which transformation u use ? What is the difference between lookup Data cache and Index cache. What is an indicator file and how it can be used.

It will perform the certain action what we specified in update strategy transformation and it depends on treat source rows option in session properties Repository manager it will load the data but if we select the target discard option it will not load the data.

Yes.

Router or filter

Data cache - Output columns data other than condition columns Index cache - Condition columns This is one of the output file of informatica when the session runs it will generate the indicator file.If you use a flat file as a target, you can configure the Integration Service to create an indicator file for target row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. The Integration Service process names this file target_name.ind and stores it in the same directory as the target file. Filter transformation is used to filter the data based on the condition specified in filter condition value.

35

36

What is an Filter Transformation? or what options u have in Filter Transformation? What happens to the discarded rows in Filter Transformation.

The discarded rows will ignore by the informatica server

37

What are the two programs that communicate with the Informatica Server?

Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server: Server Manager. A client application used to create and manage sessions and batches, and to monitor and stop the Informatica Server. You can use information provided through the Server Manager to troubleshoot sessions and improve session performance. pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server, and verify if the Informatica Server is running. The Designer has tools to help you build mappings and mapplets so you can specify how to move and transform data between sources and targets. The Designer helps you create source definitions, target definitions, and transformations to build the mappings.

38

What u can do with Designer ?

39

What are different types of Tracing Levels u hv in Transformations?

Normal :Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. Terse:Integration Service logs initialization information and error messages and notification of rejected data Verbose initialize:In addition to normal tracing, Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics Verbose data: In addition to verbose initialization tracing, Integration Service logs each row that passes into the mapping. and provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation.

40 41

What is Mapplet and how do u create Mapplet? If data source is in the form of Excel Spread sheet then how do use?

Mapplet is reusable transformation logic. We will create mapplet in mapplet designer. 1.Install the microsoft excel odbc driver 2. Create data source for the driver. 3. Define ranges for the excel sheet and set the datatypes for all the columns. 4. import the excel into source analyser

42

When do u use connected lookup n when do u use unconnected lookup?

43

44

45

How many values it (informatica server) returns when it passes thru Connected Lookup n Unconncted Lookup? What kind of modifications u Expression - performs calculations can do/perform with each Aggregator - Find the aggregate values, nested aggregates Transformation? filter - Filter the records Router - Filter the multiple conditions Stored procedure - call the oracle procedure Source qualifier - we can override the query, filter the records Expressions in Use the Expression transformation to calculate values in a single row before you write to the target. For Transformations, Explain example, you might need to adjust employee salaries, concatenate first and last names, or convert briefly how do u use? strings to numbers. Use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.

Connected lookup transformation is part of mapping pipeline, by using this we can receive multiple return values. Unconnected lookup transforamtion is a separate from the flow. we can call this in expression transformation by using the :LKP qualifier. we can use same lookup in multiple transformations. Unconnected lookup will return single return port connect lookup will return one or more output values

46

In case of Flat files (which comes thru FTP as source) has not arrived then what happens

We will get the fatar error and session will fail

47

48 49

What does a load manager do A component of the Integration Service that dispatches Session, Command, and predefined Event-Wait ? tasks across nodes in a grid. The load Manager is the Primary informatica Server Process. It Performs the following tasks Manages session and batch scheduling. Locks the session and read session properties. Reads the parameter file. Expand the server and session variables and parameters. Verify permissions and privileges. Validate source and target code pages. Create the session log file. Create the Data Transformation Manager which execute the session. What is a cache It stores the temporary results while running the session. What is an Expression Expression transformation is passive transformation transformation? I have two sources S1 having Master should be S1.In general, Master table contains less rows and detail table contains the more rows . 100 records and S2 having The cache will create for the master table rows. For each master row the detail record will process. Here 10000 records, I want to join the cache performance will increase them, using joiner transformation. Which of these two sources (S1,S2) should be master to improve my performance? Why? I have a source and I want to generate sequence numbers using mappings in informatica. But I dont want to use sequence generator transformation. Is there any other way to do it? What is a bad file? What is the first column of the bad file? What are the contents of the cache directory in the server 1. We can use oracle sequence 2. in expression transformation, declare one variable port and increment by 1 for every row processing.

50

51

52 53 54

Bad file is a file that contains the rejected information. Record and row indicator. Row indicator (0-insert,1-update,2-delete,3-reject) Data cache - Output columns data other than condition columns Index cache - Condition columns

55

56

Is lookup a Active transformation or Passive transformation ? What is a Mapping?

Passive

A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the Integration Service runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Every mapping must contain the following components: Source definition. Describes the characteristics of a source table or file. Transformation. Modifies data before writing it to targets. Use different transformation objects to perform different functions. Target definition. Defines the target table or file. Links. Connect sources, targets, and transformations so the Integration Service can move the data as it transforms it.

57 58

59

What are the types of transformations If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL port, what value will each target get? Have you used the Abort, Decode functions?

Active and Passive. Active transformation can change the number of rows that pass through it. Passive transformation can not change the number of rows that pass through it. If the 3 target are using the nextval output port from the same transformation the value of 3 targets are same. If the nextval outport port coming from 3 different flows the value of 3 target will be different. Its like multiple of 3. Abort can be used to stops trasforming at the row. Generally, you use ABORT within an IIF or DECODE function to set rules for aborting a session

60

What do you know about the Informatica server architecture? Load Manager, DTM, Reader, Writer, Transformer

61 62

What are the default values for variables? How many ways you can filter Filter or router or source qualifier or rank or update startagy the records?

Load Manager is the first process started when the session runs. It will manage the session andbatch scheduling, lock the session and read the session, read the parameter file, validate the parameter value and session variables, creates session logs, creates DTM process. After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks.The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads.If we partition a session, the DTM creates a set of threads for each partition to allow concurrent processing.. String - empty, numeric -0,Date -1/1/1753

63

How do you identify the bottlenecks in Mappings?

We should look the performance bottelnecks in the following order: Target Source Mapping Session System Identifying Target Bottlenecks The most common performance bottleneck occurs when the Informatica Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. If your session already writes to a flat file target, you probably do not have a target bottleneck. You can optimize session performance by writing to a flat file target local to the Informatica Server. Causes for a target bottleneck may include small check point intervals, small database network packet size, or problems during heavy loading operations. Identifying Source Bottlenecks If the session reads from flat file source, we probably do not have a source bottelneck . you can improve the session performance by setting the number of bytes the informatica server reads per line if you read from a flat file source. If the session reads from relational source, we have to use filter transformation, a read test mapping or database query to identify source bottelnecks. Using a Read Test Session: Use the following steps to create a read test mapping: 1. Make a copy of the original mapping. 2. In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries. 3. Remove all transformations. 4. Connect the source qualifiers to a file target. Use the read test mapping in a test session. If the test session performance is similar to the original session, you have a source bottleneck.

64

How to improve the Session performance?

If we do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The Integration Service creates performance details when you enable Collect Performance Data in the Performance settings on the session properties. Performance details display information about each transformation. All transformations have some basic counters that indicate the number of input rows, output rows, and error rows. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks. 1. we can implement partition concept at session level 2. we can increase the cache size, commit intervals 3. by running concurrent sessions 4. by optimizing the transformations 5. reduce the error records.

65 66

What is Business components Business components component will be available in folder ? Where it exists ? What are Short cuts ? Where it is used ? By dragging from one folder to another folder we can create shortcuts. If we want to create shortcuts the folder should be shared. After dragging the object we can rename it but we can not change anything else. When ever the main object changes it will inherit all the changes to shortcut also.

67

While importing the relational Extension name, column names,datatypes, constraints and database name source definition from database, what are the meta data of source U import? . How many ways U can update a relational source definition and what r they? 1. We can reimport the source definition 2. Manually edit the definition by adding new columns/update the existing columns.

68

69

.What r the unsupported repository objects for a mapplet?

70

What r the mapping parameters and mapping variables?

1. Normalizer transformation 2. XML Transformation 3. XML Targets 4. COBOL sources 5. other targets 6. other mapplets 7. pre and post session stored procedures 8. Joiner transformation 8. non reusable sequence generators Maping parameter represents a constant value that you can define before running a session. A mapping parameter retains the same value throughout the entire session. When u use the maping parameter ,U declare and use the parameter in a maping or maplet. Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session. No. We can use when the same mapping parameters and variables are created in other mapping. Yes.

71

Can U use the mapping parameters or variables created in one mapping into another mapping? Can u use the mapping parameters or variables created in one mapping into any other reusable transformation? How can U improve session performance in aggregator transformation?

72

73

By using sorted input option

74

.What r the difference Joiner transformation used to join the hetrogenous sources between joiner transformation and source qualifier Source qualifier used to join the tables in same database transformation?

75

In which conditions we can not use joiner transformation(Limitations of joiner transformation)?

1. if the join source is coming from the Update starategy transformation. 2. Sequence generator

76

What r the settings that u use Join type : Normal,master outer,detail outer,full outer to configure the joiner join condition transformation? Master and detail source check What r the join types in joiner transformation? How the informatica server sorts the string values in Rank transformation? Normal, master outer,detail outer,full outer If the data movement mode is ASCII format the string sort will use the binary sort order If the data movement mode is UNICODE format the string sort will use the sort order defined in session properties.

77 78

79

What is the Rank index in Rank transformation?

the designer automatically creates rank index port for each rank transformation. The power center server uses the rank index port to stored the ranking postion for each row in a group. It is output port only.

80

What is the Router transformation?

Router transformation is similar to the filter transformation because both transformations will use to test the condition. A filter transformation tests the data for one condition and drops the rows of data that do not meet conditon. Router transformation can tests data for one or more conditions and gives you the option to route rows that do not meet any of the condition to a default output group. Router transformation contains one input group , one or more output groups and one default group.

81

What r the types of groups in 1. Input group Router transformation? 2. Output groups : two types of output groups i.e user defined groups ,default group (we cannot change the default group)

82 83

Why we use stored procedure Stored procedures run in either connected or unconnected mode. We can call the stored procedures from transformation? the database. What r the types of data that We can send the data to stored procedure and also we can get the data from the stored procedure. passes between informatica There are three types of data that passes between informatica server and stored procedure. Input/output server and stored procedure? values, return value, status code.

84

What is the status code?

Status codes provide error handling for the Integration Service during a workflow. The stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. You cannot see this value. The Integration Service uses it to determine whether to continue running the session or stop. You configure options in the Workflow Manager to continue or stop the session in the event of a stored procedure error.

85

What r the tasks that source qualifier performs?

86 87

What is the default join that source qualifier provides? . What r the basic needs to join two sources in a source qualifier? what is update strategy transformation ? Describe two levels in which update strategy transformation sets? What is Data driven?

When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier transformation represents the rows that the Integration Service reads when it runs a session. The following tasks we can do by using the SQ transformation: 1. Join data coming from the same database 2. Filter the rows when the informatica server reads the source data 3. We can use outer join instead of normal join 4. specify sorted order 5. specify select distinct option normal the two sources should come from same database two sources join condition datatypes are same. For handling changes in existing rows we will go for update strategy. When you design a data warehouse, you need to decide what type of information to store in targets. As part of the target table design, you need to determine whether to maintain all the historic data or just the most recent changes Mapping level and session level Mapping level - we will use the update strategy transformation to flag rows for insert ,update,delete or reject Session level - we need to set the treat all rows property as inserts,update,delete,data driven. If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven by default. If you do not choose Data Driven when a mapping contains an Update Strategy, the Workflow Manager displays a warning. When you run the session, the Integration Service does not follow instructions in the Update Strategy in the mapping to determine how to flag rows.

88

89

90

91

What r the options in the target session of update strategy transformation?

92

Insert - Select this option to insert a row into a target table. Delete - Select this option to delete a row from a table. Update - You have the following options in this situation: Update as Update - Update each row flagged for update if it exists in the target table. Update as Insert - Inset each row flagged for update. Update else Insert - Update the row if it exists. Otherwise, insert it. table - Select this option to truncate the target table before loading data. What r the mappings that we Truncate Type1 ,Type2,Type3 Mappings use for slowly changing dimension table?

93

What r the different types of Type2 dimension mapping?

94

Type2 Version Data Mapping: Type2 Flag current Mapping: Type2 Effective Date Range Mapping: How can u recognize whether 1. We can check by using version or flag or effective date of the particular records. or not the newly added rows in the source r gets insert in the target ? What r two types of processes Load Balancer:When you run a workflow, the Load Balancer dispatches the Session, Command, and that informatica runs the predefined Event-Wait tasks within the workflow. The Load Balancer matches task requirements with session? resource availability to identify the best node to run a task. It dispatches the task to an Integration Service process running on the node. It may dispatch tasks to a single node or across nodes DTM Processes:The Integration Service starts a DTM process to run each Session and Command task Can u generate reports in Yes Informatica? What is metadata reporter? we can generate reports by using the metadata reporter. Define mapping and sessions? Mapping :A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the Integration Service runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Which tool U use to create and manage sessions and batches and to monitor and stop the informatica server? Why we use partitioning the session in informatica? To achieve the session partition what r the necessary tasks u have to do? Informatica server manager

95

96 97 98

99

100 101

To increase the session perofrmance. When running sessions, the PowerCenter Server can achieve high performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel. To accomplish this, use the following session and server configuration: Configure the session with multiple partitions. Install the PowerCenter Server on a machine with multiple CPUs. You can configure the partition type at most transformations in the pipeline. The PowerCenter Server can partition data using round-robin, hash, key-range, database partitioning, or pass-through partitioning.

102

How the informatica server increases the session performance through partitioning the source? . Why u use repository connectivity?

When you run a session that partitions relational or Application sources, the Integration Service creates a separate connection to the source database for each partition. It then creates an SQL query for each partition. You can customize the query for each source partition by entering filter conditions in the Transformation view on the Mapping tab. You can also override the SQL query for each source partition using the Transformations view on the Mapping tab. When u edit, schedule the session each time, informatica server directly communicates the repository to check whether or not the session and users are valid. All the metadata of sessions and mappings will be stored in repository. A component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks across nodes in a grid. The load Manager is the Primary informatica Server Process. It Performs the following tasks Manages worflow and batch scheduling. Locks the workflow and reads workflow properties. Reads the parameter file. Expand the server and workflow variables and parameters. Verify permissions and privileges. Validate source and target code pages. Create the workflow log file. Create the Data Transformation Manager which execute the session. After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks.The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads.If we

103

104

What r the tasks that Load manger process will do?

105

What is DTM process?

106

What r the different threads in MASTER THREAD DTM process? - Main thread of the DTM process. Creates and manages all other threads. MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping Information. Pre And Post Session Thread - One Thread Each To Perform Pre And Post Session Operations. READER THREAD - One Thread for Each Partition for Each Source Pipeline. WRITER THREAD - One Thread for Each Partition If Target Exist In The Source pipeline Write To The Target. TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition. What r the data movement ASCII modes in informatica? UNICODE What r the out put files that the informatica server creates during the session running? The PowerCenter Server creates the following output files: PowerCenter Server log Workflow log file Session log file Session details file Performance details file Reject files Row error logs Recovery tables and files Control file Post-session email Output file Cache files for reject by an Update Strategy or Custom transformation. In which circumstances that It is flagged informatica server creates Reject files? It violates a database constraint such as primary key constraint. A field in the row was truncated or overflowed, and the target database is configured to reject truncated or overflowed data.

107 108

109

110

Can u copy the session to a yes different folder or repository? . What is batch and describe about types of batches? Can u copy the batches? .How many number of sessions that u can create in a batch? When the informatica server marks that a batch is failed? . What is a command that used to run a batch What r the different options used to configure the sequential batches? In a sequential batch can u run the session if previous session fails? Can u start batches with in a batch? Can u start a session inside a batch individually? How can u stop a batch? What r the session parameters? Grouping sessions is called a batch. Two types of batches sequential, concurrent. no any number of sessions When the session fails (the session property indicates run if the previous session successful) pmcmd command 1. Run the session if the previous session successful 2. Always run the session yes, by setting the always run the session property NO Yes By using pmcmd command Session parameters represent values you can change between session runs, such as database connections or source and target files. You use a session parameter in session or workflow properties and define the parameter value in a parameter file. You can also create workflow variables in the Workflow Properties and define the values in a parameter file. Database connections, Cachefile dir, source file names,target file names,reject filenames

111 112 113 114 115 116

117 118 119 120 121

122 123

What is parameter file? Parameter file contains the mapping parameters,mapping varaiables,session parameters. How can u access the remote 1. we need to create the database connection for the relational target source into Ur session? 2. we need to create FTP connection for the flat files

124

What is difference between partioning of relational target and partitioning of file targets? what r the transformations that restricts the partitioning of sessions? . Performance tuning in Informatica?

If we create the partition on relational target, it will create the multiple database connections for each partition and it will run the sql query and write it in to target. If we create the partition on file target, it will create multiple threads to write the target.

125

Normalizer transformation,XML Target, Join transformation for master data

126

After finding the bottlenecks for the mapping: For optimizing the target we need to go for the following options: If it relation target, Drop indexes and key constraints. Increase checkpoint intervals. Use bulk loading. Use external loading. Minimize deadlocks. Increase database network packet size. Optimize Oracle target databases. If the target is flat file, we need to move the file to local server. Optimizing the Source: Optimize the query, use conditional filters, increase the network packet size Optimizing the Mapping: Use Source qualifier/filter transformations to filter the data, remove unnecessary datatype conventions, minimize aggreate function calls,Tune the tansformations Optimizing theSession: Reduce error tracing, increase commit interval, increase cache size for index and data, allocate buffer memory, run sessions and workflows concurrently, remove staging areas,use pushdown optimization.

127

Define informatica repository? The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.Thsea tables stores metadata in specific format the informatica server,client tools use.

128

129

What r the types of metadata Following r the types of metadata that stores in the repository that stores in repository? Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target defintions Transformations What is power center Power Center repository is used to store informatica's meta data . repository? Information such as mapping name,location,target definitions,source definitions,transformation and flow is stored as meta data in the repository. How can u work with remote We have to create remote connection for this. But this is not suggestable in informatica, we need to database in informatica? did u import the source/target objects into local machine. work directly by using remote

130

131

what is incremental aggregation?

Incremental aggregation is used for aggregator transformation. Once the aggregate transformation is placed in mapping, in session properties we need to check the increment aggregation property. So that the data will aggregate for incrmentallly. The first time you run an incremental aggregation session, the Integration Service processes the source. At the end of the session, the Integration Service stores the aggregated data in two cache files, the index and data cache files. The Integration Service saves the cache files in the cache file directory. The next time you run the session, the Integration Service aggregates the new rows with the cached aggregated values in the cache files. When you run a session with an incremental Aggregator transformation, the Integration Service creates a backup of the Aggregator cache files in $PMCacheDir at the beginning of a session run. The Integration Service promotes the backup cache to the initial cache at the beginning of a session recovery run. The Integration Service cannot restore the backup cache file if the session aborts

132

. What r the scheduling options to run a session?

You can schedule a workflow to run continuously, repeat at a given time or interval, or you can manually start a workflow. The Integration Service runs a scheduled workflow as configured. By default, the workflow runs on demand. You can change the schedule settings by editing the scheduler. If you change schedule settings, the Integration Service reschedules the workflow according to the new settings.

133

Scheduling options: Run on server initialization - the integration service runs the workflow as soon as the service is initialized. Run on demand - the integaration service run othe workflow when we start the workflow manually What is tracing level and what Normal :Integration Service logs initialization and status information, errors encountered, and skipped r the types of tracing level? rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. Terse:Integration Service logs initialization information and error messages and notification of rejected data Verbose initialize:In addition to normal tracing, Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics Verbose data: In addition to verbose initialization tracing, Integration Service logs each row that passes

134

What is difference between stored procedure transformation and external procedure transformation? Explain about Recovering sessions?

In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.You need data base connection to import the stored procedure in to maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Iet we need to make it as a DLL to access in maping.No need to have data base connection in case of external procedure transformation. If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.

135

136

137

Use one of the following methods to complete the session: Run the session again if the Informatica Server has not issued a commit. Truncate the target tables and run the session again if the session is not recoverable. performing recovery if the Informatica Server has issued at least one commit. If a session fails after loading Consider By using Perform recovery option setting in session properties of 10,000 records in to the target. How can u load the records from 10001st record when u run the session next time? Explain about perform When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and recovery? notes the row ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table

138

How to recover the standalone session?

A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions. To recover sessions using the menu: 1. In the Server Manager, highlight the session you want to recover. 2. Select Server Requests-Stop from the menu. 3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu. To recover sessions using pmcmd: 1.From the command line, stop the session. 2. From the command line, start recovery. If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property To recover sessions in sequential batches configured to stop on failure: 1.In the Server Manager, open the session property sheet. 2.On the Log Files tab, select Perform Recovery, and click OK. 3.Run the session. 4.After the batch completes, open the session property sheet. 5.Clear Perform Recovery, and click OK. If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session. If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session.

139

How can u recover the session in sequential batches?

140

How to recover sessions in concurrent batches?

If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session. To recover a session in a concurrent batch: 1.Copy the failed session using Operations-Copy Session. 2.Drag the copied session outside the batch to be a standalone session. 3.Follow the steps to recover a standalone session. 4.Delete the standalone copy.

141

How can u complete unrecoverable sessions?

142

Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data. What r the circumstances that The source qualifier transformation does not use sorted ports. informatica server results an If u change the partition information after the initial session fails. unrecoverable session? Perform recovery is disabled in the informatica server configuration. If the sources or targets changes after initial session fails. If the maping consists of sequence generator or normalizer transformation. If a concuurent batch contains multiple failed sessions. If i've done any modifications It will not reflect automatically, we need to reimport the objects or manually we need to add the for my table in back end does changes. it reflect in informatica warehouse or mapping designer or source analyzer? After dragging the ports of If we join the three sources into source qualifier then we can map the ports to target. three sources(sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target? Server Variables $PMCacheDir,$PMBadFileDir,$PMSourceFileDir,$PMTargetFileDir,$PMSessionLogDir,$PMWorkflowLogDir

143

144

145

146

Folders

147

Multiple Servers

Folders provide a way to organize and store all metadata in the repository, including mappings, schemas, and sessions. Folders are designed to be flexible, to help you logically organize the repository. Each folder has a set of configurable properties that help you define how users access the folder. For example, you can create a folder that allows all repository users to see objects within the folder, but not to edit them. Or, you can create a folder that allows users to share objects within the folder. You can create shared and non-shared folders.

Sl. No. Questions 1 what is a Data Warehouse?

What is the conventional definition of a DWH? Explain 2 each term. 3 Draw the architecture of a Datawarehousing system.

What are the goals of the Data 4 warehouse?


5

What are the approaches in constructing a Datawarehouse and the datamart? Data Mart

6
7

Can a datamart be independent?

What are the sources for a 8 datawarehouse? 9 What the difference is between a database, a data warehouse and a data mart?

OLAP (On-Line Analytical Processing)

10
11 what do you mean by

Multidimensional Analysis?

What is the difference between 12 OLAP, ROLAP, MOLAP ,DOLAP?

13

Difference between OLAP & OLTP? What are the different types of 14 OLAP? Give an eg.
15

Which is the suitable data model for a datawarehouse? Why? Star Schema 16
17 What are the benefits of STAR

SCHEMA?
What are Additive Facts? Or what

18 is meant by Additive Fact? 19 Snowflake Schema

20 What is Galaxy schema?

21

What is Dimension & Fact ?

22 Different types of Dimensions 23 Are the dimensional tables normalized? If so when? What is Transaction fact table & 24 Centipede Fact table?
25

Different types of Facts? What are the types of Factless 26 fact tables? What is Granularity? 28 Is the Fact table normalized? 29 Can 2 Fact Tables share same dimensions Tables? Give egs. of the fact, dimensional tables, datamarts/DWH used in your project. Explain what data 30 each contains.
27

31

What are fact constellations?

What is a Fact less fact table 32 ?


33 What is metadata?

What is data quality?


34
35

How do you achieve data quality? 36 What is Parsing & Data Mining. 37 What are surrogate keys? Name a few data modelling 38 tools. Materialized views? Can you insert into materialized 40 views?
41 39

Definition of Adhoc Queries?

What is ODS (Operational Data Store), DSS (Decision support System), Data Staging Area, 42 Data Presentation Area.
43

What is Market-Basket analysis?

SCD Types
44 46
45 what is a Hypercube? 47 Explain the performance

improvement techniques in DW? 48 Explain slice and dice ?

Datawarehouse is a relational database and it is designed for query and analysis purpose. It contains the historical data derived from the transaction data and also it include data from other sources. characterstics of datawarehouse: Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together. Time-variant The changes to the data in the data warehouse are tracked and recorded so that reports can be produced showing changes over time. Non-volatile Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting. Integrated The data warehouse contains data from most or all of an organization's operational systems and this data is made consistent.

The main goals of our Data Warehouse are: 1. understand the users needs by business area, job responsibilities. 2. determine the decisions the business users want to make with the help of the data warehouse 3. choose the most effective, actionable subset of the OLTP data to present in the data warehouse, 4. make sure the data is accurate and can be trusted, labeling it consistently across the enterprise 5. continuously monitor the accuracy of the data and the content of the delivered reports 6. publish the data on a regular basis Top down approach Bottom up approach: data marts are first created to provide reporting and analytical capabilities for specific business processes. Data marts contain atomic data and, if necessary, summarized data. These data marts can eventually be unioned together to create a comprehensive data warehouse A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.[1] Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization. Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts. Datamarts can use for small organizations. So it can be independent.

OLTP and operations sources Database - 1. It is collection of data.( Online Transaction processing). 2. Normalized Form 3. Complex joins 4. More DML operations 5. Holds current data Datawarehouse -1. It is relational database , query and analytical purpose 2. partially normalized/denormalized form 3. Processing Less joins to retrieve data 1. Online Analytical 2. Read only data 3. Partially normalized/denormalized tables 4. holds current and historical data 5. records are based on surrogate field 6. cannot delete the records 7. simplified datamodel multidimensional analysis is a data analysis process that groups data into two basic categories: data dimensions and measurements OLAP - Online Analytical processing, It is normalized data. To retrieve data complex joins has to perform. ROLAP - Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS). MOLAP -Multidimensional OLAP, provides the analysis of data stored in a multi-dimensional data cube. HOLAP - Hybrid OLAP a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS). DOLAP - Desktop OLAP or Database OLAP, provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.

OLAP 1. Online Analytical Processing 2. Read only data 3. Partially normalized/denormalized tables 4. holds current and historical data 5. records are based on surrogate field 6. cannot delete the records 7. simplified datamodel OLTP 1. Online Transaction processing 2. continuosuly updates data 3. normalized form 4. holds current data 5. records are maintained on primary key field 6. delete the table or records 7. complex data model In the OLAP, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP Star schema. It is a demoralized model. No need to use complicated joins. Queries results fastly. Star schema is the simplest datawarehouse design. The main feature of schema is a table at the center called fact table and surrounding dimension tables. Fact tables tables are in a star schema are in database are 3rd normal form. Dimensions are denormalized form It is a demoralized model. No need to use complicated joins. Queries results fastly. Additive facts are fact that can be summed up through all of the dimensions in the fact table Snowflake schema is more complex variation of a star schema design. The main difference is the dimensional tables in a snowflake schema are normalized, so they have a typical relational database design. These are used when the dimension table becomes very big and star schema can't represent the complexity of data structure. This schema is more complex than start and snowflake schema, which is because it contains multiple fact tables. This allows dimension tables to be shared amongest many fact tables. It is very hard to manage.

1. Dimension contains the descriptive information for a fact table. Size of dimension table is smaller than fact. In a schema more number of dimensions are presented than fact table. Surrogate key is used to prevent the primary key violation. values of columns are in numerica nd test representation 1. Fact contains measurements. size of fact table is larger than dimension table. In a schema less number of fact than dimension table. values of columns are always in numeric. Junk Dimension -When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will have few records in them, cluttering your database with these mini identifier tables, all records from all these small dimension tables are loaded into ONE dimension table and we call this dimension table Junk dimension table Confirmed Dimension - The dimension used by multiple facts. For ex Time dimension,Date dimension Degenerated Dimension Slowly Changing Dimensions If the schema is snowflake , then dimensions are normalized form.

Additive Fact Semi additive fact Non Additive fact A fact table which doesn't contains any facts then called as fact less fact table. The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless A second kind of factless fact table is called a coverage table Designing a fact table is to determine the granularity of the fact table. By yes. Fact table will be in 3rd normal form Yes Rating identity instr fact table is fact table, date_dim, all_org_dim, instrument_dim are dimensions

For each star schema it is possible to construct fact constellation schema(for example by splitting the original star schema into more star schemes each of them describes facts on another level of dimension hierarchies). The fact constellation architecture contains multiple fact tables that share many dimension tables. The main shortcoming of the fact constellation schema is a more complicated design because many variants for particular kinds of aggregation must be considered and selected. Moreover, dimension tables are still large A fact table which doesn't contains any facts then called as fact less fact table. metadata is information about data Data quality is the reliability and effectiveness of data, particularly in a data warehouse. Data quality assurance (DQA) is the process of verifying the reliability and effectiveness of data. Maintaining data quality requires going through the data periodically and scrubbing it. To achieve good quality information a database must have good system design accurate and complete user friendly interface data validation Surroage keys are sequence generated numbers. It is always numeric data type. ERWIN,RATIONAL ROSE A materialized view is a table that actually contains rows, but behaves like a view. That is, the data in the table changes when the data in the underlying tables changes it will done in periodical basis. No, we will not insert records into materialized views but certain query refresh will be there in certain period of times then materialized view will get the data from the source tables. Query that is not predefined or anticipated, usually just runs once are called Ad-hoc queries. These are typical in the dataware housing environment. ODS:ODS is the Operational Data Source which is also called transactional data ODS is the source of a warehouse. Data from ODs is staged, transformed and then moved to datawarehouse. DSS:Gathers and presents data from a wide range of sources, typically for business purposes. DSS applications are systems and subsystems that help people make decisions based on data that is pulled from a wide range of sources. This data used for analytical and reporting purpose. Data Staging Area: The Data Warehouse Staging Area is temporary location where data from source systems is copied

Three types of Slowly Changing Dimension, TYPE1 SCD , the new information overrides the current information. No history kept. TYPE2 SCD, the new record is added to the table. History will avialable. TYPE3 SCD: This will maintain the partial history.

we need to find the bottleneck of the mapping then we need to proceed to improve the performance of the warehouse. By using partitions, query tuning, indexing.

You might also like