Professional Documents
Culture Documents
Transform -- the process of converting the extracted data from its previous form
into required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts anddata
warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is transformed into
useful information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It is a set of
specification which allows the client applications in retrieving the data for analytical
processing.
It is a specialized tool that sits between a database and user in order to provide various
analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision
support for top level management.
1.
Business Objects
2.
Cognos
3.
Micro strategy
4.
Hyperion
5.
Oracle Express
6. Microsoft Analysis Services
OLAP
OLTP
Application Oriented (e.g.,
purchase order it is
functionality of an application)
Used to run business
Detailed data
Summarized data
Repetitive access
Ad-hoc access
5
6
Current data
Historical data
Clerical User
Knowledge User
Bulk Loading
10
Time invariant
Time variant
11
Normalized data
De-normalized data
12
E R schema
Star schema
3.
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
It provides a central database for decision support throughout the enterprise
It is a collection of DATAMARTS
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals depts. in an
organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
It is defined as an integrated view of operational database designed to support operational
monitoring
It is a collection of operational data sources designed to support Transaction processing
Data is refreshed near real-time and used for business activity
It is an intermediate between the OLTP and OLAP which helps to create an instance
reports
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
Star schema
Snow flake schema
Star flake schema (or) Hybrid schema
Multi star schema
What is Star Schema?
The Star Schema Logical database design which contains a centrally located fact table
surrounded by at least one or more dimension tables
Since the database design looks like a star, hence it is called star schema db
The Dimension table contains Primary keys and the textual descriptions
It contain de-normalized business information
A Fact table contains a composite key and measures
The measure are of types of key performance indicators which are used to evaluate the
enterprise performance in the form of success and failure
Eg: Total revenue , Product sale , Discount given, no of customers
To generate meaningful report the report should contain at least one dimension and one
fact table
The advantage of star schema
Less number of joins
Improve query performance
Slicing down
Easy understanding of data.
Disadvantage:
Require more storage space
Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.
Star flake schema (or) Hybrid Schema
Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
Multiple fact tables sharing a set of dimension tables
Confirmed Dimensions are nothing but Reusable Dimensions.
The dimensions which u r using multiple times or in multiple data marts.
Those are common in different data marts
Measure Types (or) Types of Facts
Semi Additive - Measures that can be summed up across few dimensions and
not with others
o
Surrogate Key
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers
A sample data warehouse schema
WHY NEED STAGING AREA FOR DWH?
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source.
Its the area where most of the ETL is done
Data Cleansing
It is used to remove duplications
It is used to correct wrong email addresses
It is used to identify missing data
It used to convert the data types
It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
Confirmed Dimensions
Junk Dimensions Garbage Dimension
Degenerative Dimensions
Slowly changing Dimensions
Garbage Dimension or Junk Dimension
Confirmed is something which can be shared by multiple Fact Tables or multiple Data
Marts.
Junk Dimensions is grouping flagged values
Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice
No)
Which is neither fact nor strictly dimension attributes. These are useful
for some kind of analysis. These are kept as attributes in fact table called degenerated
dimension
Degenerate dimension: A column of the key section of the fact table that does not
have the associated dimension table but used for reporting and analysis, such
column is called degenerate dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id,
bill_no, and date in key section and price, quantity, amount in measure section. In this
fact table, bill_no from key section is a single value; it has no associated dimension
table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table to improve
performance. SO here the column, bill_no is a degenerate dimension or line item
dimension.
Informatica Architecture
The Power Center domain
It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from clients
and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources, processes it
as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back to
the
requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata created
in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power
Center installation.
Q. How can you define a transformation? What are different types of
transformations available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes data. The
Designer provides a set of transformations that perform specific functions. For example,
an Aggregator transformation performs calculations on groups of data. Below are the
various transformations available in Informatica:
Aggregator
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a
relational or flat file source when it runs a session. When a relational or a flat file source
definition is added to a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation
whenever it runs the session. The default query is SELET statement containing all the
source columns. Source Qualifier has capability to override this default query by
changing the default settings of the transformation properties. The list of selected ports
or the order they appear in the default query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as
averages and sums. Unlike Expression Transformation, the Aggregator transformation
can only be used to perform calculations on groups. The Expression transformation
permits calculations on a rowby-row basis only.
Aggregator Transformation contains group by ports that indicate how to group the data.
While grouping the data, the aggregator transformation outputs the last row of each
group unless otherwise specified in the transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST,
MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session
option for Incremental Aggregation can be enabled. When PowerCenter performs
incremental aggregation, it passes new source data through the mapping and uses
historical cache data to perform new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to
merge data from various sources (or pipelines). This transformation works just like
UNION ALL statement in SQL, that is used to combine result set of two SELECT
statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or
synonym. It compares lookup transformation ports (input ports) to the source column
values based on the lookup condition. Later returned values can be passed to other
transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet
contains set of transformations and it allows us to reuse that transformation logic in
multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable
transformation is stored as a metadata separate from any other mapping that uses the
transformation. Whenever any changes to a reusable transformation are made, all the
mappings where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to
be inserted in the target table. If the row has to be updated/inserted based on some
logic Update Strategy transformation is used. The condition can be specified in Update
Strategy to mark the processed row for update or insert.
Following options are available for update strategy:
DD_INSERT: If this is used the Update Strategy flags the row for insertion. Equivalent
numeric value of DD_INSERT is 0.
DD_UPDATE: If this is used the Update Strategy flags the row for update. Equivalent
numeric value of DD_UPDATE is 1.
DD_DELETE: If this is used the Update Strategy flags the row for deletion. Equivalent
numeric value of DD_DELETE is 2.
DD_REJECT: If this is used the Update Strategy flags the row for rejection. Equivalent
numeric value of DD_REJECT is 3.
contain shared objects across the repositories in a domain. The objects are shared
through global shortcuts.
Local Repository: Local repository is within a domain and its not a global
repository. Local repository can connect to a global repository using global shortcuts and
can use objects in its shared folders.
Versioned Repository: This can either be local or global repository but it allows
version control for the repository. A versioned repository can store multiple copies, or
versions of an object. This feature allows efficiently developing, testing and deploying
metadata in the production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more
languages. The code page is selected based on source of the data. For example if source
contains Japanese text then the code page should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set,
refers to a specific set of data that describes the characters the application recognizes.
This influences the way that application stores, receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:
IBM DB2
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase
Teradata
Q. How to execute PL/SQL script from Informatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP
Transformation PL/SQL procedure name can be specified. Whenever the session is
executed, the session will call the pl/sql procedure.
Pre and Post Session Thread - One Thread each to Perform Pre and Post Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.
Writer Thread - One Thread for Each Partition if target exist in the source pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each Partition.
Q. What is Session and Batches?
Session - A Session Is A set of instructions that tells the Informatica Server How
And When To Move Data From Sources To Targets. After creating the session, we
can use either the server manager or the command line program pmcmd to start
or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By
The Informatica Server. There Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes aggregate
calculations. When u run a session that uses an aggregator transformation, the
Informatica server creates index and data caches in memory to process the
transformation. If the Informatica server requires more space, it stores overflow values
in cache files.
Q. What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the Informatica Server reads all the
records from the master source and builds index and data caches based on the master
rows. After building the caches, the Joiner transformation reads records
from the detail source and performs joins.
Q. What r the types of lookup caches?
Static cache: You can configure a static or read-only cache for only lookup table. By
default Informatica server creates a static cache. It caches the lookup table and lookup
values in the cache for each row that comes into the transformation. When the lookup
condition is true, the Informatica server does not update the cache while it processes the
lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache
and the target, you can create a look up transformation to use dynamic cache. The
Informatica server dynamically inserts data to the target table.
Persistent cache: You can save the lookup cache files and reuse them the next time
the Informatica server processes a lookup transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with the lookup
table, you can configure the lookup transformation to rebuild the lookup cache.
Shared cache: You can share the lookup cache between multiple transactions. You can
share unnamed cache between transformations in the same mapping.
Q. What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data.
Transformation performs specific function. They are two types of transformations:
1. Active
Rows, which are affected during the transformation or can change the no of rows that
pass through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source
qualifier, Update Strategy, ERP Source Qualifier, Advance External Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression, External
Procedure, Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source
Qualifier.
Q. What are Options/Type to run a Stored Procedure?
A: Normal: During a session, the stored procedure runs where the
transformation exists in the mapping on a row-by-row basis. This is useful for calling the
stored procedure for each row of data that passes through the mapping, such as running
a calculation against an input port. Connected stored procedures run only in normal
mode.
Pre-load of the Source. Before the session retrieves data from the source, the stored
procedure runs. This is useful for verifying the existence of tables or performing joins of
data in a temporary table.
Post-load of the Source. After the session retrieves data from the source, the stored
procedure runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the stored
procedure runs. This is useful for verifying target tables or disk space on the target
system.
Post-load of the Target. After the session sends data to the target, the stored
procedure runs. This is useful for re-creating indexes on the database. It must contain at
least one Input and one Output port.
Q. What kinds of sources and of targets can be used in Informatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that Informatica Server writes
in a log file. Tracing levels store information about mapping and transformations. There
are 4 types of tracing levels supported
1. Normal: It specifies the initialization and status information and summarization of the
success rows and target rows and the information about the skipped rows due to
transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose Initialization: In addition to the Normal tracing, specifies the location of
the data cache files and index cache files that are treated and detailed transformation
statistics for each and every transformation within the mapping.
4. Verbose Data: Along with verbose initialization records each and every record
processed by the informatica server.
Q. TYPES OF DIMENSIONS?
A dimension table consists of the attributes about the facts. Dimensions store
the textual descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table
to which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the date
dimension connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In
the fact table we need to maintain two keys referring to these dimensions.
Instead of that create a junk dimension which has all the combinations of gender
and marital status (cross join gender and marital status table and create a junk
table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesnt have its own dimension table.
Eg: A transactional code in a fact table.
Slowly changing dimension:
Slowly changing dimensions are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the output files that the Informatica server creates during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for all status and
error messages (default name: pm.server.log). It also creates an error log for error
messages. These files will be created in Informatica home directory
Session log file: Informatica server creates session log file for each session. It writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping.
Session detail includes information such as table name, number of rows written or
rejected. You can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to
targets.
Control file: Informatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about
the target flat file such as data format and loading instructions for the external
loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if the session
fails.
Indicator file: If you use the flat file as a target, you can configure the Informatica
server to create indicator file. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the
target file based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache
files.
For the following circumstances Informatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the first row
You specify the target loadorder based on source qualifiers in a mapping. If you have the
multiple source qualifiers connected to the multiple targets, you can designate the order
in which informatica server loads data into the targets.
The Target load Plan defines the order in which data extract from source qualifier
transformation. In Mappings (tab) Target Load Order Plan
It is a web based application that enables you to run reports against repository
metadata. With a Meta data reporter you can access information about your repository
without having knowledge of sql, transformation language or underlying tables in the
repository.
file to the directory local to the Informatica Server. Server waits for the indicator file to
appear before running the session.
Audit Table is nothing but the table which contains about your workflow names and
session names. It contains information about workflow and session status and their
details.
WKFL_RUN_ID
WKFL_NME
START_TMST
END_TMST
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
Q. If session fails after loading 10000 records in the target, how can we load
10001th record when we run the session in the next time?
Select the Recovery Strategy in session properties as Resume from the last check
point. Note Set this property before running the session
Staging area is nothing but to apply our logic to extract the data from source
and cleansing the data and put the data into meaningful and summaries of the
data for data warehouse.
Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the
multiple targets based on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like
id name dept row_num
1 aa 4 1
3 cc 4 2
So row num changed and it is an active transformation
2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6
it never restricts the data from any source so the data can
come in any manner
id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6
so the row_num are changing . Thus we say that union is an active transformation
Q. What is use of batch file in informatica? How many types of batch file
in informatica?
With the batch file, we can run sessions either in sequential or in concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.
If u have sessions with source-target dependencies u have to go for sequential
batch to start the sessions one after another. If u have several independent
sessions u can use concurrent batches Which run all the sessions at the same
time
Q. What is joiner cache?
SQL statements executed using the source database connection, after a pipeline
is run write post sql in target table as truncate table name. we have the property
in session truncate option.
Q. What is polling in informatica?
It displays the updated information about the session in the monitor window.
The monitor window displays the status of each session when you poll the
Informatica server.
Q. How i will stop my workflow after 10 errors
Constraint based load order defines the order in which data loads into the
multiple targets based on primary key and foreign key relationship.
Q. What is target load plan
Data driven is available at session level. it says that when we r using update
strategy t/r ,how the integration service fetches the data and how to
update/insert row in the database log.
Data driven is nothing but instruct the source rows that should take action on
target i.e(update,delete,reject,insert). If we use the update strategy
transformation in a mapping then will select the data driven option in session.
Q. How to run workflow in unix?
Constraint Based Load order defines load the data into multiple targets depend
on the primary key foreign key relation.
set the option is: Double click the session
Configure Object check the Constraint Based Loading
Q. Difference between top down(w.h inmon)and bottom up(ralph
kimball)approach?
Top Down approach:As per W.H.INWON, first we need to build the Data warehouse after that we
need to build up the DataMart but this is so what difficult to maintain the DWH.
Bottom up approach;As per Ralph Kimbal, first we need to build up the Data Marts then we need to
build up the Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.
Q. What are the different caches used in informatica?
Static cache
Dynamic cache
Shared cache
Persistent cache
$ls -lrt
Create a Oracle source with how much ever column you want and write the join
query in SQL query override. But the column order and data type should be
same as in the SQL query.
Q. How to call unconnected lookup in expression transformation?
:LKP.LKP_NAME(PORTS)
Connected lookup:
CONNECTED LOOKUP:
>> It will participated in data pipeline
>> It contains multiple inputs and multiple outputs.
>> It supported static and dynamic cache.
UNCONNECTED LOOKUP:
>> It will not participated in data pipeline
>> It contains multiple inputs and single output.
>> It supported static cache only.
Q. Types of partitioning in Informatica?
Partition 5 types
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
Lookup transformation
Aggregator transformation
Rank transformation
Sorter transformation
Joiner transformation
Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert
and which one is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update
Q. How will you check the bottle necks in informatica? From where do
you start checking?
You start as per this order
1. Target
2. Source
3. Mapping
4. Session
5. System
Q. What is incremental aggregation?
When the aggregator transformation executes all the output data will get stored
in the temporary location called aggregator cache. When the next time the
mapping runs the aggregator transformation runs for the new records loaded
after the first run. These output values will get incremented with the values in
the aggregator cache. This is called incremental aggregation. By this way we can
improve performance...
--------------------------Incremental aggregation means applying only the captured changes in the
source to aggregate calculations in a session.
When the source changes only incrementally and if we can capture those
changes, then we can configure the session to process only those changes. This
allows informatica server to update target table incrementally, rather than
forcing it to process the entire source and recalculate the same calculations each
time you run the session. By doing this obviously the session performance
increases.
Q. How can i explain my project architecture in interview..? Tell me your
project flow from source to target..?
Project architecture is like
1. Source Systems: Like Mainframe,Oracle,People soft,DB2.
2. Landing tables: These are tables act like source. Used for easy to access, for
backup purpose, as reusable for other mappings.
3. Staging tables: From landing tables we extract the data into staging tables
after all validations done on the data.
4. Dimension/Facts: These are the tables those are used for analysis and
make decisions by analyzing the data.
5. Aggregation tables: These tables have summarized data useful for
managers who wants to view monthly wise sales, year wise sales etc.
6. Reporting layer: 4 and 5 phases are useful for reporting developers to
generate reports. I hope this answer helps you.
Q. What type of transformation is not supported by mapplets?
Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets
on employee_target today. Your target already have the data of that employees
having hire date up to 31-12-2009.so you now pickup the source data which are
hiring from 1-1-2010 to till date. That's why you needn't take the data before
than that date, if you do that wrongly it is overhead for loading data again in
target which is already exists. So in source qualifier you filter the records as per
hire date and you can also parameterized the hire date that help from which
date you want to load data upon target.
This is the concept of Incremental loading.
Q. What is target update override?
By Default the integration service updates the target based on key columns. But
we might want to update non-key columns also, at that point of time we can
override the
UPDATE statement for each target in the mapping. The target override affects
only when the source rows are marked as update by an update strategy in the
mapping.
Q. What is the Mapping parameter and Mapping variable?
Mapping parameter: Mapping parameter is constant values that can be
defined before mapping run. A mapping parameter reuses the mapping for
various constant values.
Mapping variable: Mapping variable is represent a value that can be change
during the mapping run that can be stored in repository the integration service
retrieve that value from repository and incremental value for next run.
Q. What is rank and dense rank in informatica with any examples and
give sql query for this both ranks
for eg: the file contains the records with column
100
200(repeated rows)
200
300
400
500
the rank function gives output as
1
2
2
4
5
6
and dense rank gives
1
2
2
3
4
5
text editor such as word pad or notepad. You can define the following values in
parameter file
Mapping parameters
Mapping variables
Session parameters
Q. What is session override?
Session override is an option in informatica at session level. Here we can
manually give a sql query which is issued to the database when the session
runs. It is nothing but over riding the default sql which is generated by a
particular transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. In 8.1.1 we can do all the creation of
IS and repository Service, web service, Domain, node, grid ( if we have licensed
version),In 8.6.1 the Informatica Admin console we can manage both Domain
page and security page. Domain Page means all the above like creation of IS
and repository Service, web service, Domain, node, grid ( if we have licensed
version) etc. Security page means creation of users, privileges, LDAP
configuration, Export Import user and Privileges etc.
Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
--------------------------------Parameter files are created with an extension of .PRM
These are created to pass values those can be changed for Mapping Parameter
and Session Parameter during mapping run.
Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create
already in the Mapping with Data Type , Precision and scale.
The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value
After that we have to select the properties Tab of Session and Set Parameter file
name including physical path of this xxxx.prm file.
Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your lookup.
3) Check the Port CLASS exists as either input port or a variable port in your
expression.
4) Check data type of CLASS and the data type of input parameter for your
lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to
identify a database record without any extraneous data (unique). Each table
may have one or more candidate keys. One of these candidate keys is selected
as the table primary key else are called Alternate Key.
Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female
Account status:Active/Inactive
Btree index is used for unique values.
ex: empid.
1.
every minute
2.
of every hour
3.
of every day of the month
4.
of every month
5.
and every day in the week.
In short: This script is being executed every minute.
Without exception.
Execute every Friday 1AM
So if we want to schedule the script to run at 1AM every
Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 5 (=Friday)
Execute on weekdays 1AM
So if we want to schedule the script to run at 1AM every Friday, we would need
the following cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 1-5 (=Monday til Friday)
Execute 10 past after every hour on the 1st of every month
Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.
Q. Can anyone tell me the difference between persistence and dynamic
caches? On which conditions we are using these caches?
Dynamic:-1)When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
2)In Dynamic, we can update catch will New data also.
3) Dynamic cache, Not Reusable
(when we need Updated cache data, That only we need Dynamic Cache)
Persistent:--
Sorter Transformations
Aggregator Transformations
Filter Transformation
Union Transformation
Joiner Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Expression Transformation
Lookup Transformation
Select id, count (*) from seq1 group by id having count (*)>1;
Below are the ways to eliminate the duplicate records:
1. By enabling the option in Source Qualifier transformation as select
distinct.
2. By enabling the option in sorter transformation as select distinct.
3. By enabling all the values as group by in Aggregator transformation.
Q. Can anyone give idea on how do we perform test load in informatica? What
do we test as part of test load in informatica?
With a test load, the Informatica Server reads and transforms data without writing to
targets. The Informatica Server does everything, as if running the full session. The
Informatica Server writes data to relational targets, but rolls back the data when the
session completes. So, you can enable collect performance details property and analyze
the how efficient your mapping is. If the session is running for a long time, you may like
to find out the bottlenecks that are existing. It may be bottleneck of type target, source,
mapping etc.
The basic idea behind test load is to see the behavior of Informatica Server with your
session.
Q. What is ODS (Operational Data Store)?
Domains
Nodes
Services
Q. WHAT IS VERSIONING?
Its used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that everyone can see
those changes.
2. Check out: You check out from the main stream when you want to make any change to
the mapping/workflow.
3. Version history: It will show you all the changes made and who made it.
Column indicator:
D -Valid
o - Overflow
n - Null
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to
the target
The reject data is stored on reject files. You can check the data and reload the
data in to the target using reject reload utility.
Q. Difference between STOP and ABORT?
Stop - If the Integration Service is executing a Session task when you issue the stop
command, the Integration Service stops reading data. It continues processing and
writing data and committing data to targets. If the Integration Service cannot finish
processing and committing data, you can issue the abort command.
Abort - The Integration Service handles the abort command for the Session task like the
stop command, except it has a timeout period of 60 seconds. If the Integration Service
cannot finish processing and committing data within the timeout period, it kills the DTM
process and terminates the session.
The integration service increments the generated key (GK) sequence number each time
it process a source row. When the source row contains a multiple-occurring column or a
multiple-occurring group of columns, the normalizer transformation returns a row for
each occurrence. Each row contains the same generated key value.
The normalizer transformation has a generated column ID (GCID) port for each
multiple-occurring column. The GCID is an index for the instance of the multipleoccurring data. For example, if a column occurs 3 times in a source record, the
normalizer returns a value of 1, 2 or 3 in the generated column ID.
TABLES
VIEWS
INDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT IS @@ERROR?
The @@ERROR automatic variable returns the error code of the last TransactSQL statement. If there was no error, @@ERROR returns zero. Because @@ERROR is
reset after each Transact-SQL statement, it must be saved to a variable if it is needed to
process it further after checking it.
Correlated subquery runs once for each row selected by the outer query. It
contains a reference to a value from the row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. It does
not contain any reference to the outer query row.
For example,
Correlated Subquery:
Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal =
(select max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested Subquery:
Select empname, basicsal, deptno from emp where (deptno, basicsal) in (select
deptno, max(basicsal) from emp group by deptno)
Q. HOW DOES ONE ESCAPE SPECIAL CHARACTERS WHEN BUILDING
SQL QUERIES?
The LIKE keyword allows for string searches. The _ wild card character is used
to match exactly one character, % is used to match zero or more occurrences
of any characters. These characters can be escaped in SQL. Example:
SELECT name FROM emp WHERE id LIKE %\_% ESCAPE \;
Use two quotes for every one displayed. Example:
SELECT Frankss Oracle site FROM DUAL;
SELECT A quoted word. FROM DUAL;
SELECT A double quoted word. FROM DUAL;
Surrogate key:
Query processing is fast.
It is only numeric
Developer develops the surrogate key using sequence generator transformation.
Eg: 12453
Primary key:
Query processing is slow
Can be alpha numeric
Source system gives the primary key.
Eg: C10999
Q. How does the server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
B-tree index
B-tree cluster index
Hash cluster index
Reverse key index
Bitmap index
Function Based index
Q. How do identify the empty line in a flat file in UNIX? How to remove it?
grep v ^$ filename
Q. How do send the session report (.txt) to manager after session
is completed?
Email variable %a (attach the file) %g attach session log file
Q. How to check all the running processes in UNIX?
$> ps ef
Q. How can i display only and only hidden file in the current directory?
ls -a|grep "^\."
Q. How to display the first 10 lines of a file?
# head -10 logfile
Q. How to display the last 10 lines of a file?
# tail -10 logfile
Q. How did you schedule sessions in your project?
1. Run once Set 2 parameter date and time when session should start.
2. Run Every Informatica server run session at regular interval as we configured,
parameter Days, hour, minutes, end on, end after, forever.
3. Customized repeat Repeat every 2 days, daily frequency hr, min, every week, every
month.
Q. What is lookup override?
This feature is similar to entering a custom query in a Source Qualifier transformation.
When entering a Lookup SQL Override, you can enter the entire override, or generate
and edit the default SQL statement.
The lookup query override can include WHERE clause.
Q. What is Sql Override?
The Source Qualifier provides the SQL Query option to override the default query. You
can enter any SQL statement supported by your source database. You might enter your
own SELECT statement, or have the database perform aggregate calculations, or call a
stored procedure or stored function to read the data and perform some tasks.
Q. How to get sequence value using Expression?
v_temp = v_temp+1
o_seq = IIF(ISNULL(v_temp), 0, v_temp)
Q. How to get Unique Record?
Source > SQ > SRT > EXP > FLT OR RTR > TGT
In Expression:
flag = Decode(true,eid=pre_eid, Y,'N)
flag_out = flag
pre_eid = eid
TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the transaction, and begins a new transaction. The current row is in the
committed transaction.
TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in
the new transaction.
TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
T-SQL select command and can come from one to many different base tables or even
other views.
Q. What is Index?
An index is a physical structure containing pointers to the data. Indices are created in an
existing table to locate rows more quickly and efficiently. It is possible to create an index
on one or more columns of a table, and each index is given a name. The users cannot
see the indexes; they are just used to speed up queries. Effective indexes are one of the
best ways to improve performance in a database application. A table scan happens when
there is no index available to help a query. In a table scan SQL Server examines every
row in the table to satisfy the query results. Table scans are sometimes unavoidable, but
on large tables, scans have a terrific impact on performance. Clustered indexes define
the physical sorting of a database tables rows in the storage media. For this reason,
each database table may
have only one clustered index. Non-clustered indexes are created outside of the
database table and contain a sorted list of references to the table itself.
Q. What is the difference between clustered and a non-clustered index?
A clustered index is a special type of index that reorders the way records in the table are
physically stored. Therefore table can have only one clustered index. The leaf nodes of a
clustered index contain the data pages. A nonclustered index is a special type of index in
which the logical order of the index does not match the physical stored order of the rows
on disk. The leaf node of a nonclustered index does not consist of the data pages.
Instead, the leaf nodes contain index rows.
Q. What is Cursor?
Cursor is a database object used by applications to manipulate data in a set on a row-by
row basis, instead of the typical SQL commands that operate on all the rows in the set at
one time.
In order to work with a cursor we need to perform some steps in the following order:
Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor
Deallocate cursor
Q. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
1. Specifies a search condition for a group or an aggregate. HAVING can be used only
with the SELECT statement.
2. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING
behaves like a WHERE clause.
3. Having Clause is basically used only with the GROUP BY function in a query. WHERE
Clause is applied to each row before they are part of the GROUP BY function in a query.
RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it compares
an input row with rows in the data cache. If the input row out-ranks a Stored row, the
Power Center Server replaces the stored row with the input row.
Example: Power Center caches the first 5 rows if we are finding top 5 salaried
Employees. When 6th row is read, it compares it with 5 rows in cache and places it in
Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are Using
Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is generally
larger than the index cache. To reduce the data cache size, connect only the necessary
input/output ports to subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK Transformations are
stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.
Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it completes Aggregate
calculations.
2. It stores group values in an index cache and row data in the data cache. If the Power
Center Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation
with sorted ports. It does not use cache memory. We do not need to configure cache
memory for Aggregator transformations that use sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are using
Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index
cache and Data Cache based on MASTER table.
1) Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
Example: DEPTNO in our mapping.
Unconnected Lookup
Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
2) In Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The
PowerCenter Server saves or deletes lookup cache files after a successful session
based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
Informatica - Transformations
In Informatica, Transformations help to transform the source data according to
the requirements of target system and it ensures the quality of the data being
loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it
from source to target. (i.e) It eliminates rows that do not meet the condition in
transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through
it (i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to
target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the
mapping. It is called within another transformation, and returns a value to that
transformation.
==============================================================================
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is stored in denormalized format.
Also, Normalizer transformation can be used to create multiple rows from a single row of
data.
==============================================================================
Rank Transformation
Rank transformation is an Active and Connected transformation.
It is used to select the top or bottom rank of data.
For example,
To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
==============================================================================
Router Transformation
Router is an Active and Connected transformation. It is similar to filter transformation.
The only difference is, filter transformation drops the data that do not meet the condition
whereas router has an option to capture the data that do not meet the condition. It is
useful to test multiple conditions.
It has input, output and default groups.
For example, if we want to filter data like where State=Michigan, State=California,
State=New York and all other States. Its easy to route data to different tables.
==============================================================================
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected transformation. It is
used to create unique primary key values or cycle through a sequential range of
numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two
fields CURRVAL and NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a transformation or
target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By
value.
==============================================================================
Sorter Transformation
Sorter transformation is a Connected and an Active transformation.
It allows sorting data either in ascending or descending order according to a specified
field.
Also used to configure for case-sensitive sorting, and specify whether the output rows
should be distinct.
==============================================================================
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation. When adding
a relational or a flat file source definition to a mapping, it is must to connect it to a
Source Qualifier transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
==============================================================================
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected &
Unconnected transformation. It is useful to automate time-consuming tasks and it is also
used in error handling, to drop and recreate indexes and to determine the space in
database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure
transformation, and the stored procedure can exist in a source, target, or any database
with a valid connection to the Informatica Server. Stored Procedure is an executable
script with SQL statements and control statements, user-defined variables and
conditional statements.
==============================================================================
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation.
It is used to update data in target table, either to maintain history of data or recent
changes.
You can specify how to treat source rows in table, insert, update, delete or data driven.
==============================================================================
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation.
XML Source Qualifier is used only with an XML source definition.
It represents the data elements that the Informatica Server reads when it executes a
session with XML sources.
==============================================================================
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When
you select this option, the Integration Service orders the target load on a row-by-row
basis. For every row generated by an active source, the Integration Service loads the
corresponding transformed row first to the primary key table, then to any foreign key
tables. Constraint-based loading depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You cannot use
updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service
reverts to normal loading for those tables, but loads all other targets in the session using
constraint-based loading when possible. For example, a mapping contains three distinct
pipelines. The first two contain a source, source qualifier, and target. Since these two
targets receive data from different active sources, the Integration Service reverts to
normal loading for both targets. The third pipeline contains a source, Normalizer, and
two targets. Since these two targets share a single active source (the Normalizer), the
Integration Service performs constraint-based loading: loading the primary key table
first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform
constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service
reverts to a normal load. For example, you have one target containing a primary key and
a foreign key related to the primary key in a second target. The second target also
contains a foreign key that references the primary key in the first target. The Integration
Service cannot enforce constraint-based loading for these tables. It reverts to a normal
load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target
connection group. If you want to specify constraint-based loading for multiple targets
that receive data from the same active source, you must verify the tables are in the
same target connection group. If the tables with the primary key-foreign key relationship
are in different target connection groups, the Integration Service cannot enforce
constraint-based loading when you run the workflow. To verify that all targets are in the
same target connection group, complete the following tasks:
Verify all targets are in the same target load order group and receive data from the
same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As is set to
insert. You might get inconsistent data if you select a different Treat Source Rows As
option and you configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data
to a primary key table first, split the mapping using one of the following options:
Load primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping. Target
load ordering defines the order the Integration Service reads the sources in each target
load order group in the mapping. A target load order group is a collection of source
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys
referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign
key.
Since these tables receive records from a single active source, SQ_A, theIntegration
Service loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and
contains a primary key referenced by T_2 and T_3. The Integration Service then loads
T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any
particular order. The Integration Service loads T_4 last, because it has a foreign key that
references a primary key in T_3.After loading the first set of targets, the Integration
Service begins reading source B. If there are no key relationships between T_5 and T_6,
the Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive
data from a single active source, the Aggregator AGGTRANS, the Integration Service
loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are
in another target connection group together if you use the same database connection for
each target and you use the default partition properties. The Integration Service includes
T_5 and T_6 in a different target connection group because they are in a different target
load order group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target
load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the Treat Source
Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load
Ordering.
3. Click OK.
When you use a mapplet in a mapping, the Mapping Designer lets you set the target
load plan for sources within the mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target
definition. In the Designer, you can set the order in which the Integration Service sends
rows to targets in different target load order groups in a mapping. A target load order
group is the collection of source qualifiers, transformations, and targets linked together
in a mapping. You can set the target load order if you want to maintain referential
integrity when inserting, deleting, or updating tables that have the primary key and
foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it
processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets, create one
source qualifier for each target within a mapping. To set the target load order, you then
determine in which order the Integration Service reads each source in the mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and
T_ITEMS. The second target load order group includes all other objects in the mapping,
including the TOTAL_ORDERS target. The Integration Service processes the first target
load order group, and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at
the same time.
To set the target load order:
Create a mapping that contains multiple target load order groups.
Click Mappings > Target Load Plan.
The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping
and the targets that receive data from each source qualifier.
Select a source qualifier from the list.
Click the Up and Down buttons to move the source qualifier within the load order.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.
1.
2.
3.
4.
5.
6.
7.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10.
Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT
is visible when datatype is INT or SMALLINT.
11.
Create variable $$var_set of MAX aggregation type.
12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,
out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+
COMM. To add $$BONUS to it, select variable tab and select the parameter from
mapping parameter. SAL + COMM + $$Bonus
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.
SETMAXVARIABLE($$var_max,SAL)
17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow, worklet, or
session.
Parameter files provide flexibility to change these variables each time we run a workflow
or session.
We can create multiple parameter files and change the file we use for a session or
workflow. We can create a parameter file using a text editor such as WordPad or
Notepad.
Enter the parameter file name and directory in the workflow or session properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a workflow.
Worklet variable: References values and records information in a worklet. Use
predefined worklet variables in a parent workflow, but we cannot use workflow variables
from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to session, such as a
database connection or file name.
Mapping parameter and Mapping variable
USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading identifies
the Integration Service, Integration Service process, workflow, worklet, or session to
which we want to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can specify the parameter file name and directory in the workflow or session
properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
Mapplet
Solution3:
1. Import one flat file definition and make the mapping as per need.
2. Now make a notepad file that contains the location and name of each 10 flat files.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory location fields,
give the name and location of above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
Incremental Aggregation
When we enable the session option-> Incremental Aggregation the Integration
Service performs incremental aggregation, it passes source data through the mapping
and uses historical cache data to perform aggregation calculations incrementally.
When using incremental aggregation, you apply captured changes in the source to
aggregate calculations in a session. If the source changes incrementally and you can
capture changes, you can configure the session to process those changes. This allows
the Integration Service to update the target incrementally, rather than forcing it to
process the entire source and recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data every day.
You can capture those incremental changes because you have added a filter condition to
the mapping that removes pre-existing data from the flow of data. You then enable
incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March
1, you use the entire source. This allows the Integration Service to read and store the
necessary aggregate data. On March 2, when you run the session again, you filter out all
the records except those time-stamped March 2. The Integration Service then processes
the new data and updates the target accordingly. Consider using incremental
aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture
new source data each time you run the session. Use a Stored Procedure or Filter
transformation to process new data.
Incremental changes do not significantly change the target. Use incremental aggregation
when the changes do not significantly change the target. If processing the incrementally
changed source alters more than half the existing target, the session may not benefit
from using incremental aggregation. In this case, drop the table and recreate the target
with complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile or median
functions. The Integration Service uses system memory to process these functions in
addition to the cache memory you configure in the session properties. As a result, the
Integration Service does not store incremental aggregation values for percentile and
median functions in disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration Service
processes the entire source. At the end of the session, the Integration Service stores
aggregate data from that session run in two files, the index file and the data file. The
Integration Service creates the files in the cache directory specified in the Aggregator
transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the
incremental source changes in the session. For each input record, the Integration
Service checks historical information in the index file for a corresponding group. If it
finds a corresponding group, the Integration Service performs the aggregate operation
incrementally, using the aggregate data for that group, and saves the incremental
change. If it does not find a corresponding group, the Integration Service creates a new
group and saves the record data.
(iii)When writing to the target, the Integration Service applies the changes to the
existing target. It saves modified aggregate data in the index and data files to be used
as historical data the next time you run the session.
(iv) If the source changes significantly and you want the Integration Service to continue
saving aggregate data for future incremental changes, configure the Integration Service
to overwrite existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the Integration
Service creates a backup of the incremental aggregation files. The cache directory for
the Aggregator transformation must contain enough disk space for two sets of the files.
(v)When you partition a session that uses incremental aggregation, the Integration
Service creates one set of cache files for each partition.
The Integration Service creates new aggregate data, instead of using historical data,
when you perform one of the following tasks:
Save a new version of the mapping.
Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or directory for the files
in the session properties.
Change the configured path or directory for the aggregate files without moving the files
to the new location.
Delete cache files.
Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the data in the
previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure,
periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session
properties:
Implement mapping logic or filter to remove pre-existing data.
Configure the session for incremental aggregation and verify that the file directory has
enough disk space for the aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in source data. You
can use a Filter or Stored Procedure transformation in the mapping to remove preexisting source data during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental aggregation:
(i) Verify the location where you want to store the aggregate files.
The index and data files grow in proportion to the source data. Be sure the cache
directory has enough disk space to store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you want
the files stored. Then, enter the appropriate directory for the process variable,
$PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the
index and data files. However, by using the process variable for all sessions using
incremental aggregation, you can easily change the cache directory when necessary by
changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration Service to
reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot find.
When an Integration Service rebuilds incremental aggregation files, it loses aggregate
history.
(ii) Verify the incremental aggregation settings in the session properties.
You can configure the session for incremental aggregation in the Performance settings
on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you choose to
reinitialize the cache, the Workflow Manager displays a warning indicating the
Integration Service overwrites the existing cache and a reminder to clear this option
after running the session.
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the following
statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;
TASKS
The Workflow Manager contains many types of tasks to help you build workflows and
worklets. We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Session
Email
Command
Event-Raise
Event-Wait
Timer
Decision
Assignment
Control
1.
2.
3.
4.
5.
6.
7.
8.
9.
1.
2.
3.
4.
5.
6.
7.
Reusable or not
Yes
Yes
Yes
No
No
No
No
No
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and when to
move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email during a
workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
In the Task Developer or Workflow Designer, choose Tasks-Create.
Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave this field
blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.
Example: To send an email when a session completes:
Steps:
Create a workflow wf_sample_email
Drag any session task to workspace.
Edit Session task and go to Components tab.
See On Success Email Option there and configure it.
In Type select reusable or Non-reusable.
In Value, select the email task to be used.
Click Apply -> Ok.
Example1: Use an event wait task and make sure that session s_filter_example runs
when abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In the blank space, give directory and filename to watch.
Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example succeeds.
Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then done. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event
and Select EVENT1 from the list displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create
and then done.
9. Link EW_WAIT to START task.
10.
Right click EW_WAIT -> EDIT-> EVENTS tab.
11.
Select User Defined there. Select the Event1 by clicking Browse Events button.
12.
Apply -> OK.
13.
Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14.
Mapping -> Validate
15.
Repository -> Save.
Run workflow and see.
TIMER TASK
The Timer task allows us to specify the period of time to wait before the Power Center
Server runs the next task in the workflow. The Timer task has two types of settings:
Absolute time: We specify the exact date and time or we can choose a user-defined
workflow variable to specify the exact time. The next task in workflow will run as per the
date and time specified.
Relative time: We instruct the Power Center Server to wait for a specified period of
time after the Timer task, the parent workflow, or the top-level workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create
and then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT -> TIMER tab.
5. Select Relative Time Option and Give 1 min and Select From start time of this task
Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TIMER_Example.
8. Workflow-> Validate and Repository -> Save.
DECISION TASK
The Decision task allows us to enter a condition that determines the execution of the
workflow, similar to a link condition.
Control
Option
Fail Me
Fail Parent
Stop Parent
Abort Parent
Description
Fails the control task.
Marks the status of the WF or worklet that
contains the
Control task as failed.
Stops the WF or worklet that contains the
Control task.
Aborts the WF or worklet that contains the
Control task.
Fails the workflow that is running.
Fail Top-Level
WF
Stop Top-Level Stops the workflow that is running.
WF
Abort Top-Level Aborts the workflow that is running.
WF
Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values
or expressions to user-defined variables.
Scheduler
We can schedule a workflow to run continuously, repeat at a given time or interval, or
we can manually start a workflow. The Integration Service runs a scheduled workflow as
configured.
By default, the workflow runs on demand. We can change the schedule settings by
editing the scheduler. If we change schedule settings, the Integration Service
reschedules the workflow according to the new settings.
For each folder, the Workflow Manager lets us create reusable schedulers so we
can reuse the same set of scheduling settings for workflows in the folder.
When we delete a reusable scheduler, all workflows that use the deleted
scheduler becomes invalid. To make the workflows valid, we must edit them and replace
the missing scheduler.
Steps:
1.
2.
3.
4.
5.
6.
Run on Demand
2.
Run Continuously
3.
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The Integration
Service then starts the next run of the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The
Integration Service then starts the next run of the workflow according to settings in
Schedule Options.
Schedule options for Run on Server initialization:
Customized Repeat: Integration Service runs the workflow on the dates and
times specified in the Repeat dialog box.
Start options for Run on Server initialization:
Start Date
Start Time
End options for Run on Server initialization:
End After: IS stops scheduling the workflow after the set number of
Workflow runs.
Forever: IS schedules the workflow as long as the workflow does not fail.
2.
3.
In the Scheduler tab, choose Non-reusable. Select Reusable if we want to
select an existing reusable scheduler for the workflow.
4.
5.
6.
Click the right side of the Scheduler field to edit scheduling settings for the
non- reusable scheduler
7.
8.
9.
Click Ok.
Points to Ponder:
To remove a workflow from its schedule, right-click the workflow in the Navigator
window and choose Unscheduled Workflow.
You can push transformation logic to the source or target database using pushdown
optimization. When you run a session configured for pushdown optimization,
theIntegration Service translates the transformation logic into SQL queries and
sends the SQL queries to the database. The source or target database executes the
SQL queries to process the transformations.
The amount of transformation logic you can push to the database depends on the
database, transformation logic, and mapping and session configuration. The Integration
Service processes all transformation logic that it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping
logic that the Integration Service can push to the source or target database. You can
also use the Pushdown Optimization Viewer to view the messages related to pushdown
optimization.
The following figure shows a mapping containing transformation logic that can be
an ID greater than 1005. The Integration Service can push the transformation logic to
the database. It generates the following SQL statement to process the transformation
logic:
INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT
database. It generates and executes SQL statements against the source or target based
on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the
database server must run a long transaction. Consider the following database
performance issues when you generate a long transaction:
To minimize database performance issues for long transactions, consider using sourceside or target-side pushdown optimization.
Integration Service Behavior with Full Optimization
When you configure a session for full optimization, the Integration Service analyzes the
mapping from the source to the target or until it reaches a downstream transformation it
cannot push to the target database. If the Integration Service cannot push all
transformation logic to the target database, it tries to push all transformation logic to the
source database. If it cannot push all transformation logic to the source or target, the
Integration Service pushes as much transformation logic to the source database,
processes intermediate transformations that it cannot push to any database, and then
pushes the remaining transformation logic to the target database. The Integration
Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for
each database to which it pushes transformation logic.
For example, a mapping contains the following transformations:
The Rank transformation cannot be pushed to the source or target database. If you
configure the session for full pushdown optimization, the Integration Service pushes the
Source Qualifier transformation and the Aggregator transformation to the source,
processes the Rank transformation, and pushes the Expression transformation and
target to the target database. The Integration Service does not fail the session if it can
push only part of the transformation logic to the database.
Active and Idle Databases
During pushdown optimization, the Integration Service pushes the transformation logic
to one database, which is called the active database. A database that does not process
transformation logic is called an idle database. For example, a mapping contains two
sources that are joined by a Joiner transformation. If the session is configured for
source-side pushdown optimization, the Integration Service pushes the Joiner
transformation logic to the source in the detail pipeline, which is the active database.
The source in the master pipeline is the idle database because it does not process
transformation logic.
The Integration Service uses the following criteria to determine which database is active
or idle:
IBM DB2
Microsoft SQL Server
Netezza
Oracle
Sybase ASE
Teradata
Databases that use ODBC drivers
When you push transformation logic to a database, the database may produce different
output than the Integration Service. In addition, the Integration Service can usually push
more transformation logic to a database if you use a native driver, instead of an ODBC
driver.
Comparing the Output of the Integration Service and Databases
The Integration Service and databases can produce different results when processing the
same transformation logic. The Integration Service sometimes converts data to a
different format when it reads data. The Integration Service and database may also
handle null values, case sensitivity, and sort order differently.
The database and Integration Service produce different output when the following
settings and conversions are different:
For example, the Integration Service pushes the following expression to the database:
TO_DATE( DATE_PROMISED, 'MM/DD/YY' )
The database interprets the date string in the DATE_PROMISED port based on the
specified date format string MM/DD/YY. The database converts each date string, such as
01/22/98, to the supported date value, such as Jan 22 1998 00:00:00.
If the Integration Service pushes a date format to an IBM DB2, a Microsoft SQL Server,
or a Sybase database that the database does not support, the Integration Service stops
pushdown optimization and processes the transformation.
The Integration Service converts all dates before pushing transformations to an Oracle or
Teradata database. If the database does not support the date format after the date
conversion, the session fails.
HH24 date format. You cannot use the HH24 format in the date format
string for Teradata. When the Integration Service generates SQL for a
Teradata database, it uses the HH format string instead.
Blank spaces in date format strings. You cannot use blank spaces in
the date format string in Teradata. When the Integration Service
generates SQL for a Teradata database, it substitutes the space with B.
Handling subsecond precision for a Lookup transformation. If you
enable subsecond precision for a Lookup transformation, the database
and Integration Service perform the lookup comparison using the
subsecond precision, but return different results. Unlike the Integration
Service, the database does not truncate the lookup results based on
subsecond precision. For example, you configure the Lookup
transformation to show subsecond precision to the millisecond. If the
lookup result is 8:20:35.123456, a database returns 8:20:35.123456, but
the Integration Service returns 8:20:35.123.
SYSDATE built-in variable. When you use the SYSDATE built-in
variable, the Integration Service returns the current date and time for the
node running the service process. However, when you push the
transformation logic to the database, the SYSDATE variable returns the
current date and time for the machine hosting the database. If the time
zone of the machine hosting the database is not the same as the time
zone of the machine running the Integration Service process, the results
can vary.