Professional Documents
Culture Documents
PREPARED BY:
Ammar Hasan
CONTENTS
CHAPTER 1: TOOL KNOWLEDGE
1.1 Informatica PowerCenter
1.2 Product Overview
1.2.1 PowerCenter Domain
1.2.2 Administration Console
1.2.3 PowerCenter Repository
1.2.4 PowerCenter Client
1.2.5 Repository Service
1.2.6 INTEGRATION SERVICE
1.2.7 WEB SERVICES HUB
1.2.8 DATA ANALYZER
1.2.9 METADATA MANAGER
CHAPTER 2:
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
REPOSITORY MANAGER
CHAPTER 3:
DESIGNER
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
Union Transformation
Sorter Transformation
Rank Transformation
Aggregator Transformation
Joiner Transformation
Source Qualifier
Lookup Transformation
3.16.1 Lookup Types
3.16.2 Lookup Transformation Components
3.16.3 Connected Lookup Transformation
3.16.4 Unconnected Lookup Transformation
3.16.5 Lookup Cache Types: Dynamic, Static, Persistent, Shared
Update Strategy
Dynamic Lookup Cache Use
Lookup Query
Lookup and Update Strategy Examples
Example to Insert and Update without a Primary Key
Example to Insert and Delete based on a condition
Stored Procedure Transformation
3.21.1 Connected Stored Procedure Transformation
3.21.2 Unconnected Stored Procedure Transformation
Sequence Generator Transformation
Mapplets: Mapplet Input and Mapplet Output Transformations
Normalizer Transformation
XML Sources Import and usage
Mapping Wizards
3.26.1 Getting Started
3.26.2 Slowly Changing Dimensions
Mapping Parameters and Variables
Parameter File
Indirect Flat File Loading
CHAPTER 4:
WORKFLOW MANAGER
Chapter 1
Informatica
PowerCenter
Data Cleanse and Match Option features powerful, integrated cleansing and
matching capabilities to correct and remove duplicate customer data.
Data Federation Option enables a combination of traditional physical and virtual
data integration in a single platform.
Data Masking Option protects sensitive, private information by masking it in
flight to produce realistic-looking data, reducing the risk of security and compliance
breaches.
the
administrative
overhead
of
supporting
grid
computing
Partitioning
where appropriate, to be pushed down into any relational database to make the
best use of existing database assets.
Team-Based
Development
Option
facilitates
collaboration
among
development, quality assurance, and production administration teams and across
geographically disparate teams.
Service Manager: The Service Manager is built in to the domain to support the
domain and the application services. The Service Manager runs on each node in the
domain. The Service Manager starts and runs the application services on a machine.
functionality.
Repository Service: Manages connections to the PowerCenter repository.
Integration Service: Runs sessions and workflows.
Web Services Hub: Exposes PowerCenter functionality to external clients
through web services.
SAP BW Service: Listens for RFC requests from SAP NetWeaver BW and
initiates workflows to extract from or load to SAP BW.
Global repository: The global repository is the hub of the repository domain. Use
the global repository to store common objects that multiple developers can use
through shortcuts. These objects may include operational or Application source
definitions, reusable transformations, mapplets, and mappings.
Local repositories: A local repository is any repository within the domain that is
not the global repository. Use local repositories for development. From a local
repository, you can create shortcuts to objects in shared folders in the global
repository. These objects include source definitions, common dimensions and
lookups, and enterprise standard transformations. You can also create copies of
objects in non-shared folders.
PowerCenter supports versioned repositories. A versioned repository can store
multiple versions of an object. PowerCenter version control allows you to efficiently
develop, test, and deploy metadata into production.
Designer:
Use the Designer to create mappings that contain transformation instructions for the
Integration Service.
The Designer has the following tools that you use to analyze sources, design target
schemas, and build source-to-target mappings:
Source Analyzer: Import or create source definitions.
Target Designer: Import or create target definitions.
Transformation Developer: Develop transformations to use in mappings.
You can also develop user-defined functions to use in expressions.
Mapplet Designer: Create sets of transformations to use in mappings.
Mapping Designer: Create mappings that the Integration Service uses to
extract, transform, and load data.
Data Stencil
Use the Data Stencil to create mapping template that can be used to generate
multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create
mapping templates. Not used by a developer usually.
Repository Manager
Use the Repository Manager to administer repositories. You can navigate through
multiple folders and repositories, and complete the following tasks:
Manage users and groups: Create, edit, and delete repository users and
user groups. We can assign and revoke repository privileges and folder
permissions.
Perform folder functions: Create, edit, copy, and delete folders. Work
we perform in the Designer and Workflow Manager is stored in folders. If we
want to share metadata, you can configure a folder to be shared.
We create repository objects using the Designer and Workflow Manager Client tools.
We can view the following objects in the Navigator window of the Repository
Manager:
Source definitions: Definitions of database objects (tables, views, synonyms) or
files that provide source data.
Target definitions: Definitions of database objects or files that contain the target
data.
Mappings: A set of source and target definitions along with transformations
containing business logic that you build into the transformation. These are the
instructions that the Integration Service uses to transform and move data.
Reusable transformations: Transformations that we use in multiple mappings.
Mapplets: A set of transformations that you use in multiple mappings.
Sessions and workflows: Sessions and workflows store information about how and
when the Integration Service moves data. A workflow is a set of instructions that
describes how and when to run tasks related to extracting, transforming, and loading
data. A session is a type of task that you can put in a workflow. Each session
corresponds to a single mapping.
Workflow Manager
Use the Workflow Manager to create, schedule, and run workflows. A workflow is a
set of instructions that describes how and when to run tasks related to extracting,
transforming, and loading data.
The Workflow Manager has the following tools to help us develop a workflow:
Task Developer: Create tasks we want to accomplish in the workflow.
Worklet Designer: Create a worklet in the Worklet Designer. A worklet is an object
that groups a set of tasks. A worklet is similar to a workflow, but without scheduling
information. We can nest worklets inside a workflow.
Workflow Designer: Create a workflow by connecting tasks with links in the
Workflow Designer. You can also create tasks in the Workflow Designer as you
develop the workflow.
When we create a workflow in the Workflow Designer, we add tasks to the workflow.
The Workflow Manager includes tasks, such as the Session task, the Command task,
and the Email task so you can design a workflow. The Session task is based on a
mapping we build in the Designer.
We then connect tasks with links to specify the order of execution for the tasks we
created. Use conditional links and workflow variables to create branches in the
workflow.
Workflow Monitor
Use the Workflow Monitor to monitor scheduled and running workflows for each
Integration Service.
We can view details about a workflow or task in Gantt Chart view or Task view. We
can run, stop, abort, and resume workflows from the Workflow Monitor. We can view
sessions and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow
Monitor continuously receives information from the Integration Service and
Repository Service. It also fetches information from the repository to display historic
information.
PowerCenter Client: Use the Designer and Workflow Manager to create and
store mapping metadata and connection object information in the repository. Use the
Workflow Monitor to retrieve workflow run status information and session logs
written by the Integration Service. Use the Repository Manager to organize and
secure metadata by creating folders, users, and groups.
Integration Service (IS): When we start the IS, it connects to the repository to
schedule workflows. When we run a workflow, the IS retrieves workflow task and
mapping metadata from the repository. IS writes workflow status to the repository.
Web Services Hub: When we start the Web Services Hub, it connects to the
repository to access web-enabled workflows. The Web Services Hub retrieves
workflow task and mapping metadata from the repository and writes workflow status
to the repository.
SAP BW Service: Listens for RFC requests from SAP NetWeaver BW and initiates
workflows to extract from or load to SAP BW.
Repository Connectivity:
PowerCenter applications such as the PowerCenter Client, the Integration Service,
pmrep, and infacmd connect to the repository through the Repository Service.
The following process describes how a repository client application connects to the
repository database:
1) The repository client application sends a repository connection request to the
master gateway node, which is the entry point to the domain. This is node B
in the diagram.
2) The Service Manager sends back the host name and port number of the node
running the Repository Service. If you have the high availability option, you
can configure the Repository Service to run on a backup node. Node A in
above diagram.
3) The repository client application establishes a link with the Repository Service
process on node A. This communication occurs over TCP/IP.
Understanding Metadata
The repository stores metadata that describes how to extract, transform, and load
source and target data. PowerCenter metadata describes several different kinds of
repository objects. We use different PowerCenter Client tools to develop each kind of
object.
If we enable version control, we can store multiple versions of metadata objects in
the repository.
We can also extend the metadata stored in the repository by associating information
with repository objects. For example, when someone in our organization creates a
source definition, we may want to store the name of that person with the source
definition. We associate information with repository metadata using metadata
extensions.
Administering Repositories
We use the PowerCenter Administration Console, the Repository Manager, and the
pmrep and infacmd command line programs to administer repositories.
This is not used by Informatica Developer normally and not in scope of our
training.
Chapter 2
Repository
Manager
CHAPTER 2:
REPOSITORY MANAGER
We can navigate through multiple folders and repositories and perform basic
repository tasks with the Repository Manager. This is an administration tool and
used by Informatica Administrator.
2. Enter the name of the repository and a valid repository user name.
3. Click OK.
Before we can connect to the repository for the first time, we must configure the
connection information for the domain that the repository belongs to.
3. Click the Add button. The Add Domain dialog box appears.
4. Enter the domain name, gateway host name, and gateway port number.
5. Click OK to add the domain connection.
of locks created:
In-use lock: Placed on objects we want to view
Write-intent lock: Placed on objects we want to modify.
Execute lock: Locks objects we want to run, such as workflows and sessions
Steps:
1.
2.
3.
4.
3. Click ok.
Chapter 3
Designer
CHAPTER 3: DESIGNER
The Designer has tools to help us build mappings and mapplets so we can specify
how to move and transform data between sources and targets. The Designer helps
us create source definitions, target definitions, and transformations to build the
mappings.
The Designer lets us work with multiple tools at one time and to work in multiple
folders and repositories at the same time. It also includes windows so we can view
folders, repository objects, and tasks.
Designer Tools:
Source Analyzer: Use to import or create source definitions for flat file, XML,
COBOL, Application, and relational sources.
Target Designer: Use to import or create target definitions.
Transformation Developer: Use to create reusable transformations.
Mapplet Designer: Use to create mapplets.
Mapping Designer: Use to create mappings.
Designer Windows:
Overview Window
Designer Windows
Designer Tasks:
Add a repository.
Print the workspace.
View date and time an object was last saved.
Open and close a folder.
Create shortcuts.
Check out and in repository objects.
Search for repository objects.
Enter descriptions for repository objects.
View older versions of objects in the workspace.
Revert to a previously saved object version.
Copy objects.
Export and import repository objects.
Work with multiple objects, ports, or columns.
Rename ports.
Use shortcut keys.
5)
6)
7)
8)
9)
6) Click Next. Follow the directions in the wizard to manipulate the column
breaks in the file preview window. Move existing column breaks by dragging
them. Double-click a column break to delete it.
7) Click next and Enter column information for each column in the file.
8) Click Finish.
9) Click Repository > Save.
Required
Treat
Consecutive
Delimiters as
One
Escape
Character
Optional
Remove Escape
Character From
Data
Use Default Text
Length
Optional
Optional
Text Qualifier
Required
Optional
The way to handle target flat files is also same as described in the above
sections. Just make sure that instead of Source Analyzer,
Select Tools -> Target Designer.
Rest is same.
3.3 MAPPINGS
A mapping is a set of source and target definitions linked by transformation objects
that define the rules for data transformation. Mappings represent the data flow
between sources and targets. When the Integration Service runs a session, it uses
the instructions configured in the mapping to read, transform, and write data.
Mapping Components:
3.4 TRANSFORMATIONS
A transformation is a repository object that generates, modifies, or passes data. You
configure logic in a transformation that the Integration Service uses to transform
data. The Designer provides a set of transformations that perform specific functions.
For example, an Aggregator transformation performs calculations on groups of data.
Transformations in a mapping represent the operations the Integration Service
performs on the data. Data passes through transformation ports that we link in a
mapping or mapplet.
Types of Transformations:
Active: An active transformation can change the number of rows that pass through
it, such as a Filter transformation that removes rows that do not meet the filter
condition.
Passive: A passive transformation does not change the number of rows that pass
through it, such as an Expression transformation that performs a calculation on data
and passes all rows through the transformation.
Drag a port from another transformation. When we drag a port from another
transformation the Designer creates a port with the same properties, and it
links the two ports. Click Layout > Copy Columns to enable copying ports.
Click the Add button on the Ports tab. The Designer creates an empty port
you can configure.
Input port: The system default value for null input ports is NULL. It displays
as a blank in the transformation. If an input value is NULL, the Integration
Service leaves it as NULL.
Output port: The system default value for output transformation errors is
ERROR.
The
default
value
appears
in
the
transformation
as
ERROR(`transformation error'). If a transformation error occurs, the
Integration Service skips the row. The Integration Service notes all input rows
skipped by the ERROR function in the session log file.
Input/output port: The system default value for null input is the same as
input ports, NULL. The system default value appears as a blank in the
transformation. The default value for output transformation errors is the same
as output ports.
Note: Variable ports do not support default values. The Integration Service initializes
variable ports according to the datatype.
Note: The Integration Service ignores user-defined default values for unconnected
transformations.
Level
Description
Normal
Terse
Verbose
Initialization
Verbose
Data
Change the tracing level to a Verbose setting only when we need to debug a
transformation that is not behaving as expected.
To add a slight performance boost, we can also set the tracing level to Terse.
Note: We can edit the source definition by dragging the table in Source Analyzer
only.
Shortcut use:
If we will select paste option, then the copy of EMP table definition will be
created.
Suppose, we are 10 people and 5 using shortcut and 5 are copying the
definition of EMP.
Now suppose the definition of EMP changes in database.
We will now reimport the EMP definition and old definition will be replaced.
Developers who were using shortcuts will see that the changes have
been reflected in mapping automatically.
Developers using copy will have to reimport manually.
So for maintenance and ease, we use shortcuts to source and target
definitions in our folder and short to other reusable transformations and
mapplets.
Creating Mapping:
1.
2.
3.
4.
5.
6.
7.
Creating Session:
Now we will create session in workflow manager.
1.
2.
3.
4.
5.
Creating Workflow:
1.
2.
3.
4.
5.
6.
7.
8.
Use the Expression transformation to calculate values in a single row before we write
to the target. For example, we might need to adjust employee salaries, concatenate
first and last names, or convert strings to numbers.
Use the Expression transformation to perform any non-aggregate calculations.
Example: Addition, Subtraction, Multiplication, Division, Concat, Uppercase
conversion, lowercase conversion etc.
We can also use the Expression transformation to test conditional statements before
we output the results to target tables or other transformations. Example: IF, Then,
Decode
There
Calculating Values
To use the Expression transformation to calculate values for a single row, we must
include the following ports:
Input or input/output ports for each value used in the calculation: For
example: To calculate Total Salary, we need salary and commission.
Output port for the expression: We enter one expression for each output
port. The return value for the output port needs to match the return value of
the expression.
Import the source table EMP in Shared folder. If it is already there, then dont
import.
In shared folder, create the target table Emp_Total_SAL. Keep all ports as in
EMP table except Sal and Comm in target table. Add Total_SAL port to store
the calculation.
Create the necessary shortcuts in the folder.
Creating Mapping:
1.
2.
3.
4.
5.
Create Session and Workflow as described earlier. Run the workflow and
see the data in target table.
As COMM is null, Total_SAL will be null in most cases. Now open your
mapping and expression transformation. Select COMM port, In Default Value
give 0. Now apply changes. Validate Mapping and Save.
Refresh the session and validate workflow again. Run the workflow and see
the result again.
Now use ERROR in Default value of COMM to skip rows where COMM is null.
Syntax: ERROR(Any message here)
Similarly, we can use ABORT function to abort the session if COMM is null.
Syntax: ABORT(Any message here)
Make sure to double click the session after doing any changes in mapping. It will
prompt that mapping has changed. Click OK to refresh the mapping. Run workflow
after validating and saving the workflow.
We can filter rows in a mapping with the Filter transformation. We pass all the rows
from a source transformation through the Filter transformation, and then enter a
filter condition for the transformation. All ports in a Filter transformation are
input/output and only rows that meet the condition pass through the Filter
transformation.
Import the source table EMP in Shared folder. If it is already there, then dont
import.
In shared folder, create the target table Filter_Example. Keep all fields as in
EMP table.
Create the necessary shortcuts in the folder.
Creating Mapping:
1.
2.
3.
4.
5.
Create Session and Workflow as described earlier. Run the workflow and
see the data in target table.
Mapping A uses three Filter transformations while Mapping B produces the same
result with one Router transformation.
A Router transformation consists of input and output groups, input and output ports,
group filter conditions, and properties that we configure in the Designer.
Creating Mapping:
1.
2.
3.
4.
5.
Sample Mapping:
The Union transformation is a multiple input group transformation that you can use
to merge data from multiple pipelines or pipeline branches into one pipeline branch.
It merges data from multiple sources similar to the UNION ALL SQL statement to
combine the results from two or more SQL statements.
We can create multiple input groups, but only one output group.
We can connect heterogeneous sources to a Union transformation.
All input groups and the output group must have matching ports. The
precision, datatype, and scale must be identical across all groups.
The Union transformation does not remove duplicate rows. To remove
duplicate rows, we must add another transformation such as a Router or Filter
transformation.
We cannot use a Sequence Generator or Update Strategy transformation
upstream from a Union transformation.
Creating Mapping:
1.
2.
3.
4.
5.
We can specify any amount between 1 MB and 4 GB for the Sorter cache
size.
If it cannot allocate enough memory, the PowerCenter Server fails the
session.
For best performance, configure Sorter cache size with a value less than
or equal to the amount of available physical RAM on the PowerCenter
Server machine.
Informatica recommends allocating at least 8 MB (8,388,608 bytes) of
physical memory to sort data using the Sorter transformation.
2. Case Sensitive:
The Case Sensitive property determines whether the PowerCenter Server
considers case when sorting data. When we enable the Case Sensitive property,
the PowerCenter Server sorts uppercase characters higher than lowercase
characters.
3. Work Directory
Directory PowerCenter Server uses to create temporary files while it sorts data.
4. Distinct:
Check this option if we want to remove duplicates. Sorter will sort data according
to all the ports when it is selected.
Creating Mapping:
1.
2.
3.
4.
5.
The Rank transformation allows us to select only the top or bottom rank of data. It
allows us to select a group of top or bottom values, not just one value.
During the session, the PowerCenter Server caches input data until it can perform
the rank calculations.
Number
Required
Description
1 Minimum
1 Minimum
Not needed
Only 1
Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation.
The PowerCenter Server uses the Rank Index port to store the ranking position for
each row in a group.
For example, if we create a Rank transformation that ranks the top five salaried
employees, the rank index numbers the employees from 1 to 5.
The RANKINDEX is an output port only.
We can pass the rank index to another transformation in the mapping or
directly to a target.
We cannot delete or edit it.
Defining Groups
Rank transformation allows us to group information. For example: If we want to
select the top 3 salaried employees of each Department, we can define a group for
department.
By defining groups, we create one set of ranked rows for each group.
We define a group in Ports tab. Click the Group By for needed port.
We cannot Group By on port which is also Rank Port.
1> Example: Finding Top 5 Salaried Employees
Creating Mapping:
1.
2.
3.
4.
5.
6.
RANK CACHE
When the PowerCenter Server runs a session with a Rank transformation, it
compares an input row with rows in the data cache. If the input row out-ranks a
stored row, the PowerCenter Server replaces the stored row with the input row.
Example: PowerCenter caches the first 5 rows if we are finding top 5 salaried
employees. When 6th row is read, it compares it with 5 rows in cache and places it in
cache is needed.
All Variable ports if there, Rank Port, All ports going out from RANK
transformation are stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.
Aggregate expression
Group by port
Sorted Input
Aggregate cache
Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of
rows used in the aggregation. The conditional clause can be any clause that
evaluates to TRUE or FALSE.
SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
The PowerCenter Server stores data in the aggregate cache until it completes
aggregate calculations.
It stores group values in an index cache and row data in the data cache. If
the PowerCenter Server requires more space, it stores overflow values in
cache files.
1> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.
Creating Mapping:
1.
2.
3.
4.
5.
6.
7.
8.
9.
2> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table for
every DEPARTMENT
Open the mapping made above. Edit Rank Transformation.
Go to Ports Tab. Select Group By for DEPTNO.
Click Apply -> Ok.
Mapping -> Validate and Repository Save.
Refresh the session by double clicking. Save the changed and run workflow to see
the new result.
Creating Mapping:
1>
2>
3>
4>
5>
6>
7>
8>
Specify the join condition in Condition tab. See steps on next page.
Set Master in Ports tab. See steps on next page.
Mapping -> Validate
Repository -> Save.
JOIN CONDITION:
The join condition contains ports from both input sources that must match for the
PowerCenter Server to join two rows.
Example: DEPTNO=DEPTNO1 in above.
1. Edit Joiner Transformation -> Condition Tab
2. Add condition
If we join Char and Varchar datatypes, the PowerCenter Server counts any spaces
that pad Char values as part of the string. So if you try to join the following:
Char (40) = abcd and Varchar (40) = abcd
Then the Char value is abcd padded with 36 blank spaces, and the PowerCenter
Server does not join the two fields because the Char field contains trailing spaces.
Note: The Joiner transformation does not match null values.
JOIN TYPES
In SQL, a join is a relational operator that combines data from multiple tables into a
single result set. The Joiner transformation acts in much the same manner, except
that tables can originate from different databases or flat files.
Types
of Joins:
Normal
Master Outer
Detail Outer
Full Outer
Note: A normal or master outer join performs faster than a full outer or detail outer
join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, we
have DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.
Normal Join:
With a normal join, the PowerCenter Server discards all rows of data from the master
and detail source that do not match, based on the condition.
All employees of 10, 20 and 30 will be there as only they are matching.
Master Outer Join:
This join keeps all rows of data from the detail source and the matching rows from
the master source. It discards the unmatched rows from the master source.
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index
cache and Data Cache based on MASTER table.
1> Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX
CACHE.
Example: DEPTNO in our mapping.
2> Joiner Data Cache:
Master column not in join condition and used for output to other
transformation or target table are in Data Cache.
Example: DNAME and LOC in our mapping example.
Join data originating from the same source database: We can join two
or more tables with primary key-foreign key relationships by linking the
sources to one Source Qualifier transformation.
Filter rows when the PowerCenter Server reads source data: If we
include a filter condition, the PowerCenter Server adds a WHERE clause to the
default query.
Specify an outer join rather than the default inner join: If we include a
user-defined join, the PowerCenter Server replaces the join information
specified by the metadata in the SQL query.
Specify sorted ports: If we specify a number for sorted ports, the
PowerCenter Server adds an ORDER BY clause to the default SQL query.
Select only distinct values from the source: If we choose Select Distinct,
the PowerCenter Server adds a SELECT DISTINCT statement to the default
SQL query.
Create a custom query to issue a special SELECT statement for the
PowerCenter Server to read source data: For example, you might use a
custom query to perform aggregate calculations.
The entire above are possible in Properties Tab of Source Qualifier t/f.
SAMPLE MAPPING TO BE MADE:
Creating Mapping:
1>
2>
3>
4>
5>
6>
7>
SQ PROPERTIES TAB
Validate the mapping. Save it. Now refresh session and save the changes. Now
run the workflow and see output.
We can specify equi join, left outer join and right outer join only. We
cannot specify full outer join. To use full outer join, we need to write
SQL Query.
Steps:
1> Open the Source Qualifier transformation, and click the Properties tab.
2> Click the Open button in the User Defined Join field. The SQL Editor Dialog
box appears.
3>Enter the syntax for the join.
Syntax
Equi Join
DEPT.DEPTNO=EMP.DEPTNO
In mapping above, we are passing only SAL and DEPTNO from SQ_EMP to
Aggregator transformation. Default query generated will be:
4. The SQL Editor displays the default query the PowerCenter Server uses to
select source data.
5. Click Cancel to exit.
Note: If we do not cancel the SQL query, the PowerCenter Server overrides
the default query with the custom SQL query.
We can enter an SQL statement supported by our source database. Before entering
the query, connect all the input and output ports we want to use in the mapping.
Example: As in our case, we cant use full outer join in user defined join, we can
write SQL query for FULL OUTER JOIN:
SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC, EMP.EMPNO,
EMP.JOB, EMP.SAL, EMP.COMM, EMP.DEPTNO FROM
EMP FULL OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO
WHERE
SAL>2000
EMP.ENAME,
We also added WHERE clause. We can enter more conditions and write
more complex SQL.
We can write any query. We can join as many tables in one query as
required if all are in same database. It is very handy and used in most of the
projects.
Important Points:
When creating a custom SQL query, the SELECT statement must list the
port names in the order in which they appear in the transformation.
Example: DEPTNO is top column; DNAME is second in our SQ mapping.
So when we write SQL Query, SELECT statement have name DNAME
first, DNAME second and so on. SELECT DEPT.DEPTNO, DEPT.DNAME
Once we have written a custom query like above, then this query will
always be used to fetch data from database. In our example, we used
WHERE SAL>2000. Now if we use Source Filter and give condition
SAL>1000 or any other, then it will not work. Informatica will always
use the custom query only.
Make sure to test the query in database first before using it in SQL
Query. If query is not running in database, then it wont work in
Informatica too.
Also always connect to the database and validate the SQL in SQL query
editor.
Passive Transformation
Can be Connected or Unconnected. Dynamic lookup is connected.
Use a Lookup transformation in a mapping to look up data in a flat file or a
relational table, view, or synonym.
We can import a lookup definition from any flat file or relational database to
which both the PowerCenter Client and Server can connect.
We can use multiple Lookup transformations in a mapping.
The PowerCenter Server queries the lookup source based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup source
column values based on the lookup condition. Pass the result of the lookup to other
transformations and a target.
We can use the Lookup transformation to perform following:
Get a related value: EMP has DEPTNO but DNAME is not there. We use
Lookup to get DNAME from DEPT table based on Lookup Condition.
Perform a calculation: We want only those Employees whos SAL >
Average (SAL). We will write Lookup Override query.
Update slowly changing dimension tables: Most important use. We can
use a Lookup transformation to determine whether rows already exist in the
target.
Connected or Unconnected
Relational or Flat File
Cached or Uncached
Relational Lookup:
When we create a Lookup transformation using a relational table as a lookup source,
we can connect to the lookup source using ODBC and import the table definition as
the structure for the Lookup transformation.
We can override the default SQL statement if we want to add a WHERE clause
or query multiple tables.
We can use a dynamic lookup cache with relational lookups.
Unconnected Lookup
lookup
Server
lookup
ports.
values
to
Pass
one
output
transformation.
value
to
another
1. Lookup Source:
We can use a flat file or a relational table for a lookup source. When we create a
Lookup t/f, we can import the lookup source from the following locations:
Any relational source or target definition in the repository
Any flat file source or target definition in the repository
Any table or file that both the PowerCenter Server and Client machine can
connect to
The lookup table can be a single table, or we can join multiple tables in the same
database using a lookup SQL override in Properties Tab.
2. Ports:
Ports
Lookup
Type
Number
Needed
Description
Connected
Unconnected
Minimum 1
Connected
Unconnected
Minimum 1
Connected
Unconnected
Minimum 1
Unconnected
1 Only
3. Properties Tab
Options
Lookup
Type
Description
Lookup SQL
Override
Relational
Lookup Table
Name
Relational
Lookup Caching
Enabled
Flat File,
Relational
Lookup Policy on
Multiple Match
Flat File,
Relational
Lookup
Condition
Flat File,
Relational
Connection
Information
Relational
Source Type
Flat File,
Relational
Lookup Cache
Directory Name
Flat File,
Relational
Lookup Cache
Persistent
Flat File,
Relational
Dynamic Lookup
Cache
Flat File,
Relational
Recache From
Lookup Source
Flat File,
Relational
Insert Else
Update
Relational
Update Else
Insert
Relational
Lookup Data
Cache Size
Flat File,
Relational
Lookup Index
Cache Size
Flat File,
Relational
Flat File,
Relational
Datetime Format
Thousand Separator
Decimal Separator
Case-Sensitive String Comparison
Null Ordering
Sorted Input
4: Condition Tab
We enter the Lookup Condition. The PowerCenter Server uses the lookup condition to
test incoming values. We compare transformation input values with values in the
lookup source or cache, represented by lookup ports.
Tip: If we include more than one lookup condition, place the conditions with an equal
sign first to optimize lookup performance.
Note:
1. We can use = operator in case of Dynamic Cache.
2. The PowerCenter Server fails the session when it encounters multiple keys for
a Lookup transformation configured to use a dynamic cache.
Creating Mapping:
1.
2.
3.
4.
5.
6.
8. As DEPT is the Source definition, click Source and then Select DEPT.
10> Now Pass DEPTNO from SQ_EMP to this Lookup. DEPTNO from SQ_EMP will
be named as DEPTNO1. Edit Lookup and rename it to IN_DEPTNO in ports
tab.
11> Now go to CONDITION tab and add CONDITION.
DEPTNO = IN_DEPTNO and Click Apply and then OK.
Link the mapping as shown below:
12> We are not passing IN_DEPTNO and DEPTNO to any other transformation
from LOOKUP; we can edit the lookup transformation and remove the
OUTPUT check from them.
13> Mapping -> Validate
14> Repository -> Save
We use Connected Lookup when we need to return more than one column from
Lookup table.
There is no use of Return Port in Connected Lookup.
If we use a flat file lookup, the IS always caches the lookup source.
We set the Cache type in Lookup Properties.
2. Dynamic Cache
To cache a target table or flat file source and insert new rows or update existing
rows in the cache, use a Lookup transformation with a dynamic cache.
The IS dynamically inserts or updates data in the lookup cache and passes data
to the target.
Target table is also our lookup table. No good for performance if table is huge.
3. Persistent Cache
If the lookup table does not change between sessions, we can configure the
Lookup transformation to use a persistent lookup cache.
The IS saves and reuses cache files from session to session, eliminating the time
required to read the lookup table.
5. Shared Cache
Till now, we have only inserted rows in our target tables. What if we want to
update, delete or reject rows coming from source based on some condition?
Example: If Address of a CUSTOMER changes, we can update the old address or
keep both old and new address. One row is for old and one for new. This way we
maintain the historical data.
Update Strategy is used with Lookup Transformation. In DWH, we create a Lookup
on target table to determine whether a row already exists or not. Then we insert,
update, delete or reject the source record as per business need.
In PowerCenter, we set the update strategy at two different levels:
1. Within a session
2. Within a Mapping
Constant
Numeric Value
INSERT
DD_INSERT
UPDATE
DD_UPDATE
DELETE
DD_DELETE
REJECT
DD_REJECT
Steps:
1. Create Update Strategy Transformation
2. Pass all ports needed to it.
3. Set the Expression in Properties Tab.
4. Connect to other transformations or target.
When
cache
To use Dynamic Cache, first Edit Lookup Transformation -> Properties Tab ->
Select Dynamic Cache Option
Also Select Insert Else Update or Update Else Insert Option
Description
Associated Port:
Associate lookup ports with either an input/output port or a sequence ID. Each
Lookup Port is associated with a source port so that it can compare the changes.
Also, we can generate of Sequence 1, 2, 3 and so on with it. Sequence ID is
available when datatype is Integer or Small Int.
Ignore In Comparison:
When we do not want to compare any column in source with target, then we can use
this option. Ex: Hiredate will be always same so no need to compare.
In above:
The top most port is NewLookupRow. Its hidden.
All Lookup table ports have been PREV_ before them.
ENAME has been associated with PREV_ENAME and so are others.
PREV_COMM port has been checked for Ignore Null Inputs for updates.
PREV_HIREDATE has been checked for Ignore in Comparison.
Creating Mapping:
1.
2.
3.
4.
5.
Create Session and Workflow as usual. First time all rows will be inserted.
Now Change the data of target table in Oracle and Run workflow again.
You can see how the data is updated as per the properties selected.
We pass the data from Lookup Cache and not source to Filter. This is because the
Cache is updated regularly and contains the most updated data.
Example
Source:
EMPNO
9000
9001
9002
9003
of cache:
Name
Amit Kumar
Rahul Singh
Sanjay
Sumit Singh
SAL
9000
9500
8000
7000
DEPTNO
10
20
30
20
SAL
8000
9500
DEPTNO
10
20
EMPNO
9000
9001
Name
Amit Kumar
Rahul Singh
SAL
8000
9500
DEPTNO
10
20
EMPNO
9000
9001
9002
9003
Name
Amit Kumar
Rahul Singh
Sanjay
Sumit Singh
SAL
9000
9500
8000
7000
DEPTNO
10
20
30
20
Initial Cache:
NewlookupRow
Updated Cache:
NewlookupRow
2
0
1
1
Also if in above, we will not write AS, then lookup will not work and an ERROR
TE_7001 is displayed. It is mandatory to write AS after a column in lookup.
Passive Transformation
Connected and Unconnected Transformation
Stored procedures are stored and run within the database.
Stored Procedures:
Connect to Source database and create the stored procedures given below:
CREATE OR REPLACE procedure sp_agg (in_deptno in number, max_sal out number,
min_sal out number, avg_sal out number, sum_sal out number)
As
Begin
select max(Sal),min(sal),avg(sal),sum(sal) into max_sal,min_sal,avg_sal,sum_sal
from emp where deptno=in_deptno group by deptno;
End;
/
Creating Mapping:
1.
2.
3.
4.
5.
6. Drag DEPTNO from SQ_DEPT to the stored procedure input port and also to
DEPTNO port of target.
7. Connect the ports from procedure to target as shown below:
Creating Mapping:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Click OK and connect the port from expression to target as in mapping below:
PROC_RESULT use:
If the stored procedure returns multiple output parameters, you must create
variables for each output parameter.
Example: DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
as output then:
1. Create four variable ports in expression VAR_MAX_SAL,
VAR_MIN_SAL, VAR_AVG_SAL and VAR_SUM_SAL.
2. Create four output ports in expression OUT_MAX_SAL,
OUT_MIN_SAL, OUT_AVG_SAL and OUT_SUM_SAL.
3. Call the procedure in last variable port says VAR_SUM_SAL.
:SP.SP_AGG (DEPTNO, VAR_MAX_SAL,VAR_MIN_SAL, VAR_AVG_SAL,
PROC_RESULT)
Example 2:
DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL as O/P.
Stored Procedure to drop index in Pre Load of Target
Stored Procedure to create index in Post Load of Target
Stored procedures are given below to drop and create index on target.
Make sure to create target table first.
Creating Mapping:
1.
2.
3.
4.
5.
NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting it to a
transformation or target.
For example, we might connect NEXTVAL to two target tables in a mapping to
generate unique primary key values.
Sequence in Table 1 will be generated first. When table 1 has been loaded, only then
sequence for table 2 will be generated.
CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.
We typically only connect the CURRVAL port when the NEXTVAL port is
already connected to a downstream transformation.
If we connect the CURRVAL port without connecting the NEXTVAL port,
the Integration Service passes a constant value for each row.
When we connect the CURRVAL port in a Sequence Generator
transformation, the Integration Service processes one row in each block.
We can optimize performance by connecting only the NEXTVAL port in a
mapping.
Creating Mapping:
1.
2.
3.
4.
5.
6.
Required/
Optional
Description
Start Value
Required
Increment By
Required
End Value
Optional
Current Value
Optional
Cycle
Optional
Reset
Optional
POINTS:
If Current value is 1 and end value 10, no cycle option. There are 17 records
in source. In this case session will fail.
If we connect just CURR_VAL only, the value will be same for all records.
If Current value is 1 and end value 10, cycle option there. Start value is 0.
There are 17 records in source. Sequence: 1 2 10. 0 1 2 3
To make above sequence as 1-10 1-20, give Start Value as 1. Start value is
used along with Cycle option only.
If Current value is 1 and end value 10, cycle option there. Start value is 1.
There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in
repository. If we run session again, sequence will start from 8.
Use reset option if you want to start sequence from CURR_VAL every time.
3.23 MAPPLETS
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input
transformation in the mapplet. We can create multiple pipelines in a mapplet.
Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give
the output to mapplet out transformation.
Steps:
1.
2.
3.
4.
5.
6.
7.
We can mapplet in mapping by just dragging the mapplet from mapplet folder
on left pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the
ports from the Input and Output transformations. These are referred to as the
mapplet input and mapplet output ports.
Make sure to give correct connection information in session.
Creating Mapping
1.
2.
3.
4.
5.
6.
7.
Example2: We will join EMP and DEPT table. The ports of DEPT table will be
passed to mapplet in mapping. We will use MAPPLET_INPUT to pass ports of
DEPT to joiner. Then calculate total salary. Give the output to mapplet out
transformation.
Steps:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapplet Designer.
3. Click Mapplets-> Create-> Give name. Ex: mplt_example1
4. Drag EMP table.
5. Transformation -> Create -> Select Mapplet Input for list->Create -> Done
6. Edit Mapplet Input.
7. Go to ports tab and add 3 ports DEPTNO, DNAME and LOC.
8. Use Joiner transformation as described earlier to join them.
9. Transformation -> Create -> Select Expression for list -> Create -> Done
10. Pass all ports from joiner to expression and then calculate total salary as
described in expression transformation.
11. Now Transformation -> Create -> Select Mapplet Out from list -> Create
-> Give name and then done.
12. Pass all ports from expression to Mapplet output.
13. Mapplet -> Validate
14. Repository -> Save
Creating Mapping
1.
2.
3.
4.
5.
6.
7.
8.
Creating Mapping
1.
2.
3.
4.
5.
6.
7.
8.
9.
Name
Amit
Rahul
Jessie
ENG
78
67
56
Target: Roll_Number
100
100
100
101
101
101
102
102
102
Name
Amit
Amit
Amit
Rahul
Rahul
Rahul
Jessie
Jessie
Jessie
Marks
78
67
90
67
87
78
56
89
97
HINDI
67
87
89
MATHS
90
78
97
Steps:
1. Open Shared Folder -> Tools -> Source Analyzer
2. Sources -> Import XML Definition.
3. Browse for location where XML file is present. To import the definition, we
should have XML file in our local system on which we are working.
4. Select the file and click open.
5. Option for Override Infinite Length is not set. Do you want to set it is
displayed.
6. Click Yes.
7. Check Override all infinite lengths with value and give value as 2.
8. Do not modify other options and Click Ok.
9. Click NEXT and then click FINISH
10. Definition has been imported and can be used in mapping as we select other
sources.
SESSION PROPERTIES
Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Getting Started.
3. Enter a mapping name and select Simple Pass Through, and click next.
4. Select a source definition to use in the mapping.
5. Enter a name for the mapping target table and click Finish.
6. To save the mapping, click Repository > Save.
Handling Keys: When we use the Slowly Growing Target option, the Designer
creates an additional column in target, PM_PRIMARYKEY. In this column, the
Integration Service generates a primary key for each row written to the target,
incrementing new key values by 1.
Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Getting Started.
3. Enter a mapping name and select Slowly Growing Target, and click next.
4. Select a source definition to be used in the mapping.
5. Enter a name for the mapping target table. Click Next.
6. Select the column or columns from the Target Table Fields list that we want
the Integration Service to use to look up data in the target table. Click Add.
These columns are used to compare source and target.
7. Click Finish.
8. To save the mapping, click Repository > Save.
Note: The Fields to Compare for Changes field is disabled for the Slowly
Growing Targets mapping.
If row exists in source and not in target, then the row is inserted in target.
If row exists in source and target but there is some change, the row in target
table is updated.
Use this mapping when we do not want a history of previous dimension data.
Handling Keys: When we use the SCD Type1 option, the Designer creates an
additional column in target, PM_PRIMARYKEY. Value incremented by +1.
Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Slowly Changing
Dimension.
3. Enter a mapping name and select Type 1 Dimension, and click Next.
4. Select a source definition to be used by the mapping.
5. Enter a name for the mapping target table. Click Next.
6. Select the column or columns we want to use as a lookup condition from the
Target Table Fields list and click add.
7. Select the column or columns we want the Integration Service to compare for
changes, and click add.
8. Click Finish.
9. To save the mapping, click Repository > Save.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert and Update as Update for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is updated into target table.
Note: In the Type 1 Dimension mapping, the Designer uses two instances of the
same target definition to enable inserting and updating data in the same target
table. Generate only one target table in the target database.
When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_VERSION_NUMBER: The IS generates a version number for each row
written to the target.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Keep the `Version' Number in Separate Column.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Note: Designer uses two instances of the same target definition to enable the two
separate data flows to write to the same target table. Generate only one target table
in the target database.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is inserted into target table.
When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_CURRENT_FLAG: The Integration Service flags the current row "1" and
all previous versions "0".
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Mark the `Current' Dimension Record with a Flag.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Note: In the Type 2 Dimension/Flag Current mapping, the Designer uses three
instances of the same target definition to enable the three separate data flows to
write to the same target table. Generate only one target table in the target database.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert and Update as Update for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is inserted into target table.
Flow2: Current Flag of changed record is updated in target table.
When we use this option, the Designer creates 3 additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_BEGIN_DATE: For each new and changed record, it is populated with
SYSDATE. This Sysdate is the date on which ETL process runs.
3. PM_END_DATE: It is populated as NULL when record is inserted. A new
record is inserted when a record changes. However, PM_END_DATE of
changed record is updated with SYSDATE.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Mark the Dimension Records with their Effective Date
Range.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Configuring Session: It is same as we did in SCD Type Flag Current.
Flow1: New record is inserted into target table with PM_BEGIN_DATE as SYSDATE.
Flow2: Changed record is inserted into target with PM_BEGIN_DATE as SYSDATE.
Flow2: END_DATE of changed record is updated in target table.
When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_PREV_ColumnName: The Designer generates a previous column
corresponding to each column for which we want historical data. The IS keeps
the previous version of record data in these columns.
3. PM_EFFECT_DATE: An optional field. The IS uses the system date to
indicate when it creates or updates a dimension.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 3 Dimension in
Step 3.
2. Click Next. Select Effective Date if desired.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Configuring Session: It is same as we did in SCD Type Flag Current.
Flow1: New record is inserted into target table.
Flow2: Changed record is updated in the target table.
MAPPING PARAMETERS
Example: When we want to extract records of a particular month during ETL process,
we will create a Mapping Parameter of data type and use it in query to compare it
with the timestamp field in SQL override.
MAPPING VARIABLES
Unlike mapping parameters, mapping variables are values that can change
between sessions.
The Integration Service saves the latest value of a mapping variable to the
repository at the end of each successful session.
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the Workflow Manager.
We might use a mapping variable to perform an incremental read of the source. For
example, we have a source table containing timestamped transactions and we want
to evaluate the transactions on a daily basis. Instead of manually entering a session
override to filter source data each time we run the session, we can create a mapping
variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows
whose transaction date equals $$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one
day each time the session runs. If we set the initial value of $$IncludeDateTime to
8/1/2004, the first time the Integration Service runs the session, it reads only rows
dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime
to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next
time it runs the session, it reads only rows from August 2, 2004.
Expression
Filter
Router
Update Strategy
Variable Values:
Default Value
0
Empty String
1/1/1
Start Value:
The start value is the value of the variable at the start of the session. The
Integration Service looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value:
The current value is the value of the variable as the session progresses. When a
session starts, the current value of a variable is the same as the start value. The
final current value for a variable is saved to the repository at the end of a successful
session. When a session fails to complete, the Integration Service does not update
the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping
variable, the start value of the variable is saved to the repository.
Variable Functions
Variable functions determine how the Integration Service calculates the current value
of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It
ignores rows marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It
ignores rows marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the
variable value when a row is marked for insertion, and subtracts one when the row is
marked for deletion. It ignores rows marked for update or reject. Aggregation type
set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it
compares the final current value of the variable to the start value of the variable.
Based on the aggregate type of the variable, it saves a final value to the repository.
Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create -> Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0.
COUNT is visible when datatype is INT or SMALLINT.
11. Create variable $$var_set of MAX aggregation type.
12.
13. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,
out_COUNT_VAR and out_SET_VAR.
14. Open expression editor for TOTAL_SAL. Do the same as we did earlier for
SAL+ COMM. To add $$BONUS to it, select variable tab and select the
parameter from mapping parameter. SAL + COMM + $$Bonus
15. Open Expression editor for out_max_var.
16. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.
SETMAXVARIABLE($$var_max,SAL)
17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.
Chapter 4
Workflow
Manager
To move data from sources to targets, the Integration Service uses the following
components:
Integration Service process
Load Balancer
Data Transformation Manager (DTM) process
Valid Workflow:
Example of loop:
Once we create links between tasks, we can specify conditions for each link to
determine the order of execution in the workflow.
If we do not specify conditions for each link, the Integration Service runs the
next task in the workflow by default.
Use predefined or user-defined workflow variables in the link condition.
Steps:
1. In the Workflow Designer workspace, double-click the link you want to
specify.
2. The Expression Editor appears.
3. In the Expression Editor, enter the link condition. The Expression Editor
provides predefined workflow variables, user-defined workflow variables,
variable functions, and Boolean and arithmetic operators.
4. Validate the expression using the Validate button.
System variables:
Use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow.
Task-specific variables:
The Workflow Manager provides a set of task-specific variables for each task in the
workflow. The Workflow Manager lists task-specific variables under the task name in
the Expression Editor.
Task-specific
variable
Condition
EndTime
ErrorCode
ErrorMsg
FirstErrorCode
FirstErrorMsg
PrevTaskStatus
SrcFailedRows
SrcSuccessRows
StartTime
Status
TgtFailedRows
TgtSuccessRows
TotalTransErrors
Description
Result of decision condition expression. NULL if
task fails.
Date and time when a task ended.
Last error code for the associated task. 0 if there
is no error.
Last error message for the associated task. Empty
String if there is no error.
Error code for the first error message in the
session. 0 if there is no error.
First error message in the session. Empty String if
there is no error.
Status of the previous task in the workflow that IS
ran. Can be ABORTED, FAILED, STOPPED,
SUCCEEDED.
Total number of rows the Integration Service failed
to read from the source.
Total number of rows successfully read from the
sources.
Date and time when task started.
Status of the previous task in the workflow. Can
be ABORTED, DISABLED, FAILED, NOTSTARTED,
STARTED, STOPPED, SUCCEEDED.
Total number of rows the Integration Service failed
to write to the target.
Total number of rows successfully written to the
target
Total number of transformation errors.
Task Type
Decision
Task
All Tasks
All Tasks
All Tasks
Session
Session
All Tasks
Session
Session
All Tasks
All Tasks
Session
Session
Session
6. Enter the default value for the variable in the Default field.
7. To validate the default value of the new workflow variable, click the Validate
button.
8. Click Apply to save the new workflow variable.
9. Click OK to close the workflow properties.
Naming Convention
$DBConnectionName
$InputFileName
$OutputFileName
$LookupFileName
$BadFileName
Source file, target file, lookup file, reject file parameters are used for Flat
Files.
Reusable or not
Yes
Yes
Yes
No
No
No
No
No
No
A session is a set of instructions that tells the PowerCenter Server how and
when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the
Session tasks sequentially or concurrently, depending on our needs.
The PowerCenter Server creates several files and in-memory caches
depending on the transformations and options used in the session.
Steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.
The Workflow Manager provides an Email task that allows us to send email
during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
We can also drag the email task and use as per need.
We can set the option to send email on success or failure in
components tab of a session task.
2.
3.
4.
5.
Open Workflow Designer. Workflow -> Create -> Give name and click ok.
Start is displayed. Drag session say s_m_Filter_example and command task.
Link Start to Session task and Session to Command Task.
Double click link between Session and Command and give condition in editor
as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository -> Save
Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
Task -> Create -> Select Event Wait. Give name. Click create and done.
Link Start to Event Wait task.
Drag s_filter_example to workspace and link it to event wait task.
Right click on event wait task and click EDIT -> EVENTS tab.
Select Pre Defined option there. In the blank space, give directory and
filename to watch. Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
Workflow -> Edit -> Events Tab and add events EVENT1 there.
Drag s_m_filter_example and link it to START task.
Click Tasks -> Create -> Select EVENT RAISE from list. Give name
ER_Example. Click Create and then done.
5. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User
Defined Event and Select EVENT1 from the list displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give the
condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT.
Click Create and then done.
9. Link EW_WAIT to START task.
10. Right click EW_WAIT -> EDIT-> EVENTS tab.
11. Select User Defined there. Select the Event1 by clicking Browse Events
button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14. Mapping -> Validate
15. Repository -> Save.
16. Run workflow and see.
The Decision task allows us to enter a condition that determines the execution
of the workflow, similar to a link condition.
The Decision task has a pre-defined variable called
$Decision_task_name.condition that represents the result of the decision
condition.
The PowerCenter Server evaluates the condition in the Decision task and sets
the pre-defined condition variable to True (1) or False (0).
We can specify one decision condition per Decision task.
We can use the Control task to stop, abort, or fail the top-level workflow or
the parent workflow based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the
Control task.
We give the condition to the link connected to Control Task.
Control Option
Description
Fail Me
Fail Parent
Stop Parent
Abort Parent
Fail Top-Level WF
Stop Top-Level WF
Abort Top-Level WF
Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to START task.
3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
Click Create and then done.
4. Link all sessions to the control task cntr_task.
5. Double click link between cntr_task and any session say s_m_filter_example
and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
6. Repeat above step for remaining 2 sessions also.
7. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to
OR. Default is AND.
8. Go to PROPERTIES tab of cntr_task and select the value Fail top level
Workflow for Control Option. Click Apply and OK.
9. Workflow Validate and repository Save.
10. Run workflow and see the result.
We can use the User Defined Variable in our link conditions as per the need
and also calculate or set the value of variable in Assignment Task.
4.4 SCHEDULERS
We can schedule a workflow to run continuously, repeat at a given time or interval,
or we can manually start a workflow. The Integration Service runs a scheduled
workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings
by editing the scheduler. If we change schedule settings, the Integration Service
reschedules the workflow according to the new settings.
For each folder, the Workflow Manager lets us create reusable schedulers so
we can reuse the same set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of
scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted
scheduler becomes invalid. To make the workflows valid, we must edit them
and replace the missing scheduler.
Steps:
1.
2.
3.
4.
5.
6.
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The
Integration Service then starts the next run of the workflow as soon as it finishes
the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The
Integration Service then starts the next run of the workflow according to settings
in Schedule Options.
Schedule options for Run on Server initialization:
Start Date
Start Time
Some Points:
To remove a workflow from its schedule, right-click the workflow in the
Navigator window and choose Unschedule Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in
the Navigator window and choose Schedule Workflow.
4.5 WORKLETS
Some Points:
We cannot run two instances of the same worklet concurrently in the same
workflow.
We cannot run two instances of the same worklet concurrently across two
different workflows.
Each worklet instance in the workflow can run once.
4.6 PARTITIONING
2. Number of Partitions
We can define up to 64 partitions at any partition point in a pipeline.
When we increase or decrease the number of partitions at any partition point,
the Workflow Manager increases or decreases the number of partitions at all
partition points in the pipeline.
Increasing the number of partitions or partition points increases the number
of threads.
The number of partitions we create equals the number of connections to the
source or target. For one partition, one database connection will be used.
3. Partition types
The Integration Service creates a default partition type at each partition
point.
If we have the Partitioning option, we can change the partition type. This
option is purchased separately.
The partition type controls how the Integration Service distributes data
among partitions at partition points.
2. PROPERTIES TAB
Property
Write Backward Compatible
Session Log File
Session Log File Name
Session Log File Directory
Parameter File Name
Enable Test Load
Number of Rows to Test
$Source Connection Value
Required/
Optional
Optional
Optional
Required
Optional
Optional
Optional
Optional
Optional
Required
Commit Type
Required
Commit Interval
Required
Recovery Strategy
Required
Description
Select to write session log to a file.
RECOVERY STRATEGY
Workflow recovery allows us to continue processing the workflow and workflow tasks
from the point of interruption. We can recover a workflow if the Integration Service
can access the workflow state of operation.
The Integration Service recovers tasks in the workflow based on the recovery
strategy of the task.
By default, the recovery strategy for Session and Command tasks is to fail the
task and continue running the workflow.
We can configure the recovery strategy for Session and Command tasks.
The strategy for all other tasks is to restart the task.
6. COMPONENTS TAB
In the
Required/
Optional
Required
Optional
Required
Suspension Email
Optional
Disabled
Optional
Suspend on Error
Optional
Description
Name of workflow
Comment that describes the workflow.
Integration Service that runs the
workflow by default.
Email message that the Integration
Service sends when a task fails and the
Integration
Service
suspends
the
workflow.
Disables the workflow from the
schedule.
The Integration Service suspends the
workflow when a task in the workflow
fails.
2. PROPERTIES TAB
Properties tab has the following options:
Parameter File Name
Write Backward Compatible Workflow Log File: Select to write workflow
log to a file. It is Optional.
Workflow Log File Name
Workflow Log File Directory
Save Workflow Log By: Required and Options are By Run and By
Timestamp
Save Workflow Log For These Runs: Required. How many logs needs to
be saved for a workflow.
Enable HA Recovery: Not required.
Automatically recover terminated tasks: Not required.
Maximum automatic recovery attempts: Not required.
3. SCHEDULER TAB
The Scheduler Tab lets us schedule a workflow to run continuously, run at a
given interval, or manually start a workflow.
4. VARIABLE TAB
It is used to declare User defined workflow variables.
5. EVENTS TAB
Before using the Event-Raise task, declare a user-defined event on the Events
tab.