You are on page 1of 25

Pushdown

Optimization
Jason Hamby

Informatica confidential. For discussion purposes only.

Agenda

Pushdown Optimization Overview and Benefits

How it works

How to Configure Pushdown Optimization

What Is and What Is Not Supported

What can/can not be pushed down

Limitations details of rules

When is Pushdown Optimization appropriate

Demo

Informatica confidential. For discussion purposes only.

Overview

Informatica confidential. For discussion purposes only.

Pushdown Optimization Overview


Push transformation processing to data sources
Benefits
-

Reduce data moved when source and target are the same

Utilize database-specific processing that may be more


optimal

Maintain metadata and lineage in PowerCenter

Informatica confidential. For discussion purposes only.

Customer Scenario
Batch transformation and load -- staging and target tables
in the same target database
Transformation and load from real-time status table to
data warehouse in the same database

Step 1

Step 2

Staging

Data
Sources

Warehouse

Target
Database

Informatica confidential. For discussion purposes only.

Solution Overview
Pushdown optimization is an option that user selects
SQL to be processed in DB is automatically generated
A session may be partially, or completely pushed down
Step 2

Step 1
DI
Server

Staging

Warehouse

SQL

Data
Sources

Optimizer
Metadata
Repository

Target
Database

Informatica confidential. For discussion purposes only.

How Does It Work

Informatica confidential. For discussion purposes only.

How It Works
Available as a session property
Pushdown Optimization Options
Partial pushdown optimization to source
Partial pushdown optimization to target
Full pushdown optimization

Integration Service analyzes the mapping and


generates one or more SQL statements based on the
mapping transformation logic
Integration Service executes SQL against the
database instead of processing the transformation
logic itself

Informatica confidential. For discussion purposes only.

How It Works (contd)


Integration Service analyzes the mapping and
session to determine the transformation logic it can
push to the database
Integration Service processes transformation logic
that it cannot push down to the database
Generated SQL is not saved in the repository
Displayed results in session mapping tab (in
Workflow Manager)
Transformations that can/cant be pushed down
Generated SQL
Reason why certain transformations cant be pushed down

Informatica confidential. For discussion purposes only.

Configuration (from Workflow Mgr)

Informatica confidential. For discussion purposes only.

10

Viewing the Result

Informatica confidential. For discussion purposes only.

11

Preview from SessionMapping Tab

Transformations
Pushed to Source
or Target Database

Generated SQL
Statement

Informatica confidential. For discussion purposes only.

12

What Is and What Is Not Supported

13

Informatica confidential. For discussion purposes only.

Supported Databases
Teradata (V2R5 or above)
Oracle (9i or above)
DB2 (v8 or above)
SQL Server (7 and above)
Sybase (ASE 12.5)
ODBC source/target

Informatica confidential. For discussion purposes only.

14

Supported Transformations
To Source
Aggregator
Expression

To Target
Expression
Lookup

Filter
Joiner
Lookup
Sorter
Union

Informatica confidential. For discussion purposes only.

15

Unsupported Transformations

Custom Transformation

Router

External Procedure

Sequence Generator

XML

Stored Procedure

Normalizer

TCT

Rank

Update Strategy

Informatica confidential. For discussion purposes only.

16

Partial Source Pushdown


Condition:
One or more transformations can be processed in source database

Virtual source transformations pushed to source


Generated SQL:
SELECT FROM s WHERE (filter/join condition) GROUP
BY

a
Extract
Source
DB

Transform

Load
Target

Informatica confidential. For discussion purposes only.

17

Partial Target Pushdown


Condition:
One or more transformations can be processed in target
database

Virtual target transformations pushed to target


Generated SQL:
INSERT INTO t () VALUES (?+1, SOUNDEX(?))

a
Source

Extract

Transform

Load
Target
DB

Informatica confidential. For discussion purposes only.

18

Full Pushdown
Condition:
Source and target are in the same RDBMS
All transformations can be processed in database

Data not extracted outside of DB


Generated SQL:
INSERT INTO t () SELECT FROM s

z Extract
Source
DB

Transform

Load
Target
DB

Informatica confidential. For discussion purposes only.

19

Design (Two-Pass)
Pass 1:
Start from the source and traverse transformations
downstream, and build SQL query (SELECT statement).
Stop if a transformation cannot be processed in source
database and settle for partial pushdown to source.
If target is reached, then full pushdown can be done with
INSERT SELECT statement

Informatica confidential. For discussion purposes only.

20

Design (Two-Pass)
Pass 2:
Bypass if phase 1 results in full pushdown optimization
Start from the target and traverse transformations upstream
and build SQL statement (INSERT, DELETE, and
UPDATE) for partial pushdown to target
Stop if a transformation cannot be processed in target
database or already pushed to source database

Informatica confidential. For discussion purposes only.

21

Considerations
Error handling subject to DBMS error handling
No row-level error logging
For mappings that generate long transaction
Require more database resources (locks and log space)
No partial commit: entire transaction rolled back when an error is encountered

Result when executing in PowerCenter vs. pushed to DB may be


different based on DB config

Case sensitivity
How null is treated in sort order
Formats (numeric value conversion to char; date conversion to char)
Data precision

Informatica confidential. For discussion purposes only.

22

Limitations
A transformation will not be pushed down / stops the optimization if:
A Source Qualifier, lookup, update transformation contains a SQL override
Optimizer does not parse user-defined SQL override (i.e. lookup, update, DSQ)
DSQ SQL override limitation will be removed in GA by using temporary views

Use mapping variable


Contains a variable port
Override default values for input/output ports
An expression uses a function that has no equivalent function in the
database
It is part of a data profiling session
Debugging is turned on
An external loader is used (can only push to source, not to target)
Row error logging is enabled

Informatica confidential. For discussion purposes only.

23

Limitations

A transformation will not be pushed down / stops the


optimization if:
Mapping has too complex i.e. too many pipeline branches (max 64
two-way branches, 43 three-way branches, or 32 four-way branches)
Partitioning is configured where:
The partition type is not pass thru
There are different partition types for transformations in the pipeline and the
optimizer cant remerge the partitions

Multiple match for lookup is configured (except for error report)


Limited by single SQL statement generated at target (INSERT into).
Optimizer doesnt use temp tables or views (in FCS, GA will use
temporary views)
Generated SQL cant be modified

Informatica confidential. For discussion purposes only.

24

Appropriate Use of Pushdown Optimization


Pushdown Optimization is
ideal where:
Source and target are located
in the same database
Transformations processed in
the source DB reduces the
amount of data moved
Such as filters, aggregators

Processing within
PowerCenter is used
when :
Operation cant be done in
database (i.e. using SQL)
Source or target is not a
database

Informatica confidential. For discussion purposes only.

25

You might also like