You are on page 1of 3

Lookup is one of the frequently used transformations in Informatica PowerCenter.

This transformation is
considered to be relatively simple, but at times this can become complex because of the amount of data to be
cached, number of columns used in the lookup condition, checks on columns fetched from lookup table etc... In
this article lets see how we can take care of couple of these issues using CHECKSUM number.

What is Checksum
A checksum is a value used to verify the integrity of a file or a data. Checksums are typically used to compare
two sets of data to make sure they are the same. If the checksums don't match those of the original file or data,
the data may have been altered.

Informatica Implementation
Lets consider one of the complex lookup scenario, which we use in SCD Type 2 mapping for our demonstration
purpose. You can check out the complete SCD Type 2 Implementation from one of our previous article.
In normal SCD Type 2 mapping we use a Lookup to fetch all the columns (attributes) from the dimension tables
and compare the attributes with the records from the source. If there is a changed value in any of the
attributes, the record will be set for update else insert. If you are loading a dimension with hundreds of
attribute, you can imagine the complexity involved in the lookup in terms of number of columns in the lookup,
lookup

cache

size

and

comparison

of

the

attributes.

Lets consider the Customer Dimension load, we will be identifying the update based on any change in any of
the columns CUST_NAME, ADDRESS1, ADDRESS2, CITY, STATE or ZIP else the record will be inserted into
the Dimension Table.

Datamodel needs
We need to create an additional database table column to store the CHECKSUM number. Checksum number
created by Informatica is 32 character hexadecimal value. So add the column below to the Dimension Table.

CHK_SUM_NB VARCHAR2(32)

This Column in the target table will be populated for every record inserted into the Dimension Table, using the
function provided in the next section.

Checksum number generation

Informatica provides the function MD5() for Checksum generation. This function returns a unique 32 character
hexadecimal value. Below expression generates the checksum number for the Customer Dimension.

MD5(CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))

Note : This value needs to be populated for every record inserted into the Dimension Table by the mapping.

Lookup Transformation
Now we can create a simple lookup transformation to fetch the checksum number (CHK_SUM_NB, which is
already stored in the Dimension table T_DIM_CUST) and the surrogate key (CUST_KEY) from the dimension
table.

You can see the lookup is really simple, with one input and two output ports.
Now we just need a simple expression to identify the records for INSERT and UPDATE as below
IIF(ISNULL(CUST_KEY),'INSERT',
IIF(NOT ISNULL(CUST_KEY) AND CHK_SUM_NB <> NEW_CHK_SUM_NB, 'UPDATE'))
Note : NEW_CHK_SUM_NB is the Checksum for the incoming record from source table. Calculate the
Checksum in an Expression Transformation using MD5() function. MD5(CUST_NAME || ADDRESS1 ||
ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))

Design Advantages

Lookup Transformation made simple and the mapping as well.

Less number of columns in the lookup cache, hence improved lookup performance.

Avoid complex expression logic to make the mapping simple.

Hope you guys enjoyed this and you will be able to implement this in your upcoming projects. Please leave us a
comment in case you have any questions of difficulties implementing this.

People who read this also read :

You might also like