You are on page 1of 41

INFORMATICA

Overview

A DataWarehouse is a collection of subject


oriented databases. It is a series of processes,
procedures and tools (h/w & s/w).
From the Data Warehouse , data flows to
various customized databases. If this data is
periodically extracted from data warehouse and
loaded into local databases, then local
database is called a Data Mart.
Complete Warehouse Solution Architecture

Data Information Knowledge


Data Sources Data Management Access
Sales
Data
Mart
Metadata

Legacy Data
Inventory
Extract Data
Enterprise
Transform Mart
Data
Load
Warehouse
Operational Data
The Post
Purchase
Organizationally Data
structured Mart

VISA
Departmentally
External Data structured
Sources
Asset Assembly (and Management) Asset Exploitation
Use of Informatica in Datawarehousing
Overview

The data in the data warehouse comes from


various sources running on different platforms. An
ETL tool is used to integrate data from various
sources and load it into DataWarehouse.
INFORMATICA is an ETL tool used in the process
of Extracting data, transforming the data and
loading it in data warehouse. INFORMATICA has
two products to carry out this ETL process.
PowerCenter
PowerMart
Overview

Source Target
Source Server Transformed
Data Data

Instru
ction
s

Repository
Components
INFORMATICA PowerCenter has following components :
•ODBC
•PowerCenter Server: It is a application that reads,
transforms and writes data to target.
Components

•PowerCenter Client : The client has five different


tools:
The Source Analyzer : Used to add
source definitions to the repository.
The Warehouse Designer : Used to
create targets and add their definitions to the
repository.
The Transformation Developer : Used to
create reusable transformations.
Components

Mapplet Designer : Used to create


mapplets.
The Mapping Designer : Used to create
mappings from source to targets.
Connectivity And Set Up
Configuring Server Manager
• Informatica Server name
• Type of network protocol to access the server
– TCP/IP or IPX/SPX
• Port number on which the client
communicates (for TCP/IP) - 4001
• Address of machine on which the server runs
(for IPX/SPX)
• Timeout – number of seconds the SM waits
for response from Informatica Server
Configuring Server Manager
• Default directories for session files and
caches e.g $PMRootDir, $PMSessionLogDir,
$PMBadFileDir
• Defining Database Connections
• Defining FTP connections
Features

•INFORMATICA Server : Reads data from


sources, transforms data as instructed by
repository metadata and writes it to target.
Features

•Repository manager: Used to create and


manage repositories.

Repository is a database containing a set


of instructions to know from where to get data
(source), how to process/transform it and where
to write it (target). This set of instructions is called
metadata.
Features

You can create repository users and groups,


assign privileges and permissions, manage
folders and locks, import and export from ODBC
data sources.
•Designer: used to create mappings and target
tables.
•Server manager: used to create sessions and
configure the schedule to run the sessions.
Repository User Management

Multiple developers can use same repository


to create/manage multiple projects or same
project.
Informatica allows to create separate user
profile for each developer with separate
username and password.
Repository User Management

Privileges like Administer Server, Create sessions,


User Designer can be assigned to each user on
repository.
Groups of users can be created and privileges can
be granted to the groups.
A user can be member of one or more groups.
Repository User Management

Access can be restricted to individual folders within a


repository.
Permissions of following types can be granted to
Owner, Owner’s group and Repository users on
folders:
 Read: Allow to view the folder and objects within
the folder.
 Write: Allow to create and edit objects within the
folder.
 Execute: Allow to execute or schedule a session
in the folder.
Designer
• Creation of mappings

MAPPING

Type of metadata that you create to specify how


to move and transform data between sources
and targets
- Stored in Repository
Mapping

A mapping describes how to move and transform


data from sources to targets. Mapping includes:
Source
Target
Transformations
Mapping
Sample Mapping
Transformations

A component of a mapping which describes


how Informatica Server should transform data.
Transformations

There are two categories of transformations


depending upon their scope:

Standard Transformation: It is created in a mapping


and exists within that mapping. It can not be used in
other mappings.

 Reusable Transformation: It is created and stored


independently in the repository. It can be used by all
mappings.
Transformations

Following are the types of transformations:

Expression – Calculate a value or modify text.


Operates on individual rows.
Aggregator – Perform aggregate calculations.
Operates on sets of rows.
Transformations

Source Qualifier – Filter records read from the


relational source only. Order records queried by
Informatica server.
Filter – Filter records sent to the targets.
Applicable to any source.
Stored Procedure – Call a stored procedure.
External procedure/Advanced External
Procedure – Call a procedure in a shared library
(e.g. a DLL) or in a COM layer of Windows NT.
Transformations

Sequence Generator – Generates primary


keys.
Rank – Limit records to a top or bottom range.
Normalizer – Normalize records including those
read from COBOL sources.
Lookup – Get related values.
Transformations

Update Strategy – Determine whether to insert,


update, delete or reject data.
Joiner – Join records from different databases
or flat file systems.
Transformations

Every mapping needs at least one


Source Qualifier Transformation or a
normalizer transformation for COBOL
sources.
Ports

A port represents a single column of data.


Every source definition, target definition and
transformation contains a collection of ports.
Ports

There exist four types of ports:

Input port - Receives data.

Output port – provide data.

Input/Output port – pass data.

Variable port – Used to store components


of expression.
Ports

Source definitions contain only output ports, since


they provide data.

Target definitions contain only input ports, since they


receive data.

Transformations contain a combination of input port,


output port and input/output port, since they can
pass the data as it is or modify the data depending
upon its type.
Transformation Language

Transformation Language is used to write


expressions for Transformations. It consists of
functions (similar to SQL) used to modify the
data or validate the data.
Transformation Language

Expressions can be written in following


types of transformations:
Aggregator

Expression

Filter

Rank

Update Strategy.
Transformation Language

Transformation Language consists of following


components:
 Functions : E.g. AVG, COUNT, ISNULL,
SUBSTR, IIF etc.
 Operators : E.g. Addition, Subtraction,
Multiplication, Division etc.
 Constants : E.g. Built-in constants like TRUE
 Variables : E.g. SYSDATE to represent current
date.
 Return Values.
Mapplets

A Mapplet is a reusable object created in a


repository that represents a set of
transformations.
Summary

Basic steps to create a project:

Create database that contains repository.


Create data model for target.
Create repositories.
Create folders within repositories.
Import definitions of sources.
Create targets that will receive data.
Summary

Create mappings between source & targets,


including transformations which modify the data.
Create source & target connections in the server
manager.
Create sessions for transferring data between
source & target.
Schedule & run sessions.

You might also like