You are on page 1of 10

Informatica has a Service Oriented Architecture (SOA).

A service-oriented architecture (SOA) is an architectural pattern in computer


software design in which application components provide services to other
components via a communications protocol, typically over a network. The principles
of service-orientation are independent of any vendor, product or technology.
It can be Single Node or Multiple Node.
We will discuss Multi Node here
Informatica Architecture is divided into two components:
1. Server Components -->Server Components in Informatica are:
a. Repository Server
b. Informatica Server
2. Client Components --> Client Components in Informatica are:
a. Designer
b. Workflow Manager
c. Workflow Monitor
d. Repository Manager

Domain: Domain is the primary unit for management and administration of


services in Powercenter.It is the collection of nodes and services.
The Informatica domain consists of one or more servers, one or more installations of
the Informatica software, and at least one relational database.
Domain Basics

The most basic description of an Informatica domain is this: There is a node, and
there is a domain database. The node is where stuff runs. The database is where the
node keeps stuff.
The domain database is a standard relational database. Informatica supports
Oracle, IBM DB2, Sybase, and MSSQLServer for domain databases.
The node connects to the domain database through JDBC. It does not use native
database connectivity, and it does not use ODBC. In all but a few rare situations, the
supported JDBC drivers are the ones that Informatica has licensed from DataDirect,
and they are the ones that you should use for the Informatica domain.
The domain database is like a 'backbone' that supports all of the moving parts in
the domain. It holds metadata for services that run on the node. It also stores
records of jobs that ran on that node, as well as other data.
.There are two types of services in Domain

Service Manager: Service manager manages domain operations like


authentication, authorization, and logging. It also runs application services on
the nodes as well as manages users and groups.
Application Services: Application service represents the server specific
services like integration service, repository service, and reporting service. These
services run on different nodes based upon the configuration.
Node: Node is logical representation of a machine in a domain.
Nodes and Domains Architecture

When you install and run the Informatica services, the installation is known as a
node. The node becomes part of an Informatica domain. A domain is a grouping of
one or more nodes. The domain forms the environment upon which the Informatica
service processes run. One of the nodes in the domain connects to a relational
database. The database holds the tables for the domain configuration repository.
You can add nodes to the domain when you install the node, or you can change an
existing node and add it to another domain.
Two Kinds of Nodes

There are two kinds of nodes, the gateway node and the worker node. The gateway
nodes can run management services for the domain, and they are also the ones
that communicate with the domain database. The worker node can run application
services, but cannot communicate with the domain database.
Only one gateway node can run the management services at a time, and only one
node can talk to the domain database at a time, regardless of how many nodes are
in the domain. The node that performs these tasks is the master gateway node.
While there is no upper limit on the number of nodes in a domain, each domain has
a minimum of one gateway node.

Worker Nodes

A worker node runs a Service Manager process, and it can run application services.
The worker node cannot run the extra management processes, nor does it
communicate with the domain database. This can be good, because it does not
require the extra resources for management, but it cannot take over as a master
gateway node.

Gateway Nodes

A gateway node can run application services, and it runs its own Service Manager
Process. When you start the master gateway node, the management services log
their startup progress to node.log and catalina.out. You can find both of these log
files in INFA_HOME/tomcat/logs.
A node in the domain can be a worker node or a gateway node. A gateway node can
connect to the domain database and change the metadata stored in the domain
tables. A gateway node can also be a master gateway node. The master gateway
node runs management services for the rest of the domain and it is the one node in
the domain that updates the domain database. A worker node runs service
processes and it also connects to the master gateway node. A domain can have any
number of worker nodes and gateway nodes, but it can have only one master
gateway node at a time.
The running domain requires a master gateway node to be running as well. If the
domain has more than one gateway node, and the master gateway fails, one the
other gateway nodes elect a new master gateway node.

What Happens When a Node Goes Down? Fail Over Mechanism

There are many ways to answer this question, and the answer becomes more
complicated on domains that host a larger number of nodes. First it waits for the
specified Resilience time (It is the time defined in Informatica to wait when it looses
the connection on a network failure).
After that, it might end up into four scenarios.

Single-node domain
When the one node in a single-node domain goes down, the domain dies. Any
services running on that node go down as well.

Two-node domain with a worker node and a gateway node


In a domain with one gateway node and one worker node, if the lone gateway node
goes down, so does the worker node and any application services. If the worker
node goes down, then the gateway node will continue running.

Two-node domain with two gateway nodes


In a domain with two gateway nodes, and the master gateway node goes down, the
second gateway node will take over for the first one.

Multiple-node domain with more than two gateway nodes


In a domain with more than two gateway nodes, where the master gateway dies,
the remaining gateway nodes hold an election to select the next master gateway
node.

Repository Service

The repository service maintains the connections from Powercenter clients to the
PowerCenter repository. It is a separate multi-threaded process, and it fetches,
inserts and updates the metadata inside the repository. It is also responsible for
maintaining consistency inside the repository metadata.
Integration Service

Integration service is the executing engine for the Informatica, in other words, this is
the entity which executes the tasks that we create in Informatica. This is how it
works

A user executes a workflow


Informatica instructs the integration service to execute the workflow
The integration service reads workflow details from the repository
Integration service starts execution of the tasks inside the workflow
Once execution is complete, the status of the task is updated i.e. failed,
succeeded or aborted.
After completion of execution, session log and workflow log is generated.
This service is responsible for loading data into the target systems
The integration service also combines data from different sources
For example, it can combine data from an oracle table and a flat file source.

So, in summary, Informatica integration service is a process residing on the


Informatica server waiting for tasks to be assigned for the execution. When we
execute a workflow, the integration service receives a notification to execute the
workflow. Then the integration service reads the workflow to know the details like
which tasks it has to execute like mappings & at what timings. Then the service
reads the task details from the repository and proceeds with the execution.

Integration Services has 3 components namely:


Integration Service Process
Load Balancer
Data Transformation Manager
Integration Service Process
In admin community it is called as pmserver process.
Integration Service can start one or more service ISP to run and
monitor the workflow
It is used to freeze the workflow .
The Integration Service starts one or more Integration Service
processes to run and monitor workflows.
When you run a workflow, the Integration Service process
starts
and locks the workflow, runs the workflow tasks, and starts the
process to run sessions.
Integration Service process passes Parameter File and Session
information to DTM process, that helps it to retrieve required
metadata from the repository.

Load Balancer
It is used to distribute the load among the nodes.
The Load Balancer dispatches tasks to achieve optimal
performance.
It may dispatch tasks to a single node or across the nodes in a
grid.

There are types load balancing technique:


Round Robin
In Round Robin fashion Dispatch mode allocates the
tasks in round robin fashion.
It is the most preferable method load dispatching.
Adaptive
In Adaptive mode, Dispatch Mode considers the
node with most available CPU and allocate the
process to it. The load balancer takes into account
CPU power when running in Adaptive mode. After
the partition group is formed in the DTM, the load
balancer sends a message to the Integration
Service, to distribute the different partition groups
(worker DTM) across the nodes of the grid.
Metrics
It checks all resource provision thresholds on each available node and excludes a
node if dispatching a task causes the thresholds to be exceeded. The Load Balancer
continues to evaluate nodes until it finds a node that can accept the task. This
mode prevents overloading nodes when tasks have uneven computing
requirements

Data Transformation Manager


Retrieves and validates session information from the repository.
Creates the session log.
Validates source and target code pages.
Reads the parameter file and expands workflow variables
Sends post-session email.
Sends a request to start worker DTM processes on other nodes
when the session is configured to run on a grid.
Six threads are initiated by DTM
Master thread
Main thread of the DTM process
It manages other three threads.
Reader thread
It is used to fetch data from the source into the Informatica
memory.
One Thread for Each Partition for Each Source Pipeline
Transformation thread.
It is used to transform the data according to the business
requirement.
Writer thread
It is used to load transformed data to the target.
One Thread for Each Partition if target exist in the source
pipeline write to the target.
Mapping Thread
One Thread to Each Session.
Fetches Session and Mapping Information.
Pre and Post Session Thread
One Thread each to Perform Pre and Post Session Operations.

Admin Console:
The Administration Console is a web application that you use
to manage a domain and security.
Used to add/delete the node.
Manage application services. Manage all application services in the
domain, such as the Integration Service and Repository Service.
Configure nodes. Configure node properties, such as the backup
directory and resources. You can also shut down and restart
nodes
View log events. Use the Log Viewer to view domain, Integration
Service, Web Services Hub, and Repository Service log events.
Manage domain objects. Create and manage objects such as services,
nodes, licenses, and folders. Folders allow you to organize domain
objects and to manage security by setting permissions for domain
objects.
View and edit domain object properties. You can view and edit properties for all
objects in the domain, including the domain object.

You might also like