You are on page 1of 39

What is the relation between EME , GDE

and Co-operating system ?


ans. EME is said as enterprise metdata env, GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server
relation b/w this CO-OP, EME AND GDE is as fallows
Co operating system is the Abinitio Server. this co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the EME,
its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to GDE its
is
end
user
envirinment
where
we
can
devlop
the
graphs(mapping
just
like
in
informatica)
desinger uses the GDE and designs the graphs and save to the EME or Sand box it is at user side.where EME is ast server side.

What is the use of aggregation when we


have rollup
as we know rollup component in abinitio is used to summirize group of data record. then where we will use aggregation ?
ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a particular
summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input and output
filtering
Aggregate

and

rollup

of
same

perform

result
in
main
memory,
what are kinds of layouts does ab initio supports

Aggregate

action,
does

rollup

not

records.
intermediat

display

support

intermediat

result

Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on
the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is
defined such as its same as the degree of parallelism.

How can you run a graph infinitely?


To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the
end
script
of
the
Like this the graph will run infinitely.

graph

there

should

be

call

to

abc.ksh.

How do you add default rules in


transformer?
Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform editor
click on the Edit menu and then select Add Default Rules from the dropdown. It will show two options 1) Match Names 2) Wildcard.

Do you know what a local lookup is?


If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call.
This is local to a particular partition depending on the key.

Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much
faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.

What is the difference between look-up file


and look-up, with a relevant example?
Generally Lookup file represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows
transform
A lookup

functions
to
is a component

retrive
records
of abinitio graph

much
more
quickly
where we can store data

than
it
and retrieve

could
it by

A
lookup
file
is
the
physical
file
where
the
data
for
How many components in your most complicated graph? It depends the type of components you us.

the

retrive
using a
lookup

from
Disk.
key parameter.
is

stored.

usually avoid using much complicated transform function in a graph.

Explain what is lookup?


Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file
(serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup
file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less
number
AbInitio

has

of
built-in

records
functions
to

with
retrieve

values

slim
using

What
is
a
The limit parameter contains an integer that represents a number of reject events

the

key

record
for

the

ramp

length.
lookup
limit?

The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.
no
of
bad
records
allowed
=
limit
+
no
of
records*ramp.
ramp
is
basically
the
percentage
This two together provides the threshold value of bad records.

value

(from

to

1)

Have you worked with packages?


Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function and can
include this in other transfer functions.

Have you used rollup component? Describe


how.
If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform function
and
it
contains
the
following
mandatory
functions.
1.
2.
3.
Also need to declare one temporary variable if you want to get counts of a particular group.

initialise
rollup
finalise

For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and
finally calls the finalise function once at the end of last rollup call.

How do you add default rules in


transformer?
Add Default Rules Opens the Add Default Rules dialog. Select one of the following: Match Names Match names: generates a set of
rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule Generates one rule that copies input fields to
output fields with the same name.
)If
2)Click

it

is

not
Business

the

already
Rules

displayed,
tab

display
it

if

the
is

Transform
Editor
not
already

Grid.
displayed.

3)Select Edit > Add Default Rules.


In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr
unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the
functionality.

What is the difference between partitioning


with key and round robin?
Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key is
present in large volume then there can large data skew. But this method is used more often for parallel data processing.
Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is
zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4
players in a round-robin manner.

How do you improve the performance of a


graph?
There

are

many

1)
2)

Use
Use

a
optimum

3)
4)

Minimise

Minimise
sorted

5)
6)

Use
Use

7)
8)

If

the

two
For

ways
limited
value
join

are
large

huge

performance

number
of
max

the
component

only
required
phasing/flow
inputs

the

and

fields
buffers
then use
dataset

9)
Minimise
the
use
of
regular
10) Avoid repartitioning of data unnecessarily

of
core
if

of

number
possible
in
in

graph
in
sort

can
a

sort,
of

otherwise
use

functions

them

like

use

sort
in-memory

by

reformat,
merge,
hash join
broadcast

re_index

be
particular
join

and

of
replace

the
case

sorted join,
dont

expression

the

components
values
for

in

improved.
phase
components

components
join/hash
join

join
components
sorted
joins
with
the

proper
as
trasfer

driving port
partitioner
functions

Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be
partitioned.
How do you truncate a table?
From
Abinitio
run
sql
By using the Truncate table component in Ab Initio

component

using

the

DDL

trucate

table

Have you eveer encountered an error called


depth not equal?
When two components are linked together if their layout doesnot match then this problem can occur during the compilation of the graph. A
solution to this problem would be to use a partitioning component in between if there was change in layout.

What is the function you would use to


transfer a string into a decimal?
In this case no specific function is required if the size of the string and decimal is same. Just use decimal cast with the size in the transform
function and will suffice. For example, if the source field is defined as string(8) and the destination as decimal(8) then (say the field name is
field1).
out.field :: (decimal(8)) in.field
If the destination field size is lesser than the input then use of string_substring function can be used likie the following.
say destination field is decimal(5).
out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5))
What are primary keys and foreign keys?

/*

string_lrtrim

used

to

trim

leading

and

trailing

spaces

*/

In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary key table is
the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching column.

What is the difference between clustered and non-clustered indices? and why do you use a clustered index?
What is an outer join?

An outer join is used when one wants to select all the records from a port whether it has satisfied the join
criteria or not.

What are Cartesian joins?


joins
two
tables

without

join

key.

Key

should

be

{}.

What
is
the
purpose
of
having
stored
procedures
in
a
database?
Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high.
Why might you create a stored procedure with the with recompile option?

Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to the heavy
modification activity the execute plan becomes outdated and hence the stored proc performance goes down. If we create the stored proc with
recompile option, the sql server wont cache a plan for this stored proc and it will be recompiled every time it is run.

What is a cursor? Within a cursor, how


would you update fields on the row just
fetched
The oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.There are two types of
cursors like Implecit cursor and Explicit cursor.Implicit cursor is using for internal processing and Explicit cursor is using for user open for data
required.

How would you find out whether a SQL


query is using the indices you expect?
explain plan can be reviewed to check the execution plan of the query. This would guide if the expected indexes are used or not.

How can you force the optimizer to use a


particular index?
use hints /*+ */, these acts as directives to the optimizer
select /*+ index(a index_name) full(b) */ *from table1 a, table2 bwhere b.col1 = a.col1 and b.col2= sidand b.col3 = 1;
When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or explicit transactions, and why.
Because

implicit

is

using

for

internal

processing

and

explicit

is

using

for

user

open

data

requied.

Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with each other.
Because every job depend upon another job for example if you first job result is successfull then another job will execute otherwise your job
doesnt work.

Describe the process steps you would


perform when defragmenting a data table.
This
There
1) We

table
can

are
move the

contains
table

in

several
the same

or

mission
other

ways
tablespace

and

critical
to
rebuild

all

the

do
indexes

data.
on

the

this:
table.

alter

table move this

activity

reclaims

the

defragmented

space

analyze
table
table_name
compute
statistics
to
capture
the
2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.

in

the

updated

table
statistics.

Explain the difference between the


truncate and delete commands.
The difference between the TRUNCATE and DELETE statement is Truncate belongs to DDL command whereas DELETE belongs to DML
command.Rollback cannot be performed incase of Truncate statement wheras Rollback can be performed in Delete statement. WHERE
clause cannot be used in Truncate where as WHERE clause can be used in DELETE statement.

What is the difference between a DB config


and a CFG file?
A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the table
configuration file created by db_config while using components like Load DB Table.

Describe the Grant/Revoke DDL facility


and how it is implemented.
Basically,This is a part of D.B.A responsibilities GRANT means permissions for example GRANT CREATE TABLE ,CREATE VIEW AND
MANY
MORE
REVOKE means cancel the grant (permissions).So,Grant or Revoke both commands depend upon D.B.A.

Have
you
worked
with
packages?
Ans: Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function
and can include this in other transfer functions.
Have
you
used
rollup
component?
Describe
how.
Ans: If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform
function
and
it
contains
the
following
mandatory
functions.
1.
2.

initialise
rollup

3.
Also need to declare one temporary variable if you want to get counts of a particular group.

finalise

For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and
finally calls the finalise function once at the end of last rollup call.

How
do
you
add
default
rules
in
transformer?
Ans: Add Default Rules Opens the Add Default Rules dialog. Select one of the following: Match Names Match names: generates a set
of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule Generates one rule that copies input fields to
output fields with the same name.
1)If
2)Click

it

is
the

not
Business

3)Select Edit > Add Default Rules.

already
Rules

displayed,
tab

if

display
it

the
is

Transform
Editor
not
already

Grid.
displayed.

In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr
unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the
functionality.

What
is
the
difference
between
partitioning
with
key
and
round
robin?
Ans: Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key
is present in large volume then there can large data skew. But this method is used more often for parallel data processing.
Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is
zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4
players in a round-robin manner.
How
Ans: There

do
are

you
many

1)
2)

Use
Use

a
optimum

3)
4)

Minimise

Minimise
sorted

5)
6)

Use
Use

7)
8)
9)
10)

If

the

two
For

Minimise

improve
ways
the

limited
value
join

are
large

the
use
Avoid

number
max

the
component

only
required
phasing/flow
inputs

of

huge
of

the
performance

and

fields
buffers
then use
dataset

of
core
if

performance
of
the
components
values
for

number
possible
in
in

regular
expression
repartitioning

in
sort

like

use

sort
in-memory

by

hash join
broadcast
re_index

of

graph?
improved.

be
particular
join

and

reformat,
merge,

of

otherwise
use

functions

them

sort,

a
can

of
replace

the
case

sorted join,
dont

of
graph

phase
components

components
join/hash
join

join
components
sorted
joins
with

in
data

the

proper
as
trasfer

driving port
partitioner
functions
unnecessarily

Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be
partitioned.
How
Ans: From

do
Abinitio

you
run

sql

component

truncate
using

the

a
DDL

trucate

table?
table

By using the Truncate table component in Ab Initio


What
is
the
relation
between
EME
,
GDE
and
Co-operating
system
?
Ans
: EME
is
said
as
enterprise
metdata
env,
GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server relation b/w this CO-OP, EME AND GDE is as
follows
Co operating system is the Abinitio Server.This co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the
EME, its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to
GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) desinger uses the GDE and designs the
graphs and save to the EME or Sand box it is at user side where EME is ast server side.
What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to
summirize
group
of
data
record.
then
where
we
will
use
aggregation
?
Ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a
particular summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input
and
output
filtering
of
records.
Aggregate
and
rollup
perform
same
action,
rollup
display
intermediat
result in main memory, Aggregate does not support intermediat result.
What
are
kinds
of
layouts
does
ab
initio
supports?
Ans: Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends
on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is
defined such as its same as the degree of parallelism.
How
can
you
run
a
graph
infinitely?
Ans: To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then
in the end script of the graph there should be a call to abc.ksh. Like this the graph will run infinitely.
How
do
you
add
default
rules
in
transformer?
Ans : Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform
editor click on the Edit menu and then select Add Default Rules from the dropdown. It will show two options - 1) Match Names 2)
Wildcard.
Do
you
know
what
a
local
lookup
is?
Ans : If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup
function
call.
This
is
local
to
a
particular
partition
depending
on
the
key.
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much
faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.

What is the difference between look-up file and look-up, with a relevant example? Ans: Generally Lookup file
represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows transform functions
to
retrive
records
much
more
quickly
than
it
could
retrive
from
Disk.
A lookup is a component of abinitio graph where we can
A lookup file is the physical file where the data for the lookup is stored.

store

data

and

retrieve

it

by

using

key

parameter.

How many components in your most complicated graph? It depends the type of components you us.
Ans: Usually avoid using much complicated transform function in a graph.
Explain
what
is
lookup?
Ans: Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file
(serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup
file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less
number
of
records
with
AbInitio has built-in functions to retrieve values using the key for the lookup
What
Ans: The

is
limit

parameter

contains

an

a
integer

that

slim

represents

record
ramp
a
number

of

length.

reject

limit?
events

The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.
no
of
bad
records
allowed =
limit
+
no
of
records*ramp.
ramp
is
basically
the
percentage
This two together provides the threshold value of bad records.

value

(from

What is destructor what is destructor


What is XML-RPC? What is XML-RPC?
What is new about Web services? What is new about Web services?
What is a Web service? What is a Web service?

What kind of services operating system provides? What kind of services operating system provides?

What is logic? What is logic?

What is algorithm? What is algorithm?

What is constant? What is constant?

What is variable? What is variable?

What for an assignment statement is used? What for an assignment statement is used?

What are the four basic types of data? What are the four basic types of data?

What for a conditional loop is best suited? What for a conditional loop is best suited?

What for an incremented loop is best suited? What for an incremented loop is best suited?

What is Relational operators used for? What is Relational operators used for?

What Relational Operators Do you know? (C) What Relational Operators Do you know? (C)

What does grep() stand for? (Unix interview question) What does grep() stand for? (Unix interview question)

What does RPG stand for? What does RPG stand for?

to

1)

What does RPG stand for? What does RPG stand for?

What does Lisp stand for? What does Lisp stand for?

What does HTML stand for? . What does HTML stand for?

What does Fortran stand for? What does Fortran stand for?

What does DOS stand for? What does DOS stand for?

What does CGI stand for? What does CGI stand for?

What does CORBA stand for? What does CORBA stand for?

What does Cobol stand for? What does Cobol stand for?

What does Case stand for? What does Case stand for?

What does BASIC stand for? What does BASIC stand for?

What does ASCII stand for? What does ASCII stand for?

What does Algol stand for? What does Algol stand for?

What does SQL stand for? What does SQL stand for?
What is the latest version that is available in Ab-initio?
How to take the input data from an excel sheet?

How will you test a dbc file from command prompt ?

Which one is faster for processing fixed length dmls or delimited dmls and why ?

What are the contineous components in Abinitio?

What is meant by fancing in abinitio ?

What is the relation between EME , GDE and Co-operating system ?

What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summirize group of
data record. then where we will use aggregation ?

Describe the process steps you would perform when defragmenting a data table. This table contains mission critical data.

Explain the difference between the ?truncate? and delete commands.

When running a stored procedure definition script how would you guarantee the definition could be rolled back in the
event of problems.

Describe the ?Grant/Revoke? DDL facility and how it is implemented.

Describe how you would ensure that database object definitions (Tables, Indices, Constraints, Triggers, Users, Logins,
Connection Options, and Server Options etc) are consistent and repeatable between multiple database instances (i.e.: a
test and production copy of a database).

What is the difference between a DB config and a CFG file?

What about DML changes dynamically?

What is backward compatibility in abinitio?

What are kinds of layouts does ab initio supports

How do you add default rules in transformer?

Have you used rollup component? Describe how.

What are primary keys and foreign keys?

What is an outer join?

What are Cartesian joins?

What is the purpose of having stored procedures in a database?

What is a cursor? Within a cursor, how would you update fields on the row just fetched?

How would you find out whether a SQL query is using the indices you expect?

How can you force the optimizer to use a particular index?

When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or explicit transactions,
and why.

Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with each other.

What is semi-join

How to get DML using Utilities in UNIX?

What is driving port? When do you use it?

What is local and formal parameter

What is BRODCASTING and REPLICATE ?


Explain what is lookup?
Have you worked with packages?

How to create repository in abinitio for stand alone system(LOCAL NT)?

What is the difference between .dbc and .cfg file?

What does dependency analysis mean in Ab Initio?

What do you have to give the value for the Record Required parameter for a natural join?

When do you use Partition by Expression?

What is Adhoc File System? Give me a scenario where you used it.

What are the different commands that you used when writing wrappers?

What do the hidden files in a sandbox represent and what does start.ksh represent?

How can we test the abintio manually and automation?

What is the difference between sandbox and EME, can we perform checkin and checkout through sandbox/ Can anybody
explain checkin and checkout?

What does layout means in terms of Ab Initio

What are different things that you have to consider when loading data into a table?

How to Create Surrogate Key using Ab Initio?

Can anyone give me an exaple of realtime start script in the graph?

What are differences between different GDE versions(1.10,1.11,1.12,1.13and 1.15)? What are differences between different
versions of Co-op?

Do you know what a local lookup is?

How many components in your most complicated graph?

How to handle if DML changes dynamically in abinitio

Explain what is lookup?

Have you worked with packages?

How to run the graph without GDE?

What are the different versions and releases of ABinitio (GDE and Co-op version)

What is the Difference between DML Expression and XFR Expression ?

How Does MAXCORE works?

What is $mpjret? Where it is used in ab-initio?

How do you convert 4-way MFS to 8-way mfs?


What is skew and skew measurement?
What is the importance of EME in abinitio?

How do you add default rules in transformer?

What is difference between file and table in abinitio

How to create a computer program that computes the monthly interest charge on a credit card account?

What is .abinitiorc and What it contain?

What do you mean by .profile in Abinitio and what does it contains?

What is data mapping and data modelling?

What is the difference between partitioning with key and round robin?

Can anyone tell me what happens when the graph run? i.e The Co-operating System will be at the host, We are running the
graph
at
some
other
place.
How
How would you do performance tuning for already built graph ? Can you let me know some examples?

the

How to execute the graph from start to end stages? Tell me and how to run graph in non-Abinitio system?

What are the most commonly used components in a Abinition graph? can anybody give me a practical example of a
trasformation of data, say customer data in a credit card company into meaningful output based on business rules?

Can we load multiple files?

Can anyone please explain the environment varaibles with example.

Explain the differences between api and utility mode?

Please let me know whether we have ab initio GDE version 1.14 and what is the latest GDE version and Co-op version?

What are the Graph parameter?

How to find the number of arguments defined in graph..

What is the difference between rollup and scan?

How to work with parameterized graphs?

Please give us insight on Enterprise Meta Environment, and some possible questions on that.

What are delta table and master table?

What error would you get when you use Partition by Round Robin and Join?

Do you know what a local lookup is?

How many components in your most complicated graph?

How to handle if DML changes dynamically in abinitio


How do you count the number of records in a flat file?
How do you connect EME to Abinitio Server?

Have you eveer encountered an error called depth not equal? (This occurs when you extensively create graphs it is a trick
question)

What is the difference between a DB config and a CFG file?

Do you know what a local lookup is?

What is the difference between look-up file and look-up, with a relevant example?

Have you worked with packages?

In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?

What are the different dimension tables that you used and some columns in the fact table?

What is the difference between a Scan component and a RollUp component?

How do we handle if DML changing dynamicaly

What is m_dump

What is the syntax of m_dump command?

Have you used rollup component? Describe how.

How do you improve the performance of a graph?

How many components are there in your most complicated graph?

What is the function you would use to transfer a string into a decimal?

For data parallelism, we can use partition components. For component parallelism, we can use replicate component. Like
this which component(s) can we use for pipeline parallelism?

What is AB_LOCAL expression where do you use it in ab-initio?

What is mean by Co > Operating system and why it is special for Ab-initio ?

How to retrive data from database to source in that case whice componenet is used for this?

How can you run a graph infinitely?

What is the syntax of m_dump command?

How to do we run sequences of jobs ,, like output of A JOB is Input to B How do we co-ordinate the jobs

How do you truncate a table?


What is a ramp limit?
What is the difference between dbc and cfg? When do you use these two?

What are the compilation errors you came across while executing your graphs?

What is depth_error?

Difference between conventional loading and direct loading ? When it is used in real time .

During the execution of graph, let us say you lost the network connection, would you have to start the process all over
again or does it start from where it stopped?

What are the different types of partitions and scenarios.

What does dependency analysis mean in Ab Initio?

What does unused port in join component do?

Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name,
Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other
partition?

What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?

How did you do version control? Which tool did you use?

How do you troubleshoot performance issues in graph?

What are the usual errors that you encounter during ETL process apart from compilation process?

Were you involved in production support? What were the different kinds of problems that you encountered?

How do you count the number of records in a multifile system without using GDE?

What does Scan and Rollup component do and give a scenario where you used them?

Did you ever used user defined functions or packages? If yes, give a scenario.

What is difference between Redefine Format and Reformat components?

Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?

Why might you create a stored procedure with the with recompile option?

How many parallelisms are in Abinitio? Please give a definition of each.

How to Schedule Graphs in AbInitio, like workflow Schedule in Informatica? And where we must is Unix shell scripting in
AbInitio?

How to Improve Performance of graphs in Ab initio? Give some examples or tips.

Ab Initio Questions and Answers:

1 :: What does dependency analysis mean in Ab Initio?


Dependency analysis will answer the questions regarding datalinage.That is where does the data come from,what applications
prodeuce
and
depend
on
this
data
etc.
We can retrieve the maximum (surrogate key) from the existing data,the by using scan or next_in_sequence/reformat we can
generate further sequence for new records.

1 Yes

1 No

Is This Answer Correct?

2 :: When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or
explicit transactions, and why?
Because implicit is using for internal processing and explicit is using for user open data requied.

1 Yes

1 No

Is This Answer Correct?

3 :: Describe the Grant/Revoke DDL facility and how it is implemented?


Basically,This is a part of D.B.A responsibilities GRANT means permissions for example GRANT CREATE TABLE ,CREATE VIEW
AND
MANY
MORE
.
REVOKE means cancel the grant (permissions).So,Grant or Revoke both commands depend upon D.B.A.

1 Yes

0 No

Is This Answer Correct?

4 :: What is the difference between rollup and scan?


By using rollup we cant generate cumulative summary records for that we will be using scan.

1 Yes

1 No

Is This Answer Correct?

5 :: Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with
each other?
Because every job depend upon another job for example if you first job result is successfull then another job will execute
otherwise your job doesn't work.

0 Yes

1 No

Is This Answer Correct?

6 :: How can i run the 2 GUI merge files?


Do you mean by merging Gui map files in WR.If so, by merging GUI map files in GUI map editor it wont create corresponding test
script.without testscript you cant run a file.So it is impossible to run a file by merging 2 GUI map files.

0 Yes

1 No

Is This Answer Correct?

7 :: Describe how you would ensure that database object definitions (Tables, Indices, Constraints, Triggers,
Users, Logins, Connection Options, and Server Options etc) are consistent and repeatable between multiple
database instances (i.e.: a test and production copy of a database)?
Take

an

entire

database

backup

and

restore

it

in

different

instance.

Take

statistics

of

all

valid

and

invalid

objects

and

match.

Periodically refresh

0 Yes

0 No

Is This Answer Correct?

8 :: How would you find out whether a SQL query is using the indices you expect?
Explain plan can be reviewed to check the execution plan of the query. This would guide if the expected indexes are used or not.

0 Yes

0 No

Is This Answer Correct?

9 :: How to create repository in abinitio for stand alone system(LOCAL NT)?


If you are trying to install the Ab -Initio on stand alone machine , then it is not necessary to create the repository , While installing It
creates automatically for you under abinitio folder ( where you installing the Ab-Initio) If you are still not clear please ask your
Question on the same portal .

0 Yes

0 No

Is This Answer Correct?

10 :: When running a stored procedure definition script how would you guarantee the definition could be
rolled back in the event of problems?
There are quite a few factors that determines the approach such as what type of version control are used, what is the size of the
change, what is the impact of the change, is it a new procedure or replacing an existing and so on.
If

it

is

new,

then

just

drop

the

wrong

one

if it is a replacement then how big is the change and what will be the possible impact, depending upon you can have the entire
database backed up or just create a script for your original procedure before messing it up or you just do an ed and change the file
back to original and reapply. you may rename the old procedure as old and then work on new and so on.
few issues to keep in mind are synonyms, dependancies, grants, any job calling the procedure at the time of change and so on. In
nutshell, scenario can be varied and solution also can be varied.

11 :: Explain the difference between the truncate and delete commands?


Truncate
:
It is a DDL command, used to delete tables or clusters. Since it is a DDL command hence it is auto commit and Rollback can't be
performed.
It
is
faster
than
delete.
Delete:
It is DML command, generally used to delete a record, clusters or tables. Rollback command can be performed , in order to retrieve
the earlier deleted things. To make deleted things permanently, "commit" command should be used.

0 Yes

0 No

Is This Answer Correct?

12 :: Describe the process steps you would perform when defragmenting a data table. This table contains
mission critical data?
There
1)

We

are
can

move

the

several
table

in

the

same

ways
or

other

tablespace

to
and

rebuild

do
all

the

indexes

this:
on

the

table.

alter

table

analyze

<table_name>
table

move

table_name

<tablespace_name>
compute

this

activity

statistics

reclaims
to

the

defragmented

space

the

updated

capture

in

the

table

statistics.

2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.

0 Yes

0 No

Is This Answer Correct?

13 :: How can you force the optimizer to use a particular index?


Use hints /*+ <hint> */, these acts as directives to the optimizer

0 Yes

0 No

Is This Answer Correct?

14 :: What is a cursor? Within a cursor, how would you update fields on the row just fetched?
The oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.There are two types
of cursors like Implecit cursor and Explicit cursor.Implicit cursor is using for internal processing and Explicit cursor is using for user
open for data required.

0 Yes

0 No

Is This Answer Correct?

15 :: Why might you create a stored procedure with the with recompile option?
Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to
the heavy modification activity the execute plan becomes outdated and hence the stored proc performance goes down. If we create
the stored proc with recompile option, the sql server wont cache a plan for this stored proc and it will be recompiled every time it is
run.

0 Yes

0 No

Is This Answer Correct?

Ab Initio Questions and Answers:

16 :: What is the purpose of having stored procedures in a database?


Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high.

0 Yes

0 No

Is This Answer Correct?

17 :: What are Cartesian joins?


A Cartesian join will get you a Cartesian product. A Cartesian join is when you join every row of one table to every row of another
table. You can also get one by joining every row of a table to every row of itself.

0 Yes

0 No

Is This Answer Correct?

18 :: What is an outer join?


An outer join is used when one wants to select all the records from a port - whether it has satisfied the join criteria or not.

0 Yes

0 No

Is This Answer Correct?

19 :: What are primary keys and foreign keys?


In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary
key table is the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching
column.

0 Yes

0 No

Is This Answer Correct?

20 :: Have you used rollup component? Describe how?


If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform
function
and
it
contains
the
following
mandatory
functions.
1.
initialise
2.
rollup
3.
finalise
Also need to declare one temporary variable if you want to get counts of a particular group.
For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the
group and finally calls the finalise function once at the end of last rollup call.

0 Yes

0 No

Is This Answer Correct?

Ab Initio Questions and Answers:

21 :: How do you convert 4-way MFS to 8-way mfs?


To convert 4 way to 8 way partition we need to change the layout in the partioning component. There will be seperate parameters
for each and every type of partioning eg. AI_MFS_HOME, AI_MFS_MEDIUM_HOME, AI_MFS_WIDE_HOME etc.
The appropriate parameter need to be selected in the component layout for the type of partioning.

Ab Initio Questions and Answers:

26 :: What is the Difference between DML Expression and XFR Expression?


The
DML

main

difference
represent

b/w
format

dml

&
of

xfr
the

is

that
metadata.

XFR
rules

represent

the

tranform

functions.which

will

contain

0 Yes

business

0 No

Is This Answer Correct?

27 :: How Does MAXCORE works?


Maxcore is a value (it will be in Kb).Whne ever a component is executed it will take that much memeory we specified for execution

0 Yes

0 No

0 Yes

0 No

Is This Answer Correct?

28 :: What is the syntax of m_dump command?


The genaral syntax is "m_dump metadata data [action] "

Is This Answer Correct?

29 :: Can anyone give me an exaple of realtime start script in the graph?


Here

is

simple

In

example

start

to

use

script

export

start

script

lets

in

give

as:

$DT=`date

Now

this

variable

Now

somewhere

DT
in

will
the

have
graph

graph:

'+%m%d%y'`

today's
transform

date
we

before
can

the
use

graph
this

is

run.

variable

as;

out.process_dt::$DT;
which provides the value from the shell.

0 Yes

0 No

Is This Answer Correct?

30

::

What

are

differences

between

different

GDE

versions(1.10,1.11,1.12,1.13and

1.15)?

What are differences between different versions of Co-op?


1.10

is

non

key

version

and

rest

are

There are lot of components added and revised at following versions.

0 Yes
Is This Answer Correct?

Ab Initio Questions and Answers:

0 No

key

versions.

31 :: How to run the graph without GDE?


In RUN ==> Deploy >> As script , it create a .bat file at ur host directory ,and then run .bat file from Command prompt

1 Yes

0 No

Is This Answer Correct?

32 :: What is local and formal parameter?


Two are graph level parameters but in local you need to initialize the value at the time of declaration where as globle no need to
initialize the data it will promt at the time of running the graph for that parameter.

0 Yes

0 No

Is This Answer Correct?

33 :: What is BRODCASTING and REPLICATE?


Broadcast

Takes

data

from

multiple

inputs,

combines

it

and

sends

it

to

all

the

output

ports.

Eg - You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast component, one with 10
records & other with 20 records. Then on all the outgoing flows (it can be any number of flows) will have 10 + 20 = 30 records
Replicate - It replicates the data for a particular partition and send it out to multiple out ports of the component, but maintains the
partition
integrity.
Eg - Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10 recs & other one having 20 recs.
Now suppose you have 3 output flos from replicate. Then each flow will have 2 data partitions with 10 & 20 records respectively.

0 Yes

0 No

Is This Answer Correct?

34 :: What is the importance of EME in abinitio?


EME is a repository in Ab Inition and it used for checkin and checkout for graphs also maintains graph version.

0 Yes

0 No

Is This Answer Correct?

35 :: What is m_dump?
m_dump

command

prints

the

data

in

formatted

m_dump <dml> <file.dat>

0 Yes

0 No

0 Yes

0 No

Is This Answer Correct?

Is This Answer Correct?

way.

22 :: What is AB_LOCAL expression where do you use it in ab-initio?


ablocal_expr is a parameter of itable component of Ab Initio.ABLOCAL() is replaced by the contents of ablocal_expr.Which we can
make use in parallel unloads.There are two forms of AB_LOCAL() construct, one with no arguments and one with single argument
as
a
table
name(driving
table).
The use of AB_LOCAL() construct is in Some complex SQL statements contain grammar that is not recognized by the Ab Initio
parser when unloading in parallel. You can use the ABLOCAL() construct in this case to prevent the Input Table component from
parsing the SQL (it will get passed through to the database). It also specifies which table to use for the parallel clause.

0 Yes

0 No

Is This Answer Correct?

23 :: What is the latest version that is available in Ab-initio?


The latest version of GDE ism1.15 AND Co>operating system is 2.14

0 Yes

1 No

Is This Answer Correct?

24 :: What is $mpjret? Where it is used in ab-initio?


You

can

use

if

$mpjret

in

endscript

like
-eq($mpjret)

then
echo

"success"

else
mailx -s "[graphname] failed" mailid

0 Yes

0 No

Is This Answer Correct?

25 :: I am unable to connect sever database(oracle) from GDE(db config file) local system.i set all these?
ChalapathiFirst we can check the properties in internet options and then u can check in cmd format telenet abinitio ip_add.

0 Yes

0 No

Is This Answer Correct?

Ab Initio Questions and Answers:

36 :: What is the difference between a Scan component and a RollUp component?


Rollup is for group by and Scan is for successive total. Basically, when we need to produce summary then we use scan. Rollup is
used to aggregate data.

0 Yes

0 No

Is This Answer Correct?

37 :: What is skew and skew measurement?


skew

is

suppose

the
i/p

mesaureof

is

comming

data
from

gb=

250

each

and

size

is

partation

gb

100mb+200mb+300mb+5oomb)
250

)/500=

-->

-150/500

==

calclu
+ve

files

to

1000mb/4=
(100-

flow

cal

ur

mb
self

it

wil

come

in

for
value

of

-ve

value.

200,500,300.

skew

is

allways

desriable.

skew is a indericet measure of graph.

0 Yes

0 No

Is This Answer Correct?

38 :: How to get DML using Utilities in UNIX?


If your source is a cobol copybook, then we have a command in unix which generates the required in Ab Initio. here it is:
cobol-to-dml.

0 Yes

0 No

Is This Answer Correct?

The Latin term ab initio means from the beginning .


"Ab Initio Software LLC" is a company which excels in solving extreme data processing problems.
Many IT people never heard of Ab Initio. Why? Well, first, Ab Initio never advertise themselves. They get lots of
business by referral - in fact so much that they don't need any advertising. Second, because Ab Initio only works
with few clients who have extreme data processing problems. Ab Initio is not common, and they don't sell
software. They sell solutions - and license the tools to provide those solutions. So it is more a solutions company,
not a software company.
Most of those people who have heard about Ab Initio think about it as an ETL provider. This is wrong. Yes, Ab Initio
has excellent tools for ETL (Extract, Transform, Load). But for some problems they provide solutions which have
nothing to do with databases. In fact, in many situations they recommend to STOP using database at all for
performance reasons.
If you are a small or medium client - Ab Initio is an overkill. But if you have thousands of transactions per second,
big databases, very active web site, or huge transactional or accounting system - Ab Initio is a savior. Their pricing
model is a bit unusual, but the long term costs are reasonable.
You can read a short description on wikipedia, but as of today (20098) this description doesn't give a good honest
representation of the company (in my opinion).

http://en.wikipedia.org/wiki/Ab_Initio

http://www.abinitio.com

http://www.patents.com/Ab-Initio-SoftwareCorporation/Lexington/MA/301339/company/

http://www.bi-nerd.com/ab-initio-the-dark-horse-of-etl/

Patents: US6654907.pdf, US7047232.pdf, US7164422.pdf, US7167850.pdf

http://www.linkedin.com/companies/ab-initio

Ab Initio is a private company, its main offices are in Lexington, Massachusetts (near Boston, USA - since 1994),
but they have offices all over the world (as you can see on their web site). They have very good talented devoted
people. I've heard that when you are calling their customer service - there is a 75% chance that you will speak
with a Ph.D.. It may very well be true. The company was formed by former employees of the Thinking Machines
Corporation. Some key people: Craig W. Stanfill, Richard A. Shapiro, Stephen A. Kukolich.
Ab Initio also uses its own people as well as independent consulting firms to build proof of concept for a client, and
then to guide clients in using their tools.
Unfortunately Ab Initio provides very little information about their solutions to general public. So not getting into
details, most of AI functionality can be scripted using several commands which you can give from prompt (with
many options):

m_* commands ( for example, m_shutdown, m_mkfs, m_cp, etc. ) are used for
administering

mp ... (some options) - to define, establish, and run jobs

air ... (some options) - to work with EME (basically a specialized version control
system)

The scripts can be easily integrated to work with external schedulers.


Somewhere ~1997 Ab Initio has introduced Graphical Development Environment - a very powerful desktop
software. You place components on the screen, connect them, define what they do and how. So your application is
a graph. You can create components which consist of other components which consist of other components, etc. so effectively you can drill deeply into the diagram. I've seen this tool generating powerful data processing
application in less than 10 minutes. You can run the application right from the IDE, or save it as a set of scripts
(ksh for unix). The scripts will call misc. component libraries. The libraries are written in C++.
Some of the key elements of the system:

"Co>Operating System"

"Component Library"

"Graphical Development Environment" (GDE)

"Enterprise Meta>Environment" (EME)

"Data Profiler"

"Conduct>It"

Main power of Ab Initio - parallelism - is achieved via its "Co>Operating System" which provides the facilities for
"parallel execution (multiple CPUs and/or multiple boxes), platform independent data transport, check pointing, and
process monitoring. A lot of attention is devoted to monitoring resources (CPU, memory). multi-file, multidirectory.
Component Library - a set of software modules to perform sorting, data transforming, and high speed data loading
and unloading tasks.
Ab Initio tools incorporate best practices, such as check-pointing, rerunnability, tagging everything with unique Ids, etc.
Unfortunately Ab Initio doesn't advertise or publish any information. So there are just bits and pieces here and
there. Here is an interesting blog:

http://www.geekinterview.com/Interview-Questions/Data-Warehouse/Abinitio

Question

Answer
==============================================
============
Phases - are used to break the graph into pieces. Temporary files created
during a phase will be deleted after its completion. Phases are used to
effectively separately manage resource-consuming (memory, CPU, disk)
parts of the application.

Phases vs
Checkpoint
Checkpoints - created for recovery purposes. These are points where
s
everything is written to disk. You can recover to the latest saved point - and
rerun from it.
You can have phase breaks with or without checkpoints.

xfr

three
types of
parallelism

A new sandbox will have many directories: mp, dml, xfr, db, ... . xfr is a
directory where you put files with extension .xfr containing your own
custom functions (and then use : include "somepath/xfr/yourfile.xfr").
Usually XFR stores mapping.

1) Data Parallesim - data (partitionning of data into parallel streams for


parallel processing).
2) Componnent Paralelism (execute simultaneously on different branches of

the graph)
3) Pipeline (sequential).

Multi-File System

MFS

m_mkfs - create a multifile (m_mkfs ctrlfile mpfile1 ... mpfileN)


m_ls - list all the multifiles
m_rm - remove the multifile
m_cp - copy a multifile
m_mkdir - to add more directories to existing directory structure

Memory
requireme
nts of a
graph

How to
calculate a
SUM

Each partition of a component uses: ~ 8 MB + max-core (if any)

Add size of lookup files used in phase (if multiple components use
same lookup only count it once)

Multiply by degree of parallelism. Add up all components in a phase;


that is how much memory is used in that phase.

Select the largest-memory phase in the graph

SCAN
ROLLUP
SCANWITHROLLUP
Scan followed by Dedup sort and select the last

If we don't use any key in the sort component while using the dedup sort,
then the output depends on the keep parameter.
dedup sort
with null
key

join on
partitioned

first - only the first record

last - only last record

unique_only - there will be no records in the output file.

file1 (A,B,C) , file2 (A,B,D). We partition both files by "A", and then join by
"A,B". IS it OK? Or should we partition by "A,B" ? Not clear.

flow
checkin,
checkout

You can do checkin/checkout using the wizard right from the GDE using
versions and tags

how to
have
different
passwords
for QA and
production

parameterize the .dbc file - or use environmental variable.

How to get
records
50-75 out
of 100

use scan and filter

m_dump <dml> <mfs file> -start 50 -end 75

use next_in_sequence() function and filter by expression component


(next_in_sequence() >50 && next_in_sequence() <75)

Hot to
convert a
serial file
into FFS

create MFS, then use partition component

project
parameter
s vs.
sandbox
parameter
s

When you check out a project into your sandbox - you get project
parameters. Once in your sandbox - you can refer to them as sandbox
parameters.

BadStraightflow

error you get when connecting mismatching components (for example,


connecting serial flow directly to mfs flow without using a partition
component)

merging
graphs

You can not merge two ab initio graphs. You can use the ouput of one graph
as input for another. You can also copy/paste the contents between graphs.
See also about using .plan

partitionin
g, repartitionin
g,
departition
ing

partitioning - dividing a single flow of records(serial file, mfs) into


multiple flows.

departitioning - removing partitionning (gather an merge


component)

re-partitioning - change the number of partitions (eg, from 2 to 4


flows)

lookup file

for large amounts of data use MFS lookup file (instead of serial)

indexing

No indexes as such. But there is an "output indexing" using reformat and


doing necessary coding in transform part.

Environme
nt project

Environment project - special public project that exists in every Ab Initio


environment. It contains all the environment parameters required by the
private or public projects which constitute AI Standard Environment.

Aggregate
vs Rollup

Aggregate - old component


Rollup - newer, extended, recommended to use instead of Agregate.
(built-in functions like sum count avg min max product, ...)

EME = Enterprise Metdata Environment. Functions (repository,


version control, statistical analysis, dependency analysis). It is on
the server side and holds all the projects (metadata of
transformations, config info, source and target info: graph dml xfr
ksh sql, etc..). This is where you checkin/checkout. /Project dir of
EME contains common directories for all application sandboxes
connected to it. It also helps in dependency analysis of codes. Ab
Initio has series of air commands to manipulate repository objects.

GDE = Graphical Devlopment Environment (on the client box)

Co-operating sytem = Ab Initio server installed on top of native


(unix) os on the server

EME, GDE,
Cooperating
sytem

fencing means job controlling on priority basis.


In AI it actually refers to customized phase breaking. A well fenced graph
means no matter what is source data volume process will not cough in dead
locks. It actually limits the number of simultaneous processes.
fencing

Fencing - changing a priority of a job


Phasing - managing the resources to avoid deadlocks.
For example, limiting the number of simultaneous processes
(by breaking the graph into phases, only 1 of which can run at any given
time)

Continuou
Continuous components - produce useful output file while running
s
continously. For example, Continuous rollup, Continuous update batch
component
subscribe
s

Question

Answer
===============================================
===========

deadlock

Deadlock is when two or more processes are requesting the same resource.
To avoid use phasing and resource pooling.

environm
ent

AB_HOME - where co>operating system is installed

AB_AIR_ROOT - default location for EME datastore

sandboxes standard environment

AI_SORT_MAX_CORE, AI_HOME, AI_SERIAL, AI_MFS, etc.

from unix prompt: env | grep AI

wrapper
script

unix script to run graphs

multistag
e
compone
nt

A multistage component is a component which transforms input records in 5


stages (1.input select, 2.temporary initialization, 3.processing, 4. output
selection, 5.finalize). So it is a transform component which has packages.
Examples: scan Normalize and Denormalize, rollup scan normalize and
denormalize sorted.

Dynamic
DML

Dynamic DML is used if the input metadata can change. Example: at


different time different input files are recieved for processing which have
different dml. in that case we can use flag in the dml and the flag is first
read in the input file recieved and according to the flag its corresponding
dml is used.

fan in, fan


out

fan out - partition component (increase parallelism)

fan in departition component (decrease parallelism)

lock

a user can lock the graph for editing so that others will see the message and
can not edit the same graph.

join vs
lookup

Lookup is good for spped for small files (will load whole file in memory). For
large files use join. You may need to increase the maxcore limit to handle
big joins.

multi
update

multi update executes SQL statements - it treats each input record as a


completely separate piece of work.

We can use Autosys, Control-M, or any other external scheduler.

We can take care of dependencies in many ways. For example, if


scripts should run sequentially, we can arrange for this in Autosys, or
we can create a wrapper script and put there several sequential
commands (nohup command1.ksh & ; nohup command2.ksh &; etc).
We can even create a special graph in Ab Initio to execute individual
scripts as needed.

scheduler

Api and
Utility
modes in
input
table

These are database interfaces (api - uses SQL, utility - bulk loads, whatever
vendor provides)

lookup file component. Functions: lookup, lookup_count,


lookup_next, lookup_match, lookup_local.

Lookups are always used with combination of the reformat


components.

lookup file

Calling
stored
proc in
DB

You can call stored proc (for example, from input component). In fact, you
can even write SP in Ab Initio. Make it "with recompile" to assure good
performance.

Frequentl
y used
functions

string_ltrim, string_lrtrim, string_substring, reinterpret_as, today(), now()

data
validation

is_valid, is_null, is_blank, is_defined

driving
port

When joining inputs (in0, in1, ...) one of the ports is used as "driving (by
default - in0). Driving input is usually the largest one. Whereas the smallest
can have "Sorted-Input" parameter be set to "Input need not be sorted"
because it will be loaded completely in memory.

Ab Initio
vs

Ab Initio benefits: parallelism built in, mulitifile system, handles huge

amounts of data, easy to build and run. Generates scripts which can be
easily modified as needed )if something couldn't be done in ETL tool itself).
The scripts can be easily scheduled using any external scheduler - and easily
integrated with other systems.
Ab Initio doesn't require a dedicated administrator.
Informati
ca for ETL

Ab Initio doesn't have built-in CDC capabilities (CDC = Change Data


Capture).
Ab Initio allows to (attach error / reject files) to each transformation and
capture and analyze the message and data separately (as opposed to
Informatica which has just one huge log). Ab Initio provides immediate
metrics for each component.

override
key

override key option is used when we need to join 2 fields which have
different field names.

control
file

control file should be in the multifile directory (contains the addresses of the
serial files)

max-core

max-core parameter (for example, sort 100 MBytes) specifies the amount of
memory used by a component (like Sort or Rollup) - per partition - before
spilling to disk. Usually you don't need to change it - just use default value.
Setting it too high may degrade the performance because of OS swapping
and degrading of the performance of other components.

graph > select parameters tab > click "create" - and create a parameter.
Input
Usage: $paramname. Edit > parameters. These parameters will be
Parameter
substituted during run time. You may need to declare you parameter scope
s
as formal.

Error
Trapping

Each component has reject, error, and log ports. Reject captures rejected
records, Error captures corresponding error, and log captures the execution
statistics of the component. You can control reject status of each component
by setting reject threshold to either Never Abort, Abort on first reject, or
setting ramp/limit. You can also use force_error() function in transform
function.

Question

Answer
============================================
==============

How to see
resource usage

In GDE goto options View > Tracking Details - will see each
component's CPU and memory usage, etc.

assign keys

Easy and saves development time. Need to understand how to feed

component

Join in DB vs
join in Ab Initio

Join with DB

parameters, and you can't control it easily.

Scenario 1 (preferred): we run query which joins 2 tables in DB


and gives us the result in just 1 DB component.

Scenario 2 (much slower): we use 2 database components,


extract all data - and join them in Ab Initio.

not recommended if number of records is big. It is better to retrieve


the data out - and then join in Ab Initio.
Parameter showing how data is unevenly distributed between
partitions.

Data Skew
skew = (partition size - avg.part.size)* 100 / (size of the largest
partition)

.dbc - database configuration file (dbname, nodes, version user/pwd) resides in the db directory
dbc vs cfg

.cfg - any tyoe of config file. for example, remote connection config
(name of remote server, user/pwd to connect to db, location of OS on
remote machine, connection method). .cfg file resides in the config dir.

depth not equal data format error etc...


compilation
errors

depth error : we get this error.. when two components connected


together but does't match there layout

types of
partitions

broadcast pbyexpression pbyroundrobin pbykey pwithloadbalance

unused port

when joining, used records go to the output port, unused records - to


the unused port

tuning
performance

Go parallel using partitionning. Roundrobin partitionning gives


good balance.

Use Multi-file system (MFS).

Use Ad Hoc MFS to read many serial files in parallel, and use

concat component.

Once data is partitionned - do not switch it to serial and back.


Repartition instead.

Do not acceess large filess via NFS - use FTP instead

use lookup local rather than lookup (especially for big lookups).

Use rollup and Filter as soon as possible to reduce number of


records. Ideally do it in the source (database ?) before you get
the data.

Remove unnecessary components. For example, instead of


using filter by exp, you can implement the same function in
reformat/Join/Rollup. Another example - when joining data from
2 files, use union function instead of adding an additional
component for removing duplicates.

use gather instead of concatenate.

it is faster to do a sort after a partitino, than to do a sort before


a partition.

try to avoid using a join with the "db" component.

when getting data from database - make sure your queries are
fast (use indexes, etc.). If possible, do necessary selection /
aggregation / sorting in the database before getting data into
Ab Initio.

tune Max_core for Optimal performance (for sort depends on


the size of the input file).

Note - If in-memory join cannot fit its non-driving inputs in the


provided MAX-CORE, then it will drop all the inputs to disk and
in-memory does not make sence.

Using phase breaks let you allocate more memory in individual


components - thus improving performance.

Use checkpoint after sort to land data on disk

Use Join and rollup in-memory feature

When joining very small dataset to a very large dataset it is


more efficient to broadcast the small dataset to MFS using

broadcast component, or use the small file as lookup. But for


large dataset don't use broadcast as a partitioner.

Use Ab Initio layout instead of database default to achieve


parallel loads

Change AB_REPORT parameter to increased monitoring duration

Use catalogs for reusability

Components like join/ rollup should have the option "Input must
be sorted"
if they are placed after a sort component.

minimize number of sort components. Minimize usage of sorted


join component, and if possible replace them by in-memory
join/hash join. Use only required fields in the sort reformat join
components. Use "Sort within Groups" instead of just Sort when
data was already presorted.

Use phasing/flow buffers in case of merge sorted joins

Minimize the use of regular expression functions like re_index in


the transfer functions

Avoid repartitioning of data unnecessarily. When splitting


records into more than two flows, use Reformat rather than
Broadcast component.

For joining records from 2 flows use Concatenate component


ONLY when there is a need to follow some specific order in
joining records. If no order is required then it is preferable to
use Gather component.

Instead of putting many Reformat components consecutively,


use output indexes parameter in the first Reformat component
and mention the condition there.

Delta table maintain the sequencer of each data table.

Master (or base) table - a table on tp of which we create a view

delta table

scan vs rollup

rollup - performs aggregate calculations on groups, scan - calculates


cumulative totals

packages

Reformat vs
"Redefine
Format"

Conditional
DML

SORTWITHING
ROUP

passing a
condition as a
parameter

Passing file
name as a
parameter

used in multistage components or transform components

Reformat - deriving new data by adding/dropping fields

Redefine format - rename fields

DML which is separated based on a condition

The prerequisit for using sortwithingroup is that the data is


already sorted by the major key. sortwithingroup outputs the
data once it has finished reading the major key group. It is like
an implicit phase.

Define a Formal Keyword Parameter of type string. For example, you


call it FilterCondition, and you want it to do filtering on COUNT > 0 .
Also in your graph in your "Filter by expression" Component enter
following condition: $FilterCondition
Now on your command line or in wrapper script give the following
command
YourGraphname.ksh -FilterCondition COUNT > 0
#!/bin/ksh
#Running the set up script on
enviornment
typeset PROJ_DIR $(cd $(dirname $0)/..; pwd)
.
$PROJ_DIR/ab_project_setup.ksh $PROJ_DIR
#Exporting the script parameter1 to
INPUT_FILE_NAME
if [ $# -ne 2 ];
then
INPUT_FILE_PARAMETER_1
$1
INPUT_FILE_PARAMETER_2 $2
# This grpah is using the input
file
cd $AI_RUN
./my_graph1.ksh $INPUT_FILE_PARAMETER_1
# This graph also is using the input
file.
./my_graph2.ksh $INPUT_FILE_PARAMETER_2
exit 0;
else
echo Insufficient
parameters
exit 1;
fi
------------------------------------#!/bin/ksh
#Running the set up script on enviornment
typeset PROJ_DIR $(cd $(dirname $0)/..; pwd)

. $PROJ_DIR/ab_project_setup.ksh $PROJ_DIR
#Exporting the script parameter1 to INPUT_FILE_NAME
export INPUT_FILE_NAME $1
# This grpah is using the input file
cd $AI_RUN
./my_graph1.ksh
# This graph also is using the input file.
./my_graph2.ksh
exit 0;
How to remove
header and
trailer lines?

How to create
a multi file
system on
Windows

use conditional dml where you can separate detail from header and
trailer. For validations use reformat with count :3 (out0:header
out1:detail out2:trailer.)

first method: in GDE go to RUN > Execute Command - and run


m_mkfs c:control c:dp1 c:dp2 c:dp3 c:dp4

second method: double-click on the file component, and in ports


tab double-click on partitions - there you can enter the number
of partitions.

Vector

A vector is simply an array. It is an ordered set of elements of the


same type (type can be any type, including a vector or a record).

Dependency
Analysis

Dependency analysis will answer the questions regarding datalinage,


that is where does the data come from what applications prodeuce and
depend on this data etc..

Question
Surrogate
key

Answer
===============================================
===========
There are many ways to create a surrogate key. For example, you can
use next_in_sequence() function in your transform. Or you can use
"Assign key values" component. Or you can write a stored procedure - and
call it.
Note: if you use partitions, then do something like this:

(next_in_sequence()-1)*no_of_partition()+this_partition()

.abinitiorc

This is a config file for ab initio - in user's home directory and in


$AB_HOME/Config. It sets abinitio home path, configuration variables
(AB_WORK_DIR, AB_DATA_DIR, etc.), login info (id, encrypted password),
login methods for hosts for execution (like EME host, etc.), etc.

.profile

your ksh init file ( environment, aliases, path variables, history file settings,
command prompt settings, etc.)

data
mapping,
data
modelling
Hwo to
execute
the graph

From GDE - whole graph or by phases. From checkpoint. Also using ksh
scripts

Write
Multiplefil
es

A component which allows to write simultaneously into multiple local files

Testing

Run the graph - see the results. Use components from Validate category.

Sandbox
vs EME

Sandbox is your private area where you develop and test. Only one project
and one version can be in the sandbox at any time. The EME
Datastorecontains all versions of the code that have been checked into it
(source control).

Layout

Where the data-files are and where the components are running. For
example, for data - serial or partitioned (multi-file). The layout is defined by
the location of the file (or a control file for the multifile). In the graph the
layout can propagate automatically (for multifile you have to provide
details).

Latest
versions

April 2009: GDE ver.1.15.6, Co-operative system ver 2.14.

Graph
paramete
rs

menu edit > parameters - allows you to specify private parameters for the
graph. They can be of 2 types - local and formal.

Plan>It

You can define pre- and post-processes, triggers. Also you can specify
methods to run on success or on failure of the graphs.

Frequentl
y used
compone
nts

running
on hosts

conventio
nal
loading vs
direct
loading

input file / output file

input table / output table

lookup / lookup_local

reformat

gather / concatenate

join

runsql

join with db

compression components

filter by expression

sort (single or multiple keys)

rollup

trash

partition by expression / partition by key

co>operating system is layered on top of native OS (unix). When running


from GDE, GDE generates a script (according to "run" setings). Co>op
system will execute the scripts on different machines (using specified host
settings and connection methods, like rexec telnet rsh rlogin) - and then
return error or success codes back.
This is basically an Oracle question - regarding SQLLDR (SQL Loader) utility.
Conventional load - using insert statements. All triggers will fire, all
contraints will be checked, all indexes will be updated.
Direct load - data is written directly block by block. Can load into specific
partition. Some constraints are checked, indexes may be disabled - need to
specify native options to skip index maintenance.

semi-join
abinitio online help gives 3 examples of joins: inner join, outer join, and

semi join.

for inner join 'record_requiredN' parameter is true for all "in" ports.

for outer join it is false for all the "in" ports.

for semi join it is true for both port (like InnerJoin), but the dedup
option is set only on one side

You might also like