Professional Documents
Culture Documents
Ab Initio Software:
Part 1
24 August 2007
Course Structure
Day 1
Day 2
Finger Exercises
Intermediate
Exercises
Ab Initios focus
Moving Data
move small and large volumes of data in an efficient manner
deal with the complexity associated with business data
High Performance
scalable solutions
Better productivity
Ab Initio Software
Ab Initio software is a general-purpose data processing platform
for mission-critical applications such as:
Data warehousing
Batch processing
Click-stream analysis
Real Time Applications
Data movement
Data transformation
A Network of Networks
Data transformation.
User-defined
User-defined
Components
Components
3rd
3rdParty
Party
Components
Components
Ab
AbInitio
Initio
EME
EME
The
TheAb
AbInitio
InitioCo>Operating
Co>OperatingSystem
System
Native
NativeOperating
OperatingSystem
System(Unix,
(Unix,Windows,
Windows,OS/390)
OS/390)
Components
Datasets
Dataset
Flows
Ports
Record format
metadata
Expression
metadata
Components
Components may run on any computer running the Co>Operating
System.
Datasets
A dataset is a source or destination of data. It can be a simple file, a
database table, a SAS dataset, ...
Datasets may reside on any machine running the Co>Operating
System.
Datasets may reside on other machines if connected by FTP or
database middleware.
Data is always described by record format metadata (termed dml).
A dataset is made up of
records; a record
consists of fields.
Analogous database
terms are rows and
columns
Records
0345John
0345John
0212Sam
0212Sam
0322Elvis
0322Elvis
0492Sue
0492Sue
0121Mary
0121Mary
0221Bill
0221Bill
Fields
Smith
Smith
Spade
Spade
Jones
Jones
West
West
Forth
Forth
Black
Black
A Sandbox Environment
Setting up a standard working environment helps a development
team work together.
Sandbox Parameters
Environment Overview
We will make use of environment variables (shortcuts, parms)
during class.
Double click on a
component to bring
up its Properties Page
0345John
0345John
0212Sam
0212Sam
0322Elvis
0322Elvis
0492Sue
0492Sue
0121Mary
0121Mary
0221Bill
0221Bill
Smith
Smith
Spade
Spade
Jones
Jones
West
West
Forth
Forth
Black
Black
Field name
Field type
Field length
record
decimal(4) id;
string(6) first_name;
string(6) last_name;
string(5) newfield;
end
Field Names
Names consist of letters, digits, and underscores:
a z, A Z, 0 9, _
Note: No spaces, hyphens, $s, #s, %s
View Attributes.
record
decimal(4) id;
string(6) first_name;
string(6) last_name;
date("YYYY-DD-MM") newfield;
end;
Expressions in DML
Computations are expressed in the algebraic syntax of C, Pascal, etc.
Field names act as variables.
Arithmetic operators: +, -, *, ...
Comparison operators: >, <, ==, !=, ...
Many built-in functions: string_concat, string_trim, today,
date_day_of_week,
(See the Data Manipulation Language Reference for more information on
expressions and built-in functions.)
Type in an expression...
or use the expression editor
Expression Editor
Fields
Functions
Expression text
Operators
Use the Record Format Editor (New) to create a description of this data:
lastname, firstname, pur_date, and amt. Then use View Data to verify
the description is correct.
Hint: Newline delimiters are written: \n
Simple Components
(figure-02)
Expression Parameter
Keys
A key identifies a single field or set of fields (a composite key) used to
organize a dataset in some way.
Single field:
{id}
Multiple field:
{last_name; first_name}
Modifiers:
{id descending}
Sorting (mp/figure-03.mp)
Exercise 3: Sorting
Using example graph figure-03.mp, change the key parameter of
the Sort component to sort the data by first_name.
Data Transformation
0345,090263John,Smith;
0345,090263John,Smith;
Drop Reformat
Reformat
Reorder
id+1000000
Output record format:
record
decimal(7) id;
string(8) last_name;
date(YYYY.MM.DD) bday;
end
1000345Smith
1000345Smith
1963.09.02
1963.09.02
Transformation Functions
A transform function specifies the business rules used to create
the output record.
a b
x y z
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
9 45 QF
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
44 9 RG
out :: trans(in) =
begin
out.x :: in.b - 1;
out.y :: in.a;
out.z :: fn(in.c);
end;
44 9 RG
Then modify the transform to trim the spaces from the first name before
concatenating with last name to get John Smith rather than John
Smith
Data Aggregation
0345Smith
0345Smith
0212Spade
0212Spade
0322Jones
0322Jones
0492West
0492West
0121Forth
0121Forth
0221Black
0221Black
Bristol
Bristol
London
London
Compton
Compton
London
London
Bristol
Bristol
New
New York
York
56
56
88
12
12
23
23
77
42
42
Bristol
Bristol
Compton
Compton
London
London
New
New York
York
63
63
12
12
31
31
42
42
0345Smith
0345Smith
0121Forth
0121Forth
0322Jones
0322Jones
0212Spade
0212Spade
0492West
0492West
0221Black
0221Black
Bristol
Bristol
Bristol
Bristol
Compton
Compton
London
London
London
London
New
New York
York
56
56
77
12
12
88
23
23
42
42
Bristol
Bristol 63
63
Compton
Compton 12
12
London
London
New
New York
York
31
31
42
42
avg
max
count
min
first
product
last
sum
Rollup Wizard
Joining Data
0345Smith
0345Smith
0212Spade
0212Spade
0322Jones
0322Jones
0492West
0492West
0121Forth
0121Forth
0221Black
0221Black
Bristol
Bristol
London
London
Compton
Compton
London
London
Bristol
Bristol
New
New York
York
56
56
88
12
12
23
23
77
42
42
0322970402
0322970402
0345970924
0345970924
0121961211
0121961211
0492971123
0492971123
0666950616
0666950616
0345Bristol
0345Bristol
0212London
0212London
0322Compton
0322Compton
0492London
0492London
0121Bristol
0121Bristol
0221New
0221New York
York
1242.50
1242.50
923.75
923.75
12392.00
12392.00
234.12
234.12
2312.10
2312.10
561997/09/24
561997/09/24
81900/01/01
81900/01/01
121997/04/02
121997/04/02
231997/11/23
231997/11/23
71996/12/11
71996/12/11
421900/01/01
421900/01/01
0121Forth
0121Forth
0212Spade
0212Spade
0221Black
0221Black
0322Jones
0322Jones
0345Smith
0345Smith
0492West
0492West
Bristol
Bristol
London
London
New
New York
York
Compton
Compton
Bristol
Bristol
London
London
77
88
42
42
12
12
56
56
23
23
0121Bristol
0121Bristol
0212London
0212London
...
...
0121961211
0121961211 12392.00
12392.00
0322970402
0322970402 1242.50
1242.50
0345970924
923.75
0345970924
923.75
0492971123
234.12
0492971123
234.12
0666950616
0666950616 2312.10
2312.10
71996/12/11
71996/12/11
81900/01/01
81900/01/01
in0:
in1:
record
decimal(4) id;
string(6) name;
string(8) city;
decimal(3) amount;
end
record
decimal(4) id;
date(YYMMDD) dt;
decimal(9.2) cost;
end
out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(YYYY/MM/DD)dt;
end
in0:
in1:
record
decimal(4) id;
string(6) name;
string(8) city;
decimal(3) amount;
end
record
decimal(4) id;
date(YYMMDD) dt; ???
decimal(9.2) cost;
end
out:
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(YYYY/MM/DD)dt;
end
Prioritized Assignment
Destination
out.dt
out.dt
Priority
Source
:1: in1.dt;
:2: 1900/01/01;
Joining (mp/figure-06.mp)
a b
a q
a q
a x
*join-type = Full
Outer join
G NY
Align inputs by a
G 234 42
G NY
Align inputs by a
G 234 42
G NY
Align inputs by a
Align inputs by a
G 234 42
G NY
Align inputs by a
G 234 42
G NY
Align inputs by a
G 24 NY
K IL
Align inputs by a
H 79 23
K IL
Align inputs by a
H 79 23
K IL
Align inputs by a
K IL
Align inputs by a
H 79 23
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: XX;
end;
K IL
Align inputs by a
H 79 23
out :: join(in0, in1) =
begin
out.a : : in0.a;
out.x :1: in1.r + 20;
out.x :2: in0.b + 10;
out.q :1: in1.q;
out.q :2: XX;
end;
Align inputs by a
H 89 XX
Lookup Files
DML provides a facility for looking up records in a dataset based
on a key:
lookup(file-name, key-expression)
Using Last-Visits
as a lookup file
record
decimal(4) id;
string(6) name;
string(8) city;
decimal(3) amount;
end
record
decimal(4) id;
string(8) city;
decimal(3) amount;
date(YYYY/MM/DD) dt;
end
Transform function:
out :: lookup_info(in) =
begin
out.id
: : in.id;
out.city
: : in.city;
out.amount : : in.amount;
out.dt
:1 : lookup(Last-Visits, in.id).dt;
out.dt
:2 : 1900/01/01;
end;
Enable Debugger
Isolate Components
Q&A
Any Questions ?
Capgemini
WORLDWIDE HEADQUARTERS 6400 SHAFER COURT ROSEMONT, ILLINOIS USA 60018
Tel. 847.384.6100 Fax 847.384.0500 WWW.Capgemini.COM
24 August 2007