Professional Documents
Culture Documents
24 August 2007
Course Structure
Finger Exercises
Day 1
Part 2: Building Applications & Parallelism
Day 2
Intermediate Exercises
Ab Initios focus
Moving Data
Better productivity
Ab Initio Software
Ab Initio software is a general-purpose data processing platform for mission-critical applications such as:
Data warehousing Batch processing Click-stream analysis Real Time Applications Data movement Data transformation
Multi-CPU machines are often called SMPs (for Symmetric Multi Processors). Specially-built networks of machines are often called MPPs (for Massively Parallel Processors).
A Network of Networks
Data transformation.
GDE
Component Library
Shell
User-defined Components
The Ab Initio Co>Operating System Native Operating System (Unix, Windows, OS/390)
Components
Dataset
Datasets
Flows
Ports
Expression metadata
Components
Components may run on any computer running the Co>Operating System. Different components do different jobs. The particular work a component accomplishes depends upon its parameter settings. Some parameters are data transformations, that is business rules to be applied to an input(s) to produce a required output.
Datasets
A dataset is a source or destination of data. It can be a simple file, a database table, a SAS dataset, ... Datasets may reside on any machine running the Co>Operating System.
A dataset is made up of records; a record consists of fields. Analogous database terms are rows and columns
Records
Fields
COBOL copybooks
Other third-party products SAS datasets
A Sandbox Environment
Setting up a standard working environment helps a development team work together. The Sandbox capability allows an application to be designed to be trivially portable The Sandbox contents are a project administrative function
Sandbox Parameters
$AI_XFRtransform files
$AI_MPgraphs $AI_DBdatabase config files
$AI_SERIAL - serial source data, other serial data $AI_MFS - Ab Initio multifile directory in training will also contain partition directories (more about this later!) $AI_LOG - A location to place logging files, etc.
Environment Overview
We will make use of environment variables (shortcuts, parms) during class. The goal is to have a development environment which enables the migration of a graph or set of graphs to any other environment with absolutely no changes
Field name
Field type
Field length
record
decimal(4) id;
string(6) first_name; string(6) last_name; string(5) newfield; end
Field Names
Names consist of letters, digits, and underscores: a z, A Z, 0 9, _
Case does matters! ABC and abc are different! Some words are reserved (record, end, date, )
A field length is either a number for fixed-length fields, or the delimiter that terminates the field for variable-length fields.
View Attributes.
record
decimal(4) id; string(6) first_name; string(6) last_name; date("YYYY-DD-MM") newfield; end;
Expressions in DML
Computations are expressed in the algebraic syntax of C, Pascal, etc. Field names act as variables.
(See the Data Manipulation Language Reference for more information on expressions and built-in functions.)
Type in an expression...
Expression Editor
Fields
Functions
Operators
Expression text
Use the Record Format Editor (New) to create a description of this data: lastname, firstname, pur_date, and amt. Then use View Data to verify the description is correct. Hint: Newline delimiters are written: \n
Simple Components
In these components the record format metadata does not change from input to output
(figure-02)
Expression Parameter
Keys
A key identifies a single field or set of fields (a composite key) used to organize a dataset in some way. Single field: Multiple field: Modifiers: {id} {last_name; first_name} {id descending}
Used for sorting, grouping, partitioning. (See the Data Manipulation Language Reference for more information on keys. Note: keys are also called collators.)
Sorting (mp/figure-03.mp)
Exercise 3: Sorting
Using example graph figure-03.mp, change the key parameter of the Sort component to sort the data by first_name.
Data Transformation
Input record format:
record decimal(,) id; date(MMDDYY) bday; string(,)first_name; string(;) last_name; end
Reorder
1000345Smith
1963.09.02
Transformation Functions
A transform function specifies the business rules used to create the output record. Each field of the output record must successfully be assigned a value. Partial output records are not allowed! The Transform Editor is used to create a transform function in a graphical manner.
(See the Data Manipulation Language Reference for more information on transform functions.)
out :: reformat (in) = begin out.id :: in.id + 1000000; out.last_name :: string_concat(Mac, in.last_name); end;
a b
x y z
9 45 QF
out :: trans(in) = begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c); end;
9 45 QF out :: trans(in) = begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c); end;
9 45 QF out :: trans(in) = begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c); end;
out :: trans(in) = begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c); end; 44 9 RG
out :: trans(in) = begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c); end;
44 9 RG
Data Aggregation
56 8 12 23 7 42
63 12 31 42
56 7 12 8 23 42
Bristol Compton
63 12
By default, Rollup reads grouped (sorted) records from the input port, aggregates them as indicated by key and transform parameters, and writes the resulting aggregate record on the out port.
Rollup Wizard
Joining Data
0345Smith 0212Spade 0322Jones 0492West 0121Forth 0221Black Bristol London Compton London Bristol New York 56 8 12 23 7 42 0322970402 1242.50 0345970924 923.75 0121961211 12392.00 0492971123 234.12 0666950616 2312.10 561997/09/24 81900/01/01 121997/04/02 231997/11/23 71996/12/11 421900/01/01
7 8 42 12 56 23
0121961211 12392.00
71996/12/11 81900/01/01
in0:
record decimal(4) id; string(6) name; string(8) city; decimal(3) amount; end
in1:
record decimal(4) id; date(YYMMDD) dt; decimal(9.2) cost; end
out:
record decimal(4) id; string(8) city; decimal(3) amount; date(YYYY/MM/DD)dt; end
in0:
record decimal(4) id; string(6) name; string(8) city; decimal(3) amount; end
in1:
record decimal(4) id; date(YYMMDD) dt; ??? decimal(9.2) cost; end
out:
record decimal(4) id; string(8) city; decimal(3) amount; date(YYYY/MM/DD)dt; end
Prioritized Assignment
Destination
Priority
Source
out.dt out.dt
In DML, a missing value (say, if there is no in1 record) causes an assignment to fail. If an assignment for a left hand side fails, the next priority assignment is tried. There must be one successful assignment for each output field.
Joining (mp/figure-06.mp)
a b
a q
out :: fname(in0, in1) = begin ... ... ... ... ... end;
a x
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
G 234 42
G NY
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
G 234 42
G NY
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
Align inputs by a
G 234 42 G NY 4
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
Align inputs by a
G 234 42 G NY 4
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
A result record is emitted and written out as long as all output fields have been successfully computed
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
G 24 NY
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
H 79 23
K IL
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
H 79 23
K IL
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
K IL
Align inputs by a
H 79 23
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
K IL
Align inputs by a
H 79 23
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
A result record is generated and written out as all output fields are successfully computed
K IL 8
Align inputs by a
out :: join(in0, in1) = begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: XX; end;
H 89 XX
Run the application, and examine the results. The Unmatched Last Visits dataset should be empty.
Change the necessary parameters, run the application, and examine the results.
Lookup Files
DML provides a facility for looking up records in a dataset based on a key:
lookup(file-name, key-expression)
The GDE provides a Lookup File component as a special dataset with no ports.
Transform function:
out :: lookup_info(in) = begin out.id : : in.id; out.city : : in.city; out.amount : : in.amount; out.dt :1 : lookup(Last-Visits, in.id).dt; out.dt :2 : 1900/01/01; end;
Enable Debugger
Isolate Components
Q&A
Any Questions ?
Capgemini
WORLDWIDE HEADQUARTERS 6400 SHAFER COURT ROSEMONT, ILLINOIS USA 60018 Tel. 847.384.6100 Fax 847.384.0500 WWW.Capgemini.COM
24 August 2007