You are on page 1of 3

Dynamic Script Generation is the latest buzz in Ab Initio world and one of its finest.

It comes with

lots of other advantages which were not there in earlier versions of Ab Initio Co>Operating System. Now it is available in Co>Operating System version 2.14.46 and above. This feature typically enables the use of Ab Initio PDL (Parameter Definition Language) and Component Folding. Now if we enable this feature by changing the script generation method to Dynamic in Run Settings we will be able to run a graph without a deployed script from command prompt. Here the .mp itself works as an executable file. So you don't need to checkin ksh into the EME run directory anymore. In production server once we run the mp file using air sandbox run command on the fly it generates a reduced script, which contains the commands to set up the host environment. Unlike earlier .mp file Dynamic Script Generation(DSG) enabled graph(.mp) file is a text file. You can open and view the content from an editor.
Component Folding: It is a feature of Co>Operating system that helps combining group of

components and runs them as a single process.


Prerequisites of Component Folding:

This has to be a DSG enabled graph. The components must be foldable. They must be in the same phase and layout. Components must be connected by a straight flow. Now question - Does this improve the performance? Yes, in most of the cases it will bring a significant performance boost over the traditional approach of execution. How it works (Advantages): 1. When this feature is enabled by checking the folding option in Run Setting, Co>Operating System runtime folds all the processes (foldable components) in a single process. As a result number of processes is reduced when a graph executes. On any system every process has overheads of forking new process, scheduling, memory consumption etc. These overheads will vary from OS to OS. In some systems like Mainframe-MVS, creation and maintenance of processes are very costly compared to different flavors of UNIX. 2. Another major benefit of component folding is the reduction of interpretation time for the DML between processes. Because it will end up with multitool folded processes communicating with other multitool or unitool. 3. Apart from that increase in number of processes results higher interprocess communication. Data movement between two or more processes will not only consume time but memory too. In CFG (Continuous Flow Graph) interprocess communication is always very high. So it is worth enabling Component folding in a CFG. Disadvantages of Component Folding: 1. Pipeline Parallelism: As component folding folds different component in a single process it will hinder the pipeline parallelism of Ab Initio. If flow of our graph is like - Input File -> Filter By Expression -> Reformat -> Output File. In traditional method by the help of Pipeline Parallelism FBE and Reformat will execute concurrently. But now these two components are folded together so there is no chance of parallel execution.

2. Address Space: In a 32 bit OS maximum limit of Address space for process is 4 GB. So if we combine 4 different components to a single process by folding OS will allow only 4 GB of address space for all 4 instead of 4X4 total 16 GB(maximum) of space for 4 independent processes. So we should avert component folding components where memory use is very high as in case of inmemory Rollup, Join, and Reformat with lookup. Some components like Sort, in-memory Join causes internal buffering of data. Combing them in a single process will result writing to disk (Higher IO). Set AB_MULTITOOL_MAXCORE variable to limit the maximum allowable memory for the folded component group. Excluding any component from Component Folding: I know sometime you would wish to prevent components to be folded to allow pipeline parallelism or to access more address space. Then you need to exclude some components from being folded. Set AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES configuration variable to space separated mpname of the components in your $HOME/.abinitiorc or system wide $AB_HOME/config/abinitiorc file. e.g. export AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES= hash-rollup reformat-transform The other way to prevent two components from getting folded is right clicking on the flow between them and uncheck the Allow Component Folding option. Everything has its cost. So it is always worth benchmarking before taking a decision. Prevent and allow component folding for your components of the graph, tune it for the highest performance. CPU tracking report of folded components in a graph: To report the execution detail of folded graph on console we need to override the AB_REPORTvariable with show-folding option as AB_REPORT=show-folding flows times interval=180 scroll=true spillage totals file-percentages. The folded components are displayed as multitool process in CPU tracking information. The CPU time for a folded component is shown twice once for the component itself again as a multitool component.

Parameter Definition Language (PDL):


PDL is used to put logic for inline computation in parameter value. It provides high flexibility in terms of interpretation. It supports both $ and ${} substitution. For this you need to set the parameter interpretation to PDL and write the DML expression within $[ ]. The excution time of PDL is always shorter than traditional shell interpreted parameters. For all new developments PDL is highly recommended to replace the crude shell scripting on parameters as much as possible. The major drawback of shell interpreted parameters is lack of support in EME dependency analysis. But the EME understands PDL since it is the native language for Ab Initio. Also there is no overhead of invoking shell for every parameter evaluation which may significantly increase graph pre-processing time.

Again PDL comes with numerous metaprogramming functions like add_field, make_transfom, add_rule which help handling metadata and tranformation rules run time. The definition and utility of those functions are well defined in Online help document. We can use the majority of the Ab Initio DML functions as well. I would recommend looking at the metaprogramming section for starters. Then play with the parameters editor.

Some examples of PDL Suppose in a graph we have a conditional component which runs based on existence of a file called emp.dat. Now FILE_NAME parameter is defined as /home/xyz/emp.dat and a conditional parameter called EXIST is defined as $[if (file_information($FILE_NAME).found) 1 else 0] We can define a parameter with type and transform function with the help of parameter AB_DML_DEFS. e.g. Suppose AB_DML_DEFS is defined as out :: sqrt(in) = begin out :: math_sqrt(in); end; Now in a parameter called SQRT is defined as $[sqrt (16)] Resolved value from this parameter will be 4. Ensure your host run settings are checked for dynamic script generation, and read the 2.14 patchset notes for a description of any hint.

You might also like