You are on page 1of 18

Job Design Approach

Agenda
Introduction Framework Scheduling Approach Restart Ability Reusability Templates Modularity and Maintain Ability Performance Considerations

2002. Infosys Technologies Ltd.

Introduction
Job design will be influenced by following points. Framework Scheduling Approach Restart Ability Reusability/Templates Modularity and Maintain Ability Performance Considerations Metadata Management

2002. Infosys Technologies Ltd.

Framework
Reprocessing System Health Tables ACR Balancing Logs , Errors & Warnings

2002. Infosys Technologies Ltd.

Framework
Reprocessing - Records will be error out according to business rules defined and records should be reconsidered when the Job runs in next run Reprocessing will be required/enforced, if the quality of data is not good enough. Reprocessing will influence Jobs Design/Framework in many ways
Error records need to be retained to allow corrections, need for landing/work table Job should have logic to handle duplicate records with same natural key

ACR log file should accommodate the count of reprocessed records


End users should be able to identify error records and correct

2002. Infosys Technologies Ltd.

Framework
System Health Tables Jobs should provide necessary information to maintain , track, and control data loading.

System Health Tables will have data of start and end time of a Job, # of records read, # of records written, # of records bypassed, Start of Batch , end of batch.
System Health Tables will directly/indirectly influence Jobs Design/Framework
To have necessary files generated with necessary information

To have enough information like link counts etc.


Reusable and Common jobs will be identified Scheduling and Sequencing will be influenced

2002. Infosys Technologies Ltd.

Framework
Few Common Tables from CSL/ABI projects DTMT_PRCS: Stores information about business processes. DTMT_PGM_CNTL: Stores all control table entries. DTMT_PGM_ERR: Stores information about errors occurred during program execution.

DTMT_PGM_EXEC_H: Stores Execution history of every program execution


DTMT_REC_ERR_LOG (Staging table): Staging table for error records to be corrected DTMT_SRC: Contains Source file names DTMT_PGM: Contains details about all the programs

2002. Infosys Technologies Ltd.

Framework
Logs, Errors, Warning : Datastage jobs should have provisions to maintains logs, Errors and Warnings Logs are required to facilitate in debugging and keep track

Errors and Warning need to be logged to validate business rules and data validations
Restart Ability will play vital role in loading Errors and Warning. Reusability/Common Jobs can be identified

2002. Infosys Technologies Ltd.

Scheduling
Scheduling approach will effect the Job designs. Scheduling can be done in two approaches
Use Sequencers of DataStage for Sequencing the Job. Use Control M only for Scheduling. Sequences should be build with restart points
Pros : Sequencing Complexity Abstracted inside Sequencers. Pros : Scheduling will be simplified only Starting point Cons : Complexity and additional effort in building sequencers. Sequencing and Job Designs tightly coupled

Use Control M for sequencing and scheduling . Break the functionality required into Restartable jobs and use Control M for sequencing and scheduling
Pros : Simplified Job Design and Sequencing and Job Designs are loosely coupled Pros : Flexibility to break/join jobs without major effect on sequencing. No additional overhead of maintaining Restartable points Cons : Complexity of sequencing is shifted to scheduling.

2002. Infosys Technologies Ltd.

Scheduling Sequencer Approach

2002. Infosys Technologies Ltd.

10

Scheduling Control M Approach


The scheduling of jobs/scripts in a project is done through Cntl-m. The dependency between jobs within the same module or across the modules (successor/predecessor) are tracked in an xls and is submitted to the cntl-m team

The dependency of the jobs is set up in the cntl-m using triggers, so that a job starts execution only after all its predecessors completed their execution successfully
The trigger can be the successful completion of a job, presence of a particular file, etc.

Sample Control M excel attached

Requester Name Contact Information Requested Migration Date

Brian Turbes Application 612-304-0476, brian.turbes@target.com Description of Request Test 2/10/2005 Prod

ADW Gift Registry New job setup for application ADWGR 04/01/2005 Time Window for Job Dependencies Start (job names or line number) (optional)

Table Name (If table exists)

Job Name (If job exists)

Action Requested Add, Change, Delete

Server / Account Test Prod

Path Name, Script Name, Parameters

Days Scheduled Holidays (M,T,W,Th,F,Sa, Su)

START_OF_CYCL E

ADWGR0010T

Add

grm etltes t/ adwgradm

START_OF_CYCL E

ADWGR0020T

Add

grm etltes t/ adwgradm

START_OF_CYCL E

ADWGR0080T

Add

grm etltes t/ adwgradm grm etltes t/ adwgradm

/opt/scripts/test/adwetlrun.ksh -f ADWGR0010T_parms.dat ADWGR ADWGR0010TtableEtlPrcsGrp ADWGR0010T adwgrcur /opt/scripts/test/adwetlrun.ksh -f ADWGR0020T_parms.dat ADWGR ADWGR0020TtableEtlPrcs ADWGR0020T adwgrcur /opt/scripts/test/adwetlrun.ksh -f ADWGR0080T_parms.dat ADWGR ADWGR0080TtablePrcsCntl ADWGR0080T adwgrcur
/opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1005T_parm s .dat ADWGR ADWGR1005TtableGftrgE ADWGR1005T /opt/s cripts /tes t/adwacrrun.ks h ADWGR1005B ADWGR1005B ADW3407 adwgrcur ADWGR

2am

ADWGR0010T

ADWGR0020T

LANDING_JOBS

ADWGR1005T

Add

ADWGR0080T

LANDING_JOBS

ADWGR1005B

Add

grm etltes t/ adwgradm

LANDING_JOBS

ADWGR1005L

Change

grm etltes t/ adwgradm grm etltes t/ adwgradm grm etltes t/ adwgradm

LANDING_JOBS

ADWGR1008T

Add

F /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1005L_parm s .dat ADWGR ADWGR0030TtableEtlSubPrcs .ADWGR1005 ADWGR1005L adwgrcur F /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1008T_parm s .dat ADWGR ADWGR1008Tds s 1008GftrgCus t ADWGR1008T F /opt/s cripts /tes t/adwacrrun.ks h ADWGR1008B ADWGR1008B ADW3401 adwgrcur ADWGR F /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1008L_parm s .dat ADWGR ADWGR0030TtableEtlSubPrcs .ADWGR1008 ADWGR1008L adwgrcur /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1010T_parm s .dat ADWGR ADWGR1010TtableGftrgCus tE ADWGR1010T adwgrcur /opt/s cripts /tes t/adwacrrun.ks h ADWGR1010B ADWGR1010B ADW3409 adwgrcur ADWGR /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1010L_parm s .dat ADWGR ADWGR0030TtableEtlSubPrcs .ADWGR1010 ADWGR1010L adwgrcur /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1015T_parm s .dat ADWGR ADWGR1015TtableGftrgBabyE ADWGR1015T adwgrcur /opt/s cripts /tes t/adwacrrun.ks h ADWGR1015B ADWGR1015B ADW3402 adwgrcur ADWGR /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1015L_parm s .dat ADWGR ADWGR0030TtableEtlSubPrcs .ADWGR1015 ADWGR1015L adwgrcur /opt/s cripts /tes t/adwetlrun.ks h -f ADWGR1020T_parm s .dat ADWGR ADWGR1020TtableGftrgCharE ADWGR1020T

ADWGR1005T

ADWGR1005B

ADWGR0080T

LANDING_JOBS

ADWGR1008B

Add

ADWGR1008T

LANDING_JOBS

ADWGR1008L

Change

grm etltes t/ adwgradm

ADWGR1008B

LANDING_JOBS

ADWGR1010T

Add

grm etltes t/ adwgradm grm etltes t/ adwgradm

ADWGR1008L

LANDING_JOBS

ADWGR1010B

Add

ADWGR1010T

LANDING_JOBS

ADWGR1010L

Change

grm etltes t/ adwgradm

ADWGR1010B

LANDING_JOBS

ADWGR1015T

Add

grm etltes t/ adwgradm grm etltes t/ adwgradm

ADWGR0080T

LANDING_JOBS

ADWGR1015B

Add

ADWGR1015T

LANDING_JOBS

ADWGR1015L

Change

grm etltes t/ adwgradm grm etltes t/ adwgradm

ADWGR1015B

LANDING_JOBS

ADWGR1020T

Add

ADWGR0080T

2002. Infosys Technologies Ltd.

11

Restart Ability
Restart Ability will influence Job Designs in breaking up Jobs Restart Ability is very important in ETL Jobs and each Job should be restart able Restart Ability will play vital role in
Loading tables with History
Sequence Number Generation Reprocessing Loading Errors/Warning Tables Loading System Health Tables

If Sequencers are used for sequencing Sequencer Routines and Shell scripts will be place holders to maintain restartable points If Control M is used for sequencing , breaking of Jobs/Identifying Common Jobs is key

2002. Infosys Technologies Ltd.

12

Reusability
Reusability is very imp in Software projects DataStage allows reusability in following forms
Shared Containers Build Ops Common Jobs Routines Templates

Shared Containers are best form of reusability on DataStage. Typical Examples that are probable for usage of Shared Container are
Sequence Id Generation Logic
Errors/Warning Generation/Loading Loading Landing tables with common functionalities Common Business Rules & Logic A Container is a group of stages and links which will perform a particular task. The container replaces the complex logic into one unit and acts as a stage.

2002. Infosys Technologies Ltd.

13

Reusability

Build Ops provide Flexibility to write own logic Build Ops can be used to obtain common functionality within/across modules , if logic to achieve that functionality using DataStage stages is complex. Code-ease: Handling complex conditions, say, many nested if-else statements or handling many stage variables and their computation is much easier in BuildOp than Transformer stage. Coding-liberties: BuildOp allows the use of data-structures like arrays and string, loopstatements like for and while loops and many other normal coding paradigms. It also allows use of various header files and their built-in functions. For ex: Include string.h and it provides you with function APT_String, which can be used for string declarations and other string operations. All the above mentioned coding features are otherwise not ease to use in DataStage.

2002. Infosys Technologies Ltd.

14

Reusability
Common Job will perform common tasks across project/modules taking different parameter to different context Common Jobs should be run in Multiple Instance to allow multiple instances in parallel Routines will help in performing Pre Job Initiation and Post Job Initiation activities like Copying Input files to different directories, ACR File generation , Log Files Etc. Clarity in defining activities between Shell Scripts, DataStage Job , Routines , Sequences,Generic Shell Script is key having clean separation and consistency across project. This will influence the Job Designs The job template should contain generic Annotations which would act as a guideline while creating the jobs All the parameters that are common across all the jobs should be defined in the job templates Specific stage properties that are common or mandatory to be set, should be defined in the job templates Templates will act as Design Pattern/Guideline in achieving consistency and strict enforcement on dos and donts Identifying common patterns and defining templates will achieve consistency Few reusable components will evolve as we progress in project , but enough exercise should be done to bring out reusable components. Piloting a module will also be another option in brining out reusable components

2002. Infosys Technologies Ltd.

15

Modularity and Maintainability


Modularity and Maintainability is another influencing factor in Job Designs Reusable Components and Restart Ability will bring the required Modularity and Maintainability A proper optimization need to be achieved between Modularity and I/O operations in a Job, keeping Restart Ability into consideration Performance Considerations and Maintainability should be properly balanced. For Ex, Reducing # of Transformers in a Job will enhance the performance , but not at the cost of its maintainability.

2002. Infosys Technologies Ltd.

16

Performance Considerations
Identifying correct stage for required functionality is key in Job Design Sequencing of stages in Job design should be decided keeping the performance considerations. For ex avoid repartitioning Usage of temporary tables/worktables/datasets may enhance the performance by reducing load on Jobs, which will influence Job Design Make sure all the necessary environment variables are part of template , which can influence performance Consider volume of data while deciding the stage. Detailed points , which can influence performance of Job are covered in performance tuning

2002. Infosys Technologies Ltd.

17

Metadata Management
Job design will be influenced by Metadata Management Considerations
Jobs should not be driven by Reject Links. To avoid reject links, Looks should have dummy column selected from reference link and should be checked in next stages like transformer.

2002. Infosys Technologies Ltd.

18