Prepared by : Vishal Kulkarni (Team Lead , RMG Operations) Production Control - OPC (Operation Planning and Control) Overview
This Training provides an overview of Production Control on Mainframes And overview on OPC (Operation planning and control) and will learn techniques Of effective monitoring and effective product utilization Prerequisites
Mainframe MVS , ISPF ,JCL
Training Agenda Production Control
OPC (Production Control)
Open Forum & Assessment Production Scheduling Overview of Scheduling
Terminology
Overview of Scheduling Scheduling allows for centralized management of the production application environment and automation of operator activities across multiple platforms and operating systems in a distributed environment.
Centralized job scheduling controls applications so they run at the correct time, in the proper order of execution and provide monitoring services to ensure applications terminate normally and problem management with error reports for the jobs that do not terminate normally There are many scheduling tools available from third parties Like Computer Associates CA7 (Mainframes) Autosys (Unix) BMC Control M (Mainframes) IBM OPC/ESA (Mainframes) Appworx Appworx (Windows) Let Us Understand The Terminology JOB Controlled executable process
For example : Mainframe JOB which has executable Steps //ACCT1 JOB 123,RAJ //STEP1 EXEC PGM=PAY1 //STEP2 EXEC PGM=PAY2
JOB Stream A Set of jobs , their dependency information And run cycles Dependencies might be Internal / External Run Cycle is a set of DATES controlling job Stream Production
SUBMIT TIME The TIME on or After which a JOB STREAM or JOB may launch for execution Note : Even after the submit time passes the job may not start execution , because It might be waiting for internal/external dependencies, waiting for some resources (like initiators / online regions ...etc)
ABEND / FAILURE Whenever the job start executing it has to end normally / it has to fail / has to end abnormally (ABEND) NOTE: There are various types of ABENDS depends on job completion code like soc7-Data mismatch error,sb37 space error..etc PREDECESSOR
Job stream or job that must complete successfully Before another job / job stream can run.
SUCCESSOR
Job or Job stream that depended on The successful Completion of a prior job/ job stream
TRIGGERING Triggers cause jobs to be dynamically scheduled based solely on completion of an event.
The triggering event may be date/time scheduled or it too can be automatically triggered by even another event .
Once defined, triggers are perpetual, remaining in effect until they are either deleted or modified by the user Schedule triggers can be used as a more efficient alternative to date/time schedules. Schedule The current or long-term plan. To determine the input arrival date and time of an occurrence or operation. Waiting List / Ready List A list of jobs that have been submitted but still have uncompleted predecessors.
Operations will be included in the waiting list if the JCL is not submitted by the controller, and the tracker has been started with HOLDJOB(YES).
RERUN A function that lets an application or part of an application that ended in error be run again.
Restart and Cleanup A recovery function that ensures the restart of a job and the related cleanup actions, for example, deleting or un-cataloging data sets created in a job run.
Calendar
The data that defines the operation departments work time in terms of work days and free days. OPC has been scheduling and controlling batch workloads in data centers since 1977.
Usually OPC can easily handle overnight workload Consisting of 100,000 production jobs.
OPC databases contain information about the work that is to be run, when it should run, and the resources that are needed and available
OPC Overview Long Term Plan :
The long term plan usually covers a time range of four to twelve weeks.
Current plan :
The current plan usually covers 24 hours and is a detailed production schedule. OPC uses this to submit jobs to the appropriate processor at the appropriate time.
OPC consists of a Controller and trackers.
Controller : The Controller manages the databases and the plans and causes the work to be submitted at the appropriate time.
Tracker : The tracker records details of job starts and ends and passes that information to the Controller, which updates the current plan with statuses
OPC Terminology Plan Job stream Work Station Special Resources Dependencies Calendars Business Processing Cycle OPC Terminology Plan :
OPC builds operating plans from the descriptions of the production workload. First, a long-term plan (LTP) is created, which shows (usually, for one or two months) the job streams (applications) that should be run each day and the dependencies between job streams Then, a more detailed current plan is created. The current plan is used by OPC to submit and control jobs (operations).
Jobstream:
A job stream is a description of a unit of production work. It includes a list of the jobs (related tasks) associated with that unit of work.
For example
A payroll job stream might include a manual task in which an operator prepares a job, several computer-processing tasks in which programs are run to read a database, update employee records, and write payroll information to an output file, and a print task that prints paychecks.
Workstations:
OPC supports a range of work process types, called workstations, that map the processing needs of any task in your production workload. Each workstation supports one type of activity. This gives you the flexibility To schedule, monitor, and control any data center activity, including: Job setup (manual and automatic) Jobs Started tasks Print jobs Manual preprocessing or post processing activity.
Special resources
A special resource can be used to serialize access to a dataset or to limit the number of file transfers on a network link.
The resource does not have to represent a physical object in your configuration, although it often does.
Dependencies
In OPC, we can specify dependencies for jobs when a specific processing order is required.
Most data processing activities need to occur in a specific order.
Calendars
OPC uses calendar information so that jobstreams are not scheduled to run-on days when processing resources are not available (for example, Sundays and holidays). This information is stored in an OPC calendar.
OPC supports multiple calendars for enterprises where different departments have different work days and free days or when multiple data centers in different states or regions are controlled from a single site. OPC ARCHITECTURE
OPC architecture OPC consists of a Controller and one or more trackers. The Controller runs on an OS/390 system. The Controller manages the OPC databases and the long term and current plans. The Controller schedules work and causes jobs to be submitted to the appropriate system at the appropriate time. Trackers are installed on every system managed by the Controller. The tracker is the link between the Controller and the managed system. The tracker submits jobs when the Controller instructs it to do so, and it passes job start and job end information back to the Controller.
OPC MONITORING & CONTROLLING
LOGON PROCEDURE COMUNICATION WITH OPC
OPC Monitoring & Controling The Production Monitor is responsible for the activities post failure of the batch job on OPC/ESA .Once the job fail is notified on OPC Production monitor has to perform the action .
Checking the Job Restart / Recovery instruction which can be found in Operator instructions in OPC and follow as per instructions.
There are different scenarios for failed job explained in the Operator Instruction of the job like restarting the job , resubmitting the job, Force complete the job, call the analyst that depends on the criticality on Job.
Logon Procedure to OPC Pre-requisite
1. You have a TSO user ID that is authorized to access the scheduler subsystem. 2. The scheduler subsystem is started. 3. You are authorized to update these scheduler databases: Workstation Calendar Application description Job description 4. You are authorized to use these scheduler functions: Long-term planning Daily planning Workstation communication Modify current plan Query current plan To perform most tasks, you use the scheduler panels, which run under Interactive System Productivity Facility (ISPF).
From ISPF panel select option for OPC (system defined)
You can use the ISPF quick return command (=) as a fast path through the panels. For example, to return to the ready list from wherever you are in the panels, enter =4.1.0 on the command line.
Communication with OPC There are different ways of logging into OPC, It can be defined on the ISPF Once u entered into OPC Environment your screen will be look like below ----------------- OPERATIONS PLANNING AND CONTROL ------------- Option ===>
Welcome to TWS 8.1. You are communicating with OPCP via Server A00OPSP
Select one of the following options and press ENTER.
0 OPTIONS - Define OPC dialog user parameters and options
1 DATABASE - Display or update OPC data base information 2 LTP - Long Term Plan query and update 3 DAILY PLANNING - Produce daily plans, real and trial 4 WORK STATIONS - Work station communication 5 MCP - Modify the Current Plan 6 QCP - Query the status of work in progress 7 OLD OPERATIONS - Restart old operations from the DB2 repository
9 SERVICE FUNC - Perform OPC service functions
U RM UTILITIES - Royal Mail Group In-House Utilities
X EXIT - Exit from the OPC dialog
In above Panel need to select the option as per our requirements such as checking for error, checking for workstation, modifying the current plan OPC MAIN-MENU Fig-1 ABEND HANDLING Part-I
CHECKING FOR JOB FAILURE IN OPC CHECKING FOR OPERATOR INSTRUCTION Checking for the Failure Jobs on OPC For Checking The Failures in OPC select option 5 from (Fig-1) OPC Main-menu , now you can have the Panel Modifying the current plan Fig-2 ------------------------- MODIFYING THE CURRENT PLAN ---------------- Option ===>
Select one of the following:
1 ADD - Add a new occurrence to the current plan 2 LIST - List existing occurrences for further processing
3 OPERATIONS - List existing operations for further processing 4 ERROR HANDLING - Handle operations in error 5 WORK STATIONS - Change status and open interval of work stations
6 JOB SETUP - Prepare JCL for jobs in the current plan
7 SPECRES - Special resource monitor
9 DEFINE EL - Define alternative error list layouts
Fig-2 Now you can see option to modify the current plan. For checking the errors select Option 4 (Fig-2) (ERROR HANDLING) Handle operations in error. Now this can lead you to next panel shown below. And just type in OPCESA as layout ID --------------- SPECIFYING ENDED IN ERROR LIST CRITERIA ----------- Command ===>
Specify selection criteria below and press ENTER to create a list of operations that have ended in error.
LAYOUT ID ===> OPCESA__ Id of layout, * for a list
JOBNAME ===> ________ APPLICATION ID ===> ________________ OWNER ID ===> ________________ AUTHORITY GROUP ID ===> ________ WORK STATION NAME ===> ____ ERROR CODE ===> ____ GROUP DEFINITION ===> ________________
CLEAN UP TYPE ===> ____ Types list: A M I N or blank CLEAN UP RESULT ===> __ Results list: C E or blank
Fig-3 ----- HANDLING OPERATIONS ENDED IN ERROR (left part) Row 1 to 2 of Command ===> Scroll ===> CSR
Scroll right, enter the EXTEND command to get extended row command information, enter the HIST command to select operation history list or enter any of the row command s below: I,O,J,L,RC,C,MH,MR,SJR or RER,ARC,WOC,CMP,MOD,DEL,RG,DG or CG
LAYOUT ID ===> OPCESA__ Change to switch layout id
Cmd Ended time Application ws no. Jobname Errc ''' 06/07/17 06.30 MXPMW115 CM70 20 CHKPWSMF 0012 ''' 06/07/17 06.16 MXPMW110 CM70 30 CRPPWBGB 0008 ******************************* Bottom of data ******************** Fig-4 The Fig-4 shows the job panel which are in failure state Command Tab Application name Job name Job completion code Workstation name Checking the Operator Instruction
Once you get the failure in OPC (5.4), need to check the Operator instructions to resolve / Pass it to some body. This can be achieved by keying the O in the cmd field in above figure 4. The model screen looks like figure 5 ------------------- OPERATOR INSTRUCTION ------------------------- Command ===>
Application : MXPMW115 Wky smf job Cre POH3/5 Operation : CM70 020 Wky smf job SAP SMF data
*********************************************************** Top of Data ***** DOCBOX-EXPORT: 28 Jun 2006 10:55:45
Job : CHKPWSMF - Wky smf job SAP SMF data ========================================= Callout: N Rerun: Y Online Affected: Y
JOB NOTES ========= Build weekly SMF creation job in dynam.
RECOVERY INSTRUCTIONS ===================== Restart from the top in all cases. This part shows the Criticality of the Job Restart Instructions There are three Objects of Operator Instructions 1) Criticality 2) Job notes 3) Restart / Recovery instructions
Criticality :
There are three components under the criticality Callout , Rerun , Online Affected
Callout Values can be YES/NO : need to call the oncall Analyst related to this Application Rerun Values can be YES/NO : can be Rerun or Not Online Effected - Values can be YES/NO If the value is Yes then this is the Critical job need to call immediately. Job Notes :
Job notes is the kind of preparation of the job on failure Mostly these are the instruction for Analyst (on call personals ) who will take care of this job (Production Support Analyst )
And has information about the nature of the job like application details, run details etc
Recovery Instruction:
This section has details about Recovery step on the Failure job , Like restart the , restart from particular step, and call the Analyst. (OPC/ESA Training) ABEND HANDLING Part-II Restart& Rerun Forcecomplete (Job & Schedule) Deleting (Job & Schedule) Manual Hold Manual Release OPC Scheduling - PART-I Modifying The Current Plan Adding an Occurrence to the Current Plan Adding application to the Current Plan Changing Details of an Operation in the Current Plan Rerunning an Occurrence from a Specific Operation ABEND Handling on OPC PART-II There are different scenarios to follow once there is any failure detected in OPC.
Once any failure in OPC need to check the operator instructions Operator instruction like Callout = yes (invoke the call list and call the anlyst from call list ) Rerun = Y (Restart the job either top ) Restart from particular step then edit jcl then add restart step and restart Force complete the job Check for some other resources (any online regions availability...etc) 1) RESTARTING the job:
Once you get the failure in OPC (5.4), need to check the Operator instructions to resolve / Pass it to some body. This can be achieved by keying the O in the cmd field in above figure 4.
ABEND HANDLING Different Scenarios -------- HANDLING OPERATIONS ENDED IN ERROR(left part) Row 1 to 2 of 2 Command ===> Scroll ===> CSR
Scroll right or enter the SUPPRESS command to suppress full row command information, enter the HIST command to select operation history list or enter any of the row commands below:
OPERATION RELATED COMMANDS : I query information, O browse operator instructions, J edit JCL, C complete, MH manual hold, MR manual release, SJR simple job restart, RC restart and cleanup, L Browse joblog
OCCURRENCE RELATED COMMANDS: RER rerun, ARC attempt automatic recovery, WOC reset to waiting, CMP complete, MOD modify, DEL delete, RG Remove from Group, DG Delete Group, or CG Complete Group
LAYOUT ID ===> OPCESA__ Change to switch layout id
Cmd Ended time Application ws no. Jobname Errc SJR 06/07/17 06.30 MXPMW115 CM70 20 CHKPWSMF 0012 ''' 06/07/17 06.16 MXPMW110 CM70 30 CRPPWBGB 0008 ******************************* Bottom of data ******************************* If the operator instructions says to Restart the job then go back to job error showing panel and typing in SJR in cmd column. Then the job will be restarted. If the Operator instruction says to Restart the from Particular step then Fig 4 shown panel put J it takes you into the jcl then need to add the RESTART card in the JOB statement RESTART= Stepname.procname , F3 (save & exit). and key in SJR in cmd column. Job will be restarted. If the Operator instruction says to Restart the from Particular step then Fig 4 shown panel put J it takes you into the jcl then need to add the RESTART card in the JOB statement RESTART= Stepname.procname , F3 (save & exit). and key in SJR in cmd column. Job will be restarted. Completing (Job & Schedule)
To Force complete the job in OPC Go to 5.4 option Just key in C for complete the job Key in CMP for force complete the Schedule -------- HANDLING OPERATIONS ENDED IN ERROR(left part) Row 1 to 2 of 2 Command ===> Scroll ===> CSR
Scroll right or enter the SUPPRESS command to suppress full row command information, enter the HIST command to select operation history list or enter any of the row commands below:
OPERATION RELATED COMMANDS : I query information, O browse operator instructions, J edit JCL, C complete, MH manual hold, MR manual release, SJR simple job restart, RC restart and cleanup, L Browse joblog
OCCURRENCE RELATED COMMANDS: RER rerun, ARC attempt automatic recovery, WOC reset to waiting, CMP complete, MOD modify, DEL delete, RG Remove from Group, DG Delete Group, or CG Complete Group
LAYOUT ID ===> OPCESA__ Change to switch layout id
Cmd Ended time Application ws no. Jobname Errc
''' 06/07/17 06.16 MXPMW110 CM70 30 CRPPWBGB 0008 ******************************* Bottom of data ******************************* To Restart the Job Key in SJR For Completing the Job Put C on CMD tab For Operator instructions key in O on CMD tab Deleting (Job & Schedule)
You can delete an application occurrence from the current plan by entering D in the row Command field for the occurrence in the Modifying Occurrences in the Current Plan panel.
The scheduler then displays a confirmation panel showing all external dependencies for the Occurrence. After your confirmation, all operations in the occurrence are deleted.
Additionally, you can delete occurrences on a fault-tolerant workstation in the same way. Manual Holds & Manual Release Sometimes you must delay the start of an operation because of a situation beyond your control.
For example, the application programmer is manually editing some production files to incorporate an urgent program fix. In such situations, when the operations concerned are already in the current plan and waiting only for a certain time or for predecessors to be complete, you must do something to stop the operation from being started when the Scheduling criteria are satisfied.
Manually HOLD the operation by using the MCP panel or the ready list if the operation predecessors are already complete.
Modify the job to include a deliberate error; for example, a comma at the end of the job card for a z/OS job. The job is submitted when all the scheduling criteria are met but does not actually execute until the syntax error is corrected.
Modify the occurrence to include an extra operation on a general workstation, which becomes a predecessor for the operation you need to delay.
The manual HOLD command, MH, can be issued for an operation on a compute The manual HOLD command, MH, can be issued for an operation on a computer Workstation with automatic reporting or on any workstation with no reporting
The scheduler does not start any operation that has been manually placed in HOLD by a panel user, even though the status of the operation will change when the operation start criteria make the operation eligible to be started.
When you no longer want the operation held, you can issue the RELEASE command, MR, and the operation extended status code changes to reflect the current situation. If all start criteria for this operation are met, the operation can start immediately. MODIFY THE CURRENT PLAN-(5.3)
From the Main Menu panel- Option 5 is for dynamically amending the schedules Option 6 is for Browsing the schedule status Select Option 5 from the main TWS menu. (MODIFY the CURRENT PLAN). The following panel will be displayed. From this menu you can;-
1-ADD Applications into the CURRENT PLAN 2-Perform amendments on Applications or Jobs 3-Perform enhanced amendments on jobs (HOLD, NOP, EXECUTE etc) 4-Perform functions on jobs ENDED in ERROR (Uses a LAYOUT option) 5-Amend WORK STATIONS (Useful for machine manipulation) 6-Not generally used 7-Check status of SPECIAL RESOURCES (VERY USEFUL TO OPS!!!!!!!!!!!)
From this panel you can select two options, Option 3 will allow you to view information about your jobs in the Current Plan. ACTION Select OPTION 3 From this panel you can select any jobname or application name or a combination of both to view the information on your jobs ACTION Type your jobname or Application ID in the relevant field and Press ENTER MODIFY THE CURRENT PLAN 5.2 Select Option 5 from the main TWS menu. (MODIFY the CURRENT PLAN). The following panel will be displayed.
From this panel you can select two options, Option 2 will allow you to view information about your jobs in the Current Plan. ACTION Select OPTION 2 Adding Occurrence to the Current Plan Go to option 5.1 of OPC main menu i.e. Adding Occurrence to the Current Plan. It is ADHOC run. Which can be added upon request from the Analyst.
In this panel we need to add the Schedule name and input arrival time deadline time. Etc.