MainframeMaster

JCL Job Restart

Job restart lets you run a failed (or cancelled) job again starting from a specific step instead of from the beginning. You put the RESTART= parameter on the JOB statement and give the step name (or step.procstep for steps inside a procedure). The system skips all steps before that point so you do not re-execute completed work or duplicate output. For programs that support checkpoint/restart, you can also restart from a checkpoint ID. The RD parameter controls whether restart is allowed and whether checkpoints are written.

Explain Like I'm Five: What Is Job Restart?

Imagine a list of chores: make bed, brush teeth, eat breakfast, go to school. If you finished the first two and then had to stop, tomorrow you do not start over from "make bed"—you start from "eat breakfast." Job restart is like that: the job has a list of steps; if it failed at step 5, you tell the system "restart at step 5" (or the next one) so steps 1–4 are not run again. That saves time and avoids doing the same work twice.

RESTART= on the JOB Statement

RESTART= is a parameter of the JOB statement. It tells the system which step to run first when the job is submitted. All steps before that step are skipped. The job must be the same job (same JCL structure) as the one that failed or was stopped; you are not creating a new job from scratch, you are resuming. Syntax: RESTART=stepname for a step in the job, or RESTART=stepname.procstepname for a step inside a cataloged procedure invoked by stepname.

RESTART= forms
FormMeaning
RESTART=stepnameRestart at the beginning of stepname. Stepname is a job step (EXEC) in this job.
RESTART=stepname.procstepnameRestart at procstepname inside the procedure invoked by stepname. For cataloged procedures.
RESTART=(jobname,checkpoint-id)Restart from a checkpoint. Jobname is the job that wrote the checkpoint; checkpoint-id identifies the checkpoint.
Omit RESTARTJob runs from the first step (normal run).

Restarting at a Job Step

When the step you want is a direct step in your JCL (not inside a procedure), use RESTART=stepname. The step name is the name on the EXEC statement (the first position after //). For example, if your job has //STEP1 EXEC ..., //STEP2 EXEC ..., //STEP3 EXEC ..., then RESTART=STEP3 means STEP1 and STEP2 are skipped and execution begins at STEP3. STEP3 and any following steps run normally (subject to COND and other parameters).

jcl
1
2
3
4
5
6
//MYJOB JOB (ACCT),'JOB RESTART',RESTART=STEP2 //STEP1 EXEC PGM=PROG1 //DD1 DD DSN=FILE1,DISP=SHR //STEP2 EXEC PGM=PROG2 //DD2 DD DSN=FILE2,DISP=OLD //STEP3 EXEC PGM=PROG3,COND=(4,LT)

With RESTART=STEP2, STEP1 is skipped. STEP2 and STEP3 run. Use this after a run where STEP1 completed and STEP2 or STEP3 failed; fix the problem and resubmit with RESTART=STEP2 so STEP1 is not repeated.

Restarting at a Step Inside a Procedure

When the step you want is inside a cataloged procedure, you must identify both the EXEC step that calls the procedure and the step name within the procedure. Use RESTART=stepname.procstepname. The first name (stepname) is the label of the EXEC that invokes the procedure (e.g. //STEP1 EXEC PROC=MYPROC). The second (procstepname) is the step name as defined inside the procedure. The system then restarts at that procedure step; any steps in the procedure before procstepname are skipped for that invocation.

jcl
1
2
3
//BATCH JOB 100,RESTART=STEP1.STEP07 //STEP1 EXEC PROC=PAYPROC //STEP2 EXEC PGM=REPORT

Here, STEP1 invokes PAYPROC. Restart at STEP07 inside PAYPROC is specified by RESTART=STEP1.STEP07. Steps before STEP07 within the procedure are skipped; STEP2 runs after the procedure completes.

Checkpoint Restart: RESTART=(jobname, checkpoint-id)

Some programs write checkpoints (save state) so that if the job fails, you can resume from that point instead of from a step boundary. When using checkpoint/restart, you specify RESTART=(jobname,checkpoint-id). Jobname is the name of the job that wrote the checkpoint (often the same job you are resubmitting). The checkpoint-id identifies which checkpoint to use (e.g. C0000007). The program must support checkpoint/restart (e.g. CHKPT macro in assembler, COBOL checkpoint/restart). Not all programs support this; step-level restart (RESTART=stepname) is more common.

RD Parameter and Restart Behavior

The RD (Restart Definition) parameter controls whether the job (or step) can be restarted and whether checkpoint records are written. RD=NC (or RNC) means no restart and no checkpoint: the job will not be restarted automatically, and no checkpoint data set is written. If RD is on the JOB statement, it can override RD on EXEC statements. When you want to prevent restart (e.g. for a one-off test), you can code RD=NC. When you want normal restart capability, do not set RD=NC. See the RD quick reference for full values (R, NR, NC, RNC, etc.).

jcl
1
2
//RUN JOB (X,Y),'RUN',RD=RNC //STEP1 EXEC PGM=PGM1

RD=RNC on the JOB statement suppresses restart and checkpoint for the whole job. Use when you do not want the job to be restarted or when you are debugging.

Step-by-Step: Restarting After a Failure

  1. Identify the step where the job failed (from the job log or JES output). Decide the step from which you want to restart—usually the failed step or the next one after the last successful step.
  2. Ensure the cause of the failure is fixed (e.g. data set available, correct DISP, program or resource fixed). Restarting without fixing the cause may cause the same failure.
  3. Add or change the JOB statement to include RESTART=stepname (or RESTART=stepname.procstepname if the step is inside a procedure). Use the exact step name as in your JCL.
  4. Submit the job. The system will skip all steps before the restart step and begin execution there. Check that DD statements and data sets for the restart step and later steps are correct (e.g. DISP=OLD or SHR for existing data sets that were created in a previous step).

Data Sets and Restart

When you restart, steps that did not run did not create or modify their output data sets. So any step that runs after the restart point may expect data sets that were produced by skipped steps. You must either: (1) ensure those data sets still exist from the previous run (e.g. they were not deleted, or they are cataloged), and use DISP=SHR or OLD as appropriate, or (2) restart from a step that does not depend on outputs of skipped steps. If a step expects a data set that was created by a step you skipped, that data set must still be available from the earlier run.

Best Practices

Test Your Knowledge

Test Your Knowledge

1. RESTART=STEP3 on the JOB statement means:

  • Run only STEP3
  • Start the job at STEP3 and skip earlier steps
  • Run STEP3 three times
  • Invalid

2. To restart at a step inside a cataloged procedure you use:

  • RESTART=STEP07
  • RESTART=STEP1.STEP07
  • RESTART=PROC(STEP07)
  • Only RESTART=STEP1

3. RD=NC is used to:

  • Force restart
  • Disable restart and checkpoint
  • Name the checkpoint
  • Run step 0