Job restart lets you run a failed (or cancelled) job again starting from a specific step instead of from the beginning. You put the RESTART= parameter on the JOB statement and give the step name (or step.procstep for steps inside a procedure). The system skips all steps before that point so you do not re-execute completed work or duplicate output. For programs that support checkpoint/restart, you can also restart from a checkpoint ID. The RD parameter controls whether restart is allowed and whether checkpoints are written.
Imagine a list of chores: make bed, brush teeth, eat breakfast, go to school. If you finished the first two and then had to stop, tomorrow you do not start over from "make bed"—you start from "eat breakfast." Job restart is like that: the job has a list of steps; if it failed at step 5, you tell the system "restart at step 5" (or the next one) so steps 1–4 are not run again. That saves time and avoids doing the same work twice.
RESTART= is a parameter of the JOB statement. It tells the system which step to run first when the job is submitted. All steps before that step are skipped. The job must be the same job (same JCL structure) as the one that failed or was stopped; you are not creating a new job from scratch, you are resuming. Syntax: RESTART=stepname for a step in the job, or RESTART=stepname.procstepname for a step inside a cataloged procedure invoked by stepname.
| Form | Meaning |
|---|---|
| RESTART=stepname | Restart at the beginning of stepname. Stepname is a job step (EXEC) in this job. |
| RESTART=stepname.procstepname | Restart at procstepname inside the procedure invoked by stepname. For cataloged procedures. |
| RESTART=(jobname,checkpoint-id) | Restart from a checkpoint. Jobname is the job that wrote the checkpoint; checkpoint-id identifies the checkpoint. |
| Omit RESTART | Job runs from the first step (normal run). |
When the step you want is a direct step in your JCL (not inside a procedure), use RESTART=stepname. The step name is the name on the EXEC statement (the first position after //). For example, if your job has //STEP1 EXEC ..., //STEP2 EXEC ..., //STEP3 EXEC ..., then RESTART=STEP3 means STEP1 and STEP2 are skipped and execution begins at STEP3. STEP3 and any following steps run normally (subject to COND and other parameters).
123456//MYJOB JOB (ACCT),'JOB RESTART',RESTART=STEP2 //STEP1 EXEC PGM=PROG1 //DD1 DD DSN=FILE1,DISP=SHR //STEP2 EXEC PGM=PROG2 //DD2 DD DSN=FILE2,DISP=OLD //STEP3 EXEC PGM=PROG3,COND=(4,LT)
With RESTART=STEP2, STEP1 is skipped. STEP2 and STEP3 run. Use this after a run where STEP1 completed and STEP2 or STEP3 failed; fix the problem and resubmit with RESTART=STEP2 so STEP1 is not repeated.
When the step you want is inside a cataloged procedure, you must identify both the EXEC step that calls the procedure and the step name within the procedure. Use RESTART=stepname.procstepname. The first name (stepname) is the label of the EXEC that invokes the procedure (e.g. //STEP1 EXEC PROC=MYPROC). The second (procstepname) is the step name as defined inside the procedure. The system then restarts at that procedure step; any steps in the procedure before procstepname are skipped for that invocation.
123//BATCH JOB 100,RESTART=STEP1.STEP07 //STEP1 EXEC PROC=PAYPROC //STEP2 EXEC PGM=REPORT
Here, STEP1 invokes PAYPROC. Restart at STEP07 inside PAYPROC is specified by RESTART=STEP1.STEP07. Steps before STEP07 within the procedure are skipped; STEP2 runs after the procedure completes.
Some programs write checkpoints (save state) so that if the job fails, you can resume from that point instead of from a step boundary. When using checkpoint/restart, you specify RESTART=(jobname,checkpoint-id). Jobname is the name of the job that wrote the checkpoint (often the same job you are resubmitting). The checkpoint-id identifies which checkpoint to use (e.g. C0000007). The program must support checkpoint/restart (e.g. CHKPT macro in assembler, COBOL checkpoint/restart). Not all programs support this; step-level restart (RESTART=stepname) is more common.
The RD (Restart Definition) parameter controls whether the job (or step) can be restarted and whether checkpoint records are written. RD=NC (or RNC) means no restart and no checkpoint: the job will not be restarted automatically, and no checkpoint data set is written. If RD is on the JOB statement, it can override RD on EXEC statements. When you want to prevent restart (e.g. for a one-off test), you can code RD=NC. When you want normal restart capability, do not set RD=NC. See the RD quick reference for full values (R, NR, NC, RNC, etc.).
12//RUN JOB (X,Y),'RUN',RD=RNC //STEP1 EXEC PGM=PGM1
RD=RNC on the JOB statement suppresses restart and checkpoint for the whole job. Use when you do not want the job to be restarted or when you are debugging.
When you restart, steps that did not run did not create or modify their output data sets. So any step that runs after the restart point may expect data sets that were produced by skipped steps. You must either: (1) ensure those data sets still exist from the previous run (e.g. they were not deleted, or they are cataloged), and use DISP=SHR or OLD as appropriate, or (2) restart from a step that does not depend on outputs of skipped steps. If a step expects a data set that was created by a step you skipped, that data set must still be available from the earlier run.
1. RESTART=STEP3 on the JOB statement means:
2. To restart at a step inside a cataloged procedure you use:
3. RD=NC is used to: