What is the DFSORT execution flow?

DFSORT execution flow is: (1) Step starts, SYSIN is read and control statements parsed; (2) Input phase—SORTIN is read, INREC and INCLUDE/OMIT applied; (3) Sort/merge phase—records are sorted or merged (or skipped for OPTION COPY); (4) Output phase—OUTREC applied, records written to SORTOUT and OUTFIL; (5) Datasets closed, summary messages written, step ends.

When does DFSORT read SYSIN?

DFSORT reads SYSIN at the beginning of the step, before it reads any data from SORTIN. All control statements must be parsed so DFSORT knows the processing options, sort/merge keys, and reformatting rules for the rest of the step.

Can DFSORT run with an empty input file?

Yes. If SORTIN is empty (zero records), DFSORT completes normally. It writes zero records to SORTOUT and prints messages indicating 0 records processed. The step does not abend.

In what order does DFSORT process records?

Records are read from SORTIN (input phase), optionally reformatted by INREC and filtered by INCLUDE/OMIT, then passed to the sort or merge phase. After ordering, records are reformatted by OUTREC (output phase) and written to SORTOUT and any OUTFIL datasets. This order is fixed.

When does a DFSORT step end?

The step ends after the output phase completes: all records are written, output datasets are closed, summary messages (e.g. ICE000I) are written to SYSOUT, and the program returns. The next JCL step (if any) then runs.

DFSORT Execution Flow

This page walks through the execution flow of a DFSORT step: what happens from the moment the step starts until it ends. Knowing this order helps you understand when datasets are opened, when control statements take effect, and how to interpret messages and abends.

Fundamentals

Progress0 of 0 lessons

Step Start and SYSIN

When the job scheduler runs your step (e.g. EXEC PGM=SORT), the DFSORT program is loaded. The first thing DFSORT does is open and read the SYSIN dataset. SYSIN contains your control statements (SORT FIELDS=, INCLUDE, OMIT, INREC, OUTREC, OUTFIL, OPTION, MERGE, etc.). DFSORT parses these statements to determine:

Whether to sort, merge, or copy (and if sort/merge, what keys)
Which records to keep or drop (INCLUDE/OMIT)
How to reformat records before and after the sort (INREC, OUTREC)
What options to apply (OPTION COPY, EQUALS, SIZE, etc.)
Any OUTFIL definitions for multiple or split outputs

If SYSIN is missing, empty, or contains invalid syntax, DFSORT can abend or issue error messages and stop. So the very first part of execution is "read and understand the control statements." Nothing is read from SORTIN until this is done.

Opening Datasets

After SYSIN is parsed, DFSORT opens the input and output datasets. It opens the dataset(s) allocated to the SORTIN DD name (or SORTIN01, SORTIN02, etc. for multi-input). It also opens the SORTOUT dataset and any OUTFIL DD names that were defined in the control statements. Opening happens before the input phase so that read/write operations are ready. If a required DD is missing (e.g. no SORTIN or no SORTOUT), DFSORT typically fails at this point with a JCL or allocation error.

Input Phase

The input phase runs next. DFSORT reads records from SORTIN (or from each merge input in turn for MERGE). For each record:

Read — One record is read from the input dataset.
INREC — If INREC was specified, the record is reformatted. The result becomes the "current" record for the rest of the step.
INCLUDE / OMIT — If INCLUDE or OMIT was specified, the condition is evaluated. If the record is to be dropped, it is not passed to the sort/merge phase; processing continues with the next input record. If the record is kept, it is passed to the sort/merge phase (or directly to the output phase if OPTION COPY is used).

This continues until all input records have been read. So the input phase is "read everything, optionally reformat and filter." The set of records that survive and their layout are now fixed for the next phase.

Sort or Merge Phase

Next, DFSORT runs the sort phase or merge phase (or skips it if OPTION COPY was specified).

SORT — Records are reordered by the keys in SORT FIELDS=. DFSORT may use in-memory sorting and/or sortwork datasets. The result is one stream of records in key order.
MERGE — The pre-sorted input streams are merged by key. One record at a time is taken from the "current smallest" of each stream and written to the output stream. No full re-sort is done.
OPTION COPY — No reordering. Records are already in the order they will be written; the sort/merge phase is skipped.

If SUM was specified, duplicate-key collapse and aggregation (sum, min, max) are done during or immediately after this phase, so that the stream going to the output phase has one record per key (or as defined by SUM). After this phase, the logical stream of records is in final order and ready to be written.

Output Phase

The output phase runs next. For each record in the ordered stream:

OUTREC — If OUTREC was specified, the record is reformatted. The result is what will be written to the primary output.
Write to SORTOUT — The (possibly reformatted) record is written to the dataset allocated to SORTOUT.
OUTFIL — If OUTFIL was specified, DFSORT may also write to additional datasets (with their own INCLUDE/OMIT and build logic). So one input record can produce one SORTOUT record and zero or more OUTFIL records, depending on OUTFIL definitions.

This continues until all records in the ordered stream have been written. So the output phase is "reformat if needed, then write to SORTOUT and OUTFIL."

Closing and Step End

After the last record is written, DFSORT closes all datasets: SORTIN, SORTOUT, and any OUTFIL and sortwork datasets. It then writes summary messages to SYSOUT (the DD used for messages, often SYSOUT=*). These messages typically include:

How many records were read from input
How many records were sorted/merged (or copied)
How many records were written to output
Any warnings or errors (e.g. ICE001I, ICE002I)

The message prefix ICE identifies DFSORT. After the messages are written, the DFSORT program returns control to the job scheduler. The step is complete; the next step in the job (if any) runs, or the job ends.

Execution Order in Sequence

In sequence: (1) Step starts, load DFSORT. (2) Read and parse SYSIN. (3) Open SORTIN, SORTOUT, OUTFIL DDs. (4) Input phase: read each record from SORTIN, apply INREC, apply INCLUDE/OMIT, pass kept records to next phase. (5) Sort/merge phase: reorder (or skip for COPY), optionally SUM. (6) Output phase: for each record, apply OUTREC, write to SORTOUT and OUTFIL. (7) Close datasets, write summary messages, step ends.

What If Input Is Empty?

If SORTIN has zero records, the input phase reads nothing, the sort/merge phase has nothing to order, and the output phase writes nothing. DFSORT still completes normally. SORTOUT will be an empty dataset (or contain only any OUTFIL headers/trailers if defined). Messages will show 0 records read and 0 written. This is valid and often occurs when a prior step produced no data or when INCLUDE/OMIT filters out every record.

What If There Is an Error?

Errors can occur at different points. SYSIN errors (syntax, invalid key, etc.) are usually detected during SYSIN parse; the step may abend before reading SORTIN. Dataset errors (missing DD, D37, D87, etc.) can occur during open or during read/write. Data or logic errors (e.g. invalid format, SUM overflow) may be detected during the input, sort, or output phase. When DFSORT detects a severe error, it typically writes an error message to SYSOUT and abends; the step ends with a non-zero condition code. So the execution flow can "short-circuit" at any stage if something goes wrong.

Explain It Like I'm Five

Think of DFSORT as a worker with a checklist. First the worker reads the checklist (SYSIN)—what to do, what order to put things in, what to throw away. Then the worker opens the inbox (SORTIN) and the outbox (SORTOUT). Then the worker takes each paper from the inbox, maybe fixes it or throws it away (INREC, INCLUDE/OMIT), then puts the kept papers in order (sort or merge) or doesn't (copy). Then the worker writes each paper into the outbox, maybe copying it in a new format (OUTREC). When everything is done, the worker closes the boxes and writes a note saying how many papers were handled. That note is the ICE message. If something is wrong on the checklist or with the boxes, the worker might stop early and leave an error note.

Exercises

In what order does DFSORT use SYSIN, SORTIN, and SORTOUT during the step?
If your job has two steps—step 1 runs DFSORT, step 2 runs a program that reads the DFSORT output—when does step 2 start?
Where in the execution flow is INCLUDE applied relative to OUTREC?
Why does DFSORT read SYSIN before reading SORTIN?

Quiz

Test Your Knowledge

1. When does DFSORT read the SYSIN control statements?

After writing SORTOUT
At the start of the step, before reading SORTIN
Only if OPTION COPY is used
At the end of the sort phase

2. What happens if SORTIN is empty?

The step abends immediately
DFSORT writes an empty SORTOUT and completes normally
DFSORT waits for data
SYSIN is ignored

3. When are ICE messages (e.g. ICE000I) written to SYSOUT?

Only at step end
Throughout the step and at step end
Only if an error occurs
Before SYSIN is read

4. In execution order, when does the sort phase run?

Before the input phase
After the input phase and before the output phase
After OUTREC
Only when MERGE is used

5. What does DFSORT do after the last record is written to SORTOUT?

Immediately ends the step
Closes datasets, writes summary messages, then ends the step
Re-reads SYSIN
Starts the sort again