This page walks through the execution flow of a DFSORT step: what happens from the moment the step starts until it ends. Knowing this order helps you understand when datasets are opened, when control statements take effect, and how to interpret messages and abends.
When the job scheduler runs your step (e.g. EXEC PGM=SORT), the DFSORT program is loaded. The first thing DFSORT does is open and read the SYSIN dataset. SYSIN contains your control statements (SORT FIELDS=, INCLUDE, OMIT, INREC, OUTREC, OUTFIL, OPTION, MERGE, etc.). DFSORT parses these statements to determine:
If SYSIN is missing, empty, or contains invalid syntax, DFSORT can abend or issue error messages and stop. So the very first part of execution is "read and understand the control statements." Nothing is read from SORTIN until this is done.
After SYSIN is parsed, DFSORT opens the input and output datasets. It opens the dataset(s) allocated to the SORTIN DD name (or SORTIN01, SORTIN02, etc. for multi-input). It also opens the SORTOUT dataset and any OUTFIL DD names that were defined in the control statements. Opening happens before the input phase so that read/write operations are ready. If a required DD is missing (e.g. no SORTIN or no SORTOUT), DFSORT typically fails at this point with a JCL or allocation error.
The input phase runs next. DFSORT reads records from SORTIN (or from each merge input in turn for MERGE). For each record:
This continues until all input records have been read. So the input phase is "read everything, optionally reformat and filter." The set of records that survive and their layout are now fixed for the next phase.
Next, DFSORT runs the sort phase or merge phase (or skips it if OPTION COPY was specified).
If SUM was specified, duplicate-key collapse and aggregation (sum, min, max) are done during or immediately after this phase, so that the stream going to the output phase has one record per key (or as defined by SUM). After this phase, the logical stream of records is in final order and ready to be written.
The output phase runs next. For each record in the ordered stream:
This continues until all records in the ordered stream have been written. So the output phase is "reformat if needed, then write to SORTOUT and OUTFIL."
After the last record is written, DFSORT closes all datasets: SORTIN, SORTOUT, and any OUTFIL and sortwork datasets. It then writes summary messages to SYSOUT (the DD used for messages, often SYSOUT=*). These messages typically include:
The message prefix ICE identifies DFSORT. After the messages are written, the DFSORT program returns control to the job scheduler. The step is complete; the next step in the job (if any) runs, or the job ends.
In sequence: (1) Step starts, load DFSORT. (2) Read and parse SYSIN. (3) Open SORTIN, SORTOUT, OUTFIL DDs. (4) Input phase: read each record from SORTIN, apply INREC, apply INCLUDE/OMIT, pass kept records to next phase. (5) Sort/merge phase: reorder (or skip for COPY), optionally SUM. (6) Output phase: for each record, apply OUTREC, write to SORTOUT and OUTFIL. (7) Close datasets, write summary messages, step ends.
If SORTIN has zero records, the input phase reads nothing, the sort/merge phase has nothing to order, and the output phase writes nothing. DFSORT still completes normally. SORTOUT will be an empty dataset (or contain only any OUTFIL headers/trailers if defined). Messages will show 0 records read and 0 written. This is valid and often occurs when a prior step produced no data or when INCLUDE/OMIT filters out every record.
Errors can occur at different points. SYSIN errors (syntax, invalid key, etc.) are usually detected during SYSIN parse; the step may abend before reading SORTIN. Dataset errors (missing DD, D37, D87, etc.) can occur during open or during read/write. Data or logic errors (e.g. invalid format, SUM overflow) may be detected during the input, sort, or output phase. When DFSORT detects a severe error, it typically writes an error message to SYSOUT and abends; the step ends with a non-zero condition code. So the execution flow can "short-circuit" at any stage if something goes wrong.
Think of DFSORT as a worker with a checklist. First the worker reads the checklist (SYSIN)—what to do, what order to put things in, what to throw away. Then the worker opens the inbox (SORTIN) and the outbox (SORTOUT). Then the worker takes each paper from the inbox, maybe fixes it or throws it away (INREC, INCLUDE/OMIT), then puts the kept papers in order (sort or merge) or doesn't (copy). Then the worker writes each paper into the outbox, maybe copying it in a new format (OUTREC). When everything is done, the worker closes the boxes and writes a note saying how many papers were handled. That note is the ICE message. If something is wrong on the checklist or with the boxes, the worker might stop early and leave an error note.
1. When does DFSORT read the SYSIN control statements?
2. What happens if SORTIN is empty?
3. When are ICE messages (e.g. ICE000I) written to SYSOUT?
4. In execution order, when does the sort phase run?
5. What does DFSORT do after the last record is written to SORTOUT?