Understanding DFSORT architecture means understanding how a sort/merge step is put together: the JCL that runs it, the datasets it uses, and the control statements that define the operation. This page gives you a high-level picture of the components and how data flows through DFSORT.
Here, architecture means the main building blocks of a DFSORT run: the program that runs, the datasets it reads and writes, and the instructions (control statements) that tell it what to do. We are not describing IBM's internal code design—we are describing the way you assemble a DFSORT job and how data moves through it.
A DFSORT step is built from three layers that work together.
The JCL defines the step and the DD statements. The EXEC statement says which program to run (usually PGM=SORT or PGM=ICEMAN). The DD statements tell the system where the input data is, where the output goes, where the control statements are, and where to put messages. Optional DDs (e.g. STEPLIB, SORTWK01) can point to the sort product library and to work datasets. So the JCL is the "plumbing": it connects the step to the right program and to the right datasets.
DFSORT uses fixed DD names so it knows what each dataset is for:
SYSOUT=* so messages go to the job log.The control statements are the "brain" of the step. They tell DFSORT whether to SORT or MERGE, which keys to use, whether to filter (INCLUDE/OMIT), how to reformat before sort (INREC) or after (OUTREC), whether to write multiple outputs (OUTFIL), and options (OPTION). So: JCL wires the step to datasets; control statements define the operation on that data.
Data moves through DFSORT in three logical stages. The exact order depends on which control statements you use, but the idea is always: read input, process it, write output.
DFSORT reads records from SORTIN (for SORT) or from SORTIN01, SORTIN02, ... (for MERGE). During input, it can:
So the "input" to the sort/merge engine is not necessarily the raw SORTIN record—it can be a filtered and reformatted version.
For SORT, DFSORT orders all records by the key(s) you specify in SORT FIELDS. It may use memory and/or work datasets (SORTWKnn or dynamic) to hold and order the data. For MERGE, it assumes each input is already sorted by the same key and merges the streams into one sorted output. For COPY (OPTION COPY), there is no reordering—records are just passed through (after input filtering/reformat) to the output stage.
After sort/merge (and optional SUM for duplicate handling or aggregation), records are prepared for output. OUTREC can change the layout (reorder fields, add constants, edit numbers). OUTFIL can create multiple outputs: different datasets, different INCLUDE/OMIT or reformat per output, SPLIT, report layouts, etc. So you can have one SORTOUT and several OUTFIL-defined outputs from a single step.
The following JCL and control statements show the minimal architecture for a sort:
12345678//SORT1 EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.INPUT,DISP=SHR //SORTOUT DD DSN=MY.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,20,CH,A) /*
JCL: One step (SORT1), program SORT, SYSOUT for messages, SORTIN for input, SORTOUT for output, SYSIN for control data. Control: One statement—SORT FIELDS—so records are sorted by positions 1–20, character, ascending. No INREC, INCLUDE, OMIT, OUTREC, or OUTFIL, so the flow is: read from SORTIN → sort by (1,20,CH,A) → write same record to SORTOUT.
OPTION controls global behavior. For example, OPTION COPY means "do not sort"—only copy (with optional INCLUDE/OMIT, INREC, OUTREC, OUTFIL). OPTION EQUALS or NOEQUALS affects the order of records with equal keys. So in the architecture, OPTION sits alongside SORT/MERGE and modifies how the sort/merge phase (or copy path) behaves.
Other statements fit into the flow: RECORD can define record length or type; SUM applies after sort/merge to collapse duplicates or aggregate; JOINKEYS and REFORMAT define a join between two inputs, which is a different flow (two inputs, match by key, build output from REFORMAT). So the architecture expands when you use these—same idea (input → process → output), but with more inputs or different processing.
DFSORT is like a machine with three parts: a door where papers (records) come in, a desk where someone sorts or stacks them in order, and a door where the sorted papers go out. The JCL tells the computer where the papers are and where to put the result. The control statements are the instructions on the desk: "sort by name," "throw away the blue ones," "only write the first three columns." So the architecture is: where things go in, what the machine does in the middle, and where things come out.
1. What defines the input and output of a DFSORT step?
2. Where do DFSORT control statements come from?
3. In a SORT (not MERGE) job, which DD is used for the single input file?
4. What is the logical order of processing in a typical DFSORT sort job?
5. What role does the OPTION statement play in DFSORT architecture?