What is DFSORT architecture?

DFSORT architecture is the way a sort/merge step is built: a JCL step that runs the sort program (e.g. PGM=SORT), DD statements that define input (SORTIN or SORTINnn), output (SORTOUT, OUTFIL), control input (SYSIN), and messages (SYSOUT), plus optional work datasets. Control statements in SYSIN drive the processing (SORT, MERGE, INCLUDE, INREC, OUTREC, etc.).

What are the main components of a DFSORT job?

The main components are: (1) the EXEC statement that runs the sort program, (2) DD statements for SORTIN/SORTOUT (or SORTIN01, SORTIN02... for merge), SYSIN (control statements), and SYSOUT (messages), (3) optional SORTWKnn or dynamic work datasets, and (4) the control statements in SYSIN that define the operation (SORT FIELDS, MERGE, INCLUDE, OMIT, INREC, OUTREC, OUTFIL, SUM, OPTION, etc.).

How does data flow through DFSORT?

Data flows in three logical stages: (1) Input—records are read from SORTIN (or merge inputs), optionally filtered (INCLUDE/OMIT) and reformatted (INREC). (2) Sort or merge—records are sorted by key or merged from multiple inputs. (3) Output—records are optionally reformatted (OUTREC) and written to SORTOUT and/or OUTFIL datasets.

What is the difference between SORTIN and SYSIN in DFSORT?

SORTIN is the DD that points to the input data to be sorted (the actual records). SYSIN is the DD that contains the control statements (SORT FIELDS, INCLUDE, OUTREC, etc.) that tell DFSORT how to process the data. Both are required for a typical sort step.

Can DFSORT write to more than one output?

Yes. The primary output is SORTOUT. In addition, OUTFIL control statements can define multiple output datasets or reports (e.g. FNAMES, SPLIT, different INCLUDE/OMIT or reformatting per output). So one DFSORT step can produce several files or reports.

DFSORT Architecture Overview

Understanding DFSORT architecture means understanding how a sort/merge step is put together: the JCL that runs it, the datasets it uses, and the control statements that define the operation. This page gives you a high-level picture of the components and how data flows through DFSORT.

Fundamentals

Progress0 of 0 lessons

What We Mean by "Architecture"

Here, architecture means the main building blocks of a DFSORT run: the program that runs, the datasets it reads and writes, and the instructions (control statements) that tell it what to do. We are not describing IBM's internal code design—we are describing the way you assemble a DFSORT job and how data moves through it.

The Three Layers: JCL, Datasets, and Control Statements

A DFSORT step is built from three layers that work together.

1. JCL (Job Control Language)

The JCL defines the step and the DD statements. The EXEC statement says which program to run (usually PGM=SORT or PGM=ICEMAN). The DD statements tell the system where the input data is, where the output goes, where the control statements are, and where to put messages. Optional DDs (e.g. STEPLIB, SORTWK01) can point to the sort product library and to work datasets. So the JCL is the "plumbing": it connects the step to the right program and to the right datasets.

2. Datasets (DD Names)

DFSORT uses fixed DD names so it knows what each dataset is for:

SORTIN — The single input dataset for a SORT operation. Records read from here are sorted (and optionally filtered/reformatted) before output.
SORTIN01, SORTIN02, ... — Used for MERGE. Each is a pre-sorted input; DFSORT merges them into one sorted stream.
SORTOUT — The primary output dataset. After sort or merge (and optional OUTREC), records are written here unless you use OUTFIL to redirect or add outputs.
SYSIN — The dataset (or in-stream data) that contains the control statements. DFSORT reads and parses these to know what to do.
SYSOUT — Where DFSORT writes messages (e.g. ICE000I). Usually SYSOUT=* so messages go to the job log.
SORTWK01, SORTWK02, ... — Optional work datasets for sort/merge when data does not fit in memory. DFSORT can also dynamically allocate work.

3. Control Statements (SYSIN)

The control statements are the "brain" of the step. They tell DFSORT whether to SORT or MERGE, which keys to use, whether to filter (INCLUDE/OMIT), how to reformat before sort (INREC) or after (OUTREC), whether to write multiple outputs (OUTFIL), and options (OPTION). So: JCL wires the step to datasets; control statements define the operation on that data.

Logical Data Flow: Input → Process → Output

Data moves through DFSORT in three logical stages. The exact order depends on which control statements you use, but the idea is always: read input, process it, write output.

Input Phase

DFSORT reads records from SORTIN (for SORT) or from SORTIN01, SORTIN02, ... (for MERGE). During input, it can:

Filter — INCLUDE keeps only records that meet conditions; OMIT drops records that meet conditions. Only records that pass the filter participate in the sort or merge.
Reformat — INREC builds a new record layout before the sort. That lets you drop fields or build keys so that less data goes into the sort, which can improve performance.

So the "input" to the sort/merge engine is not necessarily the raw SORTIN record—it can be a filtered and reformatted version.

Sort or Merge Phase

For SORT, DFSORT orders all records by the key(s) you specify in SORT FIELDS. It may use memory and/or work datasets (SORTWKnn or dynamic) to hold and order the data. For MERGE, it assumes each input is already sorted by the same key and merges the streams into one sorted output. For COPY (OPTION COPY), there is no reordering—records are just passed through (after input filtering/reformat) to the output stage.

Output Phase

After sort/merge (and optional SUM for duplicate handling or aggregation), records are prepared for output. OUTREC can change the layout (reorder fields, add constants, edit numbers). OUTFIL can create multiple outputs: different datasets, different INCLUDE/OMIT or reformat per output, SPLIT, report layouts, etc. So you can have one SORTOUT and several OUTFIL-defined outputs from a single step.

Simple Architecture Example

The following JCL and control statements show the minimal architecture for a sort:

jcl

1
2
3
4
5
6
7
8
//SORT1    EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN   DD DSN=MY.INPUT,DISP=SHR
//SORTOUT  DD DSN=MY.OUTPUT,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
//SYSIN    DD *
  SORT FIELDS=(1,20,CH,A)
/*

JCL: One step (SORT1), program SORT, SYSOUT for messages, SORTIN for input, SORTOUT for output, SYSIN for control data. Control: One statement—SORT FIELDS—so records are sorted by positions 1–20, character, ascending. No INREC, INCLUDE, OMIT, OUTREC, or OUTFIL, so the flow is: read from SORTIN → sort by (1,20,CH,A) → write same record to SORTOUT.

Where OPTION and Other Statements Fit

OPTION controls global behavior. For example, OPTION COPY means "do not sort"—only copy (with optional INCLUDE/OMIT, INREC, OUTREC, OUTFIL). OPTION EQUALS or NOEQUALS affects the order of records with equal keys. So in the architecture, OPTION sits alongside SORT/MERGE and modifies how the sort/merge phase (or copy path) behaves.

Other statements fit into the flow: RECORD can define record length or type; SUM applies after sort/merge to collapse duplicates or aggregate; JOINKEYS and REFORMAT define a join between two inputs, which is a different flow (two inputs, match by key, build output from REFORMAT). So the architecture expands when you use these—same idea (input → process → output), but with more inputs or different processing.

Explain It Like I'm Five

DFSORT is like a machine with three parts: a door where papers (records) come in, a desk where someone sorts or stacks them in order, and a door where the sorted papers go out. The JCL tells the computer where the papers are and where to put the result. The control statements are the instructions on the desk: "sort by name," "throw away the blue ones," "only write the first three columns." So the architecture is: where things go in, what the machine does in the middle, and where things come out.

Exercises

List the DD names used for: (a) input data in a SORT, (b) output data, (c) control statements, (d) messages.
In what order does DFSORT logically process data (input, sort/merge, output)? What can happen in the input phase and in the output phase?
What is the role of SYSIN in the architecture? What is the role of SORTIN?
If you use OPTION COPY, does DFSORT still use a "sort phase"? Why or why not?

Quiz

Test Your Knowledge

1. What defines the input and output of a DFSORT step?

Control statements only
JCL DD statements only
Both JCL DD statements and control statements
Only the EXEC statement

2. Where do DFSORT control statements come from?

JCL EXEC parameters
SYSIN DD
PARM on EXEC
SYSOUT

3. In a SORT (not MERGE) job, which DD is used for the single input file?

SORTIN01
SORTIN
INPUT
SYSIN

4. What is the logical order of processing in a typical DFSORT sort job?

Output then input then sort
Input, then sort/merge, then output
Sort then input then output
Output and input in parallel

5. What role does the OPTION statement play in DFSORT architecture?

It defines the input file
It modifies global behavior (e.g. COPY, EQUALS)
It defines the output file
It replaces SORT FIELDS