MainframeMaster

DFSORT Architecture Overview

Understanding DFSORT architecture means understanding how a sort/merge step is put together: the JCL that runs it, the datasets it uses, and the control statements that define the operation. This page gives you a high-level picture of the components and how data flows through DFSORT.

Fundamentals
Progress0 of 0 lessons

What We Mean by "Architecture"

Here, architecture means the main building blocks of a DFSORT run: the program that runs, the datasets it reads and writes, and the instructions (control statements) that tell it what to do. We are not describing IBM's internal code design—we are describing the way you assemble a DFSORT job and how data moves through it.

The Three Layers: JCL, Datasets, and Control Statements

A DFSORT step is built from three layers that work together.

1. JCL (Job Control Language)

The JCL defines the step and the DD statements. The EXEC statement says which program to run (usually PGM=SORT or PGM=ICEMAN). The DD statements tell the system where the input data is, where the output goes, where the control statements are, and where to put messages. Optional DDs (e.g. STEPLIB, SORTWK01) can point to the sort product library and to work datasets. So the JCL is the "plumbing": it connects the step to the right program and to the right datasets.

2. Datasets (DD Names)

DFSORT uses fixed DD names so it knows what each dataset is for:

  • SORTIN — The single input dataset for a SORT operation. Records read from here are sorted (and optionally filtered/reformatted) before output.
  • SORTIN01, SORTIN02, ... — Used for MERGE. Each is a pre-sorted input; DFSORT merges them into one sorted stream.
  • SORTOUT — The primary output dataset. After sort or merge (and optional OUTREC), records are written here unless you use OUTFIL to redirect or add outputs.
  • SYSIN — The dataset (or in-stream data) that contains the control statements. DFSORT reads and parses these to know what to do.
  • SYSOUT — Where DFSORT writes messages (e.g. ICE000I). Usually SYSOUT=* so messages go to the job log.
  • SORTWK01, SORTWK02, ... — Optional work datasets for sort/merge when data does not fit in memory. DFSORT can also dynamically allocate work.

3. Control Statements (SYSIN)

The control statements are the "brain" of the step. They tell DFSORT whether to SORT or MERGE, which keys to use, whether to filter (INCLUDE/OMIT), how to reformat before sort (INREC) or after (OUTREC), whether to write multiple outputs (OUTFIL), and options (OPTION). So: JCL wires the step to datasets; control statements define the operation on that data.

Logical Data Flow: Input → Process → Output

Data moves through DFSORT in three logical stages. The exact order depends on which control statements you use, but the idea is always: read input, process it, write output.

Input Phase

DFSORT reads records from SORTIN (for SORT) or from SORTIN01, SORTIN02, ... (for MERGE). During input, it can:

  • Filter — INCLUDE keeps only records that meet conditions; OMIT drops records that meet conditions. Only records that pass the filter participate in the sort or merge.
  • Reformat — INREC builds a new record layout before the sort. That lets you drop fields or build keys so that less data goes into the sort, which can improve performance.

So the "input" to the sort/merge engine is not necessarily the raw SORTIN record—it can be a filtered and reformatted version.

Sort or Merge Phase

For SORT, DFSORT orders all records by the key(s) you specify in SORT FIELDS. It may use memory and/or work datasets (SORTWKnn or dynamic) to hold and order the data. For MERGE, it assumes each input is already sorted by the same key and merges the streams into one sorted output. For COPY (OPTION COPY), there is no reordering—records are just passed through (after input filtering/reformat) to the output stage.

Output Phase

After sort/merge (and optional SUM for duplicate handling or aggregation), records are prepared for output. OUTREC can change the layout (reorder fields, add constants, edit numbers). OUTFIL can create multiple outputs: different datasets, different INCLUDE/OMIT or reformat per output, SPLIT, report layouts, etc. So you can have one SORTOUT and several OUTFIL-defined outputs from a single step.

Simple Architecture Example

The following JCL and control statements show the minimal architecture for a sort:

jcl
1
2
3
4
5
6
7
8
//SORT1 EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.INPUT,DISP=SHR //SORTOUT DD DSN=MY.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,20,CH,A) /*

JCL: One step (SORT1), program SORT, SYSOUT for messages, SORTIN for input, SORTOUT for output, SYSIN for control data. Control: One statement—SORT FIELDS—so records are sorted by positions 1–20, character, ascending. No INREC, INCLUDE, OMIT, OUTREC, or OUTFIL, so the flow is: read from SORTIN → sort by (1,20,CH,A) → write same record to SORTOUT.

Where OPTION and Other Statements Fit

OPTION controls global behavior. For example, OPTION COPY means "do not sort"—only copy (with optional INCLUDE/OMIT, INREC, OUTREC, OUTFIL). OPTION EQUALS or NOEQUALS affects the order of records with equal keys. So in the architecture, OPTION sits alongside SORT/MERGE and modifies how the sort/merge phase (or copy path) behaves.

Other statements fit into the flow: RECORD can define record length or type; SUM applies after sort/merge to collapse duplicates or aggregate; JOINKEYS and REFORMAT define a join between two inputs, which is a different flow (two inputs, match by key, build output from REFORMAT). So the architecture expands when you use these—same idea (input → process → output), but with more inputs or different processing.

Explain It Like I'm Five

DFSORT is like a machine with three parts: a door where papers (records) come in, a desk where someone sorts or stacks them in order, and a door where the sorted papers go out. The JCL tells the computer where the papers are and where to put the result. The control statements are the instructions on the desk: "sort by name," "throw away the blue ones," "only write the first three columns." So the architecture is: where things go in, what the machine does in the middle, and where things come out.

Exercises

  1. List the DD names used for: (a) input data in a SORT, (b) output data, (c) control statements, (d) messages.
  2. In what order does DFSORT logically process data (input, sort/merge, output)? What can happen in the input phase and in the output phase?
  3. What is the role of SYSIN in the architecture? What is the role of SORTIN?
  4. If you use OPTION COPY, does DFSORT still use a "sort phase"? Why or why not?

Quiz

Test Your Knowledge

1. What defines the input and output of a DFSORT step?

  • Control statements only
  • JCL DD statements only
  • Both JCL DD statements and control statements
  • Only the EXEC statement

2. Where do DFSORT control statements come from?

  • JCL EXEC parameters
  • SYSIN DD
  • PARM on EXEC
  • SYSOUT

3. In a SORT (not MERGE) job, which DD is used for the single input file?

  • SORTIN01
  • SORTIN
  • INPUT
  • SYSIN

4. What is the logical order of processing in a typical DFSORT sort job?

  • Output then input then sort
  • Input, then sort/merge, then output
  • Sort then input then output
  • Output and input in parallel

5. What role does the OPTION statement play in DFSORT architecture?

  • It defines the input file
  • It modifies global behavior (e.g. COPY, EQUALS)
  • It defines the output file
  • It replaces SORT FIELDS