What is SORTIN in DFSORT?

SORTIN is the standard DD name for the input dataset in a DFSORT step. Your JCL defines a SORTIN DD that points to the file containing the records to be sorted, merged, or copied. DFSORT reads all input from this dataset (or from SORTIN01, SORTIN02, etc. for multi-input).

What is SORTOUT in DFSORT?

SORTOUT is the standard DD name for the primary output dataset. DFSORT writes the sorted, merged, or copied records to the dataset specified by the SORTOUT DD. SORTIN and SORTOUT must point to different datasets.

What are DFSORT control statements?

Control statements are the instructions you put in the SYSIN dataset that tell DFSORT what to do. Examples include SORT FIELDS= (how to sort), INCLUDE/OMIT (which records to keep or drop), INREC/OUTREC (how to reformat records), and OUTFIL (how to write output).

What is the difference between INREC and OUTREC in DFSORT?

INREC reformats records before the sort/merge phase; OUTREC reformats records after the sort/merge phase. Use INREC when you need to build or change sort keys or reduce record size before sorting. Use OUTREC when you need to format the final output (e.g. add spaces, edit numbers, reorder fields).

What do CH, PD, BI, and ZD mean in DFSORT?

They are field format types: CH = Character (alphanumeric), PD = Packed Decimal, BI = Binary, ZD = Zoned Decimal. You specify them in SORT FIELDS, INREC, OUTREC, and elsewhere so DFSORT knows how to interpret and compare the data in each field.

DFSORT Terminology Glossary - Key Terms Explained

JCL and Dataset Terms

SORTIN

SORTIN is the DD name that points to the input dataset for a DFSORT step. DFSORT reads the records it will sort, merge, or copy from the dataset allocated to SORTIN. In a simple sort job you have one input file, so you define one DD named SORTIN. If you use multi-input (e.g. for MERGE), you can use SORTIN01, SORTIN02, and so on. The name is fixed by convention; if you use a different DD name, DFSORT will not use it as input unless you use special options. So in practice you always see SORTIN DD ... in JCL for the main input file.

SORTOUT

SORTOUT is the DD name for the primary output dataset. After DFSORT processes the records (sort, merge, copy, with any INCLUDE/OMIT, INREC, OUTREC, etc.), it writes the result to the dataset allocated to SORTOUT. There is only one SORTOUT for the main output. If you need multiple output files or split output, you use OUTFIL control statements to define additional outputs; the main stream still goes to SORTOUT unless you redirect it. SORTIN and SORTOUT must never point to the same dataset, or data can be overwritten while being read.

SYSIN

SYSIN is the DD name for the control statement input. Instead of reading data records, the SYSIN dataset contains the instructions that tell DFSORT what to do: SORT FIELDS=, INCLUDE, OMIT, INREC, OUTREC, OUTFIL, SUM, OPTION, and so on. You can use instream data (DD * or DD DATA) or a cataloged dataset. DFSORT reads SYSIN until it finds a line with /* (for instream) or end-of-file. Everything between the start of SYSIN and that terminator is parsed as control statements.

SYSOUT

SYSOUT is the DD name for the job log or message output. DFSORT writes informational and error messages (e.g. ICE000I, ICE001I) to the SYSOUT dataset. You typically code SYSOUT DD SYSOUT=* so messages go to the job's standard output. SYSOUT does not contain your data; it contains the messages that tell you how many records were read, sorted, and written, and any errors or warnings.

DD statement

A DD (Data Definition) statement is a JCL statement that defines a dataset or stream used by the program. In a DFSORT step, you use DD statements to allocate SORTIN (input data), SORTOUT (output data), SYSIN (control statements), and SYSOUT (messages). Each DD has a name (e.g. SORTIN) and parameters (e.g. DSN=, DISP=, SPACE=, DCB=). The program references these by DD name; DFSORT is written to look for the names SORTIN, SORTOUT, SYSIN, and SYSOUT.

Control Statement Names

SORT (and SORT FIELDS)

SORT is the control statement that tells DFSORT to sort the input records. You usually code SORT FIELDS=(position,length,format,direction,...) to define the sort keys. "SORT" by itself can be used with OPTION COPY to mean "copy only, no sort"; with SORT FIELDS it means "reorder records by these keys." The word SORT here is the statement name; the actual sort keys are given in the FIELDS= parameter. If you omit SORT FIELDS and do not use MERGE or OPTION COPY, the job may fail or behave in an unexpected way.

MERGE

MERGE is the control statement used when you are combining two or more already sorted datasets into one sorted result. You use MERGE FIELDS= with the same key definition as the order of the input files. DFSORT does not re-sort the data; it merges the streams in order. So MERGE is different from SORT: SORT takes one (or more) inputs and sorts them; MERGE takes pre-sorted inputs and merges them. The term "merge" in general programming means combining two ordered lists into one; in DFSORT the MERGE statement is the way you request that behavior.

INCLUDE

INCLUDE is a control statement that specifies which records to keep. You give a condition (e.g. a field equals a value, or is in a range). Only records that satisfy the condition are passed to the sort/merge and then to output; all others are dropped. So INCLUDE is a filter: "include only records where this is true." You can have multiple INCLUDE statements; conditions can be combined (AND/OR) depending on syntax. INCLUDE is the opposite of OMIT.

OMIT

OMIT is the control statement that specifies which records to drop. You give a condition; records that satisfy it are removed and never sorted or written. So OMIT means "exclude these records." INCLUDE and OMIT are mutually exclusive in a given job: you use one or the other to filter, not both. OMIT is useful when you want to remove a small set of records (e.g. test records, header lines) and keep everything else.

INREC

INREC (Input Record) is the control statement that reformats records before the sort/merge phase. You can change the record layout, pick only certain fields, add constants, convert dates, build new keys, or shorten records. Why before sort? So that the sort sees the keys and data you want, and so you can reduce record length to save memory and improve performance. INREC does not change the number of records (except indirectly if you use IFTHEN to drop some); it changes the content or layout of each record before sorting.

OUTREC

OUTREC (Output Record) is the control statement that reformats records after the sort/merge phase. You use it to build the final output layout: reorder fields, add spaces, edit numeric fields with edit masks, add sequence numbers, do FINDREP or overlay, and so on. OUTREC applies to the main SORTOUT stream. So the order of processing is: read input → INREC (if present) → sort/merge → OUTREC (if present) → write to SORTOUT. INREC shapes what gets sorted; OUTREC shapes what gets written.

OUTFIL

OUTFIL (Output File) is the control statement that defines additional or alternative outputs. You can create multiple output datasets from one DFSORT step: for example, one OUTFIL for a sorted file, another for a report, another for records that meet a condition. Each OUTFIL can have its own INCLUDE/OMIT, reformatting (OUTREC-like build), FNAMES (DD name), SPLIT, and report options. So OUTFIL extends the idea of "one output" to many. The main output is still SORTOUT unless you use OUTFIL to take over the whole stream.

SUM

SUM is the control statement used to remove duplicates and/or aggregate numeric fields. You specify the fields that form the "key" for uniqueness and optionally fields to sum, take MIN, or take MAX. When SUM is used, DFSORT collapses records that have the same key into one record; if you specify sum/min/max, those fields are combined. So SUM gives you deduplication and simple aggregation without writing a separate program. Terms you might see with SUM include FIELDS= (which fields to compare or aggregate) and overflow handling (VLSHRT, NOVLSHRT).

JOINKEYS

JOINKEYS is the control statement used to join two datasets (like a database join). You define a join key for each side (F1 and F2) and optionally REFORMAT to specify which fields from each file appear in the result. DFSORT can do inner joins, left/right joins, and full outer joins. So JOINKEYS is the term for "match records from two files on a key and produce a combined result." It is an advanced feature and requires two SORTIN (or SORTIN01/SORTIN02) inputs and careful ordering of control statements.

OPTION

OPTION is the control statement that sets processing options. Examples: EQUALS/NOEQUALS (whether to preserve order of equal keys), SKIPREC (skip N input records), STOPAFT (stop after N input records), VLSHRT/NOVLSHRT (variable-length short record handling), COPY (copy only, no sort), MSGPRT, and sizing/tuning options like SIZE, FILSZ, DYNALLOC. So OPTION is where you turn on or off behaviors that affect the whole run; the exact options available depend on your DFSORT version and documentation.

Sort Keys and Field Formats

Sort key

A sort key is the field (or combination of fields) that DFSORT uses to determine the order of records. You define it in SORT FIELDS=(start,length,format,direction,...). The first key is the major key; if two records are equal on that key, the next key is used, and so on. So "sort key" means "what we compare to put records in order." The same idea applies in MERGE FIELDS=: the merge key is the field order that the inputs are already sorted by.

CH (Character)

CH stands for Character. A field specified as CH is treated as alphanumeric (EBCDIC) data. Sort order is determined by the collating sequence (typically EBCDIC): character-by-character comparison. CH is used for names, IDs, and any non-numeric data. Length is in bytes. Example: (1,20,CH,A) means positions 1–20, character format, ascending.

PD (Packed Decimal)

PD stands for Packed Decimal. The field is stored in packed decimal format: each byte holds two decimal digits (except the last, which holds one digit and the sign). DFSORT compares these as signed numbers. PD is common for financial and numeric fields in mainframe files. You must specify the length in bytes (e.g. 4 bytes for up to 7 digits + sign). Sort order is numeric: -1 before 0 before 1.

BI (Binary)

BI stands for Binary. The field is stored as a binary (twos-complement) integer. DFSORT interprets the bytes as a fullword or halfword and compares them numerically. BI is used for integer fields (counts, offsets) when the application stores them in binary. Length is typically 2 or 4 bytes. Sort order is numeric.

ZD (Zoned Decimal)

ZD stands for Zoned Decimal. Each byte holds one digit (in the high nibble) and the sign (in the low nibble of the last byte). It is the common "display" or "zoned" numeric format on the mainframe. DFSORT compares ZD fields as signed numbers. You specify length in bytes (one per digit, plus sign in last byte). Sort order is numeric.

A and D (Ascending and Descending)

A means Ascending (low to high): smaller values come first. D means Descending (high to low): larger values come first. You specify A or D for each sort key in SORT FIELDS=. So (1,10,CH,A) sorts that key ascending; (21,4,PD,D) sorts that key descending. Mixing A and D lets you do things like "sort by name ascending, then by amount descending."

Collating sequence

The collating sequence is the order in which characters are compared when sorting character (CH) data. On the mainframe the default is usually EBCDIC (Extended Binary Coded Decimal Interchange Code): the numeric value of each byte determines order. So 'A' (hex C1 in EBCDIC) comes before 'B' (hex C2). If you need a different order (e.g. ASCII, or a custom order for digits/letters), you can use the ALTSEQ control statement to define a translation table so that sort comparisons use your desired sequence.

Processing and Data Terms

Record

A record is one unit of data that DFSORT reads from input and writes to output. In fixed-length (FB) datasets, every record has the same length (LRECL). In variable-length (VB) datasets, each record has a 4-byte prefix (RDW) plus the data. DFSORT processes one record at a time for filtering and reformatting; for sort/merge it compares and reorders records by keys.

Fixed-length (FB) and variable-length (VB)

FB (Fixed Block) means all records have the same length (e.g. LRECL=80). VB (Variable Block) means each record has a 4-byte Record Descriptor Word (RDW) followed by the data, so record length can vary. DFSORT supports both; you specify RECFM=FB or RECFM=VB (or F, V) on your DD or in the dataset. Sort keys and positions in control statements refer to the data portion; for VB, the RDW is not counted in the position you give in SORT FIELDS or INREC/OUTREC unless you account for it.

Reformat

To reformat means to change the layout or content of a record. INREC reformats before sort; OUTREC reformats after sort. Reformatting can include: picking fields (e.g. 1,10 and 30,5), inserting constants (e.g. spaces, literals), converting dates, editing numbers, building sequence numbers, and overlay/IFTHEN logic. So when documentation says "reformat," it means "build a new record layout from the current record (and optionally constants)."

Control statement

A control statement is a single line (or continuation) in the SYSIN dataset that tells DFSORT what to do. SORT FIELDS=, INCLUDE, OMIT, INREC, OUTREC, OUTFIL, SUM, OPTION, MERGE, JOINKEYS, and so on are all control statements. They are free-format in columns 1–71 (with continuation as needed). Control statements are not JCL; they are data that the sort program reads and interprets.

Phase

In DFSORT documentation, phase often refers to a stage of processing: for example, "input phase" (read SORTIN), "sort phase" (reorder by keys), "output phase" (write SORTOUT). INREC runs in the input phase; OUTREC runs in the output phase. Understanding phases helps you know when each kind of reformatting and filtering is applied.

Explain It Like I'm Five

Imagine you have a stack of cards with names and numbers. SORTIN is the box the cards come from, and SORTOUT is the box you put the finished stack in. SYSIN is the instruction sheet that says "sort by name" or "only keep the red cards" or "write the answer in this order." A sort key is "what we look at to put the cards in order"—like "first by name, then by number." INREC is like fixing or shortening the cards before we sort them; OUTREC is like writing the final list the way we want it. INCLUDE means "only keep cards that match this rule"; OMIT means "throw away cards that match this rule." CH means we compare the words letter by letter; PD or BI means we compare numbers as numbers. The collating sequence is the order of the alphabet we use (e.g. A then B then C) when we compare words.

Exercises

What is the difference between SORTIN and SYSIN? What does each DD contain?
If you want to drop header records and keep the rest, do you use INCLUDE or OMIT? Why?
You need to add a sequence number to each record in the output. Should you use INREC or OUTREC? Explain.
What does CH mean for a sort field? How does sort order differ from a PD field of the same length?
What is a sort key? How do you specify multiple sort keys in SORT FIELDS?

DFSORT Terminology Glossary