SORTIN and SORTOUT are the two DD names that define where your data comes from and where it goes in a DFSORT step. SORTIN is the input dataset; SORTOUT is the output dataset. This page explains what each one is, how to allocate them, how record length and format relate to INREC and OUTREC, and how MERGE changes the picture (SORTIN01, SORTIN02, … instead of SORTIN).
SORTIN is the DD name that points to the input dataset when you run a SORT step (or a single-input copy with OPTION COPY). DFSORT reads every record from the dataset allocated to SORTIN. The order in which records appear in SORTIN does not matter for a SORT—DFSORT will reorder them according to SORT FIELDS. For OPTION COPY, records are written to SORTOUT in the same order they were read (after INCLUDE/OMIT and INREC if specified).
The dataset you allocate to SORTIN must already exist before the step runs. You typically use DISP=SHR so your job can read it while other jobs may also access it, or DISP=OLD if you need exclusive access. The record format (RECFM) and record length (LRECL) are taken from the dataset label (or from the DCB you specify if you override). DFSORT uses that information to read each record correctly. So for fixed-length 80-byte records, the input dataset has LRECL=80; for variable-length (VB), the dataset has RECFM=VB and a 4-byte RDW followed by the data. DFSORT handles both.
Important: SORTIN is used only for a single-input SORT (or copy). When you do a MERGE, you do not use SORTIN; you use SORTIN01, SORTIN02, and so on. So "SORTIN" means "the one input for a SORT step."
SORTOUT is the DD name that points to the primary output dataset. After DFSORT has read input, applied INREC and INCLUDE/OMIT, sorted or merged (and optionally SUM), and applied OUTREC, it writes the resulting records to the dataset allocated to SORTOUT. There is exactly one primary output stream, and that stream goes to SORTOUT.
Unlike SORTIN, the SORTOUT dataset is usually created by the step. You allocate it with DISP=(NEW,CATLG,DELETE) (or similar), SPACE= to reserve space, and DCB= to define the record format and length. The DCB must match what DFSORT will write. If you do not use OUTREC, the output records have the same layout (and length) as the records that went into the sort phase—which may be the same as input or the result of INREC. If you use OUTREC, the output record layout and length are determined by the OUTREC specification. So you must set SORTOUT's LRECL (and RECFM) to match the actual output record length. If you get it wrong, you can get truncated or padded records, or allocation/abend issues.
SORTIN and SORTOUT must be different datasets. DFSORT reads from one and writes to the other; using the same dataset for both would be invalid.
Data flows from SORTIN to SORTOUT. In between, DFSORT may change the record (INREC before sort, OUTREC after sort). So the record length and format on SORTIN can differ from the record length and format on SORTOUT. You are responsible for making sure the SORTOUT DCB matches the final output.
If you are unsure of the output length, you can compute it from your OUTREC FIELDS (or INREC if no OUTREC): add the lengths of each field you output. For example, OUTREC FIELDS=(1,20,40,30) outputs 20 + 30 = 50 bytes, so SORTOUT should have LRECL=50 (and RECFM=FB for fixed).
Input usually already exists, so you reference it by name and use SHR (or OLD):
1//SORTIN DD DSN=MY.INPUT.DATA,DISP=SHR
If the input is the output of a previous step in the same job, you might use a step-specific name and DISP=(SHR,PASS) or reference the step output (e.g. DSN=*.STEP1.OUTPUT).
Output is usually new; you create it with SPACE and DCB:
12//SORTOUT DD DSN=MY.SORTED.DATA,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
Here RECFM=FB and LRECL=80 match an 80-byte fixed-length output. If your OUTREC produces 100-byte records, use LRECL=100 (and adjust BLKSIZE if desired, e.g. 32760 for 100-byte records). SPACE=(CYL,(5,2)) allocates 5 cylinders primary and 2 secondary; adjust for your expected record count and record length.
For a MERGE step, there is no single SORTIN. Instead you have SORTIN01, SORTIN02, and optionally more (SORTIN03, …). Each is a separate input stream; each dataset must be pre-sorted by the same key (and format) as specified in MERGE FIELDS. DFSORT merges these streams into one and writes the result to SORTOUT. So for MERGE, the relationship is: multiple inputs (SORTIN01, SORTIN02, …) → one output (SORTOUT). SORTOUT is still the primary output; allocation rules are the same (DCB must match the merged output record layout).
1234//SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR //SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR //SORTOUT DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
The merged output has the same record layout as each input (unless you use OUTREC). So if each input has LRECL=80, the merged output has LRECL=80 and SORTOUT should be allocated with LRECL=80.
If your input is variable-length (RECFM=VB), each record includes a 4-byte RDW (Record Descriptor Word) at the front. The LRECL you specify for the dataset is the maximum record length (including the RDW). DFSORT reads and writes VB records correctly; it maintains the RDW. So for SORTIN with a VB dataset, you do not need to strip the RDW—DFSORT handles it. For SORTOUT, if the output is variable-length, allocate with RECFM=VB and the appropriate LRECL (max length including RDW). If you use OUTREC to produce fixed-length output, then SORTOUT would be RECFM=FB with the fixed LRECL.
SORTIN is the inbox where the messy pile of papers (your data) sits. DFSORT takes the papers from that inbox one by one. SORTOUT is the outbox where DFSORT puts the papers after it has sorted them (or copied them, or merged them with papers from other inboxes). The inbox and outbox must be different boxes—you can't put the sorted papers back into the same box you took them from while you're still reading from it. And the outbox has to be the right size: if each sorted "paper" is 50 lines long, the outbox must be made to hold 50-line papers, not 80-line. That's what LRECL on SORTOUT is: making sure the outbox fits the papers DFSORT will put there.
1. What is SORTIN used for?
2. Can SORTIN and SORTOUT point to the same dataset?
3. If you use OUTREC to shorten records, what must match on SORTOUT?
4. For a MERGE step, which DD name(s) hold the input?
5. What does DISP=SHR on SORTIN mean?