DFSORT can do three different kinds of "move" from input to output: sort (reorder records by keys), copy (keep the same order, no reordering), and merge (combine two or more already-sorted streams into one). Choosing the right one affects correctness, performance, and the control statements you use. This page explains what each does, when to use it, and how they differ.
Sorting means reordering records so they appear in ascending or descending order by one or more sort keys. The input can be in any order—random, reverse, or partially sorted. DFSORT reads all the records, compares them by the keys you specify in SORT FIELDS=, and writes the output in the new order. So the output order is determined entirely by the sort keys and the direction (A or D), not by the input order.
You use sorting when the application or report needs data in a specific sequence: for example, by customer ID, by date, or by name then account number. Sorting is the most common DFSORT operation. It uses the most resources (CPU and often sortwork datasets) because DFSORT must compare and move records until the full dataset is ordered. The larger the file and the more keys, the more work sorting does.
In control statements, you request a sort by coding SORT FIELDS=(position,length,format,direction,...). You do not use OPTION COPY or MERGE. Example: to sort an 80-byte fixed file by positions 1–10 (character, ascending) then 11–18 (packed decimal, descending), you would code:
1SORT FIELDS=(1,10,CH,A,11,4,PD,D)
The input order is irrelevant; the output will always be in that key order (subject to INCLUDE/OMIT and any SUM).
Copying means passing records from input to output without changing their order. The first input record is the first output record, the second is the second, and so on (after any filtering). You use copy when you do not need to reorder—for example, you only want to filter records (INCLUDE/OMIT), reformat them (INREC, OUTREC), write to multiple outputs (OUTFIL), or remove duplicates (SUM). Because there is no sort phase, copy is usually faster and uses less memory and sortwork than a full sort.
To request a copy in DFSORT, you use one of these:
Important: copy does not mean "no other processing." You can still use INCLUDE, OMIT, INREC, OUTREC, OUTFIL, and SUM. So you might run OPTION COPY with INCLUDE to keep only certain records and OUTREC to reformat them—the records stay in input order, but you have filtered and reformatted. If you use SUM with OPTION COPY, duplicate key collapse and aggregation still happen; the "key" order for SUM is the input order, so adjacent records with the same key are collapsed.
123OPTION COPY INCLUDE COND=(1,1,CH,EQ,C'A') OUTREC FIELDS=(1,80)
This keeps only records with 'A' in position 1 and writes them in the same order they appeared in the input; no sort is performed.
Merging means combining two or more input streams that are already sorted by the same key into one output stream that is also sorted by that key. DFSORT does not re-sort the data; it does a merge pass: it reads the next record from each input, compares the keys, and writes the "smallest" (or "largest" for descending) to the output. So the output is the combined, sorted result without running a full sort algorithm. Merging is efficient when your inputs are already in order—for example, two daily files each sorted by transaction ID that you want to combine into one chronological file.
You request a merge by coding MERGE FIELDS=(position,length,format,direction,...) with the same key definition as the order of your input datasets. You must allocate multiple inputs (e.g. SORTIN01, SORTIN02, or SORTIN and SORTIN02 depending on DFSORT conventions). Each input must be pre-sorted by that key. If any input is not in the correct order, the merged output will be wrong: records from different files can be interleaved incorrectly. So MERGE is only correct when you can guarantee the sort order of each input.
When to choose MERGE over SORT:
If you have one unsorted file, you use SORT. If you have one file already sorted and only need to filter or reformat, you might use OPTION COPY. If you have two or more sorted files to combine, MERGE is the right and usually fastest choice.
Use SORT when the input is unsorted (or sorted by a different key) and you need output in a specific key order. Use OPTION COPY when you do not need to change order—you only need to filter, reformat, split, or aggregate (e.g. INCLUDE, OUTREC, OUTFIL, SUM). Use MERGE when you have two or more datasets already sorted by the same key and you want one combined sorted dataset. Picking the wrong one can cause wrong results (e.g. using SORT when inputs are sorted and you only need to merge—it works but is wasteful; using MERGE when an input is not sorted—output will be wrong).
Suppose you have one file of 80-byte records, sorted by position 1–10 (customer ID). You want only records with position 11 = 'Y' and output in the same order (customer ID order).
Option 1 — OPTION COPY: Use OPTION COPY, INCLUDE COND=(11,1,CH,EQ,C'Y'). No SORT FIELDS. Records stay in customer ID order; filtering is applied. Fast and correct.
Option 2 — SORT: Use SORT FIELDS=(1,10,CH,A), INCLUDE same as above. Result is also in customer ID order, but DFSORT runs a full sort. Slower and unnecessary if the file is already in that order.
Option 3 — MERGE: Not applicable—you have only one file. MERGE is for combining two or more pre-sorted files.
So for this requirement, OPTION COPY is the right choice. If you had two files already sorted by customer ID and wanted one combined file in customer ID order, you would use MERGE with two input DDs and MERGE FIELDS=(1,10,CH,A).
Imagine you have cards with names and numbers. Sorting is like shuffling the whole pile and then putting the cards in order by name (or number)—you look at every card and rearrange them. Copying is like taking the pile as it is and just moving it to another table, maybe throwing away some cards or copying only part of each card—but you don't change the order they're in. Merging is when you have two piles that are already in order (e.g. both by name), and you combine them into one pile that's still in order by taking the "next" card from either pile so the combined pile stays sorted. So: sort = put in order; copy = keep the order you have; merge = mix two already-ordered piles into one ordered pile.
1. When should you use MERGE instead of SORT?
2. What does OPTION COPY do?
3. Can you use INCLUDE and OUTREC with OPTION COPY?
4. What is the main difference between SORT and MERGE in terms of input?
5. Which operation is usually fastest for combining two already-sorted files?