In DFSORT, SORT and MERGE are two different ways to produce sorted output. SORT is for a single input that may be in any order: DFSORT reads it, reorders by the key you specify, and writes the result. MERGE is for two or more inputs that are each already sorted by the same key: DFSORT combines them in one pass without re-sorting. Choosing the right one affects correctness, performance, and JCL (which DD names you use). This page explains the differences in input, algorithm, performance, and when to use each.
The following table summarizes the main differences. The most important distinction is: one unsorted input → SORT; multiple inputs already sorted by the same key → MERGE.
| Aspect | SORT | MERGE |
|---|---|---|
| Number of inputs | One | Two or more |
| Input DD name(s) | SORTIN | SORTIN01, SORTIN02, … |
| Input order requirement | Any (unsorted OK) | Each input must be pre-sorted by same key |
| Control statement | SORT FIELDS= | MERGE FIELDS= |
| Algorithm | Full sort (reorder all records) | Linear merge (combine sorted streams) |
| Typical use | One file to put in order | Combine already-sorted files |
SORT has exactly one input. In JCL you allocate a single DD named SORTIN. All records to be sorted come from that dataset. The records can be in any order—random, reverse, or partially sorted. DFSORT will read them and produce output ordered by the key you specify in SORT FIELDS=.
MERGE has two or more inputs. You allocate SORTIN01, SORTIN02, and optionally SORTIN03 through SORTIN16 (or more depending on the product). You do not use SORTIN in a MERGE step. Each of these inputs must already be sorted by the same key (and format and direction) that you specify in MERGE FIELDS=. If you have only one input, you use SORT, not MERGE.
With SORT, the input can be in any order. DFSORT is designed to reorder it. You can feed one big unsorted file and get back one sorted file. No prior sorting step is required.
With MERGE, every input must already be in key order. DFSORT does not check or fix order; it assumes that when it reads the next record from SORTIN01, that record has a key greater than or equal to the previous one (for ascending). If an input is out of order, the merged output will be wrong: you might see a record with key 500 before a record with key 100. So before using MERGE, you must ensure each input was produced by a sort (or another process) that used the same key as MERGE FIELDS=. If you are not sure, run a SORT step on each input first with that key, then MERGE the results.
SORT performs a full sort. DFSORT reads all (or batches of) records, compares them by the sort key, and reorders them so that the output is in ascending or descending order. This typically involves comparison and rearrangement of many records, with time complexity on the order of n log n for n records.
MERGE performs a linear merge. It does not reorder; it only combines. At any moment, DFSORT has the "current" record from each input stream. It compares the keys of those records, writes the one that comes first in the desired order to SORTOUT, and reads the next record from that stream. It repeats until all streams are exhausted. So the work is proportional to the total number of records (linear), as long as the inputs are already sorted. That is why MERGE is more efficient when you have pre-sorted data: it avoids the extra work of a full sort.
If you concatenate two sorted files into one and run SORT, DFSORT treats the combined file as unsorted and runs a full sort. If you instead feed the two files as SORTIN01 and SORTIN02 and run MERGE, DFSORT only merges: one pass, comparing the next record from each stream. So for the same total data size, MERGE uses less CPU and often less sortwork when the inputs are already in order. The rule of thumb: if the data is already sorted by the key you need, use MERGE to combine it; if not, use SORT.
The control statement that drives the operation is different: SORT FIELDS= for sorting, MERGE FIELDS= for merging. The syntax of the FIELDS= parameter is the same for both. You specify the key as (start position, length, format, direction), and optionally additional keys. For example:
12SORT FIELDS=(1,10,CH,A) MERGE FIELDS=(1,10,CH,A)
Both use bytes 1–10 as character ascending. With SORT, that is the key to sort by. With MERGE, that is the key that all inputs must already be sorted by. So when you prepare inputs for MERGE (e.g. with prior SORT steps), use the same key in those SORT steps as you will use in MERGE FIELDS=.
A simple decision flow: (1) How many input datasets do you have? If one, use SORT (with SORTIN). (2) If two or more, are they already sorted by the same key you want for the output? If yes, use MERGE (with SORTIN01, SORTIN02, …). If no, either run SORT on each and then MERGE, or concatenate all into one SORTIN and run a single SORT—the latter is simpler but may use more resource than sort-then-merge if the data is large and was already partially sorted.
12345678//SORTSTEP EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.UNSORTED.DATA,DISP=SHR //SORTOUT DD DSN=MY.SORTED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,10,CH,A) /*
One input (SORTIN), one output (SORTOUT). Records in MY.UNSORTED.DATA can be in any order; SORT reorders them by bytes 1–10 character ascending.
123456789//MERGESTP EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR //SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR //SORTOUT DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * MERGE FIELDS=(1,10,CH,A) /*
Two inputs (SORTIN01, SORTIN02), one output. MY.SORTED.PART1 and MY.SORTED.PART2 must each already be sorted by bytes 1–10 character ascending. MERGE combines them without re-sorting.
INCLUDE, OMIT, INREC, OUTREC, and OUTFIL work the same with SORT and MERGE. MERGE only changes the middle phase (merge instead of sort). So you can filter records with INCLUDE or OMIT, reformat before the sort/merge with INREC, and reformat after with OUTREC, regardless of whether you used SORT or MERGE. The choice of SORT vs MERGE is purely about number of inputs and whether they are already sorted.
Imagine you have two stacks of cards, and each stack is already in order from A to Z. You want one big stack that is still A to Z. You don't shuffle everything. You merge: look at the top card of the first stack and the top card of the second stack. Whichever letter comes first, you put that card on the result pile. Then you look at the next top cards and do it again. That's MERGE. Now imagine you have one messy pile of cards in random order. You have to sort them: look at all of them and put them in A-to-Z order. That's SORT. So: one messy pile → SORT. Two (or more) piles that are already in order → MERGE to combine them.
1. You have one large unsorted file. Which do you use?
2. You have three files, each already sorted by customer ID. You want one combined file in the same order. Which is more efficient?
3. What is the main algorithmic difference between SORT and MERGE?
4. Which DD names do you use for a two-input MERGE?
5. What happens if you run MERGE but one input is not sorted by the MERGE key?