MainframeMaster

MERGE vs SORT

In DFSORT, SORT and MERGE are two different ways to produce sorted output. SORT is for a single input that may be in any order: DFSORT reads it, reorders by the key you specify, and writes the result. MERGE is for two or more inputs that are each already sorted by the same key: DFSORT combines them in one pass without re-sorting. Choosing the right one affects correctness, performance, and JCL (which DD names you use). This page explains the differences in input, algorithm, performance, and when to use each.

MERGE Processing
Progress0 of 0 lessons

Quick Comparison Table

The following table summarizes the main differences. The most important distinction is: one unsorted input → SORT; multiple inputs already sorted by the same key → MERGE.

SORT vs MERGE at a glance
AspectSORTMERGE
Number of inputsOneTwo or more
Input DD name(s)SORTINSORTIN01, SORTIN02, …
Input order requirementAny (unsorted OK)Each input must be pre-sorted by same key
Control statementSORT FIELDS=MERGE FIELDS=
AlgorithmFull sort (reorder all records)Linear merge (combine sorted streams)
Typical useOne file to put in orderCombine already-sorted files

Input: One vs Many

SORT has exactly one input. In JCL you allocate a single DD named SORTIN. All records to be sorted come from that dataset. The records can be in any order—random, reverse, or partially sorted. DFSORT will read them and produce output ordered by the key you specify in SORT FIELDS=.

MERGE has two or more inputs. You allocate SORTIN01, SORTIN02, and optionally SORTIN03 through SORTIN16 (or more depending on the product). You do not use SORTIN in a MERGE step. Each of these inputs must already be sorted by the same key (and format and direction) that you specify in MERGE FIELDS=. If you have only one input, you use SORT, not MERGE.

Order Requirement: Any vs Pre-Sorted

With SORT, the input can be in any order. DFSORT is designed to reorder it. You can feed one big unsorted file and get back one sorted file. No prior sorting step is required.

With MERGE, every input must already be in key order. DFSORT does not check or fix order; it assumes that when it reads the next record from SORTIN01, that record has a key greater than or equal to the previous one (for ascending). If an input is out of order, the merged output will be wrong: you might see a record with key 500 before a record with key 100. So before using MERGE, you must ensure each input was produced by a sort (or another process) that used the same key as MERGE FIELDS=. If you are not sure, run a SORT step on each input first with that key, then MERGE the results.

Algorithm: Full Sort vs Linear Merge

SORT performs a full sort. DFSORT reads all (or batches of) records, compares them by the sort key, and reorders them so that the output is in ascending or descending order. This typically involves comparison and rearrangement of many records, with time complexity on the order of n log n for n records.

MERGE performs a linear merge. It does not reorder; it only combines. At any moment, DFSORT has the "current" record from each input stream. It compares the keys of those records, writes the one that comes first in the desired order to SORTOUT, and reads the next record from that stream. It repeats until all streams are exhausted. So the work is proportional to the total number of records (linear), as long as the inputs are already sorted. That is why MERGE is more efficient when you have pre-sorted data: it avoids the extra work of a full sort.

Why Merge Is Faster When Data Is Already Sorted

If you concatenate two sorted files into one and run SORT, DFSORT treats the combined file as unsorted and runs a full sort. If you instead feed the two files as SORTIN01 and SORTIN02 and run MERGE, DFSORT only merges: one pass, comparing the next record from each stream. So for the same total data size, MERGE uses less CPU and often less sortwork when the inputs are already in order. The rule of thumb: if the data is already sorted by the key you need, use MERGE to combine it; if not, use SORT.

Control Statement: SORT FIELDS= vs MERGE FIELDS=

The control statement that drives the operation is different: SORT FIELDS= for sorting, MERGE FIELDS= for merging. The syntax of the FIELDS= parameter is the same for both. You specify the key as (start position, length, format, direction), and optionally additional keys. For example:

text
1
2
SORT FIELDS=(1,10,CH,A) MERGE FIELDS=(1,10,CH,A)

Both use bytes 1–10 as character ascending. With SORT, that is the key to sort by. With MERGE, that is the key that all inputs must already be sorted by. So when you prepare inputs for MERGE (e.g. with prior SORT steps), use the same key in those SORT steps as you will use in MERGE FIELDS=.

When to Use SORT

  • You have one input dataset and records are unsorted or in the wrong order.
  • You want to order a single file by a key (e.g. customer ID, date, or a composite key).
  • You are producing sorted input for a later MERGE (e.g. sort each partition, then merge).
  • You have multiple files but they are not sorted; in that case you can concatenate them into SORTIN and run one SORT (or sort each and then MERGE).

When to Use MERGE

  • You have two or more input datasets and each is already sorted by the same key.
  • You want one combined output in that same key order without re-sorting (e.g. combining daily sorted files into one weekly file).
  • You ran separate SORT steps (e.g. one per partition or one per day) and now want to combine the results efficiently.
  • You are building one sorted master from multiple sorted feeds (e.g. from different regions or systems).

Decision Flow

A simple decision flow: (1) How many input datasets do you have? If one, use SORT (with SORTIN). (2) If two or more, are they already sorted by the same key you want for the output? If yes, use MERGE (with SORTIN01, SORTIN02, …). If no, either run SORT on each and then MERGE, or concatenate all into one SORTIN and run a single SORT—the latter is simpler but may use more resource than sort-then-merge if the data is large and was already partially sorted.

JCL Examples Side by Side

SORT: one input

jcl
1
2
3
4
5
6
7
8
//SORTSTEP EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.UNSORTED.DATA,DISP=SHR //SORTOUT DD DSN=MY.SORTED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,10,CH,A) /*

One input (SORTIN), one output (SORTOUT). Records in MY.UNSORTED.DATA can be in any order; SORT reorders them by bytes 1–10 character ascending.

MERGE: two inputs

jcl
1
2
3
4
5
6
7
8
9
//MERGESTP EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR //SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR //SORTOUT DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * MERGE FIELDS=(1,10,CH,A) /*

Two inputs (SORTIN01, SORTIN02), one output. MY.SORTED.PART1 and MY.SORTED.PART2 must each already be sorted by bytes 1–10 character ascending. MERGE combines them without re-sorting.

Filtering and Reformatting: Same for Both

INCLUDE, OMIT, INREC, OUTREC, and OUTFIL work the same with SORT and MERGE. MERGE only changes the middle phase (merge instead of sort). So you can filter records with INCLUDE or OMIT, reformat before the sort/merge with INREC, and reformat after with OUTREC, regardless of whether you used SORT or MERGE. The choice of SORT vs MERGE is purely about number of inputs and whether they are already sorted.

Explain It Like I'm Five

Imagine you have two stacks of cards, and each stack is already in order from A to Z. You want one big stack that is still A to Z. You don't shuffle everything. You merge: look at the top card of the first stack and the top card of the second stack. Whichever letter comes first, you put that card on the result pile. Then you look at the next top cards and do it again. That's MERGE. Now imagine you have one messy pile of cards in random order. You have to sort them: look at all of them and put them in A-to-Z order. That's SORT. So: one messy pile → SORT. Two (or more) piles that are already in order → MERGE to combine them.

Exercises

  1. You have four datasets; two are sorted by bytes 5–8 (packed decimal, ascending) and two are not sorted. How would you produce one output sorted by 5–8 PD A? (Hint: sort the two unsorted ones first, then merge all four.)
  2. Why can MERGE be faster than SORT when combining the same total number of records?
  3. Write the MERGE control statement to merge inputs that are each sorted by bytes 1–6 CH ascending and then bytes 7–10 PD descending.
  4. If you mistakenly use SORTIN for one of two inputs in a MERGE step, what will DFSORT do?

Quiz

Test Your Knowledge

1. You have one large unsorted file. Which do you use?

  • MERGE with SORTIN01
  • SORT with SORTIN
  • MERGE with SORTIN
  • SORT with SORTIN01 and SORTIN02

2. You have three files, each already sorted by customer ID. You want one combined file in the same order. Which is more efficient?

  • Concatenate the three into SORTIN and run SORT
  • Use MERGE with SORTIN01, SORTIN02, SORTIN03
  • Run SORT three times
  • MERGE and SORT are equally efficient

3. What is the main algorithmic difference between SORT and MERGE?

  • SORT compares two records; MERGE compares three
  • SORT reorders all records (full sort); MERGE only combines pre-sorted streams in order (linear merge)
  • MERGE uses more memory
  • There is no algorithmic difference

4. Which DD names do you use for a two-input MERGE?

  • SORTIN (twice)
  • SORTIN01 and SORTIN02
  • SORTIN and SORTIN02
  • SORTOUT only

5. What happens if you run MERGE but one input is not sorted by the MERGE key?

  • DFSORT sorts that input first automatically
  • The merged output may be incorrectly ordered; MERGE assumes inputs are already in order
  • Only that input is skipped
  • DFSORT abends