What is the difference between MERGE and SORT in DFSORT?

SORT takes one input (SORTIN), reorders records by a key, and writes to SORTOUT. MERGE takes two or more inputs (SORTIN01, SORTIN02, …) that are already sorted by the same key and merges them into one sorted stream without re-sorting. Use SORT for one unsorted file; use MERGE when you have multiple files already in key order.

When should I use MERGE instead of SORT?

Use MERGE when you have two or more input datasets that are each already sorted by the same key and you want one combined sorted output. Use SORT when you have a single input that is unsorted or in the wrong order. MERGE is more efficient for combining pre-sorted data because it does a linear merge instead of a full sort.

Do MERGE and SORT use the same FIELDS= syntax?

Yes. MERGE FIELDS= and SORT FIELDS= use the same syntax: (position, length, format, direction) for each key. The key you specify in MERGE FIELDS= must be the key that all MERGE inputs are already sorted by.

Can I use INCLUDE, OMIT, INREC, and OUTREC with MERGE?

Yes. MERGE only changes the sort phase (merge instead of sort). INCLUDE and OMIT filter records from each input; INREC reformats before the merge; OUTREC reformats after the merge. These control statements work the same with both SORT and MERGE.

Why is MERGE faster than SORT for combining sorted files?

MERGE performs a linear pass over the inputs, always taking the next record in key order from one of the streams. That is O(n) work. SORT must compare and reorder all records, which is typically O(n log n). So when data is already sorted, MERGE avoids the extra work of a full sort.

MERGE vs SORT

In DFSORT, SORT and MERGE are two different ways to produce sorted output. SORT is for a single input that may be in any order: DFSORT reads it, reorders by the key you specify, and writes the result. MERGE is for two or more inputs that are each already sorted by the same key: DFSORT combines them in one pass without re-sorting. Choosing the right one affects correctness, performance, and JCL (which DD names you use). This page explains the differences in input, algorithm, performance, and when to use each.

MERGE Processing

Progress0 of 0 lessons

Quick Comparison Table

The following table summarizes the main differences. The most important distinction is: one unsorted input → SORT; multiple inputs already sorted by the same key → MERGE.

SORT vs MERGE at a glance
Aspect	SORT	MERGE
Number of inputs	One	Two or more
Input DD name(s)	SORTIN	SORTIN01, SORTIN02, …
Input order requirement	Any (unsorted OK)	Each input must be pre-sorted by same key
Control statement	SORT FIELDS=	MERGE FIELDS=
Algorithm	Full sort (reorder all records)	Linear merge (combine sorted streams)
Typical use	One file to put in order	Combine already-sorted files

Input: One vs Many

SORT has exactly one input. In JCL you allocate a single DD named SORTIN. All records to be sorted come from that dataset. The records can be in any order—random, reverse, or partially sorted. DFSORT will read them and produce output ordered by the key you specify in SORT FIELDS=.

MERGE has two or more inputs. You allocate SORTIN01, SORTIN02, and optionally SORTIN03 through SORTIN16 (or more depending on the product). You do not use SORTIN in a MERGE step. Each of these inputs must already be sorted by the same key (and format and direction) that you specify in MERGE FIELDS=. If you have only one input, you use SORT, not MERGE.

Order Requirement: Any vs Pre-Sorted

With SORT, the input can be in any order. DFSORT is designed to reorder it. You can feed one big unsorted file and get back one sorted file. No prior sorting step is required.

With MERGE, every input must already be in key order. DFSORT does not check or fix order; it assumes that when it reads the next record from SORTIN01, that record has a key greater than or equal to the previous one (for ascending). If an input is out of order, the merged output will be wrong: you might see a record with key 500 before a record with key 100. So before using MERGE, you must ensure each input was produced by a sort (or another process) that used the same key as MERGE FIELDS=. If you are not sure, run a SORT step on each input first with that key, then MERGE the results.

Algorithm: Full Sort vs Linear Merge

SORT performs a full sort. DFSORT reads all (or batches of) records, compares them by the sort key, and reorders them so that the output is in ascending or descending order. This typically involves comparison and rearrangement of many records, with time complexity on the order of n log n for n records.

MERGE performs a linear merge. It does not reorder; it only combines. At any moment, DFSORT has the "current" record from each input stream. It compares the keys of those records, writes the one that comes first in the desired order to SORTOUT, and reads the next record from that stream. It repeats until all streams are exhausted. So the work is proportional to the total number of records (linear), as long as the inputs are already sorted. That is why MERGE is more efficient when you have pre-sorted data: it avoids the extra work of a full sort.

Why Merge Is Faster When Data Is Already Sorted

If you concatenate two sorted files into one and run SORT, DFSORT treats the combined file as unsorted and runs a full sort. If you instead feed the two files as SORTIN01 and SORTIN02 and run MERGE, DFSORT only merges: one pass, comparing the next record from each stream. So for the same total data size, MERGE uses less CPU and often less sortwork when the inputs are already in order. The rule of thumb: if the data is already sorted by the key you need, use MERGE to combine it; if not, use SORT.

Control Statement: SORT FIELDS= vs MERGE FIELDS=

The control statement that drives the operation is different: SORT FIELDS= for sorting, MERGE FIELDS= for merging. The syntax of the FIELDS= parameter is the same for both. You specify the key as (start position, length, format, direction), and optionally additional keys. For example:

text

1
2
  SORT FIELDS=(1,10,CH,A)
  MERGE FIELDS=(1,10,CH,A)

Both use bytes 1–10 as character ascending. With SORT, that is the key to sort by. With MERGE, that is the key that all inputs must already be sorted by. So when you prepare inputs for MERGE (e.g. with prior SORT steps), use the same key in those SORT steps as you will use in MERGE FIELDS=.

When to Use SORT

You have one input dataset and records are unsorted or in the wrong order.
You want to order a single file by a key (e.g. customer ID, date, or a composite key).
You are producing sorted input for a later MERGE (e.g. sort each partition, then merge).
You have multiple files but they are not sorted; in that case you can concatenate them into SORTIN and run one SORT (or sort each and then MERGE).

When to Use MERGE

You have two or more input datasets and each is already sorted by the same key.
You want one combined output in that same key order without re-sorting (e.g. combining daily sorted files into one weekly file).
You ran separate SORT steps (e.g. one per partition or one per day) and now want to combine the results efficiently.
You are building one sorted master from multiple sorted feeds (e.g. from different regions or systems).

Decision Flow

A simple decision flow: (1) How many input datasets do you have? If one, use SORT (with SORTIN). (2) If two or more, are they already sorted by the same key you want for the output? If yes, use MERGE (with SORTIN01, SORTIN02, …). If no, either run SORT on each and then MERGE, or concatenate all into one SORTIN and run a single SORT—the latter is simpler but may use more resource than sort-then-merge if the data is large and was already partially sorted.

JCL Examples Side by Side

SORT: one input

jcl

1
2
3
4
5
6
7
8
//SORTSTEP EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN   DD DSN=MY.UNSORTED.DATA,DISP=SHR
//SORTOUT  DD DSN=MY.SORTED.OUTPUT,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
//SYSIN    DD *
  SORT FIELDS=(1,10,CH,A)
/*

One input (SORTIN), one output (SORTOUT). Records in MY.UNSORTED.DATA can be in any order; SORT reorders them by bytes 1–10 character ascending.

MERGE: two inputs

jcl

1
2
3
4
5
6
7
8
9
//MERGESTP EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR
//SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR
//SORTOUT  DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
//SYSIN    DD *
  MERGE FIELDS=(1,10,CH,A)
/*

Two inputs (SORTIN01, SORTIN02), one output. MY.SORTED.PART1 and MY.SORTED.PART2 must each already be sorted by bytes 1–10 character ascending. MERGE combines them without re-sorting.

Filtering and Reformatting: Same for Both

INCLUDE, OMIT, INREC, OUTREC, and OUTFIL work the same with SORT and MERGE. MERGE only changes the middle phase (merge instead of sort). So you can filter records with INCLUDE or OMIT, reformat before the sort/merge with INREC, and reformat after with OUTREC, regardless of whether you used SORT or MERGE. The choice of SORT vs MERGE is purely about number of inputs and whether they are already sorted.

Explain It Like I'm Five

Imagine you have two stacks of cards, and each stack is already in order from A to Z. You want one big stack that is still A to Z. You don't shuffle everything. You merge: look at the top card of the first stack and the top card of the second stack. Whichever letter comes first, you put that card on the result pile. Then you look at the next top cards and do it again. That's MERGE. Now imagine you have one messy pile of cards in random order. You have to sort them: look at all of them and put them in A-to-Z order. That's SORT. So: one messy pile → SORT. Two (or more) piles that are already in order → MERGE to combine them.

Exercises

You have four datasets; two are sorted by bytes 5–8 (packed decimal, ascending) and two are not sorted. How would you produce one output sorted by 5–8 PD A? (Hint: sort the two unsorted ones first, then merge all four.)
Why can MERGE be faster than SORT when combining the same total number of records?
Write the MERGE control statement to merge inputs that are each sorted by bytes 1–6 CH ascending and then bytes 7–10 PD descending.
If you mistakenly use SORTIN for one of two inputs in a MERGE step, what will DFSORT do?

Quiz

Test Your Knowledge

1. You have one large unsorted file. Which do you use?

MERGE with SORTIN01
SORT with SORTIN
MERGE with SORTIN
SORT with SORTIN01 and SORTIN02

2. You have three files, each already sorted by customer ID. You want one combined file in the same order. Which is more efficient?

Concatenate the three into SORTIN and run SORT
Use MERGE with SORTIN01, SORTIN02, SORTIN03
Run SORT three times
MERGE and SORT are equally efficient

3. What is the main algorithmic difference between SORT and MERGE?

SORT compares two records; MERGE compares three
SORT reorders all records (full sort); MERGE only combines pre-sorted streams in order (linear merge)
MERGE uses more memory
There is no algorithmic difference

4. Which DD names do you use for a two-input MERGE?

SORTIN (twice)
SORTIN01 and SORTIN02
SORTIN and SORTIN02
SORTOUT only

5. What happens if you run MERGE but one input is not sorted by the MERGE key?

DFSORT sorts that input first automatically
The merged output may be incorrectly ordered; MERGE assumes inputs are already in order
Only that input is skipped
DFSORT abends