Why is MERGE faster than SORT for combining sorted files?

MERGE only combines already-sorted streams in one linear pass. It does not reorder records. SORT reorders all records, which typically requires O(n log n) comparisons and sortwork. So for the same total data size, MERGE uses less CPU and often less sortwork when the inputs are already in key order.

Does MERGE need sortwork (SORTWK) space?

MERGE typically needs less sortwork than SORT because it does not perform a full sort. It reads from each input stream and writes the next record in key order. The exact sortwork requirement depends on the product and options; check your installation. In many cases merge uses minimal or no sortwork compared to a full sort.

When should I use sort-then-merge instead of one big SORT?

When you have multiple inputs that can be sorted separately (e.g. in parallel or as separate steps) and then merged. Sorting smaller partitions can be faster than sorting one large combined file, and the final MERGE is one linear pass. So for large data that is already split (e.g. by day or region), sort-then-merge can reduce elapsed time and resource use.

Does the number of merge inputs affect performance?

Merge is still linear in the total number of records. With more inputs, the step of choosing which stream has the next record in key order involves more comparisons (e.g. 16-way vs 2-way). So there can be a modest CPU increase with many inputs, but merge remains much more efficient than re-sorting the combined data.

What hurts MERGE performance?

Using inputs that are not pre-sorted forces wrong output; fixing it (e.g. by re-sorting) wastes work. Very large record lengths or heavy INREC/OUTREC processing add CPU. Ensuring inputs are correctly sorted and avoiding unnecessary reformatting helps. Otherwise MERGE is already an efficient operation.

Merge Performance - DFSORT MERGE Performance and Tuning

Merge Performance

DFSORT MERGE is usually faster and uses fewer resources than SORT when you are combining data that is already sorted. MERGE does a linear pass over the input streams; SORT reorders all records. This page explains why merge performs better for pre-sorted data, how sortwork and CPU compare, when to use sort-then-merge instead of one big SORT, and what can hurt merge performance.

MERGE Processing

Why MERGE Is Faster Than SORT for Pre-Sorted Data

SORT takes one input and reorders it. To produce sorted output, DFSORT must compare records and rearrange them. That typically requires work proportional to n log n (where n is the number of records): many comparisons and intermediate storage (sortwork). MERGE takes two or more inputs that are already in key order and only combines them. At each step it looks at the current record from each stream and writes the one that comes next in the key. So the work is proportional to n: one pass over the data. That is why, for the same total number of records, MERGE uses less CPU and often less sortwork when the inputs are already sorted.

SORT vs MERGE: performance aspects
Aspect	SORT	MERGE
Algorithm	Full sort (reorder)	Linear merge (combine streams)
Typical time complexity	O(n log n)	O(n)
Sortwork	Often substantial	Typically less
When optimal	Single unsorted input	Multiple pre-sorted inputs

Sortwork and MERGE

Sortwork (SORTWK datasets) is used by DFSORT for temporary storage during the sort phase. A full SORT often needs sortwork space proportional to the input size because it must hold and reorder records. MERGE, in contrast, only reads from each input and writes the next record in key order; it does not need to hold the entire dataset in sortwork to reorder it. So MERGE typically requires less sortwork (or in some configurations minimal sortwork) compared to a SORT of the same total data. The exact amount depends on your product version and options; when in doubt, allocate SORTWK DDs for a MERGE step as you would for a SORT, and the product will use what it needs.

When Sort-Then-Merge Beats One Big SORT

Suppose you have a very large unsorted file. You could run one SORT on the whole file. Alternatively, you could split the file into several smaller files, run a SORT on each (e.g. in separate jobs or in parallel), and then MERGE the sorted results. The second approach can be faster because: (1) each SORT operates on less data, so each sort step may run in less time or use less resource per step; (2) if you can run the sort steps in parallel, total elapsed time can drop; (3) the final MERGE is one linear pass. So for very large data that can be split (e.g. by key range or by partition), sort-then-merge can reduce both elapsed time and peak resource use compared to one large SORT.

Number of Merge Inputs

With more merge inputs (e.g. 16 instead of 2), the merge still does one linear pass over the total records. The only extra cost is that for each output record, DFSORT must determine which input stream has the next record in key order. With 2 streams that is one comparison; with 16 streams it is more comparisons (or a small heap/priority structure). So there can be a modest increase in CPU when you have many inputs, but the overall work remains linear and much cheaper than a full re-sort. You should not avoid multi-file merge just because there are many inputs; the benefit of not re-sorting usually outweighs the cost of more comparisons per record.

What Can Hurt MERGE Performance

Inputs not pre-sorted: If any input is out of order, the merged output is wrong. Correcting it (e.g. by re-sorting the output or the inputs) wastes the benefit of merge. Always ensure each input is sorted by the same key as MERGE FIELDS=.
Heavy INREC/OUTREC: Complex reformatting adds CPU. Use only what you need.
Very large records or many inputs: Large records mean more data movement; many inputs mean more comparisons per output record. These are usually still better than one full SORT.

Concatenate-and-SORT vs MERGE

If you have two sorted files and you concatenate them into one dataset and run SORT, DFSORT treats the combined file as unsorted. It will re-sort everything. That uses more CPU and sortwork than feeding the two files as SORTIN01 and SORTIN02 and running MERGE. So whenever the data is already in the correct key order, use MERGE instead of SORT to combine it. The performance difference can be large for big files.

Explain It Like I'm Five

Imagine you have two stacks of cards already in A-to-Z order. To get one big A-to-Z stack, you don’t shuffle everything. You just keep picking the next card from the top of either stack (whichever comes first in the alphabet) and put it on the result. That’s fast—you only look at the top of each stack. That’s MERGE. If you mixed the two stacks into one big messy pile and then sorted the whole pile, you’d have to look at lots of cards and move them around. That’s SORT. So when your piles are already in order, merging is the fast way to combine them.

Exercises

You have 4 sorted files of 1 million records each. Would you prefer to concatenate them and run SORT, or run MERGE with SORTIN01–SORTIN04? Why?
What is the main reason MERGE typically needs less sortwork than SORT?
When might splitting one large unsorted file into 8 parts, sorting each part, and then merging the 8 be faster than sorting the large file once?
If you discover after a MERGE that one input was out of order, what are the implications for performance and correctness?

Quiz

Test Your Knowledge

1. Why is MERGE typically faster than SORT when combining pre-sorted data?

MERGE uses more memory
MERGE does a linear pass over the streams; SORT does a full reorder (e.g. O(n log n))
SORT always reads twice
MERGE skips INCLUDE

2. If you concatenate two sorted files into SORTIN and run SORT, vs feeding them as SORTIN01 and SORTIN02 and running MERGE, which usually uses more sortwork?

MERGE uses more
SORT usually uses more; it must reorder the combined data
They use the same
It depends only on record length

3. When might sort-then-merge be faster than one big SORT?

Never
When you can sort smaller partitions in parallel or in sequence and then merge; the merge is linear and the per-partition sorts are on smaller data
Only when you have two inputs
Only when using OPTION COPY

4. Does the number of merge inputs (e.g. 4 vs 16) significantly affect MERGE performance?

No; merge is always the same speed
More inputs mean more comparisons per output record (e.g. pick smallest of 16 vs 4); it can have some impact but merge is still linear
More inputs always make it faster
MERGE only allows 2 inputs

5. What can hurt MERGE performance?

Using OUTREC
Inputs that are not actually sorted (wrong order forces incorrect output; if you then fix by re-sorting, you lose the benefit)
Using SORTIN01
Using more than two inputs

Merge Performance

Why MERGE Is Faster Than SORT for Pre-Sorted Data

Sortwork and MERGE

When Sort-Then-Merge Beats One Big SORT

Number of Merge Inputs

What Can Hurt MERGE Performance

Concatenate-and-SORT vs MERGE

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

MERGE vs SORT

Multi-file merges

Pre-sorted dataset merging

Sortwork datasets

Related Pages