What is the MERGE statement in DFSORT?

The MERGE statement tells DFSORT to merge two or more pre-sorted input streams into one sorted output. You use MERGE FIELDS=(position,length,format,direction,...) to define the key that all inputs are sorted by. Inputs are SORTIN01, SORTIN02, etc.; output is SORTOUT.

What is the difference between SORT and MERGE in DFSORT?

SORT takes one input (SORTIN), reorders records by the specified key, and writes to SORTOUT. MERGE takes two or more inputs (SORTIN01, SORTIN02, ...) that are already sorted by the same key and merges them into one sorted stream. MERGE does not re-sort; it only combines.

Do MERGE inputs have to be sorted?

Yes. Each input to MERGE must already be sorted by the same key (and format and direction) as specified in MERGE FIELDS. If any input is out of order, the merged output will not be correctly ordered.

How many inputs can DFSORT MERGE handle?

DFSORT supports multiple merge inputs. The exact maximum (e.g. SORTIN01 through SORTIN16 or more) depends on the product level. You allocate one DD per input (SORTIN01, SORTIN02, SORTIN03, ...) and code MERGE FIELDS= with the common key.

Can I use INCLUDE or OUTREC with MERGE?

Yes. INCLUDE and OMIT filter records during the input phase; INREC reformats before the merge; OUTREC reformats after the merge when writing. MERGE only changes how the sort phase works (merge instead of sort); the rest of the control statements apply the same way.

DFSORT MERGE

The MERGE control statement tells DFSORT to merge two or more pre-sorted input streams into one sorted output. Unlike SORT, which reorders a single unsorted input, MERGE assumes each input is already in order by the same key and combines them with a single pass. This page covers MERGE FIELDS= syntax, the required DD names (SORTIN01, SORTIN02, …), when to use MERGE instead of SORT, and how INREC, INCLUDE, OMIT, and OUTREC still apply.

Control Statements

Progress0 of 0 lessons

When to Use MERGE Instead of SORT

Use SORT when you have one input dataset and the records are in no particular order (or wrong order). DFSORT reads all records, sorts them by the key you specify, and writes the result. Use MERGE when you have two or more input datasets that are each already sorted by the same key. DFSORT does not re-sort; it only merges the streams by repeatedly taking the "next" record in key order from each input and writing it to the output. So the combined output is still sorted. Merging is more efficient than sorting when the data is already in order because it requires only a single pass over the data (linear time) instead of a full sort (typically n log n).

Typical uses of MERGE: combining the results of several sort jobs that each sorted a partition of the data; merging daily files that were each sorted by the same key; or building one sorted master from multiple sorted feeds. If you have one big unsorted file, use SORT. If you have several smaller sorted files with the same key order, use MERGE.

MERGE FIELDS= Syntax

The syntax of MERGE FIELDS= is the same as SORT FIELDS=:

text

1
MERGE FIELDS=(start,length,format,direction,...)

You specify the key that all input streams are sorted by: start (byte position, 1-based), length (key length in bytes), format (CH, PD, ZD, BI, etc.), and direction (A ascending, D descending). For multiple keys, you repeat the four values (e.g. primary key, then secondary key). Every input (SORTIN01, SORTIN02, …) must be sorted by this exact same key and direction. If one input was sorted ascending and another descending, or by a different position, the merge would produce incorrect order. So when you prepare the inputs (e.g. with prior SORT steps), use the same SORT FIELDS= (or equivalent) that you will use in MERGE FIELDS=.

DD Names for MERGE: SORTIN01, SORTIN02, …

For a SORT step you use a single input DD named SORTIN. For a MERGE step you do not use SORTIN. Instead you use SORTIN01, SORTIN02, and optionally SORTIN03 through SORTIN16 (or more, depending on product level). Each DD corresponds to one input stream. The number of DDs must match the number of inputs you want to merge. Example: to merge three files, you allocate SORTIN01, SORTIN02, and SORTIN03. SORTOUT, SYSIN, and SYSOUT are still required; only the input side changes from one SORTIN to multiple SORTINnn.

Pre-Sorted Requirement

Every record in SORTIN01 must be in ascending (or descending) order by the MERGE key; same for SORTIN02, SORTIN03, etc. DFSORT does not verify or correct order. It assumes that when it reads the next record from SORTIN01, that record has a key greater than or equal to the previous record from SORTIN01 (for ascending). If an input is out of order, the merged output will be wrong: you might get a record with key 100 before a record with key 50 in the output. So before running MERGE, ensure each input was produced by a sort (or other process) that used the same key as MERGE FIELDS=. If in doubt, run a SORT on each input first with the same key, then MERGE the results.

Example: Two-Input Merge

Suppose you have two datasets, each sorted by bytes 1–10 character ascending. You want one combined dataset in the same order.

jcl

1
2
3
4
5
6
7
8
9
//MERGESTP EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR
//SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR
//SORTOUT  DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
//SYSIN    DD *
  MERGE FIELDS=(1,10,CH,A)
/*

MERGE FIELDS=(1,10,CH,A) says: all inputs are sorted by bytes 1–10, character, ascending. DFSORT merges the two streams and writes to SORTOUT. You do not allocate SORTIN; only SORTIN01 and SORTIN02.

Multiple Keys and Multiple Inputs

You can merge by a multi-part key, e.g. primary key bytes 1–10 CH A, secondary key bytes 11–14 PD D. All inputs must be sorted by that same multi-part key. Example:

text

1
  MERGE FIELDS=(1,10,CH,A,11,4,PD,D)

Each input must have been sorted with the equivalent of SORT FIELDS=(1,10,CH,A,11,4,PD,D) so that the merge comparison is consistent.

Using INREC, INCLUDE, OMIT, and OUTREC with MERGE

MERGE only changes the sort phase (merge instead of sort). The input phase and output phase work the same. So you can use INCLUDE or OMIT to filter records from each input before they go into the merge. You can use INREC to reformat each record before the merge (e.g. shorten the record or build a new key); the merge then uses the reformatted record. You can use OUTREC to reformat each record after the merge when writing to SORTOUT. You can use OUTFIL to write additional outputs. So a MERGE step can still have complex filtering and reformatting; only the middle step is "merge streams" instead of "sort records."

MERGE vs SORT: Comparison

Input: SORT uses one DD (SORTIN). MERGE uses two or more (SORTIN01, SORTIN02, …).
Order assumption: SORT accepts any order. MERGE requires each input to be pre-sorted by the same key.
Algorithm: SORT does a full sort (compare and reorder). MERGE does a linear merge (compare the next record from each stream and write the smallest or largest).
Performance: For the same total data size, MERGE is typically faster when inputs are already sorted because it avoids the full sort.
Control statement: SORT FIELDS= vs MERGE FIELDS=. Syntax of FIELDS= is the same.

Explain It Like I'm Five

Imagine you have two stacks of cards that are each already in order (A to Z). You want one big stack still in order. You don't shuffle everything again—you merge: look at the top card of stack 1 and the top card of stack 2, pick the one that comes first in the alphabet, put it in the result pile, and repeat. That's what MERGE does. The MERGE statement is like saying: "My two (or more) piles are already sorted by the first 10 letters. Combine them by always taking the smallest next card from any pile." If one of the piles wasn't really in order, the big pile you build won't be in order either. So MERGE is for when you already have sorted piles and just want to combine them.

Exercises

You have three datasets, each sorted by bytes 20–25 packed decimal ascending. Write the MERGE control statement and list the DD names you need for input.
Why must all MERGE inputs be sorted by the same key? What could go wrong if one input was sorted by a different key?
When would you choose to run two SORT steps (one per partition) and then MERGE, instead of one SORT over the combined data?
Can you use INCLUDE COND= with MERGE? Where in the processing does it apply?

Quiz

Test Your Knowledge

1. When do you use MERGE instead of SORT?

When you have one unsorted input
When you have two or more inputs already sorted by the same key
When you want to copy only
When you use OUTREC

2. Which DD names are used for MERGE inputs?

SORTIN only
SORTIN01, SORTIN02, ...
SORTOUT
SYSIN

3. What happens if a MERGE input is not sorted by the key you specify?

DFSORT sorts it first automatically
The merged output may be incorrect; MERGE assumes inputs are already in order
Nothing
Only SORTIN01 must be sorted

4. Is MERGE FIELDS= syntax the same as SORT FIELDS=?

No, completely different
Yes; same (position,length,format,direction) for the merge key
Only for single key
MERGE has no FIELDS parameter

5. Why is MERGE often faster than SORT for combining sorted files?

MERGE uses less memory
MERGE does a linear pass comparing the next record from each stream; no full sort
MERGE skips INCLUDE
MERGE does not write to SORTOUT