MainframeMaster

MERGE Statement

The MERGE control statement tells DFSORT to merge two or more pre-sorted input streams into one sorted output. Unlike SORT, which reorders a single unsorted input, MERGE assumes each input is already in order by the same key and combines them with a single pass. This page covers MERGE FIELDS= syntax, the required DD names (SORTIN01, SORTIN02, …), when to use MERGE instead of SORT, and how INREC, INCLUDE, OMIT, and OUTREC still apply.

Control Statements
Progress0 of 0 lessons

When to Use MERGE Instead of SORT

Use SORT when you have one input dataset and the records are in no particular order (or wrong order). DFSORT reads all records, sorts them by the key you specify, and writes the result. Use MERGE when you have two or more input datasets that are each already sorted by the same key. DFSORT does not re-sort; it only merges the streams by repeatedly taking the "next" record in key order from each input and writing it to the output. So the combined output is still sorted. Merging is more efficient than sorting when the data is already in order because it requires only a single pass over the data (linear time) instead of a full sort (typically n log n).

Typical uses of MERGE: combining the results of several sort jobs that each sorted a partition of the data; merging daily files that were each sorted by the same key; or building one sorted master from multiple sorted feeds. If you have one big unsorted file, use SORT. If you have several smaller sorted files with the same key order, use MERGE.

MERGE FIELDS= Syntax

The syntax of MERGE FIELDS= is the same as SORT FIELDS=:

text
1
MERGE FIELDS=(start,length,format,direction,...)

You specify the key that all input streams are sorted by: start (byte position, 1-based), length (key length in bytes), format (CH, PD, ZD, BI, etc.), and direction (A ascending, D descending). For multiple keys, you repeat the four values (e.g. primary key, then secondary key). Every input (SORTIN01, SORTIN02, …) must be sorted by this exact same key and direction. If one input was sorted ascending and another descending, or by a different position, the merge would produce incorrect order. So when you prepare the inputs (e.g. with prior SORT steps), use the same SORT FIELDS= (or equivalent) that you will use in MERGE FIELDS=.

DD Names for MERGE: SORTIN01, SORTIN02, …

For a SORT step you use a single input DD named SORTIN. For a MERGE step you do not use SORTIN. Instead you use SORTIN01, SORTIN02, and optionally SORTIN03 through SORTIN16 (or more, depending on product level). Each DD corresponds to one input stream. The number of DDs must match the number of inputs you want to merge. Example: to merge three files, you allocate SORTIN01, SORTIN02, and SORTIN03. SORTOUT, SYSIN, and SYSOUT are still required; only the input side changes from one SORTIN to multiple SORTINnn.

Pre-Sorted Requirement

Every record in SORTIN01 must be in ascending (or descending) order by the MERGE key; same for SORTIN02, SORTIN03, etc. DFSORT does not verify or correct order. It assumes that when it reads the next record from SORTIN01, that record has a key greater than or equal to the previous record from SORTIN01 (for ascending). If an input is out of order, the merged output will be wrong: you might get a record with key 100 before a record with key 50 in the output. So before running MERGE, ensure each input was produced by a sort (or other process) that used the same key as MERGE FIELDS=. If in doubt, run a SORT on each input first with the same key, then MERGE the results.

Example: Two-Input Merge

Suppose you have two datasets, each sorted by bytes 1–10 character ascending. You want one combined dataset in the same order.

jcl
1
2
3
4
5
6
7
8
9
//MERGESTP EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR //SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR //SORTOUT DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * MERGE FIELDS=(1,10,CH,A) /*

MERGE FIELDS=(1,10,CH,A) says: all inputs are sorted by bytes 1–10, character, ascending. DFSORT merges the two streams and writes to SORTOUT. You do not allocate SORTIN; only SORTIN01 and SORTIN02.

Multiple Keys and Multiple Inputs

You can merge by a multi-part key, e.g. primary key bytes 1–10 CH A, secondary key bytes 11–14 PD D. All inputs must be sorted by that same multi-part key. Example:

text
1
MERGE FIELDS=(1,10,CH,A,11,4,PD,D)

Each input must have been sorted with the equivalent of SORT FIELDS=(1,10,CH,A,11,4,PD,D) so that the merge comparison is consistent.

Using INREC, INCLUDE, OMIT, and OUTREC with MERGE

MERGE only changes the sort phase (merge instead of sort). The input phase and output phase work the same. So you can use INCLUDE or OMIT to filter records from each input before they go into the merge. You can use INREC to reformat each record before the merge (e.g. shorten the record or build a new key); the merge then uses the reformatted record. You can use OUTREC to reformat each record after the merge when writing to SORTOUT. You can use OUTFIL to write additional outputs. So a MERGE step can still have complex filtering and reformatting; only the middle step is "merge streams" instead of "sort records."

MERGE vs SORT: Comparison

  • Input: SORT uses one DD (SORTIN). MERGE uses two or more (SORTIN01, SORTIN02, …).
  • Order assumption: SORT accepts any order. MERGE requires each input to be pre-sorted by the same key.
  • Algorithm: SORT does a full sort (compare and reorder). MERGE does a linear merge (compare the next record from each stream and write the smallest or largest).
  • Performance: For the same total data size, MERGE is typically faster when inputs are already sorted because it avoids the full sort.
  • Control statement: SORT FIELDS= vs MERGE FIELDS=. Syntax of FIELDS= is the same.

Explain It Like I'm Five

Imagine you have two stacks of cards that are each already in order (A to Z). You want one big stack still in order. You don't shuffle everything again—you merge: look at the top card of stack 1 and the top card of stack 2, pick the one that comes first in the alphabet, put it in the result pile, and repeat. That's what MERGE does. The MERGE statement is like saying: "My two (or more) piles are already sorted by the first 10 letters. Combine them by always taking the smallest next card from any pile." If one of the piles wasn't really in order, the big pile you build won't be in order either. So MERGE is for when you already have sorted piles and just want to combine them.

Exercises

  1. You have three datasets, each sorted by bytes 20–25 packed decimal ascending. Write the MERGE control statement and list the DD names you need for input.
  2. Why must all MERGE inputs be sorted by the same key? What could go wrong if one input was sorted by a different key?
  3. When would you choose to run two SORT steps (one per partition) and then MERGE, instead of one SORT over the combined data?
  4. Can you use INCLUDE COND= with MERGE? Where in the processing does it apply?

Quiz

Test Your Knowledge

1. When do you use MERGE instead of SORT?

  • When you have one unsorted input
  • When you have two or more inputs already sorted by the same key
  • When you want to copy only
  • When you use OUTREC

2. Which DD names are used for MERGE inputs?

  • SORTIN only
  • SORTIN01, SORTIN02, ...
  • SORTOUT
  • SYSIN

3. What happens if a MERGE input is not sorted by the key you specify?

  • DFSORT sorts it first automatically
  • The merged output may be incorrect; MERGE assumes inputs are already in order
  • Nothing
  • Only SORTIN01 must be sorted

4. Is MERGE FIELDS= syntax the same as SORT FIELDS=?

  • No, completely different
  • Yes; same (position,length,format,direction) for the merge key
  • Only for single key
  • MERGE has no FIELDS parameter

5. Why is MERGE often faster than SORT for combining sorted files?

  • MERGE uses less memory
  • MERGE does a linear pass comparing the next record from each stream; no full sort
  • MERGE skips INCLUDE
  • MERGE does not write to SORTOUT