MainframeMaster

Pre-Sorted Dataset Merging

DFSORT MERGE combines two or more inputs that are each already sorted by the same key. For the merge to produce correct output, every input must be in key order: same key position, length, format, and direction. This page explains how to prepare and verify datasets for MERGE: matching the key across inputs, when to run a prior SORT, and how to avoid wrong order or inconsistent results.

MERGE Processing
Progress0 of 0 lessons

What “Pre-Sorted” Means for MERGE

Pre-sorted means that within each input dataset, every record is in ascending (or descending) order by the merge key. The merge key is what you specify in MERGE FIELDS=: start position, length, format (CH, PD, ZD, BI, etc.), and direction (A or D). For example, if you code MERGE FIELDS=(1,10,CH,A), then each input must be sorted so that the bytes in positions 1–10 (interpreted as character) never decrease from one record to the next. DFSORT does not re-sort; it only reads the next record from each stream and writes the one that comes first in key order. So if any input is out of order, the merged output will be wrong.

Key Must Match Across All Inputs

All inputs must be sorted by the same key. That means:

  • Position and length: The key must be in the same byte range in each input (e.g. bytes 1–10 in every file). If one file has the key in 1–10 and another in 11–20, they are not comparable for merge.
  • Format: The key must be interpreted the same way. Character (CH) sorts by character code; packed decimal (PD) and zoned decimal (ZD) sort by numeric value. If one input was sorted as CH and another as PD for the same bytes, the order may differ (e.g. "2" and "10" in CH order vs numeric order). Use the same format in the prior SORT (or source process) as in MERGE FIELDS=.
  • Direction: All inputs must be in the same direction—all ascending (A) or all descending (D). If one stream is ascending and another descending, the merge will interleave them incorrectly.

So when you prepare inputs, use the exact same key specification (position, length, format, direction) in every SORT or process that produces those inputs, and then use that same specification in MERGE FIELDS=.

Preparation Steps in Order

Steps to prepare datasets for MERGE
StepActionDetail
1Decide the merge keySame position, length, format, and direction for MERGE FIELDS=
2Ensure each input is sorted by that keyRun SORT with same SORT FIELDS= on any unsorted input
3Use same key in MERGE FIELDS=Match exactly what each input was sorted by
4Allocate SORTIN01, SORTIN02, …One DD per pre-sorted input

When to Run a Prior SORT

Run a SORT step on an input when:

  • The dataset is unsorted or you do not know its order. Sort it with SORT FIELDS= equal to the key you will use in MERGE FIELDS=.
  • The dataset was sorted by a different key (different position, length, format, or direction). Re-sort it with the merge key.
  • The dataset was produced by another system or job and you cannot guarantee order. Sort it with the merge key to be safe.

You do not need to sort an input again if you are certain it is already in the correct order by the same key as MERGE FIELDS=. For example, if a previous step in the same job produced the dataset with a SORT FIELDS=(1,10,CH,A) and your MERGE uses MERGE FIELDS=(1,10,CH,A), you can use it directly as SORTIN01 or SORTIN02.

Example: Two Unsorted Files to One Merged Output

Suppose you have two unsorted files, PART1 and PART2. You want one output merged by bytes 1–6 character ascending. First sort each part, then merge.

Step 1: Sort each input

jcl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//SORT1 EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.PART1,DISP=SHR //SORTOUT DD DSN=MY.SORTED.PART1,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,6,CH,A) /* //SORT2 EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=MY.PART2,DISP=SHR //SORTOUT DD DSN=MY.SORTED.PART2,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(5,2)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * SORT FIELDS=(1,6,CH,A) /*

Both SORT steps use the same key: 1,6,CH,A. So MY.SORTED.PART1 and MY.SORTED.PART2 are both in order by bytes 1–6 character ascending.

Step 2: MERGE the sorted outputs

jcl
1
2
3
4
5
6
7
8
9
//MERGE1 EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN01 DD DSN=MY.SORTED.PART1,DISP=SHR //SORTIN02 DD DSN=MY.SORTED.PART2,DISP=SHR //SORTOUT DD DSN=MY.MERGED.OUTPUT,DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(10,5)),DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920) //SYSIN DD * MERGE FIELDS=(1,6,CH,A) /*

MERGE FIELDS=(1,6,CH,A) matches the SORT FIELDS= used in step 1. The merge step only combines the two streams; it does not re-sort.

Format and Direction Matter

Format: If one input was sorted with format CH (character) and another with PD (packed decimal) for the same byte positions, the logical order can differ. For example, bytes containing the character representation of numbers "10" and "2" sort differently in CH (character) vs numeric (PD). So when you prepare inputs, use the same format in every SORT as in MERGE FIELDS=.

Direction: Ascending (A) means smallest key first; descending (D) means largest key first. MERGE assumes all streams are in the same direction. If one input is A and another D, the “next” record from the D stream is actually moving the wrong way relative to the A stream, and the merged sequence will be incorrect. Always use the same direction when sorting each input.

Verifying Order (When in Doubt)

If you are not sure whether a dataset is sorted correctly, the safest approach is to run a SORT step on it with the exact MERGE FIELDS= key. That guarantees correct order for the MERGE. Some shops use a separate “validation” or “re-sort” step that sorts the input and compares record count or checksums to detect if the input was already in order; in practice, re-sorting with the merge key is a simple and reliable way to prepare data for MERGE.

Multiple Keys

If you merge by a multi-part key (e.g. primary 1–10 CH A, secondary 11–4 PD D), then every input must be sorted by that same multi-part key. When you run prior SORT steps, use SORT FIELDS=(1,10,CH,A,11,4,PD,D) (or the same positions/formats you use in MERGE FIELDS=). The entire key—all positions, formats, and directions—must match across all inputs.

Explain It Like I'm Five

You have two piles of cards. Each pile is already in order from A to Z. Before you can merge them into one big A-to-Z pile, you have to make sure both piles are really in A-to-Z order. If one pile was sorted by first name and the other by last name, when you merge you mix two different orderings and get a mess. So “pre-sorted for merge” means: both piles are sorted the same way (same rule, same direction). If you are not sure about a pile, sort it again with that same rule. Then you can merge them into one correct pile.

Exercises

  1. You will merge by bytes 20–25 PD ascending. One input was sorted by 20–25 CH ascending. Is it safe to use as-is? What should you do?
  2. Write the SORT FIELDS= you would use to prepare an unsorted file for MERGE FIELDS=(1,8,CH,A,9,4,ZD,D).
  3. Why must all MERGE inputs use the same key direction (all A or all D)?
  4. You have three inputs: two from a prior job that used SORT FIELDS=(1,10,CH,A), and one from an external system with unknown order. What steps do you take before the MERGE?

Quiz

Test Your Knowledge

1. What must match across all inputs before you MERGE them?

  • Only record length
  • The sort key (position, length, format, and direction) that each input is sorted by
  • Only the key position
  • Dataset names

2. Your MERGE inputs came from different jobs. How do you ensure they are sorted correctly for MERGE?

  • DFSORT checks automatically
  • Run a SORT step on each input using the same SORT FIELDS= as you will use in MERGE FIELDS=, then MERGE those outputs
  • Only the first input needs to be sorted
  • Use INCLUDE to fix order

3. If one input is sorted ascending and another descending by the same key, what happens when you MERGE?

  • DFSORT merges correctly by using the first input order
  • The merged output will not be correctly ordered; one stream is in the wrong direction
  • DFSORT reverses the descending one automatically
  • Only the ascending input is used

4. What is a safe way to prepare two unsorted files for a MERGE step?

  • Concatenate them and run one SORT
  • Run SORT on each file with the same SORT FIELDS= as the planned MERGE FIELDS=, then MERGE the two sorted outputs
  • Use MERGE with SORTIN01 and SORTIN02 and hope for the best
  • Use INREC to fix the key

5. Why is it important that key format (e.g. CH, PD) match across MERGE inputs?

  • It does not matter; DFSORT converts automatically
  • Different formats compare differently; if one input was sorted as CH and another as PD, the key order may not match and merged order can be wrong
  • Only length must match
  • Format only affects performance