MainframeMaster

File Comparison

Comparing two datasets is a common mainframe task: you need to know what was added, deleted, or changed between two versions of a file (e.g. before and after a load, or two extracts from different systems). DFSORT does not have a single "compare" verb, but you can achieve full file comparison using JOINKEYS. JOINKEYS joins two files on a key; by using UNPAIRED options and REFORMAT you can produce separate outputs for records only in the first file, only in the second file, and in both. This page explains how to compare two files by key (adds and deletes), how to use UNPAIRED,ONLY and UNPAIRED,F,F, how to split the join output into three files, and how to extend the approach to find content changes (not just key presence) by comparing full records or selected fields after the join.

Real World Use Cases
Progress0 of 0 lessons

Why Compare Two Files?

In batch processing you often have two versions of the same logical file: yesterday's extract and today's, or the output of a program run before and after a change. You need to know: which keys were added (in the new file but not the old), which were deleted (in the old but not the new), and optionally which records have the same key but different content (changes). This supports audit, reconciliation, and change-driven processing. DFSORT's JOINKEYS facility is designed for joining two files on a key; that same mechanism gives you a powerful way to compare two files by key and to feed a downstream step that compares content.

File comparison methods with DFSORT
What you wantDFSORT approachPurpose
Key-based (adds/deletes)JOINKEYS UNPAIREDFind records in one file but not the other by join key
Key + content (changes)JOINKEYS + REFORMAT + comparatorMatch on key, then compare full record or fields to find changed records
Same key, different dataJOINKEYS REFORMATOutput both versions side by side for downstream comparison

Key-Based Comparison: JOINKEYS

JOINKEYS reads two files (F1 and F2), both assumed to be in ascending order by a join key. It matches records with the same key and can output: (1) only matched pairs, (2) only unmatched records (records in F1 with no match in F2, and records in F2 with no match in F1), or (3) all records (matched and unmatched). For file comparison you typically use the unmatched or all options so you can identify adds and deletes.

JOINKEYS UNPAIRED options

The UNPAIRED control on the JOINKEYS statement determines which records appear in the join output. The main values for comparison are:

  • UNPAIRED,F,F — Include all records: every F1 record and every F2 record. Matched keys appear as paired records (depending on REFORMAT); unmatched F1 records are included; unmatched F2 records are included. So you get a single stream that you can later split by a source indicator.
  • UNPAIRED,ONLY — Include only unmatched records. Records that exist in F1 but not in F2 (deletes when F1=old, F2=new) and records that exist in F2 but not in F1 (adds). No matched records are written. This is the usual choice when you only care about the delta (adds and deletes).
  • UNPAIRED,F — Unmatched records from F1 only (records in F1 that have no match in F2).
  • UNPAIRED,NONE (default) — Only matched pairs. Unmatched records from either file are dropped. Not typically used for comparison; used for inner join.

For a classic "before vs after" comparison: treat F1 as the old file and F2 as the new file. Then records only in F1 are deletes (removed in the new), and records only in F2 are adds (new in the new). Use UNPAIRED,ONLY to get just those two groups, or UNPAIRED,F,F and then split by an indicator to get three files: only-F1, only-F2, and matched.

Ensuring Both Files Are in Key Order

JOINKEYS assumes both inputs are sorted in ascending order by the join key. If either file is not in that order, the join will miss matches or produce wrong unmatched output. So before the JOINKEYS step you must either: (1) sort both files by the comparison key and pass the sorted datasets as F1 and F2, or (2) use a single DFSORT step that does SORT FIELDS= on both (e.g. via two SORTIN-like inputs and a merge) so that the outputs are in key order. In practice, many shops run a separate SORT for each file and then run JOINKEYS with the two sorted results. The JOINKEYS step does not sort; it only joins.

JOINKEYS Control Statements for Comparison

You need: (1) JOINKEYS F1=... with the key for the first file, (2) JOINKEYS F2=... with the key for the second file (same key length and type as F1), (3) optionally REFORMAT to define the output record, and (4) optionally SORT or COPY and OUTFIL to format or split the join output. Example: compare two files on a 20-byte key at position 1, and output only unmatched records (adds and deletes).

text
1
2
3
4
5
6
JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) REFORMAT FIELDS=(F1:1,80,F2:1,80) JOIN UNPAIRED,ONLY SORT FIELDS=COPY OUTFIL FNAMES=DIFF

Here SORTIN is the first file (e.g. old), SORTIN2 is the second (e.g. new). Both must be sorted by 1,20,CH. REFORMAT FIELDS=(F1:1,80,F2:1,80) writes an 160-byte record: first 80 bytes from F1, next 80 from F2. For UNPAIRED,ONLY, when a record is only in F1 the F2 part is blank (or low values); when only in F2 the F1 part is blank. So you can tell adds vs deletes by checking which half has data. In practice you often add a one-byte indicator in REFORMAT (e.g. from a constant or from the join) and then use OUTFIL INCLUDE/OMIT or SPLITBY to write separate add and delete files.

Producing Three Outputs: Only in File1, Only in File2, In Both

To get three separate datasets—records only in the first file, only in the second, and in both—use UNPAIRED,F,F so the join output contains every record. In REFORMAT you must include a field that indicates which case each record is: only F1, only F2, or both. DFSORT does not automatically add such a field; you can use a user exit (MODS) to set it, or in some setups you can infer it from the presence of data in F1 vs F2 portions (e.g. if F2 portion is blank then record was only in F1). A common approach is to use REFORMAT to build a record that includes a one-byte code: e.g. 1=only F1, 2=only F2, 3=both. That requires the exit or post-processing. Alternatively, use two JOINKEYS steps: one with UNPAIRED,F to get only-F1 records, one with UNPAIRED with F2 to get only-F2 records, and one with UNPAIRED,NONE to get matched; then you have three outputs from three steps. Many shops use one JOINKEYS with UNPAIRED,F,F and a REFORMAT that includes F1 and F2 full records, then a second step (or ICETOOL) that checks whether the F1 part is blank (then it's an add) or F2 part is blank (then it's a delete) and routes to three files.

Example: REFORMAT with source indicator via constants

If your REFORMAT can distinguish records by checking blank/low-value in F1 or F2, you can use OUTFIL with INCLUDE. For example REFORMAT FIELDS=(F1:1,80,F2:1,80). For unmatched F1 records, F2 bytes are typically blank. For unmatched F2 records, F1 bytes are typically blank. So you could use OUTFIL with INCLUDE=(81,80,CH,EQ,C' ') for "only in F1" (F2 part all blanks) and OMIT with the same for "only in F2"—but that is fragile if data contains blanks. The robust way is to use a MODS exit to set a byte to 1/2/3 and then SPLITBY or INCLUDE/OMIT on that byte. For illustration, a simple split that works when unused half is blank:

text
1
2
3
4
5
6
7
8
JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) REFORMAT FIELDS=(F1:1,80,F2:1,80) JOIN UNPAIRED,F,F SORT FIELDS=COPY OUTFIL FNAMES=ONLYF1,INCLUDE=(81,80,CH,EQ,C' ') OUTFIL FNAMES=ONLYF2,INCLUDE=(1,80,CH,EQ,C' ') OUTFIL FNAMES=BOTH,OMIT=(81,80,CH,EQ,C' '),OMIT=(1,80,CH,EQ,C' ')

The exact OMIT/INCLUDE conditions depend on your record layout and how DFSORT fills the "missing" side for unmatched records (often blanks). Check your DFSORT documentation for REFORMAT and UNPAIRED behavior. When in doubt, use a user exit to set an explicit indicator and split on that.

Comparing Full Record Content (Changes)

To find not just adds and deletes but also "changed" records (same key, different content), you need to compare the full record or selected fields after the join. Step 1: JOINKEYS with UNPAIRED,NONE (or UNPAIRED,F,F and then keep only matched) so you have matched pairs. Step 2: REFORMAT so the output record contains the key and the full record (or fields to compare) from both F1 and F2—for example F1:1,80 and F2:1,80. Step 3: Use a second DFSORT step or a small program to compare the two halves byte-by-byte. If they differ, write the record to a "changes" file. DFSORT does not have a built-in "compare two fields" operator in INCLUDE/OMIT; you need a user exit (MODS) or a separate program/ICETOOL step to do the byte comparison and keep only differing records. So the pattern is: JOINKEYS (match on key) → REFORMAT (both versions in one record) → exit or next step (compare bytes, output only differences).

JCL for File Comparison

You need two input DDs for the two files (e.g. SORTIN and SORTIN2, or F1 and F2 as required by your DFSORT version) and one or more output DDs. Example JCL skeleton:

text
1
2
3
4
5
6
7
8
9
//STEP EXEC PGM=SORT //SORTIN DD DSN=OLD.FILE,DISP=SHR //SORTIN2 DD DSN=NEW.FILE,DISP=SHR //SORTOUT DD DSN=COMPARE.OUTPUT,DISP=(NEW,CATLG),... //SYSIN DD * JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) ... /*

Ensure OLD.FILE and NEW.FILE are both sorted by (1,20,CH) before this step. The join output goes to SORTOUT (or to OUTFIL DDs if you use multiple OUTFILs with FNAMES=).

Best Practices

  • Always sort both inputs by the same join key before JOINKEYS; otherwise results are wrong.
  • Use UNPAIRED,ONLY when you only need adds and deletes; it keeps the output small.
  • For three-way split (only F1, only F2, both), use a reliable indicator (e.g. MODS exit) rather than relying on blank detection unless your data guarantees it.
  • For content comparison (changes), combine JOINKEYS + REFORMAT with a comparator (exit or next step); document the comparison rule (e.g. which bytes are compared).
  • Validate record counts: e.g. only-F1 + only-F2 + 2*both should match total input records when using UNPAIRED,F,F, depending on how you count.

Explain It Like I'm Five

Imagine you have two lists of toy names: yesterday's list and today's. You want to know which toys are new (on today's list but not yesterday's), which are gone (on yesterday's but not today's), and which are on both lists. JOINKEYS is like a helper who lines up both lists by name and tells you: "this name only on the first list," "this name only on the second list," and "this name on both." So you get three piles: new toys, gone toys, and same toys. If you also want to know "same name but the color changed," you need another step that looks at both lists side by side and says when the rest of the line is different.

Exercises

  1. Write JOINKEYS control statements to compare two files on a 10-byte key at position 1 and output only unmatched records (adds and deletes). Assume F1=SORTIN, F2=SORTIN2.
  2. What is the difference between UNPAIRED,F,F and UNPAIRED,ONLY? When would you use each?
  3. Both input files have 80-byte records. You want one output file with records that exist in both files but with different content (same key, different bytes). Describe the two-step approach (JOINKEYS + what next?).
  4. Why can you not use JOINKEYS correctly if one of the input files is not sorted by the join key?

Quiz

Test Your Knowledge

1. What is the main DFSORT feature used to compare two files by key?

  • INCLUDE/OMIT only
  • JOINKEYS: it joins two files on a key and can output matched, unmatched from F1, or unmatched from F2 so you can see adds, deletes, and matches
  • SUM only
  • OUTFIL only

2. How do you get three outputs: records only in file 1, only in file 2, and in both?

  • Three separate SORT steps
  • One JOINKEYS step with REFORMAT and OUTFIL SPLITBY or multiple OUTFIL with INCLUDE on a match indicator field that you put in REFORMAT (e.g. 1,1 from F1 and a constant for F2; then INCLUDE/OMIT on that indicator)
  • MERGE only
  • SUM only

3. What does UNPAIRED,ONLY do in JOINKEYS?

  • Keeps only matched records
  • Outputs only records that do not have a match in the other file: records in F1 not in F2, and records in F2 not in F1 (adds and deletes)
  • Keeps the first record of each key
  • Deduplicates

4. Why must both input files be in key order for JOINKEYS comparison?

  • They do not need to be
  • JOINKEYS assumes both files are sorted (or at least ordered) by the join key; it reads both streams in key order and matches by advancing through both. If either file is not in key order, matches are missed or wrong
  • Only SORTIN must be sorted
  • Only for MERGE

5. How can you compare full records (not just keys) to find changed content?

  • JOINKEYS cannot do that
  • Use JOINKEYS to match on key, REFORMAT to include the full records (or the fields to compare) from both F1 and F2 in one output record, then use a later step or OUTREC/IFTHEN to compare bytes and flag or output only records where the content differs
  • Only with SUM
  • Only with INCLUDE