Comparing two datasets is a common mainframe task: you need to know what was added, deleted, or changed between two versions of a file (e.g. before and after a load, or two extracts from different systems). DFSORT does not have a single "compare" verb, but you can achieve full file comparison using JOINKEYS. JOINKEYS joins two files on a key; by using UNPAIRED options and REFORMAT you can produce separate outputs for records only in the first file, only in the second file, and in both. This page explains how to compare two files by key (adds and deletes), how to use UNPAIRED,ONLY and UNPAIRED,F,F, how to split the join output into three files, and how to extend the approach to find content changes (not just key presence) by comparing full records or selected fields after the join.
In batch processing you often have two versions of the same logical file: yesterday's extract and today's, or the output of a program run before and after a change. You need to know: which keys were added (in the new file but not the old), which were deleted (in the old but not the new), and optionally which records have the same key but different content (changes). This supports audit, reconciliation, and change-driven processing. DFSORT's JOINKEYS facility is designed for joining two files on a key; that same mechanism gives you a powerful way to compare two files by key and to feed a downstream step that compares content.
| What you want | DFSORT approach | Purpose |
|---|---|---|
| Key-based (adds/deletes) | JOINKEYS UNPAIRED | Find records in one file but not the other by join key |
| Key + content (changes) | JOINKEYS + REFORMAT + comparator | Match on key, then compare full record or fields to find changed records |
| Same key, different data | JOINKEYS REFORMAT | Output both versions side by side for downstream comparison |
JOINKEYS reads two files (F1 and F2), both assumed to be in ascending order by a join key. It matches records with the same key and can output: (1) only matched pairs, (2) only unmatched records (records in F1 with no match in F2, and records in F2 with no match in F1), or (3) all records (matched and unmatched). For file comparison you typically use the unmatched or all options so you can identify adds and deletes.
The UNPAIRED control on the JOINKEYS statement determines which records appear in the join output. The main values for comparison are:
For a classic "before vs after" comparison: treat F1 as the old file and F2 as the new file. Then records only in F1 are deletes (removed in the new), and records only in F2 are adds (new in the new). Use UNPAIRED,ONLY to get just those two groups, or UNPAIRED,F,F and then split by an indicator to get three files: only-F1, only-F2, and matched.
JOINKEYS assumes both inputs are sorted in ascending order by the join key. If either file is not in that order, the join will miss matches or produce wrong unmatched output. So before the JOINKEYS step you must either: (1) sort both files by the comparison key and pass the sorted datasets as F1 and F2, or (2) use a single DFSORT step that does SORT FIELDS= on both (e.g. via two SORTIN-like inputs and a merge) so that the outputs are in key order. In practice, many shops run a separate SORT for each file and then run JOINKEYS with the two sorted results. The JOINKEYS step does not sort; it only joins.
You need: (1) JOINKEYS F1=... with the key for the first file, (2) JOINKEYS F2=... with the key for the second file (same key length and type as F1), (3) optionally REFORMAT to define the output record, and (4) optionally SORT or COPY and OUTFIL to format or split the join output. Example: compare two files on a 20-byte key at position 1, and output only unmatched records (adds and deletes).
123456JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) REFORMAT FIELDS=(F1:1,80,F2:1,80) JOIN UNPAIRED,ONLY SORT FIELDS=COPY OUTFIL FNAMES=DIFF
Here SORTIN is the first file (e.g. old), SORTIN2 is the second (e.g. new). Both must be sorted by 1,20,CH. REFORMAT FIELDS=(F1:1,80,F2:1,80) writes an 160-byte record: first 80 bytes from F1, next 80 from F2. For UNPAIRED,ONLY, when a record is only in F1 the F2 part is blank (or low values); when only in F2 the F1 part is blank. So you can tell adds vs deletes by checking which half has data. In practice you often add a one-byte indicator in REFORMAT (e.g. from a constant or from the join) and then use OUTFIL INCLUDE/OMIT or SPLITBY to write separate add and delete files.
To get three separate datasets—records only in the first file, only in the second, and in both—use UNPAIRED,F,F so the join output contains every record. In REFORMAT you must include a field that indicates which case each record is: only F1, only F2, or both. DFSORT does not automatically add such a field; you can use a user exit (MODS) to set it, or in some setups you can infer it from the presence of data in F1 vs F2 portions (e.g. if F2 portion is blank then record was only in F1). A common approach is to use REFORMAT to build a record that includes a one-byte code: e.g. 1=only F1, 2=only F2, 3=both. That requires the exit or post-processing. Alternatively, use two JOINKEYS steps: one with UNPAIRED,F to get only-F1 records, one with UNPAIRED with F2 to get only-F2 records, and one with UNPAIRED,NONE to get matched; then you have three outputs from three steps. Many shops use one JOINKEYS with UNPAIRED,F,F and a REFORMAT that includes F1 and F2 full records, then a second step (or ICETOOL) that checks whether the F1 part is blank (then it's an add) or F2 part is blank (then it's a delete) and routes to three files.
If your REFORMAT can distinguish records by checking blank/low-value in F1 or F2, you can use OUTFIL with INCLUDE. For example REFORMAT FIELDS=(F1:1,80,F2:1,80). For unmatched F1 records, F2 bytes are typically blank. For unmatched F2 records, F1 bytes are typically blank. So you could use OUTFIL with INCLUDE=(81,80,CH,EQ,C' ') for "only in F1" (F2 part all blanks) and OMIT with the same for "only in F2"—but that is fragile if data contains blanks. The robust way is to use a MODS exit to set a byte to 1/2/3 and then SPLITBY or INCLUDE/OMIT on that byte. For illustration, a simple split that works when unused half is blank:
12345678JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) REFORMAT FIELDS=(F1:1,80,F2:1,80) JOIN UNPAIRED,F,F SORT FIELDS=COPY OUTFIL FNAMES=ONLYF1,INCLUDE=(81,80,CH,EQ,C' ') OUTFIL FNAMES=ONLYF2,INCLUDE=(1,80,CH,EQ,C' ') OUTFIL FNAMES=BOTH,OMIT=(81,80,CH,EQ,C' '),OMIT=(1,80,CH,EQ,C' ')
The exact OMIT/INCLUDE conditions depend on your record layout and how DFSORT fills the "missing" side for unmatched records (often blanks). Check your DFSORT documentation for REFORMAT and UNPAIRED behavior. When in doubt, use a user exit to set an explicit indicator and split on that.
To find not just adds and deletes but also "changed" records (same key, different content), you need to compare the full record or selected fields after the join. Step 1: JOINKEYS with UNPAIRED,NONE (or UNPAIRED,F,F and then keep only matched) so you have matched pairs. Step 2: REFORMAT so the output record contains the key and the full record (or fields to compare) from both F1 and F2—for example F1:1,80 and F2:1,80. Step 3: Use a second DFSORT step or a small program to compare the two halves byte-by-byte. If they differ, write the record to a "changes" file. DFSORT does not have a built-in "compare two fields" operator in INCLUDE/OMIT; you need a user exit (MODS) or a separate program/ICETOOL step to do the byte comparison and keep only differing records. So the pattern is: JOINKEYS (match on key) → REFORMAT (both versions in one record) → exit or next step (compare bytes, output only differences).
You need two input DDs for the two files (e.g. SORTIN and SORTIN2, or F1 and F2 as required by your DFSORT version) and one or more output DDs. Example JCL skeleton:
123456789//STEP EXEC PGM=SORT //SORTIN DD DSN=OLD.FILE,DISP=SHR //SORTIN2 DD DSN=NEW.FILE,DISP=SHR //SORTOUT DD DSN=COMPARE.OUTPUT,DISP=(NEW,CATLG),... //SYSIN DD * JOINKEYS F1=SORTIN,FIELDS=(1,20,CH) JOINKEYS F2=SORTIN2,FIELDS=(1,20,CH) ... /*
Ensure OLD.FILE and NEW.FILE are both sorted by (1,20,CH) before this step. The join output goes to SORTOUT (or to OUTFIL DDs if you use multiple OUTFILs with FNAMES=).
Imagine you have two lists of toy names: yesterday's list and today's. You want to know which toys are new (on today's list but not yesterday's), which are gone (on yesterday's but not today's), and which are on both lists. JOINKEYS is like a helper who lines up both lists by name and tells you: "this name only on the first list," "this name only on the second list," and "this name on both." So you get three piles: new toys, gone toys, and same toys. If you also want to know "same name but the color changed," you need another step that looks at both lists side by side and says when the rest of the line is different.
1. What is the main DFSORT feature used to compare two files by key?
2. How do you get three outputs: records only in file 1, only in file 2, and in both?
3. What does UNPAIRED,ONLY do in JOINKEYS?
4. Why must both input files be in key order for JOINKEYS comparison?
5. How can you compare full records (not just keys) to find changed content?