MainframeMaster

One-to-One Joins

A one-to-one join in DFSORT JOINKEYS means that for each value of the join key, there is at most one record in the first file (F1) and at most one record in the second file (F2). So each key produces at most one matched pair and thus at most one output row (for an inner join), or exactly one output row per F1 record in a left join and one per F2 in a right join. DFSORT does not have a special “one-to-one” control statement—the behavior comes from your data having unique keys in each file. This page explains what one-to-one means, how to ensure it (including deduplication before the join), and how it differs from one-to-many and many-to-many.

JOINKEYS / key multiplicity
Progress0 of 0 lessons

What One-to-One Means

In relational terms, the join key in file 1 is unique (primary key or unique key), and the join key in file 2 is unique. So there is a one-to-one relationship between the two files on that key. When you join them, each key that appears in both files produces exactly one row (one F1 record × one F2 record). Each key that appears only in one file (for outer joins) produces one row per unpaired record. So the output row count is predictable: for an inner join, it equals the number of keys that exist in both files; for a left join, it equals the number of records in F1; for a right join, the number in F2; for a full outer join, the number of distinct keys that appear in either file (with one row per key when keys are unique in each file).

Join multiplicity and output rows per key
TypeF1 records per keyF2 records per keyOutput rows per key (matched)
One-to-oneAt most one per keyAt most one per keyAt most one row per key (matched)
One-to-manyOne per keyMany per keyMany rows per key (one F1 × many F2)
Many-to-oneMany per keyOne per keyMany rows per key (many F1 × one F2)
Many-to-manyMany per keyMany per keyProduct of counts per key

Ensuring One-to-One: Deduplicate Before the Join

DFSORT JOINKEYS does not enforce uniqueness. If either file has duplicate keys, you will get multiple output rows per key (one per combination). To get one-to-one results, ensure each file has at most one record per join key before the join. A common approach is to run a separate SORT step on each file (or only on the file that has duplicates) with the same key as the join key and SUM FIELDS=NONE. That keeps one record per key (the first after sort, unless you use other options). Write the result to a temporary or permanent dataset and use it as SORTJNF1 or SORTJNF2.

Example: dedupe file 2 on join key, then one-to-one join

text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
* Step 1: Dedupe FILE2 on key 1-10, keep one per key //STEP1 EXEC PGM=SORT //SORTIN DD DSN=MY.FILE2,DISP=SHR //SORTOUT DD DSN=MY.FILE2.UNIQ,DISP=(NEW,PASS),... //SYSIN DD * SORT FIELDS=(1,10,CH,A) SUM FIELDS=NONE /* * Step 2: Join FILE1 (unique on 1-10) with deduped FILE2 //STEP2 EXEC PGM=SORT //SORTJNF1 DD DSN=MY.FILE1,DISP=SHR //SORTJNF2 DD DSN=MY.FILE2.UNIQ,DISP=SHR //SORTOUT DD DSN=MY.JOINED,DISP=(NEW,CATLG),... //SYSIN DD * JOINKEYS F1=SORTJNF1,FIELDS=(1,10,CH,A) JOINKEYS F2=SORTJNF2,FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,80,F2:11,70) SORT FIELDS=COPY /*

After step 1, MY.FILE2.UNIQ has at most one record per key. If FILE1 is already unique on 1–10, the join in step 2 is one-to-one: at most one output row per key.

When Both Files Are Already Unique

If your master file and lookup file are already unique on the join key (e.g. customer ID in both), you can run JOINKEYS directly. No dedupe step is needed. The result will be one-to-one: one row per key for inner join, one row per F1 for left join, and so on.

Inner Join: One Row per Matched Key

With an inner join (no JOIN statement) and one-to-one data, the output has exactly one row for each key that exists in both files. So the output record count equals the number of keys that appear in both F1 and F2. If F1 has 1000 unique keys and F2 has 800 unique keys and 700 keys are common, you get 700 output rows.

Left Join: One Row per F1 Record

With a left join (JOIN UNPAIRED,F1) and one-to-one data, every F1 record produces exactly one output row. So the output record count equals the number of records in F1. Matched keys get F1+F2 data; unmatched F1 keys get F1 data and blank F2. There is no row expansion because each F1 record matches at most one F2 record.

Right Join and Full Outer Join

With a right join (JOIN UNPAIRED,F2), output row count equals the number of records in F2 when data is one-to-one. With a full outer join (JOIN UNPAIRED,F1,F2), you get one row per F1 record plus one row per F2 record that has no match in F1—but because matched pairs produce one row each (not two), the total is: (number of F1 records) + (number of F2-only keys × records per key). With strict one-to-one (one record per key in each file), that simplifies to one row per distinct key that appears in either file (matched keys once, F1-only keys once, F2-only keys once).

What If I Don’t Dedupe and Have Duplicates?

If F1 has two records with key "X" and F2 has one record with key "X", you get two output rows for "X" (one-to-many). If both have two records each with key "X", you get four output rows (many-to-many). So the output can grow. Downstream programs that assume one row per key may break (duplicate keys, wrong totals). Always ensure or verify key uniqueness if you expect one-to-one behavior.

When to Use One-to-One Joins

  • Master–master or master–lookup: one record per entity in each file (e.g. customer master and customer attributes).
  • You need a fixed, predictable output row count (e.g. one row per key for reporting).
  • Downstream logic assumes at most one row per key (e.g. control break, key-based updates).

Explain It Like I'm Five

You have a list of kids (one line per kid) and a list of their favorite colors (one line per kid). You want to match them by name. If each list has each name only once, then for each name you get one line: the kid and their color. That’s one-to-one. If the color list had the same kid twice (two favorite colors?), you’d get two lines for that kid. So one-to-one means “each name appears only once on each list,” and then you get one line per name in the result.

Exercises

  1. File 1 has 500 records with unique customer IDs; file 2 has 400 records with unique customer IDs. How many output rows do you get with an inner join (one-to-one)? How many with a left join?
  2. Write a two-step JCL outline: step 1 dedupe a transaction file on key 1–6 (keep first record per key); step 2 run a left join of a master file (unique on 1–6) with the deduped transaction file.
  3. What happens to output row count if you forget to dedupe file 2 and it has three records per key while file 1 has one per key?
  4. When would you choose to dedupe only F2 instead of both F1 and F2 before a join?

Quiz

Test Your Knowledge

1. What defines a one-to-one join in DFSORT JOINKEYS?

  • You must use a special JOIN ONE2ONE statement
  • Each join key appears at most once in file 1 and at most once in file 2, so you get at most one output row per key
  • You must sort both files first with OPTION UNIQUE
  • Only inner joins can be one-to-one

2. If file 1 has one record with key "A001" and file 2 has one record with key "A001", how many output rows do you get for that key with an inner join?

  • Zero
  • One—one F1 record paired with one F2 record
  • Two
  • It depends on REFORMAT

3. How can you guarantee one-to-one behavior if your input files might have duplicate keys?

  • JOINKEYS automatically deduplicates
  • Preprocess one or both files with DFSORT SUM FIELDS=NONE (or equivalent) on the join key to keep one record per key before the join; then run JOINKEYS
  • Use REFORMAT with a special option
  • You cannot; you must use a program

4. In a one-to-one left join, how many output rows do you get?

  • Exactly the number of rows in file 2
  • Exactly the number of rows in file 1—one row per F1 record, with F2 data when the key matches (once) or blank when it does not
  • The sum of both file counts
  • At most the minimum of the two file counts

5. When is a one-to-one join the right choice?

  • Only for tiny files
  • When each entity (e.g. customer ID, product ID) appears once in each file—e.g. master and detail that are both unique on the key, or two extracts that are already deduplicated
  • Only for full outer join
  • When you want the maximum number of output rows