What is a one-to-one join in DFSORT?

A one-to-one join is a JOINKEYS join where each join key appears at most once in the first file and at most once in the second file. So for each key you get at most one output row (for inner join) or exactly one row per F1 record in a left join and one per F2 in a right join. DFSORT does not have a special “one-to-one” statement; the behavior comes from the data having unique keys in each file.

How do I get one-to-one join results with JOINKEYS?

Ensure each input file has at most one record per join key. If needed, run a prior step (e.g. SORT with SUM FIELDS=NONE on the join key) to remove duplicates. Then run JOINKEYS as usual (inner, left, right, or full outer). The result will have at most one row per key for matched keys, and one row per unpaired record for outer joins.

What happens if I join two files without ensuring uniqueness and both have duplicate keys?

You get a many-to-many (or one-to-many) result: one output row per combination of F1 and F2 records with the same key. For example one F1 record with key X and three F2 records with key X produce three output rows. So the output can have more rows than either file. If you expect one-to-one, duplicate keys will break that assumption.

Can I use INREC or SUM before the join to enforce one-to-one?

JOINKEYS reads from two separate files (e.g. SORTJNF1 and SORTJNF2). You cannot apply INREC or SUM inside the same JOINKEYS step to both. To enforce one-to-one, run a separate SORT step on one or both files first (e.g. SORT with SUM FIELDS=NONE on the join key), write the deduplicated result to temporary or permanent datasets, then use those as inputs to the JOINKEYS step.

Does one-to-one affect which JOIN statement I use (inner, left, right, full)?

No. One-to-one is about the multiplicity of keys in the data. You still choose inner (no JOIN), left (JOIN UNPAIRED,F1), right (JOIN UNPAIRED,F2), or full (JOIN UNPAIRED,F1,F2) based on whether you want to keep only matches or all from one or both files. One-to-one only affects how many rows you get per key (at most one for matched keys).

DFSORT One-to-One Joins - Single Match Per Key JOINKEYS

One-to-One Joins

A one-to-one join in DFSORT JOINKEYS means that for each value of the join key, there is at most one record in the first file (F1) and at most one record in the second file (F2). So each key produces at most one matched pair and thus at most one output row (for an inner join), or exactly one output row per F1 record in a left join and one per F2 in a right join. DFSORT does not have a special “one-to-one” control statement—the behavior comes from your data having unique keys in each file. This page explains what one-to-one means, how to ensure it (including deduplication before the join), and how it differs from one-to-many and many-to-many.

JOINKEYS / key multiplicity

What One-to-One Means

In relational terms, the join key in file 1 is unique (primary key or unique key), and the join key in file 2 is unique. So there is a one-to-one relationship between the two files on that key. When you join them, each key that appears in both files produces exactly one row (one F1 record × one F2 record). Each key that appears only in one file (for outer joins) produces one row per unpaired record. So the output row count is predictable: for an inner join, it equals the number of keys that exist in both files; for a left join, it equals the number of records in F1; for a right join, the number in F2; for a full outer join, the number of distinct keys that appear in either file (with one row per key when keys are unique in each file).

Join multiplicity and output rows per key
Type	F1 records per key	F2 records per key	Output rows per key (matched)
One-to-one	At most one per key	At most one per key	At most one row per key (matched)
One-to-many	One per key	Many per key	Many rows per key (one F1 × many F2)
Many-to-one	Many per key	One per key	Many rows per key (many F1 × one F2)
Many-to-many	Many per key	Many per key	Product of counts per key

Ensuring One-to-One: Deduplicate Before the Join

DFSORT JOINKEYS does not enforce uniqueness. If either file has duplicate keys, you will get multiple output rows per key (one per combination). To get one-to-one results, ensure each file has at most one record per join key before the join. A common approach is to run a separate SORT step on each file (or only on the file that has duplicates) with the same key as the join key and SUM FIELDS=NONE. That keeps one record per key (the first after sort, unless you use other options). Write the result to a temporary or permanent dataset and use it as SORTJNF1 or SORTJNF2.

Example: dedupe file 2 on join key, then one-to-one join

text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
  * Step 1: Dedupe FILE2 on key 1-10, keep one per key
  //STEP1    EXEC PGM=SORT
  //SORTIN   DD DSN=MY.FILE2,DISP=SHR
  //SORTOUT  DD DSN=MY.FILE2.UNIQ,DISP=(NEW,PASS),...
  //SYSIN    DD *
    SORT FIELDS=(1,10,CH,A)
    SUM FIELDS=NONE
  /*
  * Step 2: Join FILE1 (unique on 1-10) with deduped FILE2
  //STEP2    EXEC PGM=SORT
  //SORTJNF1 DD DSN=MY.FILE1,DISP=SHR
  //SORTJNF2 DD DSN=MY.FILE2.UNIQ,DISP=SHR
  //SORTOUT  DD DSN=MY.JOINED,DISP=(NEW,CATLG),...
  //SYSIN    DD *
    JOINKEYS F1=SORTJNF1,FIELDS=(1,10,CH,A)
    JOINKEYS F2=SORTJNF2,FIELDS=(1,10,CH,A)
    REFORMAT FIELDS=(F1:1,80,F2:11,70)
    SORT FIELDS=COPY
  /*

After step 1, MY.FILE2.UNIQ has at most one record per key. If FILE1 is already unique on 1–10, the join in step 2 is one-to-one: at most one output row per key.

When Both Files Are Already Unique

If your master file and lookup file are already unique on the join key (e.g. customer ID in both), you can run JOINKEYS directly. No dedupe step is needed. The result will be one-to-one: one row per key for inner join, one row per F1 for left join, and so on.

Inner Join: One Row per Matched Key

With an inner join (no JOIN statement) and one-to-one data, the output has exactly one row for each key that exists in both files. So the output record count equals the number of keys that appear in both F1 and F2. If F1 has 1000 unique keys and F2 has 800 unique keys and 700 keys are common, you get 700 output rows.

Left Join: One Row per F1 Record

With a left join (JOIN UNPAIRED,F1) and one-to-one data, every F1 record produces exactly one output row. So the output record count equals the number of records in F1. Matched keys get F1+F2 data; unmatched F1 keys get F1 data and blank F2. There is no row expansion because each F1 record matches at most one F2 record.

Right Join and Full Outer Join

With a right join (JOIN UNPAIRED,F2), output row count equals the number of records in F2 when data is one-to-one. With a full outer join (JOIN UNPAIRED,F1,F2), you get one row per F1 record plus one row per F2 record that has no match in F1—but because matched pairs produce one row each (not two), the total is: (number of F1 records) + (number of F2-only keys × records per key). With strict one-to-one (one record per key in each file), that simplifies to one row per distinct key that appears in either file (matched keys once, F1-only keys once, F2-only keys once).

What If I Don’t Dedupe and Have Duplicates?

If F1 has two records with key "X" and F2 has one record with key "X", you get two output rows for "X" (one-to-many). If both have two records each with key "X", you get four output rows (many-to-many). So the output can grow. Downstream programs that assume one row per key may break (duplicate keys, wrong totals). Always ensure or verify key uniqueness if you expect one-to-one behavior.

When to Use One-to-One Joins

Master–master or master–lookup: one record per entity in each file (e.g. customer master and customer attributes).
You need a fixed, predictable output row count (e.g. one row per key for reporting).
Downstream logic assumes at most one row per key (e.g. control break, key-based updates).

Explain It Like I'm Five

You have a list of kids (one line per kid) and a list of their favorite colors (one line per kid). You want to match them by name. If each list has each name only once, then for each name you get one line: the kid and their color. That’s one-to-one. If the color list had the same kid twice (two favorite colors?), you’d get two lines for that kid. So one-to-one means “each name appears only once on each list,” and then you get one line per name in the result.

Exercises

File 1 has 500 records with unique customer IDs; file 2 has 400 records with unique customer IDs. How many output rows do you get with an inner join (one-to-one)? How many with a left join?
Write a two-step JCL outline: step 1 dedupe a transaction file on key 1–6 (keep first record per key); step 2 run a left join of a master file (unique on 1–6) with the deduped transaction file.
What happens to output row count if you forget to dedupe file 2 and it has three records per key while file 1 has one per key?
When would you choose to dedupe only F2 instead of both F1 and F2 before a join?

Quiz

Test Your Knowledge

1. What defines a one-to-one join in DFSORT JOINKEYS?

You must use a special JOIN ONE2ONE statement
Each join key appears at most once in file 1 and at most once in file 2, so you get at most one output row per key
You must sort both files first with OPTION UNIQUE
Only inner joins can be one-to-one

2. If file 1 has one record with key "A001" and file 2 has one record with key "A001", how many output rows do you get for that key with an inner join?

Zero
One—one F1 record paired with one F2 record
Two
It depends on REFORMAT

3. How can you guarantee one-to-one behavior if your input files might have duplicate keys?

JOINKEYS automatically deduplicates
Preprocess one or both files with DFSORT SUM FIELDS=NONE (or equivalent) on the join key to keep one record per key before the join; then run JOINKEYS
Use REFORMAT with a special option
You cannot; you must use a program

4. In a one-to-one left join, how many output rows do you get?

Exactly the number of rows in file 2
Exactly the number of rows in file 1—one row per F1 record, with F2 data when the key matches (once) or blank when it does not
The sum of both file counts
At most the minimum of the two file counts

5. When is a one-to-one join the right choice?

Only for tiny files
When each entity (e.g. customer ID, product ID) appears once in each file—e.g. master and detail that are both unique on the key, or two extracts that are already deduplicated
Only for full outer join
When you want the maximum number of output rows