A many-to-many join in DFSORT JOINKEYS occurs when both input files can have multiple records per join key. For each key value, DFSORT pairs every record from file 1 (F1) that has that key with every record from file 2 (F2) that has that key. So if key "X" has 3 records in F1 and 4 in F2, you get 3 × 4 = 12 output rows for that key alone. There is no separate "many-to-many" control statement—you use the same JOINKEYS and REFORMAT as for inner, left, or right joins; the multiplicity comes from the data. This page explains how many-to-many works, how to compute output row count, when it is useful, and how to avoid unwanted expansion or control output size.
For every value of the join key, the join produces one output row per pair of (F1 record, F2 record) that share that key. So:
Output rows per key = (number of F1 records with that key) × (number of F2 records with that key).
When both numbers are 1, you get 1 row (one-to-one). When one is 1 and the other is many, you get "many" rows (one-to-many). When both can be greater than 1, you get the product—that is many-to-many. The following table illustrates the formula.
| Scenario | F1 records per key | F2 records per key | Output rows per key |
|---|---|---|---|
| 1 F1, 1 F2 per key | 1 | 1 | 1 row (one-to-one) |
| 1 F1, 5 F2 per key | 1 | 5 | 5 rows (one-to-many) |
| 3 F1, 2 F2 per key | 3 | 2 | 6 rows (many-to-many) |
| 4 F1, 4 F2 per key | 4 | 4 | 16 rows (many-to-many) |
JOINKEYS does not pick "one best" match per key—it produces every possible pairing. So one F1 record with key "A" and three F2 records with key "A" yield three rows (that F1 with F2 record 1, same F1 with F2 record 2, same F1 with F2 record 3). If you have two F1 and three F2 records with key "A", you get 2 × 3 = 6 rows: (F1a,F2a), (F1a,F2b), (F1a,F2c), (F1b,F2a), (F1b,F2b), (F1b,F2c). That multiplicative effect is what makes many-to-many output large when both files have duplicates on the key.
For an inner join, total output rows = sum over all keys that appear in both files of (F1 count for that key × F2 count for that key). For a left join, add one row for each F1 record whose key does not appear in F2 (F2 portion blank). For a right join, add one row for each F2 record whose key does not appear in F1 (F1 portion blank). For a full outer join, add both sets of unpaired rows. So total output can be much larger than either input file—plan SORTOUT size and downstream processing accordingly.
Suppose file 1 is order lines (multiple lines per order ID) and file 2 is shipment lines (multiple shipments per order ID). You want every order line paired with every shipment for that order—a classic many-to-many by order ID. One order might have 5 order lines and 3 shipments; the join produces 5 × 3 = 15 rows for that order. REFORMAT defines how much of each F1 and F2 record goes into the output (e.g. order line fields plus shipment fields).
12345JOINKEYS F1=ORDERS,FIELDS=(1,10,CH,A) JOINKEYS F2=SHIP,FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,80,F2:1,60) JOIN UNPAIRED,F1,F2 SORT FIELDS=COPY
Join key is positions 1–10 in both files (e.g. order ID). Each F1 record is paired with every F2 record that has the same key. Output record length is 80 + 60 = 140 bytes. JOIN UNPAIRED,F1,F2 keeps orders with no shipments and shipments with no orders (full outer). With many-to-many data, the number of output rows can be large.
If you expected one-to-one or one-to-many but both files have duplicate keys, you will see more rows than expected. For example one customer master and one account master per customer ID would normally give one row per customer; if either file has duplicates (e.g. duplicate customer IDs), you get multiple rows per customer. To fix that, deduplicate the file that should have unique keys before the join—e.g. use SUM FIELDS=NONE with the join key as the control field to keep one record per key, or run a prior step that selects one record per key.
To get one record per key in a file, you can run a separate DFSORT step: SORT FIELDS=(key position, length, format, A), then SUM FIELDS=NONE. That keeps one record per key (the first after sort). Alternatively use INREC/OUTREC and a stable sort with SUM so the "first" record is the one you want. After deduplication, the join becomes one-to-one or one-to-many and the output row count is predictable.
Many-to-many joins can use a lot of sortwork and produce large SORTOUT. DFSORT must read both files, match by key, and write one record per pair. If both files have high duplicate-key counts, the intermediate and output sizes grow quickly. Ensure SORTWK DD(s) and SORTOUT are sized for the expanded volume. Using SORTED and NOSEQCK on JOINKEYS when inputs are already sorted by the join key can reduce CPU; otherwise the join phase will sort or merge by key.
One-to-one: at most one record per key in each file → at most one output row per key. One-to-many: one file has one per key, the other has many per key → output rows per key = the "many" count. Many-to-many: both files can have many per key → output rows per key = F1 count × F2 count. The same JOINKEYS syntax handles all three; only the data (and optional deduplication) determines which you get.
You have a list of kids and a list of toys. Some kids have the same name (like two "Alex"), and some toys have the same type (like three "ball"). You want to write a line for every kid–toy pair that "matches" by something (e.g. favorite type). So one Alex and three balls of that type give you three lines (Alex + ball1, Alex + ball2, Alex + ball3). If you have two Alexes and three balls, you get 2 × 3 = 6 lines—every Alex with every ball. That’s many-to-many: lots of lines because both sides can repeat.
1. In a many-to-many join, file 1 has 3 records with key "X" and file 2 has 4 records with key "X". How many output rows do you get for key "X" with an inner join?
2. What defines a many-to-many join in DFSORT?
3. Why can many-to-many joins produce very large output?
4. How do you avoid accidental many-to-many expansion when you wanted one-to-one?
5. In a left join with many-to-many (e.g. 2 F1 and 3 F2 records for key X), how many output rows for key X?