MainframeMaster

Many-to-Many Joins

A many-to-many join in DFSORT JOINKEYS occurs when both input files can have multiple records per join key. For each key value, DFSORT pairs every record from file 1 (F1) that has that key with every record from file 2 (F2) that has that key. So if key "X" has 3 records in F1 and 4 in F2, you get 3 × 4 = 12 output rows for that key alone. There is no separate "many-to-many" control statement—you use the same JOINKEYS and REFORMAT as for inner, left, or right joins; the multiplicity comes from the data. This page explains how many-to-many works, how to compute output row count, when it is useful, and how to avoid unwanted expansion or control output size.

JOINKEYS / many-to-many
Progress0 of 0 lessons

How Many-to-Many Produces Rows

For every value of the join key, the join produces one output row per pair of (F1 record, F2 record) that share that key. So:

Output rows per key = (number of F1 records with that key) × (number of F2 records with that key).

When both numbers are 1, you get 1 row (one-to-one). When one is 1 and the other is many, you get "many" rows (one-to-many). When both can be greater than 1, you get the product—that is many-to-many. The following table illustrates the formula.

Output rows per key (inner join): F1 count × F2 count
ScenarioF1 records per keyF2 records per keyOutput rows per key
1 F1, 1 F2 per key111 row (one-to-one)
1 F1, 5 F2 per key155 rows (one-to-many)
3 F1, 2 F2 per key326 rows (many-to-many)
4 F1, 4 F2 per key4416 rows (many-to-many)

Why the Count Multiplies

JOINKEYS does not pick "one best" match per key—it produces every possible pairing. So one F1 record with key "A" and three F2 records with key "A" yield three rows (that F1 with F2 record 1, same F1 with F2 record 2, same F1 with F2 record 3). If you have two F1 and three F2 records with key "A", you get 2 × 3 = 6 rows: (F1a,F2a), (F1a,F2b), (F1a,F2c), (F1b,F2a), (F1b,F2b), (F1b,F2c). That multiplicative effect is what makes many-to-many output large when both files have duplicates on the key.

Total Output Row Count

For an inner join, total output rows = sum over all keys that appear in both files of (F1 count for that key × F2 count for that key). For a left join, add one row for each F1 record whose key does not appear in F2 (F2 portion blank). For a right join, add one row for each F2 record whose key does not appear in F1 (F1 portion blank). For a full outer join, add both sets of unpaired rows. So total output can be much larger than either input file—plan SORTOUT size and downstream processing accordingly.

Example: Orders and Shipments

Suppose file 1 is order lines (multiple lines per order ID) and file 2 is shipment lines (multiple shipments per order ID). You want every order line paired with every shipment for that order—a classic many-to-many by order ID. One order might have 5 order lines and 3 shipments; the join produces 5 × 3 = 15 rows for that order. REFORMAT defines how much of each F1 and F2 record goes into the output (e.g. order line fields plus shipment fields).

Example control statements

text
1
2
3
4
5
JOINKEYS F1=ORDERS,FIELDS=(1,10,CH,A) JOINKEYS F2=SHIP,FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,80,F2:1,60) JOIN UNPAIRED,F1,F2 SORT FIELDS=COPY

Join key is positions 1–10 in both files (e.g. order ID). Each F1 record is paired with every F2 record that has the same key. Output record length is 80 + 60 = 140 bytes. JOIN UNPAIRED,F1,F2 keeps orders with no shipments and shipments with no orders (full outer). With many-to-many data, the number of output rows can be large.

When Many-to-Many Is Intended

  • Cross-product by key: you need every combination of records from the two files that share the same key (e.g. every order line with every shipment for that order).
  • Matrix or grid generation: keys are dimensions and you want one row per combination.
  • Audit or reconciliation: match multiple source records to multiple target records by key and produce a row per pair for review.

When Many-to-Many Is Unwanted

If you expected one-to-one or one-to-many but both files have duplicate keys, you will see more rows than expected. For example one customer master and one account master per customer ID would normally give one row per customer; if either file has duplicates (e.g. duplicate customer IDs), you get multiple rows per customer. To fix that, deduplicate the file that should have unique keys before the join—e.g. use SUM FIELDS=NONE with the join key as the control field to keep one record per key, or run a prior step that selects one record per key.

Deduplication Before the Join

To get one record per key in a file, you can run a separate DFSORT step: SORT FIELDS=(key position, length, format, A), then SUM FIELDS=NONE. That keeps one record per key (the first after sort). Alternatively use INREC/OUTREC and a stable sort with SUM so the "first" record is the one you want. After deduplication, the join becomes one-to-one or one-to-many and the output row count is predictable.

Performance and Resource Planning

Many-to-many joins can use a lot of sortwork and produce large SORTOUT. DFSORT must read both files, match by key, and write one record per pair. If both files have high duplicate-key counts, the intermediate and output sizes grow quickly. Ensure SORTWK DD(s) and SORTOUT are sized for the expanded volume. Using SORTED and NOSEQCK on JOINKEYS when inputs are already sorted by the join key can reduce CPU; otherwise the join phase will sort or merge by key.

Many-to-Many vs One-to-Many vs One-to-One

One-to-one: at most one record per key in each file → at most one output row per key. One-to-many: one file has one per key, the other has many per key → output rows per key = the "many" count. Many-to-many: both files can have many per key → output rows per key = F1 count × F2 count. The same JOINKEYS syntax handles all three; only the data (and optional deduplication) determines which you get.

Explain It Like I'm Five

You have a list of kids and a list of toys. Some kids have the same name (like two "Alex"), and some toys have the same type (like three "ball"). You want to write a line for every kid–toy pair that "matches" by something (e.g. favorite type). So one Alex and three balls of that type give you three lines (Alex + ball1, Alex + ball2, Alex + ball3). If you have two Alexes and three balls, you get 2 × 3 = 6 lines—every Alex with every ball. That’s many-to-many: lots of lines because both sides can repeat.

Exercises

  1. File 1 has 100 keys; for each key there are 2 F1 records. File 2 has the same 100 keys with 3 F2 records per key. How many output rows for an inner join?
  2. If you want at most one row per key, and both files have duplicates on the key, what step would you add before the join?
  3. Write REFORMAT for F1 record length 50 and F2 record length 40, output = F1 then F2. What is the output record length?
  4. When would total join output exceed the number of records in file 1? When would it exceed the number in file 2?

Quiz

Test Your Knowledge

1. In a many-to-many join, file 1 has 3 records with key "X" and file 2 has 4 records with key "X". How many output rows do you get for key "X" with an inner join?

  • 3
  • 4
  • 7
  • 12

2. What defines a many-to-many join in DFSORT?

  • A special JOINKEYS option like JOIN MANYTOMANY
  • The data: both files have at least one key that appears on multiple records; DFSORT uses the same JOINKEYS syntax and the multiplicity comes from the data
  • Using two REFORMAT statements
  • Sorting both files first by key

3. Why can many-to-many joins produce very large output?

  • Because DFSORT always doubles the record length
  • Because output rows per key = (F1 records for that key) × (F2 records for that key); if many keys have multiple records in both files, total rows can explode
  • Only when using JOIN UNPAIRED
  • Because REFORMAT duplicates every field

4. How do you avoid accidental many-to-many expansion when you wanted one-to-one?

  • Use JOIN ONCE
  • Deduplicate one or both files on the join key before the join (e.g. SUM FIELDS=NONE or a prior sort/copy with unique key) so each key appears at most once per file
  • Use INCLUDE to limit records
  • Many-to-many cannot be avoided

5. In a left join with many-to-many (e.g. 2 F1 and 3 F2 records for key X), how many output rows for key X?

  • 2 (one per F1 record)
  • 3 (one per F2 record)
  • 6 (2 × 3 pairs); unpaired F1 keys would add one row each with blank F2
  • 1