MainframeMaster

Inner Joins

An inner join in DFSORT JOINKEYS means you keep only records that have a matching key in both input files. Records that exist in one file but have no matching key in the other file are dropped. This is the default behavior when you use JOINKEYS and do not code JOIN UNPAIRED. You need two JOINKEYS statements (one per file) with the join key for each, and a REFORMAT statement to define the joined output layout. The two files can have different record lengths and key positions; duplicate keys in either file produce multiple output rows (one per combination). This page is a deep dive on inner joins: when to use them, control statement order, key position and format, REFORMAT layout, JCL DD names, and one-to-one vs one-to-many behavior.

JOINKEYS
Progress0 of 0 lessons

What an Inner Join Produces

With an inner join, the output contains one row for each matched pair of records. A “pair” is one record from file 1 (F1) and one record from file 2 (F2) that have the same value in the join key. If a key appears only in F1 or only in F2, no output row is produced for that key. So the result is a subset of both files: only the keys that exist in both files, with the combined data you specify in REFORMAT.

Default: No JOIN UNPAIRED

To get an inner join, you do not code JOIN UNPAIRED. As soon as you add JOIN UNPAIRED,F1 or JOIN UNPAIRED,F2 (or both), you are doing an outer join and unmatched records from one or both files are included. So for a pure inner join, the control cards are: two JOINKEYS, REFORMAT, and optionally SORT FIELDS=COPY or SORT FIELDS=—no JOIN statement.

Join type vs JOIN UNPAIRED
Join typeWhat to codeResult
Inner (default)No JOIN UNPAIREDOnly records with matching key in both files
Left outerJOIN UNPAIRED,F1All F1 records; F2 data when match exists
Right outerJOIN UNPAIRED,F2All F2 records; F1 data when match exists
Full outerJOIN UNPAIRED,F1,F2All records from both files

Control Statement Order and Example

The usual order is: first JOINKEYS for file 1 (F1), then JOINKEYS for file 2 (F2), then REFORMAT FIELDS=, then SORT FIELDS=COPY (or SORT FIELDS= if you want to re-sort the joined data). The two input files are read from the DD names you specify—default SORTJNF1 and SORTJNF2—or from F1= and F2= if you override.

Example: inner join with keys at position 1

text
1
2
3
4
JOINKEYS F1=SORTJNF1,FIELDS=(1,10,CH,A) JOINKEYS F2=SORTJNF2,FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,80,F2:11,70) SORT FIELDS=COPY

File 1 (SORTJNF1) and file 2 (SORTJNF2) are joined on the first 10 bytes (character, ascending). Only records with the same 10-byte key in both files produce output. REFORMAT builds the output record: 80 bytes from file 1 (positions 1–80), then 70 bytes from file 2 (positions 11–80 of F2). Total output length is 80 + 70 = 150 bytes. SORT FIELDS=COPY keeps the order produced by the join without re-sorting.

Join Key in Different Positions

The join key does not have to be in the same position in both files. You specify the position and length for each file in its own JOINKEYS. For example, file 1 might have customer ID at 1–8 and file 2 at 15–22. Then you code FIELDS=(1,8,CH,A) for F1 and FIELDS=(15,8,CH,A) for F2. The values must match; the positions can differ. The format (CH, PD, etc.) and sort order (A or D) should be such that equal values compare equal—typically both keys use the same format and ascending order.

Example: key at different positions

text
1
2
3
4
JOINKEYS F1=MASTER,FIELDS=(1,8,CH,A) JOINKEYS F2=TRAN,FIELDS=(20,8,CH,A) REFORMAT FIELDS=(F1:1,100,F2:1,50) SORT FIELDS=COPY

The join key for the first file (MASTER) is bytes 1–8; for the second file (TRAN) it is bytes 20–27. Records with the same key value (e.g. same customer ID) are paired. Output is 100 bytes from F1 and 50 bytes from F2 (150 bytes total). Only records that match on that key appear in the output.

REFORMAT: Building the Joined Record

REFORMAT FIELDS= is required. It lists which parts of each file go into the output record. Each item is F1:position,length or F2:position,length. They are concatenated in order. So FIELDS=(F1:1,50,F2:1,40) produces a 90-byte record: first 50 bytes of the F1 record, then first 40 bytes of the F2 record. You can skip the key in the output (e.g. F2:11,70 to avoid duplicating the key from F2) or include it from one or both sides. The output record length is the sum of all the lengths in the REFORMAT list.

One-to-One vs One-to-Many

If each key appears at most once in each file, you get at most one output record per key (one-to-one). If file 1 has two records with key "A001" and file 2 has one record with key "A001", you get two output records: the single F2 record is paired with each of the two F1 records. If both files have two records with "A001", you get four output records (every combination). So the number of output rows per key is the product of the number of F1 records with that key and the number of F2 records with that key. Plan your REFORMAT and downstream processing for possible row expansion when duplicates exist.

JCL: Allocating the Two Files

By default, the first join file is read from SORTJNF1 and the second from SORTJNF2. In JCL you provide two DD statements pointing to your input datasets. You do not use SORTIN for JOINKEYS—SORTIN is for single-file sort/copy. Example JCL:

text
1
2
3
4
5
6
7
//STEP1 EXEC PGM=SORT //SORTJNF1 DD DSN=MY.MASTER,DISP=SHR //SORTJNF2 DD DSN=MY.TRAN,DISP=SHR //SORTOUT DD DSN=MY.JOINED,DISP=(NEW,CATLG),... //SYSIN DD * ... JOINKEYS and REFORMAT ... /*

If you use F1=MASTER and F2=TRAN in the JOINKEYS statements, then you would use DD names MASTER and TRAN instead of SORTJNF1 and SORTJNF2. SORTOUT is the joined output.

Pre-Sorted Input: SORTED and NOSEQCK

If both input files are already sorted by the join key in the same order (e.g. ascending), you can specify SORTED on the JOINKEYS statements so DFSORT does not re-sort them, which can improve performance. You may also specify NOSEQCK to skip sequence checking. The key order (A or D) and key definition must match the physical order of the files. See your DFSORT manual for exact syntax (e.g. JOINKEYS F1=...,FIELDS=(...),SORTED).

When to Use an Inner Join

Use an inner join when you want only records that exist in both files. Typical cases: match a transaction file to a master file and keep only transactions that have a valid master; match an order file to a product file and keep only orders for products that exist; combine two extracts that share a common key and you care only about keys present in both. If you need to keep records that have no match (e.g. “all customers and their orders, including customers with no orders”), use a left outer join (JOIN UNPAIRED,F1) instead.

Explain It Like I'm Five

You have two lists: one list of kids and their favorite color, and one list of kids and their favorite snack. You want one list that has kid, color, and snack—but only for kids who appear on both lists. So you look at the first list and for each name you look on the second list. If the name is on both lists, you write one line with the name, color, and snack. If a name is only on the first list or only on the second list, you skip it. That’s the inner join: only the names that are on both lists get a line in the result.

Exercises

  1. Write JOINKEYS control statements for an inner join of two files (DD names FILE1 and FILE2), both with a 6-byte character key at position 1. REFORMAT should output 60 bytes from file 1 and 40 bytes from file 2 (starting at position 1 of each).
  2. If file 1 has 100 records and file 2 has 80 records, and every key in file 1 exists in file 2 but 20 keys in file 2 do not exist in file 1, approximately how many output records do you get from an inner join? (Assume at most one record per key in each file.)
  3. What is the difference between inner join and left outer join when a record in file 1 has no match in file 2?
  4. Why do you not code JOIN UNPAIRED for an inner join?

Quiz

Test Your Knowledge

1. How do you request an inner join with JOINKEYS?

  • Code JOIN INNER
  • Do not code JOIN UNPAIRED—the default is inner join; only records with matching keys in both files are written
  • Code JOIN UNPAIRED,F1,F2
  • Use INCLUDE only

2. What happens to a record in file 1 that has no matching key in file 2?

  • It is written with blank F2 data
  • It is dropped—inner join keeps only records that have a match in both files
  • It is written to a separate file
  • It causes an error

3. If file 1 has two records with key "A001" and file 2 has one record with key "A001", how many output records do you get for that key with an inner join?

  • One
  • Two—each F1 record is matched with the F2 record, so you get two joined records (one-to-many)
  • Three
  • None

4. What DD names are used for the two join inputs if you do not specify F1= or F2=?

  • SORTIN and SORTOUT
  • SORTJNF1 and SORTJNF2
  • JOIN1 and JOIN2
  • F1 and F2

5. Why is REFORMAT required for an inner join with JOINKEYS?

  • It is optional
  • REFORMAT FIELDS= defines which fields from F1 and F2 go into the joined output record; without it DFSORT does not know the output layout
  • Only for outer joins
  • To sort the result