MainframeMaster

Conditional Joins

In SQL you might write a join with an extra condition in the ON or WHERE clause (e.g. join on customer ID and only where order date is in range). In DFSORT, JOINKEYS matches only on keys; there is no built-in "ON key AND condition". A conditional join in DFSORT means you restrict which records participate in the join or which joined rows appear in the output by using pre-join filtering, post-join filtering with OUTFIL INCLUDE/OMIT, or conditional output layout with REFORMAT IFTHEN. This page explains each approach and when to use it.

Advanced Tricks
Progress0 of 0 lessons

What Is a Conditional Join?

A conditional join is any join where the result is limited by something more than "keys match". For example: only join when file 1 status is active; only keep joined rows where the amount is greater than zero; or build the output record differently when a type field is "X" vs "Y". DFSORT does not support a single syntax like "JOIN ON key WHERE condition". Instead you combine JOINKEYS with one or more of: (1) pre-join filtering by running a prior step that filters one or both inputs, (2) post-join filtering by using OUTFIL INCLUDE/OMIT on the joined stream, or (3) REFORMAT IFTHEN to build the output record conditionally. Each approach is described below.

Pre-Join Filtering: Restricting Who Participates in the Join

JOINKEYS reads from two datasets (default DD names SORTJNF1 and SORTJNF2). It does not support INCLUDE or OMIT on those inputs in the same step. So if you want only certain records from file 1 (e.g. status = "A") or only certain records from file 2 (e.g. type = "ACTIVE") to participate in the join, you must filter first and then join the filtered data.

How to do it

  • Step 1: Run a DFSORT or COPY step that reads the full file 1. Use INCLUDE COND=(position,length,format,operator,value) to keep only records that meet your condition (e.g. status in position 10, length 1, character, equal to 'A'). Write the result to a temporary or permanent dataset.
  • Step 2: Run your join step. Allocate the filtered dataset to SORTJNF1 (or to the DD name you specify with JOINKEYS F1=ddname). Allocate the second file (filtered or not) to SORTJNF2. Code JOINKEYS and REFORMAT as usual. Only the records that passed the first step will be in the join.

The same idea applies to file 2: if only some F2 records should be used in the join, run a prior step that uses INCLUDE/OMIT on file 2 and write the result to a dataset, then point SORTJNF2 to that dataset. So "conditional" here means: the join sees only a subset of F1 and/or F2, defined by a prior filter step.

Example: Join Only Active Customers (Pre-Join Filter)

File 1 (customers): positions 1–10 customer ID, 11 status ('A' = active). File 2 (orders): 1–10 customer ID, 11–20 order amount. You want joined output only for active customers. First step: filter file 1 to active only. Second step: join.

text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
* Step 1: Filter customers to active only //STEP1 EXEC PGM=SORT //SORTIN DD DSN=...CUSTOMER,DISP=SHR //SORTOUT DD DSN=&&ACTIVE,DISP=(NEW,PASS),... //SYSIN DD * OPTION COPY INCLUDE COND=(11,1,CH,EQ,C'A') /* * * Step 2: Join active customers with orders //STEP2 EXEC PGM=SORT //SORTJNF1 DD DSN=&&ACTIVE,DISP=(OLD,PASS) //SORTJNF2 DD DSN=...ORDERS,DISP=SHR //SORTOUT DD ... //SYSIN DD * JOINKEYS FIELDS=(1,10,CH,A) JOINKEYS FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,11,F2:1,20) SORT FIELDS=COPY /*

In step 1, INCLUDE keeps only records with status 'A'. The result is written to &&ACTIVE. In step 2, SORTJNF1 is that filtered file, so only active customers are joined with orders. Orders for inactive customers never appear because their customer ID is not in F1.

Post-Join Filtering: Restricting Which Joined Rows Are Written

Sometimes you want every key match to be considered by the join, but you only want to keep certain joined rows in the output—for example, only where the order amount is greater than 100, or only where a date is in a given range. That is post-join filtering: the join runs normally, and then you use OUTFIL with INCLUDE= (or OMIT=) to write only the joined records that meet your condition.

The condition in OUTFIL INCLUDE/OMIT refers to the joined record layout—the layout produced by REFORMAT. So you must know the position and length of the field you are testing in that layout. REFORMAT FIELDS=(F1:1,50,F2:1,80) produces a 130-byte record: bytes 1–50 from F1, bytes 51–130 from F2. If the amount is in F2 at 11–18 (packed), in the joined record it is at 50+11 = 61, length 8. So INCLUDE=(61,8,PD,GT,+100) keeps only joined rows where that amount is greater than 100.

Example: Join Then Keep Only Rows Where Amount > 0

text
1
2
3
4
5
JOINKEYS F1=F1,FIELDS=(1,10,CH,A) JOINKEYS F2=F2,FIELDS=(1,10,CH,A) REFORMAT FIELDS=(F1:1,30,F2:1,40) OUTFIL FNAMES=VALID,INCLUDE=(31,8,PD,GT,+0) SORT FIELDS=COPY

REFORMAT builds a 70-byte record: F1 bytes 1–30, then F2 bytes 1–40. If the amount in F2 is at 21–28 (packed), in the joined record it is at 30+21=51, length 8. So INCLUDE=(51,8,PD,GT,+0) keeps only joined rows where that amount is greater than zero. Adjust the position (51 here) to match your actual REFORMAT layout. Only rows that pass the condition are written to the dataset allocated to VALID.

Conditional Output Layout with REFORMAT IFTHEN

REFORMAT FIELDS= can include IFTHEN clauses. That does not filter which rows are written; it controls how each output record is built. For example: when a type field in the joined data is "X", put certain fields in one order; when it is "Y", put them in another order or use a different edit. So the "condition" applies to the content of the joined record, and the layout (or value) of the output depends on it. This is useful when you need one physical record format for type A and another for type B, or when you want to set a flag or default based on what was joined.

IFTHEN in REFORMAT uses the same WHEN= and BUILD= (or OVERLAY=) syntax as in INREC/OUTREC. The positions in WHEN= refer to the joined record (after REFORMAT has assembled F1 and F2 segments). So you can test a field that came from F1 or F2 by its position in the reformatted record. For full details on IFTHEN see the IFTHEN clauses and REFORMAT deep dive tutorials.

Splitting Joined Output by Condition (Multiple OUTFILs)

You can combine a join with multiple OUTFIL statements so that different subsets of the joined rows go to different files. For example: one OUTFIL for region = "NORTH", one for region = "SOUTH", and one with SAVE for any other region. Each OUTFIL has its own INCLUDE= (or OMIT=) and FNAMES=. The joined record is evaluated against each OUTFIL in order; only records that match an OUTFIL's condition are written to that OUTFIL's dataset. SAVE on the last OUTFIL catches any joined row not written by a previous OUTFIL. So "conditional join" output can be split into several datasets based on field values in the joined record.

Strategy Summary

Conditional join approaches
ApproachWhen to useHow
Pre-join filterOnly certain records from F1 or F2 should participate in the join (e.g. status = active).Run a prior step: INCLUDE/OMIT on the file, write to temp dataset; use that as SORTJNF1 or SORTJNF2.
Post-join filterOnly joined rows that meet a condition should appear in the output (e.g. amount > 0, date in range).After JOINKEYS and REFORMAT, use OUTFIL with INCLUDE=(pos,len,fmt,op,value) on the joined record layout.
Conditional layoutThe output record layout or content should depend on a field in the joined data.Use REFORMAT FIELDS= with IFTHEN WHEN=... to build different segments based on conditions.

Explain It Like I'm Five

Imagine you have two stacks of cards: one with names and one with their scores. You want to match names and only keep the matches where the score is good. Matching names is the join. "Only keep good scores" is the condition. You can do it two ways: (1) Before matching, throw away all the score cards that are bad, then match—that's pre-join filter. (2) Match everyone, then when you write the result, only write a line when the score is good—that's post-join filter (OUTFIL INCLUDE). DFSORT doesn't let you say "match only if score is good" in one sentence; you either clean the stacks first or clean the result after.

Exercises

  1. You have F1 (customer ID 1–10, status 11) and F2 (customer ID 1–10, amount 11–18 PD). You want only joined rows where amount is between 100 and 500. Write the JOINKEYS, REFORMAT, and OUTFIL INCLUDE (use two conditions with AND if your DFSORT supports it, or one OUTFIL with a range condition).
  2. Why can you not code INCLUDE on JOINKEYS F1= to filter file 1 in the same step?
  3. You want joined output in two files: one for type "A" and one for type "B" (type is in the joined record at position 40, length 1). Write two OUTFIL statements with the appropriate INCLUDE conditions.

Quiz

Test Your Knowledge

1. What is a "conditional join" in DFSORT?

  • A join that only runs when JCL COND is true
  • A join where you only keep output rows that meet extra conditions (e.g. date in range, status active), or where only records that meet conditions participate in the join
  • JOINKEYS with REFORMAT only
  • Inner join only

2. You want to join F1 and F2 on customer ID but only keep rows where the joined amount is greater than 100. How do you do it?

  • INCLUDE on JOINKEYS (not supported)
  • Run the join, then use OUTFIL with INCLUDE=(position,length,PD,GT,+100) (or the correct position of amount in the joined record) so only joined rows with amount > 100 are written
  • Use two JOINKEYS
  • REFORMAT cannot include amount

3. You only want to join records from file 1 where status is "A" (active). How can you do that?

  • Code INCLUDE on JOINKEYS F1 (not valid)
  • Run a prior DFSORT step that copies file 1 with INCLUDE COND=(status_pos,1,CH,EQ,C'A') to a temporary dataset, then use that dataset as SORTJNF1 in the join step
  • Use JOIN UNPAIRED,F1 only
  • Use REFORMAT with IFTHEN only

4. What is REFORMAT IFTHEN used for in a join?

  • To filter which records are joined (no; that is pre- or post-join filtering)
  • To build different output record layouts based on conditions in the joined data (e.g. when F2 type is "X" put field A in one place, when "Y" put it elsewhere)
  • To choose inner vs outer join
  • To sort the joined output

5. Can you use multiple OUTFILs after a join to split joined output by condition?

  • No, only one output from a join
  • Yes. After JOINKEYS and REFORMAT, you can have OUTFIL FNAMES=OUT1,INCLUDE=(...), OUTFIL FNAMES=OUT2,INCLUDE=(...), etc., so different subsets of the joined rows go to different files
  • Only with SAVE
  • Only for inner join