MainframeMaster

Lookup Tables

A lookup table is a reference dataset—usually relatively small—that is keyed by a code or ID (e.g. product code, state code, account type). In batch processing you often need to enrich a main file (transactions, orders, detail records) with descriptive data from such a table: for example, add product name to each order line by matching product code. In DFSORT you do this with JOINKEYS: join the main file with the lookup on the key and use REFORMAT to pull the lookup fields into the output. This page explains how to set up a lookup join, when to use inner vs left join, and how to avoid duplicate rows when the lookup has duplicate keys.

Advanced Tricks
Progress0 of 0 lessons

What Is a Lookup Table?

A lookup table (or reference file) is a dataset that holds rows of the form (key, value1, value2, …)—for example, product code and product name, or state code and state full name. The main file you are processing has the key (e.g. product code on each order line) but not the description. The goal is to produce an output record that contains both the main record data and the looked-up fields (e.g. order line plus product name). In DFSORT there is no separate "lookup" verb; you use a join. The main file is one input (typically F1) and the lookup table is the other (typically F2). You match on the key and use REFORMAT to define the output as main-record fields plus lookup fields.

Basic Setup: Main File as F1, Lookup as F2

By convention the main file (the one you want to keep every record of) is the first file (F1), and the lookup is the second file (F2). In JCL you allocate the main file to SORTJNF1 and the lookup to SORTJNF2. In control statements you code two JOINKEYS (one for each file) with FIELDS= specifying the join key position and format for that file. The key must be in the same order in both files (e.g. ascending character). Then REFORMAT FIELDS=(F1:pos,len,...,F2:pos,len,...) defines the output: which segments from the main record and which from the lookup record go into the joined row. After the join you usually use SORT FIELDS=COPY because the order of the joined output is already determined by the merge; if you need a different order you can specify SORT FIELDS=.

Inner Join vs Left Join for Lookups

If you use only JOINKEYS and REFORMAT without JOIN UNPAIRED, you get an inner join: only main records that have a matching key in the lookup appear in the output. Main records whose key is not in the lookup are dropped. That is useful when you only want rows with valid lookup data (e.g. exclude invalid product codes). If you want to keep every main record and still add lookup data when available, use JOIN UNPAIRED,F1 (left join). Then every F1 (main) record is written; when the key matches an F2 (lookup) row, the REFORMAT output includes the lookup fields; when there is no match, the F2 portion of the REFORMAT is blank-filled (or low values for numeric fields). So you get one output row per main record, with lookup fields blank when the code is not in the lookup table.

Join type for lookup
Join typeWhen to use
Inner join (no JOIN UNPAIRED)Only main records that have a matching key in the lookup appear; unknown codes are dropped.
Left join (JOIN UNPAIRED,F1)Every main record appears; lookup fields are filled when the key matches, blank when it does not.

Example: Add Product Name to Order Lines (Left Join)

Main file (orders): positions 1–8 order ID, 9–16 product code (CH), 17–24 quantity. Lookup: 1–8 product code (CH), 9–48 product name (CH). Key is product code. We want one output row per order with order ID, product code, quantity, and product name. Use left join so orders with unknown product codes still appear with a blank product name.

text
1
2
3
4
5
JOINKEYS FIELDS=(9,8,CH,A) JOINKEYS FIELDS=(1,8,CH,A) REFORMAT FIELDS=(F1:1,24,F2:9,40) JOIN UNPAIRED,F1 SORT FIELDS=COPY

F1 (main) key is 9–8 (product code). F2 (lookup) key is 1–8. REFORMAT takes F1 bytes 1–24 (order ID, product code, quantity) and F2 bytes 9–48 (product name, 40 bytes). JOIN UNPAIRED,F1 keeps every order; when product code is in the lookup, product name is added; when not, the 40-byte product name area is blank. Allocate main file to SORTJNF1 and lookup to SORTJNF2 (or use F2= if you use a different DD name for the lookup).

Ensuring One Row per Main Record: Deduplicate the Lookup

JOINKEYS matches every F1 record with every F2 record that has the same key. So if the lookup has two rows for the same product code, one order line with that code will produce two output rows. For a true "lookup" you usually want at most one row per key in the lookup. To achieve that, deduplicate the lookup before the join: run a prior step that reads the lookup, sorts by the key, and uses SUM FIELDS=NONE (or keeps one record per key with INREC/OUTREC if you need to choose which duplicate to keep). Write the result to a temporary dataset and use that as SORTJNF2. Then each main record will match at most one lookup row and you get one output row per main record (for left join) or one per matched pair (for inner join).

Multiple Lookups (Chaining)

DFSORT JOINKEYS joins two files per step. To enrich with two lookup tables (e.g. product name and warehouse name), run two join steps. Step 1: main file + first lookup (e.g. product) → output1. Step 2: output1 + second lookup (e.g. warehouse) on the appropriate key → final output. The key in step 2 is the key in the first join output (e.g. warehouse code), and REFORMAT in step 2 includes the existing fields from output1 plus the new lookup fields. So lookup tables are applied one at a time by chaining join steps.

Explain It Like I'm Five

You have a list of orders that say "product 7" or "product 12" and a small list that says "7 = Apples" and "12 = Oranges". A lookup is like having a friend read your order list and the small list: for each order they write down the order and then look up the number in the small list and write "Apples" or "Oranges" next to it. If an order has a number that is not in the small list, they can either skip that order (inner join) or still write the order and leave the name blank (left join). DFSORT does that by joining the two lists on the number and then building a new line that has the order plus the name from the small list.

Exercises

  1. Main file has customer ID at 1–10 and state code at 11–12. Lookup has state code at 1–2 and state name at 3–22. Write JOINKEYS and REFORMAT to add state name to each main record. Use left join. What is the REFORMAT layout length?
  2. Why might you deduplicate the lookup table before the join? What control statement do you use to keep one record per key?
  3. You need to add both product name and category name. Product lookup: product code, product name. Category lookup: category code, category name. Your main file has product code and category code. Describe the two-step join approach.

Quiz

Test Your Knowledge

1. What is a "lookup table" in the context of DFSORT?

  • A table stored inside DFSORT
  • A relatively small reference dataset (e.g. codes to descriptions) that you join with a larger main file so each main record can be enriched with fields from the lookup (e.g. product code to product name)
  • The SORT FIELDS statement
  • OUTFIL only

2. You have a main file (orders) and a lookup (product code, product name). You want every order row to show the product name. Some orders may have a product code not in the lookup. What join type do you use?

  • Inner join—then unmatched orders are dropped
  • Left join: JOIN UNPAIRED,F1 so every main (F1) record is kept; when the product code is not in the lookup, the F2 (lookup) portion is blank-filled
  • Right join only
  • Full outer join only

3. For a lookup table join, which file is usually F1 and which is F2?

  • It does not matter
  • Typically the main file (the one you want to keep every record of, or the larger file) is F1 and the lookup table is F2; then JOIN UNPAIRED,F1 keeps all main records and adds lookup data when the key matches
  • Lookup must always be F1
  • Main file must always be F2

4. What if the lookup table has duplicate keys (e.g. two rows for the same product code)?

  • DFSORT abends
  • You get one output row per main record per matching lookup row—so a main record with that key will appear multiple times (once per lookup duplicate). To avoid that, deduplicate the lookup first (e.g. SORT with SUM FIELDS=NONE on the key) so there is one row per key
  • DFSORT keeps only the first
  • REFORMAT ignores duplicates

5. Why is it often better to put the smaller file (the lookup) in memory or to have it as F2?

  • DFSORT requires lookup to be F2
  • JOINKEYS processing can be more efficient when the smaller file is used in a way that minimizes I/O; conceptually the "lookup" is the small reference and the "main" is the larger stream—so main as F1 and lookup as F2 is a natural and often efficient arrangement
  • F2 must be sorted; F1 need not be
  • Only F1 can use JOIN UNPAIRED