MainframeMaster

JOINKEYS Performance Tuning

JOINKEYS performance tuning means reducing CPU and elapsed time and avoiding resource failures when joining two files with DFSORT. The main levers are: sort avoidance (declare pre-sorted inputs with SORTED), sortwork (enough SORTWK or dynamic allocation with DYNALLOC), size estimates (FILSZ so DFSORT can plan allocation), and I/O layout (e.g. F1 and F2 on different volumes for parallel reads). This page covers these tuning options, how they interact, and practical recommendations for JOINKEYS steps.

JOINKEYS / tuning
Progress0 of 0 lessons

Sort Avoidance: SORTED and NOSEQCK

The single most effective tuning for JOINKEYS is to avoid sorting when the data is already in order. On each JOINKEYS you can specify:

  • SORTED — This file is already in the same order as the join key (position, length, format, order). DFSORT will not sort it.
  • NOSEQCK — Do not check that the file is in sequence. Use only when you are certain the order is correct; otherwise omit it so DFSORT can detect sequence errors.

If both files are pre-sorted by the join key, specifying SORTED on both JOINKEYS removes the sort phase for both, which can dramatically reduce CPU and elapsed time for large files.

text
1
2
3
4
JOINKEYS F1=MASTER,FIELDS=(1,10,CH,A),SORTED,NOSEQCK JOINKEYS F2=DETAIL,FIELDS=(5,10,CH,A),SORTED,NOSEQCK REFORMAT FIELDS=(F1:1,80,F2:1,100) SORT FIELDS=COPY

Sortwork: SORTWK and DYNALLOC

JOINKEYS uses sortwork for the F1 sort (if not SORTED), the F2 sort (if not SORTED), and the join/merge phase. If you do not allocate enough space, the step can fail. You can:

  • Allocate a sufficient number of SORTWK (or SORTWK01, SORTWK02, …) DD statements with enough space for the expected data volume.
  • Use OPTION DYNALLOC so DFSORT can dynamically allocate additional sortwork datasets when needed. That way you do not have to guess the exact number or size of SORTWK DDs. The exact syntax and where to specify it (e.g. in JNF1CNTL for JOINKEYS) can vary by DFSORT version; refer to IBM documentation.

For very large joins, IBM tuning guides recommend estimating the total data volume (inputs and output) and providing enough work space—either via static SORTWKnn or via DYNALLOC limits—so that DFSORT does not run out of space.

Size Estimation: FILSZ

FILSZ (file size) gives DFSORT an estimate of how much data will be processed. With that estimate, DFSORT can plan how much sortwork and memory to request and how to organize the sort/merge. If you omit it or use a poor estimate, DFSORT may under-allocate (leading to failures or extra passes) or over-allocate (wasting DASD and memory). For JOINKEYS, size estimates may be specified in the appropriate control DD (e.g. JNF1CNTL) as documented for your DFSORT release. The format is often something like FILSZ=Ennnnnnn for an estimate in bytes (E = estimate). Check your installation’s DFSORT documentation for the exact syntax and DD name.

Parallel I/O and Volume Placement

JOINKEYS reads F1 and F2 in parallel (separate tasks). If both datasets reside on the same volume or share the same channel path, the two reads can contend for I/O and elapsed time may not improve much. Where possible, place F1 and F2 on different volumes or spread them across channels so that the parallel reads do not wait on the same device. Similarly, spreading SORTWK datasets across volumes can reduce I/O contention during the sort/merge phases.

Blocksize

Use a blocksize that is a multiple of the record length and within the device maximum (e.g. 32K or 64K for many DASD). Larger blocks reduce the number of I/O operations per unit of data and can improve throughput. This applies to input files, SORTOUT, and sortwork datasets where you control allocation.

Summary of Tuning Areas

JOINKEYS performance tuning areas
AreaSuggestion
Sort avoidanceUse SORTED (and NOSEQCK if safe) on JOINKEYS when the file is already in join-key order.
SortworkAllocate enough SORTWK or SORTWKnn; use DYNALLOC to allow dynamic allocation of additional work datasets.
Size estimateProvide FILSZ or equivalent so DFSORT can estimate and allocate sortwork and memory appropriately.
I/O layoutPlace F1 and F2 on different volumes/channels so parallel reads do not contend.
BlocksizeUse efficient blocksize (e.g. multiple of record length, within device max) for input and output.

OPTION Statement and JOINKEYS

General DFSORT options (e.g. SIZE, DYNALLOC, EQUALS, MSGPRT) apply to the overall step. For JOINKEYS, some options may need to be specified in a control DD used by the join (e.g. JNF1CNTL) rather than in SYSIN or DFSPARM, depending on the product version. Always refer to your site’s IBM DFSORT documentation or installation notes for where to specify DYNALLOC, FILSZ, and other tuning parameters in JOINKEYS jobs.

Explain It Like I'm Five

Making two piles of cards (F1 and F2) and then matching them is faster if the piles are already in order—you don’t have to sort them again (SORTED). If you have a big table (sortwork), you can spread the cards out and not get stuck. And if two people each sort one pile at the same time on different tables (different volumes), you finish sooner. Tuning is like giving the right instructions and enough space so the job runs as fast as possible without running out of room.

Exercises

  1. When would you use SORTED without NOSEQCK? When would you use both?
  2. What problem does DYNALLOC solve when you are not sure how much sortwork a JOINKEYS step will need?
  3. Why does placing F1 and F2 on different volumes help elapsed time?
  4. Look up your DFSORT documentation: where do you specify FILSZ or DYNALLOC for a JOINKEYS step—SYSIN, DFSPARM, or another DD?

Quiz

Test Your Knowledge

1. Which JOINKEYS option has the biggest impact when your inputs are already sorted by the join key?

  • REFORMAT
  • SORTED—it avoids sorting one or both files, saving CPU and I/O
  • JOIN UNPAIRED
  • FIELDS=

2. What does OPTION DYNALLOC do in a DFSORT (or JOINKEYS) step?

  • Skips the join
  • Allows DFSORT to dynamically allocate additional sortwork datasets when needed, instead of relying only on pre-allocated SORTWK DDs
  • Allocates SORTOUT only
  • Disables parallel processing

3. Why might you specify FILSZ (file size estimate) in a JOINKEYS step?

  • To limit output record count
  • So DFSORT can estimate resource needs (e.g. sortwork, memory) and tune the run; a better estimate can improve allocation and performance
  • To set SORTOUT length
  • Only for single-file SORT

4. What is the trade-off of using NOSEQCK with SORTED?

  • No trade-off
  • You save CPU by not checking that the file is in order, but if the file is not in order you may get incorrect join results or unpredictable behavior; use only when the sort order is guaranteed
  • NOSEQCK is required with SORTED
  • NOSEQCK increases memory

5. How can you improve elapsed time for a JOINKEYS step when both inputs are large?

  • Only by using a faster CPU
  • Use SORTED for pre-sorted inputs; put F1 and F2 on different volumes so parallel reads do not contend; size sortwork and memory so DFSORT does not thrash
  • Use JOIN UNPAIRED,F1,F2
  • Reduce REFORMAT fields