JOINKEYS performance tuning means reducing CPU and elapsed time and avoiding resource failures when joining two files with DFSORT. The main levers are: sort avoidance (declare pre-sorted inputs with SORTED), sortwork (enough SORTWK or dynamic allocation with DYNALLOC), size estimates (FILSZ so DFSORT can plan allocation), and I/O layout (e.g. F1 and F2 on different volumes for parallel reads). This page covers these tuning options, how they interact, and practical recommendations for JOINKEYS steps.
The single most effective tuning for JOINKEYS is to avoid sorting when the data is already in order. On each JOINKEYS you can specify:
If both files are pre-sorted by the join key, specifying SORTED on both JOINKEYS removes the sort phase for both, which can dramatically reduce CPU and elapsed time for large files.
1234JOINKEYS F1=MASTER,FIELDS=(1,10,CH,A),SORTED,NOSEQCK JOINKEYS F2=DETAIL,FIELDS=(5,10,CH,A),SORTED,NOSEQCK REFORMAT FIELDS=(F1:1,80,F2:1,100) SORT FIELDS=COPY
JOINKEYS uses sortwork for the F1 sort (if not SORTED), the F2 sort (if not SORTED), and the join/merge phase. If you do not allocate enough space, the step can fail. You can:
For very large joins, IBM tuning guides recommend estimating the total data volume (inputs and output) and providing enough work space—either via static SORTWKnn or via DYNALLOC limits—so that DFSORT does not run out of space.
FILSZ (file size) gives DFSORT an estimate of how much data will be processed. With that estimate, DFSORT can plan how much sortwork and memory to request and how to organize the sort/merge. If you omit it or use a poor estimate, DFSORT may under-allocate (leading to failures or extra passes) or over-allocate (wasting DASD and memory). For JOINKEYS, size estimates may be specified in the appropriate control DD (e.g. JNF1CNTL) as documented for your DFSORT release. The format is often something like FILSZ=Ennnnnnn for an estimate in bytes (E = estimate). Check your installation’s DFSORT documentation for the exact syntax and DD name.
JOINKEYS reads F1 and F2 in parallel (separate tasks). If both datasets reside on the same volume or share the same channel path, the two reads can contend for I/O and elapsed time may not improve much. Where possible, place F1 and F2 on different volumes or spread them across channels so that the parallel reads do not wait on the same device. Similarly, spreading SORTWK datasets across volumes can reduce I/O contention during the sort/merge phases.
Use a blocksize that is a multiple of the record length and within the device maximum (e.g. 32K or 64K for many DASD). Larger blocks reduce the number of I/O operations per unit of data and can improve throughput. This applies to input files, SORTOUT, and sortwork datasets where you control allocation.
| Area | Suggestion |
|---|---|
| Sort avoidance | Use SORTED (and NOSEQCK if safe) on JOINKEYS when the file is already in join-key order. |
| Sortwork | Allocate enough SORTWK or SORTWKnn; use DYNALLOC to allow dynamic allocation of additional work datasets. |
| Size estimate | Provide FILSZ or equivalent so DFSORT can estimate and allocate sortwork and memory appropriately. |
| I/O layout | Place F1 and F2 on different volumes/channels so parallel reads do not contend. |
| Blocksize | Use efficient blocksize (e.g. multiple of record length, within device max) for input and output. |
General DFSORT options (e.g. SIZE, DYNALLOC, EQUALS, MSGPRT) apply to the overall step. For JOINKEYS, some options may need to be specified in a control DD used by the join (e.g. JNF1CNTL) rather than in SYSIN or DFSPARM, depending on the product version. Always refer to your site’s IBM DFSORT documentation or installation notes for where to specify DYNALLOC, FILSZ, and other tuning parameters in JOINKEYS jobs.
Making two piles of cards (F1 and F2) and then matching them is faster if the piles are already in order—you don’t have to sort them again (SORTED). If you have a big table (sortwork), you can spread the cards out and not get stuck. And if two people each sort one pile at the same time on different tables (different volumes), you finish sooner. Tuning is like giving the right instructions and enough space so the job runs as fast as possible without running out of room.
1. Which JOINKEYS option has the biggest impact when your inputs are already sorted by the join key?
2. What does OPTION DYNALLOC do in a DFSORT (or JOINKEYS) step?
3. Why might you specify FILSZ (file size estimate) in a JOINKEYS step?
4. What is the trade-off of using NOSEQCK with SORTED?
5. How can you improve elapsed time for a JOINKEYS step when both inputs are large?