How do I tune DFSORT JOINKEYS performance?

Use SORTED (and NOSEQCK if safe) when inputs are already in join-key order. Allocate sufficient SORTWK or use DYNALLOC for sortwork. Provide FILSZ or size estimates so DFSORT can allocate efficiently. Use appropriate blocksize for inputs and output. Place F1 and F2 on different volumes for parallel I/O. Refer to IBM tuning guides for SIZE and other options.

What is SORTED in JOINKEYS?

SORTED is an option on JOINKEYS that tells DFSORT the file is already in the same order as the join key. DFSORT then skips sorting that file, which improves performance. Use it whenever the input is guaranteed to be in join-key order (e.g. from a prior SORT step).

What is DYNALLOC in DFSORT?

OPTION DYNALLOC allows DFSORT to dynamically allocate additional sortwork datasets beyond the SORTWK DDs you provide. It is useful when the amount of work space needed is variable or when you want to avoid defining many static SORTWKnn. For JOINKEYS, options like DYNALLOC and FILSZ may be specified in the appropriate control DD (e.g. JNF1CNTL) as documented by IBM.

Why use FILSZ with JOINKEYS?

FILSZ provides an estimated file size so DFSORT can plan sortwork and memory allocation. With a reasonable estimate, DFSORT can avoid under-allocating (which causes failures) or over-allocating (which wastes resources). For JOINKEYS, size estimates help the internal F1, F2, and join tasks allocate efficiently.

Does JOINKEYS use more memory than a simple SORT?

JOINKEYS runs multiple tasks (F1, F2, and the join). Each may use memory for sorting or merging. So overall memory and sortwork demand can be higher than a single-file SORT of the same size. Tuning options (SIZE, DYNALLOC, FILSZ) and SORTED to avoid redundant sorts help control resource use.

DFSORT JOINKEYS Performance Tuning - SORTED, Sortwork, DYNALLOC, SIZE

JOINKEYS Performance Tuning

JOINKEYS performance tuning means reducing CPU and elapsed time and avoiding resource failures when joining two files with DFSORT. The main levers are: sort avoidance (declare pre-sorted inputs with SORTED), sortwork (enough SORTWK or dynamic allocation with DYNALLOC), size estimates (FILSZ so DFSORT can plan allocation), and I/O layout (e.g. F1 and F2 on different volumes for parallel reads). This page covers these tuning options, how they interact, and practical recommendations for JOINKEYS steps.

JOINKEYS / tuning

Sort Avoidance: SORTED and NOSEQCK

The single most effective tuning for JOINKEYS is to avoid sorting when the data is already in order. On each JOINKEYS you can specify:

SORTED — This file is already in the same order as the join key (position, length, format, order). DFSORT will not sort it.
NOSEQCK — Do not check that the file is in sequence. Use only when you are certain the order is correct; otherwise omit it so DFSORT can detect sequence errors.

If both files are pre-sorted by the join key, specifying SORTED on both JOINKEYS removes the sort phase for both, which can dramatically reduce CPU and elapsed time for large files.

text

1
2
3
4
  JOINKEYS F1=MASTER,FIELDS=(1,10,CH,A),SORTED,NOSEQCK
  JOINKEYS F2=DETAIL,FIELDS=(5,10,CH,A),SORTED,NOSEQCK
  REFORMAT FIELDS=(F1:1,80,F2:1,100)
  SORT FIELDS=COPY

Sortwork: SORTWK and DYNALLOC

JOINKEYS uses sortwork for the F1 sort (if not SORTED), the F2 sort (if not SORTED), and the join/merge phase. If you do not allocate enough space, the step can fail. You can:

Allocate a sufficient number of SORTWK (or SORTWK01, SORTWK02, …) DD statements with enough space for the expected data volume.
Use OPTION DYNALLOC so DFSORT can dynamically allocate additional sortwork datasets when needed. That way you do not have to guess the exact number or size of SORTWK DDs. The exact syntax and where to specify it (e.g. in JNF1CNTL for JOINKEYS) can vary by DFSORT version; refer to IBM documentation.

For very large joins, IBM tuning guides recommend estimating the total data volume (inputs and output) and providing enough work space—either via static SORTWKnn or via DYNALLOC limits—so that DFSORT does not run out of space.

Size Estimation: FILSZ

FILSZ (file size) gives DFSORT an estimate of how much data will be processed. With that estimate, DFSORT can plan how much sortwork and memory to request and how to organize the sort/merge. If you omit it or use a poor estimate, DFSORT may under-allocate (leading to failures or extra passes) or over-allocate (wasting DASD and memory). For JOINKEYS, size estimates may be specified in the appropriate control DD (e.g. JNF1CNTL) as documented for your DFSORT release. The format is often something like FILSZ=Ennnnnnn for an estimate in bytes (E = estimate). Check your installation’s DFSORT documentation for the exact syntax and DD name.

Parallel I/O and Volume Placement

JOINKEYS reads F1 and F2 in parallel (separate tasks). If both datasets reside on the same volume or share the same channel path, the two reads can contend for I/O and elapsed time may not improve much. Where possible, place F1 and F2 on different volumes or spread them across channels so that the parallel reads do not wait on the same device. Similarly, spreading SORTWK datasets across volumes can reduce I/O contention during the sort/merge phases.

Blocksize

Use a blocksize that is a multiple of the record length and within the device maximum (e.g. 32K or 64K for many DASD). Larger blocks reduce the number of I/O operations per unit of data and can improve throughput. This applies to input files, SORTOUT, and sortwork datasets where you control allocation.

Summary of Tuning Areas

JOINKEYS performance tuning areas
Area	Suggestion
Sort avoidance	Use SORTED (and NOSEQCK if safe) on JOINKEYS when the file is already in join-key order.
Sortwork	Allocate enough SORTWK or SORTWKnn; use DYNALLOC to allow dynamic allocation of additional work datasets.
Size estimate	Provide FILSZ or equivalent so DFSORT can estimate and allocate sortwork and memory appropriately.
I/O layout	Place F1 and F2 on different volumes/channels so parallel reads do not contend.
Blocksize	Use efficient blocksize (e.g. multiple of record length, within device max) for input and output.

OPTION Statement and JOINKEYS

General DFSORT options (e.g. SIZE, DYNALLOC, EQUALS, MSGPRT) apply to the overall step. For JOINKEYS, some options may need to be specified in a control DD used by the join (e.g. JNF1CNTL) rather than in SYSIN or DFSPARM, depending on the product version. Always refer to your site’s IBM DFSORT documentation or installation notes for where to specify DYNALLOC, FILSZ, and other tuning parameters in JOINKEYS jobs.

Explain It Like I'm Five

Making two piles of cards (F1 and F2) and then matching them is faster if the piles are already in order—you don’t have to sort them again (SORTED). If you have a big table (sortwork), you can spread the cards out and not get stuck. And if two people each sort one pile at the same time on different tables (different volumes), you finish sooner. Tuning is like giving the right instructions and enough space so the job runs as fast as possible without running out of room.

Exercises

When would you use SORTED without NOSEQCK? When would you use both?
What problem does DYNALLOC solve when you are not sure how much sortwork a JOINKEYS step will need?
Why does placing F1 and F2 on different volumes help elapsed time?
Look up your DFSORT documentation: where do you specify FILSZ or DYNALLOC for a JOINKEYS step—SYSIN, DFSPARM, or another DD?

Quiz

Test Your Knowledge

1. Which JOINKEYS option has the biggest impact when your inputs are already sorted by the join key?

REFORMAT
SORTED—it avoids sorting one or both files, saving CPU and I/O
JOIN UNPAIRED
FIELDS=

2. What does OPTION DYNALLOC do in a DFSORT (or JOINKEYS) step?

Skips the join
Allows DFSORT to dynamically allocate additional sortwork datasets when needed, instead of relying only on pre-allocated SORTWK DDs
Allocates SORTOUT only
Disables parallel processing

3. Why might you specify FILSZ (file size estimate) in a JOINKEYS step?

To limit output record count
So DFSORT can estimate resource needs (e.g. sortwork, memory) and tune the run; a better estimate can improve allocation and performance
To set SORTOUT length
Only for single-file SORT

4. What is the trade-off of using NOSEQCK with SORTED?

No trade-off
You save CPU by not checking that the file is in order, but if the file is not in order you may get incorrect join results or unpredictable behavior; use only when the sort order is guaranteed
NOSEQCK is required with SORTED
NOSEQCK increases memory

5. How can you improve elapsed time for a JOINKEYS step when both inputs are large?

Only by using a faster CPU
Use SORTED for pre-sorted inputs; put F1 and F2 on different volumes so parallel reads do not contend; size sortwork and memory so DFSORT does not thrash
Use JOIN UNPAIRED,F1,F2
Reduce REFORMAT fields