When the input to DFSORT is extremely large, the main challenges are having enough sortwork (temporary disk space) for the sort to complete, giving DFSORT a good estimate of input size so it can plan its merge strategy, and tuning blocksizes and options so the job runs within time and space limits. This page covers how to size and allocate sortwork, the role of OPTION FILSZ and dynamic allocation, what happens when sortwork is insufficient, and practical approaches such as filtering or splitting the data when a single sort job would be too large.
DFSORT (and compatible sort products) use temporary work datasets—sortwork—to hold data during the sort. Records are read from the input, written to sortwork in sorted or partially sorted runs, then merged until one sorted stream is produced. The total amount of sortwork must be large enough to hold the intermediate data. If it is too small, the sort cannot complete: you get a space abend or severe performance degradation as the product does many merge passes or runs out of space. So the first step in handling extremely large files is to ensure enough sortwork. That usually means either many SORTWKnn DD statements (each pointing to a dataset with enough space) or dynamic allocation (OPTION DYNALLOC or equivalent) so the product allocates the work datasets itself based on the estimated input size.
OPTION FILSZ lets you tell DFSORT how large the input is—for example, the number of records or the number of blocks. With that estimate, DFSORT can plan how much sortwork to request (when using dynamic allocation) or how to use the SORTWKnn datasets you provided. If you underestimate FILSZ, DFSORT may assume a smaller input and request or use too little sortwork, which can lead to failure or poor performance. If you overestimate, the product may allocate more than needed, which is usually safe but may waste space. Use a reasonable estimate based on the actual or expected input size; some sites derive it from the input dataset’s statistics or from a prior run.
When dynamic allocation is used (e.g. OPTION DYNALLOC), DFSORT (or the sort product) allocates the sortwork datasets itself instead of relying on pre-defined SORTWKnn DDs. The product typically uses FILSZ (and sometimes SIZE or other options) to determine how many work datasets and how large each should be. That simplifies JCL: you do not have to code dozens of SORTWKnn statements or guess their size for each job. The product requests what it needs within the limits allowed by the installation. Check your product manual for the exact syntax and any installation requirements (e.g. SMS rules, unit names) for dynamically allocated sortwork.
For very large files, I/O can dominate elapsed time. Use a large blocksize for SORTIN, SORTOUT, and sortwork datasets when the system and DASD support it. Larger blocks reduce the number of I/O operations per unit of data and can significantly speed up the job. The optimal blocksize depends on your DASD type and installation standards; common values are in the tens or hundreds of kilobytes. Avoid very small blocksizes for large files.
Some sort products support OPTION SIZE (or equivalent) to influence how much memory or how many merge paths are used. The exact meaning (e.g. main storage for sort, number of merge inputs) is product-specific. For extremely large sorts, tuning these options can reduce merge passes or improve CPU and I/O balance. Consult your DFSORT or sort product manual for SIZE, MXSORT, and merge-related options that apply to your release.
If a single sort job cannot complete within the available sortwork or time window, consider reducing the amount of data or splitting the work. Use INCLUDE or OMIT to filter records so that only the needed subset is sorted (e.g. by date or key range). That reduces the effective input size. Alternatively, split the input by key range: run multiple jobs, each sorting a portion (e.g. keys A–M in one job, N–Z in another), then concatenate the sorted outputs or merge them with MERGE. That way each job uses less sortwork and may fit within limits.
Imagine sorting a huge pile of cards. You need lots of empty tables (sortwork) to put the cards on while you sort. If you don't have enough tables, you can't finish. So we tell the machine how many cards we have (FILSZ), and it gets enough tables (sortwork) for the job. If the pile is too big for one room, we can split it: sort one part, sort another part, then put the two sorted piles together.
1. What is the main resource constraint when sorting very large files with DFSORT?
2. What is OPTION FILSZ used for?
3. Why use dynamic allocation for sortwork when handling large files?
4. What can happen if sortwork is too small for the input?
5. Besides sortwork, what else should you consider for very large sorts?