How do I sort a very large file with DFSORT?

Ensure enough sortwork space (SORTWKnn or dynamic allocation). Use OPTION FILSZ to give an estimate of input size so DFSORT can plan. Consider OPTION DYNALLOC if your product supports it. Use efficient blocksizes for input, output, and work datasets. Tune SIZE and merge options per your product manual. If the file is too large for one job, consider filtering (INCLUDE/OMIT) or splitting the data (e.g. by key range) and merging results.

What is sortwork in DFSORT?

Sortwork is temporary storage (disk) that DFSORT uses to hold and merge data during the sort. It is provided via SORTWK01, SORTWK02, ... or by dynamic allocation. The total amount of sortwork must be sufficient for the input size and sort key; otherwise the sort can fail or run very slowly.

Why did my large sort job abend with a space or resource error?

Often the sortwork is too small for the input. Increase the number or size of SORTWKnn datasets, or use dynamic allocation with a sufficient limit. Also check that FILSZ (if used) is not severely underestimated. Review the abend code and messages for "insufficient space" or "work dataset" and add more sortwork.

Can I split a very large sort into multiple jobs?

Yes. One approach is to filter the input (e.g. INCLUDE by key range) so each job sorts a subset (e.g. keys A–M, N–Z), then concatenate or merge the sorted outputs. Another is to pre-sort chunks (e.g. by date) and use MERGE to combine them. That way each job uses less sortwork and may fit in your resource limits.

Handling Extremely Large Files in DFSORT

Q: What is OPTION FILSZ?

FILSZ is an option that tells DFSORT the estimated size of the input (e.g. number of records or blocks). DFSORT uses this to plan how much sortwork to use and how to organize the merge. A reasonable estimate helps avoid under-allocation or inefficient use of work space.

Handling Extremely Large Files

When the input to DFSORT is extremely large, the main challenges are having enough sortwork (temporary disk space) for the sort to complete, giving DFSORT a good estimate of input size so it can plan its merge strategy, and tuning blocksizes and options so the job runs within time and space limits. This page covers how to size and allocate sortwork, the role of OPTION FILSZ and dynamic allocation, what happens when sortwork is insufficient, and practical approaches such as filtering or splitting the data when a single sort job would be too large.

Sortwork: The Main Bottleneck

DFSORT (and compatible sort products) use temporary work datasets—sortwork—to hold data during the sort. Records are read from the input, written to sortwork in sorted or partially sorted runs, then merged until one sorted stream is produced. The total amount of sortwork must be large enough to hold the intermediate data. If it is too small, the sort cannot complete: you get a space abend or severe performance degradation as the product does many merge passes or runs out of space. So the first step in handling extremely large files is to ensure enough sortwork. That usually means either many SORTWKnn DD statements (each pointing to a dataset with enough space) or dynamic allocation (OPTION DYNALLOC or equivalent) so the product allocates the work datasets itself based on the estimated input size.

Estimating and Specifying Size: FILSZ

OPTION FILSZ lets you tell DFSORT how large the input is—for example, the number of records or the number of blocks. With that estimate, DFSORT can plan how much sortwork to request (when using dynamic allocation) or how to use the SORTWKnn datasets you provided. If you underestimate FILSZ, DFSORT may assume a smaller input and request or use too little sortwork, which can lead to failure or poor performance. If you overestimate, the product may allocate more than needed, which is usually safe but may waste space. Use a reasonable estimate based on the actual or expected input size; some sites derive it from the input dataset’s statistics or from a prior run.

Dynamic Allocation (DYNALLOC)

When dynamic allocation is used (e.g. OPTION DYNALLOC), DFSORT (or the sort product) allocates the sortwork datasets itself instead of relying on pre-defined SORTWKnn DDs. The product typically uses FILSZ (and sometimes SIZE or other options) to determine how many work datasets and how large each should be. That simplifies JCL: you do not have to code dozens of SORTWKnn statements or guess their size for each job. The product requests what it needs within the limits allowed by the installation. Check your product manual for the exact syntax and any installation requirements (e.g. SMS rules, unit names) for dynamically allocated sortwork.

Blocksize and I/O Efficiency

For very large files, I/O can dominate elapsed time. Use a large blocksize for SORTIN, SORTOUT, and sortwork datasets when the system and DASD support it. Larger blocks reduce the number of I/O operations per unit of data and can significantly speed up the job. The optimal blocksize depends on your DASD type and installation standards; common values are in the tens or hundreds of kilobytes. Avoid very small blocksizes for large files.

SIZE and Merge Options

Some sort products support OPTION SIZE (or equivalent) to influence how much memory or how many merge paths are used. The exact meaning (e.g. main storage for sort, number of merge inputs) is product-specific. For extremely large sorts, tuning these options can reduce merge passes or improve CPU and I/O balance. Consult your DFSORT or sort product manual for SIZE, MXSORT, and merge-related options that apply to your release.

Splitting and Filtering

If a single sort job cannot complete within the available sortwork or time window, consider reducing the amount of data or splitting the work. Use INCLUDE or OMIT to filter records so that only the needed subset is sorted (e.g. by date or key range). That reduces the effective input size. Alternatively, split the input by key range: run multiple jobs, each sorting a portion (e.g. keys A–M in one job, N–Z in another), then concatenate the sorted outputs or merge them with MERGE. That way each job uses less sortwork and may fit within limits.

Explain It Like I'm Five

Imagine sorting a huge pile of cards. You need lots of empty tables (sortwork) to put the cards on while you sort. If you don't have enough tables, you can't finish. So we tell the machine how many cards we have (FILSZ), and it gets enough tables (sortwork) for the job. If the pile is too big for one room, we can split it: sort one part, sort another part, then put the two sorted piles together.

Exercises

What is sortwork and why can running out of it cause the sort to fail?
What is the purpose of OPTION FILSZ when sorting a very large file?
Give one advantage of using dynamic allocation for sortwork.
How could you sort a file that is too large for one job (without adding more sortwork)?

Quiz

Test Your Knowledge

1. What is the main resource constraint when sorting very large files with DFSORT?

CPU only
Sortwork space (disk); DFSORT needs enough work datasets to hold and merge intermediate data. Running out of sortwork causes abends or poor performance
SYSIN length only
Record length only

2. What is OPTION FILSZ used for?

File size in records
Giving DFSORT an estimate of the input size (e.g. in records or blocks) so it can plan sortwork and merge strategy; a good estimate can improve allocation and performance
Output length only
Same as STOPAFT

3. Why use dynamic allocation for sortwork when handling large files?

It is required
Dynamic allocation lets DFSORT (or the sort product) request the right number and size of work datasets based on the job, instead of pre-allocating fixed SORTWKnn DDs; that can simplify JCL and adapt to different file sizes
It is faster than SORTWKnn
Only for small files

4. What can happen if sortwork is too small for the input?

Nothing
The sort can abend (e.g. out of space), or performance can degrade sharply (excessive I/O, multiple passes). Allocate more sortwork or reduce the effective input (e.g. INCLUDE to filter) to fit
It always uses memory instead
DFSORT stops after 1000 records

5. Besides sortwork, what else should you consider for very large sorts?

Only sortwork
Blocksize of input/output and sortwork (efficient I/O), SIZE/MXSORT and merge options, elapsed time (long-running jobs), and whether the job can be split (e.g. by key range) or run in a window with enough resources
Only OPTION COPY
Only record length