When DFSORT runs a sort or merge, it may need more space than available memory. It uses work (temporary) datasets to hold intermediate data. You can let DFSORT dynamically allocate these, or you can supply them explicitly with DD names SORTWK01, SORTWK02, and so on. This page explains what temporary datasets are for, when they are used, how to allocate them in JCL, and how that compares to dynamic allocation.
During a sort or merge, DFSORT holds records in memory and may write intermediate results to disk when the data does not fit in the allocated memory. The disk space used for this is the sort work or work datasets. They are temporary: used only during the step and then discarded. Using disk allows DFSORT to sort or merge very large files that would not fit in central storage. The number and size of work datasets depend on the volume of data and the options (e.g. SIZE, dynamic allocation limits) in effect.
You have two ways to provide work space:
Many shops use dynamic allocation for most jobs and reserve explicit SORTWKnn for large or performance-critical sorts where placement matters.
When you allocate SORTWKnn explicitly, you typically:
DSN=&&TEMP1 (job temporary) or a name that is unique per run. Some sites use a single high-level qualifier for sort work (e.g. SORTWORK.TEMP). The dataset should not be cataloged.123456//SORTWK01 DD DSN=&&SORTWK1,DISP=(NEW,DELETE,DELETE), // SPACE=(CYL,(50,10)),UNIT=SYSDA //SORTWK02 DD DSN=&&SORTWK2,DISP=(NEW,DELETE,DELETE), // SPACE=(CYL,(50,10)),UNIT=SYSDA //SORTWK03 DD DSN=&&SORTWK3,DISP=(NEW,DELETE,DELETE), // SPACE=(CYL,(50,10)),UNIT=SYSDA
Here three work datasets are created as job temporaries (&&). Each gets 50 cylinders primary and 10 secondary. UNIT=SYSDA uses the system default direct-access device type. They are deleted when the step (or job) ends.
The number of SORTWKnn datasets and the total space required depend on the data size and DFSORT's internal algorithm. There is no single formula that fits all jobs. In practice:
The OPTION statement can influence memory and work usage (e.g. SIZE, FILSZ). See the OPTION and performance-tuning tutorials for more detail.
DSN=&&name creates a job temporary dataset. It exists only for the duration of the job and is not cataloged. The name is unique to the job. This is ideal for sort work: you do not need to invent a unique name each run, and the dataset is automatically cleaned up. Alternatively, you can use a fixed name (e.g. USERID.SORT.WORK01) and DISP=(NEW,DELETE,DELETE); the dataset is deleted at step end, so it does not persist. Using a cataloged permanent name for sort work is unusual and generally not recommended.
When you sort a huge pile of cards, sometimes your desk (memory) is not big enough to hold all of them at once. So you use extra tables (work datasets) to put some of the cards down while you work on the rest. Those tables are temporary—you don't keep them when you're done. DFSORT does the same: it uses temporary work datasets when the data is too big for memory. You can either let DFSORT ask for those tables itself (dynamic allocation) or set up the tables yourself in the JCL (SORTWK01, SORTWK02, …). Either way, when the sort is finished, the work tables are cleared away.
1. What are SORTWK01, SORTWK02, ... used for?
2. Are SORTWKnn DD statements required for every DFSORT step?
3. What DISP is typically used for temporary sort work datasets?
4. Why might a shop use explicit SORTWKnn instead of dynamic allocation?
5. What happens to temporary work datasets when the step completes normally?