How do I debug a slow DFSORT step?

Start with the step SYSOUT: read the ICE messages for record counts, phases run, and any capacity or error messages. Compare CPU time to elapsed time to see if the step is I/O-bound or CPU-bound. Check FILSZ, SIZE, REGION, and sortwork allocation. Use DEBUG if you need to see record counts per phase or control statement interpretation. Then tune: add memory, increase sortwork, filter or shorten records, or avoid unnecessary sorts (COPY/MERGE).

What do ICE messages in DFSORT mean?

ICE messages are DFSORT product messages. ICE000I and similar informatics report normal completion, record counts, and sometimes phase or resource information. ICE046A indicates sort capacity exceeded (insufficient sortwork or memory). Other ICE codes indicate syntax errors, resource problems, or data issues. See your DFSORT Messages and Codes manual for the full list.

Why is my DFSORT step I/O-bound?

When data does not fit in the allocated memory (SIZE/REGION), DFSORT writes sorted runs to sortwork and performs merge passes. Each merge pass reads and writes sortwork, so I/O can dominate elapsed time. To reduce I/O: increase memory (SIZE, REGION), provide a realistic FILSZ so sortwork is allocated adequately, reduce record count (INCLUDE/OMIT) or record length (INREC), and ensure blocksize and number of sortwork datasets are tuned.

When should I use DEBUG in DFSORT?

Use DEBUG when you are diagnosing why a sort failed, produced wrong record counts, or to see how control statements were applied. DEBUG produces extra diagnostic output in SYSOUT. It adds overhead, so enable it only during problem determination and remove it for production runs.

What is the first thing to check when a sort fails with ICE046A?

ICE046A means sort capacity exceeded. First check: (1) OPTION FILSZ—is the estimate too low so that DFSORT under-allocated sortwork? (2) Sortwork—do you have enough SORTWK datasets or DYNALLOC space? (3) OPTION SIZE and step REGION—is there enough memory so that less data spills to sortwork? Increase FILSZ, add or enlarge sortwork, or increase memory as needed.

Performance Debugging

When a DFSORT step is slow or fails, you need a systematic way to find the cause. Performance debugging involves reading the product messages (ICE messages) in SYSOUT, understanding what each phase of the sort did, and identifying whether the bottleneck is CPU, memory, or I/O (especially sortwork). This page explains how to analyze DFSORT job output, interpret common messages, use DEBUG when needed, and apply fixes so your sorts run faster or complete successfully.

Performance Optimization

Progress0 of 0 lessons

Start with SYSOUT and ICE Messages

The first place to look when debugging a DFSORT step is the step SYSOUT (job log). DFSORT writes ICE messages there—product-specific messages that report what the sort did. Typical informatics (e.g. ICE000I) include the number of records read from SORTIN, the number of records written to SORTOUT, and sometimes which phases ran (e.g. sort phase, merge phase). These numbers help you verify that the step processed the expected volume of data and that INCLUDE/OMIT did not drop more (or fewer) records than you intended. When the step fails, the last ICE message before the abend or error usually indicates the reason—for example, ICE046A (sort capacity exceeded) or a syntax error code. Keep your DFSORT Messages and Codes (or equivalent) manual handy so you can look up any message you do not recognize.

Common ICE Messages and What They Mean

Different DFSORT versions may use slightly different message numbers; the following are representative. ICE000I (or similar) is usually informational: it might say that the sort completed and report record counts. ICE046A means the sort exceeded its capacity: it ran out of sortwork space or could not complete with the allocated memory. The fix is to increase sortwork (more or larger SORTWK datasets, or higher DYNALLOC), to provide a larger FILSZ so DFSORT allocates more work space, or to increase SIZE and REGION so more data is sorted in memory. ICE083A and similar often indicate a resource or allocation failure (e.g. could not allocate a dataset). Syntax errors (e.g. ICE1xx) point to a problem in SYSIN—invalid keyword, wrong field position, or conflicting options. Correct the control statements and rerun.

Representative ICE messages (check your product manual for exact codes)
Message	Meaning
ICE000I	Informational; often completion or phase summary (record counts, etc.).
ICE046A	Sort capacity exceeded; insufficient sortwork or memory—increase FILSZ, sortwork, or SIZE/REGION.
ICE083A	Resource or allocation failure; may indicate sortwork or system resource shortage.
Syntax / ICE1xx	Control statement error; check SYSIN for invalid syntax or conflicting options.

Identifying the Bottleneck: CPU vs I/O

To understand why a sort is slow, you need to know whether the step is limited by CPU or by I/O. The job report (SMF or the job log) usually shows CPU time and elapsed time for the step. If elapsed time is much larger than CPU time, the step spent a lot of time waiting—often for I/O. That suggests the sort is I/O-bound: reading input, writing to sortwork, or reading from sortwork during merge passes. In that case, increasing memory (SIZE, REGION) can reduce how much data is written to sortwork, and tuning sortwork (number of datasets, blocksize) can speed up that I/O. If CPU time is close to elapsed time and both are high, the step is likely CPU-bound: a lot of key comparison or INREC/OUTREC processing. Then you might reduce record count (INCLUDE/OMIT), simplify the sort key or reformat logic, or avoid an unnecessary sort (use COPY or MERGE where possible). DFSORT messages that mention merge passes or sortwork activity also indicate that a significant amount of I/O is happening; reducing spill to sortwork usually helps.

Using DEBUG for Diagnostics

The DEBUG control statement tells DFSORT to produce extra diagnostic output. Depending on your product and options, DEBUG may print how control statements were interpreted, how many records passed through each phase (e.g. after INCLUDE, after sort, after OUTREC), or sample record content. That is useful when you suspect wrong results: for example, you expect 100,000 records but get 80,000—DEBUG can show whether INCLUDE/OMIT dropped 20,000 or whether the sort phase or output phase is losing records. It can also help you confirm that field positions and formats in SYSIN match the actual record layout. DEBUG does not improve performance; it adds overhead and output. Use it during problem determination, then remove it for production. See the DEBUG statement tutorial for syntax and options specific to your installation.

Checking FILSZ, Sortwork, and Memory

When a sort fails with capacity exceeded (ICE046A) or runs slowly with heavy sortwork use, check three areas. First, FILSZ: this is the estimated size of the data to be sorted. If FILSZ is too low, DFSORT may allocate too little sortwork and then exceed that allocation. Provide a realistic or slightly high estimate (in the units your product expects). Second, sortwork itself: ensure you have enough SORTWK datasets (or sufficient DYNALLOC limit) and that each has enough space. If DFSORT dynamically allocates sortwork, a higher FILSZ often leads to more or larger work datasets. Third, memory: OPTION SIZE (or MOSIZE) and the step REGION in JCL limit how much memory the sort can use. If memory is too small, more data spills to sortwork, which increases I/O and can cause capacity problems. Increase SIZE and REGION within your system’s guidelines so that the sort can hold more data in memory. See the tutorials on FILSZ estimation, sortwork datasets, dynamic allocation, and memory usage for detailed tuning.

Common Causes of Slow Sorts and Fixes

Unnecessary sort: The step does not need to reorder data (e.g. only filtering or reformatting). Fix: Use OPTION COPY so the sort phase is skipped. See Avoiding unnecessary sorts.
Re-sorting already sorted data: Inputs are already in key order but are concatenated and sorted again. Fix: Use MERGE with SORTIN01, SORTIN02, etc., instead of SORT with one SORTIN.
Too little memory: SIZE or REGION is small, so most data spills to sortwork. Fix: Increase OPTION SIZE (or MOSIZE) and step REGION so more data is sorted in memory.
Underestimated FILSZ: DFSORT allocates too little sortwork and then hits capacity or performs poorly. Fix: Set FILSZ to a realistic estimate of input (or data) size.
Too many or too few sortwork datasets: Either allocation fails or I/O is not balanced. Fix: Follow your site’s guidelines for SORTWK count and size; use DYNALLOC if allowed and set limits appropriately.
Large record length: Fewer records fit in memory, so more spill. Fix: Use INREC to shorten records before the sort if the application only needs a subset of fields.
No filtering: All input records are sorted even though only a subset is needed. Fix: Use INCLUDE/OMIT to reduce the record count before the sort phase.

A Systematic Approach

When you are assigned to improve a slow or failing sort, follow a sequence. (1) Capture SYSOUT and read all ICE messages; note record counts and any error or capacity messages. (2) Compare CPU and elapsed time to see if the step is I/O-bound or CPU-bound. (3) If it failed, look up the ICE code and address the cause (e.g. increase sortwork or FILSZ for ICE046A). (4) If it is slow and I/O-bound, review FILSZ, SIZE, REGION, and sortwork; consider INREC to shorten records and INCLUDE/OMIT to reduce count. (5) If the step does not need to sort (order does not matter or data is already ordered), switch to COPY or MERGE. (6) Optionally add DEBUG for one run to verify record counts and control statement behavior, then remove it. (7) Re-run and compare SYSOUT and timing to confirm improvement.

Explain It Like I'm Five

When your sort is slow or breaks, it’s like when a game freezes: you have to find out why. First you look at the “scoreboard” (the messages in the job log)—that tells you how many cards were handled and if something went wrong. If the game is waiting a long time for the disk (I/O), you give it more “desk space” (memory) so it doesn’t have to put cards in drawers so often. If the game is doing too much work (CPU), you try to give it fewer cards or simpler rules. And if you didn’t really need to sort the cards at all—you only wanted to take out the red ones—you skip the sorting step entirely. So: read the messages, see if it’s waiting on I/O or busy with CPU, fix the thing that’s wrong (more space, fewer records, or no sort), and check the scoreboard again to see if it’s better.

Exercises

Your sort step completed but took 45 minutes; CPU time was 2 minutes. What does that suggest, and what would you check first?
A step fails with ICE046A. List three things you would verify or change in the JCL and SYSIN.
You suspect INCLUDE is dropping more records than intended. How would you use DEBUG (or message analysis) to confirm?

Quiz

Test Your Knowledge

1. Where do you look first when a DFSORT step is slower than expected?

Only at the input file size
At the step SYSOUT: ICE messages show record counts, phases, and often timing or resource use; they help identify whether the bottleneck is input size, sortwork I/O, or other factors
Only at SORTOUT
Only at the JCL

2. What does ICE046A usually indicate?

Input file not found
Sort capacity exceeded—typically insufficient sortwork space or memory; DFSORT could not complete the sort with the allocated resources
Invalid control statement
Output full

3. How can you tell if a sort is I/O-bound vs CPU-bound?

You cannot tell from DFSORT messages
SMF or performance data (e.g. CPU time vs elapsed time, I/O counts) and DFSORT messages (e.g. merge passes, sortwork usage) help: high elapsed time with low CPU often suggests I/O wait; high CPU with many records suggests CPU-bound comparison or processing
Only by reading the program
I/O-bound sorts always abend

4. When is the DEBUG statement useful for performance?

It speeds up every sort
DEBUG adds diagnostic output (e.g. record counts per phase, control statement interpretation); it helps you verify that the right number of records passed each phase and that control statements were applied as expected—useful when debugging wrong results or understanding where time is spent
DEBUG reduces memory use
Only for MERGE

5. Your sort runs but uses far more sortwork I/O than a similar job. What might you check?

Only SORTOUT BLKSIZE
FILSZ (underestimating can cause poor allocation), SIZE/REGION (too little memory means more spill to sortwork), record size (larger records mean fewer fit in memory), and whether you can reduce data with INCLUDE/OMIT or INREC before the sort
Only the number of SORTWK DDs
Only OPTION COPY