MainframeMaster

Avoiding Unnecessary Sorts

One of the most effective ways to improve DFSORT performance is to avoid sorting when you do not need to. Every sort consumes CPU, memory, and often sortwork I/O. If the output order does not matter, or if the data is already in the right order, you can use OPTION COPY or MERGE instead of SORT and save significant resource. This page explains when to skip the sort, when to use COPY, when to use MERGE, and how to reduce the workload when a sort is truly required.

Performance Optimization
Progress0 of 0 lessons

Why Unnecessary Sorts Are Costly

A full SORT in DFSORT reads all input records, builds sorted runs (in memory and then on sortwork if needed), and merges those runs into one sorted stream. That process uses CPU for key comparison and movement, memory for sort objects, and often substantial I/O to sortwork datasets. If the result does not need to be sorted—or if it is already sorted—doing that work wastes time and resource. In large batch jobs, eliminating a single unnecessary sort can cut minutes from elapsed time and reduce load on the system. The key is to decide up front: does the downstream step or report actually require records in a specific order?

Use OPTION COPY When Order Does Not Matter

OPTION COPY tells DFSORT to skip the sort (and merge) phase. Records are read from SORTIN, optionally filtered by INCLUDE or OMIT, optionally reformatted by INREC or OUTREC, and written to SORTOUT in the same order they were read. No SORT FIELDS= or MERGE FIELDS= is used. So COPY is ideal when you only need to:

  • Copy data from one dataset to another without changing order.
  • Filter records (INCLUDE/OMIT) and write the result in input order.
  • Reformat records (INREC/OUTREC)—e.g. change column positions, add constants, convert dates—without reordering.
  • Split output (OUTFIL) or write multiple outputs (e.g. different formats or subsets) in input order.

In all these cases, using SORT would add a full sort phase for no benefit. COPY avoids that phase and is typically much faster and lighter on sortwork. The only limitation is that SUM (which collapses adjacent duplicates by key) expects records in key order; if you need SUM, you usually need SORT FIELDS= (or MERGE) to establish that order first.

Example: Filter and Reformat Without Sort

Suppose you need to select records where a status byte equals 'A' and write them with a few fields reordered and a constant added. The consumer does not care about order. Using COPY:

text
1
2
3
4
5
6
7
8
//STEP1 EXEC PGM=SORT //SORTIN DD DSN=MY.INPUT,DISP=SHR //SORTOUT DD DSN=MY.OUTPUT,DISP=(NEW,CATLG),... //SYSIN DD * OPTION COPY INCLUDE COND=(50,1,CH,EQ,C'A') OUTREC FIELDS=(1,20, 80,10, 40,30, 21,1,C'X') /*

INCLUDE keeps only records with the status byte at position 50 equal to 'A'. OUTREC builds the output record from three input ranges plus a constant. No SORT FIELDS= is given, and OPTION COPY ensures no sort phase runs. Records are written in the order they were read. If you had used SORT FIELDS=(1,20,CH,A) instead, DFSORT would have sorted all included records by the first 20 bytes, which is unnecessary when order does not matter.

Use MERGE When Inputs Are Already Sorted

MERGE is designed for two or more inputs that are each already sorted by the same key. DFSORT then merges the streams in a linear fashion: it repeatedly picks the next record in key order from among the inputs and writes it to SORTOUT. No full re-sort is performed. If you concatenate those same inputs into a single SORTIN and run SORT, DFSORT will treat the combined stream as unsorted and will re-sort everything—wasting CPU and sortwork. So whenever you have multiple files already in key order and want one combined sorted file, use MERGE with SORTIN01, SORTIN02, and so on, and MERGE FIELDS= with the same key. MERGE is the right tool and avoids an unnecessary sort.

Example: Combining Pre-Sorted Files with MERGE

Two datasets, each sorted by employee ID in columns 1–10, need to be combined into one sorted file:

text
1
2
3
4
5
6
7
8
//STEP1 EXEC PGM=SORT //SORTIN01 DD DSN=EMP.FILE1,DISP=SHR //SORTIN02 DD DSN=EMP.FILE2,DISP=SHR //SORTOUT DD DSN=EMP.COMBINED,DISP=(NEW,CATLG),... //SYSIN DD * MERGE FIELDS=(1,10,CH,A) OUTREC FIELDS=(1,80) /*

MERGE FIELDS=(1,10,CH,A) merges the two streams on the first 10 bytes in ascending order. No SORT is used; the merge is linear. If you had allocated both files to a single SORTIN (e.g. with a concatenated DD) and used SORT FIELDS=(1,10,CH,A), DFSORT would have re-sorted the entire combined set, which is unnecessary when both inputs are already in that order.

Avoid Redundant Sorts in the Pipeline

In a multi-step job, an earlier step may already produce output in the order the next step needs. For example, a COBOL program might write records in customer number order, or a DB2 unload might use ORDER BY. If the next DFSORT step only adds sequence numbers, changes formatting, or splits the file, it does not need to sort again. Use OPTION COPY in that step so it preserves the existing order. Similarly, if the next step is a MERGE that expects sorted inputs, and the previous step produced one of those inputs already sorted, do not run an extra SORT on that output before the MERGE. Checking the job flow and removing redundant sorts often yields large savings.

Reduce Sort Work When You Must Sort

When you truly need sorted output, you can still reduce the cost of the sort in several ways. First, use INCLUDE or OMIT to drop records that are not needed before they enter the sort phase. Fewer records mean less data to sort and less sortwork I/O. Second, use INREC to shorten or simplify records before the sort (e.g. keep only the key and a few needed fields). Smaller records allow more records to fit in memory and can reduce the number of runs written to sortwork. Third, sort only by the keys the application actually needs; adding extra sort keys increases comparison cost and can change the order in ways that are unnecessary. Fourth, ensure FILSZ and sortwork allocation are adequate so the sort does not fail or retry; see the tuning and sortwork topics for details.

Decision Guide

Use the following as a quick reference when designing a DFSORT step:

When to use COPY, MERGE, or SORT
ScenarioRecommended action
Filter/reformat only; order does not matterOPTION COPY with INCLUDE/OMIT, INREC, OUTREC, OUTFIL as needed.
Two or more inputs already sorted by same keyMERGE with SORTIN01, SORTIN02, … and MERGE FIELDS= with that key.
One unsorted input; output must be sortedSORT with SORT FIELDS=; use INCLUDE/OMIT and INREC to reduce data if possible.
Input already in required order; need to add fields or reformatOPTION COPY; no SORT or MERGE.

Explain It Like I'm Five

Imagine you have a pile of cards and a friend asks you to “fix” them. If they only want you to take out the red cards and stack the rest, you don’t have to sort the stack—you just take out red and leave the rest in the order they’re in. That’s like COPY: you’re cleaning up or changing the cards without reordering. If your friend gives you two stacks that are already in order (say, by number) and wants one big stack in order, you don’t shuffle everything; you just merge the two stacks by always taking the smallest card from the top of either stack. That’s MERGE. Only when the cards are all mixed up and your friend needs them in a specific order do you do a full sort. So: only sort when you really need a new order; otherwise copy or merge.

Exercises

  1. Your job has a step that filters records with OMIT and writes to SORTOUT. The next program does not care about order. What change would you make to the DFSORT step and why?
  2. You have three files, each sorted by date. You want one file in date order. Write the control statements (MERGE FIELDS= and any DD names) you would use. Why is MERGE better than SORT here?
  3. A previous step produces a file sorted by account number. The next step must add a sequence number in columns 1–5. Should the next step use SORT or COPY? What control statements would you use?

Quiz

Test Your Knowledge

1. You need to filter records and reformat them, but the output order does not matter. What is the best approach?

  • Use SORT FIELDS= with a dummy key
  • Use OPTION COPY with INCLUDE/OMIT and INREC or OUTREC
  • Use MERGE with one input
  • Sort first, then filter in a second step

2. You have two datasets, each already sorted by employee ID. You want one combined file in employee ID order. What should you use?

  • SORT with concatenated SORTIN
  • MERGE with SORTIN01 and SORTIN02
  • Two separate SORT steps
  • COPY with SORTIN01

3. Why might filtering with INCLUDE/OMIT before the sort help performance when you do need to sort?

  • INCLUDE/OMIT has no effect on sort performance
  • Fewer records enter the sort phase, so less data is sorted and written to sortwork
  • INCLUDE/OMIT only affects SORTOUT
  • It increases memory use

4. A previous step in your job already produces output sorted by account number. The next step only needs to add a sequence number. What should the next step use?

  • SORT FIELDS=(1,10,CH,A) to “ensure” order
  • OPTION COPY with OUTREC to add sequence numbers; no sort needed
  • MERGE with one input
  • Two SORT steps for safety

5. What is a “redundant sort” in a batch pipeline?

  • A sort that uses SUM
  • A sort step that runs even though the data is already in the required order from a previous step
  • A sort with multiple keys
  • A sort that writes to tape