MainframeMaster

Performance Considerations for Filtering

Using INCLUDE or OMIT in DFSORT is not only about getting the right records—it also affects performance. Because INCLUDE and OMIT are applied during the input phase, before the sort, records that fail the condition are dropped and never participate in the sort. That means less data to move to sortwork, fewer comparisons, and a smaller output. So filtering early (in SYSIN) usually reduces CPU time, elapsed time, and resource use. This page covers when to filter in SYSIN versus in OUTFIL, why INCLUDE vs OMIT choice is mostly about clarity rather than raw speed, how condition complexity can affect cost, and the fact that you cannot use both INCLUDE and OMIT in the same run—so you must express your logic with one or the other (or with multiple OUTFIL outputs). Understanding these points helps you design efficient sort jobs and avoid unnecessary sortwork.

INCLUDE / OMIT Advanced Filtering
Progress0 of 0 lessons

Filter Early: SYSIN INCLUDE/OMIT

INCLUDE and OMIT in the SYSIN control statements are applied as records are read from the input. A record that fails the condition is not passed to the sort phase at all. So the sort operates on a subset of the input. That subset is what gets written to sortwork, compared, and written to SORTOUT. The smaller that subset, the less I/O and CPU. So as a rule: filter as early as possible. If you can express your requirement with one INCLUDE or one OMIT in SYSIN, do it there. Do not rely on a later step or OUTFIL to do the same filter if the goal is a single output—otherwise you sort the full input and then discard records, which wastes sortwork.

SYSIN vs OUTFIL Filtering

SYSIN INCLUDE/OMIT applies to the entire run: one logical filter, and only records that pass go to the sort. OUTFIL can also specify INCLUDE/OMIT, but that is applied per OUTFIL output, and it happens after the sort. So if you code only OUTFIL INCLUDE (and no SYSIN INCLUDE/OMIT), every input record is read, sorted, and then the OUTFIL filter is applied when building that output. The sort phase still processes all records. Use SYSIN INCLUDE/OMIT when you have a single filter for the whole job. Use OUTFIL INCLUDE/OMIT when you have multiple output files and each needs a different filter—e.g. one OUTFIL with INCLUDE for status=A, another with INCLUDE for status=B. Then you sort once and each OUTFIL copy gets only the records that pass its condition.

Where to filter
WhereWhenBenefit
SYSIN INCLUDE/OMITSingle filter for the whole runFilter before sort; fewer records sorted
OUTFIL INCLUDE/OMITDifferent filter per output fileOne sort; multiple filtered outputs

INCLUDE vs OMIT: Which to Use?

From a performance perspective, INCLUDE and OMIT are equivalent: both are evaluated in the input phase, and the same number of records are dropped either way. The difference is logical: INCLUDE keeps records that satisfy the condition; OMIT drops records that satisfy the condition. Choose based on clarity. If the set you want to keep is small and easy to describe (e.g. "keep status = A or B"), use INCLUDE. If the set you want to drop is small and easy to describe (e.g. "drop invalid or test records"), use OMIT. Avoid double negatives (e.g. OMIT with a long NOT-like condition) so the next programmer can understand the intent quickly. Performance is not the deciding factor; maintainability is.

Condition Complexity

Each record is tested against your COND= expression. So the cost per record is the cost of evaluating all conditions (and short-circuit evaluation if supported). A long chain of AND/OR with many field reads and comparisons adds CPU. So: (1) Keep the condition as simple as possible while still correct. (2) If the product short-circuits (stops as soon as the result is known), putting the most selective condition first might reduce work—e.g. a test that is false most of the time. This is implementation-dependent; when in doubt, write for clarity and measure. (3) Substring search (SS) over a long range can be more expensive than a fixed-position CH comparison; use the smallest search range that is correct.

One INCLUDE or OMIT Per Run

You cannot code both INCLUDE and OMIT in the same DFSORT run for the same data path. You must choose one. If your requirement is "keep A and drop B," you have to express that as a single condition—e.g. INCLUDE with a condition that is true for A and false for B (e.g. "keep (region=North and amount>0) or (region=South)"), or OMIT with the opposite logic. Alternatively, use multiple OUTFIL outputs: one OUTFIL with INCLUDE for one subset, another OUTFIL with INCLUDE for another subset, and so on. Then the sort runs once and each output gets its own filter.

Combining with Other Optimizations

Filtering reduces the volume that the sort sees. You can combine that with other tuning: appropriate OPTION settings (e.g. EQUALS, SIZE), efficient SORT FIELDS (minimal key length), and enough sortwork space. Also ensure the order of control statements follows the required sequence (e.g. INCLUDE/OMIT before SORT FIELDS and INREC). A well-written filter plus a well-tuned sort gives the best overall job performance.

Explain It Like I'm Five

Imagine you have to sort a big pile of cards, but only the red ones matter. If you take out all the non-red cards first and then sort the red ones, you have less work—you sort a small pile. If you sort the whole pile and then throw away the non-red cards, you did a lot of extra work. So we "filter" first: we only keep the red cards (or only throw away the non-red ones) before we sort. The computer does the same: it keeps or drops records before the sort step, so the sort step has less to do. When we have two boxes and we want red cards in one box and blue in the other, we can sort once and then put each card in the right box when we write the output—that's like OUTFIL with different filters for each output.

Exercises

  1. You need one output with only records where status (1 byte at 10) = 'A'. Should you use SYSIN INCLUDE or OUTFIL INCLUDE? Why?
  2. You need two outputs: one with status='A', one with status='B'. How can you do it with one sort and two OUTFIL specs?
  3. If 90% of records have amount=0 and you want to drop them, is it better to use INCLUDE (amount NE 0) or OMIT (amount EQ 0)? Discuss performance and clarity.
  4. Why can you not code both INCLUDE COND=(...) and OMIT COND=(...) in the same SYSIN?

Quiz

Test Your Knowledge

1. Why does filtering with INCLUDE/OMIT before the sort usually improve performance?

  • It does not
  • Fewer records participate in the sort phase, so less data to sort and write to sortwork
  • INCLUDE runs in parallel
  • OMIT is faster than INCLUDE

2. If you want to keep 5% of records and drop 95%, should you use INCLUDE or OMIT?

  • OMIT—fewer conditions to evaluate for the majority of records
  • INCLUDE—so only 5% of records are passed to the sort phase
  • Either is the same
  • Use OUTFIL only

3. What is a downside of using OUTFIL INCLUDE/OMIT instead of SYSIN INCLUDE/OMIT?

  • OUTFIL cannot filter
  • Filtering in OUTFIL happens after the sort, so all records are sorted first—more sortwork and CPU
  • OUTFIL is faster
  • No downside

4. Does the order of conditions in a long AND chain affect performance?

  • Yes—put the most selective condition first so short-circuit evaluation can skip the rest when possible
  • No—all conditions are always evaluated
  • Only for OR
  • Only for numeric fields

5. When might you filter in OUTFIL instead of (or in addition to) SYSIN INCLUDE/OMIT?

  • Never
  • When you have multiple OUTFIL outputs and each needs a different filter—e.g. one file with status=A, another with status=B
  • Only for reports
  • When INCLUDE is not supported