Does INCLUDE/OMIT improve DFSORT performance?

Yes. INCLUDE and OMIT are applied during the input phase, before the sort. Records that do not pass the condition are dropped and never participate in the sort. So fewer records mean less sortwork I/O, less CPU for comparisons, and a smaller output. Filter as early as possible in the pipeline.

Should I use INCLUDE or OMIT for better performance?

Performance is similar; both are applied in the input phase. Choose based on clarity: use INCLUDE when the set you want to keep is easy to describe (e.g. keep 10%), and OMIT when the set to drop is easy to describe (e.g. drop invalid). Avoid double negatives; keep the condition that is simpler to write and maintain.

What is the difference between filtering in SYSIN vs OUTFIL?

SYSIN INCLUDE/OMIT filters before the sort: only records that pass are sorted. OUTFIL INCLUDE/OMIT filters after the sort: all records are sorted first, then each OUTFIL output can apply its own filter. For a single filtered output, use SYSIN to reduce sort workload. Use OUTFIL when you need different filters for different output files.

Does the complexity of the COND= expression affect performance?

Yes, but usually less than the benefit of filtering. Each condition is evaluated per record. Many conditions or expensive formats (e.g. long SS searches) add CPU. Still, filtering early with a moderate condition is usually better than sorting all records. Optimize the condition if profiling shows it is hot.

Can I use both INCLUDE and OMIT in the same DFSORT run?

No. INCLUDE and OMIT are mutually exclusive for the same run—you use one or the other in SYSIN. If you need both "include X" and "omit Y" logic, combine into one condition: e.g. INCLUDE with (X AND NOT Y) expressed using the available operators, or use multiple OUTFIL outputs with different filters.

Performance Considerations for Filtering - DFSORT INCLUDE OMIT Efficiency

Performance Considerations for Filtering

Using INCLUDE or OMIT in DFSORT is not only about getting the right records—it also affects performance. Because INCLUDE and OMIT are applied during the input phase, before the sort, records that fail the condition are dropped and never participate in the sort. That means less data to move to sortwork, fewer comparisons, and a smaller output. So filtering early (in SYSIN) usually reduces CPU time, elapsed time, and resource use. This page covers when to filter in SYSIN versus in OUTFIL, why INCLUDE vs OMIT choice is mostly about clarity rather than raw speed, how condition complexity can affect cost, and the fact that you cannot use both INCLUDE and OMIT in the same run—so you must express your logic with one or the other (or with multiple OUTFIL outputs). Understanding these points helps you design efficient sort jobs and avoid unnecessary sortwork.

INCLUDE / OMIT Advanced Filtering

Filter Early: SYSIN INCLUDE/OMIT

INCLUDE and OMIT in the SYSIN control statements are applied as records are read from the input. A record that fails the condition is not passed to the sort phase at all. So the sort operates on a subset of the input. That subset is what gets written to sortwork, compared, and written to SORTOUT. The smaller that subset, the less I/O and CPU. So as a rule: filter as early as possible. If you can express your requirement with one INCLUDE or one OMIT in SYSIN, do it there. Do not rely on a later step or OUTFIL to do the same filter if the goal is a single output—otherwise you sort the full input and then discard records, which wastes sortwork.

SYSIN vs OUTFIL Filtering

SYSIN INCLUDE/OMIT applies to the entire run: one logical filter, and only records that pass go to the sort. OUTFIL can also specify INCLUDE/OMIT, but that is applied per OUTFIL output, and it happens after the sort. So if you code only OUTFIL INCLUDE (and no SYSIN INCLUDE/OMIT), every input record is read, sorted, and then the OUTFIL filter is applied when building that output. The sort phase still processes all records. Use SYSIN INCLUDE/OMIT when you have a single filter for the whole job. Use OUTFIL INCLUDE/OMIT when you have multiple output files and each needs a different filter—e.g. one OUTFIL with INCLUDE for status=A, another with INCLUDE for status=B. Then you sort once and each OUTFIL copy gets only the records that pass its condition.

Where to filter
Where	When	Benefit
SYSIN INCLUDE/OMIT	Single filter for the whole run	Filter before sort; fewer records sorted
OUTFIL INCLUDE/OMIT	Different filter per output file	One sort; multiple filtered outputs

INCLUDE vs OMIT: Which to Use?

From a performance perspective, INCLUDE and OMIT are equivalent: both are evaluated in the input phase, and the same number of records are dropped either way. The difference is logical: INCLUDE keeps records that satisfy the condition; OMIT drops records that satisfy the condition. Choose based on clarity. If the set you want to keep is small and easy to describe (e.g. "keep status = A or B"), use INCLUDE. If the set you want to drop is small and easy to describe (e.g. "drop invalid or test records"), use OMIT. Avoid double negatives (e.g. OMIT with a long NOT-like condition) so the next programmer can understand the intent quickly. Performance is not the deciding factor; maintainability is.

Condition Complexity

Each record is tested against your COND= expression. So the cost per record is the cost of evaluating all conditions (and short-circuit evaluation if supported). A long chain of AND/OR with many field reads and comparisons adds CPU. So: (1) Keep the condition as simple as possible while still correct. (2) If the product short-circuits (stops as soon as the result is known), putting the most selective condition first might reduce work—e.g. a test that is false most of the time. This is implementation-dependent; when in doubt, write for clarity and measure. (3) Substring search (SS) over a long range can be more expensive than a fixed-position CH comparison; use the smallest search range that is correct.

One INCLUDE or OMIT Per Run

You cannot code both INCLUDE and OMIT in the same DFSORT run for the same data path. You must choose one. If your requirement is "keep A and drop B," you have to express that as a single condition—e.g. INCLUDE with a condition that is true for A and false for B (e.g. "keep (region=North and amount>0) or (region=South)"), or OMIT with the opposite logic. Alternatively, use multiple OUTFIL outputs: one OUTFIL with INCLUDE for one subset, another OUTFIL with INCLUDE for another subset, and so on. Then the sort runs once and each output gets its own filter.

Combining with Other Optimizations

Filtering reduces the volume that the sort sees. You can combine that with other tuning: appropriate OPTION settings (e.g. EQUALS, SIZE), efficient SORT FIELDS (minimal key length), and enough sortwork space. Also ensure the order of control statements follows the required sequence (e.g. INCLUDE/OMIT before SORT FIELDS and INREC). A well-written filter plus a well-tuned sort gives the best overall job performance.

Explain It Like I'm Five

Imagine you have to sort a big pile of cards, but only the red ones matter. If you take out all the non-red cards first and then sort the red ones, you have less work—you sort a small pile. If you sort the whole pile and then throw away the non-red cards, you did a lot of extra work. So we "filter" first: we only keep the red cards (or only throw away the non-red ones) before we sort. The computer does the same: it keeps or drops records before the sort step, so the sort step has less to do. When we have two boxes and we want red cards in one box and blue in the other, we can sort once and then put each card in the right box when we write the output—that's like OUTFIL with different filters for each output.

Exercises

You need one output with only records where status (1 byte at 10) = 'A'. Should you use SYSIN INCLUDE or OUTFIL INCLUDE? Why?
You need two outputs: one with status='A', one with status='B'. How can you do it with one sort and two OUTFIL specs?
If 90% of records have amount=0 and you want to drop them, is it better to use INCLUDE (amount NE 0) or OMIT (amount EQ 0)? Discuss performance and clarity.
Why can you not code both INCLUDE COND=(...) and OMIT COND=(...) in the same SYSIN?

Quiz

Test Your Knowledge

1. Why does filtering with INCLUDE/OMIT before the sort usually improve performance?

It does not
Fewer records participate in the sort phase, so less data to sort and write to sortwork
INCLUDE runs in parallel
OMIT is faster than INCLUDE

2. If you want to keep 5% of records and drop 95%, should you use INCLUDE or OMIT?

OMIT—fewer conditions to evaluate for the majority of records
INCLUDE—so only 5% of records are passed to the sort phase
Either is the same
Use OUTFIL only

3. What is a downside of using OUTFIL INCLUDE/OMIT instead of SYSIN INCLUDE/OMIT?

OUTFIL cannot filter
Filtering in OUTFIL happens after the sort, so all records are sorted first—more sortwork and CPU
OUTFIL is faster
No downside

4. Does the order of conditions in a long AND chain affect performance?

Yes—put the most selective condition first so short-circuit evaluation can skip the rest when possible
No—all conditions are always evaluated
Only for OR
Only for numeric fields

5. When might you filter in OUTFIL instead of (or in addition to) SYSIN INCLUDE/OMIT?

Never
When you have multiple OUTFIL outputs and each needs a different filter—e.g. one file with status=A, another with status=B
Only for reports
When INCLUDE is not supported

Performance Considerations for Filtering

Filter Early: SYSIN INCLUDE/OMIT

SYSIN vs OUTFIL Filtering

INCLUDE vs OMIT: Which to Use?

Condition Complexity

One INCLUDE or OMIT Per Run

Combining with Other Optimizations

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

INCLUDE statement

OUTFIL statement

OPTION statement

Related Pages