The DFSORT OPTION keywords EQUALS and NOEQUALS control what happens when two or more records have the same sort key. With EQUALS, DFSORT preserves the relative order of such records—the order they had in the input. That is called a stable sort. With NOEQUALS, DFSORT does not guarantee that order; it may output them in a different sequence. NOEQUALS can allow a faster or more efficient algorithm. This page explains the difference in detail, when to use each, and how EQUALS/NOEQUALS affect SUM (deduplication) and multi-key sorting.
When you specify SORT FIELDS= (or MERGE FIELDS=), every record has a sort key—the bytes at the positions you specified, interpreted in the format you gave (CH, PD, etc.). When two records have the same key, they are equal on that key. For example, if you sort by bytes 1–10 character and two records both have "CUSTOMER1" in positions 1–10, those two records have equal keys. The question is: in the sorted output, which one comes first? EQUALS and NOEQUALS control that.
OPTION EQUALS tells DFSORT to use a stable sort. A stable sort has this property: when two records have equal keys, their relative order in the output is the same as their relative order in the input. So if record A was read before record B and both have the same key, then in the output A will still appear before B. That makes the result predictable: you can rely on input order as a tie-breaker when keys are equal. For example, if you sort by region and your input was already in date order within each region, with EQUALS the output will be in region order and within each region still in date order. That is useful when you have a secondary ordering in the input that you want to preserve when the primary key is equal.
OPTION NOEQUALS tells DFSORT that it does not need to preserve the order of equal-key records. So when two records have the same key, DFSORT may output them in either order. That freedom can allow DFSORT to use a different internal algorithm—often one that is faster or uses less memory (e.g. less sortwork). So NOEQUALS is a performance option: use it when you only care that records are grouped by key and you do not care which order they appear in within the group. If you do care (e.g. you want the first record in input order to be the first in output for each key), you must use EQUALS.
| Aspect | EQUALS | NOEQUALS |
|---|---|---|
| Order of equal-key records | Preserved (same as input relative order) | Not guaranteed |
| Stable sort | Yes | No |
| Performance | May use more resource (stable algorithm) | May be faster / less sortwork |
| Predictable for SUM dedup | Yes; first in group is well-defined | Which record kept may vary |
You code OPTION EQUALS or OPTION NOEQUALS in the control statements (SYSIN). You can combine them with other options in the same OPTION statement, for example:
12OPTION EQUALS SORT FIELDS=(1,10,CH,A)
or
12OPTION NOEQUALS,STOPAFT=10000 SORT FIELDS=(20,4,PD,D)
The OPTION statement usually appears before or near the SORT or MERGE statement. If you omit EQUALS and NOEQUALS, the default depends on your installation; to be sure of behavior, code the one you want explicitly.
When you have multiple sort keys (e.g. primary 1–10 CH A, secondary 11–4 PD D), two records are “equal” only when they are equal on all keys. So DFSORT first orders by the primary key; when the primary key is equal, it uses the secondary key; when both are equal, the relative order is controlled by EQUALS vs NOEQUALS. So EQUALS/NOEQUALS only apply when the records are equal on every key you specified. If they differ on any key, the key order determines the output order regardless of EQUALS.
When you use SUM FIELDS=NONE (or SUM with fields) to collapse duplicate keys, DFSORT keeps one record per key (and optionally sums numeric fields). The record that is kept is the first in the sorted order for that key. With EQUALS, that “first” is well-defined: it is the record that was first in the input among all records with that key (because the sort is stable). So you can predict which record is kept—for example, if you sorted by customer ID and your input had transactions in time order, the kept record is the first transaction for that customer. With NOEQUALS, the order among equal-key records is not guaranteed, so which record SUM keeps is not guaranteed. If you need predictable “keep first” or “keep last” behavior when removing duplicates, use EQUALS and control the sort order (e.g. sort by key and then by a time field so “first” is the earliest transaction).
In general, EQUALS (stable sort) may use slightly more CPU or sortwork than NOEQUALS because the algorithm must preserve order. The difference is often small, but for very large sorts with many equal keys, NOEQUALS can show a noticeable improvement. So if you do not need stable behavior, coding OPTION NOEQUALS can be a simple performance tuning step. If you need stable behavior, the predictability of EQUALS is usually worth the possible extra cost.
When you use MERGE, records from multiple streams are combined. For records with equal keys that come from different streams (or the same stream), the order in the merged output may be affected by EQUALS or NOEQUALS: with EQUALS, the product may preserve a well-defined order (e.g. by stream and position); with NOEQUALS, the order among equal-key records may not be guaranteed. If you merge and then use SUM or rely on order within key groups, specify EQUALS so the behavior is predictable. See your product documentation for MERGE-specific details.
Imagine you are lining up kids by the first letter of their name. Two kids named “Anna” and “Alex” both start with “A.” Who goes first? With EQUALS, we remember the order they were standing before: if Anna was in front of Alex, Anna stays in front. With NOEQUALS, the teacher can put either one first—we don’t care. So EQUALS means “when the sort key is the same, keep the same order they had.” NOEQUALS means “when the key is the same, any order is fine,” and that can make the job a bit faster.
1. What does OPTION EQUALS do?
2. When might you choose NOEQUALS over EQUALS?
3. You use SUM FIELDS=NONE to keep one record per key. Why does EQUALS matter?
4. What is a "stable" sort?
5. Is EQUALS or NOEQUALS the default in DFSORT?