When you sort a file, many records can have the same sort key (or the same combination of keys). How DFSORT handles those duplicates depends on what you ask for: (1) Keep all—by default, every record is written to the output; duplicates stay, and their order can be preserved with OPTION EQUALS (stable sort) or left unspecified with NOEQUALS. (2) Remove duplicates—use the SUM statement to collapse records with the same control (sort) fields; with SUM FIELDS=NONE you keep one record per unique key and drop the rest. (3) Aggregate—SUM can also add up numeric fields for each key and output one record per key with totals. So "handling duplicates" means either keeping them in a defined order (EQUALS) or reducing them (SUM). This page explains when to keep duplicates, when to remove them, and how to use EQUALS and SUM.
A plain SORT (no SUM) keeps every record. Records with the same sort key all appear in the output. So if you have 100 records and 20 have the same key, you still get 100 records out. The order of those 20 among themselves depends on OPTION EQUALS (preserve input order) or NOEQUALS (order not guaranteed). Use this when you need every record—e.g. all transactions per customer, or a report that lists every line—and you only care that they are sorted (and optionally in stable order within each key).
To remove duplicates and keep one record per key, use the SUM statement. You specify the same fields as the sort key as the "control" fields, and with SUM FIELDS=NONE you do not sum any numeric fields—you just collapse the group to one record. The record that is kept is typically the first in sort order for that key. So after the sort, each unique key appears once. SUM has more options (e.g. summing amounts, overflow handling); see the SUM statement tutorial. For simple de-duplication, SUM FIELDS=NONE with the appropriate control fields is the usual approach.
When you keep all duplicates, their order in the output may matter. OPTION EQUALS makes the sort stable: records with equal keys keep their input order. So if the input was in time order and you sort by department, with EQUALS the records within each department stay in time order. If you do not specify EQUALS (NOEQUALS default), the order of records with the same key is not guaranteed. So use EQUALS when downstream processing depends on the order within duplicate keys.
Keep all when: you need every record (e.g. detail report, all transactions); you are only sorting for order; or you will process duplicates in a later step. Remove duplicates when: you need one row per key (e.g. unique customer list, or one record per key for a join); or you want to aggregate (SUM with numeric fields). Choose EQUALS when you keep all and care about order within keys.
12SORT FIELDS=(1,10,CH,A) OPTION EQUALS
Sort by bytes 1–10; keep all records; preserve input order for records with the same key.
Sort by bytes 1–10 and keep one record per unique key (first in sort order):
12SORT FIELDS=(1,10,CH,A) SUM FIELDS=NONE
Control fields for SUM must match the sort key; see SUM statement for full syntax (e.g. control field positions and lengths).
Duplicates are when two people have the same name. We can either keep everyone in line (and maybe keep the order they arrived—EQUALS) or say "one person per name" and only keep the first for each name (SUM). Keeping everyone is like a full list; keeping one per name is like a name-only list.
1. If you only sort (no SUM), what happens to records with duplicate sort keys?
2. How do you remove duplicate keys and keep only one record per key in DFSORT?
3. What does OPTION EQUALS do for duplicate keys?
4. When might you want to keep all duplicates (not use SUM)?
5. SUM FIELDS=NONE keeps one record per unique key. Which record is kept?