How do I remove duplicates in DFSORT?

Use the SUM statement with the same control (sort) fields as your sort key, and SUM FIELDS=NONE if you do not need to add any numeric fields. That collapses records with the same key into one record per key. See SUM statement for full syntax and options (e.g. overflow handling).

What is the difference between EQUALS and SUM for duplicates?

OPTION EQUALS preserves the order of records with equal keys (stable sort); it does not remove duplicates. SUM collapses records with equal control fields and can output one record per key (e.g. SUM FIELDS=NONE) or one record with summed/aggregated fields. Use EQUALS when you need order among duplicates; use SUM when you want to remove or aggregate duplicates.

When should I keep duplicate keys in the output?

Keep all records when you need every occurrence—e.g. all transactions per account, all line items per order, or any report that must list every record. Sort order (and OPTION EQUALS for stable order within keys) still applies. Use SUM only when you want to reduce to one record per key or to aggregate.

Which record does SUM FIELDS=NONE keep for each key?

Typically the first record in sort order for that key (the one that would appear first after the sort). Product documentation defines the exact rule. So with ascending sort, the "smallest" record in the group (by any tie-breaker keys) is the one retained.

Handling Duplicates - DFSORT Duplicate Keys SUM EQUALS

Q: How does DFSORT handle duplicate sort keys?

By default, all records are kept; duplicates remain in the output. Their relative order can be preserved with OPTION EQUALS (stable sort) or left unspecified with NOEQUALS. To remove duplicates and keep one record per key, use SUM FIELDS=NONE (or SUM with control fields and no summary fields).

Handling Duplicates

When you sort a file, many records can have the same sort key (or the same combination of keys). How DFSORT handles those duplicates depends on what you ask for: (1) Keep all—by default, every record is written to the output; duplicates stay, and their order can be preserved with OPTION EQUALS (stable sort) or left unspecified with NOEQUALS. (2) Remove duplicates—use the SUM statement to collapse records with the same control (sort) fields; with SUM FIELDS=NONE you keep one record per unique key and drop the rest. (3) Aggregate—SUM can also add up numeric fields for each key and output one record per key with totals. So "handling duplicates" means either keeping them in a defined order (EQUALS) or reducing them (SUM). This page explains when to keep duplicates, when to remove them, and how to use EQUALS and SUM.

SORT Statement Deep Dive

Keeping All Records (Default)

A plain SORT (no SUM) keeps every record. Records with the same sort key all appear in the output. So if you have 100 records and 20 have the same key, you still get 100 records out. The order of those 20 among themselves depends on OPTION EQUALS (preserve input order) or NOEQUALS (order not guaranteed). Use this when you need every record—e.g. all transactions per customer, or a report that lists every line—and you only care that they are sorted (and optionally in stable order within each key).

Removing Duplicates with SUM

To remove duplicates and keep one record per key, use the SUM statement. You specify the same fields as the sort key as the "control" fields, and with SUM FIELDS=NONE you do not sum any numeric fields—you just collapse the group to one record. The record that is kept is typically the first in sort order for that key. So after the sort, each unique key appears once. SUM has more options (e.g. summing amounts, overflow handling); see the SUM statement tutorial. For simple de-duplication, SUM FIELDS=NONE with the appropriate control fields is the usual approach.

Order Among Duplicates: EQUALS

When you keep all duplicates, their order in the output may matter. OPTION EQUALS makes the sort stable: records with equal keys keep their input order. So if the input was in time order and you sort by department, with EQUALS the records within each department stay in time order. If you do not specify EQUALS (NOEQUALS default), the order of records with the same key is not guaranteed. So use EQUALS when downstream processing depends on the order within duplicate keys.

When to Keep vs Remove

Keep all when: you need every record (e.g. detail report, all transactions); you are only sorting for order; or you will process duplicates in a later step. Remove duplicates when: you need one row per key (e.g. unique customer list, or one record per key for a join); or you want to aggregate (SUM with numeric fields). Choose EQUALS when you keep all and care about order within keys.

Example: Keep All, Stable Order

text

1
2
  SORT FIELDS=(1,10,CH,A)
  OPTION EQUALS

Sort by bytes 1–10; keep all records; preserve input order for records with the same key.

Example: One Record per Key

Sort by bytes 1–10 and keep one record per unique key (first in sort order):

text

1
2
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=NONE

Control fields for SUM must match the sort key; see SUM statement for full syntax (e.g. control field positions and lengths).

Explain It Like I'm Five

Duplicates are when two people have the same name. We can either keep everyone in line (and maybe keep the order they arrived—EQUALS) or say "one person per name" and only keep the first for each name (SUM). Keeping everyone is like a full list; keeping one per name is like a name-only list.

Exercises

You need a unique list of customer IDs (one record per ID). Do you use SUM or just SORT? What SUM syntax?
You need all transactions sorted by customer, and within customer by date. Do you want EQUALS? Why?
What is the difference between OPTION EQUALS and SUM FIELDS=NONE for duplicate keys?

Quiz

Test Your Knowledge

1. If you only sort (no SUM), what happens to records with duplicate sort keys?

They are dropped
All are kept; they appear in the output in some order (input order if OPTION EQUALS, else implementation-dependent)
Only the first is kept
DFSORT abends

2. How do you remove duplicate keys and keep only one record per key in DFSORT?

OPTION NODUPS
SUM FIELDS=NONE—collapse records with the same control fields and output one record per group (the first or with summary fields)
SORT FIELDS=NONE
INCLUDE DUPS=1

3. What does OPTION EQUALS do for duplicate keys?

Removes them
Preserves the input order of records with equal keys in the output (stable sort)
Keeps only the first
Sorts duplicates last

4. When might you want to keep all duplicates (not use SUM)?

Never
When every record is needed—e.g. all transactions for each customer, or a report that lists every line; you just want them sorted (and maybe in stable order with EQUALS)
Only for character keys
Only when EQUALS is on

5. SUM FIELDS=NONE keeps one record per unique key. Which record is kept?

Random
Typically the first record in sort order for that key (after the sort); product behavior is defined in the SUM documentation
The last
The one with the largest second key

Handling Duplicates

Keeping All Records (Default)

Removing Duplicates with SUM

Order Among Duplicates: EQUALS

When to Keep vs Remove

Example: Keep All, Stable Order

Example: One Record per Key

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

SUM statement

Stable sorting

Key sequencing logic

Related Pages