What is the SUM statement in DFSORT?

The SUM statement collapses records that have the same sort key and optionally sums specified numeric fields. SUM FIELDS=NONE keeps one record per unique key (removes duplicates). SUM FIELDS=(position,length,format,...) keeps one record per key and sets the specified fields to the sum of those fields across all records in the group. SUM runs after the sort phase.

How do I remove duplicates in DFSORT?

Use SORT FIELDS= to define the key that identifies a duplicate, then SUM FIELDS=NONE. Records with the same key are collapsed to one record (typically the first of the group). So sort by the key, then SUM FIELDS=NONE to deduplicate.

SUM runs after the sort or merge phase. Records must be in key order so that records with the same key are adjacent. DFSORT then processes the sorted stream, grouping by key and applying SUM (collapse and/or sum numeric fields) to produce one record per group.

What formats can I sum with SUM FIELDS=?

You can sum numeric formats: PD (packed decimal), ZD (zoned decimal), BI (binary), FI (fixed-point), FL (floating-point). Each summed field is specified as (position,length,format). Character fields are not summed; use SUM FIELDS=NONE if you only want to remove duplicates.

What is VLSHRT vs NOVLSHRT in SUM?

When a summed value overflows the output field length, VLSHRT (default in some configurations) truncates the result to fit, avoiding an abend. NOVLSHRT causes DFSORT to terminate with an overflow error so you can increase the field size or fix the data. Use NOVLSHRT when you need to detect overflow.

SUM Statement - DFSORT SUM FIELDS Remove Duplicates and Sum Numeric

SUM Statement

The SUM control statement does two related jobs: it collapses records that have the same sort key into one record per key, and it can sum specified numeric fields across those records. With SUM FIELDS=NONE, you keep one record for each unique key—so you remove duplicates based on the sort key. With SUM FIELDS=(position,length,format,...), you keep one record per key and set the listed fields to the sum of those fields over all records in the group (other fields usually come from the first record). SUM runs after the sort or merge, so records must already be in key order. This page covers SUM FIELDS=NONE for deduplication, SUM FIELDS= for aggregation, the role of the sort key, and overflow handling (VLSHRT/NOVLSHRT).

Control Statements

What SUM Does

SUM is applied in the output phase, after the sort or merge has put all records in key order. At that point, records with the same key are adjacent. SUM reads this sorted stream and, for each group of records that share the same key, produces one output record. If you specify SUM FIELDS=NONE, that one record is essentially the first of the group (or a representative record); no numeric fields are added. If you specify SUM FIELDS=(pos,len,format,...), the listed numeric fields in the output record are set to the sum of those fields across all records in the group; other fields typically come from the first record. So SUM both deduplicates by key and can aggregate numeric fields (totals, counts via summing a 1, etc.).

SUM FIELDS=NONE: Removing Duplicates

The most common use of SUM is to remove duplicate records based on a key. You sort by that key, then use SUM FIELDS=NONE. Every group of records with the same key is collapsed to one record. Example:

text

1
2
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=NONE

Records are sorted by the first 10 bytes (character, ascending). Then SUM collapses: for each unique value of those 10 bytes, only one record is kept. So if you had three records with the same 10-byte key, you get one record in the output. The key in SUM is always the same as the sort key—the key that defines the group is the one you sorted on.

For multiple sort keys, the "key" for SUM is the full combination. So SORT FIELDS=(1,5,CH,A,10,3,CH,A) with SUM FIELDS=NONE keeps one record per unique (bytes 1–5, bytes 10–12) combination.

SUM FIELDS=(position,length,format,...): Summing Numeric Fields

When you want to add up numeric fields for each key group, you list those fields in SUM FIELDS=. Each entry is (position, length, format). The format must be a numeric type: PD (packed decimal), ZD (zoned decimal), BI (binary), FI (fixed-point), or FL (floating-point). DFSORT adds the values in that field across all records in the group and puts the total in the output record. Non-summed fields in the output usually retain the value from the first record of the group. Example:

text

1
2
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=(11,5,ZD,21,4,PD)

Groups are defined by the first 10 bytes. For each group, the 5-byte zoned-decimal field at positions 11–15 is summed, and the 4-byte packed-decimal field at positions 21–24 is summed. The output record has those summed values in 11–15 and 21–24; other positions come from the first record of the group. So you get one record per key with totals in the specified fields.

The Sort Key Is the Group Key

The key that defines "same group" for SUM is exactly the SORT FIELDS= (or MERGE FIELDS=) key. So you must sort by the field(s) you want to use for grouping. If you sort by customer ID, SUM collapses or sums by customer ID. If you sort by (region, dept), SUM groups by (region, dept). There is no separate "SUM key"—it is the sort key. That is why SUM is always used together with SORT (or MERGE): first you order by the key, then SUM processes the ordered stream and collapses/sums by that key.

Overflow: VLSHRT and NOVLSHRT

When you sum numeric fields, the total can exceed the size of the output field (e.g. the sum of many large numbers does not fit in 5 bytes). By default, DFSORT may truncate the result to fit (VLSHRT behavior), or it may terminate with an overflow error (NOVLSHRT). VLSHRT (variable-length short) means: if the sum overflows, shorten/truncate so it fits in the field—you may lose digits. NOVLSHRT means: do not truncate; if the sum overflows, abend or signal an error so you can fix the field length or data. Use NOVLSHRT when you need to guarantee that no overflow is silently truncated. The exact option (OPTION VLSHRT, OPTION NOVLSHRT, or on the SUM statement) depends on your DFSORT version; see your documentation.

SUM with INREC

If you use INREC, the record that the sort (and SUM) sees is the reformatted record. So the positions in SUM FIELDS= refer to the record after INREC, not the original input. Make sure the positions and lengths you give in SUM FIELDS= match the INREC output layout. The same applies to the sort key: SORT FIELDS= also refers to the INREC record.

SUM with OPTION COPY

You can use SUM with OPTION COPY (no sort). In that case, the "key" order is the input order. Records that are adjacent in the input and have the same key (you still define the key for SUM) are collapsed or summed. So the input must already be in the right order for grouping—or you accept that only adjacent duplicates are collapsed. Usually you sort first so that all records with the same key are adjacent, then use SUM.

Explain It Like I'm Five

Imagine you have a stack of cards, each with a name and a number. You sort the cards by name so all "Alice" cards are together and all "Bob" cards are together. SUM is like a helper who takes each group: for all the Alice cards, they make one new card that says "Alice" and either (1) just keeps one of the numbers (SUM FIELDS=NONE—remove duplicates), or (2) adds up all the numbers and writes the total (SUM FIELDS=). So you end up with one card per name, and if you summed, that card has the total. The "name" is the sort key—it decides who is in the same group.

Exercises

Write control statements to sort by bytes 1–8 (CH,A) and remove duplicates (one record per key).
You want one record per customer (bytes 1–10) with the total of a 4-byte packed amount at positions 21–24. Write SORT and SUM.
Why must the sort key and the SUM "group" key be the same? What would happen if you sorted by one key but wanted to sum by another?
What is the difference between VLSHRT and NOVLSHRT when a sum overflows the field?

Quiz

Test Your Knowledge

1. What does SUM FIELDS=NONE do?

Sums all numeric fields
Collapses records with the same sort key and keeps one record per key; no numeric summing
Deletes all records
Sorts by NONE

2. When is SUM applied?

Before the sort
During the input phase
After the sort phase, when records with the same key are adjacent
Only with OPTION COPY

3. SUM FIELDS=(25,4,PD) does what?

Sorts by position 25
For each group of records with the same sort key, sums the 4-byte packed-decimal field at position 25 and keeps one record with that sum
Copies 4 bytes from position 25
Omits position 25

4. What is the purpose of the sort key when using SUM?

It is ignored
It defines which records are grouped together for collapsing or summing
It only affects SUM FIELDS=NONE
It must be numeric

5. What does VLSHRT mean when summing?

Variable length short
If the sum overflows the field length, truncate (shorten) the result to fit
Sum only short fields
Validate length