The SUM control statement does two related jobs: it collapses records that have the same sort key into one record per key, and it can sum specified numeric fields across those records. With SUM FIELDS=NONE, you keep one record for each unique key—so you remove duplicates based on the sort key. With SUM FIELDS=(position,length,format,...), you keep one record per key and set the listed fields to the sum of those fields over all records in the group (other fields usually come from the first record). SUM runs after the sort or merge, so records must already be in key order. This page covers SUM FIELDS=NONE for deduplication, SUM FIELDS= for aggregation, the role of the sort key, and overflow handling (VLSHRT/NOVLSHRT).
SUM is applied in the output phase, after the sort or merge has put all records in key order. At that point, records with the same key are adjacent. SUM reads this sorted stream and, for each group of records that share the same key, produces one output record. If you specify SUM FIELDS=NONE, that one record is essentially the first of the group (or a representative record); no numeric fields are added. If you specify SUM FIELDS=(pos,len,format,...), the listed numeric fields in the output record are set to the sum of those fields across all records in the group; other fields typically come from the first record. So SUM both deduplicates by key and can aggregate numeric fields (totals, counts via summing a 1, etc.).
The most common use of SUM is to remove duplicate records based on a key. You sort by that key, then use SUM FIELDS=NONE. Every group of records with the same key is collapsed to one record. Example:
12SORT FIELDS=(1,10,CH,A) SUM FIELDS=NONE
Records are sorted by the first 10 bytes (character, ascending). Then SUM collapses: for each unique value of those 10 bytes, only one record is kept. So if you had three records with the same 10-byte key, you get one record in the output. The key in SUM is always the same as the sort key—the key that defines the group is the one you sorted on.
For multiple sort keys, the "key" for SUM is the full combination. So SORT FIELDS=(1,5,CH,A,10,3,CH,A) with SUM FIELDS=NONE keeps one record per unique (bytes 1–5, bytes 10–12) combination.
When you want to add up numeric fields for each key group, you list those fields in SUM FIELDS=. Each entry is (position, length, format). The format must be a numeric type: PD (packed decimal), ZD (zoned decimal), BI (binary), FI (fixed-point), or FL (floating-point). DFSORT adds the values in that field across all records in the group and puts the total in the output record. Non-summed fields in the output usually retain the value from the first record of the group. Example:
12SORT FIELDS=(1,10,CH,A) SUM FIELDS=(11,5,ZD,21,4,PD)
Groups are defined by the first 10 bytes. For each group, the 5-byte zoned-decimal field at positions 11–15 is summed, and the 4-byte packed-decimal field at positions 21–24 is summed. The output record has those summed values in 11–15 and 21–24; other positions come from the first record of the group. So you get one record per key with totals in the specified fields.
The key that defines "same group" for SUM is exactly the SORT FIELDS= (or MERGE FIELDS=) key. So you must sort by the field(s) you want to use for grouping. If you sort by customer ID, SUM collapses or sums by customer ID. If you sort by (region, dept), SUM groups by (region, dept). There is no separate "SUM key"—it is the sort key. That is why SUM is always used together with SORT (or MERGE): first you order by the key, then SUM processes the ordered stream and collapses/sums by that key.
When you sum numeric fields, the total can exceed the size of the output field (e.g. the sum of many large numbers does not fit in 5 bytes). By default, DFSORT may truncate the result to fit (VLSHRT behavior), or it may terminate with an overflow error (NOVLSHRT). VLSHRT (variable-length short) means: if the sum overflows, shorten/truncate so it fits in the field—you may lose digits. NOVLSHRT means: do not truncate; if the sum overflows, abend or signal an error so you can fix the field length or data. Use NOVLSHRT when you need to guarantee that no overflow is silently truncated. The exact option (OPTION VLSHRT, OPTION NOVLSHRT, or on the SUM statement) depends on your DFSORT version; see your documentation.
If you use INREC, the record that the sort (and SUM) sees is the reformatted record. So the positions in SUM FIELDS= refer to the record after INREC, not the original input. Make sure the positions and lengths you give in SUM FIELDS= match the INREC output layout. The same applies to the sort key: SORT FIELDS= also refers to the INREC record.
You can use SUM with OPTION COPY (no sort). In that case, the "key" order is the input order. Records that are adjacent in the input and have the same key (you still define the key for SUM) are collapsed or summed. So the input must already be in the right order for grouping—or you accept that only adjacent duplicates are collapsed. Usually you sort first so that all records with the same key are adjacent, then use SUM.
Imagine you have a stack of cards, each with a name and a number. You sort the cards by name so all "Alice" cards are together and all "Bob" cards are together. SUM is like a helper who takes each group: for all the Alice cards, they make one new card that says "Alice" and either (1) just keeps one of the numbers (SUM FIELDS=NONE—remove duplicates), or (2) adds up all the numbers and writes the total (SUM FIELDS=). So you end up with one card per name, and if you summed, that card has the total. The "name" is the sort key—it decides who is in the same group.
1. What does SUM FIELDS=NONE do?
2. When is SUM applied?
3. SUM FIELDS=(25,4,PD) does what?
4. What is the purpose of the sort key when using SUM?
5. What does VLSHRT mean when summing?