What is aggregation in DFSORT?

Aggregation means grouping records by a key and computing a value per group (e.g. sum, count). In DFSORT you do this with SORT FIELDS= (to define the group key) and SUM FIELDS= (to sum numeric fields). Records with the same sort key are grouped; SUM outputs one record per group with the summed values in the specified fields. Non-summed fields usually come from the first record of the group.

How do you sum multiple numeric fields in DFSORT?

List each field in SUM FIELDS= as (position, length, format). For example SUM FIELDS=(11,5,ZD,21,4,PD) sums the 5-byte zoned decimal at 11–15 and the 4-byte packed decimal at 21–24. Each group (defined by the sort key) produces one record with both totals. Ensure positions and formats match your record layout (after INREC if used).

Can you use SUM with OPTION COPY (no sort)?

Yes. With OPTION COPY, the input order is preserved. SUM still groups by the key you specify, but only adjacent records with the same key are grouped. So if the input is not sorted by that key, records with the same key in different places will not be summed together. Normally you sort first so all records with the same key are adjacent, then SUM.

What formats can SUM FIELDS= use?

Numeric formats only: PD (packed decimal), ZD (zoned decimal), BI (binary), FI (fixed-point), FL (floating-point). Specify (position, length, format) for each field. The format must match the actual data (e.g. PD for packed, ZD for zoned). Character or invalid format can cause incorrect results or abends.

How do aggregation (SUM) and report totals (OUTFIL TRAILER1) differ?

SUM produces one output record per group (e.g. one per department) with summed fields in the record itself—record-level aggregation. OUTFIL TRAILER1 with TOTAL= or COUNT= adds a separate trailer line at the end of the report with a grand total or count. So SUM = aggregated records in the dataset; TRAILER1 = summary line(s) at the end of a report. You can use both: SUM for section totals and TRAILER1 for grand total.

Aggregation Basics

Aggregation means grouping records by a key and computing a single value per group—for example, the sum of a numeric field, or a count. In DFSORT the main tool for this is the SUM statement with SUM FIELDS=(position, length, format,...). You first sort by the key that defines the group (SORT FIELDS=). Records with the same key are then adjacent; SUM adds the specified numeric fields across each group and outputs one record per group with those totals. Non-summed fields in the output usually come from the first record of the group. This page covers the idea of group-by and sum, the role of the sort key, which data types can be summed (PD, ZD, BI, FI, FL), summing multiple fields, overflow (VLSHRT vs NOVLSHRT), and using INREC to prepare data for aggregation.

SUM Statement

Progress0 of 0 lessons

What Aggregation Means

You have many detail records (e.g. one per transaction) and you want one record per group (e.g. per customer or per department) with a total (e.g. sum of amount). That is aggregation: group by a key, then compute a value (sum, count, and in some products min/max/avg) per group. In DFSORT, the group is defined by the SORT FIELDS= key. SUM FIELDS= then specifies which numeric fields to add up. So you get one output record per unique key value, with the summed fields containing the total and other fields typically copied from the first record of the group.

The Sort Key Is the Group Key

There is no separate “group by” key. The key that defines the group is exactly the SORT FIELDS= (or MERGE FIELDS=) key. So you must sort by the field(s) you want to aggregate on. If you sort by customer ID, SUM groups by customer ID. If you sort by (region, department), SUM groups by (region, department). After the sort, all records with the same key are adjacent; SUM reads that stream and, for each run of identical keys, produces one record with the summed values. So the order of operations is: sort by key → SUM (group by same key, sum specified fields) → output.

SUM FIELDS= Syntax and Numeric Formats

Each summed field is specified as (position, length, format). Position and length are the starting column and length of the field in the record (after INREC if used). Format must be a numeric type. Allowed formats typically include:

Numeric formats for SUM FIELDS=
Format	Name	Typical use
PD	Packed decimal	Common for amounts; compact
ZD	Zoned decimal	One digit per byte; sign in last byte
BI	Binary	Fullword/halfword integers
FI	Fixed-point	Fixed-point numeric
FL	Floating-point	Floating-point numeric

If you specify the wrong format (e.g. ZD for a packed field), the bytes are interpreted incorrectly and the sum will be wrong or you may get a S0C7. So the format must match how the data is actually stored.

Example: Sum One Field by Department

Input: records with department at positions 30–39 and a 3-byte zoned decimal “marks” field at positions 45–47. Requirement: one record per department with the sum of marks.

text

1
2
  SORT FIELDS=(30,10,CH,A)
  SUM FIELDS=(45,3,ZD)

Records are sorted by department (30–39). For each department, the 3-byte ZD field at 45–47 is summed. The output has one record per department; the record is based on the first record of the group, with positions 45–47 replaced by the sum. So you get department-level totals in the summed field.

Summing Multiple Fields

You can list several fields in SUM FIELDS=. Each is (position, length, format). For example SUM FIELDS=(11,5,ZD,21,4,PD) sums the 5-byte zoned decimal at 11–15 and the 4-byte packed decimal at 21–24. Both sums are computed for each group; the output record has the first group’s key and non-summed data (from the first record) and both totals in positions 11–15 and 21–24. So you can aggregate multiple numeric fields in one SUM statement.

Example: sum two fields

text

1
2
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=(11,5,ZD,21,4,PD)

One record per unique (1–10). The 5-byte ZD at 11–15 and the 4-byte PD at 21–24 are summed. Other positions (e.g. 1–10, 16–20, 25–80) come from the first record of the group.

Overflow: VLSHRT and NOVLSHRT

When you sum many large numbers, the total can exceed the size of the output field (e.g. the sum does not fit in 5 bytes). Two behaviors are available. VLSHRT (variable-length short): if the sum overflows, truncate it to fit in the field so the job continues; you may lose high-order digits. NOVLSHRT: do not truncate; if the sum overflows, DFSORT terminates with an error (e.g. ICE or abend) so you can increase the field length or fix the data. Use NOVLSHRT when you need to guarantee that no overflow is silently lost. The option can be specified as OPTION NOVLSHRT or on the SUM statement (e.g. SUM FIELDS=(...),NOVLSHRT); see your manual.

Example: sum with overflow check

text

1
2
3
  OPTION NOVLSHRT
  SORT FIELDS=(1,10,CH,A)
  SUM FIELDS=(11,5,ZD),NOVLSHRT

If the sum of the 5-byte ZD field overflows 5 bytes, the step will fail instead of truncating. Use this when the result must be correct and you prefer to fix the layout or data rather than accept a truncated total.

INREC Before SUM: Reformat and Convert

If the input record is not in a form that SUM can use (e.g. the “amount” is in character or signed form), use INREC to reformat or convert before the sort and SUM. INREC runs before the sort; the record that SORT and SUM see is the INREC output. So the positions and lengths in SORT FIELDS= and SUM FIELDS= refer to the record after INREC. For example, if the input has a signed character amount at 20–25, you might use INREC to convert it to ZD in a fixed position, then SORT and SUM on that layout. That way SUM receives a proper numeric field.

Aggregation vs Report Totals

SUM produces aggregated records: one record per group with summed fields in the record. That is record-level aggregation. OUTFIL TRAILER1 (and TRAILER3 with SECTIONS) can add summary lines at the end of a report (or after each section) with COUNT= or TOTAL= for grand total or section total. So SUM = aggregated dataset (many groups, one row per group); TRAILER = extra line(s) with totals. You can use both: e.g. SUM to build department-level records, then OUTFIL to write a report with TRAILER1 for grand total.

Explain It Like I'm Five

Imagine you have a pile of receipts, each with a store name and an amount. You sort them so all “Store A” receipts are together and all “Store B” receipts are together. Aggregation is like having a helper take each pile: for Store A they add up all the amounts and write one line “Store A total: $500.” For Store B they do the same. So you end up with one line per store and the total for that store. In the computer, the “store” is the sort key, and SUM FIELDS= is the “add up the amounts” step. The helper has to know what kind of number is on the receipt (packed, zoned, etc.) so they add correctly—that’s the format. If the total is too big to fit in the box (overflow), VLSHRT means “chop it to fit”; NOVLSHRT means “stop and tell me so I can get a bigger box.”

Exercises

You have records with customer ID at 1–8 and amount (4-byte PD) at 21–24. Write SORT and SUM to get one record per customer with the sum of amount.
Sum two fields: a 5-byte ZD at 30 and a 4-byte PD at 40. Sort by bytes 1–10. Write the control statements.
Why would you specify NOVLSHRT when summing? What could go wrong if you use VLSHRT and the sum overflows?
If the input has a 6-byte character field that looks like a signed number (e.g. -02963), can you sum it directly with SUM FIELDS=(...,6,ZD)? What might you do instead?

Quiz

Test Your Knowledge

1. What defines the "group" when using SUM FIELDS=(...)?

The SUM FIELDS positions
The SORT FIELDS= key—records with the same sort key are in the same group and are summed together
INREC only
The first record

2. Which data types can you sum with SUM FIELDS=?

Only CH (character)
Numeric types: PD (packed decimal), ZD (zoned decimal), BI (binary), FI (fixed-point), FL (floating-point)
Only PD
Any type

3. What happens to non-summed fields in the output record?

They are set to zero
They usually come from the first record of the group (e.g. key and descriptive fields from the first record, summed fields from the total)
They are dropped
They are concatenated

4. Why would you use INREC before SUM?

To sort only
To reformat or convert data before the sort and SUM—e.g. convert character/signed data to ZD so SUM can add it correctly; positions in SUM FIELDS= then refer to the INREC output
To add headers
INREC is after SUM

5. What is the difference between VLSHRT and NOVLSHRT when a sum overflows the field?

They are the same
VLSHRT truncates the result to fit in the field (you may lose digits); NOVLSHRT causes an error/abend so you can fix the field size or data
NOVLSHRT truncates
Only VLSHRT is valid

Aggregation Basics

What Aggregation Means

The Sort Key Is the Group Key

SUM FIELDS= Syntax and Numeric Formats

Example: Sum One Field by Department

Summing Multiple Fields

Example: sum two fields

Overflow: VLSHRT and NOVLSHRT

Example: sum with overflow check

INREC Before SUM: Reformat and Convert

Aggregation vs Report Totals

Explain It Like I'm Five

Exercises

Quiz

Test Your Knowledge

Related Concepts

SUM statement

Removing duplicates with SUM

Report totals

Related Pages