Group aggregation means combining many records that share the same key into summary information: one row per key with a total, count, minimum, maximum, or average, without writing a COBOL or other program. DFSORT supports this in two main ways. First: SUM FIELDS= keeps one record per sort key and replaces specified numeric fields with their sum (or min/max) across all records in that group—so you get one aggregated data record per key. Second: OUTFIL SECTIONS= with TRAILER lets you write one summary line per group containing COUNT=, TOTAL=, MIN=, MAX=, AVG=, and fixed text; with NODETAIL you suppress the detail records and get only those summary lines. This tutorial shows how to do group aggregation using only DFSORT control statements.
Many batch jobs need "one row per department with total sales" or "one line per region with count and average." You could write a COBOL program that reads sorted input, detects control breaks, and accumulates totals. But if the only thing you need is sum, count, min, max, or average per key, DFSORT can do it with SUM or OUTFIL SECTIONS—no program to compile, test, or maintain. That reduces code and speeds up development. The tradeoff is that DFSORT aggregation is limited to what SUM and SECTIONS TRAILER support; for custom logic (e.g. weighted average, median) you still need a program.
Sort by the key that defines the groups (e.g. department, region). Then specify SUM FIELDS=(position,length,format,...) for each numeric field you want to aggregate. DFSORT keeps the first record of each key group and sets those fields to the sum of the values across all records in the group. So the output has one record per unique sort key, and the SUM fields contain the totals. You can specify multiple fields (e.g. total sales and total cost). SUM also supports MIN and MAX: SUM FIELDS=(pos,len,format,MIN) or (pos,len,format,MAX) to keep the minimum or maximum value in the group instead of the sum. Overflow handling (e.g. when the sum exceeds the field length) can be controlled with options like VLSHRT; see the SUM and overflow tutorials for details.
Input: fixed-length records with department in 1–4 (CH) and amount in 20–27 (PD). Output: one record per department with department and total amount.
12SORT FIELDS=(1,4,CH,A) SUM FIELDS=(20,8,PD)
All records with the same value in 1–4 are grouped. SUM FIELDS=(20,8,PD) keeps one record per department and sets the amount in 20–27 to the sum of all amounts in that group. The first record's non-SUM fields (e.g. 1–19) are kept; only the SUM fields are replaced. So you get one aggregated row per department with no program.
When you want a report-style output—one line per group with count, total, average, or min/max—use OUTFIL with SECTIONS=. Sort by the group key first. Then specify SECTIONS=(position,length,format,...) with that key. In TRAILER1=, TRAILER2=, or TRAILER3= you can include: COUNT=(output_position,length,format) for the number of records in the section; TOTAL=(input_position,length,format,edit_mask) for the sum; MIN=, MAX=, AVG= for minimum, maximum, and average. You can mix literals, the key value, and these aggregates in one trailer line. If you specify NODETAIL, the detail (data) records are not written—only the section headers (if any) and trailers. So the output is a list of summary lines, one per group. That is group aggregation without a program: you get count, total, and average (or min/max) per group in one pass.
Sorted by department (1–4). OUTFIL with SECTIONS and TRAILER3 containing a literal, the department, count, and total. NODETAIL so only the summary lines are written.
12345SORT FIELDS=(1,4,CH,A) OUTFIL FNAMES=REPORT, SECTIONS=(1,4,CH, TRAILER3=(1,10,C'DEPT ',5,4,1,X,10,8,COUNT=(10,8,ZD),18,20,TOTAL=(20,8,PD,M11))), NODETAIL
Each section is defined by 1–4 (department). TRAILER3 writes one line per department: "DEPT ", the department value, a blank, the record count in 10,8,ZD format, and the total of the amount field (20,8,PD) with edit mask M11. NODETAIL means no detail records—only these trailer lines. So you get one aggregated line per department with count and total.
Use SUM when the output you want is a dataset of records—one record per key with summed (or min/max) numeric fields. The record layout is the same as the input (except the SUM fields are replaced). Use OUTFIL SECTIONS TRAILER when the output you want is a report—lines that contain literals, the key, count, total, average, etc. SECTIONS is ideal for "one line per group" human-readable reports. SUM is ideal for creating a summary file that will be used as input to another step. You can also use both in the same job: one OUTFIL with SUM-like logic (via a different technique) or a separate SORT step with SUM to produce a summary file, and another OUTFIL with SECTIONS TRAILER to produce a report.
| Goal | Approach |
|---|---|
| One row per key with sum (or min/max) | SORT FIELDS=(key). SUM FIELDS=(pos,len,format,...). One aggregated data record per key. |
| One summary line per group (count, total, avg) | SORT by key. OUTFIL SECTIONS=(key) TRAILER3=(COUNT=, TOTAL=, AVG=,...). Use NODETAIL for summary-only. |
| Both detail and summary lines | Same as above but omit NODETAIL; detail records and TRAILER lines both written. |
Imagine you have a list of lemonade sales: 5 lines for Monday, 3 for Tuesday, 4 for Wednesday. "Group aggregation" means making a short list that says: Monday 5 sales, Tuesday 3 sales, Wednesday 4 sales. You don't need to write a special program—DFSORT can do it. One way: it squashes all Monday lines into one line that has the total (SUM). Another way: at the end of each day's lines it writes a line that says "Monday: 5 sales, total $20" (that's the trailer). So you get one line per day with the summary, without writing any code.
1. What is "group aggregation" in DFSORT?
2. How do you get one row per key with the sum of a numeric field?
3. How do you get one summary line per group with count and total (e.g. "DEPT 1000 Count: 5 Total: 5000")?
4. What is the difference between SUM and OUTFIL SECTIONS TRAILER for aggregation?
5. Can you get both a count and a sum (and average) per group without a program?