MainframeMaster

Group Aggregation Without Programs

Group aggregation means combining many records that share the same key into summary information: one row per key with a total, count, minimum, maximum, or average, without writing a COBOL or other program. DFSORT supports this in two main ways. First: SUM FIELDS= keeps one record per sort key and replaces specified numeric fields with their sum (or min/max) across all records in that group—so you get one aggregated data record per key. Second: OUTFIL SECTIONS= with TRAILER lets you write one summary line per group containing COUNT=, TOTAL=, MIN=, MAX=, AVG=, and fixed text; with NODETAIL you suppress the detail records and get only those summary lines. This tutorial shows how to do group aggregation using only DFSORT control statements.

Advanced Tricks
Progress0 of 0 lessons

Why Group Aggregation Without a Program?

Many batch jobs need "one row per department with total sales" or "one line per region with count and average." You could write a COBOL program that reads sorted input, detects control breaks, and accumulates totals. But if the only thing you need is sum, count, min, max, or average per key, DFSORT can do it with SUM or OUTFIL SECTIONS—no program to compile, test, or maintain. That reduces code and speeds up development. The tradeoff is that DFSORT aggregation is limited to what SUM and SECTIONS TRAILER support; for custom logic (e.g. weighted average, median) you still need a program.

SUM: One Record per Key with Sum (or Min/Max)

Sort by the key that defines the groups (e.g. department, region). Then specify SUM FIELDS=(position,length,format,...) for each numeric field you want to aggregate. DFSORT keeps the first record of each key group and sets those fields to the sum of the values across all records in the group. So the output has one record per unique sort key, and the SUM fields contain the totals. You can specify multiple fields (e.g. total sales and total cost). SUM also supports MIN and MAX: SUM FIELDS=(pos,len,format,MIN) or (pos,len,format,MAX) to keep the minimum or maximum value in the group instead of the sum. Overflow handling (e.g. when the sum exceeds the field length) can be controlled with options like VLSHRT; see the SUM and overflow tutorials for details.

Example: One Row per Department with Total Amount

Input: fixed-length records with department in 1–4 (CH) and amount in 20–27 (PD). Output: one record per department with department and total amount.

text
1
2
SORT FIELDS=(1,4,CH,A) SUM FIELDS=(20,8,PD)

All records with the same value in 1–4 are grouped. SUM FIELDS=(20,8,PD) keeps one record per department and sets the amount in 20–27 to the sum of all amounts in that group. The first record's non-SUM fields (e.g. 1–19) are kept; only the SUM fields are replaced. So you get one aggregated row per department with no program.

OUTFIL SECTIONS: One Summary Line per Group

When you want a report-style output—one line per group with count, total, average, or min/max—use OUTFIL with SECTIONS=. Sort by the group key first. Then specify SECTIONS=(position,length,format,...) with that key. In TRAILER1=, TRAILER2=, or TRAILER3= you can include: COUNT=(output_position,length,format) for the number of records in the section; TOTAL=(input_position,length,format,edit_mask) for the sum; MIN=, MAX=, AVG= for minimum, maximum, and average. You can mix literals, the key value, and these aggregates in one trailer line. If you specify NODETAIL, the detail (data) records are not written—only the section headers (if any) and trailers. So the output is a list of summary lines, one per group. That is group aggregation without a program: you get count, total, and average (or min/max) per group in one pass.

Example: One Line per Department with Count and Total

Sorted by department (1–4). OUTFIL with SECTIONS and TRAILER3 containing a literal, the department, count, and total. NODETAIL so only the summary lines are written.

text
1
2
3
4
5
SORT FIELDS=(1,4,CH,A) OUTFIL FNAMES=REPORT, SECTIONS=(1,4,CH, TRAILER3=(1,10,C'DEPT ',5,4,1,X,10,8,COUNT=(10,8,ZD),18,20,TOTAL=(20,8,PD,M11))), NODETAIL

Each section is defined by 1–4 (department). TRAILER3 writes one line per department: "DEPT ", the department value, a blank, the record count in 10,8,ZD format, and the total of the amount field (20,8,PD) with edit mask M11. NODETAIL means no detail records—only these trailer lines. So you get one aggregated line per department with count and total.

When to Use SUM vs SECTIONS TRAILER

Use SUM when the output you want is a dataset of records—one record per key with summed (or min/max) numeric fields. The record layout is the same as the input (except the SUM fields are replaced). Use OUTFIL SECTIONS TRAILER when the output you want is a report—lines that contain literals, the key, count, total, average, etc. SECTIONS is ideal for "one line per group" human-readable reports. SUM is ideal for creating a summary file that will be used as input to another step. You can also use both in the same job: one OUTFIL with SUM-like logic (via a different technique) or a separate SORT step with SUM to produce a summary file, and another OUTFIL with SECTIONS TRAILER to produce a report.

Strategy Summary

Group aggregation without programs
GoalApproach
One row per key with sum (or min/max)SORT FIELDS=(key). SUM FIELDS=(pos,len,format,...). One aggregated data record per key.
One summary line per group (count, total, avg)SORT by key. OUTFIL SECTIONS=(key) TRAILER3=(COUNT=, TOTAL=, AVG=,...). Use NODETAIL for summary-only.
Both detail and summary linesSame as above but omit NODETAIL; detail records and TRAILER lines both written.

Explain It Like I'm Five

Imagine you have a list of lemonade sales: 5 lines for Monday, 3 for Tuesday, 4 for Wednesday. "Group aggregation" means making a short list that says: Monday 5 sales, Tuesday 3 sales, Wednesday 4 sales. You don't need to write a special program—DFSORT can do it. One way: it squashes all Monday lines into one line that has the total (SUM). Another way: at the end of each day's lines it writes a line that says "Monday: 5 sales, total $20" (that's the trailer). So you get one line per day with the summary, without writing any code.

Exercises

  1. Input has region in 1–3 and quantity in 10–17 (PD). You want one record per region with total quantity. Write the SORT and SUM control statements.
  2. You need a report with one line per region: "Region: xxx Count: nnn Avg: aaa." What OUTFIL keywords do you use, and what must you do before it?
  3. When would you choose SUM over OUTFIL SECTIONS TRAILER for group aggregation?

Quiz

Test Your Knowledge

1. What is "group aggregation" in DFSORT?

  • Only sorting
  • Combining many records that share the same key into summary information—e.g. one row per key with a total, count, min, max, or average, without writing a COBOL or other program
  • Copying only
  • Merge only

2. How do you get one row per key with the sum of a numeric field?

  • INCLUDE only
  • Sort by the key, then SUM FIELDS=(position,length,format,...); DFSORT keeps one record per sort key and replaces the specified fields with the sum (or min/max) across all records in that group
  • OMIT only
  • OUTFIL only

3. How do you get one summary line per group with count and total (e.g. "DEPT 1000 Count: 5 Total: 5000")?

  • SUM only
  • Sort by the group key, then OUTFIL SECTIONS=(key) with TRAILER1/2/3= containing COUNT= and TOTAL= (and optionally MIN=, MAX=, AVG=); use NODETAIL to suppress detail records and get only the summary lines
  • INREC only
  • MERGE only

4. What is the difference between SUM and OUTFIL SECTIONS TRAILER for aggregation?

  • No difference
  • SUM keeps one data record per key and replaces specified numeric fields with the sum (or min/max); the output is still one record per key with the same layout. OUTFIL SECTIONS TRAILER writes a separate summary line (header/trailer) per group with COUNT, TOTAL, MIN, MAX, AVG—you can have both detail records and summary lines, or NODETAIL for summary only
  • SUM is only for duplicates
  • SECTIONS is only for headers

5. Can you get both a count and a sum (and average) per group without a program?

  • No
  • Yes: use OUTFIL SECTIONS= with TRAILER3= that includes COUNT=(out_pos,len,format), TOTAL=(pos,len,format,mask), and AVG=(pos,len,format,mask) in the same trailer; or use SUM for the sum and a separate pass/count for count—but TRAILER3 can have COUNT and TOTAL and AVG in one go
  • Only with COBOL
  • Only with INCLUDE