DFSORT does not provide a built-in AVG (average) function in the SUM statement. To get an average per group, you use the formula average = sum / count: you need the sum of the numeric field and the count of records in each group. You can get the sum with SUM FIELDS=. To get the count in the same aggregated record, a common technique is to add a field that is always 1 in every record (e.g. via INREC), then include that field in SUM FIELDS=—the sum of 1s is the record count. The output record then has both the total and the count; computing the average (total divided by count) may be done in OUTREC if your product supports arithmetic, or in a second step or program. This page covers the sum-and-count approach, the constant-1 trick, and options for doing the division.
The average of a numeric field over a group of records is the sum of that field divided by the number of records. So you need two values per group: the sum of the field and the count of records. SUM FIELDS= gives you the sum. It does not add a count field. So you must obtain the count by some other means and then perform the division.
A standard technique is to add a numeric field that contains the value 1 in every input record. When you SUM that field, the result is the number of records in the group (each record contributes 1). So you use INREC to build or overlay a field—for example a 1-byte or 2-byte packed or zoned decimal with value 1—and then list that field in SUM FIELDS= along with your real numeric field. The aggregated record then has: sum of your field (e.g. amount) and sum of the 1s (count). From that you can compute average = amount_sum / count.
Input: 80-byte records, key at 1–8, amount (5-byte ZD) at 21–25. You want one record per key with sum of amount, count, and (conceptually) average. Add a count field at a fixed position (e.g. 26–27 as 2-byte PD with value 1) via INREC, then sort and sum both amount and that field.
123INREC FIELDS=(1,80,26,2,PD,1,0) SORT FIELDS=(1,8,CH,A) SUM FIELDS=(21,5,ZD,26,2,PD)
The exact INREC syntax for “overlay position 26–27 with packed 1” is product-dependent (the example shows the idea: 26,2,PD,1,0 might be BUILD or OVERLAY with a constant). After SUM, positions 21–25 hold the sum of amount and 26–27 hold the sum of the 1s (the count). You then need to compute average = (21–25) / (26–27). If your DFSORT supports arithmetic in OUTREC (e.g. DIV or similar), you can add an OUTREC step that overlays an “average” field with that quotient. Otherwise, write this record and use a second step or program to compute the average.
| Step | What | How |
|---|---|---|
| 1 | Sum the numeric field | SUM FIELDS=(pos, len, format) for the amount (or other field) |
| 2 | Get count per group | INREC add a field = 1; SUM that field too—sum of 1s = count |
| 3 | Compute average | Average = sum / count; use OUTREC arithmetic if supported, else second step or program |
When you add a “constant 1” field with INREC, the positions in SORT FIELDS= and SUM FIELDS= refer to the record after INREC. So you must build a layout where the key, the amount, and the new count field are at known positions. For example: INREC FIELDS=(1,25,26,2,PD,1,0,28,53) might copy bytes 1–25, put a 2-byte packed 1 at 26–27, and copy the rest. Then SUM FIELDS=(21,5,ZD,26,2,PD) sums the amount at 21–25 and the count field at 26–27. Check your product manual for the exact OVERLAY or BUILD syntax for constants.
Once you have sum and count in the same record, average = sum / count. Some DFSORT or ICETOOL versions support arithmetic in OUTREC or OUTFIL BUILD (e.g. divide one field by another and put the result in a target position). If so, you can add an OUTREC that overlays the average field. If not, your options are: (1) a second SORT step that reads the aggregated file and uses INREC/OVERLAY with arithmetic (if supported in that product); (2) a short program (COBOL, REXX, etc.) that reads the record, computes sum/count, and writes it back; (3) leave the record with sum and count and let a downstream reporting or analytics tool compute the average. For report-only output, OUTFIL SECTIONS with TRAILER3 can give you section total and section count; the average for the section is total/count, which you may be able to format in the trailer or derive in the report writer.
When you need section subtotals (e.g. per department), you can use OUTFIL with SECTIONS= and TRAILER3=. In TRAILER3 you can specify COUNT= for the record count in that section and TOTAL= (or TOT=) for the sum of a numeric field. So each section gets a trailer line with total and count; the section average is total/count. Some products allow you to include a computed average in the trailer; otherwise you have the two numbers and can derive the average in reporting. This is useful when you are building a report and do not need the average in the same data record as the detail.
When you divide sum by count, the result may have decimal places. Decide how many digits you need and whether to round. If you are writing the average to a fixed-length field (e.g. 5 bytes ZD), ensure the quotient fits and consider rounding (product-dependent). If you use a program for the division, you can control rounding and precision there.
Imagine you have three jars of coins. Jar 1 has 6 coins, jar 2 has 10, jar 3 has 8. The “average per jar” is total coins (24) divided by number of jars (3), so 8. The computer does the same: first it adds up the numbers (sum) and counts how many records there are (count). To get the count, we trick it: we put a 1 in every record and add those 1s up—that’s the count. Then we divide the sum by the count to get the average. Sometimes the computer can do the division in the same job; sometimes we need a second step to do the division.
1. How do you get a record count per group when using SUM for aggregation?
2. What is the formula for average per group?
3. Can you compute the average in a single DFSORT step?
4. What is the purpose of adding a "constant 1" field before SUM when you need average?
5. For report-style output with section averages, what OUTFIL feature is often used?