MainframeMaster

AVG Calculation Techniques

DFSORT does not provide a built-in AVG (average) function in the SUM statement. To get an average per group, you use the formula average = sum / count: you need the sum of the numeric field and the count of records in each group. You can get the sum with SUM FIELDS=. To get the count in the same aggregated record, a common technique is to add a field that is always 1 in every record (e.g. via INREC), then include that field in SUM FIELDS=—the sum of 1s is the record count. The output record then has both the total and the count; computing the average (total divided by count) may be done in OUTREC if your product supports arithmetic, or in a second step or program. This page covers the sum-and-count approach, the constant-1 trick, and options for doing the division.

SUM Statement
Progress0 of 0 lessons

Average = Sum / Count

The average of a numeric field over a group of records is the sum of that field divided by the number of records. So you need two values per group: the sum of the field and the count of records. SUM FIELDS= gives you the sum. It does not add a count field. So you must obtain the count by some other means and then perform the division.

Getting the Count: Sum a Constant 1

A standard technique is to add a numeric field that contains the value 1 in every input record. When you SUM that field, the result is the number of records in the group (each record contributes 1). So you use INREC to build or overlay a field—for example a 1-byte or 2-byte packed or zoned decimal with value 1—and then list that field in SUM FIELDS= along with your real numeric field. The aggregated record then has: sum of your field (e.g. amount) and sum of the 1s (count). From that you can compute average = amount_sum / count.

Example: INREC add count field, then SUM

Input: 80-byte records, key at 1–8, amount (5-byte ZD) at 21–25. You want one record per key with sum of amount, count, and (conceptually) average. Add a count field at a fixed position (e.g. 26–27 as 2-byte PD with value 1) via INREC, then sort and sum both amount and that field.

text
1
2
3
INREC FIELDS=(1,80,26,2,PD,1,0) SORT FIELDS=(1,8,CH,A) SUM FIELDS=(21,5,ZD,26,2,PD)

The exact INREC syntax for “overlay position 26–27 with packed 1” is product-dependent (the example shows the idea: 26,2,PD,1,0 might be BUILD or OVERLAY with a constant). After SUM, positions 21–25 hold the sum of amount and 26–27 hold the sum of the 1s (the count). You then need to compute average = (21–25) / (26–27). If your DFSORT supports arithmetic in OUTREC (e.g. DIV or similar), you can add an OUTREC step that overlays an “average” field with that quotient. Otherwise, write this record and use a second step or program to compute the average.

Steps for average per group
StepWhatHow
1Sum the numeric fieldSUM FIELDS=(pos, len, format) for the amount (or other field)
2Get count per groupINREC add a field = 1; SUM that field too—sum of 1s = count
3Compute averageAverage = sum / count; use OUTREC arithmetic if supported, else second step or program

Record Layout After INREC

When you add a “constant 1” field with INREC, the positions in SORT FIELDS= and SUM FIELDS= refer to the record after INREC. So you must build a layout where the key, the amount, and the new count field are at known positions. For example: INREC FIELDS=(1,25,26,2,PD,1,0,28,53) might copy bytes 1–25, put a 2-byte packed 1 at 26–27, and copy the rest. Then SUM FIELDS=(21,5,ZD,26,2,PD) sums the amount at 21–25 and the count field at 26–27. Check your product manual for the exact OVERLAY or BUILD syntax for constants.

Computing the Division (Average)

Once you have sum and count in the same record, average = sum / count. Some DFSORT or ICETOOL versions support arithmetic in OUTREC or OUTFIL BUILD (e.g. divide one field by another and put the result in a target position). If so, you can add an OUTREC that overlays the average field. If not, your options are: (1) a second SORT step that reads the aggregated file and uses INREC/OVERLAY with arithmetic (if supported in that product); (2) a short program (COBOL, REXX, etc.) that reads the record, computes sum/count, and writes it back; (3) leave the record with sum and count and let a downstream reporting or analytics tool compute the average. For report-only output, OUTFIL SECTIONS with TRAILER3 can give you section total and section count; the average for the section is total/count, which you may be able to format in the trailer or derive in the report writer.

OUTFIL SECTIONS: Section Total and Count

When you need section subtotals (e.g. per department), you can use OUTFIL with SECTIONS= and TRAILER3=. In TRAILER3 you can specify COUNT= for the record count in that section and TOTAL= (or TOT=) for the sum of a numeric field. So each section gets a trailer line with total and count; the section average is total/count. Some products allow you to include a computed average in the trailer; otherwise you have the two numbers and can derive the average in reporting. This is useful when you are building a report and do not need the average in the same data record as the detail.

Rounding and Precision

When you divide sum by count, the result may have decimal places. Decide how many digits you need and whether to round. If you are writing the average to a fixed-length field (e.g. 5 bytes ZD), ensure the quotient fits and consider rounding (product-dependent). If you use a program for the division, you can control rounding and precision there.

Explain It Like I'm Five

Imagine you have three jars of coins. Jar 1 has 6 coins, jar 2 has 10, jar 3 has 8. The “average per jar” is total coins (24) divided by number of jars (3), so 8. The computer does the same: first it adds up the numbers (sum) and counts how many records there are (count). To get the count, we trick it: we put a 1 in every record and add those 1s up—that’s the count. Then we divide the sum by the count to get the average. Sometimes the computer can do the division in the same job; sometimes we need a second step to do the division.

Exercises

  1. Your record has customer at 1–8 and amount (4-byte PD) at 20–23. Describe how you would get one record per customer with sum of amount, count of records, and average amount (conceptually). What INREC would you add?
  2. Why is “sum of a field that is 1 in every record” equal to the record count for the group?
  3. If your DFSORT does not support division in OUTREC, what are two ways to still produce a record that contains the average?
  4. For a report with section subtotals, how do TRAILER3 COUNT= and TOTAL= help you show a section average?

Quiz

Test Your Knowledge

1. How do you get a record count per group when using SUM for aggregation?

  • SUM automatically adds a count field
  • Add a field that is always 1 in every record (e.g. via INREC), then include that field in SUM FIELDS=—the sum of 1s is the count
  • Use COUNT= in SUM
  • Only OUTFIL can count

2. What is the formula for average per group?

  • Sum only
  • Average = sum of the field / count of records in the group
  • Average = count / sum
  • SUM FIELDS= has an AVG option

3. Can you compute the average in a single DFSORT step?

  • No, always two steps
  • It depends on the product: some DFSORT/ICETOOL versions support arithmetic in OUTREC (e.g. divide sum by count); otherwise you need a second step or program to do the division
  • Yes, with SUM FIELDS=AVG
  • Only with ICETOOL

4. What is the purpose of adding a "constant 1" field before SUM when you need average?

  • To pad the record
  • So that when you SUM that field, the result is the number of records in the group (count)—each record contributes 1, so the sum is the count
  • To avoid overflow
  • Only for reporting

5. For report-style output with section averages, what OUTFIL feature is often used?

  • Only SUM
  • SECTIONS= with TRAILER3= can give section count and section total (COUNT=, TOTAL=); the average for the section is total/count—some products allow expressing this in the trailer or you compute it separately
  • INCLUDE only
  • OUTREC only