Summing numeric fields in DFSORT means adding up the values in one or more numeric fields for each group of records that share the same sort key. You specify each field to sum with SUM FIELDS=(position, length, format)—and for multiple fields, you list several such triples. Position is the starting byte (1-based) in the record, length is the number of bytes, and format tells DFSORT how to interpret those bytes (PD, ZD, BI, FI, or FL). The format must match how the data is actually stored; otherwise the sum will be wrong or the job may abend. This page goes into depth on position and length, each numeric format and when to use it, summing multiple fields, record layout after INREC, and how to avoid overflow or convert character data before summing.
Every field you sum is identified by where it starts in the record and how many bytes it occupies. Position is the starting byte number, usually 1-based (byte 1 is the first byte of the record). Length is the number of consecutive bytes. So (11, 5, ZD) means “the field starting at byte 11, 5 bytes long, in zoned decimal format”—i.e. bytes 11 through 15.
If you use INREC, the record that SORT and SUM see is the record after INREC. So all positions in SORT FIELDS= and SUM FIELDS= refer to that reformatted record, not the original input. If your input has the amount at bytes 50–54 and you use INREC to move it to bytes 21–25, you would specify SUM FIELDS=(21,5,ZD) (or PD if you converted to packed). Getting the position or length wrong—for example pointing to another field or including an extra byte—causes incorrect sums or protection exceptions (e.g. S0C7). Always verify against your record layout or copybook.
The same bytes in storage represent different numbers depending on the format. DFSORT uses the format to interpret the bytes as a numeric value before adding. You must specify the format that matches how the data is stored.
| Format | Name | Storage | Length example | Typical use |
|---|---|---|---|---|
| PD | Packed decimal | Two digits per byte; sign in last half-byte | e.g. 4 bytes = 7 digits + sign | COMP-3, amounts, quantities |
| ZD | Zoned decimal | One digit per byte; sign in last byte (C/D/F) | e.g. 5 bytes = 5 digits + sign | DISPLAY numeric, display fields |
| BI | Binary | Two's complement integer | 2 bytes (halfword) or 4 bytes (fullword) | COMP, COMP-4, counts, IDs |
| FI | Fixed-point | Fixed-point numeric (product-dependent) | Product-specific | When data is in FI form |
| FL | Floating-point | Floating-point (e.g. IEEE or hex) | 4 or 8 bytes typically | Scientific or float data |
Packed decimal stores two decimal digits per byte, except the rightmost half-byte which holds the sign (e.g. C for positive, D for negative in EBCDIC). So a 4-byte packed field holds 7 digits plus sign. The length you specify is the byte length (e.g. 4). PD is very common for amounts and quantities in mainframe files (e.g. COBOL COMP-3). If the data is packed and you specify ZD or BI, the bytes will be misinterpreted and the sum will be wrong or you may get an abend.
Zoned decimal uses one byte per digit. The last byte also carries the sign (e.g. C or F for positive, D for negative). So a 5-byte ZD field holds 5 digits plus sign. Length is the number of bytes (same as digit count for a signed field). ZD is common when the numeric data is stored in “display” or character-like form (e.g. COBOL DISPLAY numeric). If you have a character field that looks like a number (e.g. "12345"), you cannot sum it directly; you must convert it to ZD or PD in INREC first, then sum the converted field.
Binary fields are two’s complement integers: halfword (2 bytes) or fullword (4 bytes). Length is 2 or 4 (or 8 for doubleword if supported). Use BI when the field is stored as binary (e.g. COBOL COMP or COMP-4). Binary summing is exact for integers; specify the correct length or you will read the wrong value.
FI (fixed-point) and FL (floating-point) are less common in typical batch reporting. Use them when your data is actually in those formats. The exact length and interpretation are product-dependent; refer to your DFSORT/ICETOOL documentation.
Input: fixed-length records with department code in bytes 1–10 and a 5-byte zoned decimal “sales” amount in bytes 21–25. Requirement: one record per department with the sum of sales.
12SORT FIELDS=(1,10,CH,A) SUM FIELDS=(21,5,ZD)
Records are sorted by department (1–10). For each department, the 5-byte ZD field at 21–25 is summed. The output has one record per department; that record is based on the first record of the group, with bytes 21–25 replaced by the sum. So you get one total per department in the same record layout.
You can sum as many numeric fields as you need in a single SUM FIELDS= by listing each as (position, length, format). DFSORT adds each field independently per group and writes all totals into the output record in the same positions. Non-summed positions usually retain the value from the first record of the group.
Record layout: key at 1–8, amount1 (4-byte PD) at 9–12, amount2 (4-byte PD) at 13–16, quantity (2-byte BI) at 17–18. One record per key with sum of amount1, amount2, and quantity.
12SORT FIELDS=(1,8,CH,A) SUM FIELDS=(9,4,PD,13,4,PD,17,2,BI)
The output has one record per unique (1–8). Positions 9–12 hold the sum of amount1, 13–16 the sum of amount2, and 17–18 the sum of quantity. Bytes 1–8 (the key) and any bytes after 18 come from the first record of the group.
If the input record does not have the numeric fields in a summable form or in the positions you want, use INREC first. INREC runs before the sort; SORT and SUM see the record after INREC. So you can move fields, convert character to numeric (e.g. build a ZD or PD field), or rearrange the layout. The positions in SORT FIELDS= and SUM FIELDS= then refer to the INREC output. For example, if the input has a 6-byte character amount at 40–45, you might use INREC to convert it to a 5-byte ZD at positions 21–25, then SORT FIELDS=(1,10,CH,A) and SUM FIELDS=(21,5,ZD). Always ensure the summed field length is sufficient for the largest possible total (see “Handling overflow”).
The sum is written back into the same byte range as the input field. If the total is too large to fit (e.g. you sum 10,000 records and the total needs more digits than the field allows), you get overflow. With VLSHRT, DFSORT may truncate the value to fit, which can lose significant digits. With NOVLSHRT, the step fails so you can increase the field size (e.g. define a longer field in INREC and sum that) or correct the data. For reliable totals, plan the field size for the maximum possible sum or use NOVLSHRT to detect overflow.
Imagine a stack of receipts. Each receipt has a store name and a number (the amount). You first sort the receipts so all “Store A” receipts are together. Then for Store A you add up all the amounts and write the total on one line. You do the same for Store B, and so on. In the computer, the “store” is the sort key, and “add up the amounts” is SUM FIELDS=. The computer has to know where the amount is on each receipt (position) and how long it is (length), and whether the number is written in “packed” or “zoned” or “binary” (format). If it uses the wrong kind of number, the total will be wrong. If the total is too big to fit in the box, the computer can either squeeze it in and lose part of the number (VLSHRT) or stop and tell you (NOVLSHRT) so you can use a bigger box.
1. What does the "position" in SUM FIELDS=(position,length,format) refer to?
2. Why must the format in SUM FIELDS= match how the data is stored?
3. Can you sum a character field that contains digits (e.g. "01234") with SUM FIELDS=?
4. How do you sum three different numeric fields in one SUM statement?
5. What happens to the output field length when you sum many large numbers?