Reformatting before sort means changing the layout or content of each input record before the sort or merge phase runs. In DFSORT you do this with the INREC control statement. Why does it matter? Because the sort phase only sees the record you give it. If you shorten the record in INREC, the sort processes less data—less sortwork I/O, less memory, and often better performance. If you need to sort by a key that does not exist as a single field in the input (for example, region code plus account number), you build that key in INREC so the sort can use it. This page explains why and when to reformat before sort, the exact processing flow (read → INREC → filter → sort → OUTREC → write), how to use INREC FIELDS= and BUILD=, and how reformatting affects SORT FIELDS and INCLUDE/OMIT positions. Understanding this flow helps you design efficient sort jobs and avoid mistakes when referring to field positions.
Reformatting before sort gives you three main benefits: performance, correct sort keys, and simpler control statements.
DFSORT processes each record in a fixed order. Knowing this order is essential so that you use the right positions in SORT FIELDS, INCLUDE, and OMIT.
| Step | Phase | What happens |
|---|---|---|
| 1 | Read | Record read from SORTIN |
| 2 | INREC | Record reformatted (if INREC present) |
| 3 | INCLUDE/OMIT | Condition evaluated on reformatted record |
| 4 | Sort/Merge | Records ordered by SORT FIELDS |
| 5 | OUTREC | Record reformatted for output (if OUTREC present) |
| 6 | Write | Record written to SORTOUT |
INREC runs in step 2, as soon as the record is read. The result of INREC is the record used in steps 3–5 (and step 6 if you do not use OUTREC). So INCLUDE/OMIT conditions and SORT FIELDS positions refer to the reformatted record. If you shorten the record in INREC, you cannot reference a byte position that no longer exists (e.g. byte 80 of the original) in later statements—you must use the new layout.
Use INREC (reformat before sort) when:
Do not use INREC when the only change you need is to the final report—for example, adding edit masks to numbers or sequence numbers for display. That belongs in OUTREC, which runs after the sort and does not affect how much data is sorted.
You define the reformatted record with INREC FIELDS=(item1,item2,...) or INREC BUILD=(item1,item2,...). Each item is either a field copy or a constant. Items are placed in the output record in the order listed; the output length is the sum of the lengths of all items.
A pair (position,length) means: copy length bytes from the input record starting at byte position. Positions are 1-based. So (1,20) copies input bytes 1–20 into the first 20 bytes of the reformatted record. (50,10) copies input bytes 50–59. The order in the list is the order in the output—so (50,10,1,20) puts the 10-byte block first, then the 20-byte block, for a 30-byte reformatted record with a different column order than the input.
C'literal' inserts the character literal (in EBCDIC) into the output; X'hex' inserts bytes from hex pairs. For example, C' ' is one space; X'40' is one EBCDIC space. Constants are useful for delimiters or padding between copied fields.
For basic reformatting, FIELDS= and BUILD= are equivalent: the same list of items produces the same output. Many shops use BUILD= to make it clear that a new record is being built.
Input is 80-byte fixed-length. You only need bytes 1–25 (ID and name) and bytes 41–60 (amount and date) for the sort and for the next step. Reformat before sort to drop the rest:
12INREC FIELDS=(1,25,41,20) SORT FIELDS=(1,8,CH,A)
INREC FIELDS=(1,25,41,20) copies input 1–25 (25 bytes) and 41–60 (20 bytes) into a 45-byte record. The sort phase therefore sorts 45-byte records instead of 80. SORT FIELDS=(1,8,CH,A) sorts by the first 8 bytes of the reformatted record, which correspond to input bytes 1–8 (e.g. ID). All position references in SORT FIELDS are to the 45-byte record.
You want to sort by region (input bytes 10–11) and then by account number (input bytes 20–29). In the input those are not adjacent. In INREC you build a single 12-byte key at the start: region then account number. The sort then uses that key.
12INREC FIELDS=(10,2,20,10,1,80) SORT FIELDS=(1,2,CH,A,3,10,CH,A)
INREC FIELDS=(10,2,20,10,1,80) builds: bytes 1–2 = region (from input 10–11), bytes 3–12 = account (from input 20–29), bytes 13–92 = full original record (1–80). So the reformatted record is 92 bytes with the composite key at the front. SORT FIELDS=(1,2,CH,A,3,10,CH,A) sorts by that key: first 2 bytes ascending, then next 10 bytes ascending. Without INREC you would need to specify SORT FIELDS=(10,2,CH,A,20,10,CH,A) and the sort would still process the full 80-byte record; with INREC you also have the option to shorten the record (e.g. keep only 1,12 and 13,80 if you do not need the middle of the original record).
After INREC, every control statement that uses record positions uses the reformatted record. That includes:
When you design INREC, plan the new layout so that the key and any filter fields are at known positions. Then write SORT FIELDS and INCLUDE/OMIT against that layout.
INREC runs before the sort; OUTREC runs after. So INREC shapes what the sort sees (and what gets written if you do not use OUTREC). OUTREC shapes only the final output. Use INREC to shorten records, build sort keys, or normalize the layout for the sort. Use OUTREC to format the sorted data for reports (edit masks, column order, sequence numbers). You can use both: INREC for a compact sort layout, OUTREC for the final file layout.
Imagine you have a big worksheet with many columns, and you only need two columns to put the rows in order. Before sorting, you make a small copy of each row with just those two columns (and maybe a few more you need later). That small copy is like INREC: you reformat each row before sorting. Now you sort the small copies. Sorting is fast because each row is short. If you had sorted the big worksheet first and only then cut out the extra columns, you would have done a lot of extra work moving big rows. So we "reformat before sort" to make the sort step faster and to put the sort key in a simple place (e.g. always at the start of the small copy).
1. In the DFSORT processing flow, when is INREC applied relative to INCLUDE/OMIT and the sort phase?
2. You have 100-byte input records but only need bytes 1–20 and 60–80 for the sort and output. What is the main performance benefit of using INREC to shorten the record?
3. After INREC FIELDS=(1,10,50,5), what positions does SORT FIELDS= use?
4. Why would you build a composite sort key in INREC instead of in OUTREC?
5. What is the practical difference between INREC FIELDS= and INREC BUILD= for basic reformatting?