MainframeMaster

Reformatting Records Before Sort

Reformatting before sort means changing the layout or content of each input record before the sort or merge phase runs. In DFSORT you do this with the INREC control statement. Why does it matter? Because the sort phase only sees the record you give it. If you shorten the record in INREC, the sort processes less data—less sortwork I/O, less memory, and often better performance. If you need to sort by a key that does not exist as a single field in the input (for example, region code plus account number), you build that key in INREC so the sort can use it. This page explains why and when to reformat before sort, the exact processing flow (read → INREC → filter → sort → OUTREC → write), how to use INREC FIELDS= and BUILD=, and how reformatting affects SORT FIELDS and INCLUDE/OMIT positions. Understanding this flow helps you design efficient sort jobs and avoid mistakes when referring to field positions.

INREC Processing
Progress0 of 0 lessons

Why Reformat Before Sort?

Reformatting before sort gives you three main benefits: performance, correct sort keys, and simpler control statements.

  • Performance. The sort phase reads records, compares keys, and writes data to sortwork. The larger each record, the more data is moved and compared. If you drop unneeded columns in INREC, the record length shrinks. For example, going from 80-byte to 40-byte records roughly halves the amount of data the sort handles. That usually means less elapsed time, less sortwork space, and less CPU.
  • Sort keys that do not exist in the input. Sometimes the key you want is a combination of fields (e.g. department and then employee ID) or a converted value (e.g. a date in a sortable format). In the input file those might be in different positions or formats. In INREC you build a single key field at a known position (e.g. bytes 1–15), and then SORT FIELDS=(1,15,CH,A) uses it. If you built that key only in OUTREC, the sort would never see it and could not order by it.
  • Consistent layout for SORT FIELDS and INCLUDE/OMIT. After INREC, every record has the same layout. You can refer to "position 1–10" or "position 11–15" in SORT FIELDS and in INCLUDE/OMIT without worrying about the original input positions. That makes control statements easier to write and maintain.

Processing Flow: When INREC Runs

DFSORT processes each record in a fixed order. Knowing this order is essential so that you use the right positions in SORT FIELDS, INCLUDE, and OMIT.

Record processing order
StepPhaseWhat happens
1ReadRecord read from SORTIN
2INRECRecord reformatted (if INREC present)
3INCLUDE/OMITCondition evaluated on reformatted record
4Sort/MergeRecords ordered by SORT FIELDS
5OUTRECRecord reformatted for output (if OUTREC present)
6WriteRecord written to SORTOUT

INREC runs in step 2, as soon as the record is read. The result of INREC is the record used in steps 3–5 (and step 6 if you do not use OUTREC). So INCLUDE/OMIT conditions and SORT FIELDS positions refer to the reformatted record. If you shorten the record in INREC, you cannot reference a byte position that no longer exists (e.g. byte 80 of the original) in later statements—you must use the new layout.

When to Reformat Before Sort

Use INREC (reformat before sort) when:

  • You want to shorten the record so the sort uses less memory and sortwork. Copy only the fields needed for the sort and for downstream steps.
  • You need to build a sort key from multiple input fields or from a converted value (e.g. date, packed to character). Put the new key at a fixed position in the INREC output and reference it in SORT FIELDS.
  • You need to reorder fields so the key is at the start of the record or so the layout matches what INCLUDE/OMIT and SORT FIELDS expect.
  • You need to insert constants (e.g. delimiters, filler, or a type code) so every record has a uniform structure before the sort.

Do not use INREC when the only change you need is to the final report—for example, adding edit masks to numbers or sequence numbers for display. That belongs in OUTREC, which runs after the sort and does not affect how much data is sorted.

INREC FIELDS= and BUILD=

You define the reformatted record with INREC FIELDS=(item1,item2,...) or INREC BUILD=(item1,item2,...). Each item is either a field copy or a constant. Items are placed in the output record in the order listed; the output length is the sum of the lengths of all items.

Field copy: (position, length)

A pair (position,length) means: copy length bytes from the input record starting at byte position. Positions are 1-based. So (1,20) copies input bytes 1–20 into the first 20 bytes of the reformatted record. (50,10) copies input bytes 50–59. The order in the list is the order in the output—so (50,10,1,20) puts the 10-byte block first, then the 20-byte block, for a 30-byte reformatted record with a different column order than the input.

Constants: C'...' and X'...'

C'literal' inserts the character literal (in EBCDIC) into the output; X'hex' inserts bytes from hex pairs. For example, C' ' is one space; X'40' is one EBCDIC space. Constants are useful for delimiters or padding between copied fields.

For basic reformatting, FIELDS= and BUILD= are equivalent: the same list of items produces the same output. Many shops use BUILD= to make it clear that a new record is being built.

Example: Shortening Records for Performance

Input is 80-byte fixed-length. You only need bytes 1–25 (ID and name) and bytes 41–60 (amount and date) for the sort and for the next step. Reformat before sort to drop the rest:

text
1
2
INREC FIELDS=(1,25,41,20) SORT FIELDS=(1,8,CH,A)

INREC FIELDS=(1,25,41,20) copies input 1–25 (25 bytes) and 41–60 (20 bytes) into a 45-byte record. The sort phase therefore sorts 45-byte records instead of 80. SORT FIELDS=(1,8,CH,A) sorts by the first 8 bytes of the reformatted record, which correspond to input bytes 1–8 (e.g. ID). All position references in SORT FIELDS are to the 45-byte record.

Example: Building a Composite Sort Key

You want to sort by region (input bytes 10–11) and then by account number (input bytes 20–29). In the input those are not adjacent. In INREC you build a single 12-byte key at the start: region then account number. The sort then uses that key.

text
1
2
INREC FIELDS=(10,2,20,10,1,80) SORT FIELDS=(1,2,CH,A,3,10,CH,A)

INREC FIELDS=(10,2,20,10,1,80) builds: bytes 1–2 = region (from input 10–11), bytes 3–12 = account (from input 20–29), bytes 13–92 = full original record (1–80). So the reformatted record is 92 bytes with the composite key at the front. SORT FIELDS=(1,2,CH,A,3,10,CH,A) sorts by that key: first 2 bytes ascending, then next 10 bytes ascending. Without INREC you would need to specify SORT FIELDS=(10,2,CH,A,20,10,CH,A) and the sort would still process the full 80-byte record; with INREC you also have the option to shorten the record (e.g. keep only 1,12 and 13,80 if you do not need the middle of the original record).

Positions After INREC: SORT FIELDS and INCLUDE/OMIT

After INREC, every control statement that uses record positions uses the reformatted record. That includes:

  • SORT FIELDS= — Positions and lengths refer to the INREC output. If the reformatted record is 50 bytes, valid positions are 1–50. You cannot reference original input position 80 after shortening the record.
  • INCLUDE COND= / OMIT COND= — The condition is evaluated on the reformatted record. So if INREC puts a type code at position 1, use something like INCLUDE COND=(1,1,CH,EQ,C'A') to keep only type A.
  • SUM FIELDS= — If you use SUM, the control and sum fields are in the reformatted layout.

When you design INREC, plan the new layout so that the key and any filter fields are at known positions. Then write SORT FIELDS and INCLUDE/OMIT against that layout.

INREC vs OUTREC: Quick Comparison

INREC runs before the sort; OUTREC runs after. So INREC shapes what the sort sees (and what gets written if you do not use OUTREC). OUTREC shapes only the final output. Use INREC to shorten records, build sort keys, or normalize the layout for the sort. Use OUTREC to format the sorted data for reports (edit masks, column order, sequence numbers). You can use both: INREC for a compact sort layout, OUTREC for the final file layout.

Explain It Like I'm Five

Imagine you have a big worksheet with many columns, and you only need two columns to put the rows in order. Before sorting, you make a small copy of each row with just those two columns (and maybe a few more you need later). That small copy is like INREC: you reformat each row before sorting. Now you sort the small copies. Sorting is fast because each row is short. If you had sorted the big worksheet first and only then cut out the extra columns, you would have done a lot of extra work moving big rows. So we "reformat before sort" to make the sort step faster and to put the sort key in a simple place (e.g. always at the start of the small copy).

Exercises

  1. Input is 100 bytes. You need to sort by bytes 5–14 (character) and only keep bytes 1–30 and 70–100 in the output. Write INREC to shorten the record and give the correct SORT FIELDS= for the reformatted record.
  2. Why can you not build a composite sort key in OUTREC and expect the sort to use it?
  3. After INREC FIELDS=(1,10,20,5), what is the length of the record seen by the sort? What position in the reformatted record holds the data that was at input positions 20–24?
  4. List the processing order from "read from SORTIN" to "write to SORTOUT" and say which steps use the original input record and which use the INREC record.

Quiz

Test Your Knowledge

1. In the DFSORT processing flow, when is INREC applied relative to INCLUDE/OMIT and the sort phase?

  • After the sort
  • After INCLUDE/OMIT but before sort
  • Before INCLUDE/OMIT and before sort
  • Only when OPTION COPY is used

2. You have 100-byte input records but only need bytes 1–20 and 60–80 for the sort and output. What is the main performance benefit of using INREC to shorten the record?

  • INREC runs faster than OUTREC
  • The sort phase processes 40-byte records instead of 100, so less sortwork I/O and memory
  • INCLUDE runs faster on shorter records
  • There is no benefit

3. After INREC FIELDS=(1,10,50,5), what positions does SORT FIELDS= use?

  • Original input positions 1–10 and 50–54
  • Reformatted record positions 1–15 only
  • Reformatted record positions; e.g. 1,10,CH,A for first 10 bytes
  • SORT FIELDS cannot be used with INREC

4. Why would you build a composite sort key in INREC instead of in OUTREC?

  • OUTREC cannot build keys
  • The sort phase must see the key to order by it; INREC runs before sort, OUTREC runs after
  • INREC is required for multi-key sorts
  • Building in OUTREC would change the output format

5. What is the practical difference between INREC FIELDS= and INREC BUILD= for basic reformatting?

  • FIELDS= shortens, BUILD= lengthens
  • No difference; both define the same output layout when given the same items
  • BUILD= runs after the sort
  • FIELDS= cannot use constants