What does reformatting before sort mean in DFSORT?

Reformatting before sort means using the INREC control statement to change each input record (length, field order, or content) before the sort or merge phase runs. The reformatted record is what INCLUDE/OMIT and SORT FIELDS see. So you can shorten records, build new keys, or reorder fields so that the sort operates on exactly the data you need.

Why reformat records before sort instead of after?

Reformatting before sort (INREC) reduces the amount of data the sort phase processes: shorter records mean less sortwork I/O and memory. It also lets you build sort keys that do not exist in the input (e.g. concatenate two fields). Reformatting after (OUTREC) only affects the final output file; the sort still processes the full record if you do not use INREC.

Does INREC change the number of records?

With simple INREC FIELDS= or BUILD=, one input record produces one output record. The record count stays the same. With IFTHEN and advanced logic you can conditionally drop or duplicate records; for basic reformatting, count is unchanged.

Do SORT FIELDS and INCLUDE/OMIT use input or reformatted positions?

After INREC, SORT FIELDS= and INCLUDE/OMIT COND= use positions in the reformatted record, not the original input. So if INREC produces a 50-byte record, valid positions are 1–50. You must design INREC so that the key and filter fields are at known positions in the reformatted layout.

When should I use INREC vs OUTREC?

Use INREC when you need to change what gets sorted: shorten records, build or move sort keys, or normalize the layout before the sort. Use OUTREC when you need to change only the final output: column order for reports, edit masks, sequence numbers, or headers. You can use both in the same job—INREC for the sort, OUTREC for the written file.

Reformatting Records Before Sort

Reformatting before sort means changing the layout or content of each input record before the sort or merge phase runs. In DFSORT you do this with the INREC control statement. Why does it matter? Because the sort phase only sees the record you give it. If you shorten the record in INREC, the sort processes less data—less sortwork I/O, less memory, and often better performance. If you need to sort by a key that does not exist as a single field in the input (for example, region code plus account number), you build that key in INREC so the sort can use it. This page explains why and when to reformat before sort, the exact processing flow (read → INREC → filter → sort → OUTREC → write), how to use INREC FIELDS= and BUILD=, and how reformatting affects SORT FIELDS and INCLUDE/OMIT positions. Understanding this flow helps you design efficient sort jobs and avoid mistakes when referring to field positions.

INREC Processing

Progress0 of 0 lessons

Why Reformat Before Sort?

Reformatting before sort gives you three main benefits: performance, correct sort keys, and simpler control statements.

Performance. The sort phase reads records, compares keys, and writes data to sortwork. The larger each record, the more data is moved and compared. If you drop unneeded columns in INREC, the record length shrinks. For example, going from 80-byte to 40-byte records roughly halves the amount of data the sort handles. That usually means less elapsed time, less sortwork space, and less CPU.
Sort keys that do not exist in the input. Sometimes the key you want is a combination of fields (e.g. department and then employee ID) or a converted value (e.g. a date in a sortable format). In the input file those might be in different positions or formats. In INREC you build a single key field at a known position (e.g. bytes 1–15), and then SORT FIELDS=(1,15,CH,A) uses it. If you built that key only in OUTREC, the sort would never see it and could not order by it.
Consistent layout for SORT FIELDS and INCLUDE/OMIT. After INREC, every record has the same layout. You can refer to "position 1–10" or "position 11–15" in SORT FIELDS and in INCLUDE/OMIT without worrying about the original input positions. That makes control statements easier to write and maintain.

Processing Flow: When INREC Runs

DFSORT processes each record in a fixed order. Knowing this order is essential so that you use the right positions in SORT FIELDS, INCLUDE, and OMIT.

Record processing order
Step	Phase	What happens
1	Read	Record read from SORTIN
2	INREC	Record reformatted (if INREC present)
3	INCLUDE/OMIT	Condition evaluated on reformatted record
4	Sort/Merge	Records ordered by SORT FIELDS
5	OUTREC	Record reformatted for output (if OUTREC present)
6	Write	Record written to SORTOUT

INREC runs in step 2, as soon as the record is read. The result of INREC is the record used in steps 3–5 (and step 6 if you do not use OUTREC). So INCLUDE/OMIT conditions and SORT FIELDS positions refer to the reformatted record. If you shorten the record in INREC, you cannot reference a byte position that no longer exists (e.g. byte 80 of the original) in later statements—you must use the new layout.

When to Reformat Before Sort

Use INREC (reformat before sort) when:

You want to shorten the record so the sort uses less memory and sortwork. Copy only the fields needed for the sort and for downstream steps.
You need to build a sort key from multiple input fields or from a converted value (e.g. date, packed to character). Put the new key at a fixed position in the INREC output and reference it in SORT FIELDS.
You need to reorder fields so the key is at the start of the record or so the layout matches what INCLUDE/OMIT and SORT FIELDS expect.
You need to insert constants (e.g. delimiters, filler, or a type code) so every record has a uniform structure before the sort.

Do not use INREC when the only change you need is to the final report—for example, adding edit masks to numbers or sequence numbers for display. That belongs in OUTREC, which runs after the sort and does not affect how much data is sorted.

INREC FIELDS= and BUILD=

You define the reformatted record with INREC FIELDS=(item1,item2,...) or INREC BUILD=(item1,item2,...). Each item is either a field copy or a constant. Items are placed in the output record in the order listed; the output length is the sum of the lengths of all items.

Field copy: (position, length)

A pair (position,length) means: copy length bytes from the input record starting at byte position. Positions are 1-based. So (1,20) copies input bytes 1–20 into the first 20 bytes of the reformatted record. (50,10) copies input bytes 50–59. The order in the list is the order in the output—so (50,10,1,20) puts the 10-byte block first, then the 20-byte block, for a 30-byte reformatted record with a different column order than the input.

Constants: C'...' and X'...'

C'literal' inserts the character literal (in EBCDIC) into the output; X'hex' inserts bytes from hex pairs. For example, C' ' is one space; X'40' is one EBCDIC space. Constants are useful for delimiters or padding between copied fields.

For basic reformatting, FIELDS= and BUILD= are equivalent: the same list of items produces the same output. Many shops use BUILD= to make it clear that a new record is being built.

Example: Shortening Records for Performance

Input is 80-byte fixed-length. You only need bytes 1–25 (ID and name) and bytes 41–60 (amount and date) for the sort and for the next step. Reformat before sort to drop the rest:

text

1
2
  INREC FIELDS=(1,25,41,20)
  SORT FIELDS=(1,8,CH,A)

INREC FIELDS=(1,25,41,20) copies input 1–25 (25 bytes) and 41–60 (20 bytes) into a 45-byte record. The sort phase therefore sorts 45-byte records instead of 80. SORT FIELDS=(1,8,CH,A) sorts by the first 8 bytes of the reformatted record, which correspond to input bytes 1–8 (e.g. ID). All position references in SORT FIELDS are to the 45-byte record.

Example: Building a Composite Sort Key

You want to sort by region (input bytes 10–11) and then by account number (input bytes 20–29). In the input those are not adjacent. In INREC you build a single 12-byte key at the start: region then account number. The sort then uses that key.

text

1
2
  INREC FIELDS=(10,2,20,10,1,80)
  SORT FIELDS=(1,2,CH,A,3,10,CH,A)

INREC FIELDS=(10,2,20,10,1,80) builds: bytes 1–2 = region (from input 10–11), bytes 3–12 = account (from input 20–29), bytes 13–92 = full original record (1–80). So the reformatted record is 92 bytes with the composite key at the front. SORT FIELDS=(1,2,CH,A,3,10,CH,A) sorts by that key: first 2 bytes ascending, then next 10 bytes ascending. Without INREC you would need to specify SORT FIELDS=(10,2,CH,A,20,10,CH,A) and the sort would still process the full 80-byte record; with INREC you also have the option to shorten the record (e.g. keep only 1,12 and 13,80 if you do not need the middle of the original record).

Positions After INREC: SORT FIELDS and INCLUDE/OMIT

After INREC, every control statement that uses record positions uses the reformatted record. That includes:

SORT FIELDS= — Positions and lengths refer to the INREC output. If the reformatted record is 50 bytes, valid positions are 1–50. You cannot reference original input position 80 after shortening the record.
INCLUDE COND= / OMIT COND= — The condition is evaluated on the reformatted record. So if INREC puts a type code at position 1, use something like INCLUDE COND=(1,1,CH,EQ,C'A') to keep only type A.
SUM FIELDS= — If you use SUM, the control and sum fields are in the reformatted layout.

When you design INREC, plan the new layout so that the key and any filter fields are at known positions. Then write SORT FIELDS and INCLUDE/OMIT against that layout.

INREC vs OUTREC: Quick Comparison

INREC runs before the sort; OUTREC runs after. So INREC shapes what the sort sees (and what gets written if you do not use OUTREC). OUTREC shapes only the final output. Use INREC to shorten records, build sort keys, or normalize the layout for the sort. Use OUTREC to format the sorted data for reports (edit masks, column order, sequence numbers). You can use both: INREC for a compact sort layout, OUTREC for the final file layout.

Explain It Like I'm Five

Imagine you have a big worksheet with many columns, and you only need two columns to put the rows in order. Before sorting, you make a small copy of each row with just those two columns (and maybe a few more you need later). That small copy is like INREC: you reformat each row before sorting. Now you sort the small copies. Sorting is fast because each row is short. If you had sorted the big worksheet first and only then cut out the extra columns, you would have done a lot of extra work moving big rows. So we "reformat before sort" to make the sort step faster and to put the sort key in a simple place (e.g. always at the start of the small copy).

Exercises

Input is 100 bytes. You need to sort by bytes 5–14 (character) and only keep bytes 1–30 and 70–100 in the output. Write INREC to shorten the record and give the correct SORT FIELDS= for the reformatted record.
Why can you not build a composite sort key in OUTREC and expect the sort to use it?
After INREC FIELDS=(1,10,20,5), what is the length of the record seen by the sort? What position in the reformatted record holds the data that was at input positions 20–24?
List the processing order from "read from SORTIN" to "write to SORTOUT" and say which steps use the original input record and which use the INREC record.

Quiz

Test Your Knowledge

1. In the DFSORT processing flow, when is INREC applied relative to INCLUDE/OMIT and the sort phase?

After the sort
After INCLUDE/OMIT but before sort
Before INCLUDE/OMIT and before sort
Only when OPTION COPY is used

2. You have 100-byte input records but only need bytes 1–20 and 60–80 for the sort and output. What is the main performance benefit of using INREC to shorten the record?

INREC runs faster than OUTREC
The sort phase processes 40-byte records instead of 100, so less sortwork I/O and memory
INCLUDE runs faster on shorter records
There is no benefit

3. After INREC FIELDS=(1,10,50,5), what positions does SORT FIELDS= use?

Original input positions 1–10 and 50–54
Reformatted record positions 1–15 only
Reformatted record positions; e.g. 1,10,CH,A for first 10 bytes
SORT FIELDS cannot be used with INREC

4. Why would you build a composite sort key in INREC instead of in OUTREC?

OUTREC cannot build keys
The sort phase must see the key to order by it; INREC runs before sort, OUTREC runs after
INREC is required for multi-key sorts
Building in OUTREC would change the output format

5. What is the practical difference between INREC FIELDS= and INREC BUILD= for basic reformatting?

FIELDS= shortens, BUILD= lengthens
No difference; both define the same output layout when given the same items
BUILD= runs after the sort
FIELDS= cannot use constants