Field extraction in DFSORT INREC means copying only selected segments of the input record into the record that the sort (and later steps) will see. Instead of passing the full input record to the sort, you list (position,length) pairs in INREC FIELDS= or BUILD=: each pair specifies a starting position and length in the input record, and DFSORT copies that segment into the next available positions in the output record. The segments are concatenated in the order you list them—so you can reorder columns, drop unneeded data, or even repeat the same segment. This page explains how extraction works, how (position,length) relates to input vs output, how to extract multiple and non-contiguous fields, reordering and duplicate extraction, and how the output length is determined. Field extraction is the foundation of shortening records and building a clean layout before the sort.
In the context of INREC, field extraction means selecting specific ranges of bytes from the input record and placing them (in order) into a new record. You do not copy the whole record; you copy only the ranges you specify. Each range is given as (position, length): position is the starting byte number in the input (1-based), and length is how many consecutive bytes to copy. The first extracted segment goes at the start of the output, the next segment immediately after it, and so on. There are no gaps unless you insert constants (e.g. C' ') between segments. The result is a single record that is usually shorter than the input and that may have a different column order.
Every field extraction uses a pair of numbers in parentheses: (position, length).
Both numbers always refer to the input record. The output record is built by appending each copied segment one after the other. So the first (position,length) fills output positions 1 through length; the second fills the next length positions, and so on.
You list all extraction pairs (and any constants) in INREC FIELDS=(item1,item2,...) or INREC BUILD=(item1,item2,...). Items are processed left to right. Example:
1INREC FIELDS=(1,15,40,20,5,10)
This extracts: (1) bytes 1–15 from the input (15 bytes), (2) bytes 40–59 (20 bytes), (3) bytes 5–14 (10 bytes). They are placed in the output in that order. Output length = 15 + 20 + 10 = 45 bytes. Output positions 1–15 = input 1–15; output 16–35 = input 40–59; output 36–45 = input 5–14. So you have reordered and shortened the record.
| Specification | Meaning | Output |
|---|---|---|
| (1,10) | Extract 10 bytes starting at input position 1 | Output positions 1–10 |
| (50,5) | Extract 5 bytes starting at input position 50 | Next 5 bytes (e.g. 11–15 if first item was 1,10) |
| (1,10,1,10) | Same 10 bytes copied twice | 20 bytes total |
The order in which you list (position,length) pairs is the order in the output. So you can put a field that appears in the middle of the input at the start of the output—for example, put the sort key first. Example: input has ID at 1–8, name at 21–40, and amount at 41–48. You want the output to be: amount, then ID, then name (e.g. for a report or for a different sort key). Use:
1INREC FIELDS=(41,8,1,8,21,20)
Output: bytes 1–8 = amount (from input 41–48), bytes 9–16 = ID (from input 1–8), bytes 17–36 = name (from input 21–40). Total 36 bytes. The sort could then use SORT FIELDS=(1,8,...) to sort by amount if that is packed or numeric.
You can use the same (position,length) more than once. That copies the same input bytes to multiple places in the output. For example, if the key is at input 10–19 and you want it at both the start of the record (for SORT FIELDS) and later (for a report column), you could use:
1INREC FIELDS=(10,10,1,80)
That puts the 10-byte key first, then the full 80-byte record. So the key appears at output 1–10 and again at output 11–20 (as part of the original 1–80). Alternatively, to duplicate the key explicitly in two separate places without the full record: (10,10,1,9,10,10,21,60) would put key at 1–10, input 1–9, key again at 20–29, then input 21–80—illustrating that the same segment can be repeated by listing it again.
The length of the reformatted record is the sum of the lengths of all (position,length) items, plus the length of any constants (e.g. C'x' adds 1). Positions do not affect output length; only the lengths do. So (1,10,50,5,90,10) produces 10 + 5 + 10 = 25 bytes. Make sure SORTOUT (and any downstream DCB) has LRECL at least this size when you do not use OUTREC to change it again.
Each (position,length) is independent. The ranges can overlap in the input. For example, (1,20,11,20) copies input 1–20 and then input 11–30. The two segments overlap (bytes 11–20 appear in both). DFSORT allows this; both copies are placed in the output. Usually you use non-overlapping ranges to extract distinct fields, but overlapping is valid if you need the same bytes in two places in the output.
You can mix (position,length) with constants. Constants are inserted at that position in the output. Example:
1INREC BUILD=(1,8,C',',9,30,C'|',39,10)
Output: 8 bytes (input 1–8), comma, 30 bytes (input 9–38), pipe, 10 bytes (input 39–48). So you are both extracting and inserting delimiters. Total length: 8+1+30+1+10 = 50 bytes. This is still "field extraction" in the sense that you are choosing which input ranges to copy; the constants are not extracted from the input.
The main reasons to extract only certain fields in INREC are: (1) Shorten the record so the sort uses less memory and sortwork and runs faster. (2) Reorder so the sort key or important columns are at fixed positions (e.g. 1–10) for SORT FIELDS and INCLUDE/OMIT. (3) Drop sensitive or unneeded data so it never reaches sortwork or output. (4) Build a consistent layout when input has variable or awkward layouts—extract only what you need into a fixed layout.
Imagine a long strip of paper with lots of words. You have a highlighter and you only highlight a few pieces: the first word, a word in the middle, and the last word. Then you cut out only those highlighted pieces and tape them onto a new, shorter strip in the order you want. Field extraction is like that: you tell the computer which pieces (position and length) to take from the big record, and it builds a new record from just those pieces. The new strip is shorter and can have the pieces in a different order. So we "extract" only the fields we need.
1. What does INREC FIELDS=(20,10,1,8) do?
2. In INREC, do (position,length) pairs refer to the input record or the output record?
3. Can you extract the same input bytes more than once in one INREC FIELDS= list?
4. What is the output record length of INREC FIELDS=(1,5,30,12,50,8)?
5. Why extract fields in INREC instead of keeping the full record?