MainframeMaster

Field Extraction

Field extraction in DFSORT INREC means copying only selected segments of the input record into the record that the sort (and later steps) will see. Instead of passing the full input record to the sort, you list (position,length) pairs in INREC FIELDS= or BUILD=: each pair specifies a starting position and length in the input record, and DFSORT copies that segment into the next available positions in the output record. The segments are concatenated in the order you list them—so you can reorder columns, drop unneeded data, or even repeat the same segment. This page explains how extraction works, how (position,length) relates to input vs output, how to extract multiple and non-contiguous fields, reordering and duplicate extraction, and how the output length is determined. Field extraction is the foundation of shortening records and building a clean layout before the sort.

INREC Processing
Progress0 of 0 lessons

What Is Field Extraction?

In the context of INREC, field extraction means selecting specific ranges of bytes from the input record and placing them (in order) into a new record. You do not copy the whole record; you copy only the ranges you specify. Each range is given as (position, length): position is the starting byte number in the input (1-based), and length is how many consecutive bytes to copy. The first extracted segment goes at the start of the output, the next segment immediately after it, and so on. There are no gaps unless you insert constants (e.g. C' ') between segments. The result is a single record that is usually shorter than the input and that may have a different column order.

Syntax: (position, length)

Every field extraction uses a pair of numbers in parentheses: (position, length).

  • Position — The starting byte in the input record. Position 1 is the first byte. So (1,20) starts at the very beginning of the input; (50,10) starts at the 50th byte.
  • Length — How many bytes to copy from the input. So (1,20) copies 20 bytes (input positions 1–20); (50,10) copies 10 bytes (input positions 50–59).

Both numbers always refer to the input record. The output record is built by appending each copied segment one after the other. So the first (position,length) fills output positions 1 through length; the second fills the next length positions, and so on.

INREC FIELDS= with Multiple Extractions

You list all extraction pairs (and any constants) in INREC FIELDS=(item1,item2,...) or INREC BUILD=(item1,item2,...). Items are processed left to right. Example:

text
1
INREC FIELDS=(1,15,40,20,5,10)

This extracts: (1) bytes 1–15 from the input (15 bytes), (2) bytes 40–59 (20 bytes), (3) bytes 5–14 (10 bytes). They are placed in the output in that order. Output length = 15 + 20 + 10 = 45 bytes. Output positions 1–15 = input 1–15; output 16–35 = input 40–59; output 36–45 = input 5–14. So you have reordered and shortened the record.

Extraction examples
SpecificationMeaningOutput
(1,10)Extract 10 bytes starting at input position 1Output positions 1–10
(50,5)Extract 5 bytes starting at input position 50Next 5 bytes (e.g. 11–15 if first item was 1,10)
(1,10,1,10)Same 10 bytes copied twice20 bytes total

Reordering Fields by Extraction Order

The order in which you list (position,length) pairs is the order in the output. So you can put a field that appears in the middle of the input at the start of the output—for example, put the sort key first. Example: input has ID at 1–8, name at 21–40, and amount at 41–48. You want the output to be: amount, then ID, then name (e.g. for a report or for a different sort key). Use:

text
1
INREC FIELDS=(41,8,1,8,21,20)

Output: bytes 1–8 = amount (from input 41–48), bytes 9–16 = ID (from input 1–8), bytes 17–36 = name (from input 21–40). Total 36 bytes. The sort could then use SORT FIELDS=(1,8,...) to sort by amount if that is packed or numeric.

Extracting the Same Field More Than Once

You can use the same (position,length) more than once. That copies the same input bytes to multiple places in the output. For example, if the key is at input 10–19 and you want it at both the start of the record (for SORT FIELDS) and later (for a report column), you could use:

text
1
INREC FIELDS=(10,10,1,80)

That puts the 10-byte key first, then the full 80-byte record. So the key appears at output 1–10 and again at output 11–20 (as part of the original 1–80). Alternatively, to duplicate the key explicitly in two separate places without the full record: (10,10,1,9,10,10,21,60) would put key at 1–10, input 1–9, key again at 20–29, then input 21–80—illustrating that the same segment can be repeated by listing it again.

Output Record Length

The length of the reformatted record is the sum of the lengths of all (position,length) items, plus the length of any constants (e.g. C'x' adds 1). Positions do not affect output length; only the lengths do. So (1,10,50,5,90,10) produces 10 + 5 + 10 = 25 bytes. Make sure SORTOUT (and any downstream DCB) has LRECL at least this size when you do not use OUTREC to change it again.

Overlapping vs Non-Overlapping Ranges

Each (position,length) is independent. The ranges can overlap in the input. For example, (1,20,11,20) copies input 1–20 and then input 11–30. The two segments overlap (bytes 11–20 appear in both). DFSORT allows this; both copies are placed in the output. Usually you use non-overlapping ranges to extract distinct fields, but overlapping is valid if you need the same bytes in two places in the output.

Combining Extraction with Constants

You can mix (position,length) with constants. Constants are inserted at that position in the output. Example:

text
1
INREC BUILD=(1,8,C',',9,30,C'|',39,10)

Output: 8 bytes (input 1–8), comma, 30 bytes (input 9–38), pipe, 10 bytes (input 39–48). So you are both extracting and inserting delimiters. Total length: 8+1+30+1+10 = 50 bytes. This is still "field extraction" in the sense that you are choosing which input ranges to copy; the constants are not extracted from the input.

Why Extract Fields?

The main reasons to extract only certain fields in INREC are: (1) Shorten the record so the sort uses less memory and sortwork and runs faster. (2) Reorder so the sort key or important columns are at fixed positions (e.g. 1–10) for SORT FIELDS and INCLUDE/OMIT. (3) Drop sensitive or unneeded data so it never reaches sortwork or output. (4) Build a consistent layout when input has variable or awkward layouts—extract only what you need into a fixed layout.

Explain It Like I'm Five

Imagine a long strip of paper with lots of words. You have a highlighter and you only highlight a few pieces: the first word, a word in the middle, and the last word. Then you cut out only those highlighted pieces and tape them onto a new, shorter strip in the order you want. Field extraction is like that: you tell the computer which pieces (position and length) to take from the big record, and it builds a new record from just those pieces. The new strip is shorter and can have the pieces in a different order. So we "extract" only the fields we need.

Exercises

  1. Write INREC FIELDS= to extract input bytes 1–10, 25–30, and 70–80. What is the output length? What are the output positions of the data that was at input 25–30?
  2. You want the output to have: input bytes 50–60 first, then bytes 1–20. Write the INREC FIELDS= and give the output length.
  3. INREC FIELDS=(1,5,1,5) — what is the output length and what does it contain?
  4. Input is 100 bytes. You extract (1,20,60,15). What is the maximum input position you can reference in a later SORT FIELDS= on the reformatted record?

Quiz

Test Your Knowledge

1. What does INREC FIELDS=(20,10,1,8) do?

  • Copies bytes 1–8 then 20–29 in that order
  • Copies bytes 20–29 then 1–8 in that order
  • Copies 20 bytes starting at 1
  • Extracts only byte 8

2. In INREC, do (position,length) pairs refer to the input record or the output record?

  • Output record
  • Input record
  • Either, depending on OPTION
  • The sort key only

3. Can you extract the same input bytes more than once in one INREC FIELDS= list?

  • No
  • Yes—e.g. (1,10,1,10) puts the same 10 bytes twice in the output
  • Only with BUILD=
  • Only for the sort key

4. What is the output record length of INREC FIELDS=(1,5,30,12,50,8)?

  • 50 bytes
  • 25 bytes
  • 5+12+8 = 25 bytes
  • Longest segment only

5. Why extract fields in INREC instead of keeping the full record?

  • INREC requires extraction
  • To shorten the record so the sort processes less data and runs faster
  • Extraction is only for OUTREC
  • To change the sort order