What are variable-length records in DFSORT?

Variable-length (VB) records have a 4-byte Record Descriptor Word (RDW) at the start of each record, followed by the data. The length of each record can vary up to the maximum (LRECL). DFSORT reads and writes VB records with the RDW intact. Positions in SORT FIELDS, INREC, OUTREC, INCLUDE, and OMIT refer to the data portion—position 1 is the first byte after the RDW.

Does position 1 in SORT FIELDS include the RDW for VB?

No. For variable-length records, position 1 is the first byte of the data after the 4-byte RDW. So SORT FIELDS=(1,10,CH,A) sorts on the first 10 data bytes. The RDW is not counted in the positions you specify in control statements.

What is LRECL for a VB dataset?

For RECFM=VB, LRECL is the maximum record length in bytes, and that maximum includes the 4-byte RDW. So the maximum data length is LRECL minus 4. For example, LRECL=100 means records can be up to 100 bytes total (4-byte RDW plus up to 96 bytes of data).

Can DFSORT sort variable-length records?

Yes. DFSORT supports variable-length (VB) input and output. Allocate SORTIN and SORTOUT with RECFM=VB and the appropriate LRECL. DFSORT maintains the RDW and sorts by the key you specify in SORT FIELDS; the key positions refer to the data portion of the record.

When do I need the RECORD statement for variable-length data?

Use RECORD TYPE=V (and optionally RECORD LENGTH=n) when DFSORT cannot determine from the DD or dataset that records are variable-length—e.g. tape or dynamic allocation without DCB. For disk datasets with DCB=(RECFM=VB,LRECL=n), RECORD is usually omitted.

Variable-Length Records - DFSORT VB and RDW

Variable-Length Records

In DFSORT, variable-length records are records that can have different lengths from one to the next. On the mainframe, variable-length datasets typically use RECFM=VB (Variable Block). Each record consists of a 4-byte RDW (Record Descriptor Word) followed by the data. The RDW contains the length of the entire record (including the RDW). DFSORT reads and writes these records with the RDW intact; it does not strip the RDW on input or add it on output—the format is preserved. When you specify positions in control statements such as SORT FIELDS, INREC, OUTREC, INCLUDE, and OMIT, those positions refer to the data portion of the record: position 1 is the first byte after the RDW. This page explains how variable-length records work, how the RDW is used, how to allocate SORTIN and SORTOUT for VB, and how position numbering affects your sort keys and build statements.

Data Types & Formats

What Is a Variable-Length Record?

A variable-length record is one whose length can vary from record to record. In z/OS, variable-length datasets usually have RECFM=V or VB. The "V" means variable; the "B" means blocked (multiple records per block). Each record starts with a 4-byte Record Descriptor Word (RDW) that tells the system how long that record is. So the physical layout is: RDW (4 bytes) + data (variable bytes). The total length of the record (RDW + data) is stored in the RDW. Because the length can change from record to record, you can save space when some records are short and others are long, instead of padding every record to a fixed length.

The RDW (Record Descriptor Word)

The RDW is a 4-byte field at the beginning of each variable-length record. It contains the total length of the record in bytes (including the 4 bytes of the RDW itself). So if a record has 80 bytes of data, the RDW typically contains the value 84 (4 + 80). The exact format (e.g. whether the length is in the first 2 bytes, big-endian, etc.) is defined by the platform. DFSORT and the access method use the RDW to know how many bytes to read for that record. When DFSORT writes variable-length output, it writes the RDW and then the data, so the output is also valid VB.

Position Numbering: Data Portion Only

In DFSORT control statements, positions always refer to the data portion of the record. For variable-length records, position 1 is the first byte of the data—i.e. the byte immediately after the 4-byte RDW. Position 2 is the second data byte, and so on. The RDW is not counted in the position numbers you give in SORT FIELDS, INREC, OUTREC, INCLUDE, or OMIT. So if you code:

text

1
  SORT FIELDS=(1,10,CH,A)

you are sorting on the first 10 bytes of the data, not the RDW. This keeps your control statements the same whether the dataset is fixed-length or variable-length (for the data layout): you always refer to "byte 1" as the first data byte. For VB, DFSORT skips the RDW when applying these positions.

LRECL for VB Datasets

When you allocate a variable-length dataset (e.g. SORTIN or SORTOUT), you specify LRECL (logical record length). For RECFM=VB, LRECL is the maximum record length, and that maximum includes the 4-byte RDW. So:

VB LRECL and data length
LRECL	Maximum data length
44	40 bytes (44 - 4)
100	96 bytes
256	252 bytes

Records can be shorter than the maximum; each record's actual length is in its RDW. You must set LRECL large enough for the longest record you will read or write.

SORTIN and SORTOUT for Variable-Length

For variable-length input, allocate SORTIN with RECFM=VB and LRECL equal to the maximum record length (including RDW). DFSORT will read each record (RDW + data) and use the RDW to know how many bytes to read. For variable-length output, allocate SORTOUT with RECFM=VB and an LRECL that is at least as large as the longest record you will write. If you use OUTREC to build a fixed-length output record, you can instead allocate SORTOUT as RECFM=FB with that fixed LRECL; then the output is fixed-length even if the input was variable. So you can convert VB to FB (or FB to VB) by the way you build the record and allocate the output.

INREC and OUTREC with VB

When you use INREC or OUTREC, the positions you specify refer to the input or output data portion. For VB input, position 1 in INREC is the first data byte (after RDW). The record that INREC builds can be fixed-length or variable-length depending on your product and how you specify the output. For OUTREC, you build the output record; if SORTOUT is VB, DFSORT will add or maintain the RDW for the length of the record you build. Exact behavior (e.g. whether OUTREC builds data only and DFSORT adds RDW, or you must account for RDW) can be product-dependent—consult your DFSORT manual for VB and INREC/OUTREC.

RECORD Statement for Variable-Length

When DFSORT cannot determine from the DD or dataset that the data is variable-length (e.g. tape or dynamic allocation without RECFM), you can use the RECORD control statement to supply the format:

text

1
  RECORD TYPE=V,LENGTH=256

TYPE=V indicates variable-length records. LENGTH=256 specifies the maximum record length (typically including RDW). This is needed when the DCB is missing or incomplete. For normal disk datasets with DCB=(RECFM=VB,LRECL=n), you do not need RECORD.

Variable vs Fixed: When to Use VB

Use variable-length (VB) when records naturally have different lengths—e.g. free-form text, variable-length comments, or records that are only as long as needed. You save space and avoid padding. Use fixed-length (FB) when every record has the same length—e.g. 80-byte card images or fixed-format files. DFSORT supports both; just ensure SORTIN and SORTOUT have the correct RECFM and LRECL for the data you are reading and writing.

Short Records and VLSHRT

When records are variable-length, some records may be shorter than the maximum. If you specify a sort key that extends past the end of a short record (e.g. key at positions 1–20 but the record has only 15 data bytes), behavior may be product-dependent. The option VLSHRT (and NOVLSHRT) also affects variable-length and SUM overflow handling—see the VLSHRT tutorial. In general, ensure your sort key and build specifications do not assume more bytes than the shortest record you will process, or use INREC to normalize to a fixed-length key.

Explain It Like I'm Five

Imagine a stack of papers where each paper can be a different length. Each paper has a small label at the top that says "I am 5 lines long" or "I am 10 lines long." That label is like the RDW—it tells the computer how long that record is. The sort program reads the label, then reads that many bytes (including the label), and sorts the papers. When we say "position 1," we mean the first line of the actual paper, not the label. So the label is there so the computer knows how much to read, but when we talk about where the name or number is, we count from the first line of the paper.

Exercises

Your VB dataset has LRECL=100. What is the maximum number of data bytes in any one record?
Write a SORT FIELDS statement to sort a VB file by the first 15 bytes of data (after the RDW) as character ascending. Does position 1 refer to the RDW?
If SORTIN is VB and SORTOUT is FB, what must you do in your control statements and DD allocation so that the output is correct?
When would you code RECORD TYPE=V in SYSIN?

Quiz

Test Your Knowledge

1. What is the RDW in a variable-length record?

A record key
A 4-byte Record Descriptor Word at the start of each record that contains the record length
A sort field
Optional; only used in DFSORT

2. When you specify SORT FIELDS=(1,10,CH,A) for a VB input, does position 1 refer to the first byte of the RDW or the first byte of data?

The first byte of the RDW
The first byte of the data (the byte immediately after the 4-byte RDW)
It depends on OPTION
Position 1 is never used for VB

3. For a VB dataset, what does LRECL on the DD statement represent?

The length of the data only (excluding RDW)
The maximum record length, including the 4-byte RDW
The average record length
The block size

4. Does DFSORT strip the RDW when reading SORTIN or add it when writing SORTOUT?

DFSORT strips the RDW on input and adds it on output
DFSORT maintains the RDW: it reads and writes VB records with the RDW intact
DFSORT always converts VB to FB
The RDW is only used for MERGE

5. When would you use RECORD TYPE=V in DFSORT?

Always for VB datasets
When the input or output dataset does not have a proper DCB (e.g. tape or dynamic allocation without RECFM/LRECL) so DFSORT needs to be told the format
Only for OUTFIL
Only when using INREC