In DFSORT, variable-length records are records that can have different lengths from one to the next. On the mainframe, variable-length datasets typically use RECFM=VB (Variable Block). Each record consists of a 4-byte RDW (Record Descriptor Word) followed by the data. The RDW contains the length of the entire record (including the RDW). DFSORT reads and writes these records with the RDW intact; it does not strip the RDW on input or add it on output—the format is preserved. When you specify positions in control statements such as SORT FIELDS, INREC, OUTREC, INCLUDE, and OMIT, those positions refer to the data portion of the record: position 1 is the first byte after the RDW. This page explains how variable-length records work, how the RDW is used, how to allocate SORTIN and SORTOUT for VB, and how position numbering affects your sort keys and build statements.
A variable-length record is one whose length can vary from record to record. In z/OS, variable-length datasets usually have RECFM=V or VB. The "V" means variable; the "B" means blocked (multiple records per block). Each record starts with a 4-byte Record Descriptor Word (RDW) that tells the system how long that record is. So the physical layout is: RDW (4 bytes) + data (variable bytes). The total length of the record (RDW + data) is stored in the RDW. Because the length can change from record to record, you can save space when some records are short and others are long, instead of padding every record to a fixed length.
The RDW is a 4-byte field at the beginning of each variable-length record. It contains the total length of the record in bytes (including the 4 bytes of the RDW itself). So if a record has 80 bytes of data, the RDW typically contains the value 84 (4 + 80). The exact format (e.g. whether the length is in the first 2 bytes, big-endian, etc.) is defined by the platform. DFSORT and the access method use the RDW to know how many bytes to read for that record. When DFSORT writes variable-length output, it writes the RDW and then the data, so the output is also valid VB.
In DFSORT control statements, positions always refer to the data portion of the record. For variable-length records, position 1 is the first byte of the data—i.e. the byte immediately after the 4-byte RDW. Position 2 is the second data byte, and so on. The RDW is not counted in the position numbers you give in SORT FIELDS, INREC, OUTREC, INCLUDE, or OMIT. So if you code:
1SORT FIELDS=(1,10,CH,A)
you are sorting on the first 10 bytes of the data, not the RDW. This keeps your control statements the same whether the dataset is fixed-length or variable-length (for the data layout): you always refer to "byte 1" as the first data byte. For VB, DFSORT skips the RDW when applying these positions.
When you allocate a variable-length dataset (e.g. SORTIN or SORTOUT), you specify LRECL (logical record length). For RECFM=VB, LRECL is the maximum record length, and that maximum includes the 4-byte RDW. So:
| LRECL | Maximum data length |
|---|---|
| 44 | 40 bytes (44 - 4) |
| 100 | 96 bytes |
| 256 | 252 bytes |
Records can be shorter than the maximum; each record's actual length is in its RDW. You must set LRECL large enough for the longest record you will read or write.
For variable-length input, allocate SORTIN with RECFM=VB and LRECL equal to the maximum record length (including RDW). DFSORT will read each record (RDW + data) and use the RDW to know how many bytes to read. For variable-length output, allocate SORTOUT with RECFM=VB and an LRECL that is at least as large as the longest record you will write. If you use OUTREC to build a fixed-length output record, you can instead allocate SORTOUT as RECFM=FB with that fixed LRECL; then the output is fixed-length even if the input was variable. So you can convert VB to FB (or FB to VB) by the way you build the record and allocate the output.
When you use INREC or OUTREC, the positions you specify refer to the input or output data portion. For VB input, position 1 in INREC is the first data byte (after RDW). The record that INREC builds can be fixed-length or variable-length depending on your product and how you specify the output. For OUTREC, you build the output record; if SORTOUT is VB, DFSORT will add or maintain the RDW for the length of the record you build. Exact behavior (e.g. whether OUTREC builds data only and DFSORT adds RDW, or you must account for RDW) can be product-dependent—consult your DFSORT manual for VB and INREC/OUTREC.
When DFSORT cannot determine from the DD or dataset that the data is variable-length (e.g. tape or dynamic allocation without RECFM), you can use the RECORD control statement to supply the format:
1RECORD TYPE=V,LENGTH=256
TYPE=V indicates variable-length records. LENGTH=256 specifies the maximum record length (typically including RDW). This is needed when the DCB is missing or incomplete. For normal disk datasets with DCB=(RECFM=VB,LRECL=n), you do not need RECORD.
Use variable-length (VB) when records naturally have different lengths—e.g. free-form text, variable-length comments, or records that are only as long as needed. You save space and avoid padding. Use fixed-length (FB) when every record has the same length—e.g. 80-byte card images or fixed-format files. DFSORT supports both; just ensure SORTIN and SORTOUT have the correct RECFM and LRECL for the data you are reading and writing.
When records are variable-length, some records may be shorter than the maximum. If you specify a sort key that extends past the end of a short record (e.g. key at positions 1–20 but the record has only 15 data bytes), behavior may be product-dependent. The option VLSHRT (and NOVLSHRT) also affects variable-length and SUM overflow handling—see the VLSHRT tutorial. In general, ensure your sort key and build specifications do not assume more bytes than the shortest record you will process, or use INREC to normalize to a fixed-length key.
Imagine a stack of papers where each paper can be a different length. Each paper has a small label at the top that says "I am 5 lines long" or "I am 10 lines long." That label is like the RDW—it tells the computer how long that record is. The sort program reads the label, then reads that many bytes (including the label), and sorts the papers. When we say "position 1," we mean the first line of the actual paper, not the label. So the label is there so the computer knows how much to read, but when we talk about where the name or number is, we count from the first line of the paper.
1. What is the RDW in a variable-length record?
2. When you specify SORT FIELDS=(1,10,CH,A) for a VB input, does position 1 refer to the first byte of the RDW or the first byte of data?
3. For a VB dataset, what does LRECL on the DD statement represent?
4. Does DFSORT strip the RDW when reading SORTIN or add it when writing SORTOUT?
5. When would you use RECORD TYPE=V in DFSORT?