MainframeMaster

Parsing Delimited Files

Parsing delimited files in DFSORT means taking input records where fields are separated by a delimiter (comma, tab, pipe, etc.) and turning them into fixed-length fields so you can sort, filter, or write a fixed-format output. You use the PARSE feature with INREC or OUTFIL. For each field you specify the delimiter (ENDBEFR=) and the output length (FIXLEN=). The product reads the record, splits it on the delimiters, and assigns each segment to a parsed field (e.g. %01, %02). You then use BUILD to place those fields at fixed positions. This page explains PARSE syntax, comma- and pipe-delimited examples, and how to handle quoted CSV fields.

Data Transformation
Progress0 of 0 lessons

What Is a Delimited File?

In a delimited file, each record has multiple values separated by a special character. In comma-separated (CSV) format, a record might look like: ABC,DEF,GHI,JKL. In pipe-delimited format: ABC|DEF|GHI|JKL. The record length can vary because each field can have a different length. Many mainframe sorts and reports expect fixed-length fields at fixed positions. Parsing is the step that reads the delimited record, finds each delimiter, extracts the value between delimiters, and writes it into a fixed-length area (with padding or truncation as needed). DFSORT's PARSE does that in one pass.

PARSE Basics: ENDBEFR and FIXLEN

For each field you want to extract, you define a parsed field with a name like %01, %02, etc. (syntax may vary by product). For each one you specify:

  • ENDBEFR= — The delimiter that ends the field. The field content is from the current position in the record up to (but not including) the next occurrence of this character. For comma-delimited use C','; for pipe use C'|'.
  • FIXLEN= — The length in bytes of the output field. The extracted value is placed in a fixed-length area. If the value is shorter than FIXLEN, it is typically padded (e.g. with spaces); if longer, it may be truncated. FIXLEN is required so that the BUILD layout has known lengths.

Optional parameters (product-dependent) include STARTAFT= to skip a character before the value (e.g. skip an opening quote) and PAIR=QUOTE for quoted CSV handling.

Common Delimiters

Common ENDBEFR values
DelimiterTypical specTypical use
CommaENDBEFR=C','CSV files
PipeENDBEFR=C'|'Pipe-delimited files
TabENDBEFR=C'\t' or tab characterTab-delimited files

Example: Comma-Delimited to Fixed

Input record: ABC,DEF,GHI,JKL. You want four fixed-length fields: 3, 5, 8, and 8 bytes. Use OUTFIL with PARSE and BUILD. With OUTFIL you need a SORT or OPTION COPY (or MERGE) before the OUTFIL statement.

text
1
2
3
4
5
6
OPTION COPY OUTFIL PARSE=(%01=(ENDBEFR=C',',FIXLEN=3), %02=(ENDBEFR=C',',FIXLEN=5), %03=(ENDBEFR=C',',FIXLEN=8), %04=(ENDBEFR=C',',FIXLEN=8)), BUILD=(1:%01,4:%02,9:%03,17:%04)

%01 gets "ABC" (3 bytes), %02 gets "DEF" (padded to 5: "DEF "), %03 gets "GHI" (padded to 8), %04 gets "JKL" (padded to 8). BUILD places them at positions 1, 4, 9, and 17. The exact BUILD syntax (e.g. 1:%01,08:%02) may vary—some products use a colon and length. Check your manual.

INREC PARSE vs OUTFIL PARSE

INREC PARSE runs before the sort. The record is reformatted into fixed-length fields from the delimited input; then the sort and INCLUDE/OMIT see that fixed layout. So you can SORT FIELDS= on a parsed field or INCLUDE COND= on it. OUTFIL PARSE runs when building the output for that OUTFIL. The input to that OUTFIL may already be sorted; PARSE then converts the delimited record to fixed format for the output file. Use INREC when you need to sort or filter by the parsed fields; use OUTFIL when you only need the parsed layout in a specific output dataset.

Quoted CSV Fields

In CSV, a field may be enclosed in double quotes so that a comma inside the field is not a delimiter: "Smith, John",25,NY. To extract the first field as "Smith, John", you need to define the field as starting after the opening quote and ending before the closing quote. Use STARTAFT=C'"' and ENDBEFR=C'"' so the parser skips the first quote and treats the next quote as the end of the field. Example (syntax may vary):

text
1
%01=(STARTAFT=C'"',ENDBEFR=C'"',FIXLEN=20)

Some DFSORT versions support PAIR=QUOTE or similar for standard CSV; see your product documentation.

Variable-Length Input (VB)

If the input is variable-length (RECFM=VB), the data starts after the RDW (typically at position 5). PARSE operates on the record content. For OUTFIL with VB input and fixed output you may need VTOF (variable to fixed) and ensure the BUILD output length matches the output LRECL (e.g. pad with 133:X for a 133-byte fixed record). Product messages (e.g. ICE222A) often indicate when the built record length does not match the output DCB.

Explain It Like I'm Five

Imagine a line of words separated by commas: "apple,banana,cherry". Parsing means we split the line at each comma. The first word is "apple", the second is "banana", the third is "cherry". We then put each word in a fixed-size box: if the box is 10 spaces, "apple" becomes "apple " and "banana" fits in the box. DFSORT does that: it looks for the comma (or pipe or tab), takes what's before it, and puts it in a box the size we say (FIXLEN). We do that for each field and then arrange the boxes in a row (BUILD) to make a fixed-format record.

Exercises

  1. Write a PARSE for a pipe-delimited record with three fields, with FIXLEN 5, 10, and 15, and BUILD to place them at positions 1, 6, and 16.
  2. Why is FIXLEN required for each parsed field?
  3. When would you use INREC PARSE instead of OUTFIL PARSE?

Quiz

Test Your Knowledge

1. What is PARSE in DFSORT used for?

  • Sorting only
  • Extracting fields from delimited records (comma, tab, pipe) and building fixed-length output—you define each field by a delimiter and output length (FIXLEN)
  • Only for VB files
  • Only in INCLUDE

2. What does ENDBEFR specify in PARSE?

  • Record length
  • The character that ends the field (the delimiter)—e.g. C',' for comma, C'|' for pipe. The field is the bytes before that delimiter
  • Only for BUILD
  • Sort order

3. Why is FIXLEN required in PARSE?

  • It is optional
  • FIXLEN sets the output length of the parsed field—the extracted value is placed in a fixed-length area (padded or truncated). DFSORT needs a fixed length for the BUILD layout
  • Only for INREC
  • Only for OUTFIL

4. Can you use PARSE in both INREC and OUTFIL?

  • Only OUTFIL
  • Yes—INREC PARSE runs before the sort so you can sort by parsed fields; OUTFIL PARSE runs on the record going to that OUTFIL. With OUTFIL you often need OPTION COPY (or SORT/MERGE)
  • Only INREC
  • Neither

5. How do you handle quoted fields in CSV (e.g. "Smith, John")?

  • FINDREP only
  • Use STARTAFT=C'"' and ENDBEFR=C'"' so the field is taken from between the quotes; some products support PAIR=QUOTE for CSV
  • PARSE cannot handle quotes
  • Only with ICETOOL