How do I parse a comma-delimited file in DFSORT?

Use INREC or OUTFIL with PARSE. Define each field with %nn=(ENDBEFR=C',', FIXLEN=length). Then use BUILD to place the parsed fields at fixed positions. Example: OUTFIL PARSE=(%01=(ENDBEFR=C',',FIXLEN=10),%02=(ENDBEFR=C',',FIXLEN=20)), BUILD=(1:%01,11:%02). With OUTFIL you need OPTION COPY or a SORT/MERGE statement.

What is ENDBEFR in DFSORT PARSE?

ENDBEFR= specifies the delimiter character that marks the end of the field. For example ENDBEFR=C',' means the field is everything from the current position up to (but not including) the next comma. So for CSV you use ENDBEFR=C',' for each field; for pipe-delimited use ENDBEFR=C'|'.

What is FIXLEN in PARSE?

FIXLEN= sets the output length of the parsed field in bytes. The value extracted from the delimited input is written into a fixed-length area: if the value is shorter it is typically padded (e.g. with spaces), if longer it may be truncated. FIXLEN is required for each parsed field so the BUILD layout has fixed lengths.

Can I parse tab-delimited or pipe-delimited files?

Yes. Use the appropriate delimiter in ENDBEFR. For tab: ENDBEFR=C'\t' or the EBCDIC tab character. For pipe: ENDBEFR=C'|'. The rest of the PARSE and BUILD logic is the same as for comma-delimited. Define each field with %nn=(ENDBEFR=delimiter, FIXLEN=length) and BUILD with the parsed field references.

How do I handle quoted CSV fields in DFSORT PARSE?

For fields enclosed in double quotes, use STARTAFT=C'"' to start after the opening quote and ENDBEFR=C'"' to end before the closing quote. So %01=(STARTAFT=C'"',ENDBEFR=C'"',FIXLEN=20) extracts the first quoted field. That way a comma inside the quotes is not treated as a delimiter. Some products support PAIR=QUOTE for CSV.

Parsing Delimited Files - DFSORT PARSE ENDBEFR FIXLEN

Parsing Delimited Files

Parsing delimited files in DFSORT means taking input records where fields are separated by a delimiter (comma, tab, pipe, etc.) and turning them into fixed-length fields so you can sort, filter, or write a fixed-format output. You use the PARSE feature with INREC or OUTFIL. For each field you specify the delimiter (ENDBEFR=) and the output length (FIXLEN=). The product reads the record, splits it on the delimiters, and assigns each segment to a parsed field (e.g. %01, %02). You then use BUILD to place those fields at fixed positions. This page explains PARSE syntax, comma- and pipe-delimited examples, and how to handle quoted CSV fields.

Data Transformation

What Is a Delimited File?

In a delimited file, each record has multiple values separated by a special character. In comma-separated (CSV) format, a record might look like: ABC,DEF,GHI,JKL. In pipe-delimited format: ABC|DEF|GHI|JKL. The record length can vary because each field can have a different length. Many mainframe sorts and reports expect fixed-length fields at fixed positions. Parsing is the step that reads the delimited record, finds each delimiter, extracts the value between delimiters, and writes it into a fixed-length area (with padding or truncation as needed). DFSORT's PARSE does that in one pass.

PARSE Basics: ENDBEFR and FIXLEN

For each field you want to extract, you define a parsed field with a name like %01, %02, etc. (syntax may vary by product). For each one you specify:

ENDBEFR= — The delimiter that ends the field. The field content is from the current position in the record up to (but not including) the next occurrence of this character. For comma-delimited use C','; for pipe use C'|'.
FIXLEN= — The length in bytes of the output field. The extracted value is placed in a fixed-length area. If the value is shorter than FIXLEN, it is typically padded (e.g. with spaces); if longer, it may be truncated. FIXLEN is required so that the BUILD layout has known lengths.

Optional parameters (product-dependent) include STARTAFT= to skip a character before the value (e.g. skip an opening quote) and PAIR=QUOTE for quoted CSV handling.

Common Delimiters

Common ENDBEFR values
Delimiter	Typical spec	Typical use
Comma	ENDBEFR=C','	CSV files
Pipe	ENDBEFR=C'\|'	Pipe-delimited files
Tab	ENDBEFR=C'\t' or tab character	Tab-delimited files

Example: Comma-Delimited to Fixed

Input record: ABC,DEF,GHI,JKL. You want four fixed-length fields: 3, 5, 8, and 8 bytes. Use OUTFIL with PARSE and BUILD. With OUTFIL you need a SORT or OPTION COPY (or MERGE) before the OUTFIL statement.

text

1
2
3
4
5
6
  OPTION COPY
  OUTFIL PARSE=(%01=(ENDBEFR=C',',FIXLEN=3),
               %02=(ENDBEFR=C',',FIXLEN=5),
               %03=(ENDBEFR=C',',FIXLEN=8),
               %04=(ENDBEFR=C',',FIXLEN=8)),
         BUILD=(1:%01,4:%02,9:%03,17:%04)

%01 gets "ABC" (3 bytes), %02 gets "DEF" (padded to 5: "DEF "), %03 gets "GHI" (padded to 8), %04 gets "JKL" (padded to 8). BUILD places them at positions 1, 4, 9, and 17. The exact BUILD syntax (e.g. 1:%01,08:%02) may vary—some products use a colon and length. Check your manual.

INREC PARSE vs OUTFIL PARSE

INREC PARSE runs before the sort. The record is reformatted into fixed-length fields from the delimited input; then the sort and INCLUDE/OMIT see that fixed layout. So you can SORT FIELDS= on a parsed field or INCLUDE COND= on it. OUTFIL PARSE runs when building the output for that OUTFIL. The input to that OUTFIL may already be sorted; PARSE then converts the delimited record to fixed format for the output file. Use INREC when you need to sort or filter by the parsed fields; use OUTFIL when you only need the parsed layout in a specific output dataset.

Quoted CSV Fields

In CSV, a field may be enclosed in double quotes so that a comma inside the field is not a delimiter: "Smith, John",25,NY. To extract the first field as "Smith, John", you need to define the field as starting after the opening quote and ending before the closing quote. Use STARTAFT=C'"' and ENDBEFR=C'"' so the parser skips the first quote and treats the next quote as the end of the field. Example (syntax may vary):

text

1
  %01=(STARTAFT=C'"',ENDBEFR=C'"',FIXLEN=20)

Some DFSORT versions support PAIR=QUOTE or similar for standard CSV; see your product documentation.

Variable-Length Input (VB)

If the input is variable-length (RECFM=VB), the data starts after the RDW (typically at position 5). PARSE operates on the record content. For OUTFIL with VB input and fixed output you may need VTOF (variable to fixed) and ensure the BUILD output length matches the output LRECL (e.g. pad with 133:X for a 133-byte fixed record). Product messages (e.g. ICE222A) often indicate when the built record length does not match the output DCB.

Explain It Like I'm Five

Imagine a line of words separated by commas: "apple,banana,cherry". Parsing means we split the line at each comma. The first word is "apple", the second is "banana", the third is "cherry". We then put each word in a fixed-size box: if the box is 10 spaces, "apple" becomes "apple " and "banana" fits in the box. DFSORT does that: it looks for the comma (or pipe or tab), takes what's before it, and puts it in a box the size we say (FIXLEN). We do that for each field and then arrange the boxes in a row (BUILD) to make a fixed-format record.

Exercises

Write a PARSE for a pipe-delimited record with three fields, with FIXLEN 5, 10, and 15, and BUILD to place them at positions 1, 6, and 16.
Why is FIXLEN required for each parsed field?
When would you use INREC PARSE instead of OUTFIL PARSE?

Quiz

Test Your Knowledge

1. What is PARSE in DFSORT used for?

Sorting only
Extracting fields from delimited records (comma, tab, pipe) and building fixed-length output—you define each field by a delimiter and output length (FIXLEN)
Only for VB files
Only in INCLUDE

2. What does ENDBEFR specify in PARSE?

Record length
The character that ends the field (the delimiter)—e.g. C',' for comma, C'|' for pipe. The field is the bytes before that delimiter
Only for BUILD
Sort order

3. Why is FIXLEN required in PARSE?

It is optional
FIXLEN sets the output length of the parsed field—the extracted value is placed in a fixed-length area (padded or truncated). DFSORT needs a fixed length for the BUILD layout
Only for INREC
Only for OUTFIL

4. Can you use PARSE in both INREC and OUTFIL?

Only OUTFIL
Yes—INREC PARSE runs before the sort so you can sort by parsed fields; OUTFIL PARSE runs on the record going to that OUTFIL. With OUTFIL you often need OPTION COPY (or SORT/MERGE)
Only INREC
Neither

5. How do you handle quoted fields in CSV (e.g. "Smith, John")?

FINDREP only
Use STARTAFT=C'"' and ENDBEFR=C'"' so the field is taken from between the quotes; some products support PAIR=QUOTE for CSV
PARSE cannot handle quotes
Only with ICETOOL