Parsing delimited files in DFSORT means taking input records where fields are separated by a delimiter (comma, tab, pipe, etc.) and turning them into fixed-length fields so you can sort, filter, or write a fixed-format output. You use the PARSE feature with INREC or OUTFIL. For each field you specify the delimiter (ENDBEFR=) and the output length (FIXLEN=). The product reads the record, splits it on the delimiters, and assigns each segment to a parsed field (e.g. %01, %02). You then use BUILD to place those fields at fixed positions. This page explains PARSE syntax, comma- and pipe-delimited examples, and how to handle quoted CSV fields.
In a delimited file, each record has multiple values separated by a special character. In comma-separated (CSV) format, a record might look like: ABC,DEF,GHI,JKL. In pipe-delimited format: ABC|DEF|GHI|JKL. The record length can vary because each field can have a different length. Many mainframe sorts and reports expect fixed-length fields at fixed positions. Parsing is the step that reads the delimited record, finds each delimiter, extracts the value between delimiters, and writes it into a fixed-length area (with padding or truncation as needed). DFSORT's PARSE does that in one pass.
For each field you want to extract, you define a parsed field with a name like %01, %02, etc. (syntax may vary by product). For each one you specify:
C','; for pipe use C'|'.Optional parameters (product-dependent) include STARTAFT= to skip a character before the value (e.g. skip an opening quote) and PAIR=QUOTE for quoted CSV handling.
| Delimiter | Typical spec | Typical use |
|---|---|---|
| Comma | ENDBEFR=C',' | CSV files |
| Pipe | ENDBEFR=C'|' | Pipe-delimited files |
| Tab | ENDBEFR=C'\t' or tab character | Tab-delimited files |
Input record: ABC,DEF,GHI,JKL. You want four fixed-length fields: 3, 5, 8, and 8 bytes. Use OUTFIL with PARSE and BUILD. With OUTFIL you need a SORT or OPTION COPY (or MERGE) before the OUTFIL statement.
123456OPTION COPY OUTFIL PARSE=(%01=(ENDBEFR=C',',FIXLEN=3), %02=(ENDBEFR=C',',FIXLEN=5), %03=(ENDBEFR=C',',FIXLEN=8), %04=(ENDBEFR=C',',FIXLEN=8)), BUILD=(1:%01,4:%02,9:%03,17:%04)
%01 gets "ABC" (3 bytes), %02 gets "DEF" (padded to 5: "DEF "), %03 gets "GHI" (padded to 8), %04 gets "JKL" (padded to 8). BUILD places them at positions 1, 4, 9, and 17. The exact BUILD syntax (e.g. 1:%01,08:%02) may vary—some products use a colon and length. Check your manual.
INREC PARSE runs before the sort. The record is reformatted into fixed-length fields from the delimited input; then the sort and INCLUDE/OMIT see that fixed layout. So you can SORT FIELDS= on a parsed field or INCLUDE COND= on it. OUTFIL PARSE runs when building the output for that OUTFIL. The input to that OUTFIL may already be sorted; PARSE then converts the delimited record to fixed format for the output file. Use INREC when you need to sort or filter by the parsed fields; use OUTFIL when you only need the parsed layout in a specific output dataset.
In CSV, a field may be enclosed in double quotes so that a comma inside the field is not a delimiter: "Smith, John",25,NY. To extract the first field as "Smith, John", you need to define the field as starting after the opening quote and ending before the closing quote. Use STARTAFT=C'"' and ENDBEFR=C'"' so the parser skips the first quote and treats the next quote as the end of the field. Example (syntax may vary):
1%01=(STARTAFT=C'"',ENDBEFR=C'"',FIXLEN=20)
Some DFSORT versions support PAIR=QUOTE or similar for standard CSV; see your product documentation.
If the input is variable-length (RECFM=VB), the data starts after the RDW (typically at position 5). PARSE operates on the record content. For OUTFIL with VB input and fixed output you may need VTOF (variable to fixed) and ensure the BUILD output length matches the output LRECL (e.g. pad with 133:X for a 133-byte fixed record). Product messages (e.g. ICE222A) often indicate when the built record length does not match the output DCB.
Imagine a line of words separated by commas: "apple,banana,cherry". Parsing means we split the line at each comma. The first word is "apple", the second is "banana", the third is "cherry". We then put each word in a fixed-size box: if the box is 10 spaces, "apple" becomes "apple " and "banana" fits in the box. DFSORT does that: it looks for the comma (or pipe or tab), takes what's before it, and puts it in a box the size we say (FIXLEN). We do that for each field and then arrange the boxes in a row (BUILD) to make a fixed-format record.
1. What is PARSE in DFSORT used for?
2. What does ENDBEFR specify in PARSE?
3. Why is FIXLEN required in PARSE?
4. Can you use PARSE in both INREC and OUTFIL?
5. How do you handle quoted fields in CSV (e.g. "Smith, John")?