Pattern matching in DFSORT often means finding or excluding records that contain a given string anywhere within a field—not just when the whole field equals the string. For example, you might want to keep records where the word "ERROR" appears anywhere in the first 80 bytes, or omit records that contain a comma in a certain range. This is done with the SS (substring search) format in INCLUDE or OMIT COND=. With SS, you specify a search range (start position and length) and a constant; DFSORT looks for that constant anywhere within the range. If it finds it, the condition is true (for EQ) or false (for NE). So "contains" is expressed as (start, length, SS, EQ, C'string'). This is different from CH (character) with EQ, which requires the entire field to exactly match the constant (same length, byte for byte). This page explains SS substring search, how to combine multiple patterns with AND/OR, and pitfalls such as commas inside constants.
The condition for substring search has the form (start, length, SS, operator, constant). Start is the starting byte position (1-based). Length is the number of bytes in the search range—DFSORT looks for the constant anywhere within those bytes. SS is the format code for substring search. Operator is usually EQ (contains) or NE (does not contain); GT, GE, LT, LE may also be supported for collating-order comparison—check your manual. Constant is the string to search for, typically C'…' for character. The constant can be shorter than the length; the length defines the window in which to search. Example: keep records that contain "ERROR" anywhere in the first 80 bytes of the record:
1INCLUDE COND=(1,80,SS,EQ,C'ERROR')
So if "ERROR" appears at position 10, 50, or anywhere in 1–80, the record is kept. To omit records that contain that string:
1OMIT COND=(1,80,SS,EQ,C'ERROR')
Then only records that do not contain "ERROR" in the first 80 bytes are written to the output.
With CH (character format), EQ means the entire field must exactly match the constant. The field length and the constant length must match. So (1,5,CH,EQ,C'HELLO') keeps only records where bytes 1–5 are exactly H-E-L-L-O. With SS, EQ means the constant appears somewhere within the search range. So (1,80,SS,EQ,C'HELLO') keeps any record that has "HELLO" as a substring in the first 80 bytes. Use CH when you need a fixed-position exact match; use SS when you need "contains" or "find anywhere."
| Format | Comparison | Meaning |
|---|---|---|
| CH | Exact match | Entire field must equal the constant; same length required |
| SS | Substring search | Constant can appear anywhere within the field; field can be longer |
You can limit the substring search to a specific part of the record by setting start and length accordingly. For example, to find "WARN" only in bytes 41–80 (e.g. a second half of a fixed block):
1INCLUDE COND=(41,40,SS,EQ,C'WARN')
So only if "WARN" appears in that 40-byte region is the record kept. This avoids false hits in the first 40 bytes. Similarly, to omit records that have a comma anywhere in positions 1–50 (e.g. to ensure a region has no delimiter):
1OMIT COND=(1,50,SS,EQ,C',')
Any record with a comma in 1–50 is dropped.
To keep records that contain any of several strings, use OR between SS conditions. Each condition specifies the same or different search range and a different constant. Example: keep records that contain "ERROR" or "WARN" or "FAIL" in the first 80 bytes:
1INCLUDE COND=(1,80,SS,EQ,C'ERROR',OR,1,80,SS,EQ,C'WARN',OR,1,80,SS,EQ,C'FAIL')
If any one of the three substrings is found in 1–80, the record is kept. The same (start, length, SS, EQ, constant) is repeated for each pattern; only the constant changes. Note: if your constant itself contains a comma, it can be confused with the comma that separates elements in COND=. Use a different delimiter in the constant (e.g. a period or slash) or escape as required by your product; see your DFSORT manual.
To keep records that contain all of several strings, use AND. You can search in the same range or in different ranges. Example: keep records that contain "ID=" in 1–40 and "OK" in 41–80:
1INCLUDE COND=(1,40,SS,EQ,C'ID=',AND,41,40,SS,EQ,C'OK')
Both substrings must be present in their respective regions. So AND narrows the set (all conditions true); OR widens it (at least one true).
In COND=, commas separate the elements of each condition and the AND/OR keywords. If your search string contains a comma (e.g. C'A,B'), the parser may treat it as a delimiter. Different DFSORT versions handle this differently—some allow quoted or escaped commas inside C'…'. To avoid ambiguity, you can (1) use a separator that is not a comma in your constant (e.g. C'A.B' if that still matches your data), or (2) check your Application Programming Guide for how to include a comma in a constant. Document any such cases in your shop standards.
Imagine you have a long line of letters and you want to find the word "CAT" somewhere in that line. You don't care if it's at the start, the middle, or the end—you just want to know if "CAT" appears anywhere. That's what SS does: it looks for the little word (pattern) anywhere inside the big line (the search range). If it finds it, we keep that line. If we use "omit," we throw away any line that has that word. We can also say "keep lines that have CAT or DOG" (OR) or "keep lines that have both CAT and DOG" (AND). The computer just scans the bytes and checks.
1. How do you keep records that contain the string "ERROR" anywhere in the first 80 bytes?
2. What is the difference between CH,EQ and SS,EQ for the same position and length?
3. How do you omit records that contain a comma in positions 1–50?
4. To match multiple different substrings (e.g. keep if "ABC" OR "XYZ" appears in 1–30), how do you code it?
5. Can you use AND to require two different substrings in the same or different areas?