The collating sequence is the order in which character (byte) values are compared when you sort with CH (character) format. On z/OS the default is usually EBCDIC: every byte has a position in that sequence (e.g. space, then digits 0–9, then letters A–Z, then lowercase, then other symbols—the exact order is defined by the EBCDIC code page). When DFSORT compares two CH keys, it looks at the first byte: if one is "smaller" in the collating sequence, that record comes first; if the first bytes are equal, it compares the second byte, and so on. So "ABC" comes before "ABD" because C is before D in the sequence. Numeric formats (PD, ZD, BI, FL) do not use the collating sequence—they compare by numeric value. This page explains what the collating sequence is, when it applies, and how EBCDIC affects CH sort order.
The collating sequence is a fixed ordering of all possible byte values (0x00 through 0xFF). When DFSORT compares two character keys, it asks: "Which byte comes first in this sequence?" The byte with the lower position in the sequence is "smaller." On the mainframe the default is EBCDIC, so the sequence is determined by the EBCDIC code page in use (e.g. 037, 1047). In that order, typically: control characters, then space, then digits 0–9 (0xF0–0xF9), then uppercase A–Z, then lowercase a–z, then other symbols. So for CH keys, "A" < "B" < "Z" and "0" < "9" in the usual code pages.
The collating sequence is used only for CH (character) keys. When you specify SORT FIELDS=(1,10,CH,A), DFSORT compares the 10 bytes at positions 1–10 using the collating sequence: byte by byte, left to right. For PD, ZD, BI, or FL, the key is interpreted as a number and compared by numeric value—the collating sequence is not used. So the only time you need to think about collating sequence is when you have character sort keys.
On z/OS, the default is EBCDIC, not ASCII. The numeric values of the bytes are different: for example, in ASCII the letter "A" is 0x41; in EBCDIC it is 0xC1. The relative order of letters (A before B) is the same in both for A–Z, but the position of digits, spaces, and symbols relative to letters can differ. If you receive a file in ASCII and sort it on the mainframe with CH, the sort order will follow EBCDIC (or whatever collation is in effect), not ASCII, unless you use a conversion or alternate sequence. For character data that stays in EBCDIC, CH sort order is the normal EBCDIC order.
DFSORT can support an alternate collating sequence (e.g. ALTSEQ) so that certain bytes are compared in a different order. For example, you might want case-insensitive order (treat A and a as equal) or a custom order for special characters. See the ALTSEQ control statement and the product documentation. Locale or collation options may also be available depending on the product and release.
The collating sequence is like the order of the alphabet the sort uses. When we sort words, we say "A comes before B, B before C." The computer has a list of which character comes before which (A, B, C, … 0, 1, 2, …). That list is the collating sequence. On the mainframe it’s usually EBCDIC. When we sort by "letters" (CH), the sort uses that list to put things in order.
1. What determines the order when you sort with CH (character) in DFSORT?
2. In EBCDIC, do letters A–Z sort in the same order as in ASCII?
3. When does collating sequence apply in DFSORT?
4. Can you change the collating sequence in DFSORT?
5. What is the main impact of EBCDIC on digit order?