MainframeMaster

Collating Sequence

The collating sequence is the order in which character (byte) values are compared when you sort with CH (character) format. On z/OS the default is usually EBCDIC: every byte has a position in that sequence (e.g. space, then digits 0–9, then letters A–Z, then lowercase, then other symbols—the exact order is defined by the EBCDIC code page). When DFSORT compares two CH keys, it looks at the first byte: if one is "smaller" in the collating sequence, that record comes first; if the first bytes are equal, it compares the second byte, and so on. So "ABC" comes before "ABD" because C is before D in the sequence. Numeric formats (PD, ZD, BI, FL) do not use the collating sequence—they compare by numeric value. This page explains what the collating sequence is, when it applies, and how EBCDIC affects CH sort order.

SORT Statement Deep Dive
Progress0 of 0 lessons

What Is the Collating Sequence?

The collating sequence is a fixed ordering of all possible byte values (0x00 through 0xFF). When DFSORT compares two character keys, it asks: "Which byte comes first in this sequence?" The byte with the lower position in the sequence is "smaller." On the mainframe the default is EBCDIC, so the sequence is determined by the EBCDIC code page in use (e.g. 037, 1047). In that order, typically: control characters, then space, then digits 0–9 (0xF0–0xF9), then uppercase A–Z, then lowercase a–z, then other symbols. So for CH keys, "A" < "B" < "Z" and "0" < "9" in the usual code pages.

When DFSORT Uses It

The collating sequence is used only for CH (character) keys. When you specify SORT FIELDS=(1,10,CH,A), DFSORT compares the 10 bytes at positions 1–10 using the collating sequence: byte by byte, left to right. For PD, ZD, BI, or FL, the key is interpreted as a number and compared by numeric value—the collating sequence is not used. So the only time you need to think about collating sequence is when you have character sort keys.

EBCDIC vs ASCII

On z/OS, the default is EBCDIC, not ASCII. The numeric values of the bytes are different: for example, in ASCII the letter "A" is 0x41; in EBCDIC it is 0xC1. The relative order of letters (A before B) is the same in both for A–Z, but the position of digits, spaces, and symbols relative to letters can differ. If you receive a file in ASCII and sort it on the mainframe with CH, the sort order will follow EBCDIC (or whatever collation is in effect), not ASCII, unless you use a conversion or alternate sequence. For character data that stays in EBCDIC, CH sort order is the normal EBCDIC order.

Changing the Sequence: ALTSEQ and Others

DFSORT can support an alternate collating sequence (e.g. ALTSEQ) so that certain bytes are compared in a different order. For example, you might want case-insensitive order (treat A and a as equal) or a custom order for special characters. See the ALTSEQ control statement and the product documentation. Locale or collation options may also be available depending on the product and release.

Explain It Like I'm Five

The collating sequence is like the order of the alphabet the sort uses. When we sort words, we say "A comes before B, B before C." The computer has a list of which character comes before which (A, B, C, … 0, 1, 2, …). That list is the collating sequence. On the mainframe it’s usually EBCDIC. When we sort by "letters" (CH), the sort uses that list to put things in order.

Exercises

  1. Does the collating sequence affect a key specified as PD? Why or why not?
  2. In EBCDIC, why do fixed-length digit strings often sort in numeric order when using CH?
  3. When might you need an alternate collating sequence (e.g. ALTSEQ)?

Quiz

Test Your Knowledge

1. What determines the order when you sort with CH (character) in DFSORT?

  • Alphabetical order only
  • The collating sequence—typically EBCDIC on z/OS—defines the byte order used for comparison
  • Random order
  • Numeric order

2. In EBCDIC, do letters A–Z sort in the same order as in ASCII?

  • Yes, identical
  • The order A–Z is similar but the numeric code values differ; also, in EBCDIC digits 0–9 are not adjacent to letters
  • No, Z comes first in EBCDIC
  • EBCDIC has no letters

3. When does collating sequence apply in DFSORT?

  • Only for PD
  • For character (CH) keys; numeric formats (PD, ZD, BI) compare by numeric value, not collating sequence
  • Only for the first key
  • Always

4. Can you change the collating sequence in DFSORT?

  • No
  • Yes—products may support ALTSEQ or LOCALE/collation options to alter or specify the sequence for CH comparison
  • Only for descending
  • Only for the second key

5. What is the main impact of EBCDIC on digit order?

  • Digits are random
  • In EBCDIC the codes for 0–9 (X'F0'–X'F9') are in ascending order, so same-length digit strings sort in numeric order when compared as CH
  • 9 comes before 0
  • Digits are not sortable