MainframeMaster

CH (Character) Format

In DFSORT, every sort key and many control fields are described by a format that tells the program how to interpret the bytes. CH stands for character (sometimes called alphanumeric). When you specify CH, DFSORT does not treat the field as a number; it compares the bytes in collating sequence—on z/OS typically EBCDIC—one byte at a time from left to right. CH is the right choice for names, IDs, codes, and any text data. Using CH for numeric data can produce wrong order (e.g. "10" before "9"). This page explains what CH is, how it behaves, when to use it, and how it differs from numeric formats like ZD and PD.

Data Types & Formats
Progress0 of 0 lessons

What CH Means

CH tells DFSORT: "this field is character data." The sort does not interpret the bytes as a number (packed, zoned, or binary). Instead, it compares the key byte by byte, from the first byte to the last, using the collating sequence of the job. On IBM z/OS that is usually EBCDIC (Extended Binary Coded Decimal Interchange Code). Whichever record has the smaller byte value at the first position where the two keys differ is ordered first in ascending sort; for descending, the order is reversed. So CH gives you lexicographic (dictionary-style) order based on the character set, not numeric order.

Syntax: Where CH Appears

The most common place you use CH is in SORT FIELDS. The format is:

text
1
SORT FIELDS=(start,length,CH,direction)
  • start — Starting position of the field in the record (1-based). The first byte of the record is position 1.
  • length — Length of the field in bytes. DFSORT will compare this many bytes.
  • CH — Format: character. No numeric conversion; byte-by-byte comparison in collating sequence.
  • direction — A for ascending, D for descending.

Example: sort on a 20-byte name at position 1, ascending:

text
1
SORT FIELDS=(1,20,CH,A)

For multiple sort keys, you repeat the (position, length, format, direction) pattern. Example: sort by last name (1,20,CH,A) then first name (21,15,CH,A):

text
1
SORT FIELDS=(1,20,CH,A,21,15,CH,A)

How Comparison Works

With CH, DFSORT compares the two keys like this: start at byte 1; if the bytes are equal, move to byte 2; repeat until a byte differs or one key runs out. The key with the smaller byte value at that position is ordered first (ascending). If all bytes are equal and the lengths are the same, the two records are equal for sort purposes (order between them may depend on OPTION EQUALS/NOEQUALS). If one key is a prefix of the other (e.g. "ABC" vs "ABCD"), the shorter one is less in EBCDIC, so it comes first in ascending order.

EBCDIC and digit order

In standard EBCDIC, the character codes for the digits 0 through 9 are in ascending numeric order (typically 0xF0 through 0xF9). So for fixed-length strings that contain only digits and no sign (e.g. 8-digit date YYYYMMDD or 5-digit ID), comparing byte-by-byte in EBCDIC gives the same order as comparing the numbers. That is why some shops use CH for such fields. But as soon as you have variable-length numbers (e.g. "9" vs "10"), the first byte of "10" is "1", which is less than "9" in EBCDIC, so "10" sorts before "9"—wrong for numeric order. Negative numbers (with a sign in the last byte) and leading spaces also break the correspondence. So for reliability with numeric data, use a numeric format (ZD, PD, BI) that matches the storage.

When to Use CH

Use CH when the field is character data
Use caseReason
Names (person, product, etc.)Data is text; byte order is the correct order.
Alphanumeric IDs or codesIDs are compared as strings, not as numbers.
Address lines, descriptionsPlain text; no numeric meaning.
Fixed-length date as string (e.g. YYYYMMDD)Can work if all values are same length and positive; ZD is safer for dates stored as numbers.
Keys that mix letters and digitsCH preserves lexicographic order; numeric formats do not apply.

When Not to Use CH

  • Packed decimal (COMP-3) — Use PD. The bytes are not character digits; they are nibbles. CH gives meaningless order.
  • Zoned decimal (DISPLAY numeric) — Use ZD if you want numeric order, especially with sign or variable length.
  • Binary integers — Use BI. Raw byte comparison does not match numeric value.
  • Variable-length or signed numeric data — Use the numeric format that matches storage (ZD, PD, etc.) so 9 comes before 10 and negatives are correct.

CH in Other Control Statements

CH is not only for SORT FIELDS. Whenever DFSORT needs to know the format of a field—for comparison, building a new field, or writing output—you can specify CH. Examples:

  • INCLUDE / OMIT — When you use a numeric test (e.g. comparison with a constant), you specify the format of the field. If the field is character and you are comparing as character, the constant is typically character and the format is CH.
  • INREC / OUTREC — When building or copying fields, you refer to input positions; the "format" of what you copy can be thought of as character when you are just moving bytes. For build items like constants, you are supplying character data.
  • OUTFIL — Report fields and constants are often character; CH is implicit when you output literal text or unmodified character fields.
  • MERGE FIELDS — Same idea as SORT FIELDS: use CH for character merge keys.

Collating Sequence and ALTSEQ

The order you get with CH depends on the collating sequence. By default that is the job's character set (e.g. EBCDIC). You can change it with ALTSEQ (alternate collating sequence) to get case-insensitive sorts or custom order. That does not change the fact that the key is still compared byte-by-byte; it only changes which byte value is considered "less than" or "greater than" another. So CH plus ALTSEQ is still character comparison, not numeric.

Explain It Like I'm Five

Imagine sorting word cards. We look at the first letter: if one card has "A" and another "B", the A card goes first. If the first letters are the same, we look at the second letter, and so on. We never try to "add up" the letters as a number—we just compare them in ABC order. CH does that with the bytes in your field: first byte, then second, and so on, using the computer's letter order (EBCDIC). So CH is for names and words. When the field is really a number (like money or quantity), we use a different rule (ZD or PD) so that 9 comes before 10.

Exercises

  1. Write a SORT FIELDS statement to sort on bytes 5–14 as character ascending, then bytes 15–18 as character descending.
  2. Your file has a 6-byte customer ID at position 1 (digits only, fixed length). Could you use CH? What could go wrong if one day the ID has a leading space or a letter?
  3. Sort a file by last name (positions 1–25) then first name (26–40), both CH ascending. Write the full SORT FIELDS control statement.
  4. Why does sorting "9" and "10" with CH put "10" before "9"? What format would you use to get numeric order?

Quiz

Test Your Knowledge

1. What does CH mean in a DFSORT SORT FIELDS specification?

  • Compact Hexadecimal
  • Character: the field is compared byte-by-byte in collating sequence (e.g. EBCDIC)
  • Check digit
  • Column header

2. For a 10-byte name field starting at position 1, which SORT FIELDS specification is correct for ascending order?

  • SORT FIELDS=(1,10,ZD,A)
  • SORT FIELDS=(1,10,CH,A)
  • SORT FIELDS=(1,10,PD,A)
  • SORT FIELDS=(CH,1,10,A)

3. When can sorting digits with CH produce the same order as numeric sort?

  • Never
  • When the field is fixed-length, contains only positive digits (no sign), and all values have the same length
  • Only when OPTION EQUALS is used
  • Only for the primary key

4. Why might "10" sort before "9" when using CH?

  • CH always sorts descending
  • CH compares byte by byte; the first byte of "10" is "1" and of "9" is "9"; in EBCDIC "1" is less than "9", so "10" comes first
  • It is a bug in DFSORT
  • Only when the field is packed decimal

5. Where can CH be used besides SORT FIELDS?

  • Only in SORT FIELDS
  • In INCLUDE/OMIT, INREC, OUTREC, OUTFIL, and SUM when specifying field format (e.g. comparison or build type)
  • Only in MERGE
  • Only in JOINKEYS