MainframeMaster

Numeric vs Character Sorts

In DFSORT, every sort key has a format: you tell the sort whether the key is character (CH) or a kind of number (PD, ZD, BI, FI, FL). That choice controls how DFSORT compares the key bytes. With CH, bytes are compared in collating sequence (typically EBCDIC)—left to right, byte by byte. With numeric formats, DFSORT interprets the bytes as a number (packed decimal, zoned decimal, binary, etc.) and compares the numeric value. Using the wrong format leads to wrong sort order: for example, sorting a packed-decimal amount as CH can put 100 before 20, and sorting a variable-length number as CH can put 9 after 10. This page explains the difference between numeric and character sorts, when each is correct, how EBCDIC order compares to numeric order, and how to choose and verify the right format for your data.

SORT Statement Deep Dive
Progress0 of 0 lessons

Why the Format Choice Matters

The same bytes in a record can be interpreted in different ways. For example, the byte sequence that represents the number 12345 in zoned decimal (one digit per byte, like COBOL DISPLAY) is different from the byte sequence that represents 12345 in packed decimal (two digits per byte plus sign, like COMP-3). If you tell DFSORT the key is CH, it does not interpret the bytes as a number at all—it compares them as character codes. So the "order" you get is the order of the byte values (EBCDIC order on the mainframe), not the numeric order. For true numeric order you must use a numeric format (ZD, PD, BI, etc.) that matches how the data is actually stored. Choosing character when the data is numeric, or numeric when the data is character, is one of the most common causes of incorrect sort order in DFSORT jobs.

Character (CH): Byte-by-Byte Collating Sequence

CH means the sort key is treated as character (alphanumeric) data. DFSORT compares the key one byte at a time, from left to right, using the collating sequence of the job—on z/OS this is usually EBCDIC. Whichever record has the smaller byte value at the first differing position is ordered first (for ascending). So "ABC" comes before "ABD" because the third byte C is less than D in EBCDIC. For digit characters, in EBCDIC the codes for 0 through 9 are in ascending order (0xF0 through 0xF9 in typical EBCDIC). So for fixed-length strings of digits with no sign, character order and numeric order can coincide: "00100", "00234", "99999" sort in the same order whether you think of them as characters or numbers. But as soon as the lengths differ, character order diverges from numeric order: the string "9" (one byte) is compared with "10" (two bytes); the first byte of "10" is "1", which in EBCDIC is less than "9", so "10" would sort before "9"—the opposite of numeric order. So CH is correct for names, IDs, codes, and text, but risky for numeric data unless it is fixed-length, positive, and consistently formatted.

What CH Does and Does Not Do

CH does: compare bytes in collating sequence; give consistent alphabetical or alphanumeric order; work for any byte string (names, codes, dates stored as YYYYMMDD character). CH does not: interpret the key as a number; handle negative numbers correctly (the sign is just another byte); guarantee numeric order when lengths differ or when the data is stored in packed or binary form. So use CH when the key is genuinely character data. When the key is a number stored in a numeric format (zoned, packed, binary), use the matching format so that comparison is by numeric value.

Numeric Formats: ZD, PD, BI (Brief)

ZD (zoned decimal): Each byte holds one digit (and the last byte may hold the sign). This is COBOL DISPLAY storage. DFSORT interprets the key as a signed decimal number and compares numeric value. So 1 sorts before 2, 10 sorts after 9, and negative numbers sort before positive (for ascending). PD (packed decimal): Each byte holds two digits (nibbles); the last byte has one digit and the sign. This is COBOL COMP-3. DFSORT interprets the key as a signed packed decimal and compares numeric value. BI (binary): The key is a binary integer (e.g. 2- or 4-byte fullword). DFSORT compares the integer value. For all of these, the length in SORT FIELDS is the number of bytes of the key. The important point: with a numeric format, the comparison is by numeric value, not by the raw byte order. So 9 is always less than 10, and -5 is less than 100, regardless of how many bytes they occupy or how they are encoded.

Comparing Character vs Numeric: Same Digits, Different Order

Suppose you have three records with a 3-byte field that contains the values 9, 10, and 11. If the field is stored as character (EBCDIC), you might have: record A = " 9" (space, space, 9), record B = " 10", record C = " 11". With CH, DFSORT compares byte by byte. The first byte is space (0x40) for all three. The second byte: space (0x40) for A, space (0x40) for B and C. The third byte: 9 (0xF9) for A, 0 (0xF0) for B, 1 (0xF1) for C. So ascending order is: B (0), C (1), A (9)—i.e. " 10", " 11", " 9". That is not numeric order (9, 10, 11). If the same values were stored in a numeric field (e.g. ZD or PD), and you specified ZD or PD, DFSORT would compare the numeric values 9, 10, 11 and produce the order A, B, C (9, 10, 11). So for numeric order you must use a numeric format when the data is stored in that form; CH is only for when you want character (collating sequence) order.

When Character (CH) Gives Wrong Numeric Order

CH gives wrong numeric order in these typical cases: (1) Variable-length numbers: "9" vs "10"—CH puts "10" before "9" because the first byte "1" < "9". (2) Packed decimal data: The bytes are not character digits; they are nibbles. Comparing them as CH does not correspond to numeric value at all; you can get 100 before 20 or seemingly random order. (3) Negative numbers in zoned form: The sign is in the last byte (e.g. D for negative). CH compares that byte as a character, so the order of negative vs positive may not match numeric order. (4) Leading spaces or leading zeros used inconsistently: " 99" vs "100"—character comparison is byte-by-byte, so the first differing byte decides; that may not match numeric comparison. To avoid these problems, use ZD for zoned (display) numeric data, PD for packed (COMP-3) data, and BI for binary integer keys. Reserve CH for true character keys (names, codes, IDs, or fixed-format character dates like YYYYMMDD where you are happy with character order).

When a Numeric Format Is Wrong (Character Data)

If the key is actually character (e.g. an alphanumeric product code or a name) and you specify PD or ZD, DFSORT will try to interpret those bytes as a number. For PD it expects a packed layout (two digits per byte); for ZD it expects one digit per byte with a sign in the last byte. If the bytes are letters or other characters, the interpretation is meaningless and the sort order can be wrong or unpredictable. For example, a customer ID like "AB12CD" must be sorted with CH so that the order is alphabetical/numeric by character. Using PD or ZD on "AB12CD" would misinterpret the bytes and produce incorrect order. So: use numeric formats only when the key is really stored in that numeric format; use CH for character keys.

Fixed-Length Positive Digit Strings: When CH Matches Numeric

A special case is when the key is a fixed-length field containing only positive digits (e.g. 8-byte date YYYYMMDD or 5-digit ID). In EBCDIC, the character codes for 0–9 are in ascending order. So when you compare two same-length digit strings byte by byte, the character order is the same as the numeric order: "00100" before "00234" before "99999". In that situation, using CH will produce the same order as using ZD (if the data were zoned). So it is "acceptable" to use CH for such fields when you are sure the data will always be fixed-length, positive, and digit-only. However, if later someone adds a negative value, or a shorter value (e.g. "9" instead of "00009"), or a blank, the CH order can become wrong. For robustness, many shops prefer to use ZD (or PD) whenever the field is logically a number, even if CH would work today.

Choosing the Right Format: Decision Guide

  • Names, codes, alphanumeric IDs, text → Use CH. You want collating sequence order.
  • COBOL DISPLAY numeric (PIC 9(n) or S9(n) DISPLAY) → Use ZD. The data is zoned decimal (one digit per byte, sign in last byte).
  • COBOL COMP-3 (packed decimal) → Use PD. The data is packed (two digits per byte, sign in last nibble).
  • COBOL COMP or COMP-4 (binary) → Use BI (or FI for signed; see product docs). The data is binary integer.
  • Fixed-format date as character (YYYYMMDD)CH is fine and gives chronological order for dates in that form. Alternatively, if the date is stored in a numeric format, use that format.
  • Unsure how the field is stored → Check the program or copybook that writes the data (PIC clause and usage). Match the format to the storage type.

Examples: Correct vs Incorrect Format

Record layout: bytes 1–20 customer name (character), bytes 21–24 amount (COMP-3, packed decimal), bytes 25–32 date (YYYYMMDD character). You want sort by amount descending, then by date ascending.

text
1
SORT FIELDS=(21,4,PD,D,25,8,CH,A)

Correct: amount is packed → use PD; date is character → use CH. Amount is compared as a number (largest first); date is compared as character, which for YYYYMMDD gives chronological order.

text
1
SORT FIELDS=(21,4,CH,D,25,8,ZD,A)

Incorrect: amount is specified as CH but the data is packed—the sort order for amount will not be numeric. Date is specified as ZD but the data is character digits—ZD might interpret the bytes as zoned decimal, which can change order if there is a sign or different lengths; for YYYYMMDD character, CH is the right choice. So the first example is correct; the second mixes formats and storage types and can produce wrong order.

EBCDIC Digit Order and Why It Sometimes Matches

In EBCDIC, the character codes for the digits 0 through 9 are typically in ascending order (e.g. 0xF0 for "0" through 0xF9 for "9"). So when you compare two strings of digits that have the same length, the first differing position has the smaller code for the smaller digit, and the order is the same as numeric order. That is why fixed-length positive numeric strings often sort correctly with CH. But the moment you compare "9" (one byte) with "10" (two bytes), the comparison is "9" vs "1" in the first byte—and "1" is less than "9", so "10" comes first. So CH does not "know" that 9 is less than 10; it only knows byte order. Numeric formats (ZD, PD, BI) actually decode the value and compare 9 and 10 correctly.

Explain It Like I'm Five

Imagine you have cards with numbers on them. If you sort them "by the way the letters look" (character), you look at the first digit, then the second, and so on. So "10" might come before "9" because you look at "1" first and "1" comes before "9" in the alphabet of digits. If you sort them "by the real number" (numeric), you know 9 is smaller than 10, so 9 comes first. The sort program needs you to say: "This field is letters (character)" or "This field is a real number (numeric)." If you say the wrong thing, the order gets mixed up—like putting 100 before 20, or 9 after 10. So we use character (CH) for names and codes, and numeric (PD, ZD) for numbers stored in those ways.

Exercises

  1. A 4-byte field contains a dollar amount in COMP-3. What format do you use in SORT FIELDS for correct numeric order? What happens if you use CH?
  2. You have a 10-byte customer ID that is alphanumeric (e.g. "CU00123456"). Should you use CH, ZD, or PD? Why?
  3. Give an example of two values that would sort in different order with CH than with ZD (same data, same length). (Hint: consider negative numbers or different lengths.)
  4. Date is stored as 8 bytes YYYYMMDD in character. Why is CH acceptable here? When might ZD or a date format be better?

Quiz

Test Your Knowledge

1. You have a 5-byte field containing the digits "00234" stored as EBCDIC (one character per byte). If you sort with CH, what order do you get compared with "00100" and "99999"?

  • Numeric order: 00100, 00234, 99999
  • Character order: byte-by-byte EBCDIC, so 00100, 00234, 99999 (same as numeric for same-length positive digits)
  • Random order
  • Descending only

2. A field is stored as COBOL COMP-3 (packed decimal). Which format must you use in SORT FIELDS for correct numeric order?

  • CH
  • ZD
  • PD
  • Either CH or PD

3. What is the main difference between CH and ZD when the data looks like "12345" (display digits)?

  • There is no difference
  • CH compares byte-by-byte in collating sequence; ZD interprets the bytes as a signed decimal number and compares numeric value
  • ZD is only for negative numbers
  • CH is faster

4. Why can sorting a packed-decimal amount as CH produce "wrong" order?

  • CH is always wrong for numbers
  • Packed decimal stores two digits per byte in nibbles; the byte values do not increase in numeric order, so comparing raw bytes does not give numeric order
  • CH only works for the first key
  • PD is required for all numbers

5. When is it acceptable to use CH for a field that contains only digits?

  • Never
  • When the field is fixed-length, positive, same length for all records, and you are sure no negatives or variable length will appear
  • Only for the primary key
  • Only with OPTION EQUALS