What are EBCDIC nuances in DFSORT?

EBCDIC nuances include: sort order follows EBCDIC byte order (not ASCII); uppercase and lowercase sort separately unless you normalize; different code pages (037, 273, etc.) assign different characters to bytes and can affect order; and special or national characters may sort in unexpected positions. Use ALTSEQ or INREC translation when you need a specific order (e.g. case-insensitive).

How does EBCDIC affect sort order?

DFSORT compares sort key bytes using EBCDIC numeric order (or the alternate sequence if ALTSEQ is used). So the order of A–Z, a–z, 0–9, and special characters is determined by their EBCDIC code points. That order differs from ASCII; for example, in EBCDIC 037 uppercase letters typically have lower byte values than lowercase, so "A" sorts before "a".

What is EBCDIC code page 037?

Code page 037 is a common US EBCDIC character set. It defines which character corresponds to each byte value (0–255). Other code pages (273 for German, 277 for Danish/Norwegian, etc.) are used for other languages and may assign different characters to the same byte. The code page affects display and sort order.

How do I get case-insensitive sort in EBCDIC?

Normalize the sort key to one case before comparison. Use INREC (or INCLUDE/OMIT if only filtering) with a translation that converts the key field to uppercase (or lowercase), or use ALTSEQ to map one case to the other. Then sort on the normalized field so "Apple" and "apple" sort together.

Why does my sort order differ from an ASCII system?

EBCDIC and ASCII use different code point orders. The same character has a different byte value, and the relative order of letters, digits, and symbols is not the same. So a file sorted on an ASCII system will not necessarily have the same order when compared byte-by-byte on an EBCDIC system. Convert the data or the key if you need consistent cross-platform order.

EBCDIC Nuances in DFSORT - Code Pages and Sort Order

EBCDIC Nuances

DFSORT runs on z/OS where the default character encoding is EBCDIC. The way bytes are interpreted and the order in which characters sort depend on the EBCDIC code page and on the fact that EBCDIC is not the same as ASCII. This page covers EBCDIC nuances that affect DFSORT: how EBCDIC sort order differs from ASCII, what code pages are and why they matter, how uppercase and lowercase letters sort in EBCDIC, and how to get the sort order you want (e.g. case-insensitive) when data is in EBCDIC.

EBCDIC vs ASCII Order

In EBCDIC, each character is represented by one byte (0–255). The numeric value of that byte determines the sort order when DFSORT compares keys byte by byte. The EBCDIC assignment is different from ASCII: for example, the letter A is X'C1' in EBCDIC 037 but X'41' in ASCII. Digits, lowercase letters, and special characters also have different code points. So the same text sorted on an ASCII system will not have the same byte order when stored in EBCDIC and sorted by DFSORT. If you exchange files with ASCII systems, the "sorted" order may look different when displayed, and if you compare files byte-by-byte they may not match even when logically equal.

Code Pages

An EBCDIC code page defines which character (glyph) corresponds to each byte value. Code page 037 is common in the US (US English). Other code pages—273 (German), 277 (Danish/Norwegian), 297 (French), and many others—support other languages and assign different characters to some byte values. When you specify a sort key, DFSORT compares the bytes as they are; it does not "know" the code page. The code page matters when the data was written (which characters became which bytes) and when it is displayed (which bytes become which characters). If you move a dataset from a 037 system to a 273 system without conversion, the same bytes may display as different characters and the sort order (by byte value) can put "same" characters in different places relative to others. So for consistent behavior, know which code page your data is in and use the same one for sort and display.

Common EBCDIC code pages (examples)
Code page	Region / use	Note
037	US English	Common on US mainframes
273	Germany	German EBCDIC
277	Denmark/Norway	Nordic EBCDIC
297	France	French EBCDIC

Uppercase and Lowercase in EBCDIC

In EBCDIC 037, uppercase letters (A–Z) have byte values in one range and lowercase (a–z) in another. They do not interleave: uppercase A (X'C1') is less than lowercase a (X'81') in numeric order, so when you sort by byte value, all uppercase letters that start with A come before any lowercase "a". That is different from ASCII, where lowercase often sorts after uppercase in the same letter. So "Apple", "apple", and "APPLE" will not sort together in EBCDIC unless you normalize case—for example, by translating the sort key (or the whole record) to uppercase in INREC so that the key bytes are the same for all three.

Getting Case-Insensitive Sort

To get case-insensitive sort order, normalize the data that is used for comparison. One approach is to translate the sort key field to uppercase (or lowercase) in INREC using a translation option (e.g. TRAN=UPPER or the equivalent in your product) and then sort on that normalized field. Another is to use ALTSEQ to map lowercase bytes to the corresponding uppercase bytes (or vice versa) so that when DFSORT compares, both cases compare equal. Either way, the key bytes that DFSORT sees are in a single case, so "Banana" and "banana" sort together. See the case-insensitive sorting and ALTSEQ tutorials for syntax.

National and Special Characters

National characters (accented letters, umlauts, etc.) and symbols have code point values that vary by code page. In one code page a byte might be "é"; in another the same byte might be a different character. When you sort, the order of these characters relative to A–Z and 0–9 is determined by their byte values in the code page. If you need a specific linguistic order (e.g. dictionary order for a language), you may need ALTSEQ or a custom collating sequence that maps bytes to the desired comparison order. The default EBCDIC byte order is not always the same as dictionary or locale order.

Explain It Like I'm Five

The mainframe uses a different ABC than the one on your PC. So when the sorter puts things in order, it uses the mainframe ABC. Big A and little a are in different places in that ABC, so they don't sit together unless we first turn all letters into the same size (big or little) and then sort. And if we use a different "language" (code page), the same squiggle can mean a different letter, so the order can change.

Exercises

In EBCDIC 037, why do "Apple" and "apple" not sort next to each other when you sort by the first character?
What is the purpose of knowing the code page of your dataset when sorting?
Name one way to get case-insensitive sort order when data is EBCDIC.
Why might the same dataset sort "differently" when displayed on a system that uses a different code page?

Quiz

Test Your Knowledge

1. Why does EBCDIC sort order differ from ASCII?

It does not
EBCDIC assigns different numeric values (code points) to characters than ASCII; byte-by-byte comparison gives a different order (e.g. uppercase vs lowercase, digits vs letters)
Only with OPTION COPY
Only for numeric keys

2. What is an EBCDIC code page?

A type of sort key
A defined mapping of byte values to characters (e.g. 037 for US English, 273 for German); different code pages can assign different characters to the same byte and affect sort/collation
A DD name
Same as CH format

3. In EBCDIC, how do uppercase and lowercase letters typically compare?

Lowercase first, like ASCII
Uppercase and lowercase have different byte values and do not sort together; often uppercase letters sort before lowercase (e.g. A before a) but the exact order depends on the code page
They are equal
DFSORT ignores case

4. How can you get case-insensitive sort order when data is EBCDIC?

Use SORT FIELDS only
Translate the sort key (or the whole record) to one case (e.g. uppercase) in INREC with TRAN=UPPER or similar, or use ALTSEQ to map lower to upper (or vice versa) so comparison is consistent
Use OPTION COPY
EBCDIC is always case-insensitive

5. Why might the same sort job produce different apparent order on different systems?

It cannot
Different EBCDIC code pages (or different ALTSEQ/collation settings) assign different byte values or sort orders to national characters and symbols; the same bytes can represent different characters or sort differently
Only due to record length
Only with MERGE