DFSORT runs on z/OS where the default character encoding is EBCDIC. The way bytes are interpreted and the order in which characters sort depend on the EBCDIC code page and on the fact that EBCDIC is not the same as ASCII. This page covers EBCDIC nuances that affect DFSORT: how EBCDIC sort order differs from ASCII, what code pages are and why they matter, how uppercase and lowercase letters sort in EBCDIC, and how to get the sort order you want (e.g. case-insensitive) when data is in EBCDIC.
In EBCDIC, each character is represented by one byte (0–255). The numeric value of that byte determines the sort order when DFSORT compares keys byte by byte. The EBCDIC assignment is different from ASCII: for example, the letter A is X'C1' in EBCDIC 037 but X'41' in ASCII. Digits, lowercase letters, and special characters also have different code points. So the same text sorted on an ASCII system will not have the same byte order when stored in EBCDIC and sorted by DFSORT. If you exchange files with ASCII systems, the "sorted" order may look different when displayed, and if you compare files byte-by-byte they may not match even when logically equal.
An EBCDIC code page defines which character (glyph) corresponds to each byte value. Code page 037 is common in the US (US English). Other code pages—273 (German), 277 (Danish/Norwegian), 297 (French), and many others—support other languages and assign different characters to some byte values. When you specify a sort key, DFSORT compares the bytes as they are; it does not "know" the code page. The code page matters when the data was written (which characters became which bytes) and when it is displayed (which bytes become which characters). If you move a dataset from a 037 system to a 273 system without conversion, the same bytes may display as different characters and the sort order (by byte value) can put "same" characters in different places relative to others. So for consistent behavior, know which code page your data is in and use the same one for sort and display.
| Code page | Region / use | Note |
|---|---|---|
| 037 | US English | Common on US mainframes |
| 273 | Germany | German EBCDIC |
| 277 | Denmark/Norway | Nordic EBCDIC |
| 297 | France | French EBCDIC |
In EBCDIC 037, uppercase letters (A–Z) have byte values in one range and lowercase (a–z) in another. They do not interleave: uppercase A (X'C1') is less than lowercase a (X'81') in numeric order, so when you sort by byte value, all uppercase letters that start with A come before any lowercase "a". That is different from ASCII, where lowercase often sorts after uppercase in the same letter. So "Apple", "apple", and "APPLE" will not sort together in EBCDIC unless you normalize case—for example, by translating the sort key (or the whole record) to uppercase in INREC so that the key bytes are the same for all three.
To get case-insensitive sort order, normalize the data that is used for comparison. One approach is to translate the sort key field to uppercase (or lowercase) in INREC using a translation option (e.g. TRAN=UPPER or the equivalent in your product) and then sort on that normalized field. Another is to use ALTSEQ to map lowercase bytes to the corresponding uppercase bytes (or vice versa) so that when DFSORT compares, both cases compare equal. Either way, the key bytes that DFSORT sees are in a single case, so "Banana" and "banana" sort together. See the case-insensitive sorting and ALTSEQ tutorials for syntax.
National characters (accented letters, umlauts, etc.) and symbols have code point values that vary by code page. In one code page a byte might be "é"; in another the same byte might be a different character. When you sort, the order of these characters relative to A–Z and 0–9 is determined by their byte values in the code page. If you need a specific linguistic order (e.g. dictionary order for a language), you may need ALTSEQ or a custom collating sequence that maps bytes to the desired comparison order. The default EBCDIC byte order is not always the same as dictionary or locale order.
The mainframe uses a different ABC than the one on your PC. So when the sorter puts things in order, it uses the mainframe ABC. Big A and little a are in different places in that ABC, so they don't sit together unless we first turn all letters into the same size (big or little) and then sort. And if we use a different "language" (code page), the same squiggle can mean a different letter, so the order can change.
1. Why does EBCDIC sort order differ from ASCII?
2. What is an EBCDIC code page?
3. In EBCDIC, how do uppercase and lowercase letters typically compare?
4. How can you get case-insensitive sort order when data is EBCDIC?
5. Why might the same sort job produce different apparent order on different systems?