MainframeMaster

COBOL Tutorial

COBOL UTF-16 - Quick Reference

Progress0 of 0 lessons

Overview

UTF-16 is a Unicode encoding using 16-bit code units and surrogate pairs for non-BMP characters. Many COBOL runtimes store NATIONAL text as UTF-16 or UCS-2 internally.

Key Points

  • Surrogate pairs for non-BMP
  • Byte order considerations in files
  • Conversions via NATIONAL-OF/DISPLAY-OF

Syntax and Usage

NATIONAL Conversions

cobol
1
2
3
4
5
6
* NATIONAL conversions (often UTF-16 internally) 01 WIDE-TEXT PIC N(40). 01 DISP-TEXT PIC X(120). MOVE FUNCTION NATIONAL-OF("𐍈 – Gothic letter") TO WIDE-TEXT MOVE FUNCTION DISPLAY-OF(WIDE-TEXT) TO DISP-TEXT DISPLAY DISP-TEXT

File Considerations

  • Endianness (UTF-16LE/UTF-16BE) and BOM
  • Consumer expectations for BOM presence
  • Use consistent encoding per dataset

Best Practices

  • Compatibility - Prefer UTF-8 at boundaries when practical
  • Validation - Ensure surrogate pairs are well-formed
  • Documentation - Record encoding and BOM policies

UTF-16 Quick Reference

AspectDescriptionExample
Encoding16-bit code unitsUTF-16BE/LE
Non-BMPSurrogate pairsU+D800..U+DFFF
ConversionsNATIONAL-OF/DISPLAY-OFDISPLAY-OF(WIDE)

Test Your Knowledge

1. What distinguishes UTF-16 from UCS-2?

  • UTF-16 is 8-bit
  • UTF-16 supports surrogate pairs for non-BMP characters; UCS-2 does not
  • They are identical
  • UCS-2 is variable-length

2. Which COBOL feature commonly maps to UTF-16?

  • NATIONAL data items (PIC N)
  • COMP-3 fields
  • INDEX data
  • SCREEN SECTION only

3. What must be considered when writing UTF-16 to files?

  • Byte order (endianness) and BOM usage
  • Only record length
  • No special considerations
  • Use of COMP-3

4. How to convert between UTF-16 NATIONAL and DISPLAY?

  • NATIONAL-OF and DISPLAY-OF
  • RANDOM and CURRENT-DATE
  • NUMVAL and INTEGER
  • REVERSE and REWRITE

5. Which encoding is best for ASCII-heavy data interchange?

  • UTF-16
  • UCS-2
  • UTF-8
  • EBCDIC only

Frequently Asked Questions