MainframeMaster

COBOL Tutorial

COBOL UTF-8 - Quick Reference

Progress0 of 0 lessons

Overview

UTF-8 is a byte-oriented, variable-length Unicode encoding. In COBOL systems, it is commonly used at file, API, and database boundaries; programs convert between NATIONAL and DISPLAY text as needed.

Key Points

  • Use NATIONAL-OF/DISPLAY-OF to bridge NATIONAL and DISPLAY
  • Configure I/O layers to emit/accept UTF-8
  • Test round trips and special characters

Syntax and Usage

Conversion Flow

cobol
1
2
3
4
5
6
* Typical conversion path 01 WIDE-NAME PIC N(40). 01 DISP-NAME PIC X(120). MOVE FUNCTION NATIONAL-OF("Łódź") TO WIDE-NAME MOVE FUNCTION DISPLAY-OF(WIDE-NAME) TO DISP-NAME * External layer writes DISP-NAME as UTF-8

File/DB Considerations

  • Ensure dataset or file encoding is UTF-8 (if supported)
  • Set DB client/server code pages to UTF-8 for text columns
  • Beware of BOM and tool expectations

Best Practices

  • Consistency - Use UTF-8 across system boundaries
  • Validation - Validate input for well-formed UTF-8
  • Monitoring - Log encoding/decoding errors

UTF-8 Quick Reference

AspectDescriptionExample
EncodingVariable-length bytesUTF-8
ConversionsNATIONAL-OF/DISPLAY-OFDISPLAY-OF(WIDE)
BoundariesFiles/APIs/DBsHTTP, MQ, datasets

Test Your Knowledge

1. What is UTF-8?

  • A fixed-width 16-bit encoding
  • A variable-length Unicode encoding using bytes
  • An EBCDIC variant
  • A compression filter for COBOL

2. How do COBOL programs commonly handle UTF-8?

  • By storing UTF-8 directly in NATIONAL
  • By converting between NATIONAL (UCS-2/UTF-16) and DISPLAY text; UTF-8 used at boundaries
  • By disabling Unicode
  • By using COMP-3

3. Which functions help convert NATIONAL to DISPLAY (for UTF-8 encoding later)?

  • DISPLAY-OF and NATIONAL-OF
  • MOD and REM
  • RANDOM and CURRENT-DATE
  • NUMVAL and INTEGER

4. What is a best practice for files and databases using UTF-8?

  • Assume auto-conversion always works
  • Explicitly configure code pages and test round-trip conversions
  • Avoid diacritics
  • Store everything as binary

5. Which advantage does UTF-8 have for mixed-language text?

  • Fixed width for all characters
  • Compact representation for ASCII and broad compatibility
  • Limited character coverage
  • Requires surrogate pairs for BMP

Frequently Asked Questions