COBOL Tutorial

COBOL UTF-8 - Quick Reference

Progress0 of 0 lessons

Overview

UTF-8 is a byte-oriented, variable-length Unicode encoding. In COBOL systems, it is commonly used at file, API, and database boundaries; programs convert between NATIONAL and DISPLAY text as needed.

Key Points

Use NATIONAL-OF/DISPLAY-OF to bridge NATIONAL and DISPLAY
Configure I/O layers to emit/accept UTF-8
Test round trips and special characters

Syntax and Usage

Conversion Flow

cobol

1
2
3
4
5
6
* Typical conversion path
       01  WIDE-NAME  PIC N(40).
       01  DISP-NAME  PIC X(120).
       MOVE FUNCTION NATIONAL-OF("Łódź") TO WIDE-NAME
       MOVE FUNCTION DISPLAY-OF(WIDE-NAME) TO DISP-NAME
       * External layer writes DISP-NAME as UTF-8

File/DB Considerations

Ensure dataset or file encoding is UTF-8 (if supported)
Set DB client/server code pages to UTF-8 for text columns
Beware of BOM and tool expectations

Best Practices

Consistency - Use UTF-8 across system boundaries
Validation - Validate input for well-formed UTF-8
Monitoring - Log encoding/decoding errors

UTF-8 Quick Reference

Aspect	Description	Example
Encoding	Variable-length bytes	UTF-8
Conversions	NATIONAL-OF/DISPLAY-OF	DISPLAY-OF(WIDE)
Boundaries	Files/APIs/DBs	HTTP, MQ, datasets

Test Your Knowledge

1. What is UTF-8?

A fixed-width 16-bit encoding
A variable-length Unicode encoding using bytes
An EBCDIC variant
A compression filter for COBOL

2. How do COBOL programs commonly handle UTF-8?

By storing UTF-8 directly in NATIONAL
By converting between NATIONAL (UCS-2/UTF-16) and DISPLAY text; UTF-8 used at boundaries
By disabling Unicode
By using COMP-3

3. Which functions help convert NATIONAL to DISPLAY (for UTF-8 encoding later)?

DISPLAY-OF and NATIONAL-OF
MOD and REM
RANDOM and CURRENT-DATE
NUMVAL and INTEGER

4. What is a best practice for files and databases using UTF-8?

Assume auto-conversion always works
Explicitly configure code pages and test round-trip conversions
Avoid diacritics
Store everything as binary

5. Which advantage does UTF-8 have for mixed-language text?

Fixed width for all characters
Compact representation for ASCII and broad compatibility
Limited character coverage
Requires surrogate pairs for BMP

Frequently Asked Questions

Related Concepts

UCS-2

Fixed-width BMP encoding.

UTF-16

Surrogate pairs and BMP/non-BMP.

Character Sets

Code pages, encodings, and conversions.

USER-DEFAULT UTF-16