COBOL Tutorial

Progress0 of 0 lessons

COBOL Unicode support

Unicode support in COBOL lets you work with characters from many languages and symbols. How it works depends on your compiler: often UTF-8 is used at the boundary (files, APIs) and NATIONAL (e.g. UTF-16) for internal text. You may need to convert between encodings using intrinsic functions or compiler options. This page gives an overview; see UTF-8 and your product docs for details.

Key terms

Unicode and encoding terms
TermMeaning
UnicodeCharacter set covering many scripts and symbols
UTF-8Variable-length encoding of Unicode; common for files and APIs
NATIONALCOBOL type for national/Unicode text (often UTF-16)
Code pagePlatform/compiler setting for character encoding

UTF-8 and NATIONAL

UTF-8 is a common encoding for Unicode: one character can take one or more bytes. It is often used when reading or writing files, or when talking to APIs. NATIONAL in COBOL is a data type (PIC N) for “national” characters; in many implementations it is UTF-16. So the pattern is: read UTF-8 into a buffer, convert to NATIONAL if you need to process as Unicode, process, then convert back to display or UTF-8 for output. Exact conversion (e.g. NATIONAL-OF, DISPLAY-OF) is compiler-specific.

What to check in your environment

  • Compiler options – Code page, UTF-8, or NATIONAL support (e.g. CODEPAGE, NATIVE).
  • File encoding – Whether files are opened as UTF-8 or another encoding.
  • Intrinsic functions – NATIONAL-OF, DISPLAY-OF, or similar for converting between character sets.
  • Database and APIs – If you read/write Unicode (e.g. UTF-8), ensure the client or API is configured for that encoding.

Typical flow

Read data from a UTF-8 file (or API) into a display or national field. If the run time expects NATIONAL for processing, convert with the appropriate function. Do your logic in NATIONAL or display as required. For output, convert to the form expected by the file or API (e.g. display/UTF-8) and write. Always verify round-trip and edge cases (e.g. BOM, invalid bytes) in your environment.

cobol
1
2
3
4
5
6
7
8
9
*> Conceptual: conversion and I/O depend on compiler 01 WS-DISPLAY PIC X(100). *> May hold UTF-8 bytes in some setups 01 WS-NATIONAL PIC N(50). *> National (e.g. UTF-16) for processing *> Read UTF-8 (environment-dependent) *> Convert to national if needed: MOVE or NATIONAL-OF *> Process WS-NATIONAL *> Convert back: DISPLAY-OF or MOVE to WS-DISPLAY *> Write WS-DISPLAY to UTF-8 file

Step-by-step: working with Unicode

  • Confirm your compiler’s Unicode/code page options and NATIONAL support.
  • Decide where data is UTF-8 (e.g. files, APIs) and where it is NATIONAL or display.
  • Use the documented conversion (e.g. NATIONAL-OF, DISPLAY-OF) when moving between encodings.
  • Test with sample data including non-ASCII characters and check for BOM or encoding errors.

Explain like I'm five

Unicode is a big list of characters (letters, symbols, many languages). UTF-8 is one way to store that list so it can be saved or sent (like in a file). Inside the program, the computer might use another form (NATIONAL). Converting is like translating so the program and the outside world can both read the same text.

Test Your Knowledge

1. Where is UTF-8 commonly used in COBOL systems?

  • Only in the DATA DIVISION
  • At file and API boundaries for exchanging text
  • Only for numeric data
  • Instead of PROCEDURE DIVISION

2. What is NATIONAL typically used for in COBOL?

  • Numeric data
  • Internal processing of national/Unicode character data
  • File names only
  • Report headers only

Related concepts

Related Pages