Unicode support in COBOL lets you work with characters from many languages and symbols. How it works depends on your compiler: often UTF-8 is used at the boundary (files, APIs) and NATIONAL (e.g. UTF-16) for internal text. You may need to convert between encodings using intrinsic functions or compiler options. This page gives an overview; see UTF-8 and your product docs for details.
| Term | Meaning |
|---|---|
| Unicode | Character set covering many scripts and symbols |
| UTF-8 | Variable-length encoding of Unicode; common for files and APIs |
| NATIONAL | COBOL type for national/Unicode text (often UTF-16) |
| Code page | Platform/compiler setting for character encoding |
UTF-8 is a common encoding for Unicode: one character can take one or more bytes. It is often used when reading or writing files, or when talking to APIs. NATIONAL in COBOL is a data type (PIC N) for “national” characters; in many implementations it is UTF-16. So the pattern is: read UTF-8 into a buffer, convert to NATIONAL if you need to process as Unicode, process, then convert back to display or UTF-8 for output. Exact conversion (e.g. NATIONAL-OF, DISPLAY-OF) is compiler-specific.
Read data from a UTF-8 file (or API) into a display or national field. If the run time expects NATIONAL for processing, convert with the appropriate function. Do your logic in NATIONAL or display as required. For output, convert to the form expected by the file or API (e.g. display/UTF-8) and write. Always verify round-trip and edge cases (e.g. BOM, invalid bytes) in your environment.
123456789*> Conceptual: conversion and I/O depend on compiler 01 WS-DISPLAY PIC X(100). *> May hold UTF-8 bytes in some setups 01 WS-NATIONAL PIC N(50). *> National (e.g. UTF-16) for processing *> Read UTF-8 (environment-dependent) *> Convert to national if needed: MOVE or NATIONAL-OF *> Process WS-NATIONAL *> Convert back: DISPLAY-OF or MOVE to WS-DISPLAY *> Write WS-DISPLAY to UTF-8 file
Unicode is a big list of characters (letters, symbols, many languages). UTF-8 is one way to store that list so it can be saved or sent (like in a file). Inside the program, the computer might use another form (NATIONAL). Converting is like translating so the program and the outside world can both read the same text.
1. Where is UTF-8 commonly used in COBOL systems?
2. What is NATIONAL typically used for in COBOL?